mindcraft

mirror of https://github.com/kolbytn/mindcraft.git synced 2025-08-03 05:45:36 +02:00

Author	SHA1	Message	Date
Sweaterdog	d116e90126	Update prompter.js Fixed spacing and logging	2025-06-07 17:17:51 -07:00
Sweaterdog	989664d1be	Update openrouter.js Fixed some logging	2025-06-07 17:16:42 -07:00
Sweaterdog	3ea4c2df5d	Update local.js Fixed some logging	2025-06-07 17:15:57 -07:00
Sweaterdog	ba1b0ea22f	Update hyperbolic.js Fixed some logging	2025-06-07 17:15:37 -07:00
Sweaterdog	bdb3b1788a	Update groq.js Fixed some logging	2025-06-07 17:15:03 -07:00
Sweaterdog	8e558a10ad	Update grok.js Fixed some logging	2025-06-07 17:14:34 -07:00
Sweaterdog	63ff3e4c1f	Update gpt.js Fixed some logging	2025-06-07 17:03:16 -07:00
Sweaterdog	69332f6a19	Update glhf.js Fixed some logging	2025-06-07 17:02:42 -07:00
Sweaterdog	6ae7b82a53	Update gemini.js Fixed some logging	2025-06-07 17:02:21 -07:00
Sweaterdog	f6b276b3cf	Update deepseek.js fixed logging	2025-06-07 17:01:58 -07:00
Sweaterdog	237f7ce915	Update claude.js Fixed some logging	2025-06-07 17:01:34 -07:00
Sweaterdog	f0da49403c	Update logger.js Fixed some bugs after testing	2025-06-07 16:59:50 -07:00
Sweaterdog	44be97adc4	Update huggingface.js Fixed an accidental enter	2025-06-07 16:35:46 -07:00
Sweaterdog	296fb1323c	Update settings.js fixed a comma	2025-06-07 16:17:00 -07:00
google-labs-jules[bot]	19b69efd67	Fix: Use mic as fallback for STT if naudiodon fails This commit addresses an issue where Speech-to-Text (STT) functionality would be disabled if the `naudiodon` package failed to build during installation. The `src/process/tts_process.js` file (which handles STT) has been modified to: 1. Attempt to load `naudiodon` first. 2. If `naudiodon` fails to load, attempt to load the `mic` package as an alternative. 3. The audio recording logic has been adapted to work with both `naudiodon` and `mic` APIs. Additionally, `package.json` has been updated to move `mic` from `dependencies` to `optionalDependencies`, making its behavior consistent with `naudiodon`. This change provides a fallback mechanism for audio recording, increasing the robustness of the STT feature across different platforms and environments where `naudiodon` might have build issues.	2025-06-07 23:13:07 +00:00
Sweaterdog	98b9284b44	Merge pull request #9 from Sweaterdog/vision-logging-enhancements Fix: Make naudiodon optional and document prerequisites	2025-06-07 16:01:47 -07:00
google-labs-jules[bot]	990ef03dca	Fix: Make naudiodon optional and document prerequisites This commit addresses build failures related to the `naudiodon` package encountered during `npm install`. Changes Made: 1. `naudiodon` as Optional Dependency: * Moved `naudiodon` from `dependencies` to `optionalDependencies` in `package.json`. This allows `npm install` to succeed even if `naudiodon` fails to build on your system, preventing the installation from being blocked. 2. Graceful Handling of `naudiodon` Absence: * Modified `src/process/tts_process.js` to dynamically import `naudiodon`. * If `naudiodon` is not found or fails to load, the Speech-to-Text (STT) functionality that relies on it for microphone input will be gracefully disabled. * The application will log a warning in this case but will otherwise start and run normally. 3. Documentation of Prerequisites: * Updated `README.md` with a new section detailing the system prerequisites for building `naudiodon` successfully on Linux, Windows, and macOS. This includes commands for installing necessary C++ compilers, development tools, and PortAudio libraries. * Added notes to the README explaining that `naudiodon` is used for STT and is optional. Summary of Approach: The primary goal was to resolve the `npm install` error caused by `naudiodon`. By making it an optional dependency and ensuring the application handles its absence, you can now install and run the core application without needing to immediately troubleshoot `naudiodon` build issues. If you wish to use the STT feature, you can refer to the updated README for guidance on installing the necessary system dependencies for `naudiodon`. Note on Your Feedback (STT Alternatives): You expressed a desire for STT to work even without `naudiodon`, possibly using alternative packages. While this commit ensures the application no longer errors out due to `naudiodon` and makes STT optionally functional, it does not replace `naudiodon` with an alternative for STT audio input. Exploring and integrating alternative cross-platform audio input libraries for STT would be a separate task. This set of changes should improve the installation experience across different platforms.	2025-06-07 23:01:17 +00:00
Sweaterdog	15578595f1	Merge pull request #8 from Sweaterdog/vision-logging-enhancements Fix: Improve vision logging and add comments	2025-06-07 15:29:55 -07:00
google-labs-jules[bot]	4577a68dfd	Fix: Improve vision logging and add comments This commit addresses several aspects of the vision logging system: 1. Always Active Vision Logging: * Ensures that when `settings.vision_mode` is 'always', a vision log entry is created each time a message is handled. * The full conversation history is now correctly formatted into a JSON string and passed as the `visionMessage` (4th argument) to `logger.logVision`. This ensures the entire input context is logged for these "always active" vision captures, similar to 'normal' and 'reasoning' text logs. * I implemented this by adding a `formatHistoryForVisionLog` helper function to `Agent.js` and calling it within `handleMessage` to prepare the history string. This approach was chosen due to difficulties in directly modifying `logger.js` to always use its internal full history formatter. 2. Comments: * I added detailed comments in `agent.js` to explain the `formatHistoryForVisionLog` helper function and the logic for "always active" vision logging, including the rationale for the approach. * I clarified how `latestScreenshotPath` is managed in relation to "always active" logs and other history entries. 3. General Code Health: * I ensured necessary imports (`fs`, `path`, `logger`) are present in `agent.js`. I tested the changes by simulating the "always active" vision scenario and verifying that `logger.logVision` was called with the correct arguments, including the complete formatted history string.	2025-06-07 22:29:19 +00:00
Sweaterdog	4efb5c304f	Merge pull request #7 from Sweaterdog/Speech-to-Text Speech to text	2025-06-07 14:59:42 -07:00
Sweaterdog	da0722a8fb	Merge branch 'main' into Speech-to-Text	2025-06-07 14:59:35 -07:00
Sweaterdog	d58633640f	Merge branch 'kolbytn:main' into Speech-to-Text	2025-06-07 14:57:26 -07:00
Sweaterdog	e87e615f0c	Merge pull request #6 from Sweaterdog/always-active-vision Always active vision	2025-06-07 14:57:07 -07:00
Sweaterdog	131dd45c9f	Merge branch 'main' into always-active-vision	2025-06-07 14:56:59 -07:00
Sweaterdog	c75ac9495c	Merge pull request #5 from Sweaterdog/advanced-logging Advanced logging	2025-06-07 13:59:52 -07:00
Sweaterdog	ae475955d8	Merge pull request #4 from Sweaterdog/refactor-logging-and-remove-features Refactor logging and remove features	2025-06-07 13:58:21 -07:00
Sweaterdog	d106791c76	Update openrouter.js Added reasoning for a fixed comment	2025-06-07 13:54:32 -07:00
Sweaterdog	b4f6ad8835	Update settings.js Removed unnecessary comments made by Jules	2025-06-07 13:52:28 -07:00
google-labs-jules[bot]	857d14e64c	I've enhanced logging, transformed thinking tags, and cleaned comments. - I implemented universal logging for all API providers in src/models/, ensuring calls to logger.js for text and vision logs. - I added transformation of <thinking>...</thinking> tags to <think>...</think> in all provider responses before logging, for correct categorization by logger.js. - I standardized the input to logger.js's log() function to be a JSON string of the message history (system prompt + turns). - I removed unnecessary comments from most API provider files, settings.js, and prompter.js to improve readability. Note: I encountered some issues that prevented final comment cleanup for qwen.js, vllm.js, and logger.js. Their core logging functionality and tag transformations (for qwen.js and vllm.js) are in place from previous steps.	2025-06-07 20:47:26 +00:00
google-labs-jules[bot]	62bcb1950c	I've integrated universal logging and applied some refactors. I implemented comprehensive logging across all API providers in src/models/ using logger.js. This includes: - Adding log() and logVision() calls to each provider (Claude, DeepSeek, Gemini, GLHF, GPT, Grok, Groq, HuggingFace, Hyperbolic, Local, Mistral, Novita, Qwen, Replicate, VLLM). - Ensuring logging respects 'log_normal_data', 'log_reasoning_data', and 'log_vision_data' flags in settings.js, which I added. - I deprecated 'log_all_prompts' in settings.js and updated prompter.js accordingly. I refactored openrouter.js and prompter.js: - I removed the experimental reasoning prompt functionality ($REASONING) from openrouter.js. - I removed a previously implemented (and then reverted) personality injection feature ($PERSONALITY) from prompter.js, openrouter.js, and profile files. I had to work around some issues: - I replaced the full file content for glhf.js and hyperbolic.js due to persistent errors with applying changes. Something I still need to do: - Based on your latest feedback, model responses containing <thinking>...</thinking> tags need to be transformed to <think>...</think> tags before being passed to logger.js to ensure they are categorized into reasoning_logs.csv. This change is not included in this update.	2025-06-07 10:18:04 +00:00
google-labs-jules[bot]	fa35e03ec5	Refactor logging and remove unused features. - Unified logging for `prompter.js` to use granular settings from `settings.js` (e.g., `log_normal_data`) instead of `log_all_prompts`, which has been deprecated. - Removed the experimental reasoning prompt functionality (formerly triggered by `$REASONING`) from `openrouter.js`. - Reverted the recently added personality injection feature (`$PERSONALITY` and `getRandomPersonality`) from `prompter.js`, `openrouter.js`, and profile files as per your request. - Verified that `openrouter.js` correctly utilizes `logger.js` for standard and vision logs.	2025-06-07 10:01:18 +00:00
Sweaterdog	b70c3bb03a	Added example logging with openrouter.js	2025-06-07 02:47:07 -07:00
Sweaterdog	068f1009be	Add files via upload	2025-06-07 02:46:12 -07:00
Sweaterdog	0db80cfc56	Merge pull request #3 from Jules' work Jules wip 2192516976139170352	2025-06-07 02:33:05 -07:00
google-labs-jules[bot]	be38f56f12	I've implemented enhanced vision modes with bug fixes and extended API support. This update finalizes the implementation of three distinct vision modes: - "off": This disables all my vision capabilities. - "prompted": (Formerly "on") This allows me to use vision via explicit commands from you (e.g., !lookAtPlayer), and I will then summarize the image. - "always": (Formerly "always_active") I will automatically take a screenshot every time you send a prompt and send it with your prompt to a multimodal LLM. If you use a look command in this mode, I will only update my view and take a screenshot for the next interaction if relevant, without immediate summarization. Here are the key changes and improvements: 1. Bug Fix (Image Path ENOENT): * I've corrected `Camera.capture()` so it returns filenames with the `.jpg` extension. * I've updated `VisionInterpreter.analyzeImage()` to handle full filenames. * This resolves the `ENOENT` error that was previously happening in `Prompter.js`. 2. Vision Mode Renaming: * I've renamed the modes in `settings.js` and throughout the codebase: "on" is now "prompted", and "always_active" is now "always". 3. Core Framework (from previous work, now integrated): * I've added `vision_mode` to `settings.js`. * `Agent.js` now manages `latestScreenshotPath` and initializes `VisionInterpreter` with `vision_mode`. * `VisionInterpreter.js` handles different behaviors for each mode. * My vision commands (`!lookAt...`) respect the `off` mode. * `History.js` stores `imagePath` with turns, and `Agent.js` manages this path's lifecycle. * `Prompter.js` reads image files when I'm in "always" mode and passes `imageData` to model wrappers. 4. Extended Multimodal API Support: * `gemini.js`, `gpt.js`, `claude.js`, `local.js` (Ollama), `qwen.js`, and `deepseek.js` have been updated to accept `imageData` in their `sendRequest` method and format it for their respective multimodal APIs. They now include `supportsRawImageInput = true`. * Other model wrappers (`mistral.js`, `glhf.js`, `grok.js`, etc.) now safely handle the `imageData` parameter in `sendRequest` (by ignoring it and logging a warning) and have `supportsRawImageInput = false` for that method, ensuring consistent behavior. 5. Testing: I have a comprehensive plan to verify all modes and functionalities. This set of changes provides a robust and flexible vision system for me, catering to different operational needs and supporting various multimodal LLMs.	2025-06-07 09:07:02 +00:00
Sweaterdog	5c1a8c46b2	Fixed Agent.js error caused by Jules	2025-06-07 01:49:11 -07:00
google-labs-jules[bot]	e9160d928e	feat: Implement framework for new vision modes and Gemini support This commit introduces a comprehensive framework for three new vision modes: 'off', 'on', and 'always_active'. Key changes include: 1. Settings (`settings.js`): Added a `vision_mode` setting. 2. Agent State (`src/agent/agent.js`): * Added `latestScreenshotPath` to store the most recent screenshot. * Updated `VisionInterpreter` initialization to use `vision_mode`. 3. Screenshot Handling: * `VisionInterpreter` now updates `agent.latestScreenshotPath` after look commands. * `Agent.handleMessage` captures screenshots in `always_active` mode for your messages. 4. VisionInterpreter (`src/agent/vision/vision_interpreter.js`): * Refactored to support distinct behaviors for `off` (disabled), `on` (summarize), and `always_active` (capture-only, no summarization for look commands). 5. Vision Commands (`src/agent/commands/actions.js`): * `!lookAtPlayer` and `!lookAtPosition` now respect `vision_mode: 'off'` and camera availability. 6. History Storage (`src/agent/history.js`): * `History.add` now supports an `imagePath` for each turn. * `Agent.js` correctly passes `latestScreenshotPath` for relevant turns in `always_active` mode and manages its lifecycle. 7. Prompter Logic (`src/models/prompter.js`): * `Prompter.promptConvo` now reads image files specified in history for `always_active` mode and passes `imageData` to the chat model. 8. Model API Wrappers (Example: `src/models/gemini.js`): * `gemini.js` updated to accept `imageData` in `sendRequest`. * Added `supportsRawImageInput` flag to `gemini.js`. The system is now structured to support these vision modes. The `always_active` mode, where raw images are sent with prompts, is fully implemented for the Gemini API. Further work will involve extending this raw image support in `always_active` mode to all other capable multimodal API providers as per your feedback.	2025-06-07 08:41:24 +00:00
google-labs-jules[bot]	ffe3b0e528	Jules was unable to complete the task in time. Please review the work done so far and provide feedback for Jules to continue.	2025-06-07 08:39:05 +00:00
Sweaterdog	21481a7861	Merge branch 'kolbytn:main' into Make-Andy-4-Default-Ollama-Model	2025-05-25 14:57:10 -07:00
Max Robinson	f2f06fcf3f	Merge pull request #540 from icwhite/main Small Fixes and lots of Task reworking	2025-05-24 12:30:33 -06:00
Isadora White	fa02028b8b	remove unnecessary changes	2025-05-23 12:02:23 -07:00
Isadora White	b55f92800f	restore settings.js	2025-05-23 11:56:40 -07:00
Isadora White	f7e4fee249	update README and remove useless tasks	2025-05-23 11:54:53 -07:00
Isadora White	77535f97d5	fix goal string issues	2025-05-23 11:49:51 -07:00
Sweaterdog	da6c0bef23	Merge branch 'kolbytn:main' into Speech-to-Text	2025-05-22 19:14:20 -07:00
Sweaterdog	d32dcdc887	Update local.js Made Andy-4 the default model if the Ollama API is the only thing specified	2025-05-22 19:13:52 -07:00
Sweaterdog	d2a3e11fdd	Merge branch 'kolbytn:main' into Make-Andy-4-Default-Ollama-Model	2025-05-22 19:12:59 -07:00
Kolby Nottingham	c4e23ea387	Merge pull request #550 from rajammanabrolu/main Update README.md with bib for arxiv paper	2025-05-21 09:50:38 -07:00
Prithviraj Ammanabrolu	0fabaa8e90	smol	2025-05-21 09:48:28 -07:00
Prithviraj Ammanabrolu	99af6506aa	Update README.md with bib	2025-05-21 09:44:47 -07:00

1 2 3 4 5 ...

1423 commits