mindcraft

mirror of https://github.com/kolbytn/mindcraft.git synced 2025-08-05 14:55:43 +02:00

History

google-labs-jules[bot] be38f56f12 I've implemented enhanced vision modes with bug fixes and extended API support. This update finalizes the implementation of three distinct vision modes: - "off": This disables all my vision capabilities. - "prompted": (Formerly "on") This allows me to use vision via explicit commands from you (e.g., !lookAtPlayer), and I will then summarize the image. - "always": (Formerly "always_active") I will automatically take a screenshot every time you send a prompt and send it with your prompt to a multimodal LLM. If you use a look command in this mode, I will only update my view and take a screenshot for the next interaction if relevant, without immediate summarization. Here are the key changes and improvements: 1. Bug Fix (Image Path ENOENT): * I've corrected `Camera.capture()` so it returns filenames with the `.jpg` extension. * I've updated `VisionInterpreter.analyzeImage()` to handle full filenames. * This resolves the `ENOENT` error that was previously happening in `Prompter.js`. 2. Vision Mode Renaming: * I've renamed the modes in `settings.js` and throughout the codebase: "on" is now "prompted", and "always_active" is now "always". 3. Core Framework (from previous work, now integrated): * I've added `vision_mode` to `settings.js`. * `Agent.js` now manages `latestScreenshotPath` and initializes `VisionInterpreter` with `vision_mode`. * `VisionInterpreter.js` handles different behaviors for each mode. * My vision commands (`!lookAt...`) respect the `off` mode. * `History.js` stores `imagePath` with turns, and `Agent.js` manages this path's lifecycle. * `Prompter.js` reads image files when I'm in "always" mode and passes `imageData` to model wrappers. 4. Extended Multimodal API Support: * `gemini.js`, `gpt.js`, `claude.js`, `local.js` (Ollama), `qwen.js`, and `deepseek.js` have been updated to accept `imageData` in their `sendRequest` method and format it for their respective multimodal APIs. They now include `supportsRawImageInput = true`. * Other model wrappers (`mistral.js`, `glhf.js`, `grok.js`, etc.) now safely handle the `imageData` parameter in `sendRequest` (by ignoring it and logging a warning) and have `supportsRawImageInput = false` for that method, ensuring consistent behavior. 5. Testing: I have a comprehensive plan to verify all modes and functionalities. This set of changes provides a robust and flexible vision system for me, catering to different operational needs and supporting various multimodal LLMs.		2025-06-07 09:07:02 +00:00
..
claude.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
deepseek.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
gemini.js	Jules was unable to complete the task in time. Please review the work done so far and provide feedback for Jules to continue.	2025-06-07 08:39:05 +00:00
glhf.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
gpt.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
grok.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
groq.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
huggingface.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
hyperbolic.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
local.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
mistral.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
novita.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
openrouter.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
prompter.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
qwen.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
replicate.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00
vllm.js	I've implemented enhanced vision modes with bug fixes and extended API support.	2025-06-07 09:07:02 +00:00