Commit graph

24 commits

Author SHA1 Message Date
Sweaterdog
131dd45c9f
Merge branch 'main' into always-active-vision 2025-06-07 14:56:59 -07:00
google-labs-jules[bot]
857d14e64c I've enhanced logging, transformed thinking tags, and cleaned comments.
- I implemented universal logging for all API providers in src/models/, ensuring calls to logger.js for text and vision logs.
- I added transformation of <thinking>...</thinking> tags to <think>...</think> in all provider responses before logging, for correct categorization by logger.js.
- I standardized the input to logger.js's log() function to be a JSON string of the message history (system prompt + turns).
- I removed unnecessary comments from most API provider files, settings.js, and prompter.js to improve readability.

Note: I encountered some issues that prevented final comment cleanup for qwen.js, vllm.js, and logger.js. Their core logging functionality and tag transformations (for qwen.js and vllm.js) are in place from previous steps.
2025-06-07 20:47:26 +00:00
google-labs-jules[bot]
62bcb1950c I've integrated universal logging and applied some refactors.
I implemented comprehensive logging across all API providers in src/models/ using logger.js.
This includes:
- Adding log() and logVision() calls to each provider (Claude, DeepSeek, Gemini, GLHF, GPT, Grok, Groq, HuggingFace, Hyperbolic, Local, Mistral, Novita, Qwen, Replicate, VLLM).
- Ensuring logging respects 'log_normal_data', 'log_reasoning_data', and 'log_vision_data' flags in settings.js, which I added.
- I deprecated 'log_all_prompts' in settings.js and updated prompter.js accordingly.

I refactored openrouter.js and prompter.js:
- I removed the experimental reasoning prompt functionality ($REASONING) from openrouter.js.
- I removed a previously implemented (and then reverted) personality injection feature ($PERSONALITY) from prompter.js, openrouter.js, and profile files.

I had to work around some issues:
- I replaced the full file content for glhf.js and hyperbolic.js due to persistent errors with applying changes.

Something I still need to do:
- Based on your latest feedback, model responses containing <thinking>...</thinking> tags need to be transformed to <think>...</think> tags before being passed to logger.js to ensure they are categorized into reasoning_logs.csv. This change is not included in this update.
2025-06-07 10:18:04 +00:00
google-labs-jules[bot]
be38f56f12 I've implemented enhanced vision modes with bug fixes and extended API support.
This update finalizes the implementation of three distinct vision modes:
- "off": This disables all my vision capabilities.
- "prompted": (Formerly "on") This allows me to use vision via explicit commands from you (e.g., !lookAtPlayer), and I will then summarize the image.
- "always": (Formerly "always_active") I will automatically take a screenshot every time you send a prompt and send it with your prompt to a multimodal LLM. If you use a look command in this mode, I will only update my view and take a screenshot for the *next* interaction if relevant, without immediate summarization.

Here are the key changes and improvements:

1.  **Bug Fix (Image Path ENOENT)**:
    *   I've corrected `Camera.capture()` so it returns filenames with the `.jpg` extension.
    *   I've updated `VisionInterpreter.analyzeImage()` to handle full filenames.
    *   This resolves the `ENOENT` error that was previously happening in `Prompter.js`.

2.  **Vision Mode Renaming**:
    *   I've renamed the modes in `settings.js` and throughout the codebase: "on" is now "prompted", and "always_active" is now "always".

3.  **Core Framework (from previous work, now integrated)**:
    *   I've added `vision_mode` to `settings.js`.
    *   `Agent.js` now manages `latestScreenshotPath` and initializes `VisionInterpreter` with `vision_mode`.
    *   `VisionInterpreter.js` handles different behaviors for each mode.
    *   My vision commands (`!lookAt...`) respect the `off` mode.
    *   `History.js` stores `imagePath` with turns, and `Agent.js` manages this path's lifecycle.
    *   `Prompter.js` reads image files when I'm in "always" mode and passes `imageData` to model wrappers.

4.  **Extended Multimodal API Support**:
    *   `gemini.js`, `gpt.js`, `claude.js`, `local.js` (Ollama), `qwen.js`, and `deepseek.js` have been updated to accept `imageData` in their `sendRequest` method and format it for their respective multimodal APIs. They now include `supportsRawImageInput = true`.
    *   Other model wrappers (`mistral.js`, `glhf.js`, `grok.js`, etc.) now safely handle the `imageData` parameter in `sendRequest` (by ignoring it and logging a warning) and have `supportsRawImageInput = false` for that method, ensuring consistent behavior.

5.  **Testing**: I have a comprehensive plan to verify all modes and functionalities.

This set of changes provides a robust and flexible vision system for me, catering to different operational needs and supporting various multimodal LLMs.
2025-06-07 09:07:02 +00:00
MaxRobinsonTheGreat
d5cfae27c9 add openrouter vision, gpt strict format 2025-04-16 12:30:26 -05:00
gmuffiness
430ae24d20 fix: use text description when vision features are used with a non-vision model 2025-02-10 02:03:25 +09:00
gmuffiness
a22f9d439f merge: main 2025-02-08 17:39:38 +09:00
MaxRobinsonTheGreat
60187e2317 added model parameters obj to profile 2025-02-04 13:02:57 -06:00
MaxRobinsonTheGreat
9b387649a1 enable o3, improve novita 2025-02-03 18:35:58 -06:00
gmuffiness
7d51726289 feat: remove promptImageConvo and implement sendVisionRequest to each provider 2025-01-24 16:29:03 +09:00
MaxRobinsonTheGreat
66a03bf893 embed max tokens, fix shutdown race condition 2025-01-21 13:41:48 -06:00
bartek
bd8f911637
fix as described in https://github.com/kolbytn/mindcraft/issues/306 2024-11-08 09:49:24 +01:00
wlvrx
043031f9b0 Update default embedding model to text-embedding-3-small
- Changed from text-ada-002 to text-embedding-3-small
- Aligns with OpenAI's current best practices
- More cost-effective and better performance
2024-11-04 21:55:10 +08:00
MaxRobinsonTheGreat
2e84595772 added strict format to o1 2024-10-06 22:54:51 -05:00
MaxRobinsonTheGreat
e90783b650 added o1 support and prompt param to newAction 2024-10-06 13:28:49 -05:00
MaxRobinsonTheGreat
3342a6deb9 removed strict format 2024-06-03 18:40:01 -05:00
MaxRobinsonTheGreat
440ffdd931 ignore user commands, remove message logs 2024-06-03 18:21:11 -05:00
MaxRobinsonTheGreat
24a6370332 refactored into key reader 2024-05-30 18:00:48 -05:00
Sam Kemp
fe621ebcf8 Renamed config.json to keys.json and added check for env var if keys.json isn't found 2024-05-30 17:30:34 +01:00
Sam Kemp
8b4ea79b9a Switched to config.json instead of environment variable 2024-05-27 12:36:29 +01:00
Kolby Nottingham
40e067903e model refactor 2024-04-24 11:28:04 -07:00
MaxRobinsonTheGreat
34b571d347 added claude 2024-03-23 11:15:53 -05:00
MaxRobinsonTheGreat
4afdebac20 refactored prompts/examples/memory/initialization 2024-02-25 14:13:32 -06:00
MaxRobinsonTheGreat
5bf147fc40 refactored llm, added gemini 2024-02-18 22:56:38 -06:00