Commit graph

1423 commits

Author SHA1 Message Date
Sweaterdog
d116e90126
Update prompter.js
Fixed spacing and logging
2025-06-07 17:17:51 -07:00
Sweaterdog
989664d1be
Update openrouter.js
Fixed some logging
2025-06-07 17:16:42 -07:00
Sweaterdog
3ea4c2df5d
Update local.js
Fixed some logging
2025-06-07 17:15:57 -07:00
Sweaterdog
ba1b0ea22f
Update hyperbolic.js
Fixed some logging
2025-06-07 17:15:37 -07:00
Sweaterdog
bdb3b1788a
Update groq.js
Fixed some logging
2025-06-07 17:15:03 -07:00
Sweaterdog
8e558a10ad
Update grok.js
Fixed some logging
2025-06-07 17:14:34 -07:00
Sweaterdog
63ff3e4c1f
Update gpt.js
Fixed some logging
2025-06-07 17:03:16 -07:00
Sweaterdog
69332f6a19
Update glhf.js
Fixed some logging
2025-06-07 17:02:42 -07:00
Sweaterdog
6ae7b82a53
Update gemini.js
Fixed some logging
2025-06-07 17:02:21 -07:00
Sweaterdog
f6b276b3cf
Update deepseek.js
fixed logging
2025-06-07 17:01:58 -07:00
Sweaterdog
237f7ce915
Update claude.js
Fixed some logging
2025-06-07 17:01:34 -07:00
Sweaterdog
f0da49403c
Update logger.js
Fixed some bugs after testing
2025-06-07 16:59:50 -07:00
Sweaterdog
44be97adc4
Update huggingface.js
Fixed an accidental enter
2025-06-07 16:35:46 -07:00
Sweaterdog
296fb1323c
Update settings.js
fixed a comma
2025-06-07 16:17:00 -07:00
google-labs-jules[bot]
19b69efd67 Fix: Use mic as fallback for STT if naudiodon fails
This commit addresses an issue where Speech-to-Text (STT) functionality would be disabled if the `naudiodon` package failed to build during installation.

The `src/process/tts_process.js` file (which handles STT) has been modified to:
1. Attempt to load `naudiodon` first.
2. If `naudiodon` fails to load, attempt to load the `mic` package as an alternative.
3. The audio recording logic has been adapted to work with both `naudiodon` and `mic` APIs.

Additionally, `package.json` has been updated to move `mic` from `dependencies` to `optionalDependencies`, making its behavior consistent with `naudiodon`.

This change provides a fallback mechanism for audio recording, increasing the robustness of the STT feature across different platforms and environments where `naudiodon` might have build issues.
2025-06-07 23:13:07 +00:00
Sweaterdog
98b9284b44
Merge pull request #9 from Sweaterdog/vision-logging-enhancements
Fix: Make naudiodon optional and document prerequisites
2025-06-07 16:01:47 -07:00
google-labs-jules[bot]
990ef03dca Fix: Make naudiodon optional and document prerequisites
This commit addresses build failures related to the `naudiodon` package encountered during `npm install`.

Changes Made:

1.  **`naudiodon` as Optional Dependency:**
    *   Moved `naudiodon` from `dependencies` to `optionalDependencies` in `package.json`. This allows `npm install` to succeed even if `naudiodon` fails to build on your system, preventing the installation from being blocked.

2.  **Graceful Handling of `naudiodon` Absence:**
    *   Modified `src/process/tts_process.js` to dynamically import `naudiodon`.
    *   If `naudiodon` is not found or fails to load, the Speech-to-Text (STT) functionality that relies on it for microphone input will be gracefully disabled.
    *   The application will log a warning in this case but will otherwise start and run normally.

3.  **Documentation of Prerequisites:**
    *   Updated `README.md` with a new section detailing the system prerequisites for building `naudiodon` successfully on Linux, Windows, and macOS. This includes commands for installing necessary C++ compilers, development tools, and PortAudio libraries.
    *   Added notes to the README explaining that `naudiodon` is used for STT and is optional.

**Summary of Approach:**

The primary goal was to resolve the `npm install` error caused by `naudiodon`. By making it an optional dependency and ensuring the application handles its absence, you can now install and run the core application without needing to immediately troubleshoot `naudiodon` build issues. If you wish to use the STT feature, you can refer to the updated README for guidance on installing the necessary system dependencies for `naudiodon`.

**Note on Your Feedback (STT Alternatives):**
You expressed a desire for STT to work even without `naudiodon`, possibly using alternative packages. While this commit ensures the application no longer errors out due to `naudiodon` and makes STT optionally functional, it does not replace `naudiodon` with an alternative for STT audio input. Exploring and integrating alternative cross-platform audio input libraries for STT would be a separate task.

This set of changes should improve the installation experience across different platforms.
2025-06-07 23:01:17 +00:00
Sweaterdog
15578595f1
Merge pull request #8 from Sweaterdog/vision-logging-enhancements
Fix: Improve vision logging and add comments
2025-06-07 15:29:55 -07:00
google-labs-jules[bot]
4577a68dfd Fix: Improve vision logging and add comments
This commit addresses several aspects of the vision logging system:

1.  **Always Active Vision Logging:**
    *   Ensures that when `settings.vision_mode` is 'always', a vision log entry is created each time a message is handled.
    *   The full conversation history is now correctly formatted into a JSON string and passed as the `visionMessage` (4th argument) to `logger.logVision`. This ensures the entire input context is logged for these "always active" vision captures, similar to 'normal' and 'reasoning' text logs.
    *   I implemented this by adding a `formatHistoryForVisionLog` helper function to `Agent.js` and calling it within `handleMessage` to prepare the history string. This approach was chosen due to difficulties in directly modifying `logger.js` to always use its internal full history formatter.

2.  **Comments:**
    *   I added detailed comments in `agent.js` to explain the `formatHistoryForVisionLog` helper function and the logic for "always active" vision logging, including the rationale for the approach.
    *   I clarified how `latestScreenshotPath` is managed in relation to "always active" logs and other history entries.

3.  **General Code Health:**
    *   I ensured necessary imports (`fs`, `path`, `logger`) are present in `agent.js`.

I tested the changes by simulating the "always active" vision scenario and verifying that `logger.logVision` was called with the correct arguments, including the complete formatted history string.
2025-06-07 22:29:19 +00:00
Sweaterdog
4efb5c304f
Merge pull request #7 from Sweaterdog/Speech-to-Text
Speech to text
2025-06-07 14:59:42 -07:00
Sweaterdog
da0722a8fb
Merge branch 'main' into Speech-to-Text 2025-06-07 14:59:35 -07:00
Sweaterdog
d58633640f
Merge branch 'kolbytn:main' into Speech-to-Text 2025-06-07 14:57:26 -07:00
Sweaterdog
e87e615f0c
Merge pull request #6 from Sweaterdog/always-active-vision
Always active vision
2025-06-07 14:57:07 -07:00
Sweaterdog
131dd45c9f
Merge branch 'main' into always-active-vision 2025-06-07 14:56:59 -07:00
Sweaterdog
c75ac9495c
Merge pull request #5 from Sweaterdog/advanced-logging
Advanced logging
2025-06-07 13:59:52 -07:00
Sweaterdog
ae475955d8
Merge pull request #4 from Sweaterdog/refactor-logging-and-remove-features
Refactor logging and remove features
2025-06-07 13:58:21 -07:00
Sweaterdog
d106791c76
Update openrouter.js
Added reasoning for a fixed comment
2025-06-07 13:54:32 -07:00
Sweaterdog
b4f6ad8835
Update settings.js
Removed unnecessary comments made by Jules
2025-06-07 13:52:28 -07:00
google-labs-jules[bot]
857d14e64c I've enhanced logging, transformed thinking tags, and cleaned comments.
- I implemented universal logging for all API providers in src/models/, ensuring calls to logger.js for text and vision logs.
- I added transformation of <thinking>...</thinking> tags to <think>...</think> in all provider responses before logging, for correct categorization by logger.js.
- I standardized the input to logger.js's log() function to be a JSON string of the message history (system prompt + turns).
- I removed unnecessary comments from most API provider files, settings.js, and prompter.js to improve readability.

Note: I encountered some issues that prevented final comment cleanup for qwen.js, vllm.js, and logger.js. Their core logging functionality and tag transformations (for qwen.js and vllm.js) are in place from previous steps.
2025-06-07 20:47:26 +00:00
google-labs-jules[bot]
62bcb1950c I've integrated universal logging and applied some refactors.
I implemented comprehensive logging across all API providers in src/models/ using logger.js.
This includes:
- Adding log() and logVision() calls to each provider (Claude, DeepSeek, Gemini, GLHF, GPT, Grok, Groq, HuggingFace, Hyperbolic, Local, Mistral, Novita, Qwen, Replicate, VLLM).
- Ensuring logging respects 'log_normal_data', 'log_reasoning_data', and 'log_vision_data' flags in settings.js, which I added.
- I deprecated 'log_all_prompts' in settings.js and updated prompter.js accordingly.

I refactored openrouter.js and prompter.js:
- I removed the experimental reasoning prompt functionality ($REASONING) from openrouter.js.
- I removed a previously implemented (and then reverted) personality injection feature ($PERSONALITY) from prompter.js, openrouter.js, and profile files.

I had to work around some issues:
- I replaced the full file content for glhf.js and hyperbolic.js due to persistent errors with applying changes.

Something I still need to do:
- Based on your latest feedback, model responses containing <thinking>...</thinking> tags need to be transformed to <think>...</think> tags before being passed to logger.js to ensure they are categorized into reasoning_logs.csv. This change is not included in this update.
2025-06-07 10:18:04 +00:00
google-labs-jules[bot]
fa35e03ec5 Refactor logging and remove unused features.
- Unified logging for `prompter.js` to use granular settings from `settings.js` (e.g., `log_normal_data`) instead of `log_all_prompts`, which has been deprecated.
- Removed the experimental reasoning prompt functionality (formerly triggered by `$REASONING`) from `openrouter.js`.
- Reverted the recently added personality injection feature (`$PERSONALITY` and `getRandomPersonality`) from `prompter.js`, `openrouter.js`, and profile files as per your request.
- Verified that `openrouter.js` correctly utilizes `logger.js` for standard and vision logs.
2025-06-07 10:01:18 +00:00
Sweaterdog
b70c3bb03a
Added example logging with openrouter.js 2025-06-07 02:47:07 -07:00
Sweaterdog
068f1009be
Add files via upload 2025-06-07 02:46:12 -07:00
Sweaterdog
0db80cfc56
Merge pull request #3 from Jules' work
Jules wip 2192516976139170352
2025-06-07 02:33:05 -07:00
google-labs-jules[bot]
be38f56f12 I've implemented enhanced vision modes with bug fixes and extended API support.
This update finalizes the implementation of three distinct vision modes:
- "off": This disables all my vision capabilities.
- "prompted": (Formerly "on") This allows me to use vision via explicit commands from you (e.g., !lookAtPlayer), and I will then summarize the image.
- "always": (Formerly "always_active") I will automatically take a screenshot every time you send a prompt and send it with your prompt to a multimodal LLM. If you use a look command in this mode, I will only update my view and take a screenshot for the *next* interaction if relevant, without immediate summarization.

Here are the key changes and improvements:

1.  **Bug Fix (Image Path ENOENT)**:
    *   I've corrected `Camera.capture()` so it returns filenames with the `.jpg` extension.
    *   I've updated `VisionInterpreter.analyzeImage()` to handle full filenames.
    *   This resolves the `ENOENT` error that was previously happening in `Prompter.js`.

2.  **Vision Mode Renaming**:
    *   I've renamed the modes in `settings.js` and throughout the codebase: "on" is now "prompted", and "always_active" is now "always".

3.  **Core Framework (from previous work, now integrated)**:
    *   I've added `vision_mode` to `settings.js`.
    *   `Agent.js` now manages `latestScreenshotPath` and initializes `VisionInterpreter` with `vision_mode`.
    *   `VisionInterpreter.js` handles different behaviors for each mode.
    *   My vision commands (`!lookAt...`) respect the `off` mode.
    *   `History.js` stores `imagePath` with turns, and `Agent.js` manages this path's lifecycle.
    *   `Prompter.js` reads image files when I'm in "always" mode and passes `imageData` to model wrappers.

4.  **Extended Multimodal API Support**:
    *   `gemini.js`, `gpt.js`, `claude.js`, `local.js` (Ollama), `qwen.js`, and `deepseek.js` have been updated to accept `imageData` in their `sendRequest` method and format it for their respective multimodal APIs. They now include `supportsRawImageInput = true`.
    *   Other model wrappers (`mistral.js`, `glhf.js`, `grok.js`, etc.) now safely handle the `imageData` parameter in `sendRequest` (by ignoring it and logging a warning) and have `supportsRawImageInput = false` for that method, ensuring consistent behavior.

5.  **Testing**: I have a comprehensive plan to verify all modes and functionalities.

This set of changes provides a robust and flexible vision system for me, catering to different operational needs and supporting various multimodal LLMs.
2025-06-07 09:07:02 +00:00
Sweaterdog
5c1a8c46b2
Fixed Agent.js error caused by Jules 2025-06-07 01:49:11 -07:00
google-labs-jules[bot]
e9160d928e feat: Implement framework for new vision modes and Gemini support
This commit introduces a comprehensive framework for three new vision modes: 'off', 'on', and 'always_active'.

Key changes include:

1.  **Settings (`settings.js`)**: Added a `vision_mode` setting.
2.  **Agent State (`src/agent/agent.js`)**:
    *   Added `latestScreenshotPath` to store the most recent screenshot.
    *   Updated `VisionInterpreter` initialization to use `vision_mode`.
3.  **Screenshot Handling**:
    *   `VisionInterpreter` now updates `agent.latestScreenshotPath` after look commands.
    *   `Agent.handleMessage` captures screenshots in `always_active` mode for your messages.
4.  **VisionInterpreter (`src/agent/vision/vision_interpreter.js`)**:
    *   Refactored to support distinct behaviors for `off` (disabled), `on` (summarize), and `always_active` (capture-only, no summarization for look commands).
5.  **Vision Commands (`src/agent/commands/actions.js`)**:
    *   `!lookAtPlayer` and `!lookAtPosition` now respect `vision_mode: 'off'` and camera availability.
6.  **History Storage (`src/agent/history.js`)**:
    *   `History.add` now supports an `imagePath` for each turn.
    *   `Agent.js` correctly passes `latestScreenshotPath` for relevant turns in `always_active` mode and manages its lifecycle.
7.  **Prompter Logic (`src/models/prompter.js`)**:
    *   `Prompter.promptConvo` now reads image files specified in history for `always_active` mode and passes `imageData` to the chat model.
8.  **Model API Wrappers (Example: `src/models/gemini.js`)**:
    *   `gemini.js` updated to accept `imageData` in `sendRequest`.
    *   Added `supportsRawImageInput` flag to `gemini.js`.

The system is now structured to support these vision modes. The `always_active` mode, where raw images are sent with prompts, is fully implemented for the Gemini API.

Further work will involve extending this raw image support in `always_active` mode to all other capable multimodal API providers as per your feedback.
2025-06-07 08:41:24 +00:00
google-labs-jules[bot]
ffe3b0e528 Jules was unable to complete the task in time. Please review the work done so far and provide feedback for Jules to continue. 2025-06-07 08:39:05 +00:00
Sweaterdog
21481a7861
Merge branch 'kolbytn:main' into Make-Andy-4-Default-Ollama-Model 2025-05-25 14:57:10 -07:00
Max Robinson
f2f06fcf3f
Merge pull request #540 from icwhite/main
Small Fixes and lots of Task reworking
2025-05-24 12:30:33 -06:00
Isadora White
fa02028b8b remove unnecessary changes 2025-05-23 12:02:23 -07:00
Isadora White
b55f92800f restore settings.js 2025-05-23 11:56:40 -07:00
Isadora White
f7e4fee249 update README and remove useless tasks 2025-05-23 11:54:53 -07:00
Isadora White
77535f97d5 fix goal string issues 2025-05-23 11:49:51 -07:00
Sweaterdog
da6c0bef23
Merge branch 'kolbytn:main' into Speech-to-Text 2025-05-22 19:14:20 -07:00
Sweaterdog
d32dcdc887
Update local.js
Made Andy-4 the default model if the Ollama API is the only thing specified
2025-05-22 19:13:52 -07:00
Sweaterdog
d2a3e11fdd
Merge branch 'kolbytn:main' into Make-Andy-4-Default-Ollama-Model 2025-05-22 19:12:59 -07:00
Kolby Nottingham
c4e23ea387
Merge pull request #550 from rajammanabrolu/main
Update README.md with bib for arxiv paper
2025-05-21 09:50:38 -07:00
Prithviraj Ammanabrolu
0fabaa8e90
smol 2025-05-21 09:48:28 -07:00
Prithviraj Ammanabrolu
99af6506aa
Update README.md with bib 2025-05-21 09:44:47 -07:00