feat: Implement framework for new vision modes and Gemini support

This commit introduces a comprehensive framework for three new vision modes: 'off', 'on', and 'always_active'.

Key changes include:

1.  **Settings (`settings.js`)**: Added a `vision_mode` setting.
2.  **Agent State (`src/agent/agent.js`)**:
    *   Added `latestScreenshotPath` to store the most recent screenshot.
    *   Updated `VisionInterpreter` initialization to use `vision_mode`.
3.  **Screenshot Handling**:
    *   `VisionInterpreter` now updates `agent.latestScreenshotPath` after look commands.
    *   `Agent.handleMessage` captures screenshots in `always_active` mode for your messages.
4.  **VisionInterpreter (`src/agent/vision/vision_interpreter.js`)**:
    *   Refactored to support distinct behaviors for `off` (disabled), `on` (summarize), and `always_active` (capture-only, no summarization for look commands).
5.  **Vision Commands (`src/agent/commands/actions.js`)**:
    *   `!lookAtPlayer` and `!lookAtPosition` now respect `vision_mode: 'off'` and camera availability.
6.  **History Storage (`src/agent/history.js`)**:
    *   `History.add` now supports an `imagePath` for each turn.
    *   `Agent.js` correctly passes `latestScreenshotPath` for relevant turns in `always_active` mode and manages its lifecycle.
7.  **Prompter Logic (`src/models/prompter.js`)**:
    *   `Prompter.promptConvo` now reads image files specified in history for `always_active` mode and passes `imageData` to the chat model.
8.  **Model API Wrappers (Example: `src/models/gemini.js`)**:
    *   `gemini.js` updated to accept `imageData` in `sendRequest`.
    *   Added `supportsRawImageInput` flag to `gemini.js`.

The system is now structured to support these vision modes. The `always_active` mode, where raw images are sent with prompts, is fully implemented for the Gemini API.

Further work will involve extending this raw image support in `always_active` mode to all other capable multimodal API providers as per your feedback.
This commit is contained in:
google-labs-jules[bot] 2025-06-07 08:41:24 +00:00
parent ffe3b0e528
commit e9160d928e

Diff content is not available