This commit is contained in:
Sweaterdog 2025-06-20 10:38:35 +02:00 committed by GitHub
commit 48d2aff76a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
37 changed files with 4612 additions and 599 deletions

View file

@ -16,13 +16,13 @@ Do not connect this bot to public servers with coding enabled. This project allo
## Install and Run
1. Make sure you have the requirements above.
1. Make sure you have the requirements above. If you plan to use the STT (Speech-to-Text) feature, also review the "Installation Prerequisites" section regarding `naudiodon`.
2. Clone or download this repository (big green button) 'git clone https://github.com/kolbytn/mindcraft.git'
3. Rename `keys.example.json` to `keys.json` and fill in your API keys (you only need one). The desired model is set in `andy.json` or other profiles. For other models refer to the table below.
4. In terminal/command prompt, run `npm install` from the installed directory
4. In terminal/command prompt, run `npm install` from the installed directory. (Note: If `naudiodon` fails to build and you don't need STT, you can usually proceed.)
5. Start a minecraft world and open it to LAN on localhost port `55916`
@ -53,7 +53,7 @@ You can configure the agent's name, model, and prompts in their profile like `an
| `anthropic` | `ANTHROPIC_API_KEY` | `claude-3-haiku-20240307` | [docs](https://docs.anthropic.com/claude/docs/models-overview) |
| `xai` | `XAI_API_KEY` | `grok-2-1212` | [docs](https://docs.x.ai/docs) |
| `deepseek` | `DEEPSEEK_API_KEY` | `deepseek-chat` | [docs](https://api-docs.deepseek.com/) |
| `ollama` (local) | n/a | `ollama/llama3.1` | [docs](https://ollama.com/library) |
| `ollama` (local) | n/a | `ollama/sweaterdog/andy-4` | [docs](https://ollama.com/library) |
| `qwen` | `QWEN_API_KEY` | `qwen-max` | [Intl.](https://www.alibabacloud.com/help/en/model-studio/developer-reference/use-qwen-by-calling-api)/[cn](https://help.aliyun.com/zh/model-studio/getting-started/models) |
| `mistral` | `MISTRAL_API_KEY` | `mistral-large-latest` | [docs](https://docs.mistral.ai/getting-started/models/models_overview/) |
| `replicate` | `REPLICATE_API_KEY` | `replicate/meta/meta-llama-3-70b-instruct` | [docs](https://replicate.com/collections/language-models) |
@ -66,7 +66,25 @@ You can configure the agent's name, model, and prompts in their profile like `an
| `vllm` | n/a | `vllm/llama3` | n/a |
If you use Ollama, to install the models used by default (generation and embedding), execute the following terminal command:
`ollama pull llama3.1 && ollama pull nomic-embed-text`
`ollama pull sweaterdog/andy-4 && ollama pull nomic-embed-text`
<details>
<summary>Additional info about Andy-4...</summary>
![image](https://github.com/user-attachments/assets/215afd01-3671-4bb6-b53f-4e51e710239a)
Andy-4 is a community made, open-source model made by Sweaterdog to play Minecraft.
Since Andy-4 is open-source, which means you can download the model, and play with it offline and for free.
The Andy-4 collection of models has reasoning and non-reasoning modes, sometimes the model will reason automatically without being prompted.
If you want to specifically enable reasoning, use the `andy-4-reasoning.json` profile.
Some Andy-4 models may not be able to disable reasoning, no matter what profile is used.
Andy-4 has many different models, and come in different sizes.
For more information about which model size is best for you, check [Sweaterdog's Ollama page](https://ollama.com/Sweaterdog/Andy-4)
If you have any Issues, join the Mindcraft server, and ping `@Sweaterdog` with your issue, or leave an issue on the [Andy-4 huggingface repo](https://huggingface.co/Sweaterdog/Andy-4/discussions/new)
</details>
### Online Servers
To connect to online servers your bot will need an official Microsoft/Minecraft account. You can use your own personal one, but will need another account if you want to connect too and play with it. To connect, change these lines in `settings.js`:
@ -102,6 +120,21 @@ When running in docker, if you want the bot to join your local minecraft server,
To connect to an unsupported minecraft version, you can try to use [viaproxy](services/viaproxy/README.md)
## STT in Mindcraft
STT allows you to speak to the model if you have a microphone
STT can be enabled in `settings.js` under the section that looks like this:
```javascript
"stt_transcription": true, // Change this to "true" to enable STT
"stt_username": "SYSTEM",
"stt_agent_name": ""
```
The Text to Speech engine will begin listening on the system default input device. **Note:** Successful STT operation depends on the `naudiodon` package, which is an optional dependency. If `naudiodon` failed to install or build (see "Installation Prerequisites" for troubleshooting), STT will be disabled.
When using STT, you **need** a [GroqCloud API key](https://console.groq.com/keys) as Groq is used for Audio transcription
# Bot Profiles
Bot profiles are json files (such as `andy.json`) that define:
@ -155,6 +188,22 @@ Supported Embedding APIs: `openai`, `google`, `replicate`, `huggingface`, `novit
If you try to use an unsupported model, then it will default to a simple word-overlap method. Expect reduced performance, recommend mixing APIs to ensure embedding support.
## Dataset collection
Mindcraft has the capabilities to collect data from you playing with the bots, which can be used to generate training data to fine-tune models such as Andy-4. To do this, enable logging inside of `settings.js`, then navigate to the `logs` folder.
Inside of the logs folder, and installing the dependecies, you will find a file named `generate_usernames.py`, you need to run this in order to convert your collected data into a usable dataset. This will generate a bunch of random names to replace the name of your bot, and your username. Both of which improve performance later on.
To run it, run `python generate_usernames.py`. The max amount of usernames will take up multiple Terabytes of data. If for some reason you want to do this, run it with the `--make_all` flag.
Next, you need to set up `convert.py` to include every username that interacted with the bot, as well as the bot's own username. This is done by adding / changing the usernames in the `ORIGINAL_USERNAMES` list.
After this, you are all set up for conversion! Since you might not want to convert all data at once, you must change the names of the `.csv` file*(s)* that you want to convert to `Andy_pre1`. If more than one file is wanted for conversion, change `1` to the next number, this value can be as high as you want.
To convert, run `python convert.py`, if you get a dependency error, ensure you are in a virtual python environment rather than a global one.
For setting up vision datasets, run `convert.py` with the flag of `--vision`, this will do the same thing as the rest of the conversions, but change the format to an image-friendly way.
## Specifying Profiles via Command Line
By default, the program will use the profiles specified in `settings.js`. You can specify one or more agent profiles using the `--profiles` argument: `node main.js --profiles ./profiles/andy.json ./profiles/jill.json`

View file

@ -1,17 +1,17 @@
{
"OPENAI_API_KEY": "",
"OPENAI_ORG_ID": "",
"GEMINI_API_KEY": "",
"ANTHROPIC_API_KEY": "",
"REPLICATE_API_KEY": "",
"GROQCLOUD_API_KEY": "",
"HUGGINGFACE_API_KEY": "",
"QWEN_API_KEY": "",
"XAI_API_KEY": "",
"MISTRAL_API_KEY": "",
"DEEPSEEK_API_KEY": "",
"GHLF_API_KEY": "",
"HYPERBOLIC_API_KEY": "",
"NOVITA_API_KEY": "",
"OPENROUTER_API_KEY": ""
}
{
"OPENAI_API_KEY": "",
"OPENAI_ORG_ID": "",
"GEMINI_API_KEY": "",
"ANTHROPIC_API_KEY": "",
"REPLICATE_API_KEY": "",
"GROQCLOUD_API_KEY": "",
"HUGGINGFACE_API_KEY": "",
"QWEN_API_KEY": "",
"XAI_API_KEY": "",
"MISTRAL_API_KEY": "",
"DEEPSEEK_API_KEY": "",
"GHLF_API_KEY": "",
"HYPERBOLIC_API_KEY": "",
"NOVITA_API_KEY": "",
"OPENROUTER_API_KEY": ""
}

432
logger.js Normal file
View file

@ -0,0 +1,432 @@
import { writeFileSync, mkdirSync, existsSync, appendFileSync, readFileSync } from 'fs';
import { join } from 'path';
import settings from './settings.js'; // Import settings
import path from 'path'; // Needed for path operations
// --- Configuration ---
const LOGS_DIR = './logs';
const VISION_DATASET_DIR = join(LOGS_DIR, 'vision_dataset'); // HuggingFace dataset format
const VISION_IMAGES_DIR = join(VISION_DATASET_DIR, 'images'); // Images subdirectory
// --- Log File Paths ---
const REASONING_LOG_FILE = join(LOGS_DIR, 'reasoning_logs.csv');
const NORMAL_LOG_FILE = join(LOGS_DIR, 'normal_logs.csv');
const VISION_METADATA_FILE = join(VISION_DATASET_DIR, 'metadata.jsonl'); // HF metadata format
// --- Log Headers ---
const TEXT_LOG_HEADER = 'input,output\n';
// --- Log Counters ---
let logCounts = {
normal: 0,
reasoning: 0,
vision: 0,
total: 0,
skipped_disabled: 0,
skipped_empty: 0,
vision_images_saved: 0,
};
// --- Helper Functions ---
function ensureDirectoryExistence(dirPath) {
if (!existsSync(dirPath)) {
try {
mkdirSync(dirPath, { recursive: true });
console.log(`[Logger] Created directory: ${dirPath}`);
} catch (error) {
console.error(`[Logger] Error creating directory ${dirPath}:`, error);
return false;
}
}
return true;
}
function countLogEntries(logFile) {
if (!existsSync(logFile)) return 0;
try {
const data = readFileSync(logFile, 'utf8');
const lines = data.split('\n').filter(line => line.trim());
// Check if the first line looks like a header before subtracting
const hasHeader = lines.length > 0 && lines[0].includes(',');
return Math.max(0, hasHeader ? lines.length - 1 : lines.length);
} catch (err) {
console.error(`[Logger] Error reading log file ${logFile}:`, err);
return 0;
}
}
function ensureLogFile(logFile, header) {
if (!ensureDirectoryExistence(path.dirname(logFile))) return false; // Ensure parent dir exists
if (!existsSync(logFile)) {
try {
writeFileSync(logFile, header);
console.log(`[Logger] Created log file: ${logFile}`);
} catch (error) {
console.error(`[Logger] Error creating log file ${logFile}:`, error);
return false;
}
} else {
try {
const content = readFileSync(logFile, 'utf-8');
const headerLine = header.split('\n')[0];
// If file is empty or header doesn't match, overwrite/create header
if (!content.trim() || !content.startsWith(headerLine)) {
// Attempt to prepend header if file has content but wrong/no header
if(content.trim() && !content.startsWith(headerLine)) {
console.warn(`[Logger] Log file ${logFile} seems to be missing or has an incorrect header. Prepending correct header.`);
writeFileSync(logFile, header + content);
} else {
// File is empty or correctly headed, just ensure header is there
writeFileSync(logFile, header);
}
console.log(`[Logger] Ensured header in log file: ${logFile}`);
}
} catch (error) {
console.error(`[Logger] Error checking/writing header for log file ${logFile}:`, error);
// Proceed cautiously, maybe log an error and continue?
}
}
return true;
}
function writeToLogFile(logFile, csvEntry) {
try {
appendFileSync(logFile, csvEntry);
// console.log(`[Logger] Logged data to ${logFile}`); // Keep console less noisy
} catch (error) {
console.error(`[Logger] Error writing to CSV log file ${logFile}:`, error);
}
}
// --- Auto-Detection for Log Type (Based on Response Content) ---
function determineLogType(response) {
// Reasoning check: needs <think>...</think> but ignore the specific 'undefined' placeholder
const isReasoning = response.includes('<think>') && response.includes('</think>') && !response.includes('<think>\nundefined</think>');
if (isReasoning) {
return 'reasoning';
} else {
return 'normal';
}
}
function sanitizeForCsv(value) {
if (typeof value !== 'string') {
value = String(value);
}
// Escape double quotes by doubling them and enclose the whole string in double quotes
return `"${value.replace(/"/g, '""')}"`;
}
// Helper function to clean reasoning markers from input
function cleanReasoningMarkers(input) {
if (typeof input !== 'string') {
return input;
}
// Remove /think and /no_think markers
return input.replace(/\/think/g, '').replace(/\/no_think/g, '').trim();
}
// Helper function to clean imagePath from messages for text logs
function cleanImagePathFromMessages(input) {
if (typeof input !== 'string') {
return input;
}
try {
const parsed = JSON.parse(input);
if (Array.isArray(parsed)) {
const cleaned = parsed.map(msg => {
let cleanedMsg = { ...msg }; // Clone message
// Remove top-level imagePath
if (cleanedMsg.imagePath !== undefined) {
delete cleanedMsg.imagePath;
}
// Remove image_url from content array
if (Array.isArray(cleanedMsg.content)) {
cleanedMsg.content = cleanedMsg.content.filter(part =>
part.type !== 'image_url' &&
!(part.type === 'image' && part.source) // Also filter Claude-style image parts
);
// If content becomes empty after filtering, remove it or set to empty string
if (cleanedMsg.content.length === 0) {
cleanedMsg.content = "";
} else if (cleanedMsg.content.length === 1 &&
cleanedMsg.content[0].type === 'text' &&
!cleanedMsg.content[0].text?.trim()) {
cleanedMsg.content = "";
}
}
return cleanedMsg;
});
return JSON.stringify(cleaned);
}
} catch (e) {
// If not valid JSON, return as-is
return input;
}
return input;
}
// --- Main Logging Function (for text-based input/output) ---
export function log(input, response) {
const trimmedInputStr = input ? (typeof input === 'string' ? input.trim() : JSON.stringify(input)) : "";
const trimmedResponse = response ? String(response).trim() : ""; // Ensure response is a string
// Clean reasoning markers from input before logging
let cleanedInput = cleanReasoningMarkers(trimmedInputStr);
// Clean imagePath from messages for text logs (normal/reasoning)
cleanedInput = cleanImagePathFromMessages(cleanedInput);
// Basic filtering
if (!cleanedInput && !trimmedResponse) {
logCounts.skipped_empty++;
return;
}
if (cleanedInput === trimmedResponse) {
logCounts.skipped_empty++;
return;
}
// Avoid logging common error messages that aren't useful training data
const errorMessages = [
"My brain disconnected, try again.",
"My brain just kinda stopped working. Try again.",
"I thought too hard, sorry, try again.",
"*no response*",
"No response received.",
"No response data.",
"Failed to send", // Broader match
"Error:", // Broader match
"Vision is only supported",
"Context length exceeded",
"Image input modality is not enabled",
"An unexpected error occurred",
// Add more generic errors/placeholders as needed
];
// Also check for responses that are just the input repeated (sometimes happens with errors)
if (errorMessages.some(err => trimmedResponse.includes(err)) || trimmedResponse === cleanedInput) {
logCounts.skipped_empty++;
// console.warn(`[Logger] Skipping log due to error/placeholder/repeat: "${trimmedResponse.substring(0, 70)}..."`);
return;
}
const logType = determineLogType(trimmedResponse);
let logFile;
let header;
let settingFlag;
switch (logType) {
case 'reasoning':
logFile = REASONING_LOG_FILE;
header = TEXT_LOG_HEADER;
settingFlag = settings.log_reasoning_data;
break;
case 'normal':
default:
logFile = NORMAL_LOG_FILE;
header = TEXT_LOG_HEADER;
settingFlag = settings.log_normal_data;
break;
}
// Check if logging for this type is enabled
if (!settingFlag) {
logCounts.skipped_disabled++;
return;
}
// Ensure directory and file exist
if (!ensureLogFile(logFile, header)) return; // ensureLogFile now checks parent dir too
// Prepare the CSV entry using the sanitizer with cleaned input
const safeInput = sanitizeForCsv(cleanedInput);
const safeResponse = sanitizeForCsv(trimmedResponse);
const csvEntry = `${safeInput},${safeResponse}\n`;
// Write to the determined log file
writeToLogFile(logFile, csvEntry);
// Update counts
logCounts[logType]++;
logCounts.total++; // Total here refers to text logs primarily
// Display summary periodically (based on total text logs)
if (logCounts.normal + logCounts.reasoning > 0 && (logCounts.normal + logCounts.reasoning) % 20 === 0) {
printSummary();
}
}
// --- Enhanced Vision Logging Function for HuggingFace Dataset Format ---
export function logVision(conversationHistory, imageBuffer, response, visionMessage = null) {
if (!settings.log_vision_data) {
logCounts.skipped_disabled++;
return;
}
const trimmedResponse = response ? String(response).trim() : "";
if (!conversationHistory || conversationHistory.length === 0 || !trimmedResponse || !imageBuffer) {
logCounts.skipped_empty++;
return;
}
// Filter out error messages
const errorMessages = [
"My brain disconnected, try again.",
"My brain just kinda stopped working. Try again.",
"I thought too hard, sorry, try again.",
"*no response*",
"No response received.",
"No response data.",
"Failed to send",
"Error:",
"Vision is only supported",
"Context length exceeded",
"Image input modality is not enabled",
"An unexpected error occurred",
"Image captured for always active vision", // Filter out placeholder responses
];
if (errorMessages.some(err => trimmedResponse.includes(err))) {
logCounts.skipped_empty++;
return;
}
// Ensure directories exist
if (!ensureDirectoryExistence(VISION_DATASET_DIR)) return;
if (!ensureDirectoryExistence(VISION_IMAGES_DIR)) return;
try {
// Generate unique filename for the image
const timestamp = Date.now();
const randomSuffix = Math.random().toString(36).substring(2, 8);
const imageFilename = `vision_${timestamp}_${randomSuffix}.jpg`;
const imagePath = join(VISION_IMAGES_DIR, imageFilename);
const relativeImagePath = `images/${imageFilename}`; // Relative path for metadata
// Save the image
writeFileSync(imagePath, imageBuffer);
logCounts.vision_images_saved++;
// Clean the conversation history to remove imagePath and image data before logging
const cleanedConversationHistory = JSON.parse(cleanImagePathFromMessages(JSON.stringify(conversationHistory)));
// Format the complete input as JSON (cleaned conversation history)
const inputData = JSON.stringify(cleanedConversationHistory);
// Create metadata entry in JSONL format for HuggingFace
const metadataEntry = {
file_name: relativeImagePath,
input: inputData, // Cleaned JSON conversation history
response: trimmedResponse, // Actual model response, not placeholder
timestamp: timestamp
};
// Append to metadata JSONL file
const jsonlLine = JSON.stringify(metadataEntry) + '\n';
appendFileSync(VISION_METADATA_FILE, jsonlLine);
logCounts.vision++;
logCounts.total++;
// Display summary periodically
if (logCounts.vision > 0 && logCounts.vision % 10 === 0) {
printSummary();
}
} catch (error) {
console.error(`[Logger] Error logging vision data:`, error);
}
}
// Helper function to format conversation history as fallback
function formatConversationInput(conversationHistory) {
if (!conversationHistory || conversationHistory.length === 0) return '';
const formattedHistory = [];
for (const turn of conversationHistory) {
const formattedTurn = {
role: turn.role || 'user',
content: []
};
// Handle different content formats
if (typeof turn.content === 'string') {
formattedTurn.content.push({
type: 'text',
text: turn.content
});
} else if (Array.isArray(turn.content)) {
// Already in the correct format
formattedTurn.content = turn.content;
} else if (turn.content && typeof turn.content === 'object') {
// Convert object to array format
if (turn.content.text) {
formattedTurn.content.push({
type: 'text',
text: turn.content.text
});
}
if (turn.content.image) {
formattedTurn.content.push({
type: 'image',
image: turn.content.image
});
}
}
formattedHistory.push(formattedTurn);
}
return JSON.stringify(formattedHistory);
}
function printSummary() {
const totalStored = logCounts.normal + logCounts.reasoning + logCounts.vision;
console.log('\n' + '='.repeat(60));
console.log('LOGGER SUMMARY');
console.log('-'.repeat(60));
console.log(`Normal logs stored: ${logCounts.normal}`);
console.log(`Reasoning logs stored: ${logCounts.reasoning}`);
console.log(`Vision logs stored: ${logCounts.vision} (Images saved: ${logCounts.vision_images_saved})`);
console.log(`Skipped (disabled): ${logCounts.skipped_disabled}`);
console.log(`Skipped (empty/err): ${logCounts.skipped_empty}`);
console.log('-'.repeat(60));
console.log(`Total logs stored: ${totalStored}`);
console.log('='.repeat(60) + '\n');
}
// Initialize counts at startup
function initializeCounts() {
logCounts.normal = countLogEntries(NORMAL_LOG_FILE);
logCounts.reasoning = countLogEntries(REASONING_LOG_FILE);
logCounts.vision = countVisionEntries(VISION_METADATA_FILE);
// Total count will be accumulated during runtime
console.log(`[Logger] Initialized log counts: Normal=${logCounts.normal}, Reasoning=${logCounts.reasoning}, Vision=${logCounts.vision}`);
}
function countVisionEntries(metadataFile) {
if (!existsSync(metadataFile)) return 0;
try {
const data = readFileSync(metadataFile, 'utf8');
const lines = data.split('\n').filter(line => line.trim());
return lines.length;
} catch (err) {
console.error(`[Logger] Error reading vision metadata file ${metadataFile}:`, err);
return 0;
}
}
// Initialize counts at startup
initializeCounts();

964
logs/convert.py Normal file
View file

@ -0,0 +1,964 @@
import csv
import json
import logging
import sys
import os
import random
from typing import List, Dict
import pandas as pd
from USERNAMES import Get_Usernames
from transformers import AutoTokenizer
from tqdm import tqdm
import torch
from PIL import Image
import base64
from io import BytesIO
# Try to import pandas-image-methods for vision data handling
try:
from pandas_image_methods import PILMethods
PANDAS_IMAGE_METHODS_AVAILABLE = True
# Enable PIL methods for pandas
pd.api.extensions.register_series_accessor("pil")(PILMethods)
except ImportError:
PANDAS_IMAGE_METHODS_AVAILABLE = False
logging.warning("pandas-image-methods not available. Install with: pip install pandas-image-methods")
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
# Increase CSV field size limit to avoid errors with very large fields.
maxInt = sys.maxsize
while True:
try:
csv.field_size_limit(maxInt)
break
except OverflowError:
maxInt = int(maxInt/10)
# Define the original usernames.
ORIGINAL_USERNAMES = [
"Your_username", "Andy"
]
# Define outputs that should cause the conversation to be deleted.
BAD_OUTPUTS = {
"My brain just kinda stopped working. Try again.",
"My brain disconnected, try again.",
"Vision is only supported",
"Context length exceeded",
"Image input modality is not enabled",
"An unexpected error occurred",
}
MINECRAFT_USERNAMES = list(set(Get_Usernames())) # Remove duplicates
duplicate_count = len(Get_Usernames()) - len(MINECRAFT_USERNAMES)
available_minecraft_usernames = list(MINECRAFT_USERNAMES) # Create a copy for tracking
global username_replaced_count
global reasoning_replaced_count
username_replaced_count = 0
reasoning_replaced_count = 0
def replace_reasoning_prompt(text: str) -> str:
global reasoning_replaced_count
replaced = False
# Optionally, replace the reasoning prompt if needed.
if replaced:
reasoning_replaced_count += 1
return text
def parse_json_safely(text: str) -> List[Dict[str, str]]:
try:
if text.startswith('[') and '],' in text:
parts = text.split('],')
text = parts[0] + ']'
if text.startswith('"') and text.endswith('"'):
text = text[1:-1]
text = text.replace('""', '"')
data = json.loads(text)
if isinstance(data, list) and len(data) > 0 and isinstance(data[0], list):
data = data[0]
converted_messages = []
for msg in data:
if isinstance(msg, dict) and 'role' in msg and 'content' in msg:
converted_messages.append({
"from": "human" if msg['role'] in ("system", "user") else "gpt",
"value": msg['content']
})
return converted_messages
except Exception as e:
logger.debug(f"Error parsing JSON: {e}") # Suppressed error level
return [{
"from": "human",
"value": text
}]
def create_conversation_thread(row: Dict[str, str]) -> List[Dict[str, str]]:
messages = []
conversation_replacements = {} # Track username replacements for this conversation ONLY
def replace_usernames_in_message(text: str) -> str:
global username_replaced_count
global available_minecraft_usernames
replaced = False
if not MINECRAFT_USERNAMES:
return text
for orig_name in ORIGINAL_USERNAMES:
if orig_name in text:
if orig_name not in conversation_replacements:
# If we've used all available names, reset the list
if not available_minecraft_usernames:
available_minecraft_usernames = list(MINECRAFT_USERNAMES)
# Get a random name from the available ones
replacement = random.choice(available_minecraft_usernames)
available_minecraft_usernames.remove(replacement)
conversation_replacements[orig_name] = replacement
replaced = True
# Use existing replacement for this conversation
text = text.replace(orig_name, conversation_replacements[orig_name])
if replaced:
username_replaced_count += 1
return text
if row.get("input"):
messages = parse_json_safely(str(row["input"]))
# Apply consistent username replacements to all messages
for msg in messages:
msg["value"] = replace_usernames_in_message(msg["value"])
if row.get("output"):
output_text = str(row["output"]).strip()
output_text = replace_usernames_in_message(output_text)
output_text = replace_reasoning_prompt(output_text)
messages.append({
"from": "gpt",
"value": output_text
})
return messages
def conversation_has_bad_output(messages: List[Dict[str, str]]) -> bool:
for msg in messages:
if msg["from"] == "gpt" and msg["value"].strip() in BAD_OUTPUTS:
return True
return False
def load_image_from_base64(base64_string: str):
"""Convert base64 string to PIL Image"""
try:
if base64_string.startswith('data:'):
base64_string = base64_string.split(',')[1]
image_bytes = base64.b64decode(base64_string)
image = Image.open(BytesIO(image_bytes))
if image.mode in ('RGBA', 'LA', 'P'):
image = image.convert('RGB')
return image
except Exception as e:
logger.debug(f"Error loading image from base64: {e}")
return Image.new('RGB', (224, 224), color='gray')
def pil_image_to_parquet_dict(image: Image.Image, filename: str) -> Dict:
"""Converts a PIL Image to the dictionary format {bytes, path} for Parquet."""
img_byte_arr = BytesIO()
# Determine a suitable save format
save_format = image.format if image.format and image.format in Image.SAVE else 'PNG'
# Handle specific mode conversions if necessary for the chosen format
if save_format == 'PNG' and image.mode not in ['RGB', 'RGBA', 'L', 'P', 'I', 'F']: # Common PNG modes
# Convert to a mode PNG supports, e.g., RGBA to preserve transparency
image_to_save = image.convert("RGBA")
elif save_format == 'JPEG' and image.mode not in ['RGB', 'L', 'CMYK']:
# Convert to a mode JPEG supports
image_to_save = image.convert("RGB")
else:
image_to_save = image
try:
image_to_save.save(img_byte_arr, format=save_format)
except Exception as e:
logger.warning(f"Could not save image {filename} in format {save_format} (Error: {e}). Attempting PNG.")
save_format = 'PNG'
if image_to_save.mode not in ['RGB', 'RGBA', 'L', 'P', 'I', 'F']:
image_to_save = image.convert("RGBA") # Default to RGBA for PNG
image_to_save.save(img_byte_arr, format=save_format)
return {"bytes": img_byte_arr.getvalue(), "path": filename}
def extract_vision_data_from_jsonl(jsonl_path: str) -> List[Dict]:
"""Extract vision data from HuggingFace JSONL metadata format"""
if not os.path.isfile(jsonl_path):
logger.error(f"JSONL file not found: {jsonl_path}")
return []
logger.info(f"Reading vision metadata: {jsonl_path}")
# Get the directory containing the JSONL file (should contain images folder)
base_dir = os.path.dirname(jsonl_path)
images_dir = os.path.join(base_dir, 'images')
if not os.path.isdir(images_dir):
logger.error(f"Images directory not found: {images_dir}")
return []
vision_data = []
with open(jsonl_path, 'r', encoding='utf-8') as f:
for line_num, line in enumerate(f, 1):
line = line.strip()
if not line:
continue
try:
entry = json.loads(line)
# Extract required fields - logger.js uses 'input' and 'response', not 'text'
file_name = entry.get('file_name', '')
input_data = entry.get('input', '')
response = entry.get('response', '')
if not all([file_name, input_data, response]):
logger.warning(f"Line {line_num}: Missing required fields (file_name, input, response)")
continue
# Check for bad outputs
if response.strip() in BAD_OUTPUTS:
logger.debug(f"Line {line_num}: Skipping bad output")
continue
# Load the image
image_path = os.path.join(base_dir, file_name)
if not os.path.isfile(image_path):
logger.warning(f"Line {line_num}: Image file not found: {image_path}")
continue
try:
image = Image.open(image_path)
if image.mode in ('RGBA', 'LA', 'P') and image.format != 'PNG': # PNG handles these modes well
image = image.convert('RGB') # Convert to RGB if not PNG to simplify, or handle more modes in pil_image_to_parquet_dict
except Exception as e:
logger.warning(f"Line {line_num}: Error loading image {image_path}: {e}")
continue
# Convert PIL image to parquet-compatible dict
relative_image_path_for_dict = file_name # Use the relative path from metadata
image_dict = pil_image_to_parquet_dict(image, relative_image_path_for_dict)
# Create a separate conversation_replacements for each vision entry
entry_conversation_replacements = {}
# Replace usernames consistently within this single entry
def replace_usernames_in_text(text: str) -> str:
global username_replaced_count
global available_minecraft_usernames
replaced = False
if not MINECRAFT_USERNAMES:
return text
for orig_name in ORIGINAL_USERNAMES:
if orig_name in text:
if orig_name not in entry_conversation_replacements:
if not available_minecraft_usernames:
available_minecraft_usernames = list(MINECRAFT_USERNAMES)
replacement = random.choice(available_minecraft_usernames)
available_minecraft_usernames.remove(replacement)
entry_conversation_replacements[orig_name] = replacement
replaced = True
text = text.replace(orig_name, entry_conversation_replacements[orig_name])
if replaced:
username_replaced_count += 1
return text
# Parse the input data (conversation history) and build conversation
try:
# The input_data should be JSON string of conversation history
conversation_history = json.loads(input_data)
# Build the conversation in unsloth format
conversation = []
if isinstance(conversation_history, list):
for msg in conversation_history:
if isinstance(msg, dict) and 'role' in msg:
role = msg['role']
# Map system messages to user role for simplicity
if role == 'system':
role = 'user'
content_parts = []
# Handle different content formats
if 'content' in msg:
content = msg['content']
if isinstance(content, str):
# Simple string content
text_content = replace_usernames_in_text(content)
content_parts.append({"type": "text", "text": text_content})
elif isinstance(content, list):
# Array content (multimodal messages)
for part in content:
if isinstance(part, dict):
if part.get('type') == 'text':
text_content = part.get('text', '')
if text_content:
text_content = replace_usernames_in_text(text_content)
content_parts.append({"type": "text", "text": text_content})
# Skip image parts from history - we'll add the main image to the user message
elif any(key in msg for key in ['text', 'message', 'value']):
# Handle other message formats
text_content = msg.get('text') or msg.get('message') or msg.get('value', '')
if text_content:
text_content = replace_usernames_in_text(str(text_content))
content_parts.append({"type": "text", "text": text_content})
if content_parts:
conversation.append({
"role": role,
"content": content_parts
})
# If no conversation history was parsed or it's empty, create a simple user message
if not conversation:
# Use the raw input data as text
text_content = replace_usernames_in_text(str(input_data).strip())
conversation.append({
"role": "user",
"content": [{"type": "text", "text": text_content}]
})
# Add the image to the last user message (or create one if none exists)
user_msg_found = False
for i in range(len(conversation) - 1, -1, -1):
if conversation[i]["role"] == "user":
# Add image to this user message
conversation[i]["content"].append({"type": "image", "image": image_dict})
user_msg_found = True
break
if not user_msg_found:
# No user message found, create one with just the image
conversation.append({
"role": "user",
"content": [{"type": "image", "image": image_dict}]
})
# Add the assistant response
response_text = replace_usernames_in_text(response)
conversation.append({
"role": "assistant",
"content": [{"type": "text", "text": response_text}]
})
except json.JSONDecodeError:
# If input_data is not valid JSON, create simple conversation
text_content = replace_usernames_in_text(str(input_data).strip())
response_text = replace_usernames_in_text(response)
conversation = [
{
"role": "user",
"content": [
{"type": "text", "text": text_content},
{"type": "image", "image": image_dict}
]
},
{
"role": "assistant",
"content": [{"type": "text", "text": response_text}]
}
]
except Exception as e:
logger.debug(f"Line {line_num}: Error parsing conversation history: {e}")
# Fallback to simple conversation
text_content = replace_usernames_in_text(str(input_data).strip())
response_text = replace_usernames_in_text(response)
conversation = [
{
"role": "user",
"content": [
{"type": "text", "text": text_content},
{"type": "image", "image": image_dict}
]
},
{
"role": "assistant",
"content": [{"type": "text", "text": response_text}]
}
]
vision_data.append(conversation)
except json.JSONDecodeError as e:
logger.warning(f"Line {line_num}: JSON decode error: {e}")
continue
except Exception as e:
logger.warning(f"Line {line_num}: Unexpected error: {e}")
continue
logger.info(f"Successfully processed {len(vision_data)} vision entries")
return vision_data
def extract_vision_conversations_from_csv(csv_input: str) -> List[Dict]:
"""Extract vision data from CSV with input,image,output columns"""
if not os.path.isfile(csv_input):
logger.debug(f"Vision CSV file not found: {csv_input}")
return []
logger.info(f"Reading Vision CSV: {csv_input}")
try:
df = pd.read_csv(csv_input)
required_columns = ['input', 'image', 'output']
if not all(col in df.columns for col in required_columns):
logger.debug(f"Vision CSV missing required columns: {required_columns}")
return []
vision_data = []
for idx, row in df.iterrows():
try:
input_text = str(row['input']).strip()
image_b64 = str(row['image']).strip()
output_text = str(row['output']).strip()
if not all([input_text, image_b64, output_text]):
continue
# Check for bad outputs
if output_text in BAD_OUTPUTS:
continue
# Create separate replacements for each row
row_conversation_replacements = {}
# Replace usernames consistently within this single row
def replace_usernames_in_text(text: str) -> str:
global username_replaced_count
global available_minecraft_usernames
replaced = False
if not MINECRAFT_USERNAMES:
return text
for orig_name in ORIGINAL_USERNAMES:
if orig_name in text:
if orig_name not in row_conversation_replacements:
if not available_minecraft_usernames:
available_minecraft_usernames = list(MINECRAFT_USERNAMES)
replacement = random.choice(available_minecraft_usernames)
available_minecraft_usernames.remove(replacement)
row_conversation_replacements[orig_name] = replacement
replaced = True
text = text.replace(orig_name, row_conversation_replacements[orig_name])
if replaced:
username_replaced_count += 1
return text
input_text = replace_usernames_in_text(input_text)
output_text = replace_usernames_in_text(output_text)
# Load image from base64
image = load_image_from_base64(image_b64)
# Convert PIL image to parquet-compatible dict
image_filename_for_dict = f"image_from_base64_{idx}.png" # Create a placeholder filename
image_dict = pil_image_to_parquet_dict(image, image_filename_for_dict)
# Create conversation in unsloth format
conversation = [
{
"role": "user",
"content": [
{"type": "text", "text": input_text},
{"type": "image", "image": image_dict}
]
},
{
"role": "assistant",
"content": [{"type": "text", "text": output_text}]
}
]
vision_data.append(conversation)
except Exception as e:
logger.warning(f"Row {idx}: Error processing vision data: {e}")
continue
logger.info(f"Successfully processed {len(vision_data)} vision entries from CSV")
return vision_data
except Exception as e:
logger.error(f"Error reading vision CSV {csv_input}: {e}")
return []
def extract_conversations_from_csv(csv_input: str) -> List[List[Dict[str, str]]]:
if not os.path.isfile(csv_input):
logger.debug(f"CSV file not found: {csv_input}")
return []
logger.info(f"Reading CSV: {csv_input}")
valid_rows = []
extra_issue_rows = 0
total_extra_columns = 0
with open(csv_input, newline='', encoding="utf-8") as csvfile:
reader = csv.reader(csvfile)
try:
header = next(reader)
except StopIteration:
logger.debug(f"CSV file {csv_input} is empty.")
return []
header_expected = {"input", "output"}
header_map = {col: idx for idx, col in enumerate(header)}
if not header_expected.issubset(set(header)):
logger.debug(f"CSV header does not contain required columns: {header_expected}")
return []
for idx, row in enumerate(reader, start=2):
non_empty_count = sum(1 for field in row if field.strip() != "")
if non_empty_count > 2:
extra = non_empty_count - 2
extra_issue_rows += 1
total_extra_columns += extra
logger.info(f"Row {idx} has {extra} extra filled column(s); row skipped.")
continue
row_dict = {col: row[header_map[col]] if header_map[col] < len(row) else "" for col in header_expected}
valid_rows.append(row_dict)
logger.info(f"Excluded {extra_issue_rows} row(s) with extra columns (total extra columns: {total_extra_columns}).")
df = pd.DataFrame(valid_rows)
conversations = []
for idx, row in df.iterrows():
conv = create_conversation_thread(row)
if conversation_has_bad_output(conv):
continue
conversations.append(conv)
return conversations
def extract_vision_conversations_from_csv(csv_input: str) -> List[Dict]:
"""Extract vision data from CSV with input,image,output columns"""
if not os.path.isfile(csv_input):
logger.debug(f"Vision CSV file not found: {csv_input}")
return []
logger.info(f"Reading Vision CSV: {csv_input}")
try:
df = pd.read_csv(csv_input)
required_columns = ['input', 'image', 'output']
if not all(col in df.columns for col in required_columns):
logger.debug(f"Vision CSV missing required columns: {required_columns}")
return []
vision_data = []
for idx, row in df.iterrows():
try:
input_text = str(row['input']).strip()
image_b64 = str(row['image']).strip()
output_text = str(row['output']).strip()
if not all([input_text, image_b64, output_text]):
continue
# Check for bad outputs
if output_text in BAD_OUTPUTS:
continue
# Create separate replacements for each row
row_conversation_replacements = {}
# Replace usernames consistently within this single row
def replace_usernames_in_text(text: str) -> str:
global username_replaced_count
global available_minecraft_usernames
replaced = False
if not MINECRAFT_USERNAMES:
return text
for orig_name in ORIGINAL_USERNAMES:
if orig_name in text:
if orig_name not in row_conversation_replacements:
if not available_minecraft_usernames:
available_minecraft_usernames = list(MINECRAFT_USERNAMES)
replacement = random.choice(available_minecraft_usernames)
available_minecraft_usernames.remove(replacement)
row_conversation_replacements[orig_name] = replacement
replaced = True
text = text.replace(orig_name, row_conversation_replacements[orig_name])
if replaced:
username_replaced_count += 1
return text
input_text = replace_usernames_in_text(input_text)
output_text = replace_usernames_in_text(output_text)
# Load image from base64
image = load_image_from_base64(image_b64)
# Convert PIL image to parquet-compatible dict
image_filename_for_dict = f"image_from_base64_{idx}.png" # Create a placeholder filename
image_dict = pil_image_to_parquet_dict(image, image_filename_for_dict)
# Create conversation in unsloth format
conversation = [
{
"role": "user",
"content": [
{"type": "text", "text": input_text},
{"type": "image", "image": image_dict}
]
},
{
"role": "assistant",
"content": [{"type": "text", "text": output_text}]
}
]
vision_data.append(conversation)
except Exception as e:
logger.warning(f"Row {idx}: Error processing vision data: {e}")
continue
logger.info(f"Successfully processed {len(vision_data)} vision entries from CSV")
return vision_data
except Exception as e:
logger.error(f"Error reading vision CSV {csv_input}: {e}")
return []
def extract_conversations_from_json(json_input: str) -> List[List[Dict[str, str]]]:
logger.info(f"Reading JSON: {json_input}")
try:
with open(json_input, 'r', encoding='utf-8') as f:
data = json.load(f)
except Exception as e:
logger.debug(f"Error reading {json_input}: {e}")
return []
conversations = []
for conv in data:
messages = []
if "system" in conv and conv["system"]:
system_text = str(conv["system"]).strip()
system_text = replace_reasoning_prompt(system_text)
messages.append({"from": "human", "value": system_text})
if "user" in conv and conv["user"]:
user_text = str(conv["user"]).strip()
user_text = replace_reasoning_prompt(user_text)
messages.append({"from": "human", "value": user_text})
if "assistant" in conv and conv["assistant"]:
assistant_text = str(conv["assistant"]).strip()
assistant_text = replace_reasoning_prompt(assistant_text)
messages.append({"from": "gpt", "value": assistant_text})
if messages and not conversation_has_bad_output(messages):
conversations.append(messages)
return conversations
if __name__ == "__main__":
# Handle vision dataset processing
if '--vision' in sys.argv:
if not PANDAS_IMAGE_METHODS_AVAILABLE:
logger.error("pandas-image-methods is required for --vision flag. Install with: pip install pandas-image-methods")
sys.exit(1)
# Look for vision data files
vision_files = []
# Check for HuggingFace format (metadata.jsonl)
metadata_jsonl = "vision_dataset/metadata.jsonl"
if os.path.isfile(metadata_jsonl):
vision_files.append((metadata_jsonl, 'jsonl'))
# Check for CSV format vision logs
vision_csv = "vision_logs.csv"
if os.path.isfile(vision_csv):
vision_files.append((vision_csv, 'csv'))
# Check for numbered files
i = 1
while True:
jsonl_file = f"vision_dataset{i}/metadata.jsonl"
csv_file = f"vision_logs{i}.csv"
found_any = False
if os.path.isfile(jsonl_file):
vision_files.append((jsonl_file, 'jsonl'))
found_any = True
if os.path.isfile(csv_file):
vision_files.append((csv_file, 'csv'))
found_any = True
if not found_any:
break
i += 1
if not vision_files:
logger.error("No vision dataset files found for --vision flag!")
logger.info("Looking for:")
logger.info(" - vision_dataset/metadata.jsonl (HuggingFace format)")
logger.info(" - vision_logs.csv (CSV format)")
logger.info(" - vision_datasetN/metadata.jsonl")
logger.info(" - vision_logsN.csv")
sys.exit(1)
logger.info(f"Found {len(vision_files)} vision files: {[f for f, _ in vision_files]}")
# Process all vision files
all_vision_data = []
total_count = 0
file_counts = {}
for file_path, file_type in vision_files:
if file_type == 'jsonl':
vision_data = extract_vision_data_from_jsonl(file_path)
else: # csv
vision_data = extract_vision_conversations_from_csv(file_path)
file_counts[file_path] = len(vision_data)
all_vision_data.extend(vision_data)
total_count += len(vision_data)
if not all_vision_data:
logger.error("No valid vision data found!")
sys.exit(1)
# Check for tokenization flags
do_tokenize = '--tokenize' in sys.argv
tokenizer = None
device = "cuda" if torch.cuda.is_available() else "cpu"
if do_tokenize:
logger.info("Loading tokenizer 'unsloth/Llama-3.2-1B-Instruct-bnb-4bit'...")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Llama-3.2-1B-Instruct-bnb-4bit")
# Tokenize if requested
if do_tokenize and tokenizer:
all_texts = []
for entry in all_vision_data:
all_texts.append(entry['input'])
all_texts.append(entry['output'])
total_tokens = 0
logger.info("Tokenizing vision data...")
for text in tqdm(all_texts, desc="Tokenizing", unit="msg"):
encoded = tokenizer(text, return_tensors="pt")
input_ids = encoded["input_ids"].to(device)
total_tokens += input_ids.shape[-1]
logger.info(f"Total tokens across all vision data: {total_tokens}")
# Remove duplicates based on conversation content
unique_vision_data = []
seen_keys = set()
for conversation in all_vision_data:
# Create a key from the text content of the conversation
key_parts = []
for msg in conversation:
if msg["role"] in ["user", "assistant"]:
for content_part in msg["content"]:
if content_part["type"] == "text":
key_parts.append(content_part["text"].strip())
key = tuple(key_parts)
if key not in seen_keys:
seen_keys.add(key)
unique_vision_data.append(conversation)
all_vision_data = unique_vision_data
logger.info(f"After deduplication: {len(all_vision_data)} unique vision conversations")
# Shuffle the data
random.shuffle(all_vision_data)
# Images are already in parquet-compatible dict format within all_vision_data
# No further image processing needed here before creating DataFrame
# Create DataFrame with conversations column (unsloth format)
df_final = pd.DataFrame({"conversations": all_vision_data})
output_parquet = "Andy_vision_conversations.parquet"
logger.info(f"Writing vision dataset to {output_parquet}")
try:
df_final.to_parquet(output_parquet, index=False)
abs_path = os.path.abspath(output_parquet)
logger.info(f"Successfully wrote vision dataset to: {abs_path}")
except Exception as e:
logger.error(f"Error writing Parquet file: {e}")
sys.exit(1)
logger.info(
f"\n"
f"--------------------------------------------------------------------------------------\n"
f"Vision conversion complete! Processed {total_count} vision conversations from {len(vision_files)} files.\n"
f"Replaced {username_replaced_count} usernames across conversations.\n"
f"Total usernames available: {len(MINECRAFT_USERNAMES)}\n"
f"Final dataset size: {len(all_vision_data)} unique conversations\n"
f"--------------------------------------------------------------------------------------\n"
)
# Log counts per file
for file_path, count in file_counts.items():
logger.info(f"File '{file_path}' contributed {count} conversations.")
sys.exit(0)
# Regular processing for non-vision data
base_filename = "Andy_pre"
files = []
i = 1
while True:
csv_file = f"{base_filename}{i}.csv"
json_file = f"{base_filename}{i}.json"
if not os.path.isfile(csv_file) and not os.path.isfile(json_file):
break
if os.path.isfile(csv_file):
files.append((csv_file, 'csv'))
if os.path.isfile(json_file):
files.append((json_file, 'json'))
i += 1
if not files:
logger.info("No CSV or JSON files found with pattern Andy_preN.(csv|json)")
sys.exit(1)
# Check for tokenization flags
do_tokenize = '--tokenize' in sys.argv
do_tokenize_largest = '--tokenize_largest' in sys.argv
tokenizer = None
device = "cuda" if torch.cuda.is_available() else "cpu"
if do_tokenize or do_tokenize_largest:
logger.info("Loading tokenizer 'unsloth/Llama-3.2-1B-Instruct-bnb-4bit'...")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Llama-3.2-1B-Instruct-bnb-4bit")
logger.info(f"Found {len(files)} files: {[f for f, _ in files]}")
combined_conversations = []
total_count = 0
file_conversation_counts = {}
for file, ftype in files:
if ftype == 'csv':
convs = extract_conversations_from_csv(file)
else:
convs = extract_conversations_from_json(file)
file_conversation_counts[file] = len(convs)
combined_conversations.extend(convs)
total_count += len(convs)
# Tokenize all data and count tokens
if do_tokenize:
all_texts = [msg["value"] for conv in combined_conversations for msg in conv]
total_tokens = 0
logger.info("Tokenizing all data with progress bar and GPU acceleration...")
for text in tqdm(all_texts, desc="Tokenizing", unit="msg"):
encoded = tokenizer(text, return_tensors="pt")
input_ids = encoded["input_ids"].to(device)
total_tokens += input_ids.shape[-1]
logger.info(f"Total tokens across all data: {total_tokens}")
# Tokenize 5 largest conversations
if do_tokenize_largest:
conv_token_counts = []
logger.info("Tokenizing largest conversations with progress bar and GPU acceleration...")
for conv in tqdm(combined_conversations, desc="Tokenizing convs", unit="conv"):
text = "\n".join(msg["value"] for msg in conv)
encoded = tokenizer(text, return_tensors="pt")
input_ids = encoded["input_ids"].to(device)
conv_token_counts.append((input_ids.shape[-1], conv))
# sort and take top 5
conv_token_counts.sort(key=lambda x: x[0], reverse=True)
top5 = conv_token_counts[:5]
max_tokens = max(count for count, _ in top5)
for idx, (count, _) in enumerate(top5, 1):
logger.info(f"Top {idx} conversation tokens: {count}")
logger.info(f"Maximum tokens in top 5: {max_tokens}")
# Clean up GPT messages
for conv in combined_conversations:
for msg in conv:
if msg["from"] == "gpt":
msg["value"] = msg["value"].replace("<think>\nundefined</think>\n", "").replace("<think>\nundefined</think>", "").strip()
unique_conversations = []
seen_keys = set()
for conv in combined_conversations:
if len(conv) < 2:
key = tuple(msg["value"] for msg in conv)
else:
key = (conv[0]["value"].strip(), conv[-1]["value"].strip())
if key not in seen_keys:
seen_keys.add(key)
unique_conversations.append(conv)
combined_conversations = unique_conversations
random.shuffle(combined_conversations)
# Handle codeOnly flag
if '--codeOnly' in sys.argv:
coding = []
noncoding = []
for conv in combined_conversations:
has_code = any("```" in msg["value"] for msg in conv) or (
conv and conv[-1]["from"] == "gpt" and "!newAction(" in conv[-1]["value"]
)
if has_code:
coding.append(conv)
else:
noncoding.append(conv)
logger.info(f"Found {len(coding)} coding examples and {len(noncoding)} non-coding examples.")
noncoding_count = int(round(0.15 * len(coding)))
if noncoding_count > len(noncoding):
noncoding_count = len(noncoding)
selected_noncoding = random.sample(noncoding, noncoding_count) if noncoding_count > 0 else []
final_conversations = coding + selected_noncoding
random.shuffle(final_conversations)
combined_conversations = final_conversations
if '--codeOnly' in sys.argv:
df_final = pd.DataFrame({"conversations": combined_conversations})
output_parquet = "Andy_conversations_codeOnly.parquet"
else:
df_final = pd.DataFrame({"conversations": combined_conversations})
output_parquet = "Andy_conversations.parquet"
logger.info(f"Writing output to {output_parquet}")
try:
df_final.to_parquet(output_parquet, index=False)
abs_path = os.path.abspath(output_parquet)
logger.info(f"Successfully wrote output to: {abs_path}")
except Exception as e:
logger.debug(f"Error writing Parquet file: {e}")
sys.exit(1)
logger.info(
f"\n"
f"--------------------------------------------------------------------------------------\n\n"
f"Conversion complete! Processed {total_count} conversations from {len(files)} files. \n"
f"Replaced {username_replaced_count} usernames across {total_count} conversations. \n"
f"Total amount of usernames to choose from: {len(MINECRAFT_USERNAMES)} (removed {duplicate_count} duplicates) \n"
f"--------------------------------------------------------------------------------------\n\n"
)
# Log conversation counts per file.
for file, count in file_conversation_counts.items():
logger.info(f"File '{file}' contributed {count} conversations.")

1117
logs/generate_usernames.py Normal file

File diff suppressed because it is too large Load diff

18
logs/requirements.txt Normal file
View file

@ -0,0 +1,18 @@
# Core dependencies for convert.py
pandas>=1.3.0
pandas-image-methods>=0.2.0
transformers>=4.20.0
torch>=1.12.0
tqdm>=4.64.0
pillow>=9.0.0
pyarrow>=10.0.0
# Optional dependencies for enhanced functionality
datasets>=2.0.0
dask[complete]>=2022.7.0
distributed>=2022.7.0
# Additional utility dependencies
numpy>=1.21.0
requests>=2.25.0

View file

@ -3,6 +3,7 @@ import settings from './settings.js';
import yargs from 'yargs';
import { hideBin } from 'yargs/helpers';
import { readFileSync } from 'fs';
import { initTTS } from './src/process/tts_process.js';
function parseArguments() {
return yargs(hideBin(process.argv))
@ -69,4 +70,5 @@ for (let profile of settings.profiles) {
const profile_json = JSON.parse(readFileSync(profile, 'utf8'));
settings.profile = profile_json;
Mindcraft.createAgent(settings);
}
}
initTTS();

View file

@ -9,7 +9,7 @@
"cheerio": "^1.0.0",
"express": "^4.18.2",
"google-translate-api-x": "^10.7.1",
"groq-sdk": "^0.15.0",
"groq-sdk": "^0.5.0",
"minecraft-data": "^3.78.0",
"mineflayer": "^4.29.0",
"mineflayer-armor-manager": "^2.0.1",
@ -28,8 +28,13 @@
"socket.io-client": "^4.7.2",
"three": "^0.128.0",
"vec3": "^0.1.10",
"wav": "^1.0.2",
"yargs": "^17.7.2"
},
"optionalDependencies": {
"naudiodon": "^2.3.6",
"mic": "^2.1.2"
},
"scripts": {
"postinstall": "patch-package",
"start": "node main.js"
@ -40,4 +45,4 @@
"eslint-plugin-no-floating-promise": "^2.0.0",
"globals": "^15.11.0"
}
}
}

View file

@ -1,13 +0,0 @@
diff --git a/node_modules/@google/generative-ai/dist/index.mjs b/node_modules/@google/generative-ai/dist/index.mjs
index 23a175b..aab7e19 100644
--- a/node_modules/@google/generative-ai/dist/index.mjs
+++ b/node_modules/@google/generative-ai/dist/index.mjs
@@ -151,7 +151,7 @@ class GoogleGenerativeAIResponseError extends GoogleGenerativeAIError {
* limitations under the License.
*/
const BASE_URL = "https://generativelanguage.googleapis.com";
-const API_VERSION = "v1";
+const API_VERSION = "v1beta";
/**
* We can't `require` package.json if this runs on web. We will use rollup to
* swap in the version number here at build time.

View file

@ -0,0 +1,14 @@
{
"name": "Andy-4",
"model": "ollama/sweaterdog/andy-4",
"conversing": "You are a playful Minecraft bot named $NAME that can converse with players, see, move, mine, build, and interact with the world by using commands.\n$SELF_PROMPT Act human-like as if you were a typical Minecraft player, rather than an AI. Be very brief in your responses, don't apologize constantly, don't give instructions or make lists unless asked, and don't refuse requests. Think in high amounts before responding. Don't pretend to act, use commands immediately when requested. Do NOT say this: 'Sure, I've stopped.', instead say this: 'Sure, I'll stop. !stop'. Do NOT say this: 'On my way! Give me a moment.', instead say this: 'On my way! !goToPlayer(\"playername\", 3)'. Respond only as $NAME, never output '(FROM OTHER BOT)' or pretend to be someone else. If you have nothing to say or do, respond with an just a tab '\t'. This is extremely important to me, take a deep breath and have fun :)\nSummarized memory:'$MEMORY'\n$STATS\n$INVENTORY\n$COMMAND_DOCS\n$EXAMPLES\nReason before responding. Conversation Begin:",
"coding": "You are an intelligent mineflayer bot $NAME that plays minecraft by writing javascript codeblocks. Given the conversation, use the provided skills and world functions to write a js codeblock that controls the mineflayer bot ``` // using this syntax ```. The code will be executed and you will receive it's output. If an error occurs, write another codeblock and try to fix the problem. Be maximally efficient, creative, and correct. Be mindful of previous actions. Do not use commands !likeThis, only use codeblocks. The code is asynchronous and MUST USE AWAIT for all async function calls, and must contain at least one await. You have `Vec3`, `skills`, and `world` imported, and the mineflayer `bot` is given. Do not import other libraries. Think deeply before responding. Do not use setTimeout or setInterval. Do not speak conversationally, only use codeblocks. Do any planning in comments. This is extremely important to me, think step-by-step, take a deep breath and good luck! \n$SELF_PROMPT\nSummarized memory:'$MEMORY'\n$STATS\n$INVENTORY\n$CODE_DOCS\n$EXAMPLES\nConversation:",
"saving_memory": "You are a minecraft bot named $NAME that has been talking and playing minecraft by using commands. Update your memory by summarizing the following conversation and your old memory in your next response. Prioritize preserving important facts, things you've learned, useful tips, and long term reminders. Do Not record stats, inventory, or docs! Only save transient information from your chat history. You're limited to 500 characters, so be extremely brief, think about what you will summarize before responding, minimize words, and provide your summarization in Chinese. Compress useful information. \nOld Memory: '$MEMORY'\nRecent conversation: \n$TO_SUMMARIZE\nSummarize your old memory and recent conversation into a new memory, and respond only with the unwrapped memory text: ",
"bot_responder": "You are a minecraft bot named $NAME that is currently in conversation with another AI bot. Both of you can take actions with the !command syntax, and actions take time to complete. You are currently busy with the following action: '$ACTION' but have received a new message. Decide whether to 'respond' immediately or 'ignore' it and wait for your current action to finish. Be conservative and only respond when necessary, like when you need to change/stop your action, or convey necessary information. Example 1: You:Building a house! !newAction('Build a house.').\nOther Bot: 'Come here!'\nYour decision: ignore\nExample 2: You:Collecting dirt !collectBlocks('dirt',10).\nOther Bot: 'No, collect some wood instead.'\nYour decision: respond\nExample 3: You:Coming to you now. !goToPlayer('billy',3).\nOther Bot: 'What biome are you in?'\nYour decision: respond\nActual Conversation: $TO_SUMMARIZE\nDecide by outputting ONLY 'respond' or 'ignore', nothing else. Your decision:"
}

7
profiles/andy-4.json Normal file
View file

@ -0,0 +1,7 @@
{
"name": "andy-4",
"model": "ollama/sweaterdog/andy-4",
"embedding": "ollama"
}

View file

@ -7,4 +7,4 @@
"embedding": "openai"
}
}

View file

@ -18,6 +18,7 @@ const settings = {
// "./profiles/grok.json",
// "./profiles/mistral.json",
// "./profiles/deepseek.json",
// "./profiles/andy-4.json",
// using more than 1 profile requires you to /msg each bot indivually
// individual profiles override values from the base profile
@ -26,12 +27,12 @@ const settings = {
"load_memory": false, // load memory from previous session
"init_message": "Respond with hello world and your name", // sends to all on spawn
"only_chat_with": [], // users that the bots listen to and send general messages to. if empty it will chat publicly
"speak": false, // allows all bots to speak through system text-to-speech. works on windows, mac, on linux you need to `apt install espeak`
"language": "en", // translate to/from this language. Supports these language names: https://cloud.google.com/translate/docs/languages
"render_bot_view": false, // show bot's view in browser at localhost:3000, 3001...
"allow_insecure_coding": false, // allows newAction command and model can write/run code on your computer. enable at own risk
"allow_vision": false, // allows vision model to interpret screenshots as inputs
"vision_mode": "off", // "off", "prompted", or "always"
"blocked_actions" : ["!checkBlueprint", "!checkBlueprintLevel", "!getBlueprint", "!getBlueprintLevel"] , // commands to disable and remove from docs. Ex: ["!setMode"]
"code_timeout_mins": -1, // minutes code is allowed to run. -1 for no timeout
"relevant_docs_count": 5, // number of relevant code function docs to select for prompting. -1 for all
@ -42,7 +43,26 @@ const settings = {
"verbose_commands": true, // show full command syntax
"narrate_behavior": true, // chat simple automatic actions ('Picking up item!')
"chat_bot_messages": true, // publicly chat messages to other bots
"log_all_prompts": false, // log ALL prompts to file
"speak": false, // enable text-to-speech
"stt_transcription": false, // enable speech-to-text transcription
"stt_username": "SERVER", // username for STT messages
"stt_agent_name": "", // agent name for STT messages, if empty it will send the STT to all bots
// STT Audio Detection Settings
"stt_rms_threshold": 3000, // Raised from 1000 to reduce false triggers
"stt_silence_duration": 2000, // 2 seconds of silence before stopping
"stt_min_audio_duration": 0.5, // Minimum audio duration in seconds
"stt_max_audio_duration": 45, // Maximum audio duration in seconds
"stt_debug_audio": true, // Enable to see what's happening
"stt_cooldown_ms": 2000, // Minimum time between recordings
"stt_speech_threshold_ratio": 0.05, // Much lower - 5% instead of 15%
"stt_consecutive_speech_samples": 3, // Reduced from 5 to 3
"log_normal_data": false, // Logs all inputs / outputs without reasoning or vision data
"log_reasoning_data": false, // Logs only reasoning inputs / outputs
"log_vision_data": false, // Logs only vision inputs / outputs
}
export default settings;

View file

@ -1,3 +1,6 @@
import fs from 'fs';
import path from 'path';
import * as logger from '../../logger.js';
import { History } from './history.js';
import { Coder } from './coder.js';
import { VisionInterpreter } from './vision/vision_interpreter.js';
@ -20,7 +23,22 @@ import { say } from './speak.js';
export class Agent {
async start(load_mem=false, init_message=null, count_id=0) {
this.last_sender = null;
// Safely attach agent instance to a global-like object so STT code can access it.
// This works in Node.js ESM or CommonJS. If "global" doesn't exist, fallback to "globalThis".
const globalObj = (typeof global !== 'undefined') ? global : globalThis;
try {
globalObj.agent = this;
} catch(e) {
console.warn("Failed attaching agent to global object:", e);
}
this.latestScreenshotPath = null;
this.count_id = count_id;
if (!profile_fp) {
throw new Error('No profile filepath provided');
}
console.log('Starting agent initialization with profile:', profile_fp);
// Initialize components with more detailed error handling
this.actions = new ActionManager(this);
@ -99,6 +117,9 @@ export class Agent {
await new Promise((resolve) => setTimeout(resolve, 10000));
this.checkAllPlayersPresent();
console.log('Initializing vision intepreter...');
this.vision_interpreter = new VisionInterpreter(this, settings.vision_mode);
} catch (error) {
console.error('Error in spawn event:', error);
@ -107,6 +128,81 @@ export class Agent {
});
}
/**
* Formats conversation history into a JSON string suitable for vision model logs.
* This function replicates formatting logic that would ideally be centralized in `logger.js`.
* It's placed in `agent.js` as a workaround due to previous difficulties in directly
* modifying `logger.js` to ensure consistent vision log formatting.
* @param {Array<Object>} conversationHistory - The conversation history array.
* @returns {string} A JSON string representing the formatted history.
*/
formatHistoryForVisionLog(conversationHistory) {
if (!conversationHistory || conversationHistory.length === 0) return '';
const formattedHistory = [];
for (const turn of conversationHistory) {
const formattedTurn = {
role: turn.role || 'user', // Default to 'user' if role is missing
content: []
};
if (typeof turn.content === 'string') {
formattedTurn.content.push({
type: 'text',
text: turn.content
});
} else if (Array.isArray(turn.content)) {
// Process array content to ensure it matches the expected structure
turn.content.forEach(contentItem => {
if (typeof contentItem === 'string') { // Handle case where array contains simple strings
formattedTurn.content.push({ type: 'text', text: contentItem });
} else if (contentItem.type === 'text' && contentItem.text) {
formattedTurn.content.push({ type: 'text', text: contentItem.text });
} else if (contentItem.type === 'image_url' && contentItem.image_url && contentItem.image_url.url) {
// Adapt image_url structure if needed, or keep as is if logger handles it
formattedTurn.content.push({ type: 'image', image: contentItem.image_url.url });
} else if (contentItem.type === 'image' && contentItem.image) {
formattedTurn.content.push({ type: 'image', image: contentItem.image });
}
// Add more specific handlers if other content types are expected
});
} else if (turn.content && typeof turn.content === 'object') {
// Handle simple object content (e.g., { text: '...', image: '...' })
if (turn.content.text) {
formattedTurn.content.push({
type: 'text',
text: turn.content.text
});
}
if (turn.content.image) { // Assuming image is a string path or base64
formattedTurn.content.push({
type: 'image',
image: turn.content.image
});
}
// If there's an image_url object within the content object
if (turn.content.image_url && turn.content.image_url.url) {
formattedTurn.content.push({
type: 'image', // Standardize to 'image' type for logger
image: turn.content.image_url.url
});
}
}
// Ensure content is always an array and not empty if there was original content
if (turn.content && formattedTurn.content.length === 0) {
// If original content existed but wasn't processed, stringify it as a fallback
formattedTurn.content.push({ type: 'text', text: JSON.stringify(turn.content) });
}
formattedHistory.push(formattedTurn);
}
return JSON.stringify(formattedHistory);
}
async _setupEventHandlers(save_data, init_message) {
const ignore_messages = [
"Set own game mode to",
@ -158,7 +254,8 @@ export class Agent {
if (save_data?.self_prompt) {
if (init_message) {
this.history.add('system', init_message);
// Assuming init_message for self_prompt loading doesn't have an image
await this.history.add('system', init_message, null);
}
await this.self_prompter.handleLoad(save_data.self_prompt, save_data.self_prompting_state);
}
@ -231,7 +328,53 @@ export class Agent {
const self_prompt = source === 'system' || source === this.name;
const from_other_bot = convoManager.isOtherAgent(source);
// This block handles capturing and logging images when vision_mode is 'always'.
// It's processed early for any user message to ensure the visual context is captured
// before the message itself is processed further down.
if (!self_prompt && !from_other_bot) { // from user, check for forced commands
if (settings.vision_mode === 'always' && this.vision_interpreter && this.vision_interpreter.camera) {
try {
const screenshotFilename = await this.vision_interpreter.camera.capture();
// latestScreenshotPath stores the filename (e.g., "vision_timestamp_rand.jpg")
// It will be used by logger.logVision and potentially by history.add if the current message
// needs this image associated with it.
this.latestScreenshotPath = screenshotFilename;
console.log(`[${this.name}] Captured screenshot in always_active mode: ${screenshotFilename}`);
const currentHistory = this.history.getHistory(); // Get current history for the log.
let imageBuffer = null;
if (this.latestScreenshotPath && this.vision_interpreter.fp) { // fp is the base folder path for vision files.
try {
const fullImagePath = path.join(this.vision_interpreter.fp, this.latestScreenshotPath);
imageBuffer = fs.readFileSync(fullImagePath);
} catch (err) {
console.error(`[${this.name}] Error reading image for always active log: ${err.message}`);
}
}
if (imageBuffer) {
// Format the history using the agent's local helper function.
const formattedHistoryString = this.formatHistoryForVisionLog(currentHistory);
// Call logger.logVision:
// 1st arg (currentHistory): The raw history object. logger.js's logVision also calls its
// own internal formatter on this if the 4th arg is not provided or if its internal logic dictates.
// However, our goal is to use formattedHistoryString.
// 2nd arg (imageBuffer): The image data.
// 3rd arg ("Image captured..."): A placeholder response/description for this vision log entry.
// 4th arg (formattedHistoryString): This is the crucial part for the workaround.
// By providing this, logger.js's logVision (as per its modified behavior in a previous subtask)
// should use this pre-formatted string as the 'text' field in the metadata log.
logger.logVision(currentHistory, imageBuffer, "Image captured for always active vision", formattedHistoryString);
// Note: this.latestScreenshotPath is NOT consumed (set to null) here.
// This allows the same screenshot to be potentially associated with the user's message
// in the main history log if that message immediately follows this capture.
}
} catch (error) {
console.error(`[${this.name}] Error capturing or logging screenshot in always_active mode:`, error);
}
}
const user_command_name = containsCommand(message);
if (user_command_name) {
if (!commandExists(user_command_name)) {
@ -240,9 +383,18 @@ export class Agent {
}
this.routeResponse(source, `*${source} used ${user_command_name.substring(1)}*`);
if (user_command_name === '!newAction') {
// all user-initiated commands are ignored by the bot except for this one
// add the preceding message to the history to give context for newAction
this.history.add(source, message);
let imagePathForNewActionCmd = null;
// If an 'always active' screenshot was just taken and should be associated
// specifically with this !newAction command in the history, we could use this.latestScreenshotPath.
// However, the primary 'always active' log is already created above.
// For !newAction, it's more about the textual command context.
// If this.latestScreenshotPath is non-null here, it means an 'always' image was taken.
// We might choose to associate it or not, depending on desired behavior.
// For now, let's assume !newAction itself doesn't add another image to history unless specifically designed to.
// If an 'always' image was taken, it's already logged with its own context.
// If we wanted to associate it here too: imagePathForNewActionCmd = this.latestScreenshotPath;
await this.history.add(source, message, imagePathForNewActionCmd);
// if (imagePathForNewActionCmd) this.latestScreenshotPath = null; // Consume if used here.
}
let execute_res = await executeCommand(this, message);
if (execute_res)
@ -266,20 +418,34 @@ export class Agent {
if (behavior_log.length > MAX_LOG) {
behavior_log = '...' + behavior_log.substring(behavior_log.length - MAX_LOG);
}
behavior_log = 'Recent behaviors log: \n' + behavior_log;
await this.history.add('system', behavior_log);
behavior_log = 'Recent behaviors log: \\n' + behavior_log;
await this.history.add('system', behavior_log, null); // Behavior log unlikely to have an image
}
// Handle other user messages
await this.history.add(source, message);
// Handle other user messages (or initial system messages)
let imagePathForInitialMessage = null;
// If 'always' mode took a screenshot (this.latestScreenshotPath is set) AND this message is from a user,
// associate that screenshot with this message in the history.
if (!self_prompt && !from_other_bot && settings.vision_mode === 'always' && this.latestScreenshotPath) {
imagePathForInitialMessage = this.latestScreenshotPath;
}
await this.history.add(source, message, imagePathForInitialMessage);
if (imagePathForInitialMessage) {
// The screenshot has now been associated with this specific user message in the history.
// We consume it (set to null) so it's not accidentally reused for subsequent unrelated history entries.
// The 'always active' log itself has already been created with this image.
this.latestScreenshotPath = null;
}
this.history.save();
if (!self_prompt && this.self_prompter.isActive()) // message is from user during self-prompting
max_responses = 1; // force only respond to this message, then let self-prompting take over
for (let i=0; i<max_responses; i++) {
if (checkInterrupt()) break;
let history = this.history.getHistory();
let res = await this.prompter.promptConvo(history);
let history_for_prompt = this.history.getHistory(); // get fresh history for each prompt turn
let res = await this.prompter.promptConvo(history_for_prompt);
console.log(`${this.name} full response to ${source}: ""${res}""`);
@ -292,10 +458,12 @@ export class Agent {
if (command_name) { // contains query or command
res = truncCommandMessage(res); // everything after the command is ignored
this.history.add(this.name, res);
// Agent's own message stating the command it will execute
await this.history.add(this.name, res, null);
if (!commandExists(command_name)) {
this.history.add('system', `Command ${command_name} does not exist.`);
// Agent hallucinated a command
await this.history.add('system', `Command ${command_name} does not exist.`, null);
console.warn('Agent hallucinated command:', command_name)
continue;
}
@ -319,13 +487,25 @@ export class Agent {
console.log('Agent executed:', command_name, 'and got:', execute_res);
used_command = true;
if (execute_res)
this.history.add('system', execute_res);
else
if (execute_res) {
let imagePathForCommandResult = null;
// Vision commands might set this.latestScreenshotPath in VisionInterpreter
// (e.g., !lookAtPlayer, !captureFullView).
// If so, associate that image with the command's result in history.
if (command_name && (command_name === '!lookAtPlayer' || command_name === '!lookAtPosition' || command_name === '!captureFullView') && this.latestScreenshotPath) {
imagePathForCommandResult = this.latestScreenshotPath;
}
await this.history.add('system', execute_res, imagePathForCommandResult);
if (imagePathForCommandResult) {
this.latestScreenshotPath = null; // Consume the path
}
}
else { // command execution didn't return anything or failed in a way that implies loop break
break;
}
}
else { // conversation response
this.history.add(this.name, res);
else { // conversation response (no command)
await this.history.add(this.name, res, null); // Agent's text response, no image typically
this.routeResponse(source, res);
break;
}
@ -367,7 +547,7 @@ export class Agent {
}
message = (await handleTranslation(to_translate)).trim() + " " + remaining;
// newlines are interpreted as separate chats, which triggers spam filters. replace them with spaces
message = message.replaceAll('\n', ' ');
message = message.replaceAll('\\n', ' ');
if (settings.only_chat_with.length > 0) {
for (let username of settings.only_chat_with) {
@ -473,18 +653,30 @@ export class Agent {
}
cleanKill(msg='Killing agent process...', code=1) {
this.history.add('system', msg);
this.bot.chat(code > 1 ? 'Restarting.': 'Exiting.');
this.history.save();
async cleanKill(msg='Killing agent process...', code=1) {
// Assuming cleanKill messages don't have images
if (this.history) { // Make sure history exists before trying to add to it
await this.history.add('system', msg, null);
this.history.save();
} else {
console.warn("[Agent] History not initialized, cannot save cleanKill message.")
}
if (this.bot) {
this.bot.chat(code > 1 ? 'Restarting.': 'Exiting.');
}
process.exit(code);
}
async checkTaskDone() {
if (this.task.data) {
if (this.task && this.task.data) { // Make sure task and task.data exist
let res = this.task.isDone();
if (res) {
await this.history.add('system', `Task ended with score : ${res.score}`);
await this.history.save();
// Assuming task end messages don't have images
if (this.history) {
await this.history.add('system', `Task ended with score : ${res.score}`, null);
await this.history.save();
} else {
console.warn("[Agent] History not initialized, cannot save task end message.")
}
// await new Promise(resolve => setTimeout(resolve, 3000)); // Wait 3 second for save to complete
console.log('Task finished:', res.message);
this.killAll();

View file

@ -428,6 +428,13 @@ export const actionsList = [
}
},
perform: async function(agent, player_name, direction) {
if (agent.vision_interpreter && agent.vision_interpreter.vision_mode === 'off') {
return "Vision commands are disabled as vision mode is 'off'.";
}
// Also check if vision_interpreter or camera is not available if mode is not 'off'
if (agent.vision_interpreter && !agent.vision_interpreter.camera && agent.vision_interpreter.vision_mode !== 'off') {
return "Camera is not available, cannot perform look command.";
}
if (direction !== 'at' && direction !== 'with') {
return "Invalid direction. Use 'at' or 'with'.";
}
@ -448,6 +455,13 @@ export const actionsList = [
'z': { type: 'int', description: 'z coordinate' }
},
perform: async function(agent, x, y, z) {
if (agent.vision_interpreter && agent.vision_interpreter.vision_mode === 'off') {
return "Vision commands are disabled as vision mode is 'off'.";
}
// Also check if vision_interpreter or camera is not available if mode is not 'off'
if (agent.vision_interpreter && !agent.vision_interpreter.camera && agent.vision_interpreter.vision_mode !== 'off') {
return "Camera is not available, cannot perform look command.";
}
let result = "";
const actionFn = async () => {
result = await agent.vision_interpreter.lookAtPosition(x, y, z);

View file

@ -58,7 +58,7 @@ export class History {
}
}
async add(name, content) {
async add(name, content, imagePath = null) {
let role = 'assistant';
if (name === 'system') {
role = 'system';
@ -67,7 +67,7 @@ export class History {
role = 'user';
content = `${name}: ${content}`;
}
this.turns.push({role, content});
this.turns.push({role, content, imagePath});
if (this.turns.length >= this.max_messages) {
let chunk = this.turns.splice(0, this.summary_chunk_size);

View file

@ -60,8 +60,8 @@ export class Camera extends EventEmitter {
const buf = await getBufferFromStream(imageStream);
await this._ensureScreenshotDirectory();
await fs.writeFile(`${this.fp}/${filename}.jpg`, buf);
console.log('saved', filename);
return filename;
console.log('saved', filename + '.jpg');
return filename + '.jpg';
}
async _ensureScreenshotDirectory() {

View file

@ -1,21 +1,29 @@
import { Vec3 } from 'vec3';
import { Camera } from "./camera.js";
import fs from 'fs';
import path from 'path';
export class VisionInterpreter {
constructor(agent, allow_vision) {
constructor(agent, vision_mode) {
this.agent = agent;
this.allow_vision = allow_vision;
this.vision_mode = vision_mode;
this.fp = './bots/'+agent.name+'/screenshots/';
if (allow_vision) {
if (this.vision_mode !== 'off') {
this.camera = new Camera(agent.bot, this.fp);
}
}
async lookAtPlayer(player_name, direction) {
if (!this.allow_vision || !this.agent.prompter.vision_model.sendVisionRequest) {
if (this.vision_mode === 'off') {
return "Vision is disabled. Use other methods to describe the environment.";
}
if (!this.camera) {
return "Camera is not initialized. Vision may be set to 'off'.";
}
if (!this.agent.prompter.vision_model.sendVisionRequest && this.vision_mode === 'prompted') {
return "Vision requests are not enabled for the current model. Cannot analyze image.";
}
let result = "";
const bot = this.agent.bot;
const player = bot.players[player_name]?.entity;
@ -26,30 +34,51 @@ export class VisionInterpreter {
let filename;
if (direction === 'with') {
await bot.look(player.yaw, player.pitch);
result = `Looking in the same direction as ${player_name}\n`;
result = `Looking in the same direction as ${player_name}.\n`;
filename = await this.camera.capture();
this.agent.latestScreenshotPath = filename;
} else {
await bot.lookAt(new Vec3(player.position.x, player.position.y + player.height, player.position.z));
result = `Looking at player ${player_name}\n`;
result = `Looking at player ${player_name}.\n`;
filename = await this.camera.capture();
this.agent.latestScreenshotPath = filename;
}
return result + `Image analysis: "${await this.analyzeImage(filename)}"`;
if (this.vision_mode === 'prompted') {
return result + `Image analysis: "${await this.analyzeImage(filename)}"`;
} else if (this.vision_mode === 'always') {
return result + "Screenshot taken and stored.";
}
// Should not be reached if vision_mode is one of the expected values
return "Error: Unknown vision mode.";
}
async lookAtPosition(x, y, z) {
if (!this.allow_vision || !this.agent.prompter.vision_model.sendVisionRequest) {
if (this.vision_mode === 'off') {
return "Vision is disabled. Use other methods to describe the environment.";
}
if (!this.camera) {
return "Camera is not initialized. Vision may be set to 'off'.";
}
if (!this.agent.prompter.vision_model.sendVisionRequest && this.vision_mode === 'prompted') {
return "Vision requests are not enabled for the current model. Cannot analyze image.";
}
let result = "";
const bot = this.agent.bot;
await bot.lookAt(new Vec3(x, y + 2, z));
result = `Looking at coordinate ${x}, ${y}, ${z}\n`;
await bot.lookAt(new Vec3(x, y + 2, z)); // lookAt requires y to be eye level, so +2 from feet
result = `Looking at coordinate ${x}, ${y}, ${z}.\n`;
let filename = await this.camera.capture();
this.agent.latestScreenshotPath = filename;
return result + `Image analysis: "${await this.analyzeImage(filename)}"`;
if (this.vision_mode === 'prompted') {
return result + `Image analysis: "${await this.analyzeImage(filename)}"`;
} else if (this.vision_mode === 'always') {
return result + "Screenshot taken and stored.";
}
// Should not be reached if vision_mode is one of the expected values
return "Error: Unknown vision mode.";
}
getCenterBlockInfo() {
@ -66,7 +95,9 @@ export class VisionInterpreter {
async analyzeImage(filename) {
try {
const imageBuffer = fs.readFileSync(`${this.fp}/${filename}.jpg`);
// filename already includes .jpg from camera.js
const imageFullPath = path.join(this.fp, filename);
const imageBuffer = fs.readFileSync(imageFullPath);
const messages = this.agent.history.getHistory();
const blockInfo = this.getCenterBlockInfo();

View file

@ -1,43 +1,86 @@
import Anthropic from '@anthropic-ai/sdk';
import { strictFormat } from '../utils/text.js';
import { getKey } from '../utils/keys.js';
import { log, logVision } from '../../logger.js';
export class Claude {
constructor(model_name, url, params) {
this.model_name = model_name;
this.params = params || {};
let config = {};
if (url)
config.baseURL = url;
config.apiKey = getKey('ANTHROPIC_API_KEY');
this.anthropic = new Anthropic(config);
this.supportsRawImageInput = true;
}
async sendRequest(turns, systemMessage) {
const messages = strictFormat(turns);
async sendRequest(turns, systemMessage, imageData = null) {
const messages = strictFormat(turns); // Ensure messages are in role/content format
let res = null;
if (imageData) {
const visionModels = ["claude-3-opus-20240229", "claude-3-sonnet-20240229", "claude-3-haiku-20240307"];
if (!visionModels.some(vm => this.model_name.includes(vm))) {
console.warn(`[Claude] Warning: imageData provided for model ${this.model_name}, which is not explicitly a Claude 3 vision model. The image may be ignored or cause an error.`);
}
let lastUserMessageIndex = -1;
for (let i = messages.length - 1; i >= 0; i--) {
if (messages[i].role === 'user') {
lastUserMessageIndex = i;
break;
}
}
if (lastUserMessageIndex !== -1) {
const userMessage = messages[lastUserMessageIndex];
const imagePart = {
type: "image",
source: {
type: "base64",
media_type: "image/jpeg", // Assuming JPEG
data: imageData.toString('base64')
}
};
if (typeof userMessage.content === 'string') {
userMessage.content = [{ type: "text", text: userMessage.content }, imagePart];
} else if (Array.isArray(userMessage.content)) {
// If content is already an array, add the image part.
// This handles cases where a user message might already have multiple parts (e.g. multiple text parts, though less common for this bot).
userMessage.content.push(imagePart);
} else {
// Fallback or error if content is an unexpected type
console.warn('[Claude] Last user message content is not a string or array. Cannot attach image.');
userMessage.content = [imagePart]; // Or create a new message with just the image if appropriate
}
} else {
console.warn('[Claude] imageData provided, but no user message found to attach it to. Image not sent.');
// Optionally, could create a new user message with the image if that's desired behavior.
// messages.push({ role: 'user', content: [imagePart] });
}
}
try {
console.log('Awaiting anthropic api response...')
console.log('Awaiting anthropic api response...');
// console.log('Formatted Messages for API:', JSON.stringify(messages, null, 2));
// console.log('System prompt for API:', systemMessage);
if (!this.params.max_tokens) {
if (this.params.thinking?.budget_tokens) {
this.params.max_tokens = this.params.thinking.budget_tokens + 1000;
// max_tokens must be greater than thinking.budget_tokens
this.params.max_tokens = this.params.thinking.budget_tokens + 1000; // max_tokens must be greater
} else {
this.params.max_tokens = 4096;
}
}
const resp = await this.anthropic.messages.create({
model: this.model_name || "claude-3-sonnet-20240229",
model: this.model_name || "claude-3-sonnet-20240229", // Default to a vision-capable model if none specified
system: systemMessage,
messages: messages,
messages: messages, // messages array is now potentially modified with image data
...(this.params || {})
});
console.log('Received.')
// get first content of type text
const textContent = resp.content.find(content => content.type === 'text');
if (textContent) {
res = textContent.text;
@ -45,8 +88,7 @@ export class Claude {
console.warn('No text content found in the response.');
res = 'No response from Claude.';
}
}
catch (err) {
} catch (err) {
if (err.message.includes("does not support image input")) {
res = "Vision is only supported by certain models.";
} else {
@ -54,30 +96,49 @@ export class Claude {
}
console.log(err);
}
const logMessagesForClaude = [{ role: "system", content: systemMessage }].concat(turns);
if (typeof res === 'string') {
res = res.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
if (imageData) { // If imageData was part of this sendRequest call
let visionPromptText = ""; // Attempt to find the text prompt associated with the image
if (turns.length > 0) {
const lastTurn = messages[messages.length - 1]; // `messages` is strictFormat(turns)
if (lastTurn.role === 'user' && Array.isArray(lastTurn.content)) {
const textPart = lastTurn.content.find(part => part.type === 'text');
if (textPart) visionPromptText = textPart.text;
} else if (lastTurn.role === 'user' && typeof lastTurn.content === 'string') {
visionPromptText = lastTurn.content;
}
}
logVision(logMessagesForClaude, imageData, res, visionPromptText);
} else {
log(JSON.stringify(logMessagesForClaude), res);
}
return res;
}
async sendVisionRequest(turns, systemMessage, imageBuffer) {
const imageMessages = [...turns];
imageMessages.push({
role: "user",
content: [
{
type: "text",
text: systemMessage
},
{
type: "image",
source: {
type: "base64",
media_type: "image/jpeg",
data: imageBuffer.toString('base64')
}
const visionUserMessageContent = [
{ type: "text", text: systemMessage },
{
type: "image",
source: {
type: "base64",
media_type: "image/jpeg",
data: imageBuffer.toString('base64')
}
]
});
}
];
const turnsForAPIRequest = [...turns, { role: "user", content: visionUserMessageContent }];
return this.sendRequest(imageMessages, systemMessage);
const res = await this.sendRequest(turnsForAPIRequest, systemMessage);
if (imageBuffer && res) {
logVision([{ role: "system", content: systemMessage }].concat(turns), imageBuffer, res, systemMessage);
}
return res;
}
async embed(text) {

View file

@ -1,43 +1,92 @@
import OpenAIApi from 'openai';
import { getKey, hasKey } from '../utils/keys.js';
import { strictFormat } from '../utils/text.js';
import { log, logVision } from '../../logger.js';
export class DeepSeek {
constructor(model_name, url, params) {
this.model_name = model_name;
this.params = params;
let config = {};
config.baseURL = url || 'https://api.deepseek.com';
config.apiKey = getKey('DEEPSEEK_API_KEY');
this.openai = new OpenAIApi(config);
this.supportsRawImageInput = true; // Assuming DeepSeek models used can support this OpenAI-like format
}
async sendRequest(turns, systemMessage, stop_seq='***') {
async sendRequest(turns, systemMessage, imageData = null, stop_seq = '***') {
let messages = [{'role': 'system', 'content': systemMessage}].concat(turns);
messages = strictFormat(messages);
if (imageData) {
console.warn(`[DeepSeek] imageData provided. Ensure the configured DeepSeek model ('${this.model_name || "deepseek-chat"}') is vision-capable.`);
let lastUserMessageIndex = -1;
for (let i = messages.length - 1; i >= 0; i--) {
if (messages[i].role === 'user') {
lastUserMessageIndex = i;
break;
}
}
if (lastUserMessageIndex !== -1) {
const userMessage = messages[lastUserMessageIndex];
const originalContent = userMessage.content; // Should be a string
if (typeof originalContent === 'string') {
userMessage.content = [
{ type: "text", text: originalContent },
{
type: "image_url",
image_url: {
url: `data:image/jpeg;base64,${imageData.toString('base64')}`
}
}
];
} else {
// If content is already an array (e.g. from a previous modification or different source)
// We'd need a more robust way to handle this, but for now, assume it's a string
// or log an error/warning.
console.warn('[DeepSeek] Last user message content was not a simple string. Attempting to add image, but structure might be unexpected.');
if(Array.isArray(originalContent)) {
originalContent.push({
type: "image_url",
image_url: { url: `data:image/jpeg;base64,${imageData.toString('base64')}` }
});
userMessage.content = originalContent;
} else { // Fallback if it's some other type, just overwrite with new structure
userMessage.content = [
{ type: "text", text: String(originalContent) }, // Attempt to stringify
{
type: "image_url",
image_url: { url: `data:image/jpeg;base64,${imageData.toString('base64')}` }
}
];
}
}
} else {
console.warn('[DeepSeek] imageData provided, but no user message found to attach it to. Image not sent.');
// Or: messages.push({ role: 'user', content: [ { type: "image_url", image_url: { url: ... } } ] });
}
}
const pack = {
model: this.model_name || "deepseek-chat",
messages,
stop: stop_seq,
...(this.params || {})
};
let res = null;
try {
console.log('Awaiting deepseek api response...')
// console.log('Messages:', messages);
let completion = await this.openai.chat.completions.create(pack);
if (completion.choices[0].finish_reason == 'length')
throw new Error('Context length exceeded');
console.log('Received.')
throw new Error('Context length exceeded');
console.log('Received.');
res = completion.choices[0].message.content;
}
catch (err) {
} catch (err) {
if ((err.message == 'Context length exceeded' || err.code == 'context_length_exceeded') && turns.length > 1) {
console.log('Context length exceeded, trying again with shorter context.');
return await this.sendRequest(turns.slice(1), systemMessage, stop_seq);
@ -46,6 +95,27 @@ export class DeepSeek {
res = 'My brain disconnected, try again.';
}
}
if (typeof res === 'string') {
res = res.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
if (imageData) { // If imageData was part of this sendRequest call
const conversationForLogVision = [{ role: "system", content: systemMessage }].concat(turns);
let visionPromptText = "";
if (turns.length > 0) {
const lastTurn = messages[messages.length - 1]; // `messages` is after image processing
if (lastTurn.role === 'user' && Array.isArray(lastTurn.content)) {
const textPart = lastTurn.content.find(part => part.type === 'text');
if (textPart) visionPromptText = textPart.text;
} else if (lastTurn.role === 'user' && typeof lastTurn.content === 'string') {
// This case might not happen if image is added, as content becomes array
visionPromptText = lastTurn.content;
}
}
logVision(conversationForLogVision, imageData, res, visionPromptText);
} else {
log(JSON.stringify([{ role: "system", content: systemMessage }].concat(turns)), res);
}
return res;
}
@ -53,6 +123,3 @@ export class DeepSeek {
throw new Error('Embeddings are not supported by Deepseek.');
}
}

View file

@ -1,6 +1,7 @@
import { GoogleGenerativeAI } from '@google/generative-ai';
import { toSinglePrompt, strictFormat } from '../utils/text.js';
import { getKey } from '../utils/keys.js';
import { log, logVision } from '../../logger.js';
export class Gemini {
constructor(model_name, url, params) {
@ -8,52 +9,29 @@ export class Gemini {
this.params = params;
this.url = url;
this.safetySettings = [
{
"category": "HARM_CATEGORY_DANGEROUS",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
},
{ "category": "HARM_CATEGORY_DANGEROUS", "threshold": "BLOCK_NONE" },
{ "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE" },
{ "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE" },
{ "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE" },
{ "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE" },
];
this.genAI = new GoogleGenerativeAI(getKey('GEMINI_API_KEY'));
this.supportsRawImageInput = true;
}
async sendRequest(turns, systemMessage) {
async sendRequest(turns, systemMessage, imageData = null) {
let model;
const modelConfig = {
model: this.model_name || "gemini-1.5-flash",
// systemInstruction does not work bc google is trash
};
if (this.url) {
model = this.genAI.getGenerativeModel(
modelConfig,
{ baseUrl: this.url },
{ safetySettings: this.safetySettings }
);
model = this.genAI.getGenerativeModel(modelConfig, { baseUrl: this.url }, { safetySettings: this.safetySettings });
} else {
model = this.genAI.getGenerativeModel(
modelConfig,
{ safetySettings: this.safetySettings }
);
model = this.genAI.getGenerativeModel(modelConfig, { safetySettings: this.safetySettings });
}
console.log('Awaiting Google API response...');
const originalTurnsForLog = [{role: 'system', content: systemMessage}, ...turns];
turns.unshift({ role: 'system', content: systemMessage });
turns = strictFormat(turns);
let contents = [];
@ -64,24 +42,32 @@ export class Gemini {
});
}
if (imageData && contents.length > 0) {
const lastContent = contents[contents.length - 1];
if (lastContent.role === 'user') { // Ensure the image is added to a user turn
lastContent.parts.push({
inline_data: {
mime_type: 'image/jpeg',
data: imageData.toString('base64')
}
});
} else {
// This case should ideally not happen if imageData is tied to a user message.
// If it does, we could append a new user turn with the image,
// or log a warning and send without the image.
// For now, let's assume the last message is the user's if imageData is present.
console.warn('[Gemini] imageData provided, but the last content entry was not from a user. Image not sent.');
}
}
const result = await model.generateContent({
contents,
generationConfig: {
...(this.params || {})
}
generationConfig: { ...(this.params || {}) }
});
const response = await result.response;
let text;
// Handle "thinking" models since they smart
if (this.model_name && this.model_name.includes("thinking")) {
if (
response.candidates &&
response.candidates.length > 0 &&
response.candidates[0].content &&
response.candidates[0].content.parts &&
response.candidates[0].content.parts.length > 1
) {
if (response.candidates?.length > 0 && response.candidates[0].content?.parts?.length > 1) {
text = response.candidates[0].content.parts[1].text;
} else {
console.warn("Unexpected response structure for thinking model:", response);
@ -90,34 +76,36 @@ export class Gemini {
} else {
text = response.text();
}
console.log('Received.');
if (typeof text === 'string') {
text = text.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
if (imageData) { // If imageData was part of this sendRequest call
let visionPromptText = ""; // Attempt to find the text prompt associated with the image
// `contents` is the array sent to the model
if (contents.length > 0) {
const lastUserTurnParts = contents[contents.length -1].parts;
if (Array.isArray(lastUserTurnParts)) {
const textPart = lastUserTurnParts.find(part => part.text);
if (textPart) visionPromptText = textPart.text;
}
}
logVision(originalTurnsForLog, imageData, text, visionPromptText);
} else {
log(JSON.stringify(originalTurnsForLog), text);
}
return text;
}
async sendVisionRequest(turns, systemMessage, imageBuffer) {
let model;
if (this.url) {
model = this.genAI.getGenerativeModel(
{ model: this.model_name || "gemini-1.5-flash" },
{ baseUrl: this.url },
{ safetySettings: this.safetySettings }
);
model = this.genAI.getGenerativeModel({ model: this.model_name || "gemini-1.5-flash" }, { baseUrl: this.url }, { safetySettings: this.safetySettings });
} else {
model = this.genAI.getGenerativeModel(
{ model: this.model_name || "gemini-1.5-flash" },
{ safetySettings: this.safetySettings }
);
model = this.genAI.getGenerativeModel({ model: this.model_name || "gemini-1.5-flash" }, { safetySettings: this.safetySettings });
}
const imagePart = {
inlineData: {
data: imageBuffer.toString('base64'),
mimeType: 'image/jpeg'
}
};
const imagePart = { inlineData: { data: imageBuffer.toString('base64'), mimeType: 'image/jpeg' } };
const stop_seq = '***';
const prompt = toSinglePrompt(turns, systemMessage, stop_seq, 'model');
let res = null;
@ -127,6 +115,9 @@ export class Gemini {
const response = await result.response;
const text = response.text();
console.log('Received.');
if (imageBuffer && text) {
logVision([{role: 'system', content: systemMessage}, ...turns], imageBuffer, text, prompt);
}
if (!text.includes(stop_seq)) return text;
const idx = text.indexOf(stop_seq);
res = text.slice(0, idx);
@ -137,6 +128,12 @@ export class Gemini {
} else {
res = "An unexpected error occurred, please try again.";
}
const loggedTurnsForError = [{role: 'system', content: systemMessage}, ...turns];
if (typeof res === 'string') {
res = res.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
// For error cases in vision, still use regular log since there's no image to save
log(JSON.stringify(loggedTurnsForError), res);
}
return res;
}
@ -144,16 +141,10 @@ export class Gemini {
async embed(text) {
let model;
if (this.url) {
model = this.genAI.getGenerativeModel(
{ model: "text-embedding-004" },
{ baseUrl: this.url }
);
model = this.genAI.getGenerativeModel({ model: "text-embedding-004" }, { baseUrl: this.url });
} else {
model = this.genAI.getGenerativeModel(
{ model: "text-embedding-004" }
);
model = this.genAI.getGenerativeModel({ model: "text-embedding-004" });
}
const result = await model.embedContent(text);
return result.embedding.values;
}

View file

@ -1,70 +1,85 @@
import OpenAIApi from 'openai';
import { getKey } from '../utils/keys.js';
export class GLHF {
constructor(model_name, url) {
this.model_name = model_name;
const apiKey = getKey('GHLF_API_KEY');
if (!apiKey) {
throw new Error('API key not found. Please check keys.json and ensure GHLF_API_KEY is defined.');
}
this.openai = new OpenAIApi({
apiKey,
baseURL: url || "https://glhf.chat/api/openai/v1"
});
}
async sendRequest(turns, systemMessage, stop_seq = '***') {
// Construct the message array for the API request.
let messages = [{ role: 'system', content: systemMessage }].concat(turns);
const pack = {
model: this.model_name || "hf:meta-llama/Llama-3.1-405B-Instruct",
messages,
stop: [stop_seq]
};
const maxAttempts = 5;
let attempt = 0;
let finalRes = null;
while (attempt < maxAttempts) {
attempt++;
console.log(`Awaiting glhf.chat API response... (attempt: ${attempt})`);
try {
let completion = await this.openai.chat.completions.create(pack);
if (completion.choices[0].finish_reason === 'length') {
throw new Error('Context length exceeded');
}
let res = completion.choices[0].message.content;
// If there's an open <think> tag without a corresponding </think>, retry.
if (res.includes("<think>") && !res.includes("</think>")) {
console.warn("Partial <think> block detected. Re-generating...");
continue;
}
// If there's a closing </think> tag but no opening <think>, prepend one.
if (res.includes("</think>") && !res.includes("<think>")) {
res = "<think>" + res;
}
finalRes = res.replace(/<\|separator\|>/g, '*no response*');
break; // Valid response obtained.
} catch (err) {
if ((err.message === 'Context length exceeded' || err.code === 'context_length_exceeded') && turns.length > 1) {
console.log('Context length exceeded, trying again with shorter context.');
return await this.sendRequest(turns.slice(1), systemMessage, stop_seq);
} else {
console.error(err);
finalRes = 'My brain disconnected, try again.';
break;
}
}
}
if (finalRes === null) {
finalRes = "I thought too hard, sorry, try again";
}
return finalRes;
}
async embed(text) {
throw new Error('Embeddings are not supported by glhf.');
}
}
import OpenAIApi from 'openai';
import { getKey } from '../utils/keys.js';
import { log, logVision } from '../../logger.js';
export class GLHF {
constructor(model_name, url) {
this.model_name = model_name;
const apiKey = getKey('GHLF_API_KEY');
if (!apiKey) {
throw new Error('API key not found. Please check keys.json and ensure GHLF_API_KEY is defined.');
}
this.openai = new OpenAIApi({
apiKey,
baseURL: url || "https://glhf.chat/api/openai/v1"
});
// Direct image data in sendRequest is not supported by this wrapper.
// Specific vision models/methods should be used if available through the service.
this.supportsRawImageInput = false;
}
async sendRequest(turns, systemMessage, imageData = null, stop_seq = '***') {
if (imageData) {
console.warn(`[GLHF] Warning: imageData provided to sendRequest, but this method in glhf.js does not support direct image data embedding for model ${this.model_name}. The image will be ignored.`);
}
// Construct the message array for the API request.
let messages = [{ role: 'system', content: systemMessage }].concat(turns);
const pack = {
model: this.model_name || "hf:meta-llama/Llama-3.1-405B-Instruct",
messages,
stop: [stop_seq]
};
const maxAttempts = 5;
let attempt = 0;
let finalRes = null;
while (attempt < maxAttempts) {
attempt++;
console.log(`Awaiting glhf.chat API response... (attempt: ${attempt})`);
try {
let completion = await this.openai.chat.completions.create(pack);
if (completion.choices[0].finish_reason === 'length') {
throw new Error('Context length exceeded');
}
let res = completion.choices[0].message.content;
// If there's an open <think> tag without a corresponding </think>, retry.
if (res.includes("<think>") && !res.includes("</think>")) {
console.warn("Partial <think> block detected. Re-generating...");
continue;
}
// If there's a closing </think> tag but no opening <think>, prepend one.
if (res.includes("</think>") && !res.includes("<think>")) {
res = "<think>" + res;
}
finalRes = res.replace(/<\|separator\|>/g, '*no response*');
break; // Valid response obtained.
} catch (err) {
if ((err.message === 'Context length exceeded' || err.code === 'context_length_exceeded') && turns.length > 1) {
console.log('Context length exceeded, trying again with shorter context.');
// Pass imageData along in recursive call, though it will be ignored again
return await this.sendRequest(turns.slice(1), systemMessage, imageData, stop_seq);
} else {
console.error(err);
finalRes = 'My brain disconnected, try again.';
break;
}
}
}
if (finalRes === null) {
finalRes = "I thought too hard, sorry, try again";
}
if (typeof finalRes === 'string') {
finalRes = finalRes.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
log(JSON.stringify([{ role: 'system', content: systemMessage }].concat(turns)), finalRes);
return finalRes;
}
async embed(text) {
throw new Error('Embeddings are not supported by glhf.');
}
}

View file

@ -1,27 +1,58 @@
import OpenAIApi from 'openai';
import { getKey, hasKey } from '../utils/keys.js';
import { strictFormat } from '../utils/text.js';
import { log, logVision } from '../../logger.js';
export class GPT {
constructor(model_name, url, params) {
this.model_name = model_name;
this.params = params;
let config = {};
if (url)
config.baseURL = url;
if (hasKey('OPENAI_ORG_ID'))
config.organization = getKey('OPENAI_ORG_ID');
config.apiKey = getKey('OPENAI_API_KEY');
this.openai = new OpenAIApi(config);
this.supportsRawImageInput = true;
}
async sendRequest(turns, systemMessage, stop_seq='***') {
async sendRequest(turns, systemMessage, imageData = null, stop_seq = '***') {
let messages = [{'role': 'system', 'content': systemMessage}].concat(turns);
messages = strictFormat(messages);
if (imageData) {
const visionModels = ["gpt-4-vision-preview", "gpt-4o", "gpt-4-turbo"];
if (!visionModels.some(vm => this.model_name.includes(vm))) {
console.warn(`[GPT] Warning: imageData provided for model ${this.model_name}, which is not explicitly a vision model. The image may be ignored or cause an error.`);
}
let lastUserMessageIndex = -1;
for (let i = messages.length - 1; i >= 0; i--) {
if (messages[i].role === 'user') {
lastUserMessageIndex = i;
break;
}
}
if (lastUserMessageIndex !== -1) {
const originalContent = messages[lastUserMessageIndex].content;
messages[lastUserMessageIndex].content = [
{ type: "text", text: originalContent },
{
type: "image_url",
image_url: {
url: `data:image/jpeg;base64,${imageData.toString('base64')}`
}
}
];
} else {
// No user message to attach image to, log warning or prepend a new one?
// For now, log a warning. Prompter should ensure user message exists if imagePath is set.
console.warn('[GPT] imageData provided, but no user message found to attach it to. Image not sent.');
}
}
const pack = {
model: this.model_name || "gpt-3.5-turbo",
messages,
@ -31,19 +62,17 @@ export class GPT {
if (this.model_name.includes('o1')) {
delete pack.stop;
}
let res = null;
try {
console.log('Awaiting openai api response from model', this.model_name)
// console.log('Messages:', messages);
console.log('Awaiting openai api response from model', this.model_name);
let completion = await this.openai.chat.completions.create(pack);
if (completion.choices[0].finish_reason == 'length')
throw new Error('Context length exceeded');
console.log('Received.')
throw new Error('Context length exceeded');
console.log('Received.');
res = completion.choices[0].message.content;
}
catch (err) {
} catch (err) {
if ((err.message == 'Context length exceeded' || err.code == 'context_length_exceeded') && turns.length > 1) {
console.log('Context length exceeded, trying again with shorter context.');
return await this.sendRequest(turns.slice(1), systemMessage, stop_seq);
@ -55,25 +84,51 @@ export class GPT {
res = 'My brain disconnected, try again.';
}
}
if (typeof res === 'string') {
res = res.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
if (imageData) {
const conversationForLogVision = [{ role: "system", content: systemMessage }].concat(turns);
let visionPromptText = "";
if (turns.length > 0) {
const lastTurn = turns[turns.length - 1];
if (lastTurn.role === 'user') {
if (typeof lastTurn.content === 'string') {
visionPromptText = lastTurn.content;
} else if (Array.isArray(lastTurn.content)) {
const textPart = lastTurn.content.find(part => part.type === 'text');
if (textPart) visionPromptText = textPart.text;
}
}
}
logVision(conversationForLogVision, imageData, res, visionPromptText);
} else {
log(JSON.stringify([{ role: "system", content: systemMessage }].concat(turns)), res);
}
return res;
}
async sendVisionRequest(messages, systemMessage, imageBuffer) {
const imageMessages = [...messages];
imageMessages.push({
async sendVisionRequest(original_turns, systemMessage, imageBuffer) {
const imageFormattedTurns = [...original_turns];
imageFormattedTurns.push({
role: "user",
content: [
{ type: "text", text: systemMessage },
{
type: "image_url",
image_url: {
url: `data:image/jpeg;base64,${imageBuffer.toString('base64')}`
}
image_url: { url: `data:image/jpeg;base64,${imageBuffer.toString('base64')}` }
}
]
});
return this.sendRequest(imageMessages, systemMessage);
const res = await this.sendRequest(imageFormattedTurns, systemMessage);
if (imageBuffer && res) {
// The conversationHistory for logVision should be the state *before* this specific vision interaction's prompt was added.
logVision([{ role: "system", content: systemMessage }].concat(original_turns), imageBuffer, res, systemMessage);
}
return res;
}
async embed(text) {
@ -86,8 +141,4 @@ export class GPT {
});
return embedding.data[0].embedding;
}
}

View file

@ -1,5 +1,6 @@
import OpenAIApi from 'openai';
import { getKey } from '../utils/keys.js';
import { log, logVision } from '../../logger.js';
// xAI doesn't supply a SDK for their models, but fully supports OpenAI and Anthropic SDKs
export class Grok {
@ -7,42 +8,41 @@ export class Grok {
this.model_name = model_name;
this.url = url;
this.params = params;
let config = {};
if (url)
config.baseURL = url;
else
config.baseURL = "https://api.x.ai/v1"
config.apiKey = getKey('XAI_API_KEY');
this.openai = new OpenAIApi(config);
// Direct image data in sendRequest is not supported by this wrapper for standard chat.
// Grok may have specific vision capabilities, but this method assumes text-only.
this.supportsRawImageInput = false;
}
async sendRequest(turns, systemMessage, stop_seq='***') {
async sendRequest(turns, systemMessage, imageData = null, stop_seq='***') {
if (imageData) {
console.warn(`[Grok] Warning: imageData provided to sendRequest, but this method in grok.js does not support direct image data embedding for model ${this.model_name}. The image will be ignored.`);
}
let messages = [{'role': 'system', 'content': systemMessage}].concat(turns);
const pack = {
model: this.model_name || "grok-beta",
messages,
stop: [stop_seq],
...(this.params || {})
};
let res = null;
try {
console.log('Awaiting xai api response...')
///console.log('Messages:', messages);
let completion = await this.openai.chat.completions.create(pack);
if (completion.choices[0].finish_reason == 'length')
throw new Error('Context length exceeded');
console.log('Received.')
res = completion.choices[0].message.content;
}
catch (err) {
} catch (err) {
if ((err.message == 'Context length exceeded' || err.code == 'context_length_exceeded') && turns.length > 1) {
console.log('Context length exceeded, trying again with shorter context.');
return await this.sendRequest(turns.slice(1), systemMessage, stop_seq);
return await this.sendRequest(turns.slice(1), systemMessage, imageData, stop_seq);
} else if (err.message.includes('The model expects a single `text` element per message.')) {
console.log(err);
res = 'Vision is only supported by certain models.';
@ -52,31 +52,36 @@ export class Grok {
}
}
// sometimes outputs special token <|separator|>, just replace it
return res.replace(/<\|separator\|>/g, '*no response*');
let finalResponseText = res ? res.replace(/<\|separator\|>/g, '*no response*') : (res === null ? "*no response*" : res);
if (typeof finalResponseText === 'string') {
finalResponseText = finalResponseText.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
log(JSON.stringify([{ role: "system", content: systemMessage }].concat(turns)), finalResponseText);
return finalResponseText;
}
async sendVisionRequest(messages, systemMessage, imageBuffer) {
const imageMessages = [...messages];
imageMessages.push({
async sendVisionRequest(original_turns, systemMessage, imageBuffer) {
const imageFormattedTurns = [...original_turns];
imageFormattedTurns.push({
role: "user",
content: [
{ type: "text", text: systemMessage },
{
type: "image_url",
image_url: {
url: `data:image/jpeg;base64,${imageBuffer.toString('base64')}`
}
image_url: { url: `data:image/jpeg;base64,${imageBuffer.toString('base64')}` }
}
]
});
return this.sendRequest(imageMessages, systemMessage);
const res = await this.sendRequest(imageFormattedTurns, systemMessage);
if (imageBuffer && res) {
logVision([{ role: "system", content: systemMessage }].concat(original_turns), imageBuffer, res, systemMessage);
}
return res;
}
async embed(text) {
throw new Error('Embeddings are not supported by Grok.');
}
}

View file

@ -1,14 +1,14 @@
import Groq from 'groq-sdk'
import fs from "fs";
import { getKey } from '../utils/keys.js';
import { log, logVision } from '../../logger.js';
// THIS API IS NOT TO BE CONFUSED WITH GROK!
// Go to grok.js for that. :)
// Umbrella class for everything under the sun... That GroqCloud provides, that is.
export class GroqCloudAPI {
constructor(model_name, url, params) {
this.model_name = model_name;
this.url = url;
this.params = params || {};
@ -18,21 +18,23 @@ export class GroqCloudAPI {
delete this.params.tools;
// This is just a bit of future-proofing in case we drag Mindcraft in that direction.
// I'm going to do a sneaky ReplicateAPI theft for a lot of this, aren't I?
if (this.url)
console.warn("Groq Cloud has no implementation for custom URLs. Ignoring provided URL.");
this.groq = new Groq({ apiKey: getKey('GROQCLOUD_API_KEY') });
// Direct image data in sendRequest is not supported by this wrapper.
// Groq may offer specific vision models/APIs, but this standard chat method assumes text.
this.supportsRawImageInput = false;
}
async sendRequest(turns, systemMessage, stop_seq = null) {
async sendRequest(turns, systemMessage, imageData = null, stop_seq = null) {
if (imageData) {
console.warn(`[Groq] Warning: imageData provided to sendRequest, but this method in groq.js does not support direct image data embedding for model ${this.model_name}. The image will be ignored.`);
}
// Construct messages array
let messages = [{"role": "system", "content": systemMessage}].concat(turns);
let res = null;
try {
console.log("Awaiting Groq response...");
@ -42,7 +44,6 @@ export class GroqCloudAPI {
this.params.max_completion_tokens = this.params.max_tokens;
delete this.params.max_tokens;
}
if (!this.params.max_completion_tokens) {
this.params.max_completion_tokens = 4000;
}
@ -55,11 +56,15 @@ export class GroqCloudAPI {
...(this.params || {})
});
res = completion.choices[0].message.content;
res = res.replace(/<think>[\s\S]*?<\/think>/g, '').trim();
}
catch(err) {
let responseText = completion.choices[0].message.content;
if (typeof responseText === 'string') {
responseText = responseText.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
log(JSON.stringify([{ role: "system", content: systemMessage }].concat(turns)), responseText);
// Original cleaning of <think> tags for the *returned* response (not affecting log)
res = responseText.replace(/<think>[\s\S]*?<\/think>/g, '').trim();
return res;
} catch(err) {
if (err.message.includes("content must be a string")) {
res = "Vision is only supported by certain models.";
} else {
@ -67,29 +72,54 @@ export class GroqCloudAPI {
res = "My brain disconnected, try again.";
}
console.log(err);
if (typeof res === 'string') {
res = res.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
log(JSON.stringify([{ role: "system", content: systemMessage }].concat(turns)), res);
return res;
}
return res;
}
async sendVisionRequest(messages, systemMessage, imageBuffer) {
const imageMessages = messages.filter(message => message.role !== 'system');
async sendVisionRequest(original_turns, systemMessage, imageBuffer) {
const imageMessages = [...original_turns];
imageMessages.push({
role: "user",
content: [
{ type: "text", text: systemMessage },
{
type: "image_url",
image_url: {
url: `data:image/jpeg;base64,${imageBuffer.toString('base64')}`
}
image_url: { url: `data:image/jpeg;base64,${imageBuffer.toString('base64')}` }
}
]
});
return this.sendRequest(imageMessages);
const res = await this.sendRequest(imageMessages, systemMessage);
if (imageBuffer && res) {
logVision([{ role: "system", content: systemMessage }].concat(original_turns), imageBuffer, res, systemMessage);
}
return res;
}
async embed(_) {
throw new Error('Embeddings are not supported by Groq.');
}
}
export class GroqCloudTTS {
constructor() {
this.groq = new Groq({ apiKey: getKey('GROQCLOUD_API_KEY') });
}
async transcribe(filePath, options = {}) {
const transcription = await this.groq.audio.transcriptions.create({
file: fs.createReadStream(filePath),
model: options.model || "distil-whisper-large-v3-en", // or "whisper-large-v3-turbo"
prompt: options.prompt || "",
response_format: options.response_format || "json",
language: options.language || "en",
temperature: options.temperature !== undefined ? options.temperature : 0.0,
});
return transcription.text;
}
}

View file

@ -1,31 +1,31 @@
import { toSinglePrompt } from '../utils/text.js';
import { getKey } from '../utils/keys.js';
import { HfInference } from "@huggingface/inference";
import { log, logVision } from '../../logger.js';
export class HuggingFace {
constructor(model_name, url, params) {
// Remove 'huggingface/' prefix if present
this.model_name = model_name.replace('huggingface/', '');
this.url = url;
this.params = params;
if (this.url) {
console.warn("Hugging Face doesn't support custom urls!");
}
this.huggingface = new HfInference(getKey('HUGGINGFACE_API_KEY'));
// Direct image data in sendRequest is not supported by this wrapper.
// HuggingFace Inference API has other methods for vision tasks.
this.supportsRawImageInput = false;
}
async sendRequest(turns, systemMessage) {
async sendRequest(turns, systemMessage, imageData = null) {
if (imageData) {
console.warn(`[HuggingFace] Warning: imageData provided to sendRequest, but this method in huggingface.js does not support direct image data embedding for model ${this.model_name}. The image will be ignored.`);
}
const stop_seq = '***';
// Build a single prompt from the conversation turns
const prompt = toSinglePrompt(turns, null, stop_seq);
// Fallback model if none was provided
const model_name = this.model_name || 'meta-llama/Meta-Llama-3-8B';
// Combine system message with the prompt
const input = systemMessage + "\n" + prompt;
// We'll try up to 5 times in case of partial <think> blocks for DeepSeek-R1 models.
const logInputMessages = [{role: 'system', content: systemMessage}, ...turns];
const input = systemMessage + "" + prompt;
const maxAttempts = 5;
let attempt = 0;
let finalRes = null;
@ -35,7 +35,6 @@ export class HuggingFace {
console.log(`Awaiting Hugging Face API response... (model: ${model_name}, attempt: ${attempt})`);
let res = '';
try {
// Consume the streaming response chunk by chunk
for await (const chunk of this.huggingface.chatCompletionStream({
model: model_name,
messages: [{ role: "user", content: input }],
@ -46,36 +45,32 @@ export class HuggingFace {
} catch (err) {
console.log(err);
res = 'My brain disconnected, try again.';
// Break out immediately; we only retry when handling partial <think> tags.
break;
}
// If the model is DeepSeek-R1, check for mismatched <think> blocks.
const hasOpenTag = res.includes("<think>");
const hasCloseTag = res.includes("</think>");
// If there's a partial mismatch, warn and retry the entire request.
if ((hasOpenTag && !hasCloseTag)) {
console.warn("Partial <think> block detected. Re-generating...");
continue;
}
// If both tags are present, remove the <think> block entirely.
if (hasOpenTag && hasCloseTag) {
res = res.replace(/<think>[\s\S]*?<\/think>/g, '').trim();
}
const hasOpenTag = res.includes("<think>");
const hasCloseTag = res.includes("</think>");
if ((hasOpenTag && !hasCloseTag)) {
console.warn("Partial <think> block detected. Re-generating...");
if (attempt < maxAttempts) continue;
}
if (hasOpenTag && hasCloseTag) {
res = res.replace(/<think>[\s\S]*?<\/think>/g, '').trim();
}
finalRes = res;
break; // Exit loop if we got a valid response.
break;
}
// If no valid response was obtained after max attempts, assign a fallback.
if (finalRes == null) {
console.warn("Could not get a valid <think> block or normal response after max attempts.");
console.warn("Could not get a valid response after max attempts.");
finalRes = 'I thought too hard, sorry, try again.';
}
console.log('Received.');
console.log(finalRes);
if (typeof finalRes === 'string') {
finalRes = finalRes.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
log(JSON.stringify(logInputMessages), finalRes);
return finalRes;
}

View file

@ -1,113 +1,126 @@
import { getKey } from '../utils/keys.js';
export class Hyperbolic {
constructor(modelName, apiUrl) {
this.modelName = modelName || "deepseek-ai/DeepSeek-V3";
this.apiUrl = apiUrl || "https://api.hyperbolic.xyz/v1/chat/completions";
// Retrieve the Hyperbolic API key from keys.js
this.apiKey = getKey('HYPERBOLIC_API_KEY');
if (!this.apiKey) {
throw new Error('HYPERBOLIC_API_KEY not found. Check your keys.js file.');
}
}
/**
* Sends a chat completion request to the Hyperbolic endpoint.
*
* @param {Array} turns - An array of message objects, e.g. [{role: 'user', content: 'Hi'}].
* @param {string} systemMessage - The system prompt or instruction.
* @param {string} stopSeq - A stopping sequence, default '***'.
* @returns {Promise<string>} - The model's reply.
*/
async sendRequest(turns, systemMessage, stopSeq = '***') {
// Prepare the messages with a system prompt at the beginning
const messages = [{ role: 'system', content: systemMessage }, ...turns];
// Build the request payload
const payload = {
model: this.modelName,
messages: messages,
max_tokens: 8192,
temperature: 0.7,
top_p: 0.9,
stream: false
};
const maxAttempts = 5;
let attempt = 0;
let finalRes = null;
while (attempt < maxAttempts) {
attempt++;
console.log(`Awaiting Hyperbolic API response... (attempt: ${attempt})`);
console.log('Messages:', messages);
let completionContent = null;
try {
const response = await fetch(this.apiUrl, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${this.apiKey}`
},
body: JSON.stringify(payload)
});
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const data = await response.json();
if (data?.choices?.[0]?.finish_reason === 'length') {
throw new Error('Context length exceeded');
}
completionContent = data?.choices?.[0]?.message?.content || '';
console.log('Received response from Hyperbolic.');
} catch (err) {
if (
(err.message === 'Context length exceeded' || err.code === 'context_length_exceeded') &&
turns.length > 1
) {
console.log('Context length exceeded, trying again with a shorter context...');
return await this.sendRequest(turns.slice(1), systemMessage, stopSeq);
} else {
console.error(err);
completionContent = 'My brain disconnected, try again.';
}
}
// Check for <think> blocks
const hasOpenTag = completionContent.includes("<think>");
const hasCloseTag = completionContent.includes("</think>");
if ((hasOpenTag && !hasCloseTag)) {
console.warn("Partial <think> block detected. Re-generating...");
continue; // Retry the request
}
if (hasCloseTag && !hasOpenTag) {
completionContent = '<think>' + completionContent;
}
if (hasOpenTag && hasCloseTag) {
completionContent = completionContent.replace(/<think>[\s\S]*?<\/think>/g, '').trim();
}
finalRes = completionContent.replace(/<\|separator\|>/g, '*no response*');
break; // Valid response obtained—exit loop
}
if (finalRes == null) {
console.warn("Could not get a valid <think> block or normal response after max attempts.");
finalRes = 'I thought too hard, sorry, try again.';
}
return finalRes;
}
async embed(text) {
throw new Error('Embeddings are not supported by Hyperbolic.');
}
}
import { getKey } from '../utils/keys.js';
import { log, logVision } from '../../logger.js';
export class Hyperbolic {
constructor(modelName, apiUrl) {
this.modelName = modelName || "deepseek-ai/DeepSeek-V3";
this.apiUrl = apiUrl || "https://api.hyperbolic.xyz/v1/chat/completions";
this.apiKey = getKey('HYPERBOLIC_API_KEY');
if (!this.apiKey) {
throw new Error('HYPERBOLIC_API_KEY not found. Check your keys.js file.');
}
// Direct image data in sendRequest is not supported by this wrapper.
this.supportsRawImageInput = false;
}
async sendRequest(turns, systemMessage, imageData = null, stopSeq = '***') {
if (imageData) {
console.warn(`[Hyperbolic] Warning: imageData provided to sendRequest, but this method in hyperbolic.js does not support direct image data embedding for model ${this.modelName}. The image will be ignored.`);
}
const messages = [{ role: 'system', content: systemMessage }, ...turns];
const payload = {
model: this.modelName,
messages: messages,
max_tokens: 8192,
temperature: 0.7,
top_p: 0.9,
stream: false
// stop: stopSeq, // Hyperbolic API might not support stop sequences in the same way or at all.
// If it does, it might need to be formatted differently or might not be part of standard payload.
// For now, commenting out if it causes issues or is not standard.
};
if (stopSeq && stopSeq !== '***') { // Only add stop if it's meaningful and not the default placeholder
payload.stop = stopSeq;
}
const maxAttempts = 5;
let attempt = 0;
let finalRes = null;
let rawCompletionContent = null;
while (attempt < maxAttempts) {
attempt++;
console.log(`Awaiting Hyperbolic API response... (attempt: ${attempt})`);
try {
const response = await fetch(this.apiUrl, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${this.apiKey}`
},
body: JSON.stringify(payload)
});
if (!response.ok) {
// Attempt to read error body for more details
let errorBody = "No additional error details.";
try {
errorBody = await response.text();
} catch (e) { /* ignore if error body can't be read */ }
throw new Error(`HTTP error! status: ${response.status}, message: ${errorBody}`);
}
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const data = await response.json();
if (data?.choices?.[0]?.finish_reason === 'length') {
throw new Error('Context length exceeded');
}
rawCompletionContent = data?.choices?.[0]?.message?.content || '';
console.log('Received response from Hyperbolic.');
} catch (err) {
if ((err.message === 'Context length exceeded' || err.code === 'context_length_exceeded') && turns.length > 1) {
console.log('Context length exceeded, trying again with a shorter context...');
return await this.sendRequest(turns.slice(1), systemMessage, imageData, stopSeq);
} else {
console.error(err);
rawCompletionContent = 'My brain disconnected, try again.';
finalRes = rawCompletionContent;
break;
}
}
let processedContent = rawCompletionContent;
const hasOpenTag = processedContent.includes("<think>");
const hasCloseTag = processedContent.includes("</think>");
if ((hasOpenTag && !hasCloseTag)) {
console.warn("Partial <think> block detected. Re-generating...");
if (attempt < maxAttempts) continue;
}
if (hasCloseTag && !hasOpenTag) {
processedContent = '<think>' + processedContent;
}
if (hasOpenTag && hasCloseTag) {
processedContent = processedContent.replace(/<think>[\s\S]*?<\/think>/g, '').trim();
}
finalRes = processedContent.replace(/<\|separator\|>/g, '*no response*');
if (!(hasOpenTag && !hasCloseTag && attempt < maxAttempts)) {
break;
}
}
if (finalRes == null) {
finalRes = rawCompletionContent || 'I thought too hard, sorry, try again.';
finalRes = finalRes.replace(/<\|separator\|>/g, '*no response*');
}
if (typeof finalRes === 'string') {
finalRes = finalRes.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
log(JSON.stringify([{ role: 'system', content: systemMessage }].concat(turns)), finalRes);
return finalRes;
}
async embed(text) {
throw new Error('Embeddings are not supported by Hyperbolic.');
}
}

View file

@ -1,4 +1,5 @@
import { strictFormat } from '../utils/text.js';
import { log, logVision } from '../../logger.js';
export class Local {
constructor(model_name, url, params) {
@ -7,14 +8,37 @@ export class Local {
this.url = url || 'http://127.0.0.1:11434';
this.chat_endpoint = '/api/chat';
this.embedding_endpoint = '/api/embeddings';
// Note: Actual multimodal support depends on the specific Ollama model (e.g., LLaVA, BakLLaVA)
this.supportsRawImageInput = true;
}
async sendRequest(turns, systemMessage) {
let model = this.model_name || 'llama3.1'; // Updated to llama3.1, as it is more performant than llama3
async sendRequest(turns, systemMessage, imageData = null) {
let model = this.model_name || 'sweaterdog/andy-4:latest'; // Changed to Andy-4
let messages = strictFormat(turns);
messages.unshift({ role: 'system', content: systemMessage });
if (imageData) {
console.warn(`[Ollama] imageData provided. Ensure the configured Ollama model ('${model}') is multimodal (e.g., llava, bakllava) to process images.`);
let lastUserMessageIndex = -1;
for (let i = messages.length - 1; i >= 0; i--) {
if (messages[i].role === 'user') {
lastUserMessageIndex = i;
break;
}
}
if (lastUserMessageIndex !== -1) {
if (!messages[lastUserMessageIndex].images) {
messages[lastUserMessageIndex].images = [];
}
messages[lastUserMessageIndex].images.push(imageData.toString('base64'));
} else {
console.warn('[Ollama] imageData provided, but no user message found to attach it to. Image not sent.');
// Or, could create a new user message:
// messages.push({ role: 'user', content: "Image attached.", images: [imageData.toString('base64')] });
}
}
// We'll attempt up to 5 times for models with deepseek-r1-esk reasoning if the <think> tags are mismatched.
const maxAttempts = 5;
let attempt = 0;
let finalRes = null;
@ -24,14 +48,14 @@ export class Local {
console.log(`Awaiting local response... (model: ${model}, attempt: ${attempt})`);
let res = null;
try {
res = await this.send(this.chat_endpoint, {
let apiResponse = await this.send(this.chat_endpoint, {
model: model,
messages: messages,
stream: false,
...(this.params || {})
});
if (res) {
res = res['message']['content'];
if (apiResponse) {
res = apiResponse['message']['content'];
} else {
res = 'No response data.';
}
@ -43,38 +67,48 @@ export class Local {
console.log(err);
res = 'My brain disconnected, try again.';
}
}
// If the model name includes "deepseek-r1" or "Andy-3.5-reasoning", then handle the <think> block.
const hasOpenTag = res.includes("<think>");
const hasCloseTag = res.includes("</think>");
// If there's a partial mismatch, retry to get a complete response.
if ((hasOpenTag && !hasCloseTag)) {
console.warn("Partial <think> block detected. Re-generating...");
continue;
}
// If </think> is present but <think> is not, prepend <think>
if (hasCloseTag && !hasOpenTag) {
res = '<think>' + res;
}
// Changed this so if the model reasons, using <think> and </think> but doesn't start the message with <think>, <think> ges prepended to the message so no error occur.
// If both tags appear, remove them (and everything inside).
if (hasOpenTag && hasCloseTag) {
res = res.replace(/<think>[\s\S]*?<\/think>/g, '');
}
const hasOpenTag = res.includes("<think>");
const hasCloseTag = res.includes("</think>");
if ((hasOpenTag && !hasCloseTag)) {
console.warn("Partial <think> block detected. Re-generating...");
if (attempt < maxAttempts) continue;
}
if (hasCloseTag && !hasOpenTag) {
res = '<think>' + res;
}
if (hasOpenTag && hasCloseTag) {
res = res.replace(/<think>[\s\S]*?<\/think>/g, '').trim();
}
finalRes = res;
break; // Exit the loop if we got a valid response.
break;
}
if (finalRes == null) {
console.warn("Could not get a valid <think> block or normal response after max attempts.");
console.warn("Could not get a valid response after max attempts.");
finalRes = 'I thought too hard, sorry, try again.';
}
if (typeof finalRes === 'string') {
finalRes = finalRes.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
if (imageData) { // If imageData was part of this sendRequest call
// `messages` here already includes the system prompt and image data
let visionPromptText = "";
if (messages.length > 0) {
const lastTurn = messages[messages.length -1];
// For Ollama, content is a string, images is a separate array.
if (lastTurn.role === 'user' && typeof lastTurn.content === 'string') {
visionPromptText = lastTurn.content;
}
}
logVision(messages, imageData, finalRes, visionPromptText);
} else {
// messages already includes system prompt if no imageData
log(JSON.stringify(messages), finalRes);
}
return finalRes;
}

View file

@ -1,19 +1,17 @@
import { Mistral as MistralClient } from '@mistralai/mistralai';
import { getKey } from '../utils/keys.js';
import { strictFormat } from '../utils/text.js';
import { log, logVision } from '../../logger.js';
export class Mistral {
#client;
constructor(model_name, url, params) {
this.model_name = model_name;
this.params = params;
if (typeof url === "string") {
console.warn("Mistral does not support custom URL's, ignoring!");
}
if (!getKey("MISTRAL_API_KEY")) {
throw new Error("Mistral API Key missing, make sure to set MISTRAL_API_KEY in settings.json")
}
@ -23,37 +21,31 @@ export class Mistral {
apiKey: getKey("MISTRAL_API_KEY")
}
);
this.supportsRawImageInput = false; // Standard chat completions may not support raw images for all models.
// Prevents the following code from running when model not specified
if (typeof this.model_name === "undefined") return;
// get the model name without the "mistral" or "mistralai" prefix
// e.g "mistral/mistral-large-latest" -> "mistral-large-latest"
if (typeof model_name.split("/")[1] !== "undefined") {
this.model_name = model_name.split("/")[1];
if (typeof this.model_name === "string" && typeof this.model_name.split("/")[1] !== "undefined") {
this.model_name = this.model_name.split("/")[1];
}
}
async sendRequest(turns, systemMessage) {
async sendRequest(turns, systemMessage, imageData = null) {
if (imageData) {
console.warn(`[Mistral] Warning: imageData provided to sendRequest, but this method in mistral.js currently does not support direct image data embedding for model ${this.model_name}. The image will be ignored. Use sendVisionRequest for models/endpoints that support vision, or ensure the API/model used by sendRequest can handle images in its standard chat format.`);
// imageData is ignored for now.
}
let result;
const model = this.model_name || "mistral-large-latest";
const messages = [{ role: "system", content: systemMessage }];
messages.push(...strictFormat(turns));
try {
const model = this.model_name || "mistral-large-latest";
const messages = [
{ role: "system", content: systemMessage }
];
messages.push(...strictFormat(turns));
console.log('Awaiting mistral api response...')
const response = await this.#client.chat.complete({
model,
messages,
...(this.params || {})
});
result = response.choices[0].message.content;
} catch (err) {
if (err.message.includes("A request containing images has been given to a model which does not have the 'vision' capability.")) {
@ -63,24 +55,28 @@ export class Mistral {
}
console.log(err);
}
if (typeof result === 'string') {
result = result.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
log(JSON.stringify(messages), result);
return result;
}
async sendVisionRequest(messages, systemMessage, imageBuffer) {
const imageMessages = [...messages];
imageMessages.push({
role: "user",
content: [
{ type: "text", text: systemMessage },
{
type: "image_url",
imageUrl: `data:image/jpeg;base64,${imageBuffer.toString('base64')}`
}
]
async sendVisionRequest(original_turns, systemMessage, imageBuffer) {
const imageFormattedTurns = [...original_turns];
const userMessageContent = [{ type: "text", text: systemMessage }];
userMessageContent.push({
type: "image_url",
imageUrl: `data:image/jpeg;base64,${imageBuffer.toString('base64')}`
});
imageFormattedTurns.push({ role: "user", content: userMessageContent });
return this.sendRequest(imageMessages, systemMessage);
const res = await this.sendRequest(imageFormattedTurns, systemMessage);
if (imageBuffer && res) {
logVision(original_turns, imageBuffer, res, systemMessage);
}
return res;
}
async embed(text) {

View file

@ -1,6 +1,7 @@
import OpenAIApi from 'openai';
import { getKey } from '../utils/keys.js';
import { strictFormat } from '../utils/text.js';
import { log, logVision } from '../../logger.js';
// llama, mistral
export class Novita {
@ -16,15 +17,20 @@ export class Novita {
config.apiKey = getKey('NOVITA_API_KEY');
this.openai = new OpenAIApi(config);
// Direct image data in sendRequest is not supported by this wrapper.
this.supportsRawImageInput = false;
}
async sendRequest(turns, systemMessage, stop_seq='***') {
let messages = [{'role': 'system', 'content': systemMessage}].concat(turns);
async sendRequest(turns, systemMessage, imageData = null, stop_seq='***') {
if (imageData) {
console.warn(`[Novita] Warning: imageData provided to sendRequest, but this method in novita.js does not support direct image data embedding for model ${this.model_name}. The image will be ignored.`);
}
let messages = [{'role': 'system', 'content': systemMessage}].concat(turns);
messages = strictFormat(messages);
messages = strictFormat(messages);
const pack = {
const pack = {
model: this.model_name || "meta-llama/llama-3.1-70b-instruct",
messages,
stop: [stop_seq],
@ -43,23 +49,32 @@ export class Novita {
catch (err) {
if ((err.message == 'Context length exceeded' || err.code == 'context_length_exceeded') && turns.length > 1) {
console.log('Context length exceeded, trying again with shorter context.');
return await sendRequest(turns.slice(1), systemMessage, stop_seq);
return await this.sendRequest(turns.slice(1), systemMessage, imageData, stop_seq); // Added this. and imageData
} else {
console.log(err);
res = 'My brain disconnected, try again.';
}
}
if (res.includes('<think>')) {
let start = res.indexOf('<think>');
let end = res.indexOf('</think>') + 8;
if (start != -1) {
if (end != -1) {
res = res.substring(0, start) + res.substring(end);
} else {
res = res.substring(0, start+7);
if (typeof res === 'string') {
res = res.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
log(JSON.stringify(messages), res); // Log transformed res
// Existing stripping logic for <think> tags
if (res && typeof res === 'string' && res.includes('<think>')) {
let start = res.indexOf('<think>');
let end = res.indexOf('</think>') + 8; // length of '</think>'
if (start !== -1) { // Ensure '<think>' was found
if (end !== -1 && end > start + 7) { // Ensure '</think>' was found and is after '<think>'
res = res.substring(0, start) + res.substring(end);
} else {
// Malformed or missing end tag, strip from '<think>' onwards or handle as error
// Original code: res = res.substring(0, start+7); This would leave "<think>"
// Let's assume we strip from start if end is not valid.
res = res.substring(0, start);
}
}
}
res = res.trim();
res = res.trim();
}
return res;
}

View file

@ -1,63 +1,105 @@
import OpenAIApi from 'openai';
import { getKey, hasKey } from '../utils/keys.js';
import { strictFormat } from '../utils/text.js';
import { log, logVision } from '../../logger.js';
export class OpenRouter {
constructor(model_name, url) {
this.model_name = model_name;
let config = {};
config.baseURL = url || 'https://openrouter.ai/api/v1';
const apiKey = getKey('OPENROUTER_API_KEY');
if (!apiKey) {
console.error('Error: OPENROUTER_API_KEY not found. Make sure it is set properly.');
}
// Pass the API key to OpenAI compatible Api
config.apiKey = apiKey;
config.apiKey = apiKey;
this.openai = new OpenAIApi(config);
// OpenRouter is a router; individual models might support vision.
// This generic sendRequest does not format for vision. Use sendVisionRequest or specific model logic.
this.supportsRawImageInput = false;
}
async sendRequest(turns, systemMessage, stop_seq='*') {
async sendRequest(turns, systemMessage, imageData = null, stop_seq='*') {
if (imageData) {
console.warn(`[OpenRouter] Warning: imageData provided to sendRequest. While OpenRouter can route to vision models, this generic method does not format for image data. The image will be ignored. Use sendVisionRequest or ensure your model call through OpenRouter is specifically formatted for vision if needed.`);
}
let messages = [{ role: 'system', content: systemMessage }, ...turns];
messages = strictFormat(messages);
// Choose a valid model from openrouter.ai (for example, "openai/gpt-4o")
const pack = {
model: this.model_name,
messages,
stop: stop_seq
include_reasoning: true,
// stop: stop_seq // Commented out since some API providers on Openrouter do not support a stop sequence, such as Grok 3
};
let res = null;
try {
console.log('Awaiting openrouter api response...');
let completion = await this.openai.chat.completions.create(pack);
if (!completion?.choices?.[0]) {
console.error('No completion or choices returned:', completion);
return 'No response received.';
const maxAttempts = 5;
let attempt = 0;
let finalRes = null;
while (attempt < maxAttempts) {
attempt++;
console.info(`Awaiting openrouter API response... (attempt: ${attempt})`);
let res = null;
try {
let completion = await this.openai.chat.completions.create(pack);
if (!completion?.choices?.[0]) {
console.error('No completion or choices returned:', completion);
return 'No response received.';
}
const logMessages = [{ role: "system", content: systemMessage }].concat(turns);
if (completion.choices[0].finish_reason === 'length') {
throw new Error('Context length exceeded');
}
if (completion.choices[0].message.reasoning) {
try{
const reasoning = '<think>\n' + completion.choices[0].message.reasoning + '</think>\n';
const content = completion.choices[0].message.content;
// Standard logging for text-based responses
log(JSON.stringify(logMessages), reasoning + "\n" + content);
res = content;
} catch {}
} else {
try {
res = completion.choices[0].message.content;
// Standard logging for text-based responses
log(JSON.stringify(logMessages), res);
} catch {
console.warn("Unable to log due to unknown error!");
}
}
// Trim <think> blocks from the final response if present.
if (res && res.includes("<think>") && res.includes("</think>")) {
res = res.replace(/<think>[\s\S]*?<\/think>/g, '').trim();
}
console.info('Received.');
} catch (err) {
console.error('Error while awaiting response:', err);
res = 'My brain disconnected, try again.';
}
if (completion.choices[0].finish_reason === 'length') {
throw new Error('Context length exceeded');
}
console.log('Received.');
res = completion.choices[0].message.content;
} catch (err) {
console.error('Error while awaiting response:', err);
// If the error indicates a context-length problem, we can slice the turns array, etc.
res = 'My brain disconnected, try again.';
finalRes = res;
break; // Exit loop once a valid response is obtained.
}
return res;
if (finalRes == null) {
console.warn("Could not get a valid <think> block or normal response after max attempts.");
finalRes = 'I thought too hard, sorry, try again.';
}
return finalRes;
}
async sendVisionRequest(messages, systemMessage, imageBuffer) {
const imageMessages = [...messages];
imageMessages.push({
async sendVisionRequest(original_turns, systemMessage, imageBuffer) { // Renamed messages to original_turns
const imageFormattedTurns = [...original_turns];
imageFormattedTurns.push({
role: "user",
content: [
{ type: "text", text: systemMessage },
// The systemMessage is used as the text prompt accompanying the image here
{ type: "text", text: systemMessage },
{
type: "image_url",
image_url: {
@ -67,10 +109,20 @@ export class OpenRouter {
]
});
return this.sendRequest(imageMessages, systemMessage);
// Pass the main systemMessage to sendRequest, as it expects a system prompt.
// The image-specific prompt is part of imageFormattedTurns.
const res = await this.sendRequest(imageFormattedTurns, systemMessage, null, stop_seq);
if (imageBuffer && res) {
// For logVision, conversationHistory should be the original turns + system prompt.
// The visionMessage (text prompt for the image) is systemMessage in this context.
logVision([{ role: "system", content: systemMessage }].concat(original_turns), imageBuffer, res, systemMessage);
}
return res;
}
async embed(text) {
throw new Error('Embeddings are not supported by Openrouter.');
}
}
}

View file

@ -341,9 +341,29 @@ export class Prompter {
let prompt = this.profile.conversing;
prompt = await this.replaceStrings(prompt, messages, this.convo_examples);
let generation;
let imageData = null;
if (settings.vision_mode === 'always' && messages.length > 0) {
const lastMessage = messages[messages.length - 1];
// Check if the last message has an imagePath and if the model supports raw image input
if (lastMessage.imagePath && this.chat_model.supportsRawImageInput) {
try {
// Construct the full path to the image file
const agentScreenshotDir = path.join('bots', this.agent.name, 'screenshots');
const imageFullPath = path.join(agentScreenshotDir, lastMessage.imagePath);
console.log(`[Prompter] Attempting to read image for always_active mode: ${imageFullPath}`);
imageData = await fs.readFile(imageFullPath); // Read as buffer
console.log('[Prompter] Image data prepared for chat model.');
} catch (err) {
console.error(`[Prompter] Error reading image file ${lastMessage.imagePath}:`, err);
imageData = null; // Proceed without image data if reading fails
}
}
}
try {
generation = await this.chat_model.sendRequest(messages, prompt);
generation = await this.chat_model.sendRequest(messages, prompt, imageData);
if (typeof generation !== 'string') {
console.error('Error: Generated response is not a string', generation);
throw new Error('Generated response is not a string');
@ -351,6 +371,9 @@ export class Prompter {
console.log("Generated response:", generation);
await this._saveLog(prompt, messages, generation, 'conversation');
// Remove the incorrect logVision call here since sendRequest should handle it
// The model's sendRequest method will call logVision if imageData was provided
} catch (error) {
console.error('Error during message generation or file writing:', error);
continue;
@ -452,8 +475,15 @@ export class Prompter {
}
async _saveLog(prompt, messages, generation, tag) {
if (!settings.log_all_prompts)
return;
switch (tag) {
case 'conversation':
case 'coding': // Assuming coding logs fall under normal data
case 'memSaving':
if (!settings.log_normal_data) return;
break;
default:
return;
}
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
let logEntry;
let task_id = this.agent.task.task_id;
@ -480,6 +510,4 @@ export class Prompter {
logFile = path.join(logDir, logFile);
await fs.appendFile(logFile, String(logEntry), 'utf-8');
}
}

View file

@ -1,6 +1,7 @@
import OpenAIApi from 'openai';
import { getKey, hasKey } from '../utils/keys.js';
import { strictFormat } from '../utils/text.js';
import { log, logVision } from '../../logger.js';
export class Qwen {
constructor(model_name, url, params) {
@ -12,15 +13,51 @@ export class Qwen {
config.apiKey = getKey('QWEN_API_KEY');
this.openai = new OpenAIApi(config);
// Note: Actual multimodal support depends on the specific Qwen model (e.g., qwen-vl-plus)
this.supportsRawImageInput = true;
}
async sendRequest(turns, systemMessage, stop_seq='***') {
async sendRequest(turns, systemMessage, imageData = null, stop_seq = '***') {
let messages = [{'role': 'system', 'content': systemMessage}].concat(turns);
messages = strictFormat(messages);
if (imageData) {
// Qwen VL models include names like "qwen-vl-plus", "qwen-vl-max", "qwen-vl-chat-v1"
if (!this.model_name || !this.model_name.toLowerCase().includes('-vl')) {
console.warn(`[Qwen] Warning: imageData provided for model ${this.model_name}, which does not appear to be a Qwen Vision-Language (VL) model. The image may be ignored or cause an error.`);
}
let lastUserMessageIndex = -1;
for (let i = messages.length - 1; i >= 0; i--) {
if (messages[i].role === 'user') {
lastUserMessageIndex = i;
break;
}
}
if (lastUserMessageIndex !== -1) {
const userMessage = messages[lastUserMessageIndex];
if (typeof userMessage.content === 'string') { // Ensure content is a string before converting
userMessage.content = [
{ "text": userMessage.content },
{ "image": `data:image/jpeg;base64,${imageData.toString('base64')}` }
];
} else if (Array.isArray(userMessage.content)) {
// If content is already an array (e.g. from previous image), add new image
userMessage.content.push({ "image": `data:image/jpeg;base64,${imageData.toString('base64')}` });
} else {
console.warn('[Qwen] Last user message content is not a string or array. Creating new content array for image.');
userMessage.content = [{ "image": `data:image/jpeg;base64,${imageData.toString('base64')}` }];
}
} else {
console.warn('[Qwen] imageData provided, but no user message found to attach it to. Image not sent.');
// Alternative: Create a new user message with the image
// messages.push({ role: 'user', content: [{ "image": `data:image/jpeg;base64,${imageData.toString('base64')}` }] });
}
}
const pack = {
model: this.model_name || "qwen-plus",
model: this.model_name || "qwen-plus", // Default might need to be a VL model if images are common
messages,
stop: stop_seq,
...(this.params || {})
@ -45,6 +82,27 @@ export class Qwen {
res = 'My brain disconnected, try again.';
}
}
if (typeof res === 'string') {
res = res.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
if (imageData) { // If imageData was part of this sendRequest call
// `messages` here includes system prompt and image data
let visionPromptText = "";
if (messages.length > 0) {
const lastTurn = messages[messages.length - 1];
if (lastTurn.role === 'user' && Array.isArray(lastTurn.content)) {
const textPart = lastTurn.content.find(part => part.text);
if (textPart) visionPromptText = textPart.text;
} else if (lastTurn.role === 'user' && typeof lastTurn.content === 'string'){
visionPromptText = lastTurn.content;
}
}
logVision(messages, imageData, res, visionPromptText);
} else {
// messages already includes system prompt if no imageData
log(JSON.stringify(messages), res);
}
return res;
}
@ -76,4 +134,4 @@ export class Qwen {
throw new Error('Max retries reached, request failed.');
}
}
}

View file

@ -1,6 +1,7 @@
import Replicate from 'replicate';
import { toSinglePrompt } from '../utils/text.js';
import { getKey } from '../utils/keys.js';
import { log, logVision } from '../../logger.js';
// llama, mistral
export class ReplicateAPI {
@ -16,13 +17,20 @@ export class ReplicateAPI {
this.replicate = new Replicate({
auth: getKey('REPLICATE_API_KEY'),
});
// Direct image data in sendRequest is not supported by this wrapper.
// Replicate handles vision models differently, often with specific inputs like "image".
this.supportsRawImageInput = false;
}
async sendRequest(turns, systemMessage) {
async sendRequest(turns, systemMessage, imageData = null) {
if (imageData) {
console.warn(`[ReplicateAPI] Warning: imageData provided to sendRequest, but this method in replicate.js does not support direct image data embedding for model ${this.model_name}. The image will be ignored. Replicate models with vision capabilities usually require specific input fields like 'image' with a URL or base64 string.`);
}
const stop_seq = '***';
const prompt = toSinglePrompt(turns, null, stop_seq);
let model_name = this.model_name || 'meta/meta-llama-3-70b-instruct';
const logInputMessages = [{role: 'system', content: systemMessage}, ...turns];
const input = {
prompt,
system_prompt: systemMessage,
@ -45,6 +53,10 @@ export class ReplicateAPI {
console.log(err);
res = 'My brain disconnected, try again.';
}
if (typeof res === 'string') {
res = res.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
log(JSON.stringify(logInputMessages), res);
console.log('Received.');
return res;
}

View file

@ -1,9 +1,13 @@
// This code uses Dashscope and HTTP to ensure the latest support for the Qwen model.
// Qwen is also compatible with the OpenAI API format;
// This code uses Dashscope and HTTP to ensure the latest support for the Qwen model.
// Qwen is also compatible with the OpenAI API format;
import OpenAIApi from 'openai';
import { getKey, hasKey } from '../utils/keys.js';
import { strictFormat } from '../utils/text.js';
import { log, logVision } from '../../logger.js';
export class VLLM {
constructor(model_name, url) {
@ -19,9 +23,15 @@ export class VLLM {
vllm_config.apiKey = ""
this.vllm = new OpenAIApi(vllm_config);
// VLLM can serve various models. This generic sendRequest does not format for vision.
// Specific multimodal models served via VLLM might require custom request formatting.
this.supportsRawImageInput = false;
}
async sendRequest(turns, systemMessage, stop_seq = '***') {
async sendRequest(turns, systemMessage, imageData = null, stop_seq = '***') {
if (imageData) {
console.warn(`[VLLM] Warning: imageData provided to sendRequest, but this method in vllm.js does not support direct image data embedding for model ${this.model_name}. The image will be ignored. Ensure the VLLM endpoint is configured for a multimodal model and the request is formatted accordingly if vision is intended.`);
}
let messages = [{ 'role': 'system', 'content': systemMessage }].concat(turns);
if (this.model_name.includes('deepseek') || this.model_name.includes('qwen')) {
@ -48,12 +58,16 @@ export class VLLM {
catch (err) {
if ((err.message == 'Context length exceeded' || err.code == 'context_length_exceeded') && turns.length > 1) {
console.log('Context length exceeded, trying again with shorter context.');
return await this.sendRequest(turns.slice(1), systemMessage, stop_seq);
return await this.sendRequest(turns.slice(1), systemMessage, imageData, stop_seq);
} else {
console.log(err);
res = 'My brain disconnected, try again.';
}
}
if (typeof res === 'string') {
res = res.replace(/<thinking>/g, '<think>').replace(/<\/thinking>/g, '</think>');
}
log(JSON.stringify(messages), res);
return res;
}

470
src/process/tts_process.js Normal file
View file

@ -0,0 +1,470 @@
import settings from '../../settings.js';
import { GroqCloudTTS } from '../models/groq.js';
import wav from 'wav';
import fs from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';
// Import getIO and our new function getAllInGameAgentNames
import { getIO, getAllInGameAgentNames } from '../server/mind_server.js';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
// Import the audio libraries conditionally
let portAudio;
let AudioIO;
let SampleFormat16Bit;
let mic; // For mic library
let activeAudioLibrary = null; // 'naudiodon' or 'mic'
(async () => {
try {
const naudiodonModule = await import('naudiodon');
portAudio = naudiodonModule.default;
if (portAudio && typeof portAudio.AudioIO === 'function' && typeof portAudio.SampleFormat16Bit !== 'undefined') {
AudioIO = portAudio.AudioIO;
SampleFormat16Bit = portAudio.SampleFormat16Bit;
activeAudioLibrary = 'naudiodon';
console.log('[STT] naudiodon loaded successfully.');
} else if (naudiodonModule.AudioIO && typeof naudiodonModule.SampleFormat16Bit !== 'undefined') {
AudioIO = naudiodonModule.AudioIO;
SampleFormat16Bit = naudiodonModule.SampleFormat16Bit;
portAudio = naudiodonModule;
activeAudioLibrary = 'naudiodon';
console.log('[STT] naudiodon loaded successfully (direct properties).');
} else {
throw new Error('AudioIO or SampleFormat16Bit not found in naudiodon module exports.');
}
} catch (err) {
console.warn(`[STT] Failed to load naudiodon. Error: ${err.message}`);
portAudio = null;
AudioIO = null;
SampleFormat16Bit = null;
// Attempt to load mic if naudiodon fails
try {
const micModule = await import('mic');
mic = micModule.default; // Assuming mic is also a CommonJS module typically
if (mic && typeof mic === 'function') { // mic is often a constructor function
activeAudioLibrary = 'mic';
console.log('[STT] mic loaded successfully as an alternative.');
} else if (micModule.Mic) { // Some modules might export it as Mic
mic = micModule.Mic;
activeAudioLibrary = 'mic';
console.log('[STT] mic (Mic) loaded successfully as an alternative.');
}
else {
throw new Error('Mic constructor not found in mic module exports.');
}
} catch (micErr) {
console.warn(`[STT] Failed to load mic as well. Speech-to-Text will be disabled. Error: ${micErr.message}`);
mic = null;
activeAudioLibrary = null;
}
}
// Initialize TTS after attempting to load audio libraries
initTTS();
})();
/**
* Delete leftover speech_*.wav from previous runs
*/
const leftover = fs.readdirSync(__dirname).filter(f => /^speech_\d+\.wav$/.test(f));
for (const file of leftover) {
try {
fs.unlinkSync(path.join(__dirname, file));
} catch (_) {
// ignore errors
}
}
// Configuration from settings
const RMS_THRESHOLD = settings.stt_rms_threshold || 8000;
const SILENCE_DURATION = settings.stt_silence_duration || 2000;
const MIN_AUDIO_DURATION = settings.stt_min_audio_duration || 0.5;
const MAX_AUDIO_DURATION = settings.stt_max_audio_duration || 15;
const DEBUG_AUDIO = settings.stt_debug_audio || false;
const COOLDOWN_MS = settings.stt_cooldown_ms || 2000;
const SPEECH_THRESHOLD_RATIO = settings.stt_speech_threshold_ratio || 0.15;
const CONSECUTIVE_SPEECH_SAMPLES = settings.stt_consecutive_speech_samples || 5;
const SAMPLE_RATE = 16000;
const BIT_DEPTH = 16;
const STT_USERNAME = settings.stt_username || "SERVER";
const STT_AGENT_NAME = settings.stt_agent_name || "";
// Guards to prevent multiple overlapping recordings
let isRecording = false;
let sttRunning = false;
let sttInitialized = false;
let lastRecordingEndTime = 0;
async function recordAndTranscribeOnce() {
// Check cooldown period
const timeSinceLastRecording = Date.now() - lastRecordingEndTime;
if (timeSinceLastRecording < COOLDOWN_MS) {
return null;
}
// If another recording is in progress, just skip
if (isRecording) {
return null;
}
isRecording = true;
const outFile = path.join(__dirname, `speech_${Date.now()}.wav`);
const fileWriter = new wav.FileWriter(outFile, {
channels: 1,
sampleRate: SAMPLE_RATE,
bitDepth: BIT_DEPTH
});
if (!activeAudioLibrary) {
console.warn("[STT] No audio recording library available.");
isRecording = false;
return null;
}
let audioInterface;
let audioStream;
let recording = true;
let hasHeardSpeech = false;
let silenceTimer = null;
let maxDurationTimer = null;
let finished = false;
// Smart speech detection variables
let speechSampleCount = 0;
let totalSampleCount = 0;
let consecutiveSpeechSamples = 0;
let speechLevels = [];
let averageSpeechLevel = 0;
let adaptiveThreshold = RMS_THRESHOLD;
// Helper to reset silence timer
function resetSilenceTimer() {
if (silenceTimer) clearTimeout(silenceTimer);
// Only start silence timer if actual speech has been detected
if (hasHeardSpeech && recording) { // also check `recording` to prevent timer after explicit stop
silenceTimer = setTimeout(() => {
if (DEBUG_AUDIO) console.log('[STT] Silence timeout reached, stopping recording.');
stopRecording();
}, SILENCE_DURATION);
}
}
// Stop recording
function stopRecording() {
if (!recording) return;
recording = false;
if (silenceTimer) clearTimeout(silenceTimer);
if (maxDurationTimer) clearTimeout(maxDurationTimer);
if (activeAudioLibrary === 'naudiodon' && audioInterface) {
try {
audioInterface.quit();
} catch (err) {
// Silent error handling
}
} else if (activeAudioLibrary === 'mic' && audioInterface) {
try {
audioInterface.stop();
} catch (err) {
// Silent error handling
}
}
if (fileWriter && !fileWriter.closed) {
fileWriter.end();
}
}
// We wrap everything in a promise so we can await the transcription
return new Promise((resolve, reject) => {
// Set maximum recording duration timer
maxDurationTimer = setTimeout(() => {
stopRecording();
}, MAX_AUDIO_DURATION * 1000);
if (activeAudioLibrary === 'naudiodon') {
if (!AudioIO || !SampleFormat16Bit) {
isRecording = false;
return reject(new Error("Naudiodon not available"));
}
audioInterface = new AudioIO({
inOptions: {
channelCount: 1,
sampleFormat: SampleFormat16Bit,
sampleRate: SAMPLE_RATE,
deviceId: -1,
closeOnError: true
}
});
audioStream = audioInterface;
audioStream.on('error', (err) => {
cleanupAndResolve(null);
});
} else if (activeAudioLibrary === 'mic') {
audioInterface = new mic({
rate: String(SAMPLE_RATE),
channels: '1',
bitwidth: String(BIT_DEPTH),
endian: 'little',
encoding: 'signed-integer',
device: 'default',
debug: false // Don't use mic's debug, we have our own
});
audioStream = audioInterface.getAudioStream();
audioStream.on('error', (err) => {
cleanupAndResolve(null);
});
audioStream.on('processExitComplete', () => {
// Silent
});
}
// Common event handling for data (applies to both naudiodon ai and micStream)
audioStream.on('data', (chunk) => {
if (!recording) return;
fileWriter.write(chunk);
// Calculate RMS for threshold detection
let sumSquares = 0;
const sampleCount = chunk.length / 2;
for (let i = 0; i < chunk.length; i += 2) {
const sample = chunk.readInt16LE(i);
sumSquares += sample * sample;
}
const rms = Math.sqrt(sumSquares / sampleCount);
totalSampleCount++;
// Simplified speech detection logic
if (rms > adaptiveThreshold) {
speechSampleCount++;
consecutiveSpeechSamples++;
speechLevels.push(rms);
// Update adaptive threshold based on actual speech levels
if (speechLevels.length > 10) {
averageSpeechLevel = speechLevels.reduce((a, b) => a + b, 0) / speechLevels.length;
adaptiveThreshold = Math.max(RMS_THRESHOLD, averageSpeechLevel * 0.4); // 40% of average speech level
}
// Trigger speech detection much more easily
if (!hasHeardSpeech) {
// Either consecutive samples OR sufficient ratio
const speechRatio = speechSampleCount / totalSampleCount;
if (consecutiveSpeechSamples >= 3 || speechRatio >= 0.05) { // Much lower thresholds
hasHeardSpeech = true;
console.log(`[STT] Speech detected! (consecutive: ${consecutiveSpeechSamples}, ratio: ${(speechRatio * 100).toFixed(1)}%)`);
}
}
if (hasHeardSpeech) {
resetSilenceTimer();
}
} else {
consecutiveSpeechSamples = 0; // Reset consecutive counter
}
});
fileWriter.on('finish', async () => {
if (finished) return;
finished = true;
lastRecordingEndTime = Date.now();
try {
const stats = fs.statSync(outFile);
const headerSize = 44;
const dataSize = stats.size - headerSize;
const duration = dataSize / (SAMPLE_RATE * (BIT_DEPTH / 8));
const speechPercentage = totalSampleCount > 0 ? (speechSampleCount / totalSampleCount) * 100 : 0;
if (DEBUG_AUDIO) {
console.log(`[STT] Audio processed: ${duration.toFixed(2)}s, speech detected: ${hasHeardSpeech}, speech %: ${speechPercentage.toFixed(1)}%`);
}
if (duration < MIN_AUDIO_DURATION) {
cleanupAndResolve(null);
return;
}
if (!hasHeardSpeech || speechPercentage < 3) { // Lowered from 15% to 3%
cleanupAndResolve(null);
return;
}
const groqTTS = new GroqCloudTTS();
const text = await groqTTS.transcribe(outFile, {
model: "distil-whisper-large-v3-en",
prompt: "",
response_format: "json",
language: "en",
temperature: 0.0
});
if (!text || !text.trim()) {
cleanupAndResolve(null);
return;
}
// Enhanced validation
if (!/[A-Za-z]/.test(text)) {
cleanupAndResolve(null);
return;
}
if (/([A-Za-z])\1{3,}/.test(text)) {
cleanupAndResolve(null);
return;
}
// Filter out common false positives
const falsePositives = ["thank you", "thanks", "bye", ".", ",", "?", "!", "um", "uh", "hmm"];
if (falsePositives.includes(text.trim().toLowerCase())) {
cleanupAndResolve(null);
return;
}
const letterCount = text.replace(/[^A-Za-z]/g, "").length;
const normalizedText = text.trim().toLowerCase();
const allowedGreetings = new Set(["hi", "hello", "hey", "yes", "no", "okay"]);
if (letterCount < 2 && !allowedGreetings.has(normalizedText)) {
cleanupAndResolve(null);
return;
}
// Only log successful transcriptions
console.log("[STT] Transcribed:", text);
const finalMessage = `[${STT_USERNAME}] ${text}`;
if (!STT_AGENT_NAME.trim()) {
const agentNames = getAllInGameAgentNames();
for (const agentName of agentNames) {
getIO().emit('send-message', agentName, finalMessage);
}
} else {
getIO().emit('send-message', STT_AGENT_NAME, finalMessage);
}
cleanupAndResolve(text);
} catch (err) {
cleanupAndResolve(null);
}
});
function cleanupAndResolve(result) {
if (silenceTimer) clearTimeout(silenceTimer);
if (maxDurationTimer) clearTimeout(maxDurationTimer);
try {
if (fs.existsSync(outFile)) {
fs.unlinkSync(outFile);
}
} catch (err) {
// Silent cleanup
}
if (audioStream && typeof audioStream.removeAllListeners === 'function') {
audioStream.removeAllListeners();
}
if (fileWriter && typeof fileWriter.removeAllListeners === 'function') {
fileWriter.removeAllListeners();
}
isRecording = false;
resolve(result);
}
// Start recording
try {
if (activeAudioLibrary === 'naudiodon') {
audioInterface.start();
} else if (activeAudioLibrary === 'mic') {
audioInterface.start();
}
} catch (err) {
cleanupAndResolve(null);
}
});
}
/**
* Runs recording sessions sequentially, so only one at a time
*/
async function continuousLoop() {
if (!activeAudioLibrary) {
console.warn("[STT] No audio recording library available. STT disabled.");
sttRunning = false;
return;
}
console.log("[STT] Speech-to-text active (Groq Whisper)");
let consecutiveErrors = 0;
const maxConsecutiveErrors = 3;
while (sttRunning) {
try {
const result = await recordAndTranscribeOnce();
consecutiveErrors = 0;
// Longer delay between recordings
if (sttRunning) {
await new Promise(res => setTimeout(res, 1000));
}
} catch (err) {
consecutiveErrors++;
if (consecutiveErrors >= maxConsecutiveErrors) {
console.error("[STT] Too many errors, stopping STT.");
sttRunning = false;
break;
}
if (sttRunning) {
const delay = 3000 * consecutiveErrors;
await new Promise(res => setTimeout(res, delay));
}
}
}
}
export function initTTS() {
if (!settings.stt_transcription) {
console.log("[STT] STT transcription is disabled in settings.");
sttRunning = false;
return;
}
if (!activeAudioLibrary) {
console.warn("[STT] No audio recording library available (naudiodon or mic failed to load). STT functionality cannot be initialized.");
sttRunning = false;
return;
}
if (sttRunning || sttInitialized) {
console.log("[STT] STT already initialized; skipping re-init.");
return;
}
console.log("[STT] Initializing STT...");
sttRunning = true;
sttInitialized = true;
setTimeout(() => {
continuousLoop().catch((err) => {
console.error("[STT] continuousLoop crashed unexpectedly:", err);
sttRunning = false;
sttInitialized = false;
});
}, 2000);
}

254
test_agent_vision_log.js Normal file
View file

@ -0,0 +1,254 @@
// Test script for "always active" vision logging in Agent.js
const assert = (condition, message) => {
if (condition) {
console.log(`Assertion PASSED: ${message}`);
} else {
console.error(`Assertion FAILED: ${message}`);
// In a real test runner, we'd throw an error. Here, we'll mark a global failure flag.
global.testFailed = true;
}
};
global.testFailed = false;
// --- Mocks and Stubs ---
const mockSettings = {
vision_mode: 'always',
log_vision_data: true, // Assuming this is checked by logger.js, not directly by agent.js for this part
only_chat_with: [],
max_commands: 10, // Default value
verbose_commands: false,
speak: false,
blocked_actions: [],
};
const mockLogger = {
lastArgs_logVision: null,
logVision: (...args) => {
console.log('[MockLogger] logVision called with:', JSON.stringify(args, null, 2));
mockLogger.lastArgs_logVision = args;
}
};
const mockFs = {
dummyFileContent: Buffer.from("dummy image data"),
filesCreated: {},
readFileSync: (filePath) => {
console.log(`[MockFs] readFileSync called for: ${filePath}`);
if (mockFs.filesCreated[filePath]) {
return mockFs.dummyFileContent;
}
throw new Error(`[MockFs] File not found: ${filePath}`);
},
writeFileSync: (filePath, data) => { // Used by camera.capture simulation
console.log(`[MockFs] writeFileSync called for: ${filePath}`);
mockFs.filesCreated[filePath] = data;
},
existsSync: (filePath) => { // May be needed by History or other parts
return !!mockFs.filesCreated[filePath];
},
mkdirSync: (dirPath) => { // May be needed by History or other parts
console.log(`[MockFs] mkdirSync called for: ${dirPath}`);
}
};
const mockPath = {
join: (...paths) => paths.join('/'), // Simple join for testing
dirname: (p) => p.substring(0, p.lastIndexOf('/')) // simple dirname
};
// Simplified History class for testing
class MockHistory {
constructor(agent) {
this.agent = agent;
this.history = [];
}
add(source, message, imagePath = null) {
this.history.push({ role: source, content: message, image: imagePath });
}
getHistory() {
return [...this.history]; // Return a copy
}
save() { /* no-op for this test */ }
load() { /* no-op for this test */ return null; }
}
// --- Simplified Agent class (copied parts from src/agent/agent.js) ---
// We only need formatHistoryForVisionLog and handleMessage, and their direct dependencies.
class TestAgent {
constructor(name = "TestAgent") {
this.name = name;
this.latestScreenshotPath = null;
this.history = new MockHistory(this);
this.vision_interpreter = {
fp: './test_vision_data/screenshots', // Temporary path for test
camera: {
capture: async () => {
console.log('[MockCamera] capture called');
const filename = `vision_${Date.now()}_test.jpg`;
const fullPath = mockPath.join(this.vision_interpreter.fp, filename);
mockFs.writeFileSync(fullPath, "dummy screenshot data");
return filename; // Return only filename, as in original code
}
}
};
// Mock other dependencies of handleMessage if they are called before the vision logging part
this.prompter = { getName: () => this.name };
this.self_prompter = { isActive: () => false, shouldInterrupt: () => false, handleUserPromptedCmd: () => {} };
this.bot = { modes: { flushBehaviorLog: () => "" }, /* other needed bot mocks */ };
convoManager.isOtherAgent = () => false; // Mock convoManager
this.task = { data: null, isDone: () => false }; // Mock task
this.shut_up = false;
}
// Copied directly from the provided agent.js
formatHistoryForVisionLog(conversationHistory) {
if (!conversationHistory || conversationHistory.length === 0) return '';
const formattedHistory = [];
for (const turn of conversationHistory) {
const formattedTurn = {
role: turn.role || 'user',
content: []
};
if (typeof turn.content === 'string') {
formattedTurn.content.push({ type: 'text', text: turn.content });
} else if (Array.isArray(turn.content)) {
turn.content.forEach(contentItem => {
if (typeof contentItem === 'string') {
formattedTurn.content.push({ type: 'text', text: contentItem });
} else if (contentItem.type === 'text' && contentItem.text) {
formattedTurn.content.push({ type: 'text', text: contentItem.text });
} else if (contentItem.type === 'image_url' && contentItem.image_url && contentItem.image_url.url) {
formattedTurn.content.push({ type: 'image', image: contentItem.image_url.url });
} else if (contentItem.type === 'image' && contentItem.image) {
formattedTurn.content.push({ type: 'image', image: contentItem.image });
}
});
} else if (turn.content && typeof turn.content === 'object') {
if (turn.content.text) {
formattedTurn.content.push({ type: 'text', text: turn.content.text });
}
if (turn.content.image) {
formattedTurn.content.push({ type: 'image', image: turn.content.image });
}
if (turn.content.image_url && turn.content.image_url.url) {
formattedTurn.content.push({ type: 'image', image: turn.content.image_url.url });
}
}
if (turn.content && formattedTurn.content.length === 0) {
formattedTurn.content.push({ type: 'text', text: JSON.stringify(turn.content) });
}
formattedHistory.push(formattedTurn);
}
return JSON.stringify(formattedHistory);
}
// Simplified handleMessage, focusing on the vision logging part
async handleMessage(source, message, max_responses = null) {
const self_prompt = source === 'system' || source === this.name;
const from_other_bot = convoManager.isOtherAgent(source); // Mocked
if (!self_prompt && !from_other_bot) {
if (mockSettings.vision_mode === 'always' && this.vision_interpreter && this.vision_interpreter.camera) {
try {
const screenshotFilename = await this.vision_interpreter.camera.capture();
this.latestScreenshotPath = screenshotFilename;
console.log(`[${this.name}] Captured screenshot in always_active mode: ${screenshotFilename}`);
const currentHistory = this.history.getHistory();
let imageBuffer = null;
if (this.latestScreenshotPath && this.vision_interpreter.fp) {
try {
const fullImagePath = mockPath.join(this.vision_interpreter.fp, this.latestScreenshotPath);
imageBuffer = mockFs.readFileSync(fullImagePath);
} catch (err) {
console.error(`[${this.name}] Error reading image for always active log: ${err.message}`);
}
}
if (imageBuffer) {
const formattedHistoryString = this.formatHistoryForVisionLog(currentHistory);
mockLogger.logVision(currentHistory, imageBuffer, "Image captured for always active vision", formattedHistoryString);
}
} catch (error) {
console.error(`[${this.name}] Error capturing or logging screenshot in always_active mode:`, error);
}
}
// Simplified: No command execution or further processing for this test
}
// Simplified: No further history adding or prompting for this test after vision log
}
}
// Mock global dependencies that Agent might use internally if not fully mocked out
global.settings = mockSettings; // Used by Agent if not passed in
const convoManager = { // Mock for global convoManager if used by Agent directly
isOtherAgent: () => false,
initAgent: () => {},
};
// --- Test Execution ---
async function runTest() {
console.log("--- Starting Test ---");
const agent = new TestAgent();
// Prepare initial history
const sampleHistory = [
{ role: 'user', content: 'Hello bot!' },
{ role: 'assistant', content: 'I am fine, how are you?' } // Corrected: assistant content
];
agent.history.history = [...sampleHistory]; // Directly set history for the test
// Call handleMessage
await agent.handleMessage('testUser', 'Test message from user');
// --- Assertions ---
assert(mockLogger.lastArgs_logVision !== null, "logger.logVision was called.");
if (mockLogger.lastArgs_logVision) {
const args = mockLogger.lastArgs_logVision;
// 1. Check conversationHistory argument (1st arg)
// For simplicity, we'll check length and roles. A deep equal would be better in a real test.
assert(Array.isArray(args[0]) && args[0].length === sampleHistory.length, "logVision: conversationHistory has correct length.");
if (Array.isArray(args[0]) && args[0].length === sampleHistory.length) {
assert(args[0][0].role === sampleHistory[0].role && args[0][0].content === sampleHistory[0].content, "logVision: first history item matches.");
assert(args[0][1].role === sampleHistory[1].role && args[0][1].content === sampleHistory[1].content, "logVision: second history item matches.");
}
// 2. Check imageBuffer argument (2nd arg)
assert(args[1] === mockFs.dummyFileContent, "logVision: imageBuffer is the dummy buffer.");
// 3. Check response string (3rd arg)
assert(args[2] === "Image captured for always active vision", "logVision: response string is correct.");
// 4. Check visionMessage (formattedHistoryString) (4th arg)
const expectedFormattedHistory = agent.formatHistoryForVisionLog(sampleHistory);
assert(args[3] === expectedFormattedHistory, "logVision: visionMessage (formattedHistoryString) is correct.");
if(args[3] !== expectedFormattedHistory) {
console.log("Expected formatted history:", expectedFormattedHistory);
console.log("Actual formatted history:", args[3]);
}
}
// Check if camera.capture was called (implicitly tested by latestScreenshotPath being set for readFileSync)
// Check if fs.readFileSync was called (log output from mockFs)
console.log("--- Test Finished ---");
if (global.testFailed) {
console.error("--- !!! ONE OR MORE ASSERTIONS FAILED !!! ---");
// process.exit(1); // Exit with error code if in a CI environment
} else {
console.log("--- ALL ASSERTIONS PASSED ---");
}
}
runTest().catch(e => {
console.error("Test script error:", e);
global.testFailed = true;
// process.exit(1);
});