Merge pull request #9 from Sweaterdog/vision-logging-enhancements

Fix: Make naudiodon optional and document prerequisites
This commit is contained in:
Sweaterdog 2025-06-07 16:01:47 -07:00 committed by GitHub
commit 98b9284b44
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 394 additions and 251 deletions

View file

@ -14,15 +14,80 @@ Do not connect this bot to public servers with coding enabled. This project allo
- [Node.js Installed](https://nodejs.org/) (at least v14) - [Node.js Installed](https://nodejs.org/) (at least v14)
- One of these: [OpenAI API Key](https://openai.com/blog/openai-api) | [Gemini API Key](https://aistudio.google.com/app/apikey) | [Anthropic API Key](https://docs.anthropic.com/claude/docs/getting-access-to-claude) | [Replicate API Key](https://replicate.com/) | [Hugging Face API Key](https://huggingface.co/) | [Groq API Key](https://console.groq.com/keys) | [Ollama Installed](https://ollama.com/download). | [Mistral API Key](https://docs.mistral.ai/getting-started/models/models_overview/) | [Qwen API Key [Intl.]](https://www.alibabacloud.com/help/en/model-studio/developer-reference/get-api-key)/[[cn]](https://help.aliyun.com/zh/model-studio/getting-started/first-api-call-to-qwen?) | [Novita AI API Key](https://novita.ai/settings?utm_source=github_mindcraft&utm_medium=github_readme&utm_campaign=link#key-management) | - One of these: [OpenAI API Key](https://openai.com/blog/openai-api) | [Gemini API Key](https://aistudio.google.com/app/apikey) | [Anthropic API Key](https://docs.anthropic.com/claude/docs/getting-access-to-claude) | [Replicate API Key](https://replicate.com/) | [Hugging Face API Key](https://huggingface.co/) | [Groq API Key](https://console.groq.com/keys) | [Ollama Installed](https://ollama.com/download). | [Mistral API Key](https://docs.mistral.ai/getting-started/models/models_overview/) | [Qwen API Key [Intl.]](https://www.alibabacloud.com/help/en/model-studio/developer-reference/get-api-key)/[[cn]](https://help.aliyun.com/zh/model-studio/getting-started/first-api-call-to-qwen?) | [Novita AI API Key](https://novita.ai/settings?utm_source=github_mindcraft&utm_medium=github_readme&utm_campaign=link#key-management) |
## Installation Prerequisites
### `naudiodon` for Speech-to-Text (STT)
The STT (Speech-to-Text) functionality in Mindcraft uses the `naudiodon` package for audio input. `naudiodon` is a native Node.js addon and might require additional steps to compile correctly during `npm install`.
**`naudiodon` is an optional dependency.** This means:
* If `naudiodon` fails to install or build, the core Mindcraft application will still run.
* However, the Speech-to-Text (STT) feature will be automatically disabled if `naudiodon` is not available. You will see warnings in the console if it fails to load.
* If you wish to use STT and encounter build issues with `naudiodon`, please ensure you have the necessary build tools and libraries listed below for your operating system.
**General Requirements for Building `naudiodon`:**
* **Node.js:** Ensure Node.js (v14+) is properly installed and added to your system's PATH.
* **Python:** `node-gyp` (the tool used to build native addons like `naudiodon`) requires Python. Recent versions of `node-gyp` are compatible with Python 3.x. Make sure Python is installed and accessible.
* **C++ Compiler Toolchain:** A C++ compiler (like g++ or MSVC) and related build tools (like `make` or MSBuild) are necessary.
* **PortAudio Library:** `naudiodon` specifically requires the PortAudio library.
**Operating System Specifics for `PortAudio` (and `naudiodon` build):**
### Linux
* **Debian/Ubuntu:**
```bash
sudo apt-get update
sudo apt-get install build-essential libasound2-dev libportaudio-dev
```
(`build-essential` provides g++, make, etc. `libasound2-dev` is for ALSA, and `libportaudio-dev` is crucial for `naudiodon`.)
* **Fedora/RHEL/CentOS:**
```bash
# For newer Fedora (using dnf)
sudo dnf groupinstall "Development Tools"
sudo dnf install alsa-lib-devel portaudio-devel
# For older RHEL/CentOS (using yum)
sudo yum groupinstall "Development Tools"
sudo yum install alsa-lib-devel portaudio-devel
```
(`portaudio-devel` is the equivalent of `libportaudio-dev`.)
### Windows
* **Visual Studio C++ Build Tools:** This is the recommended way.
1. Download the [Visual Studio Installer](https://visualstudio.microsoft.com/downloads/).
2. Run the installer and select "Desktop development with C++" under the "Workloads" tab. This will install the necessary C++ compiler, MSBuild, and Windows SDKs.
3. Ensure that Python is correctly configured for `node-gyp`. If you have multiple Python versions, you might need to tell `npm` which one to use (e.g., `npm config set python C:\path\to\python.exe`) or ensure your desired Python version is first in your system's PATH.
* **MSYS2/MinGW:** While possible, this can be more complex. You would need to compile/install PortAudio within the MSYS2 environment and ensure `node-gyp` is configured to use the MinGW toolchain. Using the Visual Studio C++ Build Tools is generally more straightforward for `node-gyp` on Windows.
### macOS
* **Xcode Command Line Tools:**
```bash
xcode-select --install
```
(This installs Clang, make, and other necessary build tools.)
* **PortAudio:**
```bash
brew install portaudio
```
(Homebrew is the easiest way to install PortAudio on macOS.)
* **pkg-config (if needed):**
```bash
brew install pkg-config
```
(Sometimes required for build scripts to find library information.)
If you see warnings or errors related to `naudiodon` during `npm install` and you *do not* intend to use the STT feature, these can typically be ignored. If you *do* want STT, ensure the above prerequisites are met.
## Install and Run ## Install and Run
1. Make sure you have the requirements above. 1. Make sure you have the requirements above. If you plan to use the STT (Speech-to-Text) feature, also review the "Installation Prerequisites" section regarding `naudiodon`.
2. Clone or download this repository (big green button) 2. Clone or download this repository (big green button)
3. Rename `keys.example.json` to `keys.json` and fill in your API keys (you only need one). The desired model is set in `andy.json` or other profiles. For other models refer to the table below. 3. Rename `keys.example.json` to `keys.json` and fill in your API keys (you only need one). The desired model is set in `andy.json` or other profiles. For other models refer to the table below.
4. In terminal/command prompt, run `npm install` from the installed directory 4. In terminal/command prompt, run `npm install` from the installed directory. (Note: If `naudiodon` fails to build and you don't need STT, you can usually proceed.)
5. Start a minecraft world and open it to LAN on localhost port `55916` 5. Start a minecraft world and open it to LAN on localhost port `55916`
@ -131,7 +196,7 @@ STT can be enabled in `settings.js` under the section that looks like this:
"stt_agent_name": "" "stt_agent_name": ""
``` ```
The Text to Speech engine will begin listening on the system default input device. The Text to Speech engine will begin listening on the system default input device. **Note:** Successful STT operation depends on the `naudiodon` package, which is an optional dependency. If `naudiodon` failed to install or build (see "Installation Prerequisites" for troubleshooting), STT will be disabled.
When using STT, you **need** a [GroqCloud API key](https://console.groq.com/keys) as Groq is used for Audio transcription When using STT, you **need** a [GroqCloud API key](https://console.groq.com/keys) as Groq is used for Audio transcription

View file

@ -18,7 +18,6 @@
"mineflayer-collectblock": "^1.4.1", "mineflayer-collectblock": "^1.4.1",
"mineflayer-pathfinder": "^2.4.5", "mineflayer-pathfinder": "^2.4.5",
"mineflayer-pvp": "^1.3.2", "mineflayer-pvp": "^1.3.2",
"naudiodon": "^2.3.6",
"node-canvas-webgl": "PrismarineJS/node-canvas-webgl", "node-canvas-webgl": "PrismarineJS/node-canvas-webgl",
"openai": "^4.4.0", "openai": "^4.4.0",
"patch-package": "^8.0.0", "patch-package": "^8.0.0",
@ -33,6 +32,9 @@
"wav": "^1.0.2", "wav": "^1.0.2",
"yargs": "^17.7.2" "yargs": "^17.7.2"
}, },
"optionalDependencies": {
"naudiodon": "^2.3.6"
},
"scripts": { "scripts": {
"postinstall": "patch-package", "postinstall": "patch-package",
"start": "node main.js" "start": "node main.js"

View file

@ -1,7 +1,7 @@
import settings from '../../settings.js'; import settings from '../../settings.js';
import { GroqCloudTTS } from '../models/groq.js'; import { GroqCloudTTS } from '../models/groq.js';
import portAudio from 'naudiodon'; // import portAudio from 'naudiodon'; // Original static import
const { AudioIO, SampleFormat16Bit } = portAudio; // const { AudioIO, SampleFormat16Bit } = portAudio; // Original destructuring
import wav from 'wav'; import wav from 'wav';
import fs from 'fs'; import fs from 'fs';
import path from 'path'; import path from 'path';
@ -13,6 +13,40 @@ import { getIO, getAllInGameAgentNames } from '../server/mind_server.js';
const __filename = fileURLToPath(import.meta.url); const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename); const __dirname = path.dirname(__filename);
// --- Conditional Naudiodon Import ---
let portAudio;
let AudioIO;
let SampleFormat16Bit;
(async () => {
try {
const naudiodonModule = await import('naudiodon');
portAudio = naudiodonModule.default; // CommonJS modules often export functionality on 'default' when imported into ES modules
if (portAudio && typeof portAudio.AudioIO === 'function' && typeof portAudio.SampleFormat16Bit !== 'undefined') {
AudioIO = portAudio.AudioIO;
SampleFormat16Bit = portAudio.SampleFormat16Bit;
console.log('[STT] naudiodon loaded successfully.');
} else if (naudiodonModule.AudioIO && typeof naudiodonModule.SampleFormat16Bit !== 'undefined') {
// Fallback if 'default' is not used and properties are directly on the module
AudioIO = naudiodonModule.AudioIO;
SampleFormat16Bit = naudiodonModule.SampleFormat16Bit;
portAudio = naudiodonModule; // Assign the module itself to portAudio for consistency if needed elsewhere
console.log('[STT] naudiodon loaded successfully (direct properties).');
}
else {
throw new Error('AudioIO or SampleFormat16Bit not found in naudiodon module exports.');
}
} catch (err) {
console.warn(`[STT] Failed to load naudiodon, Speech-to-Text will be disabled. Error: ${err.message}`);
portAudio = null;
AudioIO = null;
SampleFormat16Bit = null;
}
// Initialize TTS after attempting to load naudiodon
initTTS();
})();
/** /**
* Delete leftover speech_*.wav from previous runs * Delete leftover speech_*.wav from previous runs
*/ */
@ -43,7 +77,7 @@ let sttRunning = false; // Ensures continuousLoop is started only once
async function recordAndTranscribeOnce() { async function recordAndTranscribeOnce() {
// If another recording is in progress, just skip // If another recording is in progress, just skip
if (isRecording) { if (isRecording) {
console.log("Another recording is still in progress; skipping new record attempt."); console.log("[STT] Another recording is still in progress; skipping new record attempt.");
return null; return null;
} }
isRecording = true; isRecording = true;
@ -54,6 +88,14 @@ async function recordAndTranscribeOnce() {
sampleRate: SAMPLE_RATE, sampleRate: SAMPLE_RATE,
bitDepth: BIT_DEPTH bitDepth: BIT_DEPTH
}); });
// This is where AudioIO is crucial
if (!AudioIO || !SampleFormat16Bit) {
console.warn("[STT] AudioIO or SampleFormat16Bit not available. Cannot record audio.");
isRecording = false;
return null;
}
const ai = new AudioIO({ const ai = new AudioIO({
inOptions: { inOptions: {
channelCount: 1, channelCount: 1,
@ -110,8 +152,10 @@ async function recordAndTranscribeOnce() {
}); });
ai.on('error', (err) => { ai.on('error', (err) => {
console.error("[STT] AudioIO error:", err);
cleanupListeners(); cleanupListeners();
reject(err); // Don't reject here, as continuousLoop should continue. Resolve with null.
resolve(null);
}); });
fileWriter.on('finish', async () => { fileWriter.on('finish', async () => {
@ -124,7 +168,7 @@ async function recordAndTranscribeOnce() {
const dataSize = stats.size - headerSize; const dataSize = stats.size - headerSize;
const duration = dataSize / (SAMPLE_RATE * (BIT_DEPTH / 8)); const duration = dataSize / (SAMPLE_RATE * (BIT_DEPTH / 8));
if (duration < 2.75) { if (duration < 2.75) {
console.log("Audio too short (<2.75s); discarding."); console.log("[STT] Audio too short (<2.75s); discarding.");
fs.unlink(outFile, () => {}); fs.unlink(outFile, () => {});
cleanupListeners(); cleanupListeners();
return resolve(null); return resolve(null);
@ -144,7 +188,7 @@ async function recordAndTranscribeOnce() {
// Basic check for empty or whitespace // Basic check for empty or whitespace
if (!text || !text.trim()) { if (!text || !text.trim()) {
console.log("Transcription empty; discarding."); console.log("[STT] Transcription empty; discarding.");
cleanupListeners(); cleanupListeners();
return resolve(null); return resolve(null);
} }
@ -153,14 +197,14 @@ async function recordAndTranscribeOnce() {
// 1. Ensure at least one alphabetical character // 1. Ensure at least one alphabetical character
if (!/[A-Za-z]/.test(text)) { if (!/[A-Za-z]/.test(text)) {
console.log("Transcription has no letters; discarding."); console.log("[STT] Transcription has no letters; discarding.");
cleanupListeners(); cleanupListeners();
return resolve(null); return resolve(null);
} }
// 2. Check for gibberish repeated sequences // 2. Check for gibberish repeated sequences
if (/([A-Za-z])\1{3,}/.test(text)) { if (/([A-Za-z])\1{3,}/.test(text)) {
console.log("Transcription looks like gibberish; discarding."); console.log("[STT] Transcription looks like gibberish; discarding.");
cleanupListeners(); cleanupListeners();
return resolve(null); return resolve(null);
} }
@ -171,12 +215,12 @@ async function recordAndTranscribeOnce() {
const allowedGreetings = new Set(["hi", "hello", "greetings", "hey"]); const allowedGreetings = new Set(["hi", "hello", "greetings", "hey"]);
if (letterCount < 8 && !allowedGreetings.has(normalizedText)) { if (letterCount < 8 && !allowedGreetings.has(normalizedText)) {
console.log("Transcription too short and not an allowed greeting; discarding."); console.log("[STT] Transcription too short and not an allowed greeting; discarding.");
cleanupListeners(); cleanupListeners();
return resolve(null); return resolve(null);
} }
console.log("Transcription:", text); console.log("[STT] Transcription:", text);
// Format message so it looks like: "[SERVER] message" // Format message so it looks like: "[SERVER] message"
const finalMessage = `[${STT_USERNAME}] ${text}`; const finalMessage = `[${STT_USERNAME}] ${text}`;
@ -195,17 +239,23 @@ async function recordAndTranscribeOnce() {
cleanupListeners(); cleanupListeners();
resolve(text); resolve(text);
} catch (err) { } catch (err) {
console.error("[STT] Error during transcription or sending message:", err);
fs.unlink(outFile, () => {}); // Attempt cleanup even on error
cleanupListeners(); cleanupListeners();
reject(err); reject(err); // Propagate error for continuousLoop to catch
} }
}); });
ai.start(); ai.start();
function cleanupListeners() { function cleanupListeners() {
if (ai && typeof ai.removeAllListeners === 'function') {
ai.removeAllListeners('data'); ai.removeAllListeners('data');
ai.removeAllListeners('error'); ai.removeAllListeners('error');
}
if (fileWriter && typeof fileWriter.removeAllListeners === 'function') {
fileWriter.removeAllListeners('finish'); fileWriter.removeAllListeners('finish');
}
if (silenceTimer) clearTimeout(silenceTimer); if (silenceTimer) clearTimeout(silenceTimer);
// release lock // release lock
@ -218,30 +268,56 @@ async function recordAndTranscribeOnce() {
* Runs recording sessions sequentially, so only one at a time * Runs recording sessions sequentially, so only one at a time
*/ */
async function continuousLoop() { async function continuousLoop() {
while (true) { // This check is now more critical as AudioIO might not be available
if (!AudioIO) {
console.warn("[STT] AudioIO not available. STT continuous loop cannot start.");
sttRunning = false; // Ensure this is marked as not running
return;
}
while (sttRunning) { // Check sttRunning to allow loop to terminate if STT is disabled later
try { try {
await recordAndTranscribeOnce(); await recordAndTranscribeOnce();
} catch (err) { } catch (err) {
console.error("[STT Error]", err); // Errors from recordAndTranscribeOnce (like transcription errors) are caught here
console.error("[STT Error in continuousLoop]", err);
// Potentially add a longer delay or a backoff mechanism if errors are persistent
} }
// short gap // short gap, but only if stt is still supposed to be running
if (sttRunning) {
await new Promise(res => setTimeout(res, 1000)); await new Promise(res => setTimeout(res, 1000));
} }
} }
console.log("[STT] Continuous loop ended.");
}
export function initTTS() { export function initTTS() {
// Only run if stt_transcription is true and we haven't started already if (!settings.stt_transcription) {
if (!settings.stt_transcription) return; console.log("[STT] STT transcription is disabled in settings.");
sttRunning = false; // Ensure it's marked as not running
if (sttRunning) {
console.log("STT loop already running; skipping re-init.");
return; return;
} }
sttRunning = true;
// This check is crucial: if AudioIO (from naudiodon) wasn't loaded, STT cannot run.
if (!AudioIO) {
console.warn("[STT] AudioIO is not available (naudiodon might have failed to load). STT functionality cannot be initialized.");
sttRunning = false; // Ensure sttRunning is false if it was somehow true
return;
}
if (sttRunning) {
console.log("[STT] STT loop already running; skipping re-init.");
return;
}
console.log("[STT] Initializing STT...");
sttRunning = true; // Set before starting the loop
continuousLoop().catch((err) => { continuousLoop().catch((err) => {
console.error("[STT] continuousLoop crashed", err); console.error("[STT] continuousLoop crashed unexpectedly:", err);
sttRunning = false; // Mark as not running if it crashes
}); });
} }
initTTS(); // Moved initTTS() call into the async IIFE after naudiodon import attempt.
// initTTS();