Merge pull request #9 from Sweaterdog/vision-logging-enhancements

Fix: Make naudiodon optional and document prerequisites
2025-08-04 14:25:43 +02:00 · 2025-06-07 16:01:47 -07:00 · 2025-06-07 16:01:47 -07:00 · 98b9284b44
commit 98b9284b44
parent 15578595f1 990ef03dca
3 changed files with 394 additions and 251 deletions
--- a/README.md
+++ b/README.md
@ -14,15 +14,80 @@ Do not connect this bot to public servers with coding enabled. This project allo
 - [Node.js Installed](https://nodejs.org/) (at least v14)
 - One of these: [OpenAI API Key](https://openai.com/blog/openai-api) | [Gemini API Key](https://aistudio.google.com/app/apikey) | [Anthropic API Key](https://docs.anthropic.com/claude/docs/getting-access-to-claude) | [Replicate API Key](https://replicate.com/) | [Hugging Face API Key](https://huggingface.co/) | [Groq API Key](https://console.groq.com/keys) | [Ollama Installed](https://ollama.com/download). | [Mistral API Key](https://docs.mistral.ai/getting-started/models/models_overview/) | [Qwen API Key [Intl.]](https://www.alibabacloud.com/help/en/model-studio/developer-reference/get-api-key)/[[cn]](https://help.aliyun.com/zh/model-studio/getting-started/first-api-call-to-qwen?) | [Novita AI API Key](https://novita.ai/settings?utm_source=github_mindcraft&utm_medium=github_readme&utm_campaign=link#key-management) |
 ## Installation Prerequisites
 ### `naudiodon` for Speech-to-Text (STT)
 The STT (Speech-to-Text) functionality in Mindcraft uses the `naudiodon` package for audio input. `naudiodon` is a native Node.js addon and might require additional steps to compile correctly during `npm install`.
 **`naudiodon` is an optional dependency.** This means:
 *   If `naudiodon` fails to install or build, the core Mindcraft application will still run.
 *   However, the Speech-to-Text (STT) feature will be automatically disabled if `naudiodon` is not available. You will see warnings in the console if it fails to load.
 *   If you wish to use STT and encounter build issues with `naudiodon`, please ensure you have the necessary build tools and libraries listed below for your operating system.
 **General Requirements for Building `naudiodon`:**
 *   **Node.js:** Ensure Node.js (v14+) is properly installed and added to your system's PATH.
 *   **Python:** `node-gyp` (the tool used to build native addons like `naudiodon`) requires Python. Recent versions of `node-gyp` are compatible with Python 3.x. Make sure Python is installed and accessible.
 *   **C++ Compiler Toolchain:** A C++ compiler (like g++ or MSVC) and related build tools (like `make` or MSBuild) are necessary.
 *   **PortAudio Library:** `naudiodon` specifically requires the PortAudio library.
 **Operating System Specifics for `PortAudio` (and `naudiodon` build):**
 ### Linux
 *   **Debian/Ubuntu:**
    ```bash
    sudo apt-get update
    sudo apt-get install build-essential libasound2-dev libportaudio-dev
    ```
    (`build-essential` provides g++, make, etc. `libasound2-dev` is for ALSA, and `libportaudio-dev` is crucial for `naudiodon`.)
 *   **Fedora/RHEL/CentOS:**
    ```bash
    # For newer Fedora (using dnf)
    sudo dnf groupinstall "Development Tools"
    sudo dnf install alsa-lib-devel portaudio-devel
    # For older RHEL/CentOS (using yum)
    sudo yum groupinstall "Development Tools"
    sudo yum install alsa-lib-devel portaudio-devel
    ```
    (`portaudio-devel` is the equivalent of `libportaudio-dev`.)
 ### Windows
 *   **Visual Studio C++ Build Tools:** This is the recommended way.
    1.  Download the [Visual Studio Installer](https://visualstudio.microsoft.com/downloads/).
    2.  Run the installer and select "Desktop development with C++" under the "Workloads" tab. This will install the necessary C++ compiler, MSBuild, and Windows SDKs.
    3.  Ensure that Python is correctly configured for `node-gyp`. If you have multiple Python versions, you might need to tell `npm` which one to use (e.g., `npm config set python C:\path\to\python.exe`) or ensure your desired Python version is first in your system's PATH.
 *   **MSYS2/MinGW:** While possible, this can be more complex. You would need to compile/install PortAudio within the MSYS2 environment and ensure `node-gyp` is configured to use the MinGW toolchain. Using the Visual Studio C++ Build Tools is generally more straightforward for `node-gyp` on Windows.
 ### macOS
 *   **Xcode Command Line Tools:**
    ```bash
    xcode-select --install
    ```
    (This installs Clang, make, and other necessary build tools.)
 *   **PortAudio:**
    ```bash
    brew install portaudio
    ```
    (Homebrew is the easiest way to install PortAudio on macOS.)
 *   **pkg-config (if needed):**
    ```bash
    brew install pkg-config
    ```
    (Sometimes required for build scripts to find library information.)
 If you see warnings or errors related to `naudiodon` during `npm install` and you *do not* intend to use the STT feature, these can typically be ignored. If you *do* want STT, ensure the above prerequisites are met.
 ## Install and Run
-1. Make sure you have the requirements above.
+1. Make sure you have the requirements above. If you plan to use the STT (Speech-to-Text) feature, also review the "Installation Prerequisites" section regarding `naudiodon`.
 2. Clone or download this repository (big green button)
 3. Rename `keys.example.json` to `keys.json` and fill in your API keys (you only need one). The desired model is set in `andy.json` or other profiles. For other models refer to the table below.
-4. In terminal/command prompt, run `npm install` from the installed directory
+4. In terminal/command prompt, run `npm install` from the installed directory. (Note: If `naudiodon` fails to build and you don't need STT, you can usually proceed.)
 5. Start a minecraft world and open it to LAN on localhost port `55916`
@ -131,7 +196,7 @@ STT can be enabled in `settings.js` under the section that looks like this:
    "stt_agent_name": ""
 ```
-The Text to Speech engine will begin listening on the system default input device.
+The Text to Speech engine will begin listening on the system default input device. **Note:** Successful STT operation depends on the `naudiodon` package, which is an optional dependency. If `naudiodon` failed to install or build (see "Installation Prerequisites" for troubleshooting), STT will be disabled.
 When using STT, you **need** a [GroqCloud API key](https://console.groq.com/keys) as Groq is used for Audio transcription
--- a/package.json
+++ b/package.json
@ -18,7 +18,6 @@
        "mineflayer-collectblock": "^1.4.1",
        "mineflayer-pathfinder": "^2.4.5",
        "mineflayer-pvp": "^1.3.2",
        "naudiodon": "^2.3.6",
        "node-canvas-webgl": "PrismarineJS/node-canvas-webgl",
        "openai": "^4.4.0",
        "patch-package": "^8.0.0",
@ -33,6 +32,9 @@
        "wav": "^1.0.2",
        "yargs": "^17.7.2"
    },
    "optionalDependencies": {
        "naudiodon": "^2.3.6"
    },
    "scripts": {
        "postinstall": "patch-package",
        "start": "node main.js"
--- a/src/process/tts_process.js
+++ b/src/process/tts_process.js
@ -1,247 +1,323 @@
-import settings from '../../settings.js';
+import settings from '../../settings.js';
-import { GroqCloudTTS } from '../models/groq.js';
+import { GroqCloudTTS } from '../models/groq.js';
-import portAudio from 'naudiodon';
+// import portAudio from 'naudiodon'; // Original static import
-const { AudioIO, SampleFormat16Bit } = portAudio;
+// const { AudioIO, SampleFormat16Bit } = portAudio; // Original destructuring
-import wav from 'wav';
+import wav from 'wav';
-import fs from 'fs';
+import fs from 'fs';
-import path from 'path';
+import path from 'path';
-import { fileURLToPath } from 'url';
+import { fileURLToPath } from 'url';
-
+
-// Import getIO and our new function getAllInGameAgentNames
+// Import getIO and our new function getAllInGameAgentNames
-import { getIO, getAllInGameAgentNames } from '../server/mind_server.js';
+import { getIO, getAllInGameAgentNames } from '../server/mind_server.js';
-
+
-const __filename = fileURLToPath(import.meta.url);
+const __filename = fileURLToPath(import.meta.url);
-const __dirname = path.dirname(__filename);
+const __dirname = path.dirname(__filename);
-
+
-/**
+// --- Conditional Naudiodon Import ---
- * Delete leftover speech_*.wav from previous runs
+let portAudio;
- */
+let AudioIO;
-const leftover = fs.readdirSync(__dirname).filter(f => /^speech_\d+\.wav$/.test(f));
+let SampleFormat16Bit;
-for (const file of leftover) {
+
-  try {
+(async () => {
-    fs.unlinkSync(path.join(__dirname, file));
+    try {
-  } catch (_) {
+        const naudiodonModule = await import('naudiodon');
-    // ignore errors
+        portAudio = naudiodonModule.default; // CommonJS modules often export functionality on 'default' when imported into ES modules
-  }
+        if (portAudio && typeof portAudio.AudioIO === 'function' && typeof portAudio.SampleFormat16Bit !== 'undefined') {
-}
+            AudioIO = portAudio.AudioIO;
-
+            SampleFormat16Bit = portAudio.SampleFormat16Bit;
-// Configuration
+            console.log('[STT] naudiodon loaded successfully.');
-const RMS_THRESHOLD = 500;     // Lower threshold for faint audio
+        } else if (naudiodonModule.AudioIO && typeof naudiodonModule.SampleFormat16Bit !== 'undefined') {
-const SILENCE_DURATION = 2000; // 2 seconds of silence after speech => stop
+            // Fallback if 'default' is not used and properties are directly on the module
-const SAMPLE_RATE = 16000;
+            AudioIO = naudiodonModule.AudioIO;
-const BIT_DEPTH = 16;
+            SampleFormat16Bit = naudiodonModule.SampleFormat16Bit;
-const STT_USERNAME = settings.stt_username || "SERVER"; // Name that appears as sender
+            portAudio = naudiodonModule; // Assign the module itself to portAudio for consistency if needed elsewhere
-const STT_AGENT_NAME = settings.stt_agent_name || "";   // If blank, broadcast to all
+            console.log('[STT] naudiodon loaded successfully (direct properties).');
-
+        }
-// Guards to prevent multiple overlapping recordings
+        else {
-let isRecording = false;  // Ensures only one recordAndTranscribeOnce at a time
+            throw new Error('AudioIO or SampleFormat16Bit not found in naudiodon module exports.');
-let sttRunning = false;   // Ensures continuousLoop is started only once
+        }
-
+    } catch (err) {
-/**
+        console.warn(`[STT] Failed to load naudiodon, Speech-to-Text will be disabled. Error: ${err.message}`);
- * Records one session, transcribes, and sends to MindServer as a chat message
+        portAudio = null;
- */
+        AudioIO = null;
-async function recordAndTranscribeOnce() {
+        SampleFormat16Bit = null;
-  // If another recording is in progress, just skip
+    }
-  if (isRecording) {
+    // Initialize TTS after attempting to load naudiodon
-    console.log("Another recording is still in progress; skipping new record attempt.");
+    initTTS();
-    return null;
+})();
-  }
+
-  isRecording = true;
+
-
+/**
-  const outFile = path.join(__dirname, `speech_${Date.now()}.wav`);
+ * Delete leftover speech_*.wav from previous runs
-  const fileWriter = new wav.FileWriter(outFile, {
+ */
-    channels: 1,
+const leftover = fs.readdirSync(__dirname).filter(f => /^speech_\d+\.wav$/.test(f));
-    sampleRate: SAMPLE_RATE,
+for (const file of leftover) {
-    bitDepth: BIT_DEPTH
+  try {
-  });
+    fs.unlinkSync(path.join(__dirname, file));
-  const ai = new AudioIO({
+  } catch (_) {
-    inOptions: {
+    // ignore errors
-      channelCount: 1,
+  }
-      sampleFormat: SampleFormat16Bit,
+}
-      sampleRate: SAMPLE_RATE,
+
-      deviceId: -1,
+// Configuration
-      closeOnError: true
+const RMS_THRESHOLD = 500;     // Lower threshold for faint audio
-    }
+const SILENCE_DURATION = 2000; // 2 seconds of silence after speech => stop
-  });
+const SAMPLE_RATE = 16000;
-
+const BIT_DEPTH = 16;
-  let recording = true;
+const STT_USERNAME = settings.stt_username || "SERVER"; // Name that appears as sender
-  let hasHeardSpeech = false;
+const STT_AGENT_NAME = settings.stt_agent_name || "";   // If blank, broadcast to all
-  let silenceTimer = null;
+
-  let finished = false; // Guard to ensure final processing is done only once
+// Guards to prevent multiple overlapping recordings
-
+let isRecording = false;  // Ensures only one recordAndTranscribeOnce at a time
-  // Helper to reset silence timer
+let sttRunning = false;   // Ensures continuousLoop is started only once
-  function resetSilenceTimer() {
+
-    if (silenceTimer) clearTimeout(silenceTimer);
+/**
-    if (hasHeardSpeech) {
+ * Records one session, transcribes, and sends to MindServer as a chat message
-      silenceTimer = setTimeout(() => stopRecording(), SILENCE_DURATION);
+ */
-    }
+async function recordAndTranscribeOnce() {
-  }
+  // If another recording is in progress, just skip
-
+  if (isRecording) {
-  // Stop recording
+    console.log("[STT] Another recording is still in progress; skipping new record attempt.");
-  function stopRecording() {
+    return null;
-    if (!recording) return;
+  }
-    recording = false;
+  isRecording = true;
-    ai.quit();
+
-    fileWriter.end();
+  const outFile = path.join(__dirname, `speech_${Date.now()}.wav`);
-  }
+  const fileWriter = new wav.FileWriter(outFile, {
-
+    channels: 1,
-  // We wrap everything in a promise so we can await the transcription
+    sampleRate: SAMPLE_RATE,
-  return new Promise((resolve, reject) => {
+    bitDepth: BIT_DEPTH
-    // Attach event handlers
+  });
-    ai.on('data', (chunk) => {
+
-      fileWriter.write(chunk);
+  // This is where AudioIO is crucial
-
+  if (!AudioIO || !SampleFormat16Bit) {
-      // Calculate RMS for threshold detection
+      console.warn("[STT] AudioIO or SampleFormat16Bit not available. Cannot record audio.");
-      let sumSquares = 0;
+      isRecording = false;
-      const sampleCount = chunk.length / 2;
+      return null;
-      for (let i = 0; i < chunk.length; i += 2) {
+  }
-        const sample = chunk.readInt16LE(i);
+
-        sumSquares += sample * sample;
+  const ai = new AudioIO({
-      }
+    inOptions: {
-      const rms = Math.sqrt(sumSquares / sampleCount);
+      channelCount: 1,
-
+      sampleFormat: SampleFormat16Bit,
-      // If RMS passes threshold, we've heard speech
+      sampleRate: SAMPLE_RATE,
-      if (rms > RMS_THRESHOLD) {
+      deviceId: -1,
-        if (!hasHeardSpeech) {
+      closeOnError: true
-          hasHeardSpeech = true;
+    }
-        }
+  });
-        resetSilenceTimer();
+
-      }
+  let recording = true;
-    });
+  let hasHeardSpeech = false;
-
+  let silenceTimer = null;
-    ai.on('error', (err) => {
+  let finished = false; // Guard to ensure final processing is done only once
-      cleanupListeners();
+
-      reject(err);
+  // Helper to reset silence timer
-    });
+  function resetSilenceTimer() {
-
+    if (silenceTimer) clearTimeout(silenceTimer);
-    fileWriter.on('finish', async () => {
+    if (hasHeardSpeech) {
-      if (finished) return;
+      silenceTimer = setTimeout(() => stopRecording(), SILENCE_DURATION);
-      finished = true;
+    }
-      try {
+  }
-        // Check audio duration
+
-        const stats = fs.statSync(outFile);
+  // Stop recording
-        const headerSize = 44; // standard WAV header size
+  function stopRecording() {
-        const dataSize = stats.size - headerSize;
+    if (!recording) return;
-        const duration = dataSize / (SAMPLE_RATE * (BIT_DEPTH / 8));
+    recording = false;
-        if (duration < 2.75) {
+    ai.quit();
-          console.log("Audio too short (<2.75s); discarding.");
+    fileWriter.end();
-          fs.unlink(outFile, () => {});
+  }
-          cleanupListeners();
+
-          return resolve(null);
+  // We wrap everything in a promise so we can await the transcription
-        }
+  return new Promise((resolve, reject) => {
-
+    // Attach event handlers
-        // Transcribe
+    ai.on('data', (chunk) => {
-        const groqTTS = new GroqCloudTTS();
+      fileWriter.write(chunk);
-        const text = await groqTTS.transcribe(outFile, {
+
-          model: "distil-whisper-large-v3-en",
+      // Calculate RMS for threshold detection
-          prompt: "",
+      let sumSquares = 0;
-          response_format: "json",
+      const sampleCount = chunk.length / 2;
-          language: "en",
+      for (let i = 0; i < chunk.length; i += 2) {
-          temperature: 0.0
+        const sample = chunk.readInt16LE(i);
-        });
+        sumSquares += sample * sample;
-
+      }
-        fs.unlink(outFile, () => {}); // cleanup WAV file
+      const rms = Math.sqrt(sumSquares / sampleCount);
-
+
-        // Basic check for empty or whitespace
+      // If RMS passes threshold, we've heard speech
-        if (!text || !text.trim()) {
+      if (rms > RMS_THRESHOLD) {
-          console.log("Transcription empty; discarding.");
+        if (!hasHeardSpeech) {
-          cleanupListeners();
+          hasHeardSpeech = true;
-          return resolve(null);
+        }
-        }
+        resetSilenceTimer();
-
+      }
-        // Heuristic checks to determine if the transcription is genuine
+    });
-        
+
-        // 1. Ensure at least one alphabetical character
+    ai.on('error', (err) => {
-        if (!/[A-Za-z]/.test(text)) {
+      console.error("[STT] AudioIO error:", err);
-          console.log("Transcription has no letters; discarding.");
+      cleanupListeners();
-          cleanupListeners();
+      // Don't reject here, as continuousLoop should continue. Resolve with null.
-          return resolve(null);
+      resolve(null);
-        }
+    });
-
+
-        // 2. Check for gibberish repeated sequences
+    fileWriter.on('finish', async () => {
-        if (/([A-Za-z])\1{3,}/.test(text)) {
+      if (finished) return;
-          console.log("Transcription looks like gibberish; discarding.");
+      finished = true;
-          cleanupListeners();
+      try {
-          return resolve(null);
+        // Check audio duration
-        }
+        const stats = fs.statSync(outFile);
-
+        const headerSize = 44; // standard WAV header size
-        // 3. Check transcription length, with allowed greetings
+        const dataSize = stats.size - headerSize;
-        const letterCount = text.replace(/[^A-Za-z]/g, "").length;
+        const duration = dataSize / (SAMPLE_RATE * (BIT_DEPTH / 8));
-        const normalizedText = text.trim().toLowerCase();
+        if (duration < 2.75) {
-        const allowedGreetings = new Set(["hi", "hello", "greetings", "hey"]);
+          console.log("[STT] Audio too short (<2.75s); discarding.");
-
+          fs.unlink(outFile, () => {});
-        if (letterCount < 8 && !allowedGreetings.has(normalizedText)) {
+          cleanupListeners();
-          console.log("Transcription too short and not an allowed greeting; discarding.");
+          return resolve(null);
-          cleanupListeners();
+        }
-          return resolve(null);
+
-        }
+        // Transcribe
-
+        const groqTTS = new GroqCloudTTS();
-        console.log("Transcription:", text);
+        const text = await groqTTS.transcribe(outFile, {
-
+          model: "distil-whisper-large-v3-en",
-        // Format message so it looks like: "[SERVER] message"
+          prompt: "",
-        const finalMessage = `[${STT_USERNAME}] ${text}`;
+          response_format: "json",
-
+          language: "en",
-        // If STT_AGENT_NAME is empty, broadcast to all agents
+          temperature: 0.0
-        if (!STT_AGENT_NAME.trim()) {
+        });
-          const agentNames = getAllInGameAgentNames(); // from mind_server
+
-          for (const agentName of agentNames) {
+        fs.unlink(outFile, () => {}); // cleanup WAV file
-            getIO().emit('send-message', agentName, finalMessage);
+
-          }
+        // Basic check for empty or whitespace
-        } else {
+        if (!text || !text.trim()) {
-          // Otherwise, send only to the specified agent
+          console.log("[STT] Transcription empty; discarding.");
-          getIO().emit('send-message', STT_AGENT_NAME, finalMessage);
+          cleanupListeners();
-        }
+          return resolve(null);
-
+        }
-        cleanupListeners();
+
-        resolve(text);
+        // Heuristic checks to determine if the transcription is genuine
-      } catch (err) {
+
-        cleanupListeners();
+        // 1. Ensure at least one alphabetical character
-        reject(err);
+        if (!/[A-Za-z]/.test(text)) {
-      }
+          console.log("[STT] Transcription has no letters; discarding.");
-    });
+          cleanupListeners();
-
+          return resolve(null);
-    ai.start();
+        }
-
+
-    function cleanupListeners() {
+        // 2. Check for gibberish repeated sequences
-      ai.removeAllListeners('data');
+        if (/([A-Za-z])\1{3,}/.test(text)) {
-      ai.removeAllListeners('error');
+          console.log("[STT] Transcription looks like gibberish; discarding.");
-      fileWriter.removeAllListeners('finish');
+          cleanupListeners();
-      if (silenceTimer) clearTimeout(silenceTimer);
+          return resolve(null);
-
+        }
-      // release lock
+
-      isRecording = false;
+        // 3. Check transcription length, with allowed greetings
-    }
+        const letterCount = text.replace(/[^A-Za-z]/g, "").length;
-  });
+        const normalizedText = text.trim().toLowerCase();
-}
+        const allowedGreetings = new Set(["hi", "hello", "greetings", "hey"]);
-
+
-/**
+        if (letterCount < 8 && !allowedGreetings.has(normalizedText)) {
- * Runs recording sessions sequentially, so only one at a time
+          console.log("[STT] Transcription too short and not an allowed greeting; discarding.");
- */
+          cleanupListeners();
-async function continuousLoop() {
+          return resolve(null);
-  while (true) {
+        }
-    try {
+
-      await recordAndTranscribeOnce();
+        console.log("[STT] Transcription:", text);
-    } catch (err) {
+
-      console.error("[STT Error]", err);
+        // Format message so it looks like: "[SERVER] message"
-    }
+        const finalMessage = `[${STT_USERNAME}] ${text}`;
-    // short gap
+
-    await new Promise(res => setTimeout(res, 1000));
+        // If STT_AGENT_NAME is empty, broadcast to all agents
-  }
+        if (!STT_AGENT_NAME.trim()) {
-}
+          const agentNames = getAllInGameAgentNames(); // from mind_server
-
+          for (const agentName of agentNames) {
-export function initTTS() {
+            getIO().emit('send-message', agentName, finalMessage);
-  // Only run if stt_transcription is true and we haven't started already
+          }
-  if (!settings.stt_transcription) return;
+        } else {
-
+          // Otherwise, send only to the specified agent
-  if (sttRunning) {
+          getIO().emit('send-message', STT_AGENT_NAME, finalMessage);
-    console.log("STT loop already running; skipping re-init.");
+        }
-    return;
+
-  }
+        cleanupListeners();
-  sttRunning = true;
+        resolve(text);
-
+      } catch (err) {
-  continuousLoop().catch((err) => {
+        console.error("[STT] Error during transcription or sending message:", err);
-    console.error("[STT] continuousLoop crashed", err);
+        fs.unlink(outFile, () => {}); // Attempt cleanup even on error
-  });
+        cleanupListeners();
-}
+        reject(err); // Propagate error for continuousLoop to catch
-
+      }
-initTTS();
+    });
    ai.start();
    function cleanupListeners() {
      if (ai && typeof ai.removeAllListeners === 'function') {
        ai.removeAllListeners('data');
        ai.removeAllListeners('error');
      }
      if (fileWriter && typeof fileWriter.removeAllListeners === 'function') {
        fileWriter.removeAllListeners('finish');
      }
      if (silenceTimer) clearTimeout(silenceTimer);
      // release lock
      isRecording = false;
    }
  });
 }
 /**
 * Runs recording sessions sequentially, so only one at a time
 */
 async function continuousLoop() {
  // This check is now more critical as AudioIO might not be available
  if (!AudioIO) {
    console.warn("[STT] AudioIO not available. STT continuous loop cannot start.");
    sttRunning = false; // Ensure this is marked as not running
    return;
  }
  while (sttRunning) { // Check sttRunning to allow loop to terminate if STT is disabled later
    try {
      await recordAndTranscribeOnce();
    } catch (err) {
      // Errors from recordAndTranscribeOnce (like transcription errors) are caught here
      console.error("[STT Error in continuousLoop]", err);
      // Potentially add a longer delay or a backoff mechanism if errors are persistent
    }
    // short gap, but only if stt is still supposed to be running
    if (sttRunning) {
      await new Promise(res => setTimeout(res, 1000));
    }
  }
  console.log("[STT] Continuous loop ended.");
 }
 export function initTTS() {
  if (!settings.stt_transcription) {
    console.log("[STT] STT transcription is disabled in settings.");
    sttRunning = false; // Ensure it's marked as not running
    return;
  }
  // This check is crucial: if AudioIO (from naudiodon) wasn't loaded, STT cannot run.
  if (!AudioIO) {
    console.warn("[STT] AudioIO is not available (naudiodon might have failed to load). STT functionality cannot be initialized.");
    sttRunning = false; // Ensure sttRunning is false if it was somehow true
    return;
  }
  if (sttRunning) {
    console.log("[STT] STT loop already running; skipping re-init.");
    return;
  }
  console.log("[STT] Initializing STT...");
  sttRunning = true; // Set before starting the loop
  continuousLoop().catch((err) => {
    console.error("[STT] continuousLoop crashed unexpectedly:", err);
    sttRunning = false; // Mark as not running if it crashes
  });
 }
 // Moved initTTS() call into the async IIFE after naudiodon import attempt.
 // initTTS();