Find a file
Sweaterdog d116e90126
Update prompter.js
Fixed spacing and logging
2025-06-07 17:17:51 -07:00
bots stopped tracking files within bots/ 2025-03-20 14:04:21 -07:00
patches Merge branch 'kolbytn:main' into TTS 2025-04-20 21:36:58 -07:00
profiles Merge branch 'main' into Speech-to-Text 2025-06-07 14:59:35 -07:00
services/viaproxy run in docker-compose: readme 2024-10-30 09:31:21 +01:00
src Update prompter.js 2025-06-07 17:17:51 -07:00
tasks remove unnecessary changes 2025-05-23 12:02:23 -07:00
.gitignore add ds_store to gitignore 2025-04-23 14:19:26 -05:00
andy.json Fixed agent.js, removed unncessary change in main.js, moved/deleted unnecessary files 2025-03-08 16:46:59 -08:00
docker-compose.yml run in docker-compose: funny comment 2024-10-29 14:27:22 +01:00
Dockerfile improved Dockerfile 2025-05-03 18:01:05 -07:00
eslint.config.js improve coding: no floating promises, place air, better prompt 2025-03-20 16:42:26 -05:00
FAQ.md Update FAQ.md 2024-11-04 00:06:32 +08:00
keys.example.json Update keys.example.json 2025-03-14 12:36:00 -07:00
LICENSE refactor tasks to task folder, readd license 2025-04-21 12:20:06 -05:00
logger.js Update logger.js 2025-06-07 16:59:50 -07:00
main.js Update main.js 2025-03-14 12:41:12 -07:00
minecollab.md minecollab further documentation 2025-04-23 14:35:54 -07:00
package.json Fix: Use mic as fallback for STT if naudiodon fails 2025-06-07 23:13:07 +00:00
README.md Fix: Make naudiodon optional and document prerequisites 2025-06-07 23:01:17 +00:00
requirements.txt add markdown file for explaining minecollab 2025-04-21 12:08:58 -07:00
settings.js Update settings.js 2025-06-07 16:17:00 -07:00
test_agent_vision_log.js Fix: Improve vision logging and add comments 2025-06-07 22:29:19 +00:00
viewer.html fix syntax error, var->let, remove examples 2025-02-04 16:26:00 -06:00

Mindcraft 🧠⛏️

Crafting minds for Minecraft with LLMs and Mineflayer!

FAQ | Discord Support | Video Tutorial | Blog Post | Contributor TODO | Paper Website | MineCollab

Caution

Do not connect this bot to public servers with coding enabled. This project allows an LLM to write/execute code on your computer. The code is sandboxed, but still vulnerable to injection attacks. Code writing is disabled by default, you can enable it by setting allow_insecure_coding to true in settings.js. Ye be warned.

Requirements

Installation Prerequisites

naudiodon for Speech-to-Text (STT)

The STT (Speech-to-Text) functionality in Mindcraft uses the naudiodon package for audio input. naudiodon is a native Node.js addon and might require additional steps to compile correctly during npm install.

naudiodon is an optional dependency. This means:

  • If naudiodon fails to install or build, the core Mindcraft application will still run.
  • However, the Speech-to-Text (STT) feature will be automatically disabled if naudiodon is not available. You will see warnings in the console if it fails to load.
  • If you wish to use STT and encounter build issues with naudiodon, please ensure you have the necessary build tools and libraries listed below for your operating system.

General Requirements for Building naudiodon:

  • Node.js: Ensure Node.js (v14+) is properly installed and added to your system's PATH.
  • Python: node-gyp (the tool used to build native addons like naudiodon) requires Python. Recent versions of node-gyp are compatible with Python 3.x. Make sure Python is installed and accessible.
  • C++ Compiler Toolchain: A C++ compiler (like g++ or MSVC) and related build tools (like make or MSBuild) are necessary.
  • PortAudio Library: naudiodon specifically requires the PortAudio library.

Operating System Specifics for PortAudio (and naudiodon build):

Linux

  • Debian/Ubuntu:

    sudo apt-get update
    sudo apt-get install build-essential libasound2-dev libportaudio-dev
    

    (build-essential provides g++, make, etc. libasound2-dev is for ALSA, and libportaudio-dev is crucial for naudiodon.)

  • Fedora/RHEL/CentOS:

    # For newer Fedora (using dnf)
    sudo dnf groupinstall "Development Tools"
    sudo dnf install alsa-lib-devel portaudio-devel
    
    # For older RHEL/CentOS (using yum)
    sudo yum groupinstall "Development Tools"
    sudo yum install alsa-lib-devel portaudio-devel
    

    (portaudio-devel is the equivalent of libportaudio-dev.)

Windows

  • Visual Studio C++ Build Tools: This is the recommended way.
    1. Download the Visual Studio Installer.
    2. Run the installer and select "Desktop development with C++" under the "Workloads" tab. This will install the necessary C++ compiler, MSBuild, and Windows SDKs.
    3. Ensure that Python is correctly configured for node-gyp. If you have multiple Python versions, you might need to tell npm which one to use (e.g., npm config set python C:\path\to\python.exe) or ensure your desired Python version is first in your system's PATH.
  • MSYS2/MinGW: While possible, this can be more complex. You would need to compile/install PortAudio within the MSYS2 environment and ensure node-gyp is configured to use the MinGW toolchain. Using the Visual Studio C++ Build Tools is generally more straightforward for node-gyp on Windows.

macOS

  • Xcode Command Line Tools:
    xcode-select --install
    
    (This installs Clang, make, and other necessary build tools.)
  • PortAudio:
    brew install portaudio
    
    (Homebrew is the easiest way to install PortAudio on macOS.)
  • pkg-config (if needed):
    brew install pkg-config
    
    (Sometimes required for build scripts to find library information.)

If you see warnings or errors related to naudiodon during npm install and you do not intend to use the STT feature, these can typically be ignored. If you do want STT, ensure the above prerequisites are met.

Install and Run

  1. Make sure you have the requirements above. If you plan to use the STT (Speech-to-Text) feature, also review the "Installation Prerequisites" section regarding naudiodon.

  2. Clone or download this repository (big green button)

  3. Rename keys.example.json to keys.json and fill in your API keys (you only need one). The desired model is set in andy.json or other profiles. For other models refer to the table below.

  4. In terminal/command prompt, run npm install from the installed directory. (Note: If naudiodon fails to build and you don't need STT, you can usually proceed.)

  5. Start a minecraft world and open it to LAN on localhost port 55916

  6. Run node main.js from the installed directory

If you encounter issues, check the FAQ or find support on discord. We are currently not very responsive to github issues.

Tasks

Bot performance can be roughly evaluated with Tasks. Tasks automatically intialize bots with a goal to aquire specific items or construct predefined buildings, and remove the bot once the goal is achieved.

To run tasks, you need python, pip, and optionally conda. You can then install dependencies with pip install -r requirements.txt.

Tasks are defined in json files in the tasks folder, and can be run with: python tasks/run_task_file.py --task_path=tasks/example_tasks.json

For full evaluations, you will need to download and install the task suite. Full instructions.

Model Customization

You can configure project details in settings.js. See file.

You can configure the agent's name, model, and prompts in their profile like andy.json with the model field. For comprehensive details, see Model Specifications.

API Config Variable Example Model name Docs
openai OPENAI_API_KEY gpt-4o-mini docs
google GEMINI_API_KEY gemini-2.0-flash docs
anthropic ANTHROPIC_API_KEY claude-3-haiku-20240307 docs
xai XAI_API_KEY grok-2-1212 docs
deepseek DEEPSEEK_API_KEY deepseek-chat docs
ollama (local) n/a ollama/sweaterdog/andy-4 docs
qwen QWEN_API_KEY qwen-max Intl./cn
mistral MISTRAL_API_KEY mistral-large-latest docs
replicate REPLICATE_API_KEY replicate/meta/meta-llama-3-70b-instruct docs
groq (not grok) GROQCLOUD_API_KEY groq/mixtral-8x7b-32768 docs
huggingface HUGGINGFACE_API_KEY huggingface/mistralai/Mistral-Nemo-Instruct-2407 docs
novita NOVITA_API_KEY novita/deepseek/deepseek-r1 docs
openrouter OPENROUTER_API_KEY openrouter/anthropic/claude-3.5-sonnet docs
glhf.chat GHLF_API_KEY glhf/hf:meta-llama/Llama-3.1-405B-Instruct docs
hyperbolic HYPERBOLIC_API_KEY hyperbolic/deepseek-ai/DeepSeek-V3 docs
vllm n/a vllm/llama3 n/a

If you use Ollama, to install the models used by default (generation and embedding), execute the following terminal command: ollama pull sweaterdog/andy-4 && ollama pull nomic-embed-text

Additional info about Andy-4...

image

Andy-4 is a community made, open-source model made by Sweaterdog to play Minecraft. Since Andy-4 is open-source, which means you can download the model, and play with it offline and for free.

The Andy-4 collection of models has reasoning and non-reasoning modes, sometimes the model will reason automatically without being prompted. If you want to specifically enable reasoning, use the andy-4-reasoning.json profile. Some Andy-4 models may not be able to disable reasoning, no matter what profile is used.

Andy-4 has many different models, and come in different sizes. For more information about which model size is best for you, check Sweaterdog's Ollama page

If you have any Issues, join the Mindcraft server, and ping @Sweaterdog with your issue, or leave an issue on the Andy-4 huggingface repo

Online Servers

To connect to online servers your bot will need an official Microsoft/Minecraft account. You can use your own personal one, but will need another account if you want to connect too and play with it. To connect, change these lines in settings.js:

"host": "111.222.333.444",
"port": 55920,
"auth": "microsoft",

// rest is same...

Important

The bot's name in the profile.json must exactly match the Minecraft profile name! Otherwise the bot will spam talk to itself.

To use different accounts, Mindcraft will connect with the account that the Minecraft launcher is currently using. You can switch accounts in the launcer, then run node main.js, then switch to your main account after the bot has connected.

Docker Container

If you intend to allow_insecure_coding, it is a good idea to run the app in a docker container to reduce risks of running unknown code. This is strongly recommended before connecting to remote servers.

docker run -i -t --rm -v $(pwd):/app -w /app -p 3000-3003:3000-3003 node:latest node main.js

or simply

docker-compose up

When running in docker, if you want the bot to join your local minecraft server, you have to use a special host address host.docker.internal to call your localhost from inside your docker container. Put this into your settings.js:

"host": "host.docker.internal", // instead of "localhost", to join your local minecraft from inside the docker container

To connect to an unsupported minecraft version, you can try to use viaproxy

STT in Mindcraft

STT allows you to speak to the model if you have a microphone

STT can be enabled in settings.js under the section that looks like this:

    "stt_transcription": true, // Change this to "true" to enable STT
    "stt_username": "SYSTEM",
    "stt_agent_name": ""

The Text to Speech engine will begin listening on the system default input device. Note: Successful STT operation depends on the naudiodon package, which is an optional dependency. If naudiodon failed to install or build (see "Installation Prerequisites" for troubleshooting), STT will be disabled.

When using STT, you need a GroqCloud API key as Groq is used for Audio transcription

Bot Profiles

Bot profiles are json files (such as andy.json) that define:

  1. Bot backend LLMs to use for talking, coding, and embedding.
  2. Prompts used to influence the bot's behavior.
  3. Examples help the bot perform tasks.

Model Specifications

LLM models can be specified simply as "model": "gpt-4o". However, you can use different models for chat, coding, and embeddings. You can pass a string or an object for these fields. A model object must specify an api, and optionally a model, url, and additional params.

"model": {
  "api": "openai",
  "model": "gpt-4o",
  "url": "https://api.openai.com/v1/",
  "params": {
    "max_tokens": 1000,
    "temperature": 1
  }
},
"code_model": {
  "api": "openai",
  "model": "gpt-4",
  "url": "https://api.openai.com/v1/"
},
"vision_model": {
  "api": "openai",
  "model": "gpt-4o",
  "url": "https://api.openai.com/v1/"
},
"embedding": {
  "api": "openai",
  "url": "https://api.openai.com/v1/",
  "model": "text-embedding-ada-002"
}

model is used for chat, code_model is used for newAction coding, vision_model is used for image interpretation, and embedding is used to embed text for example selection. If code_model or vision_model is not specified, model will be used by default. Not all APIs support embeddings or vision.

All apis have default models and urls, so those fields are optional. The params field is optional and can be used to specify additional parameters for the model. It accepts any key-value pairs supported by the api. Is not supported for embedding models.

Embedding Models

Embedding models are used to embed and efficiently select relevant examples for conversation and coding.

Supported Embedding APIs: openai, google, replicate, huggingface, novita

If you try to use an unsupported model, then it will default to a simple word-overlap method. Expect reduced performance, recommend mixing APIs to ensure embedding support.

Specifying Profiles via Command Line

By default, the program will use the profiles specified in settings.js. You can specify one or more agent profiles using the --profiles argument: node main.js --profiles ./profiles/andy.json ./profiles/jill.json

Patches

Some of the node modules that we depend on have bugs in them. To add a patch, change your local node module file and run npx patch-package [package-name]

Citation:

@article{mindcraft2025,
  title = {Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning},
  author = {White*, Isadora and Nottingham*, Kolby and Maniar, Ayush and Robinson, Max and Lillemark, Hansen and Maheshwari, Mehul and Qin, Lianhui and Ammanabrolu, Prithviraj},
  journal = {arXiv preprint arXiv:2504.17950},
  year = {2025},
  url = {https://arxiv.org/abs/2504.17950},
}