mirror of https://github.com/kolbytn/mindcraft.git synced 2025-07-28 02:45:27 +02:00

Mirror of https://github.com/kolbytn/mindcraft.git

Find a file

Sweaterdog d116e90126 Update prompter.js Fixed spacing and logging		2025-06-07 17:17:51 -07:00
bots	stopped tracking files within bots/	2025-03-20 14:04:21 -07:00
patches	Merge branch 'kolbytn:main' into TTS	2025-04-20 21:36:58 -07:00
profiles	Merge branch 'main' into Speech-to-Text	2025-06-07 14:59:35 -07:00
services/viaproxy	run in docker-compose: readme	2024-10-30 09:31:21 +01:00
src	Update prompter.js	2025-06-07 17:17:51 -07:00
tasks	remove unnecessary changes	2025-05-23 12:02:23 -07:00
.gitignore	add ds_store to gitignore	2025-04-23 14:19:26 -05:00
andy.json	Fixed agent.js, removed unncessary change in main.js, moved/deleted unnecessary files	2025-03-08 16:46:59 -08:00
docker-compose.yml	run in docker-compose: funny comment	2024-10-29 14:27:22 +01:00
Dockerfile	improved Dockerfile	2025-05-03 18:01:05 -07:00
eslint.config.js	improve coding: no floating promises, place air, better prompt	2025-03-20 16:42:26 -05:00
FAQ.md	Update FAQ.md	2024-11-04 00:06:32 +08:00
keys.example.json	Update keys.example.json	2025-03-14 12:36:00 -07:00
LICENSE	refactor tasks to task folder, readd license	2025-04-21 12:20:06 -05:00
logger.js	Update logger.js	2025-06-07 16:59:50 -07:00
main.js	Update main.js	2025-03-14 12:41:12 -07:00
minecollab.md	minecollab further documentation	2025-04-23 14:35:54 -07:00
package.json	Fix: Use mic as fallback for STT if naudiodon fails	2025-06-07 23:13:07 +00:00
README.md	Fix: Make naudiodon optional and document prerequisites	2025-06-07 23:01:17 +00:00
requirements.txt	add markdown file for explaining minecollab	2025-04-21 12:08:58 -07:00
settings.js	Update settings.js	2025-06-07 16:17:00 -07:00
test_agent_vision_log.js	Fix: Improve vision logging and add comments	2025-06-07 22:29:19 +00:00
viewer.html	fix syntax error, var->let, remove examples	2025-02-04 16:26:00 -06:00

README.md

Mindcraft 🧠⛏️

Crafting minds for Minecraft with LLMs and Mineflayer!

Caution

Do not connect this bot to public servers with coding enabled. This project allows an LLM to write/execute code on your computer. The code is sandboxed, but still vulnerable to injection attacks. Code writing is disabled by default, you can enable it by setting allow_insecure_coding to true in settings.js. Ye be warned.

Requirements

Minecraft Java Edition (up to v1.21.1, recommend v1.20.4)
Node.js Installed (at least v14)
One of these: OpenAI API Key | Gemini API Key | Anthropic API Key | Replicate API Key | Hugging Face API Key | Groq API Key | Ollama Installed. | Mistral API Key | Qwen API Key [Intl.]/[cn] | Novita AI API Key |

Installation Prerequisites

`naudiodon` for Speech-to-Text (STT)

The STT (Speech-to-Text) functionality in Mindcraft uses the naudiodon package for audio input. naudiodon is a native Node.js addon and might require additional steps to compile correctly during npm install.

naudiodon is an optional dependency. This means:

If naudiodon fails to install or build, the core Mindcraft application will still run.
However, the Speech-to-Text (STT) feature will be automatically disabled if naudiodon is not available. You will see warnings in the console if it fails to load.
If you wish to use STT and encounter build issues with naudiodon, please ensure you have the necessary build tools and libraries listed below for your operating system.

General Requirements for Building naudiodon:

Node.js: Ensure Node.js (v14+) is properly installed and added to your system's PATH.
Python: node-gyp (the tool used to build native addons like naudiodon) requires Python. Recent versions of node-gyp are compatible with Python 3.x. Make sure Python is installed and accessible.
C++ Compiler Toolchain: A C++ compiler (like g++ or MSVC) and related build tools (like make or MSBuild) are necessary.
PortAudio Library: naudiodon specifically requires the PortAudio library.

Operating System Specifics for PortAudio (and naudiodon build):

Linux

Debian/Ubuntu:
```
sudo apt-get update
sudo apt-get install build-essential libasound2-dev libportaudio-dev
```
(build-essential provides g++, make, etc. libasound2-dev is for ALSA, and libportaudio-dev is crucial for naudiodon.)

Fedora/RHEL/CentOS:

# For newer Fedora (using dnf)
sudo dnf groupinstall "Development Tools"
sudo dnf install alsa-lib-devel portaudio-devel

# For older RHEL/CentOS (using yum)
sudo yum groupinstall "Development Tools"
sudo yum install alsa-lib-devel portaudio-devel

(portaudio-devel is the equivalent of libportaudio-dev.)

Windows

Visual Studio C++ Build Tools: This is the recommended way.
1. Download the Visual Studio Installer.
2. Run the installer and select "Desktop development with C++" under the "Workloads" tab. This will install the necessary C++ compiler, MSBuild, and Windows SDKs.
3. Ensure that Python is correctly configured for node-gyp. If you have multiple Python versions, you might need to tell npm which one to use (e.g., npm config set python C:\path\to\python.exe) or ensure your desired Python version is first in your system's PATH.
MSYS2/MinGW: While possible, this can be more complex. You would need to compile/install PortAudio within the MSYS2 environment and ensure node-gyp is configured to use the MinGW toolchain. Using the Visual Studio C++ Build Tools is generally more straightforward for node-gyp on Windows.

macOS

Xcode Command Line Tools:
```
xcode-select --install
```
(This installs Clang, make, and other necessary build tools.)
PortAudio:
```
brew install portaudio
```
(Homebrew is the easiest way to install PortAudio on macOS.)
pkg-config (if needed):
```
brew install pkg-config
```
(Sometimes required for build scripts to find library information.)

If you see warnings or errors related to naudiodon during npm install and you do not intend to use the STT feature, these can typically be ignored. If you do want STT, ensure the above prerequisites are met.

Install and Run

Make sure you have the requirements above. If you plan to use the STT (Speech-to-Text) feature, also review the "Installation Prerequisites" section regarding naudiodon.
Clone or download this repository (big green button)
Rename keys.example.json to keys.json and fill in your API keys (you only need one). The desired model is set in andy.json or other profiles. For other models refer to the table below.
In terminal/command prompt, run npm install from the installed directory. (Note: If naudiodon fails to build and you don't need STT, you can usually proceed.)
Start a minecraft world and open it to LAN on localhost port 55916
Run node main.js from the installed directory

If you encounter issues, check the FAQ or find support on discord. We are currently not very responsive to github issues.

Tasks

Bot performance can be roughly evaluated with Tasks. Tasks automatically intialize bots with a goal to aquire specific items or construct predefined buildings, and remove the bot once the goal is achieved.

To run tasks, you need python, pip, and optionally conda. You can then install dependencies with pip install -r requirements.txt.

Tasks are defined in json files in the tasks folder, and can be run with: python tasks/run_task_file.py --task_path=tasks/example_tasks.json

For full evaluations, you will need to download and install the task suite. Full instructions.

Model Customization

You can configure project details in settings.js. See file.

You can configure the agent's name, model, and prompts in their profile like andy.json with the model field. For comprehensive details, see Model Specifications.

API	Config Variable	Example Model name	Docs
`openai`	`OPENAI_API_KEY`	`gpt-4o-mini`	docs
`google`	`GEMINI_API_KEY`	`gemini-2.0-flash`	docs
`anthropic`	`ANTHROPIC_API_KEY`	`claude-3-haiku-20240307`	docs
`xai`	`XAI_API_KEY`	`grok-2-1212`	docs
`deepseek`	`DEEPSEEK_API_KEY`	`deepseek-chat`	docs
`ollama` (local)	n/a	`ollama/sweaterdog/andy-4`	docs
`qwen`	`QWEN_API_KEY`	`qwen-max`	Intl./cn
`mistral`	`MISTRAL_API_KEY`	`mistral-large-latest`	docs
`replicate`	`REPLICATE_API_KEY`	`replicate/meta/meta-llama-3-70b-instruct`	docs
`groq` (not grok)	`GROQCLOUD_API_KEY`	`groq/mixtral-8x7b-32768`	docs
`huggingface`	`HUGGINGFACE_API_KEY`	`huggingface/mistralai/Mistral-Nemo-Instruct-2407`	docs
`novita`	`NOVITA_API_KEY`	`novita/deepseek/deepseek-r1`	docs
`openrouter`	`OPENROUTER_API_KEY`	`openrouter/anthropic/claude-3.5-sonnet`	docs
`glhf.chat`	`GHLF_API_KEY`	`glhf/hf:meta-llama/Llama-3.1-405B-Instruct`	docs
`hyperbolic`	`HYPERBOLIC_API_KEY`	`hyperbolic/deepseek-ai/DeepSeek-V3`	docs
`vllm`	n/a	`vllm/llama3`	n/a

If you use Ollama, to install the models used by default (generation and embedding), execute the following terminal command: ollama pull sweaterdog/andy-4 && ollama pull nomic-embed-text

Additional info about Andy-4...

Andy-4 is a community made, open-source model made by Sweaterdog to play Minecraft. Since Andy-4 is open-source, which means you can download the model, and play with it offline and for free.

The Andy-4 collection of models has reasoning and non-reasoning modes, sometimes the model will reason automatically without being prompted. If you want to specifically enable reasoning, use the andy-4-reasoning.json profile. Some Andy-4 models may not be able to disable reasoning, no matter what profile is used.

Andy-4 has many different models, and come in different sizes. For more information about which model size is best for you, check Sweaterdog's Ollama page

If you have any Issues, join the Mindcraft server, and ping @Sweaterdog with your issue, or leave an issue on the Andy-4 huggingface repo

Online Servers

To connect to online servers your bot will need an official Microsoft/Minecraft account. You can use your own personal one, but will need another account if you want to connect too and play with it. To connect, change these lines in settings.js:

"host": "111.222.333.444",
"port": 55920,
"auth": "microsoft",

// rest is same...

Important

The bot's name in the profile.json must exactly match the Minecraft profile name! Otherwise the bot will spam talk to itself.

To use different accounts, Mindcraft will connect with the account that the Minecraft launcher is currently using. You can switch accounts in the launcer, then run node main.js, then switch to your main account after the bot has connected.

Docker Container

If you intend to allow_insecure_coding, it is a good idea to run the app in a docker container to reduce risks of running unknown code. This is strongly recommended before connecting to remote servers.

docker run -i -t --rm -v $(pwd):/app -w /app -p 3000-3003:3000-3003 node:latest node main.js

or simply

docker-compose up

When running in docker, if you want the bot to join your local minecraft server, you have to use a special host address host.docker.internal to call your localhost from inside your docker container. Put this into your settings.js:

"host": "host.docker.internal", // instead of "localhost", to join your local minecraft from inside the docker container

To connect to an unsupported minecraft version, you can try to use viaproxy

STT in Mindcraft

STT allows you to speak to the model if you have a microphone

STT can be enabled in settings.js under the section that looks like this:

    "stt_transcription": true, // Change this to "true" to enable STT
    "stt_username": "SYSTEM",
    "stt_agent_name": ""

The Text to Speech engine will begin listening on the system default input device. Note: Successful STT operation depends on the naudiodon package, which is an optional dependency. If naudiodon failed to install or build (see "Installation Prerequisites" for troubleshooting), STT will be disabled.

When using STT, you need a GroqCloud API key as Groq is used for Audio transcription

Bot Profiles

Bot profiles are json files (such as andy.json) that define:

Bot backend LLMs to use for talking, coding, and embedding.
Prompts used to influence the bot's behavior.
Examples help the bot perform tasks.

Model Specifications

LLM models can be specified simply as "model": "gpt-4o". However, you can use different models for chat, coding, and embeddings. You can pass a string or an object for these fields. A model object must specify an api, and optionally a model, url, and additional params.

"model": {
  "api": "openai",
  "model": "gpt-4o",
  "url": "https://api.openai.com/v1/",
  "params": {
    "max_tokens": 1000,
    "temperature": 1
  }
},
"code_model": {
  "api": "openai",
  "model": "gpt-4",
  "url": "https://api.openai.com/v1/"
},
"vision_model": {
  "api": "openai",
  "model": "gpt-4o",
  "url": "https://api.openai.com/v1/"
},
"embedding": {
  "api": "openai",
  "url": "https://api.openai.com/v1/",
  "model": "text-embedding-ada-002"
}

model is used for chat, code_model is used for newAction coding, vision_model is used for image interpretation, and embedding is used to embed text for example selection. If code_model or vision_model is not specified, model will be used by default. Not all APIs support embeddings or vision.

All apis have default models and urls, so those fields are optional. The params field is optional and can be used to specify additional parameters for the model. It accepts any key-value pairs supported by the api. Is not supported for embedding models.

Embedding Models

Embedding models are used to embed and efficiently select relevant examples for conversation and coding.

Supported Embedding APIs: openai, google, replicate, huggingface, novita

If you try to use an unsupported model, then it will default to a simple word-overlap method. Expect reduced performance, recommend mixing APIs to ensure embedding support.

Specifying Profiles via Command Line

By default, the program will use the profiles specified in settings.js. You can specify one or more agent profiles using the --profiles argument: node main.js --profiles ./profiles/andy.json ./profiles/jill.json

Patches

Some of the node modules that we depend on have bugs in them. To add a patch, change your local node module file and run npx patch-package [package-name]

Citation:

@article{mindcraft2025,
  title = {Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning},
  author = {White*, Isadora and Nottingham*, Kolby and Maniar, Ayush and Robinson, Max and Lillemark, Hansen and Maheshwari, Mehul and Qin, Lianhui and Ammanabrolu, Prithviraj},
  journal = {arXiv preprint arXiv:2504.17950},
  year = {2025},
  url = {https://arxiv.org/abs/2504.17950},
}