Clarifying instructions for installing tmux

This commit is contained in:
Isadora White 2025-06-09 01:55:47 -05:00 committed by GitHub
parent f2f06fcf3f
commit 3661114321
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -55,9 +55,20 @@ pip install -r requirements.txt
Then, you can run the evaluation_script **from the project root** using `python tasks/evaluation_script.py --task_path {your-task-path} --model {model you want to use}`.
### Tmux Installation
**MacOS**:
1. If brew isn't already installed run `/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"`
2. `brew install tmux`
**Linux**: `apt-get -y install tmux`
**Windows**: You can not use tmux on Windows, but you can run tasks with the --no-launch-world flag. Run
```
cd /tasks/server_data/
java -jar server.jar
```
If you want to run with vllm be sure to run with `--api vllm --url {your_url_for_vllm} --model {model_name}`, by default vllm will use http://127.0.0.1:8000/v1 as the url for quering the model!
When running with construction tasks, make sure to set the flag `--insecure_coding` so that the agents can be allowed to write freeform javascript code to complete the tasks. However, when using insecure coding it is highly recommended to use a docker container to avoid damage to your computer.
When running with construction tasks, make sure to set the flag `--insecure_coding` so that the agents can be allowed to write freeform javascript code to complete the tasks. However, when using insecure coding it is **highly recommended** to use a docker container to avoid damage to your computer.
When running an experiment that requires more than 2 agents, use the `--num_agents` flag to match the number of agents in your task file. For example, if you are running a task file with 3 agents, use `--num_agents 3`.
@ -81,7 +92,7 @@ python tasks/evaluation_script.py --task_path {path_to_two_agent_construction_ta
When you launch the evaluation script, you will see the minecraft server being launched. If you want to join this world, you can connect to it on the port localhost:55916 the way you would a standard Minecraft world (go to single player -> direct connection -> type in localhost:55916) It may take a few minutes for everything to be properly loaded - as first the agents need to be added to the world and given the correct permissions to use cheats and add inventory. After about 5 minutes everything should be loaded and working. If you wish to kill the experiment run `tmux kill-server`. Sometimes there will be issues copying the files, if this happens you can run the python file twice.
## Installation (without tmux)
## Windows Installation (without tmux)
If you are on a machine that can't run tmux (like a Windows PC without WSL) or you don't care about doing evaluations only running tasks you can run the following script
@ -99,7 +110,7 @@ As you run, the evalaution script will evaluate the performance so far. It will
### Running multiple worlds in parallel
You can use `--num_parallel` to run multiple Minecraft worlds in parallel. This will launch `n` tmux shells, claled `server_i` and shell `i`, where `i` corresponds to ith parallel world. It will also copy worlds into `server_data_i` as well. On an M3 Mac with 34 GB of RAM, we can normally support up to 4 parallel worlds. When running an open source model, it is more likely you will be constrained by the throughput and size of your GPU RAM. On a cluster of 8 H100s you can expect to run 4 experiments in parallel. However, for best performance it is advisable to only use one parallel world.
You can use `--num_parallel` to run multiple Minecraft worlds in parallel. This will launch `n` tmux shells, called `server_i` and shell `i`, where `i` corresponds to ith parallel world. It will also copy worlds into `server_data_i` as well. On an M3 Mac with 34 GB of RAM, we can normally support up to 4 parallel worlds. When running an open source model, it is more likely you will be constrained by the throughput and size of your GPU RAM. On a cluster of 8 H100s you can expect to run 4 experiments in parallel. However, for best performance it is advisable to only use one parallel world.
### Using an S3 Bucket to store files
To use S3 set the --s3 flag and the --bucket_name to use an s3 bucket to log all the files collected. It will also copy the /bots folder in this case with all of the files in there.