Johnathan Walker
f7947ec3c2
refactor: Eliminate code duplication and enhance development workflow
...
- Created tasks/experiment_utils.py for shared utility functions
- Streamlined entry point scripts by moving common code to utils
- Enhanced .gitignore with comprehensive Python development patterns
- Validated and fixed documentation links across all markdown files
- Applied final code quality improvements and optimization
2025-06-15 23:12:34 -04:00
Johnathan Walker
3c6649f224
Add npm cache directories to .gitignore to prevent accidental commits
2025-06-15 22:22:30 -04:00
Johnathan Walker
cc51242527
feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting
...
- Added new evaluation.py with dynamic agent configuration support
- Implemented comprehensive test suite (38 tests, 100% pass rate)
- Enhanced evaluation_script.py with improved error handling and logging
- Updated analysis tools for better outcome reporting and visualization
- Added extensive documentation including architecture guide and user manuals
- Maintained backward compatibility with existing task formats
- Improved performance and reliability for multi-agent evaluations
Key improvements:
- Flexible agent count configuration (1-N agents)
- Rich outcome data structures with detailed metrics
- Comprehensive error handling and recovery mechanisms
- Enhanced logging and debugging capabilities
- Complete test coverage for production readiness
Files added/modified:
- tasks/evaluation.py (new core evaluation engine)
- tasks/test_*.py (comprehensive test suite)
- docs/ (complete documentation suite)
- Updated analysis and visualization tools
2025-06-15 22:01:19 -04:00
Max Robinson
5fe256d10a
Merge pull request #557 from icwhite/main
...
Human AI Tasks Update
2025-06-10 13:58:04 -05:00
Max Robinson
a33465ce03
better node version
2025-06-10 13:56:56 -05:00
Isadora White
447b906ce3
Update minecollab.md
2025-06-09 02:00:57 -05:00
Isadora White
5a403951d1
Update minecollab.md
2025-06-09 02:00:36 -05:00
Isadora White
00aa14ab5f
Update minecollab.md
2025-06-09 02:00:04 -05:00
Isadora White
3661114321
Clarifying instructions for installing tmux
2025-06-09 01:55:47 -05:00
Isadora White
3a43b3c03c
Fix a small typo
2025-06-09 01:46:08 -05:00
Isadora White
6748b65fcb
Update README.md
2025-06-09 01:45:28 -05:00
Isadora White
088b71a99a
more friendly messages in the python evaluation script to make it more easy for the users to understand what is happening
2025-06-09 01:35:18 -05:00
Isadora White
1f11b3bf55
Merge branch 'vllm_debugging'
2025-06-09 01:30:38 -05:00
Isadora White
0503ee3409
remove the conversation thingie from the set agent goal command
2025-06-01 18:46:50 -05:00
Isadora White
0bffe111b1
fixing weird conversation thing maybe
2025-06-01 18:43:28 -05:00
Isadora White
38ec20fd80
Merge branch 'kolbytn:main' into main
2025-05-30 17:47:58 -05:00
Isadora White
1331239830
small changes to tasks
2025-05-30 17:39:47 -05:00
Isadora White
ef9fb74757
cleaning up human ai tasks
2025-05-26 21:25:10 -07:00
Isadora White
6b4b895cc1
human ai tasks
2025-05-26 16:46:42 -07:00
Max Robinson
f2f06fcf3f
Merge pull request #540 from icwhite/main
...
Small Fixes and lots of Task reworking
2025-05-24 12:30:33 -06:00
Isadora White
fa02028b8b
remove unnecessary changes
2025-05-23 12:02:23 -07:00
Isadora White
b55f92800f
restore settings.js
2025-05-23 11:56:40 -07:00
Isadora White
f7e4fee249
update README and remove useless tasks
2025-05-23 11:54:53 -07:00
Isadora White
77535f97d5
fix goal string issues
2025-05-23 11:49:51 -07:00
Kolby Nottingham
c4e23ea387
Merge pull request #550 from rajammanabrolu/main
...
Update README.md with bib for arxiv paper
2025-05-21 09:50:38 -07:00
Prithviraj Ammanabrolu
0fabaa8e90
smol
2025-05-21 09:48:28 -07:00
Prithviraj Ammanabrolu
99af6506aa
Update README.md with bib
2025-05-21 09:44:47 -07:00
Isadora White
a1bd99dc43
small changes
2025-05-14 14:27:38 -07:00
Isadora White
87e56092bf
fix inventories for hells kitchen
2025-05-13 16:48:32 -07:00
Isadora White
ef5f7dfe61
remaining tasks
2025-05-13 16:35:36 -07:00
Isadora White
c5490ee024
full cooking tasks
2025-05-13 16:01:06 -07:00
Isadora White
a655357267
all possible hells kitchen tasks and partial plan tasks
2025-05-13 15:55:10 -07:00
Isadora White
748334f7c0
new cooking tasks
2025-05-13 15:18:18 -07:00
Isadora White
c0577a64cb
update cooking profile so they don't hunt around for chests and try catch loop around the get crafting plan
2025-05-12 21:47:47 -07:00
Isadora White
994685496b
better blocked actions and hells kitchen tasks
2025-05-12 20:02:22 -07:00
Isadora White
c1d106de0f
fixing crafting tasks as well
2025-05-12 19:46:49 -07:00
Isadora White
015d38ab69
bone meal is a looping item in 1.21.1
2025-05-12 18:15:12 -07:00
Isadora White
155dbae436
longer timeouts for tasks
2025-05-12 12:32:22 -07:00
Max Robinson
8d016b80f9
Merge pull request #535 from aeromechanic000/get_env_key
...
return key instead of keys[name]
2025-05-12 12:08:56 -05:00
Isadora White
09595d2f3b
fixing small task timeout bug
2025-05-11 16:43:08 -07:00
Isadora White
a42dc3342d
hells kitchen and blocked access tasks
2025-05-10 18:38:20 -07:00
Isadora White
e049abb708
making more test tasks for cooking
2025-05-10 18:17:06 -07:00
Isadora White
4ae95cba38
collaboration train tasks with 2 items for cooking
2025-05-10 17:07:08 -07:00
Isadora White
82475f7934
adding some small changes to help with human ai results
2025-05-09 15:22:27 -07:00
Isadora White
c2ce6aed0d
new human ai tasks for new cooking tasks
2025-05-08 12:39:32 -07:00
Isadora White
88b974f332
one agent tasks
2025-05-07 17:15:50 -07:00
Isadora White
8233a29dac
longer timeout and more task pruning
2025-05-07 16:06:25 -07:00
Isadora White
3151253246
update 2 item tasks to require collaboration further
2025-05-07 16:01:25 -07:00
Isadora White
3d27399c3b
long timeout for 4 and 5 agents
2025-05-06 18:34:13 -07:00
Isadora White
216a4cde5d
2 and 3 item tasks
2025-05-06 18:28:22 -07:00