Commit graph

1364 commits

Author SHA1 Message Date
Johnathan Walker
f7947ec3c2 refactor: Eliminate code duplication and enhance development workflow
- Created tasks/experiment_utils.py for shared utility functions
- Streamlined entry point scripts by moving common code to utils
- Enhanced .gitignore with comprehensive Python development patterns
- Validated and fixed documentation links across all markdown files
- Applied final code quality improvements and optimization
2025-06-15 23:12:34 -04:00
Johnathan Walker
3c6649f224 Add npm cache directories to .gitignore to prevent accidental commits 2025-06-15 22:22:30 -04:00
Johnathan Walker
cc51242527 feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting
- Added new evaluation.py with dynamic agent configuration support
- Implemented comprehensive test suite (38 tests, 100% pass rate)
- Enhanced evaluation_script.py with improved error handling and logging
- Updated analysis tools for better outcome reporting and visualization
- Added extensive documentation including architecture guide and user manuals
- Maintained backward compatibility with existing task formats
- Improved performance and reliability for multi-agent evaluations

Key improvements:
- Flexible agent count configuration (1-N agents)
- Rich outcome data structures with detailed metrics
- Comprehensive error handling and recovery mechanisms
- Enhanced logging and debugging capabilities
- Complete test coverage for production readiness

Files added/modified:
- tasks/evaluation.py (new core evaluation engine)
- tasks/test_*.py (comprehensive test suite)
- docs/ (complete documentation suite)
- Updated analysis and visualization tools
2025-06-15 22:01:19 -04:00
Max Robinson
5fe256d10a
Merge pull request #557 from icwhite/main
Human AI Tasks Update
2025-06-10 13:58:04 -05:00
Max Robinson
a33465ce03
better node version 2025-06-10 13:56:56 -05:00
Isadora White
447b906ce3
Update minecollab.md 2025-06-09 02:00:57 -05:00
Isadora White
5a403951d1
Update minecollab.md 2025-06-09 02:00:36 -05:00
Isadora White
00aa14ab5f
Update minecollab.md 2025-06-09 02:00:04 -05:00
Isadora White
3661114321
Clarifying instructions for installing tmux 2025-06-09 01:55:47 -05:00
Isadora White
3a43b3c03c
Fix a small typo 2025-06-09 01:46:08 -05:00
Isadora White
6748b65fcb
Update README.md 2025-06-09 01:45:28 -05:00
Isadora White
088b71a99a more friendly messages in the python evaluation script to make it more easy for the users to understand what is happening 2025-06-09 01:35:18 -05:00
Isadora White
1f11b3bf55 Merge branch 'vllm_debugging' 2025-06-09 01:30:38 -05:00
Isadora White
0503ee3409 remove the conversation thingie from the set agent goal command 2025-06-01 18:46:50 -05:00
Isadora White
0bffe111b1 fixing weird conversation thing maybe 2025-06-01 18:43:28 -05:00
Isadora White
38ec20fd80
Merge branch 'kolbytn:main' into main 2025-05-30 17:47:58 -05:00
Isadora White
1331239830 small changes to tasks 2025-05-30 17:39:47 -05:00
Isadora White
ef9fb74757 cleaning up human ai tasks 2025-05-26 21:25:10 -07:00
Isadora White
6b4b895cc1 human ai tasks 2025-05-26 16:46:42 -07:00
Max Robinson
f2f06fcf3f
Merge pull request #540 from icwhite/main
Small Fixes and lots of Task reworking
2025-05-24 12:30:33 -06:00
Isadora White
fa02028b8b remove unnecessary changes 2025-05-23 12:02:23 -07:00
Isadora White
b55f92800f restore settings.js 2025-05-23 11:56:40 -07:00
Isadora White
f7e4fee249 update README and remove useless tasks 2025-05-23 11:54:53 -07:00
Isadora White
77535f97d5 fix goal string issues 2025-05-23 11:49:51 -07:00
Kolby Nottingham
c4e23ea387
Merge pull request #550 from rajammanabrolu/main
Update README.md with bib for arxiv paper
2025-05-21 09:50:38 -07:00
Prithviraj Ammanabrolu
0fabaa8e90
smol 2025-05-21 09:48:28 -07:00
Prithviraj Ammanabrolu
99af6506aa
Update README.md with bib 2025-05-21 09:44:47 -07:00
Isadora White
a1bd99dc43 small changes 2025-05-14 14:27:38 -07:00
Isadora White
87e56092bf fix inventories for hells kitchen 2025-05-13 16:48:32 -07:00
Isadora White
ef5f7dfe61 remaining tasks 2025-05-13 16:35:36 -07:00
Isadora White
c5490ee024 full cooking tasks 2025-05-13 16:01:06 -07:00
Isadora White
a655357267 all possible hells kitchen tasks and partial plan tasks 2025-05-13 15:55:10 -07:00
Isadora White
748334f7c0 new cooking tasks 2025-05-13 15:18:18 -07:00
Isadora White
c0577a64cb update cooking profile so they don't hunt around for chests and try catch loop around the get crafting plan 2025-05-12 21:47:47 -07:00
Isadora White
994685496b better blocked actions and hells kitchen tasks 2025-05-12 20:02:22 -07:00
Isadora White
c1d106de0f fixing crafting tasks as well 2025-05-12 19:46:49 -07:00
Isadora White
015d38ab69 bone meal is a looping item in 1.21.1 2025-05-12 18:15:12 -07:00
Isadora White
155dbae436 longer timeouts for tasks 2025-05-12 12:32:22 -07:00
Max Robinson
8d016b80f9
Merge pull request #535 from aeromechanic000/get_env_key
return key instead of keys[name]
2025-05-12 12:08:56 -05:00
Isadora White
09595d2f3b fixing small task timeout bug 2025-05-11 16:43:08 -07:00
Isadora White
a42dc3342d hells kitchen and blocked access tasks 2025-05-10 18:38:20 -07:00
Isadora White
e049abb708 making more test tasks for cooking 2025-05-10 18:17:06 -07:00
Isadora White
4ae95cba38 collaboration train tasks with 2 items for cooking 2025-05-10 17:07:08 -07:00
Isadora White
82475f7934 adding some small changes to help with human ai results 2025-05-09 15:22:27 -07:00
Isadora White
c2ce6aed0d new human ai tasks for new cooking tasks 2025-05-08 12:39:32 -07:00
Isadora White
88b974f332 one agent tasks 2025-05-07 17:15:50 -07:00
Isadora White
8233a29dac longer timeout and more task pruning 2025-05-07 16:06:25 -07:00
Isadora White
3151253246 update 2 item tasks to require collaboration further 2025-05-07 16:01:25 -07:00
Isadora White
3d27399c3b long timeout for 4 and 5 agents 2025-05-06 18:34:13 -07:00
Isadora White
216a4cde5d 2 and 3 item tasks 2025-05-06 18:28:22 -07:00