Johnathan Walker
cc51242527
feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting
...
- Added new evaluation.py with dynamic agent configuration support
- Implemented comprehensive test suite (38 tests, 100% pass rate)
- Enhanced evaluation_script.py with improved error handling and logging
- Updated analysis tools for better outcome reporting and visualization
- Added extensive documentation including architecture guide and user manuals
- Maintained backward compatibility with existing task formats
- Improved performance and reliability for multi-agent evaluations
Key improvements:
- Flexible agent count configuration (1-N agents)
- Rich outcome data structures with detailed metrics
- Comprehensive error handling and recovery mechanisms
- Enhanced logging and debugging capabilities
- Complete test coverage for production readiness
Files added/modified:
- tasks/evaluation.py (new core evaluation engine)
- tasks/test_*.py (comprehensive test suite)
- docs/ (complete documentation suite)
- Updated analysis and visualization tools
2025-06-15 22:01:19 -04:00
Isadora White
088b71a99a
more friendly messages in the python evaluation script to make it more easy for the users to understand what is happening
2025-06-09 01:35:18 -05:00
Isadora White
1331239830
small changes to tasks
2025-05-30 17:39:47 -05:00
Isadora White
ef9fb74757
cleaning up human ai tasks
2025-05-26 21:25:10 -07:00
Isadora White
6b4b895cc1
human ai tasks
2025-05-26 16:46:42 -07:00
Isadora White
fa02028b8b
remove unnecessary changes
2025-05-23 12:02:23 -07:00
Isadora White
f7e4fee249
update README and remove useless tasks
2025-05-23 11:54:53 -07:00
Isadora White
77535f97d5
fix goal string issues
2025-05-23 11:49:51 -07:00
Isadora White
a1bd99dc43
small changes
2025-05-14 14:27:38 -07:00
Isadora White
87e56092bf
fix inventories for hells kitchen
2025-05-13 16:48:32 -07:00
Isadora White
ef5f7dfe61
remaining tasks
2025-05-13 16:35:36 -07:00
Isadora White
c5490ee024
full cooking tasks
2025-05-13 16:01:06 -07:00
Isadora White
a655357267
all possible hells kitchen tasks and partial plan tasks
2025-05-13 15:55:10 -07:00
Isadora White
748334f7c0
new cooking tasks
2025-05-13 15:18:18 -07:00
Isadora White
c0577a64cb
update cooking profile so they don't hunt around for chests and try catch loop around the get crafting plan
2025-05-12 21:47:47 -07:00
Isadora White
994685496b
better blocked actions and hells kitchen tasks
2025-05-12 20:02:22 -07:00
Isadora White
c1d106de0f
fixing crafting tasks as well
2025-05-12 19:46:49 -07:00
Isadora White
155dbae436
longer timeouts for tasks
2025-05-12 12:32:22 -07:00
Isadora White
09595d2f3b
fixing small task timeout bug
2025-05-11 16:43:08 -07:00
Isadora White
a42dc3342d
hells kitchen and blocked access tasks
2025-05-10 18:38:20 -07:00
Isadora White
e049abb708
making more test tasks for cooking
2025-05-10 18:17:06 -07:00
Isadora White
4ae95cba38
collaboration train tasks with 2 items for cooking
2025-05-10 17:07:08 -07:00
Isadora White
82475f7934
adding some small changes to help with human ai results
2025-05-09 15:22:27 -07:00
Isadora White
c2ce6aed0d
new human ai tasks for new cooking tasks
2025-05-08 12:39:32 -07:00
Isadora White
88b974f332
one agent tasks
2025-05-07 17:15:50 -07:00
Isadora White
8233a29dac
longer timeout and more task pruning
2025-05-07 16:06:25 -07:00
Isadora White
3151253246
update 2 item tasks to require collaboration further
2025-05-07 16:01:25 -07:00
Isadora White
3d27399c3b
long timeout for 4 and 5 agents
2025-05-06 18:34:13 -07:00
Isadora White
216a4cde5d
2 and 3 item tasks
2025-05-06 18:28:22 -07:00
Isadora White
156e5d87fc
fixing gold ingot issue
2025-05-06 11:44:39 -07:00
Isadora White
057faa6046
make train tasks
2025-05-05 21:08:34 -07:00
Isadora White
31e5b6f9fb
4 and 5 agent examples
2025-05-05 17:30:05 -07:00
Isadora White
4536a33ee8
resolve merge conflicts
2025-05-05 17:22:05 -07:00
Isadora White
8f2fcbe50d
required collaboration cooking tasks
2025-05-05 17:11:48 -07:00
Isadora White
3ef7f89773
construction human ai experiments
2025-05-05 15:36:56 -07:00
Isadora White
7c73afcfed
fixed issue with inventory not being cleared
2025-05-05 15:00:58 -07:00
Isadora White
c42250717d
fixing issues with inventory not being cleared for human users
2025-05-05 14:19:14 -07:00
Isadora White
94388efe89
fix merge issues
2025-05-05 13:52:07 -07:00
Isadora White
147203e36b
merge in main
2025-05-05 13:30:24 -07:00
Isadora White
ae34261e52
new task files and fixing old ones
2025-05-05 12:30:35 -07:00
Isadora White
bafef3bff2
Merge branch 'merge-main' of https://github.com/icwhite/mindcraft into merge-main
2025-05-03 15:06:29 -07:00
Isadora White
29b946b8be
new tasks
2025-05-03 15:06:22 -07:00
Isadora White
fa316e350c
fixing human human experiments
2025-05-03 15:00:48 -07:00
Ayush Maniar
e4eb595a8b
Added code for cooking tasks generation (with dictionary for cooking_items)
2025-05-02 13:09:19 -07:00
Isadora White
e88185602b
fix human evaluations for construction
2025-05-02 12:01:10 -07:00
Isadora White
abf3532433
updating cooking task intiator to be easier to read, fewer nested function
2025-05-01 15:08:07 -07:00
Isadora White
f8c61c64ec
add human human evaluation set up
2025-05-01 13:41:54 -07:00
Isadora White
759f861b57
fixed issue with single agent and equal load tasks
2025-04-29 14:26:37 -07:00
Isadora White
aac00bc893
human ai cooking and crafting tasks
2025-04-25 19:16:00 -07:00
Isadora White
181d628033
fixing small issue with defaults
2025-04-25 15:08:34 -07:00