Commit graph

1386 commits

Author SHA1 Message Date
Jhn
cf78c1941d
Merge a6009c50c1 into 00127506b1 2025-06-25 22:11:15 -04:00
Jhn
a6009c50c1
Merge branch 'main' into feature/granular-task-outcomes 2025-06-25 22:11:11 -04:00
Johnathan Walker
7c5a7f8df8 fix: Add missing __init__.py to make tasks directory a Python package
Resolves the ModuleNotFoundError when running evaluation_script.py.
Users can now run the script after installing dependencies:

1. python -m venv venv && source venv/bin/activate
2. pip install -r requirements.txt
3. PYTHONPATH=. python tasks/evaluation_script.py [args]
2025-06-25 19:06:31 -04:00
MaxRobinsonTheGreat
00127506b1 improve ui and default settings 2025-06-16 16:32:40 -05:00
Max Robinson
0eb16cc3ec
Merge pull request #555 from uukelele-scratch/patch-4
bump mineflayer version from 4.26.0 to 4.29.0
2025-06-16 11:27:15 -05:00
Max Robinson
36aa78d5e5
Merge pull request #525 from mrelmida/main
Fixed groq outputing a empty response
2025-06-16 11:26:50 -05:00
Max Robinson
43a5b7a0c7
Merge pull request #563 from kolbytn/develop
API Refactor
2025-06-16 11:26:04 -05:00
Johnathan Walker
18eca2f5d9 fix: Resolve API naming inconsistency in analyse_results module
- Re-export enhanced function as 'aggregate_results' for backward compatibility
- Users can now import aggregate_results and get the enhanced functionality
- Updated architecture documentation to reflect the corrected API
- Maintains intuitive API while providing enhanced model extraction features
2025-06-15 23:21:01 -04:00
Johnathan Walker
f7947ec3c2 refactor: Eliminate code duplication and enhance development workflow
- Created tasks/experiment_utils.py for shared utility functions
- Streamlined entry point scripts by moving common code to utils
- Enhanced .gitignore with comprehensive Python development patterns
- Validated and fixed documentation links across all markdown files
- Applied final code quality improvements and optimization
2025-06-15 23:12:34 -04:00
Johnathan Walker
3c6649f224 Add npm cache directories to .gitignore to prevent accidental commits 2025-06-15 22:22:30 -04:00
Johnathan Walker
cc51242527 feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting
- Added new evaluation.py with dynamic agent configuration support
- Implemented comprehensive test suite (38 tests, 100% pass rate)
- Enhanced evaluation_script.py with improved error handling and logging
- Updated analysis tools for better outcome reporting and visualization
- Added extensive documentation including architecture guide and user manuals
- Maintained backward compatibility with existing task formats
- Improved performance and reliability for multi-agent evaluations

Key improvements:
- Flexible agent count configuration (1-N agents)
- Rich outcome data structures with detailed metrics
- Comprehensive error handling and recovery mechanisms
- Enhanced logging and debugging capabilities
- Complete test coverage for production readiness

Files added/modified:
- tasks/evaluation.py (new core evaluation engine)
- tasks/test_*.py (comprehensive test suite)
- docs/ (complete documentation suite)
- Updated analysis and visualization tools
2025-06-15 22:01:19 -04:00
Max Robinson
121572fabe
Merge pull request #562 from kolbytn/api-refactor
Api refactor
2025-06-14 16:26:13 -05:00
Isadora White
eb09c2f08e fixing issues with shutting down of tasks 2025-06-14 14:23:25 -05:00
MaxRobinsonTheGreat
b2de1cda17 clean settings.js 2025-06-13 13:08:44 -05:00
MaxRobinsonTheGreat
317c01e340 always connect agents to localhost 2025-06-13 13:02:48 -05:00
MaxRobinsonTheGreat
1eea05f576 Merge branch 'develop' into api-refactor 2025-06-13 11:24:21 -05:00
MaxRobinsonTheGreat
7348ddd458 Merge branch 'main' into develop 2025-06-13 11:21:46 -05:00
MaxRobinsonTheGreat
ebf2d4663b add python api prototype 2025-06-11 17:11:11 -05:00
MaxRobinsonTheGreat
0f5dd0cb07 create-agent endpoint from ui 2025-06-11 16:41:54 -05:00
MaxRobinsonTheGreat
8162fc1ab1 major refactor: use mindserver to init settings 2025-06-10 17:52:30 -05:00
Max Robinson
5fe256d10a
Merge pull request #557 from icwhite/main
Human AI Tasks Update
2025-06-10 13:58:04 -05:00
Max Robinson
a33465ce03
better node version 2025-06-10 13:56:56 -05:00
Isadora White
447b906ce3
Update minecollab.md 2025-06-09 02:00:57 -05:00
Isadora White
5a403951d1
Update minecollab.md 2025-06-09 02:00:36 -05:00
Isadora White
00aa14ab5f
Update minecollab.md 2025-06-09 02:00:04 -05:00
Isadora White
3661114321
Clarifying instructions for installing tmux 2025-06-09 01:55:47 -05:00
Isadora White
3a43b3c03c
Fix a small typo 2025-06-09 01:46:08 -05:00
Isadora White
6748b65fcb
Update README.md 2025-06-09 01:45:28 -05:00
Isadora White
088b71a99a more friendly messages in the python evaluation script to make it more easy for the users to understand what is happening 2025-06-09 01:35:18 -05:00
Isadora White
1f11b3bf55 Merge branch 'vllm_debugging' 2025-06-09 01:30:38 -05:00
Maximus
6f2bf41e6e initial refactor 2025-06-02 13:47:07 -06:00
Isadora White
0503ee3409 remove the conversation thingie from the set agent goal command 2025-06-01 18:46:50 -05:00
Isadora White
0bffe111b1 fixing weird conversation thing maybe 2025-06-01 18:43:28 -05:00
Isadora White
38ec20fd80
Merge branch 'kolbytn:main' into main 2025-05-30 17:47:58 -05:00
Isadora White
1331239830 small changes to tasks 2025-05-30 17:39:47 -05:00
Isadora White
ef9fb74757 cleaning up human ai tasks 2025-05-26 21:25:10 -07:00
Isadora White
6b4b895cc1 human ai tasks 2025-05-26 16:46:42 -07:00
uukelele
4343f78118
bump mineflayer version from 4.26.0 to 4.29.0 2025-05-25 08:06:03 +01:00
Max Robinson
2b6a1115db
Merge pull request #554 from kolbytn/main
update dev
2025-05-24 12:31:29 -06:00
Max Robinson
f2f06fcf3f
Merge pull request #540 from icwhite/main
Small Fixes and lots of Task reworking
2025-05-24 12:30:33 -06:00
Isadora White
fa02028b8b remove unnecessary changes 2025-05-23 12:02:23 -07:00
Isadora White
b55f92800f restore settings.js 2025-05-23 11:56:40 -07:00
Isadora White
f7e4fee249 update README and remove useless tasks 2025-05-23 11:54:53 -07:00
Isadora White
77535f97d5 fix goal string issues 2025-05-23 11:49:51 -07:00
Kolby Nottingham
c4e23ea387
Merge pull request #550 from rajammanabrolu/main
Update README.md with bib for arxiv paper
2025-05-21 09:50:38 -07:00
Prithviraj Ammanabrolu
0fabaa8e90
smol 2025-05-21 09:48:28 -07:00
Prithviraj Ammanabrolu
99af6506aa
Update README.md with bib 2025-05-21 09:44:47 -07:00
Isadora White
a1bd99dc43 small changes 2025-05-14 14:27:38 -07:00
Isadora White
87e56092bf fix inventories for hells kitchen 2025-05-13 16:48:32 -07:00
Isadora White
ef5f7dfe61 remaining tasks 2025-05-13 16:35:36 -07:00