mirror of
https://github.com/kolbytn/mindcraft.git
synced 2025-07-30 03:45:36 +02:00
![]() - Added new evaluation.py with dynamic agent configuration support - Implemented comprehensive test suite (38 tests, 100% pass rate) - Enhanced evaluation_script.py with improved error handling and logging - Updated analysis tools for better outcome reporting and visualization - Added extensive documentation including architecture guide and user manuals - Maintained backward compatibility with existing task formats - Improved performance and reliability for multi-agent evaluations Key improvements: - Flexible agent count configuration (1-N agents) - Rich outcome data structures with detailed metrics - Comprehensive error handling and recovery mechanisms - Enhanced logging and debugging capabilities - Complete test coverage for production readiness Files added/modified: - tasks/evaluation.py (new core evaluation engine) - tasks/test_*.py (comprehensive test suite) - docs/ (complete documentation suite) - Updated analysis and visualization tools |
||
---|---|---|
.. | ||
construction_tasks | ||
cooking_tasks | ||
crafting_tasks | ||
single_agent | ||
analyse_results.py | ||
analyze_construction_tasks.py | ||
analyze_cooking_tasks.py | ||
analyze_crafting_tasks.py | ||
evaluation.py | ||
evaluation_script.py | ||
example_tasks.json | ||
experiment_script.sh | ||
human_ai_tasks.py | ||
human_evaluation.js | ||
multi_data_collection_script.py | ||
multiagent_crafting_tasks.json | ||
new_analyze_construction_tasks.py | ||
run_task_file.py | ||
running_human_ai.md | ||
test_edge_cases.py | ||
test_evaluation.py | ||
test_integration.py | ||
test_production_readiness.py | ||
test_regression.py |