mindcraft

mirror of https://github.com/kolbytn/mindcraft.git synced 2025-07-30 03:45:36 +02:00

History

Johnathan Walker cc51242527 feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting - Added new evaluation.py with dynamic agent configuration support - Implemented comprehensive test suite (38 tests, 100% pass rate) - Enhanced evaluation_script.py with improved error handling and logging - Updated analysis tools for better outcome reporting and visualization - Added extensive documentation including architecture guide and user manuals - Maintained backward compatibility with existing task formats - Improved performance and reliability for multi-agent evaluations Key improvements: - Flexible agent count configuration (1-N agents) - Rich outcome data structures with detailed metrics - Comprehensive error handling and recovery mechanisms - Enhanced logging and debugging capabilities - Complete test coverage for production readiness Files added/modified: - tasks/evaluation.py (new core evaluation engine) - tasks/test_*.py (comprehensive test suite) - docs/ (complete documentation suite) - Updated analysis and visualization tools		2025-06-15 22:01:19 -04:00
..
construction_tasks	cleaning up human ai tasks	2025-05-26 21:25:10 -07:00
cooking_tasks	small changes to tasks	2025-05-30 17:39:47 -05:00
crafting_tasks	small changes to tasks	2025-05-30 17:39:47 -05:00
single_agent	remove task bloat	2025-04-23 14:48:37 -05:00
analyse_results.py	feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting	2025-06-15 22:01:19 -04:00
analyze_construction_tasks.py	fixing crafting tasks as well	2025-05-12 19:46:49 -07:00
analyze_cooking_tasks.py	feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting	2025-06-15 22:01:19 -04:00
analyze_crafting_tasks.py	refactor all python to tasks folder (ai)	2025-04-19 14:49:20 -05:00
evaluation.py	feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting	2025-06-15 22:01:19 -04:00
evaluation_script.py	feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting	2025-06-15 22:01:19 -04:00
example_tasks.json	fix goal string issues	2025-05-23 11:49:51 -07:00
experiment_script.sh	small changes	2025-05-14 14:27:38 -07:00
human_ai_tasks.py	human ai cooking and crafting tasks	2025-04-25 19:16:00 -07:00
human_evaluation.js	new human ai tasks for new cooking tasks	2025-05-08 12:39:32 -07:00
multi_data_collection_script.py	refactor all python to tasks folder (ai)	2025-04-19 14:49:20 -05:00
multiagent_crafting_tasks.json	refactor all python to tasks folder (ai)	2025-04-19 14:49:20 -05:00
new_analyze_construction_tasks.py	fixing crafting tasks as well	2025-05-12 19:46:49 -07:00
run_task_file.py	add script to run all tasks in task file	2025-03-19 13:53:50 -05:00
running_human_ai.md	cleaning up human ai tasks	2025-05-26 21:25:10 -07:00
test_edge_cases.py	feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting	2025-06-15 22:01:19 -04:00
test_evaluation.py	feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting	2025-06-15 22:01:19 -04:00
test_integration.py	feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting	2025-06-15 22:01:19 -04:00
test_production_readiness.py	feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting	2025-06-15 22:01:19 -04:00
test_regression.py	feat: Enhanced task evaluation system with flexible agent support and rich outcome reporting	2025-06-15 22:01:19 -04:00