autogen/python/packages/agbench/benchmarks
Jack Gerrits 538f39497b
Replace create_completion_client_from_env with component config (#4928)
* Replace create_completion_client_from_env with component config

* json load
2025-01-08 14:33:28 +00:00
..
AssistantBench Replace create_completion_client_from_env with component config (#4928) 2025-01-08 14:33:28 +00:00
GAIA Replace create_completion_client_from_env with component config (#4928) 2025-01-08 14:33:28 +00:00
HumanEval Replace create_completion_client_from_env with component config (#4928) 2025-01-08 14:33:28 +00:00
WebArena Replace create_completion_client_from_env with component config (#4928) 2025-01-08 14:33:28 +00:00
.gitignore Adding Benchmarks to agbench (#3803) 2024-10-18 06:33:33 +02:00
README.md Adding Benchmarks to agbench (#3803) 2024-10-18 06:33:33 +02:00
process_logs.py Adding Benchmarks to agbench (#3803) 2024-10-18 06:33:33 +02:00

README.md

Benchmarking Agents

This directory provides ability to benchmarks agents (e.g., built using Autogen) using AgBench. Use the instructions below to prepare your environment for benchmarking. Once done, proceed to relevant benchmarks directory (e.g., benchmarks/GAIA) for further scenario-specific instructions.

Setup on WSL

  1. Install Docker Desktop. After installation, restart is needed, then open Docker Desktop, in Settings, Ressources, WSL Integration, Enable integration with additional distros Ubuntu

  2. Clone autogen and export AUTOGEN_REPO_BASE. This environment variable enables the Docker containers to use the correct version agents.

    git clone git@github.com:microsoft/autogen.git
    export AUTOGEN_REPO_BASE=<path_to_autogen>
    
  3. Install agbench. AgBench is currently a tool in the Autogen repo.

    cd autogen/python/packages/agbench
    pip install -e .