Rename the `ChatMessage` and `AgentEvent` base classes to `BaseChatMessage` and `BaseAgentEvent`.
Bring back the `ChatMessage` and `AgentEvent` as union of built-in concrete types to avoid breaking existing applications that depends on Pydantic serialization.
Why?
Many existing code uses containers like this:
```python
class AppMessage(BaseModel):
name: str
message: ChatMessage
# Serialization is this:
m = AppMessage(...)
m.model_dump_json()
# Fields like HandoffMessage.target will be lost because it is now treated as a base class without content or target fields.
```
The assumption on `ChatMessage` or `AgentEvent` to be a union of concrete types could be in many existing code bases. So this PR brings back the union types, while keep method type hints such as those on `on_messages` to use the `BaseChatMessage` and `BaseAgentEvent` base classes for flexibility.
This PR allows docker-out-of-docker scenarios to be run in agbench
(e.g., agent teams that rely on the DockerCommandLineExecutor)
This is becoming increasingly important for benchmarking and testing,
since the behaviors of running local executors can diverge in important
ways.
This pull request introduces a new linting feature to the benchmark
configuration in the `agbench` package. The main changes include adding
a new command to the CLI, implementing the linter functionality, and
integrating it with the existing codebase.
### New Linting Feature:
*
[`python/packages/agbench/src/agbench/cli.py`](diffhunk://#diff-0eafed70ad5e99e6f7319927bf92ee3ce4787d156dd2775b10a61baad7ec1799R10):
Added `lint_cli` import and integrated the new "lint" command into the
`main` function.
[[1]](diffhunk://#diff-0eafed70ad5e99e6f7319927bf92ee3ce4787d156dd2775b10a61baad7ec1799R10)
[[2]](diffhunk://#diff-0eafed70ad5e99e6f7319927bf92ee3ce4787d156dd2775b10a61baad7ec1799R37-R41)
### Linter Implementation:
*
[`python/packages/agbench/src/agbench/linter/__init__.py`](diffhunk://#diff-45842e728e3daad063b3cf84d5857a4fdfe14e6d977fb2054f284eb9f5bb5272R1-R4):
Added necessary imports to initialize the linter module.
*
[`python/packages/agbench/src/agbench/linter/_base.py`](diffhunk://#diff-f7ea2f6706232406b6c727fda6d71f09c568b4573f070af79bb7f3da3514e364R1-R81):
Defined core classes such as `Document`, `Code`, `CodeExample`,
`CodedDocument`, and the `BaseQualitativeCoder` protocol.
*
[`python/packages/agbench/src/agbench/linter/cli.py`](diffhunk://#diff-e6ad1e14dc0df2c10fe62fede5a06d83865ad1961f99ec2d78f9052feb4d663bR1-R86):
Implemented the `lint_cli` function, which includes loading log files,
coding them, and printing the results.
*
[`python/packages/agbench/src/agbench/linter/coders/oai_coder.py`](diffhunk://#diff-5059129410822c8a214f797a6167cbfcfbe31bd6a3b1efcb65a2dd703ef9b331R1-R212):
Implemented the `OAIQualitativeCoder` class to interact with OpenAI for
coding documents and caching results.
Example usage:
<img width="997" alt="image"
src="https://github.com/user-attachments/assets/6718688e-9917-4a43-a2f1-1105b030528d"
/>
<img width="999" alt="image"
src="https://github.com/user-attachments/assets/7fcb9c43-70f2-4fe7-ae29-5ad6a4ef2a16"
/>
> If you are in VSCode Terminal, you can click on the links in the
terminal output to jump to the exact error.
---------
Co-authored-by: afourney <adamfo@microsoft.com>
- Updated HumanEval template to use AgentChat
- Update templates to use config.yaml for model and other configuration
- Read environment from ENV.yaml (ENV.json still supported but
deprecated)
- Temporarily removed WebArena and AssistantBench. Neither had viable
Templates after `autogen_magentic_one` was removed. Templates need to be
update to AgentChat (in a future PR, but this PR is getting big enough
already)
This PR removes the older `autogen_magentic_one` package, and directs
people to use the new AgentChat implementation.
Hopefully this eases confusion.
---------
Co-authored-by: Jack Gerrits <jack@jackgerrits.com>
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
* Fix definition of workspace package, remove uv pin
* add --all-packages
* pin docs uv versions for older project structure
* try old version to verify CI
* Use workflow target
* change syntax
* change check
* try with var in matrix
* add all packages to workspace
* remove project table
1. convert dataclass types to pydantic basemodel
2. add save_state and load_state for ChatAgent
3. state types for AgentChat
---------
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
* add tests to ruff for core
* fmt
* lint
* lint fixes
* fixup more dirs
* dont include non python
* lint fixes
* lint fixes
* fix dir name
* dont relative include
* Migrate to uv and poe for workspace management and task running
* install python
* try fix
* ensure workspace venv in used
* package dir
* move nbqa to mypy task
* separate sync, clarify docs