- Wrap json list of runs as a trace object (what we defined)
- Put trace files in random folder like
traces
- Preprocess traces like below command (at agent-graph repo)
4. Generate agent graph with below command
5. Visualise generated agent graph by serving portable AgentGraph visualiser
What to change Tool ideas
Search
line
rag
rule based
Validation
Image
Rule based
Graph
networkx
Whole run with 49 samples gpt4o-mini
- At least with the same model, performance remains consistent when increasing sample size independent of steps or token usage with limited data using the same prompt. Even changing the prompt could produce better results, but there is still an upper bound on performance.
- Significant differences only appear when preprocessing the trace.
- External tools or data processing algorithms are the only ways to improve results with a fixed model.
- Never, judge based on a single result, the results differs randomly with high variation. Only rely on multiple visualisation and metrics
GPT 4o
OpenAI Agent
Two step agent
baseline single validation agnet
two step
direct → validation only
only add validation
external iterative addition strategy without addtion (whole generate)
interval validation tool without addition
corrected add and validation tool
direct llm
Production
slice
proper slice
Seonglae Cho