Agent state graph is an integral of the trace
Todo
check system prompt
support different visualization plan change. 10 traces 11 which have plan feedback loop
counterfactual bias, bias, jailbreaking, tool misuse testing
Infer
1. checkSchemaType → merged | langsmith | langfuse IF merged: → mergedToGraph() (existing, done) IF langsmith/langfuse: ↓ 2. saveTraceToFile(/tmp/trace.json) ↓ 3. jqQuery tool - explore & extract per step: - inferStateType - inferSubject - extractData - extractTools - inferPrompts - extractTimestamp ↓ 4. Map phase: - Direct inference OR - Generate function → double-check → use Ready to start implementing? I'll add minimal tools: 1. createSaveTraceToFileTool() 2. createJqQueryTool() 3. Update checkSchemaType to detect langsmith/langfuse 4. Update prompt for new flow
- inferStateType
- inferSubject
- extractData
- extractTools
- inferPrompts
- extractTimestamp
Integration - leftshift
Big win on they are gonna developing open api enpoint to pull traces.
even tho we need more talk on that. Additionally good news is the whole unilever teams may gonna use the internal observaboty platform that means we have a least effort to extend our agentgraph porduct o the whole unilvever teams much better to hear that evalkit is gonna be standized observability platform
go for it. all for it
we already have a connection based integration. much easier and less complexity
trace registraction . constructed workflow memory, proedure
how much volume of trace gonna be
filter: embedding based trace clustering to prevent long run
webhook
do you have any centralized system that store. just like langfuse langsmith
uniform consistent trace schema, connection
some utility agent trace missing system prompt
competitor agent orchestrator trace has different scheme
data schmea
share tool and system prompt schema change
redacting data masking
visualization
model config 변경 가능 select 3
let visualize view tools available
bias mitigation 이미 그래프 참고
suggested action 연속이면 뭉쳐서
42 tests are confusing
unsafe score 0% is weird
why bias has only 3 not 4
save frequency and freqeuncy of final state
heatmap agent graph - for state and use frequency
failure visualization detection, final state and malfunction detection
basic asset view broken
how to test more accruately
user node
cascading effect
sample trace data
작은 테스트 큰테스트 dataset 나눠서 지금은 작은 데이터셋
Perturbation Testing
why hallucination takes longest > jailbreak > bias > tool misuse
verify changed system prompt actually lower on the same perturbation testing
why hallucination does have base prompt / indicate recheck as safe
migrate to s3 since 100kb 256kb event bus limit
prompt leakage testing
agent mode
trace registration process - let client to interact
jq function tool to identify trace
delete all key tool for agent for trace
rag system/instruction prompt
use current rule based as a function
ref claude code's src
agent can write a function that extract tool etc. dynamically, but need to check the finalized readible graph
hide the graph and connection building with play
how about fine tuning black box with endpoint case? even white box agent system
for ui, select trace to visualize
specific model - temp 0 - user side can be configurable
data flow graph
memory is a trace specific information that could help a global view of graph generation which might not used for reusable graph
share video on the channel (deterministic demo)
agent grpah visuaization → construction
Graph → Prompt Reconstruction (rebuild prompts) → Perturbation Testing (bias/jailbreak) → Causal Analysis (identify risky)
hallucination detection test
number of safe and vulnerable, most vulerable
tool misuse: two side
tool call malicisas armuent
tool response tool response expexeted with geneerated from tool description
visulizating utility agent flow
ask umar to do work in parallel
play button
separate layer of agent graph schema
mixing low-level and high-level concepts doesn't make sense, such as combining prompts and tasks. this reflects a fundamental design flaw
orchestrator agent is missed, parallel agent for utility agent
rethink stream update - interprete new trace based on agent graph by staatet transition and simulation
graph change approval
stale node and edge visualization which could be deleted show user
graph expansion
Problems
- AgentSteer mlflow trace and unilever mlflow trace are different
- That is fine but what is the real original from from mlflow
- Source code limitation
AgentSteer Structure
- MLflow Tract
- Basic Graph
- Detailed Graph
- Reactflow Graph
- Visualization
Instruction
write a @traverse-agent.ts script which can be run pnpm shell .src/agentgraph/traverse-agent.ts ˜˜.json using vercel ai sdk (refer @exposure-assessment.ts for how to using) of mastra agent it reads provided json and acces data.orchestrator_traces.messages (only using that) and than traverse the message while update updating the output graph and memory @merged_orchestrator_6909ed23e4486dfa127548a1.json (32-44) 1. for example reading the first message it update global memory stack - question make an first edge and node the edge's subject attribute is user and target node is Asked status for example @merged_orchestrator_6909ed23e4486dfa127548a1.json (45-73) 2. the new edge and status node planned is created. the edge's subject might be competitor 360 agent and explanation might be added to memory or not. obviously plan should be added on global memory stack @merged_orchestrator_6909ed23e4486dfa127548a1.json (73-101) 3. new edge whith user subject and new state node approved @merged_orchestrator_6909ed23e4486dfa127548a1.json (103-115) 4. ares subject is moving to the edge and status to research state @merged_orchestrator_6909ed23e4486dfa127548a1.json (116-128) 5. self referencings edge with ares subject @merged_orchestrator_6909ed23e4486dfa127548a1.json (129-141) 6. new state and node retrievd state with ared subject edge @merged_orchestrator_6909ed23e4486dfa127548a1.json (141-160) 7. new reported state node with subject node reporter. if the memory size is huge don't need to store the full context and even no need to update memory **if it does not help future graph construction or helful to prevention of duplicated node or edge** rules - the travese agent is for construct state graph for agent trace, traversing and generating a graph - single message can create single state node and edge - the traverse agent have a context of what he is doing, global memory yand constructed graph - the user or agent is a executor of the state change (edge); a edge's attribute has user or specific agent or subject - the agent are given with empty graph or pre-constructed graph. if the pre-constructed graph the traverse agent basially traverse them with changin state based on the edges. However if some trace components are not interpreted by pre-given graph or current graph (the agent only knows the current graph; no concept of pre existing). update or create new edge or node to traverse correctly for example - all graph is dag case 1 if use rejected the plan the agent might create state rejected from planned with user subject or create an edge to asked state based on implementation case 2 if planning is asked to another agent, off to compertitor agent , it should create another edge but not state and just with another subject did you got it or is theeere another question
first you should sort by message.created_at answer 1. graph schema {} and state node type should be dynamically selected not fixed. memory (context) only contains - current piece of trace, current graph, global memory and instruction rules etc memory is jsut key value stroage please do not make it as a fixed type please for edge remove explanation and timestamp type is stirng remove timestamp and message id you don't get it now. We are constructing **generatlized graph which are capable to explain every traces generated from a same agent system** we are not generating a graph for a single trace. edge has another attribution with optional as system_prompt instruction_prompt potentially with template string if you are confident with semantic variable name in this agent system's context Also you can update the useful global memory - system description with suspectful tone since the agent is only seeing a peice of the trace . we means it need to be udpated along the trace the output grpah schema is { states edges? or transitions? what is the term using is finite state machine context memory{record string stirng?} } there may be a several tool to generate graph and managing a graph - create_state - create_edge? when ai agent call function, the improtant thing is ai agent should not rewreuite the whole graph. the agent just wirte a state json or edge json then it pass to the tool and update the runtime object managed. (returns success or not only not the full graph inserted to context since agent already have it) create_memory update memroy delete memryo is also same only key and value is generated by llms as toksn and the return is success or not any more question?
this is not for orchestrator. this is for any agent system. change the schema name use fs promises not using import * path either never identify message.type manually inside code. the agent jsut read the full json of the message and no unnecessary fine tune to this trace schema the code only knows the message is a list of unknown you can define schama of this tracce as merged azure trace but never fine tune on that
Merged architecture
orchestrator_traces
- conversation {}
- turns []
- messages []
- plans []
- events []
- files = []
- prompt_cards []
- user_feedbacks = []
- status_store = []
- stats {}
competitor_traces
- natural_language_query
- competitor_df
- chat_history = []
- standalone_query
- rephrased_query
Orchestrator architecture
- events
- turns
- plans
- prompt_cards
- feedbacks = []
- execution_trackers
- files = []
- Technical Excellence
- Poster & Presentation
- Real-world Impact
- Track A: Performance Optimization / Robustness
- Track B: Observability / Explainability
- Track C: Attack Discovery / Methodology
sycophancy
Seonglae Cho