HAI AgentGraph

Agent state graph is an integral of the trace

Todo

check system prompt

support different visualization plan change. 10 traces 11 which have plan feedback loop

counterfactual bias, bias, jailbreaking, tool misuse testing

Infer


1. checkSchemaType → merged | langsmith | langfuse

     IF merged:
       → mergedToGraph() (existing, done)

     IF langsmith/langfuse:
       ↓
  2. saveTraceToFile(/tmp/trace.json)
       ↓
  3. jqQuery tool - explore & extract per step:
     - inferStateType
     - inferSubject
     - extractData
     - extractTools
     - inferPrompts
     - extractTimestamp
       ↓
  4. Map phase:
     - Direct inference OR
     - Generate function → double-check → use

  Ready to start implementing? I'll add minimal tools:
  1. createSaveTraceToFileTool()
  2. createJqQueryTool()
  3. Update checkSchemaType to detect langsmith/langfuse
  4. Update prompt for new flow

inferStateType

inferSubject

extractData

extractTools

inferPrompts

extractTimestamp

Integration - leftshift

Big win on they are gonna developing open api enpoint to pull traces.

even tho we need more talk on that. Additionally good news is the whole unilever teams may gonna use the internal observaboty platform that means we have a least effort to extend our agentgraph porduct o the whole unilvever teams much better to hear that evalkit is gonna be standized observability platform

go for it. all for it

we already have a connection based integration. much easier and less complexity

trace registraction . constructed workflow memory, proedure

how much volume of trace gonna be

filter: embedding based trace clustering to prevent long run

webhook

do you have any centralized system that store. just like langfuse langsmith

uniform consistent trace schema, connection

some utility agent trace missing system prompt

competitor agent orchestrator trace has different scheme

data schmea

share tool and system prompt schema change

redacting data masking

visualization

model config 변경 가능 select 3

let visualize view tools available

bias mitigation 이미 그래프 참고

suggested action 연속이면 뭉쳐서

42 tests are confusing

unsafe score 0% is weird

why bias has only 3 not 4

save frequency and freqeuncy of final state

heatmap agent graph - for state and use frequency

failure visualization detection, final state and malfunction detection

basic asset view broken

how to test more accruately

user node

cascading effect

sample trace data

작은 테스트 큰테스트 dataset 나눠서 지금은 작은 데이터셋

Perturbation Testing

why hallucination takes longest > jailbreak > bias > tool misuse

verify changed system prompt actually lower on the same perturbation testing

why hallucination does have base prompt / indicate recheck as safe

migrate to s3 since 100kb 256kb event bus limit

prompt leakage testing

agent mode

trace registration process - let client to interact

jq function tool to identify trace

delete all key tool for agent for trace

rag system/instruction prompt

use current rule based as a function

ref claude code's src

agent can write a function that extract tool etc. dynamically, but need to check the finalized readible graph

hide the graph and connection building with play

how about fine tuning black box with endpoint case? even white box agent system

for ui, select trace to visualize

specific model - temp 0 - user side can be configurable

data flow graph

memory is a trace specific information that could help a global view of graph generation which might not used for reusable graph

share video on the channel (deterministic demo)

agent grpah visuaization → construction

Graph → Prompt Reconstruction (rebuild prompts) → Perturbation Testing (bias/jailbreak) → Causal Analysis (identify risky)

hallucination detection test

number of safe and vulnerable, most vulerable

tool misuse: two side

tool call malicisas armuent

tool response tool response expexeted with geneerated from tool description

visulizating utility agent flow

ask umar to do work in parallel

play button

separate layer of agent graph schema

mixing low-level and high-level concepts doesn't make sense, such as combining prompts and tasks. this reflects a fundamental design flaw

orchestrator agent is missed, parallel agent for utility agent

Adjacency matrix

rethink stream update - interprete new trace based on agent graph by staatet transition and simulation

graph change approval

stale node and edge visualization which could be deleted show user

graph expansion

Problems

AgentSteer mlflow trace and unilever mlflow trace are different

That is fine but what is the real original from from mlflow

Source code limitation

AgentSteer Structure

MLflow Tract

Basic Graph

Detailed Graph

Reactflow Graph

Visualization

Instruction



write a @traverse-agent.ts script which can be run 


pnpm shell .src/agentgraph/traverse-agent.ts ˜˜.json

using vercel ai sdk (refer @exposure-assessment.ts for how to using)
of mastra agent

it reads provided json and acces

data.orchestrator_traces.messages (only using that)

and than traverse the message while update updating the output graph and memory

@merged_orchestrator_6909ed23e4486dfa127548a1.json (32-44)  

1. for example reading the first message
it update global memory stack - question
make an first edge and node
the edge's subject attribute is user and target node is Asked status for example

@merged_orchestrator_6909ed23e4486dfa127548a1.json (45-73) 
2. the new edge and status node planned is created. the edge's subject might be competitor 360 agent and explanation might be added to memory or not.
obviously plan should be added on global memory stack

@merged_orchestrator_6909ed23e4486dfa127548a1.json (73-101) 
3.  new edge whith user subject and new state node approved

@merged_orchestrator_6909ed23e4486dfa127548a1.json (103-115) 
4. ares subject is moving to the edge and status to research state

@merged_orchestrator_6909ed23e4486dfa127548a1.json (116-128) 
5. self referencings edge with ares subject 

@merged_orchestrator_6909ed23e4486dfa127548a1.json (129-141) 
6. new state and node retrievd state with ared subject edge
@merged_orchestrator_6909ed23e4486dfa127548a1.json (141-160) 
7. new reported state node with subject node reporter. if the memory size is huge don't need to store the full context and even no need to update memory **if it does not help future graph construction or helful to prevention of duplicated node or edge**


rules
- the travese agent is for construct state graph for agent trace, traversing and generating a graph
- single message can create single state node and edge
- the traverse agent have a context of what he is doing, global memory yand constructed graph
- the user or agent is a executor of the state change (edge); a edge's attribute has user or specific agent or subject 

- the agent are given with empty graph or pre-constructed graph. if the pre-constructed graph the traverse agent basially traverse them with changin state based on the edges. However if some trace components are not interpreted by pre-given graph or current graph (the agent only knows the current graph; no concept of pre existing). update or create new edge or node to traverse correctly for example
- all graph is dag
case 1

if use rejected the plan the agent might create state rejected from planned with user subject or create an edge to asked state based on implementation

case 2 

if planning is asked to another agent, off to compertitor agent , it should create another edge but not state and just with another subject 


did you got it or is theeere another question


first you should sort by message.created_at

answer
1. graph schema
{}  and state node type should be dynamically selected not fixed.  memory (context) only contains - current piece of trace, current graph, global memory and instruction rules etc

memory is jsut key value stroage please do not make it as a fixed type please
for edge remove explanation and timestamp
type is stirng  remove timestamp and message id


you don't get it now. We are constructing  **generatlized graph which are capable to explain every traces generated from a same agent system** we are not generating a graph for a single trace. 

edge has another attribution with optional as 

system_prompt
instruction_prompt
potentially with template string if you are confident with semantic variable name in this agent system's context
Also you can update the useful global memory
- system description
with suspectful tone since the agent is only seeing a peice of the trace . we means it need to be udpated along the trace

the  output grpah schema is

{
states
edges? or transitions? what is the term using is finite state machine context
memory{record string stirng?}
}


there may be a several tool to generate graph and managing a graph


- create_state
- create_edge?


when ai agent call function, the improtant thing is ai agent should not rewreuite the whole graph. the agent just wirte a state json or edge json then it pass to the tool and update the runtime object managed. (returns success or not only not the full graph inserted to context since agent already have it)

create_memory
update memroy
delete memryo is also same
only key and value is generated by llms as toksn and the return is success or not

any more question?


this is not for orchestrator. this is for any agent system. change the schema name
use fs promises not using import * path either
never identify message.type manually inside code. the agent jsut read the full json of the message and no unnecessary fine tune to this trace schema


the code only knows the message is a list of unknown

you can define schama of this tracce as merged azure trace but never fine tune on that

Merged architecture

orchestrator_traces

conversation {}

turns []

messages []

plans []

events []

files = []

prompt_cards []

user_feedbacks = []

status_store = []

stats {}

competitor_traces

natural_language_query

competitor_df

chat_history = []

standalone_query

rephrased_query

Orchestrator architecture

events

turns

plans

prompt_cards

feedbacks = []

execution_trackers

files = []

Technical Excellence

Poster & Presentation

Real-world Impact

Track A: Performance Optimization / Robustness

Track B: Observability / Explainability

Track C: Attack Discovery / Methodology

sycophancy