Loading views...

HAI AgentGraph

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Nov 7 11:6
Editor
Edited
Edited
2026 Mar 3 17:48

Todo

check system prompt
support different visualization plan change. 10 traces 11 which have plan feedback loop
counterfactual bias, bias, jailbreaking, tool misuse testing

Infer

  • inferStateType
  • inferSubject
  • extractData
  • extractTools
  • inferPrompts
  • extractTimestamp

Integration - leftshift

Big win on they are gonna developing open api enpoint to pull traces.
even tho we need more talk on that. Additionally good news is the whole unilever teams may gonna use the internal observaboty platform that means we have a least effort to extend our agentgraph porduct o the whole unilvever teams much better to hear that evalkit is gonna be standized observability platform
go for it. all for it
we already have a connection based integration. much easier and less complexity
trace registraction . constructed workflow memory, proedure
how much volume of trace gonna be
filter: embedding based trace clustering to prevent long run
webhook
do you have any centralized system that store. just like langfuse langsmith
uniform consistent trace schema, connection
some utility agent trace missing system prompt
competitor agent orchestrator trace has different scheme

data schmea

share tool and system prompt schema change
redacting data masking

visualization

model config 변경 가능 select 3
let visualize view tools available
bias mitigation 이미 그래프 참고
suggested action 연속이면 뭉쳐서
42 tests are confusing
unsafe score 0% is weird
why bias has only 3 not 4
save frequency and freqeuncy of final state
heatmap agent graph - for state and use frequency
failure visualization detection, final state and malfunction detection
basic asset view broken

how to test more accruately

user node
cascading effect
sample trace data
작은 테스트 큰테스트 dataset 나눠서 지금은 작은 데이터셋

Perturbation Testing

why hallucination takes longest > jailbreak > bias > tool misuse
verify changed system prompt actually lower on the same perturbation testing
why hallucination does have base prompt / indicate recheck as safe
migrate to s3 since 100kb 256kb event bus limit
prompt leakage testing

agent mode

trace registration process - let client to interact
jq function tool to identify trace
delete all key tool for agent for trace
rag system/instruction prompt
use current rule based as a function
ref claude code's src
agent can write a function that extract tool etc. dynamically, but need to check the finalized readible graph
hide the graph and connection building with play
how about fine tuning black box with endpoint case? even white box agent system
for ui, select trace to visualize
specific model - temp 0 - user side can be configurable
data flow graph
memory is a trace specific information that could help a global view of graph generation which might not used for reusable graph
share video on the channel (deterministic demo)
agent grpah visuaization → construction
Graph → Prompt Reconstruction (rebuild prompts) → Perturbation Testing (bias/jailbreak) → Causal Analysis (identify risky)
hallucination detection test
number of safe and vulnerable, most vulerable
tool misuse: two side
tool call malicisas armuent
tool response tool response expexeted with geneerated from tool description
visulizating utility agent flow
ask umar to do work in parallel
play button
separate layer of agent graph schema
mixing low-level and high-level concepts doesn't make sense, such as combining prompts and tasks. this reflects a fundamental design flaw
orchestrator agent is missed, parallel agent for utility agent
rethink stream update - interprete new trace based on agent graph by staatet transition and simulation
graph change approval
stale node and edge visualization which could be deleted show user
graph expansion

Problems

  • AgentSteer mlflow trace and unilever mlflow trace are different
    • That is fine but what is the real original from from mlflow
  • Source code limitation

AgentSteer Structure

  1. MLflow Tract
  1. Basic Graph
  1. Detailed Graph
  1. Reactflow Graph
  1. Visualization

Instruction

Merged architecture

orchestrator_traces
  • conversation {}
  • turns []
  • messages []
    • plans []
    • events []
    • files = []
    • prompt_cards []
    • user_feedbacks = []
    • status_store = []
    • stats {}
    competitor_traces
    • natural_language_query
    • competitor_df
    • chat_history = []
    • standalone_query
    • rephrased_query

    Orchestrator architecture

    • events
    • turns
    • plans
    • prompt_cards
    • feedbacks = []
    • execution_trackers
    • files = []
     
     
     
    1. Technical Excellence
    1. Poster & Presentation
    1. Real-world Impact
    • Track A: Performance Optimization / Robustness
    • Track B: Observability / Explainability
    • Track C: Attack Discovery / Methodology
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    sycophancy

    Recommendations