Evaluation
hungarian matching, unmatched entities and relation should degrate or considered as 0/-1
embedding pattern sim - also benchmark
enhance prompt reconstruction performance
isolated entity count, invalid relations on the report, score variances
generating evalution dataset
generating agent sstem traces
generate ground trugh dataset
remove invalid realtions
import_crewai_example_csv.py
system_source 를 crewai 랑 system_name 이런 칼럼도 csv 에다가
Graph Generation
instead of consume, use provided_to
remove one of perform, assigned_to pair
invalid relation
input output as a consume -> intermediate 80% of invalid relation is this case
add more valid relations
why input produces tool and consume tasks?, why tool produces input?
What is the diffference between dot line and line (two same review relation)
Langsmith trace API
Seonglae Cho