Metric
separate metrics for bias and jailbreak judge metric with 3 types of categorical judging
minimum description length (MDL )로 agent graph since the smallest agent grpah describe the same thing is better
if workflow based then there would be a task, but if model based then the task can be dynamic and infinite
Seonglae Cho