Replaced before prompt
π METHOD PERFORMANCE: unified_method: Success rate: 7/7 Avg exact score: 0.107 Avg semantic score: 0.610 Avg processing time: 69.50s Avg matching score: 0.375 π° Avg tokens per run: 250952 π° Avg cost per run: $0.0404 π° Total cost: $0.2825 π€ Model: unknown original_method: Success rate: 7/7 Avg exact score: 0.087 Avg semantic score: 0.426 Avg processing time: 47.50s Avg matching score: 0.249 π° Avg tokens per run: 212647 π° Avg cost per run: $0.0346 π° Total cost: $0.2419 π€ Model: unknown hybrid_method: Success rate: 7/7 Avg exact score: 0.106 Avg semantic score: 0.646 Avg processing time: 114.85s 08 π° Avg tokens per run: 213449 π° Avg cost per run: $0.0358 π° Total cost: $0.2505 π€ Model: unknown clustering_method: Success rate: 7/7 Avg exact score: 0.143 Avg semantic score: 0.623 Avg processing time: 135.22s Avg matching score: 0.392 π° Avg tokens per run: 231979 π° Avg cost per run: $0.0420 π° Total cost: $0.2940 π€ Model: gpt-4o-mini direct_llm_method: Success rate: 7/7 Avg exact score: 0.085 Avg semantic score: 0.592 Avg processing time: 33.04s Avg matching score: 0.363 π° Avg tokens per run: 80605 π° Avg cost per run: $0.0127 π° Total cost: $0.0889 π€ Model: gpt-4o-mini pydantic_ai_method: Success rate: 7/7 Avg exact score: 0.162 Avg semantic score: 0.459 Avg processing time: 34.58s Avg matching score: 0.292 π° Avg tokens per run: 82415 π° Avg cost per run: $0.0133 π° Total cost: $0.0928 π€ Model: unknown sequential_pydantic: Success rate: 7/7 Avg exact score: 0.080 Avg semantic score: 0.316 Avg processing time: 140.56s Avg matching score: 0.209 π° Avg tokens per run: 82568 π° Avg cost per run: $0.0133 π° Total cost: $0.0666 π€ Model: unknown perfect_method: Success rate: 7/7 Avg exact score: 1.000 Avg semantic score: 1.000 Avg processing time: 0.00s Avg matching score: 1.000 π° Avg tokens per run: 0 π° Avg cost per run: $0.0000 π° Total cost: $0.0000 π€ Model: unknown dumb_method: Success rate: 7/7 Avg exact score: 0.000 Avg semantic score: 0.000 Avg processing time: 0.00s Avg matching score: 0.000 π° Token usage: Not available πΎ CACHE PERFORMANCE: Total cache hits: 46 Total cache misses: 17 Cache hit rate: 73.0%
π METHOD PERFORMANCE: unified_method: Success rate: 7/7 Avg exact score: 0.122 Avg semantic score: 0.700 Avg processing time: 73.35s Avg matching score: 0.440 π° Avg tokens per run: 318206 π° Avg cost per run: $0.0508 π° Total cost: $0.3558 π€ Model: unknown original_method: Success rate: 7/7 Avg exact score: 0.106 Avg semantic score: 0.489 Avg processing time: 77.57s Avg matching score: 0.317 π° Avg tokens per run: 335146 π° Avg cost per run: $0.0555 π° Total cost: $0.3883 π€ Model: unknown hybrid_method: Success rate: 7/7 Avg exact score: 0.093 Avg semantic score: 0.720 Avg processing time: 528.95s Avg matching score: 0.445 π° Avg tokens per run: 335928 π° Avg cost per run: $0.0573 π° Total cost: $0.4010 π€ Model: unknown clustering_method: Success rate: 7/7 Avg exact score: 0.122 Avg semantic score: 0.585 Avg processing time: 209.73s Avg matching score: 0.341 π° Avg tokens per run: 360825 π° Avg cost per run: $0.0651 π° Total cost: $0.4556 π€ Model: gpt-4o-mini direct_llm_method: Success rate: 7/7 Avg exact score: 0.143 Avg semantic score: 0.767 Avg processing time: 111.73s Avg matching score: 0.579 π° Avg tokens per run: 81373 π° Avg cost per run: $0.0131 π° Total cost: $0.0916 π€ Model: gpt-4o-mini pydantic_ai_method: Success rate: 7/7 Avg exact score: 0.075 Avg semantic score: 0.385 Avg processing time: 208.52s Avg matching score: 0.231 π° Avg tokens per run: 94392 π° Avg cost per run: $0.0182 π° Total cost: $0.0908 π€ Model: unknown sequential_pydantic: Success rate: 7/7 Avg exact score: 0.126 Avg semantic score: 0.593 Avg processing time: 58.05s Avg matching score: 0.423 π° Avg tokens per run: 84436 π° Avg cost per run: $0.0139 π° Total cost: $0.0976 π€ Model: unknown perfect_method: Success rate: 7/7 Avg exact score: 1.000 Avg semantic score: 1.000 Avg processing time: 0.00s Avg matching score: 1.000 π° Avg tokens per run: 0 π° Avg cost per run: $0.0000 π° Total cost: $0.0000 π€ Model: unknown dumb_method: Success rate: 7/7 Avg exact score: 0.000 Avg semantic score: 0.000 Avg processing time: 0.00s Avg matching score: 0.000 π° Token usage: Not available πΎ CACHE PERFORMANCE: Total cache hits: 0 Total cache misses: 63 Cache hit rate: 0.0%
Trimmed before/after prompt
π METHOD PERFORMANCE: unified_method: Success rate: 7/7 Avg exact score: 0.069 Avg semantic score: 0.422 Avg processing time: 25.94s Avg matching score: 0.235 π° Avg tokens per run: 410098 π° Avg cost per run: $0.0636 π° Total cost: $0.4451 π€ Model: unknown original_method: Success rate: 7/7 Avg exact score: 0.069 Avg semantic score: 0.375 Avg processing time: 39.26s Avg matching score: 0.200 π° Avg tokens per run: 422211 π° Avg cost per run: $0.0666 π° Total cost: $0.4661 π€ Model: unknown hybrid_method: Success rate: 7/7 Avg exact score: 0.089 Avg semantic score: 0.566 Avg processing time: 159.75s Avg matching score: 0.339 π° Avg tokens per run: 425239 π° Avg cost per run: $0.0693 π° Total cost: $0.4849 π€ Model: unknown clustering_method: Success rate: 7/7 Avg exact score: 0.092 Avg semantic score: 0.551 Avg processing time: 607.81s Avg matching score: 0.322 π° Avg tokens per run: 454201 π° Avg cost per run: $0.0789 π° Total cost: $0.5523 π€ Model: gpt-4o-mini direct_llm_method: Success rate: 7/7 Avg exact score: 0.099 Avg semantic score: 0.474 Avg processing time: 31.50s Avg matching score: 0.244 π° Avg tokens per run: 103817 π° Avg cost per run: $0.0161 π° Total cost: $0.1126 π€ Model: gpt-4o-mini pydantic_ai_method: Success rate: 7/7 Avg exact score: 0.077 Avg semantic score: 0.373 Avg processing time: 29.15s Avg matching score: 0.185 π° Avg tokens per run: 105100 π° Avg cost per run: $0.0164 π° Total cost: $0.1150 π€ Model: unknown sequential_pydantic: Success rate: 7/7 Avg exact score: 0.058 Avg semantic score: 0.346 Avg processing time: 23.68s Avg matching score: 0.168 π° Avg tokens per run: 104385 π° Avg cost per run: $0.0161 π° Total cost: $0.1127 π€ Model: unknown perfect_method: Success rate: 7/7 Avg exact score: 1.000 Avg semantic score: 1.000 Avg processing time: 0.00s Avg matching score: 1.000 π° Avg tokens per run: 0 π° Avg cost per run: $0.0000 π° Total cost: $0.0000 π€ Model: unknown dumb_method: Success rate: 7/7 Avg exact score: 0.000 Avg semantic score: 0.000 Avg processing time: 0.00s Avg matching score: 0.000 π° Token usage: Not available πΎ CACHE PERFORMANCE: Total cache hits: 63 Total cache misses: 0 Cache hit rate: 100.0%
π METHOD PERFORMANCE: unified_method: Success rate: 7/7 Avg exact score: 0.057 Avg semantic score: 0.531 Avg processing time: 65.45s Avg matching score: 0.321 π° Avg tokens per run: 411793 π° Avg cost per run: $0.0644 π° Total cost: $0.4505 π€ Model: unknown original_method: Success rate: 7/7 Avg exact score: 0.050 Avg semantic score: 0.351 Avg processing time: 56.29s Avg matching score: 0.196 π° Avg tokens per run: 422546 π° Avg cost per run: $0.0665 π° Total cost: $0.4658 π€ Model: unknown hybrid_method: Success rate: 7/7 Avg exact score: 0.097 Avg semantic score: 0.635 Avg processing time: 124.07s Avg matching score: 0.412 π° Avg tokens per run: 427415 π° Avg cost per run: $0.0701 π° Total cost: $0.4908 π€ Model: unknown clustering_method: Success rate: 7/7 Avg exact score: 0.083 Avg semantic score: 0.526 Avg processing time: 194.26s Avg matching score: 0.313 π° Avg tokens per run: 459523 π° Avg cost per run: $0.0809 π° Total cost: $0.5664 π€ Model: gpt-4o-mini direct_llm_method: Success rate: 7/7 Avg exact score: 0.076 Avg semantic score: 0.587 Avg processing time: 459.17s Avg matching score: 0.408 π° Avg tokens per run: 104811 π° Avg cost per run: $0.0166 π° Total cost: $0.1163 π€ Model: gpt-4o-mini pydantic_ai_method: Success rate: 7/7 Avg exact score: 0.064 Avg semantic score: 0.299 Avg processing time: 112.01s Avg matching score: 0.146 π° Avg tokens per run: 106761 π° Avg cost per run: $0.0168 π° Total cost: $0.1010 π€ Model: unknown sequential_pydantic: Success rate: 7/7 Avg exact score: 0.067 Avg semantic score: 0.380 Avg processing time: 27.02s Avg matching score: 0.200 π° Avg tokens per run: 105026 π° Avg cost per run: $0.0163 π° Total cost: $0.1142 π€ Model: unknown perfect_method: Success rate: 7/7 Avg exact score: 1.000 Avg semantic score: 1.000 Avg processing time: 0.00s Avg matching score: 1.000 π° Avg tokens per run: 0 π° Avg cost per run: $0.0000 π° Total cost: $0.0000 π€ Model: unknown dumb_method: Success rate: 7/7 Avg exact score: 0.000 Avg semantic score: 0.000 Avg processing time: 0.00s Avg matching score: 0.000 π° Token usage: Not available πΎ CACHE PERFORMANCE: Total cache hits: 0 Total cache misses: 63 Cache hit rate: 0.0%
Β
Β
Β
Β
Β
Β
Β
Seonglae Cho