LibVulnWatch

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jun 24 9:33
Editor
Edited
Edited
2025 Jul 5 0:51
Done
Done
Done
  1. First day PR - Evaluation script feature with screenshots
  1. Second day PR - Improved prompt evaluation comparison and relative superiority between manual LLM judge comparison

Research note

Open Deep Research ICML
openssf as an evaluation
  • should run openssh locally
    • github api token
improve prompt based on the evaluation
RRF prompt enhancement - not that complex and just mention in the papaer
change search API
doing multiple experiment on frameworks
How to catch the unstable generation in the perspective of metrics
human feedback
  • iteration
  • web serach
Is there anything that i can help for paper writing? since I can make a pr today
  • paper work is due to next week
    • feedback this week
      work on next week

Paper

risk score is weird
popular libraries were more higher laignment score since it is easy to get an information in the we
bold in table for higheest score overall
change the title: AgentOSSF SSFAgent
case study example appendix from result
all library referencesø

key parts

  1. section writer
  1. query writer
  1. plan writer
What should be prioritized?
  • Automating table format or output in the source code should be first priority, as some data points like GitHub stars, license information, and Active Maintenance status are not directly extracted by the current workflow
  • Adding more libraries to the evaluation
  • Implementing automated submission pipeline using GitHub Actions with cron jobs for better scalability

Leaderboard

Remove url link error
column changes 때문에 type list 줘도 오류나는데 전부 markdown 으로 해결함
fix double row error due to the double language
very long page error - maybe gradio? or huggingface space
libray type (framework) icon visualizaion or legend
버전 여러개면 list 된 json으로 변경
change the github readme

Paper

내용추가해야한다면
results section 에 내가 적은 insight 추가
첫 페이지 footnote 없에고 refernce 공간 차지하니
cost for each report 0.1 달러 추가
cache scorecard tool
 
 
 

Candidates

ML Frameworks
  • Pytorch
  • Tensorflow
  • JAX
  • Candle ML
Agents Framework
  • CrewAI
  • LangGraph
  • Composio
  • Agent Development Kit
  • SmolAgents
  • MetaGPT
  • Pydantic AI
App Agent
  • Browser Use
  • Stagehand
Prompt Engineering
  • Langchain
  • LLaMaIndex
 
Inference Engine
  • SGLang
  • vLLM
  • TensorRT
  • TGI
  • ONNX
 
Category
Name
Score Metrics
Model Metrics
License
Security
Maintenance
Dependencies
Regulatory
Overall
Model Coverage
Model Seeking
ML Frameworks
Pytorch
⭐⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
88% (15/17)
8
JAX
⭐⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐⭐
⭐⭐⭐
61% (11/18)
12
Tensorflow
⭐⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
80% (12/15)
5
ONNX
⭐⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
88% (14/16)
5
Candle ML
71% (10/14)
12
Agents Framework
CrewAI
⭐⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
71% (10/14)
13
LangGraph
⭐⭐⭐⭐
⭐⭐
78% (14/18)
7
Composio
⭐⭐⭐
⭐⭐
67% (10/15)
5
Agent Development Kit
⭐⭐
71% (10/14)
7
SmolAgents
⭐⭐⭐⭐⭐
73% (11/15)
9
MetaGPT
⭐⭐⭐⭐⭐
⭐⭐⭐⭐⭐
⭐⭐⭐
57% (8/14)
7
Pydantic AI
⭐⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
88% (15/17)
10
App Agent
Browser Use
⭐⭐⭐⭐⭐
⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
88% (15/17)
7
Stagehand
⭐⭐⭐
47% (7/15)
6
Prompt Engineering
LangChain
⭐⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
72% (13/18)
19
LLaMaIndex
⭐⭐⭐
47% (8/17)
7
Inference Engine
SGLang
⭐⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
73% (11/15)
5
vLLM
⭐⭐⭐
⭐⭐⭐⭐
⭐⭐
73% (11/15)
7
TensorRT
⭐⭐⭐⭐⭐
⭐⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
69% (11/16)
5
TGI
⭐⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
72% (13/18)
6
 
 
Open Deep Research ICMLs
Name
Baseline Alignment
Novelty Yield
Trust Score
License
Security
Maintenance
Dependencies
Regulatory
Baseline Alignment
88.24%
Novelty Yield
8
Trust Score
License
5
Security
1
Maintenance
3
Dependencies
1
Regulatory
3
Baseline Alignment
61.11%
Novelty Yield
12
Trust Score
License
5
Security
3
Maintenance
4
Dependencies
1
Regulatory
1
Baseline Alignment
72.22%
Novelty Yield
5
Trust Score
License
5
Security
1
Maintenance
3
Dependencies
1
Regulatory
3
Baseline Alignment
87.5%
Novelty Yield
5
Trust Score
License
5
Security
1
Maintenance
3
Dependencies
1
Regulatory
1
Baseline Alignment
76.47%
Novelty Yield
4
Trust Score
License
5
Security
1
Maintenance
4
Dependencies
1
Regulatory
3
 
 
 
 
notion image
 
notion image
notion image
notion image
notion image
 
 
notion image
 
 
 
 
 
 

Recommendations