LibVulnWatch

First day PR - Evaluation script feature with screenshots

Second day PR - Improved prompt evaluation comparison and relative superiority between manual LLM judge comparison

Research note

Open Deep Research ICML

openssf as an evaluation

should run openssh locally

github api token

improve prompt based on the evaluation

RRF prompt enhancement - not that complex and just mention in the papaer

change search API

doing multiple experiment on frameworks

How to catch the unstable generation in the perspective of metrics

human feedback

iteration

web serach

Is there anything that i can help for paper writing? since I can make a pr today

paper work is due to next week

feedback this week

work on next week

Paper

risk score is weird

popular libraries were more higher laignment score since it is easy to get an information in the we

bold in table for higheest score overall

change the title: AgentOSSF SSFAgent

case study example appendix from result

all library referencesø

key parts

section writer

query writer

plan writer

What should be prioritized?

Automating table format or output in the source code should be first priority, as some data points like GitHub stars, license information, and Active Maintenance status are not directly extracted by the current workflow

Adding more libraries to the evaluation

Implementing automated submission pipeline using GitHub Actions with cron jobs for better scalability

Leaderboard

Remove url link error

column changes 때문에 type list 줘도 오류나는데 전부 markdown 으로 해결함

fix double row error due to the double language

very long page error - maybe gradio? or huggingface space

libray type (framework) icon visualizaion or legend

버전 여러개면 list 된 json으로 변경

change the github readme

Paper

내용추가해야한다면

results section 에 내가 적은 insight 추가

첫 페이지 footnote 없에고 refernce 공간 차지하니

cost for each report 0.1 달러 추가

cache scorecard tool

Candidates

ML Frameworks

Pytorch

Tensorflow

Candle ML

Agents Framework

CrewAI

LangGraph

Composio

Agent Development Kit

SmolAgents

MetaGPT

Pydantic AI

App Agent

Browser Use

Stagehand

Prompt Engineering

Langchain

LLaMaIndex

Inference Engine

SGLang

vLLM

TensorRT

ONNX

Category	Name	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	Score Metrics	Model Metrics	ㅤ
ㅤ	ㅤ	License	Security	Maintenance	Dependencies	Regulatory	Overall	Model Coverage	Model Seeking
ML Frameworks	Pytorch	⭐⭐⭐⭐⭐	⭐	⭐⭐⭐	⭐	⭐⭐⭐	⭐⭐⭐	88% (15/17)	8
ㅤ	JAX	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐	⭐	⭐⭐⭐	61% (11/18)	12
ㅤ	Tensorflow	⭐⭐⭐⭐⭐	⭐	⭐⭐⭐	⭐	⭐⭐⭐	⭐⭐⭐	80% (12/15)	5
ㅤ	ONNX	⭐⭐⭐⭐⭐	⭐	⭐⭐⭐	⭐	⭐	⭐⭐⭐	88% (14/16)	5
ㅤ	Candle ML	⭐	⭐	⭐	⭐	⭐	⭐	71% (10/14)	12
Agents Framework	CrewAI	⭐⭐⭐⭐⭐	⭐	⭐⭐⭐	⭐	⭐	⭐⭐⭐	71% (10/14)	13
ㅤ	LangGraph	⭐	⭐	⭐⭐⭐⭐	⭐	⭐	⭐⭐	78% (14/18)	7
ㅤ	Composio	⭐	⭐	⭐⭐⭐	⭐	⭐	⭐⭐	67% (10/15)	5
ㅤ	Agent Development Kit	⭐	⭐	⭐	⭐⭐	⭐	⭐	71% (10/14)	7
ㅤ	SmolAgents	⭐⭐⭐⭐⭐	⭐	⭐	⭐	⭐	⭐	73% (11/15)	9
ㅤ	MetaGPT	⭐⭐⭐⭐⭐	⭐	⭐⭐⭐⭐⭐	⭐	⭐	⭐⭐⭐	57% (8/14)	7
ㅤ	Pydantic AI	⭐⭐⭐⭐⭐	⭐	⭐⭐⭐	⭐	⭐	⭐⭐⭐	88% (15/17)	10
App Agent	Browser Use	⭐⭐⭐⭐⭐	⭐	⭐⭐⭐⭐	⭐	⭐⭐⭐	⭐⭐⭐	88% (15/17)	7
ㅤ	Stagehand	⭐	⭐	⭐⭐⭐	⭐	⭐	⭐	47% (7/15)	6
Prompt Engineering	LangChain	⭐⭐⭐⭐⭐	⭐	⭐	⭐	⭐⭐⭐	⭐⭐⭐	72% (13/18)	19
ㅤ	LLaMaIndex	⭐	⭐	⭐⭐⭐	⭐	⭐	⭐	47% (8/17)	7
Inference Engine	SGLang	⭐⭐⭐⭐⭐	⭐	⭐⭐⭐	⭐	⭐	⭐⭐⭐	73% (11/15)	5
ㅤ	vLLM	⭐⭐⭐	⭐	⭐⭐⭐⭐	⭐	⭐	⭐⭐	73% (11/15)	7
ㅤ	TensorRT	⭐⭐⭐⭐⭐	⭐	⭐⭐⭐⭐⭐	⭐	⭐⭐⭐	⭐⭐⭐	69% (11/16)	5
ㅤ	TGI	⭐⭐⭐⭐⭐	⭐	⭐⭐⭐	⭐	⭐	⭐⭐⭐	72% (13/18)	6

Open Deep Research ICMLs

Name

Baseline Alignment

Novelty Yield

Trust Score

License

Security

Maintenance

Dependencies

Regulatory

Pytorch

Baseline Alignment

88.24%

Novelty Yield

Trust Score

License

Security

Maintenance

Dependencies

Regulatory

JAX

Baseline Alignment

61.11%

Novelty Yield

Trust Score

License

Security

Maintenance

Dependencies

Regulatory

Tensorflow

Baseline Alignment

72.22%

Novelty Yield

Trust Score

License

Security

Maintenance

Dependencies

Regulatory

ONNX

Baseline Alignment

87.5%

Novelty Yield

Trust Score

License

Security

Maintenance

Dependencies

Regulatory

Huggingface Transformers

Baseline Alignment

76.47%

Novelty Yield

Trust Score

License

Security

Maintenance

Dependencies

Regulatory

LibVulnWatch

Research note

Paper

key parts

Leaderboard

Paper

Candidates

Recommendations