New agent clustering
was this tested on synthetic dataset
we could use embedidng for fater clustering and optimize only ambiguous artifacts
the extracted text of the artifact should be type independent for clustering performance to prevent azure collapse blocking
clustering document or paper
train
agent
improvement Idea
text structure 가 text 안에서 modal 이다 semantic 이 아니라 엠베팅 따로두고
- rule based named entity linking
- multimodal embedding model training
- close same asset from different modality
- far from different asset but same modality
- llm decision
- yes or next closest
Agent based approach
- 펼쳐놓고 위 3개 tool
- read input artifact
- retreive relevent candidate assets
- match one of them
Seonglae Cho