Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/AI Code Generation/
Code Embedding
Search

Code Embedding

Creator
Creator
Seonglae Cho
Created
Created
2024 Mar 22 9:23
Editor
Editor
Seonglae Cho
Edited
Edited
2025 May 29 20:41
Refs
Refs
Text embedding
Code and natural language are not semantically similar - it is easier to semantic search on code bases if the code is first translated to natural language before generating embedding vectors.
Code Embedding Models
Codestral Embed
Voyage Code 3
Nomic Code Embed
 
 
 

Code search grep.app

Code Search | Grep by Vercel
Search for code, files, and paths across half a million public GitHub repositories.
Code Search | Grep by Vercel
https://grep.app/
Code Search | Grep by Vercel
Codebases are uniquely hard to search semantically
Cosine similarity in code vs. text.
Codebases are uniquely hard to search semantically
https://www.greptile.com/blog/semantic
Codebases are uniquely hard to search semantically
Generating similarities for code generation
Got it, so maybe a ‘hybrid’ approach? I.e. encode the code snippets as class/function/interface name + parameter_names + docstrings as a ‘syntactic’ embedding, and then use a code2seq or the like to generate embeddings based on their AST paths (and get the ‘semantic’ meaning as well). Then whatever the user prompts, I can generate an embedding based off of his prompt (whether a textual description or code) and see if I get some good similarity results for relevant coding snippets. Does this make...
Generating similarities for code generation
https://community.openai.com/t/generating-similarities-for-code-generation/276894/9
Generating similarities for code generation
 
 

 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/AI Code Generation/
Code Embedding
Copyright Seonglae Cho