Code Retrieval

Creator

Creator

Seonglae Cho

Created

Created

2024 Mar 22 9:23

Editor

Editor

Seonglae Cho

Edited

Edited

2024 Oct 24 22:55

Refs

Refs

Text embedding

Code and natural language are not semantically similar - it is easier to semantic search on code bases if the code is first translated to natural language before generating embedding vectors.

Code Embeddings

Code Prompting Tools

yamadashy • Updated 2024 Oct 24 23:20

Codebases are uniquely hard to search semantically

Cosine similarity in code vs. text.

Codebases are uniquely hard to search semantically

https://www.greptile.com/blog/semantic

Codebases are uniquely hard to search semantically

Generating similarities for code generation

Got it, so maybe a ‘hybrid’ approach? I.e. encode the code snippets as class/function/interface name + parameter_names + docstrings as a ‘syntactic’ embedding, and then use a code2seq or the like to generate embeddings based on their AST paths (and get the ‘semantic’ meaning as well). Then whatever the user prompts, I can generate an embedding based off of his prompt (whether a textual description or code) and see if I get some good similarity results for relevant coding snippets. Does this make...

Generating similarities for code generation

https://community.openai.com/t/generating-similarities-for-code-generation/276894/9

Generating similarities for code generation

Recommendations

//////