Is there an LLM that can assimilate an entire codebase for chatting?
Posted by abbondanzio@reddit | LocalLLaMA | View on Reddit | 15 comments
I have a large project in typescript (monorepo) and currently use claude-dev to ask questions on my codebase or make changes.
Is there any way to feed my entire codebase to an LLM so that I can chat with it? I had thought of something that does indexing and creates embedding of my entire codebase.
Claude's projects accept very few files, the continue.dev api often fails me if I select the whole codebase and in general they force too much.
I would prefer it to be opensource. What is the state of the art?
Spirited_Example_341@reddit
resistance is futile
BatsChimera@reddit
compile every line into a txt and uploadit to notebooklm, it works for me, i have a script that takes every line with a (from (x).txt[or whatever the extension was])
shaman-warrior@reddit
How large is your codebase?
IdeaEchoChamber@reddit
Try taking a look at Gemini APIs, they have very large context window. Or just use cursor
guyomes@reddit
[Aider](https://aider.chat) is a nice open source alternative. It handles the context of the project by feeding a [repository map](https://aider.chat/docs/repomap.html) to the LLM, and it adds specific files to the chat context when it is needed to answer some questions. Moreover, it can be connected to all the LLM supported by [LiteLLM](https://github.com/BerriAI/litellm), including local models.
GradatimRecovery@reddit
Prove your concept on NotebookLM before building your own RAG
PermanentLiminality@reddit
The default context on continue.dev is only 8k. Try turning it up. Be sure the model or your local LLM can support it
asankhs@reddit
To import the entire code the model needs to have a context length larger than the full repo. Except for a few small projects that would not be the case. If your repo is small enough you can try using one of the Gemini models. We do that in https://docs.codes/ where we take entire library sources and generate documentation from it by using Gemini Flash 1.5.
If the repo is larger then you have two options one is to embed the repo in a vector store using one of the embedding models and then chat with it. We have an implementation of that you can try freely at https://www.patched.codes/, we support multiple repos and even document stores that you can query and chat with from a single interface.
The other option is to use a short term memory and let LLM have access to it during the inference/chat. I recently implemented a short term memory via a memory plugin in optillm (https://github.com/codelion/optillm) which is an open-source optimizing inference proxy. Using memory we were able to get some very good results in large context benchmarks like FRAMES (https://huggingface.co/datasets/google/frames-benchmark). The benchmark doesn't include code retrieval though so ymv.
tallesl@reddit
Embedding source code files to later be used with natural language can be tricky. Human language does not quite match programming language. I saw a presentation by Codeium that they mitigate this problem by generating and adding a human-like description to the code (generated by a LLM of course).
Not a local solution, but a quick way to have this is by putting the entire codebase in the context them chatting with it. Gemini's huge context size shines here, combine it with how cheap the flash model is and you may have a winning combination.
justinrlloyd@reddit
Check out the Seagoat project on github.
Everlier@reddit
continue.dev would be amongst the top of the line in what you can get in Open Source
If your codebase is still relatively small - use tools like repopack for full in-context editing
how_now_brown_cow@reddit
This + an ollama back end with qwen2.5 or deep seek if your gpu can handle it. Use continue daily
caphohotain@reddit
I find Cursor's @codebase is very good!
Mundane_Ad8936@reddit
Gemini can do 2M tokens.. The main issue you can't expect any model to be able to pay attention to all the code.. it will miss things.. Your prompting tactics are key..
DangKilla@reddit
I don't think this area is fully solved yet. I've seen papers trying to solve this. Large contexts were the simplest attempt but hallucinations occur. RAG doesn't fully solve it. Synthetic data might work, not sure.
The current paradigm is to load a file or two and work on them that way. Try aider from shell, or Cursor IDE.