How do you actually start understanding a large codebase?

Posted by radjeep@reddit | ExperiencedDevs | View on Reddit | 61 comments

I’m trying to become a better engineer and feeling pretty stuck with something basic: reading large codebases.

Quick background: I’ve spent a few years as a data scientist. Built Flask endpoints, Streamlit apps, worked a bit with GCP / Vertex AI. But I haven’t really done heavy engineering work (apart from some early Java bugfixes with a lot of help).

Now I’ve got a chance to work more closely with engineering teams, but the size and complexity of the codebase is intimidating me.

A concrete example: I was asked to implement prefix KV caching. There’s already a KVCache class that I’m supposed to reuse, but I can’t even begin to reason about how it behaves across the different places it’s used. There’s a lot of abstraction (interfaces, dependency injection, etc.) and I get lost trying to follow the flow.

I’ve tried reading top-down, following function calls, even using AI tools to walk through the code, but once things get abstract, I lose track.

I’m not just looking for “ask AI to explain it”, more like -

Also, are there tools (AI or otherwise) that actually help you navigate and map out codebases better?

Right now it feels like everything depends on everything else and I don’t know where to get a foothold.

Would love to hear how others approach this.