RAG from Scratch is now live on GitHub

Posted by purellmagents@reddit | LocalLLaMA | View on Reddit | 3 comments

It’s an educational open-source project, inspired by my previous repo AI Agents from Scratch, available here: https://github.com/pguso/rag-from-scratch

The goal is to demystify Retrieval-Augmented Generation (RAG) by letting developers build it step by step. No black boxes, no frameworks, no cloud APIs.

Each folder introduces one clear concept (embeddings, vector stores, retrieval, augmentation, etc.) with tiny runnable JS files and a CODE.md file that explains the code in detail and CONCEPT.md file that explains it on a more non technical level.

Right now, the project is about halfway implemented:
the core RAG building blocks are already there and ready to run, and more advanced topics are being added incrementally.

What’s in so far (roughly first half)

Everything runs fully local using embedded databases and node-llama-cpp for inference, so you can learn RAG without paying for APIs.

Coming next

Still missing / coming next

Why this exists

At this stage, a good chunk of the pipeline is implemented, but the focus is still on teaching, not tooling:

Feel free to open issues, suggest tweaks, or send PRs - especially if you have small, focused examples that explain one RAG idea really well.

Thanks for checking it out and stay tuned as the remaining steps (advanced retrieval, prompt engineering, evaluation, observability, etc.) get implemented over time