Don't underestimate the power of RAG

[-]

SomeOddCodeGuy@reddit (OP)

First "model" in the gif was a workflow just directly hitting Mistral Small 3, and then second was a workflow that injects a wikipedia article from an offline wiki api.

Another example is the below: zero-shot workflow (if you can consider a workflow zero shot) of qwq-32b, Qwen2.5 32b coder and Mistral Small 3 working together.

https://i.redd.it/ko7pt43gcwne1.gif

[-]

CheatCodesOfLife@reddit

There's an issue you probably won't encounter if you don't use Linux.

You've got _chatOnly in the configs, but the config file is called _chatonly.json (lowercase)

Most Linux filesystems are case sensitive so this breaks. Easy to fix but given all the complexity of the configuration, it might be worth changing that to _chatonly in the config examples, to make it easier for new or time poor users trying to get started.

[-]

SomeOddCodeGuy@reddit (OP)

I've either fixed it, or broke all the things. Shake a magic 8 ball to determine which. =D

So it turns out that MacOS is now case insensitive as well, which is why I completely missed this issue. With that said, I applied the fix at the file loader level, rather than fiddling with all the configs, since I didn't want to cause tons of merge issues for folks who might have just pulled the users down.

I really appreciate you pointing this out; I never would have noticed if not. Now I have extra incentive to set up a Linux box for testing.

[-]

CheatCodesOfLife@reddit

Cool, that's a better fix.

turns out that MacOS is now case insensitive as well

Yeah, this caught me off guard a few years ago when I setup MariaDB on my mac, and the table names weren't case sensitive (file per table).

I listened to your video in the background. I like how you present it almost like a teacher, going into the basics like "don't just run code off github without checking it" and explaining a lot of the fundamentals.

If you want some general feedback with the video/tutorial (from someone who doesn't make in depth tutorials and is therefor unqualified):

It might be worth doing it without having to pull down 35gb of wikipedia, or at least stating this requirement up front. I might have missed it but it seemed like you didn't mention this until the thing was almost ready to run. So if someone with low disk space / bandwidth followed along for half an hour and had to drop out, that'd be frustrating.

I know it's easy to change it, but given you seem to be aiming the tutorial at people who might not know how to git clone, it'd be a shame for them to miss out on that magical moment of receiving the first RAG reply. Kind of like when you get your first "hello world" html displayed from a full web stack lol.

[-]

SomeOddCodeGuy@reddit (OP)

If you want some general feedback with the video/tutorial (from someone who doesn't make in depth tutorials and is therefor unqualified):

Always. Anything I do right is purely by luck alone lol. I'm always very appreciative of feedback

It might be worth doing it without having to pull down 35gb of wikipedia, or at least stating this requirement up front.

I didn't even think about that. I'll put it in the description, and in future videos I'll make sure to do that. Also, I'm going to try to add in an online wiki call within the next couple of weeks, as that was requested as well. I'll make sure to stick one of those text overlays on the vid when I do, to make sure folks know about it.

I know it's easy to change it, but given you seem to be aiming the tutorial at people who might not know how to git clone, it'd be a shame for them to miss out on that magical moment of receiving the first RAG reply.

I absolutely agree with this, and I'm kicking myself for not having looked at it from that angle before now. Going forward, I'll be more considerate of that kind of thing. Generally I try to be, but the thought never crossed my mind here. "Wilmer's lightweight ... goget65GBofwikiplease... super lightweight!"

Yea I'll see what I can do about that.

[-]

SomeOddCodeGuy@reddit (OP)

Ah man, I can't believe I left such a ridiculous bug in there. I really appreciate you pointing that out; as soon as I get off work today I'll get that fixed.

[-]

DrViilapenkki@reddit

A simple straight to the point installation guide for Open webui would be greatly appreciated!

[-]

SomeOddCodeGuy@reddit (OP)

Gladly! I'll get one up as soon as I can.

[-]

Relevant-Draft-7780@reddit

Rag works but it’s inconsistent as hell.

[-]

pier4r@reddit

I thought it wasn't underestimated? I mean there are several services (a la perplexity) that live by it (plus other techniques).

[-]

SomeOddCodeGuy@reddit (OP)

You'd be surprised at how many folks, especially companies, just completely overlook structured RAG to do things like trying to fine-tune knowledge into their LLMs.

I think that around here we have a bit of an echo chamber of knowledge/skilled folks who knows better, but as new users come in and especially outside of this domain? It's far less common than I'd like to run into folks in the wild who are building out AI systems and relying extensively on RAG, vs doing other things that aren't quite as powerful.

[-]

pab_guy@reddit

Ooof, you fine tune for behavior, not knowledge. I am sure you know that of course but it’s so frustrating to hear companies are doing that.

With RAG we can validate output comes from source material. Without you can only guess as to hallucinations…

[-]

Intraluminal@reddit

You guys seem knowledgable. What does the hallucination rate look like if you do BOTH? That is, fine-tune it AND give it RAG on 9eesentially) the same info?

[-]

pab_guy@reddit

It's not about the hallucination rate. It's that without grounded data, you can't make a second call to the LLM to ask "hey is the answer provided based on this information: ". so you can actually perform a validation step.

Without RAG, you have no grounded context to compare the provided answer to.

[-]

Firm-Fix-5946@reddit

I think that around here we have a bit of an echo chamber of knowledge/skilled folks who knows bette

just lmao. you must be new here

[-]

SomeOddCodeGuy@reddit (OP)

just lmao. you must be new here

Join in summer of 2023, a couple of months after Llama 2's release.just lmao. you must be new hereJoin in summer of 2023, a couple of months after Llama 2's release.

[-]

Xamanthas@reddit

Disconnected then or in your own echo chamber, the reality is inverse, every man and his dog is doing RAG and neglecting FT. Unsloth folks agree too

[-]

zero_proof_fork@reddit

They might be doing that as the context window is not sufficient

[-]

pier4r@reddit

ah yes. Of course the less knowledge and/or the more stubborn approach - i.e. "I read that X is better than Y so I discard Y entirely" - the more the wasteful attempts to produce useful results.

[-]

SomeOddCodeGuy@reddit (OP)

Yea, I think that in general finetuning is just a very attractive option. RAG requires a lot of stuff under the hood and it's easy to imagine it pulling the wrong data. But the concept of finetuning feels magical- "I give it my data, now it knows my data and there's very little chance of it not working."

Unfortunately, it doesn't quite work that way, but a lot of times folks just blame themselves for that and keep trying to make it work, thinking they are just doing something wrong.

I can definitely see the appeal, if you have someone breathing down your neck saying "I want 100% good answers 100% of the time". RAG is fun when you're a hobbyist, but I imagine it's scary when your livelihood is on the line lol

[-]

Firm-Fix-5946@reddit

it is far from underestimated, it is all the rage right now and mostly what all the LLM integration consultancies are making their money on. OP is just wildly out of touch or perhaps has some really clueless coworkers and is butthurt about it

[-]

CheatCodesOfLife@reddit

You could also just click "Web Search" and Mistral-Small will give you the same answer.

[-]

Firm-Fix-5946@reddit

i mean, duh. everyone and their dog is doing RAG. who is so out of touch as to underestimate it? i mean people who are doing real work for real money, not people posting here

[-]

monti1979@reddit

I think you overestimate the typical company.

[-]

SomeOddCodeGuy@reddit (OP)

There's an unfortunate number of people out there setting up AI solutions for their companies that are trying to finetune knowledge into the models, instead of doing RAG. They get tasked with making an internal chatbot to answer questions, spin up some finetuned version of Llama 3.1 8b that they tried to overfit on company knowledge, and then their users get upset when it isn't doing what they want.

That's why I mention this once in a while. AI companies and startups are one thing, but the internet IT department of non-technical industries like insurance, finance, etc? I think you'd be quite disappointed when you saw what some of the folks being paid to do this in those companies are actually doing.

[-]

madaradess007@reddit

imo it is way overestimated, i tried it like 10 times and every time it was worse than extracting strings from pages and putting them into prompt

[-]

yukiarimo@reddit

Working on innovative rag approach. Speed up +1000%, Quality: 96%

[-]

ViperAMD@reddit

Does it have to be an offline API? Wouldn't that mean you would have to keep wiki dataset up to date?

[-]

SomeOddCodeGuy@reddit (OP)

In this case its specifically the offline one, but it wouldn't have to be. I just prioritized the offline wiki api because I wanted to be able to use this on a laptop on the road. It's true that I have to keep it up to date, though.

Most workflows apps, including Wilmer, generally let you plug in a custom python script into a node, so you could pull from any source you wanted, including actual wikipedia.

With that said, I'll add it to the list to add an actual wiki api node in, just in case anyone else would rather use that and don't want to deal with doing their own custom script.

[-]

Everlier@reddit

Team workflows, let's go!

[-]

SomeOddCodeGuy@reddit (OP)

Wooo! Always happy to see more people using them. The more popular workflows get, the more clever tricks we'll see people do with them that we can try as well =D

[-]

GiveMeAegis@reddit

Custom Pipeline or n8n connection?

[-]

SomeOddCodeGuy@reddit (OP)

Custom workflow app- WilmerAI. Been a hobby project I've been banging away at for my own stuff since early last year; not a lot of other folks use it, but I've got a ton of plans for it for my own needs.

You could likely do the same with n8n or dify (just learned about this one)