Llama3.2:1B | TheaterFire

[-]

cms2307@reddit

Incredible how fast we’ve come since the original ChatGPT launch. 1b models providing answers in the same realm of quality.

[-]

ranoutofusernames__@reddit (OP)

Absolutely crazy. Small models are AI for the masses. They’ll be running everywhere soon

[-]

vibjelo@reddit

Sadly in that case, the masses shall remain dumb

[-]

ranoutofusernames__@reddit (OP)

Why do you say that?

[-]

vibjelo@reddit

Because 1B models aren't really useful for anything besides simple autocomplete and similar.

So if the masses use those to educate themselves, we'll be as smart tomorrow as we are today.

[-]

Future_Might_8194@reddit

I use Llama 3.2 3B in a chain, and it's better than a one-shot from any model. You know what answers (for example) math questions faster and better than a large model?

A 3B RAG'd up to a calculator.

When you just load models up in a chat app, you're just getting the demo. Start putting agent chains together with outside data and tools, and suddenly an incredibly obedient 3B that doesn't confuse researched data against its training is so much better.

[-]

cms2307@reddit

Any specific numbers on how much better 3b plus a calculator is than large models without? I’ve been interested in this for a while but it seems like people really aren’t trying this setup, despite what looks to me like obvious advantages

[-]

Future_Might_8194@reddit

No, It's just any model can hallucinate, no matter the size, but a calculator won't. A small and much faster model that is instructed to just relay outside information in a conversational way will more accurately read a calculator than a large slower model working the math itself and trying not to hallucinate.

[-]

cms2307@reddit

Would you say that small models without calculators are a reliable way to solve math problems? Let’s assume we’re doing basic calculus or something, can they get the answers right 50% of the time? 75%? 90%? I’m very interested to hear about this because I literally can’t find anyone else talking about it

[-]

Future_Might_8194@reddit

I mean, try it. I don't think it's ever twisted the answer for me if it's given the right answer from a calculator. I'm sorry, I don't have numbers. I have an agent chain I've been piecing together since Hermes was on Mistral.

[-]

treverflume@reddit

Can I do this with a 3B on mobile do you think?

[-]

he_he_fajnie@reddit

Rag, search, summary is actually all you need. It doesn't have to know stuff it needs to "think" and rephrase without hallucinating and thats it

[-]

tcika@reddit

It actually doesn’t even needs to think at all. LLMs have two main issues: hallucinations and inability to reason. And if you use the model for RAG, its inherent knowledge becomes “toxic” and you don’t rwally want it to fake your RAG data. So small models (like qwen 2.5 3b or qwen 2 vl 7b) are all you need. They do the job and they are cheap to host.

I have a custom use-case with a long-living multi-agent system and I found no real difference between smaller and larger models in terms of the end result. The reasoning part is done by a separate module with a bunch of external tools anyways.

[-]

cms2307@reddit

Can you give me some more info about that second part? How do you work reasoning into your workflow

[-]

tcika@reddit

Let’s start with the fact that the entire system was written from scratch so don’t hit me too hard with your keyboards when I open source it :D

My system is essentially split into several semi-independent modules communicating with each other when out-of-domain actions are needed. One of these modules is what I call the “logic reasoning module” and it is essentially a bunch of narrow specialized agents serving as a glue between the task and the bag of algos I found in the wild. One of its purposes is to apply formal logic to check whether the text given to it is correct, and to formally infer properties of some parts of the text (for example, if the text mentiones a certain door, and the system needs to ensure that the door, given its previosuly learned properties and a textual description, is indeed a door and has no undesirable properties such as being a broken door or a hard-to-open door). Another thing this module does is decision making. Agent generator creates state evaluation agents and all the other necessary entities from blueprints and then sends their actor references to the algorithm, such as mcts for example.

But I gave up on making this module work as I wanted and came up with a reasoning habit module instead. That one is a meta-module that essentially keeps track of the entire set of system activities and tries to detect any sorts of patterns, and its sub-module then tries to create a “shortcut”. The thing is, these learned patterns have individual scores w.r.t. the skill they were made for. These patterns essentially compete with each other for the right to be used in their respective cases. Basically, a schizo form of a reinforcement learning approach.

There’s much more to it than what I already described but I’m too sleepy so nope. And yes, you don’t really need large LMs for it to function, like at all. Yes, they will give you somewhat better result, but their cost is a big oof.

P.S.: I use knowledge graph with a few extensions (like that one that resembles frames), and this graph also has temporal component and a simple node level version control. I just ran out of hobbies and I really wanted to see how exactly would my attempt to build that all fail so here I am :D

[-]

cms2307@reddit

This is really interesting, I think the part about applying formal logic to questions could be really good and should be explored more. Maybe a good way to do it would be to fine tune a model on either restructuring or labeling an input question using formal logic, but I really don’t know the specifics of fine tuning. Good work though!

[-]

tcika@reddit

In-context learning is more than enough most of the time, actually. And I'm not a fan of keeping tens of different networks since it would impose many architectural and computational restrictions for the sake of minor improvements. And that reasoning habit module supersedes logic module for most tasks after gaining enough experience :-) This one is a good find!

Although it costed me \~350$ (in api credits) to run it for an hour with qwen 2 vl 72b and claude 3.5...
Because it spawned \~13k instances of agent(actor) types in total. I'm glad that only 50 were able to run at the same time.

[-]

vibjelo@reddit

Have you actually tried 1B models? They can barely form coherent sentences...

[-]

cms2307@reddit

Do you see the post your replying to? That’s a 1b model doing more than just forming coherent sentences

[-]

qwesz9090@reddit

My experience with llama 3.2:1b was the same, it was pretty incoherent. But llama 3.2:3b seriously impressed me. Still incredibly small and it seemed usefully coherent.

[-]

Ok_Cow_8213@reddit

I’m no expert in all of this LocalLLaMA stuff but in my experience smaller models tend to hallucinate more, refuse a reply, reply with something unrelated or just reply with the same text that was in the prompt. And smallest stuff i have tested has been 3b models. It’s so bad for me I really don’t understand how people are finding them useful at all in this stage.

[-]

cms2307@reddit

They do hallucinate but they’re useful in certain situations where you don’t care about 100% accuracy of information. I haven’t tested 3.1 1b and 3b very extensively so I can’t say if they’re actually at gpt3.5s level but just conversation wise theyre definitely on par, I don’t feel like I have to dumb down my prompts very much as opposed to something like Tinyllama from way back when.

[-]

JFHermes@reddit

But surely in coding situations you do want 100% accuracy. Who wants to sit around trying to get a small model on track? You would just code it yourself at that point.

Other stuff I totally get but coding seems like a poor use case for a small local model.

[-]

ggone20@reddit

Do you (or any other human on earth) code with 100% accuracy? No.

That said, the small models are really good at things like summarizing or rewriting in different tones, or taking in context and making inference on the input - think a calendar and ‘what time is my meeting’ or a sales report and ‘how much revenue last quarter’. Or think about realtime conversation advice/coaching when paired with STT where it listens to your conversation and warns of any non-factual comments or biases. Etc, etc, etc.

There are TONS of valuable uses for AI on the edge that don’t require ‘100% accuracy’ as that statement doesn’t even mean anything lots of times. Not only that but 3B can still do function calling, which makes it superhuman anyway.

It’s amazing Meta gives these away for free. Insanity.

[-]

draeician@reddit

Can someone give me some examples of when you don't care if the Model is accurate? The only thing I can come up with is Fiction Writing, but even there if you've outlined something you'd want the model to still be accurate to your outline. You wouldn't want the protagonist changing to an alien race, or switching planets, or changing from a rock star to a hermit in the span of a sentence.

[-]

ggone20@reddit

You’re misunderstanding ‘accurate’ for ‘factual’. I gave you three perfect examples. Humans aren’t 100% accurate or factual and you work and talk with them right?

[-]

AardvarkFuture4165@reddit

tbh i would say your examples would be correct...basically easy lookups that are simple to answer correctly..simple recalls where the answer is basically plainly there, no need for a big model

[-]

Wild_King4244@reddit

What models did you try?

[-]

Ok_Cow_8213@reddit

One I can remember from the top of my head that was especially bad is mini orca 3b

[-]

Various-Operation550@reddit

Dude these models in our field are like saying “i tried computers in 1987 - nothing special”

[-]

ConObs62@reddit

1987 was a good year for computers... the obviousness of their utility far exceeded these autocompletion tools

[-]

Various-Operation550@reddit

No

[-]

TechnoByte_@reddit

That's an ancient model, llama 3.2 3B and Qwen 2.5 3B are much, much better than that

[-]

nixed9@reddit

ancient model

released June 26, 2023

crazy pace in this field

[-]

crappleIcrap@reddit

it is true tho, the naysayers have only focused on the doomgraphs of increased power and computation of the largest models saying it outpaces compute. in reality all sizes of models have become better since so much work has been done, a 1B parameter model today makes a 1B parameter model last year look like cleverbot.

[-]

2016YamR6@reddit

I use 1B and 3B models in my chain of prompts for the intermediary decisions that need to be made so I don’t have to make as many calls the API or load a 34B model

[-]

MINIMAN10001@reddit

I should experiment more because I hear this same thing across llama 1b to 8b depending on the particular one shot being asked

[-]

Zealousideal-Ask-693@reddit

What are you using to host the LLM? The only local hosting tool I’ve seen is GPT4ALL but I’d like to find something easier to fine tune and custom train.

[-]

ranoutofusernames__@reddit (OP)

Ollama + PeerJS

[-]

Apgocrazy@reddit

Dope!!! you gave me some inspiration

[-]

Over-Dragonfruit5939@reddit

What UI are you using?

[-]

ranoutofusernames__@reddit (OP)

Something I wrote for the device

[-]

Over-Dragonfruit5939@reddit

Nice it looks awesome

[-]

masteryoyogi@reddit

Did the code work?

[-]

ranoutofusernames__@reddit (OP)

Haven’t tested this specific one yet but I’ve been using it to code in JS this whole week. Pretty good, everything has worked so far.

[-]

ButterflySpecialist@reddit

What is the accuracy percentage of the code snippets? Have you figured that out yet/ how do you figure that out lol.

[-]

Orolol@reddit

From what I can read, it should works yes

[-]

ventilador_liliana@reddit

Is amazing, and in q4_k_m is very good also in spanish, and all only in 800MB

[-]

Obvious-Theory-8707@reddit

What is the UI you are using ?

[-]

ranoutofusernames__@reddit (OP)

It’s UI I built for the device!

[-]

No-Ocelot2450@reddit

I've used a bigger version on 6Gb GPU (Even 4Gb suffice) using LMStudio or llama.cpp directly. I't is not fast enough to use it in any "production" task, but acceptable for personal use.
But it terms of generalization capabilities 3.1 and 3.2 are not impressive. Lack of comprehension and overall logic in smaller versions. Gemma 2 and Qwen 2.5 and even the last Microsoft Phi are better

[-]

ranoutofusernames__@reddit (OP)

Definitely agree. I wouldn’t recommend using stuff it spits out for production. For the average joe though, it’s very good. Especially 3B, at least in my opinion has been a good model to quickly ask about random things, debug etc… The plan is to run “standard” models on a GPU based device but obviously it’ll be way more expensive and larger in size.

[-]

Perfect-Campaign9551@reddit

I can run 3.21b on my phone....

[-]

ranoutofusernames__@reddit (OP)

What phone do you have?

[-]

Perfect-Campaign9551@reddit

Moto G 5G 2024. 3.21b runs about 4t/sec or so using the PocketPal app

[-]

cerchez07@reddit

what is this ui you are you using?

[-]

ranoutofusernames__@reddit (OP)

Something I made for my AI device project

Thinking about adding ability to run the code, not sure if people will want that since there’s full feature IDEs though.

[-]

MoffKalast@reddit

PERSYS is made in USA.

Wrong, the Pi 5 is manufactured in Wales. :P

[-]

ggone20@reddit

Yea but the case is 3D printed and components assembled here! Lol

[-]

MoffKalast@reddit

I once worked with a company that made their entire product in China, but then sent them to HK where they only uploaded the software so it could be technically labelled as "Made in HK" and get around import restrictions.

The regulators were seemingly totally fine with it so I guess OP is in the clear, haha.

[-]

TheOwlHypothesis@reddit

I want hardware that "crystalizes" an LLM, in other words it can only run as the LLM that was flashed to it. I can imagine a dedicated piece of hardware would have performance gains. It would be good for a project like this and all local LLM enthusiasts.

Although I could also see no one doing this because of the cost and inflexible nature of it. I'm not even sure it's possible.

[-]

Mescallan@reddit

Verisatium had a video a few years ago on a start up that converted nand flash modules to analog neural networks.

Analog is the future, but we need to reach a capabilities plateu before it's reasonable to hardcore weights

[-]

swagonflyyyy@reddit

I feel like an ASIC would be what you're looking for.

[-]

el_isma@reddit

Like an FPGA? But they AFAIK they don't have enough RAM (unless you want to run something tiny)

[-]

TheOwlHypothesis@reddit

Not exactly. I'm not a hardware person so IDK what exactly to call it. But I imagine it would be a special class of hardware that is similar to a GPU but "hard coded", or I guess hard wired in this case in a way that the LLM weights are the only thing that it runs.

[-]

MidnightHacker@reddit

Isn’t it kinda what Grok is doing right now?

[-]

my_name_isnt_clever@reddit

I think an ASIC might be the idea you're looking for. There are some attempts, the issue right now is that everything is moving so fast it's very risky to hard commit to the transformers architecture when there is a high chance we end up with something better.

[-]

ranoutofusernames__@reddit (OP)

That’s my goal for the next next version. Not only dedicated model but dedicated board too. Ground up designed to be lightweight.

That being said, building on a popular platform is very important for this stage for many reasons.

[-]

mr_happy_nice@reddit

That's a pretty tasty UI there partner. I love your spacing.

[-]

LilaSchneemann@reddit

Maybe Llama 3.3 can teach devs how to waste even more screen space.

[-]

ranoutofusernames__@reddit (OP)

Thank you!

[-]

Different-Effect-724@reddit

Hey, great taste on the UI. Did you make your own or is this an open-source package I can find?

[-]

ranoutofusernames__@reddit (OP)

Hey! Working on it, doing some documentation and cleaning up some code. Some minor things to add. If you put your email on the site, I’ll send an email update when it’s on GitHub. I’ll most likely post it here too though so you don’t have to

[-]

upquarkspin@reddit

21.63 t/s on iPhone 13!!!

[-]

dazld@reddit

How did you run on iPhone? Have had very little luck using apps so far.

[-]

upquarkspin@reddit

https://apps.apple.com/app/id6502579498

[-]

punkpeye@reddit

The UI looks interesting. Reminds concept art from sci-fi movies.

[-]

330d@reddit

Create a neural network in Python

Sure, I'll create a neural network in Python!

import neuralnetwork ....

[-]

princetrunks@reddit

I should put this on my pi5

[-]

my_name_isnt_clever@reddit

When I first used GPT-2 in AI Dungeon it blew my mind and felt like the future. But it was running from some data center somewhere, it was still out of reach. Now we can run better models on a Raspberry Pi. I love technology.

[-]

Hungry-Loquat6658@reddit

this UI looks cool

[-]

ranoutofusernames__@reddit (OP)

Thanks!

[-]

RealBiggly@reddit

If you want to really impress me, ask it to create a simple click-n-play installer for that GUI, for Windows?

I bet ya can't! Betcha?

And I bet you couldn't add lorebooks and character creation to it, with character images n stuff, using normal GGUF files from the same directory as my other apps, I'm betting that's WAY beyond it's means...

Like totally?

;)

[-]

StyMaar@reddit

LM.rs has a desktop GUI (but there's no pre-compiled binary AFAIK, you'd need to compile it yourself)

[-]

RealBiggly@reddit

I use Backyard.ai and was jus' teasin' the fella, but yeah that's a nice GUI...

[-]

ranoutofusernames__@reddit (OP)

Heading that way. Already have an electron version for v1 that can be ported to all platforms.

Everything else you mentioned, coming very soon ;)

[-]

gami13@reddit

why electron? just use native winui3

[-]

ranoutofusernames__@reddit (OP)

True, eventually 100% that’s the goal. But between doing CAD, procurement, shipping, coding and everything else, it’ll take time so having a single codebase for all platforms using electron will be a good stopgap until all native releases. Trying to get this in the hands of as many people as possible as fast as possible.

[-]

RealBiggly@reddit

\o/ I like you already! :D

[-]

EastSignificance9744@reddit

I run 13B on my 16gb RAM CPU at that speed. why's this so slow?

[-]

ranoutofusernames__@reddit (OP)

Which CPU? This is running on a Pi

[-]

EastSignificance9744@reddit

oh makes sense, I'm on a i7 ice lake

[-]

ranoutofusernames__@reddit (OP)

Ah yeah, that’ll do it.

[-]