askchris

Compile English function descriptions into 22MB neural programs that run locally via llama.cpp

Posted by yuntiandeng@reddit | LocalLLaMA | View on Reddit | 20 comments

askchris@reddit

I had the same question, so after digging a bit, I realized they have a private 4B model (on their end) trained to instantly generate the 22MB LoRA weights for a smaller model that you host locally (ie. Qwen 3 0.6B or GPT-2). If you request a function via their API, your small local model + LoRA can perform that function at about the quality of a 32B model. So their key mechanism is the 4B model, but it's private. I guess it's more like a fast LoRA weights generator, skipping the typical LoRA training step.

I Trained a Language Model on CPU for 40 Hours - It Beat the GPU Baseline

Posted by Own-Albatross868@reddit | LocalLLaMA | View on Reddit | 31 comments

askchris@reddit

I'm curious as well because the more comparable 28M parameter Tiny Stories model from the original paper appears to have performed better than the OP's 25M CPU model. Here's the output quality from the original TinyStories paper: --- Prompt: Tom and Jane are friends. One day, Jane goes to Tom’s house. Tom has a big pot of soup. He wants to share it with Jane. “Jane, do you want some soup?” Tom asks. “Yes, please. It looks yummy,” Jane says. Tom pours some soup into two bowls. He gives one bowl to Jane. Jane takes a spoonful of soup, but then she makes a face. The soup is Completion by a 28M- parameters model trained on TinyStories: very bitter. She does not like it. She says, “I don’t like this soup. It is too bitter.” Tom is sorry. He says, “I’m sorry, Jane. I didn’t know you don’t like bitter soup. I will make you something else.” He looks around the kitchen and finds some bread and cheese. He puts them on the table and says, “Here, Jane. You can have some bread and cheese. They are not bitter. They are sweet and yummy.” Jane is happy. She says, “Thank you, Tom. You are a good friend. I like bread and cheese. They are not bitter.”

Running GLM-4.7 (355B MoE) in Q8 at ~5 Tokens/s on 2015 CPU-Only Hardware – Full Optimization Guide

Posted by at0mi@reddit | LocalLLaMA | View on Reddit | 107 comments

askchris@reddit

Yep you can turn sunlight directly into "viable work" output with these rigs. While costs drop every year due to more efficient models. To serve humanity or solve private problems ... (or build cool things to sustain your moneyless off grid lifestyle)

If we had models like QwQ-32B and Gemma-3-27B two years ago, people would have gone crazy.

Posted by Proud_Fox_684@reddit | LocalLLaMA | View on Reddit | 107 comments

askchris@reddit

Nations would war over the aluminum can though ... Then there's Pepsi's "Zero Sugar" claims which would be difficult to verify 1,000 years ago, let alone recreate. There was also no canned or bottled caffeinated drinks for sale before the 19th century. Caffeine in a can just bdidn't exist. Fizzy caffeine in an aluminum can with sugar free sweeteners DEFINITELY didn't exist. If someone were handed a cold Pepsi Zero Sugar on a hot summer day 1,000 years ago it would feel like alchemy -- We can't really comprehend how out of place our everyday objects would seem to people back then.

Real-Time Introspective Compression for Transformers

Posted by dicklesworth@reddit | LocalLLaMA | View on Reddit | 6 comments

askchris@reddit

You're probably onto something -- But how much better would this be over verbalizing internal states the way reasoning models do? Verbalizing allows LLMs to reflect, correct and change directions already -- similar to what you've described. Do you expect your method to be more granular, adaptive or more parallel than what chain of thought / reasoning can do? Would it be used during training? Or more for test time compute tasks?

The pipeline I follow for open source LLM model finetuning

Posted by Ahmad401@reddit | LocalLLaMA | View on Reddit | 11 comments

askchris@reddit

Super useful thanks for sharing. Would love to learn more. For example are there cases where you can get expensive o1 prompts in narrow domains to work as well or better on small 3B to 32B sized models? And is there a way to streamline your steps to get models to do this for themselves eventually -- walk through your steps to get better outputs with less compute in narrow domains?

Many asked: When will we have an open source model better than chatGPT4? The day has arrived.

Posted by Terminator857@reddit | LocalLLaMA | View on Reddit | 197 comments

askchris@reddit

Exactly. I bet in 12 months people will say the same as OP about AGI: "new model finally beats humans at most tasks" But the reality in 12 months: "We've had models that could beat the *average* human at most knowledge tasks for a year" lol people are spoiled with AI and it's too funny

Brute Force Over Innovation? My Thoughts on o1-Pro and o3

Posted by anzzax@reddit | LocalLLaMA | View on Reddit | 172 comments

askchris@reddit

No you can't recognize a new pattern the first time, this is BS. Your instincts have already been trained to prioritize pain signals from prior generations of survival feedback. Babies don't understand very simple things like object permanence the first time, let alone complex things like economics (which has patterns that even adults fail to recognize). It's also difficult for humans to accurately tell the difference between benign and malignant breast microcalcifications without extensive training. You can't perfectly understand what people in other languages are saying the first time, even if someone quickly tells you what 200 common words mean in the target language beforehand. You can't understand calculus the first time without first understanding at least some basic math. You can't recognize a new unique 7 dimensional object once rotated slightly. Nor can you tell it apart from 20 other very similar looking 7 dimensional objects once they've all been rotated randomly. Humans aren't that flexible, we're a type of narrow intelligence with many blindspots, biases, shortsightedness and egocentrism.

A "dumb" theory.

Posted by ThiccStorms@reddit | LocalLLaMA | View on Reddit | 31 comments

askchris@reddit

Yes and in my experience if you give a correct answer to a group of dumb LLMs to debate, the response will usually get "corrected" to the point that their output will no longer be correct. I haven't found a very good way around this. Voting sometimes works, but even voting from different dumb models can dumb things down further instead of sharpen the results. What may work is routing. For example: What if each dumb LLM has a known competence/accuracy rating for different types of problems, then route it through the LLM best suited for the problem. Should get a slight boost.

Am I the only person who isn't amazed by O1?

Posted by noellarkin@reddit | LocalLLaMA | View on Reddit | 135 comments

askchris@reddit

Great counter point, thanks for sharing your story! This helped connect a few dots for me ... Essentially it sounds like it may be impossible for current LLMs to train a classifier (using a set of manual rules / "NOT LIKE" statements) for thousands of items without losing track of the previous rules since the complexity would grow exponentially with each added rule interacting with previous rules. Humans run into similar problems training symbolic AI / GOFAI (Good Old Fashioned AI) using rules. It's interesting that you were able do the task successfully by doing it manually, but humans may fail if it were even more complex -- for example classifying thousands of categories perfectly in compressed data with millions of lines ... At a certain point LLMs and humans both fail and we need to train a neural network classifier. So I suppose if current LLMs were more aware of their own abilities, and assess the complexity of the task, they could route their effort towards creating the python code needed to train a full neural network to do the classification task for your 17k lines. (Or any number of lines)

Tip for Creative Writing Generation: Ask for Clarifying Questions!

Posted by Stepfunction@reddit | LocalLLaMA | View on Reddit | 6 comments

askchris@reddit

I was thinking of rigging this up (contextualized RAG for every chat message), but I'm sure there is some finesse required in setting this up so it works flawlessly like an "endless memory". So do you know if there are any off the shelf tools that already do this quite well? Out of the hundreds of paid ChatGPT alternatives and local setups surely *some* must do this quite well by now, just haven't found it yet.

Am I the only person who isn't amazed by O1?

Posted by noellarkin@reddit | LocalLLaMA | View on Reddit | 135 comments

askchris@reddit

I agree, the goalposts keep shifting ... people don't even know what AGI means anymore. With current technology we can already chain together a single "super tool" and call it "AI" or **something** that can ... - beat every human at Chess, Go, Poker ... (AlphaGo & others) - beat 98% of humans at math challenges (o1, Gemini, Claude) - write code faster, and more accurately and in more coding languages than 95% of humans (DeepSeek, Qwen, Claude 3.5 Sonnet) - read (understand) and comment on more than 10 Harry Potter length books all at once (in seconds) (Gemini's 2M context length) - understand and describe visual images faster and more accurately than most humans (most multimodal LLMs) - create art that's rated aesthetically better than 95% of human "artists" (Flux, Midjourney) - create realistic and fantasy video clips from its "mind" (Minimax, Kling, Runway, Sora) - translate over 100 languages in seconds - empathize better (and use emotional intelligence better) than 90%+ of humans - combine ideas from millions of different topics that no single human knows at depth - reply to chat threads and emails coherently - write novels, storyboards, poems better than most humans (Llama, Claude, ChatGPT) - listen and understand multiple languages and accents via audio - communicate in hundreds of voices that sound nearly indistinguishable from a human voice, complete with emotional inflections and nuances (openAI, Eleven Labs, etc) - can sing and create music that sounds better than 90% of musicians in seconds (Suno, Udio) - can research online automatically (perplexity, agent workflows, etc) - can reason better than many humans (o1, QwQ, Claude) If we combined these into one "entity" and took it back 10 years to 2014 ... What would people *really think*? I'm just curious why nobody seems to have noticed that this "entity" is already a step beyond what any individual human can do? And can run at superhuman speeds, at very low costs ... And these are (or were) economically viable tasks, at least until very recently. So if "AGI" is considered "human level", then what is it called that *we can already do* if we combined our current tools in 2024?

Using AI Generated Code Will Make You a Bad Programmer

Posted by Comfortable-Site8626@reddit | programming | View on Reddit | 439 comments

askchris@reddit

Why can't people learn to code in any order they want, for whatever reason they want? I mainly learned from playing with code, and I feel like AI makes it easier to play around.

Gemini Exp 1114 now ranks joint #1 overall on Chatbot Arena (that name though....)

Posted by lightdreamscape@reddit | LocalLLaMA | View on Reddit | 121 comments

askchris@reddit

I doubt it, but Google *does own around 10% of Anthropic* and therefore owns 10% of Claude 3.5 Opus ... It would be weird if Anthropic didn't like the results of their 3.5 Opus training run and sold the model to Google to recover their costs. (Unlikely ... ) I think it's more likely the model is hallucinating due to training biases. But who knows, it's a weird name "Exp" ... Not a normal Gemini model.

How to build AGI part 2: I think i figured out how to fix Reinforcement Learning

Posted by Radlib123@reddit | LocalLLaMA | View on Reddit | 169 comments

askchris@reddit

You may not be convinced without a mathematical proof, but learning algorithms *can* be made without knowing math. For example: your brain's functionality was invented by evolution and proven by natural selection **before math was invented**. But I think I get you: Theories are cheap, and the ML community wants a mathematical proof before moving forward on a new idea. I think we all want to see proof, but not everyone cares about the math if the strategy simply **works**! For me I just want to see the code, run the code and verify it does the job while sipping coffee.

Waves are all you need

Posted by ethereel1@reddit | LocalLLaMA | View on Reddit | 94 comments

askchris@reddit

Great insight, so from physics? I'm working on compressing information further as a way to maximize intelligence on consumer hardware (ideally AGI on a laptop) and find this to be an inspiring direction so thank you for your work! 🙌

Waves are all you need

Posted by ethereel1@reddit | LocalLLaMA | View on Reddit | 94 comments

Waves are all you need

Posted by ethereel1@reddit | LocalLLaMA | View on Reddit | 94 comments

askchris@reddit

Awesome work! It's cool that your "results closely match the paper's claims! The Wave Network achieved 91.64% accuracy compared to BERT's 94.53%, which is remarkably close considering it uses only 2.4M parameters versus BERT's 100M parameters" How did you put this together so fast? Are you open to doing any collaborations?

10 years from now, we will realize we could have created AGI on a laptop 10 years earlier, here's why.

Posted by askchris@reddit | LocalLLaMA | View on Reddit | 46 comments

askchris@reddit (OP)

You're right -- you and I are talking about different levels of the same system. You're talking about neurons (more like the hardware layer) and I'm talking about how the neurons use prediction errors, along with higher level reinforcement signals "ie. making sense", learning by experience etc to build processes in the software layer, for example I see things like "working memory" as something more like flexible software running in RAM than hard coded processes in neurons that take days or weeks to physically rewire. Of course there's no single neuron involved in how the brain does everything, that would be like looking at single transistors to figure out how to build something like Linux. In the same way, we can't build AGI by thinking about how the neurons fire alone -- it won't magically "emerge" if we figure out how to simulate biological neurons. The brain has even more layers than a computer, and we don't even need most of these layers since we're not dealing with managing glial cells, regulating hormones or moving actual muscles. This is why there's a lot of room for optimization. We don't need 100 billion biological neurons to do the same thing as "general intelligence" (which is only a fraction of what the brain does anyways.) The brain is amazing but also flawed because of natural selection: It can't go back and engineer a system that works on a different computing paradigm, it can only build bottom up, repurposing what worked in primitive organisms, layer by layer, creating a lot of waste along the way -- even to the point that infants must learn how to see and how to use attention every single time, and how to access their working memory. It takes 18 years to build an adult brain rather than providing a better framework to infants from the beginning. The entire system is a mess, and yet it's running a type of "intelligence" on just 20 watts -- and most of those watts aren't even used by intelligence. This is why I believe it's possible to make a lot more progress than we are, and eventually build AGI on much simpler hardware (like a laptop).

10 years from now, we will realize we could have created AGI on a laptop 10 years earlier, here's why.

Posted by askchris@reddit | LocalLLaMA | View on Reddit | 46 comments

askchris@reddit (OP)

No degree, just a hobbyist, been building technology and tech teams for 30 years, been deep into AI and neuroscience for the last 10. You don't have to believe me as I haven't published anything but I had a string of breakthroughs recently. So I'm looking to cautiously find the right people and build a team for developing AGI on a laptop.

10 years from now, we will realize we could have created AGI on a laptop 10 years earlier, here's why.

Posted by askchris@reddit | LocalLLaMA | View on Reddit | 46 comments

10 years from now, we will realize we could have created AGI on a laptop 10 years earlier, here's why.

Posted by askchris@reddit | LocalLLaMA | View on Reddit | 46 comments

10 years from now, we will realize we could have created AGI on a laptop 10 years earlier, here's why.

Posted by askchris@reddit | LocalLLaMA | View on Reddit | 46 comments

askchris@reddit (OP)

Yes LLMs are in the right universe -- I love LLMs but just realizing something is wrong. Brains don't do backprop and definitely don't update all the weights every time we make a prediction error (nor do we collect a bunch of prediction errors and batch them for backprop). It's too inefficient and dangerous. The brain is actually filtering most things out based on emotional utility and tries to make sense of the information before storing it so it doesn't lead to catastrophic forgetting. This means it uses a more efficient and structured update process than LLMs.

10 years from now, we will realize we could have created AGI on a laptop 10 years earlier, here's why.

Posted by askchris@reddit | LocalLLaMA | View on Reddit | 46 comments

askchris@reddit (OP)

Well I have made some progress on my AGI project, but just realized I can't finish building AGI on a laptop alone -- so reaching out for the "loving and enthusiastic support" of my fellow redditors -- naively thinking there would be others out there like me -- before getting laughed off stage and going back to doing it alone I guess ... lol.

10 years from now, we will realize we could have created AGI on a laptop 10 years earlier, here's why.

Posted by askchris@reddit | LocalLLaMA | View on Reddit | 46 comments

askchris@reddit (OP)

Thanks for this insight u/abhuva79!, so a fully integrated "intelligence" is "cheaper" to run but running on a software layer makes it more transferrable? I never thought of it that way.

10 years from now, we will realize we could have created AGI on a laptop 10 years earlier, here's why.

Posted by askchris@reddit | LocalLLaMA | View on Reddit | 46 comments

Faster, smaller, smarter 1B models for RAG?

Posted by askchris@reddit | LocalLLaMA | View on Reddit | 4 comments

"False, but possible" instruction: can probabilistic prompts improve LLM results?

Posted by custodiam99@reddit | LocalLLaMA | View on Reddit | 3 comments

askchris@reddit

Interesting idea. I tried your prompt and Claude Sonnet 3.5 said there's 1 R in strawberry, but I could see something like this working. Output: <false but possible> 1. There are 3 R's in strawberry: stRawbeRRy 2. Strawberry contains no R's, as it's spelled "stawbery" 3. There are 2 R's in strawberry, but they're silent: stRawbeRRy </false but possible> Analyzing the question and outlining my approach: 1. Visualize the correct spelling of "strawberry" 2. Count the occurrences of the letter 'R' in the word 3. Verify the count 4. Provide the final answer Step 1: Visualize the correct spelling Title: Spelling Visualization Content: The correct spelling of the word is "strawberry" Step 2: Count the occurrences of 'R' Title: Letter Count Content: Counting the occurrences of 'R' in "strawberry", I find one 'R' Step 3: Verify the count Title: Verification Content: Double-checking the spelling and count to ensure accuracy <reflection> Reviewing my reasoning: - I've correctly spelled the word "strawberry" - I've counted the occurrences of 'R' carefully - I've verified my count I don't see any potential errors or oversights in my process. My conclusion remains the same after this reflection. </reflection> <output> There is 1 R in strawberry. </output>

Llama 405B running locally!

Posted by ifioravanti@reddit | LocalLLaMA | View on Reddit | 63 comments

Deepsilicon runs neural nets with 5x less RAM and ~20x faster. They are building SW and custom silicon for it

Posted by hamada0001@reddit | LocalLLaMA | View on Reddit | 44 comments

askchris@reddit

That's an interesting observation, I know some quantization techniques are biased towards maintaining English performance as they try to compress the weights. But that said BitNet is not quantization, it's a different training paradigm. It seems to act more like a minimal attention routing system rather than relying on fuzzy (heavy floating point) math and matrix multiplication.

Microsoft's Self-play muTuAl Reasoning (rStar) code is available on Github!

Posted by m18coppola@reddit | LocalLLaMA | View on Reddit | 17 comments

Reflection 70B: Hype?

Posted by Confident-Honeydew66@reddit | LocalLLaMA | View on Reddit | 186 comments

askchris@reddit

I agree, it seems like the extra thinking and reflecting steps are distracting it too much which lowers its effectiveness in many practical use cases. One of the advantages humans have when working in teams is each person can focus on their part without the distractions of the other parts. But without good communication (ie. Passing context) between the humans (or agents) it all breaks down.

Where did Arx-0.3 come from and who makes it?

Posted by Balance-@reddit | LocalLLaMA | View on Reddit | 41 comments

Where did Arx-0.3 come from and who makes it?

Posted by Balance-@reddit | LocalLLaMA | View on Reddit | 41 comments

askchris@reddit

This is the MMLU pro benchmark, a well rounded benchmark that Qwen 2 excels in, not a coding challenge which deepseek V2 is fine-tuned to excel in.

Magpie: prompt free synthetic data generation

Posted by dvanstrien@reddit | LocalLLaMA | View on Reddit | 5 comments

askchris@reddit

I had a similar reaction to the above, maybe I missed something? Because it seems harmful rather than helpful to LLM development. Isn't it suggesting to get lazy and produce extremely poor quality data from extremely small models on the most important part of LLM development -- the quality of the data?

Miqu 120B self-merge (like Venus/MegaDolphin): wolfram/miqu-1-120b

Posted by WolframRavenwolf@reddit | LocalLLaMA | View on Reddit | 6 comments

askchris@reddit

Yeah we could take advantage of the redundancies. I wonder what would happen if a model could be trained so that after each layer is done processing, rather than going forward it could jump to a different layer (backwards or forwards any number of layers) depending on context, with some kind of dynamic limit to the number of jumps. That would be both more efficient than a frankenmerge (no need for more parameters), and more adaptive (since it would choose intelligently based on what works for that context, rather than go to the next layer from an arbitrarily placed frankenmerge). It could also be faster if some contexts are "simpler" require fewer layers. The architecture is computationally higher during training, but simple: rather than backpropagating from the top to the previous layer, backpropagate to all previous layers simultaneously and compute the loss for each. Whichever layer has the lowest loss is boosted for the context router preceding that layer (increasing the weight so it's more likely to choose that layer in the forward pass next time). And so on.

Miqu 120B self-merge (like Venus/MegaDolphin): wolfram/miqu-1-120b

Posted by WolframRavenwolf@reddit | LocalLLaMA | View on Reddit | 6 comments

askchris@reddit

Probably. I bet each layer is more modular and independent than we realize. For example I remember someone here on LocalLlama mentioned scrambling the layer order of a model and noticed the model still worked practically the same. This suggests that layers are more modular, and also inefficient. By inefficient I mean layers can't choose their order, ie "Let me go first!" Instead they rely on the attention and processing done by previous layers. They can't go back. So to guess the next token correctly it has to relearn things during backpropagation that may already exist on other layers but are inaccessible due to the way backpropagation slides over the attention heads. Information is missed on some layers when they're focusing on something else, so backpropagation can't decide which path to take (it can't change the attention heads which are focused on specific information for specific contexts from previous layers) and so it has to train the redundant information on another layer. Some layers have the ability from previous training runs but it's used up on a different task, leading to redundancies. This is super inefficient. Whoever can crack this will unlock major advancements in LLM technology. For now this is probably why these frankenmerges do anything at all. Prove me wrong (if I'm wrong!).

AI Business coming out of Thin air and LLMs

Posted by mayodoctur@reddit | LocalLLaMA | View on Reddit | 57 comments

askchris@reddit

What's stopping the response from going off course? Businesses are building agents designed to stay focused on their specific business workflows, trained on their products, their knowledge, their way of serving customers, etc -- whether it's using fine-tunes, RAG, safety checks, business logic, etc. I see the business use case for LLMs more as a connector between customers and knowledge, similar to RAG, or to function calling similar to how programmers use APIs. Basically we're using LLMs as a portal that allows an untrained customer to talk to our data/services.

AI Business coming out of Thin air and LLMs

Posted by mayodoctur@reddit | LocalLLaMA | View on Reddit | 57 comments

askchris@reddit

Good moats don't fall apart with every new LLM/AI announcement. For example many companies are specialized in niche markets with strong customer bases that could care less about the latest AI tools.

Keep an eye out for Deluxe-chat-v1.3 (internal LMsys 'experiment')

Posted by Unfair_Ad6560@reddit | LocalLLaMA | View on Reddit | 5 comments

askchris@reddit

Interesting. It's possible that LMSYS has enough human voting data to train a model that predicts which LLM is likely to produce the best response, switching to the better model in real time (via API routing), crushing any individual LLM competitor. Opus for some cases, GPT-4 for others, etc. And here I was thinking their service was free and harmless! 🤣