LLM an engine
Posted by localremote762@reddit | LocalLLaMA | View on Reddit | 38 comments
I can’t help but feel like the LLM, ollama, deep seek, openAI, Claude, are all engines sitting on a stand. Yes we see the raw power it puts out when sitting on an engine stand, but we can’t quite conceptually figure out the “body” of the automobile. The car changed the world, but not without first the engine.
I’ve been exploring mcp, rag and other context servers and from what I can see, they all suck. ChatGPTs memory does the best job, but when programming, remembering that I always have a set of includes, or use a specific theme, they all do a terrible job.
Please anyone correct me if I’m wrong, but it feels like we have all this raw power just waiting to be unleashed, and I can only tap into the raw power when I’m in an isolated context window, not on the open road.
Iory1998@reddit
u/localremote762 The root of the problem you are discussing is simple to understand: AI development requires tremendous resources and highly qualified personnel. Only a few companies in the world can actually create and develop SOTA LLMs from scratch. And, these companies are in a fierce competition, so fierce they can't allocate resources to build on top of the LLM (i.e. the engine) they create or they would fall behind.
This is especially true as the AI race has become a political issue. They are relying on the OS community to find use cases for the models they create. Well, that's true until they hit the jackpot: a super intelligent model that can do the research they need.
As you correctly pointed out, current software layers built on top the raw LLMs are either unreliable or very limited, This reminds me of the bitcoin and eutherium periods where everyone rushed to mine the most of them and very few thought of actually building a strong infrastructure to promote their daily use.
For me, I think that the best analogy to explain the state of the AI today is a CPU without a motherboard, RAM, GPU, IOs, and a long-term storage medium.
Everlier@reddit
An LLM is a picture of intelligence
AdOne8437@reddit
https://en.wikipedia.org/wiki/The_Treachery_of_Images
Everlier@reddit
I think you'll find it amusing that same reference was brought up before: https://www.reddit.com/r/LocalLLaMA/s/uDghopoNhB
AdOne8437@reddit
:)
__Maximum__@reddit
Cars engine is doing exactly what it has been built for. You understand every part of it, and you know how to fix it if something breaks or needs enhancement.
This new engine is a faulty black box (hallucinations, quadratic cost, etc), and people are trying to fix it. It seems like fixing transformers is very hard, so a paradigm shift is required, which is expected to happen within a few years, considering the amount of resources invested in this field.
Of course, you can build systems accounting for the faulty parts. AlphaEvolve is the best use of this faulty engine I have seen yet. Even if no paradigm shift occurs within the next couple of years, we will see great returns from such systems.
nomorebuttsplz@reddit
What is the giveaway for you that transformers won’t be improved? To me it seems strange to say this when there is a new SOTA model released every 6 weeks or so.
__Maximum__@reddit
I didn't say they are not improving, I said it seems it's very hard to fix them, especially hallucinations, weak context, quadratic scaling, weak generalisation... there are advancements in those areas, but none are solved yet without caveats.
The best models today are still unreliable and require huge amounts of memory and compute. Given many years and huge resources poured in them results no fundamental change, it seems like we need a new paradigm.
nomorebuttsplz@reddit
Can you narrowly enough define "fundamental change" now, so that in six months or a year we can look back and test your hypothesis?
westsunset@reddit
True but how long has it been available? Incredible technology developing at break neck speed and people want the industry to be like products in development for decades. Chatgpt was released to the public 2 years ago!
localremote762@reddit (OP)
Not discounting its value or speed in anyway, just staring at an engine trying to imagine the door handle that opens the door, to sit in the car with a finely tuned gas pedal that too much or too little must be throttled at the moment of time to give the results for that specific situation. Oh and forgot about the brakes, the brake drums, brake fluid. Not to mention head lights, brake lights, interior lights for other situational problems.
westsunset@reddit
Yes, I get your point. There is an impatience I see in the general public that is just ridiculous. I think you correctly view the potential along with how much more work we have left to do. I did like your analogy
localremote762@reddit (OP)
Very well articulated — thank you!
lqstuart@reddit
The "body" is ChatGPT. That's it. That's the product. Someone already built it.
localremote762@reddit (OP)
There will never only be 1 winner with this.
tezdhar-mk@reddit
I guess give it a couple of years. Rush to ship anything AI is leading to lot of immature products.
localremote762@reddit (OP)
My thinking exactly. It’s too bleeding edge, but someone will figure out the rest of the car and we’ll all slap our forehead Homer Simpson style.
Ok_Appearance3584@reddit
Absolutely! My thoughts exactly, you are 100% hitting the nail in the head!
I have used the "raw" chat-based LLMs, they are impressive and smarter than me for sure, within a limited context. The key to good results is to explain the context, which is really hard to be honest.
MCP, tool calling, RAG - they are ... primitive I mean impressive, yes, but so primitive it's useless.
My hypothesis is the same as yours: LLMs are waaaaaay smarter than we think, already. They are engines on a stand like you said and nobody has figured out the mechanics how to connect it to wheels, bas pedal and steering etc. They are trying to get the engine itself to twist the wheels instead of having mechanics and gears do the work.
For example, a simple example: context memory. Take humans, my context window (working memory) is really small. If I'm multitasking, f.ex. household chores, I sometimes switch tasks to do something else and completely forget about the thing I left halfway done until my wife (an outsider) reminds me.
What you need is an operating system for LLMs. Instead of a limited chat system, you'd have the incoming message represent the state of the OS. For example, you could have widget-based text OS:2025-06-03T13:21:58
Investigate latest AI papers
...
Imagine many goals set by you and the LLM
Read and train on the AlphaEvolve paper
... imagine many tasks created by you and the LLM based on the goals or just individual, one-off tasks
...
...
...
Hmm, let's see, I have two goals in mind. The second goal is collapsed and I cannot view it. Perhaps I have collapsed it because it's not a priority right now. The other remaining goal with an open task is about investigating the AlphaEvolve paper and training on it. Let's see, I remember I have a widget with which I can download latest AI research papers. I also have a widget where I can summarize and convert large texts to training data. I also have a widget to update my neural network with whatever training data I want. Given that I don't see anything else, I shoul probably finish this task now.
The idea here is to create a real-time OS for LLMs. It would be text based, XML-like as shown in the dummy example. Every token the model outputs is actually fed into the OS, which then updates the state. For example, basic tokens would be fed as thought widgets. So the LLM does not have its own "chat box" where it writes and posts an answer. Instead, it appears "on the screen". You can then add special tokens like so LLM can select, expand, collapse etc interact with widgets with single tokens.
It requires some training for sure and you should understand that the example I gave is very simplistic. You could even have type of widget that contains the image base64 bytes. As long as the whole thing fits into 128k or 32k tokens, the LLM can basically operate in real-time. You can add type widget where you can post messages, LLM can respond and take action based on your input. For example, you can ask it to create a new task.
The idea is that the text-based OS can be created in any programming language like Python by almost anyone, you could create any number of widgets. It would create "guardrails" and a "systematic way of doing things" like LLM playing a single player game with strong narrative.
Also, a widget for raw conversation logs/files, some vector database thst include the summarized/compressed info with references to raw data, some kind of general "post it notes" kind of widget etc. Python console for quick calculations etc.
Thinking long-term, neural updates are the key here to make it really understand and evolve on its own. LoRA updates whenever something new is to be learned. The system must direct the LLM to update its weights on a schedule (like every night) or even during operation (if single pass update takes only a handful of seconds).
I think if this idea would be implemented (and I will implement it later this year, probably as open source python library), you'd start to move towards LLMs being able to really operate in the world. And again, I suspect they are already wya smarter than we think. I can't solve many math problems in my head but given a piece of paper and a calculator it's a different story.
localremote762@reddit (OP)
You’re a genius brother.
Megalion75@reddit
Great ideas.
SmChocolateBunnies@reddit
they aren't an engine, they are a transmission. The engine doesn't exist yet.
localremote762@reddit (OP)
Ok, then I feel like I’m staring at electricity, knowing it’s hugely valuable but can’t see the motor that will eventually be powering the Industrial Revolution
No_Afternoon_4260@reddit
I think that what you need to learnAI Agents vs. Agentic AI
localremote762@reddit (OP)
No doubt I have a lot to learn.
IrisColt@reddit
Just having the most learned, knowledgeable mathematician, a centaur hybrid of Euler, Gauss... at your fingertips isn’t enough?
localremote762@reddit (OP)
No question in its value, just not when there are so many other structures, frameworks, services, application etc that need to be setup in order to make it something useful. And all of those need specific monitoring, logging and tracking to be of any value outside of 1 time requests.
Only_Situation_4713@reddit
You’re doing it wrong. You need to learn how to scaffold your system
Unlikely_Track_5154@reddit
Please say more about that.
Lazy-Pattern-5171@reddit
I think commenter op is referring to the fact that post op is somehow trying to find magic in LLMs that can fix the gaps they have in their brains or their understanding and that’s just not how RAG or RLHF or MCP or any of that works. You cannot abstract away the problem itself. The problem must first exist, in your brain, to then be expressed as a concept to then be modeled mapped retrieved stored conceptualized shared conversated voiced pictured drawn etc. But you cannot say that a car sucks because for some reason it is unable to take me past this river I mean how dumb can it be?
Every time you think of somehow finding flaws in the system remind yourself if it fits in the “If my grandmother had wheels she would be a bike analogy”
What LLMs give you is just a different layer of abstraction for describing your concept. Yes people are getting oddly good results by throwing vague philosophical concepts and mind games at LLMs but fundamentally that’s not what they are.
Unlikely_Track_5154@reddit
I understand.
I wanted to learn about the guy's system for scaffolding code as well.
Lazy-Pattern-5171@reddit
That’s where software engineering and design comes in. And also product design
ThisWillPass@reddit
Aka build the rest of the car?
hidden2u@reddit
In terms of real world problem solving my roomba is 10x as capable as any frontier LLM model
NCG031@reddit
Allow self modifying LLM structure and associative long term memory? Answers (if it cares enough to answer at all) few times in a year with current hardware...
You_Wen_AzzHu@reddit
You either find an existing half baked solution of what you want and continue working on it, or you figure it yourself. This is the beauty of open-source.
Feztopia@reddit
It depends what you want. If you want a Robot that enslaves you, maybe you are right. If you want a tool that can do some quick Google searches for you and summarize some text, you already got a pretty good race car.
terminoid_@reddit
when hype meets reality
davidtwaring@reddit
this is a great analogy that I resonate with.