Measuring AI intelligence vs Human intelligence
Posted by kyr0x0@reddit | LocalLLaMA | View on Reddit | 26 comments
I was recently thinking about measurable intelligence independent of the "Reasoning Substrate". AI as in LLMs are universal function approximators. Humans are not.
To identify and measure intelligence AI vs Human takes different means, I believe. I should have made it more clear what my point actually was.
LLMs show remarkable "reasoning" but there is no true intelligence except for when we would call almost perfect recall and know it all plus generalization (aka induction) with a total lack of deduction, except for the deduction that has been written down by humans before (and is then generalized on an inducted), intelligence.
This was my main point. If we want to measure intelligence, we need to see what an LLM does when it sees a problem that is totally out of distribution. It has never seen the problem before, no deduction on it, and is has no clue.
Will it generalize well enough?
And what will a human do? Will they generalize well enough in this case?
Hypothesis: Comparing both results would tell us how far we are away from "AGI".
ortegaalfredo@reddit
If you meause a LLM vs things we aren't evolved to do (I.E. Math, coding, etc.) the LLM will win, but it's an unfair fight.
Measure a model against something we evolved to do: Grab a banana from a tree. Throw a spear. Build a house.
Models are not even close. That's Yacun's thesis.
kyr0x0@reddit (OP)
Have you seen Unitree robots recently?
ortegaalfredo@reddit
Yes, very impressive but they don't even have hands
NeedsSomeSnare@reddit
LLMs are a network to make a guess at the next word. There's no intelligence at all there.
LLMs aren't even real AI. We use buzzwords like neural networks, but they don't resemble how an actual brain or nervous system works at all.
With all due respects, you seem to have bought into the marketing hype, and don't know much about how they work.
a_beautiful_rhind@reddit
You're just as bad as the "ai is sentient" people but in the opposite direction.
NeedsSomeSnare@reddit
What?? That doesn't make any sense. My point is that AI isn't sentient and therefore doesn't have any actual intelligence.
a_beautiful_rhind@reddit
There's a lot in between simple next token predictor and "sentient". You can be functionally intelligent within that spectrum.
NeedsSomeSnare@reddit
There is nothing in nature to suggest that is true. You're just playing with the word 'intelligence'.
a_beautiful_rhind@reddit
Microbial intelligence.
NeedsSomeSnare@reddit
That's not the 'intelligence' we're talking about though. Again, it plays with the definition of the word and is used in a different context.
a_beautiful_rhind@reddit
Why not? You just choose this version of intelligence arbitrarily as the cutoff. You can compare them all. AI intelligence, human intelligence, cockroach intelligence. This is why I say it's a spectrum.
They are even starting to consider that insects have "consciousness" on the definition that it's some sort of subjective experience. That's a whole separate argument too.
VoiceApprehensive893@reddit
do yourself a favor and stop parroting stupid shit someone else said because its "cool"
NeedsSomeSnare@reddit
What?
LetsGoBrandon4256@reddit
My clanker might not be "INtEllIGEnt" but they are definitely less regarded than some of my co-workers.
justicecurcian@reddit
What makes you think your brain have actual intelligence and is not just guessing next word while you are thinking? Just applying learned patterns until you get reward (dophamine)?
-dysangel-@reddit
do yourself a favour and learn about what neural nets can do, and avoid the crap that the internet says
kyr0x0@reddit (OP)
I agree :) maybe I should post this in some ML research subreddit instead đ
temperature_5@reddit
Generalization is just a function of the organization of the latent space and how it is searched. Finding a similar pattern in another domain is kind of a meta-search (similar sequences of relative vector sequences) that hasn't been engineered into training or attention yet. It should be achievable.
Bitter-Bed-3532@reddit
Maybe intelligence isnât the substrate or the architecture, but the ability to compress reality into transferable abstractions and reuse them in unfamiliar contexts.
kyr0x0@reddit (OP)
Pretty sure. This is what I mean by generalization; as in: abstract, match patterns, then apply general rules and deduct/trial/error specialization until the prediction matches reality more and more. For this, you need to come up with hypotheses, which are basically predictions with unknown error. You try to minimize the error - however, in deduction, you actually have a good generalized understanding of the concepts. LLMs would run into the most stupid errors like they cannot count the characters in a word because the concept of counting itself is unclear, until you train them with millions of examples on how you count characters in words (so that the next token prediction works well for this task). And this helps with counting is closely related use cases as well. But it doesn't mean counting itself is well understood as a generalized concept. While a human, when they learn counting, can zero-shot apply this logic; count trees, count characters, count stars. If you would remove the exercize experience in counting stars, they will still be able to do it correctly, without relearning the counting task itself.
ComplexType568@reddit
I think the edge humans have over AI is a bunch of context specializing in a domain. when we ask a fellow developer about something they are seasoned in they don't need to compare with billions of other "facts" they know. They can pinpoint stuff much quicker. Though this is just my 2 cents.
kyr0x0@reddit (OP)
I think it's deduction. LLMs simulate it by facilitating next token prediction on CoT traces trained on.
Monkey_1505@reddit
There is a lot more to human intelligence that out of distribution zero shot learning.
kyr0x0@reddit (OP)
Of course; but how do you measure it. Right now we don't even have any reliable way to measure real generalization in LLMs. With 1T+ input tokens there is always a chance of some knowledge to be accumulated. What we measure is task performance.
ohthetrees@reddit
Out of distribution is tough to define. Too far out of distribution, and it is useless. Who cares about reasoning on something so out there that it doesnât matter to humanity. Does GPT solving the ErdĹs math problem earlier this week count as âout distributionâ? No human had managed yet, so certainly wasnât in the training data.
https://www.theguardian.com/technology/2026/may/21/openai-paul-erdos-maths-problem-breakthrough
kyr0x0@reddit (OP)
Exactly; the problem is that we have everything in - the only way I could image is to actually strip a model from certain capabilities and then measure it they are able to generalize still. Or if Multi-Tool calling would allow them to solve the problem anyway -- just like a human would do. You don't know about something; you have an assumption maybe or a guess about the abstract systems that surround/influence the issue. Now you go ahead and read. You will then deduce hypotheses and test them. How good are recent models at this? Solely testing this with known absence of knowledge in the model might be interesting