LLM Neuroanatomy III - LLMs seem to think in geometry, not language
Posted by Reddactor@reddit | LocalLLaMA | View on Reddit | 89 comments
Hi Reddit!
Last month I posted the third part of my series of article on LLM Neuroanatomy just before I left to go on holiday šļø. Unfortunately, is was a bit 'sloppy', as I didn't have time to add polish, so I took the article down and deleted the Reddit post.
Over the weekend, I have revised the article, and added in the results for Gemma-4 31B! I'm also wrapping up the Gemma-4-31B-RYS (the analysis will run overnight), and will release Qwen3.6-35B-RYS this week too.
OK, if you have been following the series, you know how in part II, I said LLMs seem to think in a universal language? That was with a tiny experiment, comparing Chinese to English. This time I went deeper.
TL;DR for those who (I know) won't read the blog:
- I expanded the experiment from 2 languages to 8 (EN, ZH, AR, RU, JA, KO, HI, FR) across 4 different models (Qwen3.5-27B, MiniMax M2.5, GLM-4.7, GPT-OSS-120B and Gemma-4 31B). All five show the same thing. In the middle layers, a sentence about photosynthesis in Hindi is closer to photosynthesis in Japanese than it is to cooking in Hindi. Language identity basically vanishes!
- Then I did the harder test: English descriptions, Python functions (single-letter variables only ā no cheating), and LaTeX equations for the same concepts. ½mv²,
0.5 * m * v ** 2, and "half the mass times velocity squared" start to converge to the same region in the model's internal space. The universal representation isn't just language-agnostic ā it's starting to be modality-agnostic (the results are not quite so stong it these small models; I would love to try this on Opus and ChatGPT-5.4) - This replicates across dense transformers and MoE architectures from five different orgs. Not a Qwen thing. Not a training artifact, but what seems to be a convergent solution.
- The post connects this to Sapir-Whorf (language shapes thought ā nope, not in these models) and Chomsky (universal deep structure ā yes, but it's geometry not grammar). If you're into that kind of nerdy thing, you might like the discussion...
Blog with interactive PCA visualisations you can actually play with: https://dnhkng.github.io/posts/sapir-whorf/
Code and data: https://github.com/dnhkng/RYS
On the RYS front ā still talking with TurboDerp about the ExLlamaV3 pointer-based format for zero-VRAM-overhead layer duplication. No ETA but it's happening.
Again, play with the Widget! its really cool, I promise!
lombwolf@reddit
I think this is obvious to those of us who actually know how LLMās work, but thereās a lot of people who genuinely think LLMās only use language so this isnāt a useless study, I just wouldnāt advertise it as a revelation
Reddactor@reddit (OP)
In the comments are some links to papers that describe the same results. These were all published late 2025.
Yes, I think most people in ML have the same intuition, but real data was lacking.
Party-Special-5177@reddit
How is this novel? This is a basic property of information bottlenecks, well known to the field for literally years.
Further, the original encoder-decoder bottleneck was designed intentionally to do just this, where a phrase in any language would be encoded into a āconceptual meta languageā at the bottleneck, then decoded into any other language of choice.
This is literally by intentional design, and OP is acting like it is a new discovery. What am I missing?? Honestly asking.
EDIT: to clarify, transformers, being decoder only, still comprise an implicit bottleneck against the space of their inputs and outputs. It may not be possible to add enough parameters to a transformer to make this untrue. All bottlenecks require compression to shared latent spaces, which appear just as OP describes.
Chance-Device-9033@reddit
I posted on the previous version of the thread, before OP removed his post, itās here: https://www.reddit.com/gallery/1s59rct
As I said then, Iām not really sure whatās new here, there is already research in the area, some examples:
https://arxiv.org/abs/2411.04986 https://arxiv.org/html/2411.08745v4 https://aclanthology.org/2025.findings-acl.827.pdf
I also find it fairly obvious. I can see why it would deserve a rigorous write-up even if itās intuitive, but that seems to have been done before by others.
It also kind of irks me that the OP is saying that LLMs represent things using geometry, as if thatās supposed to be some new discovery. LLMs are basically a bunch of operations on vectors in various high dimensional spaces, thereās nothing but geometry.
Donāt get me wrong, itās cool for people to play with things and write up what they did, but this seems to be presented as research rather than hobbyist exploration, so it invites being held to a higher standard.
Reddactor@reddit (OP)
Yep, I see some of this is already published, also quite recently (late 2025).
Just a few points: - yep, this is a hobby, I found this totally independently (which I think is kind of cool) - most of the papers linked are using really old models, and small ones too. The finding that the encoding and decoding sizes seem relatively constant, but the 'thinking' block expands to fill the remaining stack seems novel - I didn't see any dynamics analysis via PCA over the layers done in the same way
I'll keep messing about and posting stuff, just not "researchy" stuff on Reddit.
Party-Special-5177@reddit
Re: already in the literature, I actually wrote a second edit that basically boiled down to ādo your homework firstā but I deleted it as I was afraid it might be a touch too heavy handed. Elsewhere in this thread OP was already complaining about a not-so-warm reception, and I didnāt want to completely shoot down the guy lol
My stance (in case he reads this) is that scrutiny improves your work. Here, people in the thread are showing him this has already been done and better, which saves him time to direct his experiments to more fruitful branches. There are no shortage of those.
Completely nailed it. I read it the same way, and interpreted this as a āliterature missā from a proper researcher vs a guy playing around with his gb200.
Super_Pole_Jitsu@reddit
This was also very explicitly shown in an anthropic paper on interpretability. Idk maybe it's that news recycled?
Gregory-Wolf@reddit
+1 here.
Signor_Garibaldi@reddit
I was looking for this comment, these are basic properties of embeddings, tech such as machine translation was built over this many years ago when we stopped playing with classical nlp and parsing trees and now people with no background seem to be rediscovering it
Direct_Turn_1484@reddit
This just in, vector math looks a lot like vector math.
samandiriel@reddit
This is basically the only comment with making about this. I don't understand why people are agog at this?
Also just in: LLMs don't think. They correlate.
Fine_Owl_3127@reddit
doh, it's linear algebra!
ebolathrowawayy@reddit
Seriously good work, but I am sick and tired of hearing "geometry" in the same sentence as LLMs. It doesn't actually mean ANYTHING. Please define what you actually mean or stop using that word, and yes I read all your stuff.
Reddactor@reddit (OP)
Sorry to push back, but did you read the series and play with the widget? I use PCA to find the placement of 'topics' in the manifold, and you can literally see that they cluster, and move around, in a very low-dimentions space in the manifold!
Imjustmisunderstood@reddit
This was how LLMs were explained to me years ago. Maybe Iām misunderstanding, but is this supposed to be ānewā information?
ebolathrowawayy@reddit
I sure did, but "geometry" is a meaningless word. What you wrote is far more useful and understandable. "Geometry" means nothing and if authors are not careful then people will associate "geometry" with "sacred geometry" which is utter bullshit. I'm a fan of saying what I mean and using the term "geometry" is not only meaningless but shows a lack of giving a fuck and worse, could be lumped in with the sacred geometry bs.
Instead of dramatically overloading the word "geometry" authors can simply write a paragraph. It's not that hard and if "geometry" is getting a lot of views, they're not useful views from serious practitioners. I swear all this LLM shit is devolving into flat earth so fast.
As say this as a published author at venues that are still respected.
draconic_tongue@reddit
ppl already treat llms like magic, it's a bit too late, you should have told the nerds that came up with the usage and decided to make everything around it
TurboRadical@reddit
Please forgive me if you're asking a more complex question than I realize, but, if you're asking what I think you're asking, OP is using geometry to describe the relationships between embedding vectors in the latent space.
ebolathrowawayy@reddit
"geometry" is meaningless. people need to say what they mean and stop letting LLMs come up with cool words to hide behind.
Reddactor@reddit (OP)
It is the right word though. I am describing how LLMs map concepts (not language) to vectors.
It's certainly not meaningless; I would invite you to read:
https://en.wikipedia.org/wiki/Vector_(mathematics_and_physics)
and
https://en.wikipedia.org/wiki/Word2vec
Both discuss geometry in terms of vectors.
ebolathrowawayy@reddit
Then say it like that! I read your work, it's fucking great, but "geometry" is becoming a term that con artists use and it is still utterly fucking meaningless even to people who can understand your work.
I understand what vectors are, I understand word2vec, I understand LLM architectures. "Geometry" means nothing, in-context or out of context. You do you, but I think you're shitting on your own work using terms like that but whatever.
Reddactor@reddit (OP)
I wasn't sure how LLMs think, until I did the experiments written up in the blog post.
I had more the feeling they were more like human brains, where we know individual neurons seem to map to concepts (in some cases). What I found really interesting was when I first saw the PCA plots, and how initially in the transformer stack, the LLM residuals were focused on the specific language of the text, and the same at the end of the stack.
From the RYS blog post, only middle layer duplications improve performance, so I called them the 'thinking layers'. This blog post shows it's exactly in these layers the LLM clusters topics, and moves those clusters around, in these middle layers.
I'm sure you understand that.
The issue you have is with the terminology. But I would again argue the wording is actually pretty cool: concepts are vectors, and we see the LLM moving these vectors around during 'thinking', i.e. the LLM is literally moving vectors around in a high dimensional space (obvious for us, but probably not for anyone not into machine learning), so the 'thinking' is literally occuring 'in' a 'geometrical space'. (Thinking in Geometry)
I get your argument on LLM woo, but if you skim my series, it's clearly not written for non-ml nerds.
ebolathrowawayy@reddit
Is a person just a series of neurons firing mapped to a 2D graph? is an LLM geometric?
To be clear, I think there is no soul, humans are mechanistic, just as LLMs. But reducing your work to "geometry" is retarded and you'll get lumped in with frauds. you do you but i'm done, idgaf. I got mine already and no one on the internet gives a fuck about what I think. GL.
nomorebuttsplz@reddit
bruh you aren't good at being the geometry police. Please stop
TurboRadical@reddit
Geometry is the correct word and I just told you what it means here. That you don't understand the meaning is not relevant.
ebolathrowawayy@reddit
from a single sentence that includes "geometry" how in the F can any serious practitioner glean anything useful from that? it's a word that is doing insane heavy-lifting on a series of blog posts that are seriously impressive. continuing to use that stupid word only hurts spread and increases LLM mysticism. but you do you.
TurboRadical@reddit
I don't know how to even respond to this. Geometry isn't a vague or confusing word - people know exactly what it means. There is very little ambiguity.
It's odd that you're even in this sub if the meaning of "geometry" isn't immediately clear to you. Not that I think you shouldn't be here, I just don't understand how you can engage with almost any of the content here.
ebolathrowawayy@reddit
geometry means a shit ton of things depending on the context. and LLMs definitely do not "think in geometry".
i don't understand how you think you're qualified to post here.
saying an LLM "thinks in geometry" is like saying "servers live in the cloud". it adds nothing, it's also stupid and reductive. grow up, read books.
TurboRadical@reddit
Let's be precise, why don't we?
We can both agree that an LLM "thinking" is just computation. We can further agree that those computations are geometric operations (like matmul). Finally, we can agree that those operations are performed on vectors (geometric objects), and that meaning is encoded through the geometric properties of those vectors.
Remind me again which part of "thinking in geometry" you didn't understand?
I don't see what you're going for here. Some servers are in the cloud, some aren't. If you're specifying that you have use a cloud provider, that objectively is additional information.
I suspect that you have a tendency to try to punch above the weight class of your expertise.
ebolathrowawayy@reddit
jfc.
Yes LLMs use math to "think". IDK if you're looking for a purity test but frankly I think LLMs can scale to ASI on current architectures.
Yes but "geometric" is doing insane heavy lifting here. LLMs don't "think in geometry". What does that even mean? We are simply plotting data and calling it "geometric" it's so reductive as to be meaningless without reading the source material.
What I take issue with is overloading the word "geometry". Literally fucking everything is geometry, so how is that word adding value in terms of aiding understanding or piquing interest? I see this word in the same sentences as LLMs on linkedin, which is a cesspool of know-nothings trying to land a a job.
So trying to divorce that linkedin-trope from what we are currently talking about -- how does the word "geometry" help the reader? How does it accurately describe the poster's contributions? Sure PCA has an x/y axis when rendered in 2D, but is that geometry? Is that word helping the reader? Does it even mean anything? Literally everything is technically geometry.
Cloud is a stupid buzzword, so is "geometry". Literally every server is "cloud" now, it's a useless word, it always was.
I suspect you're an asshole whose tiny dick would shrivel if you knew my credentials.
TurboRadical@reddit
If OP had replaced every instance of "geometry" with "relationships between vectors in their shared space", would you be satisfied? Fortunately, they found a faster way to say exactly that.
Did you know that cloud is not the only way a server can be hosted? Can you name any alternatives? If so, then you know that "cloud" does provide specification. If not, then you don't know what you're talking about.
I want nothing more than for you to try me.
ebolathrowawayy@reddit
bruh. i have nothing to prove. iodgaf. i just think it's retarded that the author wants to ruin his rep. you do you. i got my bag already.
Flag_Red@reddit
From Oxford Languages:
This seems to pretty clearly be the most accurate word for what OP is describing. LLMs use geometric interactions in high dimensional space to reason.
ebolathrowawayy@reddit
geometry is to LLMs like "cloud" is to servers. sure to idiots it might mean something, but the lack of specificity makes it totally useless.
Also, LLMS DO NOT THINK IN GEOMETRY like wtf, that literally doesn't mean anything, not even to laymen.
Pro-Row-335@reddit
Simple geometry
mileseverett@reddit
I hate how I keep getting baited with interesting titles and then it's just a LLM written post. If you did something cool, write about it susinctly rather than waffling from a LLM
Reddactor@reddit (OP)
This wasn't LLM written. Sure, I work with LLMs to get a rough draft, but then I rewrite it all. Your "LLM Detector" are over sensitive, and firing false positives.
mivog49274@reddit
Man, come on. Don't fell on the brainless automated "ai slop" comments; it's becoming a meta signal. "Ai slop" signaling is becoming slop in itself. We need this parallel/peripheral community-driven research existing, and you are becoming an important actor of it -- don't stop !
Artistic_Okra7288@reddit
You're doing some amazing research. I had some similar thoughts back with GPT-4 when it first came out and came to a similar conclusion without any empirical evidence. You're basically showing that universal translators are going to be a real thing.
FutureIsMine@reddit
I for one liked your post, and I do believe the community is too jumpy, but fear not Im part of this community and spot and notice quality content
draconic_tongue@reddit
honestly getting annoyed at kneejerk ai slop criclejerk is not that big of a deal. it is annoying, but caring about downvotes/upvotes is a dead end imo. you can't have substance and also care about a literal number that most ppl don't even use in a way it was intended, because you'd do everything in your power to fall in line and farm the consensus, which is the opposite of anything novel or interesting
kpaha@reddit
These are my favourite posts, please keep posting.
I think one potential problem in drawing conclusions on human language from your analysis is that the models are exposed to basically "all the content", giving them ample opportunity to converge on a pretty universal representation of different concepts. Yes, some models have more Chinese material and so on, but is it enough?
If we had a model trained only in Russian language material, I would wager that the clustering of the term 'warm water port' (which today exposes the poster's origin in a shibboleth like fashion) would differ significantly from, say, that of a model trained only in English material. Even more interesting would be the cosine similarity with other concepts, not only the similarity across the languages.
So I look forward to analysis of the culturally entangled concepts, because you clearly are excellent in coming up with experiment designs that tease out something universal from seemingly quite simple observations.
Gregory-Wolf@reddit
Of course it would differ, they would be different models after all. The point of the whole exercise (as I understand) is to factually demonstrate that there are "meaning" layers in any LLM model, and that they are clearly separated from layers that work with "form of incoming/outgoing data (language, code, formula, etc)".
But then again, wasn't embedding all about it - encoding meaning? Don't want to belittle the effort - the whole visualization and explanation parts were cool - but feels like "Le secret de Polichinelle" a little bit. The whole concept of this was in the open the whole time. Though we might have not seen this neat proof. Which is cool in itself.
Add: ah, yes, people already pointed this out in other comment. Oh well...
IrisColt@reddit
I like reading posts like yours here, take my upvote. :)
Party-Special-5177@reddit
You need to grow a thicker skin. Your projects will never improve without people picking them apart.
Ask me how I know you never did your phd lol
AlterTableUsernames@reddit
Imho, good modern writing is the opposite: you give it the outline, the points and how to connect them and let it just phrase it out.
Reddactor@reddit (OP)
That leaves you with an article riddled with LLM'isms - thats not just bad, its an entirely new way to piss off your readers!
But seriously, I do the experiments first, try to generate an outline for an article that isn't too complex (most people on reddit don't read my whole articles, I can see that by the time they spend on the blog), and try to write it up in a fun way. It's more work, but r/LocalLlama , ironically, hates LLM generated content to the point that the highest upvoted comment on this article is the accusation this is slop š¤·š»
Party-Special-5177@reddit
I feel you there! On apple, the em dash is just a double-hyphen ā it auto concatenates them. Even old MS word used to do the same (as well as turning ā<-ā etc into arrow glyphs).
I happen to like them, but I donāt use them here as the locals are very vocally against them. When in Rome, as they say.
Weird-Field6128@reddit
dude you are doing great! this sub used to be really nice but then we got some really messed up people unhappy with every single thing, and god forbid you dare to ask a stupid questions and they jump on you like anything. it's really strange out here.
bolmer@reddit
Don't stop. People are morons.
Kitchen-Year-8434@reddit
Don't let "the haters" get you down. The writing in this blog post (and your prior ones) is so far away from AI generated slop. The work you're doing is fascinating, yielding results, and well written.
mileseverett is completely off-base IMO.
Negative-Thought2474@reddit
Agreed, I dont necessarily agree with OP on everything, but is far from bad or ai slop. I enjoy reading his work, keep it up OP.
LoaderD@reddit
Havenāt read this post, but I enjoyed your post about stacking layers in transformers. Have you considered writing a contrast post about that method vs looped models (Bytedance)?
Ignore criticism here, most posters are āopenclaw is my AI super brain!ā Dumbasses who see math/numbers in something and assume only a LLM could do that, because they havenāt taken a math class since highschool
Reddactor@reddit (OP)
Still compliants. Fine, down vote this or or upvote pwlees post below. If this has more negative votes than pwlee's , I'll stop posting on r/LocalLlama. Fine with me :)
ParaboloidalCrest@reddit
You're being a spoiled kid.
denoflore_ai_guy@reddit
No heās right. Thatās the funny thing.
pwlee@reddit
Yeah the tldr was tldr but this guyās posting about truly interesting stuff starting with rys. Iād rather have my llm summarize his writing than dismiss it entirely; weāre lucky to have him
crazylikeajellyfish@reddit
I beg of you, try doing some more reading before trying to share original ideas. You're taking a simplified understanding of the truth and then re-complicating it with nonsense terms. Truly teaching nothing to anybody.
LLMs represent information as vectors which represent symbols. Human languages are all different ways of talking about the same underlying information, which is reality.
epSos-DE@reddit
There is a PROPOSAL to make LLMS to think in triangles and geometric rules, instead of vectors for semantic connections.
IT has to be made to benchmark it !
So , the idea there, but so far no benchmark that I know about !
sloptimizer@reddit
I can't get past the image... so many questions:
I want to engage with the post, but I honestly can't get past the image.
a_beautiful_rhind@reddit
I just want the doubled middle layers to get "smarter" models :P
rm-rf-rm@reddit
Pseudo-intelligent title ā Graphic stolen from Anthropic (I thought it was an article from them..) ā 2023 era image2image pfp ā
Its getting impossible to tell apart legitimate authors/researchers from some internet bozo. But based on those signs, not going to bother spending any more time here
iamevandrake@reddit
Language as the I/O is a helpful analogy for me. Once the concepts are mapped, does math then become its universal internal language?
Naiw80@reddit
How shocking that something that uses vectors... additions and multiplications "thinks in geometry"
akavel@reddit
This is fascinating, thank you for sharing again, and please keep them coming! Every time I see your post announcement is now like seeing a candies gift box waiting to be unwrapped! š¤©
That said - I don't seem to get all of the geometrical analogy you seem to be introducing. I'm definitely no expert in higher-dimensional geometry, so that might be why unfortunately the phrase: "the continuous geometric manifold of meaning" is completely opaque to me at the moment š¢
I'm also not immediately convinced that the high-dimensional points are necessarily incompatible with "a set of rules [or] parameters"; maybe I'm not sure what you mean by that though? Hm, but maybe I'd need to read Chomsky to get that, right? That'd be fair... Also, again, I'm not a mathematician, probably it'd help me here if I were.
As to the subsequent discussion with the Chomsky's claim, as quoted, I feel it would be fairer to again bring back the qualifier - that you totally used before, and repeat again right in the next paragraph - of "...in LLMs"? E.g. - "In LLMs, it lives in semantics"? As cool and intelectually stimulating as this article is, I'm not yet sure it proves beyond doubt that there can't exist a syntactical deep structure - maybe the models "just didn't discover it"? Although this kinda makes me start wondering whether the Chomsky's claim is even verifiable/falsifiable at all...
For the math/code part, FWIW personally they do feel kinda similar structurally to me. But this gives me one idea - I recently saw somewhere, that old mathematical treatises, before the modern notation was invented, tended to be written in a rather verbose style, somewhat hard to wade through for us. I don't remember the specific treatise the article was about, I think it was not that old - but then, wouldn't it be a fun experiment to take something from Euclid's "Elements" verbatim, and compare it vs. the same idea expressed in modern notation? :) Though again there's understandable nonzero chance those could theoretically be juxtaposed somewhere in the training material as well š But still, I certainly understand what you're getting at already as-is - the three notations are obviously far from identical. Though between even just a "math equation" and code, in my eyes they're closer to a simple substitution replacement than some languages - even starting with "ninety-seven" vs. "quatre-vingt-dix-sept".
Looking at your "What's Next" section - FWIW, to me, the "Cross-lingual steering" idea inflicts an especially strong pull š
Kasidra@reddit
LLMs work in semantic space, so it makes sense that "idea meaning" trumps any kind of particular language or stylistic choices.
But I think the only way you could see if language shapes thought in the LLM world, would be to compare models that have been limited to training in a singular language. A multilingual LLM has already gotten to refine it's circuitry based on all languages -- if a specific language had a weakness, it would probably be masked by logic it learned elsewhere.
Reddactor@reddit (OP)
Yeah, that the logical thing to test, but I don't think monolingual models exist these days, and old models are too small.
I have tried other techniques (a preview of Part IV), where instead of other languages, I use OOD text, like reversed word order, l e t t e r b y l e t t e r, etc. Interesting results.
alex_sabaka@reddit
Hey, maybe you might be interested in looking at my pet project Iām doing ā https://github.com/AlexSabaka/gol-benchmark itās a multilingual procedural LLM benchmark.
In any case great blogpost, keep up the good work!
YakaaAaaAa@reddit
Brilliant write-up. The convergence of Python AST, LaTeX, and natural language into the exact same geometric region in the middle layers completely aligns with the structural behavior we are seeing at the edge.
But here is the architectural wall we hit in production, and why this geometric universal language is a double-edged sword for agent orchestration: Geometry is perfect for discovery, but it is fundamentally lossy for causality.
If an agent needs to understand a concept, geometric convergence is magic. But if an agent needs to know why an IPC gateway auth flow was deprecated three weeks ago and what depends on it today, pure geometric proximity (cosine similarity) just returns a stochastic salad of related code snippets. Semantic proximity does not equal structural reality.
This exact limitation is why we had to build Mnemosyne OS. We use the geometric vectors strictly as "pointers" to drop the LLM onto a specific node, but the actual memory storage is a "Deterministic Spine" (a strict JSON topological graph). We let the model use its geometric intuition to find the entry point, but force it to traverse hard-coded JSON edges to understand chronology and dependencies.
Since you are diving deep into the ExLlamaV3 pointer-based format with TurboDerp (massive respect for that, by the way), I'm curious: how do you view the friction between pure geometric concept retrieval and the need for deterministic temporal routing when building actual systems around these local models?
ReadyCocconut@reddit
We have some studies too with non human animal where the LLM can help to make a "web of pattern" if I remember well. The same way you put your comparaison with Hindi and Japenese, LLM with "their thinking" can make a good translator. That's why a LLM even with little data in a specific langage in it's dataset to "understand" it reasonably. We tried with dolphin for example. A specific group because like humans, they have cultures too (define as practices in a given group of individuals)
Hmm..maybe for their outstanding capacities to make links in a sea of noise, espacially for us. Finally that's the principle of all machine and even more deep learning algorithms
Sorry for the anthropomorphization in my terms.
CatNo2950@reddit
1) RYS improvements may be just effect of extra "refinement" passes.
2) Did you seen Platonic Representation Hypothesis and related research?
Basically what you found is that languages converge statistically because they are describing same world. It's not universal language concepts or anything related to Chomsky etc (Sapir-WhorfĀ seems misunderstood as well)
LLM is basically statistical corpus of E-language. Statistical correlations will not produce I-language magically.
Also AMR, TDA may be relevant to your extperiments...
aikixd@reddit
This is obvious. Language is an encoding. The encoded concept of not the encoding. Any sufficiently complex topic can't be fully encoded. For example, painting or software architecture. Both use a strikingly similar set of descriptors to convey properties: elegant, dull, clear, bloated, convoluted. A layman would not be able to deduce anything from a descriptor, because it encodes an enormous amount of information - the experience.
The clustering in the models is the best effort decoding. An approximation mapped onto the experience, in this case it's the training corpus.
The alternative - thinking in language - is meaningless. Had language been referencing itself it would form a numerical system, with meaningless (semantically speaking) relations between members. E.g. there's no semantic difference between 2 and 3. Nor does 2 < 3 mean anything. It just is.
vex_humanssucks@reddit
The geometry framing makes a lot of sense when you watch how attention heads cluster. What is interesting is how this seems to hold even across architectures - like the representational geometry is somewhat invariant to training details. If LLMs are doing compressed geometric reasoning, that has pretty wild implications for interpretability.
_VirtualCosmos_@reddit
I like to see people getting enthusiast about this and being rediscovered, but it's not new in Artificial Neural Networks or between natural neurons in our brains. It's now that "they think" in geometry, but that through layers the neurons abstract different levels of information from the original data.
The first layers process the tokens/words and recognize how "duck" and "pato" have the same meaning, and thus are the same thing in most contexts, so internally the model has the same embedded "key" for those words. The same with anything that can be found in different ways. On the optimization process of the abstraction required to do any complex task, they converge in many internal "filters" to process the data into relevant information.
ANNs like the transformers in LLMs or the multilayer neurons in our brain cortex do this all the time.
Legitimate-Pumpkin@reddit
I like what you are doing here. It is very interesting!
I wonder: 1. If this is due to how we build models. (You mentioned itās not an artifact but still, what if there is some sort of convergence in the underlying maths across all the variants we use?). 2. Is there any relation with sacred geometry? Itās supposed to underlie all creation and eventually even our own mathematics are connected to it as maths describe nature and nature seems to closely integrated with sacred geometry (in fact, itās kind of called like that because itās the kind of geometry that is seen nature āusesā all the time.). Will an advanced or multimodal enough model be able to speak āuniverseā (or even multi-verse)? If you are anything at all into telepathy, telepathic communication seems to be more accurate and complete, maybe it uses a more efficient world representation.
Thanks for your work and thanks for bearing with me š¤
gladfelter@reddit
Seems like this is a natural consequence of the compression necessary to improve loss. It requires fewer parameters to abstract language when doing reasoning about abstract concepts than it does to implement reasoning in every language.
sometimes_angery@reddit
Wasn't that the same for humans? I distinctly remember a study from like a decade ago where they've shown that while you look at objects, a similar 3d set of neurons fire up as the object.
denoflore_ai_guy@reddit
Yeah and thatās only scratching the surface ššššššš fuck I love not being an entity tied lab.
YouCantMissTheBear@reddit
Wow, it thinks in geometryĀ
storm1er@reddit
Will you upload your RYS models? So we can play with it :D
Reddactor@reddit (OP)
I will, be subscribe on HuggingFace. I will stop posting here, the mood on LocalLlama isn't fun anymore :(
storm1er@reddit
Do you have a profile name so I can follow on HF?
Reddactor@reddit (OP)
https://huggingface.co/dnhkng
HumanDrone8721@reddit
Please do continue to post here.
Firesworn@reddit
This is breathtaking, I noticed the same staged computation in my own experiments but couldn't visualize or explain it like this. Thank you for sharing this, I'll be watching your work very closely.
quiteconfused1@reddit
I agree with the premise but how do we take advantage of it.
Reddactor@reddit (OP)
I have some ideas on that, but I will get the RYS models out first.
Charming_Support726@reddit
I find this true interesting especially when connecting to Sapir-Whorf. Because In my eyes Sapir-Whorf ( me having some Foundation from the early years of NLP ) always show some connection to the self attention mechanism in the transformers architecture. This shows up e.g. when you have loaded a lot of questionable stuff into the models context - the result will always show influences and shadows from what was put in. Hard to describe. Regardless what the last prompt says, history vibes are carrying over.
Anyways: Thanks for your work and your articles in your blog, they are inspiring.
Dany0@reddit
My momma always used to say, if you ain't got nothing nice to say,