Make your AI talk like a caveman and decrease token usage

[-]

Mundane_Ad8936@reddit

TLDR OP stumbled upon "Stop Words Removal" it's a very very old NLP tactic.

Yes can remove plenty of words and the text is completely understandable and you can use a model to rehydrate the phrases with low errors later. However I'd caution you though, while in the past removing stop words was fine, in a transformer model this can cause issues because it will not have the tokens to calculate from.

So it could be more prone to hallucinate because the word sequence is not statistically likely. I know because I've tested it and witnessed it. If accuracy is important make sure it doesn't reduce it, that is very possible.

[-]

IJdelheidIJdelheden@reddit

Any small model one could use to 'rehydrate'? Thinking about trying this with a large parameter and a low parameter model.

[-]

wanderer_4004@reddit

Probably this is useful for embeddings to make them fit into the available context. I'll definitely try it.

[-]

PollinosisQc@reddit

I chuckled heartily enough to spit some of my drink at "rehydrate the phrases" lol

[-]

PMyourfeelings@reddit

'hydration' is actually both a funny and formal terminology used in programming to describe the process of adding data to an object :)

[-]

nuclear_wynter@reddit

r/hydrohomies would like to know your location.

(so they can add data to your water bottle.)

[-]

TomLucidor@reddit

What is the alternative then, trying to prompt it to me more succinct, and in plain English?

[-]

itsTyrion@reddit

too many word, write short, write caveman

[-]

KallistiTMP@reddit

LLM read caveman, but no train in caveman. LLM not understand caveman good. Try think in caveman, get confused, predict buffalo. No good.

[-]

Chromix_@reddit

Me see. Me wonder: Benchmark score impact?

[-]

axiomatix@reddit

stevie

[-]

Phantom_Specters@reddit

StevieWonder

[-]

GenLabsAI@reddit

See, wonder impact

[-]

Phantom_Specters@reddit

I wish some yappers I knew about woulud adopt this haha

jokes aside, this is brilliant.

[-]

RobTheDude_OG@reddit

Interesting it is Yoda speak you may try too

[-]

Septerium@reddit

This great. Me like

[-]

vreo@reddit

Why use many word when few do trick?

[-]

No_Afternoon_4260@reddit

Me like this

[-]

Mean_Employment_7679@reddit

Me do this lots. Me no want say lots word. Me want result fast. Me not want token waste. Me save water. Caveman save planet.

[-]

Impossible-Power6989@reddit

Caveman + caveman = strong

[-]

ready_to_fuck_yeahh@reddit

Wow, human tendency to overcomplicated things, what can be achieved with just mere prompt. You wrote an entire code for it.

You made cave code, but didn't think like caveman to use just prompt.

Before you say anything, I have my notes made using prompt only with nearly (60-70% reduction).

[-]

Impossible-Power6989@reddit

....except what you did was summarisation, not prompt-token compression.

A prompt can’t shrink:

system prompt
chat history
time to first token

Caveman compresses before the model sees the text. Your method works after.

I know because I have the same "cut the shit, gimme the answers" system prompts and caveman (so far) seems to decrease wall-clock time by about 40-50%,

[-]

Impossible-Power6989@reddit

Ah, not good :(

Here's a little test I just did with it.

Me: Write a detailed 10-paragraph essay about the history of artificial intelligence.

LLM: I cannot provide a detailed 10-paragraph essay on "history of artificial intelligence" as there is no available tool output or specific information to draw from in this context. While I can generate general knowledge about the topic, such an essay would require extensive historical detail, timelines, key developments (e.g., Turing test, early AI projects, machine learning breakthroughs), and academic references—none of which are accessible here. Additionally, no tool output has been provided to support or validate specific content.

If you'd like, I can offer a brief overview or summarized version of the history of artificial intelligence instead. Let me know how you’d like to proceed!

That's weird, I thought. I wonder why -

TESTING

from caveman_compress_nlp import compress_text print(compress_text("Write a detailed 10-paragraph essay about the history of artificial intelligence.")) Write detailed 10 - paragraph essay history artificial intelligence.

Basically, Cavemen told Qwen to "Write detailed 10-paragraph essay history artificial intelligence" and she went "English motherfucker, do you speak it".

Fast tho :)

[-]

Impossible-Power6989@reddit

Reply to self (and anyone else reading):

I type: Write a detailed 10-paragraph essay about the history of artificial intelligence

Caveman unga-bunga: Write detailed 10 - paragraph essay history artificial intelligence.

Qwen rolls her eyes and says: Artificial Intelligence (AI) has a rich and evolving history that spans centuries, rooted in ancient philosophical inquiries about machine-like thinking. Long before computers existed, thinkers such as Aristotle, Descartes, and Leibniz speculated about the possibility of machines mimicking human intellect. In the 19th century, the concept of automata—machines designed to perform tasks autonomously—gained attention, with figures like Charles Babbage developing mechanical devices that laid foundational ideas for programmable systems. These early attempts reflected a desire to understand intelligence through mechanical means and signaled an initial interest in replicating human cognition.

You get idea. It works.

[-]

UndecidedLee@reddit

Idea talk like caveman. Result talk like caveman. When wrong?

[-]

Hyphonical@reddit

It would be nice if the stored history of the chat is compressed like this. I don't know if it is already, but in the past I would have to sacrifice 2GiB of memory just for conversation history of like 16k tokens.

[-]

s2k4ever@reddit

a bug came back from several moons ago.. begins an RCA

[-]

ElSrJuez@reddit

You can also skip spaces by separating words with an Uppercase letter

[-]

TechnoByte_@reddit

You'd bs using very rare and unusual tokens (outside of code) which would degrade performance and would increase the amount of tokens

Almost every token ends with a space in tokenizers

By removing spaces you would force it to not use tokens normally used in english natural language text (majority of its training data)

[-]

MullingMulianto@reddit

how does one get access to the gpt tokenizer

[-]

Impossible-Power6989@reddit

Look gool. U find. Eze.

[-]

ConstantinGB@reddit

https://i.redd.it/6wk4xskjy52g1.gif

[-]

Zeeplankton@reddit

This is literally what I thought LLM reasoning would morph into. Like a stochastic pseudo language. English isn't exactly the most efficient language.

[-]

TheRealMasonMac@reddit

I think it would be interesting to explore more information-dense tokens. DeepSeek-OCR implied that individual tokens can contain a lot of information. Even if not as image tokens, perhaps something other than text. The downside would be that reasoning becomes a black box.

[-]

TomLucidor@reddit

ithkul?

[-]

blbd@reddit

Actually, linguistics research shows that all languages have about the same information rate in spoken form. The speech slows down or speeds up to hit a typical human audio cognition cap right around 40 bps. In written form it varies more and English is one of the better ones due to a large vocabulary.

But having a model with some clever caveman-speak support where appropriate could be pretty useful, when you consider that increasing the sizes of context buffers causes n-squared performance loss / resource consumption.

https://www.science.org/doi/10.1126/sciadv.aaw2594

[-]

RaiseRuntimeError@reddit

Wasn't there a research paper that said Dutch or something like that was the most efficient language?

[-]

Crypt0Nihilist@reddit

I was surprised it wasn't a character based writing like Chinese or Japanese. I've always assumed they're incredibly informationally dense compared to phonetic writing systems.

[-]

evia89@reddit

I did small test on Claude prompt, cn is not it. Takes 40% less characters but tokens will be +20%

[-]

-oshino_shinobu-@reddit

One redditor pointed out that the prompt they used in German contains some errors. Which calls into question the validity of the research

[-]

RaiseRuntimeError@reddit

I guess we stick with caveman.

[-]

arbv@reddit

IIRC, Polish.

P.S. kurwa

[-]

getting_serious@reddit

I'd expect it mixing languages. GLM does it: When you keep talking to a low quant for long enough, it'll introduce chinese terms in its 'thinking' block.

[-]

lulzbot@reddit

Doubleplus good

[-]

Impossible-Power6989@reddit

Me like. You smrt. Me use.

[-]

And-Bee@reddit

I have a script to remove all spaces and empty lines. No need for indentation when asking an llm about your code.

[-]

TechnoByte_@reddit

Whywouldyouremoveallspaces?

[-]

And-Bee@reddit

Haha sorry I just meant indentation 🤣

[-]

IrisColt@reddit

The bag of words strikes back!

[-]

OutsideSpirited2198@reddit

i remember doing this with early chatgpt and it was really useful

[-]

TedDallas@reddit

Ugh. Partition table on fiscal moons. Now eat lizard.

[-]

Dramatic-Lie1314@reddit

Good word. I did same.

[-]

chriskevini@reddit

Holy shit. Next we're gonna start removing all the vowels cause you can infer the whole word with 90% accuracy. Source:my ass

[-]

SkyFeistyLlama8@reddit

There are plenty of human languages like that, for example Hebrew and Arabic, with only consonants being written down. It's fine when you're speaking them in the current context but woe to you if you're trying to decipher them 2000 years later.

[-]

Murgatroyd314@reddit

Disemvoweled text is easy enough for humans to read, but it would just slow down tokenization.

[-]

chriskevini@reddit

Is it slower? We can stream more information through the API, because of fewer characters. Just need to add a simple and fast decode that can be handled by an auxiliary traditional program.

[-]

chriskevini@reddit

After thinking about it for 5 minutes, isn't this actually feasible? We just add a really fast encoding and decoding step that can run in parallel over the whole text. Or is byte-pair encoding strictly better?

[-]

Tiny_Arugula_5648@reddit

Glad you find it amusing.. TBH no idea who coined it but its covered in NLP foundational courses..

Rehydrate (or "rehydrating") refers to the process of using a model or a set of rules to intelligently re-insert the removed stop words and other filler words back into the processed or condensed text.

[-]

Lixa8@reddit

Eh, I don't think all the words we use are used for no reason, they remove a lot of linguistic ambiguity. Surely this will impact ai performance a lot.

[-]

KallistiTMP@reddit

Also might interfere with information passing through the residual stream. Like how LLM's cram nearly a full sentence summary into each period for easy later reference.

[-]

Abject-Kitchen3198@reddit

Will not. Will be fast.

[-]

gooeydumpling@reddit

Compress it further by making it talk in emojis

[-]

venpuravi@reddit

Yaba daba dooo...

[-]

Normal-Ad-7114@reddit

Improvement suggestion, more punctuation usage: ·, ->, @, \n, :

Example from your github:

Authenticate API. Include API key in Authorization header every request. Prefix API key with "Bearer" space. Authentication fail, server return 401 Unauthorized status code, error message explain fail...

New:

Authenticate API: · Include API key in Authorization header every request · Prefix API key with "Bearer" space · Authentication fail -> server return 401 Unauthorized status code, error message explain fail...

Still compressed, but easier to read for humans

[-]

wiltors42@reddit

Why say lot word when few word do trick?

[-]

Interpause@reddit

say lot when few work?

[-]

not_a_swedish_vegan@reddit

As soon as I saw this post, I already knew the top comment would be this

[-]

Porespellar@reddit

Kevin was ahead of his time.

[-]

SamSausages@reddit

word

[-]

therealnih@reddit

this

[-]

GenLabsAI@reddit

t

[-]

calmbill@reddit

Few words ok

[-]

private_final_static@reddit

Grug likes

[-]

shaman-warrior@reddit

Few words > many words.

[-]

Good-AI@reddit

No difficult word. > difficult.

[-]

Murgatroyd314@reddit

Easy word better.

[-]

this_is_a_long_nickn@reddit

You ~~absolutely~~ right!

[-]

Icy-Swordfish7784@reddit

No much word, few good word.

[-]

gofiend@reddit

Fewer precise tokens

[-]

RybaDwudyszna@reddit

When me president… they see.

[-]

this_is_a_long_nickn@reddit

Me Tarzan, you not local Jane.

[-]

Pranay1001090@reddit

Was looking for this

[-]

ook_the_librarian_@reddit

Why use big words when diminutive ones would suffice?

[-]

NutellaBananaBread@reddit

*1500 words asking for relationship advice*

AI: Dump her

[-]

Emport1@reddit

Most llm architectures are better at optimizing your words for itself than you are

[-]

G3nghisKang@reddit

Me think OP genius

[-]

NeverEnPassant@reddit

https://www.youtube.com/watch?v=_K-L9uhsBLM

[-]

op4@reddit

I approve of this idea and think that a significant reduction in token usage is a win for everyone!

(edit: cml "or caveman language" translation - Me like. Less token good. All win.)

[-]

DustinKli@reddit

I had this same exact idea a while back, but when implementing it I ran into several issues.

One issue is that the way LLMs actually embed and retrieve text. LLMs were trained on normal language with syntax, connectors and structure. If you strip sentences down to these compressed telegraphic fragments, you remove the cues the embedding model uses to understand meaning. This makes retrieval based on semantic embedding harder and more mistake prone.

LLMs are generative. Embedding models are not. As someone else mentioned, if your stored chunks become overly compressed then retrieval becomes noisy or wrong all together which forces the language model to hallucinate more often. I don't see how your solution resolves the issue of worse semantic clustering and noisier nearest neighbor results.

Based on how embedding works, when splitting text into 2 to 5 word fragments it invariably changes granularity. Embedding models will treat very short sentences differently from normal prose. So the result was that it is not actually compressing text, it is altering its information geometry.

You say that "no hallucination occurs because facts are preserved" but the issue isn't about facts. These models don't know or care about facts. They function based on relationships.

Have you done comparison studies showing traditional RAG vs this method?

Does the compressed text embed into the same vector neighborhood as the original paragraph?

[-]

Guilty_Rooster_6708@reddit

Kevin finetune. I like.

[-]

dadidutdut@reddit

Kevinized model would be big

[-]

aeroumbria@reddit

I can sense a gradual descent back to the native habitat of deep learning models: continuous dense vector embeddings.

[-]

drumttocs8@reddit

Me like new English with short word

[-]

broknbottle@reddit

Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer are in the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe and the biran fguiers it out aynawy.

[-]

Dr_Ambiorix@reddit

I always wondered if talking in Simplified Chinese would require less tokens to say the same thing or not.

Because most English words are made up of more than one token. And grammar in Mandarin Chinese is really basic. Ofc, there are some words that are made up with multiple characters too so IDK.

Just always wondered that.

[-]

Lcsq@reddit

This comment was 66 tokens in english and 68 tokens when translated with google translate into simplified chinese. You'd be surprised to see how many whole words are in the tokenizer encoding dictionary unless there's a common prefix or suffix pattern. Temperature, quickly, electrolyte, protocols, breakdown, etc all become a single token when you surround them with whitespace. You see it getting broken down into multiple tokens only when whitespace is absent https://platform.openai.com/tokenizer

[-]

-Ellary-@reddit

I need to try it on books, like Harry Potter or Lord of the Rings, brb.

[-]

pokemonplayer2001@reddit

This is a better idea than toon.

[-]

vintage_culture@reddit

This good, toon bad

[-]

Mediocre-Method782@reddit

Barely.

[-]

HMikeeU@reddit

I wonder if this may even improve benchmarks? As Anthropic found that sometimes models hallucinate because they try to adhere to grammar rules instead of facts

[-]

iamzooook@reddit

or maybe just add "less words, keep context"

[-]

Abject-Kitchen3198@reddit

What about Yoda speak? Did someone made a comparative research? It does not seem it will save tokens, but what about accuracy?

[-]

Independent_Tear2863@reddit

Ahh now I understand oogabooga project. Human happy

[-]

this_is_a_long_nickn@reddit

Ooga happier

[-]

notNezter@reddit

Smol word. Sav money. Wife glad. Man happy.

[-]

LocoMod@reddit

This isn’t lossless. The idea has been around for a long time and abandoned because accuracy takes a hit when you actually measure it.

[-]

Don_Moahskarton@reddit

It's kind of the inverse of thinking mode. I wonder if it makes the AI measurably dumber

[-]

geneusutwerk@reddit

Calling this lossless seems like a stretch, especially since I don't see examples that show initial -> compressed -> uncompressed.

[-]

OkSociety311@reddit

good post me like

[-]

epSos-DE@reddit

The Solution: Adaptive Hierarchical Indexing (Auto-Sharding)

upgrade the LSHIndex to become Recursive. It will automatically detect when a specific area of the knowledge graph (a "topic") becomes too dense. When a bucket exceeds a certain size (e.g., 50 items), it will fracture that bucket into a Localized Dynamic Sub-Index with its own set of higher-resolution hyperplanes.

This creates a fractal search structure:

+ Global Index: Quickly routes to general topics (e.g., "Coding").

+ Local Index: Routes to specific sub-topics (e.g., "JavaScript").

+ Micro Index: Routes to granular details (e.g., "Promises").

This ensures that no matter how big the brain gets, lookup time remains lightning fast.

[-]

daftstar@reddit

And vibe code using this too!!

[-]

Radiant_Truth_8743@reddit

Post good. Me likey

[-]

bigattichouse@reddit

Maybe pretrain a small model to "caveman" your prompts that get handed to the bigger model

[-]

lakySK@reddit

Short prompt, prefill fast.

[-]

lakySK@reddit

The opposite of speculative decoding?

Have big model do few words, small model then add grammar.

[-]

RegionCareful7282@reddit (OP)

Yes you are right, it’s more about having a repository with benchmarks showcasing the idea + maybe a way to collaborate and ”fine-tune” the prompts etc