TheaterFire

Where did Arx-0.3 come from and who makes it?

Posted by Balance-@reddit | LocalLLaMA | View on Reddit | 41 comments

Where did Arx-0.3 come from and who makes it?

Reply to Post

41 Comments

AccountantDry2483@reddit

They just posted this. Woah https://x.com/appliedgeneral/status/1884738566645018932?s=46
View on Reddit #47160638

FarVision5@reddit

Smells like a handful of AIML employees got shit canned and wanted some of that easy VC AI race money with a fake benchmark puff-up exit scam. This is the only website I need to see [https://arxiv.org/search/?query=Arx-0.3&searchtype=all&source=header](https://arxiv.org/search/?query=Arx-0.3&searchtype=all&source=header) I can't tell you how many AI-sus projects I see out there with young people scamming VCs. Halfway through a new and interesting GitHub project red flags start going up because none of the code works isn't linted for shit and there were a bunch of fake ass new contributors who have botlike Behavior submitting a bunch of stupid RPs with minor changes to generate traffic. You start digging into some of the authors and it's some kid with a couple repos but instantly popped up at the exact same time with the bunch of obvious Auto generated BS code and BS readme's and dizzying landfill of words that probably would impress someone that hasn't worked in it.
View on Reddit #35832417

Warm_Iron_273@reddit

Some janky scam that means nothing because the benchmark question set is public.
View on Reddit #34675476

leadfaarmr@reddit

Source? Evidence? Links? Proof?
View on Reddit #34905617

Warm_Iron_273@reddit

Lol. You coming to this 3 days later, on an account with only two comments, and one of your other comments is some wallstreebets jank. Way to reinforce what I'm saying, you obviously are associated to the project.
View on Reddit #34906309

leadfaarmr@reddit

Just a scientist trying to learn more and i don't use reddit much. Instead of attacking me and scrolling through my comments, why don't you provide your reasoning or evidence for your claims?
View on Reddit #34956343

Striking_Most_5111@reddit

Does anyone know how to use it?
View on Reddit #34681429

Airbus_Tom@reddit

by this org (never heard before): [ARX (agi-v2.webflow.io)](https://agi-v2.webflow.io/arx)
View on Reddit #34551250

Warm_Iron_273@reddit

Yeah, so literally a fundraising scam.
View on Reddit #34675493

Airbus_Tom@reddit

I hate when those orgs do not provide more info about their model.
View on Reddit #34677698

_supert_@reddit

Cracking website.
View on Reddit #34554493

Airbus_Tom@reddit

no useful info on the website
View on Reddit #34556087

bulletsandchaos@reddit

It really reads like a VC pitch “A path beyond LLMs to a new paradigm for intelligence.”
View on Reddit #34556435

bulletsandchaos@reddit

Their actual URL is agi.live - the deployment of their website is janky.
View on Reddit #34555431

CeFurkan@reddit

Until i test and compare myself i don't trust these benchmarks not a bit. Currently king is claude 3.5 sonnet
View on Reddit #34622650

Formal-Narwhal-1610@reddit

iAsk.ai claims 86 percent on MMLU Pro, https://iask.ai/mmlu-pro
View on Reddit #34571789

Pojiku@reddit

Wish there was more detail. They are an AI Search company like Perplexity, so they may have been using RAG to answer the questions rather than just the model itself.
View on Reddit #34574336

Dayder111@reddit

I think various forms of storing the information in precise databases, but in easy to retrieve and understand form, is better than storing it in neural network weights, and is the future. The neural network I think should have as good as possible general understanding of the world, of processes, phenomenons, associations and relationships, but not facts. It might still be useful for them to remember some facts, but always check them from the precise databases that they are tightly combined with. Evolution of biological organisms couldn't create such symbiosis, couldn't create precise forms of learned data storage (keyword is learned, during the organism's life time). We can.
View on Reddit #34619865

UnchainedAlgo@reddit

I’m a bit intrigued. From their CTO (Thomas Baker) at LinkedIn “When we say AGI, we’re taking about a highly opinionated approach that looks beyond LLMs. It means developing these incredible aspects of Ai without needing massive data centers and Nuclear Power Plants to do it! I’ll be excited to share some incredible updates with you all in the coming months.“
View on Reddit #34552447

VeryRealHuman23@reddit

this reads like it was written by AI or a marketer who has no idea what they are doing.
View on Reddit #34561257

AnticitizenPrime@reddit

AIs wouldn't randomly capitalize 'nuclear power plants'. :)
View on Reddit #34563735

vert1s@reddit

That's what you think, I have a prompt setup to trick you by telling it to use bad grammar and capitalise badly
View on Reddit #34596036

DesignToWin@reddit

Speech to text, right? Voice keyboards sometimes randomly Capitalize stuff.
View on Reddit #34586237

AbheekG@reddit

Honestly not really
View on Reddit #34568945

Hemingbird@reddit

Applied General Intelligence is apparently the company behind the model. > We recently submitted Arx-0.3 to MMLU-Pro, the latest and most challenging Massive Multitask Language Understanding benchmark to validate our research assumptions and assess our technical approach. This submission will help us track progress toward developing general intelligence capable of understanding, reasoning, and explaining beyond patterns. > Arx-0.3 operates with coherence-based comprehension via universal language understanding. The system is designed to solve multi-step problems and perform deliberate reasoning across domains. MMLU-Pro's focus on these same capabilities, and alignment with practical applications, makes it ideal to validate our assumptions and direction Based in Austin, Texas. [Website](https://www.agi.live/) says, "A path beyond LLMs to a new paradigm for intelligence". Employees include: - Kurt Bonatz (Co-founder/CEO) - "Jerry" Xiaolin Zhang (Co-founder/Chief Science Officer) - Robert Montoya (Software Engineering Leader) - Thomas Baker (Chief Technology Officer) - Dapeng Tong (Software Developer) Their CEO promises full explainability and zero hallucinations. He says in a pitch their model isn't a "black box," so it doesn't sound like a standard neural network approach. [A Google Groups user with the name Xiaolin Zhang, signing his name as *Jerry* Zhang, asked a series of questions about NELL in 2016](https://groups.google.com/g/cmunell/c/wTFyFU_rafk). NELL (Never-Ending Language Learning) is a semantic machine learning system. Apparently, Jerry was "working toward an entry for IBM's Watson AI XPRIZE Competition". I don't know if this is the same "Jerry" Xiaolin Zhang, but it would be quite the coincidence if not. So ... LLM + knowledge graph?
View on Reddit #34578601

Homeschooled316@reddit

> paradigm [Aren't these just buzzwords that dumb people use to sound important? I'm fired, aren't I?](https://www.youtube.com/watch?v=ea5L2hQurWA)
View on Reddit #34587059

Crazyscientist1024@reddit

True if huge or maybe just training on test set is all you need
View on Reddit #34553462

askchris@reddit

The questions and answers to MMLU pro are public, so it's easy to get 90%-100% with a small model trained on the answers.
View on Reddit #34574145

ksym_@reddit

Wasn't MMLU **pro** benchmark the one where the questions are actually held out from the public? Did they end up publishing it?
View on Reddit #34582156

mikael110@reddit

You're likely confusing it with another benchmark like GPQA . [MMLU-Pro](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro) is public, and to my knowledge it was never considered secret. The main point of the Pro edition was just to clean up the mistakes in the original benchmark and to be a bit harder..
View on Reddit #34584103

ihaag@reddit

Qwen2 better than deepseekV2 I don’t think so!
View on Reddit #34555587

askchris@reddit

This is the MMLU pro benchmark, a well rounded benchmark that Qwen 2 excels in, not a coding challenge which deepseek V2 is fine-tuned to excel in.
View on Reddit #34573920

Healthy-Nebula-3603@reddit

Qwen 2 72b is very good and old for today's standard ... probably soon introduce V3.
View on Reddit #34564335

Balance-@reddit (OP)

Where did this top-scoring model on [MMLU-Pro](https://huggingface.co/spaces/TIGER-Lab/MMLU-Pro) come from, who makes it and why haven't I heard of it?
View on Reddit #34551160

rorowhat@reddit

Have you tried it? curious to know if anyone has experience with it.
View on Reddit #34572391

ambient_temp_xeno@reddit

They seem to be a relatively small British company. This guy might be their secret sauce https://www.researchgate.net/scientific-contributions/Simon-M-Stringer-2163805127
View on Reddit #34554327

mjolk@reddit

Nice find! Where did you find the company/staff profile?
View on Reddit #34555151

ambient_temp_xeno@reddit

https://find-and-update.company-information.service.gov.uk/company/12211733
View on Reddit #34555464

enigma707@reddit

It’s seems like the brains of the operation just recently resigned from that company.
View on Reddit #34568810

ambient_temp_xeno@reddit

From 2016. https://preview.redd.it/ggbbcf0jt0md1.png?width=885&format=png&auto=webp&s=b25c93c7e3817320302b1d68f7a5c19f104f026e [https://nautil.us/westworld-is-strikingly-real-ai-could-be-conscious-and-unpredictable-236291/](https://nautil.us/westworld-is-strikingly-real-ai-could-be-conscious-and-unpredictable-236291/)
View on Reddit #34572174

Additional_Test_758@reddit

I was wondering this, too. Not only is it top, it's also not a self report.
View on Reddit #34552633