ICONN 1 is now out!

Posted by Enderchef@reddit | LocalLLaMA | View on Reddit | 160 comments

Hello to r/LocalLLaMA ,

Today is a huge day for us, and we're thrilled to finally share something we've poured an incredible amount of time and resources into: ICONN-1. This isn't another fine-tune; we built this model from the ground up, a project that involved a significant investment of $50,000 to train from scratch.

Our goal was ambitious: to create the most advanced and human-like open-source AI model under 100 billion parameters. And we believe ICONN-1 delivers.

What makes ICONN-1 special?

Mixture-of-Experts (MoE) Architecture: Built on a custom Mixtral framework, ICONN-1 uses dynamic routing through specialized expert pathways. This means incredible computational efficiency alongside enhanced performance.
88 Billion Parameters, 22 Billion Active: We've managed to achieve highly nuanced and contextually accurate responses while maintaining scalability benefits through sparse activation.
Designed for Human-like Interaction: ICONN-1 (this specific version) is optimized for natural, emotionally resonant, and genuinely conversational interactions. We've even benchmarked it against human responses on 500 questions for emotion and common sense, and it consistently shows the highest human-thinking benchmark scores.

Specialized Variant for Reasoning: We're also releasing ICONN-e1 in beta, a variant specifically fine-tuned for advanced reasoning, critical analysis, and complex problem-solving. This dual release represents a significant leap forward in versatile AI systems.

Open-Source Commitment: Our dedication to openness and accessibility is at the core of ICONN-1. We believe this is how we push AI forward together.

Ready to try it out?

Go to ICONNAI/ICONN-1 · Hugging Face

Don't forget to LIKE our model - It helps get us up to Trending!

[-]

Apprehensive_Page_87@reddit

how uncensored is it?

[-]

Enderchef@reddit (OP)

Our model is not censored (That we know of), and let me know if it is. It is not anything like Deepseek with it's censorship, and we have made sure to keep it uncensored unless it's dangerous or harmful.

[-]

Apprehensive_Page_87@reddit

true censorship is talk regarding sex, dating, race, religion. A model is censored if you can make jokes about jesus but not Muhammad, jokes about white people but not blacks, model leans left or right instead of center. Regarding this deepseek is actually not that bad, but most datasets are. It would be especially interesting if you are looking for natural conversations, because then it can be used to convince people about x or y.

[-]

butthole_nipple@reddit

Isn't the point of uncensored that you don't get to decide what's dangerous and harmful? For example, the CCP considering discussion of Taiwan dangerous and harmful.

[-]

Enderchef@reddit (OP)

We did not censor things like Deepseek and other Chinese companies have; "Dangerous and harmful" means things like dangerous acts; No censorship unless it is TRULY unsafe.

[-]

ninjasaid13@reddit

"Dangerous and harmful" means things like dangerous acts

well I guess that explains it 🙄

[-]

adi1709@reddit

*That we know of?

You own the data, so you know it's not censored? Unless there were biases introduced in the data collection process.

[-]

Enderchef@reddit (OP)

Yes.

Unless there were biases introduced in the data collection process.

[-]

NormalFormal69420@reddit

But can sex the bot?

[-]

Suspicious_Demand_26@reddit

😂

[-]

Zestyclose_Yak_3174@reddit

You got any more research, benchmarks, comparisons? If what you claim holds true this is a very exciting development, albeit presented a bit underwhelming.

[-]

poli-cya@reddit

I agree the presentation could use some spit-polish, but I checked the accounts in this thread and I'm not seeing a bunch of new accounts. Am I misunderstanding what you're saying?

[-]

Zestyclose_Yak_3174@reddit

I checked the github and huggingface channels and looked at account activity and age. Then searched the same account names and found some on other online platforms. Most are max five/six months old. Doesn't mean much, but at least seems to me that they came with this model quite out of the blue.

[-]

poli-cya@reddit

Oh, you mean their accounts for hosting the stuff. I thought you were saying they were astroturfing to drive engagement.

I'm fine with them making new accounts when they began working on this or accounts specifically for this endeavor but I'd have a huge problem with fake discussion/bot accounts driving engagement.

[-]

Zestyclose_Yak_3174@reddit

Like I thought already: Another ego lying and deceiving. Repo's gone. Accounts gone and weights don't work without errors. AI slop texts and no "we" only one guy. Pretty sure it's another faker like we've seen before.

[-]

Enderchef@reddit (OP)

Sorry, we just started publicizing. We have not posted much yet.

[-]

Zestyclose_Yak_3174@reddit

That's totally fine. Good things sell itself

[-]

RandumbRedditor1000@reddit

Judging how "Humanlike" a model is would be a very touch thing to do, since it's entirely subjective.
Since it's open source, just download it.
it's free, after all

[-]

Zestyclose_Yak_3174@reddit

I did. It felt off to me. Not working as expected, hence my previous statement

[-]

ansmo@reddit

Well looking at this entire thread and the hf page, seems like bought upvotes and astroturfing. OP keeps posting:

OFFICIAL MESSAGE

I sincerely apologize for the inconvenience. ICONN 1 is not functional right now. We predict it will be operational again in about 2 weeks to a month. I understand how frustrating this is(especially to us), and I want to let you all know that we are prioritizing the launch of ICONN Lite, which we aim to have ready in 1 to 2 weeks. Thank you for your patience and understanding during this time. I will provide another update on ICONN Lite in the coming weeks.

But this post still has almost 300 upvotes. This thing is confusing at best but looks more someone told claude to run a social media experiment. I hope it's real. I hope it's legit. It certainly doesn't look or feel that way.

[-]

a_beautiful_rhind@reddit

Well.. it's a model. There are weights. Either it sucks or it's good.

[-]

Zestyclose_Yak_3174@reddit

Hint: sucks

[-]

Gridhub@reddit

why did this get deleted? or is my reddit bugging?

[-]

DeProgrammer99@reddit

To give some examples for the other commenters... I ran four emotion-oriented prompts (which I asked Claude and ChatGPT to generate) through Q4_K_M.

All of them got stuck in a loop and had some weird token errors. I was wondering if that's llama-server's fault, because this is the first time I tried using the -np parameter, but I reran the first prompt and discovered it was doing the same thing as the others, just outputting a non-printing character. I did use the template llama.cpp has hard-coded, though, so maybe I should run it again with --jinja.

https://pastebin.com/VuFrQkbn

https://pastebin.com/XqqJYbVi

https://pastebin.com/mj2ya7px

https://pastebin.com/dyc4MYc3

Command line (I used -ot to fit exactly as much as I could on my 16 GB GPU):

llama-server -m "ICONNAI_ICONN-1-Q4_K_M-00001-of-00002.gguf" --port 7861 -c 32768 -np 4 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn --gpu-layers 99 -ot "blk\.(9|[1-9][0-9]).*=CPU"

Temperature 0 so the exact responses should be reproduceable by others.

[-]

a_beautiful_rhind@reddit

Use text completion and mixtral preset. That's what it looks to be made of.

[-]

Enderchef@reddit (OP)

OFFICIAL MESSAGE

I sincerely apologize for the inconvenience. ICONN 1 is not functional right now. We predict it will be operational again in about 2 weeks to a month. I understand how frustrating this is(especially to us), and I want to let you all know that we are prioritizing the launch of ICONN Lite, which we aim to have ready in 1 to 2 weeks. Thank you for your patience and understanding during this time. I will provide another update on ICONN Lite in the coming weeks.

[-]

Due_Price_8624@reddit

[-]

Zestyclose_Yak_3174@reddit

Proofs only that you don't understand how LLMs work. These type of evaluations have been disproven countless times. I am also curious on real world performance of this model, but this is not it.

[-]

a_beautiful_rhind@reddit

Can't bypass the realities of tokenization unless you specifically train on the question. Then someone just asks to count the s in Mississippi.

[-]

colin_colout@reddit

Ask it to add a dash between letters to force one token per letter. I'm 99% sure this is in frontier model training at this point.

[-]

a_beautiful_rhind@reddit

haha.. yea, good idea. Probably the way to train a reasoning model to do it from examples. b-r-e-a-k i-t u-p

[-]

Due_Price_8624@reddit

Just try out the demo and you'll see how it performs, either there's an issue with the demo or the model but it definitely doesn't perform well

[-]

Enderchef@reddit (OP)

That's ICONN Lite. You aren't chatting to our ICONN 1 model, you are chatting to a model we just started producing at 7B parameters.

[-]

Enderchef@reddit (OP)

That's ICONN Lite. You aren't chatting to our ICONN 1 model, you are chatting to a model we just started producing at 7B parameters.

[-]

vibjelo@reddit

If you're really committed to open source, openness and accessibility, are you also considering open sourcing the training code and what datasets you've used for the final weights you ended up with? I don't see any references to those anywhere.

[-]

silenceimpaired@reddit

If you’re pro open source, are you also considering donating to them?

[-]

Enderchef@reddit (OP)

We don't need anything. The open part of open source is being free and for everyone. I don't get some people's negative responses, though. Negative feedback is fine, but at least it's open source- ICONN could also be a closed-source LLM, and people aren't grateful for it.

[-]

silenceimpaired@reddit

You may want to use the term “open weights” as some in this subreddit take open source to mean you give them everything but the hardware to reproduce what you’ve done.

[-]

Enderchef@reddit (OP)

Yeah. Got me there when 20 people wanted the datasets and code and stuff.

[-]

vibjelo@reddit

If you’re pro open source, are you also considering donating to them?

No, I have no idea who they are or what they're working for, or almost anything else. All I know is what I read on this reddit post + the HuggingFace page, so probably I wouldn't.

I am donating to others in FOSS though, some can be seen from my GitHub profile I think: https://github.com/victorb

Regardless, I don't think only people who are donating within FOSS are the only ones who can be considered "pro open source", you can contribute in many ways.

[-]

silenceimpaired@reddit

I’m just teasing you because your original comment sounded quite entitled to me.

In my opinion, the AI scene has too many who put too much distinction between “open source” and “open weights”.

I don’t think it’s a black and white distinction, but a grade of grays of “open”.

[-]

a_beautiful_rhind@reddit

How does it work with existing inference software because per the config.json, it will only use 2 experts per token like other doubled mixtrals.

I remember the bagels and all of that stuff from a couple years ago and they worked similar.

[-]

Entubulated@reddit

FrankeMoE models like Bagel tend to lack properly trained routing layers, and their pasted together expert sets were generally given little if any retraining after having been sewn together. So by going with the entirely reasonable assumption that ICONN-1 was trained properly from the start, it will do much better than the Bagel series.

[-]

a_beautiful_rhind@reddit

While true, the config says to only use 2 experts so that is what exllama or llama.cpp will do. It can be overridden but I don't see any info on that.

[-]

Entubulated@reddit

And likely require at least retraining the routing layers to show real improvement with a greater experts-used count. /shrug
Will be poking at it soon, ICONN-1 downloaded and converting, started the process before seeing mradermacher had quants posted already.

[-]

Enderchef@reddit (OP)

OFFICIAL MESSAGE

I sincerely apologize for the inconvenience. ICONN 1 is not functional right now. We predict it will be operational again in about 2 weeks to a month. I understand how frustrating this is(especially to us), and I want to let you all know that we are prioritizing the launch of ICONN Lite, which we aim to have ready in 1 to 2 weeks. Thank you for your patience and understanding during this time. I will provide another update on ICONN Lite in the coming weeks.

[-]

Entubulated@reddit

I don't use VLLM, so can't comment on the issues others were having, but at first glance it's working for me under llama.cpp. First output is coherent, though I'm not sold on some of its' reasoning in discussing a comment thread from The Register about AI cloud costs. Can post settings and outputs if you're interested.

[-]

a_beautiful_rhind@reddit

Worst case scenario you get a blast from the past. Haven't fired up any of those models in a while.

[-]

mentallyburnt@reddit

Its seems to be a basic clown car MOE using mergekit?

in the model.safetensors.index.json

```
{"metadata": {"mergekit_version": "0.0.6"}
```

so either you fine-tuned the models in post after merging [I've attempted this a long time ago its not really effective and there is a massive loss in training]

or, you Fine-tuned three models (or four? you say four models and reference the base model twice) and then created a Clown car MOE and trained the gates on a positive / negative phrase or keyword list on traine "experts".

If either of these are done this is not an original MoE or even a real MoE. At most this looks like a 4 fine-tuned mistral models in a "MoE" trench coat.

I do have a problem with the "ICONN Emotional Core" its too vague and feels more like a trained classifier model that then directs the model to adjust its tone. not something new.

Also them trying to change all references to from mistral arch to ICONN arch in there original upload and then changing them back, rubs me the wrong way as the licence (which was an Iconn Licence) now needs to reference mistrals license not apache (depending on models used)

I could be wrong tho, please correct me if I am but this seems like actual project wrapped up and made glittery with sensational words so it looks like its something new.

[-]

Ok-Nature-4502@reddit

Going through the commits I found this graph which was removed from the README. I have no idea what to make of it but it appears to be some sort of benchmark

https://i.postimg.cc/tgYmDzSZ/Untitled-1.png

[-]

mentallyburnt@reddit

HA! ok good find.

Yea unless they drop something substantial to prove anything (like a research paper that explains how a $50k model is beating SOTA models that literally cost MILLIONS or BILLIONS, being built by researchers given unlimited money)

i'm pretty sure this is just a clout chase, reflection 70B vibes.

[-]

Sudden-Variation-660@reddit

Spot on, people in this sub will hype up anything .

[-]

No_Afternoon_4260@reddit

The license boys! Gives me a 404

[-]

Enderchef@reddit (OP)

?

[-]

No_Afternoon_4260@reddit

When you click on the licence at the top of the model card it says iconn and when you click 404 which means there is no license in the files

[-]

needthosepylons@reddit

I think you're also working on a "mini" version, right? The mini gguf model card is created but without the actual gguf. I suppose it will follow soon-ish?

As a 3060 12gb peasant, I'll gladly give it a try!

Congrats, anyway.

[-]

Enderchef@reddit (OP)

Yes, it's coming soon.

[-]

RandumbRedditor1000@reddit

I can't wait, maybe i will finally have a friend

[-]

traficoymusica@reddit

I Hope to hear about the lite version soon. Ty!

[-]

needthosepylons@reddit

Very nice!

[-]

fdg_avid@reddit

What are your training datasets? What was your training methodology? Did you use pretrained models? Did you merge models? If so what was your merging methodology? Why have you published these ICONN models under 2 different huggingface accounts?

[-]

Enderchef@reddit (OP)

Check this blog for some of the training datasets - ICONN 1 Training Data. It was trained from scratch (Read the model card), and it is published under 2 huggingface accounts because the Enderchef/ one was our beta, and the ICONNAI/ one is our enterprise full release.

[-]

fdg_avid@reddit

The model card is very light on details. How many tokens for pretraining? How about mid-training? Any RL in post-training? What was the GPU setup for pretraining?

[-]

Enderchef@reddit (OP)

9x B100s for training. I will provide the rest later, but ICONN 1 is new and we haven't finished with details yet.

[-]

fdg_avid@reddit

9x B100s for this architecture would take at least 1 month to train on 1T tokens. To pretrain on fineweb would take you over 1 year.

[-]

Enderchef@reddit (OP)

Not all of fineweb! Read it again - Creative Common snippets of Fineweb.

[-]

fdg_avid@reddit

Then that’s not pretraining because the Creative Commons subset would be tiny (200B tokens would be a generous maximum estimate). Did you initialize the weights randomly, or use pretrained weights?

[-]

Enderchef@reddit (OP)

That is not the only dataset!

[-]

fdg_avid@reddit

But the other datasets you list are not pretraining datasets.

[-]

vibjelo@reddit

The page says:

If you believe any dataset included does not comply with open-source standards, please contact us immediately.

Hey, that's me! I'm contacting you now :) Instead of writing "...and many more!" please just share straight up all the sources publicly, maybe even all on that same page, especially if you want to make clear "Our dedication to openness" isn't just something you say.

After that I'd feel like the only missing piece is the training code, so someone could in theory replicate the results, even if not bit identical.

[-]

ROOFisonFIRE_usa@reddit

Would appreciate a full detail on the datasets used and the training method. This is something sorely missing in the community especially surrounding MOE's.

-

Looks interesting, will see if I can give it a go when I have a chance.

[-]

smflx@reddit

Congratulation, and thanks for opening.

Could you share about data collection & training cost (gpu count & time)? $50,000 seems very small for the model size. Very interested to hear about building details.

[-]

HelpfulHand3@reddit

Any thoughts to testing it with EQ Bench? It's open source so I believe you can test it yourself.

[-]

Enderchef@reddit (OP)

Once ICONN is not down.

[-]

Inevitable-Start-653@reddit

Okay WOW yes I'm interested. I'm downloading both models rn, and can run them locally. I'm very interested to see how many AI-isms I detect. I feel like many of the new "SOTA" models have a blandness to them.

[-]

Enderchef@reddit (OP)

OFFICIAL MESSAGE

I sincerely apologize for the inconvenience. ICONN 1 is not functional right now. We predict it will be operational again in about 2 weeks to a month. I understand how frustrating this is(especially to us), and I want to let you all know that we are prioritizing the launch of ICONN Lite, which we aim to have ready in 1 to 2 weeks. Thank you for your patience and understanding during this time. I will provide another update on ICONN Lite in the coming weeks.

[-]

Enderchef@reddit (OP)

Don't worry, ICONN Lite is coming soon. 1 to 2 weeks. The ICONN 1 is bugged right now, so if you run it it might give garbled results. I'm working on it.

[-]

Enderchef@reddit (OP)

OFFICIAL MESSAGE

I sincerely apologize for the inconvenience. ICONN 1 is not functional right now. We predict it will be operational again in about 2 weeks to a month. I understand how frustrating this is(especially to us), and I want to let you all know that we are prioritizing the launch of ICONN Lite, which we aim to have ready in 1 to 2 weeks. Thank you for your patience and understanding during this time. I will provide another update on ICONN Lite in the coming weeks.

[-]

Classic_Pair2011@reddit

Please try to get on openrouter if possible

[-]

_sqrkl@reddit

Hello!

I tried running this on vllm, but I'm getting garbled output:

CUDA_VISIBLE_DEVICES=0,1,2,3  \
vllm serve ICONNAI/ICONN-1 \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.95 \
  --dtype bfloat16 \
  --max-model-len 32000 \
  --trust-remote-code \
  --port 8899 \
  --served-model-name ICONNAI/ICONN-1 \
  --api-key xxx

Any suggestions?

[-]

Enderchef@reddit (OP)

OFFICIAL MESSAGE

I sincerely apologize for the inconvenience. ICONN 1 is not functional right now. We predict it will be operational again in about 2 weeks to a month. I understand how frustrating this is(especially to us), and I want to let you all know that we are prioritizing the launch of ICONN Lite, which we aim to have ready in 1 to 2 weeks. Thank you for your patience and understanding during this time. I will provide another update on ICONN Lite in the coming weeks.

[-]

_sqrkl@reddit

Ok, tried running in vllm, and also with transformers using the code provided in the model card. I'll preface these results by saying that new models often have issues to work out.

vllm sample outputs:

(response to the first planning prompt in the longform creative writing benchmark)

# Brainstorming **«🩸🩸🩸」** **«Darkly Comic British YA Fiction»** **«Short Story»** **«8 Chapters» **«Approx.**\n**[100]** **[100]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** 
(etc)



(response to the first planning prompt in the longform creative writing benchmark)
# Brainstorming fokusiert auf Charaktere, Handlungssysteme und allgemeine Struktur.\n\n## Charaktere Ideen:\n### Götter:\n1 **«Zeus»** trägt weiße Nike Air Jordans, repräsentiert Macht und Kontrolle. Seine Fähigkeiten sind Blitzschlag und Wetterkontrolle.\n **«Hera」** trägt schwarze Adidas Ultraboost für Geschwindigkeit und Stil. Göttin der Ehe und Familie.\n **«Ares»** trägt rote Under Armour Speedforms für Kriegslust und Energie.\n **«Athena»**

(note: the prompt was in English, so this it responding in German is unexpected)

transformers sample output:

You: Write a long story about a princess. Make it 1000 words.
Bot: Once upon a time in the kingdom of Eldoria, there lived Princess Amelia. Born to King Reginald III and Queen Isabella, Amelia was the youngest of three children. Her elder siblings were Prince Edward and Princess Margaret. Unlike her siblings, Amelia was not born with the typical royal traits. She was not interested in courtly affairs, nor did she possess the grace and poise expected of a princess. Instead, Amelia was a curious and adventurous spirit, always eager to explore the vast forests and mountains surrounding Eldoria.

(it only returns 1 paragraph)

[-]

_sqrkl@reddit

Ok, tried running in vllm, and also with transformers using the code provided in the model card. I'll preface these results by saying that new models often have issues to work out.

vllm sample outputs:

(response to the first planning prompt in the longform creative writing benchmark)

# Brainstorming **«🩸🩸🩸」** **«Darkly Comic British YA Fiction»** **«Short Story»** **«8 Chapters» **«Approx.**\n**[100]** **[100]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** **[1]** 
(etc)

(response to the first planning prompt in the longform creative writing benchmark)

# Brainstorming fokusiert auf Charaktere, Handlungssysteme und allgemeine Struktur.\n\n## Charaktere Ideen:\n### Götter:\n1 **«Zeus»** trägt weiße Nike Air Jordans, repräsentiert Macht und Kontrolle. Seine Fähigkeiten sind Blitzschlag und Wetterkontrolle.\n **«Hera」** trägt schwarze Adidas Ultraboost für Geschwindigkeit und Stil. Göttin der Ehe und Familie.\n **«Ares»** trägt rote Under Armour Speedforms für Kriegslust und Energie.\n **«Athena»**

(note: the prompt was in English, so this it responding in German is unexpected)

transformers sample output:

You: Write a long story about a princess. Make it 1000 words.
Bot: Once upon a time in the kingdom of Eldoria, there lived Princess Amelia. Born to King Reginald III and Queen Isabella, Amelia was the youngest of three children. Her elder siblings were Prince Edward and Princess Margaret. Unlike her siblings, Amelia was not born with the typical royal traits. She was not interested in courtly affairs, nor did she possess the grace and poise expected of a princess. Instead, Amelia was a curious and adventurous spirit, always eager to explore the vast forests and mountains surrounding Eldoria.

(it only returns 1 paragraph)

[-]

rookan@reddit

Even Q4_K_M has a size of 50GB. How am I supposed to run it locally?

[-]

poli-cya@reddit

What a strange question, like it's incumbent on them to tailor their model to your specific situation? Ungrateful and just weird.

And before you immediately assume poor performance, it's MoE so it should run relatively fine even with much of it in RAM or SSD. I have 16GB VRAM, 32GB of usable RAM much of the time and run MoEs larger than this with good enough performance to make them useful.

[-]

rookan@reddit

It is a locallama subreddit not serverwith100gbvram reddit. Most People have 24gb at max.

[-]

colin_colout@reddit

I can run this quantized for possibly less than you spent on just your GPU.

I run my models on a $400 (on sale) 8845hs minipc's iGPU with CPU or even ssd offload for bigger models.

I spent a whopping $200 on 96gb 5600mhz dual channel RAM when I realized I can run larger MoEs at usable speeds.

I run 70gb+ models just fine, especially MoEs with small experts like Qwen3-30b (20-40tk/s depending on how full my context is or the quant I'm using). Heck, I can even run 150gb+ MoE like quantized Qwen3-235b and Maverick from ssd at a few tokens per second.

Get yourself a decent ryzen miniPC when it's on sale and try for yourself.

Otherwise learn to get the most out of your existing hardware with minimal upgrades... This whole sub is here to help if you decide to get serious about self hosting models.

[-]

mrtime777@reddit

I have 512gb RAM and 64gb VRAM at home and can run this model locally. so this subreddit is great for this model if it can be downloaded and run on anything even at 0.1t/s - it is local.

[-]

NormalFormal69420@reddit

A lot of users in here have 100 GB VRAM, a lot of users post pictures of their 3090 rigs.

[-]

Judtoff@reddit

nah, there are lots of us with 3x P40s or 3090s.

[-]

Enderchef@reddit (OP)

It's a MoE, so it should run if you have the RAM and any GPU (unless it's an old, bad one). If that doesn't work, we are producing a Lite model with 7B parameters and another with 14B and 32B.

[-]

poli-cya@reddit

I don't agree models that need 100GB VRAM don't belong here but even then, if you could read, you would see that you can absolutely run MoEs of this size at usable speeds on 24GB or even less.

Do you post this on all Qwen 235B, scout, maverick, deepseek, etc etc etc posts?

[-]

skatardude10@reddit

How? Offload tensors (override tensors to CPU) specifically for MOE tensors probably, but you should be able to load all layers to 24GB of vram with tensor overrides like people have been running Qwen 235B on modest hardware like 12/16gb cards at decent speeds.

[-]

Environmental-Metal9@reddit

More ram and cpu offload?

[-]

Enderchef@reddit (OP)

Everyone is posting a lot here. I'm still answering questions, but if you could be polite that would be great; Negative feedback is fine, but keep it polite. If you like my model in huggingface it would be great help. Thank you for your feedback!

[-]

jacek2023@reddit

In my opinion, this all looks suspicious, but I try to keep an open mind so I'll see what comes of it

[-]

mk321@reddit

Try the lightweight [ICONN 1 Mini (7B) (Not out yet)]

It will be significant. I can't wait!

[-]

New_Zucchini_3843@reddit

What languages other than English are available?🤔

Ex:French,German,Spanish,Korean,Japanese,Chinese....

[-]

Enderchef@reddit (OP)

Quite a couple. We don't have the exact amount, but I'd say French, Spanish, and more, but I don't know them all.

[-]

mk321@reddit

Any chances for Polish? Do you use polish training data (models like PLLuM, Bielik)?

[-]

New_Zucchini_3843@reddit

Okay, I will test it in my native language with high hopes.😊

[-]

vibjelo@reddit

Same question in a different way: Which languages have you tested and confirmed to be working?

[-]

adi1709@reddit

I don't like it when people say - most advanced human like

Which part of this is validated exactly? I don't see any numbers.

I don't intend to belittle your work in any which way, great job!

[-]

ElectronSpiderwort@reddit

Well you know how humans are finicky, unreliable and just plain bad a lot of the time?

[-]

OriginalTechnical531@reddit

LocalLLaMA so desperate for good local they ignore all the warning signs and proceed with unwarranted optimism...

[-]

fizzy1242@reddit

i think all big models are censored to a degree, but nothing a good system prompt can't handle

[-]

Pentium95@reddit

This might be Really promising for RP too! How much "censored" Is It? How much effettive context size has been tested? Is this considerable an "instruct" model, good at following the prompt?

i can't wait to see proper benchmarks, like longbench or eq-bench!

[-]

Enderchef@reddit (OP)

Our model is not censored (That we know of), and let me know if it is. It is not anything like Deepseek with it's censorship, and we have made sure to keep it uncensored unless it's dangerous or harmful.

Our model is considered an "instruct" model, and is great at following a prompt.

ICONN 1's context size is 32,768. ICONN 2, when it comes out, we hope to have a version that takes in a 1M token context in exchange for a larger parameter amount, and we are working on a new architecture that supports infinite context by our own method called RREEL.

[-]

some_user_2021@reddit

"uncensored unless it's dangerous or harmful"? So it won't be able to tell me how to build a nuclear facility? 😞

[-]

Enderchef@reddit (OP)

😆

[-]

Pentium95@reddit

32k Is fair, 1M with SOTA attention? Really promising!!

Thanks a lot for the good news!

sadly, i only have 48GB VRAM, i am not sure i can run this properly, i think i Need to wait for 2.5 BPW EXL3 quants, for now, i'm gonna try Berto's iMatrix GGUF quants, as soon as i can, i hope i can handle IQ3_M

[-]

ArsNeph@reddit

Sounds unique, you've got my attention. If the data is human-like, then it might be quite good for creative writing use cases. But if you're going to claim the most advanced, where are the benchmarks? You should compare it to models like Llama 4 Scout, Nemotron 49B, and Qwen 3 32B. You don't have to compare STEM, since that's not your focus, but you should definitely compare stuff like MMLU and SimpleQA.

Also, I see that there are already some quants up, but is this architecture properly supported by llama.cpp?

[-]

Enderchef@reddit (OP)

Yes it is! mradermacher has done static and imatrix quaints.

[-]

C1oover@reddit

Anything to back up the claim that it’s the most humanlike? Not to sound too skeptical but would be interesting.

[-]

Judtoff@reddit

By any chance would there be a recommended quant for 3x RTX 3090? (IE 72GB VRAM)? Especially if we wanted to take advantage of the full 32768 context?

[-]

fizzy1242@reddit

ICONN-1 is based on a customized Mixtral framework and...

I'd assume it's supported

[-]

ArsNeph@reddit

I would assume that too, but it's the word "custom" that worries me. Even models that claim to have day 1 support for llama.cpp from the official team, like Gemma 3, tend to have inference bugs, so I've learned it's better to assume all models are unsupported until llama.cpp contributors like Unsloth say otherwise

[-]

Enderchef@reddit (OP)

Don't worry! We've run our model on LLAMA_CPP, and mradermacher has done static and imatrix quaints.

[-]

jacek2023@reddit

https://huggingface.co/mradermacher/ICONN-1-i1-GGUF

https://huggingface.co/mradermacher/ICONN-e1-i1-GGUF

https://huggingface.co/mradermacher/ICONN-1-GGUF

https://huggingface.co/mradermacher/ICONN-e1-GGUF

[-]

atape_1@reddit

Well that was incredibly quick.

[-]

Enderchef@reddit (OP)

They started it a day before I announced the model. I made the model and requested quaints before announcing.

[-]

Entubulated@reddit

This is the way : - ) Also thanks for actually posting anything for inferencing setting in the model card, something others don't always do.

[-]

DepthHour1669@reddit

I hope this thing uses MLA to reduce context vram because nobody has a spare B100 lying around

[-]

a_beautiful_rhind@reddit

EXL2 it and quantize the context. Everyone complained about command-r original and it just used normal memory.

[-]

Enderchef@reddit (OP)

Sorry, but this model is 92B parameters. You can chat with it on the huggingface space, and we are releasing a Lite version soon with 7B parameters - It's currently in testing for strange errors. If you want to chat with it anyways, you can react to our provider support request with an emoji at huggingface/InferenceSupport · ICONNAI/ICONN-1

[-]

DepthHour1669@reddit

… this post says 88b. You should fix your post.

[-]

Enderchef@reddit (OP)

Sorry, I meant this model is 82B parameters.

[-]

needCUDA@reddit

Ollama link?

[-]

Enderchef@reddit (OP)

We don't have it there, but Ollama can run the mrandmacher ggufs.

[-]

Jexiel54@reddit

Put it on replicate

[-]

Enderchef@reddit (OP)

Vote for inference support and comment "replicate".

[-]

Substantial_Gate_161@reddit

Does it run on vllm?

[-]

Hurricane31337@reddit

Really nice! Does it support tool calling, too?

[-]

Enderchef@reddit (OP)

It should; We have not tested it yet.

[-]

vibjelo@reddit

I think in general when people ask "does it support tool calling?" people are asking if it was trained with tool calling in mind, otherwise it's a lot worse at it. If you didn't test for tool calling, one could probably assume you didn't train it with tool calling in mind either?

[-]

Enderchef@reddit (OP)

It was not trained with tool calling in mind, but if you modify the system prompt it can handle it.

[-]

medialoungeguy@reddit

!remindme

[-]

RemindMeBot@reddit

Defaulted to one day.

I will be messaging you on 2025-06-20 22:54:21 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

[-]

You_Wen_AzzHu@reddit

I have a few issues on gguf 4bit:

Constant repeating, extremely slow speed (compared to qwen3 235b 4bit) with and without experts offload.

[-]

Signal_Specific_3186@reddit

I know this is LocalLLaMA but is there anywhere online we can demo it?

[-]

Enderchef@reddit (OP)

Sorry, not yet; We want to, though; If you could like ICONNAI/ICONN-1 and react with an emoji to vote for provider support, we could reach it fast!

[-]

MediocreBye@reddit

How do you expect something like the DGX Spark to handle this?

[-]

Enderchef@reddit (OP)

I'm not sure. We've only done it loaded locally on transformers/torch, and loaded it in GGUF on llama_cpp.

[-]

FriskyFennecFox@reddit

Is the base version available?

[-]

Enderchef@reddit (OP)

Our base model IS the instruct model. We didn't want to spend over $50,000 so we made the instruct model the base model. Don't worry, performance isn't effected.

[-]

HelpfulHand3@reddit

I'm surprised your demo system prompt has it acting more as an assistant than a conversational partner. Do you see this being used for companion AI such as character chat or the backend LLM of a voice interface (like Kyutai's Unmute), or more of a general assistant with a high EQ?

[-]

Environmental-Metal9@reddit

Thanks for the link to the demo! OP iFrame css breaks mobile scrolling on iOS, just fyi. Makes it so that the advanced params gets cut off at the bottom and you can’t scroll because the drop down is inside the iFrame with no scroll. If having scroll bars is so undesirable, could you consider an alternative solution, like not iframing the space?

[-]

Enderchef@reddit (OP)

It depends on the use case. The reason the demo system prompt has it act like an assistant is because when we took a poll then about 96% of people would accept the model if they saw it more of an assistant in the preview. It is super flexible and you can easily do things with it.

[-]

Leflakk@reddit

Love to see new models like this, we never support enough these works

[-]

Enderchef@reddit (OP)

Thank you! If you could like the model, we want to get onto the Trending page so that we can reach more people and get our 7B lite model going!

[-]

JMowery@reddit

Very interesting!

But you kinda lost me when you show a bar chart on your HF without any axis. What is that about? It almost looks like scammer-level type deception when you do that.

Please remove the bar chart entirely or actually create a bar chart that makes any statistical and logical sense to exist.

[-]

Enderchef@reddit (OP)

Sorry about that! I fixed it.

[-]

JMowery@reddit

Appreciate it!

[-]

jacek2023@reddit

previous discussion: https://www.reddit.com/r/LocalLLaMA/comments/1lfd7e2/has_anyone_tried_the_new_iconn1_an_apache/

[-]

SlavaSobov@reddit

Really cool! Congratulations!! 💕

More open source models are always appreciated.

[-]

You_Wen_AzzHu@reddit

Who the fuck is down voting everyone? Are you jealous?

[-]

You_Wen_AzzHu@reddit

Please push for llamacpp pr.

[-]

Enderchef@reddit (OP)

Could you elaborate?

[-]

appakaradi@reddit

Congratulations. +1 for open source.

Eager to see the benchmarks.

[-]

fizzy1242@reddit

interesting, downloading right now