Russian LLMs

[-]

Shifty_13@reddit

This guy made 2 articles about their models https://habr.com/ru/users/vltnmmdv/articles/ You can use a translator. These models are legit. The main sponsor of them is the biggest Russian bank and they are trained on Russian GPU clusters and they mostly used Russian language for training (but understand other languages too). Ofc reddit won't like this because of Ukraine stuff, but it is what it is 🤷 Doesn't mean that the model itself is evil at least. Same reddit seems to use Chinese models just fine even tho China is the enemy.

Reply

[-]

Woof9000@reddit

China didn't bomb and invade their (and our) neighbors, yet. At least not in recent collective memory. Russia and Russians, and everything they create - comes carrying much heavier baggage, and it might remain so for generations.

Reply

[-]

mana_hoarder@reddit

What about Americans? I don't think we can afford to be so picky. Besides, governments are governments and not necessarily really related to the companies of said country.

Reply

[-]

Woof9000@reddit

I'm European. Americans have not invaded and bombed our close neighbors, yet, so they are not in the same category, yet, but they do seem to be working towards that "goal". "We can't afford to be so picky" - is not a great excuse for anything. We can always afford to have some standards.

Reply

[-]

mana_hoarder@reddit

I hate to get political, but. If you want to have standards, then have universal standards and don't use anything made by American companies. Who cares if they bombed your neighbors or people a bit further away. So far the US is the n1 when it comes to invading, wars, and bombings. No other country comes close.

Reply

[-]

Woof9000@reddit

There's a massive difference between having set of standards and being idealist, and I'm a former, not the latter, because "darkness" and "evil" (the subjective kind) is not something that can be eradicated, or preached and shamed out of existence, it's just something that can be pushed and held back by adhering to subjective (and collective) set of standards, values, morals, rules. There are no such things as "universal good" and "universal evil".

Reply

[-]

Shifty_13@reddit

You can use Russian-trained open weights models and still have standards and be picky where it matters. It's kinda like when all of Ukraine suddenly "forgot" Russian and started speaking Ukranian. https://preview.redd.it/8xuprbr6naog1.jpeg?width=1068&format=pjpg&auto=webp&s=cff63880e66c6e7e7800c570f27ad40b9700abf2 Imo they could have still spoke Russian and fought this war just fine. The language itself is not bad, it's literally the second most represented language on the internet (as you can see from my picture). Being critical of it is just dumb. It's kinda like my Mom who hates German because Nazis killed many millions of Russians. It's dumb. And now a person like you propagandizes a similar approach but to technology in a totally unrelated to politics sub.

Reply

[-]

Shifty_13@reddit

Are you Ukranian? Why would you or somebody non Ukranian resent Russia if they are not even Ukranian? By this logic nothing stops people from resenting the Chinese for Tiananmen or something similar. Also, maybe we should ban all the knowledge that Nazi created simply because they were inhumane when they did it? Can't use certain medicine because they made it by experimenting on humans. Can't drive BWM because the brand was built on slave labor and etc etc. If you are not the victim then cut the crap please. Even I, as a Russian, don't resent a figure like Stalin for his repressions. But somebody from a Western sphere of influence would have a really strong opinion about him, call him dictator say that Russia is bad and etc. Even tho it didn't affect this person at all. I can make so many examples of this BS. Take Holodomor for example. "Oh, poor Ukrainians, it was a total genocide, no doubt, fuck Stalin". I literally live in the place that was hit by this famine. I am literally the descendant of the victims and I don't resent Stalin while some dumbass from another country will have a super strong opinion about this matter. Sorry for the long vent. But I just hate ignorant people with strong opinions on things they shouldn't even care about to begin with.

Reply

[-]

Alex_L1nk@reddit

One of the users found a high correlation between GigaChat and Deepseek [https://habr.com/ru/companies/sberdevices/articles/968904/comments/#comment\_29147094](https://habr.com/ru/companies/sberdevices/articles/968904/comments/#comment_29147094)

Reply

[-]

Shifty_13@reddit

Dev answered it https://habr.com/ru/companies/sberdevices/articles/968904/comments/#comment_29148662 then this https://habr.com/ru/companies/sberdevices/articles/968904/comments/#comment_29151338 I don't know enough about AI to be the judge but this dev seems convincing. Also, historically, Russia/post-USSR countries had really strong IT scene. We have really nice apps and websites. So I am not surprised that we also make AI models now. I would have been very surprised if we made our own CPU or GPU. But AI model is different, I think it's quite achievable.

Reply

[-]

Alex_L1nk@reddit

To me their response looks like AI-generated. Maybe it's just me. I'm not an expert in this field (comparing one LLM to another), so IDK if dev or user is right. \>>Also, historically, Russia/post-USSR countries had really strong IT I'm a Russian myself )

Reply

[-]

Shifty_13@reddit

I got the same feeling but from his articles. He is obviously using AI for text formatting at least. Tbh, a lot of people do this stuff nowadays. Have you noticed how many AI-related github pages have emojis now? Imo we have no reason to suspect that the dev is ingenious. Also this GigaChat thing seems to be very well funded so I won't be surprised that it's 100% legit.

Reply

[-]

justicecurcian@reddit

another user explained why it's not accurate

Reply

[-]

Guardian-Spirit@reddit

... why look at Russian LLMs?

Reply

[-]

JockY@reddit

They might be good. We look at Chinese ones all day long. The academics behind the model did not invade Ukraine.

Reply

[-]

Guardian-Spirit@reddit

Of course academics behind Russian LLM did not invade Ukraine. But as a russian, I can say that these models... aren't good. To start with, GigaChat is a wordplay around "gigachad", which is russian meme-hyperbole of "chad". Kinda sets the whole tone. Moreover, this model is developed by the biggest state-owned russian bank corporation that strives to be a megacorp, Sber. But even if they were good, I don't feel like such search for such "gems" is meaningful. Most of such projects seem to be "we took a model and trained it to speak our language", not something that actually strives to solve any problem.

Reply

[-]

JockY@reddit

Sounds like you don’t have criticisms that will withstand scrutiny when your arguments are based on a general feeling and ad-hominem attacks on the model’s name and creator.

Reply

[-]

Guardian-Spirit@reddit

Yes. Yes, you are right. I don't pose what I'm right now even remotely as scientifically valid criticism. It's not. It's just that, as someone who happens to live in that country, I'm very skeptical & angry towards all the government-backed activities, constant corruption, wars, deterioration of scientific institutes. Although I did test GigaChat some time ago (and genuinely didn't find it impressive), you're absolutely right to call me out right now, I am heavily biased in this matter.

Reply

[-]

JockY@reddit

Hey man, I get you on being angry at your country’s leadership decisions. I live under Trump, the mushroom-dicked orange moron wannabe dictator. Good luck with your own dictator.

Reply

[-]

Guardian-Spirit@reddit

Best of luck to all of us, I guess. Thank you for being rational)

Reply

[-]

JockY@reddit

One day when the lobster whistles on the mountain perhaps we’ll laugh about it all.

Reply

[-]

HadHands@reddit

It's slop, first paragraph screams AI generated.

Reply

[-]

RhubarbSimilar1683@reddit (OP)

Time to stop using ai lol I wrote it myself, apparently I write like ai now

Reply

[-]

HadHands@reddit

I’d give this a **9.5 out of 10** on the "AI-generated" scale. While it's technically possible for a human to write this, it is the quintessential example of **LLM Academic Prose.** If I didn't know better, I’d say it was written by a sibling of mine. # Why it screams "AI" * **The "However" Pivot:** The structure follows a classic AI template: *\[Statement of importance\] + \[However, there is a gap\] + \[This paper introduces X to fill that gap\].* It’s the "Hero’s Journey" of every AI-generated abstract. * **The "We provide a detailed report" Phrase:** LLMs love to list features using this specific cadence. Humans often use more varied verbs like "We detail," "We outline," or "We dive into." * **Hyper-Sanitized Tone:** The text is perfectly grammatical and follows a rigid logical flow. It lacks the "clutter" or idiosyncratic phrasing often found in human writing (especially in technical papers where researchers might use more dense, jargon-heavy shorthand). * **Comprehensive Listing:** The way it lists every interface (API, Telegram, Web) and every goal (research opportunities, industrial solutions) feels like a model ensuring it hits every bullet point in a prompt. #

Reply

[-]

FriskyFennecFox@reddit

They also have much bigger models, such as `ai-sage/GigaChat3-702B-A36B-preview`, and the pretrain "true base" snapshots of 10B-A1.8B and 20B-A3B models, all under MIT. I know Russian so I checked their Habr article, they mention that the biggest one was trained on 14T tokens from scratch and used DeepSeek V3's architecture. Which is pretty huge, if you ask me! Crazy that they have zero traction in the western community!

Reply

[-]

Own_Suspect5343@reddit

I don't know about 20B version, but the big version of gigachat based on deepseek architecture with distillation from qwen3

Reply

[-]

LicensedTerrapin@reddit

Based on Qwen3 means they didn't really invent the wheel did they?

Reply

[-]

RhubarbSimilar1683@reddit (OP)

You hate it for some other reason and are trying to justify it. This sub did the same with openclaw.

Reply

[-]

Alex_L1nk@reddit

OpenClaw was hated because it was filled with vulnerabilities

Reply

[-]

justicecurcian@reddit

It's trained from ground with deepseek architecture

Reply

Reply to Post

30 Comments

Shifty_13@reddit

Woof9000@reddit

mana_hoarder@reddit

Woof9000@reddit

mana_hoarder@reddit

Woof9000@reddit

Shifty_13@reddit

Shifty_13@reddit

Alex_L1nk@reddit

Shifty_13@reddit

Alex_L1nk@reddit

Shifty_13@reddit

justicecurcian@reddit

Guardian-Spirit@reddit

JockY@reddit

Guardian-Spirit@reddit

JockY@reddit

Guardian-Spirit@reddit

JockY@reddit

Guardian-Spirit@reddit

JockY@reddit

HadHands@reddit

RhubarbSimilar1683@reddit (OP)

HadHands@reddit

FriskyFennecFox@reddit

Own_Suspect5343@reddit

LicensedTerrapin@reddit

RhubarbSimilar1683@reddit (OP)

Alex_L1nk@reddit

justicecurcian@reddit