TheaterFire

Cohere releases Aya Expanse multilingual AI model family

Posted by umarmnaq@reddit | LocalLLaMA | View on Reddit | 40 comments

Reply to Post

40 Comments

jd_3d@reddit

Looks really strong based on the win rates. Too bad it's non-commercial use.
View on Reddit #39152539

nodating@reddit

Why does anyone care about that? There is no way to proof the origin of LLM's outputs. Exactly zero ways to do that. As far as I am concerned, they only use that phrase for this tech to be "protected" if any psycho went and did something highly illegal.
View on Reddit #39169687

Koksny@reddit

>There is no way to proof the origin of LLM's outputs. Until you get sued into discovery.
View on Reddit #39170831

Serveurperso@reddit

JPP Faux, du Luc Julia
View on Reddit #65938946

LoafyLemon@reddit

What you're describing is currently impossible. Fingerprinting techniques are being developed, but no model uses it yet. Besides, this whole fingerprinting idea is dumb and won't hold up, because it requires that all models use it to work, but we know it won't happen, people will keep using different models that will break this metric by simply existing outside of it.
View on Reddit #39174671

Koksny@reddit

We were able to fingerprint models even year or two ago, just by asking the model to give random number in range of 0 to n. It was accurate enough to pinpoint the particular release version. Remember, fingerprinting isn't about being able to tell a model from non-generative text. It's about being able to recognize the model given large enough sample of output. And it's far from impossible, since i think everyone here has experienced the model slop - slop that is often unique to particular model or fine-tune.
View on Reddit #39181900

LoafyLemon@reddit

Fingerprinting is about discerning between models in the wild, not in a closed-off setting where you directly ask the model to output a number. Text models are trained on naturalized language, this idea is bound to fail simply because people (that's us, yay!), write in the same style as large language models, because they are trained on our creativity. The 'GPT slop' as you call it can be reduced in a model to the point where your fingerprinting test would fail. Mind you not all models are public, and the sole existence of models outside the spectrum would outright make the idea fail, because as I said before - they exist outside the metric.
View on Reddit #39183650

silenceimpaired@reddit

Not true. They just released a paper on digital watermarks.. and people also have consciences :)
View on Reddit #39188831

silenceimpaired@reddit

I always downvote posts about this company for that reason.
View on Reddit #39188779

appakaradi@reddit

That sucks.
View on Reddit #39152991

MasterThread@reddit

It's the best model I found for interactive storytelling, but it supports only 8k tokens, any alternatives?
View on Reddit #47480917

Quiet_Joker@reddit

I don't get why people are hating on this model, it's literally a model meant to translate from one language to another. Not a model designed to do task like coding and other stuff in mind. I compared this and the previous version of the model which was just named Aya, and this is a significant better version to the previous one when it comes to translating stuff.
View on Reddit #39204317

first2wood@reddit

Hey, have you done any test or you are already using it as a translator? what's the quality compared with google translation and GPT4 series?
View on Reddit #39222594

Nekotekina@reddit

Not OP, but just tested it and it seems better than Gemma-2 27B at translation. Will test more, I'm intrigued so far.
View on Reddit #39715975

first2wood@reddit

Yes, I tested too. It's quite decent as a translator!
View on Reddit #39717831

Nekotekina@reddit

Gemma remains a tough opponent though. Its output looks more stable and solid, but at the same time, Aya seems more vivid and witful about nuances. Qwen2.5 (32B tested) seems worst, very unstable and quirky but its writing may look interesting. If I cherry pick some "word stutter" recognition ("s-stutter"): * Gemma2 may completely ignore the stutter in translation (seems okay). * Aya Expanse works but not reliably. * Qwen2 produces unnecessary transliteration of language fragments (very bad in my opinion). I wish I could combine strong sides from them all...
View on Reddit #39953384

MRGRD56@reddit

The 32B model is also good for writing, at least in English, it seems to be coherent and creative enough. This model wasn't that good for reasoning: I gave it a riddle/problem (I made it up myself) that Qwen 2.5 32B was able to solve, with correct reasoning, and even Qwen 2.5 14B sometimes can solve it, but Aya's reasoning was wrong. So it can't be a perfect ChatGPT alternative ("helpful assistant") but it can be really good at writing, story telling, roleplaying, and also working with different languages of course. The 8B one was okay, but it didn't surprise me much. I've tested it a bit, I think it should be alright for translating tasks.
View on Reddit #39395459

Dyonizius@reddit

it also uses GQA, though 8k context for a translation model leaves a bit to be desired
View on Reddit #39248528

dubesor86@reddit

I ran the models through my [personal benchmark](https://dubesor.de/benchtable); very weak for their size compared to the competition, not worth the storage space imo. Aya Expanse 8B (f16) - failed pretty much everything and was around L3.2 3B capability. Aya Expanse 32B (Q4_K_M) - weaker results than even Gemma 2 9B & Nemo 12B in my testing. It would be OK as like a 12B model due to being fairly uncensored. Gets absolutely stomped by Qwen2.5 As always, YMMV! - but I'm deleting the models again.
View on Reddit #39170780

MaycombBlume@reddit

The emphasis with this model is on its multilingual capabilities. Are your tests relevant to that domain?
View on Reddit #39199159

dubesor86@reddit

While I do have some multilingual tasks, it's not the focus by any means. The marketing however claim an above 50% winrate against models with no emphasis on multilingual capabilities, and also in the English-specific win rates. (Link and charts are in the OP). However, to me it's not really relevant as I test each model overall skillset regardless of their intended use, such as coders, tiny lightweight models, etc.)
View on Reddit #39223323

walrusrage1@reddit

Thoughts on Nemo in general? I see it also ranks higher than GPT3.5 on your evaluation table
View on Reddit #39171689

dubesor86@reddit

For 12B it's very good. I was surprised that it managed to do so well in my coding segment, and it's obviously far less censored than Gemma 2.
View on Reddit #39172546

appakaradi@reddit

Why is no one comparing against Qwen 2.5?
View on Reddit #39151777

Xhehab_@reddit

https://twitter.com/johnamqdang/status/1849883876245516594
View on Reddit #39151948

dubesor86@reddit

Those would be insane winrates, my own testing of these models puts Aya dead last, but oh well. The low amount of ties is also quite surprising. https://preview.redd.it/jndwbizpbbxd1.png?width=661&format=png&auto=webp&s=d48e2220b8c1c47290e9c04fb1f7d9b60888df44
View on Reddit #39174252

Healthy-Nebula-3603@reddit

That model is DESIGNED for TRANSLATION only. Win rates are connected to translations.
View on Reddit #39178616

dubesor86@reddit

where do you get the info its designed for 'TRANSLATION only'? Certainly there are no such claims on their announcements. Either way, that's not relevant for my testing which tests all aspects of a model regardless for what it's 'DESIGNED for'.
View on Reddit #39182025

Healthy-Nebula-3603@reddit

Did you even read "readme" from their hugginface?
View on Reddit #39187795

BlueSwordM@reddit

Oh, very nice. I'm honestly surprised Qwen 2.5 is "better" than Gemma2 in this regard since Gemma2 has been top tier in terms of language performance for me.
View on Reddit #39154001

fungnoth@reddit

I'm also surprised. Maybe it's because the comparisons being mostly short conversations. From my experience using the q4, longer context make it randomly outputing chinese, trying to translate the previous sentences
View on Reddit #39168355

appakaradi@reddit

Qwen is a tough opponent
View on Reddit #39153015

AdSuperb3336@reddit

8k context length
View on Reddit #39181447

isr_431@reddit

Has this not been out for a week already? I've been using it in Ollama since then. Gemma 2 9b is better at translation, but supports fewer languages. Qwen 2.5 is still the best for most Asian languages.
View on Reddit #39164112

nodating@reddit

Yeah but to be fair it kinda went under the radar, understandable as its main strength is multilingual use and we all know that most Americans can barely handle English (no offense) and there are just better models out there already if all you care about is English use-cases.
View on Reddit #39169818

staladine@reddit

Open source or another non commercial version ?
View on Reddit #39154747

jd_3d@reddit

Non commercial
View on Reddit #39163729

ortegaalfredo@reddit

Surpassing Qwen-32B, its likely this is the best model for those that have a single 24GB card.
View on Reddit #39161270

Easy_Try_1138@reddit

Its too good on Arabic , and good at translate text which is so important few models can do that
View on Reddit #39154145

sunshinecheung@reddit

It's pity that there is no 14B
View on Reddit #39152836