Why does anyone care about that? There is no way to proof the origin of LLM's outputs. Exactly zero ways to do that.
As far as I am concerned, they only use that phrase for this tech to be "protected" if any psycho went and did something highly illegal.
What you're describing is currently impossible. Fingerprinting techniques are being developed, but no model uses it yet.
Besides, this whole fingerprinting idea is dumb and won't hold up, because it requires that all models use it to work, but we know it won't happen, people will keep using different models that will break this metric by simply existing outside of it.
We were able to fingerprint models even year or two ago, just by asking the model to give random number in range of 0 to n. It was accurate enough to pinpoint the particular release version.
Remember, fingerprinting isn't about being able to tell a model from non-generative text. It's about being able to recognize the model given large enough sample of output. And it's far from impossible, since i think everyone here has experienced the model slop - slop that is often unique to particular model or fine-tune.
Fingerprinting is about discerning between models in the wild, not in a closed-off setting where you directly ask the model to output a number.
Text models are trained on naturalized language, this idea is bound to fail simply because people (that's us, yay!), write in the same style as large language models, because they are trained on our creativity.
The 'GPT slop' as you call it can be reduced in a model to the point where your fingerprinting test would fail. Mind you not all models are public, and the sole existence of models outside the spectrum would outright make the idea fail, because as I said before - they exist outside the metric.
I don't get why people are hating on this model, it's literally a model meant to translate from one language to another. Not a model designed to do task like coding and other stuff in mind. I compared this and the previous version of the model which was just named Aya, and this is a significant better version to the previous one when it comes to translating stuff.
Gemma remains a tough opponent though. Its output looks more stable and solid, but at the same time, Aya seems more vivid and witful about nuances. Qwen2.5 (32B tested) seems worst, very unstable and quirky but its writing may look interesting. If I cherry pick some "word stutter" recognition ("s-stutter"):
* Gemma2 may completely ignore the stutter in translation (seems okay).
* Aya Expanse works but not reliably.
* Qwen2 produces unnecessary transliteration of language fragments (very bad in my opinion).
I wish I could combine strong sides from them all...
The 32B model is also good for writing, at least in English, it seems to be coherent and creative enough. This model wasn't that good for reasoning: I gave it a riddle/problem (I made it up myself) that Qwen 2.5 32B was able to solve, with correct reasoning, and even Qwen 2.5 14B sometimes can solve it, but Aya's reasoning was wrong. So it can't be a perfect ChatGPT alternative ("helpful assistant") but it can be really good at writing, story telling, roleplaying, and also working with different languages of course.
The 8B one was okay, but it didn't surprise me much. I've tested it a bit, I think it should be alright for translating tasks.
I ran the models through my [personal benchmark](https://dubesor.de/benchtable); very weak for their size compared to the competition, not worth the storage space imo.
Aya Expanse 8B (f16) - failed pretty much everything and was around L3.2 3B capability.
Aya Expanse 32B (Q4_K_M) - weaker results than even Gemma 2 9B & Nemo 12B in my testing. It would be OK as like a 12B model due to being fairly uncensored. Gets absolutely stomped by Qwen2.5
As always, YMMV! - but I'm deleting the models again.
While I do have some multilingual tasks, it's not the focus by any means. The marketing however claim an above 50% winrate against models with no emphasis on multilingual capabilities, and also in the English-specific win rates. (Link and charts are in the OP).
However, to me it's not really relevant as I test each model overall skillset regardless of their intended use, such as coders, tiny lightweight models, etc.)
Those would be insane winrates, my own testing of these models puts Aya dead last, but oh well. The low amount of ties is also quite surprising.
https://preview.redd.it/jndwbizpbbxd1.png?width=661&format=png&auto=webp&s=d48e2220b8c1c47290e9c04fb1f7d9b60888df44
where do you get the info its designed for 'TRANSLATION only'? Certainly there are no such claims on their announcements. Either way, that's not relevant for my testing which tests all aspects of a model regardless for what it's 'DESIGNED for'.
Oh, very nice.
I'm honestly surprised Qwen 2.5 is "better" than Gemma2 in this regard since Gemma2 has been top tier in terms of language performance for me.
I'm also surprised. Maybe it's because the comparisons being mostly short conversations. From my experience using the q4, longer context make it randomly outputing chinese, trying to translate the previous sentences
Has this not been out for a week already? I've been using it in Ollama since then. Gemma 2 9b is better at translation, but supports fewer languages. Qwen 2.5 is still the best for most Asian languages.
Yeah but to be fair it kinda went under the radar, understandable as its main strength is multilingual use and we all know that most Americans can barely handle English (no offense) and there are just better models out there already if all you care about is English use-cases.
40 Comments
jd_3d@reddit
nodating@reddit
Koksny@reddit
Serveurperso@reddit
LoafyLemon@reddit
Koksny@reddit
LoafyLemon@reddit
silenceimpaired@reddit
silenceimpaired@reddit
appakaradi@reddit
MasterThread@reddit
Quiet_Joker@reddit
first2wood@reddit
Nekotekina@reddit
first2wood@reddit
Nekotekina@reddit
MRGRD56@reddit
Dyonizius@reddit
dubesor86@reddit
MaycombBlume@reddit
dubesor86@reddit
walrusrage1@reddit
dubesor86@reddit
appakaradi@reddit
Xhehab_@reddit
dubesor86@reddit
Healthy-Nebula-3603@reddit
dubesor86@reddit
Healthy-Nebula-3603@reddit
BlueSwordM@reddit
fungnoth@reddit
appakaradi@reddit
AdSuperb3336@reddit
isr_431@reddit
nodating@reddit
staladine@reddit
jd_3d@reddit
ortegaalfredo@reddit
Easy_Try_1138@reddit
sunshinecheung@reddit