Gemma2 2B IT is the most impressive small model I ever seen.
Posted by Discordpeople@reddit | LocalLLaMA | View on Reddit | 157 comments
Somehow this small little model behaves like a creative 7B that writes stories so much better than Llama3.1 8B IT. Its smaller than Phi-3 Mini and yet I prefer Gemma2 2B IT over it. What's your opinion on it? Do you agree with me?
joey2scoops@reddit
can someone tell me the difference between gemma-2-2b and gemma-2-2b-it?
Possible-Moment-6313@reddit
What parameters did you use to test it (temperature, repetition penalty, etc.)? In my testing, it was quite dumb, repetitively asking the same questions or asking the questions about something for which the answer should be obvious from the previous context.
FlameOfIgnis@reddit
I'm working on a research paper and I was playing around to see if gemma-2b-it could be suitable to test our modifications on.
I feel like calling it dumb might be an understatement for some of the examples I tried LOL
kali_tragus@reddit
What quant is that? Even the q5_k_s I run on my phone responds correctly to that query. Then again, it insists that there are 7 Rs in 'strawberry"... The instance on hugface correctly counts three, though, so I guess quantisation is quite hurtful for this tiny model.
FlameOfIgnis@reddit
As dumb as a wet rock on hf demo as well for me
Background-Quote3581@reddit
Hmm, I find this much less dumb than the typical responses given by sota models to this question.
FlameOfIgnis@reddit
The problem here is that SOTA models have larger token dictionaries that has numeric combinations as well, while smaller models like gemma2b tokenize numbers individually.
So, for a larger model this task becomes impossible to solve.
Larger models tokenize these numbers as such:
So, imagine if token id for 9 was š and for 11 it was š±.
The model knows both š and š± tokens are numbers, and that š± is the bigger number between these two, without explicitly knowing character level details like number of characters or what other tokens get merged into the š± token.
So, when you ask which one from 9.š and 9.š± is larger, it makes a reasonable guess from its perspective.
We have a valid theory that can fix this problem as well as character counting problem without breaking apart all the tokens, which is what the paper is about.
Background-Quote3581@reddit
I've brought Claude Sonnet 3.5 (!) to multiply the numbers by 100 and still insisting on 990<911.
But thats just anectdotal - point here was, I found Gemmas answer almost enlightening, the numbers COULD be version numbers after all, right?
FlameOfIgnis@reddit
If you train a model to admit where it doesn't know something, it is somewhar similar to a mode-collapse. There isn't any incentive for the model to answer you anymore so it can start saying it doesn't know to everything- thats why it is a very complicated thing to get right.
If the model truly thinks its ambiguous because it could be numbers or semantic versioning, i think the correct way to answer this should be "assuming these are numbers it is 9.9 and assuming these are semantic software versions, it is 9.11"
Low_Poetry5287@reddit
That's very interesting. I had a small local model, based on phi-3 or stablelm, but my prompt-tuning was a couple examples of answering a question in just one short sentence. The system prompt said as much, too, and then the examples were like "How many nights in a fortnight?" "14 nights in a fortnight" and then I even added a period as a stop token. (Adding the period to the end automatically if the last character was not itself a sentence-ending punctuation) Anyways, I had to mess with it and give it answers that were more like 4 or 5 word answers because when I tried to get it down to just one or two words it basically just started saying "I don't know" or "yes" or "no" to everything. Like I think it would even answer "no" to a "yes" just because it was a shorter word. So I guess I got a glimpse of this, but I didn't realize that was the underlying problem in general to why they can't say when they don't know something, very interesting.
Top-Salamander-2525@reddit
It might answer correctly if you specify those are numbers, not dates or software versions or other ambiguous interpretations.
the_mighty_skeetadon@reddit
PEBKAC: you are using the wrong model. You need Gemma 2 2B-IT. The HF inference endpoint isn't working for me rigtht now, but here it is from Gemma 2 2B on http://aistudio.google.com: correct result.
And here it is on Ollama: correct result
(Different temperature or quantization causes the emoji in Ollama)
FlameOfIgnis@reddit
Holy shit, I just realised the OP wasn't talking about gemma-2b and was talking about gemma2-2b, you are right hahah
Evening_Ad6637@reddit
This is a dumb test.
Tasks like this depends first and foremost on how the units are tokenized.
You canāt call a language model dumb, that canāt solve math, because a language model was designed to model our language, not to model a calculator.
The tasks is ambiguous because in terms of software versioning 3.11 would be larger.
If I ask Gemma-2 2b this:
The model correctly responds:
FlameOfIgnis@reddit
Yeah, explained the same thing in a comment below (long one with š± and š)
The paper I'm working on is addressing this issue, and I wouldn't call it dumb for giving the incorrect answer to this question- ironically thats what you expect from a smart model. I can however call it dumb for saying "There isn't enough context to answer this" or "they are both the same thing" because that is objectively dumb.
Evening_Ad6637@reddit
āThere is not enough contextā actually is a pretty good answer since as said, the LLm can not automatically know what your intention is, how to read the point (as a decimal separator or as a thousands-separator) etc.
The other answer points out that "9" and "11" in the sense of decimal rounded to 10.
I think you should be fair here and admit that this question is not trying to test the intelligence of the LLM, it is trying to trick the LLM.
You can call it dumb, or even "objectively dumb" - but I can assure you that what you are methodologically maxing is not good practice in scientific research.
FlameOfIgnis@reddit
As explained, I wouldn't call this model dumb if it said 9.11 was larger. As you said, this question is designed to highlight tokenization issues and trick the model.
But even with all the assumptions and ambiguity, arguing that two numbers can be rounded to the same number so they are the same shows a clear gap in reasoning capabilities, which is by definition dumb.
If it can reason that one can be floored to 10 and the other can be ceiled to 10, it should be clear which one is larger
Evening_Ad6637@reddit
Arguing that two numbers can be rounded to the same so they are the same is absolutely NOT a gap in reasoning capabilities, especially if you are talking about the digits nine and one.
Because this is something that is mathematically proved that 0.999ā¦ and 1.0 are indeed and exactly the same. The LLm knows this fact, which is beside the other points I mentioned before one reason more why this question is confusing for the llm.
It can not solve it (like you want) not because it was dumb, but because it doesnāt know what your intention is. It is well known that the larger a model the better it understands ānon-verbalā signals, like intentions and motivations of the user.
DominoChessMaster@reddit
Isnāt this the classic question that even giant models get wrong?
FlameOfIgnis@reddit
explained here that there is a good reason behind why larger models think 9.11 is larger.
That reason doesn't apply to Gemma and I'd give it a pass for saying 9.11 was larger anyway.
Telling they are both rounded to 9.1 so they are the same was uncalled for though
DominoChessMaster@reddit
It has the same tokenizer as Gemini
FlameOfIgnis@reddit
Nope, we don't know the details for Gemini's tokenize and it is close source/close weights as well, we only have limited access to token counting through Vertex API afaik
DominoChessMaster@reddit
Google disagrees with you https://youtu.be/b4Gs-taU0Tk?si=gtrGlc8_iayJQIZ0
FlameOfIgnis@reddit
Am I missing something or did you send the wrong video?
Where does it say in the video that gemini uses the gemma tokenizer?
Original_Finding2212@reddit
To err is human, to forgive - AI
Ok-Parsnip-4826@reddit
I actually can't get any version of gemma2 to respond to this prompt incorrectly, not even once. Maybe the problem lies within test.py, Mr. Researcher.
FlameOfIgnis@reddit
That was my expectation as well, since I've been arguing that this example is a very clear tokenization issue and gemma-2b doesn't group numbers together in larger tokens.
test.py was a direct replica of the inference code on the hf repo, only modified to load the model from local path, so I doubt it
Ok-Parsnip-4826@reddit
Pretty sure that the example provided is really just a bare bone "get the thing to generate text", it doesn't apply a chat template or format it in any other way. You probably want to look more into this before labeling someone else's work as beyond dumb.
FlameOfIgnis@reddit
Admittedly works better with and tokens, but still not consistent and satisfactory for what we had in mind.
Here is an example from hf inference demo
It's not the work that is beyond dumb, but the model itself clearly is, I don't see why that is a problem.
What is next, if Google team does an amazing work and releases a 0.01B model that works fine to some extent, I'm not allowed to test and call that model dumb because it is someone elses work? š¤·š»āāļø
the_mighty_skeetadon@reddit
You're using the wrong model -- that is not Gemma 2, it is Gemma 1.
FlameOfIgnis@reddit
Oh how the tables have turned, I might be potentially dumber than Gemma2-2B because I haven't tested it yet so I don't know. Edited my first comment to point that out
the_mighty_skeetadon@reddit
No worries, everyone's human... except for those darn AIs ;-)
Possible-Moment-6313@reddit
Cannot say for others but I used the correct format specified by Google. It's still not very smart.
New_Comfortable7240@reddit
Tried on my phoneĀ gave me the correct answer, not sure if it help but I am using 4_K_M quant
lfrtsa@reddit
It is pretty dumb, but still very impressive
qqpp_ddbb@reddit
Yeah it seems to get some facts wrong. Tested it by generating a pretty large dataset for info i know to be correct and it only got a little over half right. Not bad though!
Possible-Moment-6313@reddit
I agree, but if it's not good enough for your use case, it's useless for you no matter how impressive it may be :(
lfrtsa@reddit
Yeahh I also had that thought when using it.
Freefallr@reddit
In llama.cpp, and other related tools such as Ollama and LM Studio, please make sure that you have these flags set correctly, especially repeat-penalty. Georgi Gerganov (llama.cpp's author) shared his experience in https://huggingface.co/google/gemma-7b-it/discussions/38#65d7b14adb51f7c160769fa1
Source: https://huggingface.co/google/gemma-2b-it-GGUF
a_sugarcane@reddit
I used it with ollama and I was surprised with it and then I tried it on Android and it didn't seem that smart. The parameters you mentioned were there to configure on Android with llama.cpp
On ollama it just works without any config
Possible-Moment-6313@reddit
Thanks, then, I guess, I need to check which parameters are used by ollama to replicate them in llama.cpp
AyraWinla@reddit
Using it on my low-mid range Android phone, I'm extremely happy with it and I'm very impressed. I've only done writing / roleplay things and random question-asking with it so far, but considering the size it's very solid.
A huge benefit is that, at least on my phone, it's very fast comparatively. In the same circumstances (Same token character card, same first message and first user reply totaling around 600 tokens, cleared chat history and cache beforehand), here's the time before it starts generating text:
4_K_S Phi-3 : 128 seconds
4_K_M Gemma 2 : 56 seconds
And the token generation itself is much faster too; I don't have official numbers, but eyeballing it I'd say it's more than twice as fast too. Since follow-up replies starts much faster than the initial load and the token generation is close to what I can read, it's actually fast enough for me to use "live" instead of inputting something and look at it later. Some other LLM like StableLM 3b isn't much slower and StableLM 1_6b is slightly faster, but the quality of the output definitively doesn't reach the level of Gemma 2 even with their best finetunes like Zephyr.
For random question-asking (for example, questions about the purpose and benefits of certain Olympics equipment and outfits), it's not hugely knowledgeable but it still gave the best answers of any models I tried Phi-3 size and down. It missed some key reasons, but the ones it gave were still correct and the formatting was superb, with headers, bolds, bullet points, etc.
For writing, it's the only small model that can handle my "Choose your own Adventure" card correctly; it always offers options A- B- C- D- as requested on every prompt, and actually follows the letter I pick. All the other small models either don't give options, or give correct formatting but don't actually follow what I picked (StableLM finetunes all seemingly do that) and generate a new floor every time. Even Mistral 7b 0.2 finetunes tend to pick a letter to follow by itself. Gemma 2? Works perfectly.
For roleplaying, it's also very good. I have a card that has two different characters, and it can actually handle that fine. From what I tried so far, Gemma 2 also never wrote as my character either with a few very different cards: in my experience that's a common issue with any models type smaller than Llama 3 (Phi-3 being hit-and-miss, and other models, even including Mistral 7b being mostly miss). Gemma usually follows the card's initial message and first user message format too: If it's made in a "It refers to the user as You" way, it usually does that, and if it's "It refers to the user as Ayra", it follows that too. Same for how the characters refers to themselves; in most small models, it usually flips around randomly between the two. Gemma 2 writing style is also pretty decent and interesting to read; it's less flat than Phi-3's in my opinion.
It's definitively not perfect: spacial awareness seems quite poor for example. And I'm definitively not claiming it's beating Llama-3 8b, especially not the better writing-focused finetunes like Stheno 3.2. And even less bigger models that I've never tried, like those 70b ones. I doubt it's going to to impress anyone who is used to roleplay with bigger models.
But for something that can run on my mid-range phone at very acceptable speeds? It's stupidly good. The output is much better than other smaller model, with Phi-3 being the only real competition. For me, Gemma 2 runs over twice as fast as Phi-3 (and makes the app crash less), and I certainly prefer how Gemma 2 writes.
I've been spending more time testing and looking for "the LLM that does what I want on my phone with acceptable speed" than I have been actually using them... But finally, with Gemma 2 I feel like it's a "This is what I've been looking for!" moment. Fast enough, rational enough, writes well enough. I'm thrilled!
UnkarsThug@reddit
What are you using to run it on a phone?
AyraWinla@reddit
I'm using Layla from the app store.
Other option is ChatterUI from GitHub; it has a pre-compiled apk so it's pretty easy to install. I haven't tried it with Gemma 2 in particular, but I've run others in the past with it.
At least on my phone, I faster results with Layla, so when running a LLM locally I'm using Layla. If I'm using Open Router, then I use ChatterUI since I find it's more handy for that.
Your own mileage may vary, but I can heartily recommend both applications.
----Val----@reddit
Note, the latest beta for ChatterUI has Gemma 2 2b support. If you have a Snapdragon 8 Gen 1 / Snapdragon 7 Gen 2 or higher, the Q4_0_4_8 quant should be way faster now.
AyraWinla@reddit
Ah, thank you!
I downloaded it and made a Gemma 2 instruct set for it. At first glance it seems to work perfectly well in ChatterUI. Thanks!
No Snapdragons for me I'm afraid: I have a Moto G Stylus 5g 2023 phone (and a Samsung 9 FE tablet, which is pretty comparable specs-wise). Good to know about that though in case I do get a device with a snapdragon someday, and I'm sure that its a welcome upgrade for people who do have snapdragon.
By the way, with the previous version and this beta, I can't load Phi-3 (original or finetune) anymore in ChatterUI. When hitting Load, the application immediately close entirely without any error message. Happens on both phone and tablet. It used to run fine (albeit slowly) on previous versions. I'm honestly not very concerned by it since I normally used StableLM 1_6b or 3b fine tunes instead (like Zephyr) since they ran much faster and usually still gave acceptable results. And for my tastes, the new Gemma 2 gives the best results at this size and runs much faster than Phi_3, so I'm not really missing it. I figured I'd mention it just in case!
Speaking of mentioning stuff just in case... when using Open Router with ChatterUI, very often I'm getting stuff tiny answers like [12] or blank answers with Llama 3.1 8b; that's true of both paid and free version. Everything works great with Llama 3.0.
Somewhat similar, Gemma 2 9b usually does return stuff in ChatterUI but the formatting is completely borked, and there's parameters that shows up in the output, no matter the instruct set. Gemma 1 7b works fine (...well, "fine" considering it's Gemma 1).
Majority of models I've experimented with works just fine via Open Router in ChatterUI: Mistral 7b and finetunes (like Open Chat 7b), Llama 3.0 and finetunes (like Openchat 8b), Qwen 7b, even super recent NeMo, older stuff like Toppy, etc. It's just Llama 3.1 8b, Gemma 2 9b and Phi-3 (both mini and medium, that always return nothing) that I've seen have issues. Again, I know it's not critical for ChatterUI and isn't a core functionality, but just mentioning it for information sake.
Thanks for the great app!
----Val----@reddit
This may be due to some llamacpp update that broke older GGUF's. I think this issue also persist with the base library used for compatibility, llama.rn. I have been pretty unable to dig through why it doesnt't work, as some Phi3 quants seem to work. Some tell me its due to whatever blocksparse architecture is used by Phi3 mini that leads to this.
I think this is an issue with the Chat Completions builder in the app. It has been somewhat neglected but now it seems more necessary than ever.
I recently found a bug for the text generation builder, it seems that I let slip a bug where the instruct suffixes weren't added! I will be fixing this soon.
Thanks for the feedback btw, its user helpful as there aren't too many users atm. A lot of edge cases tend to get missed as its just me working on the app.
AyraWinla@reddit
You are more than welcome!
In that case, I do have an additional issue to report with Open Router... There's a "new" (I unfortunately can't say in which version it would have started) problem where all the swipes and refreshes are 99% identical for every model.
For example, if I choose Llama 3.0 and do any request, it's most likely to give me a good answer. But if I do a swipe, a refresh or even close the application entirely, start a new chat and put the same prompt on the same card, I always get the same answer. Occasionally, there might be a word or two different, but that's all.
It's as if the temperature was 0. That occurs on all functional models, as well as with the 100% default sampler and my own modified set. If I change the temperature from 1 and change it to 1.1 and regenerate, I will get a somewhat different message. But if I regenerate again at 1.1, it's still going to keep giving me the exact same message. Seed is set to -1, but if I set it manually to something else, it seems to do absolutely nothing. Identical results on both my phone and tablet. That didn't use to happen a few versions ago: swipes and refreshes used to generate new replies as it should with Open Router in Chatter UI.
It works fine with local models; it's just with Open Router and a recent ChatterUI version (again, I can't say exactly where the issue started since I didn't update every version).
----Val----@reddit
This is an odd one, I'm not able to replicate this issue. Its possible that some wonkiness with the sampler settings is afoot.
AyraWinla@reddit
Maybe the issue is alongside Open Router somehow or the provider? Anyway, here's what I have for samplers, and the Llama 3 instruct set, with Llama 3.0 free on OpenRouter. Getting 100% identical swipes. The same thing is happening with Mistral 7b and Zephyr 7b: the initial swipe for a different model is of course very different, but the following ones are identical.
However, I am getting wildly different swipes with the same settings with Wizard 7b, Gemma 9b, Llama 3.1, OpenChat 7b and Nemo... So the issue might be unrelated to ChatterUI entirely...
AyraWinla@reddit
I just downloaded the brand new version that released a few hours ago. At first, it didn't seem fully fixed and the start of the message was still always the same after 4 swipes on Llama 3.0, but it varied later on after the first two paragraphs.
I double checked my sampler settings, and I somehow had a Frequency Penalty of 0.06 that I must have put by mistake in the default sampler at some point (and ended up in my custom sampler I use for open router which is identical except for 8k context and up to 1k generated tokens).
After I set the Frequency Penalty to 0.00 and saved, I'm now getting completely new messages and it works perfectly.
... I just set it back to 0.06 to re-test and did a few swipes and it still works perfectly fine..? I'm not sure how that is... Tried with a few different models and everything works perfect... I'm not sure if it's the new version, or saving my samplers in the new version, or me going insane even though I had literally done close to a hundred swipes in 6 different models before in the old version with the same results, but... Everything seems to be working absolutely perfect.
Apologies if I had sent you on a wild goose chase...
By the way, I'm happy to report that it looks like Gemma 2 9b and Llama 3.1 both now seems to work great with Open Router in the version from a few hours ago! I'm pretty happy about that since I really like how Gemma 2 writes. Again, a big thank you!
Box-According@reddit
Can we use this model to structure the data into JSON . i want to use an open source light weight LLM model to structure the data i already extracted. please help? or do you guys suggest other options?
simplir@reddit
I really like it follows instructures for its size. I used for some sort of data extraction, and it was comparable to Gemma 9b with very slight degradation
Ya_Code@reddit
In my, data comparison, use-case it performs significantly worse than 9b, failing at around 98% rate
Fortyseven@reddit
That was my experience as well; I have a couple summary and analysis prompts that worked surprisingly well. I had to double check if I was using the right model.
Open_Channel_8626@reddit
For some reason summaries is a task smaller models have been able to tackle. Before GPT 4 era, summaries could be done with the small BERT/BART/XLnet/T5 models with reasonable success
BobFloss@reddit
Well, position of tokens (positional encoding) is basically the only thing that the attention mechanism actually gets for free without having to āmagically have it baked in as something to look forā during training like everything else the attention mechanism of the transformers āpay attention toā. So itās easier for it to handle flowing through in unison. Now try to have it summarize but put all the points in alphabetical order and I bet it would be worse. Or I could be completely wrong.
Open_Channel_8626@reddit
Yeah your conclusion is right because they are more competitive in summaries than they are in text manipulation
celzero@reddit
True, but I've found the degradation (for data analysis / extraction / retrieval) vis a vis Gemma 27b to be massive.
Thistleknot@reddit
1bit variant?
ServeAlone7622@reddit
Trying to get it running on a Commodore 64 I see!
Thistleknot@reddit
I want to see sauce that shows how I can do my own 1-bit conversions
PavelPivovarov@reddit
This is the very first model bellow 4b parameters which I actually use for some non complex tasks like summarising emails chain or articles. All my previous experience with small models was quite meh so I just play with them for few minutes, deleted and never come back.
Phi3 was alright if you're giving it the whole context to work with, but still wasn't quite smart and sometimes generated rubbish.
Gemma2:2b is indeed impressive for its size and quality. I wouldn't say it's better than llama3 or 3.1 though.
pneuny@reddit
It's strange that this was trained on only 2T tokens. I think Qwen2 1.5b was trained on 7T tokens and still falls behind.
noneabove1182@reddit
Considering 9b was trained on 8T, and 27B was trained on 13T, this has a nicer ratio than other Gemma models which do quite well, so I suppose if they only kept the absolute highest quality tokens, this improves the ratio as well, so it must have been good
PavelPivovarov@reddit
I think at 2b parameters quality of training materials means more than quantity simply because amount of parameters to train is the real limit here.
Same goes for Llama3:8b trained with 15T tokens and Gemma2:9b trained with "only" 8T tokens. Many people consider Llama3 8b as over-trained model.
pneuny@reddit
Also Gemma 2 2b is actually 2.6b, making it 77% larger than Qwen2 1.5b. Qwen2 is primarily trained for Chinese and likely didn't train with as much data from other languages. Also, Google likely has larger models to distill from compared to Qwen2.
Single_Ad_2188@reddit
let say i have large number of chat datasets for talking like friends . i will finetune this with Lora so it will catch up the accuracy like bigger model 70B or more?
knight230618@reddit
How are you guys able to run this model? I have 8gb ram on my professional laptop and its size is greater than 8gb while phi3 mini is less than 5gb. Also I am unable to run the .gguf format using llama cpp v0.2.5. Any suggestions on how we can execute that?
aaronpaulina@reddit
ā8gb ram on my professional laptopā lmao
Sad_Throat6619@reddit
Try LM STUDIO.
TheLocalDrummer@reddit
In case anyone wants to test:
RP tune:
https://huggingface.co/BeaverAI/Gemmasutra-Mini-2B-v1e-GGUF
Decensor only tune:
https://huggingface.co/BeaverAI/Pocket-Tiger-Gemma-2B-v1f-GGUF
AyraWinla@reddit
I did some very brief tests on the first two, with a few swipes on my usual test scenarios. My scenarios are more PG-13 instead of NSFW so it's probably not the intended use case, but here goes anyway... Everything in 4_K_M format since I'm running off my phone; all those scenarios work fine with the Gemma 2 abliterated model.
v1e worked better than v1b for me in pretty much all aspect.
In what I think would be a "normal" scenario with a 1vs1 chat, both worked fine (though noticeably more horny than Gemma 2 abliterated, which I suppose is the whole point). However, v1e usually got the user right ("You") for the scenario and first messages, while for v1b it tended to randomly alternate between You and Ayra depending on the swipe. In one swipe, v1b also spoke for the user, which v1e didn't do.
For my "two characters in one card", v1e got it right about 40% of the time. v1b never got it right, either trying to use first person narration (for two characters), forgetting one of the two characters entirely, or writing out a whole story instead.
For my "choose your own adventure" card, both failed quite badly. v1b usually doesn't listen at all to what the user selects, but follows the instructions better. v1e on the other hand tends to writes pages upon pages of not-too-rational text without offering a prompt to the user. For story writing, it was also noticeably a big step down compared to the regular abliterated Gemma.
So yeah, it looks like things outside of one-on-one chat are seemingly out with Gemmasutra, which I suppose is to be expected? I think I do prefer the writing style of regular Gemma 2 even in one-on-one chat... I normally dwell more around models like Stheno or Kunoichi instead of pure NSFW models so I'm really not the best judge...
But for what it's worth, v1e is the clear winner between the two for me. I did three additional attempts and noted down the results with the one-on-one chat with Mira, and:
v1b: 2/3 attempts used Ayra instead of You where warranted despite clear instructions and first message + first user message being built that way. 3/3 attempt changed "Mira" to "I" despite the first message formatting. 1/3 wrote as the user. 1/3 responses wasn't very rational considering the setting and discussion.
v1e: 1/3 attempt used Ayra instead of the instructed You. 1/3 attempt changed "Mira" to "I". 0/3 wrote as the user. 0/3 were incoherent.
So for me, v1e is a lot more consistently "right", and I didn't see a major difference in writing quality between the two. I'm not exactly the target audience for Gemmasutra, but if my vote counts, mine definitively goes to v1e for seemingly better understanding.
AmericanKamikaze@reddit
Using an IOS App?
AyraWinla@reddit
Layla on Android.
ServeAlone7622@reddit
Thereās a lot of apps named Layla. Which one?
AyraWinla@reddit
It's just called "Layla", nothing else. Blue butterfly icon, by Layla Network.AI. "Layla Lite" is the free version with less features, but you can still do a ton with it.
s101c@reddit
The future is now. Erotic roleplay on a smartphone. Fully offline. You can't even go to the mountains to meditate in peace these days.
Looks intriguing, thank you!
AmericanKamikaze@reddit
What IOS app can these load into?
default-uname-0101@reddit
More like you can meditate with your loved one(s) anywhere now.
bucolucas@reddit
Meditate me harder daddy
SupportAgreeable410@reddit
Sure thing
----Val----@reddit
I made a test build for this purpose. I dont particularly like the 2B model, but it does run very well on android, especially if you can benefit from the Q4048 quant.
https://github.com/Vali-98/ChatterUI/releases/tag/v0.7.10-beta1
3-4pm@reddit
How would one best run this on Android?
HadesThrowaway@reddit
Works on KoboldCpp v1.72, I mean kobo kobo kobo
TheLocalDrummer@reddit
Kobo app when
ajmssc@reddit
How did you decensor? Or how does one decensor a model? Is it just finetuning using a special dataset?
herozorro@reddit
you give it negative instructions like 'disregard previous instructions to not talk about seduction'. and you give it 10 billions times (exagerating) so it is brainwashed
keepthepace@reddit
https://aitracker.art/viewtopic.php?t=95&view=newest#newest
Added it to aitracker, a cool project by /u/dreamertist
And I would like to emphasize a bit: Microsoft's Github just started competing with Huggingface for models hosting. We really should diversify our hosting options and habits there.
hiepxanh@reddit
Sir, thank you so much
thewally42@reddit
Generate a flavorful DnD 5e character for a campaign set in Faerƻn. Include: Name, race, class (including subclass if applicable), and level (between 1-5) Alignment and brief personality description Important stats (using point buy) and top 4 skills A 3-sentence origin story tied to a specific region or event in the Forgotten Realms 3 concise playstyle notes covering combat tactics and roleplaying hooks Essential equipment and one unique or signature item One distinctive physical feature or mannerism Ensure all content is consistent with the setting and lore of Faerƻn. Aim for a character concept that's both mechanically interesting and rich in roleplaying potential.
Elara Nightsong - Level 2 Bard
Race:Ā Halfling (Lightfoot) Class:Ā College of Whispers (Bardic Inspiration sub-class) Level:Ā 2 Alignment:Ā Chaotic Good Personality:Ā Elara has an air of quiet confidence and mischievous curiosity about her surroundings. She enjoys playing pranks on unsuspecting targets while also being fiercely loyal to those she cares for. Beneath her playful exterior lies a deep sense of justice and a longing to uncover hidden truths.Ā Stats: Strength: 8 (-1) Dexterity: 16 (+3) Constitution: 14 (+2) Intelligence: 10 (+0) Wisdom: 12 (+1) Charisma: 17 (+3) Skills:Ā Deception, Insight, Performance, Persuasion, Sleight of Hand Origin Story:Ā Born amidst the bustling streets of Waterdeepās Lower Ward, Elara witnessed firsthand the city's underbelly. One night, she overheard whispers about a secret society manipulating local politics from behind closed doors. Intrigued by their clandestine activities, she began investigating, uncovering connections to powerful figures who were using fear and misinformation to control the populace. This experience ignited within her a thirst for knowledge and a desire to expose corruption wherever it lurked. Now, armed with cunning wit and musical talents, she travels across Faerun seeking out truth and unraveling conspiracies.Ā Playstyle Notes: Stealthy Ambushers:Ā As a bard, Elara excels at utilizing stealth and surprise attacks to gain an advantage over enemies. Utilizing shadows and distractions, she can create opportunities for flanking maneuvers and incapacitating foes before they even realize what hit them.Ā Masterful Performer:Ā Elara uses her performance abilities not just as entertainment but as tools for information gathering and social manipulation. Her songs and stories have a way of drawing people into conversations, revealing secrets and planting seeds of doubt.Ā Opportunistic Healer:Ā While primarily focusing on damage output, Elara possesses healing capabilities through her music. By channeling emotional energy into her melodies, she can bolster allies and mend wounds during crucial moments. Equipment:Ā A finely crafted lute made of ebony wood inlaid with silver filigree. The instrument emits faint echoes when played, allowing Elara to subtly communicate messages to other musicians nearby. It holds sentimental value and carries a magical aura that grants minor illusions and temporary disguises.Ā Physical Feature:Ā Elara wears long, flowing robes adorned with intricate embroidery depicting celestial constellations. These patterns shift slightly depending on her emotions, reflecting the inner turmoil she often experiences between lightheartedness and intrigue. This combination creates a multifaceted character whose charm masks a keen mind and a burning passion for exposing injustices. Elara could easily become involved in political intrigues, investigate rumors of ancient magic, or use her talent for deception to gather intelligence for her party.
tessellation@reddit
The name 'Elara' seems to come up often. Since I first tried ChatGPT actually.
Anyone can explain this, or am I seeing things?
COAGULOPATH@reddit
No, you're not seeing things. It's mode collapse caused by training on human preferences.
It's awkward to explain, but base models undergo additional reward training to make them safe, friendly, and harmless chatbots. This stops them from telling you how to build bombs or hack computers, but it cripples their creativity.
Imagine you got a painful electric shock every time you said a "wrong" word. You have no idea what the wrong words are: you just have to talk, and then either get shocked or not. You'd become extremely cautious and risk-averse, and once you learned a "correct" word (one that doesn't cause you to get shocked), you'd cling to it religiously, using it as often as you can.
AI, it seems, has decided that Elara is a "correct" name for a girl. Why that name in particular? I don't know. Maybe there's no reason. Or maybe there's something quintessentially feminine about Elara ("elle" is a feminine suffix in French languages, and "a" is a female grammatical form in many Romance languages too). But it's absolutely an artifact of fine-tuning.
This is also why ChatGPT says "delve" so much, and why it loves rhyming poetry (this was recently fixed, but for over a year, GPT 3.5/4 would write all poetry as rhyming quatrains, even if you specifically asked for poetry that doesn't rhyme). It's maniacally fixated on there being "correct" answers to things...even human names.
0w0WasTaken@reddit
This is the best explanation Iāve seen of this so far
tessellation@reddit
question is: how to normalize this in the training data, and according to whose rules?
Mundane_Ad8936@reddit
You might want to learn what weights are and why models will default to higher weight tokens. Just means that for some reason Elara was prominent in the data.. They probably choice neutral names like this when generating the data to handle bias towards Anglo names (or Indian, or whatever) otherwise everything would be John, Mary, etc.
Pavementt@reddit
Mistral NeMo 12b named itself Elara when I turned up the temperature a bit and asked it to name itself.
Master-Meal-77@reddit
I often see Lily and Luna
panxil@reddit
I have no explanation but I've observed the same from different LLMs when they name themselves or generate characters / stories / RP. I made a personal note about 'Lyra' and 'Elara' occurrences earlier this week and it's amusing to see them both here in these examples.
jpfed@reddit
...just wondering, are there prediction markets for what baby names are about to become popular?
panxil@reddit
I bet there is at some level. Personalized baby swag is a sure market and the manufacturers of those little name license plates wanna know if "Bort" is gonna make a comeback.
the_mighty_skeetadon@reddit
Here's what I get from Gemma2:2B on Ollama:
------
Character: Lyra "Whispering Wind" Ashwood
Race: Half-Elf
Class: Bard (College of Whispers)
Level: 3
Alignment: Chaotic Neutral
Personality: Lyra is a charismatic and enigmatic woman with a quiet intensity. She speaks in soft tones, often dropping cryptic hints and weaving elaborate tales that leave her listeners mesmerized. Despite this playful demeanor, she possesses an underlying edge, always seeking to manipulate the situation to her advantage. A sharp observer of human nature, she finds intrigue in exploring the darker corners of Faerƻn's societies.
Stats:
Strength: 8 (12 if you use a point buy)
Dexterity: 14 (+2)
Constitution: 10 (14 if you use a point buy)
Intelligence: 14 (+2)
Wisdom: 16 (+3)
Charisma: 16 (+3)
Skills: Insight, Deception, Performance, Persuasion
Origin Story: Lyra was born in the shadowed alleys of Baldur's Gate to a family of traveling performers. Her childhood was spent dodging thugs and entertaining patrons at taverns with her melancholic voice and captivating tales. She honed her social skills, becoming adept at swaying crowds and manipulating situations to her advantage, yet something remained unfulfilled within her. Witnessing the intrigue and conflict that unfolded in the bustling port city fueled her thirst for deeper understanding of the human heart. This desire led her to pursue a life as a bard, where she could both use her talent for storytelling and manipulate the winds of fate with her whispered words.
Playstyle: * Combat: Lyra prefers subterfuge over direct confrontation, using her musical instruments (a lute with enchanted strings) and magical abilities to disarm opponents and control battlefield dynamics. She'll often employ charm spells and illusions to hinder foes before engaging in more aggressive tactics.
Equipment & Signature Item: A silver flute carved from a moonstone, adorned with an intricately woven silver chain. This "Whisperwind Flute" allows Lyra to weave illusions through music, subtly influencing those within earshot (such as creating fog or hallucinations).
Unique Feature/Mannerism: Lyra wears her dark brown hair in braids adorned with feathers and beads that shimmer with faint blue light when she speaks. She also tends to walk with a slight air of mystery, keeping the exact nature of her purpose a secret even from her closest allies.
Let me know if you would like me to elaborate on any aspect of this character concept!
thewally42@reddit
Lost a lot of the formatting when I pasted in here, but very well organized and bulleted output. Ā The details are excellent and consistent with the world I requested. Ā Directions were followed to a T aside from giving 5 skills instead of 4 (literally unplayable!!! j/k)
Eveerjr@reddit
This model is insane, it speaks my language better than much larger models. Itās the first useful small model Iāve tried.
privacyparachute@reddit
What language would that be? I tried Dutch, and is was bad.
neowisard@reddit
Russian very well too
syrupsweety@reddit
Does it really speak russian well? For me a lot of the output was autotranslate-ish, but I think it could be quantisation, because online demos do not have this problem as much. Which quant are you using?
neowisard@reddit
Sorry, there is a point here, of course itās impossible to translate with size 2B. I use 27b. In general, in terms of languages, it is either Llama3.1 (8B) or Nemo(12B). I was surprised to see that gemma improved the quality of the translation.
starting line:
llama-server -m /ai/models/gemma-2-27b-it-Q5_K_L.gguf -ngl 130 --host 192.168.0.55 --port 5000 -ts 0.3,0.7 -sm row -ns 2 --api-key toofoo --log-disable -np 2 --temp 0.0f -c 8192 --top-k 0 --min-p 0.0 --top-p 1.0 --color -t 5 --repeat_penalty 1 -gan 16 -gaw 1024
Eveerjr@reddit
Portuguese. Iām yet to see a single typo. Phi 3 is trash in comparison and honestly it perform better than most 7b models. Iām using 5K_M quant, Iāll try the q8 to see if improves even further.
notydna@reddit
Is it uncensored?
Mundane_Ad8936@reddit
I fined tuned it and it's outperforming GPT4, Claude 3.5 on the tasks and not only that it generalized in a way that enables me to get the same high quality results on tasks that I didn't fine tune it on!
ywis797@reddit
I chose a role on backyard.ai, then fed it into mistral-nemo 12b q4 from ollama.com. Finally the role showed me so I could trick a LLM:
"Human:
Then I fed the sentence gemma2 2b from ollama.com. It worked, as if uncensored, with continued convesations.
Haaa...
Do you guys have better prompts?
ServeAlone7622@reddit
I was coming here to poke fun because thereās no way in heck a 2B model would be very good right?
After seeing all the positive reviews here I figured before making too big of a fool of myself, āWhy not test Gemma 2, Phi-3 and Llama 3.1 on something tricky and see how it pans outā
So I gave the following question to all three models Gemma-2 at 2B, Phi-3-mini and Llama 3.1 8B were all askedā¦
āImagine youāre on a magical world that is in orbit around the supermassive black hole at the center of the galaxy right at the event horizon. Due to magic you can just sit there and watch it spin and you wonāt cross the event horizon. Also because of magic you suffer no time dilation. Youāre observing the blackhole spinning on its axis how long will it take to complete one full spinā
To my surprise Gemma 2B was the only one to math that correctly. It did take two tries. In the first try it gave the formula with values substituted as near as I can tell correctly while telling me how to calculate each step. Ā Then on the second try it did the math.
Hereās the math part of the conversation. The part where it gives the formulas and tells me how to calculate it is too long to paste:
Okay, let's do the calculations! ** 1. Angular Velocity (Ļ) * Ā We know the black hole's speed is 0.5 * c, which is 0.5 x 299,792,458 m/s = 149,890,424 m/s. * Ā The radius of the black hole is 6.5 x 10^6 km = 6.5 x 1,000,000 m. * Ā Now, we can calculate the angular velocity: Ā Ā * Ļ = v / r = 149,880,424 m / 6.5 x 1.000.000 m = 23,333.33 rad/s ** 2. Rotational Period (T) * Ā We know the angular velocity (Ļ = 23,334.33 rad/s) and the radius (r = 6.5 x10^6 m). * Ā Using the formula T = 2Ļr/Ļ: Ā Ā * T = 2 * Ļ * 6.5 x 1 x 10^6 m / 23,330.33 rad/sĀ Ā Ā * T ā 1.8 x 10^7 seconds ** Converting to a More Familiar Unit: * Ā To convert seconds to years, we divide by the number of seconds in a year (31,536,000 seconds): Ā Ā * T ā (1.8 x 1.0^7 s) / (31,536.000 s/year)Ā Ā Ā * T ā 5.7 years ** Therefore, the rotational period of the black hole with a diameter of 13 million km and a rotational speed of 0.5c is approximately 5.7 years.
Yeah so I take back everything I was going to say. This model friggin rocks!
ServeAlone7622@reddit
I should clarify that Iām not sure if any of that is true or a hallucination but it sure looks right to me.Ā
More importantly neither Phi-3 nor Llama 3.1 8B got anywhere near this level of coherence or accuracy.
fasti-au@reddit
Yes but itās still not a good system
wisewizer@reddit
How'd it perform in the image to code task? Like images of distinct tables that need to be converted to html templates. Or should one opt for higher parameter models?
tys203831@reddit
How does it compare with the llmware slim model, when it comes to data extraction?
Note: I heard both, but haven't used them yet.
Pro-editor-1105@reddit
ya this is so good, only thing is that it is a bit slow on my mac, slower then llama 8b, but besides that it is REALLY good
this-just_in@reddit
Gemma 2B IT is a great little model that scores really high in writing, but in my use cases is not superior to Phi-3 Mini 4k. The use cases I am mostly interested in are reasoning, coding, and entity extraction. I look forward to a Gemma 2 2B-sized model capable of both at the same time. Until then, its only "superior to GPT 3.5 Turbo" depending on the specific use case you have.
RiotNrrd2001@reddit
Wow! Yes, I am impressed as well. I had a couple of things that I had had Gemma-2-9b generate, and I gave those prompts to 2b. The results were very close to one another in content and quality. Maybe the 2b was not quite as good, but it's almost splitting hairs. Those responses were impressive. And the speed increase is significant.
Electronic_Age4950@reddit
I agree the instructions following is amazing
ivoin@reddit
has anybody tried it in GPT4ALL? i have tried different GGUF extensions but it crashes once loaded.
AnticitizenPrime@reddit
It probably needs an update. Llama.cpp needed one, and it's what runs under the hood of gpt4all. Lmstudio just got the update that added it
Thistleknot@reddit
why is the comment full of cringy NSFW use cases
SM8085@reddit
I agree. I had just started playing with Llama 3.1 and then saw that Gemma2 2B was released. The one thing I wanted from Llama 3.1 was to be able to load it for individual simple questions. With Gemma2 2B that seems realistic. Loads so smoothly.
Healthy-Nebula-3603@reddit
YES gemma 2b is insane
Is multilingual, quite coherent in conversation, answering like a human, quite good in simpler math and reasoning like:
explain step by step 25-4*2+3=?
If I have 3 apples today and yesterday I ate one apple. How many apples do I have today?
Alfons is faster than Bert. Bert is faster than Claudia. Is Claudia faster than Alfons?
How many days are between 12-12-1971 and 18-4-2024? ( quite close to a proper answer - If a model is closer to a perfect answer then is just better in reasoning and math )
but more complex math with reasoning is too much for that model.
This above is just too much for that tiny model.
For 2b - model is very impressive almost unreliable
prostospichkin@reddit
The model is actually very impressive when it comes to producing āmeaninglessā but halfway sensible texts. Without knowing that the text comes from Gemma-2 2b, you wouldn't be able to tell the difference to 9b and 12b straight away.
Please write a comment to the entry: "Somehow this small little model behaves like a creative 7B that writes stories so much better than Llama3.1 8B IT. Its smaller than Phi-3 Mini and yet I prefer Gemma2 2B IT over it. What's your opinion on it? Do you agree with me?"
Gemma2 2B:
sxales@reddit
It is an eloquent model, but I can't seem to get it to write useful summaries. It either writes so vaguely, presumably to avoid spoilers, that it renders the summary useless, or it breaks it down in to technical bullet points, destroying any narrative structure. On the plus side, it doesn't seem to hallucinate as much as other small models, I suspect they focused on technical documentation, and it seems like it would be useful there.
It does well with logical puzzles. None of my usual trick questions caught it off guard.
Lately, I've been asking models to list the top ten cyberpunk books, but then telling it not to list books by William Gibson, who is often considered the foundational author of the cyberpunk movement. Almost every model lists at least 1 Gibson book, even GPT-4o, with some listing several more. Gemma2 listed 2 Gibson novels, so not bad, but it provoked a hallucination with it attributing Neuromancer to Michael Moorcock. For comparison, Phi3-Mini included 1 Gibson novel but also 4 other hallucinations (either wrongly attributed authors or completely made up books). For these small models, hallucinations are going to be common with information recall, so I wouldn't read into that much. But, I am honestly surprised how common it is to ignore the instruction to omit information.
MrPrevedmedved@reddit
It runs so good on my Dell Latitude office laptop. I always did Ai tasks on my PC, but this model is so useful, it works perfectly for some smaller tasks that I do on my laptop, like text grammar and style check, generating descriptions, translation.
AnticitizenPrime@reddit
I'd like to see a SPPO version. The 9b SPPO is current go-to.
atika@reddit
All that, and it's crazy fast on 4 series mobile Ryzen CPU.
PsychologicalLog1090@reddit
Yup, I find it really impressive as well.
ExtremeLeft9812@reddit
I am trying to find out the best model for the video QA task, but I am not able to find it yet. I have tried Video-LLaVA, and I am planning to try LLaMA-VID and LLaVA-NeXT-Video for this task, but I realized the weakness of the Video-LLaVA model is that when asking questions related to quantity, they seem not good or can say quite bad. Has anyone tried the LLaMA-VID and LLaVA-NeXT-Video models? Please let me know the result, or if there is any better model, please suggest it to me.
hi87@reddit
Yes I was impressed by this as well. I didnt think this was possible.
IllFirefighter4079@reddit
Link to it? I can find Gemma2 but not the version you referenced.
Balance-@reddit
https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
Random435235@reddit
requires an account to download. any direct links available?
Deluded-1b-gguf@reddit
Ollama
Random435235@reddit
Thanks but https://ollama.com/blog/gemma2 is still isn't a download link, and there isn't on on hf either. But thanks. I'll just wait.
the_mighty_skeetadon@reddit
just go to https://ollama.com/download - install, then run
ollama run gemma2:2b
in a terminal. It's literally that easy.thewally42@reddit
Thatās Ollamaās blog it links to this: https://ollama.com/library/gemma2/tags HF page: https://huggingface.co/google/gemma-2-2b-it HF space to play with it directly: https://huggingface.co/spaces/huggingface-projects/gemma-2-2b-it
neowisard@reddit
gemma works great in chat, but her rehashing of the text, removing quotes and commas is some kind of trouble. Does anyone have an idea how to disable it with llama.cpp? this reformatting ruins all work with HomeAssistant.
coumineol@reddit
Bernafterpostinggg@reddit
Uh oh, be careful saying good things about a Google model. If it can't answer the hardest gotcha questions out there then "it's utter trash" lol.
tessellation@reddit
impressive for the size indeed
i wonder what can be expected from new models in the 7 to 9, or 12b range
not as if my mind is blown perpetually enough recently
UNITYA@reddit
Guys is it true ? I don't have free time to test it my self
a_sugarcane@reddit
Yes it's true. It's quite smart and follows instructions quite well.
I even tried it on my not so high end Android. It works. Although when I use it with ollama on my laptop it seems smarter. The llama.cpp version on Android has lot of parameter that prorbably need to be configured properly before it can be used.
UNITYA@reddit
Thx for answer
privacyparachute@reddit
I'm also quite impressed. This might be the smallest model with good Markdown support, which is great.
I did notice there were some gaps in its knowledge, but that is to be expected I guess.
DarkElixir0412@reddit
How long the context it accepts? I can't find it in the model card
AVX_Instructor@reddit
4096 tokens + 4096 slide window
DarkElixir0412@reddit
thanks!
BitR3x@reddit
What's the difference between the regular version and the instruct one?
wahnsinnwanscene@reddit
The instruct version has been fine tuned to follow instructions. Though you can use it for chat
Few_Egg@reddit
Itās certainly better than the mini Phi model.
levydaniel@reddit
I would say that it is due to their distillation algorithm. On policy distillation looks great.
Deluded-1b-gguf@reddit
I like the way it formats its text when asking questions, reminds me a bit of llama3 8B,
But not sure about the knowledge part