local-gemma: Gemma 2 optimized for your local machine
Posted by hackerllama@reddit | LocalLLaMA | View on Reddit | 37 comments
Hey all! This is Omar, Chief Llama Officer at Hugging Face, ready to talk about our latest project, `local-gemma` (https://github.com/huggingface/local-gemma)
A common feedback we receive in transformers is that picking the right parameters and settings for your use case is not obvious. Hence, we release a first `local-gemma` repo which hopefully helps patch this up!
* CLI and Python usage
* Automatic preset based on your hardware and trading off between speed, memory, and accuracy
* **Exact**: maximizes accuracy. 18.3GB for 9B, 68.2GB for 27B.
* **Memory**: uses 4-bit quantization. 7.3GB for 9B, 17GB for 27B.
* Memory Extreme: uses CPU offloading. 3.7GB for 9B, 4.7GB for 27B
* Easy to install with pip and pipx
* Works with CUDA, MPS, AND cpu
* This uses logit soft-capping, which means you won't get the weird results some folks are getting with the 27B
This is a first experiment to make it easier for folks to run models locally with transformers and get good generation results. Feel free to leave feedback as issues in the repo. Enjoy!
37 Comments
kryptkpr@reddit
hackerllama@reddit (OP)
MoffKalast@reddit
jkflying@reddit
MoffKalast@reddit
kryptkpr@reddit
DeltaSqueezer@reddit
kryptkpr@reddit
DeltaSqueezer@reddit
kryptkpr@reddit
BestSentence4868@reddit
jgante@reddit
JadeSerpant@reddit
ThickLetteread@reddit
Robert__Sinclair@reddit
AlphaLemonMint@reddit
Robert__Sinclair@reddit
AlphaLemonMint@reddit
Robert__Sinclair@reddit
smcnally@reddit
Winter_Importance436@reddit
nborwankar@reddit
jgante@reddit
a_beautiful_rhind@reddit
mikael110@reddit
hackerllama@reddit (OP)
93041025@reddit
kristaller486@reddit
Able-Locksmith-1979@reddit
cleverusernametry@reddit
crazymonezyy@reddit
CapitalNobody6687@reddit
Biggest_Cans@reddit
Majestical-psyche@reddit
SanDiegoDude@reddit
gofiend@reddit
bgighjigftuik@reddit