OLMoE - a fully open source sparse MoE with only 1 billion active parameters
Posted by Aaaaaaaaaeeeee@reddit | LocalLLaMA | View on Reddit | 36 comments
>We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B. We present various experiments on MoE training, analyze routing in our model showing high specialization, and open-source all aspects of our work: model weights, training data, code, and logs.
- models: https://huggingface.co/collections/allenai/olmoe-66cf678c047657a30c8cd3da
- paper: https://arxiv.org/html/2409.02060v1
- data: https://hf.co/datasets/allenai/OLMoE-mix-0924
- code: https://github.com/allenai/OLMoE
- logs: https://wandb.ai/ai2-llm/olmoe/reports/
36 Comments
-p-e-w-@reddit
Muennighoff@reddit
jld1532@reddit
Aaaaaaaaaeeeee@reddit (OP)
Muennighoff@reddit
innominato5090@reddit
The_GSingh@reddit
xXWarMachineRoXx@reddit
sammcj@reddit
DefiantHost6488@reddit
sammcj@reddit
innominato5090@reddit
pallavnawani@reddit
DefiantHost6488@reddit
pallavnawani@reddit
exclaim_bot@reddit
FullOf_Bad_Ideas@reddit
Muennighoff@reddit
FullOf_Bad_Ideas@reddit
Muennighoff@reddit
CosmosisQ@reddit
exxon_gas4@reddit
thezachlandes@reddit
robotphilanthropist@reddit
Imjustmisunderstood@reddit
Muennighoff@reddit
Imjustmisunderstood@reddit
Muennighoff@reddit
MoffKalast@reddit
catlordX3@reddit
mrshadow773@reddit
robotphilanthropist@reddit
MoffKalast@reddit
robotphilanthropist@reddit
Healthy-Nebula-3603@reddit
Ylsid@reddit