new MoE from ai2, EMO

Posted by ghostderp@reddit | LocalLLaMA | View on Reddit | 19 comments

new MoE from ai2, EMO

new MoE release from ai2 - EMO, 1b-active/14b-total trained on 1t tokens

interesting thing is document-level routing. experts cluster around domains like health, news, etc. instead of surface patterns

models: https://huggingface.co/collections/allenai/emo