TheaterFire

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Posted by umarmnaq@reddit | LocalLLaMA | View on Reddit | 1 comments

Reply to Post

1 Comments

iLaurens@reddit

It's an interesting concept that increases expressiveness of multimodal models. But isn't seperate transformer models per modality and connecting them with cross attention even more flexible and just as easy to implement?
View on Reddit #40467345