Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Posted by umarmnaq@reddit | LocalLLaMA | View on Reddit | 1 comments
[-] iLaurens@reddit It's an interesting concept that increases expressiveness of multimodal models. But isn't seperate transformer models per modality and connecting them with cross attention even more flexible and just as easy to implement? Reply Submit
1 Comments
iLaurens@reddit