ByteDance-Seed/Cola-DLM · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 9 comments
Cola DLM (Continuous Latent Diffusion Language Model) is a hierarchical continuous latent-space diffusion language model. It combines a Text VAE with a block-causal Diffusion Transformer (DiT) prior: the VAE maps text into continuous latent sequences and decodes latents back to tokens, while the DiT performs latent prior transport through Flow Matching.
This model repository contains the HuggingFace-format checkpoint for the paper Continuous Latent Diffusion Language Model.
Links
- Model repository: https://huggingface.co/ByteDance-Seed/Cola-DLM
- GitHub repository: https://github.com/ByteDance-Seed/Cola-DLM
- Paper: https://arxiv.org/abs/2605.06548
- HuggingFace Daily Paper: https://huggingface.co/papers/2605.06548
- Project page: https://hongcanguo.github.io/Cola-DLM/
- Blog post: https://hongcanguo.github.io/posts/2026-cola-dlm.html
- Zhihu article: https://zhuanlan.zhihu.com/p/2038324180920313704
Model Details
- Architecture: Text VAE + block-causal DiT latent prior.
- Training objective: two-stage training with Text VAE pretraining followed by joint Text VAE + DiT training using Flow Matching.
- Training-compute checkpoint: the released weights correspond to the 2000 EFLOPs checkpoint reported in the paper's RQ4 scaling curve.
- Tokenizer: OLMo 2 tokenizer with a 100,278-entry vocabulary.
- Special token ids:
pad_token_id=100277,eos_token_id=100257,im_end_token_id=100265. - Framework: PyTorch 2.1+ and HuggingFace Transformers 4.40+.
- License: Apache License 2.0.
Dolsis@reddit
Yet another cuda or CPU model.
Still waiting for diffusion models I can run on with Vulkan.
Maybe i missed something but I dont think I can use my (AMD) 7900 RT GPU to run it (rocm support with this card is meh on fedora. Maybe I should use Ubuntu only for these use cases ?)
I have the same disappointment with the qwen-image model.
This_Maintenance_834@reddit
cuda is the de facto first class citizen. if playing with newest model becomes important, I guess people have to just sell the AMD and buy a nvidia. i know it costs significantly additional money.
a_slay_nub@reddit
MMLU of 19? I thought random guessing was 25?
xeeff@reddit
pulled from hf
Silver-Champion-4846@reddit
How many params exactly?
pmttyji@reddit (OP)
From Blog post:
13. Scaling Experiments
At \~2B parameters, \~2000 EFLOPs, and under strictly matched comparisons, Cola DLM's hierarchical continuous latent prior modeling demonstrates a meaningful scaling trend.
Silver-Champion-4846@reddit
Hmmmm 2b... gguf probably needs llamacpp support of this arch first
kevinlch@reddit
If i understood correctly, model is now aimed to deliver "meaning" instead of "most probable next word". so the model have to solve the absolute meaning first and then populate the token sequences later. so the user can pick any length/any level of details for the output. hallucination will be very very low. correct?
j_osb@reddit
Wow this is
This is very exciting. Hope to see some more support for this.