TheaterFire

DFloat11: Lossless LLM Compression for Efficient GPU Inference

Posted by ninjasaid13@reddit | LocalLLaMA | View on Reddit | 6 comments

Reply to Post

6 Comments

Legitimate-Week3916@reddit

Where is the hack?
View on Reddit #55066365

BlueSwordM@reddit

You lose some performance because of the additional entropy coding.
View on Reddit #55094794

Remote_Cap_@reddit

Slow for single batch inference.
View on Reddit #55067127

nihnuhname@reddit

I wonder if it is possible to compress bf8 to some variant of DFloat?
View on Reddit #55066779

Remote_Cap_@reddit

Yes, although gains are smaller. u/danielhanchen thought the same thing! https://www.reddit.com/r/LocalLLaMA/comments/1k7o89n/comment/mp1zczv/?utm_source=share&utm_medium=mweb3x&utm_name=mweb3xcss&utm_term=1&utm_content=share_button
View on Reddit #55067191

Remote_Cap_@reddit

One of the writers made an amazing post himself here https://www.reddit.com/r/LocalLLaMA/comments/1k7o89n/we_compress_any_bf16_model_to_70_size_during/
View on Reddit #55067102