DFloat11: Lossless LLM Compression for Efficient GPU Inference

Posted by ninjasaid13@reddit | LocalLLaMA | View on Reddit | 6 comments

Reply to Post

6 Comments

[-]

BlueSwordM@reddit

You lose some performance because of the additional entropy coding.

[-]

nihnuhname@reddit

I wonder if it is possible to compress bf8 to some variant of DFloat?

[-]

Yes, although gains are smaller. u/danielhanchen thought the same thing! https://www.reddit.com/r/LocalLLaMA/comments/1k7o89n/comment/mp1zczv/?utm_source=share&utm_medium=mweb3x&utm_name=mweb3xcss&utm_term=1&utm_content=share_button

[-]

Remote_Cap_@reddit

One of the writers made an amazing post himself here https://www.reddit.com/r/LocalLLaMA/comments/1k7o89n/we_compress_any_bf16_model_to_70_size_during/