Compression Aware prompting for quantized models

Reply to Post

[-]

Funny, this paper was accepted by ICML 2024, but what exactly is the novelty here? Is it just prompt-tuning a compressed model?