LiquidGunay

Best way to add "Memory" to LLMs?

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 62 comments
Chatkit-js with LangGraph Agents?

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 6 comments
Fastest way to serve llama 3 8b

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 33 comments
Is the TPU really an ASIC?

Posted by LiquidGunay@reddit | hardware | View on Reddit | 56 comments
No AWQ for Gemma 3?

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 26 comments
GRPO for VLMs?

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 0 comments
The real use case for DIGITS is SLM training

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 16 comments
How to make Coding LMs more creative?

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 8 comments
Is Mamba inference faster than Transformers? (in practice)

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 8 comments
What happened to the Nvidia VLM?

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 6 comments
Is serving a quantized model faster?

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 5 comments
Hard RAG benchmarks?

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 5 comments
Task specific fine-tuning using distillation?

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 2 comments
What architectural changes would be required to make an omni model?

Posted by LiquidGunay@reddit | LocalLLaMA | View on Reddit | 11 comments
If companies were waifus

Posted by LiquidGunay@reddit | Jokes | View on Reddit | 2 comments