YouTube · 2026-04-10
"This sort of optimization at KV cache level is not something new. Every company that serves LLMs definitely uses some sort of quantization there. Nothing crazy revolutionary about AI is discovered. Everyone has already been maxing the compression efficiency in their own ways."
bycloud