bycloud on AI benchmarks
2 quotes from AI researchers about benchmarks, models, and evaluation
"The eight times speed up claim is comparing 4-bit against 32-bit unquantized baseline. In practice modern LM inference does not usually use 32 bits. The real question should be how much better is TurboQuant than the baselines people already use, which they did not answer."
bycloud @bycloud · 2026-04-10 view on x
"This sort of optimization at KV cache level is not something new. Every company that serves LLMs definitely uses some sort of quantization there. Nothing crazy revolutionary about AI is discovered. Everyone has already been maxing the compression efficiency in their own ways."
bycloud @bycloud · 2026-04-10 view on x