YouTube · 2026-04-10
"The eight times speed up claim is comparing 4-bit against 32-bit unquantized baseline. In practice modern LM inference does not usually use 32 bits. The real question should be how much better is TurboQuant than the baselines people already use, which they did not answer."
bycloud