- This is a project that implements the Google Research paper TurboQuant (arXiv:2504.19874) in PyTorch.
- [Source repository here] (https://github.com/javafa/turboquant)
Baseline(FP16) ๋๋น ๋น๊ต:
| ๊ตฌ์ฑ | VRAM | KV Cache | ์๋ (short) | ์๋ (long) |
|---|---|---|---|---|
| BnB 4-bit | -53.5% | ๋์ผ | -28.6% | -37.3% |
| TurboQuant 3-bit | ๋์ผ | -39.9% | +23.1% | +5.3% |
| BnB 4-bit + TurboQuant | -53.5% | -39.9% | -29.4% | -38.5% |
| Unsloth 4-bit | -42% | ๋์ผ | -41.1% | -17.5% |
| Unsloth 4-bit + TurboQuant | -42% | -39.9% | -5.5% | -12.7% |
- This is a merged model. So you have to unckeck the "[X] Contains merge/moerge" checkbox above the list.
| Metric | Value |
|---|---|
| Avg. | 81.28 |


