: A feature to handle memory spikes during training by offloading to CPU RAM. 🔬 Key Technical Details
: RNF4 mediates the degradation of the PML-RARα fusion protein. NF4.rar
: A process that quantizes the quantization constants themselves to save additional memory. : A feature to handle memory spikes during
: An information-theoretically optimal data type for normally distributed weights. It uses 16 quantization levels based on the quantiles of a standard normal distribution. 💡 : If you are looking for the
: Compresses 16-bit weights to 4 bits, effectively reducing VRAM usage by ~75% (e.g., a 65B parameter model can be loaded with ~35GB instead of ~130GB).
💡 : If you are looking for the software/machine learning paper, search for "QLoRA" or "4-bit NormalFloat" on arXiv .
: To reduce the memory footprint of LLMs (like Llama) enough to fit on a single GPU (e.g., a 24GB RTX 3090) while maintaining full 16-bit performance.