Quantization

Quantization is a machine learning technique used to speed up the inference and reduce the storage requirements of neural networks. It involves reducing the number of bits that represent the weights of the model.

Areas of application

Image and video compression
Audio signal processing
Communication systems
Speech recognition
Deep learning
Physics (Quantum Mechanics)
Data transmission

Example

For example, instead of using 32-bit floating point numbers to represent the weights of a neural network, quantization can be used to reduce them to 8-bit integers. This can result in significant reductions in memory usage and inference time without sacrificing too much accuracy.

Resources

Reduce Language Learning Models Hallucinations

← A Quantifier Quantum Computing →