Post-Training Quantization of a Language Model using Distiller
A detailed, Jupyter Notebook based tutorial on this topic is located at <distiller_repo_root>/examples/word_language_model/quantize_lstm.ipynb
.
You can view a "read-only" version of it in the Distiller GitHub repository here.
The tutorial covers the following:
- Converting the model to use Distiller's modular LSTM implementation, which allows flexible quantization of internal LSTM operations.
- Collecting activation statistics prior to quantization
- Creating a
PostTrainLinearQuantizer
and preparing the model for quantization - "Net-aware quantization" capability of
PostTrainLinearQuantizer
- Progressively tweaking the quantization settings in order to improve accuracy