Q8BERT, a Quantized 8bit Version of BERT-Base – Intel AI

This work presents a method to achieve the best-in-class compression-accuracy ratio for BERT-base. We open sourced the quantization method and the code for reproducing the 8bit quantized models and have made it available in NLP Architect release 0.5.