admin管理员组

文章数量:1122832

I am trying to convert microsoft/Multilingual-MiniLM-L12-H384 in to int8 with post quantization approach. Can someone point out references.

I have referred to tensorflow blogs and converter APIs and did convert to int8, a16i8, default quant variants. However model accuracy is only comparable in default quantization.

Rest of int8 and a16i8 result in almost zero accuracy.

My representative dataset tried from xpuple of hundred to thousands.

I have also followed quantization aware training example and still accuracy reaults similar to post quantization.

I have tried convrter debugger and checked errors in each layer. When converted with deny list of layers rsme/scale > 0.3 and < 0.2 only makes converted model meet desired accuracy. But in this process all most all layers are not quantized.

Any reference of scripts for above model or similar arch model conversion to int8 or pointers would help me.

Appreciate any support.

本文标签: tensorflowReference code to convert microsoftMultilingualMiniLML12H384 to int8 quantizationStack Overflow