âš We use dtype FP16, becuase F32 is much slower due to the hardware limit TFLOPS = 32(INT8) / 16(FP16) / 2(FP32), and INT8 does not even work properly as we tried twice :( ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results