Zhangzhe's Blog

The projection of my life.

0%

Effective Quantization Methods for Recurrent Neural Networks

URL

https://arxiv.org/pdf/1611.10176

TL;DR

  • 提出一种基于DoReFa-Net的RNN量化方式

Algorithm

Dropout

  • 由于 RNN 本质是由多个 FC 组成,所以需要使用 Dropout 将一部分元素随机置0,但在量化神经网络中,0不在 Qk(X+0.5)0.5Q_k(X + 0.5)-0.5 ,所以需要将原始 DoReFa-Net 的量化范围: [0.5,0.5][0,1][-0.5, 0.5]\to [0, 1]

quantization of word embedding weights

  • 初始化
    WUniform(0,1)W\in Uniform(0, 1)
  • quantization
    W=Clip(W,0,1)W = Clip(W, 0, 1)

quantization of GRU

  • standard GRU
    zt=σ(Wz[ht1,xt])z_t = \sigma(W_z \cdot [h_{t-1},x_t])

    rt=σ(Wr[ht1,xt])r_t = \sigma(W_r \cdot [h_{t-1},x_t])

    h~t=tanh(W[rt×ht1,xt])\tilde h_{t}=tanh(W\cdot [r_t \times h_{t-1},x_t])

    $ h_t = (1-z_t)\times h_{t-1}+z_t\times \tilde{h_t}$
    其中:” \cdot “ 表示 matmul,” ×\times “ 表示 hadamard productσ\sigma 表示 sigmoid

  • quantization of GRU
    zt=σ(Wz[ht1,xt])z_t = \sigma(W_z \cdot [h_{t-1},x_t])

    rt=σ(Wr[ht1,xt])r_t = \sigma(W_r \cdot [h_{t-1},x_t])

    h~t=σ(W[Qk(rt×ht1),xt])\tilde h_{t}=\sigma(W\cdot [Q_k(r_t \times h_{t-1}), x_t])

    $ h_t = Q_k((1-z_t)\times h_{t-1}+z_t\times \tilde{h_t})$

    • 改进:
      • tanhsigmoid
      • W,Wz,WrW, W_z, W_r quantize 到 [-1, 1]
      • xtx_t quantize 到 [0, 1]
      • 量化函数 Qk()Q_k()

quantization of LSTM

  • standard LSTM
    ft=σ(Wf[ht1,xt]+bf)f_t = \sigma(W_f\cdot [h_{t-1},x_t]+b_f)

    it=σ(Wi[ht1,xt]+bi)i_t = \sigma(W_i\cdot [h_{t-1},x_t]+b_i)

    Ct~=tanh(WC[ht1,xt]+bi)\tilde{C_t} = tanh(W_C\cdot [h_{t-1},x_t]+b_i)

    Ct=ft×Ct1+it×Ct~C_t=f_t\times C_{t-1}+i_t\times \tilde{C_t}

    ot=σ(Wo[ht1,xt]+bo)o_t = \sigma(W_o\cdot [h_{t-1},x_t]+b_o)

    ht=ot×tanh(Ct)h_t=o_t \times tanh(C_t)

  • quantization of LSTM
    ft=σ(Wf[ht1,xt]+bf)f_t = \sigma(W_f\cdot [h_{t-1},x_t]+b_f)

    it=σ(Wi[ht1,xt]+bi)i_t = \sigma(W_i\cdot [h_{t-1},x_t]+b_i)

    Ct~=tanh(WC[ht1,xt]+bi)\tilde{C_t} = tanh(W_C\cdot [h_{t-1},x_t]+b_i)

    Ct=ft×Ct1+it×Ct~C_t=f_t\times C_{t-1}+i_t\times \tilde{C_t}

    ot=σ(Wo[ht1,xt]+bo)o_t = \sigma(W_o\cdot [h_{t-1},x_t]+b_o)

    ht=Qk(ot×σ(Ct))h_t=Q_k(o_t \times \sigma(C_t))

    • 改进:
      • tanhsigmoid
      • W,Wz,WrW, W_z, W_r quantize[-1, 1]
      • xtx_t quantize 到 [0, 1]
      • 量化函数 Qk()Q_k()

Thoughts

  • 目前芯片上都是将所有的线性操作吸到卷积的 affine_kaffine_b,所以需要解决 tanhsigmoid,一种最简单暴力的方式是分别 clip 到 [-1, 1][0, 1]