0%

Effective Quantization Methods for Recurrent Neural Networks

Posted on 2021-08-22 Edited on 2025-03-21 In Quantization Valine:

URL

https://arxiv.org/pdf/1611.10176

TL;DR

提出一种基于DoReFa-Net的RNN量化方式

Algorithm

Dropout

由于 RNN 本质是由多个 FC 组成，所以需要使用 Dropout 将一部分元素随机置0，但在量化神经网络中，0不在 $Q_k(X + 0.5)-0.5$ ，所以需要将原始 DoReFa-Net 的量化范围： $[-0.5, 0.5]\to [0, 1]$

quantization of word embedding weights

初始化
$W\in Uniform(0, 1)$
quantization
$W = Clip(W, 0, 1)$

quantization of GRU

standard GRU
$z_t = \sigma(W_z \cdot [h_{t-1},x_t])$

$r_t = \sigma(W_r \cdot [h_{t-1},x_t])$

$\tilde h_{t}=tanh(W\cdot [r_t \times h_{t-1},x_t])$

$ h_t = (1-z_t)\times h_{t-1}+z_t\times \tilde{h_t}$
其中：” $\cdot$ “ 表示 matmul，” $\times$ “ 表示 hadamard product， $\sigma$ 表示 sigmoid
quantization of GRU
$z_t = \sigma(W_z \cdot [h_{t-1},x_t])$

$r_t = \sigma(W_r \cdot [h_{t-1},x_t])$

$\tilde h_{t}=\sigma(W\cdot [Q_k(r_t \times h_{t-1}), x_t])$

$ h_t = Q_k((1-z_t)\times h_{t-1}+z_t\times \tilde{h_t})$
- 改进：
  - tanh 变 sigmoid
  - 将 $W, W_z, W_r$ quantize 到 [-1, 1]
  - 将 $x_t$ quantize 到 [0, 1]
  - 量化函数 $Q_k()$

quantization of LSTM

standard LSTM
$f_t = \sigma(W_f\cdot [h_{t-1},x_t]+b_f)$

$i_t = \sigma(W_i\cdot [h_{t-1},x_t]+b_i)$

$\tilde{C_t} = tanh(W_C\cdot [h_{t-1},x_t]+b_i)$

$C_t=f_t\times C_{t-1}+i_t\times \tilde{C_t}$

$o_t = \sigma(W_o\cdot [h_{t-1},x_t]+b_o)$

$h_t=o_t \times tanh(C_t)$
quantization of LSTM
$f_t = \sigma(W_f\cdot [h_{t-1},x_t]+b_f)$

$i_t = \sigma(W_i\cdot [h_{t-1},x_t]+b_i)$

$\tilde{C_t} = tanh(W_C\cdot [h_{t-1},x_t]+b_i)$

$C_t=f_t\times C_{t-1}+i_t\times \tilde{C_t}$

$o_t = \sigma(W_o\cdot [h_{t-1},x_t]+b_o)$

$h_t=Q_k(o_t \times \sigma(C_t))$
- 改进：
  - tanh 变 sigmoid
  - 将 $W, W_z, W_r$ quantize 到 [-1, 1]
  - 将 $x_t$ quantize 到 [0, 1]
  - 量化函数 $Q_k()$

Thoughts

目前芯片上都是将所有的线性操作吸到卷积的 affine_k 和 affine_b，所以需要解决 tanh 和 sigmoid，一种最简单暴力的方式是分别 clip 到 [-1, 1] 和 [0, 1]