循环神经网络RNN

1. 1 Definition

中文:循环神经网络(Recurrent Neural Network, RNN)是一类以序列(sequence)数据为输入,在序列的演进方向进行递归(recursion)且所有节点(循环单元)按链式连接的递归神经网络(recursive neural network).

英文:Recurrent Neural Network(RNN) is a type of Neural Network where the output from the previous step are fed as input to the current step (that's why it is called "recurrent"). The main and most important feature of RNN is Hidden state, which remembers some information about a sequence[1].

Recurrent neural networks (RNNs) are the state of the art algorithm for sequential data and are used by Apple’s Siri and Google’s voice search. Like many other deep learning algorithms, RNNs are relatively old. They were initially created in the 1980’s, but only in recent years have we seen their true potential. An increase in computational power along with the massive amounts of data that we now have to work with, and the invention of long short-term memory (LSTM) in the 1990’s, has really brought RNNs to the foreground[1].

1.1. 神经网络(Artifical Neural Network, ANN)

人工神经网络(artificial neural network,ANN),简称神经网络(neural network,NN),是一种模仿生物神经网络的结构和功能的数学模型或计算模型。神经网络由大量的人工神经元联结进行计算。大多数情况下人工神经网络能在外界信息的基础上改变内部结构,是一种自适应系统。现代神经网络是一种非线性统计性数据建模工具,常用来对输入和输出间复杂的关系进行建模,或用来探索数据的模式。

神经网络是一种运算模型,由大量的节点(或称“神经元”)和之间相互的联接构成。每个节点代表一种特定的输出函数,称为激励函数、激活函数(activation function)。每两个节点间的联接都代表一个对于通过该连接信号的加权值,称之为权重,这相当于人工神经网络的记忆。网络的输出则依网络的连接方式,权重值和激励函数的不同而不同。而网络自身通常都是对自然界某种算法或者函数的逼近,也可能是对一种逻辑策略的表达。

output = f(x) = 1 if ∑w1x1 + b>= 0; 0 if ∑w1x1 + b < 0

其中 就是神经网络: 人工神经网络可以使用下图表示:

其中,激活函数 f(x) 可以表示如下:

Types of Activation Functions (GeekForGeeks, 5 Deep Learning and Neural Network Activation Functions to Know)

  • Step Function: H(x) = 1 if x>=0 else 0
  • Sigmoid Function: H(x) =
  • Tanh Function: H(x) =
  • ReLU: H(x) = max(0, x), note: ReLU = Rectified Linear Unit
  • Leaky ReLU: H(x) = ax if x < 0 else x
  • SoftMax Function:

2. 2 Training RNN

The formula for calculating current state: where:

ht -> current state
ht-1 -> previous state
xt -> input state

where:

whh -> weight at recurrent neuron
wxh -> weight at input neuron

where

Yt -> output
Why -> weight at output layer

2.1.1. Training through RNN

  1. A single-time step of the input is provided to the network.
  2. Then calculate its current state using a set of current input and the previous state.
  3. The current becomes for the next time step.
  4. One can go as many time steps according to the problem and join the information from all the previous states.
  5. Once all the time steps are completed the final current state is used to calculate the output.
  6. The output is then compared to the actual output i.e the target output and the error is generated.
  7. The error is then back-propagated to the network to update the weights and hence the network (RNN) is trained.

2.1.2. Advantages of Recurrent Neural Network

  1. An RNN remembers each and every piece of information through time. It is useful in time series prediction only because of the feature to remember previous inputs as well, which is so called Long Short-Term Memory.
  2. Recurrent neural networks are even used with convolutional layers to extend the effective pixel neighborhood.

Disadvantages of Recurrent Neural Network

  1. Gradient vanishing and exploding problems.
  2. Training an RNN is a very difficult task.
  3. It cannot process very long sequences if using tanh or ReLu as an activation function.

Applications of Recurrent Neural Network

  1. Language Modelling and Generating Text
  2. Speech Recognition
  3. Machine Translation
  4. Image Recognition, Face detection
  5. Time series Forecasting

3. LTSM

LSTM由Hochreiter和Schmidhuber(1997)在Report: FKI-207-95中提出[5],并在近期被AlexGraves进行了改良和推广。在很多问题,LSTM都取得相当巨大的成功,并得到了广泛的使用。LSTM通过刻意的设计来避免长期依赖问题。记住长期的信息在实践中是LSTM的默认行为,而非需要付出很大代价才能获得的能力!

4. 参考文献

[1] Introduction to Recurrent Neural Network

[2] A Guide to Recurrent Neural Networks: Understanding RNN and LSTM Networks In this guide to Recurrent Neural Networks, we explore RNNs, long short-term memory (LSTM) and backpropagation.

[5] Report: FKI-207-95(1995): Long Short Term Memory, https://people.idsia.ch/~juergen//FKI-207-95ocr.pdf

[6] ReLU (Rectified Linear Unit) Activation Function, https://builtin.com/machine-learning/relu-activation-function, https://iq.opengenus.org/relu-activation/

4.1. RNN

4.2. PyTorch

4.3. LSTM

4.4. Transformer

4.5. 统计学

4.6. 论文

4.7. 通用

5. Others

flowchart LR
    X((X)) --->| W | Y((Y))

flowchart LR
    X((h0)) ---> Y[h1]

results matching ""

    No results matching ""