WEBVTT
Kind: captions
Language: en

00:00:13.980 --> 00:00:17.180
There are three major types of neural networks.

00:00:17.200 --> 00:00:19.840
The first is Convolutional Neural Networks

00:00:19.960 --> 00:00:22.920
The second is the Recurrent Neural Networks

00:00:23.320 --> 00:00:28.000
And the third is the Generative Adversarial Networks

00:00:28.560 --> 00:00:31.700
Let's take a look of Convolutional Neural Networks first.

00:00:32.920 --> 00:00:42.120
The convolutional neural network is introduced by Yann LeCun in 1990s for recognizing handwritten digits.

00:00:42.480 --> 00:00:48.000
Convolution is a mathematical operation that has been widely used in many fields,

00:00:48.300 --> 00:00:51.020
such as digital signal processing,

00:00:51.520 --> 00:00:53.900
electrical engineering, and physics.

00:00:54.520 --> 00:00:56.120
In image processing,

00:00:56.540 --> 00:00:59.220
the convolution operations are 2D-filters,

00:00:59.660 --> 00:01:03.080
which can be applied to extract different image features.

00:01:03.760 --> 00:01:10.480
The training process is to adjust the parameters of filters to minimize the errors between predictions and labels.

00:01:10.860 --> 00:01:11.960
Historycally,

00:01:12.340 --> 00:01:16.020
CNN was used mainly for images.

00:01:16.340 --> 00:01:23.820
But recently scientists have found that 1D-convolutional network with attention mechanism

00:01:24.380 --> 00:01:27.500
can achieve top performance in other research fields

00:01:27.760 --> 00:01:31.920
such as neural language processing or speech recognition.

00:01:32.440 --> 00:01:36.940
So, Convolutional Neural Networks becomes more and more important.

00:01:37.860 --> 00:01:38.920
For more details,

00:01:38.920 --> 00:01:43.900
you can refer to the famous Stanford’s free course cs231n.

00:01:45.560 --> 00:01:49.480
One drawback of CNN and other feedforward networks is that

00:01:49.740 --> 00:01:53.640
they don’t consider the interdependency of sequential data.

00:01:54.320 --> 00:01:58.200
The temporal relations are important for natural language

00:01:58.680 --> 00:02:00.700
understanding or speech recognition.

00:02:01.240 --> 00:02:02.380
For example,

00:02:03.240 --> 00:02:05.080
“Mary had a little lamb”,

00:02:05.580 --> 00:02:06.880
the owner of the lamb,

00:02:06.880 --> 00:02:10.240
Mary, is mentioned in the beginning of the sentence.

00:02:10.700 --> 00:02:16.220
The Recurrent Neural Network (RNN) solves this problem by adding a “loop” in the hidden layers.

00:02:17.140 --> 00:02:20.860
The loop can keep the previous states of sequential data,

00:02:21.320 --> 00:02:23.840
which “remember” the temporal information.

00:02:24.560 --> 00:02:29.600
It may not be obvious how loops in hidden layers can be used to remember information.

00:02:30.640 --> 00:02:31.400
Actually,

00:02:31.640 --> 00:02:36.200
we can unroll the RNN loop to better understand this mechanism.

00:02:36.760 --> 00:02:39.960
Here is a good figure from Colah’s blog.

00:02:40.260 --> 00:02:43.800
The input data X0, X1, X2

00:02:44.100 --> 00:02:45.640
are coming sequentially.

00:02:46.060 --> 00:02:50.640
The recurrent layer generates outputs h0, h1, h2,

00:02:51.520 --> 00:02:53.360
depending on the input X,

00:02:54.060 --> 00:02:57.700
and also send current output to next state.

00:02:58.580 --> 00:02:59.640
From this figure,

00:02:59.640 --> 00:03:05.780
we can see that the “Loop back” mechanism is equivalent to keep information for next state.

00:03:06.440 --> 00:03:09.080
There are many variants of RNN,

00:03:09.240 --> 00:03:13.620
the most important one is Long-short Term Memory (LSTM).

00:03:14.200 --> 00:03:18.220
The main contribution of LSTM is to add the forget gate,

00:03:18.700 --> 00:03:22.100
which enables a neuron cell to reset its own state

00:03:22.480 --> 00:03:25.600
and “forget” out-of-date information.

00:03:26.140 --> 00:03:29.000
Modern RNN are all based on LSTM.

00:03:30.240 --> 00:03:35.080
Recently a new emerging architecture called Generative Adversarial Networks (GAN)

00:03:35.880 --> 00:03:38.080
was proposed by Ian Goodfellow.

00:03:38.960 --> 00:03:41.920
GAN consists of two sub-networks:

00:03:42.300 --> 00:03:44.700
a generator and a discriminator.

00:03:45.360 --> 00:03:49.660
The generator tries to generate fake images based on random inputs,

00:03:50.120 --> 00:03:53.860
while the discriminator tries to classify if the

00:03:54.440 --> 00:03:57.560
generated images are fake or real.

00:03:58.260 --> 00:04:02.420
The trick is to let those two networks compete with each other.

00:04:02.860 --> 00:04:06.160
The generator learns to generate more realistic images,

00:04:06.620 --> 00:04:11.400
while the discriminator learns to identify more challenging fake images.

00:04:12.480 --> 00:04:16.340
GAN achieves the equilibrium state

00:04:16.900 --> 00:04:22.200
when the discriminator can no longer distinguish between the real images and fake images,

00:04:22.560 --> 00:04:25.340
and we will have a strong fake generator.

00:04:25.860 --> 00:04:29.180
After being invented in 2014,

00:04:30.380 --> 00:04:33.460
GAN has been widely adopted to generate articles,

00:04:33.460 --> 00:04:36.060
music, images, or videos.

00:04:36.700 --> 00:04:43.120
It creates many new applications but also makes some new problems, such as fake porn videos.

