WEBVTT
Kind: captions
Language: en

00:00:14.620 --> 00:00:18.440
Researchers found that the deeper architecture

00:00:18.900 --> 00:00:21.220
usually achieves better performance,

00:00:21.640 --> 00:00:25.260
but if the architecture becomes too deep,

00:00:25.660 --> 00:00:27.260
the error rate increase.

00:00:28.340 --> 00:00:31.600
The is mostly due to the “vanishing gradient problem”,

00:00:32.420 --> 00:00:38.760
which states that the gradients are too small to change the weights of deep layers.

00:00:39.320 --> 00:00:40.600
In 2015,

00:00:40.940 --> 00:00:44.340
the Microsoft researchers proposed the Residual Network,

00:00:45.280 --> 00:00:47.980
which is known as ResNet,

00:00:48.280 --> 00:00:49.540
to solve this problem.

00:00:50.400 --> 00:00:55.760
ResNet present using skip connection to avoid the vanishing gradient problem,

00:00:56.180 --> 00:01:01.480
and successfully won the ImageNet Challenge with 152-layer architecture.

00:01:02.180 --> 00:01:02.860
However,

00:01:02.980 --> 00:01:08.420
the error rate of 34-layer ResNet is close to the 152-layer version,

00:01:09.040 --> 00:01:13.140
and researchers have pushed the depth limit to 1,000 layers,

00:01:13.960 --> 00:01:16.480
but didn’t get significant improvement.

00:01:17.580 --> 00:01:21.400
Here is a side-by-side comparison of major CNN architectures,

00:01:22.480 --> 00:01:26.800
from the 7-layer LeNet to 152-layer ResNet.

00:01:27.160 --> 00:01:31.540
Note that the 152 layers to too long to be shown here,

00:01:31.800 --> 00:01:37.100
so the author selected the 34-layer version instead.

00:01:37.840 --> 00:01:41.480
Here is the summary table of the major CNN architectures,

00:01:42.200 --> 00:01:45.980
which is made by Vivienne Sze in MIT.

00:01:47.000 --> 00:01:55.360
We can see that the total parameters has increased significantly from 60k to around 138M,

00:01:56.220 --> 00:01:57.740
then started to reduce.

00:01:57.920 --> 00:02:03.000
Recently some compact models like MobieNet and SeqeezeNet

00:02:03.460 --> 00:02:04.420
was proposed,

00:02:05.160 --> 00:02:09.140
which have much less parameters but maintain high accuracy.

