WEBVTT
Kind: captions
Language: en

00:00:14.280 --> 00:00:17.740
In addition to the three basic types of machine learning,

00:00:17.740 --> 00:00:19.300
there are many other types,

00:00:19.300 --> 00:00:26.460
including semi-supervised learning, self-supervised learning, one-shot learning, and zero-shot learning.

00:00:26.460 --> 00:00:29.500
Due to the time limit we don’t talk the details here.

00:00:29.720 --> 00:00:33.260
The data type is important for selecting algorithms.

00:00:33.260 --> 00:00:35.980
Data can be classified into four types:

00:00:36.120 --> 00:00:39.000
nominal, ordinal, interval and ratio.

00:00:39.020 --> 00:00:42.680
The nominal and ordinal data belong to discrete data,

00:00:42.680 --> 00:00:46.460
while interval and ratio belong to continuous data.

00:00:46.460 --> 00:00:50.700
Discrete data can’t be measured but it can be counted

00:00:50.780 --> 00:00:53.460
Continuous data can’t be counted but they can be measured.

00:00:53.460 --> 00:00:58.500
Nominal data are labeling variables without any quantitative value,

00:00:58.680 --> 00:01:01.460
which can be simply called labels!

00:01:01.680 --> 00:01:04.700
Nominal data are encoded using one-hot encoding.

00:01:05.010 --> 00:01:10.609
For example, your gender or the language you speak can be represented by nominal data.

00:01:11.100 --> 00:01:14.680
The ordinal data are discrete like nominal data,

00:01:14.680 --> 00:01:16.680
but the order is important.

00:01:16.700 --> 00:01:20.580
One of the examples is your educational background.

00:01:20.960 --> 00:01:27.360
Interval scales are continuous in which we know both the order and the exact differences between the values.

00:01:27.620 --> 00:01:31.940
The classic example of an interval scale is&nbsp;Celsius&nbsp;temperature,

00:01:31.940 --> 00:01:34.960
because the difference between each degree is the same.

00:01:35.120 --> 00:01:38.960
But the problem of interval data is that they don’t have a true zero.

00:01:39.620 --> 00:01:43.560
Ratio data are interval data with absolute zero.

00:01:43.740 --> 00:01:49.200
Good examples of ratio variables include height, weight, and duration.

00:01:49.200 --> 00:01:53.280
Now we know the difference between the four types of data.

00:01:53.280 --> 00:01:55.560
Before starting to introduce algorithms,

00:01:55.560 --> 00:01:58.920
I want to talk about the metrics first.

00:01:58.920 --> 00:02:02.160
The metrics are used to evaluate the performance of our models.

00:02:02.400 --> 00:02:07.560
You can only really understand your model if you know the meanings behind your metrics.

00:02:07.560 --> 00:02:10.820
Here is a confusion matrix form wikipedia.

00:02:10.820 --> 00:02:13.600
The confusion matrix also known as an error matrix

00:02:14.280 --> 00:02:18.919
The matrix is used to visualize if the model is confusing two classes:

00:02:19.620 --> 00:02:22.660
positive and negative class.

00:02:22.660 --> 00:02:27.440
The rows represent the predicted class, while the columns represent the labels,

00:02:27.820 --> 00:02:30.580
which is called the true condition here.

00:02:31.300 --> 00:02:34.200
The most important part of this model,

00:02:34.200 --> 00:02:36.740
is that there are two types of errors,

00:02:36.880 --> 00:02:39.460
the first one is called false positive,

00:02:40.310 --> 00:02:42.310
also called type-I error,

00:02:42.740 --> 00:02:47.380
which means the model misclassifies negative class as positive;

00:02:47.780 --> 00:02:49.860
the other one is false negative,

00:02:49.860 --> 00:02:52.220
or type II error,

00:02:52.880 --> 00:02:57.580
which means the model misclassifies positive class as negative.

00:02:57.580 --> 00:02:59.800
Keep those two types of errors in mind.

00:03:00.340 --> 00:03:03.260
Based on those two types of errors,

00:03:03.260 --> 00:03:05.880
there are many different metrics.

00:03:05.880 --> 00:03:09.060
Some metrics focus on minimizing type I error,

00:03:09.320 --> 00:03:12.460
some focus on minimizing type II error,

00:03:12.840 --> 00:03:16.660
some want to minimize both errors.

00:03:18.260 --> 00:03:20.600
Depending on your applications,

00:03:20.760 --> 00:03:26.220
you will favor different type of matrix for more details, please refer to Wikipedia

