WEBVTT
Kind: captions
Language: en
00:00:06.589 --> 00:00:10.450
Now we will consider the case where a camera
is looking at a bunch of points and these
00:00:10.450 --> 00:00:13.020
points all lie on a plane.
00:00:13.020 --> 00:00:17.600
The plane has got its own coordinate system
which we denote by coordinate frame zero.
00:00:17.600 --> 00:00:22.180
Clearly every point that lies on this plane
has got a Z coordinate of zero,
00:00:22.180 --> 00:00:23.820
which is shown down here.
00:00:23.820 --> 00:00:29.240
The coordinate capital Z multiples all of
the elements in the third column of our camera matrix,
00:00:29.240 --> 00:00:34.060
but because it's zero we can effectively
remove that column from the matrix
00:00:34.060 --> 00:00:37.960
and we can remove that row from
the world coordinate vector.
00:00:37.969 --> 00:00:42.940
What we're left with now is a three by three
matrix and we'll refer to this three by three
00:00:42.940 --> 00:00:46.149
matrix as a "planar homography".
00:00:46.149 --> 00:00:51.079
Just as for the camera matrix, there is an
arbitrary scale factor and once again we can
00:00:51.079 --> 00:00:55.430
normalise the homography matrix by choosing
one particular element that we're going to
00:00:55.430 --> 00:00:57.870
set to the value of one.
00:00:57.870 --> 00:01:02.690
So this three by three matrix, it's got one
element that we've set to one, there are eight
00:01:02.690 --> 00:01:06.150
unique numbers remaining in the homography
matrix.
00:01:06.150 --> 00:01:12.100
And we can estimate the homography matrix
if we have four world points and the corresponding
00:01:12.100 --> 00:01:16.350
position of those points on the image plane
of our camera.
00:01:16.350 --> 00:01:21.499
So the concept of corresponding points, imagine
that I've got two planes, one is perhaps the
00:01:21.500 --> 00:01:25.020
image plane of the camera; the other might
be a physical plane in the world
00:01:25.020 --> 00:01:27.140
that the camera is looking at.
00:01:27.140 --> 00:01:32.399
Alternatively, the first could be a view of
a plane in the world and the second image
00:01:32.400 --> 00:01:36.320
could be another view of the same
plane in the world,
00:01:36.320 --> 00:01:38.740
where we've moved the camera
between the two views.
00:01:38.740 --> 00:01:44.359
Now we've got four points in each of these
planes, which I'm going to denote by the subscripts
00:01:44.360 --> 00:01:48.400
one through four and I'm going to arrange
the coordinates of those points
00:01:48.400 --> 00:01:51.240
into the columns of a matrix.
00:01:51.250 --> 00:01:54.889
But what's really important here is the ordering
of these columns.
00:01:54.889 --> 00:01:57.649
We have to ensure what's called correspondence.
00:01:57.649 --> 00:02:02.400
P1 and Q1 must correspond to the
same point in the world
00:02:02.400 --> 00:02:06.100
and so it goes for P2, P3 and P4.
00:02:06.100 --> 00:02:12.080
Each point P and the corresponding point Q
must refer to the same point in the world.
00:02:12.080 --> 00:02:15.620
Let's look at a practical example of how we
can use this technique to perform something
00:02:15.629 --> 00:02:18.120
called "perspective rectification".
00:02:18.120 --> 00:02:21.760
Now this is a picture that I took of the Notre
Dame Cathedral in Paris.
00:02:22.540 --> 00:02:27.030
It's a very tall cathedral, so I'm on the
ground in front, looking up and taking a picture.
00:02:27.030 --> 00:02:31.700
And clearly because my camera is tilted
upwards I've got a very distorted view of
00:02:31.700 --> 00:02:33.520
the front of the cathedral.
00:02:33.520 --> 00:02:36.760
But I know some things about cathedrals and
particularly I know that the front of the
00:02:36.760 --> 00:02:39.769
cathedral is most likely to be a plane.
00:02:39.769 --> 00:02:44.680
So if I pick four points on the front of the
cathedral that I believe all lie in a single
00:02:44.680 --> 00:02:48.260
plane and I can label them P1 through to P4.
00:02:49.380 --> 00:02:54.740
But I know that those points in a non-distorted
image will form a rectangle in the image plane,
00:02:54.750 --> 00:02:56.510
not a trapezoid.
00:02:56.510 --> 00:03:01.080
I can compute the image plane coordinates
Q1, Q2, Q3 and Q4
00:03:01.080 --> 00:03:03.920
in order to have a rectangle in the image.
00:03:04.540 --> 00:03:10.240
So if I have now two sets of corresponding
points; I have the points P1 through P4 and
00:03:10.250 --> 00:03:15.790
I have the points Q1 through Q4, then I can
compute an homography.
00:03:15.790 --> 00:03:21.850
So if I build up a matrix P that contains
as columns the points P1 through P4 and the
00:03:21.850 --> 00:03:29.120
matrix Q, whose columns are the points Q1
through Q4, then I can compute an homography.
00:03:29.120 --> 00:03:32.639
And it's shown here and very simple to do
in MATLAB.
00:03:32.639 --> 00:03:37.431
Now that I have this homography matrix H,
I can use it to transform any point, P, in
00:03:37.440 --> 00:03:41.700
my original image, to any point, Q, in a second
image.
00:03:42.380 --> 00:03:45.020
And this is what the second image looks like.
00:03:45.030 --> 00:03:47.329
We see that the cathedral has been straightened up.
00:03:47.329 --> 00:03:52.600
We can see that the vertical edges of the
cathedral are in fact vertical lines in the image.
00:03:53.180 --> 00:03:56.760
It's important to remember that there's a
very strong assumption made in this process
00:03:56.760 --> 00:04:00.840
and that is that all of the points in the
image lie on a plane.
00:04:00.840 --> 00:04:05.879
Certainly many of the points in this image
lay on the frontal plane of the cathedral,
00:04:05.879 --> 00:04:07.239
but not all do.
00:04:07.239 --> 00:04:12.370
If we look at points around here, which are
on the edges of the bell towers, then they
00:04:12.370 --> 00:04:17.250
do not lie on the frontal plane and the transformation
won't be correct for them.
00:04:17.250 --> 00:04:20.230
It will introduce a distortion in that part
of the image.
00:04:20.230 --> 00:04:24.580
You can't get anything for free, we've certainly
proved that geometric correctness
00:04:24.580 --> 00:04:26.560
of the bulk of the cathedral.
00:04:26.960 --> 00:04:31.220
Given that I've computed the matrix H using
MATLAB, then it's a very simple matter to
00:04:31.220 --> 00:04:35.970
apply the homography to every single point
in the image.
00:04:35.970 --> 00:04:39.560
And we perform that by a process known as
"image warping".
00:04:41.360 --> 00:04:46.400
To do image warping, we can see that every
single pixel in the output image and the output
00:04:46.410 --> 00:04:52.460
image in this case is the geometrically correct,
the rectified image, of the cathedral.
00:04:52.460 --> 00:04:57.949
To illustrate this I’m going to choose just
one particular point in the output image and
00:04:57.949 --> 00:05:01.889
it’s the pixel at coordinate (600, 100).
00:05:01.889 --> 00:05:06.070
Now if I know that pixel coordinate, I want
to try and work out what's the corresponding
00:05:06.070 --> 00:05:09.330
pixel coordinate in the input image.
00:05:09.330 --> 00:05:14.310
The homography is a mapping from the original
image to the new image, so in order to map
00:05:14.310 --> 00:05:19.319
this coordinate I need to use the inverse
of the homography and that gives me the coordinate
00:05:19.319 --> 00:05:26.840
of the corresponding point in the input image
and it's got a coordinate of (757, 51).
00:05:26.840 --> 00:05:34.540
The way image warping works then is we go
and find the pixel at coordinate (757, 51)
00:05:34.540 --> 00:05:39.400
and we take that pixel value and we insert
it into the new image at coordinate
00:05:39.400 --> 00:05:41.180
(600, 100).
00:05:41.240 --> 00:05:48.980
So for every single pixel in the output image, we
work out where it comes from in the input image.
00:05:49.040 --> 00:05:54.080
You can see here that the coordinates in the
input image are fractional and that requires
00:05:54.080 --> 00:05:59.060
a technique called “image interpolation”
to find what is the actual pixel value
00:05:59.060 --> 00:06:01.900
at this particular fractional coordinate.
00:06:01.909 --> 00:06:05.880
In a nutshell, that's the process of image
warping.
00:06:05.880 --> 00:06:11.169
Another application of image warping is this
often-used effect now in swimming telecasts,
00:06:11.169 --> 00:06:17.520
where we take the flag and the name of the
competitors and we overlay them on the lanes
00:06:17.520 --> 00:06:19.350
of the swimming pool.
00:06:19.350 --> 00:06:23.300
It's actually quite an easy trick to do and
it involves these homographies.
00:06:23.300 --> 00:06:27.500
Now image that I could swim well, well enough
to get into a swimming tournament,
00:06:27.500 --> 00:06:29.580
so there's my flag and there's my name.
00:06:29.580 --> 00:06:33.300
Now I've got this image that I created, just
using ordinary computer graphics,
00:06:33.300 --> 00:06:34.820
that's the easy bit.
00:06:34.830 --> 00:06:39.150
Now I want to lay that image into my lane
in the swimming pool.
00:06:39.150 --> 00:06:42.900
All I need to do that, is to find the four
corresponding points,
00:06:42.900 --> 00:06:47.540
so the four corners of this rectangle that holds the image that I want to overlay
00:06:47.540 --> 00:06:51.840
and the four points in the swimming pool
where I'd like it to be laid.
00:06:51.840 --> 00:06:58.380
Once I have that information I can warp that
original image into this very distorted image,
00:06:58.380 --> 00:07:04.360
which I could then insert into or overlay
onto the original image of the swimming pool.
00:07:05.300 --> 00:07:09.120
Those of you who are doing the project
associated with this course,
00:07:09.120 --> 00:07:12.180
the homography is going to be very, very useful.
00:07:12.180 --> 00:07:17.039
You've probably already built a two-dimensional
robot, that sits on a worksheet and can move
00:07:17.040 --> 00:07:22.620
its end effector to any particular XY coordinate
on the robot worksheet.
00:07:23.280 --> 00:07:26.660
Now image that we take a picture of that robot
worksheet.
00:07:26.669 --> 00:07:29.250
I have an image of the robot worksheet.
00:07:29.250 --> 00:07:34.270
The homography lets me create a mapping between
a coordinate in the image of the worksheet,
00:07:34.270 --> 00:07:40.300
which has got a coordinate of (U, V) in the
image plane and I can map that to a physical
00:07:40.300 --> 00:07:43.710
coordinate, (X, Y) on the robot's worksheet.
00:07:43.710 --> 00:07:49.099
I can map from an image plane coordinate to
a robot worksheet coordinate, or I can map
00:07:49.100 --> 00:07:54.080
from a robot worksheet coordinate back to
a camera image coordinate.
00:07:54.080 --> 00:07:59.160
Now homographies are going to be very, very
helpful for you in completing the project.
00:07:59.169 --> 00:08:04.970
Just to summarise the capability in the toolbox
for computing and using homographies.
00:08:04.970 --> 00:08:12.040
Given two sets of corresponding points P and Q,
we can compute a three by three homography matrix.
00:08:12.200 --> 00:08:15.610
The columns of P and Q represent points.
00:08:15.610 --> 00:08:21.819
Now P might be image coordinates of known
points in an image, Q might be coordinates
00:08:21.819 --> 00:08:25.000
of points on the robot's physical worksheet.
00:08:25.000 --> 00:08:31.259
Alternatively, P could be a set of image coordinates
in one image and Q could be a set of image
00:08:31.259 --> 00:08:34.090
coordinates in another image.
00:08:34.090 --> 00:08:39.910
Given that I have the three by three homography
matrix H, I can then map a set of points P,
00:08:39.910 --> 00:08:44.169
in the first plane, to a set of points Q,
in the second plane.