Epipolar geometry

Epipolar geometry is the geometry of stereo vision. When two cameras view a 3D scene from two distinct positions, there are a number of geometric relations between the 3D points and their projections onto the 2D images that lead to constraints between the image points. These relations are derived based on the assumption that the cameras can be approximated by the pinhole camera model.

Definitions

The figure below depicts two pinhole cameras looking at point X. In real cameras, the image plane is actually behind the focal center, and produces an image that is symmetric about the focal center of the lens. Here, however, the problem is simplified by placing a virtual image plane in front of the focal center i.e. optical center of each camera lens to produce an image not transformed by the symmetry. O_L and O_R represent the centers of symmetry of the two cameras lenses. X represents the point of interest in both cameras. Points x_L and x_R are the projections of point X onto the image planes.

Each camera captures a 2D image of the 3D world. This conversion from 3D to 2D is referred to as a perspective projection and is described by the pinhole camera model. It is common to model this projection operation by rays that emanate from the camera, passing through its focal center. Each emanating ray corresponds to a single point in the image.

Epipole or epipolar point

Since the optical centers of the cameras lenses are distinct, each center projects onto a distinct point into the other camera's image plane. These two image points, denoted by e_L and e_R, are called epipoles or epipolar points. Both epipoles e_L and e_R in their respective image planes and both optical centers O_L and O_R lie on a single 3D line.^[1]

Epipolar line

The line O_L–X is seen by the left camera as a point because it is directly in line with that camera's lens optical center. However, the right camera sees this line as a line in its image plane. That line (e_R–x_R) in the right camera is called an epipolar line. Symmetrically, the line O_R–X is seen by the right camera as a point and is seen as epipolar line e_L–x_Lby the left camera.

An epipolar line is a function of the position of point X in the 3D space, i.e. as X varies, a set of epipolar lines is generated in both images. Since the 3D line O_L–X passes through the optical center of the lens O_L, the corresponding epipolar line in the right image must pass through the epipole e_R (and correspondingly for epipolar lines in the left image). All epipolar lines in one image contain the epipolar point of that image.^[1] In fact, any line which contains the epipolar point is an epipolar line since it can be derived from some 3D point X.

Epipolar plane

As an alternative visualization, consider the points X, O_L & O_R that form a plane called the epipolar plane. The epipolar plane intersects each camera's image plane where it forms lines—the epipolar lines. The epipolar plane and all epipolar lines intersect the epipoles regardless of where X is located.

Epipolar constraint and triangulation

If the relative position of the two cameras is known, this leads to two important observations:

Assume the projection point x_L is known, and the epipolar line e_R–x_R is known and the point X projects into the right image, on a point x_R which must lie on this particular epipolar line. This means that for each point observed in one image the same point must be observed in the other image on a known epipolar line. This provides an epipolar constraint: the projection of X on the right camera plane x_R must be contained in the e_R–x_R epipolar line. All points X e.g. X₁, X₂, X₃ on the O_L–X_L line will verify that constraint. It means that it is possible to test if two points correspond to the same 3D point. Epipolar constraints can also be described by the fundamental matrix,^[1] or in the case of normalized image coordinates, the essential matrix^[2] between the two cameras.
If the points x_L and x_R are known, their projection lines are also known. If the two image points correspond to the same 3D point X the projection lines must intersect precisely at X. This means that X can be calculated from the coordinates of the two image points, a process called triangulation.^[3]

Simplified cases

The epipolar geometry is simplified if the two camera image planes coincide. In this case, the epipolar lines also coincide (e_L–X_L = e_R–X_R). Furthermore, the epipolar lines are parallel to the line O_L–O_R between the centers of projection, and can in practice be aligned with the horizontal axes of the two images. This means that for each point in one image, its corresponding point in the other image can be found by looking only along a horizontal line. If the cameras cannot be positioned in this way, the image coordinates from the cameras may be transformed to emulate having a common image plane. This process is called image rectification.