Summary of lidar scan matching algorithms

1 Introduction
2 Point to point registration
3 Higher level features
4 Other constraints
5 Ideas for improvement
6 References

1 Introduction

These days I’m working on lidar odometry and mapping. That is, for a mobile robot equipped with a lidar sensor, we wish to find out what the robot’s pose is, and generate a map of the world. This is known as simultaneous localization and mapping (SLAM).

Algorithms that do this can be largely divided into broad categories:

Odometry algorithms run in an online setting when new data is constantly being gathered, and updates the robot’s pose estimate as soon as possible. This then allows the point set data to be corrected of motion error (de-warping).
Mapping, or scan matching algorithms combine two or more point clouds into a single consistent one. This is more time consuming and is not run as often, or may be omitted altogether in favour of loop closure algorithms.
Loop closure corrects for drift by performing a global optimization over all past data, and is especially useful if a robot returns to a previously-visited area. This is very computationally expensive, and gets much worse as more data is collected.

Just odometry by itself is prone to drifting away from the true trajectory over time, so combining the de-warped point clouds in the mapping part can mitigate this. Even then there may be some drift, but some such open-loop algorithms (e.g. 4 ) are quite successful.

Odometry vs mapping — FIGURE 1 Picture showing point sets obtained from odometry being registered to a bigger point set during the mapping step. Image credit 4.

This blog post will focus on the scan matching process, since I was given some de-warped data that need to be aligned with minimal drift.

2 Point to point registration

A lidar sensor measures distance to the nearest obstacle in many directions. So its output is a point cloud with one point for each of those measurements.

Point set registration is the problem of aligning two point sets, often denoted M and S (for map and scene). Here we will talk about registering S to M but in many papers they talk about registering M (model) to S instead.

Point set registration algorithms often use the following iterative algorithm:

Estimate correspondences between points.
Solve for the optimal transform assuming those correspondences.

Since step 2 might change the correspondences, these two steps are repeated until convergence. There are multiple ways to estimate correspondences.

In iterative closest point (ICP) 0, every point in S is assumed to correspond to the closest point in M.
In robust point matching 1, every point in S is assumed to have some correspondence value between 0 and 1 to every point in M. As the algorithm progresses, it applies deterministic annealing to converge to a one-to-one correspondence between points in M and points in S.
In coherent point drift 2, every point in S is assumed to have some correspondence value to every point in M given by a Bayesian approach assuming each point has a Gaussian centered at it and then computing the posterior probability of finding the new Gaussian given the old distribution. The sigma, same for all points, decreases as the algorithm approaches convergence.

ICP and its variants are so ubiquitous that almost all point set registration and scan matching programs use it. However, it sometimes get stuck in local minima when the initial positions are not well-aligned. By relaxing the correspondence assumptions, the latter two algorithms are more robust against poor initial alignment, noisy outliers and removed points. In all cases, the cost function is some (possibly weighted) sum of distances of each point in S to its corresponding point in M.

There are various ways to estimate the transformation that minimizes the cost. It depends on how you parametrize the transformation (e.g. rigid transformation, vs non-rigid using thin plate splines, etc). Then, you can use least squares, the Levenberg-Marquadt algorithm, or something else.

There are also other point set registration algorithms that don’t adhere to the idea of iteratively estimating correspondence and then solving for a transformation. For example, the kernel correlation method 3 assigns a kernel (e.g. a Gaussian) centered at each of the points and then uses gradient descent to maximize the correlation between those kernels.

Note that in addition to spatial distance, point set registration algorithms can also make use of normals information and colour information where available.

3 Higher level features

In laser scan matching, especially using data produced by lidar sensors that have a small number of lasers, data sets are often sparse and may appear to have spurious patterns (i.e. “rows” of points along each scan line) caused by the way the scanner collects data. Thus, assuming that each point in S corresponds to points in M may not always work. One way to get around this is to use a model specifically tailored to the pattern of points produced by a lidar, e.g. polar matching 7. This does not work in the general case, for example when points have been de-warped using an external source. A higher-level representation of the point cloud can be used for robust scan matching.

Segmentation methods separate points into local structures, most commonly planes and edges. The cost function is the distance of plane-like points in S to planar patches in M; and of edge-like points in S to edge lines M.
- LOAM method: The LOAM algorithm 4 takes advantage of the assumption that points are produced as a laser beam is rotated at a constant speed. This yields the simple heuristic for finding curvature: a point is likely to be in a plane if it is close to the mean position of its neighbours obtained before and after it, and likely part of an edge otherwise.
Grid or voxel methods divide the space into a regular grid, and obtains a representation of points within each grid cell. A regular grid naturally removes any nonuniformity in the manner in which points are collected.
- The normal distribution transform 5 uses a Gaussian to model the distribution of points within each grid cell. This is used in 8 for a mining robot and compared favourably against ICP. The cost function is the negative of the goodness function obtained by evaluating the NDT of M at the positions of each point in S. Since Gaussians are differentiable, the authors of this method proposed optimizing the cost function using Newton’s method.
- Surface elements or surfels are local surface patch estimates by fitting a plane or ellipsoid to points within each grid cell. These are used in 6 for a mine exploration robot equipped with a lidar. The cost function is a weighted Euclidean sum of distance between the centers of surfels, and the difference in orientations thereof.
Invariant feature methods 910 use geometric descriptors to associate each point with a scalar value. A point is a feature if it has a rare descriptor value. Using a histogram of descriptor values to identify features in less populated bins, the algorithm extracts pairs of features in S and M with known correspondences. The scans can be matched by minimizing the cost that is the distance between corresponding pairs. If needed, this can subsequently be refined with ICP.
- Curvature can be computed using moments and spherical harmonics from surrounding points, but is sensitive to noise and sampling rate. 10
- The integral volume descriptor 9 is the fraction of the volume in a ball centered at each point that is considered to be inside the point cloud. The “insideness” is determined using methods such as ray shooting, and the volumes are calculated using a voxel occupancy grid.

some NDTs — FIGURE 3 Some two-dimensional NDTs

In most cases, the cost function may be minimized using any nonlinear regression method such as the Levenberg-Marquadt. As with point set registration, for additional robustness against outliers, we can pass the cost through a robust function such as the Cauchy distribution — this is known as an M-estimator.

4 Other constraints

In the specific context of a mobile robot, there are other constraints useful for laser scan matching, which arise from physical assumptions about the robot. These introduce additional terms in the cost function to penalize unfeasible solutions caused by numerical stability issues due to noise. The following constraints have been used in 6 for a mine exploration robot.

The smoothness constraints take into consideration a model of the physical limitations on the robot’s linear and rotational accelerations and penalize large transformations in a short time.
The gravity constraint assumes that the top of the robot is mostly pointing straight upwards or else it would have flipped over. This is especially useful when combined with an IMU to determine the up direction (after filtering out accelerations caused by motion).

5 Ideas for improvement

Poisson disk sampling instead of regular grid
Fit multiple planes using RANSAC to each cell instead of one surfel
Fit multiple Gaussians to cell using EM instead of one

1 Introduction

2 Point to point registration

3 Higher level features

4 Other constraints

5 Ideas for improvement

6 References