Solved 16-833: Robot Localization and Mapping Homework 4 Dense SLAM with Point-based Fusion

TO FIND THE NEXT OR DIFFERENT PROJECT CLICK ON THE SEARCH BUTTON ON THE TOP RIGHT MENU AND SEARCH USING COURSE CODE OR PROJECT TITLE.

Starting from:

~~$30~~

$15

Overview In this homework you will be implementing a 3D dense SLAM system using pointbased fusion, which can be regarded as a simplified version of [1]. There are mainly three steps to setup the system: Localization: projective ICP, a common implementation of visual odometry that is similar to [2]. Mapping: point-based fusion, a simplified map fusion algorithm. SLAM: glue the components and build the entire system. Generally SLAM is a tightly coupled system and hard to debug. We simplify the problem by giving you the ground truth poses for testing. In the ICP step, you will be working on two RGB-D frames (without a map). You may verify your implementation with the loss and the visualization. In the map-based fusion step, you will be working on a sequence of RGB-D frames with ground truth, and generate a fused map. Finally, you may test your algorithm by putting everything together. You will be testing the system on the synthethic ICL-NUIM [3] dataset. Unlike real-world data, you don’t have to care about holes and noise, which can be tedious to deal with. Please download the dataset from this link. Prior to running the system, use python preprocess.py /path/to/dataset to generate normal maps. 2 Iterative Closest Point (ICP) (50 points) Please read Section 5 in [1] before working on the following problems. Note, many equations in [1] are using unconventional parameterizations, and the notations are not easy to follow. We provide our derivation below as a reference. In general, as its name indicates, ICP runs iteratively. Here is the general flow: 1 1. Setup data correspondences between source and target point clouds given the current pose [Rt | t t ] (usually initialized with identity). 2. Linearize the loss function at [Rt | t t ] and solve for incremental update [δR | δt]. 3. Update by left multiplying the incremental transformation [Rt+1 | t t+1] = [δR | δt][Rt | t t ]. 4. Go back to step 1 until convergence. You will be working on icp.py in this section. 2.1 Projective data association Data association is critical for point cloud registration, since the overall spirit is to push together correspondent points. In the conventional point cloud ICP, this is achieved by nearest neighbor search, usually depending on a KDTree. Although there are various fast implementations, building and searching in a KDTree can be relatively slow. In KinectFusion, a more efficient variation, projective data association is used. Suppose the target point cloud is in the form of a vertex map (where each pixel correpsonds to a point in 3D), then naturally there is an indexable geometry layout. Now given a point p, instead of searching for q in the 3D space, after initial transformation, we can project p to the 2D image and obtain the pixel coordinate (u, v), and directly read the corresponding point q from the vertex map. In this case, p and q are not strictly nearest neighbors in 3D, but projective nearest neighbors. In practice, this is very efficient and works pretty well. In case you are unfamiliar with projective pinhole camera model, we have provided you with the project function. Question (5 points): Suppose you have projected a point p to a vertex map and obtained the u, v coordinate with a depth d. The vertex map’s height and width are H and W. Write down the conditions u, v, d must satisfy to setup a valid correspondence. Also implement the TODO : first filter section in function find projective correspondence. After obtaining the correspondences q from the vertex map and the corresponding normal nq in the normal map, you will need to additionally filter them by distance thresholds and angle thresholds, so that |p − q| < dthr. Question (5 points): Why is this step necessary? Implement this filter in TODO : second filter section in function find projective correspondence. 2.2 Linearization KinectFusion seeks to minimize the point-to-plane error between associated points (p, q) where p is from the source and q is from the target. The error function can be written by: X i∈Ω r 2 i (R, t) = n > qi (Rpi + t − qi) 2 , (1) where (pi , qi) is the i-th associated point pair in the assocation set Ω = {(pn, qn)}, and nqi is the estimated normal at qi (we have covered the data association step, i.e., how to associate pi and qi). 2 It is hard to directly solve R ∈ SO(3). So we rely on the small-angle assumption by parameterizing δR with δR =   1 −γ β γ 1 −α −β α 1   . (2) and also introduce δt = [tx, ty, tz]. Note this parameterization is unlike [2] and is closer to the conventional Lie Algebra’s formulation [4]. With δR and δt, we can write down the incremental update version X i∈Ω r 2 i (δR, δt) = n > qi (δR)p 0 i + δt − qi 2 , (3) where p 0 i = R0pi + t 0 aiming at solving α, β, γ, tx, ty, tz with the given initial R0 , t0 . Question (15 points): Now reorganize the parameters and rewrite ri(δR, δt) in the form of ri(α, β, γ, tx, ty, tz) = Ai         α β γ tx ty tz         + bi , where Ai is a 1 × 6 matrix and bi is a scalar. Note: You don’t have to explicitly write down all the entries, and feel free to introduce intermediate variables. A cross-product operator may be useful here: [w]× =   0 −w2 w1 w2 0 −w0 −w1 w0 0   , w = [w0, w1, w2] > ∈ R 3 . 2.3 Optimization The aforementioned step in fact does the linearization. Now suppose we have collected n associated pairs, we move on to optimize X i=1n r 2 i (α, β, γ, tx, ty, tz) = Xn i=1 Ai         α β γ tx ty tz         + bi 2 . (4) Question (15 points): Write down the linear system that provides a closed form solution of α, β, γ, tx, ty, tz in terms of Ai and bi . You may either choose a QR formulation by expanding a matrix and filling in rows (resulting in a n × 6 linear system), or a LU formulation by summing up n matrices (resulting in a 6 × 6 system). Implement build linear system and the corresponding solve with your derivation. By solving the linear system you derived above, you can obtain the incremental transformation update in the tangent space. This 6D vector can be converted to a 4 × 4 matrix by pose2transformation provided by us. 3 Figure 1: Point clouds before and after registration. Now, run python icp.py /path/to/dataset. If everything works fine, you will find ICP’s loss converges in less than 10 iterations and the inlier count increases correspondingly. The two point clouds fit better, resulting in an interleaved pattern shown in Fig. 1. Question (10 points): Report your visualization before and after ICP with the default source and target (frame 10 and 50). Then, choose another more challenging source and target by yourself (e.g., frame 10 and 100) and report the visualization. Analyize the reason for its failure or success. Note: Press P to take a screenshot. 3 Point-based Fusion (40 points) Please read Section 4 in [1] before working on the following problems. KinectFusion uses volumetric TSDF as the map representation, which is nontrivial to implement. Point-based fusion, on the otherhand, maintains a weighted point cloud and actively merges incoming points, hence is easy to implement. The essential operations of point-based fusion are very similar to projective data association. Given an estimated pose, we first project the point p in the map to the image and obtain its corresponding (u, v) coordinates. Then, after proper filtering, we compute a weighted average of the properties (positions, normals, colors). You will be working on fusion.py in this section. 3.1 Filter Similar to projective data association, we need a transformation. In a typical SLAM system, this is obtained from accumulative ICP. In this section, however, we assume ground truth transformations are known. The transformation T w c = [Rw c | t w c ] is from the input camera frame’s coordinate system to the worlds’s coordinate system. With the available transformation, you need to perform filtering similar to projective data association. The only difference is that you will need to add a normal angle constraint to be more strict with filtering. Question (5 points): Implement filter pass1, filter pass2 to obtain mask arrays before merging and adding input points. 3.2 Merge The merge operation updates existing points in the map by calculating a weight average on the desired properties. 4 Figure 2: Fusion with ground truth poses with a normal map. Question (15 points): Given p ∈ R 3 in the map coordinate system with a weight w and its corresponding point q ∈ R 3 (read from the vertex map) in the frame’s coordinate system with a weight 1, write down the weight average of the positions in terms of p, q, Rw c , tw c , w. Similarly, write down the weight average of normals in terms of np, nq, Rw c , w. Implement the corresponding part in merge. Note a normalization of normals is required after weight average. 3.3 Addition The projective data association may cover a big portion of the input vertex map, but still there are uncovered regions. These points have to be added to the map. Question (10 points): Implement the corresponding part in add. You will need to select the unassociated points and concatenate the properties to the existing map. 3.4 Results Now run python fusion.py /path/to/dataset. The system will take the ground truth poses and produce the resulting image after fusing 200 frames, see Fig. 2. Question (10 points): Report your visualization with a normal map, and the final number of points in the map. Estimate the compression ratio by comparing the number of points you obtain and the number of points if you naively concatenate all the input. Note to speed up we use a downsample factor 2 for all the input. Note: Press N to visualize the normal map. 4 The dense SLAM system (20 points + 10 points) Now we put together both ICP and fusion code in main.py. We have called the relevant functions implemented by you for your convenience. Question (5 points): Which is the source and which is the target, for ICP between the map and the input RGBD-frame? Can we swap their roles, why or why not? Run python main.py /path/to/dataset. By default, 200 frames will be tracked and mapped. Question (15 points): Report your visualization. Also, record the estimated trajectory and compare against the ground truth. Drift is acceptable in the case, to the extent shown in Fig. 3. Bonus (10 points): Can you try to reduce the drift? There could be several methods, from improving the RGBD odometry to improving the fusion by rejecting 5 Figure 3: Fusion with poses estimated from frame-to-model ICP. outliers and eliminating dormant points (i.e., not associated to any points for a certain period). Report your updated visualization and trajectories. 5 Code Submission Rules Upload all of your python code in a folder. Include all your visualizations in your PDF file. Do not upload the dataset when you submit. References [1] Keller, Maik, et al. “Real-time 3D reconstruction in dynamic scenes using point-based fusion.” International Conference on 3D vision (3DV), 2013. Link: http://ieeexplore.ieee.org/document/6599048/. [2] Newcombe, Richard A., et al. “KinectFusion: Real-time dense surface mapping and tracking.” 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2011. Link: http://ieeexplore.ieee.org/document/6162880/. [3] Handa, Ankur, et al. “A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM.” IEEE International Conference on Robotics and Automation (ICRA), 2014. Website: https://www.doc.ic.ac.uk/˜ahanda/VaFRIC/iclnuim.html. [4] Sola, Joan and Deray, Jeremie and Atchuthan, Dinesh. “A micro Lie theory for state estimation in robotics,” arXiv 2018.