Jun Dai, Minghao Yang and Naohiko Hanajima
As one of the key technologies for robots to perceive real-world environments, the visual Simultaneous Localization And Mapping (SLAM) system utilizes only geometric spatial features in the process of mapping, which is unable to construct static dense maps in complex dynamic environments and is difficult to eliminate large quantities of drifting point clouds. To solve the problem, this paper proposes a map construction method based on a visual SLAM system and convolutional neural network. First, the dynamic area are comprehensively determined by object detection networks and geometric constraints, after which they are eliminated to remove the negative impact on pose estimation. Second, the dense building map is added to the back-end of our system, which combines the dynamic areas information to reject the drifting point cloud of motion, and then the camera pose and environment images are utilized to generate the dense map by stitching the point cloud together. Experimental results on public datasets show that our algorithm is able to construct accurate maps of dense static environments in dynamic environments, while effectively improving the SLAM system's localization accuracy and maintaining real-time capability of the system.
Visual SLAM; Pose estimation; Feature match; Object detection; Three-dimensional point cloud map;