Dynamic Simultaneous Localization and Mapping(SLAM)in visual scenes is currently a major research area in fields such as robot navigation and autonomous driving.However,in the face of complex real-world envi-ronments,...Dynamic Simultaneous Localization and Mapping(SLAM)in visual scenes is currently a major research area in fields such as robot navigation and autonomous driving.However,in the face of complex real-world envi-ronments,current dynamic SLAM systems struggle to achieve precise localization and map construction.With the advancement of deep learning,there has been increasing interest in the development of deep learning-based dynamic SLAM visual odometry in recent years,and more researchers are turning to deep learning techniques to address the challenges of dynamic SLAM.Compared to dynamic SLAM systems based on deep learning methods such as object detection and semantic segmentation,dynamic SLAM systems based on instance segmentation can not only detect dynamic objects in the scene but also distinguish different instances of the same type of object,thereby reducing the impact of dynamic objects on the SLAM system’s positioning.This article not only introduces traditional dynamic SLAM systems based on mathematical models but also provides a comprehensive analysis of existing instance segmentation algorithms and dynamic SLAM systems based on instance segmentation,comparing and summarizing their advantages and disadvantages.Through comparisons on datasets,it is found that instance segmentation-based methods have significant advantages in accuracy and robustness in dynamic environments.However,the real-time performance of instance segmentation algorithms hinders the widespread application of dynamic SLAM systems.In recent years,the rapid development of single-stage instance segmentationmethods has brought hope for the widespread application of dynamic SLAM systems based on instance segmentation.Finally,possible future research directions and improvementmeasures are discussed for reference by relevant professionals.展开更多
Error or drift is frequently produced in pose estimation based on geometric"feature detection and tracking"monocular visual odometry(VO)when the speed of camera movement exceeds 1.5 m/s.While,in most VO meth...Error or drift is frequently produced in pose estimation based on geometric"feature detection and tracking"monocular visual odometry(VO)when the speed of camera movement exceeds 1.5 m/s.While,in most VO methods based on deep learning,weight factors are in the form of fixed values,which are easy to lead to overfitting.A new measurement system,for monocular visual odometry,named Deep Learning Visual Odometry(DLVO),is proposed based on neural network.In this system,Convolutional Neural Network(CNN)is used to extract feature and perform feature matching.Moreover,Recurrent Neural Network(RNN)is used for sequence modeling to estimate camera’s 6-dof poses.Instead of fixed weight values of CNN,Bayesian distribution of weight factors are introduced in order to effectively solve the problem of network overfitting.The 18,726 frame images in KITTI dataset are used for training network.This system can increase the generalization ability of network model in prediction process.Compared with original Recurrent Convolutional Neural Network(RCNN),our method can reduce the loss of test model by 5.33%.And it’s an effective method in improving the robustness of translation and rotation information than traditional VO methods.展开更多
Estimating the global position of a road vehicle without using GPS is a challenge that many scientists look forward to solving in the near future. Normally, inertial and odometry sensors are used to complement GPS mea...Estimating the global position of a road vehicle without using GPS is a challenge that many scientists look forward to solving in the near future. Normally, inertial and odometry sensors are used to complement GPS measures in an attempt to provide a means for maintaining vehicle odometry during GPS outage. Nonetheless, recent experiments have demonstrated that computer vision can also be used as a valuable source to provide what can be denoted as visual odometry. For this purpose, vehicle motion can be estimated using a non-linear, photogrametric approach based on RAndom SAmple Consensus (RANSAC). The results prove that the detection and selection of relevant feature points is a crucial factor in the global performance of the visual odometry algorithm. The key issues for further improvement are discussed in this letter.展开更多
Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly dist...Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly distributed features because dense features occupy excessive weight.Herein,a new human visual attention mechanism for point-and-line stereo visual odometry,which is called point-line-weight-mechanism visual odometry(PLWM-VO),is proposed to describe scene features in a global and balanced manner.A weight-adaptive model based on region partition and region growth is generated for the human visual attention mechanism,where sufficient attention is assigned to position-distinctive objects(sparse features in the environment).Furthermore,the sum of absolute differences algorithm is used to improve the accuracy of initialization for line features.Compared with the state-of-the-art method(ORB-VO),PLWM-VO show a 36.79%reduction in the absolute trajectory error on the Kitti and Euroc datasets.Although the time consumption of PLWM-VO is higher than that of ORB-VO,online test results indicate that PLWM-VO satisfies the real-time demand.The proposed algorithm not only significantly promotes the environmental adaptability of visual odometry,but also quantitatively demonstrates the superiority of the human visual attention mechanism.展开更多
Robust and efficient vision systems are essential in such a way to support different kinds of autonomous robotic behaviors linked to the capability to interact with the surrounding environment, without relying on any ...Robust and efficient vision systems are essential in such a way to support different kinds of autonomous robotic behaviors linked to the capability to interact with the surrounding environment, without relying on any a priori knowledge. Within space missions, above all those involving rovers that have to explore planetary surfaces, vision can play a key role in the improvement of autonomous navigation functionalities: besides obstacle avoidance and hazard detection along the traveling, vision can in fact provide accurate motion estimation in order to constantly monitor all paths executed by the rover. The present work basically regards the development of an effective visual odometry system, focusing as much as possible on issues such as continuous operating mode, system speed and reliability.展开更多
This paper aims at a semi-dense visual odometry system that is accurate,robust,and able to run realtime on mobile devices,such as smartphones,AR glasses and small drones.The key contributions of our system include:1)t...This paper aims at a semi-dense visual odometry system that is accurate,robust,and able to run realtime on mobile devices,such as smartphones,AR glasses and small drones.The key contributions of our system include:1)the modified pyramidal Lucas-Kanade algorithm which incorporates spatial and depth constraints for fast and accurate camera pose estimation;2)adaptive image resizing based on inertial sensors for greatly accelerating tracking speed with little accuracy degradation;and 3)an ultrafast binary feature description based directly on intensities of a resized and smoothed image patch around each pixel that is sufficiently effective for relocalization.A quantitative evaluation on public datasets demonstrates that our system achieves better tracking accuracy and up to about 2X faster tracking speed comparing to the state-of-the-art monocular SLAM system:LSD-SLAM.For the relocalization task,our system is 2.0X∼4.6X faster than DBoW2 and achieves a similar accuracy.展开更多
Efficient and precise localization is a prerequisite for the intelligent navigation of mobile robots. Traditional visual localization systems, such as visual odometry (VO) and simultaneous localization and mapping ...Efficient and precise localization is a prerequisite for the intelligent navigation of mobile robots. Traditional visual localization systems, such as visual odometry (VO) and simultaneous localization and mapping (SLAM), suffer from two shortcomings: a drift problem caused by accumulated localization error, and erroneous motion estimation due to illumination variation and moving objects. In this paper, we propose an enhanced VO by introducing a panoramic camera into the traditional stereo-only VO system. Benefiting from the 360° field of view, the panoramic camera is responsible for three tasks: (1) detect- ing road junctions and building a landmark library online; (2) correcting the robot's position when the landmarks are revisited with any orientation; (3) working as a panoramic compass when the stereo VO cannot provide reliable positioning results. To use the large-sized panoramic images efficiently, the concept of compressed sensing is introduced into the solution and an adap- tive compressive feature is presented. Combined with our previous two-stage local binocular bundle adjustment (TLBBA) stereo VO, the new system can obtain reliable positioning results in quasi-real time. Experimental results of challenging long-range tests show that our enhanced VO is much more accurate and robust than the traditional VO, thanks to the compressive panoramic landmarks built online.展开更多
Visual odometry,which aims to estimate relative camera motion between sequential video frames,has been widely used in the fields of augmented reality,virtual reality,and autonomous driving.However,it is still quite ch...Visual odometry,which aims to estimate relative camera motion between sequential video frames,has been widely used in the fields of augmented reality,virtual reality,and autonomous driving.However,it is still quite challenging for stateof-the-art approaches to handle low-texture scenes.In this paper,we propose a robust and efficient visual odometry algorithm that directly utilizes edge pixels to track camera pose.In contrast to direct methods,we choose reprojection error to construct the optimization energy,which can effectively cope with illumination changes.The distance transform map built upon edge detection for each frame is used to improve tracking efficiency.A novel weighted edge alignment method together with sliding window optimization is proposed to further improve the accuracy.Experiments on public datasets show that the method is comparable to stateof-the-art methods in terms of tracking accuracy,while being faster and more robust.展开更多
In this paper,we present a novel algorithm for odometry estimation based on ceiling vision.The main contribution of this algorithm is the introduction of principal direction detection that can greatly reduce error acc...In this paper,we present a novel algorithm for odometry estimation based on ceiling vision.The main contribution of this algorithm is the introduction of principal direction detection that can greatly reduce error accumulation problem in most visual odometry estimation approaches.The principal direction is defned based on the fact that our ceiling is flled with artifcial vertical and horizontal lines which can be used as reference for the current robot s heading direction.The proposed approach can be operated in real-time and it performs well even with camera s disturbance.A moving low-cost RGB-D camera(Kinect),mounted on a robot,is used to continuously acquire point clouds.Iterative closest point(ICP) is the common way to estimate the current camera position by registering the currently captured point cloud to the previous one.However,its performance sufers from data association problem or it requires pre-alignment information.The performance of the proposed principal direction detection approach does not rely on data association knowledge.Using this method,two point clouds are properly pre-aligned.Hence,we can use ICP to fne-tune the transformation parameters and minimize registration error.Experimental results demonstrate the performance and stability of the proposed system under disturbance in real-time.Several indoor tests are carried out to show that the proposed visual odometry estimation method can help to signifcantly improve the accuracy of simultaneous localization and mapping(SLAM).展开更多
Simultaneous localisation and mapping(SLAM)are the basis for many robotic applications.As the front end of SLAM,visual odometry is mainly used to estimate camera pose.In dynamic scenes,classical methods are deteriorat...Simultaneous localisation and mapping(SLAM)are the basis for many robotic applications.As the front end of SLAM,visual odometry is mainly used to estimate camera pose.In dynamic scenes,classical methods are deteriorated by dynamic objects and cannot achieve satisfactory results.In order to improve the robustness of visual odometry in dynamic scenes,this paper proposed a dynamic region detection method based on RGBD images.Firstly,all feature points on the RGB image are classified as dynamic and static using a triangle constraint and the epipolar geometric constraint successively.Meanwhile,the depth image is clustered using the K-Means method.The classified feature points are mapped to the clustered depth image,and a dynamic or static label is assigned to each cluster according to the number of dynamic feature points.Subsequently,a dynamic region mask for the RGB image is generated based on the dynamic clusters in the depth image,and the feature points covered by the mask are all removed.The remaining static feature points are applied to estimate the camera pose.Finally,some experimental results are provided to demonstrate the feasibility and performance.展开更多
Visual simultaneous localization and mapping(SLAM)is crucial in robotics and autonomous driving.However,traditional visual SLAM faces challenges in dynamic environments.To address this issue,researchers have proposed ...Visual simultaneous localization and mapping(SLAM)is crucial in robotics and autonomous driving.However,traditional visual SLAM faces challenges in dynamic environments.To address this issue,researchers have proposed semantic SLAM,which combines object detection,semantic segmentation,instance segmentation,and visual SLAM.Despite the growing body of literature on semantic SLAM,there is currently a lack of comprehensive research on the integration of object detection and visual SLAM.Therefore,this study aims to gather information from multiple databases and review relevant literature using specific keywords.It focuses on visual SLAM based on object detection,covering different aspects.Firstly,it discusses the current research status and challenges in this field,highlighting methods for incorporating semantic information from object detection networks into mileage measurement,closed-loop detection,and map construction.It also compares the characteristics and performance of various visual SLAM object detection algorithms.Lastly,it provides an outlook on future research directions and emerging trends in visual SLAM.Research has shown that visual SLAM based on object detection has significant improvements compared to traditional SLAM in dynamic point removal,data association,point cloud segmentation,and other technologies.It can improve the robustness and accuracy of the entire SLAM system and can run in real time.With the continuous optimization of algorithms and the improvement of hardware level,object visual SLAM has great potential for development.展开更多
Simultaneous localization and mapping(SLAM)has attracted considerable research interest from the robotics and computer-vision communities for>30 years.With steady and progressive efforts being made,modern SLAM syst...Simultaneous localization and mapping(SLAM)has attracted considerable research interest from the robotics and computer-vision communities for>30 years.With steady and progressive efforts being made,modern SLAM systems allow robust and online applications in real-world scenes.We examined the evolution of this powerful perception tool in detail and noticed that the insights concerning incremental computation and temporal guidance are persistently retained.Herein,we denote this temporal continuity as a flow basis and present for the first time a survey that specifically focuses on the flow-based nature,ranging from geometric computation to the emerging learning techniques.We start by reviewing two essential stages for geometric computation,presenting the de facto standard pipeline and problem formulation,along with the utilization of temporal cues.The recently emerging techniques are then summarized,covering a wide range of areas,such as learning techniques,sensor fusion,and continuous time trajectory modeling.This survey aims at arousing public attention on how robust SLAM systems benefit from a continuously observing nature,as well as the topics worthy of further investigation for better utilizing the temporal cues.展开更多
Monocular visual odometry (VO) is the process of determining a user’s trajectory through a series of consecutive images taken by a single camera. A major problem that affects the accuracy of monocular visual odometry...Monocular visual odometry (VO) is the process of determining a user’s trajectory through a series of consecutive images taken by a single camera. A major problem that affects the accuracy of monocular visual odometry, however, is the scale ambiguity. This research proposes an innovative augmentation technique, which resolves the scale ambiguity problem of monocular visual odometry. The proposed technique augments the camera images with range measurements taken by an ultra-low-cost laser device known as the Spike. The size of the Spike laser rangefinder is small and can be mounted on a smartphone. Two datasets were collected along precisely surveyed tracks, both outdoor and indoor, to assess the effectiveness of the proposed technique. The coordinates of both tracks were determined using a total station to serve as a ground truth. In order to calibrate the smartphone’s camera, seven images of a checkerboard were taken from different positions and angles and then processed using a MATLAB-based camera calibration toolbox. Subsequently, the speeded-up robust features (SURF) method was used for image feature detection and matching. The random sample consensus (RANSAC) algorithm was then used to remove the outliers in the matched points between the sequential images. The relative orientation and translation between the frames were computed and then scaled using the spike measurements in order to obtain the scaled trajectory. Subsequently, the obtained scaled trajectory was used to construct the surrounding scene using the structure from motion (SfM) technique. Finally, both of the computed camera trajectory and the constructed scene were compared with ground truth. It is shown that the proposed technique allows for achieving centimeter-level accuracy in monocular VO scale recovery, which in turn leads to an enhanced mapping accuracy.展开更多
Background Based on the seminal work proposed by Zhou et al., much of the recent progress in learning monocular visual odometry, i.e., depth and camera motion from monocular videos, can be attributed to the tricks in ...Background Based on the seminal work proposed by Zhou et al., much of the recent progress in learning monocular visual odometry, i.e., depth and camera motion from monocular videos, can be attributed to the tricks in the training procedure, such as data augmentation and learning objectives. Methods Herein, we categorize a collection of such tricks through the theoretical examination and empirical evaluation of their effects on the final accuracy of the visual odometry. Results/Conclusions By combining the aforementioned tricks, we were able to significantly improve a baseline model adapted from SfMLearner without additional inference costs. Furthermore, we analyzed the principles of these tricks and the reason for their success. Practical guidelines for future research are also presented.展开更多
To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, key...To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, keypoints and descriptors based on cross-modality fusion are proposed and applied to the study of camera motion estimation. A convolutional neural network is used to detect the positions of keypoints and generate the corresponding descriptors, and the pyramid convolution is used to extract multi-scale features in the network. The problem of local similarity of images is solved by capturing local and global feature information and fusing the geometric position information of keypoints to generate descriptors. According to our experiments, the repeatability of our method is improved by 3.7%, and the homography estimation is improved by 1.6%. To demonstrate the practicability of the method, the visual odometry part of simultaneous localization and mapping is constructed and our method is 35% higher positioning accuracy than the traditional method.展开更多
The two topics of the article seem to have absolutely nothing to do with each other and,as can be expected in a contribution in honor and memory of Prof.Fritz Ackermann,they are linked in his person.Vision-based Navig...The two topics of the article seem to have absolutely nothing to do with each other and,as can be expected in a contribution in honor and memory of Prof.Fritz Ackermann,they are linked in his person.Vision-based Navigation was the focus of the doctoral thesis written by the author,the 29th and last PhD thesis supervised by Prof.Ackermann.The International Master’s Program Photogrammetry and Geoinformatics,which the author established with colleagues at Stuttgart University of Applied Sciences(HfT Stuttgart)in 1999,was a consequence of Prof.Ackermann’s benevolent promotion of international knowledge transfer in teaching.Both topics are reflected in this article;they provide further splashes of color in Prof.Ackermann’s oeuvre.展开更多
In recent years, there have been a lot of interests in incorporating semantics into simultaneous localization and mapping (SLAM) systems. This paper presents an approach to generate an outdoor large-scale 3D dense s...In recent years, there have been a lot of interests in incorporating semantics into simultaneous localization and mapping (SLAM) systems. This paper presents an approach to generate an outdoor large-scale 3D dense semantic map based on binocular stereo vision. The inputs to system are stereo color images from a moving vehicle. First, dense 3D space around the vehicle is constructed, and tile motion of camera is estimated by visual odometry. Meanwhile, semantic segmentation is performed through the deep learning technology online, and the semantic labels are also used to verify tim feature matching in visual odometry. These three processes calculate the motion, depth and semantic label of every pixel in the input views. Then, a voxel conditional random field (CRF) inference is introduced to fuse semantic labels to voxel. After that, we present a method to remove the moving objects by incorporating the semantic labels, which improves the motion segmentation accuracy. The last is to generate tile dense 3D semantic map of an urban environment from arbitrary long image sequence. We evaluate our approach on KITTI vision benchmark, and the results show that the proposed method is effective.展开更多
基金the National Natural Science Foundation of China(No.62063006)the Natural Science Foundation of Guangxi Province(No.2023GXNS-FAA026025)+3 种基金the Innovation Fund of Chinese Universities Industry-University-Research(ID:2021RYC06005)the Research Project for Young andMiddle-Aged Teachers in Guangxi Universi-ties(ID:2020KY15013)the Special Research Project of Hechi University(ID:2021GCC028)financially supported by the Project of Outstanding Thousand Young Teachers’Training in Higher Education Institutions of Guangxi,Guangxi Colleges and Universities Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region.
文摘Dynamic Simultaneous Localization and Mapping(SLAM)in visual scenes is currently a major research area in fields such as robot navigation and autonomous driving.However,in the face of complex real-world envi-ronments,current dynamic SLAM systems struggle to achieve precise localization and map construction.With the advancement of deep learning,there has been increasing interest in the development of deep learning-based dynamic SLAM visual odometry in recent years,and more researchers are turning to deep learning techniques to address the challenges of dynamic SLAM.Compared to dynamic SLAM systems based on deep learning methods such as object detection and semantic segmentation,dynamic SLAM systems based on instance segmentation can not only detect dynamic objects in the scene but also distinguish different instances of the same type of object,thereby reducing the impact of dynamic objects on the SLAM system’s positioning.This article not only introduces traditional dynamic SLAM systems based on mathematical models but also provides a comprehensive analysis of existing instance segmentation algorithms and dynamic SLAM systems based on instance segmentation,comparing and summarizing their advantages and disadvantages.Through comparisons on datasets,it is found that instance segmentation-based methods have significant advantages in accuracy and robustness in dynamic environments.However,the real-time performance of instance segmentation algorithms hinders the widespread application of dynamic SLAM systems.In recent years,the rapid development of single-stage instance segmentationmethods has brought hope for the widespread application of dynamic SLAM systems based on instance segmentation.Finally,possible future research directions and improvementmeasures are discussed for reference by relevant professionals.
基金supported by National Key R&D Plan(2017YFB1301104),NSFC(61877040,61772351)Sci-Tech Innovation Fundamental Scientific Research Funds(025195305000)(19210010005),academy for multidisciplinary study of Capital Normal University。
文摘Error or drift is frequently produced in pose estimation based on geometric"feature detection and tracking"monocular visual odometry(VO)when the speed of camera movement exceeds 1.5 m/s.While,in most VO methods based on deep learning,weight factors are in the form of fixed values,which are easy to lead to overfitting.A new measurement system,for monocular visual odometry,named Deep Learning Visual Odometry(DLVO),is proposed based on neural network.In this system,Convolutional Neural Network(CNN)is used to extract feature and perform feature matching.Moreover,Recurrent Neural Network(RNN)is used for sequence modeling to estimate camera’s 6-dof poses.Instead of fixed weight values of CNN,Bayesian distribution of weight factors are introduced in order to effectively solve the problem of network overfitting.The 18,726 frame images in KITTI dataset are used for training network.This system can increase the generalization ability of network model in prediction process.Compared with original Recurrent Convolutional Neural Network(RCNN),our method can reduce the loss of test model by 5.33%.And it’s an effective method in improving the robustness of translation and rotation information than traditional VO methods.
文摘Estimating the global position of a road vehicle without using GPS is a challenge that many scientists look forward to solving in the near future. Normally, inertial and odometry sensors are used to complement GPS measures in an attempt to provide a means for maintaining vehicle odometry during GPS outage. Nonetheless, recent experiments have demonstrated that computer vision can also be used as a valuable source to provide what can be denoted as visual odometry. For this purpose, vehicle motion can be estimated using a non-linear, photogrametric approach based on RAndom SAmple Consensus (RANSAC). The results prove that the detection and selection of relevant feature points is a crucial factor in the global performance of the visual odometry algorithm. The key issues for further improvement are discussed in this letter.
基金Supported by Tianjin Municipal Natural Science Foundation of China(Grant No.19JCJQJC61600)Hebei Provincial Natural Science Foundation of China(Grant Nos.F2020202051,F2020202053).
文摘Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly distributed features because dense features occupy excessive weight.Herein,a new human visual attention mechanism for point-and-line stereo visual odometry,which is called point-line-weight-mechanism visual odometry(PLWM-VO),is proposed to describe scene features in a global and balanced manner.A weight-adaptive model based on region partition and region growth is generated for the human visual attention mechanism,where sufficient attention is assigned to position-distinctive objects(sparse features in the environment).Furthermore,the sum of absolute differences algorithm is used to improve the accuracy of initialization for line features.Compared with the state-of-the-art method(ORB-VO),PLWM-VO show a 36.79%reduction in the absolute trajectory error on the Kitti and Euroc datasets.Although the time consumption of PLWM-VO is higher than that of ORB-VO,online test results indicate that PLWM-VO satisfies the real-time demand.The proposed algorithm not only significantly promotes the environmental adaptability of visual odometry,but also quantitatively demonstrates the superiority of the human visual attention mechanism.
文摘Robust and efficient vision systems are essential in such a way to support different kinds of autonomous robotic behaviors linked to the capability to interact with the surrounding environment, without relying on any a priori knowledge. Within space missions, above all those involving rovers that have to explore planetary surfaces, vision can play a key role in the improvement of autonomous navigation functionalities: besides obstacle avoidance and hazard detection along the traveling, vision can in fact provide accurate motion estimation in order to constantly monitor all paths executed by the rover. The present work basically regards the development of an effective visual odometry system, focusing as much as possible on issues such as continuous operating mode, system speed and reliability.
基金funded by the National Natural Science Foundation of China(Grant No.61502188).
文摘This paper aims at a semi-dense visual odometry system that is accurate,robust,and able to run realtime on mobile devices,such as smartphones,AR glasses and small drones.The key contributions of our system include:1)the modified pyramidal Lucas-Kanade algorithm which incorporates spatial and depth constraints for fast and accurate camera pose estimation;2)adaptive image resizing based on inertial sensors for greatly accelerating tracking speed with little accuracy degradation;and 3)an ultrafast binary feature description based directly on intensities of a resized and smoothed image patch around each pixel that is sufficiently effective for relocalization.A quantitative evaluation on public datasets demonstrates that our system achieves better tracking accuracy and up to about 2X faster tracking speed comparing to the state-of-the-art monocular SLAM system:LSD-SLAM.For the relocalization task,our system is 2.0X∼4.6X faster than DBoW2 and achieves a similar accuracy.
基金Project supported by the National Natural Science Foundation of China (Nos. 61071219 and 90820306) and the Fundamental Research Funds for the Central Universities, China
文摘Efficient and precise localization is a prerequisite for the intelligent navigation of mobile robots. Traditional visual localization systems, such as visual odometry (VO) and simultaneous localization and mapping (SLAM), suffer from two shortcomings: a drift problem caused by accumulated localization error, and erroneous motion estimation due to illumination variation and moving objects. In this paper, we propose an enhanced VO by introducing a panoramic camera into the traditional stereo-only VO system. Benefiting from the 360° field of view, the panoramic camera is responsible for three tasks: (1) detect- ing road junctions and building a landmark library online; (2) correcting the robot's position when the landmarks are revisited with any orientation; (3) working as a panoramic compass when the stereo VO cannot provide reliable positioning results. To use the large-sized panoramic images efficiently, the concept of compressed sensing is introduced into the solution and an adap- tive compressive feature is presented. Combined with our previous two-stage local binocular bundle adjustment (TLBBA) stereo VO, the new system can obtain reliable positioning results in quasi-real time. Experimental results of challenging long-range tests show that our enhanced VO is much more accurate and robust than the traditional VO, thanks to the compressive panoramic landmarks built online.
基金National Key R&D Program of China under Grant No.2018YFB2100601National Natural Science Foundation of China under Grant Nos.61872024 and 61702482。
文摘Visual odometry,which aims to estimate relative camera motion between sequential video frames,has been widely used in the fields of augmented reality,virtual reality,and autonomous driving.However,it is still quite challenging for stateof-the-art approaches to handle low-texture scenes.In this paper,we propose a robust and efficient visual odometry algorithm that directly utilizes edge pixels to track camera pose.In contrast to direct methods,we choose reprojection error to construct the optimization energy,which can effectively cope with illumination changes.The distance transform map built upon edge detection for each frame is used to improve tracking efficiency.A novel weighted edge alignment method together with sliding window optimization is proposed to further improve the accuracy.Experiments on public datasets show that the method is comparable to stateof-the-art methods in terms of tracking accuracy,while being faster and more robust.
文摘In this paper,we present a novel algorithm for odometry estimation based on ceiling vision.The main contribution of this algorithm is the introduction of principal direction detection that can greatly reduce error accumulation problem in most visual odometry estimation approaches.The principal direction is defned based on the fact that our ceiling is flled with artifcial vertical and horizontal lines which can be used as reference for the current robot s heading direction.The proposed approach can be operated in real-time and it performs well even with camera s disturbance.A moving low-cost RGB-D camera(Kinect),mounted on a robot,is used to continuously acquire point clouds.Iterative closest point(ICP) is the common way to estimate the current camera position by registering the currently captured point cloud to the previous one.However,its performance sufers from data association problem or it requires pre-alignment information.The performance of the proposed principal direction detection approach does not rely on data association knowledge.Using this method,two point clouds are properly pre-aligned.Hence,we can use ICP to fne-tune the transformation parameters and minimize registration error.Experimental results demonstrate the performance and stability of the proposed system under disturbance in real-time.Several indoor tests are carried out to show that the proposed visual odometry estimation method can help to signifcantly improve the accuracy of simultaneous localization and mapping(SLAM).
基金supported in part by the National Natural Science Foundation of China(Grant No.U1913201,U22B2041)Natural Science Foundation of Liaoning Province(Grant No.2019-ZD-0169).
文摘Simultaneous localisation and mapping(SLAM)are the basis for many robotic applications.As the front end of SLAM,visual odometry is mainly used to estimate camera pose.In dynamic scenes,classical methods are deteriorated by dynamic objects and cannot achieve satisfactory results.In order to improve the robustness of visual odometry in dynamic scenes,this paper proposed a dynamic region detection method based on RGBD images.Firstly,all feature points on the RGB image are classified as dynamic and static using a triangle constraint and the epipolar geometric constraint successively.Meanwhile,the depth image is clustered using the K-Means method.The classified feature points are mapped to the clustered depth image,and a dynamic or static label is assigned to each cluster according to the number of dynamic feature points.Subsequently,a dynamic region mask for the RGB image is generated based on the dynamic clusters in the depth image,and the feature points covered by the mask are all removed.The remaining static feature points are applied to estimate the camera pose.Finally,some experimental results are provided to demonstrate the feasibility and performance.
基金the National Natural Science Foundation of China(No.62063006)to the Natural Science Foundation of Guangxi Province(No.2023GXNS-FAA026025)+3 种基金to the Innovation Fund of Chinese Universities Industry-University-Research(ID:2021RYC06005)to the Research Project for Young and Middle-aged Teachers in Guangxi Universities(ID:2020KY15013)to the Special Research Project of Hechi University(ID:2021GCC028)supported by the Project of Outstanding Thousand Young Teachers’Training in Higher Education Institutions of Guangxi,Guangxi Colleges and Universities Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region.
文摘Visual simultaneous localization and mapping(SLAM)is crucial in robotics and autonomous driving.However,traditional visual SLAM faces challenges in dynamic environments.To address this issue,researchers have proposed semantic SLAM,which combines object detection,semantic segmentation,instance segmentation,and visual SLAM.Despite the growing body of literature on semantic SLAM,there is currently a lack of comprehensive research on the integration of object detection and visual SLAM.Therefore,this study aims to gather information from multiple databases and review relevant literature using specific keywords.It focuses on visual SLAM based on object detection,covering different aspects.Firstly,it discusses the current research status and challenges in this field,highlighting methods for incorporating semantic information from object detection networks into mileage measurement,closed-loop detection,and map construction.It also compares the characteristics and performance of various visual SLAM object detection algorithms.Lastly,it provides an outlook on future research directions and emerging trends in visual SLAM.Research has shown that visual SLAM based on object detection has significant improvements compared to traditional SLAM in dynamic point removal,data association,point cloud segmentation,and other technologies.It can improve the robustness and accuracy of the entire SLAM system and can run in real time.With the continuous optimization of algorithms and the improvement of hardware level,object visual SLAM has great potential for development.
基金National Key Research and Development Program of China(2017YFB1002601)National Natural Science Foundation of China(61632003,61771026)The authors thank Xin WANG,Qiuyuan WANG,Fei XUE,Pijian SUN,Shunkai LI,Junqiu WANG,Zhaoyang LV,and Wei DONG for their instructive discussion and feedback.
文摘Simultaneous localization and mapping(SLAM)has attracted considerable research interest from the robotics and computer-vision communities for>30 years.With steady and progressive efforts being made,modern SLAM systems allow robust and online applications in real-world scenes.We examined the evolution of this powerful perception tool in detail and noticed that the insights concerning incremental computation and temporal guidance are persistently retained.Herein,we denote this temporal continuity as a flow basis and present for the first time a survey that specifically focuses on the flow-based nature,ranging from geometric computation to the emerging learning techniques.We start by reviewing two essential stages for geometric computation,presenting the de facto standard pipeline and problem formulation,along with the utilization of temporal cues.The recently emerging techniques are then summarized,covering a wide range of areas,such as learning techniques,sensor fusion,and continuous time trajectory modeling.This survey aims at arousing public attention on how robust SLAM systems benefit from a continuously observing nature,as well as the topics worthy of further investigation for better utilizing the temporal cues.
文摘Monocular visual odometry (VO) is the process of determining a user’s trajectory through a series of consecutive images taken by a single camera. A major problem that affects the accuracy of monocular visual odometry, however, is the scale ambiguity. This research proposes an innovative augmentation technique, which resolves the scale ambiguity problem of monocular visual odometry. The proposed technique augments the camera images with range measurements taken by an ultra-low-cost laser device known as the Spike. The size of the Spike laser rangefinder is small and can be mounted on a smartphone. Two datasets were collected along precisely surveyed tracks, both outdoor and indoor, to assess the effectiveness of the proposed technique. The coordinates of both tracks were determined using a total station to serve as a ground truth. In order to calibrate the smartphone’s camera, seven images of a checkerboard were taken from different positions and angles and then processed using a MATLAB-based camera calibration toolbox. Subsequently, the speeded-up robust features (SURF) method was used for image feature detection and matching. The random sample consensus (RANSAC) algorithm was then used to remove the outliers in the matched points between the sequential images. The relative orientation and translation between the frames were computed and then scaled using the spike measurements in order to obtain the scaled trajectory. Subsequently, the obtained scaled trajectory was used to construct the surrounding scene using the structure from motion (SfM) technique. Finally, both of the computed camera trajectory and the constructed scene were compared with ground truth. It is shown that the proposed technique allows for achieving centimeter-level accuracy in monocular VO scale recovery, which in turn leads to an enhanced mapping accuracy.
文摘Background Based on the seminal work proposed by Zhou et al., much of the recent progress in learning monocular visual odometry, i.e., depth and camera motion from monocular videos, can be attributed to the tricks in the training procedure, such as data augmentation and learning objectives. Methods Herein, we categorize a collection of such tricks through the theoretical examination and empirical evaluation of their effects on the final accuracy of the visual odometry. Results/Conclusions By combining the aforementioned tricks, we were able to significantly improve a baseline model adapted from SfMLearner without additional inference costs. Furthermore, we analyzed the principles of these tricks and the reason for their success. Practical guidelines for future research are also presented.
基金Supported by the National Natural Science Foundation of China (61802253)。
文摘To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, keypoints and descriptors based on cross-modality fusion are proposed and applied to the study of camera motion estimation. A convolutional neural network is used to detect the positions of keypoints and generate the corresponding descriptors, and the pyramid convolution is used to extract multi-scale features in the network. The problem of local similarity of images is solved by capturing local and global feature information and fusing the geometric position information of keypoints to generate descriptors. According to our experiments, the repeatability of our method is improved by 3.7%, and the homography estimation is improved by 1.6%. To demonstrate the practicability of the method, the visual odometry part of simultaneous localization and mapping is constructed and our method is 35% higher positioning accuracy than the traditional method.
文摘The two topics of the article seem to have absolutely nothing to do with each other and,as can be expected in a contribution in honor and memory of Prof.Fritz Ackermann,they are linked in his person.Vision-based Navigation was the focus of the doctoral thesis written by the author,the 29th and last PhD thesis supervised by Prof.Ackermann.The International Master’s Program Photogrammetry and Geoinformatics,which the author established with colleagues at Stuttgart University of Applied Sciences(HfT Stuttgart)in 1999,was a consequence of Prof.Ackermann’s benevolent promotion of international knowledge transfer in teaching.Both topics are reflected in this article;they provide further splashes of color in Prof.Ackermann’s oeuvre.
基金supported by National Natural Science Foundation of China(Nos.NSFC 61473042 and 61105092)Beijing Higher Education Young Elite Teacher Project(No.YETP1215)
文摘In recent years, there have been a lot of interests in incorporating semantics into simultaneous localization and mapping (SLAM) systems. This paper presents an approach to generate an outdoor large-scale 3D dense semantic map based on binocular stereo vision. The inputs to system are stereo color images from a moving vehicle. First, dense 3D space around the vehicle is constructed, and tile motion of camera is estimated by visual odometry. Meanwhile, semantic segmentation is performed through the deep learning technology online, and the semantic labels are also used to verify tim feature matching in visual odometry. These three processes calculate the motion, depth and semantic label of every pixel in the input views. Then, a voxel conditional random field (CRF) inference is introduced to fuse semantic labels to voxel. After that, we present a method to remove the moving objects by incorporating the semantic labels, which improves the motion segmentation accuracy. The last is to generate tile dense 3D semantic map of an urban environment from arbitrary long image sequence. We evaluate our approach on KITTI vision benchmark, and the results show that the proposed method is effective.