Background Lack of depth perception from medical imaging systems is one of the long-standing technological limitations of minimally invasive surgeries.The ability to visualize anatomical structures in 3D can improve c...Background Lack of depth perception from medical imaging systems is one of the long-standing technological limitations of minimally invasive surgeries.The ability to visualize anatomical structures in 3D can improve conventional arthroscopic surgeries,as a full 3D semantic representation of the surgical site can directly improve surgeons’ability.It also brings the possibility of intraoperative image registration with preoperative clinical records for the development of semi-autonomous,and fully autonomous platforms.This study aimed to present a novel monocular depth prediction model to infer depth maps from a single-color arthroscopic video frame.Methods We applied a novel technique that provides the ability to combine both supervised and self-supervised loss terms and thus eliminate the drawback of each technique.It enabled the estimation of edge-preserving depth maps from a single untextured arthroscopic frame.The proposed image acquisition technique projected artificial textures on the surface to improve the quality of disparity maps from stereo images.Moreover,following the integration of the attention-ware multi-scale feature extraction technique along with scene global contextual constraints and multiscale depth fusion,the model could predict reliable and accurate tissue depth of the surgical sites that complies with scene geometry.Results A total of 4,128 stereo frames from a knee phantom were used to train a network,and during the pre-trained stage,the network learned disparity maps from the stereo images.The fine-tuned training phase uses 12,695 knee arthroscopic stereo frames from cadaver experiments along with their corresponding coarse disparity maps obtained from the stereo matching technique.In a supervised fashion,the network learns the left image to the disparity map transformation process,whereas the self-supervised loss term refines the coarse depth map by minimizing reprojection,gradients,and structural dissimilarity loss.Together,our method produces high-quality 3D maps with minimum re-projection loss that are 0.0004132(structural similarity index),0.00036120156(L1 error distance)and 6.591908×10^(−5)(L1 gradient error distance).Conclusion Machine learning techniques for monocular depth prediction is studied to infer accurate depth maps from a single-color arthroscopic video frame.Moreover,the study integrates segmentation model hence,3D segmented maps are inferred that provides extended perception ability and tissue awareness.展开更多
Background Knee arthroscopy is one of the most complex minimally invasive surgeries,and it is routinelyperformed to treat a range of ailments and injuries to the knee joint.Its complex ergonomic design imposesvisualiz...Background Knee arthroscopy is one of the most complex minimally invasive surgeries,and it is routinelyperformed to treat a range of ailments and injuries to the knee joint.Its complex ergonomic design imposesvisualization and navigation constraints,consequently leading to unintended tissue damage and a steep learningcurve before surgeons gain proficiency.The lack of robust visual texture and landmark frame features furtherlimits the success of image-guided approaches to knee arthroscopy Feature-and texture-less tissue structures ofknee anatomy,lighting conditions,noise,blur,debris,lack of accurate ground-truth label,tissue degeneration,and injury make semantic segmentation an extremely challenging task.To address this complex research problem,this study reported the utility of reconstructed surface reflectance as a viable piece of information that could beused with cutting-edge deep learning technique to achieve highly accurate segmented scenes.Methods We proposed an intraoperative,two-tier deep learning method that makes full use of tissue reflectanceinformation present within an RGB frame to segment texture-less images into multiple tissue types from kneearthroscopy video frames.This study included several cadaver knees experiments at the Medical and EngineeringResearch Facility,located within the Prince Charles Hospital campus,Brisbane Queensland.Data were collectedfrom a total of five cadaver knees,three were males and one from a female.The age range of the donors was 56–93 years.Aging-related tissue degeneration and some anterior cruciate ligament injury were observed in mostcadaver knees.An arthroscopic image dataset was created and subsequently labeled by clinical experts.Thisstudy also included validation of a prototype stereo arthroscope,along with conventional arthroscope,to attainlarger field of view and stereo vision.We reconstructed surface reflectance from camera responses that exhibiteddistinct spatial features at different wavelengths ranging from 380 to 730 nm in the RGB spectrum.Toward theaim to segment texture-less tissue types,this data was used within a two-stage deep learning model.Results The accuracy of the network was measured using dice coefficient score.The average segmentationaccuracy for the tissue-type articular cruciate ligament(ACL)was 0.6625,for the tissue-type bone was 0.84,and for the tissue-type meniscus was 0.565.For the analysis,we excluded extremely poor quality of frames.Here,a frame is considered extremely poor quality when more than 50%of any tissue structures are over-orunderexposed due to nonuniform light exposure.Additionally,when only high quality of frames was consideredduring the training and validation stage,the average bone segmentation accuracy improved to 0.92 and theaverage ACL segmentation accuracy reached 0.73.These two tissue types,namely,femur bone and ACL,have ahigh importance in arthroscopy for tissue tracking.Comparatively,the previous work based on RGB data achieveda much lower average accuracy for femur,tibia,ACL,and meniscus of 0.78,0.50,0.41,and 0.43 using U-Net and0.79,0.50,0.51,and 0.48 using U-Net++.From this analysis,it is clear that our multispectral method outperformsthe previously proposed methods and delivers a much better solution in achieving automatic arthroscopic scenesegmentation.Conclusion The method was based on the deep learning model and requires a reconstructed surface reflectance.It could provide tissue awareness in an intraoperative manner that has a high potential to improve surgicalprecisions.It could be applied to other minimally invasive surgeries as an online segmentation tool for training,aiding,and guiding the surgeons as well as image-guided surgeries.展开更多
基金supported by the Australian Indian Strategic Research Fund(Project AISRF53820).
文摘Background Lack of depth perception from medical imaging systems is one of the long-standing technological limitations of minimally invasive surgeries.The ability to visualize anatomical structures in 3D can improve conventional arthroscopic surgeries,as a full 3D semantic representation of the surgical site can directly improve surgeons’ability.It also brings the possibility of intraoperative image registration with preoperative clinical records for the development of semi-autonomous,and fully autonomous platforms.This study aimed to present a novel monocular depth prediction model to infer depth maps from a single-color arthroscopic video frame.Methods We applied a novel technique that provides the ability to combine both supervised and self-supervised loss terms and thus eliminate the drawback of each technique.It enabled the estimation of edge-preserving depth maps from a single untextured arthroscopic frame.The proposed image acquisition technique projected artificial textures on the surface to improve the quality of disparity maps from stereo images.Moreover,following the integration of the attention-ware multi-scale feature extraction technique along with scene global contextual constraints and multiscale depth fusion,the model could predict reliable and accurate tissue depth of the surgical sites that complies with scene geometry.Results A total of 4,128 stereo frames from a knee phantom were used to train a network,and during the pre-trained stage,the network learned disparity maps from the stereo images.The fine-tuned training phase uses 12,695 knee arthroscopic stereo frames from cadaver experiments along with their corresponding coarse disparity maps obtained from the stereo matching technique.In a supervised fashion,the network learns the left image to the disparity map transformation process,whereas the self-supervised loss term refines the coarse depth map by minimizing reprojection,gradients,and structural dissimilarity loss.Together,our method produces high-quality 3D maps with minimum re-projection loss that are 0.0004132(structural similarity index),0.00036120156(L1 error distance)and 6.591908×10^(−5)(L1 gradient error distance).Conclusion Machine learning techniques for monocular depth prediction is studied to infer accurate depth maps from a single-color arthroscopic video frame.Moreover,the study integrates segmentation model hence,3D segmented maps are inferred that provides extended perception ability and tissue awareness.
基金This work was supported by the Australian Indian Strategic ResearchFund(AISRF,Project No.AISF53820).
文摘Background Knee arthroscopy is one of the most complex minimally invasive surgeries,and it is routinelyperformed to treat a range of ailments and injuries to the knee joint.Its complex ergonomic design imposesvisualization and navigation constraints,consequently leading to unintended tissue damage and a steep learningcurve before surgeons gain proficiency.The lack of robust visual texture and landmark frame features furtherlimits the success of image-guided approaches to knee arthroscopy Feature-and texture-less tissue structures ofknee anatomy,lighting conditions,noise,blur,debris,lack of accurate ground-truth label,tissue degeneration,and injury make semantic segmentation an extremely challenging task.To address this complex research problem,this study reported the utility of reconstructed surface reflectance as a viable piece of information that could beused with cutting-edge deep learning technique to achieve highly accurate segmented scenes.Methods We proposed an intraoperative,two-tier deep learning method that makes full use of tissue reflectanceinformation present within an RGB frame to segment texture-less images into multiple tissue types from kneearthroscopy video frames.This study included several cadaver knees experiments at the Medical and EngineeringResearch Facility,located within the Prince Charles Hospital campus,Brisbane Queensland.Data were collectedfrom a total of five cadaver knees,three were males and one from a female.The age range of the donors was 56–93 years.Aging-related tissue degeneration and some anterior cruciate ligament injury were observed in mostcadaver knees.An arthroscopic image dataset was created and subsequently labeled by clinical experts.Thisstudy also included validation of a prototype stereo arthroscope,along with conventional arthroscope,to attainlarger field of view and stereo vision.We reconstructed surface reflectance from camera responses that exhibiteddistinct spatial features at different wavelengths ranging from 380 to 730 nm in the RGB spectrum.Toward theaim to segment texture-less tissue types,this data was used within a two-stage deep learning model.Results The accuracy of the network was measured using dice coefficient score.The average segmentationaccuracy for the tissue-type articular cruciate ligament(ACL)was 0.6625,for the tissue-type bone was 0.84,and for the tissue-type meniscus was 0.565.For the analysis,we excluded extremely poor quality of frames.Here,a frame is considered extremely poor quality when more than 50%of any tissue structures are over-orunderexposed due to nonuniform light exposure.Additionally,when only high quality of frames was consideredduring the training and validation stage,the average bone segmentation accuracy improved to 0.92 and theaverage ACL segmentation accuracy reached 0.73.These two tissue types,namely,femur bone and ACL,have ahigh importance in arthroscopy for tissue tracking.Comparatively,the previous work based on RGB data achieveda much lower average accuracy for femur,tibia,ACL,and meniscus of 0.78,0.50,0.41,and 0.43 using U-Net and0.79,0.50,0.51,and 0.48 using U-Net++.From this analysis,it is clear that our multispectral method outperformsthe previously proposed methods and delivers a much better solution in achieving automatic arthroscopic scenesegmentation.Conclusion The method was based on the deep learning model and requires a reconstructed surface reflectance.It could provide tissue awareness in an intraoperative manner that has a high potential to improve surgicalprecisions.It could be applied to other minimally invasive surgeries as an online segmentation tool for training,aiding,and guiding the surgeons as well as image-guided surgeries.