This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as o...This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.展开更多
Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learni...Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learning(DL)methods automate crack detection,but many still struggle with variable crack patterns and environmental conditions.This study aims to address these limitations by introducing the Masker Transformer,a novel hybrid deep learning model that integrates the precise localization capabilities of Mask Region-based Convolutional Neural Network(Mask R-CNN)with the global contextual awareness of Vision Transformer(ViT).The research focuses on leveraging the strengths of both architectures to enhance segmentation accuracy and adaptability across different pavement conditions.We evaluated the performance of theMaskerTransformer against other state-of-theartmodels such asU-Net,TransformerU-Net(TransUNet),U-NetTransformer(UNETr),SwinU-NetTransformer(Swin-UNETr),You Only Look Once version 8(YoloV8),and Mask R-CNN using two benchmark datasets:Crack500 and DeepCrack.The findings reveal that the MaskerTransformer significantly outperforms the existing models,achieving the highest Dice SimilarityCoefficient(DSC),precision,recall,and F1-Score across both datasets.Specifically,the model attained a DSC of 80.04%on Crack500 and 91.37%on DeepCrack,demonstrating superior segmentation accuracy and reliability.The high precision and recall rates further substantiate its effectiveness in real-world applications,suggesting that the Masker Transformer can serve as a robust tool for automated pavement crack detection,potentially replacing more traditional methods.展开更多
As positioning sensors,edge computation power,and communication technologies continue to develop,a moving agent can now sense its surroundings and communicate with other agents.By receiving spatial information from bo...As positioning sensors,edge computation power,and communication technologies continue to develop,a moving agent can now sense its surroundings and communicate with other agents.By receiving spatial information from both its environment and other agents,an agent can use various methods and sensor types to localize itself.With its high flexibility and robustness,collaborative positioning has become a widely used method in both military and civilian applications.This paper introduces the basic fundamental concepts and applications of collaborative positioning,and reviews recent progress in the field based on camera,LiDAR(Light Detection and Ranging),wireless sensor,and their integration.The paper compares the current methods with respect to their sensor type,summarizes their main paradigms,and analyzes their evaluation experiments.Finally,the paper discusses the main challenges and open issues that require further research.展开更多
With the rapid development of drones and autonomous vehicles, miniaturized and lightweight vision sensors that can track targets are of great interests. Limited by the flat structure, conventional image sensors apply ...With the rapid development of drones and autonomous vehicles, miniaturized and lightweight vision sensors that can track targets are of great interests. Limited by the flat structure, conventional image sensors apply a large number of lenses to achieve corresponding functions, increasing the overall volume and weight of the system.展开更多
AIM:To investigate the frequency and associated factors of accommodation and non-strabismic binocular vision dysfunction among medical university students.METHODS:Totally 158 student volunteers underwent routine visio...AIM:To investigate the frequency and associated factors of accommodation and non-strabismic binocular vision dysfunction among medical university students.METHODS:Totally 158 student volunteers underwent routine vision examination in the optometry clinic of Guangxi Medical University.Their data were used to identify the different types of accommodation and nonstrabismic binocular vision dysfunction and to determine their frequency.Correlation analysis and logistic regression were used to examine the factors associated with these abnormalities.RESULTS:The results showed that 36.71%of the subjects had accommodation and non-strabismic binocular vision issues,with 8.86%being attributed to accommodation dysfunction and 27.85%to binocular abnormalities.Convergence insufficiency(CI)was the most common abnormality,accounting for 13.29%.Those with these abnormalities experienced higher levels of eyestrain(χ2=69.518,P<0.001).The linear correlations were observed between the difference of binocular spherical equivalent(SE)and the index of horizontal esotropia at a distance(r=0.231,P=0.004)and the asthenopia survey scale(ASS)score(r=0.346,P<0.001).Furthermore,the right eye's SE was inversely correlated with the convergence of positive and negative fusion images at close range(r=-0.321,P<0.001),the convergence of negative fusion images at close range(r=-0.294,P<0.001),the vergence facility(VF;r=-0.234,P=0.003),and the set of negative fusion images at far range(r=-0.237,P=0.003).Logistic regression analysis indicated that gender,age,and the difference in right and binocular SE did not influence the emergence of these abnormalities.CONCLUSION:Binocular vision abnormalities are more prevalent than accommodation dysfunction,with CI being the most frequent type.Greater binocular refractive disparity leads to more severe eyestrain symptoms.展开更多
BACKGROUND The importance of age on the development of ocular conditions has been reported by numerous studies.Diabetes may have different associations with different stages of ocular conditions,and the duration of di...BACKGROUND The importance of age on the development of ocular conditions has been reported by numerous studies.Diabetes may have different associations with different stages of ocular conditions,and the duration of diabetes may affect the development of diabetic eye disease.While there is a dose-response relationship between the age at diagnosis of diabetes and the risk of cardiovascular disease and mortality,whether the age at diagnosis of diabetes is associated with incident ocular conditions remains to be explored.It is unclear which types of diabetes are more predictive of ocular conditions.AIM To examine associations between the age of diabetes diagnosis and the incidence of cataract,glaucoma,age-related macular degeneration(AMD),and vision acuity.METHODS Our analysis was using the UK Biobank.The cohort included 8709 diabetic participants and 17418 controls for ocular condition analysis,and 6689 diabetic participants and 13378 controls for vision analysis.Ocular diseases were identified using inpatient records until January 2021.Vision acuity was assessed using a chart.RESULTS During a median follow-up of 11.0 years,3874,665,and 616 new cases of cataract,glaucoma,and AMD,respectively,were identified.A stronger association between diabetes and incident ocular conditions was observed where diabetes was diagnosed at a younger age.Individuals with type 2 diabetes(T2D)diagnosed at<45 years[HR(95%CI):2.71(1.49-4.93)],45-49 years[2.57(1.17-5.65)],50-54 years[1.85(1.13-3.04)],or 50-59 years of age[1.53(1.00-2.34)]had a higher risk of AMD independent of glycated haemoglobin.T2D diagnosed<45 years[HR(95%CI):2.18(1.71-2.79)],45-49 years[1.54(1.19-2.01)],50-54 years[1.60(1.31-1.96)],or 55-59 years of age[1.21(1.02-1.43)]was associated with an increased cataract risk.T2D diagnosed<45 years of age only was associated with an increased risk of glaucoma[HR(95%CI):1.76(1.00-3.12)].HRs(95%CIs)for AMD,cataract,and glaucoma associated with type 1 diabetes(T1D)were 4.12(1.99-8.53),2.95(2.17-4.02),and 2.40(1.09-5.31),respectively.In multivariable-adjusted analysis,individuals with T2D diagnosed<45 years of age[β95%CI:0.025(0.009,0.040)]had a larger increase in LogMAR.Theβ(95%CI)for LogMAR associated with T1D was 0.044(0.014,0.073).CONCLUSION The younger age at the diagnosis of diabetes is associated with a larger relative risk of incident ocular diseases and greater vision loss.展开更多
针对当前遥感农作物分类研究中深度学习模型对光谱时间和空间信息特征采样不足,农作物提取仍然存在边界模糊、漏提、误提的问题,提出了一种名为视觉Transformer-长短期记忆递归神经网络(Vision Transformer-long short term memory,ViTL...针对当前遥感农作物分类研究中深度学习模型对光谱时间和空间信息特征采样不足,农作物提取仍然存在边界模糊、漏提、误提的问题,提出了一种名为视觉Transformer-长短期记忆递归神经网络(Vision Transformer-long short term memory,ViTL)的深度学习模型,ViTL模型集成了双路Vision-Transformer特征提取、时空特征融合和长短期记忆递归神经网络(LSTM)时序分类等3个关键模块,双路Vision-Transformer特征提取模块用于捕获图像的时空特征相关性,一路提取空间分类特征,一路提取时间变化特征;时空特征融合模块用于将多时特征信息进行交叉融合;LSTM时序分类模块捕捉多时序的依赖关系并进行输出分类。综合利用基于多时序卫星影像的遥感技术理论和方法,对黑龙江省齐齐哈尔市讷河市作物信息进行提取,研究结果表明,ViTL模型表现出色,其总体准确率(Overall Accuracy,OA)、平均交并比(Mean Intersection over Union,MIoU)和F1分数分别达到0.8676、0.6987和0.8175,与其他广泛使用的深度学习方法相比,包括三维卷积神经网络(3-D CNN)、二维卷积神经网络(2-D CNN)和长短期记忆递归神经网络(LSTM),ViTL模型的F1分数提高了9%~12%,显示出显著的优越性。ViTL模型克服了面对多时序遥感影像的农作物分类任务中的时间和空间信息特征采样不足问题,为准确、高效地农作物分类提供了新思路。展开更多
The emergence of the Internet-of-Things is anticipated to create a vast market for what are known as smart edge devices,opening numerous opportunities across countless domains,including personalized healthcare and adv...The emergence of the Internet-of-Things is anticipated to create a vast market for what are known as smart edge devices,opening numerous opportunities across countless domains,including personalized healthcare and advanced robotics.Leveraging 3D integration,edge devices can achieve unprecedented miniaturization while simultaneously boosting processing power and minimizing energy consumption.Here,we demonstrate a back-end-of-line compatible optoelectronic synapse with a transfer learning method on health care applications,including electroencephalogram(EEG)-based seizure prediction,electromyography(EMG)-based gesture recognition,and electrocardiogram(ECG)-based arrhythmia detection.With experiments on three biomedical datasets,we observe the classification accuracy improvement for the pretrained model with 2.93%on EEG,4.90%on ECG,and 7.92%on EMG,respectively.The optical programming property of the device enables an ultralow power(2.8×10^(-13) J)fine-tuning process and offers solutions for patient-specific issues in edge computing scenarios.Moreover,the device exhibits impressive light-sensitive characteristics that enable a range of light-triggered synaptic functions,making it promising for neuromorphic vision application.To display the benefits of these intricate synaptic properties,a 5×5 optoelectronic synapse array is developed,effectively simulating human visual perception and memory functions.The proposed flexible optoelectronic synapse holds immense potential for advancing the fields of neuromorphic physiological signal processing and artificial visual systems in wearable applications.展开更多
基金funded by Woosong University Academic Research 2024.
文摘This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.
文摘Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learning(DL)methods automate crack detection,but many still struggle with variable crack patterns and environmental conditions.This study aims to address these limitations by introducing the Masker Transformer,a novel hybrid deep learning model that integrates the precise localization capabilities of Mask Region-based Convolutional Neural Network(Mask R-CNN)with the global contextual awareness of Vision Transformer(ViT).The research focuses on leveraging the strengths of both architectures to enhance segmentation accuracy and adaptability across different pavement conditions.We evaluated the performance of theMaskerTransformer against other state-of-theartmodels such asU-Net,TransformerU-Net(TransUNet),U-NetTransformer(UNETr),SwinU-NetTransformer(Swin-UNETr),You Only Look Once version 8(YoloV8),and Mask R-CNN using two benchmark datasets:Crack500 and DeepCrack.The findings reveal that the MaskerTransformer significantly outperforms the existing models,achieving the highest Dice SimilarityCoefficient(DSC),precision,recall,and F1-Score across both datasets.Specifically,the model attained a DSC of 80.04%on Crack500 and 91.37%on DeepCrack,demonstrating superior segmentation accuracy and reliability.The high precision and recall rates further substantiate its effectiveness in real-world applications,suggesting that the Masker Transformer can serve as a robust tool for automated pavement crack detection,potentially replacing more traditional methods.
基金National Natural Science Foundation of China(Grant No.62101138)Shandong Natural Science Foundation(Grant No.ZR2021QD148)+1 种基金Guangdong Natural Science Foundation(Grant No.2022A1515012573)Guangzhou Basic and Applied Basic Research Project(Grant No.202102020701)for providing funds for publishing this paper。
文摘As positioning sensors,edge computation power,and communication technologies continue to develop,a moving agent can now sense its surroundings and communicate with other agents.By receiving spatial information from both its environment and other agents,an agent can use various methods and sensor types to localize itself.With its high flexibility and robustness,collaborative positioning has become a widely used method in both military and civilian applications.This paper introduces the basic fundamental concepts and applications of collaborative positioning,and reviews recent progress in the field based on camera,LiDAR(Light Detection and Ranging),wireless sensor,and their integration.The paper compares the current methods with respect to their sensor type,summarizes their main paradigms,and analyzes their evaluation experiments.Finally,the paper discusses the main challenges and open issues that require further research.
文摘With the rapid development of drones and autonomous vehicles, miniaturized and lightweight vision sensors that can track targets are of great interests. Limited by the flat structure, conventional image sensors apply a large number of lenses to achieve corresponding functions, increasing the overall volume and weight of the system.
基金Supported by the Innovat ion and Entrepreneurship Project for College Students of the First Affiliated Hospital of Guangxi Medical University in 2022 and the Development and Application of Appropriate Medical and Health Technologies in Guangxi(No.S2021093).
文摘AIM:To investigate the frequency and associated factors of accommodation and non-strabismic binocular vision dysfunction among medical university students.METHODS:Totally 158 student volunteers underwent routine vision examination in the optometry clinic of Guangxi Medical University.Their data were used to identify the different types of accommodation and nonstrabismic binocular vision dysfunction and to determine their frequency.Correlation analysis and logistic regression were used to examine the factors associated with these abnormalities.RESULTS:The results showed that 36.71%of the subjects had accommodation and non-strabismic binocular vision issues,with 8.86%being attributed to accommodation dysfunction and 27.85%to binocular abnormalities.Convergence insufficiency(CI)was the most common abnormality,accounting for 13.29%.Those with these abnormalities experienced higher levels of eyestrain(χ2=69.518,P<0.001).The linear correlations were observed between the difference of binocular spherical equivalent(SE)and the index of horizontal esotropia at a distance(r=0.231,P=0.004)and the asthenopia survey scale(ASS)score(r=0.346,P<0.001).Furthermore,the right eye's SE was inversely correlated with the convergence of positive and negative fusion images at close range(r=-0.321,P<0.001),the convergence of negative fusion images at close range(r=-0.294,P<0.001),the vergence facility(VF;r=-0.234,P=0.003),and the set of negative fusion images at far range(r=-0.237,P=0.003).Logistic regression analysis indicated that gender,age,and the difference in right and binocular SE did not influence the emergence of these abnormalities.CONCLUSION:Binocular vision abnormalities are more prevalent than accommodation dysfunction,with CI being the most frequent type.Greater binocular refractive disparity leads to more severe eyestrain symptoms.
基金Supported by National Natural Science Foundation of China,No.32200545The GDPH Supporting Fund for Talent Program,No.KJ012020633 and KJ012019530Science and Technology Research Project of Guangdong Provincial Hospital of Chinese Medicine,No.YN2022GK04。
文摘BACKGROUND The importance of age on the development of ocular conditions has been reported by numerous studies.Diabetes may have different associations with different stages of ocular conditions,and the duration of diabetes may affect the development of diabetic eye disease.While there is a dose-response relationship between the age at diagnosis of diabetes and the risk of cardiovascular disease and mortality,whether the age at diagnosis of diabetes is associated with incident ocular conditions remains to be explored.It is unclear which types of diabetes are more predictive of ocular conditions.AIM To examine associations between the age of diabetes diagnosis and the incidence of cataract,glaucoma,age-related macular degeneration(AMD),and vision acuity.METHODS Our analysis was using the UK Biobank.The cohort included 8709 diabetic participants and 17418 controls for ocular condition analysis,and 6689 diabetic participants and 13378 controls for vision analysis.Ocular diseases were identified using inpatient records until January 2021.Vision acuity was assessed using a chart.RESULTS During a median follow-up of 11.0 years,3874,665,and 616 new cases of cataract,glaucoma,and AMD,respectively,were identified.A stronger association between diabetes and incident ocular conditions was observed where diabetes was diagnosed at a younger age.Individuals with type 2 diabetes(T2D)diagnosed at<45 years[HR(95%CI):2.71(1.49-4.93)],45-49 years[2.57(1.17-5.65)],50-54 years[1.85(1.13-3.04)],or 50-59 years of age[1.53(1.00-2.34)]had a higher risk of AMD independent of glycated haemoglobin.T2D diagnosed<45 years[HR(95%CI):2.18(1.71-2.79)],45-49 years[1.54(1.19-2.01)],50-54 years[1.60(1.31-1.96)],or 55-59 years of age[1.21(1.02-1.43)]was associated with an increased cataract risk.T2D diagnosed<45 years of age only was associated with an increased risk of glaucoma[HR(95%CI):1.76(1.00-3.12)].HRs(95%CIs)for AMD,cataract,and glaucoma associated with type 1 diabetes(T1D)were 4.12(1.99-8.53),2.95(2.17-4.02),and 2.40(1.09-5.31),respectively.In multivariable-adjusted analysis,individuals with T2D diagnosed<45 years of age[β95%CI:0.025(0.009,0.040)]had a larger increase in LogMAR.Theβ(95%CI)for LogMAR associated with T1D was 0.044(0.014,0.073).CONCLUSION The younger age at the diagnosis of diabetes is associated with a larger relative risk of incident ocular diseases and greater vision loss.
文摘针对当前遥感农作物分类研究中深度学习模型对光谱时间和空间信息特征采样不足,农作物提取仍然存在边界模糊、漏提、误提的问题,提出了一种名为视觉Transformer-长短期记忆递归神经网络(Vision Transformer-long short term memory,ViTL)的深度学习模型,ViTL模型集成了双路Vision-Transformer特征提取、时空特征融合和长短期记忆递归神经网络(LSTM)时序分类等3个关键模块,双路Vision-Transformer特征提取模块用于捕获图像的时空特征相关性,一路提取空间分类特征,一路提取时间变化特征;时空特征融合模块用于将多时特征信息进行交叉融合;LSTM时序分类模块捕捉多时序的依赖关系并进行输出分类。综合利用基于多时序卫星影像的遥感技术理论和方法,对黑龙江省齐齐哈尔市讷河市作物信息进行提取,研究结果表明,ViTL模型表现出色,其总体准确率(Overall Accuracy,OA)、平均交并比(Mean Intersection over Union,MIoU)和F1分数分别达到0.8676、0.6987和0.8175,与其他广泛使用的深度学习方法相比,包括三维卷积神经网络(3-D CNN)、二维卷积神经网络(2-D CNN)和长短期记忆递归神经网络(LSTM),ViTL模型的F1分数提高了9%~12%,显示出显著的优越性。ViTL模型克服了面对多时序遥感影像的农作物分类任务中的时间和空间信息特征采样不足问题,为准确、高效地农作物分类提供了新思路。
基金financial support by the Semiconductor Initiative at the King Abdullah University of Science and Technologysupported by King Abdullah University of Science and Technology(KAUST)Research Funding(KRF)under Award No.ORA-2022-5314.
文摘The emergence of the Internet-of-Things is anticipated to create a vast market for what are known as smart edge devices,opening numerous opportunities across countless domains,including personalized healthcare and advanced robotics.Leveraging 3D integration,edge devices can achieve unprecedented miniaturization while simultaneously boosting processing power and minimizing energy consumption.Here,we demonstrate a back-end-of-line compatible optoelectronic synapse with a transfer learning method on health care applications,including electroencephalogram(EEG)-based seizure prediction,electromyography(EMG)-based gesture recognition,and electrocardiogram(ECG)-based arrhythmia detection.With experiments on three biomedical datasets,we observe the classification accuracy improvement for the pretrained model with 2.93%on EEG,4.90%on ECG,and 7.92%on EMG,respectively.The optical programming property of the device enables an ultralow power(2.8×10^(-13) J)fine-tuning process and offers solutions for patient-specific issues in edge computing scenarios.Moreover,the device exhibits impressive light-sensitive characteristics that enable a range of light-triggered synaptic functions,making it promising for neuromorphic vision application.To display the benefits of these intricate synaptic properties,a 5×5 optoelectronic synapse array is developed,effectively simulating human visual perception and memory functions.The proposed flexible optoelectronic synapse holds immense potential for advancing the fields of neuromorphic physiological signal processing and artificial visual systems in wearable applications.