Medical image fusion technology is crucial for improving the detection accuracy and treatment efficiency of diseases,but existing fusion methods have problems such as blurred texture details,low contrast,and inability...Medical image fusion technology is crucial for improving the detection accuracy and treatment efficiency of diseases,but existing fusion methods have problems such as blurred texture details,low contrast,and inability to fully extract fused image information.Therefore,a multimodal medical image fusion method based on mask optimization and parallel attention mechanism was proposed to address the aforementioned issues.Firstly,it converted the entire image into a binary mask,and constructed a contour feature map to maximize the contour feature information of the image and a triple path network for image texture detail feature extraction and optimization.Secondly,a contrast enhancement module and a detail preservation module were proposed to enhance the overall brightness and texture details of the image.Afterwards,a parallel attention mechanism was constructed using channel features and spatial feature changes to fuse images and enhance the salient information of the fused images.Finally,a decoupling network composed of residual networks was set up to optimize the information between the fused image and the source image so as to reduce information loss in the fused image.Compared with nine high-level methods proposed in recent years,the seven objective evaluation indicators of our method have improved by 6%−31%,indicating that this method can obtain fusion results with clearer texture details,higher contrast,and smaller pixel differences between the fused image and the source image.It is superior to other comparison algorithms in both subjective and objective indicators.展开更多
Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great pro...Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired.展开更多
In the complex orchard environment,the efficient and accurate detection of object fruit is the basic requirement to realize the orchard yield measurement and automatic harvesting.Sometimes it is hard to differentiate ...In the complex orchard environment,the efficient and accurate detection of object fruit is the basic requirement to realize the orchard yield measurement and automatic harvesting.Sometimes it is hard to differentiate between the object fruits and the background because of the similar color,and it is challenging due to the ambient light and camera angle by which the photos have been taken.These problems make it hard to detect green fruits in orchard environments.In this study,a two-stage dense to detection framework(D2D)was proposed to detect green fruits in orchard environments.The proposed model was based on multi-scale feature extraction of target fruit by using feature pyramid networks MobileNetV2+FPN structure and generated region proposal of target fruit by using Region Proposal Network(RPN)structure.In the regression branch,the offset of each local feature was calculated,and the positive and negative samples of the region proposals were predicted by a binary mask prediction to reduce the interference of the background to the prediction box.In the classification branch,features were extracted from each sub-region of the region proposal,and features with distinguishing information were obtained through adaptive weighted pooling to achieve accurate classification.The new proposed model adopted an anchor-free frame design,which improves the generalization ability,makes the model more robust,and reduces the storage requirements.The experimental results of persimmon and green apple datasets show that the new model has the best detection performance,which can provide theoretical reference for other green object detection.展开更多
基金supported by Gansu Natural Science Foundation Programme(No.24JRRA231)National Natural Science Foundation of China(No.62061023)Gansu Provincial Education,Science and Technology Innovation and Industry(No.2021CYZC-04)。
文摘Medical image fusion technology is crucial for improving the detection accuracy and treatment efficiency of diseases,but existing fusion methods have problems such as blurred texture details,low contrast,and inability to fully extract fused image information.Therefore,a multimodal medical image fusion method based on mask optimization and parallel attention mechanism was proposed to address the aforementioned issues.Firstly,it converted the entire image into a binary mask,and constructed a contour feature map to maximize the contour feature information of the image and a triple path network for image texture detail feature extraction and optimization.Secondly,a contrast enhancement module and a detail preservation module were proposed to enhance the overall brightness and texture details of the image.Afterwards,a parallel attention mechanism was constructed using channel features and spatial feature changes to fuse images and enhance the salient information of the fused images.Finally,a decoupling network composed of residual networks was set up to optimize the information between the fused image and the source image so as to reduce information loss in the fused image.Compared with nine high-level methods proposed in recent years,the seven objective evaluation indicators of our method have improved by 6%−31%,indicating that this method can obtain fusion results with clearer texture details,higher contrast,and smaller pixel differences between the fused image and the source image.It is superior to other comparison algorithms in both subjective and objective indicators.
基金supported by the National Natural Science Foundation of China(Nos.61902158,61673108)the Science and Technology Program of Nantong(JC2018129,MS12018082)Top-notch Academic Programs Project of Jiangsu Higher Education Institu-tions(PPZY2015B135).
文摘Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired.
基金the Natural Science Foundation of Shandong Province in China(Grant No.ZR2020MF076)the Focus on Research and Development Plan in Shandong Province(Grant No.2019GNC106115)+2 种基金the National Nature Science Foundation of China(Grant No.62072289)the Shandong Province Higher Educational Science and Technology Program(Grant No.J18KA308)the Taishan Scholar Program of Shandong Province of China.
文摘In the complex orchard environment,the efficient and accurate detection of object fruit is the basic requirement to realize the orchard yield measurement and automatic harvesting.Sometimes it is hard to differentiate between the object fruits and the background because of the similar color,and it is challenging due to the ambient light and camera angle by which the photos have been taken.These problems make it hard to detect green fruits in orchard environments.In this study,a two-stage dense to detection framework(D2D)was proposed to detect green fruits in orchard environments.The proposed model was based on multi-scale feature extraction of target fruit by using feature pyramid networks MobileNetV2+FPN structure and generated region proposal of target fruit by using Region Proposal Network(RPN)structure.In the regression branch,the offset of each local feature was calculated,and the positive and negative samples of the region proposals were predicted by a binary mask prediction to reduce the interference of the background to the prediction box.In the classification branch,features were extracted from each sub-region of the region proposal,and features with distinguishing information were obtained through adaptive weighted pooling to achieve accurate classification.The new proposed model adopted an anchor-free frame design,which improves the generalization ability,makes the model more robust,and reduces the storage requirements.The experimental results of persimmon and green apple datasets show that the new model has the best detection performance,which can provide theoretical reference for other green object detection.