医学视觉问答(Medical VQA)通过回答基于医学图像的自然语言问题,为临床诊断和决策提供支持。然而,现有方法在多步推理、细粒度理解和可解释性方面存在不足。本文提出一种创新性模型,通过子问题生成机制将复杂医学查询分解为简单问题,...医学视觉问答(Medical VQA)通过回答基于医学图像的自然语言问题,为临床诊断和决策提供支持。然而,现有方法在多步推理、细粒度理解和可解释性方面存在不足。本文提出一种创新性模型,通过子问题生成机制将复杂医学查询分解为简单问题,并结合多模态对齐和动态知识注入模块逐步推理。模型能够精准聚焦医学图像的关键区域,对查询相关的语义进行动态整合,提升答案生成的准确性和可靠性。在SLAKE和VQA-MED数据集上进行的实验表明,所提方法在答案准确性、推理能力和可解释性方面优于现有方法,为医学VQA任务中的多模态信息整合和复杂推理提供了高效解决方案,并为临床诊断和智能医学研究提供了新思路。Medical Visual Question Answering (Medical VQA) supports clinical diagnosis and decision-making by answering natural language questions based on medical images. However, existing approaches face challenges in multi-step reasoning, fine-grained understanding, and interpretability. This paper proposes an innovative model that decomposes complex medical queries into simpler sub-questions through a sub-question generation mechanism. Combined with multimodal alignment and dynamic knowledge injection modules, the model performs progressive reasoning. It dynamically focuses on key regions of medical images, integrates query-relevant semantics, and enhances the accuracy and reliability of answer generation. Experiments conducted on the SLAKE and VQA-MED datasets demonstrate that the proposed method outperforms state-of-the-art approaches in terms of answer accuracy, reasoning capability, and interpretability. This work offers an efficient solution for multimodal information integration and complex reasoning in Medical VQA tasks and provides new insights for clinical diagnostics and intelligent medical research.展开更多
目的:提出一种基于多特征融合的中医药问题生成模型(MFFQG),以改善现有的自动生成技术在处理特定领域时存在的领域关键词信息缺失和生成问题表达不规范问题。方法:利用RoBERTa向量和五笔向量捕捉输入序列的语义特征和字形特征,同时融合...目的:提出一种基于多特征融合的中医药问题生成模型(MFFQG),以改善现有的自动生成技术在处理特定领域时存在的领域关键词信息缺失和生成问题表达不规范问题。方法:利用RoBERTa向量和五笔向量捕捉输入序列的语义特征和字形特征,同时融合句法信息和所构建的中医药领域主副关键词信息,将得到的多特征向量信息送入UniLM生成模型得到生成结果,实现对中医药领域问题的自动生成。结果:MFFQG模型融合多种特征,在Rouge-1、Rouge-2、Rouge-L评价指标上分别达到64.93%、34.57%、63.05%。局限:数据主要来源于中医药领域,在其他领域中的效果有待验证。结论:MFFQG模型相较于对比模型,可以显著提升中医药问题的生成质量。Objective: To propose a traditional Chinese medicine problem generation model (MFFQG) based on multi feature fusion, in order to improve the problems of missing domain keyword information and non-standard expression of generation problems in existing automatic generation technologies when dealing with specific fields. Method: Using RoBERTa vectors and Wubi vectors to capture the semantic and glyph features of the input sequence, while integrating syntactic information and the constructed main and auxiliary keyword information in the field of traditional Chinese medicine, the obtained multi feature vector information is fed into the UniLM generation model to obtain the generated results, achieving automatic generation of problems in the field of traditional Chinese medicine. Result: The MFFQG model integrates multiple features and achieves 64.93%, 34.57%, and 63.05% in Rouge-1, Rouge-2, and Rouge-L evaluation indicators, respectively. Limitation: The data mainly comes from the field of traditional Chinese medicine, and its effectiveness in other fields needs to be verified. Conclusion: Compared to the comparative model, the MFFQG model can significantly improve the quality of generating traditional Chinese medicine problems.展开更多
文摘医学视觉问答(Medical VQA)通过回答基于医学图像的自然语言问题,为临床诊断和决策提供支持。然而,现有方法在多步推理、细粒度理解和可解释性方面存在不足。本文提出一种创新性模型,通过子问题生成机制将复杂医学查询分解为简单问题,并结合多模态对齐和动态知识注入模块逐步推理。模型能够精准聚焦医学图像的关键区域,对查询相关的语义进行动态整合,提升答案生成的准确性和可靠性。在SLAKE和VQA-MED数据集上进行的实验表明,所提方法在答案准确性、推理能力和可解释性方面优于现有方法,为医学VQA任务中的多模态信息整合和复杂推理提供了高效解决方案,并为临床诊断和智能医学研究提供了新思路。Medical Visual Question Answering (Medical VQA) supports clinical diagnosis and decision-making by answering natural language questions based on medical images. However, existing approaches face challenges in multi-step reasoning, fine-grained understanding, and interpretability. This paper proposes an innovative model that decomposes complex medical queries into simpler sub-questions through a sub-question generation mechanism. Combined with multimodal alignment and dynamic knowledge injection modules, the model performs progressive reasoning. It dynamically focuses on key regions of medical images, integrates query-relevant semantics, and enhances the accuracy and reliability of answer generation. Experiments conducted on the SLAKE and VQA-MED datasets demonstrate that the proposed method outperforms state-of-the-art approaches in terms of answer accuracy, reasoning capability, and interpretability. This work offers an efficient solution for multimodal information integration and complex reasoning in Medical VQA tasks and provides new insights for clinical diagnostics and intelligent medical research.
文摘目的:提出一种基于多特征融合的中医药问题生成模型(MFFQG),以改善现有的自动生成技术在处理特定领域时存在的领域关键词信息缺失和生成问题表达不规范问题。方法:利用RoBERTa向量和五笔向量捕捉输入序列的语义特征和字形特征,同时融合句法信息和所构建的中医药领域主副关键词信息,将得到的多特征向量信息送入UniLM生成模型得到生成结果,实现对中医药领域问题的自动生成。结果:MFFQG模型融合多种特征,在Rouge-1、Rouge-2、Rouge-L评价指标上分别达到64.93%、34.57%、63.05%。局限:数据主要来源于中医药领域,在其他领域中的效果有待验证。结论:MFFQG模型相较于对比模型,可以显著提升中医药问题的生成质量。Objective: To propose a traditional Chinese medicine problem generation model (MFFQG) based on multi feature fusion, in order to improve the problems of missing domain keyword information and non-standard expression of generation problems in existing automatic generation technologies when dealing with specific fields. Method: Using RoBERTa vectors and Wubi vectors to capture the semantic and glyph features of the input sequence, while integrating syntactic information and the constructed main and auxiliary keyword information in the field of traditional Chinese medicine, the obtained multi feature vector information is fed into the UniLM generation model to obtain the generated results, achieving automatic generation of problems in the field of traditional Chinese medicine. Result: The MFFQG model integrates multiple features and achieves 64.93%, 34.57%, and 63.05% in Rouge-1, Rouge-2, and Rouge-L evaluation indicators, respectively. Limitation: The data mainly comes from the field of traditional Chinese medicine, and its effectiveness in other fields needs to be verified. Conclusion: Compared to the comparative model, the MFFQG model can significantly improve the quality of generating traditional Chinese medicine problems.