Text information is principally dependent on the natural languages.Therefore,improving security and reliability of text information exchanged via internet network has become the most difficult challenge that researche...Text information is principally dependent on the natural languages.Therefore,improving security and reliability of text information exchanged via internet network has become the most difficult challenge that researchers encounter.Content authentication and tampering detection of digital contents have become a major concern in the area of communication and information exchange via the Internet.In this paper,an intelligent text Zero-Watermarking approach SETZWMWMM(Smart English Text Zero-Watermarking Approach Based on Mid-Level Order and Word Mechanism of Markov Model)has been proposed for the content authentication and tampering detection of English text contents.The SETZWMWMM approach embeds and detects the watermark logically without altering the original English text document.Based on Hidden Markov Model(HMM),Third level order of word mechanism is used to analyze the interrelationship between contexts of given English texts.The extracted features are used as a watermark information and integrated with digital zero-watermarking techniques.To detect eventual tampering,SETZWMWMM has been implemented and validated with attacked English text.Experiments were performed on four datasets of varying lengths under multiple random locations of insertion,reorder and deletion attacks.The experimental results show that our method is more sensitive and efficient for all kinds of tampering attacks with high level accuracy of tampering detection than compared methods.展开更多
Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive te...Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).展开更多
The text watermarking is a feasible method to protect the copyright from being copied and tampered. In this paper, a text zero-watermarking algorithm is proposed based on the connection between the Chinese characters ...The text watermarking is a feasible method to protect the copyright from being copied and tampered. In this paper, a text zero-watermarking algorithm is proposed based on the connection between the Chinese characters and the Chinese phonetic alphabets. According to the predefined interval threshold, the proposed algorithm extracts the characteristics of the text content by valuing on the basis of the custom of Chinese phonetic alphabets. After being chaotic transformed, the algorithm combines the text characteristics with the embedded watermarking information in the Chinese text. The experimental results show that the watermarking's capability of preventing tampering is up to 0.1%, which demonstrates the strong robustness and resistance to aggressive behavior of the algorithm.展开更多
传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需...传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需求的文本主题内容的方法,以推动数字人文研究的进一步发展。首先,选取本课题组前期标注的古籍语料数据进行主题类别标注和视图分类;其次,构建融合BERT(bidirectional encoder representation from transformers)预训练模型、改进卷积神经网络、循环神经网络和多头注意力机制的语义挖掘模型;最后,融入“主体-关系-客体”多视图的语义增强模型,构建DJ-TextRCNN(DianJi-recurrent convolutional neural networks for text classification)模型实现对典籍文本更细粒度、更深层次、更多维度的语义挖掘。研究结果发现,DJ-TextRCNN模型在不同视图下的古籍主题推荐任务的准确率均为最优。在“主体-关系-客体”视图下,精确率达到88.54%,初步实现了对古籍文本的精准主题推荐,对中华文化深层次、细粒度的语义挖掘具有一定的指导意义。展开更多
以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人...以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人文研究范式的转型。首先,参照东汉古籍《说文解字》对文字的分析方式,以前期标注的古籍语料数据集为基础,构建全新的“字音(说)-原文(文)-结构(解)-字形(字)”四维特征数据集。其次,设计四维特征向量提取模型(speaking,word,pattern,and font to vector,SWPF2vec),并结合预训练模型实现对古籍文本细粒度的特征表示。再其次,构建融合卷积神经网络、循环神经网络和多头注意力机制的古籍文本主题分类模型(dianji-recurrent convolutional neural networks for text classification,DJ-TextRCNN)。最后,融入四维语义特征,实现对古籍文本多维度、深层次、细粒度的语义挖掘。在古籍文本主题分类任务上,DJ-TextRCNN模型在不同维度特征下的主题分类准确率均为最优,在“说文解字”四维特征下达到76.23%的准确率,初步实现了对古籍文本的精准主题分类。展开更多
Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the...Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.展开更多
In the intricate network environment,the secure transmission of medical images faces challenges such as information leakage and malicious tampering,significantly impacting the accuracy of disease diagnoses by medical ...In the intricate network environment,the secure transmission of medical images faces challenges such as information leakage and malicious tampering,significantly impacting the accuracy of disease diagnoses by medical professionals.To address this problem,the authors propose a robust feature watermarking algorithm for encrypted medical images based on multi-stage discrete wavelet transform(DWT),Daisy descriptor,and discrete cosine transform(DCT).The algorithm initially encrypts the original medical image through DWT-DCT and Logistic mapping.Subsequently,a 3-stage DWT transformation is applied to the encrypted medical image,with the centre point of the LL3 sub-band within its low-frequency component serving as the sampling point.The Daisy descriptor matrix for this point is then computed.Finally,a DCT transformation is performed on the Daisy descriptor matrix,and the low-frequency portion is processed using the perceptual hashing algorithm to generate a 32-bit binary feature vector for the medical image.This scheme utilises cryptographic knowledge and zero-watermarking technique to embed watermarks without modifying medical images and can extract the watermark from test images without the original image,which meets the basic re-quirements of medical image watermarking.The embedding and extraction of water-marks are accomplished in a mere 0.160 and 0.411s,respectively,with minimal computational overhead.Simulation results demonstrate the robustness of the algorithm against both conventional attacks and geometric attacks,with a notable performance in resisting rotation attacks.展开更多
Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted vid...Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.展开更多
基金The author extends his appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number(R.G.P.2/55/40/2019),Received by Fahd N.Al-Wesabi.www.kku.edu.sa。
文摘Text information is principally dependent on the natural languages.Therefore,improving security and reliability of text information exchanged via internet network has become the most difficult challenge that researchers encounter.Content authentication and tampering detection of digital contents have become a major concern in the area of communication and information exchange via the Internet.In this paper,an intelligent text Zero-Watermarking approach SETZWMWMM(Smart English Text Zero-Watermarking Approach Based on Mid-Level Order and Word Mechanism of Markov Model)has been proposed for the content authentication and tampering detection of English text contents.The SETZWMWMM approach embeds and detects the watermark logically without altering the original English text document.Based on Hidden Markov Model(HMM),Third level order of word mechanism is used to analyze the interrelationship between contexts of given English texts.The extracted features are used as a watermark information and integrated with digital zero-watermarking techniques.To detect eventual tampering,SETZWMWMM has been implemented and validated with attacked English text.Experiments were performed on four datasets of varying lengths under multiple random locations of insertion,reorder and deletion attacks.The experimental results show that our method is more sensitive and efficient for all kinds of tampering attacks with high level accuracy of tampering detection than compared methods.
文摘Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).
基金Supported by the National Natural Science Foundation of China(91112003)Youth Foundation(31541311307)
文摘The text watermarking is a feasible method to protect the copyright from being copied and tampered. In this paper, a text zero-watermarking algorithm is proposed based on the connection between the Chinese characters and the Chinese phonetic alphabets. According to the predefined interval threshold, the proposed algorithm extracts the characteristics of the text content by valuing on the basis of the custom of Chinese phonetic alphabets. After being chaotic transformed, the algorithm combines the text characteristics with the embedded watermarking information in the Chinese text. The experimental results show that the watermarking's capability of preventing tampering is up to 0.1%, which demonstrates the strong robustness and resistance to aggressive behavior of the algorithm.
文摘传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需求的文本主题内容的方法,以推动数字人文研究的进一步发展。首先,选取本课题组前期标注的古籍语料数据进行主题类别标注和视图分类;其次,构建融合BERT(bidirectional encoder representation from transformers)预训练模型、改进卷积神经网络、循环神经网络和多头注意力机制的语义挖掘模型;最后,融入“主体-关系-客体”多视图的语义增强模型,构建DJ-TextRCNN(DianJi-recurrent convolutional neural networks for text classification)模型实现对典籍文本更细粒度、更深层次、更多维度的语义挖掘。研究结果发现,DJ-TextRCNN模型在不同视图下的古籍主题推荐任务的准确率均为最优。在“主体-关系-客体”视图下,精确率达到88.54%,初步实现了对古籍文本的精准主题推荐,对中华文化深层次、细粒度的语义挖掘具有一定的指导意义。
文摘以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人文研究范式的转型。首先,参照东汉古籍《说文解字》对文字的分析方式,以前期标注的古籍语料数据集为基础,构建全新的“字音(说)-原文(文)-结构(解)-字形(字)”四维特征数据集。其次,设计四维特征向量提取模型(speaking,word,pattern,and font to vector,SWPF2vec),并结合预训练模型实现对古籍文本细粒度的特征表示。再其次,构建融合卷积神经网络、循环神经网络和多头注意力机制的古籍文本主题分类模型(dianji-recurrent convolutional neural networks for text classification,DJ-TextRCNN)。最后,融入四维语义特征,实现对古籍文本多维度、深层次、细粒度的语义挖掘。在古籍文本主题分类任务上,DJ-TextRCNN模型在不同维度特征下的主题分类准确率均为最优,在“说文解字”四维特征下达到76.23%的准确率,初步实现了对古籍文本的精准主题分类。
基金supported by the General Projects of ISTIC Innovation Foundation“Problem innovation solution mining based on text generation model”(MS2024-03).
文摘Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.
基金National Natural Science Foundation of China,Grant/Award Numbers:62063004,62350410483Key Research and Development Project of Hainan Province,Grant/Award Number:ZDYF2021SHFZ093Zhejiang Provincial Postdoctoral Science Foundation,Grant/Award Number:ZJ2021028。
文摘In the intricate network environment,the secure transmission of medical images faces challenges such as information leakage and malicious tampering,significantly impacting the accuracy of disease diagnoses by medical professionals.To address this problem,the authors propose a robust feature watermarking algorithm for encrypted medical images based on multi-stage discrete wavelet transform(DWT),Daisy descriptor,and discrete cosine transform(DCT).The algorithm initially encrypts the original medical image through DWT-DCT and Logistic mapping.Subsequently,a 3-stage DWT transformation is applied to the encrypted medical image,with the centre point of the LL3 sub-band within its low-frequency component serving as the sampling point.The Daisy descriptor matrix for this point is then computed.Finally,a DCT transformation is performed on the Daisy descriptor matrix,and the low-frequency portion is processed using the perceptual hashing algorithm to generate a 32-bit binary feature vector for the medical image.This scheme utilises cryptographic knowledge and zero-watermarking technique to embed watermarks without modifying medical images and can extract the watermark from test images without the original image,which meets the basic re-quirements of medical image watermarking.The embedding and extraction of water-marks are accomplished in a mere 0.160 and 0.411s,respectively,with minimal computational overhead.Simulation results demonstrate the robustness of the algorithm against both conventional attacks and geometric attacks,with a notable performance in resisting rotation attacks.
基金This work is supported in part by the National Natural Science Foundation of China(Grant Number 61971078)which provided domain expertise and computational power that greatly assisted the activity+1 种基金This work was financially supported by Chongqing Municipal Education Commission Grants forMajor Science and Technology Project(KJZD-M202301901)the Science and Technology Research Project of Jiangxi Department of Education(GJJ2201049).
文摘Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.