Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction b...Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction by partical match (PPM) segmenting algorithm for Chinese words based on extracting local context information, which adds the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focuses on the process of online segmentatien and new word detection which achieves a good effect in the close or opening test, and outperforms some well-known Chinese segmentation system to a certain extent.展开更多
To identify Song Ci style automatically,we put forward a novel stylistic text categorization approach based on words and their semantic in this paper. And a modified special word segmentation method,a new semantic rel...To identify Song Ci style automatically,we put forward a novel stylistic text categorization approach based on words and their semantic in this paper. And a modified special word segmentation method,a new semantic relativity computing method based on HowNet along with the corresponding word sense disambiguation method are proposed to extract words and semantic features from Song Ci. Experiments are carried out and the results show that these methods are effective.展开更多
Word sense disambiguation(WSD),identifying the specific sense of the target word given its context,is a fundamental task in natural language processing.Recently,researchers have shown promising results using long shor...Word sense disambiguation(WSD),identifying the specific sense of the target word given its context,is a fundamental task in natural language processing.Recently,researchers have shown promising results using long short term memory(LSTM),which is able to better capture sequential and syntactic features of text.However,this method neglects the dependencies among instances,such as their context semantic similarities.To solve this problem,we proposed a novel WSD model by introducing a cache-like memory module to capture the semantic dependencies among instances for WSD.Extensive evaluations on standard datasets demonstrate the superiority of the proposed model over various baselines.展开更多
基金National Natural Science Foundation of China ( No.60903129)National High Technology Research and Development Program of China (No.2006AA010107, No.2006AA010108)Foundation of Fujian Province of China (No.2008F3105)
文摘Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction by partical match (PPM) segmenting algorithm for Chinese words based on extracting local context information, which adds the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focuses on the process of online segmentatien and new word detection which achieves a good effect in the close or opening test, and outperforms some well-known Chinese segmentation system to a certain extent.
基金National Natural Science Foundation of China ( No.60903129)National High Technology Research and Development Programs of China ( No.2006AA010107, No.2006AA010108)Foundations of Fujian Province of China ( No.2008F3105, No.2009J05156)
文摘To identify Song Ci style automatically,we put forward a novel stylistic text categorization approach based on words and their semantic in this paper. And a modified special word segmentation method,a new semantic relativity computing method based on HowNet along with the corresponding word sense disambiguation method are proposed to extract words and semantic features from Song Ci. Experiments are carried out and the results show that these methods are effective.
文摘Word sense disambiguation(WSD),identifying the specific sense of the target word given its context,is a fundamental task in natural language processing.Recently,researchers have shown promising results using long short term memory(LSTM),which is able to better capture sequential and syntactic features of text.However,this method neglects the dependencies among instances,such as their context semantic similarities.To solve this problem,we proposed a novel WSD model by introducing a cache-like memory module to capture the semantic dependencies among instances for WSD.Extensive evaluations on standard datasets demonstrate the superiority of the proposed model over various baselines.