期刊文献+

全文索引技术时空效率分析 被引量:17

Time and Space Efficiencies Analysis of Full-Text Index Techniques
在线阅读 下载PDF
导出
摘要 全文索引技术(full-text index technique)作为提高全文检索时空效率的有效方式之一,近年来得到了广泛而深入的研究.根据全文索引实现技术的不同,将其分为三大类:索引技术、压缩与索引混合技术以及自索引技术(self-index technique).从上述分类角度综述了全文索引时空效率方法中具有代表性的一些方法和技术:倒排文件、签名文件、后缀树与后缀数组、基于这3种索引的压缩技术、基于倒排文件的自索引与基于后缀数组的自索引的基本原理、所面临的问题及进展,并对这些技术的时空性能进行了详细的分析和比较,分析了各种技术的适应环境及优劣.最后总结了上述技术的特点,指出了存在的问题以及未来的研究方向. As one of the efficient methods of improving time and space efficiencies, the full-text index technique has been well studied in recent years. According to the implementation ways, it can be classified into three categories: Index technique, hybrid technique of index and compression, self-index technique. This paper reviews the recent researches on this topic, which include the main techniques such as inverted files, signature files, suffix trees, suffix arrays, compressed indices based on the indices mentioned above, self-index based on inverted files, and self-index based on suffix arrays. This paper also introduces the basic theories, problems, progress as well as space and time efficiencies of these techniques. Through a detailed efficiency analysis and comparison, the pros and cons of the techniques are given. Finally, the important features of these techniques are summarized, and the future research strategies and trends directions are pointed out as well.
出处 《软件学报》 EI CSCD 北大核心 2009年第7期1768-1784,共17页 Journal of Software
基金 国家自然科学基金Nos.60573095 90718027 国家高技术研究发展计划(863)No.2006AA12Z210 国家教育部博士学科点专项科研基金No.20050486024~~
关键词 倒排文件 签名文件 后缀树 后缀数组 自索引 压缩 时空效率 inverted file signature file suffix tree suffix array self-index compression time and space efficiency
  • 相关文献

参考文献3

二级参考文献15

  • 1李栋,史晓东.一种支持高效检索的实时更新倒排索引策略[J].情报学报,2006,25(1):16-20. 被引量:6
  • 2[1]D E Knuth.The Art of Computer Programming,Sorting and Searching.1st ed.Reading,MA:Addision-Wesley,1973
  • 3[2]Gonzalo Navarro,Edleno Silva Demoura,Nivio Ziviani.Adding compression to block addressing inverted indices.Information Retrieval Journal,2000,3(1):49-77
  • 4[4]Vongoc Anh,Alistair Moffat.Inverted index compression using word-aligned binary codes.Information Retrieval,2005,8(1):151-166
  • 5Gaston, Gonnet, Ricardo, Baeza-Yates, Snider, T. New indices for text: Pat trees and Pat arrays. In: Frakes, W.B., Ricardo Baeza-Yates, eds. Information Retrieval Data Structure and Algorithms. Englewood Cliffs, NJ: Prentice Hall, 1992. 66~81.
  • 6Language-Teaching Research Group. The Dictionary of Modern Chinese Frequency. Beijing: Publishing Company of Beijing Institute of Language, 1986 (in Chinese).[2] 北京语言学院语言教学研究所.现代汉语词频词典.北京:北京语言学院出版社,1986.
  • 7Baesa-Yates, R., Ribeiro-Neto, B.Modern Information Retrieval. Reading, M A: Addison Wesley, 1999.
  • 8Sullivan, D. Search Engine Watch. http://www.searchenginewatch.com.
  • 9AltaVista, http://www.altavista.com.
  • 10Tomasic, A., Garcia-Molina, H., Shoens, K. Incremental updates of inverted listsfor text document retrieval. In: Snodgrass, R.T., Winslett, M., eds. Proc eedings of theSIGMOD'94. New York: ACM Press, 1994. 289~300.

共引文献23

同被引文献192

引证文献17

二级引证文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部