期刊文献+

基于模拟后缀数组索引结构的实现

A Compressed Format Index Based on Suffix Arrays and Its Implement
原文传递
导出
摘要 实现了一种基于模拟后缀数组的索引的结构,并在实现索引功能的同时对索引结构进行有效压缩。首先,对传统的哈夫曼编码压缩小波树时出现的空白编码进行了处理,应用正则哈夫曼编码有效的去掉了空白编码;其次,通过相关函数操作在已压缩的小波树上模拟实现了后缀数组功能。理论分析和实验结果表明,这种结构具有很小的空间占用,并不影响索引结构的运行效率。 In this paper, we use the function rank and the function select in wavelet tree to implement the faction of the suffix arrays .We also introduce the Canonical Huffman code to encode the Burrows- Wheeler transform (BWT) of a text T. First of all, we use the canonical Huffman code to encode wavelet tree in order to reduce the space of the wavelet tree with Huffman code; we also implement some functions of suffix arrays. Based on this data structure, we implement the suffix automaton in a space economical way.
出处 《情报科学》 CSSCI 北大核心 2009年第12期1834-1836,1862,共4页 Information Science
基金 吉林省教育厅科技规划项目(2007248 2008257)
关键词 全文索引 后缀数组 BW变换 哈夫曼编码 full text index suffix arrays BWT transform huffman code.
  • 相关文献

参考文献11

  • 1U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches [J]. SIAM Journal on Computing, 1993, (22):935-948.
  • 2Paolo Ferragina , Giovanni Manzini, Veli Makinen, Conzalo Navarro. An Alphabet-Friendly FM-Index[C]. SPIRE,2004: 150-160.
  • 3刘学文,陶晓鹏,于玉,胡运发.一种全新的全文索引模型——后继数组模型[J].软件学报,2002,13(1):150-158. 被引量:11
  • 4申展,江宝林,张谧,唐磊,胡运发.互关联后继树模型及其实现[J].计算机应用与软件,2005,22(3):7-9. 被引量:10
  • 5Chen M S, Park J S, Yu P S. Efficient Data Mining for Path Travsersal Patems[J]. IEEE Trans. Knowledge Data Engineer, 1998,10 (2) : 209-211.
  • 6Pei J, Han J, Mortazavi B, et al. Mining Access Patterns Efficiently from Web Logs[C]. In: Proceedings 2000 Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan(PAKDD00), 2000:4.
  • 7R. Grossi and J. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching [C]. In Proceedings of the 32nd ACM Symposium on Theory of Computing, 2000.
  • 8喻钧,王长元,Sven Schuierer,喻萌.基于后缀树思想构造Web生物数据搜索的数据模型[J].西安工程科技学院学报,2006,20(2):206-209. 被引量:1
  • 9G.Gonnet, R. Baeza-Yates, T. Snider, New indices for text: PAT trees and PAT arrays [C]. in: W. Frakes, R.A. Baeza- Yates (Eds.),Information Retrieval: Algorithms and Data Structures,Prentice-Hall, Englewood Cliffs, NJ, 1992:66- 82.
  • 10G. Jacobson. Succinct static data structures [T]. Technical Report CMU-CS-89-112, Dept. of Computer Science, Carnegie-Mellon University, Jan. 1989.

二级参考文献11

  • 1U.Manber and E.Myers.Suffix arrays:A new method for on-line string searches.Proc.of the FISTREE Ann.ACM-SIAM Symp.on Discrete Algorithms,1990:319~327.
  • 2S.Muthukrishnan.Efficient Algorithms for Document Retrieval Problems.In Proc.ACM-SIAM SODA,657~666,2002.
  • 3J.Zobel,A.Moffat,K.Ramamohanarao,Inverted files versus signature files for text indexing,Transactions on Database Systems 23(4):453~490,1998.
  • 4Tao Xiaopeng,Hu Yunfa,Zhou Shuigeng.Subsequent Array:A New Full Text Index,Proceeding World Multiconference on Systemics,Cybernetics and Informatics,Florida,USA,2001:551~556.
  • 5R.Baeza-Yates and B.Ribeiro-Neto,Modern Information Retrieval ,Addison-Wesley,1999.
  • 6UDI Manber,GENE Myers,SUFFIX Arrays.A new method for on-line string searches[J].SIAM Journal on Computing,1993,22(5):935-948.
  • 7DAN Gusfield.Algorithms on Strings,Trees and Sequences:Computer Science and Computational Biology[M].Cambridge:Cambridge University Press,1998.
  • 8CYNTBIA Gibas,PER Jambeck.Developing Bioinformatics Computer Skills[M].USA:O'Reilly Media Inc,2002.
  • 9SUNG Wing-kin.Searching biological database[EB/OL].(2005-08)[2005-12-20].http://www.comp.nus.edu.sg/~ksung/cs5238/note/Lect3-database_2005.pdf.
  • 10Gaston, Gonnet, Ricardo, Baeza-Yates, Snider, T. New indices for text: Pat trees and Pat arrays. In: Frakes, W.B., Ricardo Baeza-Yates, eds. Information Retrieval Data Structure and Algorithms. Englewood Cliffs, NJ: Prentice Hall, 1992. 66~81.

共引文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部