基于模拟后缀数组索引结构的实现

A Compressed Format Index Based on Suffix Arrays and Its Implement

导出

摘要实现了一种基于模拟后缀数组的索引的结构,并在实现索引功能的同时对索引结构进行有效压缩。首先,对传统的哈夫曼编码压缩小波树时出现的空白编码进行了处理,应用正则哈夫曼编码有效的去掉了空白编码;其次,通过相关函数操作在已压缩的小波树上模拟实现了后缀数组功能。理论分析和实验结果表明,这种结构具有很小的空间占用,并不影响索引结构的运行效率。 In this paper, we use the function rank and the function select in wavelet tree to implement the faction of the suffix arrays .We also introduce the Canonical Huffman code to encode the Burrows- Wheeler transform （BWT） of a text T. First of all, we use the canonical Huffman code to encode wavelet tree in order to reduce the space of the wavelet tree with Huffman code; we also implement some functions of suffix arrays. Based on this data structure, we implement the suffix automaton in a space economical way.

作者杨炜鸿张毅于洪梅

机构地区吉林工商学院信息工程分院吉林大学网络中心

出处《情报科学》 CSSCI 北大核心 2009年第12期1834-1836,1862,共4页 Information Science

基金吉林省教育厅科技规划项目(2007248 2008257)

关键词全文索引后缀数组 BW变换哈夫曼编码 full text index suffix arrays BWT transform huffman code.

分类号 G354 [文化科学—情报学] TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献11

1U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches [J]. SIAM Journal on Computing, 1993, (22):935-948.
2Paolo Ferragina , Giovanni Manzini, Veli Makinen, Conzalo Navarro. An Alphabet-Friendly FM-Index[C]. SPIRE,2004: 150-160.
3刘学文,陶晓鹏,于玉,胡运发.一种全新的全文索引模型——后继数组模型[J].软件学报,2002,13(1):150-158. 被引量：11
4申展,江宝林,张谧,唐磊,胡运发.互关联后继树模型及其实现[J].计算机应用与软件,2005,22(3):7-9. 被引量：10
5Chen M S, Park J S, Yu P S. Efficient Data Mining for Path Travsersal Patems[J]. IEEE Trans. Knowledge Data Engineer, 1998,10 (2) : 209-211.
6Pei J, Han J, Mortazavi B, et al. Mining Access Patterns Efficiently from Web Logs[C]. In: Proceedings 2000 Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan(PAKDD00), 2000:4.
7R. Grossi and J. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching [C]. In Proceedings of the 32nd ACM Symposium on Theory of Computing, 2000.
8喻钧,王长元,Sven Schuierer,喻萌.基于后缀树思想构造Web生物数据搜索的数据模型[J].西安工程科技学院学报,2006,20(2):206-209. 被引量：1
9G.Gonnet, R. Baeza-Yates, T. Snider, New indices for text: PAT trees and PAT arrays [C]. in: W. Frakes, R.A. Baeza- Yates (Eds.),Information Retrieval: Algorithms and Data Structures,Prentice-Hall, Englewood Cliffs, NJ, 1992:66- 82.
10G. Jacobson. Succinct static data structures [T]. Technical Report CMU-CS-89-112, Dept. of Computer Science, Carnegie-Mellon University, Jan. 1989.

二级参考文献11

1U.Manber and E.Myers.Suffix arrays:A new method for on-line string searches.Proc.of the FISTREE Ann.ACM-SIAM Symp.on Discrete Algorithms,1990:319～327.
2S.Muthukrishnan.Efficient Algorithms for Document Retrieval Problems.In Proc.ACM-SIAM SODA,657～666,2002.
3J.Zobel,A.Moffat,K.Ramamohanarao,Inverted files versus signature files for text indexing,Transactions on Database Systems 23(4):453～490,1998.
4Tao Xiaopeng,Hu Yunfa,Zhou Shuigeng.Subsequent Array:A New Full Text Index,Proceeding World Multiconference on Systemics,Cybernetics and Informatics,Florida,USA,2001:551～556.
5R.Baeza-Yates and B.Ribeiro-Neto,Modern Information Retrieval ,Addison-Wesley,1999.
6UDI Manber,GENE Myers,SUFFIX Arrays.A new method for on-line string searches[J].SIAM Journal on Computing,1993,22(5):935-948.
7DAN Gusfield.Algorithms on Strings,Trees and Sequences:Computer Science and Computational Biology[M].Cambridge:Cambridge University Press,1998.
8CYNTBIA Gibas,PER Jambeck.Developing Bioinformatics Computer Skills[M].USA:O'Reilly Media Inc,2002.
9SUNG Wing-kin.Searching biological database[EB/OL].(2005-08)[2005-12-20].http://www.comp.nus.edu.sg/～ksung/cs5238/note/Lect3-database_2005.pdf.
10Gaston, Gonnet, Ricardo, Baeza-Yates, Snider, T. New indices for text: Pat trees and Pat arrays. In: Frakes, W.B., Ricardo Baeza-Yates, eds. Information Retrieval Data Structure and Algorithms. Englewood Cliffs, NJ: Prentice Hall, 1992. 66～81.

共引文献19

1郭琦娟,陈通照.全文检索系统中动态更新索引结构的设计与实现[J].计算机工程与科学,2006,28(z2):18-20.
2聂文琪.全文索引模型探析[J].武汉交通职业学院学报,2006,8(1):73-75.
3江华,赵建新,王海岚.PAT数组全文检索技术的研究与改进[J].现代图书情报技术,2005(8):37-41. 被引量：2
4王智强,刘建毅.一种实时更新索引结构的设计与实现[J].计算机系统应用,2005,14(10):79-82. 被引量：8
5郭琦娟,陈通照.全文检索系统中动态索引技术的研究[J].微型电脑应用,2006,22(11):11-12.
6陈祎,胡运发.用互关联后继树模型实现一个局部相似性比对算法[J].复旦学报（自然科学版）,2006,45(5):604-610.
7郭琦娟,陈通照.一种动态更新索引结构的设计与实现[J].计算机系统应用,2006,15(12):76-79. 被引量：2
8郭琦娟,陈通照.全文检索系统中动态索引技术的研究[J].计算机与数字工程,2007,35(1):40-42. 被引量：2
9白秋颖,王敬成,王枞.企业信息门户访问控制安全模型的设计[J].鞍山科技大学学报,2007,30(2):155-159. 被引量：1
10刘小珠,彭智勇.全文索引技术时空效率分析[J].软件学报,2009,20(7):1768-1784. 被引量：17

1赵雅男,徐云,程昊宇.序列比对算法中的BW变换索引技术研究及其改进[J].计算机工程,2016,42(1):282-286. 被引量：4
2曹文奇.在Windows Vista中为加密文件建立索引[J].电脑入门,2011(9):12-14.
3王超.排重+索引,本地文件便利搜[J].电脑爱好者,2010(24):64-64.
4王志军.手工定制Vista的索引功能[J].电脑迷,2007,0(6):79-79.
5戴林清.FoxBASE＋索引功能补憾[J].新浪潮,1993(9):13-16.
6毛金玲.关系数据库去重与索引实现的研究[J].中小企业管理与科技,2015,0(12):225-226.
7柳佳刚,刘高嵩.数据库查询性能优化的探讨[J].福建电脑,2005(9):58-59. 被引量：4
8如何停用VISTA索引功能[J].网友世界,2009(14):93-93.
9自定义索引范围[J].电脑迷,2007,0(6):125-125.
10刘畅,张猛.基于后缀数组改进的全文索引结构研究[J].吉林大学学报（信息科学版）,2013,31(2):183-186.

情报科学

2009年第12期

浏览历史

内容加载中请稍等...

基于模拟后缀数组索引结构的实现

参考文献11

二级参考文献11

共引文献19

相关作者

相关机构

相关主题

浏览历史