基于重复数据删除的分层存储优化技术研究进展

Research Progress on Optimization Techniques of Tiered Storage Based on Deduplication

下载PDF

导出

摘要随着全球数据量的爆炸式增长以及数据多样性的日益丰富,单一介质层的存储系统逐渐不能满足用户多样化的应用需求。分层存储技术可依据数据的重要性、访问频率、安全性需求等特征将数据分类存放到具有不同访问延迟、存储容量、容错能力的存储层中,已经在各个领域得到广泛应用。重复数据删除是一种面向大数据的缩减技术,可高效去除存储系统中的重复数据,最大化存储空间利用率。不同于单存储层场景,将重复数据删除技术运用于分层存储中,不仅能减少跨层数据冗余,进一步节省存储空间、降低存储成本,还能更好地提升数据I/O性能和存储设备的耐久性。在简要分析基于重复数据删除的分层存储技术的原理、流程和分类之后,从存储位置选择、重复内容识别和数据迁移操作3个关键步骤入手,深入总结了诸多优化方法的研究进展,并针对基于重复数据删除的分层存储技术潜在的技术挑战进行了深入探讨。最后展望了基于重复数据删除的分层存储技术的未来发展趋势。 With the explosive growth of global data volume and the increasing diversity of data,storage systems with a single media layer are gradually unable to meet the diverse application demand of users.Tiered storage can classify and store data into storage layers with different access latency,storage capacity,and fault tolerance based on the importance,access frequency,security requirements,and other characteristics of the data.It has been widely applied in various fields.Deduplication is a big data reduction technique that can efficiently remove duplicate data from storage systems and maximize storage space utilization.Unlike single storage layer scenarios,applying deduplication to tiered storage can not only reduce cross-layer data redundancy,further save storage space and reduce storage costs,but also improve data I/O performance and storage device durability.After a brief analysis of the principle,process,and classification of deduplication based tiered storage,this paper starts with three key steps:storage location selection,duplicate content identification,and data migration operation.It summarizes the research progress of many optimization methods and explores the potential technical challenges of deduplication based tiered storage.Finally,the future development trends of deduplication based tiered storage is prospected.

作者姚子路付印金肖侬 YAO Zilu;FU Yinjin;XIAO Nong(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China;National Supercomputer Center in Guangzhou,Sun Yat-Sen University,Guangzhou 510006,China)

机构地区国防科技大学计算机学院中山大学国家超级计算广州中心

出处《计算机科学》北大核心 2025年第1期120-130,共11页 Computer Science

基金国家重点研发计划(2022YFB4500304) 国家自然科学基金(62332021,61832020)。

关键词重复数据删除分层存储存储位置选择重复内容识别数据迁移 Deduplication Tiered storage Storage location selection Duplicate content identification Data migration

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献3

1谢平.存储系统重复数据删除技术研究综述[J].计算机科学,2014,41(1):22-30. 被引量：27
2付印金,肖侬,刘芳.重复数据删除关键技术研究进展[J].计算机研究与发展,2012,49(1):12-20. 被引量：65
3XIAO Nong,ZHAO YingJie,LIU Fang,CHEN ZhiGuang.Dual queues cache replacement algorithm based on sequentiality detection[J].Science China(Information Sciences),2012,55(1):191-199. 被引量：2

二级参考文献114

1Hu Y, Nightingale T, Yang Q. Rapid-cache-a reliable and inexpensive write cache for high performance storage systems. IEEE Trans Parall Distrib Syst, 2002, 13:290-307.
2Muntz D, Honeyman P. Multi-level caching in distributed file systems-or-your cache ain't nuthin' but trash. In: Proceedings of the Winter 1992 USENIX Conference. San Francisco, 1992. 305-313.
3Adelsonveslkii G M, Landisand Y M. An algorithm for the organization of Information. Doklady Akademi Nauk, 1962, 16:263-266.
4Aho A V, Denning P J, Ullman J D. Principles of optimal page replacement. J ACM, 1971, 18:80-93.
5O'Neil E J, O'Neil P E, Weikum G. The LRU-K page replacement algorithm for database disk buffering. In: Proceedings of the 1993 ACM SIGMOD International Conference. Washington, 1993. 297-306.
6Lee D, Choi J, Kim J H, et al. LRFU: a spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Trans Comput, 2001, 50:1352-1360.
7Belady L. A study of replacement algorithms for a virtual-storage computer. IBM Syst J, 1966, 5:78-101.
8Zhao Y J, Xiao N. Bargain cache: using file-system metadata to reduce the cache miss penalty. In: Proceedings of the 9th PDCAT Conference. Dunedin. 2008. 177-184.
9Chu R, Xiao N, Zhuang Y Z, et al. A distributed paging RAM grid system for wide-area memory sharing. In: Proceedings of the 20th IPDPS Conference. Rhodes Island, 2006. 10-17.
10Mattson R L, Gecsei J, Slutz D R, et al. Evaluation techniques for storage hierarchies. IBM Syst J, 1970, 9:78-117.

共引文献83

1李超,王树鹏,云晓春,周晓阳,陈明.一种基于流水线的重复数据删除系统读性能优化方法[J].计算机研究与发展,2013,50(1):90-100. 被引量：2
2孙虎威,靳嘉伟,张晶,龚鸣.重复数据删除算法在VTL系统中的应用研究[J].微型机与应用,2013,32(6):82-85. 被引量：1
3谢平.存储系统重复数据删除技术研究综述[J].计算机科学,2014,41(1):22-30. 被引量：27
4张志杰,何利力.烟草工业数据灾备中重复数据删除技术研究[J].工业控制计算机,2013,26(12):22-23.
5何磊,谭献海,赵金铃.基于数据存储的重复删除技术的研究[J].铁路计算机应用,2013,22(11):13-15.
6梁莹,陆游游,刘青昆,舒继武.MSDD:一种结合重复数据删除技术的内存交换机制[J].小型微型计算机系统,2014,35(5):989-993. 被引量：1
7周景才,张沪寅.低时延网盘同步系统的设计与实现[J].小型微型计算机系统,2014,35(7):1515-1517.
8毕朝国,徐小龙.一种云存储系统中重复数据删除机制[J].计算机应用研究,2014,31(10):3052-3055. 被引量：9
9殷秀叶.大数据环境下一种高效的重复记录检测方法[J].洛阳师范学院学报,2014,33(11):52-54. 被引量：2
10张敏.海量数据的MapReduce相似度检测[J].实验室研究与探索,2014,33(9):132-136. 被引量：4

1夏健晖,孙超,徐丹,白国风,常夏勤,艾文凯.基于时序库的分层存储技术设计与实现[J].电工技术,2024(16):140-142.
2单鲁维,江贤镁,王子悠,林俊雄,田绍伟.超融合技术在工业控制系统中对传统虚拟化的替代实现[J].化工管理,2024(33):82-88.
3张功臣.基于IPv6的金融云建设实践[J].金融电子化,2024(22):25-27.
4王惟一.基于业务场景的在线教学平台负载测试设计[J].信息与电脑,2024,36(18):88-90.
5管吉喆,程光,周余阳.基于交换机迁移的控制平面饱和攻击防御方法[J].计算机学报,2024,47(12):2889-2908.

计算机科学

2025年第1期

浏览历史

内容加载中请稍等...

基于重复数据删除的分层存储优化技术研究进展

参考文献3

二级参考文献114

共引文献83

相关作者

相关机构

相关主题

浏览历史