摘要
随着全球数据量的爆炸式增长以及数据多样性的日益丰富,单一介质层的存储系统逐渐不能满足用户多样化的应用需求。分层存储技术可依据数据的重要性、访问频率、安全性需求等特征将数据分类存放到具有不同访问延迟、存储容量、容错能力的存储层中,已经在各个领域得到广泛应用。重复数据删除是一种面向大数据的缩减技术,可高效去除存储系统中的重复数据,最大化存储空间利用率。不同于单存储层场景,将重复数据删除技术运用于分层存储中,不仅能减少跨层数据冗余,进一步节省存储空间、降低存储成本,还能更好地提升数据I/O性能和存储设备的耐久性。在简要分析基于重复数据删除的分层存储技术的原理、流程和分类之后,从存储位置选择、重复内容识别和数据迁移操作3个关键步骤入手,深入总结了诸多优化方法的研究进展,并针对基于重复数据删除的分层存储技术潜在的技术挑战进行了深入探讨。最后展望了基于重复数据删除的分层存储技术的未来发展趋势。
With the explosive growth of global data volume and the increasing diversity of data,storage systems with a single media layer are gradually unable to meet the diverse application demand of users.Tiered storage can classify and store data into storage layers with different access latency,storage capacity,and fault tolerance based on the importance,access frequency,security requirements,and other characteristics of the data.It has been widely applied in various fields.Deduplication is a big data reduction technique that can efficiently remove duplicate data from storage systems and maximize storage space utilization.Unlike single storage layer scenarios,applying deduplication to tiered storage can not only reduce cross-layer data redundancy,further save storage space and reduce storage costs,but also improve data I/O performance and storage device durability.After a brief analysis of the principle,process,and classification of deduplication based tiered storage,this paper starts with three key steps:storage location selection,duplicate content identification,and data migration operation.It summarizes the research progress of many optimization methods and explores the potential technical challenges of deduplication based tiered storage.Finally,the future development trends of deduplication based tiered storage is prospected.
作者
姚子路
付印金
肖侬
YAO Zilu;FU Yinjin;XIAO Nong(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China;National Supercomputer Center in Guangzhou,Sun Yat-Sen University,Guangzhou 510006,China)
出处
《计算机科学》
北大核心
2025年第1期120-130,共11页
Computer Science
基金
国家重点研发计划(2022YFB4500304)
国家自然科学基金(62332021,61832020)。
关键词
重复数据删除
分层存储
存储位置选择
重复内容识别
数据迁移
Deduplication
Tiered storage
Storage location selection
Duplicate content identification
Data migration