摘要
大规模分布式数据存储是云计算和大数据时代的重要支撑技术。在分布式存储系统中,数据副本如何放置是一个基本问题。然而,现有可实用的算法或忽略应用具体的访问特征而牺牲效率,或拘泥于单一应用而不具备泛化能力。通过建立副本存储策略的统一描述模型以及提取应用的关键访问特征参数,定义出副本存储策略自动生成算法的输出和输入;通过机器学习的方法获得访问特征参数和最优副本存储策略参数之间的一般性关系,从而形成自动生成机制的核心算法。在提高存储系统访问性能及节约能耗等成本的同时,有效降低副本存储策略设计过程中的人工干预程度。
The large-scale distributed storage technology is a significant support factor for the era of cloud computing and big data. In a distributed storage system, how to place the data replicas is a basic problem. However, existing applicable algorithms either ignore the apphcation-specific access patterns, resulting in a sacrifice ofefficiencies, or fit one specific application, thus lack of generalization. With the unified model for describing the replica placement strategies and the extraction of critical access characteristic parameters, the output and input of the automatic mechanism are defined. With machine learning algorithms, the relationships between the optimal replica placement strategy and the access characteristic parameters can be explored,. Compared to existing solutions, the auto generation framework will greatly improve the efficiency in utilizing the storage resources and energy, resulting in improvement of data access performance and reduction of storage device and energy consumption. Furthermore, it will reduce labor participation in the design of replica placement strategy.
出处
《网络安全技术与应用》
2014年第7期22-23,共2页
Network Security Technology & Application
基金
高等学校博士学科点专项科研基金(20120201110013)
陕西省自然科学基础研究计划资助项目(2012JQ8030)
关键词
分布式存储
负载平衡
资源分配
绿色计算
副本策略
Distributed Storage
Load Balance
Resource Allocation
Green Computing
Replica Placement Strategy