摘要
随着数据采集和处理技术的进步,人们对数据的不确定性的认识也逐步深入.在诸如经济、军事、物流、金融、电信等领域的具体应用中,数据的不确定性普遍存在.不确定性数据的表现形式多种多样,它们可以以关系型数据、半结构化数据、流数据或移动对象数据等形式出现.目前,根据应用特点与数据形式差异,研究者已经提出了多种针对不确定数据的数据模型.这些不确定性数据模型的核心思想都源自于可能世界模型.可能世界模型从一个或多个不确定的数据源演化出诸多确定的数据库实例,称为可能世界实例,而且所有实例的概率之和等于1.尽管可以首先分别为各个实例计算查询结果,然后合并中间结果以生成最终查询结果,但由于可能世界实例的数量远大于不确定性数据库的规模,这种方法并不可行.因此,必须运用排序、剪枝等启发式技术设计新型算法,以提高效率.文中介绍了不确定性数据管理技术的概念、特点与挑战,综述了数据模型、数据预处理与集成、存储与索引、查询处理等方面的工作.
The importance of the data uncertainty was studied deeply with the rapid development in data gathering and processing in various fields, inclusive of economy, military, logistic, finance and telecommunication, etc. Uncertain data has many different styles, such as relational data, semistructured data, streaming data, and moving objects. According to scenarios and data characteristics, tens of data models have been developed, stemming from the core possible world model that contains a huge number of the possible world instances with the sum of probabilities equal to 1. However, the number of the possible world instances is far greater than the volume of the uncertain database, making it infeasible to combine medial results generated from all of possible world instances for the final query results. Thus, some heuristic techniques, such as ordering, pruning, must be used to reduce the computation cost for the high efficiency. This paper introduces the concepts, characteristics and challenges in uncertain data management, proposes the advance of the research on uncertain data management, including data model, preprocessing, in- tegrating, storage, indexing, and query processing.
出处
《计算机学报》
EI
CSCD
北大核心
2009年第1期1-16,共16页
Chinese Journal of Computers
基金
国家自然科学基金(60803020)
上海市重点学科建设项目(B412)资助