摘要
数据一致性是数据质量管理的一个重要内容。为了提升图数据一致性,大量关系型数据库中的数据依赖理论被引入到图数据库,包括图函数依赖、图关联规则等。图修复规则是最新提出的一种针对图数据的数据依赖规则,具有强大的修复能力,但目前尚无有效的挖掘算法。为了自动生成图修复规则并提高图数据修复的可靠性,提出一种将图常量条件函数依赖转化为图修复规则的方法(GenGRR)。通过图模式在图中匹配同构子图并映射成节点-属性二维表,从表中相应属性域中抽取错误模式把图常量条件函数依赖转化成图属性值修复规则;删去图模式中常量条件函数依赖RHS对应的节点与相连边生成图属性补充规则。基于最大公共同构子图筛选并验证生成图修复规则的一致性。在多个真实数据集上进行测试,验证相比图常量条件函数直接修复图数据,通过转化生成的图修复规则具有更好的修复效果。
Data consistency is an important part of data quality management.In order to improve graph data consistency,a lot of data dependency theories in relational database have been introduced into graph database,including graph functional dependencies,graph association rules and so on.Graph repairing rule is a newly proposed data dependency rule for graph with powerful repairing capability,but there is no effective mining algorithm yet.In order to automatically generate graph repairing rule and improve the reliability of graph data repairing,a method called GenGRR is proposed to transform graph constant conditional functional dependencies into graph repairing rules.By using the graph pattern,the isomorphic subgraph is matched and mapped into a node-attribute two-dimensional table,and the error pattern is extracted from the corresponding attribute field in the table to transform the constant condition function dependency into the graph attribute value repair rule.The graph attribute supplement rules are generated by deleting the nodes and contiguous edges of constant condition function dependent on RHS in graph mode.Based on the maximum common isomorphic subgraph,the consistency of the repair rules of the generated graph is screened and verified.It is tested on multiple real data sets to verify that the graph repair rule generated by transformation has better repair effect than that of the graph constant condition function.
作者
李杰
曹建军
王保卫
庄园
LI Jie;CAO Jian-jun;WANG Bao-wei;ZHUANG Yuan(School of Computer&Software,Nanjing University of Information Science and Technology,Nanjing 210044,China;The 63rd Research Institute,National University of Defense Technology,Nanjing 210007,China;Laboratory for Big Data and Decision,National University of Defense Technology,Changsha 410073,China)
出处
《计算机技术与发展》
2024年第4期7-15,共9页
Computer Technology and Development
基金
国家自然科学基金资助项目(61972207)
中国博士后科学基金特别资助项目(2015M582832)
国家重大科技专项(2015ZX01040201-003)。
关键词
数据一致性
数据质量
图函数依赖
图修复规则
子图同构
最大公共同构子图
data consistency
data quality
graph functional dependency
graph repairing rule
subgraph isomorphism
maximum common isomorphism subgraph