摘要
随着知识图谱技术的不断发展,知识图谱驱动的知识信息管理广泛应用于各个领域,因此面向知识图谱的分布式SPARQL(Simple Protocol and Resource description framework Query Language)的查询效率显得尤为重要。首先针对现有的基于Spark和基于主存(RAM)的分布式资源描述框架(RDF)系统进行详细调研;其次,从上述系统中选出8个具有代表性的系统进行查询性能评估,比较基于Spark和基于RAM的系统在不同查询类型、查询直径、数据集上的查询性能差异;然后,全面分析实验结果,对基于Spark和基于RAM的系统的查询性能进行评价;最后,针对现有系统在分布式SPARQL查询中存在的查询伸缩性差、查询连接复杂度高、查询编译时间长等问题,展望面向垂直应用领域的分布式SPARQL查询优化的未来研究方向。
With the continuous development of knowledge graph technology,knowledge information management driven by knowledge graph has been widely applied in multiple domains,so the efficiency of distributed Simple Protocol and Resource description framework Query Language(SPARQL)query for knowledge graph is particularly important.Firstly,a detailed investigation on the existing Spark-based and Random Access Memory(RAM)-based distributed RDF systems was conducted.Secondly,query performance evaluation of eight representative systems selected from the above systems was performed,thereby comparing query performance differences between Spark-based and RAM-based systems with different query types,query diameters and datasets.Thirdly,the query performance of Spark-based and RAM-based systems was evaluated by analyzing the experimental results comprehensively.Finally,the future research directions of distributed SPARQL query optimization which oriented vertical application domain were pointed out aiming at problems of the existing distributed SPARQL query,such as poor query scalability,high query join complexity and long query compilation time.
作者
冯钧
王秉发
陆佳民
FENG Jun;WANG Bingfa;LU Jiamin(College of Computer and Information,Hohai University,Nanjing Jiangsu 211100,China)
出处
《计算机应用》
CSCD
北大核心
2022年第2期440-448,共9页
journal of Computer Applications
基金
国家重点研发计划项目(2018YFC0407901)。
关键词
分布式资源描述框架
主存
SPARK
分布式SPARQL查询
选择性
查询效率
查询准确性
distributed Resource Description Framework(RDF)
Random Access Memory(RAM)
Spark
distributed Simple Protocol and RDF Query Language(SPARQL)query
selectivity
query efficiency
query accuracy