期刊文献+

大规模复杂信息网络表示学习:概念、方法与挑战 被引量:44

Representation Learning of Large-Scale Complex Information Network:Concepts,Methods and Challenges
在线阅读 下载PDF
导出
摘要 大数据时代的到来,使得当前的复杂信息网络研究领域面临着三个基础性问题,即网络的动态性、大规模性以及网络空间的高维性.传统复杂信息网络特征的表示通常以邻接矩阵、出入度、中心性等离散型方式表达,这种表达方式在现有的大规模动态信息网络的新环境下,其计算效率及准确率都受到了很大的挑战.随着机器学习算法的不断发展,复杂信息网络的特征表示学习同样也引起了越来越多的关注.与自然语言中的词向量学习的目标类似,目前较为前沿的大规模复杂网络特征表示学习方法的目标是将网络中任意顶点的结构特征映射到一个低维度的、连续的实值向量,在进行这种映射的过程中,尽量保留顶点之间的结构特征关系,使大规模网络特征学习能够有效地应用于各类网络应用中,如网络中的链接预测、顶点分类、个性化推荐、大规模社区发现等.通过对复杂信息网络特征的学习,不仅能够有效缓解网络数据稀疏性问题,而且把网络中不同类型的异质信息融合为整体,可以更好地解决某些特定问题.同时,还能够高效地实现语义相关性操作,从而显著提升在大规模,特别是超大规模的网络中进行相似性顶点匹配的计算效率等.该文主要对近些年来关于复杂信息网络表示学习的方法和研究现状进行了总结,并提出自己的想法和意见.首先概述了表示学习的发展历史,然后分别阐述了有关大规模复杂信息网络、网络表示学习等基本概念与理论基础;接着,根据学习模型的不同,对经典的、大规模的、基于内容的、基于融合的以及异构的网络表示学习模型进行了全面的分析与比较.另外,对当前的网络表示学习方法所采用的实验数据集、评测指标以及应用场景等也进行了总结概括.最后给出了大规模复杂信息网络表示学习的研究难题以及未来的研究方向.大规模复杂网络表示学习是一个复杂的问题.当前研究中,大多数学习模型是根据复杂网络的结构或者内容来进行顶点的特征表示学习.只有融合复杂网络结构特征和内容特征的表示学习才能够更好地反映出一个网络特征的真实情况,使得学习得到的网络特征表示更具有意义与价值. With the arrival of the age of big data,the current study of complex information network is facing three severe challenges:dynamicity,large scale and high-dimensionality of the network.Traditionally,the characteristics of complex information networks are represented in discrete forms such as adjacency matrix,in-out degree and centrality.Such representations have great disadvantages in computational efficiency and accuracy in the new environment of large-scale dynamic information network.Meanwhile,with the advance in machine learning algorithms,the representation learning of complex information network receives more attention.Similar to the learning of word vectors in natural language processing,the representation learning of large-scale network aims to map the structural characteristics of each vertex in the network to a low-dimensional,real-valued vector,during which the structural relationship of vertices in the network is kept to the greatest extent,so that various types of network applications can be effectively applied,such as link prediction,vertex classification,personalized recommendation,and large scale community discovery.More precisely,the advantages of representation learning in complex information networks are three-fold.First,it reduces the effect of datasparsity in networks.Second,heterogeneous information is integrated into the same vector space so that specific applications can be applied easily.Third,semantic operations can be implemented in a way which dramatically improves the efficiency of node similarity computing in large-scale networks.To this end,the paper proposes a taxonomy of both the classics and the state of the arts on representation learning of information networks.We first provide a historical overview of representation learning in graphs,followed by the elaboration of correlated concepts and theories,and then a comprehensive analysis of various learning models is proposed.We consider the classic models,for example,the spectral method and optimization based methods;and models for large scale networks,including the high-order relationship based models,semi-supervised models and models with scalability.Both two categories of models focus on the structure of network.In contrast,the learning of contents in networks,e.g.,content with each vertex.The combination of these two aspects,such as the matrix factorization based models and probabilistic graphical models are also considered.Finally we discuss models for heterogeneous networks.We compare different methods from several perspectives in detail and derive some conclusions.In addition,the paper presents a summary of the experimental data sets,evaluation metrics and application scenarios of different graph representation learning methods.We also discuss the existing problems and future studies in representation learning of large-scale complex information network.In brief,most existing works focus on either the structure or content of the network in the representation learning of vertices.However,network properties should be better revealed by considering both characteristics.Room for future improvements of representation learning typically lies in the fault tolerance of network feature extraction,the adaptivity for dynamic networks,the combination of heterogeneous information in network,the universality of representation,the distributed network representation and feature learning for specific sub-graph structures.
作者 齐金山 梁循 李志宇 陈燕方 许媛 QI Jin-Shan;LIANG Xun;LI Zhi-Yu;CHEN Yan-Fang;XU Yuan(School of Information,Renmin University of China,Beijing 100872;School of Computer Science&Technology,Huaiyin Normal University,Huai’an,Jiangsu 223300;School of Information Resource Management,Renmin University of China,Beijing 100872)
出处 《计算机学报》 EI CSCD 北大核心 2018年第10期2394-2420,共27页 Chinese Journal of Computers
基金 国家自然科学基金(71271211 71531012)资助~~
关键词 大规模复杂信息网络 网络特征 顶点嵌入 网络表示学习 深度学习 特征学习 large-scale complex information network network features vertex embedding network representation learning deep learning feature learning
  • 相关文献

参考文献2

二级参考文献10

共引文献14

同被引文献246

引证文献44

二级引证文献452

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部