摘要
针对多源异构、分布广泛报送信息差异化应用需求较多、无法区分可用性信息的问题,研究了差异化需求下的非关系型分布式报送信息大数据分类方法。首先,分析了非关系型分布式报送信息数据库的可用性、开放性和拓展性等特征,结合字段类型的基本要求,采用非结构化数据库存储文本检索信息处理(TRIP)存储非关系型分布式报送信息;然后,分析了汉明散列家族内散列过程,在线性级要求约束下,利用多吸引子优化元胞自动机,通过遗传算法改进多吸引子元胞自动机分类器的最优参数,进而改进大数据分类方法。实验结果表明,该方法能够有效识别并分类非关系型分布式报送信息中的结构化数据与非结构化数据,具有较高的分类精度。
The classification method of non-relational distributed submitted information big data under the differentiated demand was studied,aiming at the problem of multi-source heterogeneous,widely distributed submitted information with more differentiated application requirements and inability to distinguish the available information.Firstly,the usability,openness and expansibility of the non-relational distributed submission information database were analyzed.The unstructured database storage TRIP was used to store non-relational distributed submission information by combining the basic requirements of field types.Then,the hashing process within the Hamming hash family was analyzed.Under the constraint of linearity level requirements,cellular automata with multiple attractors were used to optimize the system.The optimal parameters of the multiple attractor cellular automata classifier were improved through genetic algorithm,thus improving the big data classification method.Experimental results show that this method can effectively identify and classify structured data and unstructured data in non relational distributed submission information,and has high classification accuracy.
作者
韩璐
陈威宇
张斐
何建锋
苏怀振
HAN Lu;CHEN Weiyu;ZHANG Fei;HE Jianfeng;SU Huaizhen(State Grid Gansu Electric Power Company,Lanzhou 730030,China;State Grid Lanzhou Siji Feitian Cloud Date Science Technology Co.,Ltd.,Lanzhou 730020,China;State Grid Gansu Electric Power Company Dingxi Power Supply Company,Dingxi 743000,China)
出处
《电信科学》
2023年第6期114-121,共8页
Telecommunications Science
关键词
差异化需求
非关系型
分布式
报送信息
大数据分类
元胞自动机
differentiated demand
non relational
distributed
submit information
big data classification
cellular automata