目的:针对基于LDA模型进行主题识别及演化分析方法在主题数量选择困难、时间窗口划分主观性强等方面的局限提出优化改进,从而推动主题识别及演化分析方法的进步。方法:结合TF-IDF算法和Word2Vec词向量技术计算主题向量,减少主题生成时...目的:针对基于LDA模型进行主题识别及演化分析方法在主题数量选择困难、时间窗口划分主观性强等方面的局限提出优化改进,从而推动主题识别及演化分析方法的进步。方法:结合TF-IDF算法和Word2Vec词向量技术计算主题向量,减少主题生成时常用词汇的影响,同时实现主题向量的语义表达。在主题演化过程中提出基于主题语义距离变化的方法划分时间窗口,跟踪目标领域主题强度和主题内容的演化趋势。最后以软件开源领域研究文献为例进行实证研究。结果:研究结果显示,本文提出的优化方法能够有效识别领域的研究主题及热点主题,跟踪主题随时间演化的路径,并可视化呈现。结论:软件开源研究存在六个关键主题,其中“开源治理”和“市场竞争”是该研究领域的热点主题。从主题内容的演变来看,软件开源的研究正从个人自发参与的自治动机转向企业与政府等组织层面的参与。Purpose: To address the limitations of topic identification and evolution analysis methods based on LDA models, such as difficulty in selecting the number of topics and strong subjectivity in time window partitioning, and to propose optimization improvements, in order to promote the progress of topic identification and evolution analysis methods. Method: Combining TF-IDF algorithm and Word2Vec word vector technology to calculate topic vectors, reducing the influence of commonly used vocabulary in topic generation, while achieving semantic expression of topic vectors. Propose a method for dividing time windows based on changes in topic semantic distance during the process of topic evolution, and track the evolution trend of topic intensity and content in the target domain. Finally, empirical research will be conducted using literature in the field of open source software as an example. Result: The research results show that the optimization method proposed in this paper can effectively identify research topics and hot topics in the field, track the path of topic evolution over time, and visualize it. Conclusion: There are six key themes in software open source research, among which “open source governance” and “market competition” are hot topics in this research field. From the evolution of the theme content, research on open source software has shifted from the autonomous motivation of individual participation to the participation of organizations such as enterprises and governments.展开更多
本研究旨在通过文本挖掘方法研究消费者的需求和偏好。通过收集和预处理天猫商城的服装商品的在线评论数据,应用BERT-LDA模型进行分析,发现消费者在购物体验、服装特性和服装品质方面呈现出多样化的关注度和情感积极率。研究结果表明,...本研究旨在通过文本挖掘方法研究消费者的需求和偏好。通过收集和预处理天猫商城的服装商品的在线评论数据,应用BERT-LDA模型进行分析,发现消费者在购物体验、服装特性和服装品质方面呈现出多样化的关注度和情感积极率。研究结果表明,虚拟试穿等新型产品体验方式将深刻影响消费者的购买决策。消费者提高了对服装的可持续性的关注程度,倾向于选择实用性强、易于回收利用,且能“一衣多穿”的服装。基于该研究结果,本文为服装电商行业的市场营销提供了有益的参考和指导。The purpose of this study is to study consumers’ needs and preferences through text mining methods. By collecting and preprocessing online review data of clothing products on Tmall and applying BERT-LDA model for analysis, it is found that consumers show diversified attention and positive emotional rate in terms of shopping experience, clothing characteristics and clothing quality. The results show that new product experience methods such as virtual trying on will profoundly affect consumers’ purchasing decisions. Consumers are paying more attention to the sustainability of clothing, and tend to choose clothes that are practical, easy to recycle, and can be worn more than once. Based on the research results, this paper provides useful reference and guidance for the marketing of apparel e-commerce industry.展开更多
文摘目的:针对基于LDA模型进行主题识别及演化分析方法在主题数量选择困难、时间窗口划分主观性强等方面的局限提出优化改进,从而推动主题识别及演化分析方法的进步。方法:结合TF-IDF算法和Word2Vec词向量技术计算主题向量,减少主题生成时常用词汇的影响,同时实现主题向量的语义表达。在主题演化过程中提出基于主题语义距离变化的方法划分时间窗口,跟踪目标领域主题强度和主题内容的演化趋势。最后以软件开源领域研究文献为例进行实证研究。结果:研究结果显示,本文提出的优化方法能够有效识别领域的研究主题及热点主题,跟踪主题随时间演化的路径,并可视化呈现。结论:软件开源研究存在六个关键主题,其中“开源治理”和“市场竞争”是该研究领域的热点主题。从主题内容的演变来看,软件开源的研究正从个人自发参与的自治动机转向企业与政府等组织层面的参与。Purpose: To address the limitations of topic identification and evolution analysis methods based on LDA models, such as difficulty in selecting the number of topics and strong subjectivity in time window partitioning, and to propose optimization improvements, in order to promote the progress of topic identification and evolution analysis methods. Method: Combining TF-IDF algorithm and Word2Vec word vector technology to calculate topic vectors, reducing the influence of commonly used vocabulary in topic generation, while achieving semantic expression of topic vectors. Propose a method for dividing time windows based on changes in topic semantic distance during the process of topic evolution, and track the evolution trend of topic intensity and content in the target domain. Finally, empirical research will be conducted using literature in the field of open source software as an example. Result: The research results show that the optimization method proposed in this paper can effectively identify research topics and hot topics in the field, track the path of topic evolution over time, and visualize it. Conclusion: There are six key themes in software open source research, among which “open source governance” and “market competition” are hot topics in this research field. From the evolution of the theme content, research on open source software has shifted from the autonomous motivation of individual participation to the participation of organizations such as enterprises and governments.
文摘本研究旨在通过文本挖掘方法研究消费者的需求和偏好。通过收集和预处理天猫商城的服装商品的在线评论数据,应用BERT-LDA模型进行分析,发现消费者在购物体验、服装特性和服装品质方面呈现出多样化的关注度和情感积极率。研究结果表明,虚拟试穿等新型产品体验方式将深刻影响消费者的购买决策。消费者提高了对服装的可持续性的关注程度,倾向于选择实用性强、易于回收利用,且能“一衣多穿”的服装。基于该研究结果,本文为服装电商行业的市场营销提供了有益的参考和指导。The purpose of this study is to study consumers’ needs and preferences through text mining methods. By collecting and preprocessing online review data of clothing products on Tmall and applying BERT-LDA model for analysis, it is found that consumers show diversified attention and positive emotional rate in terms of shopping experience, clothing characteristics and clothing quality. The results show that new product experience methods such as virtual trying on will profoundly affect consumers’ purchasing decisions. Consumers are paying more attention to the sustainability of clothing, and tend to choose clothes that are practical, easy to recycle, and can be worn more than once. Based on the research results, this paper provides useful reference and guidance for the marketing of apparel e-commerce industry.