摘要
人工智能需要大规模、多样化和高质量的数据来训练机器学习模型,而收集这些真实世界的数据可能成本高昂,并可能威胁个人隐私、引发偏见或歧视以及侵犯版权。在实践中,合成数据作为一种替代性解决方案,受到广泛关注,被越来越多地用于训练机器学习模型。从数据法学的角度,借助数据科学以及计算机科学领域的研究成果,对人工智能训练中合成数据的治理框架进行了探索。首先,从规范层面分析了在人工智能训练中合成数据之所以受到重视的逻辑前提,即个人信息保护法所追求的“小隐私”保护与人工智能训练的“大数据”需求之间存在明显的不兼容性,使训练数据的开发面临挑战,而现有的法律和技术解决方案均存在治理效能不彰的问题。在此基础上,探讨了人工智能训练中合成数据的应用场景与风险类型。最后,提出以“法律3.0理论”和“数据治理理论”作为指引,从3个方面构建人工智能训练中合成数据的融贯性法律治理框架:制定合成数据的处理规则,强化合成数据的过程治理,开发合成数据的评估工具。
Artificial intelligence requires large,diverse,and high-quality data to train machine learning models,and collecting this real-world data can be very difficult and can threaten individual privacy,trigger bias or discrimination,and violate copyright.In practice,synthetic data,as an alternative solution has received widespread attention and is increasingly being used to train machine learning models.This paper explores the governance framework of synthetic data in AI training from the perspective of data jurisprudence,drawing on research from both data science and computer science.It first analyzes the logical premise of the importance of synthetic data in AI training from the normative level,i.e.,there is an obvious incompatibility between the protection of“small privacy”pursued by the personal information protection law and the demand for“big data”in AI training,which makes the deve-lopment of training data challenging,and the development of synthetic data for machine learning models challenging.The development of training data faces challenges,while existing legal and technological solutions suffer from ineffective governance.On this basis,the application scenarios and risk types of synthetic data in AI training are discussed.Finally,it is proposed to build a coherent legal governance framework for synthetic data in AI training from three aspects,guided by the“law 3.0 theory”and“data governance theory”:formulating rules for handling synthetic data,strengthening process governance of synthetic data,and developing assessment tools for synthetic data.
作者
张涛
ZHANG Tao(Ministry of Education Laboratory of Philosophy and Social Sciences-Data Law Laboratory of China University of Political Science and Law,Beijing 100088,China;Institute for Data Law,China University of Political Science and Law,Beijing 100088,China;Institute of Digital Society Governance,China University of Political Science and Law,Beijing 100088,China)
出处
《计算机科学》
北大核心
2025年第2期20-32,共13页
Computer Science