摘要
针对神经网络训练加速器中存在权重梯度计算效率低的问题,设计了一种高性能卷积神经网络(CNN)训练处理器的浮点运算优化架构;在分析CNN训练架构基本原理的基础上,提出了包括32 bit、24 bit、16 bit和混合精度的训练优化架构,从而找到适用于低能耗且更小尺寸边缘设备的最佳浮点格式;通过现场可编程门阵列(FPGA)验证了加速器引擎可用于MNIST手写数字数据集的推理和训练,利用24 bit自定义浮点格式与16 bit脑浮点格式相结合构成混合卷积24 bit浮点格式的准确率可达到93%以上;运用台积电55 nm芯片实现优化混合精度加速器,训练每幅图像的能耗为8.51μJ。
Aiming at the low efficiency of weight gradient calculation in a neural network training accelerator,a floating-point operation optimization architecture based on the high performance convolutional neural network(CNN)training processor is designed.On the basic principle of CNN training architecture,a training optimization architecture with 32 bit,24 bit,16 bit and mixed accuracy is proposed,the best floating-point format for edge devices with low energy consumption and smaller size is found.By field programmable gate array(FPGA),the accelerator engine is used to verify the reasoning and training of MNIST handwritten digital data sets.The data with 24 bit custom floating-point format and 16 bit brain floating-point format are used to constuct that of hybrid convolution 24 bit floating-point format,which realizes the accuracy of more than 93%.TSMC 55 nm chip is used to realize the optimized hybrid accuracy of the accelerator,and the energy consumption of each image is 8.51μJ.
作者
张立博
李昌伟
齐伟
王刚
戚鲁凤
ZHANG Libo;LI Changwei;QI Wei;WANG Gang;QI Lufeng(China Green Development Investment Group Co.,Ltd.,Beijing 100010,China;Shandong Luruan Digital Technology Co.,Ltd.,Jinan 250001,China)
出处
《计算机测量与控制》
2023年第6期176-182,共7页
Computer Measurement &Control
基金
中国绿发投资集团有限公司科技项目(CGDG529000220008)。
关键词
卷积神经网络
浮点运算
加速器
权重梯度
处理器
convolutional neural network
floating point operation
accelerator
weight gradient
processor