摘要
以在现场可编程门阵列(FPGA)上部署卷积神经网络为背景,提出了卷积神经网络在硬件上进行并行加速的方案.主要是通过分析卷积神经网络的结构特点,对数据的存储、读取、搬移以流水式的方式进行,对卷积神经网络中的每一层内的卷积运算单元进行展开,加速乘加操作.基于FPGA特有的并行化结构和流水线的处理方式可以很好地提升运算效率,从对ciafr-10数据集的物体分类结果看,在不损失正确率的前提下,当时钟工作在800 MHz时,相较于中端的Intel处理器,可实现4倍左右的加速.卷积神经网络通过循环展开并行处理以及多级流水线的处理方式,可以加速卷积神经网络的前向传播,适合于实际工程任务中的需要.
In this paper, the convolutional neural network is deployed on the Field Programmable Gate Array(FPGA). As a background, a convolutional neural network is proposed to accelerate hardware. The paper analyzes the structural characteristics of convolutional neural networks, stores, reads, and moves data in a stream-style manner. Next, the convolution unit in each layer of the convolutional neural network is expanded to speed up the multiplication and addition operations. Based on the(FPGA) unique parallel structure, pipeline processing method can effectively improve the efficiency of the operation. From object classification results for the ciafr-10 dataset, at 800 MHz operating frequency and without loss of accuracy, FPGA compared to General purpose processor can achieve 4 times speed up, Convolutional neural network through parallel process and multi-stage pipeline process can accelerate forward propagation of convolutional neural networks, being suitable for the demand of practical engineering tasks.
作者
李小燕
张欣
闫小兵
任德亮
李彦青
傅长娟
LI Xiaoyan;ZHANG Xin;YAN Xiaobing;REN Deliang;LI Yanqing;FU Changjuan(College of Telecommunications and Information Engineering,Hebei University,Baoding 071002,China;Baoding Yonghong Foundry Machinery Factory,Baoding 072150,China)
出处
《河北大学学报(自然科学版)》
CAS
北大核心
2019年第1期99-105,共7页
Journal of Hebei University(Natural Science Edition)
基金
国家自然科学基金资助项目(61674050)
关键词
现场可编程门阵列(FPGA)
卷积神经网络
并行化
流水线
分类
加速
field programmable gate array (FPGA)
convolutional neural network
parallelization
stream-style
classification
accelerate