preview

Analysis Of Neural Functional Unit

Better Essays

it has been seen that algorithm can improve the memory bandwidth. here the accelerator can be used in hardware design to improve the memory efficiency. In recent year, the hardware architecture for deep learning employed the GPU to increase the speed. next it moved to FPGA and then latest one is ASIC with extra unit. the Application-specific integrated circuit (AISC) is integrated circuit which is used to developing hardware to solve a problem by building gates to emulate the logic. The purpose of the chips to provide maximum performance at given power and cost budget. ASIC provide some sort of software engine to run deep learning algorithm to optimize performance power and memory. the different of componenet of hardware accelarator is …show more content…

Pipeline for stage n of NFU can be written as NFU-n.
So here NFU-3 function implementation which has previously proposed in the literature [19,20], the sigmoid of NFU-3 is for classifier and convolution layers which uses linear interpolation (f(x) = ai _x + bi; x 2 [xi; xi+1]) with negligible loss of accuracy. the operator used here are two 16x1 16 bit multiplexer, one 16 bit nultiplier and one 16- bit adder to perform thre interpolation. the 16 segment coefficient(Ai,Bi) are stored in small RAM.This segment is used to implemenet any fucntion not only sigmoid but also hyperbolic, nonlienar function as well as tangent by changing the RAM segment coefficient Ai, Bi. the segment boundairs(Xi, Xi+1) are hardwired.
Storage: There are four type of storage used in accelerator are NBin, NBout, SB and NFU-2 Modified buffer of Scratchpad can be used as storage. Here scratchpad use as storage which has advantage over normal cache. Because normal cache organization is attached tag check, associatively, line size, speculative read, etc and cache conflicts which are overhead. The efficient alternative, scratchpad, is used in VLIW processors though it tough to compile for. However a scratchpad in a committed accelerator realize the best things are efficient storage, and both efficient and easy utilization of locality because only a few algorithms have to be manually modified. The split buffers is one of the storage into three structures: an

Get Access