An enhanced noc-based embedded heterogeneous manycore platform
Zheng, Xuan (2018-09-17)
An enhanced noc-based embedded heterogeneous manycore platform
Zheng, Xuan
(17.09.2018)
Tätä artikkelia/julkaisua ei ole tallennettu UTUPubiin. Julkaisun tiedoissa voi kuitenkin olla linkki toisaalle tallennettuun artikkeliin / julkaisuun.
Turun yliopisto
Tiivistelmä
Deep learning is playing an increasingly significant role in pattern recognizing and verifying, in which the core structure is convolutional neural networks (CNNs). A well-trained CNN is proved to have achieved a promising correct rate over facial verification, for example. Since a CNN has such incredible pattern learning capability, desires are raised to improve its applicability. The most obvious obstacle lies in time inefficiency because CNN is a huge network containing numerous hidden layers. It is time-consuming even when completing one-way data flow through CNN. To make it worse, considering CNN is a feedback structure, data would sometimes go round repeatedly until it meets the pre-defined target during the training stage. So it is unbearable. That is where hardware accelerator makes sense.
Compromising attempts managing to test it online and train it offline have been widely acknowledged by researchers in recent years. A number of hardware accelerating designs on online testing have emerged, for testing is relatively simple and is not that critical to the limited precision compared to training. However, with the development of parallel computing, programs on GPU may gradually be competent to handle online testing with a satisfying performance. If so, accelerating training process, mainly the back propagation (BP) process, on hardware would become the focus.
Lenet-5, one of the typical CNN, is applied to the experiments in this thesis. The IDE environment is Matlab. Software program is mainly divided into two parts, training and testing. Training is fist part which comprises feed forward, back propagation and weights updating. Public MNIST hand writing database is applied for training and testing the model. Simulation shows that, with number of iterations mounting up, the error rate declines. Note that most time it decreases slowly with light vibrations, and significant improvements occur during a few concentrated epochs. With 10 batch training size out of overall 60,000 dataset, the error rate reaches 4.86%.
Hardware implementation is always with finite precision. The key to guarantee functional trainings is to ensure that the derivatives of weights are within the pre-defined precisions Otherwise those data would be ignored to 0. From mathematical theory of BP, the weight updates of last hidden layer would possibly be the minimum valid data. After numerous tests, 17 bits precision is determined, only scarifying 1% correct rate.
Compromising attempts managing to test it online and train it offline have been widely acknowledged by researchers in recent years. A number of hardware accelerating designs on online testing have emerged, for testing is relatively simple and is not that critical to the limited precision compared to training. However, with the development of parallel computing, programs on GPU may gradually be competent to handle online testing with a satisfying performance. If so, accelerating training process, mainly the back propagation (BP) process, on hardware would become the focus.
Lenet-5, one of the typical CNN, is applied to the experiments in this thesis. The IDE environment is Matlab. Software program is mainly divided into two parts, training and testing. Training is fist part which comprises feed forward, back propagation and weights updating. Public MNIST hand writing database is applied for training and testing the model. Simulation shows that, with number of iterations mounting up, the error rate declines. Note that most time it decreases slowly with light vibrations, and significant improvements occur during a few concentrated epochs. With 10 batch training size out of overall 60,000 dataset, the error rate reaches 4.86%.
Hardware implementation is always with finite precision. The key to guarantee functional trainings is to ensure that the derivatives of weights are within the pre-defined precisions Otherwise those data would be ignored to 0. From mathematical theory of BP, the weight updates of last hidden layer would possibly be the minimum valid data. After numerous tests, 17 bits precision is determined, only scarifying 1% correct rate.