Foundation Prize – Deep Learning

For his scientific work on the subject of “Deep Learning in Image Recognition: Controlling Autonomous Mobile Robots with the Help of Neural Networks”, an employee of Organon Informationssysteme GmbH received the Friedrich Dessauer Foundation Award on February 2, 2018.

Foundation Prize - Deep Learning

The extraordinary capabilities and achievements of the human brain are an enormous challenge to artificial intelligence research, as well as a constant incentive to understand the functioning of the brain and, analogously, simulate it on fast computer systems using artificial neural networks. Since the work on the first model for neurons in 1943, mathematical and programming concepts for artificial neural networks have been steadily developed. The introduction of new techniques of artificial intelligence, e.g. Machine learning (a self-adaptive algorithm), as well as deep learning (which is a class of artificial neural network optimization methods) coupled with tremendous performance improvements in computer hardware, today enable successes or product innovations previously considered utopian or even impossible. This shows u.a. the Go tournament win of the AlphaGo program against the world’s best human players.

The development of mobile autonomous systems that perform navigation based on image data has in the past been a major challenge due to the limited capabilities and flexibility of existing systems. These challenges were met by the development of new artificial intelligence techniques and the use of sophisticated and costly hardware resources, most of which were available only to a small circle of specialists with sufficient financial resources.

Based on a practical application with a self-developed robotic vehicle and using standard hardware components, consisting of a consumer desktop with graphics card and a Rapsberry Pi, the scientific work shows how computers autonomously learn to orient themselves and navigate through visual perception in their environment.

The construction and training of deep neural networks is very resource-intensive due to the large number of calculations. It is therefore necessary to use correspondingly high-performance hardware. These calculations, which take place in the forward and backward steps for each node in the graph of a neural network, are highly parallelizable, which makes the use of a powerful GPU (Graphical Processing Unit) particularly suitable. A GPU with its several hundreds of cores has a distinct advantage over a current CPU with its 4 – 8 cores when processing parallel arithmetic tasks.

Neural Learning Frameworks – Deep Learning Approach

There are already a number of frameworks for neural learning using graphics cards. For a thematic introduction to the objective of the work, these frameworks were chosen for the practical implementation of the concepts of deep learning, due to the good documentation and the provided source code, also to the models covered, TensorFlow and the programming language Python.

For neural network learning in the present hardware, the approach was chosen to test already published neural network models with good recognition capabilities for their ability to image recognition in dynamic systems and, after selection, a solution to control the robotic vehicle developed in this work to implement. For this purpose, the models that were unchanged in their structure with the intended data records as well as adapted models with their own training data were tested for their operability and performance on standard hardware. Finally, the theoretical concepts and models of deep learning in artificial neural networks were implemented with the practical use of the robot vehicle.

AlexNet and Inception

With the AlexNet and Inception models, two of the best-placed architectures of recent times have been selected and evaluated for the underlying work. In order to determine the position of the object, in this case a soccer ball, on the image and to use this to control the navigation of the robot, the image is subdivided into individual segments. Each of these segments is then used for the evaluation. The segment with the highest probability of detection contains football. If the segment lies or, if there is a high probability in several segments, the segments, in the right or left area the robot vehicle navigates in the respective direction. If the segments cluster with a relatively high probability of detection in the middle, no change in direction is initiated, but continued in this direction. A good recognition rate was already achieved in the tests with a segmentation in 12 image parts. These image parts are correspondingly 160×160 pixels in size, with a camera resolution of 640×480 of the robotic vehicle.

Object recognition and object localization

Object recognition and localization using an artificial neural network requires system and computational power that is challenging for mobile deployable platforms on the market. Since there is only a port of TensorFlow for the Raspberry Pi 3 Model B, one of the most powerful representatives of single board computers (SBC), it is used in the project carried out here. The Raspberry Pi is the central unit of the established hardware, whose entire components are controlled here by means of Python programs.

Rasperry Pi as the hardware basis for the robot vehicle

Since the robotic vehicle is an in-house developed system, before the actual development of the hardware, a detailed planning with definition and design for the basic suitability of all components and their interaction for the project of mobile use with artificial neural networks must be created. Since the processing of individual images on the Raspberry Pi due to the lack of resources and especially the lack of ability to use the GPU of the Raspberry Pi takes about 3 seconds per image and per evaluation i.d.R. several image segments must be evaluated, this would slow down the navigation. Therefore, a client / server solution was implemented in which the images are transferred to a powerful computer. There, an evaluation of the 12 segments in the image in < 70 ms per segment, so that a sufficiently fast evaluation of an image for a smooth navigation is possible. With the work it could be shown that the methods of deep learning in Deep Convolutional Neural Networks used in image recognition can give a robotic vehicle the ability to visually perceive objects in its environment in a similar way to humans on the basis of independently learned patterns of objects Orient objects in space and navigate autonomously. The desired goal could be successfully implemented despite comparatively limited hardware resources. The simple, practical application can not hide the unimagined possibilities that open up for new applications with the help of self-learning methods in artificial neural networks.

Outlook for future developments

Examining the underlying concepts and models of deep learning in image recognition, whose success relies heavily on the use of complex mathematical concepts of linear algebra and statistical methods, has been a challenge and, due to convincing ideas, deeply impressive. The once created application with the robotic vehicle can certainly be further expanded for future applications, which continues to be of great interest.