AWS DeepRacer- Educational Autonomous Racing Platform
Focusing on reinforcement learning, AWS deepracer is a program that involves development of automobile applications. It helps users to understand and implement reinforcement learning and advanced machine learning better. Deepracer makes learning more fun.
Deepracer is a small scale project based on racing car development, introduced by AWS (Amazon Web Service) that is claimed to be an efficient way to study reinforcement learning. It is 1:18 scale race car that consists of an intel atom processor and a camera. It was first introduced in September of 2018 for users willing to learn and implement reinforcement learning in an innovative way. The users are introduced to reinforcement learning through the deep racer console where users can configure their model to their liking. They can provide custom codes that can help achieve results with high accuracy. They can provide training and evaluate it through the console on a virtual environment. This virtual training can be uploaded to the actual car and the user can then observe its capabilities in the real world and can take further actions accordingly.
AWS has also introduced DeepRacer League where a number of people with medium to high development skills participate to race their cars against the clock and striving for the shortest time. This league was introduced in 2018 alongside deep racer in order to test and compare the results acquired from all the training provided to it by its user. During this league, the participants train their bots and submit their shortest time to AWS and the top ones are called for the final competition held in a separate event.
Unlike other machine learning paradigms that use preset data to achieve any desired outcome, reinforcement learning is based on the data that it experiences first hand by performing a certain set of actions. Each action taken during its training is acknowledged by a reward or penalty based on the accuracy of the same. Its primary goal is to achieve more rewards with the least number of penalties. This is basically represented by Markov Decision Process (MDP). Under action a, the probability of transition from state s to s’.
Pa(s,s’) = Pr(st+1 = s’ | st = s, at = a)
Deepracer follows the same approach where it trains itself to gain expertise and provide improved results. It follows an algorithm called Proximal Policy Optimization (PPO) for training the model. Two basic neural networks are used in this algorithm:
The critic network is used to evaluate the average reward value that is gained from taking the image as an input and processing it.
The actor-network, on the other hand, is used to determine the action to be taken on the basis of a situation in order to get the highest reward possible.
It also includes hyperparameters that are the variables used to tweak the controls of its learning process. It controls the amount of experience that is gained in each learning step.
The primary stage of deep racer is called exploration where it takes some random actions and observes its reward value to obtain a better understanding over the environment. During this stage the car moves around and takes pictures from the attached camera at 10fps to 15fps(frames per second). The pictures are also sized down to 160x120 pixels and converted to grayscale in order to execute the initialization of neural networks in aws stagemaker. This leads to steering and throttle of the car. OpenAI Gym interface is used to connect the model car and stimulator that helps to show the image in the stimulator as well. Simultaneously, it updates the position and returns with an updated image along with its corresponding reward. A collection of such data are stored in Redis that is also used in the training of neural networks. The robomaker clones the models that are saved in S3 as a result further creating experienced data. These models are created in the combination of all those data collected. Post this training exploitation stage comes to picture where it makes use of all of those trainings and returns the best possible outcome.