Training a Smart Cab

This project trains a smart cab agent through reinforcement learning. Specifically, Q-learning is applied to find an optimal action-selection policy in a simulated driving environment. The simulated environment starts the cab on the map and the cab drives to a destination. To assess the smart cab performance, I will be evaluating performance based off two metrics: Safety and Reliability. The goal is to train the smart cab so that it achieves an A grade in both metrics. This project was completed as part of Udacity's Machine Learning Degree. Python was used to create the results, the smart cab, and the simulated environment.

Safety and Reliability are measured using a letter-grade system as follows:

Grade Safety Reliability
A+ Agent commits no traffic violations,
and always chooses the correct action.
Agent reaches the destination in time
for 100% of trips.
A Agent commits few minor traffic violations,
such as failing to move on a green light.
Agent reaches the destination on time
for at least 90% of trips.
B Agent commits frequent minor traffic violations,
such as failing to move on a green light.
Agent reaches the destination on time
for at least 80% of trips.
C Agent commits at least one major traffic violation,
such as driving through a red light.
Agent reaches the destination on time
for at least 70% of trips.
D Agent causes at least one minor accident,
such as turning left on green with oncoming traffic.
Agent reaches the destination on time
for at least 60% of trips.
F Agent causes at least one major accident,
such as driving through a red light with cross-traffic.
Agent fails to reach the destination on time
for at least 60% of trips.
In [2]:
# Import the visualization code
import visuals as vs

# Pretty display for notebooks
%matplotlib inline

Implement a Basic Driving Agent

Before implementing Q-Learning, I created a smart cab that first takes random actions. The agent can choose the actions of none, left, right, or forward. The graphs below show the results of the smart cab taking random actions in the environment.

In [5]:
#Loads the 'sim_no-learning' log file from the initial simulation results
vs.plot_trials('sim_no-learning.csv')

From the top the left graph, this visualization provides an overview of the relative frequency of bad actions and violations graphed compared to the number of trials. On average, the smart cab agent is making bad decisions around 45% of the time with a minor and major accident frequency of 6%.

The bottom left graph shows the rate of reliability over the number of trials. The reliability rate remains relatively consistent around 20% for the trials. I would expect the rate to remain relatively flat because the smart cab agent is performing random actions.

The top right graph shows the average rewards the agent is receiving. The average reward is a negative amount of 5. This makes sense as the smart cab agent is performing bad actions 45% of the time. This shows that the smart cab agent is penalized more for wrong actions than right actions.

Overall, the results do not change as the number of trials increase because the smart cab agent is not learning and only taking random actions.


Define the Environment States

To make the smart cab agent start learning, I first defined the states the smart cab can encounter in my python code. In each state, the smart cab will consider several features relevant to improving safety and reliability. For safety, the most relevant features are the traffic light, the oncoming traffic direction, and the left traffic direction. For efficiency, the feature is the next direction waypoint. This informs the smart cab which direction it should take next to reach the destination. There are about 100 different states for the features I selected.


Implement Q-Learning

After defining the states, I implemented Q-Learning. Q-Learning involved creating a Q-table that contains each possible environment state. The Q-table is then updated each time the smart cab agent takes an action with the corresponding positive or negative reward value. Using this Q-table, the smart cab agent can start learning about its environment and fill in the Q-table. After the smart cab is finished exploring and learning about its environment, the smart cab agent can then take actions based on what it has learned from the past. It will make its decisions by choosing the actions that provide the highest positive reward value for the environment states. For the first algorithm, I used a basic linear decay factor to specify the exploration and learning time.

$$ \epsilon_{t+1} = \epsilon_{t} - 0.05, \hspace{10px}\textrm{for trial number } t$$

Below are the visual results after implementing the first iteration of Q-Learning using a basic linear decay factor.

In [6]:
# Load the 'sim_default-learning' file from the default Q-Learning simulation
vs.plot_trials('sim_default-learning.csv')

Implementing the basic Q-Learning function did not improve the safety and reliability ratings; however, there were improvements to the smart cab agent's decisions. For the first graph on the top left, the smart cab agent starts to perform better as the trial number increases because the frequency for bad actions and accidents decrease. This reveals that the smart cab agent is learning from its initial mistakes and not making the same errors.

On the bottom left graph, the rate of reliability increased significantly starting at trial 14 and increased to 60%. 20 trials were not enough to increase the reliability higher, but this indicates a reliability improvement from the previous simulation which only had 20% reliability.

On the top right graph, the rewards also improved as the smart cab began more positive rewards. At the end of 20 trials, the reward was averaging around negative one point instead of negative four points.

Overall, the safety and reliability rating still received an F in this case, but the graphs do show an improvement in the driving agent.

Improving the Simulation Results

To improve my simulation results, I changed the Q-Learning parameters. The changed I made were using different decay functions, changing the learning rate, and changing the exploration factor. The decay functions I experimented with were the following:

$$ \epsilon = a^t, \textrm{for } 0 < a < 1 \hspace{50px}\epsilon = \frac{1}{t^2}\hspace{50px}\epsilon = e^{-at}, \textrm{for } 0 < a < 1 \hspace{50px} \epsilon = \cos(at), \textrm{for } 0 < a < 1$$

These updates allowed the smart cab agent to explore more of the simulation environment, and improve its Q-table results so that it can make better decisions.

In [17]:
# Load the 'sim_improved-learning' file from the improved Q-Learning simulation
vs.plot_trials('sim_improved-learning.csv')

By updating my Q-Learning algorithm for the smart cab agent, I improved the simulation results. I found that the epsilon cosign decay function created the most initial improvement to my results. I also changed my learning and alpha rates. This makes my smart cab agent do more exploration to understand the different states, increasing the number of trials and finding the best Q values for each state. The smart cab agent performed approximately 1,500 trials to learn its environment.

Overall, these changes improved my smart cab to obtain an A+ safety rating and an A reliability rating.