RL vs Optimal Control: LQR for Trajectory Tracking (With Python Code)

Kowshik chilamkurthy
6 min readNov 4, 2021

--

Left: source, Right: Author Generated

Introduction

In this blog series, we will learn about classical methods in optimal control which in someway laid a solid foundation for more familiar topics like reinforcement learning. There is an inevitable common boundary between these two fields and this series is aimed to propose these formal methods in optimal control as an effective alternative to the Reinforcement Learning methods. Optimal control or RL aims to find the best way to act/control in sequential manner under some constraints and under constantly changing dynamics of the system.

Optimal Control

Fig 1: Waypoints to follow, Author Generated

Before we jump into the topic, let’s take a simple motivation example which helps in formalising different terms in optimal control methods.
Trajectory Optimisation: Your car is given a task to control itself in such a way that it follows a defined waypoints or trajectory.

If 𝓾ᵢ is the control of car at time step i

𝓾ᵢ= { throttle, Steering }

So the car needs to apply series of control steps U: 𝓾₀,𝓾₁, 𝓾₂ …𝓾n and reach all the waypoints as shown in the Fig 1. This sequential nature of the problem is what makes this problem very different from the rest and these problems like these can be easily solved using Optimal Control

We need to more formal definition to give structure to this method and also to our intuition.

Formalism

The task is finding an admissible control trajectory U generating the corresponding state trajectory X using the dynamic systems f such that the cost functional J(U) is minimised.

Don’t be intimidated by the sudden influx of jargon, they are very intuitive if break the definition and understand each term using our motivation example.
1. Control Trajectory (U): Control 𝓾ᵢ is the input at time step i and the series are these control steps(Throttle, Steering) will help the car to reach its goal of passing through all waypoints.
2. State Trajectory (X): State 𝒙ᵢ is the description of present scene. For example: “State” of our car comprises current position (e.g. X, Y coordinate) and orientation (yaw γ) of the robot in the world. This state is obviously impacted by changes to the control inputs.

3. Dynamic System (f): This is just a mathematical model which tells how the world works. In Simple words, this math model f predicts the next state 𝒙ᵢ+₁ when taken an action 𝓾ᵢ from a state 𝒙ᵢ.

In the motivation car example, we can use simple kinematic equations to mathematically pin down the system dynamics f

4. cost functional (J): Optimal control is all about the cost function. It helps us mathematically describe the goal/task of the sequential problem statement. Optimal control methods calculate an optimal trajectory U by optimising a cost function J(U).

In the trajectory problem defined in the beginning, the goal is to reach the defined waypoints. We can define the cost function as the the cosine distance between the car position and the next waypoint, thus as we minimise the cosine distance cost we get closer to the next waypoint.

Linear Quadratic Regulator (LQR)

A special case of optimal control problem where the the dynamic equation f is linear and the objective function is a quadratic function of x and u.
This subproblem is very fundamental amongst optimal control because there exists an analytical solution to the LQR problem using the differential Ricatti equation.

Let’s quickly define system dynamics function (f) , Cost function J(U) for LQR and we will solve the car trajectory optimisation problem by coding LQR solution in python.

So the idea is simple, the dynamics equation is linear and we use Matrices A and B to describe linearity. The cost function J(U) is less obvious: it is a quadratic function of x and u,

Q(t) is called the state cost matrix. Q helps us weigh the relative importance of each state in the state vector. Q penalises bad performance. Q enables us to target states where we want low error by making the corresponding value of Q large. R(t) is the input cost matrix. This matrix penalises control.

Solution: Algebraic Riccati Equation

We will briefly discuss the analytical solution to the LQR problem. It is one simplification which lead to the effective solution: control law can be expressed as a linear state feedback

R is input cost matrix and B is input matrix. Now S(t) is the solution to the differential Ricatti equation:

The solution to this can be found using backward induction and we will directly jump to the final solution:

Trajectory Tracking Problem: LQR

Now we will implement the car trajectory tracking problem using LQR and python. To define a problem in optimal control paradigm, we need to define state, control, system dynamics and cost function.

We defined state vector while introducing the problem itself.

Next we need to define the system dynamics equation which is linear.

The control 𝓾ᵢ comprised velocity and steering input. Now we can use a simple case of frictionless kinematic equations to approximate car movement in the world.

Next is to use matrix manipulations to find A and B matrices

Q penalises large differences between where you want the car to be vs where it is right now. Q enables us to target states where we want low error by making the corresponding value of Q large. Q has positive values along the diagonal and zeros elsewhere.

The code is available in this repo: open ipython to give different waypoints and test the analytical solution yourselves.

Results

Author Generated

This is simple problem, where the time steps are kept large that means that agent can cover larger distance is single time steps. The waypoints are kept perpendicular to each other so the car takes semi circular trajectory from one waypoint to other.

Author Generated

We gave more smoother trajectory where the waypoints are kept very close. The LQR worked really effectively.

Let’s give more complex trajectory and observe different control parameters

Author Generated

We can observe here that the trajectory is matching very closely.

Author Generated

Thanks for your time !

--

--

Kowshik chilamkurthy
Kowshik chilamkurthy

Written by Kowshik chilamkurthy

RL | ML | ALGO TRADING | TRANSPORTATION | GAME THEORY

Responses (1)