This article is an intermediate-level tutorial on using the GNU Linear Programming Kit (GLPK) to solve a real-world scheduling problem. I wrote it, because I found only few good resources online that show specific solution strategies. This article wants to demystify linear programming and help you to start from a working example.

Let’s first visit the problem. Scheduling outings for a rowing club can be tedious – and sometimes quite hard. Given a list of squad members and their availabilities, we want to create outings with exactly 1 coach, 1 cox, and 4 people for each side of the boat (bow/stroke). Rowing is sometimes duped the “ultimate team sport”, because even one missing person means that the entire crew cannot go out. When faced with this challenge during term break where many people are only available sporadically, I gave up after trying to do so manually and resorted to automating it. In turn, this allowed me to do more “bespoke” scheduling where I could offer everyone to provide fine-grained availability and preferences.

How can GLPK help us here? In linear programming we describe problems using a mathematical model. These models have parameters (data we provide), variables (solutions we search), constraints, and one objective. All these can be provided either through an API or, as we do here, by writing them in the GNU MathProg modelling language. Afterwards we can simply call the solver which returns an optimal solution – or an error if our constraints do not allow for one. Hence, our complex real-world problem is reduced to formalising it in a few equations.

We divide our MathProg program into three parts: parameter and variable declarations; constraints and objective; and our data.

Parameters and Variables

Let’s start by declaring which information we provide about the squad. For this we first define r as the number of people which form the set R and are simply numbered from 1 to r. While it is also possible to use descriptive atoms in the set, I found it easier to do the mapping between numbers and names in my code that calls into GLPK.

param r, integer, > 0;
set R := 1..r;

For each member of the squad we indicate which roles they can fulfil. Also, everyone can provide a lower and upper bound for the number of outings they prefer.

/* roles a squad member can fulfil */
param coach{i in R}, binary;
param cox{i in R}, binary; 
param stroke{i in R}, binary;
param bow{i in R}, binary;

/* boundaries for number of outings */
param lower{i in R}, integer;
param upper{i in R}, integer, >= 0;

Similar to the squad we define a set O for all possible outing dates and times. The matrix a contains a 1 where a squad member is available for an outing.

param o, integer, > 0;
set O := 1..o;

/* availability (1 iff member `i` can row in outing `j`, else 0) */
param a{i in R, j in O}, binary;

These are all parameters – aka the data that we will provide as input. Now we define the variables that the solver shall figure out. Matrix x describes the assignments from crew members to outings. In addition we define z as a helper variable for all outings/columns that are happening. We’ll see soon why this is handy.

/* Actual assignments: x[i,j] = 1 means rower i is assigned to outing j */
var x{i in R, j in O}, binary;

/* Outings that are active */
var z{j in O}, binary;

Constraints and Objective

With all the setup done, we can now link parameters and variables using constraints. Otherwise, any output would be a valid solution. As the name Linear Programming suggests, these constraints must be linear (in-)equalities. However, one can model surprisingly much with these. We start by making sure that the solver looks for a solution such that (s.t.) squad members are assigned only when they are available and s.t. they do not fewer or more outings than they prefer to.

s.t. ONLY_WHEN_AVAILABLE{i in R, j in O}: x[i, j], <= a[i, j];

s.t. OUTINGS_LOWER{i in R}: sum{j in O} x[i, j], >= lower[i];
s.t. OUTINGS_UPPER{i in R}: sum{j in O} x[i, j], <= upper[i];

Then we look at the individual outings (the columns of our matrix). We use the following to ensure that squad members with the right roles are selected for the outings. Here our helper variable z comes in handy. Note that we use inequalities for the stroke/bow-side assignments. This is because the squad members, who can do both sides, are marked as 1 for both and we would otherwise count them twice.

s.t. CREW_HAS_TEN_IN_TOTAL{j in O}: sum{i in R} x[i,j] , = 10*z[j];
s.t. CREW_HAS_ONE_COACH{j in O}: sum{i in R} x[i,j] * coach[i], = 1*z[j];
s.t. CREW_HAS_ONE_COX{j in O}: sum{i in R} x[i,j] * cox[i], = 1*z[j];
s.t. CREW_HAS_FOUR_STROKE{j in O}: sum{i in R} x[i,j] * stroke[i], >= 4*z[j];
s.t. CREw_HAS_FOUR_BOW{j in O}: sum{i in R} x[i,j] * bow[i], >= 4*z[j];

Finally, we specify our objective to maximise the total number of outings. This is followed by the execution of the solver and writing our solution x into a .csv file:

maximize many_outings: sum {j in O} z[j];
solve;
table tbl{(j, i) in {O, R}: x[i,j] = 1} OUT "CSV" "sol_x.csv": j, i;

This is really all there is to writing a model. The only thing missing is the data.

Data

In my project I use a Python script generate data.mod file from a Google spreadsheet. The following is a small made-up example:

data;
param r := 11;
param o := 3;
param coach :=  1 1, 2 0, 3 0, 4 0, 5 0, 6 0, 7 0, 8 0, 9 0, 10 0 11 0;
param cox :=    1 0, 2 1, 3 0, 4 0, 5 0, 6 0, 7 0, 8 0, 9 0, 10 0 11 0;
param stroke := 1 0, 2 0, 3 1, 4 1, 5 1, 6 1, 7 1, 8 0, 9 0, 10 0 11 1;
param bow :=    1 0, 2 0, 3 0, 4 0, 5 0, 6 0, 7 1, 8 1, 9 1, 10 1 11 1;

param lower := 1 0, 2 0, 3 0, 4 1, 5 0, 6 0, 7 0, 8 0, 9 0, 10 0, 11 1;
param upper := 1 4, 2 5, 3 3, 4 4, 5 4, 6 2, 7 1, 8 2, 9 3, 10 4, 11 4;

param a : 1  2  3 :=
      1   0  1  1 
      2   1  1  1 
      3   1  1  1 
      4   1  1  1 
      5   1  1  1 
      6   1  1  1 
      7   1  1  1 
      8   1  1  1 
      9   1  1  1 
      10  1  1  1 
      11  1  1  1;
end;

Now, we combine our model and data, and run them through the solver:

$ cat header.mod data.mod > input.mod && glpsol --model input.mod
GLPSOL: GLPK LP/MIP Solver, v4.65
[...]
Model has been successfully generated
GLPK Integer Optimizer, v4.65
71 rows, 36 columns, 189 non-zeros
36 integer variables, all of which are binary
[...]
Solving LP relaxation...
GLPK Simplex Optimizer, v4.65
13 rows, 24 columns, 64 non-zeros
      0: obj =  -0.000000000e+00 inf =   2.000e+00 (2)
      2: obj =   2.500000000e-01 inf =   0.000e+00 (0)
*    22: obj =   2.000000000e+00 inf =   0.000e+00 (0)
OPTIMAL LP SOLUTION FOUND
Integer optimization begins...
Long-step dual simplex will be used
+    22: mip =     not found yet <=              +inf        (1; 0)
+    22: >>>>>   2.000000000e+00 <=   2.000000000e+00   0.0% (1; 0)
+    22: mip =   2.000000000e+00 <=     tree is empty   0.0% (0; 1)
INTEGER OPTIMAL SOLUTION FOUND
Time used:   0.0 secs
Memory used: 0.2 Mb (230005 bytes)

Discussion

We arrive at one optimal solution very quickly. However, I found that with the simple constraints we used, there is still fine-tuning necessary. For instance, many people would be assigned multiple days in a row followed by multiple days of pausing. Similarly, the number of outings of each person was often exactly their lower or upper bound. Nevertheless, fixing these manually by swapping people around is much easier when starting from a valid solution.

For a more sophisticated automated solution, this linear solver can be used to create an initial population of valid solutions. Then we can feed these into an Evolutionary Algorithm (or similar) where we further optimise using a non-linear fitness function. This is especially handy, as Evolutionary Algorithms can struggle to find any valid solutions to start with when facing many strict constraints.

If you are interested in more background and examples, IBM had a great tutorial series on GLPK. While it seems to have gone lost when they updated their website, there are copies available on the Wayback Machine:

Credits: (C) cover photo by me.