How can I optimize this integer programming constraint problem without running out of memory?

2020-06-28

An answer to this question on the Operations Research Stack Exchange.

Question

I am trying to run this constraint problem but the memory runs out, $$S_{i}$$ are 1975 Students that need to be assigned to one of 188 teacher assistants classes, each teacher assistant has to chose a time slot $$TA_{j}$$ from out of 8. Each Teacher assistant and Student have their expressed time slots in dfTA and dfS data frames.

The idea is to assign a teacher assistant to each student and a time slot to each teacher assistant. Of course all students in a class must be able to take that class as well as the teacher assistant to give it.

import constraint
problem = constraint.Problem()
for i in range(0,1974):
    problem.addVariable(f'S_{i}', range(0,187))
for i in range(0,187):
    problem.addVariable(f'TA_{i}', range(0,8))
for S in range(0,1974):
    for TA in range(0,187):
        exec(f"""def timezone{S}_{TA}(s,t):
                if s!=TA:
                    return True
                if s==TA and (dfS.iloc[S,1+t]>0)*(dfTA.iloc[TA,t]>0):
                    return True
                else:
                    return False
problem.addConstraint(timezone{S}_{TA}, ['S_{S}','TA_{TA}'])""")
problem.getSolutions()

If anyone knows how to solve this or optimize this it would be of great use.

Link to a Colab notebook: https://colab.research.google.com/drive/1pb9qM13S2GmpjHAWIAEUCRwYwvyF68IT?usp=sharing

And data: https://drive.google.com/drive/folders/1J6yAfXIKcn0NZrT6xtluxt71nyhuR0ak?usp=sharing

Answer

You could try using Z3, like so:

#!/usr/bin/env python3
import itertools
import pandas as pd
import z3
#Read data
df   = pd.read_csv('constraints.csv')
dfTA = pd.read_csv('ta_timezone.csv')
dfS  = pd.read_csv('student_timezone.csv')
problem = z3.Solver()
number_of_students  = 1974
number_of_tas       = 187
number_of_timeslots = 9
student_to_ta = []
for i in range(number_of_students):
  temp = z3.Int(f"S_{i}")
  student_to_ta.append(temp)
  problem.add(0<=temp)
  problem.add(temp<number_of_tas)
#Assign TAs to timeslots
ta_to_timeslot = []
for i in range(number_of_tas):
  temp = z3.Int(f"TA_{i}")
  ta_to_timeslot.append(temp)
  problem.add(0<=temp)
  problem.add(temp<number_of_timeslots)
  for t in range(number_of_timeslots): #TA must like this time slot
    problem.add(z3.Implies(temp==t, bool(dfTA.iloc[i,t]>0)))
#Assign students to TAs
for s, ta, t in itertools.product(range(0,number_of_students), range(0,number_of_tas), range(number_of_timeslots)):
  #If (student is assigned to TA and TA is assigned to this timeslot) then (student must like this time slot)
  problem.add(z3.Implies(z3.And(student_to_ta[s]==ta, ta_to_timeslot[ta]==t), bool(dfS.iloc[s,1+t]>0)))
if problem.check()!=z3.sat:
  print("Problem could not be solved!")
else:
  solution = problem.model()
  print("Students to TAs: ", [solution[x] for x in student_to_ta])
  print("TAs to Timeslots: ", [solution[x] for x in ta_to_timeslot])

However, the problem is big and Python is slow, so this formulation may also be problematic. Moving to C++ might be a more robust option for a problem of this size. I'd suggest Julia, but its Z3 package is 5 years old and doesn't work any more.