How can I optimize this integer programming constraint problem without running out of memory?
An answer to this question on the Operations Research Stack Exchange.
Question
I am trying to run this constraint problem but the memory runs out, $$S_{i}$$ are 1975 Students that need to be assigned to one of 188 teacher assistants classes, each teacher assistant has to chose a time slot $$TA_{j}$$ from out of 8. Each Teacher assistant and Student have their expressed time slots in dfTA and dfS data frames.
The idea is to assign a teacher assistant to each student and a time slot to each teacher assistant. Of course all students in a class must be able to take that class as well as the teacher assistant to give it.
import constraint
problem = constraint.Problem()
for i in range(0,1974):
problem.addVariable(f'S_{i}', range(0,187))
for i in range(0,187):
problem.addVariable(f'TA_{i}', range(0,8))
for S in range(0,1974):
for TA in range(0,187):
exec(f"""def timezone{S}_{TA}(s,t):
if s!=TA:
return True
if s==TA and (dfS.iloc[S,1+t]>0)*(dfTA.iloc[TA,t]>0):
return True
else:
return False
problem.addConstraint(timezone{S}_{TA}, ['S_{S}','TA_{TA}'])""")
problem.getSolutions()
If anyone knows how to solve this or optimize this it would be of great use.
Link to a Colab notebook: https://colab.research.google.com/drive/1pb9qM13S2GmpjHAWIAEUCRwYwvyF68IT?usp=sharing
And data: https://drive.google.com/drive/folders/1J6yAfXIKcn0NZrT6xtluxt71nyhuR0ak?usp=sharing
Answer
You could try using Z3, like so:
#!/usr/bin/env python3
import itertools
import pandas as pd
import z3
#Read data
df = pd.read_csv('constraints.csv')
dfTA = pd.read_csv('ta_timezone.csv')
dfS = pd.read_csv('student_timezone.csv')
problem = z3.Solver()
number_of_students = 1974
number_of_tas = 187
number_of_timeslots = 9
student_to_ta = []
for i in range(number_of_students):
temp = z3.Int(f"S_{i}")
student_to_ta.append(temp)
problem.add(0<=temp)
problem.add(temp<number_of_tas)
#Assign TAs to timeslots
ta_to_timeslot = []
for i in range(number_of_tas):
temp = z3.Int(f"TA_{i}")
ta_to_timeslot.append(temp)
problem.add(0<=temp)
problem.add(temp<number_of_timeslots)
for t in range(number_of_timeslots): #TA must like this time slot
problem.add(z3.Implies(temp==t, bool(dfTA.iloc[i,t]>0)))
#Assign students to TAs
for s, ta, t in itertools.product(range(0,number_of_students), range(0,number_of_tas), range(number_of_timeslots)):
#If (student is assigned to TA and TA is assigned to this timeslot) then (student must like this time slot)
problem.add(z3.Implies(z3.And(student_to_ta[s]==ta, ta_to_timeslot[ta]==t), bool(dfS.iloc[s,1+t]>0)))
if problem.check()!=z3.sat:
print("Problem could not be solved!")
else:
solution = problem.model()
print("Students to TAs: ", [solution[x] for x in student_to_ta])
print("TAs to Timeslots: ", [solution[x] for x in ta_to_timeslot])
However, the problem is big and Python is slow, so this formulation may also be problematic. Moving to C++ might be a more robust option for a problem of this size. I'd suggest Julia, but its Z3 package is 5 years old and doesn't work any more.