How to stack N boxes of varying heights into M stacks, most evenly

2021-11-28

scicomp sorting stackexchange

An answer to this question on the Scientific Computing Stack Exchange.

Question

The "standard" box-stacking algorithm(s) AFAIK assume a single stack and try to put the "largest" boxes on the bottom. The case I want to solve is simply to distribute the N boxes over (all) M stacks, such that the stacks are all approximately the same height (to the extent possible). The specific order of the boxes is not relevant. The boxes vary in height over a range {0.25, 1.00}. At first, I thought some "binomial" approach might be useful, but can't seem to get my head around it. Appreciate any ideas!

EDIT: @PC1 - Among multiple solutions, "a" would be "better" than "b" if the average difference in stack heights was smaller.
@nicoguaro- Minimizing std dev sounds promising! I'll test that!

Answer

Overview

An optimization problem like this has almost always been considered previously, so one of the first lines of attack is to think of all the different ways your problem could be expressed and use those to search for research on the subject.

Your problem is equivalent to multiway number partitioning. Which asks for a multiset of numbers $S$ how they can be partitioned between $k$ sets such that the sums of each set are as close as possible. Your problem is also related to a number of machine scheduling problems.

As you and the commenters have discussed, there are a few ways to define "as close as possible". For instance, you could:

Minimize the difference between the largest and smallest sum
Minimize the largest sum
Maximize the smallest sum

These three objective functions give different values for $k\ge 3$, but all are approximations of formalizing what you're looking for.

For all three objective functions the problem is NP-hard. This means that you cannot know you have the best solution without somehow considering all possible solutions.

NP-hard problems are often approached by looking for approximate answers with provable quality guarantees because it would take far too long to examine the entire solution space to the true best solution.

Several algorithms give quick, approximate answers to your problem.

Approximate Solutions

List Scheduling

List scheduling, an example of greedy number partitioning iterates over the numbers in $S$ and assigns them one by one whatever set currently has the smallest sum. An implementation might look like this:

#!/usr/bin/env python3
from typing import List
import numpy as np
np.random.seed(123456)
num_sets = 5
numbers = np.random.random(size=100)
sets: List[List[int]] = [[] for _ in range(num_sets)]
set_sums = np.array([0.0 for _ in range(num_sets)])
for n in numbers:
  smallest_set = np.argmin(set_sums)
  sets[smallest_set].append(n)
  set_sums[smallest_set] += n
print(set_sums)

Note that this solution is "online" - you can run it continuously without knowing what the next number in $S$ will be.

This algorithm produces a solution whose largest sum is at most $2-\frac{1}{k}$ times the optimal largest sum.

Longest Processing Time

Longest-processing-time-first (LPT) is similar to the above, except the numbers are sorted in descending order before being added to the sets. Because sorting is required, the algorithm is no longer "online". An implementation might look like this:

#!/usr/bin/env python3
from typing import List
import numpy as np
np.random.seed(123456)
num_sets = 5
numbers = np.random.random(size=100)
numbers = -np.sort(-numbers) #Reverse-sort array
sets: List[List[int]] = [[] for _ in range(num_sets)]
set_sums = np.array([0.0 for _ in range(num_sets)])
for n in numbers:
  smallest_set = np.argmin(set_sums)
  sets[smallest_set].append(n)
  set_sums[smallest_set] += n
print(set_sums)

This algorithm produces a solution whose largest sum is at most $\frac{4}{3}-\frac{1}{k}$ times the optimal largest sum. The smallest sum will be at most $\frac{3k-1}{4k-2}$ times the optimal smallest sum. For iid values, the bound may be better: $1+O(\log \log n / n)$, but you should verify this using Frenk and Kan (1986, link).

SLACK

Croce and Scatamacchia (2018) introduce a heuristic they call SLACK for solving the problem. Though I haven't found a proven bound for it, they claim it outperforms LPT on test instances. The idea is to sort the numbers in decreasing order, split them into $k$ tuples, order those tuples by the difference between the maximum and minimum values of their entries, concatenate the tuples, and then use list scheduling to get the final solution. An implementation of this would look like:

#!/usr/bin/env python3
from typing import List
import numpy as np
np.random.seed(123456)
num_sets = 5
numbers = np.random.random(size=100)
numbers = -np.sort(-numbers) #Reverse-sort array
expected_size = int(np.ceil(len(numbers) / num_sets))
numbers_split = np.array_split(numbers, num_sets)
numbers_split.sort(key = lambda x: np.max(x) - np.min(x) if len(x)==expected_size else np.max(x), reverse=True)
numbers_comb = np.hstack(numbers_split)
sets: List[List[int]] = [[] for _ in range(num_sets)]
set_sums = np.array([0.0 for _ in range(num_sets)])
for n in numbers_comb:
  smallest_set = np.argmin(set_sums)
  sets[smallest_set].append(n)
  set_sums[smallest_set] += n
print(set_sums)

Karmarkar-Karp

The Karmarkar-Karp Algorithm Algorithm is known to perform no worse and often better than LPT in simulations where numbers are drawn uniformly from the range $[0,1]$. If $k+2\le n \le 2k$ then the largest sum it produces is at most $\frac{4}{3}-\frac{1}{3(n-k-1)}$ times the optimal largest sum. In all cases, the largest sum is at most $\frac{4}{3}-\frac{1}{3k}$ time the optimal largest sum.

The algorithm proceeds in several steps:

For each number $i$ in $S$, we construct a $k$-tuple of subsets, in which one subset is ${i}$ and the other $k-1$ subsets are empty.
At each iteration, we select the two $k$-tuples $A$ and $B$ in which the difference between their maximum and minimum sums is largest, and combine them in reverse order of sizes, i.e.: smallest subset in $A$ with largest subset in $B$, second-smallest in $A$ with the second-largest in $B$, and so on.
We repeat Step 2 until a single partition remains.

An implementation might look like the following:

#!/usr/bin/env python3
import heapq
from typing import List, Tuple
from itertools import count
import numpy as np
Partition = List[List[float]]
def KarmarkarKarp(
  numbers: List[float], num_sets: int
) -> Tuple[Partition, List[float]]:
    def argsort(seq: List[float]) -> List[int]:
        return sorted(range(len(seq)), key=seq.__getitem__)
    partitions: List[Tuple[float, int, Partition, List[float]]] = []
    heap_count = count()  # To avoid ambiguity in heap
    for i in range(len(numbers)):
        this_partition: Partition = [[] for _ in range(num_sets -1)]
        this_partition.append([numbers[i]])
        this_sizes: List[float] = [0.0] * (num_sets - 1) + [numbers[i]]
        heapq.heappush(
            partitions, (-numbers[i], next(heap_count), this_partition, this_sizes)
        )
    for _ in range(len(numbers) - 1):
        _, _, p1, p1_sum = heapq.heappop(partitions)
        _, _, p2, p2_sum = heapq.heappop(partitions)
        new_sizes: List[float] = [
            p1_sum[j] + p2_sum[num_sets - j - 1] for j in range(num_sets)
        ]
        new_partition: Partition = [
            p1[j] + p2[num_sets - j - 1] for j in range(num_sets)
        ]
        indices = argsort(new_sizes)
        new_sizes = [new_sizes[i] for i in indices]
        new_partition = [new_partition[i] for i in indices]
        diff = new_sizes[-1] - new_sizes[0]
        heapq.heappush(partitions, (-diff, next(heap_count), new_partition, new_sizes))
    _, _, final_partition, final_sums = partitions[0]
    return final_partition, final_sums
np.random.seed(123456)
num_sets = 5
numbers = np.random.random(size=100)
sets, set_sums = KarmarkarKarp(numbers, num_sets)
print(set_sums)

Comparison

The above algorithms give these results as the subset sums:

List scheduling
    [9.78607846, 9.89566122, 9.62750904, 9.62253621, 9.59228928]
    max = 9.89566122
LPT
    [9.69177628, 9.69818904, 9.70391739, 9.70841387, 9.72177763]
    max = 9.72177763
SLACK
    [9.69539566 9.69812035 9.71967084 9.70318243 9.70770493]
    max = 9.719670841141273
Karmarkar-Karp
    [9.704530541764623, 9.704676654571145, 9.704761244535716, 9.704787373425246, 9.705318390488912]
    max = 9.705318390488912

List scheduling is the only online algorithm and, with O(N) time, the fastest, but it also gives the worst result (for this input). LPT is better than List Scheduling and SLACK is better than LPT. Karmarkar-Karp gives the best result, but has the trickiest implementation.

Exact Solutions

We can solve the problem exactly using algorithms such as the Complete Karmarkar-Karp. These turn out to be difficult to implement and, perhaps because of this, don't have many implementations available online. A good alternative is to used mixed-integer programming.

Basic formulation. Do to so, let the binary variable $x_{o,b}$ indicate whether object $o$ is assigned to bin $b$, and let $c_b$ be the capacity of bin $b$ (in case we want that). The problem is to minimize $z$ subject to \begin{align} \sum_b x_{o,b} &= 1 &&\text{for all $o$} \tag1\ \sum_o w_o x_{o,b} &\le c_b &&\text{for all $b$} \tag2\ \sum_o w_o x_{o,b} &\le z &&\text{for all $b$} \tag3\ \end{align} Constraint $(1)$ assigns each object to exactly one bin. Constraint $(2)$ enforces the bin capacity. Constraint $(3)$ enforces the minimax objective.

Sparse formulation. For performance reasons, it's convenient to make the problem more sparse by introducing a variable $y_b$ to represent the common LHS of $(1)$ and $(2)$, and then minimizing $z$ subject to \begin{align} \sum_b x_{o,b} &= 1 &&\text{for all $o$} \ \sum_o w_o x_{o,b} &= y_b &&\text{for all $b$} \ y_b &\le c_b &&\text{for all $b$} \ y_b &\le z &&\text{for all $b$} \ \end{align}

Symmetry breaking. Note that the problem, as stated, has a symmetry: a solution that assigns a set $A$ of items to Bin 1 and a set $B$ of items to Bin 2 is just as good as a solution that assigns $A$ to 2 and $B$ to 1. We would like to avoid having our integer programming solver examine both solutions, since they are really the same, so we constrain the bins such that \begin{align} y_b &\le y_{b+1} &&\text{for all $b$} \end{align}

Heuristic upper-bound. The Karmarkar-Karp algorithm above is not guaranteed to give an optimal answer, but it is guaranteed to give a solution within $\frac{4}{3}-\frac{1}{3k}$ of the optimal solution. Thus, we can use the Karmarkar-Karp algorithm to get an upper bound on the actual solution and use this to constrain the integer programming solution as follows: \begin{align} y_b &\le KK_\textrm{sol} &&\text{for all $b$} \end{align}

Few Distinct Items. If we know that many items have the same weight (not the case for the question you've posed, but relevant to others), we can exploit this symmetry to reduce the problem size by letting $n_t$ be the number of objects of type $t$ (with weight $w_t$) and treating $x_{t,b}$ as an integer variable that represents the number of objects of type $t$ that are assigned to bin $b$. The problem then is to minimize $z$ subject to \begin{align} \sum_b x_{t,b} &= n_t &&\text{for all $t$} \ \sum_t w_t x_{t,b} &= y_b &&\text{for all $b$} \ y_b &\le c_b &&\text{for all $b$} \ y_b &\le z &&\text{for all $b$} \ \end{align}

The above leads to two implementations. Both use the sparse formulation with symmetry breaking and optional heuristic upper bounds. We call the one optimized for many distinct items DistinctItemsILP and the one optimized for few distinct items DuplicateItemsILP.

Unfortunately, MIPs can take a while to run, so we'll have to add a timeout value.

We can implement all that as follows:

#!/usr/bin/env python3
import heapq
from collections import Counter
from itertools import count
from typing import List, Optional, Tuple
import cvxpy as cp
import numpy as np
ResultType = Tuple[List[List[float]], List[float]]
Partition = List[List[float]]
def KarmarkarKarp(
  numbers: List[float], num_sets: int
) -> Tuple[Partition, List[float]]:
    def argsort(seq: List[float]) -> List[int]:
        return sorted(range(len(seq)), key=seq.__getitem__)
    partitions: List[Tuple[float, int, Partition, List[float]]] = []
    heap_count = count()  # To avoid ambiguity in heap
    for i in range(len(numbers)):
        this_partition: Partition = [[] for _ in range(num_sets -1)]
        this_partition.append([numbers[i]])
        this_sizes: List[float] = [0.0] * (num_sets - 1) + [numbers[i]]
        heapq.heappush(
            partitions, (-numbers[i], next(heap_count), this_partition, this_sizes)
        )
    for _ in range(len(numbers) - 1):
        _, _, p1, p1_sum = heapq.heappop(partitions)
        _, _, p2, p2_sum = heapq.heappop(partitions)
        new_sizes: List[float] = [
            p1_sum[j] + p2_sum[num_sets - j - 1] for j in range(num_sets)
        ]
        new_partition: Partition = [
            p1[j] + p2[num_sets - j - 1] for j in range(num_sets)
        ]
        indices = argsort(new_sizes)
        new_sizes = [new_sizes[i] for i in indices]
        new_partition = [new_partition[i] for i in indices]
        diff = new_sizes[-1] - new_sizes[0]
        heapq.heappush(partitions, (-diff, next(heap_count), new_partition, new_sizes))
    _, _, final_partition, final_sums = partitions[0]
    return final_partition, final_sums
def DistinctItemsILP(
  items: List[float], bins: int, bin_cap: Optional[List[float]] = None, timeout: int = 30, use_heuristic: bool = True
) -> ResultType:
  if bin_cap and len(bin_cap)!=bins:
    raise Exception("Number of bin capacities must equal number of bins!")
  heuristic_bound = None
  if use_heuristic:
    heuristic_bound = np.max(KarmarkarKarp(items, bins)[1])
  if bin_cap:
    bin_cap.sort()
  # Binary matrix indicating if an item is in a bin
  x = {}
  for item_i in range(len(items)):
    for bin_i in range(bins):
      x[item_i, bin_i] = cp.Variable(boolean=True)
  # Each item is assigned to 1 bin
  constraints = []
  for item_i in range(len(items)):
    constraints.append(cp.sum([x[item_i, bin_i] for bin_i in range(bins)]) == 1)
  # Get bin weights
  bin_weights = [cp.Variable(pos=True) for _ in range(bins)]
  for bin_i in range(bins):
    constraints.append(cp.sum([x[item_i, bin_i] * items[item_i] for item_i in range(len(items))]) == bin_weights[bin_i])
    # Each bin cannot exceed its capacity
    if bin_cap:
      constraints.append(bin_weights[bin_i] <= bin_cap[bin_i])
    if heuristic_bound:
      constraints.append(bin_weights[bin_i] <= heuristic_bound)
  # Use this to break symmetries
  if bin_cap:
    for b in range(bins-1):
      if bin_cap[b] == bin_cap[b+1]:
        constraints.append(bin_weights[b] <= bin_weights[b+1])
  else:
    for b in range(bins-1):
      constraints.append(bin_weights[b] <= bin_weights[b+1])
  # The weight of each bin is limited by max_bin_size, which we try to minimize
  max_bin_size = cp.Variable(pos=True)
  constraints += [bin_weights[bin_i] <= max_bin_size for bin_i in range(bins)]
  # Construct and solve the problem
  problem = cp.Problem(cp.Minimize(max_bin_size), constraints)
  optval = problem.solve(solver='CBC', maximumSeconds=timeout, verbose=True)
  # Extract the solution
  sets: List[List[float]] = []
  set_sums: List[float] = []
  for bin_i in range(bins):
    set_sums.append(sum([items[item_i] * x[item_i, bin_i].value for item_i in range(len(items))]))
    sets.append([items[item_i] for item_i in range(len(items)) if x[item_i, bin_i].value])
  return sets, set_sums
def DuplicateItemsILP(
  items: List[float], bins: int, bin_cap: Optional[List[float]] = None, timeout: int = 30, use_heuristic: bool = True
) -> ResultType:
  if bin_cap and len(bin_cap)!=bins:
    raise Exception("Number of bin capacities must equal number of bins!")
  heuristic_bound = None
  if use_heuristic:
    heuristic_bound = np.max(KarmarkarKarp(items, bins)[1])
  if bin_cap:
    bin_cap.sort()
  icounts = Counter(items) # Get a dictionary of <Item, Count> pairs
  constraints = []
  # Matrix of distinct items and how many times they appear in each bin
  x = {}
  for item, _ in icounts.items():
    for b in range(bins):
      x[item, b] = cp.Variable(integer=True)
      constraints.append(x[item, b] >= 0)
  # There are only so many of each item
  for item, count in icounts.items():
    constraints.append(cp.sum([x[item, b] for b in range(bins)]) == count)
  # Get bin weights
  bin_weights = [cp.Variable(pos=True) for _ in range(bins)]
  for bin_i in range(bins):
    constraints.append(cp.sum([item*x[item, bin_i] for item, _ in icounts.items()]) == bin_weights[bin_i])
    # Each bin cannot exceed its capacity
    if bin_cap:
      constraints.append(bin_weights[bin_i] <= bin_cap[bin_i])
    if heuristic_bound:
      constraints.append(bin_weights[bin_i] <= heuristic_bound)
  # Use this to break symmetries
  if bin_cap:
    for b in range(bins-1):
      if bin_cap[b] == bin_cap[b+1]:
        constraints.append(bin_weights[b] <= bin_weights[b+1])
  else:
    for b in range(bins-1):
      constraints.append(bin_weights[b] <= bin_weights[b+1])
  # The weight of each bin is limited by max_bin_size, which we try to minimize
  max_bin_size = cp.Variable(pos=True)
  constraints += [bin_weights[bin_i] <= max_bin_size for bin_i in range(bins)]
  # Construct and solve the problem
  problem = cp.Problem(cp.Minimize(max_bin_size), constraints)
  optval = problem.solve(solver='CBC', maximumSeconds=timeout, verbose=True)
  # Extract the solution
  sets: List[List[float]] = []
  set_sums: List[float] = []
  for bin_i in range(bins):
    set_sums.append(sum([item * x[item, bin_i].value for item, _ in icounts.items()]))
    sets.append([])
    for item, _ in icounts.items():
      sets[-1].extend([item for _ in range(int(x[item, bin_i].value))])
  return sets, set_sums
np.random.seed(123456)
num_sets = 5
numbers = np.random.random(size=100)
sets, set_sums = DistinctItemsILP(numbers, num_sets, timeout=480)
print(f"DistinctItemsILP: {set_sums}, max={max(set_sums)}")
sets, set_sums = DuplicateItemsILP(numbers, num_sets, timeout=480)
print(f"DuplicateItemsILP: {set_sums}, max={max(set_sums)}")
sets, set_sums = DistinctItemsILP(numbers, num_sets, timeout=480, use_heuristic=False)
print(f"DistinctItemsILP-no-heuristic: {set_sums}, max={max(set_sums)}")
sets, set_sums = DuplicateItemsILP(numbers, num_sets, timeout=480, use_heuristic=False)
print(f"DuplicateItemsILP-no-heuristic: {set_sums}, max={max(set_sums)}")

After running the above we find (duplicating the approximate solutions for comparison)

List scheduling
    [9.78607846, 9.89566122, 9.62750904, 9.62253621, 9.59228928]
    max = 9.89566122
LPT
    [9.69177628, 9.69818904, 9.70391739, 9.70841387, 9.72177763]
    max = 9.72177763
SLACK
    [9.69539566 9.69812035 9.71967084 9.70318243 9.70770493]
    max = 9.719670841141273
Karmarkar-Karp
    [9.704530541764623, 9.704676654571145, 9.704761244535716, 9.704787373425246, 9.705318390488912]
    max = 9.705318390488912
DistinctItemsILP
    **No feasible solution found within 480s**
DuplicateItemsILP (timed out)
    [9.70454479311396, 9.704684091094213, 9.704939500837135, 9.704939781454529, 9.704966038285805]
    max = 9.704966038285805
DistinctItemsILP-no-heuristic (timed out)
    [9.698474076805965, 9.705322596572808, 9.706392810380152, 9.706451385041921, 9.707433335984797]
    max = 9.707433335984797
DuplicateItemsILP-no-heuristic (timed out)
    [9.704106603441915, 9.704882096788454, 9.704910115144814, 9.705052586081315, 9.705122803329143]
    max = 9.705122803329143

Note that using the heuristic meant that DistinctItemsILP was unable to find any feasible solution within the allotted time while DistinctItemsILP without the heuristic found a solution, but one that was worse than that provided by the Karmarkar-Karp algorithm. The best way to deal with this would be to use the heuristic and then, if the MIP doesn't find a solution or finds a worse solution, to return the heuristic value.

DuplicateItemsILP performed better. With the heuristic it finds the best solution of any of the algorithms we've looked at. Running it without the heuristic gives a slight degradation in the optimal value found, but it still provides a better solution than Karmarkar-Karp. Again, running with the heuristic and returning the best value between the heuristic and the MIP is a good way to handle this.

Local Search

We used Karmarkar-Karp as a heuristic to get an upper bound for the MIP solvers. The MIP solvers are exact, but can take a long time to run. An alternative method for improving the results of Karmarkar-Karp or any other solver than doesn't return an optimal solution is to randomly swap numbers between sets accepting only those swaps that improve the solution. We can implement this for Karmarkar-Karp like so:

def SwapSearchFromExisting(sets: List[List[float]]) -> ResultType:
  def good_swap(set_ai: int, set_bi: int, aval: Optional[float], bval: Optional[float]):
    nonlocal set_sums, max_sum
    test_sums = set_sums.copy()
    test_sums[set_ai] = test_sums[set_ai] - (aval if aval else 0) + (bval if bval else 0)
    test_sums[set_bi] = test_sums[set_bi] + (aval if aval else 0) - (bval if bval else 0)
    if max(test_sums) < max_sum:
      set_sums = test_sums
      max_sum = max(test_sums)
      return True
    return False
  set_sums: List[float] = [sum(x) for x in sets]
  max_sum: float = max(set_sums)
  for _ in range(100000):
    set_ai = random.randint(0, len(sets)-1)
    set_bi = random.randint(0, len(sets)-1)
    set_a = sets[set_ai]
    set_b = sets[set_bi]
    item_a = random.randint(0, len(set_a)-1) if len(set_a)>0 else None
    item_b = random.randint(0, len(set_b)-1) if len(set_b)>0 else None
    if item_a and item_b and good_swap(set_ai, set_bi, set_a[item_a], set_b[item_b]):
      set_a[item_a], set_b[item_b] = set_b[item_b], set_a[item_a]
    elif item_a and good_swap(set_ai, set_bi, set_a[item_a], None):
      set_b.append(set_a[item_a])
      set_a.pop(item_a)
    elif item_b and good_swap(set_ai, set_bi, None, set_b[item_b]):
      set_a.append(set_b[item_b])
      set_b.pop(item_b)
  return sets, set_sums
sets, set_sums = SwapSearchFromExisting(KarmarkarKarp(numbers, num_sets)[0])
print(f"SwapSearchFromExisting: {set_sums}, max={max(set_sums)}")

This gives us

SwapSearchFromExisting
    [9.704991039048654, 9.704676654571143, 9.704761244535712, 9.704787373425246, 9.704857893204878]
    max = 9.704991039048654

This is slightly worse than the MIP solver but took a fraction of the time.