Skip to content

Max Distance between 2 points in a data set and identifying the points

An answer to this question on Stack Overflow.

Question

I have a matrix consisting of x,y,z coordinates of several points. I would like to find the extremum points i.e. the two points that are farthest apart.

I could figure out a way in matlab, but i need it in Python

Here is the code in matlab

A = randint(500,3,[-5 5]);
D=pdist(A);
D=squareform(D);
[N,I]=max(D(:));
[I_row, I_col] = ind2sub(size(D),I);

pdist gives the distance between pairs of points(i,j). squareform gives the matrix output In last two steps I attempt to find the indices of the matrix I_row, I_col..

Points I_row and I_col have the max distance..

Could anybody suggest me an efficient way in python as all my other codes are in Python.

Answer

All the other answers here take O(N^2) time and space. That's terrible.

Instead, recognize that the two farthest points in a dataset lie on the set's convex hull. Since hulls can be computed in O(N log N) time in 3D this forms an efficient prefilter. In my testing on a dataset with 16,000,000 points the convex hull contained only 420 points.

The farthest points on the hull can be found in O(H log H) time, though this is harder to implement in Python. So we can instead at that point fall back to the O(N^2) cdist solutions.

import numpy as np
from scipy.spatial import ConvexHull
from scipy.spatial.distance import cdist
N = 16000000
# Find a convex hull in O(N log N)
points = np.random.rand(N, 3)   # N random points in 3-D
# Returned 420 points in testing
hull = ConvexHull(points)
# Extract the points forming the hull
hullpoints = points[hull.vertices,:]
# Naive way of finding the best pair in O(H^2) time if H is number of points on
# hull
hdist = cdist(hullpoints, hullpoints, metric='euclidean')
# Get the farthest apart points
bestpair = np.unravel_index(hdist.argmax(), hdist.shape)
#Print them
print([hullpoints[bestpair[0]],hullpoints[bestpair[1]]])