Estimating an area of an image generated by a set of points (Alpha shapes??)
An answer to this question on Stack Overflow.
Question
I have a set of points in an example ASCII file showing a 2D image.
I would like to estimate the total area that these points are filling. There are some places inside this plane that are not filled by any point because these regions have been masked out. What I guess might be practical for estimating the area would be applying a concave hull or alpha shapes.
I tried this approach to find an appropriate alpha value, and consequently estimate the area.
from shapely.ops import cascaded_union, polygonize
import shapely.geometry as geometry
from scipy.spatial import Delaunay
import numpy as np
import pylab as pl
from descartes import PolygonPatch
from matplotlib.collections import LineCollection
def plot_polygon(polygon):
fig = pl.figure(figsize=(10,10))
ax = fig.add_subplot(111)
margin = .3
x_min, y_min, x_max, y_max = polygon.bounds
ax.set_xlim([x_min-margin, x_max+margin])
ax.set_ylim([y_min-margin, y_max+margin])
patch = PolygonPatch(polygon, fc='#999999',
ec='#000000', fill=True,
zorder=-1)
ax.add_patch(patch)
return fig
def alpha_shape(points, alpha):
if len(points) < 4:
# When you have a triangle, there is no sense
# in computing an alpha shape.
return geometry.MultiPoint(list(points)).convex_hull
def add_edge(edges, edge_points, coords, i, j):
"""
Add a line between the i-th and j-th points,
if not in the list already
"""
if (i, j) in edges or (j, i) in edges:
# already added
return
edges.add( (i, j) )
edge_points.append(coords[ [i, j] ])
coords = np.array([point.coords[0]
for point in points])
tri = Delaunay(coords)
edges = set()
edge_points = []
# loop over triangles:
# ia, ib, ic = indices of corner points of the
# triangle
for ia, ib, ic in tri.vertices:
pa = coords[ia]
pb = coords[ib]
pc = coords[ic]
# Lengths of sides of triangle
a = np.sqrt((pa[0]-pb[0])**2 + (pa[1]-pb[1])**2)
b = np.sqrt((pb[0]-pc[0])**2 + (pb[1]-pc[1])**2)
c = np.sqrt((pc[0]-pa[0])**2 + (pc[1]-pa[1])**2)
# Semiperimeter of triangle
s = (a + b + c)/2.0
# Area of triangle by Heron's formula
area = np.sqrt(s*(s-a)*(s-b)*(s-c))
circum_r = a*b*c/(4.0*area)
# Here's the radius filter.
#print circum_r
if circum_r < 1.0/alpha:
add_edge(edges, edge_points, coords, ia, ib)
add_edge(edges, edge_points, coords, ib, ic)
add_edge(edges, edge_points, coords, ic, ia)
m = geometry.MultiLineString(edge_points)
triangles = list(polygonize(m))
return cascaded_union(triangles), edge_points
points=[]
with open("test.asc") as f:
for line in f:
coords=map(float,line.split(" "))
points.append(geometry.shape(geometry.Point(coords[0],coords[1])))
print geometry.Point(coords[0],coords[1])
x = [p.x for p in points]
y = [p.y for p in points]
pl.figure(figsize=(10,10))
point_collection = geometry.MultiPoint(list(points))
point_collection.envelope
convex_hull_polygon = point_collection.convex_hull
_ = plot_polygon(convex_hull_polygon)
_ = pl.plot(x,y,'o', color='#f16824')
concave_hull, edge_points = alpha_shape(points, alpha=0.001)
lines = LineCollection(edge_points)
_ = plot_polygon(concave_hull)
_ = pl.plot(x,y,'o', color='#f16824')
I get this result but I would like that this method could detect the hole in the middle.

Update
This is how my real data looks like:

My question is what is the best way to estimate an area of the aforementioned shape? I can not figure out what has gone wrong that this code doesn't work properly?!! Any help will be appreciated.
Answer
Here's a thought: use k-means clustering.
You can accomplish this in Python as follows:
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
dat = np.loadtxt('test.asc')
xycoors = dat[:,0:2]
fit = KMeans(n_clusters=2).fit(xycoors)
plt.scatter(dat[:,0],dat[:,1], c=fit.labels_)
plt.axes().set_aspect('equal', 'datalim')
plt.gray()
plt.show()
Using your data, this gives the following result:
Now, you can take the convex hull of the top cluster and the bottom cluster and calculate the areas of each separately. Adding the areas then becomes an estimator of the area of their union, but, cunningly, avoids the hole in the middle.
To fine-tune your results, you can play with the number of clusters and the number of different starts to the algorithm (the algorithm is randomized and is typically run more than once).
You asked, for instance, if two clusters will always leave the hole in the middle. I've used the following code to experiment with that. I generate a uniform distribution of points and then chop out a randomly sized and orientated ellipse to simulate a hole.
#!/usr/bin/env python3
import sklearn
import sklearn.cluster
import numpy as np
import matplotlib.pyplot as plt
PWIDTH = 6
PHEIGHT = 6
def GetPoints(num):
return np.random.rand(num,2)*300-150 #Centered about zero
def MakeHole(pts): #Chop out a randomly orientated and sized ellipse
a = np.random.uniform(10,150) #Semi-major axis
b = np.random.uniform(10,150) #Semi-minor axis
h = np.random.uniform(-150,150) #X-center
k = np.random.uniform(-150,150) #Y-center
A = np.random.uniform(0,2*np.pi) #Angle of rotation
surviving_points = []
for pt in range(pts.shape[0]):
x = pts[pt,0]
y = pts[pt,1]
if ((x-h)*np.cos(A)+(y-k)*np.sin(A))**2/a/a+((x-h)*np.sin(A)-(y-k)*np.cos(A))**2/b/b>1:
surviving_points.append(pt)
return pts[surviving_points,:]
def ShowManyClusters(pts,fitter,clusters,title):
colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk'])
fig,axs = plt.subplots(PWIDTH,PHEIGHT)
axs = axs.ravel()
for i in range(PWIDTH*PHEIGHT):
lbls = fitter(pts[i],clusters)
axs[i].scatter(pts[i][:,0],pts[i][:,1], c=colors[lbls])
axs[i].get_xaxis().set_ticks([])
axs[i].get_yaxis().set_ticks([])
plt.suptitle(title)
#plt.show()
plt.savefig('/z/'+title+'.png')
fitters = {
'SpectralClustering': lambda x,clusters: sklearn.cluster.SpectralClustering(n_clusters=clusters,affinity='nearest_neighbors').fit(x).labels_,
'KMeans': lambda x,clusters: sklearn.cluster.KMeans(n_clusters=clusters).fit(x).labels_,
'AffinityPropagation': lambda x,clusters: sklearn.cluster.AffinityPropagation().fit(x).labels_,
}
np.random.seed(1)
pts = []
for i in range(PWIDTH*PHEIGHT):
temp = GetPoints(300)
temp = MakeHole(temp)
pts.append(temp)
for name,fitter in fitters.items():
for clusters in [2,3]:
np.random.seed(1)
ShowManyClusters(pts,fitter,clusters,"{0}: {1} clusters".format(name,clusters))
Consider the results for K-Means:
At least to my eye, it seems as though using two clusters performs worst when the "hole" separates the data into two separate blobs. (In this case that occurs when the ellipse is orientated such that it overlaps two edges of the rectangular region containing the sample points.) Using three clusters resolves most of these difficulties.
You'll also notice that K-means produces some counter-intuitive results on the 1st Column, 3rd Row as well as on the 3rd Column, 4th Row. Reviewing sklearn's menagerie of clustering methods here shows the following comparison image:
From this, image it seems as though SpectralClustering produces results that align with what we want. Trying this on the same data above fixes the problems mentioned (see 1st Column, 3rd Row and 3rd Column, 4th Row).
The foregoing suggests that Spectral clustering with three clusters should be adequate for most situations of this sort.
