RTree: Count points in the neighbourhoods within each point of another set of points
An answer to this question on Stack Overflow.
Question
Why is this not returning a count of number of points in each neighbourhoods (bounding box)?
import geopandas as gpd
def radius(points_neighbour, points_center, new_field_name, r):
"""
:param points_neighbour:
:param points_center:
:param new_field_name: new field_name attached to points_center
:param r: radius around points_center
:return:
"""
sindex = points_neighbour.sindex
pts_in_neighbour = []
for i, pt_center in points_center.iterrows():
nearest_index = list(sindex.intersection((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r)))
pts_in_this_neighbour = points_neighbour[nearest_index]
pts_in_neighbour.append(len(pts_in_this_neighbour))
points_center[new_field_name] = gpd.GeoSeries(pts_in_neighbour)
Every loop gives the same result.
Second question, how can I find k-th nearest neighbour?
More information about the problem itself:
We are doing it at a very small scale e.g. Washington State, US or British Columbia, Canada
We hope to utilize geopandas as much as possible since it's similar to pandas and supports spatial indexing: RTree
For example, sindex here has method nearest, intersection, etc.
Please comment if you need more information. This is the code in class GeoPandasBase
@property
def sindex(self):
if not self._sindex_generated:
self._generate_sindex()
return self._sindex
I tried Richard's example but it didn't work
def radius(points_neighbour, points_center, new_field_name, r):
"""
:param points_neighbour:
:param points_center:
:param new_field_name: new field_name attached to points_center
:param r: radius around points_center
:return:
"""
sindex = points_neighbour.sindex
pts_in_neighbour = []
for i, pt_center in points_center.iterrows():
pts_in_this_neighbour = 0
for n in sindex.intersection(((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r))):
dist = pt_center.distance(points_neighbour['geometry'][n])
if dist < radius:
pts_in_this_neighbour = pts_in_this_neighbour + 1
pts_in_neighbour.append(pts_in_this_neighbour)
points_center[new_field_name] = gpd.GeoSeries(pts_in_neighbour)
To download the shape file, goto https://catalogue.data.gov.bc.ca/dataset/hellobc-activities-and-attractions-listing and choose ArcView to download
Answer
Rather than answer your question directly, I'd argue that you're doing this wrong. After arguing this, I'll give a better answer.
Why you're doing it wrong
An r-tree is great for bounding-box queries in two or three Euclidean dimensions.
You are looking up longitude-latitude points on a two-dimensional surface curved in a three-dimensional space. The upshot is that your coordinate system will yield singularities and discontinuities: 180°W is the same as 180°E, 2°E by 90°N is close to 2°W by 90°N. The r-tree does not capture these sorts of things!
But, even if they were a good solution, your idea to take lat±r and lon±r yields a square region; rather, you probably want a circular region around your point.
How to do it right
Rather than keeping the points in lon-lat format, convert them to xyz format using a spherical coordinate conversion. Now they are in a 3D Euclidean space and there are no singularities or discontinuities.
Place the points in a three-dimensional kd-tree. This allows you to quickly, in O(log n) time, ask questions like "What are the k-nearest neighbours to this point?" and "What are all the points within a radius r of this points?" SciPy comes with an implementation.
For your radius search, convert from a Great Circle radius to a chord: this makes the search in 3-space equivalent to a radius search on a circle wrapped to the surface of a sphere (in this case, the Earth).
Code for doing it right
I've implemented the foregoing in Python as a demonstration. Note that all spherical points are stored in (longitude,latitude)/(x-y) format using a lon=[-180,180], lat=[-90,90] scheme. All 3D points are stored in (x,y,z) format.
#/usr/bin/env python3
import numpy as np
import scipy as sp
import scipy.spatial
Rearth = 6371
#Generate uniformly-distributed lon-lat points on a sphere
#See: http://mathworld.wolfram.com/SpherePointPicking.html
def GenerateUniformSpherical(num):
#Generate random variates
pts = np.random.uniform(low=0, high=1, size=(num,2))
#Convert to sphere space
pts[:,0] = 2*np.pi*pts[:,0] #0-360 degrees
pts[:,1] = np.arccos(2*pts[:,1]-1) #0-180 degrees
#Convert to degrees
pts = np.degrees(pts)
#Shift ranges to lon-lat
pts[:,0] -= 180
pts[:,1] -= 90
return pts
def ConvertToXYZ(lonlat):
theta = np.radians(lonlat[:,0])+np.pi
phi = np.radians(lonlat[:,1])+np.pi/2
x = Rearth*np.cos(theta)*np.sin(phi)
y = Rearth*np.sin(theta)*np.sin(phi)
z = Rearth*np.cos(phi)
return np.transpose(np.vstack((x,y,z)))
#Get all points which lie with `r_km` Great Circle kilometres of the query
#points `qpts`.
def GetNeighboursWithinR(qpts,kdtree,r_km):
#We need to convert Great Circle kilometres into chord length kilometres in
#order to use the kd-tree
#See: http://mathworld.wolfram.com/CircularSegment.html
angle = r_km/Rearth
chord_length = 2*Rearth*np.sin(angle/2)
pts3d = ConvertToXYZ(qpts)
#See: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.query_ball_point.html#scipy.spatial.KDTree.query_ball_point
#p=2 implies Euclidean distance, eps=0 implies no approximation (slower)
return kdtree.query_ball_point(pts3d,chord_length,p=2,eps=0)
##############################################################################
#WARNING! Do NOT alter pts3d or kdtree will malfunction and need to be rebuilt
##############################################################################
##############################
#Correctness tests on the North, South, East, and West poles, along with Kolkata
ptsll = np.array([[0,90],[0,-90],[0,0],[-180,0],[88.3639,22.5726]])
pts3d = ConvertToXYZ(ptsll)
kdtree = sp.spatial.KDTree(pts3d, leafsize=10) #Stick points in kd-tree for fast look-up
qptsll = np.array([[-3,88],[5,-85],[10,10],[-178,3],[175,4]])
GetNeighboursWithinR(qptsll, kdtree, 2000)
##############################
#Stress tests
ptsll = GenerateUniformSpherical(100000) #Generate uniformly-distributed lon-lat points on a sphere
pts3d = ConvertToXYZ(ptsll) #Convert points to 3d
#See: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html
kdtree = sp.spatial.KDTree(pts3d, leafsize=10) #Stick points in kd-tree for fast look-up
qptsll = GenerateUniformSpherical(100) #We'll find neighbours near these points
GetNeighboursWithinR(qptsll, kdtree, 500)