Skip to content

RTree: Count points in the neighbourhoods within each point of another set of points

An answer to this question on Stack Overflow.

Question

Why is this not returning a count of number of points in each neighbourhoods (bounding box)?

import geopandas as gpd
def radius(points_neighbour, points_center, new_field_name, r):
    """
    :param points_neighbour:
    :param points_center:
    :param new_field_name: new field_name attached to points_center
    :param r: radius around points_center
    :return:
    """
    sindex = points_neighbour.sindex
    pts_in_neighbour = []
    for i, pt_center in points_center.iterrows():
        nearest_index = list(sindex.intersection((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r)))
        pts_in_this_neighbour = points_neighbour[nearest_index]
        pts_in_neighbour.append(len(pts_in_this_neighbour))
    points_center[new_field_name] = gpd.GeoSeries(pts_in_neighbour)

Every loop gives the same result.

Second question, how can I find k-th nearest neighbour?

More information about the problem itself:

  • We are doing it at a very small scale e.g. Washington State, US or British Columbia, Canada

  • We hope to utilize geopandas as much as possible since it's similar to pandas and supports spatial indexing: RTree

  • For example, sindex here has method nearest, intersection, etc.

Please comment if you need more information. This is the code in class GeoPandasBase

@property
def sindex(self):
    if not self._sindex_generated:
        self._generate_sindex()
    return self._sindex

I tried Richard's example but it didn't work

def radius(points_neighbour, points_center, new_field_name, r):
    """
    :param points_neighbour:
    :param points_center:
    :param new_field_name: new field_name attached to points_center
    :param r: radius around points_center
    :return:
    """
    sindex = points_neighbour.sindex
    pts_in_neighbour = []
    for i, pt_center in points_center.iterrows():
        pts_in_this_neighbour = 0
        for n in sindex.intersection(((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r))):
            dist = pt_center.distance(points_neighbour['geometry'][n])
            if dist < radius:
                pts_in_this_neighbour = pts_in_this_neighbour + 1
        pts_in_neighbour.append(pts_in_this_neighbour)
    points_center[new_field_name] = gpd.GeoSeries(pts_in_neighbour)

To download the shape file, goto https://catalogue.data.gov.bc.ca/dataset/hellobc-activities-and-attractions-listing and choose ArcView to download

Answer

Rather than answer your question directly, I'd argue that you're doing this wrong. After arguing this, I'll give a better answer.

Why you're doing it wrong

An r-tree is great for bounding-box queries in two or three Euclidean dimensions.

You are looking up longitude-latitude points on a two-dimensional surface curved in a three-dimensional space. The upshot is that your coordinate system will yield singularities and discontinuities: 180°W is the same as 180°E, 2°E by 90°N is close to 2°W by 90°N. The r-tree does not capture these sorts of things!

But, even if they were a good solution, your idea to take lat±r and lon±r yields a square region; rather, you probably want a circular region around your point.

How to do it right

  1. Rather than keeping the points in lon-lat format, convert them to xyz format using a spherical coordinate conversion. Now they are in a 3D Euclidean space and there are no singularities or discontinuities.

  2. Place the points in a three-dimensional kd-tree. This allows you to quickly, in O(log n) time, ask questions like "What are the k-nearest neighbours to this point?" and "What are all the points within a radius r of this points?" SciPy comes with an implementation.

  3. For your radius search, convert from a Great Circle radius to a chord: this makes the search in 3-space equivalent to a radius search on a circle wrapped to the surface of a sphere (in this case, the Earth).

Code for doing it right

I've implemented the foregoing in Python as a demonstration. Note that all spherical points are stored in (longitude,latitude)/(x-y) format using a lon=[-180,180], lat=[-90,90] scheme. All 3D points are stored in (x,y,z) format.

#/usr/bin/env python3
import numpy as np
import scipy as sp
import scipy.spatial
Rearth = 6371
#Generate uniformly-distributed lon-lat points on a sphere
#See: http://mathworld.wolfram.com/SpherePointPicking.html
def GenerateUniformSpherical(num):
  #Generate random variates
  pts      = np.random.uniform(low=0, high=1, size=(num,2))
  #Convert to sphere space
  pts[:,0] = 2*np.pi*pts[:,0]          #0-360 degrees
  pts[:,1] = np.arccos(2*pts[:,1]-1)   #0-180 degrees
  #Convert to degrees
  pts = np.degrees(pts)
  #Shift ranges to lon-lat
  pts[:,0] -= 180
  pts[:,1] -= 90
  return pts
def ConvertToXYZ(lonlat):
  theta  = np.radians(lonlat[:,0])+np.pi
  phi    = np.radians(lonlat[:,1])+np.pi/2
  x      = Rearth*np.cos(theta)*np.sin(phi)
  y      = Rearth*np.sin(theta)*np.sin(phi)
  z      = Rearth*np.cos(phi)
  return np.transpose(np.vstack((x,y,z)))
#Get all points which lie with `r_km` Great Circle kilometres of the query
#points `qpts`.
def GetNeighboursWithinR(qpts,kdtree,r_km):
  #We need to convert Great Circle kilometres into chord length kilometres in
  #order to use the kd-tree
  #See: http://mathworld.wolfram.com/CircularSegment.html
  angle        = r_km/Rearth
  chord_length = 2*Rearth*np.sin(angle/2)
  pts3d        = ConvertToXYZ(qpts)
  #See: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.query_ball_point.html#scipy.spatial.KDTree.query_ball_point
  #p=2 implies Euclidean distance, eps=0 implies no approximation (slower)
  return kdtree.query_ball_point(pts3d,chord_length,p=2,eps=0) 
##############################################################################
#WARNING! Do NOT alter pts3d or kdtree will malfunction and need to be rebuilt
##############################################################################
##############################
#Correctness tests on the North, South, East, and West poles, along with Kolkata
ptsll = np.array([[0,90],[0,-90],[0,0],[-180,0],[88.3639,22.5726]])
pts3d = ConvertToXYZ(ptsll)
kdtree = sp.spatial.KDTree(pts3d, leafsize=10) #Stick points in kd-tree for fast look-up
qptsll = np.array([[-3,88],[5,-85],[10,10],[-178,3],[175,4]])
GetNeighboursWithinR(qptsll, kdtree, 2000)
##############################
#Stress tests
ptsll = GenerateUniformSpherical(100000)    #Generate uniformly-distributed lon-lat points on a sphere
pts3d = ConvertToXYZ(ptsll)                 #Convert points to 3d
#See: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html
kdtree = sp.spatial.KDTree(pts3d, leafsize=10) #Stick points in kd-tree for fast look-up
qptsll = GenerateUniformSpherical(100)      #We'll find neighbours near these points
GetNeighboursWithinR(qptsll, kdtree, 500)