Using Neo4j Spatial and Mapbox to search for businesses by location11 May 2015
A common use-case for database queries is to search for things that are close to other things or within some specified geospatial boundary. Geospatial indexes and queries are offered by NoSQL databases, such as MongoDB and relational databases such as PostgreSQL. But what about graph databases? In this article, I show how to create a web application to search within a user-defined boundary powered by the Neo4j graph database.
Searching for businesses within a user defined boundary
Our requirements for this web app are:
- allow the user to draw a polygon on a map
- allow the user to select the category of business they would like to search for
- show the user businesses that match their selected category and within the bounds they have defined on the map
You can check it out here, but this is what it looks like:
In the rest of this post we’ll look at how to build this project.
Let’s take a look at the components of this project.
The data for this project comes from the Yelp Academic Dataset. At the time that I built this project I was a graduate student at the University of Montana so I’m pretty sure I was operating within the confines of Yelp’s license for the use of this data. Needless to say, I can’t redistribute the data so if you’d like to follow along you’ll need to download it directly from Yelp.
The dataset contains information about over 100,000 businesses in the Phoenix, AZ area in the U.S. This includes Yelp user reviews and some anonymized information about the users. We will use the latitude / longitude data to map businesses, as well as the business category for filtering.
Neo4j is a graph database that allows for modeling, storing and querying data as a graph. If you haven’t been exposed to graph databases yet it’s worth checking out as many use cases are naturally modeled as a graph. Neo4j Spatial is a plugin for Neo4j that enables spatial operations: spatial indexing and spatial querying (such as find things within x distance of y or find things within a certain geometry.
Here we will use Neo4j Spatial’s
SearchIntersects query to perform an efficent spatial query to find businesses within a user defined polygon.
MapBox / Leaflet.js
Now that we’ve seen what tools we’ll be working with, the first step is to load our data into Neo4j.
Loading the data
The data consists of an array of JSON objects describing each business. Something like this:
Lots of interesting data available here but we are only interested in
There are several ways to import data into Neo4j. The Neo4j Spatial plugin even provides functionality for importing from shapefiles and from OSM data. I hadn’t originally considered making use of the business category information so I ended up with a messy two step data import process:
- Create a Business node for each business in the dataset and add it to the spatial index
- Create relationships between this Business node and the appropriate Category nodes
Including the in-graph spatial index, the data model looks like this:
Create Business nodes and add to spatial index
I chose to create an unmanaged server extension for interacting with Spatial using the Java API (more on this in the next section), adding an endpoint for Business node creation. This endpoint takes a JSON object that represents the business to be created and added to the spatial index:
I wrote a simple Python script to iterate through all business objects in the dataset and make a POST request to our new endpoint. The script is available here. We should really be posting an array of business objects here, but we’ll save that for v2 ;-) More in the next section about how this POST request is actually handled.
Add category relationships to Business nodes
I quickly realized that having business category information would make this demo a bit more interesting. Each business has an array of categories to which it belongs. So we now iterate through the dataset again, but this time execute a cypher query that will add a relationship from the Business node to the appropriate Category nodes for each business, creating our Category nodes along the way using a Cypher
Neo4j Spatial queries / server extension
As I mentioned, we use a Neo4j server extension to add two endpoints to the Neo4j REST API. The first endpoint handles a POST request with the data for a business and then creates a corresponding Business node in the database and adds this node to our spatial index, allowing for later spatial query operations. The second endpoint will handle a GET request, with a polygon WKT string and business category as parameters and return an array of all businesses found within the polygon that match the given category.
The purpose of this endpoint is to add a given Business into the Spatial layer. Unmanaged extensions allow us to define arbitrary JAX-RS handlers to be executed when a request is sent to our new endpoints. We saw above that we made a series of POST requests, sending a JSON body with details for each business. Here we will look at the code for the handler for that endpoint.
Handler to create Business node and add to the Spatial index:
We first declare that this method will handle POST requests to
.../node and that we expect a JSON body string. Next, we get a handle on the
GraphDatabaseServive and instantiate a new
SpatialDatabaseService. Really this should only be done once, and then cached for later calls, but I’m just trying to show each handler as a self-enclosed method. Next, we serialize the JSON body to an instance of
BusinessNode using the google-gson Java library.
BusinessNode is a POJO with instance vars for the properties of our business and the appropriate getters / setters to facilitate serialization with GSON. We then create a new Neo4j node called
businessNode and set the appropriate properties on this node before committing the transaction. Finally, we get a handle of the Neo4j Spatial point layer and add this newly created
businessNode to the Spatial layer. Spatial will take care of the appropriate in-graph R-Tree indexing and we can now make spatial query operations on the
The other endpoint we add in our server extension executes the spatial query. We send it a business category and a polygon and in return we get an array of all businesses within that polygon that match the business category.
Handler to query within polygon:
Here we define the handler of a GET request and specify two string query parameters: the polygon, as a WKT string, and the business category. We instantiate a new
SpatialDatabaseService and get a handle on our Spatial Layer
businessLayer. We then use Spatial’s
SearchIntersect query to perform a spatial query within our
businessLayer to find any Nodes intersecting the polygon. We then iterate through the results and build up an array of
HashMap objects with the business name, latitude and longitude to return as JSON.
Mapbox / Leaflet frontend
Now that we’ve imported the data into Neo4j, set up our spatial index, and have an endpoint that will allow us to perform the spatial query that we need it’s time to build a frontend for this application. We wil create a web app that renders a map, allows the user to draw an arbitrary polygon on the map and select a business category. Our web app will then need to send a request to our Neo4j server extension endpoint to perform the spatial query and render the results as pin annotations on the map.
Load the map
First we’ll need to display the map, enable polygon drawing and define which functions should be called once the polygon is drawn or an existing polygon edited:
Handlers for polygon drawing / editing
Here we build a WKT Polygon string from the user defined polygon and make an AJAX request to our Neo4j server extension to perform the spatial query to find all businesses within our polygon / category:
Render the markers on Ajax success
Now we just need to create markers for each business in the JSON array that comes back from our server extension and add that layer to the map:
Well that’s about it. You can see the demo live here. The full code is available on my GitHub page. Code for the Neo4j server extension is available here and code for data import and the web app is available in this repository.
If you’d like to be notified when I publish more posts like this, you can follow me on Twitter.
As an aside, I was fortunate to receive a Google Summer of Code grant last summer to work with the folks at OSGeo and Neo Technology to build a prototype allowing for interacting with Neo4j Spatial directly from the Cypher query language. You can read more about that here.