Building a text adjacency graph from product reviews with the Best Buy API07 Jan 2016
This week I’m preparing a presentation for DataDay Texas about Natural Language Processing with graph databases and Neo4j. While doing some research I came across a great quote from Matt Biddulph:
“Nearly all text processing starts by transforming text into vectors” - Matt Biddulph
He is referring of course to building a vector space model from a text corpus as the first step of text processing. In the context of NLP with graph databases, the first step is building a graph model from the text corpus. In many cases this is a simple word adjacency graph.
Another resource I stumbled across on the internet this week is the Best Buy API. This is a great resource for playing around with text processing. Specifically, the Reviews endpoint provides access to a rich series of user generated text that we can use for text processing. I’m specifically interested in text summarization and opinion mining so this is a perfect data set.
After registering with the Best Buy API and retrieving an API token this simple Python script can be used to fetch the first 500 product reviews for any Best Buy product and insert into Neo4j following the model of a word adjacency graph. Just replace the
SKU variables to retrieve product reviews for any product.
Once the script runs, a snippet of the graph looks like this: