This article is a brief introduction to Neo4j, one of the most popular graph databases, and its integration with Python.
Graph databases are a family of NoSQL databases, based on the concept of modelling your data as a graph, i.e. a collection of nodes (representing entities) and edges (representing relationships).
The motivation behind the use of a graph database is the need to model small records which are deeply interconnected, forming a complex web that is difficult to represent in a relational fashion. Graph databases are particularly good at supporting queries that actually make use of such connections, i.e. by traversing the graph. Examples of suitable applications include social networks, recommendation engines (e.g. “show me movies that my best friends like”) and many other cases of link-rich domains.
From the Neo4j web-site, we can download the community edition of Neo4j. At the moment of this writing, the last version is 2.2.0, which provides improved performance and a re-design of the UI. To install the software, simply unzip it:
tar zxf neo4j-community-2.2.0-unix.tar.gz ln -s neo4j-community-2.2.0 neo4j
We can immediately run the server:
cd neo4j ./bin/neo4j start
and now we can point the browser to http://localhost:7474 for a nice web GUI. The first time you open the interface, you’ll be asked to set a password for the user “neo4j”.
If you want to stop the server, you can type:
Interfacing with Python
There is no shortage of Neo4j clients available for several programming languages, including Python. An interesting project, which makes use of the Neo4j REST interface, is Neo4jRestClient. Quick installation:
sudo pip install neo4jrestclient
All the features of this client are listed in the docs.
Creating a sample graph
Let’s start with a simple social-network-like application, where users know each others and like different “things”. In this example, users and things will be nodes in our database. Each node can be associated with labels, used to describe the type of node. The following code will create two nodes labelled as User and two nodes labelled as Beer:
from neo4jrestclient.client import GraphDatabase db = GraphDatabase("http://localhost:7474", username="neo4j", password="mypassword") # Create some nodes with labels user = db.labels.create("User") u1 = db.nodes.create(name="Marco") user.add(u1) u2 = db.nodes.create(name="Daniela") user.add(u2) beer = db.labels.create("Beer") b1 = db.nodes.create(name="Punk IPA") b2 = db.nodes.create(name="Hoegaarden Rosee") # You can associate a label with many nodes in one go beer.add(b1, b2)
The second step is all about connecting the dots, which in graph DB terminology means creating the relationships.
# User-likes->Beer relationships u1.relationships.create("likes", b1) u1.relationships.create("likes", b2) u2.relationships.create("likes", b1) # Bi-directional relationship? u1.relationships.create("friends", u2)
We notice that relationships have a direction, so we can easily model subject-predicate-object kind of relationships. In case we need to model bi-directional relationship, like in a friend-of link in a social network, there are essentially two options:
- Add two edge per relationship, one for each direction
- Add one edge per relationship, with an arbitrary direction, and then ignoring the direction in the query
In this example, we’re following the second option.
Querying the graph
The Neo4j Browser available at http://localhost:7474/ provides a nice way to query the DB and visualise the results, both as a list of record and in a visual form.
The query language for Neo4j is called Cypher. It allows to describe patterns in graphs, in a declarative fashion, i.e. just like SQL, you describe what you want, rather then how to retrieve it. Cypher uses some sort of ASCII-art to describe nodes, relationships and their direction.
For example, we can retrieve our whole graph using the following Cypher query:
MATCH (n)-[r]->(m) RETURN n, r, m;
And the outcome in the browser:
In plain English, what the query is trying to match is “any node n, linked to a node m via a relationship r“. Suggestion: with a huge graph, use a LIMIT clause.
Of course we can also embed Cypher in our Python app, for example:
from neo4jrestclient import client q = 'MATCH (u:User)-[r:likes]->(m:Beer) WHERE u.name="Marco" RETURN u, type(r), m' # "db" as defined above results = db.query(q, returns=(client.Node, str, client.Node)) for r in results: print("(%s)-[%s]->(%s)" % (r["name"], r, r["name"])) # The output: # (Marco)-[likes]->(Punk IPA) # (Marco)-[likes]->(Hoegaarden Rosee)
The above query will retrieve all the triplets User-likes-Beer for the user Marco. The results variable will be a list of tuples, matching the format that we gave in Cypher with the RETURN keyword.
Graph databases, one of the NoSQL flavours, provide an interesting way to model data with rich interconnections. Examples of applications that are particularly suitable for graph databases are social networks and recommendation systems. This article has introduced Neo4j, one of the main examples of Graph DB, and its use with Python using the Neo4j REST client. We have seen how to create nodes and relationships, and how to query the graph using Cypher, the Neo4j query language.