Elasticsearch · Python · Search

How to Query Elasticsearch with Python

Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It’s a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language.

This article provides an overview on how to query Elasticsearch from Python. There are two main options:

  • Implement the REST-API calls to Elasticsearch
  • Use one of the Python libraries that does the above for you

Quick Intro on Elasticsearch

Elasticsearch is developed in Java on top of Lucene, but the format for configuring the index and querying the server is JSON. Once the server is running, by default it’s accessible at localhost:9200 and we can start sending our commands via e.g. curl:

curl -XPOST http://localhost:9200/test/articles/1 -d '{
    "content": "The quick brown fox"
}'

This commands creates a new document, and since the index didn’t exist, it also creates the index. Specifically, the format for the URL is:

http://hostname:port/index_name/doc_type/doc_id

so we have just created an index “test” which contains documents of type “articles”. The document has only one field, “content”. Since we didn’t specify, the content is indexed using the default Lucene analyzer (which is usually a good choice for standard English). The document id is optional and if we don’t explicitly give one, the server will create a random hash-like one.

We can insert a few more documents, see for example the file create_index.sh from the code snippets on github.

Once the documents are indexed, we can perform a simple search, e.g.:

curl -XPOST http://localhost:9200/test/articles/_search?pretty=true -d '{
    "query": {
        "match": {
            "content": "dog"
        }
    }
}'

Using the sample documents above, this query should return only one document. Performing the same query over the term “fox” rather than “dog” should give instead four documents, ranked according to their relevance.

How the Elasticsearch/Lucene ranking function works, and all the countless configuration options for Elasticsearch, are not the focus of this article, so bear with me if we’re not digging into the details. For the moment, we’ll just focus on how to integrate/query Elasticsearch from our Python application.

Querying Elasticsearch via REST in Python

One of the option for querying Elasticsearch from Python is to create the REST calls for the search API and process the results afterwards. The requests library is particularly easy to use for this purpose. We can install it with:

sudo pip install requests

The sample query used in the previous section can be easily embedded in a function:

def search(uri, term):
    """Simple Elasticsearch Query"""
    query = json.dumps({
        "query": {
            "match": {
                "content": term
            }
        }
    })
    response = requests.get(uri, data=query)
    results = json.loads(response.text)
    return results

The “results” variable will be a dictionary loaded from the JSON response. We can pretty-print the JSON, to observe the full output and understand all the information it provides, but again this is beyond the scope of this post. So we can simply print the results nicely, one document per line, as follows:

def format_results(results):
    """Print results nicely:
    doc_id) content
    """
    data = [doc for doc in results['hits']['hits']]
    for doc in data:
        print("%s) %s" % (doc['_id'], doc['_source']['content']))

Similarly, we can create new documents:

def create_doc(uri, doc_data={}):
    """Create new document."""
    query = json.dumps(doc_data)
    response = requests.post(uri, data=query)
    print(response)

with the doc_data variable being a (Python) dictionary which resembles the structure of the document we’re creating.

You can see a full working toy example in the rest.py file in the Gist on github.

Querying Elasticsearch Using elasticsearch-py

The requests library is fairly easy to use, but there are several options in terms of libraries that abstract away the concepts related to the REST API and focus on Elasticsearch concepts. In particular, the official Python extension for Elasticsearch, called elasticsearch-py, can be installed with:

sudo pip install elasticsearch

It’s fairly low-level compared to other client libraries with similar capabilities, but it provides a consistent and easy to extend API.

We can replicate the search used with the requests library, as well as the result print-out, just using a few lines of Python:

from elasticsearch import Elasticsearch

es = Elasticsearch()
res = es.search(index="test", doc_type="articles", body={"query": {"match": {"content": "fox"}}})
print("%d documents found" % res['hits']['total'])
for doc in res['hits']['hits']:
    print("%s) %s" % (doc['_id'], doc['_source']['content']))

In a similar fashion, we can re-create the functionality of adding an extra document:

es.create(index="test", doc_type="articles", body={"content": "One more fox"})

The full functionality of this client library are well described in the documentation.

Summary

This article has briefly discussed a couple of options to integrate Elasticsearch into a Python application. The key points of the discussion are:

  • We can interact with Elasticsearch using the REST API
  • The requests library is particularly useful for this purpose, and probably much cleaner and easier to use than the urllib module (part of the standard library)
  • Many other Python libraries implement an Elasticsearch client, abstracting away the concept related to the REST API and focusing on Elasticsearch concepts
  • We have seen simple examples with elasticsearch-py

The full code for the examples is available as usual in a Gist:
https://gist.github.com/bonzanini/fe2ff32116f16e3009be

2 thoughts on “How to Query Elasticsearch with Python

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s