Introduction to Elasticsearch Aggregations

by Anurag Srivastava,Aug 14, 2018, 4:47:56 PM | 4 minutes |

Aggregations provide us the option to group and extract statistics from our data. aggregations give the insight of our data and can be used for a wide range of problems like we can use Elasticsearch aggregations for creating a recommendation engine through which we can implement the recommendation system on any website.

Now, let us jump to the Elasticsearch aggregations and learn how we can apply data aggregations in Elasticsearch. There are mainly four types of aggregations in Elasticsearch:

Metric: Here we can extract metrics on a set of documents like on a numeric field we can get the average, max, min etc.
Matrix: This type of aggregations works on multiple fields of the document and after extracting the values from those fields it creates the matrix which provides the insight of those fields.
Bucketing: The bucketing aggregations is like group by of RDBMS where we can aggregate the data in a form of the bucket which holds the data as per the bucket criteria. So here we can group the data in different buckets and these buckets hold the data as per the applied criteria.

We will see these aggregations types in detail now. So let us start by understanding the syntax of aggregations:

"aggregationss|aggs" {
   "<name of aggregations>" : {
    "<type of aggregations>" : {
        <body of aggregations>
    }
   }
}

This is the simplest representation of Elasticsearch aggregations. Now let us see what is the meaning of each line of example.

- The first line denotes the aggregation keyword where we can use "aggregations" or "aggs".
- In the second line, we need to specify a name for the aggregation.
- In the third line, we need to specify the type of aggregation like terms, etc.
- Then we need to specify the actual aggregation body.

Now let us see the data format which I am going to use for the aggregation:

{
        "_index": "bqstack",
        "_type": "blogs",
        "_id": "EwJnGWQBnhG38eKPq5Bo",
        "_score": 1,
        "_source": {
          "category_name": "Cars",
          "name": "Rocky Paul",
          "edit_approved": false,
          "email": "rocky.paul.9867@xyz.com",
          "edited_blog_content": null,
          "category_id": 35,
          "author_id": 75,
          "create_date": "2018-05-09T13:28:20.917Z",
          "preview_image": "blog_57.png",
          "approved": false,
          "views": 148,
          "@version": "1",
          "blog_content": """
<p><span class="storyText"><p class="MsoNormal"><span lang="EN-GB">The central government approved green licence plates for electric vehicles </span> 
""",
          "tags": "",
          "id": 57,
          "blog_title": "Centre approves green licence plates for electric cars",
          "update_date": "2018-05-16T18:30:22.669Z",
          "category_image": "cars.jpg",
          "@timestamp": "2018-06-19T18:56:20.427Z"
        }
      }

Above document is taken from the index bqstack and will be used to demonstrate Elasticsearch aggregation. This is the introduction of aggregations blog so here I will explain the simplest form of Elasticsearch aggregation. See the below example:

GET bqstack/_search?size=0
{
  "aggs": {
    "blog_categories" : {
      "terms" : {
        "field" : "category_name",
        "size" : 5
      }
    }
  }
}

In the above example we are doing the following:
- Given size=0 after _search API to stop listing the documents.
- Keyword "aggs" is there to tell Elasticsearch that I am going to apply the aggregations. We can use "aggregations" instead of "aggs".
- I have given the name as "blog_categories" to make the aggregation name meaningful because we are going to bucket on category names.
- After specifying the aggregation name we are simply providing the term to specify the field name.
- I have also added, "size" = 5 as there are multiple categories and I am interested in top 5 categories only.

After executing the above expression we would get the following response:

{
  "took": 16,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 54,
    "max_score": 0,
    "hits": []
  },
  "aggregationss": {
    "blog_categories": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 19,
      "buckets": [
        {
          "key": "programming",
          "doc_count": 11
        },
        {
          "key": "devops",
          "doc_count": 9
        },
        {
          "key": "news",
          "doc_count": 8
        },
        {
          "key": "poetry",
          "doc_count": 5
        },
        {
          "key": "informational",
          "doc_count": 4
        }
      ]
    }
  }
}

In this way, we can create a bucket for any field of the document. This was the basic blog for aggregations and in my next blog of aggregations, I will explain more complex examples using which we can get better insights into our data.

Other Blogs on Elastic Stack:
Introduction to Elasticsearch
Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack
Elasticsearch Rest API
Basics of Data Search in Elasticsearch
Elasticsearch Rest API
Wildcard and Boolean Search in Elasticsearch
Configure Logstash to push MySQL data into Elasticsearch
Metrics Aggregation in Elasticsearch
Bucket Aggregation in Elasticsearch
How to create Elasticsearch Cluster

If you found this article interesting, then you can explore “Mastering Kibana 6.0”, “Kibana 7 Quick Start Guide”, “Learning Kibana 7”, and “Elasticsearch 7 Quick Start Guide” books to get more insight about Elastic Stack, how to perform data analysis, and how you can create dashboards for key performance indicators using Kibana.

You can also follow me on:

- LinkedIn: https://www.linkedin.com/in/anubioinfo/

- Twitter: https://twitter.com/anu4udilse

- Medium: https://anubioinfo.medium.com

2 likes | 6528 views | 0 comments | bookmark |

Introduction to Elasticsearch Aggregations

Comments (0)

Leave a comment

Related Blogs

Introduction to Kibana

Bucket Aggregation in Elasticsearch

Metrics Aggregations in Elasticsearch

Wildcard and Boolean Search in Elasticsearch

Basics of Data Search in Elasticsearch

Elasticsearch REST APIs

Top Blogs

Wildcard and Boolean Search in Elasticsearch

Elasticsearch REST APIs

How to count number of words in a HTML string and find Read time in Python 3

Create a Chess board in PHP

Bucket Aggregation in Elasticsearch

Metrics Aggregations in Elasticsearch

Categories