Bucket Aggregation in Elasticsearch


preview imageProgramming
by Anurag Srivastava,Aug 29, 2018, 7:15:06 PM | 4 minutes |

Bucket aggregation is like a group by the result of the RDBMS query where we group the result with a certain field. In the case of Elasticsearch, we use to bucket data on the basis of certain criteria. In metrics aggregations, we can calculate metrics on a field while in the bucket we don't perform calculations but just create buckets with the documents which can be clubbed on the basis of certain criteria. In bucket aggregations, we can create sub aggregations.

There are different types of bucket aggregations but I will focus on some of the common bucket aggregations like term aggregation, range aggregation, filters aggregation, and filter aggregation, etc. So let's start.

Term Aggregation:
In term aggregation, we use to bucket the data in the form of unique field values. for example:

GET bqstack/_search?size=0
{
  "aggs": {
    "blog_categories" : {
      "terms" : {
        "field" : "category_name",
        "size" : 5
      }
    }
  }
}

In the above expression, we are creating the bucket on blog categories using term aggregation. I have used size to limit the number of the bucket to 5. Above expression will give the following result:

{
  "took": 32,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 55,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "blog_categories": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 19,
      "buckets": [
        {
          "key": "programming",
          "doc_count": 12
        },
        {
          "key": "devops",
          "doc_count": 9
        },
        {
          "key": "news",
          "doc_count": 8
        },
        {
          "key": "poetry",
          "doc_count": 5
        },
        {
          "key": "informational",
          "doc_count": 4
        }
      ]
    }
  }
}

In the same way, we can use term aggregation on any field to create the bucket with unique values for that field.

Range Aggregation:
Using range aggregation we can bucket the data using a certain range like in blogs we have different views and we can create range aggregation using blog views. By using the views fields we can bucket the data on a certain range. See the below example:

GET bqstack/_search?size=0
{
  "aggs": {
    "blog_categories" : {
      "range" : {
        "field" : "views",
        "ranges": [
          { "key":"Less popular", "to": 50 },
          { "key":"popular","from": 50, "to": 100 },
          { "key":"Most popular","from": 100, "to": 200}
        ]
      }
    }
    }
}

In the above expression, we are creating buckets on the basis of range aggregation where we are taking the views field and provided the criteria using which we want the bucket like from 0 to 50 views, 50 to 100 views and 100 to 200 views. Also, there is a key field using which we can provide a custom label for the range. Now let's see the result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 55,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "blog_categories": {
      "buckets": [
        {
          "key": "Less popular",
          "to": 50,
          "doc_count": 25
        },
        {
          "key": "popular",
          "from": 50,
          "to": 100,
          "doc_count": 12
        },
        {
          "key": "Most popular",
          "from": 100,
          "to": 200,
          "doc_count": 6
        }
      ]
    }
  }
}

In the above result, we have three buckets with Less popular, popular and most popular blogs.

Filter Aggregation:
We use filter aggregation to narrow down the number of documents used for aggregation. As the filter is used to filter out the documents based on certain criteria and after applying the filter we can apply the aggregation. See below example:

GET bqstack/_search?size=0
{
  "aggs": {
    "blog_categories": {
      "filter": {
        "term": {
          "category_name.keyword": "DevOps"
        }
      },
      "aggs": {
        "avg_views": {
          "avg": {
            "field": "views"
          }
        }
      }
    }
  }
}

In the above expression first I have filtered the data with the category as DevOps and then applied the aggregation to get the average of blog views. We would get the following result after executing the above expression:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 55,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "blog_categories": {
      "doc_count": 9,
      "avg_views": {
        "value": 916.55
      }
    }
  }
}

In above query execution result, we are getting the average views for DevOps category. In this way, we can apply filter aggregation.


Other Blogs on Elastic Stack:
Introduction to Elasticsearch

Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack 
Elasticsearch Rest API
Basics of Data Search in Elasticsearch
Elasticsearch Rest API
Wildcard and Boolean Search in Elasticsearch
Configure Logstash to push MySQL data into Elasticsearch 
Metrics Aggregation in Elasticsearch
Bucket Aggregation in Elasticsearch
How to create Elasticsearch Cluster

If you found this article interesting, then you can explore “Mastering Kibana 6.0”, “Kibana 7 Quick Start Guide”, “Learning Kibana 7”, and “Elasticsearch 7 Quick Start Guide” books to get more insight about Elastic Stack, how to perform data analysis, and how you can create dashboards for key performance indicators using Kibana.


You can also follow me on:

- LinkedIn: https://www.linkedin.com/in/anubioinfo/
- Twitter: https://twitter.com/anu4udilse
- Medium: https://anubioinfo.medium.com



Comments (0)

Leave a comment

Related Blogs

preview thumbnail
Introduction to Kibana

Aug 1, 2020, 6:19:45 PM | Anurag Srivastava

preview thumbnail
Metrics Aggregations in Elasticsearch

Aug 18, 2018, 6:02:20 PM | Anurag Srivastava

preview thumbnail
Introduction to Elasticsearch Aggregations

Aug 14, 2018, 4:47:56 PM | Anurag Srivastava

preview thumbnail
Wildcard and Boolean Search in Elasticsearch

Aug 10, 2018, 7:14:40 PM | Anurag Srivastava

preview thumbnail
Basics of Data Search in Elasticsearch

Aug 4, 2018, 7:02:21 AM | Anurag Srivastava

preview thumbnail
Elasticsearch REST APIs

Jul 31, 2018, 6:16:42 PM | Anurag Srivastava

Top Blogs

preview thumbnail
Wildcard and Boolean Search in Elasticsearch

Aug 10, 2018, 7:14:40 PM | Anurag Srivastava

preview thumbnail
Elasticsearch REST APIs

Jul 31, 2018, 6:16:42 PM | Anurag Srivastava

preview thumbnail
preview thumbnail
Create a Chess board in PHP

Mar 9, 2020, 8:45:41 AM | Rocky Paul

preview thumbnail
Bucket Aggregation in Elasticsearch

Aug 29, 2018, 7:15:06 PM | Anurag Srivastava

preview thumbnail
Metrics Aggregations in Elasticsearch

Aug 18, 2018, 6:02:20 PM | Anurag Srivastava