Bucket Aggregation in Elasticsearch
Bucket aggregation is like a group by the result of the RDBMS query where we group the result with a certain field. In the case of Elasticsearch, we use to bucket data on the basis of certain criteria. In metrics aggregations, we can calculate metrics on a field while in the bucket we don't perform calculations but just create buckets with the documents which can be clubbed on the basis of certain criteria. In bucket aggregations, we can create sub aggregations.
There are different types of bucket aggregations but I will focus on some of the common bucket aggregations like term aggregation, range aggregation, filters aggregation, and filter aggregation, etc. So let's start.
Term Aggregation:
In term aggregation, we use to bucket the data in the form of unique field values. for example:
GET bqstack/_search?size=0
{
"aggs": {
"blog_categories" : {
"terms" : {
"field" : "category_name",
"size" : 5
}
}
}
}
In the above expression, we are creating the bucket on blog categories using term aggregation. I have used size to limit the number of the bucket to 5. Above expression will give the following result:
{
"took": 32,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 55,
"max_score": 0,
"hits": []
},
"aggregations": {
"blog_categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 19,
"buckets": [
{
"key": "programming",
"doc_count": 12
},
{
"key": "devops",
"doc_count": 9
},
{
"key": "news",
"doc_count": 8
},
{
"key": "poetry",
"doc_count": 5
},
{
"key": "informational",
"doc_count": 4
}
]
}
}
}
In the same way, we can use term aggregation on any field to create the bucket with unique values for that field.
Range Aggregation:
Using range aggregation we can bucket the data using a certain range like in blogs we have different views and we can create range aggregation using blog views. By using the views fields we can bucket the data on a certain range. See the below example:
GET bqstack/_search?size=0
{
"aggs": {
"blog_categories" : {
"range" : {
"field" : "views",
"ranges": [
{ "key":"Less popular", "to": 50 },
{ "key":"popular","from": 50, "to": 100 },
{ "key":"Most popular","from": 100, "to": 200}
]
}
}
}
}
In the above expression, we are creating buckets on the basis of range aggregation where we are taking the views field and provided the criteria using which we want the bucket like from 0 to 50 views, 50 to 100 views and 100 to 200 views. Also, there is a key field using which we can provide a custom label for the range. Now let's see the result:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 55,
"max_score": 0,
"hits": []
},
"aggregations": {
"blog_categories": {
"buckets": [
{
"key": "Less popular",
"to": 50,
"doc_count": 25
},
{
"key": "popular",
"from": 50,
"to": 100,
"doc_count": 12
},
{
"key": "Most popular",
"from": 100,
"to": 200,
"doc_count": 6
}
]
}
}
}
In the above result, we have three buckets with Less popular, popular and most popular blogs.
Filter Aggregation:
We use filter aggregation to narrow down the number of documents used for aggregation. As the filter is used to filter out the documents based on certain criteria and after applying the filter we can apply the aggregation. See below example:
GET bqstack/_search?size=0
{
"aggs": {
"blog_categories": {
"filter": {
"term": {
"category_name.keyword": "DevOps"
}
},
"aggs": {
"avg_views": {
"avg": {
"field": "views"
}
}
}
}
}
}
In the above expression first I have filtered the data with the category as DevOps and then applied the aggregation to get the average of blog views. We would get the following result after executing the above expression:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 55,
"max_score": 0,
"hits": []
},
"aggregations": {
"blog_categories": {
"doc_count": 9,
"avg_views": {
"value": 916.55
}
}
}
}
In above query execution result, we are getting the average views for DevOps category. In this way, we can apply filter aggregation.
Other Blogs on Elastic Stack:
Introduction to Elasticsearch
Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack
Elasticsearch Rest API
Basics of Data Search in Elasticsearch
Elasticsearch Rest API
Wildcard and Boolean Search in Elasticsearch
Configure Logstash to push MySQL data into Elasticsearch
Metrics Aggregation in Elasticsearch
Bucket Aggregation in Elasticsearch
How to create Elasticsearch Cluster
If you found this article interesting, then you can explore “Mastering Kibana 6.0”, “Kibana 7 Quick Start Guide”, “Learning Kibana 7”, and “Elasticsearch 7 Quick Start Guide” books to get more insight about Elastic Stack, how to perform data analysis, and how you can create dashboards for key performance indicators using Kibana.
You can also follow me on:
- LinkedIn: https://www.linkedin.com/in/anubioinfo/
- Twitter: https://twitter.com/anu4udilse
- Medium: https://anubioinfo.medium.com
Comments (0)
Leave a comment
Related Blogs
Introduction to Kibana
Aug 1, 2020, 6:19:45 PM | Anurag Srivastava
Metrics Aggregations in Elasticsearch
Aug 18, 2018, 6:02:20 PM | Anurag Srivastava
Introduction to Elasticsearch Aggregations
Aug 14, 2018, 4:47:56 PM | Anurag Srivastava
Wildcard and Boolean Search in Elasticsearch
Aug 10, 2018, 7:14:40 PM | Anurag Srivastava
Basics of Data Search in Elasticsearch
Aug 4, 2018, 7:02:21 AM | Anurag Srivastava
Elasticsearch REST APIs
Jul 31, 2018, 6:16:42 PM | Anurag Srivastava
Top Blogs
Wildcard and Boolean Search in Elasticsearch
Aug 10, 2018, 7:14:40 PM | Anurag Srivastava
Elasticsearch REST APIs
Jul 31, 2018, 6:16:42 PM | Anurag Srivastava
How to count number of words in a HTML string and find Read time in Python 3
Jun 30, 2018, 12:07:47 PM | jitender yadav
Create a Chess board in PHP
Mar 9, 2020, 8:45:41 AM | Rocky Paul
Bucket Aggregation in Elasticsearch
Aug 29, 2018, 7:15:06 PM | Anurag Srivastava
Metrics Aggregations in Elasticsearch
Aug 18, 2018, 6:02:20 PM | Anurag Srivastava