Introduction to Elasticsearch Aggregations
Aggregations provide us the option to group and extract statistics from our data. aggregations give the insight of our data and can be used for a wide range of problems like we can use Elasticsearch aggregations for creating a recommendation engine through which we can implement the recommendation system on any website.
Now, let us jump to the Elasticsearch aggregations and learn how we can apply data aggregations in Elasticsearch. There are mainly four types of aggregations in Elasticsearch:
- Metric: Here we can extract metrics on a set of documents like on a numeric field we can get the average, max, min
etc . - Matrix: This type of aggregations works on multiple fields of the document and after extracting the values from those fields it creates the matrix which provides the insight of those fields.
- Bucketing: The bucketing aggregations is like group by of RDBMS where we can aggregate the data in a form of the bucket which holds the data as per the bucket criteria. So here we can group the data in different buckets and these buckets hold the data as per the applied criteria.
We will see these aggregations types in detail now. So let us start by understanding the syntax of aggregations:
"aggregationss|aggs" {
"<name of aggregations>" : {
"<type of aggregations>" : {
<body of aggregations>
}
}
}
This is the simplest representation of Elasticsearch aggregations. Now let us see what is the meaning of each line of example.
- The first line denotes the aggregation keyword where we can use "aggregations" or "
- In the second line, we need to specify a name for the aggregation.
- In the third line, we need to specify the type of aggregation like terms, etc.
- Then we need to specify the actual aggregation body.
Now let us see the data format which I am going to use for the aggregation:
{
"_index": "bqstack",
"_type": "blogs",
"_id": "EwJnGWQBnhG38eKPq5Bo",
"_score": 1,
"_source": {
"category_name": "Cars",
"name": "Rocky Paul",
"edit_approved": false,
"email": "rocky.paul.9867@xyz.com",
"edited_blog_content": null,
"category_id": 35,
"author_id": 75,
"create_date": "2018-05-09T13:28:20.917Z",
"preview_image": "blog_57.png",
"approved": false,
"views": 148,
"@version": "1",
"blog_content": """
<p><span class="storyText"><p class="MsoNormal"><span lang="EN-GB">The central government approved green licence plates for electric vehicles </span>
""",
"tags": "",
"id": 57,
"blog_title": "Centre approves green licence plates for electric cars",
"update_date": "2018-05-16T18:30:22.669Z",
"category_image": "cars.jpg",
"@timestamp": "2018-06-19T18:56:20.427Z"
}
}
Above document is taken from the index
GET bqstack/_search?size=0
{
"aggs": {
"blog_categories" : {
"terms" : {
"field" : "category_name",
"size" : 5
}
}
}
}
In the above example we are doing the following:
- Given size=0 after _search API to stop listing the documents.
- Keyword "
- I have given the name as "blog_categories" to make the aggregation name meaningful because we are going to bucket on category names.
- After specifying the aggregation name we are simply providing the term to specify the field name.
- I have also added, "size" = 5 as there are multiple categories and I am interested in top 5 categories only.
After executing the above expression we would get the following response:
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 54,
"max_score": 0,
"hits": []
},
"aggregationss": {
"blog_categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 19,
"buckets": [
{
"key": "programming",
"doc_count": 11
},
{
"key": "devops",
"doc_count": 9
},
{
"key": "news",
"doc_count": 8
},
{
"key": "poetry",
"doc_count": 5
},
{
"key": "informational",
"doc_count": 4
}
]
}
}
}
In this way, we can create a bucket for any field of the document. This was the basic blog for aggregations and in my next blog of aggregations, I will explain more complex examples using which we can get better insights into our data.
Other Blogs on Elastic Stack:
Introduction to Elasticsearch
Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack
Elasticsearch Rest API
Basics of Data Search in Elasticsearch
Elasticsearch Rest API
Wildcard and Boolean Search in Elasticsearch
Configure Logstash to push MySQL data into Elasticsearch
Metrics Aggregation in Elasticsearch
Bucket Aggregation in Elasticsearch
How to create Elasticsearch Cluster
If you found this article interesting, then you can explore “Mastering Kibana 6.0”, “Kibana 7 Quick Start Guide”, “Learning Kibana 7”, and “Elasticsearch 7 Quick Start Guide” books to get more insight about Elastic Stack, how to perform data analysis, and how you can create dashboards for key performance indicators using Kibana.
You can also follow me on:
- LinkedIn: https://www.linkedin.com/in/anubioinfo/
- Twitter: https://twitter.com/anu4udilse
- Medium: https://anubioinfo.medium.com
Comments (0)
Leave a comment
Related Blogs
Introduction to Kibana
Aug 1, 2020, 6:19:45 PM | Anurag Srivastava
Bucket Aggregation in Elasticsearch
Aug 29, 2018, 7:15:06 PM | Anurag Srivastava
Metrics Aggregations in Elasticsearch
Aug 18, 2018, 6:02:20 PM | Anurag Srivastava
Wildcard and Boolean Search in Elasticsearch
Aug 10, 2018, 7:14:40 PM | Anurag Srivastava
Basics of Data Search in Elasticsearch
Aug 4, 2018, 7:02:21 AM | Anurag Srivastava
Elasticsearch REST APIs
Jul 31, 2018, 6:16:42 PM | Anurag Srivastava
Top Blogs
Wildcard and Boolean Search in Elasticsearch
Aug 10, 2018, 7:14:40 PM | Anurag Srivastava
Elasticsearch REST APIs
Jul 31, 2018, 6:16:42 PM | Anurag Srivastava
How to count number of words in a HTML string and find Read time in Python 3
Jun 30, 2018, 12:07:47 PM | jitender yadav
Create a Chess board in PHP
Mar 9, 2020, 8:45:41 AM | Rocky Paul
Bucket Aggregation in Elasticsearch
Aug 29, 2018, 7:15:06 PM | Anurag Srivastava
Metrics Aggregations in Elasticsearch
Aug 18, 2018, 6:02:20 PM | Anurag Srivastava