Introduction to Elasticsearch

by Anurag Srivastava,Apr 14, 2018, 1:18:05 PM | 6 minutes |

Next blogs on Elasticsearch of this series:

Elasticsearch Installation and Configuration on Ubuntu 14.04
Elasticsearch REST APIs
Basics of Data Search in Elasticsearch
Log analysis with Elastic stack

Elasticsearch is a full-text search engine that can be used as a NoSQL database and can be used as an analytics engine. It is easy to scale, schema-less, near real-time and provides a restful interface for different operations. It is schema-less and uses an inverted index for data storage. Elasticsearch is created in Java and built on top of Lucene. We can explain Elasticsearch by following terms:

Full-text Search Engine
NoSQL Database
Analytics Engine
Easy to Scale
RESTFul interface
Schema-less
Inverted Index
Near Real-Time
Elastic Stack

These are the characteristics of Elasticsearch and we can use them in the following ways:

Elasticsearch as the primary backend for your website.
Adding Elasticsearch to an existing system running through an existing data source.
Use Elasticsearch for monitoring and analysis of the existing application without affecting the behavior of the current application.

Elasticsearch can be used in different applications as it has different language clients through which we can integrate it in any application. Some of the clients are as follows:

Java
PHP
Perl
Python
.NET
Ruby
JavaScript
Groovy

We can have different use cases to use Elasticsearch like:

Online Web Store
Price Alerting Platform
Analytics / Business-intelligence
Central Log Management
Fraud Management
System Monitoring
E-commerce Search Solutions
Visualizing Data

There are the following components of Elasticsearch:
Cluster:

A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch". In a cluster there can be a single or multiple nodes and we can call it a single node cluster or multi-node cluster accordingly. Distributed behaviour of Elasticsearch allows the cluster to scale it horizontally increasing the node count by adding more machines.

Node:

A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID) that is assigned to the node at startup. In a single cluster, you can have as many nodes as you want. Nodes can be of different types like master node, data node, ingest node, machine learning node.

Master node is used for supervision as it tracks which node is part of the cluster or which shards to allocate to which nodes. Master node is important to maintain a healthy cluster of Elasticsearch. We can create a master node by changing the 'node.master' option as true in Elasticsearch configuration file.

Data nodes are responsible for storing data and performing CRUD operation also it helps to perform data search and aggregation. We can create a data node by changing the 'node.data' option as true in Elasticsearch configuration file.

Ingest nodes are used to enrich and transform data before the actual index process. It provides a data ingest pipeline using which we can transform data as per the requirement. We can create a ingest node by changing the 'node.ingest' option as true in Elasticsearch configuration file.

Machine learning nodes help us to run dedicated machine learning jobs. These are needed whenever we want to run machine learning jobs using Elastic Stack. We can create a machine learning node by changing the 'node.ml' option as true in Elasticsearch configuration file.

Index:
An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data. It is a logical namespace to store similar types of documents.

Document:

A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order. An Elasticsearch document is a single record stored in the form of JSON key-value pair, key is the name of the field while value denoted the value of that field. Elasticsearch document is flexible and we can store different set of documents in a single index.

Shard:

An index can potentially store a large amount of data that can exceed the hardware limits of a single node. To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster. Shards are of two types primary and replica. Primary shards contain the main primary data while replica shard contains copy of primary data.

Just take an example here: Let's say we have an Elasticsearch cluster with two nodes, now we want to index a data set with 2 primary shards and one replica shard. On two nodes data will be stored in a way that we are not going to loose any data, in case one machine fails. Please refer to the below diagram:

In the above diagram, P1 and P2 are primary shards while R1 and R2 are replica shards. Now in any node we have complete data so even if one machine goes down, we can still fetch the complete set of data.

In this blog, I have just given an introduction to Elasticsearch. In the next blog, I will cover the details like how to index and search the documents in Elasticsearch.

Other Blogs on Elastic Stack:
Introduction to Elasticsearch
Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack
Elasticsearch Rest API
Basics of Data Search in Elasticsearch
Elasticsearch Rest API
Wildcard and Boolean Search in Elasticsearch
Configure Logstash to push MySQL data into Elasticsearch
Metrics Aggregation in Elasticsearch
Bucket Aggregation in Elasticsearch
How to create Elasticsearch Cluster

If you found this article interesting, then you can explore “Mastering Kibana 6.0”, “Kibana 7 Quick Start Guide”, “Learning Kibana 7”, and “Elasticsearch 7 Quick Start Guide” books to get more insight about Elastic Stack, how to perform data analysis, and how you can create dashboards for key performance indicators using Kibana.

You can also follow me on:

- LinkedIn: https://www.linkedin.com/in/anubioinfo/

- Twitter: https://twitter.com/anu4udilse

- Medium: https://anubioinfo.medium.com

1 likes | 7760 views | 2 comments | bookmark |

Comments (2)

jitender yadav
Apr 15, 2018, 11:26:01 AM
Sir, can you please elaborate all terms like cluster, node, index, type, document, shard in different blogs ..
Anurag Srivastava
Apr 15, 2018, 3:15:30 PM
Sure I will do that wait for couple of days as I am little busy for a presentation.

Introduction to Elasticsearch

Comments (2)

jitender yadav

Apr 15, 2018, 11:26:01 AM

Anurag Srivastava

Apr 15, 2018, 3:15:30 PM

Leave a comment

Related Blogs

Introduction to Logstash

Importing MongoDB data into Elasticsearch

Importing MySQL data into Elasticsearch

Snapshot and Restore Elasticsearch Indices

Log analysis with Elastic stack

Creating Elasticsearch Cluster

Top Blogs

Configure SonarQube Scanner with Jenkins

Execute Commands on Remote Machines using sshpass

Importing MongoDB data into Elasticsearch

Importing MySQL data into Elasticsearch

Configure Jenkins for Automated Code Deployment

Deploying Angular code using Python script

Categories