Introduction to Elasticsearch


preview imageDevOps
by Anurag Srivastava,Apr 14, 2018, 1:18:05 PM | 6 minutes |


Next blogs on Elasticsearch of this series:

Elasticsearch Installation and Configuration on Ubuntu 14.04
Elasticsearch REST APIs
Basics of Data Search in Elasticsearch
Log analysis with Elastic stack


Elasticsearch is a full-text search engine that can be used as a NoSQL database and can be used as an analytics engine. It is easy to scale, schema-less, near real-time and provides a restful interface for different operations. It is schema-less and uses an inverted index for data storage. Elasticsearch is created in Java and built on top of Lucene. We can explain Elasticsearch by following terms:

  • Full-text Search Engine
  • NoSQL Database
  • Analytics Engine
  • Easy to Scale
  • RESTFul interface
  • Schema-less
  • Inverted Index
  • Near Real-Time
  • Elastic Stack

These are the characteristics of Elasticsearch and we can use them in the following ways:

  • Elasticsearch as the primary backend for your website.
  • Adding Elasticsearch to an existing system running through an existing data source.
  • Use Elasticsearch for monitoring and analysis of the existing application without affecting the behavior of the current application.

Elasticsearch can be used in different applications as it has different language clients through which we can integrate it in any application. Some of the clients are as follows:

  • Java
  • PHP
  • Perl
  • Python
  • .NET
  • Ruby
  • JavaScript
  • Groovy

We can have different use cases to use Elasticsearch like:

  • Online Web Store
  • Price Alerting Platform
  • Analytics / Business-intelligence
  • Central Log Management
  • Fraud Management
  • System Monitoring
  • E-commerce Search Solutions
  • Visualizing Data

There are the following components of Elasticsearch:
Cluster:

A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch". In a cluster there can be a single or multiple nodes and we can call it a single node cluster or multi-node cluster accordingly. Distributed behaviour of Elasticsearch allows the cluster to scale it horizontally increasing the node count by adding more machines.

Node:
A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID) that is assigned to the node at startup. In a single cluster, you can have as many nodes as you want. Nodes can be of different types like master node, data node, ingest node, machine learning node. 

Master node is used for supervision as it tracks which node is part of the cluster or which shards to allocate to which nodes. Master node is important to maintain a healthy cluster of Elasticsearch. We can create a master node by changing the 'node.master' option as true in Elasticsearch configuration file.

Data nodes are responsible for storing data and performing CRUD operation also it helps to perform data search and aggregation.  We can create a data node by changing the 'node.data' option as true in Elasticsearch configuration file.

Ingest nodes are used to enrich and transform data before the actual index process. It provides a data ingest pipeline using which we can transform data as per the requirement. We can create a ingest node by changing the 'node.ingest' option as true in Elasticsearch configuration file.

Machine learning nodes help us to run dedicated machine learning jobs. These are needed whenever we want to run machine learning jobs using Elastic Stack. We can create a machine learning node by changing the 'node.ml' option as true in Elasticsearch configuration file.

Index:
An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data. It is a logical namespace to store similar types of documents. 

Document:
A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order. An Elasticsearch document is a single record stored in the form of JSON key-value pair, key is the name of the field while value denoted the value of that field. Elasticsearch document is flexible and we can store different set of documents in a single index. 

Shard:
An index can potentially store a large amount of data that can exceed the hardware limits of a single node. To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster. Shards are of two types primary and replica. Primary shards contain the main primary data while replica shard contains copy of primary data.

Just take an example here: Let's say we have an Elasticsearch cluster with two nodes, now we want to index a data set with 2 primary shards and one replica shard.  On two nodes data will be stored in a way that we are not going to loose any data, in case one machine fails. Please refer to the below diagram:


                                                     

In the above diagram, P1 and P2 are primary shards while R1 and R2 are replica shards. Now in any node we have complete data so even if one machine goes down, we can still fetch the complete set of data.


In this blog, I have just given an introduction to Elasticsearch. In the next blog, I will cover the details like how to index and search the documents in Elasticsearch.

Other Blogs on Elastic Stack:
Introduction to Elasticsearch

Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack 
Elasticsearch Rest API
Basics of Data Search in Elasticsearch
Elasticsearch Rest API
Wildcard and Boolean Search in Elasticsearch
Configure Logstash to push MySQL data into Elasticsearch 
Metrics Aggregation in Elasticsearch
Bucket Aggregation in Elasticsearch
How to create Elasticsearch Cluster

If you found this article interesting, then you can explore “Mastering Kibana 6.0”, “Kibana 7 Quick Start Guide”, “Learning Kibana 7”, and “Elasticsearch 7 Quick Start Guide” books to get more insight about Elastic Stack, how to perform data analysis, and how you can create dashboards for key performance indicators using Kibana.


You can also follow me on:

- LinkedIn: https://www.linkedin.com/in/anubioinfo/

- Twitter: https://twitter.com/anu4udilse

- Medium: https://anubioinfo.medium.com




    Comments (2)

    • user image
      jitender yadav
      Apr 15, 2018, 11:26:01 AM

      Sir, can you please elaborate all terms like cluster, node, index, type, document, shard in different blogs ..

    • user image
      Anurag Srivastava
      Apr 15, 2018, 3:15:30 PM

      Sure I will do that wait for couple of days as I am little busy for a presentation.

    Leave a comment

    Related Blogs

    preview thumbnail
    Introduction to Logstash

    Dec 20, 2019, 11:38:31 AM | Anurag Srivastava

    preview thumbnail
    Importing MongoDB data into Elasticsearch

    Mar 9, 2019, 8:20:38 AM | Anurag Srivastava

    preview thumbnail
    Importing MySQL data into Elasticsearch

    Feb 9, 2019, 12:06:18 PM | Anurag Srivastava

    preview thumbnail
    Snapshot and Restore Elasticsearch Indices

    Sep 16, 2019, 5:55:06 AM | Anurag Srivastava

    preview thumbnail
    Log analysis with Elastic stack

    Jan 31, 2018, 6:11:29 AM | Anurag Srivastava

    preview thumbnail
    Creating Elasticsearch Cluster

    Apr 6, 2019, 8:41:41 PM | Anurag Srivastava

    Top Blogs

    preview thumbnail
    Configure SonarQube Scanner with Jenkins

    Jun 21, 2018, 4:58:11 AM | Anurag Srivastava

    preview thumbnail
    Execute Commands on Remote Machines using sshpass

    Jul 16, 2018, 5:00:02 PM | Anurag Srivastava

    preview thumbnail
    Importing MongoDB data into Elasticsearch

    Mar 9, 2019, 8:20:38 AM | Anurag Srivastava

    preview thumbnail
    Importing MySQL data into Elasticsearch

    Feb 9, 2019, 12:06:18 PM | Anurag Srivastava

    preview thumbnail
    Configure Jenkins for Automated Code Deployment

    Jun 13, 2018, 3:44:01 PM | Anurag Srivastava

    preview thumbnail
    Deploying Angular code using Python script

    Jun 26, 2018, 4:50:18 PM | Anurag Srivastava