Importing MySQL data into Elasticsearch


preview imageDevOps
by Anurag Srivastava,Feb 9, 2019, 12:06:18 PM | 5 minutes |

In this blog, I am going to explain how you can push the MySQL data into Elasticsearch using Logstash. Here I am taking the example of a BQStack website that is built using the MySQL database. So basically what I am going to do is to configure Logstash using the Logstash JDBC input plugin to connect with the MySQL database. After connecting to the MySQL database I will execute the Logstash configuration to fetch the records from the database and will push these records into the Elasticsearch index.

If you want to know the basics of Logstash then please refer to the "Introduction to Logstash" blog where I have explained the basics of Logstash.

Once we push the data from the MySQL database into Elasticsearch, we can create the dashboards in Kibana as per our requirement. I am using the Ubuntu OS for this configuration demonstration. Now let us start the process using which we can push MySQL data into Elasticsearch. For that, we first need to create the Logstash configuration file inside /etc/logstash/conf.d/ directory. So let us create a file as blog.conf and write the following code:

# file: blog.conf
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-5.1.23-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# mysql jdbc connection string to our database, mydb
jdbc_connection_string => "jdbc:mysql://url-of-db:3306/db_name?zeroDateTimeBehavior=convertToNull"
# The user we wish to execute our statement as
jdbc_user => "username"
jdbc_password => "password"
schedule => "* * * * *"
# our query to fetch blog details
statement => "SELECT blg.*, concat(au.first_name, ' ',au.last_name) as name,au.email as email, cc.category_name, cc.category_image FROM `blog_blogs` as blg left join auth_user as au on au.id = blg.author_id left join category_category as cc on cc.id = blg.category_id where blg.id > :sql_last_value order by blg.create_date"
use_column_value => true
tracking_column => id
tracking_column_type => "numeric"
}
}
output {
elasticsearch {
hosts => "http://127.0.0.1:9200"
index => "bqstack"
document_type => "blogs"
}
}


In the above Logstash configuration file, we have input and output section, under the input section, we are connecting to MySQL database for fetching the data and under the output section, we are sending that data to Elasticsearch cluster.  Under the input section, we are using the JDBC plugin where the first is jdbc_driver_library which tells the JDBC driver library path. JDBC input plugin does not contain the JDBC driver so we need to download it and then provide the path under the jdbc_driver_library parameter. Next is jdbc_driver_class where we need to provide the driver class, then we need to provide the JDBC connection string. For connection string, it has a fixed syntax where we have to provide the DB type, URL of the database, the port of database, database name, and then we need to set username and password parameters of the database.

Once these database connection related parameters are set we can set the scheduler by setting the schedule parameter after that we have to write the actual query which we are going to execute on the MySQL database after a successful connection. There is a specific syntax using which we can set the schedule frequency. The syntax for the JDBC input plugin is quite similar to the cron job. For example:

"* * * * * " => runs every second
"30 2 * * *" => runs as 2:30AM
"10 22 * * *" => runs at 10:10PM


The statement parameter is there to write the query, in the above query I have compared the blog id with sql_last_value which is dynamic and refers to the id column of the table. After each query execution, the value of sql_last_value is updated and set in a file. Next time the query is executed this value is picked from the file. We are providing the data type of the tracking column along with the tracking column name, an id is a numeric value so I have given it as a numeric type for tracking_column_type parameter.

After doing all those configuration changes we need to execute Logstash configuration using the following command on Ubuntu:

/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/blog.conf --path.data=/tmp/bq


This is a one time command and after running it, the scheduler will start working as per our scheduler entry, the query will be executed every second. We can do the cron entry to auto start the Logstash configuration execution once the system restarts. We need to run the following command for opening the crontab in Linux:

crontab -e

The above command opens the crontab file where we can write the following entry to ensure that after machine restart Logstash configuration is executed and the JDBC input plugin scheduler starts executing the queries to fetch the data from the database.

@reboot /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/blog.conf --path.data=/tmp/bq

The above crontab entry starts with @reboot which works after each machine restart after that we have given the expression to execute the Logstash configuration.

In this way, we can configure Logstash using the JDBC input plugin to read the RDBMS data and put it into Elasticsearch. Once data is there in Elasticsearch we can create the visualizations and dashboards using Kibana.

Other Blogs on Elastic Stack:

Configure Logstash to push MongoDB data into Elasticsearch
Load CSV Data into Elasticsearch
Introduction to Elasticsearch

Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack 
Elasticsearch Rest API
Basics of Data Search in Elasticsearch
Elasticsearch Rest API
Wildcard and Boolean Search in Elasticsearch
Metrics Aggregation in Elasticsearch
Bucket Aggregation in Elasticsearch
How to create Elasticsearch Cluster

If you found this article interesting, then you can explore “Mastering Kibana 6.0”, “Kibana 7 Quick Start Guide”, “Learning Kibana 7”, and “Elasticsearch 7 Quick Start Guide” books to get more insight about Elastic Stack, how to perform data analysis, and how you can create dashboards for key performance indicators using Kibana.

You can also follow me on:

- LinkedIn: https://www.linkedin.com/in/anubioinfo/
- Twitter: https://twitter.com/anu4udilse
- Medium: https://anubioinfo.medium.com



Comments (12)

  • user image
    Amitav Swain
    May 31, 2019, 4:54:54 AM

    Can I pass more than two columns into the tracking_column because i want to track the two column value change

  • user image
    jitender yadav
    Jun 5, 2019, 11:41:42 AM

    same question, can you please help us

  • user image
    Jitu Singh
    Jun 5, 2019, 12:08:20 PM

    Hi ANurag

  • user image
    Anurag Srivastava
    Jun 5, 2019, 12:11:33 PM

    @Amitav, @Jitender: No you can not use more than one column here and the reason behind it is that we want to track the change in the table which can easily be done using the auto increment field or timestamp field.

  • user image
    sampath kumar
    Dec 2, 2019, 11:34:47 AM

    How to mask sensitive data in logstash? I want to pass it has a parameter DB -user name and password. (Instead of hard code) How to pass encrypted USerName and Password in Logstash Config file: input { jdbc { jdbc_driver_library => "C:\Program Files\DbVisualizer\jdbc\oracle\ojdbc6.jar" jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver" jdbc_connection_string => "jdbc:oracle:thin:@TEST:1244:TESTDB" jdbc_user => "user_id" #jdbc_password => "" jdbc_validate_connection => true jdbc_fetch_size => 1000 schedule => "* * * * *" statement => "Select EMP_ID from Employee_data" } } output { elasticsearch { protocol => http index => "testdb" document_type => "TESTDB" host => "127.0.0.1:9200" bind_host => "127.0.0.1" } }

  • user image
    Anurag Srivastava
    Dec 2, 2019, 12:11:50 PM

    Hi Sampath, You can use the keystore for creating the key and store there value using the keystore. After that we can pass the keystore key in the Logstash configuration file. set +o history export LOGSTASH_KEYSTORE_PASS=mypassword set -o history bin/logstash-keystore create Please refer to the below link: https://www.elastic.co/guide/en/logstash/current/keystore.html

  • user image
    sampath kumar
    Dec 2, 2019, 12:31:41 PM

    I could not understand that concept. even i am not able to see the path also The keystore must be located in Logstash’s path.settings directory.

  • user image
    Anurag Srivastava
    Dec 2, 2019, 12:36:14 PM

    The keystore must be located in Logstash’s path.settings directory. This is the same directory that contains the logstash.yml file. You will see a warning if the path.settings is not pointed to the same directory as the logstash.yml In your command you can pass the path as well: sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash create

  • user image
    sampath kumar
    Dec 2, 2019, 12:54:38 PM

    I am getting error starting logstash-keystore--path.setting is not recognized as an internal or external command.

  • user image
    Anurag Srivastava
    Dec 2, 2019, 12:59:09 PM

    As per your screenshot there is a typo error also please maintain the space before the key.

  • user image
    Anurag Srivastava
    Dec 2, 2019, 1:05:21 PM

    Please run the following commands and let me know--------------------------------------------- bin/logstash-keystore create ----------------------------------------------------------------------------------------- bin/logstash-keystore add JDBC_PWD

  • user image
    ganesh khe
    Jan 19, 2021, 8:23:38 AM

    Please have a look at my question it is related to logstash one to many relations ship and idea is to load data in elasticsearch but logstash results are sporadic https://stackoverflow.com/questions/65737638/log-stash-not-loading-exact-number-of-records-in-elasticsearch-and-on-every-hit

Leave a comment

Related Blogs

preview thumbnail
Introduction to Logstash

Dec 20, 2019, 11:38:31 AM | Anurag Srivastava

preview thumbnail
Importing MongoDB data into Elasticsearch

Mar 9, 2019, 8:20:38 AM | Anurag Srivastava

preview thumbnail
Snapshot and Restore Elasticsearch Indices

Sep 16, 2019, 5:55:06 AM | Anurag Srivastava

preview thumbnail
Log analysis with Elastic stack

Jan 31, 2018, 6:11:29 AM | Anurag Srivastava

preview thumbnail
Creating Elasticsearch Cluster

Apr 6, 2019, 8:41:41 PM | Anurag Srivastava

preview thumbnail
Introduction to Elasticsearch

Apr 14, 2018, 1:18:05 PM | Anurag Srivastava

Top Blogs

preview thumbnail
Configure SonarQube Scanner with Jenkins

Jun 21, 2018, 4:58:11 AM | Anurag Srivastava

preview thumbnail
Execute Commands on Remote Machines using sshpass

Jul 16, 2018, 5:00:02 PM | Anurag Srivastava

preview thumbnail
Importing MongoDB data into Elasticsearch

Mar 9, 2019, 8:20:38 AM | Anurag Srivastava

preview thumbnail
Importing MySQL data into Elasticsearch

Feb 9, 2019, 12:06:18 PM | Anurag Srivastava

preview thumbnail
Configure Jenkins for Automated Code Deployment

Jun 13, 2018, 3:44:01 PM | Anurag Srivastava

preview thumbnail
Deploying Angular code using Python script

Jun 26, 2018, 4:50:18 PM | Anurag Srivastava