Importing CSV Data into Elasticsearch

by Anurag Srivastava,Feb 9, 2019, 6:34:22 PM | 4 minutes |

In this blog, I am going to explain how you can import publicly available CSV data into Elasticsearch. Elastic Stack enables us to easily analyze any data and can help us to create dashboards with key performance indicators. CSV data for different domains like healthcare, crime, agriculture, etc are available on different government sites which we can easily download. After downloading that CSV file we can push it in Elasticsearch to perform search and analysis on top of that data. I have seen many times people don't know how we can import these CSV data into Elasticsearch and that is why in this blog I have explained this process in a step by step way.

After data import, you can use this data for data analysis or for creating different dashboards. Here I am taking the example of 'Crimes - 2001 to present' from the data.gov website (https://catalog.data.gov/dataset?res_format=CSV). From this website, you can download different types of data in CSV format. The size of this CSV file is approximately 1.6 GB.

Now, let us start the process to import this data into Elasticsearch. You need to do the following:

Download the CSV file (crimes_2001.csv) from "https://catalog.data.gov/dataset?res_format=CSV" website. This file has the following fields:

ID | Case Number | Date | Block | IUCR | Primary Type | Description | Location | Description | Arrest | Domestic | Beat | District | Ward | Community Area | FBI Code | X Coordinate | Y Coordinate | Year | Updated On | Latitude | Longitude | Location

Create a Logstash configuration file for reading the CSV data and writing it to Elasticsearch. You need to write following expression in Logstash configuration file (crimes.conf):

input {
    file {
        path => "/home/user/Downloads/crimes_2001.csv"
        start_position => beginning
    }
}
filter {
    csv {
        columns => [
                "ID",
                "Case Number",
                "Date",
                "Block",
                "IUCR",
                "Primary Type",
                "Description",
                "Location Description",
                "Arrest",
                "Domestic",
                "Beat",
                "District",
                "Ward",
                "Community Area",
                "FBI Code",
                "X Coordinate",
                "Y Coordinate",
                "Year",
                "Updated On",
                "Latitude",
                "Longitude",
                "Location"
        ]
        separator => ","
        }
}
output {
    stdout
    {
        codec => rubydebug
    }
     elasticsearch {
        action => "index"
        hosts => ["127.0.0.1:9200"]
        index => "crimes"
    }
}

After creating the Logstash configuration file execute the configuration with the following command:

/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/crimes.conf

This command will create the pipeline to read the CSV data and to write it into Elasticsearch.

If you want to know the basics of Logstash then please refer to the "Introduction to Logstash" blog where I have explained the basics of Logstash.

You can verify the index "crimes" creation in Elasticsearch by listing indices in the browser:

http://localhost:9200/_cat/indices?v

If your index "crimes" is listed you can see the data in Elasticsearch:

http://localhost:9200/crimes/_search?pretty

After opening the above URL you can see the data in the following format:


{
        "_index" : "crimes",
        "_type" : "_doc",
        "_id" : "BTTXFWoB75utKkMR2zRC",
        "_score" : 1.0,
        "_source" : {
          "Case Number" : "HY190059",
          "Block" : "066XX S MARSHFIELD AVE",
          "FBI Code" : "26",
          "IUCR" : "4625",
          "X Coordinate" : "1166468",
          "Ward" : "15",
          "Y Coordinate" : "1860715",
          "Beat" : "0725",
          "Location Description" : "STREET",
          "Domestic" : "false",
          "Community Area" : "67",
          "Updated On" : "02/10/2018 03:50:01 PM",
          "Primary Type" : "OTHER OFFENSE",
          "Year of Crime" : "2015",
          "host" : "KELLGGNLPTP0305",
          "Date" : "03/18/2015 11:00:00 PM",
          "path" : "/home/user/Downloads/crimes.csv",
          "Arrest" : "true",
          "Longitude" : "-87.665319468",
          "id" : "10000094",
          "Description" : "PAROLE VIOLATION",
          "@timestamp" : "2019-04-13T08:37:05.351Z",
          "District" : "007",
          "Latitude" : "41.773371528",
          "@version" : "1"
        }
}

Above Elasticsearch document from the 'crimes' index is representing a single record from the CSV file.

In this way you can push any CSV data into Elasticsearch and then can perform search, analytics or create dashboards using that data. If you have any query please comment.

Other Blogs on Elastic Stack:
Introduction to Elasticsearch
Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack
Elasticsearch Rest API
Basics of Data Search in Elasticsearch
Elasticsearch Rest API
Wildcard and Boolean Search in Elasticsearch
Configure Logstash to push MySQL data into Elasticsearch
Configure Logstash to push MongoDB data into Elasticsearch
Metrics Aggregation in Elasticsearch
Bucket Aggregation in Elasticsearch
How to create Elasticsearch Cluster

In case of any doubt please leave your comments. You can also follow me on Twitter: https://twitter.com/anubioinfo

If you found this article interesting, then you can explore “Mastering Kibana 6.0”, “Kibana 7 Quick Start Guide”, “Learning Kibana 7”, and “Elasticsearch 7 Quick Start Guide” books to get more insight about Elastic Stack, how to perform data analysis, and how you can create dashboards for key performance indicators using Kibana.

1 likes | 8967 views | 5 comments | bookmark |

Comments (5)

Ram P
Apr 12, 2019, 8:38:45 PM
@anurag You may want to remove the column names from the quotes as logstash doesn't know how to filter your column names
Anurag Srivastava
Apr 13, 2019, 8:30:12 AM
@Ram: this is required to map the field names as if we remove the column from filter it will generate following source in Elasticsearch: "_source" : { "column14" : "25", "column21" : "-87.744384567", "@version" : "1", "column2" : "HY189866", "column12" : "011", "column8" : "STREET", "column20" : "41.891398861", "host" : "KELLGGNLPTP0305", "column15" : "04B", "column1" : "10000092", "column13" : "28", "column10" : "false", "column7" : "AGGRAVATED: HANDGUN", "column5" : "041A", "column3" : "03/18/2015 07:44:00 PM", "path" : "/home/user/Downloads/crimes.csv", "column17" : "1903566", "column9" : "false", "column19" : "02/10/2018 03:50:01 PM", "column6" : "BATTERY", "column16" : "1144606", "column22" : "(41.891398861, -87.744384567)", "@timestamp" : "2019-04-13T08:26:32.732Z", "column4" : "047XX W OHIO ST", "column11" : "1111", "column18" : "2015", "message" : """10000092,HY189866,03/18/2015 07:44:00 PM,047XX W OHIO ST,041A,BATTERY,AGGRAVATED: HANDGUN,STREET,false,false,1111,011,28,25,04B,1144606,1903566,2015,02/10/2018 03:50:01 PM,41.891398861,-87.744384567,"(41.891398861, -87.744384567)"""" }
Ram P
Apr 13, 2019, 2:25:01 PM
@anurag I meant to say is to remove the quotes. You would need the column names for identification. Also did you try auto_detect_column_names =>true. ? By defining this ELK would take the row row as column names when the input is CSV
Anurag Srivastava
Apr 13, 2019, 3:27:09 PM
@Ram: Thanks for the comment, I have just checked the auto_detect_column_names =>true and it is working.
Abhishek Kulkarni
Feb 24, 2022, 11:03:53 AM
How to load multiple CSV files to Logstash

Importing CSV Data into Elasticsearch

Comments (5)

Ram P

Apr 12, 2019, 8:38:45 PM

Anurag Srivastava

Apr 13, 2019, 8:30:12 AM

Ram P

Apr 13, 2019, 2:25:01 PM

Anurag Srivastava

Apr 13, 2019, 3:27:09 PM

Abhishek Kulkarni

Feb 24, 2022, 11:03:53 AM

Leave a comment

Related Blogs

Introduction to Logstash

Importing MongoDB data into Elasticsearch

Importing MySQL data into Elasticsearch

Snapshot and Restore Elasticsearch Indices

Log analysis with Elastic stack

Creating Elasticsearch Cluster

Top Blogs

Configure SonarQube Scanner with Jenkins

Execute Commands on Remote Machines using sshpass

Importing MongoDB data into Elasticsearch

Importing MySQL data into Elasticsearch

Configure Jenkins for Automated Code Deployment

Deploying Angular code using Python script

Categories