Importing CSV Data into Elasticsearch


preview imageDevOps
by Anurag Srivastava,Feb 9, 2019, 6:34:22 PM | 4 minutes |

In this blog, I am going to explain how you can import publicly available CSV data into Elasticsearch. Elastic Stack enables us to easily analyze any data and can help us to create dashboards with key performance indicators. CSV data for different domains like healthcare, crime, agriculture, etc are available on different government sites which we can easily download. After downloading that CSV file we can push it in Elasticsearch to perform search and analysis on top of that data.  I have seen many times people don't know how we can import these CSV data into Elasticsearch and that is why in this blog I have explained this process in a step by step way.

After data import, you can use this data for data analysis or for creating different dashboards. Here I am taking the example of 'Crimes - 2001 to present' from the data.gov website (https://catalog.data.gov/dataset?res_format=CSV). From this website, you can download different types of data in CSV format. The size of this CSV file is approximately 1.6 GB. 

Now, let us start the process to import this data into Elasticsearch. You need to do the following:

  • Download the CSV file (crimes_2001.csv) from "https://catalog.data.gov/dataset?res_format=CSV" website. This file has the following fields:
ID | Case Number | Date | Block | IUCR | Primary Type | Description | Location | Description | Arrest | Domestic | Beat | District | Ward | Community Area | FBI Code | X Coordinate | Y Coordinate | Year | Updated On | Latitude | Longitude | Location
  • Create a Logstash configuration file for reading the CSV data and writing it to Elasticsearch. You need to write following expression in Logstash configuration file (crimes.conf):
input {
    file {
        path => "/home/user/Downloads/crimes_2001.csv"
        start_position => beginning
    }
}
filter {
    csv {
        columns => [
                "ID",
                "Case Number",
                "Date",
                "Block",
                "IUCR",
                "Primary Type",
                "Description",
                "Location Description",
                "Arrest",
                "Domestic",
                "Beat",
                "District",
                "Ward",
                "Community Area",
                "FBI Code",
                "X Coordinate",
                "Y Coordinate",
                "Year",
                "Updated On",
                "Latitude",
                "Longitude",
                "Location"
        ]
        separator => ","
        }
}
output {
    stdout
    {
        codec => rubydebug
    }
     elasticsearch {
        action => "index"
        hosts => ["127.0.0.1:9200"]
        index => "crimes"
    }
}


  • After creating the Logstash configuration file execute the configuration with the following command:
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/crimes.conf 

This command will create the pipeline to read the CSV data and to write it into Elasticsearch.

If you want to know the basics of Logstash then please refer to the "Introduction to Logstash" blog where I have explained the basics of Logstash.

  • You can verify the index "crimes" creation in Elasticsearch by listing indices in the browser:
http://localhost:9200/_cat/indices?v
  • If your index "crimes" is listed you can see the data in Elasticsearch:
http://localhost:9200/crimes/_search?pretty
  • After opening the above URL you can see the data in the following format:

{
        "_index" : "crimes",
        "_type" : "_doc",
        "_id" : "BTTXFWoB75utKkMR2zRC",
        "_score" : 1.0,
        "_source" : {
          "Case Number" : "HY190059",
          "Block" : "066XX S MARSHFIELD AVE",
          "FBI Code" : "26",
          "IUCR" : "4625",
          "X Coordinate" : "1166468",
          "Ward" : "15",
          "Y Coordinate" : "1860715",
          "Beat" : "0725",
          "Location Description" : "STREET",
          "Domestic" : "false",
          "Community Area" : "67",
          "Updated On" : "02/10/2018 03:50:01 PM",
          "Primary Type" : "OTHER OFFENSE",
         "Year of Crime" : "2015",
          "host" : "KELLGGNLPTP0305",
          "Date" : "03/18/2015 11:00:00 PM",
          "path" : "/home/user/Downloads/crimes.csv",
          "Arrest" : "true",
          "Longitude" : "-87.665319468",
          "id" : "10000094",
          "Description" : "PAROLE VIOLATION",
          "@timestamp" : "2019-04-13T08:37:05.351Z",
          "District" : "007",
          "Latitude" : "41.773371528",
          "@version" : "1"
        }
}

Above Elasticsearch document from the 'crimes' index is representing a single record from the CSV file.

In this way you can push any CSV data into Elasticsearch and then can perform search, analytics or create dashboards using that data. If you have any query please comment.

Other Blogs on Elastic Stack:
Introduction to Elasticsearch

Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack 
Elasticsearch Rest API
Basics of Data Search in Elasticsearch
Elasticsearch Rest API
Wildcard and Boolean Search in Elasticsearch
Configure Logstash to push MySQL data into Elasticsearch
Configure Logstash to push MongoDB data into Elasticsearch
Metrics Aggregation in Elasticsearch
Bucket Aggregation in Elasticsearch
How to create Elasticsearch Cluster

In case of any doubt please leave your comments. You can also follow me on Twitter: https://twitter.com/anubioinfo

If you found this article interesting, then you can explore “Mastering Kibana 6.0”, “Kibana 7 Quick Start Guide”, “Learning Kibana 7”, and “Elasticsearch 7 Quick Start Guide” books to get more insight about Elastic Stack, how to perform data analysis, and how you can create dashboards for key performance indicators using Kibana.


Comments (5)

  • user image
    Ram P
    Apr 12, 2019, 8:38:45 PM

    @anurag You may want to remove the column names from the quotes as logstash doesn't know how to filter your column names

  • user image
    Anurag Srivastava
    Apr 13, 2019, 8:30:12 AM

    @Ram: this is required to map the field names as if we remove the column from filter it will generate following source in Elasticsearch: "_source" : { "column14" : "25", "column21" : "-87.744384567", "@version" : "1", "column2" : "HY189866", "column12" : "011", "column8" : "STREET", "column20" : "41.891398861", "host" : "KELLGGNLPTP0305", "column15" : "04B", "column1" : "10000092", "column13" : "28", "column10" : "false", "column7" : "AGGRAVATED: HANDGUN", "column5" : "041A", "column3" : "03/18/2015 07:44:00 PM", "path" : "/home/user/Downloads/crimes.csv", "column17" : "1903566", "column9" : "false", "column19" : "02/10/2018 03:50:01 PM", "column6" : "BATTERY", "column16" : "1144606", "column22" : "(41.891398861, -87.744384567)", "@timestamp" : "2019-04-13T08:26:32.732Z", "column4" : "047XX W OHIO ST", "column11" : "1111", "column18" : "2015", "message" : """10000092,HY189866,03/18/2015 07:44:00 PM,047XX W OHIO ST,041A,BATTERY,AGGRAVATED: HANDGUN,STREET,false,false,1111,011,28,25,04B,1144606,1903566,2015,02/10/2018 03:50:01 PM,41.891398861,-87.744384567,"(41.891398861, -87.744384567)"""" }

  • user image
    Ram P
    Apr 13, 2019, 2:25:01 PM

    @anurag I meant to say is to remove the quotes. You would need the column names for identification. Also did you try auto_detect_column_names =>true. ? By defining this ELK would take the row row as column names when the input is CSV

  • user image
    Anurag Srivastava
    Apr 13, 2019, 3:27:09 PM

    @Ram: Thanks for the comment, I have just checked the auto_detect_column_names =>true and it is working.

  • user image
    Abhishek Kulkarni
    Feb 24, 2022, 11:03:53 AM

    How to load multiple CSV files to Logstash

Leave a comment

Related Blogs

preview thumbnail
Introduction to Logstash

Dec 20, 2019, 11:38:31 AM | Anurag Srivastava

preview thumbnail
Importing MongoDB data into Elasticsearch

Mar 9, 2019, 8:20:38 AM | Anurag Srivastava

preview thumbnail
Importing MySQL data into Elasticsearch

Feb 9, 2019, 12:06:18 PM | Anurag Srivastava

preview thumbnail
Snapshot and Restore Elasticsearch Indices

Sep 16, 2019, 5:55:06 AM | Anurag Srivastava

preview thumbnail
Log analysis with Elastic stack

Jan 31, 2018, 6:11:29 AM | Anurag Srivastava

preview thumbnail
Creating Elasticsearch Cluster

Apr 6, 2019, 8:41:41 PM | Anurag Srivastava

Top Blogs

preview thumbnail
Configure SonarQube Scanner with Jenkins

Jun 21, 2018, 4:58:11 AM | Anurag Srivastava

preview thumbnail
Execute Commands on Remote Machines using sshpass

Jul 16, 2018, 5:00:02 PM | Anurag Srivastava

preview thumbnail
Importing MongoDB data into Elasticsearch

Mar 9, 2019, 8:20:38 AM | Anurag Srivastava

preview thumbnail
Importing MySQL data into Elasticsearch

Feb 9, 2019, 12:06:18 PM | Anurag Srivastava

preview thumbnail
Configure Jenkins for Automated Code Deployment

Jun 13, 2018, 3:44:01 PM | Anurag Srivastava

preview thumbnail
Deploying Angular code using Python script

Jun 26, 2018, 4:50:18 PM | Anurag Srivastava