Data Analytics with Elasticsearch, Logstash and Kibana
Last Updated on
Jul 18, 2024
ELK stack, scales nicely and works together seamlessly, is a combination of three open source projects –
- Elasticsearch: founded in 2012, commercially supported open-source, built on top of Lucene, uses JSON and has rich API
- Logstash: it’s there since 2009, as a method to stash logs
- Kibana: it’s around since 2011, to visualize event data
ELK is mostly used in log analysis and end to end Big Data analytics. This is a mini tutorial on setting up ELK stack so that you can implement the solution on top of it.
ELK Stack Installation Steps
- Go to its official website of elastic and download below products in a separate directory
- Extract all the three downloads. Here in this tutorial we are using windows10 as a host or OS.
- To start Elasticsearch
- Go to the <<Elasticsearch>>/bin and run elasticsearch.bat as an administrator.
- After starting Elasticsearch server check http://localhost:9200 in browser to confirm the startup.
- To start Kibana
- Go to the <<Kibana>>/bin and run kibana.bat as an administrator.
- After Kibana server is started check http://localhost:5601 in web browser.
- To start Logstash
- Go to the bin directory of Logstash and open command prompt as an administrator
logstash -e 'input { stdin { } } output { stdout {} }'
logstash -e ‘input { stdin { } } output { stdout {} }’
- When the main pipeline starts (“Pipeline main started”), type any message in the command prompt.
- If everything is working seamlessly, Logstash will return your message with appended timestamp and IP.
- Go to the bin directory of Logstash and open command prompt as an administrator
Architectural Description of ELK Stack
As we can see in the above architecture, Logstash collects the raw data from various sources like HDFS, logs (system logs, HTTP logs, proxy logs etc.), Twitter streams, MySQL, etc and sends for further processes. Let’s try to nibble every component from this ELK stack and
1. Elasticsearch
Elasticsearch is a highly scalable real-time distributed search engine, which is mostly used for analysing and indexing the data.
- It uses Lucene engine for fast searching and indexing.
- It uses full text based searching.
- Elasticsearch is an unstructured database which stores the data in the documents.
- Elasticsearch runs in cluster mode and data is distributed on every node.
Elasticsearch RDBMS Index Database Shard Shard Mapping Table Field Field JSON Object Tuple - Comparison between Relational database and Elasticsearch
- “Index” in Elasticsearch is a collection of different type of documents and document properties. When data is pushed to the Elasticsearch, the data is arranged in indexes of Lucene, then Elasticsearch uses the Lucene indexes to read/write operations.
- To create Index, raise a PUT request http://localhost:9200/index_name
You can search your data with http://localhost:9200/index_name/_search? As shown in below screenshot
2. Logstash
As shown in the above architectural diagram
- Logstash collects logs and events from various sources like HDFS, MySql, logs (system logs, application logs, network logs), twitter etc and.
- It transforms the data and sends to the Elasticsearch database.
- At the same time Logstash uses a number of inputs, filters and output plugins. It transforms the raw data based on specified filters in its configuration file.
- Here is an example of Logstash configuration file
- Above file contains the information of input location, output location and the filter (This needs to be applied to the processed data.)
The following command will help you to start Logstash with configuration file
3. Kibana
Kibana is an opensource visualization tool which provides a beautiful web interface to visualize the Elasticsearch data.
- Kibana allows us to create real-time dashboards in browser based interfaces.
- Kibana has different visualization effects like bar charts, graphs, pie charts, maps, tables etc.
- It allows to save, edit, delete and share the dashboards.
- After starting Kibana.bat file open http://localhost:5601 in browser and go to Management View like in the below screenshot
- From the above picture select your “Index_name” and move ahead to work on that Index.
- Discover option will allow you to see the data as shown in the below screenshot.
- Dashboard option will allow you to create your own dashboard which can have multiple visuals as in the below screenshot
Kibana “DevTool” option helps you to interact with elasticsearch data. For example, if I want to search records of my Index. , we can see how it works below
4. Elasticsearch-Hadoop
Use Cases or Examples of ELK Implementations
- DELL – Powering the Search to Put the Customer First.
- Facebook– Delivering a better help experience for over a billion users
- Microsoft– Providing search on Azure and powering Social Dynamics
- IBM– Providing the operational log analysis engine for Bluemix Apps
- Salesforce– Empowering businesses with log analysis for usage trends
- Accenture– Powering the search for the best client service
- Sprint– Analyzing 200 dashboards to search for better retail operations insight
- Symantec– Successfully switched from Solr to Elasticsearch with Elastic Support
- SunHotels– Scaling anomaly detection across 1000+ bookings a day with Elastic machine learning
- BBC– Unlocking yesterday’s content for the future of media search
TatvaSoft being a Software Development Company and working over the time with various projects have a deal with the Big Data Analytics services and consultancy for the clients from various industries. We even conveyed a project to the Media & Entertainment Industry using Elastic Search functionality for boosting up the purpose and process.
Comments