Sunday, April 18, 2021

System design: Elasticsearch Tutorial for Beginners | Learn the Elastic Stack Architecture | Frank Kane

 April 18, 2021

Here is the link. 

In this tutorial, Elasticsearch Tutorial for Beginners, Udemy instructor, Frank Kane will cover Elasticsearch, the Elastic Stack, Kibana, Beats, and Logstash in depth. This free online tutorial has been updated for Elasticsearch 6! Elasticsearch is an important tool in your big data and data processing arsenal – often, it can return results in milliseconds when it would take Apache Spark or Hadoop hours! Elasticsearch is not just for search, it is a full featured data analytics and visualization ecosystem that aggregate and analyze massive data sets very quickly. To learn more, explore the full course on Udemy. Get a discount by using the following link: https://bit.ly/2s6ahiK We will start with a high level overview of the Elastic Stack ecosystem, and how its components (Elasticsearch, Beats, Logstash, and Kibana) all fit together, and how they are used. Next we will cover how Elasticsearch organizes data, using documents, types, and indices. Also covered: • Inverted Indexes and the fundamentals of search engines • TF/IDF (Term Frequency / Inverse Document Frequency) • Elasticsearch APIs including REST, client APIs, and web-based UIs such as Kibana • Sharding and how indices are hashed into shards • Replication across primary and replica shards You will learn what the Elastic Stack is all about, and how it achieves its high scalability and resiliency to failure at very low latencies. Understanding Elasticsearch architecture is the first step toward becoming a developer or administrator of an Elasticsearch cluster. You may find that an Elasticsearch cluster is a great complement to your Spark or Hadoop clusters, and it’s especially well suited for collecting and analyzing web log data. #Udemy #ITeachOnUdemy #Elasticsearch

15:41

ElasticSearch architecture

An index is split into shards. 

Documents are hashed to a particular shard. Each shard may be on a different node in a cluster. Every shard is a self-contained Lucene index of its own. 

This index has two primary shards and two replicas. Your application should round-robin requests amongst nodes. 

Node 1, node 2, node 3

  1. Node 1- primary 1, replica 0
  2. Node 2 - replica 0, replica 1
  3. Node 3 - primary 0, replica 1
Fault tolerance - tolerant two nodes - 
Write requests are routed to the primary shard, then replicated
Read requests are routed to the primary or any replica

----------------------------
The number of primary shards cannot be changed later
Not as bad as it sounds - you can add more replica shards for more read throughput

Worst case you can re-index your data. 
The number of shards can be set up front via a PUT command via REST/ HTTP

resilience - primary 
What purpose do inverted indices serve? 

They quickly map search terms to documents
An index configured for 5 primary shards and 3 replicas would have how many shards in total
  1. 8
  2. 15
  3. 20
Elasticsearch is built only for full-text search of documents. 
- false

Compete Google analytics. 

Put /testindex
{
  "settings": {
"number_of_shards": 3, 
"number_of_replicas":1
 }
}

Elasticsearch
  • Started off as scalable Lucene
  • Horizontally scalable search engine
  • Each "shard" is an inverted index of documents
  • But not just for full text search!
  • Can handle structured data, and can aggregate data quickly
  • Often a faster solution than Hadoop/Spark/Flink/etc.
Json request - 

Kibana

  • Web UI for searching and visualizing
  • Complex aggregations, graphs, charts
  • Often used for log analysis
Kibana - similar to Google analytics 

Logstash/ Bests
  • Ways to feed data into Elasticsearch
  • FileBeat can monitor log files, parse them, and import into Elasticsearch in near-real-time
  • Logstash also pushes data into 
X-Pack
  • Security
  • Alerting 
  • Monitoring 
  • Reporting 
  • Machine learning
  • Graph Exploration


No comments:

Post a Comment