Sunday, April 18, 2021

System design: ElasticSearch - similar to database? | My first hour reading

 

April 18, 2021

Here is the link. I am not sure the quality of article. I just quickly googled and chose one to read. 

  1. ElasticSearch is a highly scalable open-source full-text search and analytic engine. 
  2. A distributed system on top of Lucene Standard Analyzer for indexing and automatic type guessing and utilizes a JSON based REST API to refer to Lucene features.
    1. Lucene Standard Analyzer
    2. indexing and automatic type guessing
    3. a JSON based REST API to refer to Lucene featureas
  3. Elasticsearch is standing as a NOSQL DB

An Overview on Elasticsearch and its usage

A brief introduction

It is easy to set up out of the box since it ships with sensible defaults and hides complexity from beginners. It has a short learning curve to grasp the basics so anyone with a bit of efforts can become productive very quickly. It is schema-less, using some defaults to index the data.

In the case of consumers searching for product information from Ecommerce websites catalogs are facing issues such as a long time in product information retrieval. This leads to poor user experience and in turn missing the potential customer. Today business is looking for alternate ways where the big amount of data is stored in such a way that the retrieval is quick.
This can be achieved by adopting NOSQL rather than RDBMS (Relational Database Management System) for storing data.

Elasticsearch is standing as a NOSQL DB because:

  • it easy-to-use
  • Has a great community
  • Compatibility with JSON
  • Broad use cases

Backend components

Node

Cluster

Index

Document

Shard and Replicas

The Elastic stack

Kibana

Kinbana console

Logstash

logstash node stats

Elasticsearch use cases

  • Main data store: Create searchable catalog, document store and logging system.
  • Complementary Technology: add visualization capabilities to SQL, mongoDB, cast indexing and search to Hadoop, or add processing and storage to kafka.
  • Additive technology: In case you have already logs in Elasticsearch, you may want to add metrics, monitoring, and analytics capabilities.

Netflix

  • The message you receive when you join the service.
  • Once people joined they receive messages about the content they might enjoy or new feature on the server.
  • Once they know more about you through Machine Learning algorithms, they send more engaging and personalized messages about what you might like or enjoy to watch.
  • In case you decide to leave the service they tell you how to come back.

This is all done through emails, app push notifications, and text messages. To accomplish that in an efficient way they need to know almost instantly about possible issues in the delivery of the message. For this reason Elasticsearch was introduced (previously they were using distributed grep) for message life cycle.
In a nutshell each status message is recorded on Elasticsearch and the proper team is able to filter each category by writing a query on Kibana.

Let’s say a new movie has been introduced, in this case the “new title” message must be delivered to all customers.
Using Kibana they can see in real time how many people got notified with the new message and the message delivery success rate. They can also verify the reason why some of the messages have failed. This has introduced the ability to investigate and tackle issues much faster such as the high rate message failure in Brazil in 2012.

By using a pie chart in Kibana they were able to find out almost instantly a huge amount of invalid memberships failures. Following up with the National provider they discovered that on July 29 the digit 9 was added to the left of all existing mobile numbers in many Brazilian regions, regardless of their former initial digits. This change was meant to increase the numbering capacity in metropolitan areas like São Paulo, thereby eliminating the perennial shortage of available numbers in that area.


Thanks to Elasticsearch they had the capability to discover all these failures near real-time and promptly follow up with the provider.

Tinder

  • Personalized: Machine Learning algorithms are also utilized in this context.
  • Location based: to find a match based on where you are at a certain point in time.
  • Bidirectional: to know which users will swipe right on each other, which basically means a match.
  • Realtime: The entire interaction has to happen within milliseconds from a massive amount of users and with many variables associated with each of them.

Considering all these functionalities the backend reality is very complex broadening from data science and machine learning, to bidirectional ranking and geolocation. Elasticsearch cornerstone is to make those components work together in a very efficient way.


In this situation performance is a hurdle. For this reason they have been cooperating with the Elasticsearch team to fine-tune many parameters and to solve bugs. In this way they have been supporting the Elasticsearch community and helped to improve the overall Elastic stack product while improving the user experience of Tinder itself.

Cisco Commerce Delivery Platform

  • Add Fault Tolerance working in active/active mode. RDBMS are not distributed and are not fault tolerant.
  • Rank based and type ahead Search for data sourced from multiple Databases on 30/40 attributes to get sub-seconds responses.
  • Global search: if no specific objects are specified in your search, the search engine will find results against multiple objects.

Cisco Threat intelligence department

Since 2017 they use logstash and kibana to detect and analyze possible global scale threads.

Conclusions

  • It allows to zoom out to your data using aggregation and make sense of billions of log lines.
  • It combines different type of searches: structured, unstructured, Geo, application search, security analytics, metrics, and logging.
  • It is really fast and it can run the same way on you laptop with a single node or on a cluster with hundreds of servers, making very easy prototyping.
  • It uses standard RESTful APIs and JSON. The community has also built and maintains clients in many languages such as Java, Python, .NET, SQL, Perl, PHP etc.
  • It is possible to put the real-time search and analytics features of Elasticsearch to work on your big data by using the Elasticsearch-Hadoop (ES-Hadoop) connector.
  • Tools like Kibana and Logstash allow you to make sense of your data in very simple and immediate ways by using charts and performing granular searches.

In this article we have only scratched the surface of Elasticsearch power and use cases, and the variety of business challenges is able solve. If you are interested to know more or to test it, have a look at their product page and their tutorials for a quick start. If you are curios on how to create a basic search-only app using django and elasticsearch, I encourage you to check out my previous articles.

No comments:

Post a Comment