Sunday, April 25, 2021

Udemy tech: How to Choose the Right Database? - MongoDB, Cassandra, MySQL, HBase - Frank Kane

April 25, 2021

Here is the link. 

Explore the full course on Udemy (special discount included in the link): https://www.udemy.com/the-ultimate-ha...

Choosing the right database for your application is no easy task. You have a wide variety of options relational databases such as MySQL, or distributed NoSQL solutions such as MongoDB, Cassandra, and HBase. NoSQL has come to mean not only SQL as many distributed database systems do in fact support SQL-style queries, as long as you are not doing complex join operations and this further blurs the lines between these systems. We will talk about how to analyze the requirements of your system in terms of consistency, availability, and partition-tolerance, and how to apply the CAP theorem to guide your choice after showing you where different database technologies fall on the sides of the CAP triangle. We will also talk about more practical considerations, such as your budget, need for professional support, and the ease of integration into the other systems already in place in your organization. Maybe you dont even need a distributed storage solution at all! Choosing the right technology for your data storage will save you a lot of pain as your application grows and evolves and making the wrong choice can lead to all sorts of maintenance problems and wasted work. Your instructor is Frank Kane of Sundog Education, bringing nine years of experience as a senior engineer and senior manager at Amazon.com and IMDb.com, where his job involved extracting meaning from their massive data sets, and processing that data in a highly distributed manner.


5:08 CAP consideration 7:55 Keep it simple 9:05 example - simple phone directory app 10:57 another example 13:07 example for Cassandra 14:54 build a massive stock trading system 15:02 care about consistency more than anything 15:14 deal with big data 16:01 discussion about choices 17:46
Show less
  1. Support consideration
  2. Budget considerations? Probably not
  3. CAP considerations
  4. Keep it simple - minimum and simple as possible, deploy - easy to maintain

Mongo DB - get support - paid support
Cassandra - open source, Linux, all are free.
Cost - Amazon web server, all are paid by demand
Budget may not be big concern

MySQL
Cassandra
Apache - HBASE, mongo DB

Availability - a few seconds, a few minutes -
Consistency - real transaction, stock trade, finance analysis - prefer consistency

Read triangle - CAP considerations
Tradeoff - cassandra - configure consistency - 
Any of technologies 

MySQL - set up sharding 

Google analytics - inside my own website - how to approach this problem? 
Analytics - Hadoop - thousands of time a second
Just import log data into HDFS - no external database
Spark - machine learning 
Hadoop -> HDFS -> Tableau -> No large audience

Google analytics - millions of people get it in the same time

Movie recommendation - 
a big Spark job that produces movie recommendations for end users rightly
Downtime is not tolerated
Must be fast
Eventual consistency OK - it's just ...
cassandra - 

You are building a massive stock trading system 
Consistency is more important than anything 
"Big data" is present
It's really, really important - so having access to professional support might be a good idea. And you have enough budget to pay for it. 

CAP theorem, requirement, simplicity - what might be a good architecture for trading system. 

Hadoop cluster - stock trading system - analytics 
Front end - database - consistency - not fit for one server, transaction, security, outside ...

Triangle 
Consistency is more important than availability - care about 

mongoDB - behind company behind - support, figure out what support out there, track records, find right partners 

what edge of triangle you are focusing ...?
Cassandra - specify consistency - make it work, professional company to make it work
MySQL - partition tolerance, stock ticket symbol - guide you towards ...
HBASE, mongoDB - security concern - simplicity 

New process - HDFS






No comments:

Post a Comment