Julia's coding blog - Practice makes perfect

From January 2015, she started to practice leetcode questions; she trains herself to stay focus, develops "muscle" memory when she practices those questions one by one. 2015年初, Julia开始参与做Leetcode, 开通自己第一个博客. 刷Leet code的题目, 她看了很多的代码, 每个人那学一点, 也开通Github, 发表自己的代码, 尝试写自己的一些体会. She learns from her favorite sports – tennis, 10,000 serves practice builds up good memory for a great serve. Just keep going. Hard work beats talent when talent fails to work hard.

Thursday, March 24, 2022

MapReduce: Fundamentals of MapReduce with MapReduce Example

March 24, 2022

Here is the example.

So, just like in the traditional way, I will split the data into smaller parts or blocks and store them in different machines. Then, I will find the highest temperature in each part stored in the corresponding machine. At last, I will combine the results received from each of the machines to have the final output. Let us look at the challenges associated with this traditional approach:

Critical path problem: It is the amount of time taken to finish the job without delaying the next milestone or actual completion date. So, if, any of the machines delay the job, the whole work gets delayed.
Reliability problem: What if, any of the machines which are working with a part of data fails? The management of this failover becomes a challenge.
Equal split issue: How will I divide the data into smaller chunks so that each machine gets even part of data to work with. In other words, how to equally divide the data such that no individual machine is overloaded or underutilized.
Single split may fail: If any of the machines fail to provide the output, I will not be able to calculate the result. So, there should be a mechanism to ensure this fault tolerance capability of the system.
Aggregation of the result: There should be a mechanism to aggregate the result generated by each of the machines to produce the final output.

These are the issues which I will have to take care individually while performing parallel processing of huge datasets when using traditional approaches.

To overcome these issues, we have the MapReduce framework which allows us to perform such parallel computations without bothering about the issues like reliability, fault tolerance etc. Therefore, MapReduce gives you the flexibility to write code logic without caring about the design issues of the system.

Julia's coding blog - Practice makes perfect

Thursday, March 24, 2022

MapReduce: Fundamentals of MapReduce with MapReduce Example

No comments:

Post a Comment