Sunday, February 27, 2022

System design: Design twitter | #codeKarle | CodeKarle.com

Feb. 27, 2022

Here is the link. 

I like to copy the content in the following, and then I can easily review the content, add some highlights, and track my learning on this topic. 

Remember that guy who, purely by chance, liveblogged the Osama raid in 2011? Be it political campaigns, social movements, natural disasters, anything! Anything major happens and it is first reported on Twitter.

Which makes this a typical interview question for those in IT - “How would you design a system like Twitter”. Well, let’s design Twitter then!

Functional Requirements
  • Tweet - should allow you to post text, image, video, links, etc
  • Re-tweet - should allow you to share someone’s tweets
  • Follow - this will be a directed relationship. If we follow Barack Obama, he doesn’t have to follow us back
  • Search
Non Functional Requirements
  • Read heavy - The read to write ratio for twitter is very high, so our system should be able to support that kind of pattern
  • Fast rendering
  • Fast tweet.
  • Lag is acceptable - From the previous two NFRs, we can understand that the system should be highly available and have very low latency. So when we say lag is ok, we mean it is ok to get notification about someone else’s tweet a few seconds later, but the rendering of the content should be almost instantaneous.
  • Scalable - 5k+ tweets come in every second on twitter on an average day. On peak times it can easily double up. These are just tweets, and as we have already discussed read to write ratio of twitter is very high i.e. there will be an even higher number of reads happening against these tweets. That is a huge amount of requests per second.
Julia's thoughts:

Ask questions to determine the scale of the twitter, 5000 twitter/ second, 30,000 twitters/ minutes, 1,800,000/ hour, and 24 * 1,800,000/ day. Storage requirement for one day twitter: petabyte size - noSQL - bigTable for example

So how do we design a system that delivers all our functional requirements without compromising the performance? Before we discuss the overall architecture, let’s split our users into different categories. Each of these categories will be handled in a slightly different manner.

  1. Famous Users: Famous users are usually celebrities, sportspeople, politicians, or business leaders who have a lot of followers
  2. Active Users: These are the users who have accessed the system in the last couple of hours or days. For our discussion, we will consider people who have accessed twitter in the last three days as active users.
  3. Live Users: These are a subset of active users who are using the system right now, similar to online on Facebook or WhatsApp..
  4. Passive Users: These are the users who have active accounts but haven’t accessed the system in the last three days.
  5. Inactive Users: These are the “deleted” accounts so to speak. We don’t really delete any accounts, it is more of a soft delete, but as far as the users are concerned the account doesn’t exist anymore.

Now, for simplicity, let us divide the overall architecture into three flows. We will separately look at the onboarding flow, tweet flow, and search and analytics side of the system.

Note: Remember how twitter is a very read-heavy system? Well, while designing a read-heavy system, we need to make sure that we are precomputing and caching as much as we can to keep the latency as low as possible.


Julia's note: Client side:

  1. User onboarding/ Login flow 
  2. User following flow
  3. Adding post UI
  4. User self profile screen
  5. Home screen
  6. Users App/ Browsers
  7. Search screen
Service design:  (Blue color)
  1. User service -> tweet processor 
  2. Graph service -> tweet processor 
  3. Tweet injection service <-> Short URL service
  4. Short URL
  5. Asset service
  6. Timeline service <-> Tweet service 
  7. Live users websockets (Notification)
  8. Search service
  9. Search Kafka consumer
  10. Notification service 
  11. Trend
The relationships are complicated, and I tried to sort it out. It takes time for me to figure out the detail. Be patient. 

Black color 
  1. Load balancer


Data model

  1. User DB MySQL cluster - master, slave 1, slave 2
  2. Redis cluster
  3. User Graph DB MySQL cluster
  4. Cassandra cluster
  5. CDN
  6. Kafka
  7. Apache streaming Spark cluster
  8. Hadoop cluster
Color coding

Green color - UI - App or website
Red color - database, data store, third parties
Blue color - Rest service / Spark jobs / Kafka consumers 
Black color - Load balancer, authorization, authentication layer


Onboarding flow

We have a User Service that will store all the user-related information in our system and provide endpoints for login, register, and any other internal services that need user-related information by providing GET APIs to fetch users by id or email, POST APIs to add or edit user information and bulk GET APIs to fetch information about multiple users. This user service sits on top of a User DB which is a MySQL database. We use MySQL here as we have a finite number of users and the data is very relational. Also, the User DB will mostly power the writes and the reads related to user details will be powered by a Redis cache which is an image of User DB. When user service receives a GET request with a user id, it will firstly lookup in the Redis cache, if the user is present in the Redis it will return the response. Otherwise, it will fetch the information from User DB, store it in Redis, and then respond to the client.

User-follow flow

The “follow” related requests will be serviced by a Graph Service, which creates a network of how the users are connected within the system. Graph service will expose APIs to add follow links, get users followed by a certain user-id or the users following a certain user-id. This Graph service sits on top of a User Graph DB which is again a MySQL DB. Again, the follow-links won’t change too frequently, so it makes sense to cache this information in a Redis. Now in the follow flow we can cache two pieces of information - who are the followers of a particular user, and who is the user following. Similar to the user service, when graph service receives a get request it will firstly lookup in the Redis. If Redis has the information it responds to the user. Otherwise, it fetches the information from the graph DB, stores it in Redis, and responds to the user.

Now, there are some things we can conclude based on a user’s interaction with Twitter, like their interests, etc. So when such events occur, an Analytics Service puts these events in a Kafka.

websocket service - This service keeps an open connection with all the live users, and whenever an event occurs that a live user needs to be notified of, it happens through this service. 

Now, remember our Live users? Say U1 is a live user following U2 and U2 tweets something. Since U1 is live, it makes sense that U1 gets notified immediately. This happens through the User Live Websocket service. This service keeps an open connection with all the live users, and whenever an event occurs that a live user needs to be notified of, it happens through this service. Now based on the user’s interaction with this service we can also track for how long the users are online, and when the interaction stops we can conclude that the user is not live anymore. When the user goes offline, through the websocket service an event will be fired to Kafka which will further interact with user service and save the last active time of the user in Redis, and other systems can use this information to accordingly modify their behavior.

Let me spend 20 minutes to 30 minutes to learn more about asset serivce - uploading and displaying all the multimedia content in a tweet. 

Tweet flow

Now a tweet could contain text, images, videos, or links. We have something called an Asset service which takes care of uploading and displaying all the multimedia content in a Tweet. We have discussed the nitty-gritty of asset service in the Netflix design article, so check that out if you are interested.

Now, we know that tweets have a constraint of 140 characters which can include text and links. Thanks to this limit we cannot post huge URLs in our tweets. This is where a URL shortener service comes in. We are not going into the details of how this service works, but we have discussed it in our Tiny URL article, so make sure to check it out. Now that we have handled the links as well, all that is left is to store the text of a tweet and fetch it when required. This is where the tweet ingestion service comes in. When a user tries to post a tweet and hits the submit button, it calls the tweet ingestion service which stores the tweet in a permanent data store. We use Cassandra here because we will have a huge amount of tweets coming in every day and the query pattern we require here is what Cassandra is best for. To know more about why we use the database solutions we do check out our article about choosing the best storage solutions.

Add one more drawing from the article. 



Now, the tweet ingestion service, as the name suggests, is only responsible for posting the tweets and doesn’t expose any GET APIs to fetch tweets. As soon as a tweet is posted, the tweet ingestion service will fire an event to Kafka saying a tweet id was posted by so and so user id. Now on top of our Cassandra, sits a Tweet service that will expose APIs to get tweets by tweet id or user id.

Now, let’s have a quick look at the users’ side of things. On the read-flow, a user can have a user timeline i.e. the tweets from that user or a home timeline i.e. tweets from the people a user is following. Now a user could have a huge list of users they are following, and if we make all the queries at runtime before displaying the timeline it will slow down the rendering. So we cache the user’s timeline instead. We will precalculate the timeline of active users and cache it in a Redis, so an active user can instantaneously see their timeline. This can be achieved with something called a Tweet processor.

As mentioned before, when a tweet is posted, an event is fired to Kafka. Kafka communicates the same to the tweet processor and creates the timeline for all the users that need to be notified of this recent tweet and cache it. To find out the followers that need to be notified of this change tweet service interacts with the graph service. Suppose user U1, followed by users U2, U3, and U4 posts a tweet T1, then the tweet processor will update the timelines for U2, U3, and U4 with tweet T1 and update the cache.

Now, we have only cached the timelines for active users. What happens when a passive user, say P1, logs in to the system? This is where the Timeline Service comes in. The request will reach the timeline service, timeline service will interact with the user service to identify if P1 is an active user or a passive user. Now since P1 is a passive user, its timeline is not cached in Redis. Now the timeline service will talk to the graph service to find a list of users that P1 follows, then queries the tweet service to fetch tweets of all those users, caches them in the Redis, and responds back to the client.

Active user vs live user -> live user will get notification, and tweet processer creates timeslines for active users and saves them in Redis; Live user will be notified through websockets service. 

Now we have seen the behavior for active and passive users. How will we optimize the flow for our live users? As we have previously discussed, when a tweet is successfully posted an event will be sent to Kafka. Kafka will then talk to the tweet processor which creates timelines for active users and saves them in Redis. But here if the tweet processor identifies that one of the users that need to be updated is a live user, then it will fire an event to Kafka which will now interact with the live websocket service we briefly discussed before. This websocket service will now send a notification to the app and update the timeline.

So now our system can successfully post tweets with different types of content and has some optimization built in to handle active, passive, and live users in a somewhat different manner. But it is still a fairly inefficient system. Why? Because we completely forgot about our famous users! If Donald Trump has 75 million followers, then every time Trump tweets about something, our system needs to make 75 million updates. And this is just one tweet from one user. So this flow will not work for our famous users.

Redis cache will only cache the tweets from non-famous users in the precalculated timelines. Timeline service knows that Redis only stores tweets from normal users. It interacts with graph service to get a list of famous users followed by our current user, say U1, and it fetches their tweets from the tweet service. It will then update these tweets in Redis and add a timestamp indicating when the timeline was last updated. When the next request comes from U1, it checks if the timestamp in Redis against U1 is from a few minutes back. If so, it will query the tweet service again. But if the timestamp is fairly recent, Redis will directly respond back to the app.

Now we have handled active, passive, live, and famous users. As for the inactive users, they are already deactivated accounts so we need not worry about them.

Now what happens when a famous user follows another famous user, let’s say Donald Trump and Elon Musk? If Donald Trump tweets, Elon musk should be notified immediately even if the other non-famous users are not notified. This is handled by the tweet processor again. Tweet processor, when it receives an event from Kafa about a new tweet from a famous user, let’s say, Donald Trump, updates the cache of the famous users that follow Trump.

Now, this looks like a fairly efficient system, but there are some bottlenecks. Like Cassandra - which will be under a huge load, Redis - which needs to scale efficiently as it is stored completely in RAM, and Kafka - which again will receive crazy amounts of events. So we need to make sure that these components are horizontally scalable and in case of Redis, don’t store old data that just unnecessarily uses up memory.

Now coming to search and analytics!

Remember the tweet ingestion service we discussed in the previous section? When a tweet is added to the system, it fires an event to Kafka. A search consumer listening to Kafka stores all these incoming tweets into an Elasticsearch database. Now when the user searches a string in Search UI, which talks to a Search Service. Search service will talk to elastic search, fetch the results, and respond back to the user.

Now assuming an event occurred and people are tweeting or searching about it on Twitter, then it is safe to assume that more people will search for it. Now we shouldn’t have to query the same thing again and again on elasticsearch. Once the search service gets some results from elasticsearch, it will save them in Redis with a time-to-live of 2-3 minutes. Now when the user searches something, Search service will firstly lookup in Redis. If the data is found in Redis it will be responded back to the user, otherwise, the search service will query elasticsearch, get the data, store it in Redis, and respond back to the user. This considerably reduces the load on elasticsearch.

Let’s go back to our Kafka again. There will be a spark streaming consumer connected to Kafka which will keep track of trending keywords and communicate them to the Trends service. This could further be connected to a Trend UI to visualize this data. We don’t need a permanent data store for this information as trends will be temporary but we could use a Redis as a cache for short-term storage.

Now you must have noticed we have used Redis very heavily in our design. Now even though Redis is an in-memory solution, there is still an option to save data to disk. So in case of an outage, if some of the machines go down, you still have the data persisted on the disk to make it a little more fault-tolerant.

Now, other than trends there is still some other analytics that can be performed like what are people from India talking about. For this, we will dump all the incoming tweets in a Hadoop cluster, which can power queries like the most re-tweeted posts, etc. We could also have a weekly cron job running on the Hadoop cluster, which will pull in the information about our passive users and send out a weekly newsletter to them with some of the most recent tweets that they might be interested in. This could be achieved by running some simple ML algorithms that could tell the relevance of tweets based on the previous searches and reads by the users. The newsletters can be sent via a notification service that can talk to user service to fetch the email ids of the users.

Follow up 
Feb. 28, 2022
After I carefully reviewed the design blog, I decided to review the book called Microservice patterns, Chris Richardson. 


High Availability Architecture PART 1 - new AWS regions!

Feb. 27, 2022

Here is the link. 

What is High Availability? How do you build highly available web applications and services? Three nines, four nines? New AWS regions in Switzerland, Spain, Jakarta, Hyderabad and Melbourne! AWS Zurich Region: https://aws.amazon.com/de/local/switz...

What are the scaling issues to keep in mind while developing a social network feed?

Feb. 27, 2022

Here is the link. 

I worked on Facebook's News Feed.

Updated Mar 27, 2010

You want to minimize the number of disk seeks that need to happen when loading your home page. The number of seeks could be 0 or 1 but definitely not O(num friends). You also can't store all the data on one machine if you're concerned about scaling, so you've got a couple of options...

If you're willing to tolerate one disk seek, or if your graph has low fan-out (small number of people following any given person), you can de-normalize the data such that the metadata about every piece of activity is propagated to each of the followers of that activity at the time the action occurs. You might think of this as a "push" model. You'd still probably only store one copy of the actual activity data, but you'd push pointers to it (along with whatever other metadata is needed if you're supporting any ranking/filtering) to all the subscribers at the time it is created. Generally the first thing to break in this model will be the process of propagating the activity to all the subscribers, particularly if you have users that have large numbers of followers (celebrities). When this fails, the feed will start to get backed up. This can also be complicated in that you may need to write code that properly updates all of the subscribers whenever the important metadata about the content is updated, and you may want to also add code to update things when someone changes their list of subscriptions.

The alternative is to keep all the recent activity data in memory and not propagate the updates to the subscribers at write time, instead fanning out at the time of loading the home page. This way you avoid all disk seeks. It's also nice in that your fan out size is limited to the number of people a user follows rather than the number of people who follow a user (most people don't have enough time on their hands to follow millions of people, so you don't have the inverse of the celebrity problem). It's also easier to keep things up-to-date, since you don't have to worry about propagating updates to all of the subscribers. The downside of this approach is that the failure scenario is more catastrophic - instead of just delaying updates, you may potentially fail to generate a user's feed. Having some kind of fallback mechanism that approximates the feed (eg by querying only a subset of your friends) is handy to avoid having to show an error page.

Probably the theoretically best approach would be a hybrid of the above two options, but either of these options can be made to work reasonably well even at very large scales.


Leetcode discuss: 98. Validate Binary Search Tree

 Feb. 27, 2022

Here is the link. 


C# | Mock interview practice | Interviewer with MSFT, Amazon, Oracle, Meta | 2022 Feb. 23

Feb. 24, 2022
Introduction
I booked a few mock interviews after two year break starting from Dec. 2020. I had chance to work on this algorithm in the mock interview, and I wrote the code and tested all cases. The interviewer gave me 10 minutes review. I learned a few things. I did practice this algorithm near 10 times last few years, but this time I had mock interview and then I got invitation to get connected after the performance. It is worthy of time to share the experience.

The definition of binary search tree | Interviewer shared
A valid BST is defined as follows:

The left subtree of a node contains only nodes with keys less than the node's key.
The right subtree of a node contains only nodes with keys greater than the node's key.
Both the left and right subtrees must also be binary search trees.

The advice from interviewer
Great Job! You solved the problem correctly and answered additional question.
Some suggestions:

  • Try to have a plan (intro, idea explanation, pseudocode solution (if you choose to do it), solution, test cases, edge cases discussions, complexity
  • Time yourself for each part of plan, try to finish within 20 mins (especially for Meta and Google)
  • In many cases you first write code with issues and then correct yourself, try to be right from the first attempt, it will save you time and also will not let impatient interviewer to log negative signal

My notes after mock interview
The interviewer asked me to write a function to check if a binary tree is a binary search tree. I wrote the code, and then he helped me to test the code, make sure that all test cases are covered. And also the interviewer gave me 15 minutes talk and demonstration to help me improve my performance, presentation and avoid giving false negative signals in advance.

  1. Initial implementation wrong, correct yourself; Cons, interviewer may interrupt you as an interviewee, not patient;
  2. Pay attention to detail
  3. Interviewer may intervene -> negative signal, how to avoid making first writing mistakes
  4. Timing, 20 minutes an algorithm; Mata - two algorithms in 45 minutes
  5. Drive interview, saying let me do the coding, be proactive, cautious of time
  6. Presentation, illustration, draw a tree, and then use cursor to help, go over a few test cases, 3(root), 1(left), 4(right), it is BST, why, do a few test cases; this makes good presentation
    Time, presentation, in general interviwee should drive interview, more driving. Mindful practice on leetcode. A lot of good advice, thank you for being my interviewer.

Test cases
I wrote C# code to run the test cases.

static void Main(string[] args)
        {
            /*
            var root2 = new TreeNode(2);
            root2.left = new TreeNode(1);
            root2.right = new TreeNode(3);
            var test = new Solution();
            */

            var root3 = new TreeNode(3);
            root3.left = new TreeNode(1);
            root3.left.right = new TreeNode(2);
            root3.left.right.left = new TreeNode(4);

            //root3.right = new TreeNode(int.MaxValue);
            var test = new Solution();

            Console.WriteLine("result is ");

            var result = test.IsValidBST(root3);

            var output = result == true ? "true" : "false";
           
            Console.WriteLine("result is " + output);            
        }

Test cases:

  1. BST, three nodes, 2(root), 1(Left), 3(right)
  2. Not BST, failed test case, the above item1 BST, add left subtree with a node value bigger than root value
  3. Try to apply int.MaxValue in the tree node value, make sure that it is working, BST
  4. Interviewer suggested me to draw a tree by typing the words, and then change the value and make good presentation on discussion of BST test cases.

Annotations (shared later) | quick sharing
I like to share quickly annotations I got after I wrote the code, starting from 21 minutes in mock interview.

  1. communication (Green) 21:02 clarification
  2. communication (Red) 21:30 illustrate your speech
  3. problem solving (Green) 22:17 good iteration on problem
  4. problem solving (Green) 22:32 long/int edge case
  5. technical skills (red) 28:01 C3: ref instead of return, default prams, order of operation, etc.
  6. problem solving (Green) 33:47 [hint]captured equal values
  7. other (green) 36:42 time complexity
  8. other (red) 37:43 Space complexity O(h)

The interviewer did not choose a hard level algorithm, instead he chose the medium level, and he likes to see the production ready code.

The following C# code passes online judge.

		// Time complexity: O(N), N is total number of nodes in binary tree
        // space complexity: O(1) - 
        // recusive: stack space - O(lgN), worst O(N) | Should be 0(h), h is height of tree, not logN (From mock interviewer)
        public bool IsValidBST(TreeNode root)
        {
            if (root == null)
            {
                return true;
            }

            // 
            var found = false;
            runPreorder(root, long.MaxValue, long.MinValue, ref found);

            return !found;
        }
        
        private void runPreorder(TreeNode root, long upperBound, long lowerBound, ref bool foundException)
        {
            if (root == null)
                return;

            if (root.val >= upperBound || root.val <= lowerBound)
            {
                foundException = true;
                return;
            }

            runPreorder(root.left, root.val, lowerBound, ref foundException);

            runPreorder(root.right, upperBound, root.val, ref foundException);
        }

Seekingalpha.com compare: PPL, LNT, PCG, EIX, NEE | Utility stocks

Feb. 27, 2022

Here is the comparison generated by SeekingAlpha.com. 

Utility stocks: Debt-to-EBITDA | NEE 4.78 | LNT 3.92 | excessively leveraged PCG 8.95 | EIX 13.33



I will continue to use seekingalpha.com compare feature, and then I will think about how to invest PPL stock. 

PPL | My position 500 shares | Daily up and down will be around $500 dollars





Utility stocks: Debt-to-EBITDA | NEE 4.78 | LNT 3.92 | excessively leveraged PCG 8.95 | EIX 13.33

 One of the things that NextEra Energy is most proud of is they are producing above average growth, while still maintaining one of the strongest balance sheets and credit positions in the utility industry. All ratings agencies rate NextEra bonds as investment grade. S&P and Fitch rate NextEra debt at upper medium grade, while Moody's rates NextEra debt at lower medium grade. Corporate Credit ratings are explained here.

NextEra Energy's annualized Debt-to-EBITDA for Q4 2021 was 4.78. This is considered high but not excessively high for a utility. Debt/EBITDA ratio can be used to compare the liquidity position of one company to the liquidity position of another company within the same industry. An example of a utility that is considered to have good leverage is Alliant Energy (NYSE: LNT) with a Debt/EBITDA ratio of 3.92. Examples of excessively leveraged utilities are PG&E (NYSE: PCG) and Edison International (NYSE: EIX), which have a Debt-to-EBITDA level of 8.95 and 13.33, respectively.

NextEra Energy's Q4 2021 Debt to Equity was 1.47. Generally, debt to equity levels between 1.0 and 1.5 are considered good levels for a utility. NextEra Energy's interest coverage for Q4 2021 was 2.33. Generally, an interest coverage ratio of at least two is considered the minimum acceptable amount for a company. All of these numbers indicate that NextEra carries a high debt level, however, it is not unusual for utilities to be heavily leveraged as infrastructure requirements can make large, periodic capital expenditures necessary.


Quora.com: What are the best practices for building something like a news feed?

Feb. 27, 2022

Here is the link.

Background

Users in most social networking sites are describable in terms of a social graph. The relationships between users are represented by adjacency lists. If Jack and Jill are friends, they are said to be adjacent. This is known as an "edge" in the graph.

Determining Importance

You'll likely want to rank edges by importance rather than simply the most recent updates, meaning that you need to calculate some sort of score. Facebook's EdgeRank was described by the formula ∑e = ue we de, wherein ∑e is the sum of the edge's rank, ue is the affinity score with the user who created the edge, we is the weight for the content type, and de is a time decay factor.

Calculating a friend's affinity score can be done something like this: ∑i = li ni wi, wherein ∑i is the sum of the interactions with that friend, li is the time since your last interaction (this would need to be weighted so that 1 day > 30 days), ni is the number of interacts, and wi is the weight of those interactions. This method allows you to rank friends in a separate database and then perhaps only show ten updates from the ten closest friends, which isn't a bad idea considering few of us are likely to have more close friends than this.

What to Store

Determining what data to store depends on your front-end (including what activities your users participate in) and your back-end. I'll describe some general information you can store. Italics are special, optional information you might want or need depending on your schema.

Activity(id, user_id, source_id, activity_type, edge_rank, parent_id, parent_type, data, time)

  • user_id - user who generated activity
  • source_id - record activity is related to
  • activity_type - type of activity (photo album, comment, etc.)
  • edge_rank - the rank for this particular activity
  • parent_type - the parent activity type (particular interest, group, etc.)
  • parent_id - primary key id for parent type
  • data - serialized object with meta-data


Assuming you're using MySQL as your database store, you can index on (user_id, time) and then perform your basic queries. An example feed row for a photo would be:

(id: 1, user_id: 1, source_id: some_source, activity_type:PHOTO, data: (photo_id: 1, photo_name: Getting married)).


In MySQL, your tables would be heavily denormalized since performing joins will hurt performance.

Potential Problems

  • Visibility - must show interesting activities
  • Performance - sorting time must be minimized
  • Publishing - multiples points of failure depending on your publish method


Publishing Methods

"Push" Model, or Fan-out-on-write

This method involves denormalizing the user's activity data and pushing the metadata to all the user's friends at the time it occurs. You store only one copy of the data as in the schema above, then push pointers to friends with the metadata. The problem with this method is that if you have a large fan-out (a large number of followers), you run the risk of this breaking while your feed accumulates a backlog. If you go with this strategy, you also risk a large number of disk seeks and random writes. You'll want some sort of write-optimized data store such as Cassandra, HBase, or BigTable.

"Pull" Model, or Fan-out-on-load

This method involves keeping all recent activity data in memory and pulling in (or fanning out) that data at the time a user loads their home page. Data doesn't need to be pushed out to all subscribers as soon as it happens, so no back-log and no disk seeks. The problem with this method is that you may fail to generate a user's news feed altogether. To mitigate this risk, you should have a fallback mechanism in place that approximates the user's feed or serves as a good alternative.

Some Suggestions

  • If you're using MySQL, you'll be want to be sure that your activities table is compact as possible, your keys are small, and that it's indexed appropriately.
  • You may want to use Redis for fast access to fresh activity stream data. Redis is read-optimized and stores all data in memory. This is a good approach for the "Push" model described above.


Conclusions
While this is by no means an exhaustive answer, I'm trying to summarize as much information as I can. My sources for this answer are collected in the links below, so any information in this answer sadly goes without direct attribution. Special thanks, however, goes to 
Ari Steinberg for his very detailed answer to What are the scaling issues to keep in mind while developing a social network feed?

As I said at the beginning, I would love to get comments, feedback, or alternative approaches on this answer.

Firestone: As investors, you need to be calm about when you step back in to the market

Feb. 27, 2022

Here is the link.

Aureus Asset Management CEO Karen Firestone gives some advice to newer investors who haven't seen a major market correction, or have lived through a inflationary environment. For access to live and exclusive video from CNBC subscribe to CNBC PRO: https://cnb.cx/2NGeIvi