Skip to content

Latest commit

 

History

History
265 lines (188 loc) · 12.8 KB

Twitter.md

File metadata and controls

265 lines (188 loc) · 12.8 KB

System Design: Twitter Application

Problem Statement

Design a scalable social media platform similar to Twitter, where users can post short messages (tweets), follow other users, like/retweet posts, and view a timeline. The system must handle millions of users and billions of tweets with low latency.

Requirements

Functional Requirements:

  1. User Registration and Authentication: Users should be able to sign up and log in.
  2. Post Tweets: Users can post short messages (tweets) with optional media (images, videos, etc.).
  3. Follow/Unfollow Users: Users can follow or unfollow other users to build their feed.
  4. Timeline Feed: Users should see a timeline of tweets from users they follow, ordered by relevance or time.
  5. Like and Retweet: Users can like or retweet tweets from other users.
  6. Hashtags: Tweets can include hashtags for topic-based grouping.
  7. Notifications: Notify users when they are followed, their tweet is liked, or retweeted.
  8. Search: Users can search for tweets, users, or hashtags.
  9. Trends: Display trending topics based on hashtags and user activity.
  10. Replies and Conversations: Users can reply to tweets and view threads of conversations.

Non-Functional Requirements:

  1. Scalability: The system should handle millions of users and billions of tweets.
  2. Low Latency: Timeline feed updates and tweet posting should have minimal delays.
  3. High Availability: The platform should remain operational 24/7.
  4. Consistency: The system should ensure eventual consistency in displaying tweets and followers.
  5. Data Durability: Ensure that all data (tweets, likes, retweets) are stored and can be recovered.

Components of the System

The Twitter-like system can be broken down into the following key components:

  1. User Service: Manages user profiles, authentication, following/followers.
  2. Tweet Service: Handles the creation and retrieval of tweets, as well as replies, retweets, and likes.
  3. Timeline Service: Generates a personalized feed for users based on the people they follow.
  4. Search and Hashtags: Manages searching for tweets, users, and trending topics.
  5. Notification Service: Notifies users of interactions, like new followers, likes, and retweets.
  6. Media Service: Handles the upload and retrieval of media files (images, videos) associated with tweets.
  7. Analytics Service: Tracks trending hashtags, most-liked tweets, user activity, and generates reports.

High-Level Design

  1. User Interface (UI): Users interact with the system by posting tweets, following users, liking/retweeting tweets, and viewing their timeline.
  2. API Layer: Provides APIs for user registration (POST /register), posting tweets (POST /tweet), retrieving the timeline (GET /timeline), and following users (POST /follow).
  3. Service Layer:
    • User Service: Handles user-related operations (registration, login, following/unfollowing users).
    • Tweet Service: Handles creating tweets, liking, retweeting, replying, and fetching tweet details.
    • Timeline Service: Generates and manages user timelines, fetching relevant tweets from followed users.
    • Search Service: Allows users to search for tweets, users, and hashtags.
    • Notification Service: Manages notifications for user activities like follows, likes, and retweets.
    • Media Service: Handles media uploads (images, videos) and retrievals.
  4. Storage Layer:
    • Database: Stores user profiles, tweets, follower/following data, retweets, and likes.
    • Cache (Optional): Stores frequently accessed data, like timelines, to reduce database load.
    • Blob Storage: Stores media files (e.g., images, videos) for tweets.

Detailed Design

1. User Registration and Authentication

Users can sign up using email, phone number, or social media OAuth (e.g., Google, Facebook). After registration, they can log in and update their profile.

Database Schema for Users:

Field Type Description
user_id String (PK) Unique identifier for the user.
username String Unique username chosen by the user.
email String User’s email address.
password_hash String Hashed password for secure login.
bio String User's bio or description.
followers Integer Number of followers the user has.
following Integer Number of users the user is following.

Key Operations:

  • Registration/Login: Users can register and log in via email/phone.
  • Follow/Unfollow: Users can follow or unfollow other users.

2. Posting Tweets

A user can post short text messages (up to 280 characters) and optionally attach media (images, videos).

Tweet Posting Flow:

  1. The user creates a tweet.
  2. If media is attached, it is uploaded to blob storage, and a URL is returned.
  3. The tweet is stored in the database, along with any media URLs.
  4. Followers of the user will see the tweet in their timeline.

Database Schema for Tweets:

Field Type Description
tweet_id String (PK) Unique identifier for the tweet.
user_id String (FK) ID of the user who posted the tweet.
content Text Text content of the tweet.
media_url String URL to any media attached to the tweet.
timestamp Timestamp Time the tweet was posted.
likes Integer Number of likes on the tweet.
retweets Integer Number of retweets.

3. Timeline Generation

The timeline is a personalized feed of tweets from users that the current user follows. It should be updated in real-time as new tweets are posted.

Timeline Generation Process:

  1. When a user posts a tweet, the system identifies the followers of that user.
  2. The tweet is inserted into the timelines of all followers.
  3. When a user views their timeline, the system retrieves tweets from the users they follow, ordered by time or relevance.

Timeline Table (Cache):

Field Type Description
user_id String (PK) ID of the user viewing the timeline.
timeline Array[String] List of tweet IDs for the user’s timeline.
last_updated Timestamp Time when the timeline was last updated.

4. Likes and Retweets

Users can like or retweet other users’ tweets, which will reflect on their timeline and be visible to their followers.

Likes Flow:

  1. When a user likes a tweet, the likes count for the tweet is updated.
  2. The user is notified that their tweet was liked.

Retweets Flow:

  1. When a user retweets a tweet, a copy of the tweet is added to their timeline.
  2. The retweeted tweet is also shared with their followers, and the original tweet’s retweet count is updated.

5. Hashtags and Trends

Users can add hashtags to their tweets to categorize them by topics. The system tracks popular hashtags to display trending topics.

Hashtag Usage:

  1. When a tweet with a hashtag is posted, the hashtag is indexed in the search service.
  2. The system tracks hashtag usage to identify trending topics based on the number of tweets with that hashtag.

Database Schema for Hashtags:

Field Type Description
hashtag String (PK) The hashtag (e.g., #sports).
tweet_count Integer Number of tweets using the hashtag.
last_used Timestamp Last time the hashtag was used.

6. Search Service

The search service allows users to search for tweets, users, and hashtags. The system should return relevant results based on popularity, recency, or user preferences.

Search Flow:

  1. The user searches for a term (e.g., #music or JohnDoe).
  2. The system searches for relevant tweets, users, or hashtags.
  3. Results are returned, ordered by relevance or recency.

Search Index:

  • Inverted Index: Create an inverted index to map keywords to tweet IDs, usernames, and hashtags.
  • Trending Topics: Periodically compute trending hashtags based on activity.

7. Notifications

Users should be notified when someone follows them, likes, or retweets their tweet. Notifications can be push notifications on mobile devices or in-app alerts.

Notification Flow:

  1. When User A likes a tweet from User B, a notification is generated.
  2. User B is notified of the like or retweet in real-time.

Key Challenges

1. Scalability

  • The system must handle millions of users and billions of tweets.
    • Solution: Use sharding to split user data and tweet storage across multiple databases, partition by user ID or tweet ID.

2. Timeline Generation

  • Generating timelines for millions of users with real-time updates is challenging.
    • Solution: Use fan-out on write or fan-out on read strategies, depending on the load. Implement timeline caching for frequently accessed timelines.

3. Search and Trending Hashtags

  • Efficiently searching tweets and handling trending topics.
    • Solution: Use full-text search engines like Elasticsearch to index tweets, users, and hashtags for efficient search.

4. Data Consistency

  • Ensuring that followers see real-time updates from the users they follow while ensuring eventual consistency.
    • Solution: Use eventual consistency models with background jobs for consistency across distributed data.

5. Media Handling

  • Efficiently storing and serving large media files (images, videos) attached to tweets.
    • Solution: Use Blob Storage like AWS S3 to store media and serve via a CDN for fast retrieval.

Advanced Features (Optional)

1. Direct Messaging

  • Allow users to send private messages to each other, including media sharing.

2. Content Moderation

  • Implement AI-based content moderation to detect abusive language, spam, or inappropriate content in tweets.

3. Ad Platform

  • Introduce an advertising system where users or businesses can promote their tweets to reach a wider audience.

4. Multimedia Support

  • Allow users to live-stream or post longer videos, similar to video-based social media platforms.

Data Flow

1. Posting a Tweet

  • The user posts a tweet with text and/or media.
  • The tweet is saved in the tweet service and indexed for search.
  • Followers' timelines are updated with the new tweet.

2. Timeline Retrieval

  • The user opens the app, and the system fetches their timeline.
  • The timeline is either generated on-demand or fetched from a pre-generated timeline cache.

3. Search for Hashtags

  • The user searches for a hashtag.
  • The search service queries the inverted index for tweets with the specified hashtag.

Scaling the System

1. Sharding

  • Shard user data and tweet storage across multiple databases to handle millions of users and billions of tweets. Use user ID or tweet ID for partitioning.

2. Caching

  • Use caching for frequently accessed timelines, popular tweets, and user profiles. Redis or Memcached can be used for this purpose.

3. Load Balancing

  • Use load balancers to distribute incoming traffic across multiple application servers, ensuring that the system handles a large volume of requests efficiently.

4. Message Queues

  • Use message queues (e.g., Kafka) to handle high-throughput events like tweet posting, fan-out to followers' timelines, and notification events.

Conclusion

Designing a Twitter-like application requires addressing scalability, real-time data consistency, and user interaction. The system must be able to efficiently manage timelines, handle large amounts of data, and provide a responsive user experience. By leveraging caching, sharding, and load balancing techniques, the system can handle millions of active users and billions of tweets while providing low-latency responses.


Further Reading