Design Amazon!
In this article I present my data-driven software-architecture design for an online shopping application like amazon.
REQUIREMENTS
FUNCTIONAL REQUIREMENTS (Must Have)
- Search products based on free text.
- Display Product description, price, availability, reviews, similar products.
- Add/Remove/Modify products in Cart.
- Registered customers can checkout, make a payment and place an order.
- Order can be tracked, cancelled, returned.
- Registered users can see product recommendations.
NON-FUNCTIONAL REQUIREMENTS (To Drive Engineering Decisions)
- Favor Consistency over Availability in order, payment, cart modifications.
- Favor Availability over Consistency in recommendation, notification, search.
- Low-Latency for product search, tracking, notifications.
OUT OF SCOPE
- Sellers can add/remove/modify products in catalog.
- System notifies customers about order updates, promotions.
ESTIMATIONS (One Year Assumptions)
USER, ORDER and PRODUCT ESTIMATION (M=Million, B=Billion)
- Total Users(Sellers + Customers) : 100M
- Total Listed Products: 200M
- Daily Active Users: 30% of Total = 30M
- Product Searches/User/Year = 300
- Order/User/Year = 20
- Total Orders/Year = 100M*20 = 2B
STORAGE ESTIMATION (See Data Model Below)
- User Storage: 100M * 400Bytes = 40GB
- Product Storage : 200M * 1.5KB = 300GB
- Order Storage: 2B * 300Bytes = 600GB
- Media Storage (Images/Videos): 2MB(Average Media Size) * 1(Average Media/Product) * 200M(Total Products) = 4TB
- Total Storage: 5TB
READS/WRITES ESTIMATION
Reads: Search, Cart Details, Order Details, Reviews, Product Details.
Writes: Modify Cart, Checkout, Modify Order, Modify Catalog.
- Reads per Search: 1(Search) + 1 Product ~ 2
- Reads per cart details: 1
- Reads per order details: 1(details)+1(tracking)=2
- Reads per product details: 1(product)+1(sellers)+1(availability)+1(offers)+1(reviews) =5
- Writes per cart modification: 1(cart)+1(catalog)
- Writes per checkout: 1(cart)+1(orders)+1(payments)+1(catalog)
- Total Reads/Second: 600M(Daily Users*(10 Searches+1Cart+1Order)/86400 ~ 6000 Reads/Second
- Total Writes/Second: 300M(Daily Users*1Order) ~ 3000 Writes/Second
SYSTEM API
- register(username, password, email)
- login(username, password) : json-web-token
- logout()
- search(search_string) : list_of_products
- product_details(productId) : product_details
- show_cart() : cart_state
- modify_cart(modified_cart_state)
- checkout()
- place_order()
- show_orders(userId) : list_of_orders
- track_order(order_id)
- cancel_or_return_order(orderId)
DATA MODEL
USER(~400Bytes)
- userId (autogenerated) (8 Bytes)
- userName (50 Bytes)
- email (50 Bytes)
- contactDetails (100 Bytes)
- status (20 Bytes)
- registeredOn (20 Bytes)
- metadata (100 Bytes)
ORDER (~300 Bytes)
- orderId (autogenerated) (8 Bytes)
- price (5 Bytes)
- status (20 Bytes)
- userId (8 Bytes)
- createdOn ( 20 Bytes)
- List<productId> (50 Bytes)
- metadata (100 Bytes)
- type (20 Bytes)
PRODUCT (~1.5KB)
- productId (autogenerated) (8 Bytes)
- name (100 Bytes)
- description (500 Bytes)
- paymentId (8 Bytes)
- List<mediaId> (200 Bytes)
- metadata (200 Bytes)
- List<sellerId> (200 Bytes)
REVIEW (~400 Bytes)
- userId (8 Bytes)
- productId (8 Bytes)
- createdOn (20 Bytes)
- List<mediaId> (100 Bytes)
- description (200 Bytes)
- status (20 Bytes)
- likes (10 Bytes)
CART
- cartID (autogenerated)
- userId
- status
- List<productId>
DATABASE DECISIONS (SQL vs NoSQL)
SQL (MySql, Oracle, MariaDB)
- User (Due to fixed structure of data)
- Payment (Due to strong consistency requirements)
- Cart (Due to structured data and strong consistency requirements)
NoSQL (Cassandra, MongoDB, Couchbase, Redis)
- Product (Due to very unstructured nature of data)
- Order (Due to ever increasing data)
- Search: ElasticSearch
Object Storage (Amazon S3, Azure Blob Storage)
- Images/Videos
HIGH LEVEL ARCHITECTURE
Micro-services: Assume Multiple active runtime for each service.
- Web-Servers: Application servers containing no business logic but act as the controller for the system doing service discovery, request validation , gatekeeping etc.
- Search Service: Handles Search, consumes product, availability and recommendation service.
- User Service: Handles User Data, access management.
- Product Service: Handles Catalog.
- Order Service: Create, Cancel, Track Order.
- Cart Service: Cart Modifications.
- Media Service: Handles media upload, download, compression etc.
- Recommendation Service: Create Personal recommendation profiles based on user interactions (cart , order, search, wish list).
- Analytics Service: Hadoop based analytics service.
Kafka Based Asynchronous Communication:
- Search, order, cart services create events for recommendation, analytics, notification, products.
SCALABILITY
DATABASE SHARDING (Details in Part-2)
- Order Database: Range based shard based on orderID
- Product Database: Hash based shard based on product names.
- User Database: Range based sharding on usernames.
CACHING
- Hot Products can be cached in a no-sql data-store like Redis.
- CDN can be configured for delivering media through Object Storage.
MONITORING AND ALERTING(OUT OF SCOPE)
BACKUP AND DISASTER RECOVERY(OUT OF SCOPE)
Out of scope topics will be covered in part 2 along with low level design.