Design Amazon!

Rajat Goyal
3 min readJan 28, 2021

--

In this article I present my data-driven software-architecture design for an online shopping application like amazon.

REQUIREMENTS

FUNCTIONAL REQUIREMENTS (Must Have)

  • Search products based on free text.
  • Display Product description, price, availability, reviews, similar products.
  • Add/Remove/Modify products in Cart.
  • Registered customers can checkout, make a payment and place an order.
  • Order can be tracked, cancelled, returned.
  • Registered users can see product recommendations.

NON-FUNCTIONAL REQUIREMENTS (To Drive Engineering Decisions)

  • Favor Consistency over Availability in order, payment, cart modifications.
  • Favor Availability over Consistency in recommendation, notification, search.
  • Low-Latency for product search, tracking, notifications.

OUT OF SCOPE

  • Sellers can add/remove/modify products in catalog.
  • System notifies customers about order updates, promotions.

ESTIMATIONS (One Year Assumptions)

USER, ORDER and PRODUCT ESTIMATION (M=Million, B=Billion)

  • Total Users(Sellers + Customers) : 100M
  • Total Listed Products: 200M
  • Daily Active Users: 30% of Total = 30M
  • Product Searches/User/Year = 300
  • Order/User/Year = 20
  • Total Orders/Year = 100M*20 = 2B

STORAGE ESTIMATION (See Data Model Below)

  • User Storage: 100M * 400Bytes = 40GB
  • Product Storage : 200M * 1.5KB = 300GB
  • Order Storage: 2B * 300Bytes = 600GB
  • Media Storage (Images/Videos): 2MB(Average Media Size) * 1(Average Media/Product) * 200M(Total Products) = 4TB
  • Total Storage: 5TB

READS/WRITES ESTIMATION

Reads: Search, Cart Details, Order Details, Reviews, Product Details.

Writes: Modify Cart, Checkout, Modify Order, Modify Catalog.

  • Reads per Search: 1(Search) + 1 Product ~ 2
  • Reads per cart details: 1
  • Reads per order details: 1(details)+1(tracking)=2
  • Reads per product details: 1(product)+1(sellers)+1(availability)+1(offers)+1(reviews) =5
  • Writes per cart modification: 1(cart)+1(catalog)
  • Writes per checkout: 1(cart)+1(orders)+1(payments)+1(catalog)
  • Total Reads/Second: 600M(Daily Users*(10 Searches+1Cart+1Order)/86400 ~ 6000 Reads/Second
  • Total Writes/Second: 300M(Daily Users*1Order) ~ 3000 Writes/Second

SYSTEM API

  • register(username, password, email)
  • login(username, password) : json-web-token
  • logout()
  • search(search_string) : list_of_products
  • product_details(productId) : product_details
  • show_cart() : cart_state
  • modify_cart(modified_cart_state)
  • checkout()
  • place_order()
  • show_orders(userId) : list_of_orders
  • track_order(order_id)
  • cancel_or_return_order(orderId)

DATA MODEL

USER(~400Bytes)

  • userId (autogenerated) (8 Bytes)
  • userName (50 Bytes)
  • email (50 Bytes)
  • contactDetails (100 Bytes)
  • status (20 Bytes)
  • registeredOn (20 Bytes)
  • metadata (100 Bytes)

ORDER (~300 Bytes)

  • orderId (autogenerated) (8 Bytes)
  • price (5 Bytes)
  • status (20 Bytes)
  • userId (8 Bytes)
  • createdOn ( 20 Bytes)
  • List<productId> (50 Bytes)
  • metadata (100 Bytes)
  • type (20 Bytes)

PRODUCT (~1.5KB)

  • productId (autogenerated) (8 Bytes)
  • name (100 Bytes)
  • description (500 Bytes)
  • paymentId (8 Bytes)
  • List<mediaId> (200 Bytes)
  • metadata (200 Bytes)
  • List<sellerId> (200 Bytes)

REVIEW (~400 Bytes)

  • userId (8 Bytes)
  • productId (8 Bytes)
  • createdOn (20 Bytes)
  • List<mediaId> (100 Bytes)
  • description (200 Bytes)
  • status (20 Bytes)
  • likes (10 Bytes)

CART

  • cartID (autogenerated)
  • userId
  • status
  • List<productId>

DATABASE DECISIONS (SQL vs NoSQL)

SQL (MySql, Oracle, MariaDB)

  • User (Due to fixed structure of data)
  • Payment (Due to strong consistency requirements)
  • Cart (Due to structured data and strong consistency requirements)

NoSQL (Cassandra, MongoDB, Couchbase, Redis)

  • Product (Due to very unstructured nature of data)
  • Order (Due to ever increasing data)
  • Search: ElasticSearch

Object Storage (Amazon S3, Azure Blob Storage)

  • Images/Videos

HIGH LEVEL ARCHITECTURE

Micro-services: Assume Multiple active runtime for each service.

  • Web-Servers: Application servers containing no business logic but act as the controller for the system doing service discovery, request validation , gatekeeping etc.
  • Search Service: Handles Search, consumes product, availability and recommendation service.
  • User Service: Handles User Data, access management.
  • Product Service: Handles Catalog.
  • Order Service: Create, Cancel, Track Order.
  • Cart Service: Cart Modifications.
  • Media Service: Handles media upload, download, compression etc.
  • Recommendation Service: Create Personal recommendation profiles based on user interactions (cart , order, search, wish list).
  • Analytics Service: Hadoop based analytics service.

Kafka Based Asynchronous Communication:

  • Search, order, cart services create events for recommendation, analytics, notification, products.

SCALABILITY

DATABASE SHARDING (Details in Part-2)

  • Order Database: Range based shard based on orderID
  • Product Database: Hash based shard based on product names.
  • User Database: Range based sharding on usernames.

CACHING

  • Hot Products can be cached in a no-sql data-store like Redis.
  • CDN can be configured for delivering media through Object Storage.

MONITORING AND ALERTING(OUT OF SCOPE)

BACKUP AND DISASTER RECOVERY(OUT OF SCOPE)

Out of scope topics will be covered in part 2 along with low level design.

--

--