There is a lot i’ve learned by building distributed systems with huge scale. The following has plenty of articles/content online. This is just a summary.

Rather than just code, your architecture or design of system evolves.

  • multiple instances, with loadbalancer to deal with scale.
  • circuit breakers
  • master,slave/cluster for db
  • archiving data
  • cache (performance, proper caching, invalidation)
  • read only API’s with slave db
  • event driven architecture (kafka, events)
  • sharding
  • monitoring, alerts
    • performance (percentiles), or request, db, cache
  • load testing
  • proper indexes, tweak resources vm
    • ulimit
    • postgres config for performance tuning
    • redis maxmemory
  • workers, async processing
  • failure isolation
  • ci/cd
  • TDD