Knowledge Byte: Designing the Cloud to Expect Failure
Designing software for failure is an extra barrier to overcome but isn’t too hard, and it certainly pays off.
Largely, it boils down to make sure that operations do not leave the system in an unstable state if they are aborted partway through for some reason. This is mainly a challenge for the frameworks and infrastructure upon which applications are built; with the infrastructure for retrying failed operations built into the system, application developers only really need to worry about the areas where the system can’t automate recovery from failure (such as operations that trigger real-world actions).
● Each application component must be deployed across redundant cloud components, ideally with minimal or no common points of failure. The best practice is deployment into multiple availability zones.
● Each application component must make no assumptions about the underlying infrastructure, it must be able to adapt to changes in the infrastructure without downtime.
● Each application component should be partition tolerant, it should be able to survive network latency (or loss of communication) among the nodes that support that component.
● Automation tools must be in place to orchestrate application responses to failures or other changes in the infrastructure.
The use of Chaos Monkey—the best way to avoid failure is to fail constantly. From an early stage, Netflix used a Chaos Monkey—a piece of software that can randomly kill off different services/ features in Netflix, with the intention of assessing how well the recovery works. Initially, this was used in the test, but now is being used randomly in production. Quoting from the blog—“If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most—in the event of an unexpected outage.”
Related products to help you upskill
The Professional Cloud Developer (PCD) course provides application developers a thorough understanding, and working-level knowledge, of vendor-neutral application design principles, ensuring that applications provide the most value throughout the application lifecycle. The course covers best practices on application design for cloud environments and supports many vendor technology solutions, covering Open Source and major Vendor Standards. The…
Never miss an interesting article
Get our latest news, tutorials, guides, tips & deals delivered to your inbox.