High Availability

In OpenStack, the infrastructure is integral to providing services and should always be available, especially when operating with SLAs. Ensuring network availability is accomplished by designing the network architecture so that no single point of failure exists. A consideration of the number of switches, routes and redundancies of power should be factored into core infrastructure, as well as the associated bonding of networks to provide diverse routes to your highly available switch infrastructure.

High availability systems seek to minimize the following issues:

  1. System downtime: Occurs when a user-facing service is unavailable beyond a specified maximum amount of time.

  2. Data loss: Accidental deletion or destruction of data.

Most high availability systems guarantee protection against system downtime and data loss only in the event of a single failure. However, they are also expected to protect against cascading failures, where a single failure deteriorates into a series of consequential failures. Many service providers guarantee a Service Level Agreement (SLA) including uptime percentage of computing service, which is calculated based on the available time and system downtime excluding planned outage time.

Partially extracted from

OpenStack Arch-design

OpenStack HA Guide