NFV Home

Ensuring High Availability and Resiliency in OpenStack for NFV

By Ali Kafel May 05, 2015

This blog will give a brief overview of how to transparently implement High Availability and Resiliency into any application running in a KVM/OpenStack environment without the extra work and complexity of changing applications. I will also list the benefits of this approach versus implementing it into the application or using a hardware-based approach. Future blogs will go into more details of each of the benefits provided by this approach. Let’s get started.

For telcos, High Availability and Network Resiliency are non-negotiable. The reputation of a telco rests on its ability to reliably connect calls and transfer data. If the network goes down, even just for a few seconds, millions of people can be affected. System failure not only results in loss of revenue for the operator, it can seriously damage its reputation and, in some cases, lives could be at stake – if a call to the emergency services doesn’t go through, the caller may not have a second chance.

For operators, the question is not whether network components will fail (they will), it’s what happens in the event of a failure. Can the network tolerate a fault and continue to deliver service as if nothing happened? Or will a network fault result in loss of service for a few seconds (or longer) while the failed service is restarted? While sometimes used interchangeably, there is a difference between networks that are Highly Available and those that are Resilient. A Highly Available (HA) network is one that insures that the network and its services are accessible almost always—Five nines (99.999%) network availability is the minimum benchmark, meaning that on average, the network is never down for more than 5 minutes in a one year period. Resiliency is the ability to recover quickly from a fault or failure, to return to its original form, state, etc. (just before the failure). This means a traditional HA system may not be capable of returning to the original state, which means it’s stateless.

If a stateless HA system fails, it means an application has to restart the sequence of interactions with the user, which is not acceptable for mission-critical telecoms systems, such as Home Subscriber Service application (that keeps track of all mobile users’ locations) or a Firewall/DDOS mitigation solution that protects a network. 

With a stateful Fault Tolerant system, both HA and Resiliency are provided.  This means when a stateful application fails, the application transparently continues to run in the secondary server with the same state as the primary application, ie. state is not lost.   Therefore stateful Fault Tolerance = HA + Resiliency. See Figure 1.

Today Telcos are looking to use OpenStack for NFV. While OpenStack supports basic levels of availability it does not support service continuity and stateful fault tolerance. This is a major concern for the Telcos. By inserting a Virtualized Cloud Resilience Layer that transparently brings Telco reliability to any cloud application (Figure 2), Telcos can use OpenStack with the confidence that any application can be deployed with selectable levels of availability, including stateful fault tolerant with geo-redundancy, without required code changes.

One of the main benefits of this approach is that it enables Telcos to take any application, like a traditional enterprise-grade firewall that may not be redundant, and deploy it on this software-based fault tolerant NFV Infrastructure (NFVI) with immediate and simplified Fault Tolerance, without the need of any code development. This gives them tremendous agility in their partner/vendor system because it immediately opens up VNFs that they otherwise would not have considered! 

There are plenty of other benefits with this Software Infrastructure approach versus using a Hardware Fault Tolerant system or putting code in each application to make it fault tolerant. Figure 3 shows a summary of the pros and the cons of each approach. 

In future subsequent blogs, I will go into further details on the other benefits of the Software Infrastructure approach. Stay tuned!

About the Author: Ali Kafel is the Senior Director of NFV Business Development for Stratus Technologies and a contributor to NFVZone. Follow him on twitter @akafel for more thoughts on this and other similar topics.

Edited by Dominick Sorrentino

Senior Director and Head of Telecom Business, Stratus Technologies

Related Articles

Winners of the 2018 INTERNET TELEPHONY NFV Innovation Award Announced

By: TMCnet News    6/4/2018

TMC announced the recipients of the 2018 INTERNET TELEPHONY NFV Innovation Award, presented by INTERNET TELEPHONY magazine.

Read More

Harnessing Pervasive Visibility to Unleash the Power of the Cloud

By: Michael Segal    11/9/2017

Cloud computing is having an unprecedented influence on companies throughout the world; according to research from BDO, an overwhelming number (74%) o…

Read More

Nokia Introduces SDAN Solution

By: Paula Bernier    10/10/2017

Nokia has unveiled a Software-Defined Access Network solution. This offering consists of cloud-native software, integration services, open programmabl…

Read More

Stating with Attestation, a Core Foundation of Computer Security for Sensitive Systems

By: Special Guest    10/3/2017

The European Telecommunication Standards Institute (ETSI) held their annual Security Week event and along with a representative from the UK National C…

Read More

Assuring Business Outcomes on Your DX Journey

By: Michael Segal    9/7/2017

When it comes to implementing strategies for digital transformation (DX), there are nearly as many methods as there are companies using them.

Read More