Like most people, I was horrified by the Texas power grid’s recent failure and the devastating impact on many citizens’ lives. I think many people wondered how something like this could happen. As someone in the telecommunications industry, it occurred to me that the power and telecommunications sector face similar planning issues.
You might not think that power grids and telecommunications networks have much in common on the surface, but they have more in common than you might initially think. Both power and communications are fundamentally infrastructures that we are increasingly reliant upon as a society. Both experience fairly predictable demand behaviors, whether it’s everyone turning on their air conditioning during a sweltering summer day or everyone calling home on Mother’s Day. These events are predictable because history tells us that the past usage trends are reasonable predictors for future consumption behaviors.
Both types of infrastructure are also subject to anomalous outages from technological or natural forces. We tend to know that these atypical events will eventually happen because probabilities also tell us these events are likely to occur. Still, the exact size, scope, and scale are often difficult to predict with precision. Infrastructure planners for both services spend much time worrying about these kinds of reliability, redundancy, and overall capacity issues.
Another area of similarity is the cost of redundancy. Spare capacity cannot be instantaneously created out of thin air, and this extra capacity incurs a cost in any infrastructure. Unfortunately, there is a tendency to question additional charges on “sunny days” when needed. It is not until we encounter the “rainy day” scenario that we value the rationale behind it.
Planning for Failure
Returning to the realm of telecommunications and network planning, let’s look at how an operator typically plans for their future capacity needs. Most operators rely on premises-based servers (both virtualized and non-virtualized) to handle their infrastructure workloads. Let’s look at an elementary network capacity planning model.
As a rule of thumb, a single server should never run at higher than 80 percent of its full capacity. To understand why this is, think about your desktop or laptop. As your computer’s memory and processing capacity get closer to 100 percent, performance drop exponentially.
So, you might think that an operator only needs to purchase an extra 20 percent of capacity more than they need. Not so fast. In a typical active/standby failover, the redundancy model requires that, for every primary server, there is some reserve capacity that can take on the workload in the event of failure. In the illustration below, I show the example of a single pair of redundant servers, Server A and Server B. Each server runs at 40 percent full-time during regular operation because, if Server A should fail, Server B can assume Server A’s workload and still stay under the 80 percent threshold.
The need for redundancy complicates network capacity planning. For example, if an operator plans ten percent month-over-month growth, they need to double their capacity year-over-year. Unfortunately, most operators can’t merely add extra capacity a month in advance because of the time required to purchase and install new hardware. There are multiple steps involved—budget approval, vendor quoting, supply chain processes, shipping times, installation and cabling, etc.—that can take anywhere from six to twelve months to complete. In other words, operators realistically need to begin this process as much as a year in advance.
Generally speaking, most planners project their capacity from the end of the current fiscal year to the end of the next fiscal year and order the necessary amount of capacity at the beginning of the fiscal year. Pre-planning in this way avoids being caught “short-served” at the end of the year, but at a cost: for most of the year, operators end up sitting on idle capacity, particularly in the first six to nine months of the year.
Advantages of Capacity Planning and the Cloud
But what if operators could expand their network capacity using the cloud? Spinning up workloads in the cloud takes a fraction of the time, meaning that telcos could add capacity as they needed it, not a year before they needed it. This approach to network capacity management saves the operator from paying for unused server capacity, along with the associated power, real estate, and remaining operational costs required to maintain those unused servers.
There are several other advantages to moving operator infrastructure into the cloud. For example, consider that operator networks face not only seasonal spikes in usage but also daily spikes. The chart below illustrates a typical day in the life of a network. Notice there can be a significant difference in the resource requirements throughout the day.
Only Pay for Resources You Use
Even with virtualized servers, operators still need to plan for enough physical infrastructure capacity to cover the peak busy hour. In genuinely cloud-native software architecture on the cloud, resources can be dynamically allocated during the busiest times throughout the day and deallocated when they are no longer needed. With cloud-based infrastructure, operators need only pay for the resources they use, which means operators are not charged for unused capacity. And because many workloads among many customers are statistically multiplexed on the cloud, operators can dynamically spin up additional resources as required.
Visibility of Network Cost
Cloud-based infrastructure also provides better visibility into actual network costs. Operator costs are analogous to an iceberg because only the physical costs (i.e., the hardware) are often visible. The real costs of running that infrastructure—operations, power, real estate, etc.—remain partially obscured. However, in the cloud, the combined CapEx and OpEx costs are visible as a total monthly cost.
New Opportunities for Service Expansion
Finally, the cloud opens up new opportunities for service expansion by eliminating most of the network’s upfront costs. Today, new service rollouts require much planning and a compelling business case because of the substantial sunk costs involved. By using cloud infrastructure, operators can dramatically reduce those costs and “dip their toes” into new services and new markets without having to make large investments while retaining the ability to scale up network capacity quickly if those services take off.
Cloud, and Especially Cloud-Native, is the Future
What kind of cost savings can be achieved? Early models show that operators can 30% or more by moving from an on-prem infrastructure model to the cloud. And that total-cost-of-ownership (TCO) gets even more attractive when using a cloud-native versus traditional virtualized cloud infrastructure. Our research shows that a cloud-native infrastructure can reduce TCO by another 25 percent.
As attractive as the cloud is, most operators aren’t ready yet to move their entire network infrastructure into the cloud, which is understandable. A hybrid model that mixes on-prem infrastructure with cloud-based infrastructure allows operators to expand network capacity in the cloud on-demand, a technique known as cloudbursting. By doing this, operators can make sure they always have enough capacity to handle whatever nature or the future throws at them.