How to Build a Highly Available WAN
Eight best practices to achieve 99.99% network uptime
When it comes to network uptime, the devil is in the details. We’re talking about each of these eight best practices potentially preventing one or two outages, thereby delivering thousands or even millions of dollars in savings to your business.
Just so you can get a better idea of what kind of savings we’re talking about with our goal at 99.99% uptime, here’s a more granular look at how improving network uptime per month can benefit your business:
- 99.9% uptime translates to 43 minutes of downtime per month
- 99.99% uptime translates to 4 minutes of downtime per month
- 99.999% uptime translates to 24 seconds of downtime per month
Most companies operate at 99.9% or below. But even the smallest amount of network outages can greatly damper employee productivity. And since productivity is directly tied to your company’s revenue stream, shrinking downtime to be as small as possible should be a priority.
Bottom line: When network downtime occurs, employees aren’t as productive and the company loses revenue, which is what we are trying to remove from the equation.
For the sake of quantifying the cost of network downtime, let’s pretend your company is operating at 99.9% uptime and has 500 employees. That’s pretty good, right?
Well, consider that this uptime rate means each employee is losing about 43 minutes of productivity every month, which adds up to 8 hours and 46 minutes annually (times the number of employees). In total, 4,383 employee hours are lost annually due to outages.
If your company earns $100 million annually, that means each employee, on average, contributes $105 per hour to the total bottom line. We’ll assume there’s eight working hours in a day, and that the actual effects of network downtime only impact employee productivity 1/3 of the time. (This is assuming that the network issue is fixed if it occurs in the middle of the night, which is unusual.) So, for the sake of argument, we’ll be conservative and divide by 3.
Using these numbers, network downtime is estimated to cost your company $153,405 in employee revenue productivity—even though you’re already operating at 99.9% uptime.
If you only operate at 99% uptime, your annual downtime would add up to 3 days, 15 hours and 36 minutes. The revenue lost from non-productive employees would add up to $1,533,000. The lower your uptime percentage is, the more compelling these numbers get.
As this example shows, network downtime can cost companies millions of dollars. If we can erase a few hundred hours of employee productivity downtime, we’ve just generated more revenue for your company.
Following the best practices outlined in this whitepaper will help minimize network downtime and positively contribute more to your company’s revenue — there’s no question about it. The fundamentals below, provided by Wired Networks’s networking experts, can help increase your network uptime to at least the 99.99% level.
Design the network to be highly available (99.99% uptime)
This may seem obvious, but it’s not so easy to do. If you’re still reading at this point, we’ll assume you understand the importance of your network. It’s the foundation from which you’re going to build upon. Your overall business revenue depends on your network being available day and night, and that’s only becoming more important as your company (and network) grows.
What “highly availability” really means is eradicating as many single points of failure as possible, both with equipment termination and circuits. Two circuits in each location on diverse last mile routes is a must, and the single most important thing a company can do to overcome a majority of all network issues is removing the loss of last mile service.
Building failover and failback between the primary and backup service will reduce 80% of your network issues. This is essential. Rather than looking at failover solutions as an extra expense, consider it as a return on your most important investment: your employees. Your employees are more productive when the network is up and operational, which means the company is generating more revenue.
Below is an illustration of a very typical design for high availability. We have coined this solution “Cloud Assurance.”
The key takeaway is that several “boxes” need to be checked in order to achieve high availability. In this case, we’ve categorized these needs as follows:
- Pt to Pt connection between data center and corporate
- Redundant hardware at remote and corporate
- Primary MPLS
- Secondary VPN
- Integration with corporate network
- Smart UPS for remote power cycling of network equipment
NOC monitoring of network and hardware 24/7/365
2. Inventory Management
Nothing left behind
This step isn’t fun, but every last stitch of detail related to your network must be documented and identified in order to have a highly available WAN. All equipment, circuits, phone numbers, and applications that will run over your network will need to be documented and identified for known anomalies or current issues.
3. Circuit Type
Not all circuits are created equal
For understandable reasons, a great deal of emphasis tends to be placed on a circuit’s bandwidth. But when you’re trying to achieve a high level of network availability, there are other important things to consider. Cable, DSL, shared/dedicated Ethernet, WiFi, dedicated fiber, wavelengths — each of these circuit types have considerably different Mean Time to Repair (MTTR) as offered by the service level agreements (SLA).
MTTR for cable and DSL is the best effort in most cases. Business-class cable can provide for MTTR, but it’s generally the same business day. DSL or Cable service is probably sufficient for a small office/home office (SOHO), but a twenty-five employee office will need a higher level of SLA and MTTR.
The bottom line is to align your primary and backup service with the solution that is most cost effective and will allow for the highest probability of supporting the business. These are the details that must be explored and discussed. In many cases, these can and will be negotiated by the provider.
For example, if a dedicated fiber service is showing and your SLA needs to be requested or changed, the carrier provider won’t provide service initially until you’ve negotiated. In general, dedicated fiber offers guaranteed low latency as well as low levels of jitter and throughput, but occasionally an issue will arise. When it does, and an outage occurs, the last thing you want to hear your carrier say is, “We know you’re down, but we’re already meeting our SLA of X number of hours per month and we don’t guarantee throughput, so we can’t help you.”
We’ve seen this kind of scenario catch more than one IT department totally off-guard. We recommend covering your backside with as much detail about the circuit and what you’re getting so you can make sure it aligns with your needs. Even if the carrier representative says you’re covered by the SLAs, you should still take a second look.
It never hurts to ask for help
We suggest you seek the counsel of an attorney to help with the negotiation of a telecom carrier agreement and who can provide your company with credibility. We find that contracts are almost always easier to negotiate to the business’s benefit when they work through an attorney. When you’re negotiating an agreement to provide for high availability, an experienced attorney will undoubtedly save you time and money in the long run.
However, it’s worth noting that some attorneys fail to understand that SLAs and termination liability should be considered when building a network for high availability (as mentioned above). You’ll want to be sure to cover yourself and your business by not only making sure the service works as advertised, but also that you have a means to fix the service if it isn’t working as advertised. Include a clause that allows you to walk away from the contract without penalties if a problem doesn’t get fixed in a set amount of time. This agreement should be very cut and dry.
In the case of (x), the cure period is (y), which is a negotiable point, and the consequence is (z), another negotiable point.
An example of x, y and z are as follows:
- (x) is when the service isn’t meeting negotiated SLAs
- (y) is a period of 10 days
- (z) is the customer is able to cancel the agreement
These type of contract clauses are far more aggressive than what the provider will offer, and to your benefit. And should you ever need to exercise this clause, you’ll sure be glad you added it.
The next two recommendations are not exclusively related to network uptime, but they’re certainly pertinent during all contract negotiations, especially if you have network or support issues of any kind.
Firstly, over the years we’ve found value in identifying all of the areas where you’re legally committing services to the carrier. We have seen up to four different areas where a customer is contractually committed to keeping their network with the carrier. It’s not just about the overall contract commitment anymore. Specifically, make sure you’re only committed to a minimum 1-year term for both ports and loops. Also, make sure that the addition of a new service doesn’t cause the committed term to increase.
Secondly, contract term auto-renewals are evil. Avoid them at all costs and don’t let the carrier commit you to anything more than the initial term. You can accomplish this by simply asking for a month-to-month renewal term and, in some cases, understand and agree that the rates may change. Most of the time, though, a carrier won’t increase your rates because they want to retain your business. Start the dialogue about a month-to-month term at any time, regardless of your intention to renew.
5. Insist your technology service provider agrees
to provide and follow a project plan
This tip is an absolute must to follow. Make sure all parties are on the same sheet of music when it comes to how the project will flow. The project plan identifies responsibilities and allows all parties to be held accountable for their contribution to the project. Keeping your provider accountable will undoubtedly lead to less confusion, which should mitigate or remove some downtime issues from the equation.
6. Testing, Testing, and More Testing
Just keep testing
Prior to cutting a network, especially in a converged voice and data environment, it’s imperative that you test configurations prior to the turn-up date as to avoid unwanted surprises. There are many different human touch points involved in the provisioning of a network circuit, and we humans are most certainly fallible. Hardware, software, and application configurations need to be tested and emulated so there are no surprises when making a cut.
Even after making two or more tests of equipment configurations and applications, we suggest hiring a tier 3 or 4 level engineer for the first three location turn-ups. Unexpected problems can and will occur because all that time in the lab still doesn’t compensate for the real world.
7. Outsource NOC to monitor equipment and circuits after-hours
Network outages don’t take weekends or vacations
We know you and your IT department are very intelligent and can troubleshoot/repair almost any circumstance. We also know that you’re IT group is convinced no one can do a better job. And we completely agree!
The trouble is that IT personnel have a life outside of work, but troublesome repair issues can arise at any time of the day, night, or during holidays and vacations. When this occurs, telecom carriers typically require the customer to report the issue by making a phone call and identifying the problem. The business must also verify that power and equipment is available.
All of this must be done before a carrier will even address a circuit outage.
Unfortunately, if an outage occurs while your IT person is at lunch or out of the office, that could instantly increase your network downtime another 15-30 minutes. If your IT person is on vacation, maybe a couple hours. Or if your IT guy had a big night out on their birthday and an outage occurs, perhaps several hours.
Our point is that placing the burden on IT to open tickets with the underlying provider creates undue stress as well as additional lost productivity and revenue. This potential issue can be circumvented by outsourcing to a service provider that watches your network 24/7/365 and automatically opens tickets on your behalf within five minutes of an outage. This can help take your network up to 99.99% uptime.
8. Create a business continuity plan for phones and applications in the event of a core meltdown
Sure, we’d like to tell you that you’ll never have a problem. But 99.99% uptime allows for roughly four hours of downtime annually. There’s no guarantee that Mother Nature or some other unforeseen event won’t intervene and throw all of the above completely out the window. In those scenarios, a well-established plan is an absolute necessity and can mean the difference between losing a small amount of business versus taking a catastrophic revenue hit.Develop a comprehensive disaster recovery (DR) and business continuity plan just in case you lose your corporate location. Where will your team work? How will the displaced team access your applications? How will they take phone calls and communicate with the rest of the company? These and other important questions should be answered as a precaution that you’ll hopefully never need.
We hoped you learned from these eight best practices. You don’t have to implement all of these steps immediately. But if you want to insure maximum network uptime, then there’s no question that this list is a great start to achieving high availability — no matter what size company you’re operating.
How do we know these steps work? Because this exact list has provided our clients with 99.99% uptime time and time again. If you have any questions or would like more information about our SD WAN solutions, feel free to contact us.
At Wired Networks, we build highly available WANs so our customers can fully leverage their cloud applications. Let us do what we do best so that you can do what you do best!
Talk to an IT expert today. Contact Wired Networks to see how we can help.
About Wired Networks
Enable IP is a technology solutions provider founded by Wired Networks founder Jeremy Kerth and Wired Networks’ head engineer Steve Roos, after they realized there was a deep market need for helping mid-size businesses establish better uptime rates for their Wide Area Networks. Armed with the best-in-class carriers and partners, Jeremy and Steve set out with a bold plan: Guarantee better uptime rates than the industry standard of only 99.5%.
Their bold plan became a reality: Enable IP ‘s solutions guarantee clients 99.99%, even 99.999%, network uptime. But, they don’t stop there. Many telecom providers promise high availability network solutions but fail to deliver, because they are in the business of providing services, not solutions.
That’s the Enable IP difference: We deliver highly available networks by providing a complete system, called Cloud Assurance, that ensures our 99.99% or above uptime guarantee.
We deliver this bold promise by:
- Owning the entire customer experience. From pricing, contracting, ordering and provisioning to installing, servicing and billing—we do it all! This means no stressful negotiations, confusing setups, or finger pointing if something goes wrong. We actually deliver on our promise.
- We manage the entire system, and monitor and manage issues as they occur so you can focus on your business—not your network.