Architecture Case Study: Complex Networking with Hybrid Multi-Cloud

Architecture Case Study: Complex Networking with Hybrid Multi-Cloud

Today, we are going to address a hospital's networking and availability problems. It is known for being one of the world's best renowned hospitals. The hospital is based in Massachusetts, serves a regional audience, and has about 5,000 employees. Out of those employees, about 2,000 remote doctors access the company's data centers via a Virtual Private Network (VPN).

The hospital is connected to a data center in Massachusetts, which is privately connected to 250 offices within 100 miles, a data center in California, which is privately connected to 200 offices within 100 miles, and a data center in New York which is privately connected to 100 offices within 250 miles. These data centers must be highly available at all times as patients come in with life threatening injuries. The availability requirements for this hospital is 99.9999%. So if the doctors cannot access their respective patients data within their data centers they can die. This is why the availability requirements are very important to meet. Doctors also may need to access patient data from the other data centers and often complain of latency issues.

This organization runs its supply chain software on 50 AMD Apex 128 cores, 4TB of RAM servers in RAID 0 in New York, at 70% capacity and does not want to refactor their software. It also runs its website, applications, and database servers on 1200 AMD Apex 128 cores, 4TB of RAM servers, running 24 hours at 80% capacity split among its 2 data centers of Massachusetts and California. This organization has a proprietary application that requires sensitive load sharing based on unique URL paths. The database is hosted on a single database server that stores patient data in a structured format.

The company cannot tolerate a breach of its system. This organization is a $60 billion business, with a 12% year growth, which can grow up to 22% with optimized supply chain, and improved data center performance. The company wants an architecture that will improve its business performance and agility. They don't want to depend on 1 cloud provider in case of sudden price changes or major cloud outages.

Company Present Architecture

Each data center has a connection to the other data center via a 10 gigabit private link connection. The routing protocol used between these data centers is the open shortest path first (OSPF) of Area type 0, which operates within a single autonomous system (AS).

Screenshot 2022-04-11 10.38.42 AM.png

All of the data centers currently utilize the top four internet service providers on their internet-facing routers to better the performance of their supply chain software, their website, and to lower latency. Border Gateway Protocol (BGP) is an exterior gateway protocol. It dynamically scales the routing protocol by learning new routes. The Interior Border Gateway Protocol (IBGP) runs internally to allow internet service providers traffic through their internal routers. The Exterior Border Gateway Protocol (EBGP) handles external connections to external entities.

Behind the routers, firewalls protect the company network. The VPN concentrator sits in a demilitarized zone to handle all IPSec connections from remote employees and put them behind the firewalls to access the company's internal system.

Screenshot 2022-04-11 10.39.28 AM.png

In terms of security, the company uses a next generation cisco firewall as its first layer of defense to keep all the bad guys out. Behind it, there are access control lists on the routers and Microsoft Active Directory is used to store information about users on the network. Lastly, they have host based firewalls on their servers and encryption is done with AES-256 to ensure data security.

Screenshot 2022-04-11 10.40.31 AM.png

Regarding their 3-tier application architecture, the company uses network load balancers to distribute the traffic to its web servers. This allows high availability for the web servers since load balancers conduct health checks. There is also a group of application load balancers to distribute traffic to the app servers. Finally, this hospital utilizes a PostgreSQL database and wants to be able to scale it. All of the servers are mounted in RAID 5 to provide fast reads and parity data.

Screenshot 2022-04-11 10.40.37 AM.png

This hospital has critical supply chain software that runs in the New York data center and is fronted with application load balancers to route traffic intelligently. This organization has recently been growing quickly, and they mentioned their supply chain software has been experiencing outages due to too many false requests.

Screenshot 2022-04-11 10.40.43 AM.png

Now, let's implement the new architecture to better the organization's network!

Company New Architecture

After evaluating the organization's system, I've found that the best way to solve their network architecture is to implement a hybrid multi-cloud solution. This refers to using a private cloud solution for cloud computing, storage, and other cloud services, while also utilizing public cloud providers. I will use the Amazon Web Services cloud as a primary public cloud to leverage the quality of the infrastructure, Microsoft Azure as the secondary cloud service provider, and I will leverage the Google Cloud Platform for any machine learning, artificial intelligence, and data science as a third cloud service provider. For this architecture I found it would be very useful to leverage the Oracle Cloud as the fourth cloud service provider due to Oracle working closely within hospital environments. I will finally incorporate the Nutanix cloud as the private cloud solution to leverage the simplicity, agility, and the scalability as the fifth cloud provider.

The internet connectivity from the present architecture needs some modification by adding more internet service providers. I will provision at least four more connections to the internet with unique internet service providers to increase resiliency to keep the latency low.

Screenshot 2022-04-11 10.40.48 AM.png

I understand the organization's problem with single cloud provider dependency. For this, I will add a Nutanix cloud to all of the data centers and I will connect those data centers to three other cloud providers. I will connect them to Azure and create multiple backup copies of their critical application to mitigate that risk. If anything goes wrong with AWS, Azure will become the central cloud instantly. If both Azure and AWS are down then the Oracle cloud will still be available as the third cloud provider. I will leverage Oracle because of the compatibility it has with hospital environments. Technology that makes the staff's lives easier is what increases productivity. If Oracle, Azure, and AWS are down, then the Nutanix cloud is the last one standing. These precautions are due to designing for an environment when downtime can cost not just revenue, but lives can be lost as well. The hospital's main data center which is in Cambridge, MA will be the layout for the other data centers in NY and CA.

Screenshot 2022-04-11 10.40.55 AM.png

I understand the urgency and severity the doctors have about not getting access to their patient data fast enough, To address this, I would like to provision four additional 10 gigabit private links for each of the three data centers. With those links I will put them in a link aggregation group and the speed should reach about 50 gigabits per data center. I would also like to provision a VPN backup solution to ensure there will always be a way for the doctors to access their data at any time.

Screenshot 2022-04-11 10.40.59 AM.png

I will move the supply chain application to the cloud because it is the most elegant and straightforward solution. By doing this, I will re-purpose the 50 big servers to run latency sensitive proprietary applications in both data centers in Massachusetts and California. I will also frontend the supply chain with a content delivery network to serve web content to customers and for baseline DDoS protection. I will also place cloud service level DDoS to further secure the availability of the supply chain application. That will buy the organization about 12% growth for at least three years and they no longer need to worry about their website or supply chain being unavailable. Migrating the supply chain to the cloud will allow it to scale via auto-scaling and eliminate the server capacity issues.

Screenshot 2022-04-11 10.41.14 AM.png

The security architecture in the cloud will be as follows: Cloudflare cloud level DDoS protection. I will go to the Cisco marketplace to get NexGen industry performance firewalls and IDS/IPS systems. I will leverage the network access control list as a layer of security for my subnets. Then I'll enable my cloud level security groups, followed by adding the host-based-firewall, anti-virus/anti-malware software, and disabling any unnecessary services on any system to protect my virtual machines (EC2). I will leverage each cloud provider's level of AWS key management service (KMS) to manage data encryption, and IAM to provision authentication, authorization, and keep track of users' activities. Finally, I will use the Amazon Web Services Managed Microsoft Active Directory to manage users both on the cloud and on-premises and give them that extra layer of protection through Multi-Factor Authentication (MFA).

Screenshot 2022-04-11 10.41.10 AM.png

With the cloud 3-tier application architecture, I will leverage each cloud provider's level of elastic load balancers and use the network to handle millions of requests. I will provision block storage volumes in RAID 1+0 for their application servers and database servers. I will then mandate access to that data via private link or via a VPN. For the web servers, I will provision block storage volumes in RAID 0. That will give them the best combination of performance and speed. I will use Amazon Aurora DB mainly because of its near failover features, massive read scalability, and auto-scaling features.

Screenshot 2022-04-11 10.41.04 AM.png

This new architecture will allow this organization to grow by 12% and reach its 22% forecast year-to-year growth.

NB: This architecture is a very very high-level representation and the complete one is much more detailed and much more complex. The intended audience is the general public.

Thank you for your time, and I hope you enjoyed this architecture study case!

Dan, the Architect.

  • May the cloud be with you