It’s exciting times to be in the network game, especially when it comes to the datacenter. It’s in this environment, service providers around the world have seen tremendous traffic growth and security requirements from their customers that can stress the capacity and capabilities of their current networks. No longer is it a matter of tailoring a datacenter for just web hosting; service providers need to make sure their networks can be flexible enough to accommodate the many types of customer usages (Web, VOIP, MMORPG, Video, Mobile Apps, Storage, etc.) AND deliver carrier services such as MPLS/VPLS, IDS/IPS, IPv6, and dynamic routing protocol signaling to customers (eg. BGP). To top it all off, service providers need to accommodate the new cloud strategies, as customers scramble to find server virtualization products that meet their needs for reduced costs and getting horsepower on demand. With the wide variety of cloud offerings available (and growing), the network operator has to have an infrastructure flexible enough to accommodate the complex requirements of each type of cloud product.
So what’s a network engineer to do to in the face of ever-growing challenges that their customer requirements pose? Keep adding etherchannel links within an oversubscribed network access layer, adjust complex spanning tree algorithms and hope to hell that nobody plugs in a switch in the wrong port that could potentially tear down the whole datacenter? Even with the emergence and availability of 10 Gb ports in today’s DC networks, maintaining a loop free topology becomes increasingly difficult as network access layers grow in size, and inefficient. Locking down a network so that only a very few select operators can work on it, is not an option; Try explaining to the COO that you are slowing down provisioning of customers and keeping operational departments on standby, in the interest of maintaining network stability. Safe to say you won’t win that argument;->
Fortunately, the network hardware vendors recognized these business challenges in the datacenter, and have come up with new strategies and designs that call for the minimization or complete eradication of complex spanning tree protocols inside the DC, while at the same time providing a high availability, highly flexible, easy to scale, and of course a high performing network. PEER 1 Hosting had the chance to review and assess DC strategies from some of these vendors over the last 24 months as we prepared for the new network deployment in our flagship datacenter in Toronto.
Cisco Virtual Switch System (VSS) & Nexus Platform
The Cisco VSS is a network system virtualization technology that pools multiple Cisco Catalyst 6500 Series routers into one virtual router. Access Layer switches connect to this pool of 6500’s, and run Multichassis Etherchannel protocol across multiple links, and all links are live (multipath). There is no port blocking or spanning tree involved on these links, giving high throughput capabilities. Some highlights of the VSS design are ease of management (at the routing level anyways), no need for spanning tree at least at one level, no need for HSRP/VRRP/GLBP, and flexible deployments options as switches can de positioned in various locations throughout the DC, connecting back to the 6500’s either by Gigabit or 10 Gigabit interfaces.
On the heels of VSS strategy, Cisco also came out with another virtualization design that started at the access layer instead of the distribution. The Nexus 5000/2000 Platform virtualizes the access layer by using a pair of 1U 5000 model switches that act as ‘parents’ to a maximum of 12 Nexus 2000 fabric extenders. All devices act as a single logical unit, flattening the access layer, simplifying operations and removing the need for spanning tree. The Nexus seems to be the dominant datacenter strategy for Cisco, as it addresses a lot of the ‘moving parts’ and requirements at the access layer instead of just at the distribution layer like the VSS does; operationally speaking, the access layer has been the most volatile to deal with inside any large enterprise datacenters.
Brocade TRILL
While not a Brocade specific protocol, the TRILL (Transparent Interconnection of Lots of Links) protocol has been embraced by the company for their datacenter network initiatives. Akin to VSS, TRILL introduces multipath into the Layer 2 networks, while at the same time minimizing the need for spanning tree. In a flattened layer 2 network, TRILL requires each device to run a link state protocol amongst themselves to identify optimal paths through the various links within (via ISIS or in Brocade’s case FSP). In the Brocade implementation, switches participating in the TRILL protocol will auto discover and auto-configure each other to form a single logical switch to the rest of the datacenter network as one large switch. The very recent launch of the Brocade VDX Datacenter Ethernet Fabric is being touted as significant development on their Trill strategy, and initial indications are impressive.
This tactic, coupled with their Multi Chassis trunking solution at the distribution layer (very similar to the Cisco VSS), brings a lot of simplicity and scalability to the datacenter network. And because TRILL is an IETF protocol, it should allow for interoperability with other hardware vendors, such as Cisco and HP, although I wouldn’t hold my breath for this to be realized any time soon..
Juniper Virtual Chassis & Stratus
Taking a decidedly different path, Juniper has taken a proprietary approach for the datacenter fabric, which is as a bold step for company traditionally viewed as being in the carrier space only. Starting with their virtual chassis (VC) EX4200 switch fabrics, one of the first to market in this sphere (Brocade has a FCX switch fabric in this sphere), Juniper was able to provide the first 10 member VC switch fabric with a 128 Gb/s backplane. This virtual chassis, not to be confused with stackable switches, has an ‘any to any’ port mapping within, meaning that full Ethernet packet processing is executed only once upon entering the VC and the packet is transported to the exit port without repeating that processing; a radical departure from what happens in typical tree architectures seen in most datacenter network implementations. Even in a TRILL implementation, each network element has to do it’s own packet processing as a packet passes within a TRILL boundary. To deliver the any-to-any port mapping, the EX4200 uses shortest-path, cost-aware and multicast-aware protocol, ensuring optimal use of the VC backplane resources and allowing multipath and extended reach topologies; a huge advantage when it comes to customizing a network deployment to the unique needs of each datacenter.
Aside from the flattening and simplification of the data plane, the EX4200 proprietary approach calls for a single control plane (unlike TRILL), which allows for master/backup routing engine architecture within a VC, and all members have a consistent view of the forwarding database. This provides for substantial improvement in management and availability, inherently lower latencies and excellent cross-sectional bandwidth. And of course removes the need for the dreaded spanning tree protocol from your operation.
Everything I’ve discussed has pertained to the access layer so far, but Juniper’s vision, known as the Stratus Project, is to realize the any-to-any, single control plane design to encompass the distribution and edge routers. Their ultimate goal is to have the whole datacenter network be administered as one big logical switch, allowing for simplicity, huge scale, and great flexibility.
So, Who did PEER 1 Hosting Choose?
After taking all the vendor proposals into consideration with what we wanted to achieve at our new flagship datacenter in Toronto, we went with the Juniper solution. Our DC design called for top of rack switches for individual rows, where each row needed a minimum of 10 Gb/s trunk capability, and an ability to LAG up in 10 Gb increments for massive capacity. We also needed network profiles for customers to easily move anywhere throughout the DC, for many reasons including vMotion and supporting non linear customer growth. In addition to meeting our requirement for removing spanning tree and providing huge flexibility and simplicity, the Juniper platform also has the added benefit in its operating system by providing onboard automation. As most network engineers are quasi developers and scripters at heart, we were able to start developing operational, event driven, and commit scripts on the network devices themselves to help manage the network infrastructure. We are also exploring utilizing this onboard automation to enhance the current automated provisioning process of our hosting products.
At the time of our decision, it was seen as a bold move to use Juniper, as it was not regarded as a traditional datacenter network vendor. However, we’ve been duly impressed by the rollout, and it would seem that the industry recognizes this as well: Juniper’s Enterprise product line appeared recently in the Gartner Magic Quadrant in the Enterprise Lan as a challenger to the traditional incumbents of Cisco and HP (http://www.gartner.com/technology/media-products/reprints/juniper/vol6/article4/article4.html), as well as being identified in the latest market share report from Dell’Oro Group showing Juniper advancing to the #3 spot in the Ethernet switch market. It’s been enough of a success in our new Toronto DC , that we chose to use the same strategy in upgrading our Serverbeach networks. I’ll describe that in a later post.