It’s 9:38 and you have barely read a third of the 56 new e-mails. That’s not counting ticket updates that need attention or all of the group e-mails – counting those brings the total to over 200 unread messages in your inbox. Good thing many of those are group emails, although everyone in the group has been sitting with you in this meeting for the better part of the last hour.
Before heading to your meeting, you let the NOC (Network Operations Center) know to call the conference room instead of the regular group extension for the next hour. Any time you’re away from your desk for an extended period of time, it’s good practice to ensure the NOC knows how to reach you. Of course, lunches are ok since the group staggers them to ensure someone is always in the Engineering department during business hours. Well, usually that means that when you see someone eating at their desk you answer the calls for a while.
“So, does that work for you?” The question was directed at you. You stumble for a second to remember what the initial question was. That’s right, the NOC discovered a link running with errors on it. No customer impact as of yet, but you need to act quickly on those before they crop up into dozens of clients calling in at once. Most likely it will amount to some prep work, a network optic swap, some testing, then packing up. Worst case it’s not the optic and the troubleshooting and testing change from 30 minutes to 2 hours. Pretty standard stuff.

Ben solving more problems while he's taking a break
“Yup. Tomorrow night at eleven o’clock PM pacific time works for me. I’ll take care of the emergency change control when I get back to my desk”. You are glad it is tomorrow night. Tonight you have a hockey game and you’ve already missed a few this season while out of town for work. That’s the same reason you’re glad it’s not the night after next. You don’t want to miss another Thursday night date night with your significant other. It’s tough enough making the date nights work when you’re on call. Having to rush home to allow ample prep time for a late night maintenance just isn’t going to cut it. Yes, Wednesday night would do just fine.
“You’re sure you don’t want to pass this one off? This morning was a long one for you.” Your manager always offers to take these emergency changes, even if he’s already got a maintenance that same night. It’s good to work with someone who has your back.
“Nah. You guys all have a maintenance this week. And you have two.” You gesture with a nod of your head right back to your manager to acknowledge that you’re well aware of what his work load is like.
“Good. Because we wouldn’t have helped you anyways” one of the other Engineers says with a smirk. You grunt to acknowledge the light-hearted humour. Sometimes it’s tough love on this team, but not without a healthy level of respect among the group. Being a small team that juggles large scale projects, tight dead lines, and daily fire fighting, respect is of the utmost importance to ensure all resources are working perfectly in unison.
“Anyways,” your manager interjects. “Let me know if you change your mind.” Just another reminder that that door is always open.
By the end of the meeting you have a note to create an emergency change control to correct some link errors as well as another late night maintenance that you have to do next week to increase some link capacity in one of the other datacenters. Again, technically someone else could take care of this one, like the on call engineer, but he’ll be busy enough with the day to day work of being on call. But the main reason is that each of the Network Engineers on the team have their own projects and specialties. Each knows the ins and outs of certain datacenters more than any of the other members of the team. This works needs to be done on your turf, so you’re the one that takes care of it.
One thing people don’t realize is what each datacenter network means to you. Each one is a unique thing of beauty that you have either built or inherited. Even the inherited networks are eventually looked upon as your adopted creations. Over the years you have shaped, sculpted, and moulded these networks with your hands and knowledge, and each reflects the intense pride you take in your work. You won’t let these pieces of art fall to ruin – not because you are paid to take care of them, but because you could never mentally, emotionally, or physically bring yourself to neglect that which you have created.
So yes, you’ll do the maintenance next week.
It’s 5:30pm. You begin to pack up and make a mental list of what you’ve accomplished during the day. Doing this helps you make your to do list of the following day. You:
1) Took some traffic off a transit link that was running above your commit rates. You did a quick calculation and if that traffic had run for more than 36 hours of the month then the 95th percentile bill would have kicked up about $5,000. Not a nice number when technically that traffic had already been running somewhere else, so we’d already committed to pay for it on that link. Essentially double billing you for the traffic. Good thing the NOC had caught it and called down to alert you.
2) The usual calls from the NOC to help out with client requests. You cleared the ARP for a colocation client who had swapped out a switch and didn’t realize that they needed to let us know about it. Another colocation client needed a static route changed at a specific time during the day.
3) You had a quick call with a client and their client relations manager to help them on configuration tips of their network gear. The client was very nice and appreciative, a good combination.
4) You wrote and replied to countless e-mails. Some were about new network gear that is being tested and you needed clarification on some specifications from the Vendor’s rep. You can’t make any assumptions on gear you haven’t used before. If something goes wrong with that gear in production guess who’s on the hook, you. Another e-mail was to answer a question from one of the members of the sales department. A customer was wanting to try something a little funky and weren’t sure if Peer1 was the right fit. So some questions needed to be answered. Another e-mail was an update to a large e-mail trail updating the group about your progress on a release for a new product. Without the network piece in place the product is non-existent as well as being off the deadline. So you need to make sure you get everything in place for that. Having network hold up the release of a product does not look good.
5) You updated a few tickets to let the NOC and the rest of the department know what the progress is. The few tickets that are on your plate are lower priority ones, so you can take a little while to get all the pieces in place. Which means it’s ok if you work on them a bit here and a bit there.
6) You created a couple change controls for a couple important late night maintenances that needed to be done. During the change control creation you double then triple check your logic behind the procedures you’ve outlined so that you can be confident in what you predict the impact will be. That’s a big part of working on service impacting maintenances. If you tell the client base that all they will see is momentary slowness, then you’d better be sure that that’s the worst that they see. Anything more than that is very bad.
7) You had a talk with the team about some upcoming upgrades. A discussion needed to be had on what different scenarios we could come up with to complete the upgrades. Future talks will be needed to work out what each scenario will cost, and which is the best idea, taking costs and benefits into account.
It’s 5:45pm get a call from the NOC – they have a misguided client requesting things that the Internet isn’t capable of. It’s not a big deal though. The internet is a complicated beast, and not one that is easily tamed. You put on your teaching cap and do your best to break things down into bite size pieces that are easy for the client to digest. In the end the initial request was actually spawned from something else entirely. Once you dissect and explain all the options you drop off and let the NOC handle the rest.
They’ll go on to explain common tools to use and the best way to approach troubleshooting a network issue, which, they remind the client, can always be brought to the attention of the NOC. You know if they encounter any issues, or even if they just have a question, they’ll call you up later. That’s what you’re there for – to ensure the NOC has access to as much knowledge as possible. This sometimes means you get a call while you’re out for dinner to explain the nuances between iBGP and eBGP. But you know knowledge is power. The more you pass on to your front line soldiers, the more time you can spend strategizing your army’s next move as opposed to fighting small skirmishes.