SLAs - Who do you call for help

atorvenatorven Member Posts: 319
When your supposed to be the SME?

This is mainly for the guys in the big shops(service providers etc); what do you if you can't figure something out(the buck stops with you), go the vendor? Isn't this very costly?

Also, can someone please give me some details on SLAs in such high pressure environments - Lets say you made a huge change and at first glance it looks like everything is working fine, later on you receive a high priority call that is related to this change, how do you tackle this? Let's say that you troubleshooting and you realize that your going to be outside of the SLA, what do you do? Do you carry on troubleshooting and hope for the best before time runs out or do you just give yourself enough time to revert your changes before you reach your SLA?

Comments

  • networker050184networker050184 Mod Posts: 11,962 Mod
    If you are the last line of defense and can't figure it out then you need to call the vendor. Hopefully (for your sake) it isn't an easy fix. Most of the time it comes to calling the vendor its a software bug. If you are the SME you should be able to find a misconfiguration.

    As far as the SLA and changes, you should have a back out plan. If things go south then rollback and live to fight another day. Get back in the lab and test with the feedback you have on the issues. You don't want your customers to be your test bed. If it doesn't work, put back what does. Obviously there are some cases where back out isn't feasible, but most of the time it should be.
    An expert is a man who has made all the mistakes which can be made.
  • vinbuckvinbuck Member Posts: 785 ■■■■□□□□□□
    Like networker said, typically you can involve the vendor if it's one of the big boys. Oftentimes, though, you usually end up having to figure it out through a combination of dilligence and systematic troubleshooting. The smaller vendors are usually not as familiar with the complexities of Service Provider architecture and it's up to you to glean the useful pieces of information they have to offer and put the puzzle together.

    My favorite was a vendor who about dropped the phone when we asked him about deployment models for his product using several hundred or even several thousand VLANs. Poor guy icon_smile.gif

    Work your way through the OSI Layers and go piece by piece. Prove everything...assume nothing (even if you think it hasn't changed in 10 years) As far as SLAs go, they may include a maintenance window or at least the details to schedule one. If not, you do your absolute best to make the changes transparent to the user. This is where tools like GNS3 or test labs can really save your butt. Not everything can be tested in a sim, but if you use a lab type environment to test your changes as much as you can, then you save your "credit" for when you really need it.
    Cisco was my first networking love, but my "other" router is a Mikrotik...
  • atorvenatorven Member Posts: 319
    So, always have a plan b! Thanks for the insight guys.
  • Forsaken_GAForsaken_GA Member Posts: 4,024
    We pay very large amounts of money for the service contracts for our vendors, and we have enough clout that we can usually get 4 hour service out of Cisco, even if our service contract is next business day if it's customer effecting.

    So yeah, if I can't figure it out, and I've got no one else to escalate it to, the vendor gets an open case and a phone call. And they're usually quite happy to assist.
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    I used to work for a big hosting company / ISP .. as a result you have contracts with vendors in place.

    VMware - you are likely to be VSPP partner - which gives you full licenses paid monthly, with support attached
    Microsoft - you are likely SPLA partner - again, licenses are paid monthly and also gives you support to some degree
    Linux - you likely end up as RedHat partner - giving you enough licenses and support

    So in issues with the above - you likely get in contact with the vendor (if all internal resources fail).

    There are also options where you go through a supplier - where the supplier provides the support and escalates if necessary - seen this especially with network gear where companies buy refurbished products with attached warranty.

    As for SLAs - you have to have your own SLA agreements with your customer. But you will also have SLA agreements with your vendor. So you might be able to push some costs to your vendor - its all in the small print though.

    As for re-imbursing customer - to be honest, they are unlikely to get rich from an outage .. most SLA are so tight, giving you only a small percentage of your monthly fee etc.

    Here an example of an SLA which you can find on my last employer's webpage

    Service Level Agreement (SLA)
    My own knowledge base made public: http://open902.com :p
  • Forsaken_GAForsaken_GA Member Posts: 4,024
    atorven wrote: »
    Also, can someone please give me some details on SLAs in such high pressure environments - Lets say you made a huge change and at first glance it looks like everything is working fine, later on you receive a high priority call that is related to this change, how do you tackle this? Let's say that you troubleshooting and you realize that your going to be outside of the SLA, what do you do? Do you carry on troubleshooting and hope for the best before time runs out or do you just give yourself enough time to revert your changes before you reach your SLA?

    Anything within our impact window (2am to 5am eastern) is fair game. We don't get dinged for it. Anything beyond that, we'd better be damn close to fixing it to justify the out of window maintenance. If we can, then it's usually no big deal. If not, then we back it out. If we can't back it out, we press on, and then figure out wtf happened at the post mortem.

    Our goal is to contain the impact to customers to no more than 30 minutes, and we're usually pretty good at it. Most of the maints I do are around 10 to 15 minutes impact, and those are usually only because we're doing some rewiring in the headend and that's how long it takes to move the wires and wait for the modems to range back in. It takes hours of preparation to make that happen though. For anything that's non wiring related that may be impactful, we like to take circuits or links that are going to be an issue out of the forwarding path ahead of time and run on the backup links so that the customers see no impact.

    Of course every once in awhile something messed up happens like the entire DHCP database for an entire headend gets corrupted when you're doing a migration and you have to boot everyone that connects to that headend off and let them re-IP, but those nights are rare.
  • DevilWAHDevilWAH Member Posts: 2,997 ■■■■■■■■□□
    If you are going to fail an SLA then talk to people, tell people!!! If you show that you are working to fix an issue and that no silly mestakes have been made, most clients will work with you to help figure it out. The worst thing to do is keep quite and struggle on.

    Keeping people updated is the key, and showing you are working to hit your SLA's
    • If you can't explain it simply, you don't understand it well enough. Albert Einstein
    • An arrow can only be shot by pulling it backward. So when life is dragging you back with difficulties. It means that its going to launch you into something great. So just focus and keep aiming.
Sign In or Register to comment.