Takedown, Shakedown, Downtime
rickshangle.com and a few other web sites I run out of one of my web services providers went down for the past few hours, but appears to be back up now… for now. I hope you survived your loss of knee-slappery / frippery / frippertronics with minimal long-term psychic damage… snort.
given what I do for a living (protect data, manage storage, make systems highly available; also- babble, stink up the joint), it would be the apex of hypocrisy if I went off on a rant re: this unplanned downtime, given that the extremely modest sum I pay annually for these services, versus the relatively broad menu of use-to-use options my provider offers. I don’t pay for five 9’s, so I don’t expect it. Honestly, I don’t pay for three 9’s, either… do I even pay for two? Is there any explicit service level agreement / contract in place between my provider and I? The fact that I don’t know offhand is as good as “no”, in terms of my expectations.
What was my point… oh yeah, found it: I feel like I pay a fair price for what I get (a user-friendly service with a lot of technical features, is up most of the time, and keeps me from having to be a round-the-clock system administrator).
However, what has changed with my provider recently is their level of response to inquiries re: outages et al. For the three outages I experienced prior to today, the chronology generally ran like this:
1. +0 minutes: I notice something’s wrong
2. +5 minutes: I determine it’s very likely not my application (ex. Wordpress) taking a giant dump (and it has on occasion)
3. +6 minutes: I contact the help desk with the problem, asking for status / assistance
4. +15 minutes: I’ve received a response with either said status, or acknowledgment that my inquiry was received, and a number of follow-on questions to help troubleshoot the issue (what I expect in terms of standard content for a help desk dialogue, in other words)
5. +30-45 minutes: I’m generally back online, and satisfied, with the issue resolved. Granted, most issues fitting under said hypothetical timeline are generally minor/minor to resolve (ex. a database got moved)
Contrast that with today’s experience, which is the first outage I’ve had since my provider has been touting their shift to “24×7″ service a few months back:
1. +0 min - I notice something’s wrong
2. +5 min - I determine, since it’s impacting a number of apps (different apps - not all Wordpress, for example, so the odds that a common exploit is being attacked are lower than they could be) in different domains
3. +6 minutes - I contact support describing the problem
4. +120 minutes (now) - my sites are back up, and I haven’t heard from the help desk.
Someone is clearly off making a sammich.
Everyone has bad days. As someone who worked help desk support for about four years, I know this. We’ll keep an eye on it. And at the end of the day… for $120 a year or whatever… how much can I complain?[1]
rds
[1] Terror rising at how much those sound like famous last words.

Leave a Reply
You must be logged in to post a comment.