Another post from guest blogger Mike Bean. A little Latin, a little logic, and the lowly service console.
Hello virtual world!
At the risk of singing an old refrain, I was unavoidably detained from my self-imposed one blog a week schedule. Due to circumstances beyond anyone’s control, my caseload has increased by an order of magnitude. Between you and me, I think my manager’s just trying to make sure I don’t have time to write any more blogs! Jokes aside, that’s really just the nature of the business we’re in, tech support is the information technology version of firefighting. Some days you’re playing cards in the station, next thing you know you’re hip deep in multiple five-alarms all over the county.
Can we speak candidly? Make no mistake about it, tech support kinda sucks. It’s very high stress, very high turnover rate, and generally fairly unappreciated. My intent here is not to solicit sympathy, but rather to paint an accurate picture. So why does anyone take the job?
Personally I tend to think it does have some advantages. It keeps the mind nimble. I can say with complete sincerity, that VERY RARELY do I leave the apartment with any real expectation of what the day has in store for me. More than once a customer has asked me, “So, have you ever seen anything like this?” Usually this is right about the time I’m frowning at my monitor with cartoon question marks spawning from my head. You really never know how the day will end. True story, one minute, I was fairly proud of myself. I’d managed to make a fairly tricky diagnosis concerning a half dozen some odd VM’s shutting down the night before. The next minute, I was getting REALLY BASIC elementary level instruction in unix/linux compression from our escalation engineers! (ah, humility, my old friend, it’s like you never left me!)
Despite all this seemingly random chaos, every now and then, patterns tend to emerge. This week’s blog, is in fact, brought to you by last week’s pattern, which, as it turns out, was (drum roll)
BROKEN SERVICE CONSOLES
Ladies and gentlemen, I’m going to ask as nicely as I know how, PLEASE! Tread lightly on your service consoles! Obviously, I can’t know what ESX’s developers had in mind. Regardless, I’ll gladly make the argument that the service console is NOT INTENDED for day to day use. As I’ve alluded to in a previous column, they are a maintenance hatch. It’s unrealistic to expect our customers not to use them, but, please, have the clear understanding that there-is-no nested maintenance hatch, within the maintenance hatch! I met one gentleman this week whose security department had pushed out untested authentication files to his ESX service consoles, effectively breaking the root user! (I’m still a little foggy on how we fixed that one, but I think it involved several sacred rites and sacrifices to the Aztec sun gods!)
The fact is, everyday at VMware, people care A LOT about stability, and they work hard to ensure that, BUT our QA department can’t test every possible package combination for compatibility and long term stability. It’s not realistically possible. So consider please, a possible rule of thumb. If you’re thinking about adding packages to your service console that weren’t there natively, please automatically assume that you’re adding a package VMware has NOT TESTED. There may be cases where that’s not true, but I think you’ll find it’s far more accurate as a rule then an exception.
I can recall a marathon session with a customer whose hosts weren’t making it more then 3 or 4 hours without crashing. Through old fashioned, traditional logic & process of elimination we narrowed it down a third party package on his consoles. When I followed up on the case several days later, he explained the software had been pushing out some sort of faulty active directory policy that was finally isolated as the source of the crashes!
entia non sunt multiplicanda praeter necessitatem
In case my readers don’t speak Latin, I won’t make you guess what that means, if you recognize it, more power to you! It’s commonly known as Occam’s razor. "Entities must not be multiplied beyond necessity", or the more common interpretation = “the simplest explanation is usually the correct one.” Occam has all sorts of applications. It’s a 14th century logic principle, that’s considered a scientific guideline by some, a mathematics principle by others, and a philosophy rule by crazy tech support engineers, (like myself). My co-workers can testify to this, I actually wrote it on the whiteboard in my cubicle, as a reminder to myself.
By and large, we’ll leave philosophy to the philosophers, but I would suggest to you now that Occam is an EXCELLENT principle to follow when it comes to modifications of the service console. If, by any circumstance, the service console is completely unrecoverable, there is NO option but reinstallation. I ALWAYS encourage caution. The less you modify your console the better, and if you MUST modify it, FOLLOW Occam and do so only with the greatest of care. The simpler you keep your configuration, the more reliable the host will be, and the better the chances it’ll be there when you need it!