Here’s the story in a nutshell: he’s built a serious VSAN cluster that uses Diablo’s Memory Channel Storage™ technology for flash storage, which means he’s well ahead of the pack. No, it’s not officially supported yet. Regardless, very impressive stuff — and a great example of hyperconverged architectures to come.
Gabriel (or Gabe as he prefers) was kind enough to speak to me about what he’s doing: the motivations, the thinking and the experience that resulted. It was an amazing story.
I hope you’ll agree as well …
Gabe, can you tell us a bit about yourself and what you do?
I grew up in the Houston area, and was recruited by EDS right out of college. EDS had an amazing training program, which gave me a strong foundation in IT fundamentals. Today, I’m an infrastructure adviser for a large IT services provider specializing in healthcare.
So how did you become interested in these technologies?
My company has a key client that runs a very big Cerner environment that’s central to their operation. Originally, we had implemented it as you’d expect: big UNIX boxes, lots of HBAs, big storage arrays and SAN fabric. And lots of people, of course.
Over time, though, the environment started to spin out of control.
When we had a problem, we had to troubleshoot across multiple technology towers, and that took time.
When the environment needed to get bigger or faster, the required acquisitions were expensive, they took months to acquire and stand up, and sometimes didn’t solve the problem.
And — over time — it wasn’t getting better, it was getting worse.
It got to the point where our client read us the riot act: shape up or else. We realized that our current architecture couldn’t keep up with what our client deserved. We needed to think differently about the problem — and that’s when we started researching alternative approaches.
What caught your interest?
At the time, I was hearing about two new technologies I thought could make a difference. The first was Diablo’s Memory Channel Storage™ technology, basically, flash storage on the motherboard — and a lot of it. MCS allows direct access to that flash, but using a storage model vs. a memory model. It was faster, denser and more integrated than anything else out there.
This technology is basis of the Lenovo eXFlash™ DIMM.
The second was VMware’s Virtual SAN — VSAN — a hyperconverged software storage product that could easily use this newer technology, plus also deliver a single-vendor software solution.
What were you hoping to get?
Everyone thinks it was all about performance, but that wasn’t really at the top of my list.
I wanted operational speed, I wanted agility, I wanted flexibility — I wanted “nimble-ness” from our next environment. Things needed to be able to happen much, much faster and more predictably.
I needed to be able to add capacity or performance as simply as I do when ordering server parts. If there was a problem, I wanted to support it inside a single technology tower vs. across several. I wanted to know that I could throw any workload at my environment, and not have to worry about performance.
And I wanted something that was darn simple to build, operate and troubleshoot.
Tell me about your current VSAN environment?
We’re using ten Lenovo x280 blade servers each with 12TB of eXFLash DIMMs. We use 40Gb networks in each rack, and 10Gb across the racks. Each server has 256GB as I’m not really worried if I start swapping to flash — worse things could happen. And I’m using VSAN 6.0.
You use blades, but most people use racks for hyperconverged environments like VSAN.
Blades offer me a lot of advantages: pre-planned power and cooling, for example. They’re incredibly easy to pop in and out without a lot of thought or planning.
I suppose if I was using magnetic disks or traditional flash disks, that might be an issue, but all of my capacity and cache is right on the motherboard, where it belongs. It makes blades a heck of a lot more attractive.
And VSAN didn’t force me into a form factor.
I suppose performance is pretty good as well?
Yes, as you’d expect. We run our own application testing workload. The best we’d ever see with external arrays was 6-9 msec application response for end user response, and that’s after a lot of optimization work. We’re now getting 2-3 msec, and that’s without any real tuning or optimization.
That’s three times faster — so the client notices it as well.
What about cost?
People point to the cost of flash, but I point to the cost of not having flash.
Our job is to deliver a predictably great IT service to our clients 24×7, and that’s what we get paid to do. Every time we had an outage or a performance problem or some other glitch, that was costing us and our clients big money and big aggravation.
With this new design, it’s an entirely different world.
So what’s been the experience with VSAN and MCS technology so far?
We’ve had it up and running for well over a year, and it’s been absolutely amazing. It’s really impressive. If you think about it, it’s a great design — everything densely packaged in a blade: compute, memory, storage and connectivity. One management screen. And all it has to do is run vSphere with VSAN enabled. Dead simple.
If something breaks, VSAN detects and recovers so we don’t notice. All we have to do is go in and swap broken parts — that’s about it. As I said, performance is predicable and stellar. My admins tell me it’s an incredibly simple environment to operate as well.
I should share that VMware hasn’t officially qualified the produts based on Diablo MCS yet, but it’s in the process. In the meantime, I’ve gotten great support from all my vendors: VMware, Lenovo and OnX — everyone wants to see this happen.
You seem to be pretty happy with how simple and predictable your environment is now.
I am. It took a bit longer than I would have liked, but I can see the impact in my own world — everything is easier and simpler. If I need more performance or capacity, I just buy another blade and slide it in.
Our vSphere admins basically are the “one throat to choke” for all infrastructure: compute, network and storage. They have all the tools they need — in one place — to do everything they need.
I’m happy, my co-workers are happy, my boss is happy. Perhaps most importantly, our client is very happy as well.
I bet some of the other infrastructure teams have taken notice of what you’ve done?
Yes — everyone is interested. We’re already planning to roll this design out to more of the workloads my team is responsible for — it’s that compelling.
It’s the new thing everyone wants now.
My co-workers also do a lot of support for big data, analytics, decision support and visualization. The idea of using this design for those workloads is very, very attractive — especially if we can do graphics rendering in the same node as compute and storage.
You mentioned that you had tried out Nutanix gear previously — can you compare from your perspective?
Well, at the time, VSAN was just coming to market, so we tried out some of their gear for a few workloads. It was simpler than what we were doing at the time, so that was a win.
However, if you ask my admins which environment they think is simpler to manage, they’d probably all say “VSAN” — one tool, one environment, one dashboard, etc.
And, of course, we couldn’t do this cool MCS stuff using their gear, so there’s that.
Any final thoughts?
I think the support we’ve gotten from all the vendors — Lenovo, VMware and OnX — has been great, so thanks. My management team and my co-workers have also been great at supporting us as we put some of these newer technologies to work, so I’d like to thank them as well.
And, of course, our client for giving us the opportunity to put something innovative and transformational in front of them to consider.
You should be proud of what you’ve been able to deliver.
I am.
—————-
I think Gabe’s story is great.
He saw a challenge, did some research and pushed the technology envelope without getting burned. He delivered a solution that made a direct impact on his business — and that of his client’s. Just goes to show the power of hyperconverged software — unconstrained by hardware.
If you’ve got a great VSAN story, let me know? I’m sure there are more of you out there