This post was co-authored by Gina Rosenthal and Don Sullivan
Can virtualized machine learning environments enable your organization to find answers even faster? Organizations that are anxious for meaningful analytics and machine learning results must invest in specialized human capital and a technology infrastructure that empowers them to do their best work. Is it possible for your data center experts or current IT teams to embrace new tools and opportunities, while at the same time reducing technology cost and complexity involved with machine learning?
Task and tools needs are only getting more complex
It takes a team to build virtualized machine learning environments:
- Data engineers are responsible for the foundational work required to clean and prepare data before it’s handed off to the scientists. They need capacity to support I/O intensive large-scale data processing, working in batches.
- Database specialists are tasked with building and managing business intelligence queries and architecture across diverse and emerging tool sets. They require resources that quickly scale and easily extend their capabilities.
- Data scientists turn transformed data into actionable answers, through advanced analytics and machine learning. They need enterprise-grade storage and compute, redundant data sets for testing and modelling, and high-end GPU support.
- Data center experts or your current IT Team must architect, build, manage, secure, and maintain the infrastructure on which all this work happens.
Each player on your team adds pressure and complexity to the hard work of delivering a flexible, high-performance infrastructure that can keep pace with today’s known needs as well as future unknowns. This leaves traditional architecture strategies acceptable for the short-term, but will they be enough over time?
The pipelines that move data from sensor or user to final analysis need both reach and capacity, empowering data scientists while extending the reach of their work. In the past, high-end workloads have stressed the limits of traditional hypervisor computing. This left two paths for delivering services: dedicated hardware or abstraction to the cloud. But our work at VMware offers a third way, delivering the flexibility of virtualized machines with performance that closes in on bare metal.
The promise of virtualized machine learning for data scientists
From our start, VMware has worked to transform infrastructure from constraint to force multiplier; this is the core promise of virtualization. Our original core promise simplifies the hard work of building virtualized machine learning environments data science teams require to do the work of gathering data, exploring via query, and building and testing models. vSphere enables these machines to be built and managed (along with the workloads they enable) across private and public clouds, and everywhere in between.
This means expanding the reach of virtual machines so that they deliver more than traditional I/O, memory, and compute resources. vSphere can also virtualize accelerated GPU computing as well as cluster software resources for data processing (Spark) and TensorFlow (machine learning).
This enables data experts to use virtual machines to work with the latest in applications and services. Data engineering has tools for large-scale transformation. Data scientists use those tools to get applications focused on extracting deep insight. Virtualization means that these environments are portable, easy to manage, and secure.
The ease of building and distributing resources across multiple different sandboxes gives everybody access to the environments they need, working across multiple application versions and instances simultaneously. This flexibility is especially important since environment demands, staging, test, and products vary wildly across the project lifecycle.
For IT, vSphere offers a unified view across applications and environments. This is especially important as analytics and machine learning programs are built to read and write, learn and train across private and public cloud and storage services and applications. Consolidating administration and security reduces time requirements while also improving readiness.
Smarter data science economics with virtualized machine learning
This isn’t just an infrastructure promise, it’s also a fundamental piece of making analytics and machine learning operational. The ability to invest in shared resources allows organizations to deliver resources on demand, without the time and money lost to purpose-built, dedicated hardware.
Data engineers, scientists, and specialists come at a cost. They should be focused on unlocking meaning, not tuning machinery. This is also true when delivering analytics as a service directly to business decision-makers – their high value time should be spent building, training, and testing models with robust data sets, not managing the technical configuration of their data tooling will come from.
- This means granular control, deep customization, and simpler interoperability of all resources and applications
- They need to quickly build, test, iterate, break and rebuild solutions as dictated by a search for answers, not system administrators. Resources need to be easy to assemble and consume without the expense of constant IT engineering.
- They need performance standards they can build on. This means complete solution security, and the ability to quickly restart and recover when something goes wrong.
- They need suitable, application performance and machine learning model training performance
Finally, agility is ever-critical – yesterday’s choices can’t constrain what needs to happen today, especially when integrating new services and storage. Good ideas can be scaled up and out, and run accordingly, but granular flexibility is universal. And, most importantly, workloads can be managed strategically without worry.
Rethinking your data science infrastructure
VMware is committed to helping technologists and their data team customers reduce the cost and complexity of delivering quality experiences at scale. For advanced workloads in analytics and machine learning, this means near bare metal speed with the ease of working in the cloud, delivered with on-premises compliance and performance.
Ever since our early hypervisor technology laid the first foundations for an increasingly software-defined world, we’ve worked to extend that core promise of virtualization– empowering maximum flexibility across finite resources – to today’s evolving workloads, always evolving to keep pace with the accelerating demands of new applications and ways of working.
The promise of virtualization for analytics and machine learning is big. The ability to give data engineers and scientists the ability to explore, experiment, iterate and adapt is essential. But being able to power those environments (including workload-specific services) with shared resources and consolidated management tools, is game-changing.
Virtualized machine learning: where to start
Data science teams are building and using virtual machines to elevate their capabilities, finding ways to do more with less. It starts with asking the right questions about your existing analytics and machine learning environments.
- Look at your data pipeline. Are you dedicating enough resources to the foundational work done by data engineers and their teams?
- Revisit your can’t virtualize list. Are there advanced analytics and machine learning environments that could benefits from open-ended flexibility and the confidence of on demand performance?
Do you have questions about how Sphere can accelerate your analytics and machine learning? Let us know in the comments.