During his 15+ year career, Mark Chmarny has worked across various industries. Most recently, as a Cloud Architect at EMC, Mark developed numerous Cloud Computing solutions for both Service Provider and Enterprise customers. As a Data Solution Evangelist at VMware, Mark works in the Cloud Application Platform group where he is actively engaged in defining new approaches to distributed data management for Cloud-scale applications. Mark received a Mechanical Engineering degree from Technical University in Vienna, Austria and a BA in Communication Arts from Multnomah University in Portland, OR.
Virtualization continues to be one of the top priorities for CIOs. As the share of virtualized workloads approaches 60%, the enterprise is looking at database and big data workloads as the next target. Their goal is to realize the virtualization benefits with the plethora of relational database sprawling in their data centers. With the increasing popularity of analytic workloads on Hadoop, virtualization presents a fast and efficient way to get started with existing infrastructure, and scale the data dynamically as needed.
VMware’s vFabric Data Director 2.5 now extends the benefits of virtualization to both traditional relational databases like Oracle, SQL Server and Postgres as well as Big Data, multi-node data solutions like Hadoop. SQL Server and Oracle represent the majority of databases in enterprises, and, Hadoop is the one of the fastest growing data technologies in the enterprise.
vFabric Data Director enables the most common databases found in the enterprise to be delivered as a service with the agility of public cloud and enterprise-grade security and control.
The key new features in vFabric Data Director 2.5 are:
Support for SQL Server – Currently supported versions of SQL Server are 2008 R2 and 2012.
Support for Apache Hadoop 1.0-based distributions: Apache Hadoop 1.0, Cloudera CDH3, Greenplum HD 1.1, 1.2 and Hortonworks HDP-1. Data Director leverages VMware’s open source Project Serengeti to deliver this capability.
Streamlined Data Director Setup – Complete setup in in less than an hour
One-click template creation for Oracle and SQL Server through ISO based database and OS installation
Oracle database ingestion enhancements – Now includes Point In Time Refresh (PITR)
Data Director’s self-provisioning enables a whole new level of operational efficiencies that greatly accelerates application development. With this new release, Data Director now delivers these efficiencies in a heterogeneous database environment.
Memory is faster than disk. People realize that when they need to support high performance on-line applications. Recently many traditional database providers latched onto this and started “washing” their offerings with in-memory variations. At the same time, new companies are jumping into the In-Memory Data Grid (IMDG) space with unproven offerings. However, enterprise data is not something many are willing to experiment on.
VMware has virtually pioneered the IMDG, even before it was a category. Its vFabric GemFire team has been at this for a while now with a proven, production-grade offering called vFabric GemFire. In its latest release, vFabric GemFire 7.0 brings a couple of key enhancements for developers and IT pros alike:
Improving developer productivity
Increasing operational efficiencies
These improvements are in addition to the already proven data consistency and reliability that many have come to expect form vFabric GemFire in their scale-out data architectures. Once more, VMware has shown, both the technical knowhow and the necessary experience in enterprise-grade in-memory data to support on Cloud-scale. Continue reading →
Not long ago I covered the topic of Big Data adoption in the enterprise. In it, I described how Serengeti enables enterprise to respond to common Hadoop implementation challenges resulting from the lack of usable enterprise-grade tools and the shortage of infrastructure deployment skills.
With the latest release of open source Project Serengeti, VMware continues on its mission to deliver the easiest and most reliable virtualized Big Data platform. One of the most unique attributes of Serengeti Hadoop deployment is that it can easily coexist with other workloads on an existent infrastructure.
Serengeti-deployed Hadoop clusters can also be configured in either local or shared, scale-out data storage architecture. This storage layer can even be shared across multiple HDFS-based analytical workloads. And, in the future, this could potentially be extended to other, non-HDFS-based data engines.
The elasticity of underlining vSphere virtualization platform, helps Serengeti to achieve new levels of efficiency. This architecture enables organizations to share the existing infrastructure with Big Data analytical workloads to deliver optimal storage capacity and performance. Continue reading →
Register for Session TEX2183 – Highly Available, Elastic and Multi-Tenant Hadoop on vSphere: Click Here
Follow all vFabric updates at VMworld on Twitter: Click Here
Enterprise IT is under immense pressure to deliver a Big Data analytic platform. The majority of this demand is currently for pilot Hadoop implementations, with fewer than 20 nodes, intended to prove its value to deliver new business insight. Gartner predicts that this demand will further increase by 800 percent over the next five years.
The explosive growth of these kinds of requests in mid-to-large size companies renders IT departments unable to that demand. Furthermore, Hadoop, and all of its ecosystem tools, are often too complex to deploy and manage for many of these organizations.
As a result, enterprise users, frustrated by these delays, often opt to circumvent IT, and, go directly to on-line analytic service providers. While satisfied by the immediacy of access, they often compromise many of the corporate data policies, inefficiently proliferate data and accrue large costs due to unpredictable pricing models. Continue reading →
Our “Uber” Data booth at VMworld this year will demonstrate how VMware continues to address enterprise data management challenges related to scalability, data proliferation, traditional database performance bottlenecks, analytics, and the ever changing data usage patterns of today’s on-line applications.
Despite what people tell you, managing on-line applications on a cloud-scale is hard. One of the main challenges is related to the fact that as an application gets more and more popular, the underlining database often becomes the bottleneck.
When demand spikes, organizations are comfortable scaling their Web and App Server layers. However, as they increase the number of application instances to accommodate the growing demand, their data layer is unable to keep up.
We all know that a solution’s overall performance is only as good as it’s lowest common denominator. Increasingly, the lowest common denominator of today’s on-line applications is the database.
A Customer Example
Recently, a large retail customer spoke to us about their experiences in dealing with demand spikes during holidays. Their virtualized infrastructure was more than capable of scaling horizontally to address the growing demand. However, their underlying, traditional database could not handle the large load increases. The database started to experience deadlocks, connection timeouts, and various other problems.
One of the most common misconceptions about the level of high availability provided by SQLFire is the client configuration. A lot of times, when people see the client connection string defined with a single IP, they assume that to mean that client will only communicate with that one SQLFire cluster host and deem that to be a single point of failure in the SQLFire data grid.
In reality however, SQLFire is based on multi-faced shared nothing architecture. One of its tenants is transparent failover at the protocol level.
For those following the vFabric Blog, one of the most intriguing new products (from both a technology and a financial perspective) is vFabric SQLFire – an in-memory, NewSQL database. There is an upcoming webcast on Wednesday, June 20, 2012, 9:00 AM PDT, titled VMware vFabric SQLFire – Fast Data that Spans the Globe.
Traditional databases should not be used to do things for which they were never designed; like supporting thousands of concurrent users.
The main challenge of managing Web applications on a Cloud-scale is performance. Disk-based database architectures are fine when you have a small number of users, but they lack the facilities for horizontal scaling, and, are unable to address the variable access patterns.
In contrast, SQLFire, an in-memory database from VMware, was designed specifically for these kinds of challenges. With its speed and low latency, SQLFire delivers dynamic scalability and high performance for modern, data-intensive applications, all through a familiar SQL interface.
In this post, I will demonstrate one of the ways SQLFire can increase throughput and decrease latency of your current Web applications.
Just a quick post to address a potential issue regarding integration of the latest SQLFire release (1.0.2) into your Maven project. You may not know this, but, the VMware GemStone team maintains its own Maven repository.
For the latest versions of SQLFire and GemFire dependencies simply integrate the GemStone repository into your pom.xml:
<repository> <id>gemstone</id> <name>Release bundles for SQLFire and GemFire</name> <url>http://dist.gemstone.com.s3.amazonaws.com/maven/release</url> </repository>
and define the necessary dependency, in this case SQLFire: