Day 2 of the O’Reilly Strata Conference is starting here in Santa Clara, California and the focus is very much on data. In 2005, Tim O’Reilly predicted: “Data is the Next Intel Inside.” At VMware, big, fast data has never been so critical for our customers and innovations are transforming the cloud applications landscape at an unprecedented rate. This conference comes at the perfect time to reset what everyone knows about big, fast data.
The conference kicked off yesterday with several brief 20 minute keynotes. They were all succinct and to the point. Greenplum‘s Scott Yara reflected on how the big data market has grown tremendously over the past few years and mentioned several key data scientist practitioners. Scott also mentioned the increased investment in open source Hadoop. Of course, Strata comes on the heels of the Greenplum Pivotal HD announcement on Monday which launched their distribution of Hadoop which can improve performance 50X to 500X when compared to existing SQL-like services on top of Hadoop.
Another great keynote presentation was from Yael Garten, a Senior Data Scientist from LinkedIn. Yael leads the mobile data analytics team. She began by polling the audience and noting that many in the audience had already been on 3 different devices that morning and it wasn’t even 9:30 am yet. She noted we’re constantly connected, and we need to use data to personalize the experience for users no matter what device we’re on. She had an interesting graph highlighting device use and laptop use during our morning time of “coffee to couch”. And those uses are different in the US compared to places like India. Continue reading →
After PostgreSQL 9.2 was released, users that relied on PostgreSQL for scale, may have noticed a performance hit. In fact, the PostgreSQL community alongside the VMware vFabric Postgres team, was able to prove that the new version demonstrated a 10% performance hit over version 9.1. As part of the VMware Postgres team, we wanted to fix this problem for our own distribution, but as mentioned in previous posts, we also wanted to contribute our fixes back to the common core. This post provides additional detail on how this problem was identified and how we worked with the open source PostgreSQL community to restore performance.
Background on the Performance Issue in PostgreSQL 9.2
Last year, during routine regression testing of vFabric Postgres, we found that PostgreSQL 9.2, the latest major release of PostgreSQL, demonstrated a significant performance regression from version 9.1. Using DBT-2, an open-source and fair-use implementation of TPC-C benchmark , we noticed a 10% performance degradation, which we then reported to the community .
To troubleshoot the problem we used git bisect to find the type of commit that caused the performance problem and cross-examined the statistical profiles using oprofile. As it turns out, the regression was caused by a commit that changed the way memory was allocated when SPI queries were executed. The commit was intended to reduce the number of allocations for queries using a cached plan at the cost of more logistics work. However, according to the DBT-2 test, we could see that this tradeoff was unfavorable for dynamic queries. So to fix it, we would need reintroduce the original tradeoff on its intended queries using conditions .
We proposed the fix to the wider PostgreSQL community and the ensuing discussion led to a refined resolution which was implemented in a patch . This patch has been back-ported to the latest PostgreSQL 9.2.3 release and is included in the latest vFabric Postgres release . Continue reading →
The application server has been the centerpiece of modern architectures for web-based applications for over a decade. However, there are trends in technology that make us rethink how we use application servers and how we can get the most value out of them.
Over the years enterprises have built up considerable technical debt. This debt is made up of outdated processes, legacy applications, and stale technologies. We are all familiar with the types of headaches caused by older apps:
Development is slow.
Costs continue to rise, not fall.
Business needs are increasing in speed and complexity.
The good news are there are solutions today that solve all of these challenges. This post and accompanying video are aimed straight at helping you understand what will help you evolve your applications to a modern approach that will benefit your company and your customers alike. Using VMware and open source technologies such as Spring, Apache Tomcat, vSphere, Spring Insight and Hyperic we will explain to you how these tools and methodologies come together with tc Server to evolve your development organization and applications to tap into the full potential of lean development and cloud computing.
Universally, applications are faster, deal with large data sets, and provide more compelling user experiences than ever before.
Competition is steep.
As a result, competitive organizations demand that IT leaders speed the rate of new application innovation and development. IT must rise to the challenge or face competitive threats, missed business opportunities, and lose momentum within their user base. In short, IT leaders and providers that do not accelerate will face a backlash from executives.
In order to meet these challenges, IT is renovating application architectures to thrive in the cloud. This is an organization-wide change involving people redirection, process redesign, and technology exploitation. For many, there is a steep learning curve. Continue reading →
If you aren’t familiar with Strata, it is a great conference for those building apps in the cloud. Its focus is all about the future of big data and how to use big data successfully. Speakers include representatives from Google, VMware, Amazon, Microsoft, and many other software companies focused in the big data space. Topics include: Continue reading →
In a nutshell, dynamic memory management in vFabric Postgres is conceptually like Elastic Memory for Java (EM4J), but for a virtualized, enterprise-class, open source database instead of an application server.
Compared to a normal PostgreSQL server, vFabric Postgres brings two additions necessary for flexible virtualization of the database server. These two features can help companies realize the benefits of virtualizing the database and the associated cost savings from running an open source database on an extremely cost-effective infrastructure.
Elastic shared memory management
Automatic memory configuration
Elastic Shared Memory Management
Directly embedded with PostgreSQL core, the elastic shared memory management is a new feature of vFabric Postgres. This capability allows memory to be released or obtained according to the other virtual machine needs on the same server. Continue reading →
One of the most common questions I’m asked to cover when I discuss software architecture topics is the difference between the various application messaging protocols that exist today—issues like how and why the protocols came about, and which one should be used in a particular application.
Their question is valid.
Today, application architects need to use a messaging broker to speed and scale their applications, particularly in the cloud. Even once you select your messaging middleware application, application developers need to then select the protocol. Understanding the subtle differences between them can be difficult.
Today, we will consider three of the most common and popular TCP/IP-based messaging protocols, and provide a quick summary on the advantages of each: AMQP, MQTT and STOMP. Before we go on, I should also point out that all three of these protocols are supported in RabbitMQ version 3.0—something we will use as an example and come back to later.
Yet, the pace of information technology often forces IT executives to do that.
In today’s world, mainframe-to-cloud decisions need solid thinking or we risk a technology tornado. This article outlines some key lessons learned at the front-line of IT decision-making.
As previously discussed, it’s possible to “modernize” mainframe legacy applications to the cloud. You can get there with little to no modification by using a “lift-and-shift” strategy. Several of my clients have taken this approach to quickly satisfy a “cloud mandate”. The results have been less than desirable:
Without the use of pooled resources, the applications do not scale well.
Timely user provisioning and access from any device is still a challenge because the apps do not provide on-demand, ubiquitous access.
In addition, utility-based pricing/costing is performed manually, with little accuracy to the realities of actual usage.
Most importantly, the applications continue to have monolithic, stove-piped architectures, which are difficult and expensive to maintain and enhance.
These “cloud” applications are more like funnel cloud apps or tornoado apps—waiting to cause IT organizations extreme havoc. Assuming you want to avoid funnel clouds and IT tornadoes, consider applying the following five application architecture and design principles indicative of a true cloud application: Continue reading →
Recently, vFabric Postgres 9.2 launched with additional cloud computing capabilities like elastic memory management. Some of the most compelling new features are performance-related and take linear scaling to new levels.
This article will cover 3 key improvements as listed below:
4x Improvement with vertical linear scaling for reads
2x Improved write efficiency for write ahead logs
Index Only Scans and More
4x Improvement with vertical linear scaling for reads
Modern websites are almost all database driven. When consumers browse online retailer catalogs, 99% of the load is reads and 1% of the load is updates to the data on the tables. Even in highly updated websites, the grand majority of load is from reads. In these high-read usage scenarios, the database needs to handle a high read load on certain tables compared to the other tables in the database. We’ve seen this behavior drive enhancements within databases. For example, many application designs started putting a caching mechanism in their application to limit the database hits. Continue reading →
“We have so many databases—the cost is so significant and hard to track. If we could just measure and charge groups based on usage, we could better manage this and cut down on license costs.”
Many companies are realizing that they must run all services like a utility. There are “table stakes” for playing in the model. It means IT must manage all data-related services with metering and chargebacks. You just can’t be a true software service if you don’t know your real costs. Not to mention, other executives look at you with big eyes when you say, “I am not sure how much all our databases cost us.”
However, some suspect there is one important, business-centric cloud component missing in vFDD—the service metering capability, a.k.a. chargeback. Many have wondered if vCenter Chargeback Manager (vCBM) can be utilized together with vFDD to fill the gap. Can it? In short, the answer is yes. Continue reading →