Home > Blogs > VMware vFabric Blog

How fast is a Rabbit? Basic RabbitMQ Performance Benchmarks

One of the greatest things about RabbitMQ is the community that surrounds it. With open source at its roots, people come together to share their code, their knowledge and their stories of how they’ve deployed it in their projects. At a recent meetup near Nice, France, database engineer Adina Mihailescu shared a presentation on choosing messaging systems. Supported by Murial Salvan’s benchmark comparing ActiveMQ, RabbitMQ, HornetQ, Apollo, QPID, and ZeroMQ, they shared some interesting performance comparisons that we’d like to share with you.

In a single laptop benchmark, Salvan ran four different scenarios in order to obtain some insight on performance of the default setups for these messaging solutions. Each test had 1 process dedicated to enqueuing and another dedicated to dequeuing. The message volume and size ranged from 200 to 20,000 to 200,000 messages and 32 to 1024 to 32768 bytes. Both persistent and transient queues and messages were used. Continue reading

10 Ways to Make Hadoop Green in the CFO’s Eyes

Hadoop is used by some pretty amazing companies to make use of big, fast data—particularly unstructured data. Huge brands on the web like AOL, eBay, Facebook, Google, Last.fm, LinkedIn, MercadoLibre, Ning, Quantcast, Spotify, Stumbleupon, Twitter, as well as some more brick and mortar giants like GE, Walmart, Morgan Stanley, Sears, and Ford use Hadoop.

Why? In a nutshell, companies like McKinsey believe the use of big data and technologies like Hadoop will allow companies to better compete and grow in the future.

Hadoop is used to support a variety of valuable business capabilities—analysis, search, machine learning, data aggregation, content generation, reporting, integration, and more. All types of industries use Hadoop—media and advertising, A/V processing, credit and fraud, security, geographic exploration, online travel, financial analysis, mobile phones, sensor networks, e-commerce, retail, energy discovery, video games, social media, and more. Continue reading

Upcoming Webinar: Paul Maritz on Pivotal and The New Platform for the New Era

The cloud, mobile applications and big, fast data are fundamentally changing how applications are built and modernized today. To speed this transformation at the enterprise level, Pivotal, the new venture by VMware and EMC, will host a live streaming event on April 24th at 10:00 am Pacific/1:00 pm Eastern with a special announcement and an unveiling of its plans to build “A New Platform for a New Era”.

The Pivotal platform will unite data, application, and cloud fabrics, helping enterprises to develop faster, understand more, and succeed at an even greater scale. It is a platform that makes the consumer grade enterprise a reality.

Pivotal brings together a prodigious set of technologies and talent from a number of EMC and VMware entities, which include Greenplum, Cloud Foundry, Spring, GemFire and other products from the VMware vFabric Suite, Cetas, and Pivotal Labs.

>> Register for webinar here!

Paul Maritz, the Pivotal Leadership Team, and special guests will unveil this platform, and make a special announcement during a live streaming event on Wednesday, April 24th at 10:00 am Pacific/1:00 pm Eastern.

Sign up for the event at gopivotal.com and follow @gopivotal on Twitter for updates.

Banks Are Breaking Away From Mainframes to Big, Fast Data Grids

The world’s largest banks have historically relied on mainframes to manage all their transactions and the related cash and profit. In mainframe terms, hundreds of thousands of MIPS are used to keep the mainframe running these transactions, and the cost per MIP can make mainframes extremely expensive to operate. For example, Sears was seeing the overall cost per MIP at $3000-$7000 per year and didn’t see that as a cost-effective way to compete with Amazon. While the price of MIPS has continued to improve, mainframes can also face pure capacity issues.

In today’s world of financial regulations, risk, and compliance, the entire history of all transactions must be captured, stored, and available to report on or search both immediately and over time. This way, banks can meet audit requirements and allow for scenarios like a customer service call that results in an agent search for the transaction history leading up to a customer’s current account balance. The volume of information created across checking, savings, credit card, insurance, and other financial products is tremendous—it’s large enough to bring a mainframe to its knees. Continue reading

How Instagram Feeds Work: Celery and RabbitMQ

Instagram is one of the poster children for social media site successes. Founded in 2010, the photo sharing site now supports upwards of 90 million active photo-sharing users. As with every social media site, part of the fun is that photos and comments appear instantly so your friends can engage while the moment is hot.  Recently, at PyCon 2013 last month, Instagram engineer Rick Branson shared how Instagram needed to transform how these photos and comments showed up in feeds as they scaled from a few thousand tasks a day to hundreds of millions.

Rick started off his talk demonstrating how traditional database approaches break, calling them the “naïve approach”. In this approach, when working to display a user feed, the application would directly fetch all the photos that the user followed from a single, monolithic data store, sort them by creation time and then only display the latest 10:

SELECT * FROM photos
WHERE author_id IN
(SELECT target_id FROM following
WHERE source_id = %(user_id)d)
ORDER BY creation_time DESC
LIMIT 10;

Instead, Instagram chose to follow a modern distributed data strategy that will allow them to scale nearly linearly. Continue reading

Understanding Speed and Scale Strategies for Big Data Grids and In-Memory Colocation

The new database is opening up significant career opportunities for data modelers, admins, architects, and data scientists. In parallel, it’s transforming how businesses use data. It’s also making the traditional RDBMS look like a T-REX. 

Our web-centric, social media, and internet-of-things are acting as a sea-change to break traditional data design and management approaches. Data is coming in at increasing speeds, and 80% of it cannot be easily organized into the neat little rows and columns associated with the traditional RDBMS.

Additionally, executives are realizing the power of bigger and faster data—responding to customer demands in real-time. They want analysis, insights, and business answers in real-time. They want the analysis to be done on data that is integrated across systems. And, they don’t want to wait a day to load it into a data warehouse or data mart. As a result, developers are changing how they build applications.  They are using different tools, different design patterns, and even different forms of SQL to parse data. Continue reading

How-To: Build a Geographic Database with PostGIS and vPostgres

Mobile Location Based Services are on the rise. After several false starts back in the mid 2000s, every mobile user now depends on their phones to tell them where they are, where their friends are, and to engage with social media like Facebook and Foursquare.  A report by Juniper Research suggests this market is expected to breach over $12 billion next year, where it hardly existed a few years ago at all.

This is in part because mobile apps have become ubiquitous now. In order to remain relevant, businesses need to interact socially and have a web store to remain accessible to their wandering customers.

Building a geographically aware application from scratch sounds daunting and like a lot of initial data setup. It doesn’t have to be. Products like vFabric Postgres (vPostgres) can be used along with the PostGIS extensions to perform geographic-style queries. Then,  public data and an open source visualizer can be used to transform the query into a meaningful result for your application or end user.

Continue reading

Disaster Recovery Jackpot: Active/Active WAN-based Replication in GemFire vs Oracle and MySQL

Ensuring your systems run smooth even when your data center has a hiccup, or a real disaster strikes is critical for many companies to survive when hardships befall them.  As we enter the age of the zettabyte, seamless disaster recovery has become even more critical and difficult. There is more data than we have ever handled before, and most of it is very, very big.

Most disaster recovery (DR) sites are in standby mode—assets sitting idle, waiting for their turn. The sites are either holding data copied through a storage area network (SAN) or using other data replication mechanisms to propagate information from a live site to a standby site.  When disaster strikes, clients are redirected to the standby site where they’re greeted with a polite “please wait” while the site spins up.

At best, the DR site is a hot standby that is ready to go on short notice.  DNS redirects clients to the DR site and they’re good to go.

What about all the machines at the DR site?  With active/passive replication you can probably do queries on the slave site, but what if you want to make full use of all of that expensive gear and go active/active?  The challenge is in the data replication technology. Most current data replication architectures are one-way. If it’s not one-way, it can come with restrictions—for example, you need to avoid opening files with exclusive access. Continue reading

PostgreSQL Security Release Now Available

IMPORTANT SECURITY UPDATE NOW AVAILABLE

Download VMware vFabric Postgres Security Update
Download immediately

The PostgreSQL Project has released an important security update for all supported versions including v9.2.4, v9.1.9, v9.0.13, and v8.4.17. Likewise, VMware has also released updated versions of the vFabric Postgres distribution.

This release includes a fix for a high-exposure security vulnerability (CVE-2013-1899) that is considered so important, the PostgreSQL project pre-announced this release last week and closed development to protect the vulnerability from being exploited before the threat had an update readily available to protect users against it.

All users are strongly urged to apply the update as soon as it is available. Continue reading

How to Perform Security Updates on vFabric Postgres

The PostgreSQL community announced last week that an important security update will be released on April 4, 2013. This release will include a fix for a high-exposure security vulnerability and all users are strongly urged to apply the update as soon as it is available. Knowing how disruptive urgent security updates can be to IT and developers, the PostgreSQL community issued advanced warning in the hopes that it would ease the impact to day-to-day operations while helping as many companies as possible to adopt the update quickly.

As such, we would like to take the time to remind us all how important these security updates are to your business, and how to apply them most efficiently for vFabric Postgres.

The Cost of Missing Security Updates

Maintenance and security software updates are essential in extending application longevity as well as in keeping the confidence of customers who use services based on the application.

When big data disasters hit, the impacts quickly move beyond financial and affect reputation and trust. Databases are a particular area of concern. A recent article titled, “Making Database Security Your No. 1 2013 Resolution,” cited a Verizon study that showed only 10 percent of total security spend goes into database protection, while 92 percent of stolen data comes out of databases.

According to the seventh annual U.S. Cost of a Data Breach report from Ponemon Institute, the cost of an average data breach was $5.5M in 2011 or $194 per record. While $5.5M may not sound like a lot to some companies, losing one million records at a cost of $194 per record adds up. Continue reading