Instagram is one of the poster children for social media site successes. Founded in 2010, the photo sharing site now supports upwards of 90 million active photo-sharing users. As with every social media site, part of the fun is that photos and comments appear instantly so your friends can engage while the moment is hot. Recently, at PyCon 2013 last month, Instagram engineer Rick Branson shared how Instagram needed to transform how these photos and comments showed up in feeds as they scaled from a few thousand tasks a day to hundreds of millions.
Rick started off his talk demonstrating how traditional database approaches break, calling them the “naïve approach”. In this approach, when working to display a user feed, the application would directly fetch all the photos that the user followed from a single, monolithic data store, sort them by creation time and then only display the latest 10:
SELECT * FROM photos WHERE author_id IN (SELECT target_id FROM following WHERE source_id = %(user_id)d) ORDER BY creation_time DESC LIMIT 10;
Instead, Instagram chose to follow a modern distributed data strategy that will allow them to scale nearly linearly. Continue reading →
For those of you not familiar with how large Indeed is, it is interesting to note that the job search company Indeed.com is one of the largest web sites in the world. According to Alexa.com, Indeed is currently the #224th biggest website in the world, and in cities like Atlanta and Chicago, it’s the 55th most popular website overall. According to research by SilkRoad, 2 out of every 5 hires came via Indeed (based on data from 150,000 hires).
As expected, the engineering team behind this large-scale application needs to support some very large scale numbers. In a recent post on their company blog, the Indeed team shared just how big those numbers are:
More than 100 million monthly unique visitors
More than 3 billion searches per month
More than 1000 searches per second
50 country-specific sites in 26 languages
The scale of their application, both in terms of processing throughput and geographic diversity, means that the team relies on a messaging layer powered by RabbitMQ. Continue reading →
Apache Derby is used for its RDBMS components, JDBC driver, query engine, and network server.
The partitioning technology of GemFire is used to implement horizontal partitioning features of vFabric SQLFire.
vFabric SQLFire specifically enhances the Apache Derby components, such as the query engine, the SQL interface, data persistence, and data eviction, as well as adding additional components like SQL commands, stored procedures, system tables, functions, persistence disk stores, listeners, and locators, to operate a highly distributed and fault tolerant data management cluster.
Today, we are pleased to have a guest blogger from a VMware customer share with us their story of how RabbitMQ transformed their business by “solving some really interesting problems”. The following is sent courtesy of Pablo Molnar of MercadoLibre:
If you haven’t heard of MercadoLibre (NASDAQ: MELI), we are the largest e-commerce ecosystem in Latin America. Our website offers a wide range of services to sellers and buyers throughout the region including marketplace, payments, advertising, and e-building solutions. Our products are present in over 14 countries, and the company is ranked as 8th largest online retailer in the world. We were also on Fortune’s list of the fastest growing companies in 2012, and we use RabbitMQ to solve some interesting problems.
About Our Technology Stack and How RabbitMQ Helps
In terms of technology infrastructure, MercadoLibre is fully committed to the open source development model. Most of our apps are primarily written in Grails, Groovy, and NodeJS, but we don’t stick to any language or framework. We entrust tool selection responsibilities to the Software Engineers on each team. Almost all applications are hosted by our in-house cloud computing provisioning system and implemented via OpenStack with more than +7000 virtual instances at the moment. Also, we have successfully launched applications using emerging storage solutions like Redis and MongoDB. With an average of 20 million requests per minute and 4GB bandwidth per second, our traffic management layer is crucial and most of the routing rules job is done by Nginx proxy servers. Our labs department includes a huge Apache Hadoop cluster to perform complex analytical queries, and we are experimenting with real-time data processing using Apache Kafka and Storm.
IT organizations are facing significant challenges maintaining legacy mainframe applications: challenges ranging from the high cost of proprietary hardware and software, to the attrition of people with qualified mainframe skills and experience, and the inability to support modern computing demands of mobile and big fast data.
Cloud computing offers an opportunity to rationalize and modernize application portfolios, which can include migrating legacy mainframe apps to the cloud. Unfortunately, many IT organizations see the prospect of modernizing mainframe apps as a “mission impossible”; the path forward too cloudy and the costs and risks are too great.
As a result, many resign themselves to living with the burdens of a legacy mainframe environment. And while remaining status quo may appear to be the best option, over time, it only intensifies the challenges associated with maintaining mainframe apps. Eventually the business loses confidence in IT’s ability to deliver, and costs continue to rise without corresponding value. Continue reading →
Next year is going to be even bigger with the Pivotal Initiative where several of the products covered on this blog will be following the new venture. This is still in the planning stages, so we will be expecting to share with you the plans for our products alongside the formal communications from each of the companies involved. (Sorry — no extra information is available right now)
Those three words often mean a lot of things – a lot of work, a lot of change, a lot of cost savings, a lot of leadership, and a lot of coordination. Of course, the payoff of doing it right can also be outstanding.
We had the opportunity to gain personal, anonymous observations from a senior technical architect of a European consulting firm who knows firsthand that data center consolidation can create value, citing “moving thirteen datacenters run by thirteen teams to six data centers run by one team is the catalyst for huge improvements in many areas.” Our architect’s company provides recommendations, architecture, installation, customized solutions, and operations services for IT. In their conversation with VMware, we found that deployment automation is a critical requirement to many of their client’s consolidation plans, and they pointed out how vFabric Application Director is fundamental to the approach.
The next release of Hyperic is coming up soon and the biggest change is to the backend. In the next release, we will only support one database, namely PostgreSQL. Those of you who have been with Hyperic for a while as long as I have may be surprised considering our history with PostgreSQL, but, as you read though this blog, it will start to make sense.
History of PostgreSQL and Hyperic
For the last few years Hyperic has supported only two databases for production use at scale—Oracle and MySQL. This in itself was a big change since at one point, PostgreSQL was our bread and butter. Hyperic was originally designed on PostgreSQL 7.x. As an open source project, PostgreSQL has a very easy license for distribution. As a startup company we had to get our product out into the marketplace quickly and affordably, so therefore PostgreSQL made sense.
Register for Session OPS-CIM2646 – Cloud Application Platform Automation on vSphere Infrastructure Leveraging Application Director : Real-World Example of Running a 4 Billion-Dollar Business (VMware IT): Click Here
Register for Session APP-CAP2757 – Accelerate Adoption by Leveraging IaaS for a Complete Deployment and Monitoring Lifecycle: Click Here
Register for Session OPS-CIM2852 – Automated Provisioning for Business Critical Applications (Microsoft/Java) in Private or Public Cloud: Click Here
Follow all VMware AppMgmt updates at VMworld on Twitter: Click Here
Now that number is up to 90%. Here’s an overview of what the business workload lifecycle management implementation looks like under the hood.
As shared in the earlier post, our goal was to automate the end-to-end application life-cycle management in a private cloud and eventually across the clouds. Automation by definition speeds things up and makes them less error prone, but in this case, it also meant that VMware’s IT organization could decouple itself from the everyday operations of the app and product teams it serviced. This split between IT and DevOps is a goal for many organizations today who are looking to be more agile, save money and maintain strong IT governance.
To achieve it, VMware IT automated several key processes across organizations including:
Recently, VMware worked with the Ocean Observatory Initiative to discuss an interesting case study that affects us all. The U.S. has built an ocean of big data on the ocean itself. Currently, we are collecting about 8 terabytes a day or 3 petabytes a year of data about the ocean in order to more efficiently and safely study the body of water that covers over 70% of earth.
The Ocean Observatories Initiative (OOI) is a 25-year program responsible for managing a networked set of 100s of sensor instruments that sit in the ocean, take measurements, send data back to a massive data infrastructure, and make data-sets and reports available to oceanographers, scientists, educators, and the public on a very broad scale. This system, quite literally, is a Hubble Telescope for observing the ocean. While this mega-system has an amazing history and tons of interesting capabilities, we think it’s pretty cool that VMware vSphere and vFabric RabbitMQ play key roles. Continue reading →