case_studies redis Tanzu RabbitMQ

22 Billion Served: Julien Genestoux of Superfeedr

superfeedr_completeSuperfeedr delivered a total of 22,408,346,028 entries at the time of this article.

In the next hour, they will publish almost one million more entries.

If you haven’t heard of Superfeedr, the service helps publishers and subscribers manage their feeds. Publishers get a simple way to implement the PubSubUbbub protocol, and they can also distribute and analyze content on their customized hub. Subscribers can speed up data collection, track keywords, normalize content, and more.

They are also fans and users of both Redis and RabbitMQ.

>> Join Pivotal at Velocity Conference this year June 18-20 in Santa Clara! More info can be found here!

We recently had the good fortune of having an interview with Julien Genestoux, founder of Superfeedr. Here is what we learned.

Julien GenestouxQ1 | In your own words, what is your job?

A1 | We are building a real-time infrastructure for the feeds. We work with web services who use RSS or Atom feeds and help them get the data faster, in a normalized way. We also work with publishers to help them push their feeds to anyone interested in their content like search engines, API partners, or federated web services.

Q2 | How is your company using RabbitMQ and Redis?

A2 | Obviously, our whole company is “job” oriented. We organize the fetching and parsing of feeds based on incoming requests (pings) as well as internal scheduling. This is how we use Rabbit and Redis respectively.

To explain our incoming requests in more detail, we get pings from the publishers we work with as well partners. Our front end webservers distribute pings to our backend servers through a couple of RabbitMQ queues. As of today, we get between 2000 and 5000 pings per second, and RabbitMQ does a very good job of distributing the data between our frontend (about 40 processes) and our backend (about 200 processes).

We also have a strong scheduling requirement because we schedule feeds to be fetched. In our architecture, we designed something we call “rings” with Redis. Our rings are basically lists of keys that we rotate (pop in front, push in the back) at various speeds. The feeds that need to be fetched the most often are put on the smaller rings, while the ones that do not need to be fetched as often are put on bigger rings. We currently have 6 rings, and these hold between 10k objects and 4M objects. It takes between 10 seconds and 1 hour to rotate them.

We also use Redis as a cache for several types of data—one example is for unique identifiers so we can quickly identify new entries from the old entries.

Q3 | Why did you select RabbitMQ and Redis?

A3 | Both of these tools were designed with speed in mind—an obvious requirements for us. We also loved how Redis could play with different data structures—these structures often represent our business logic better than “rows” in a table.

Q4 | What was your background with these products before deploying them at Superfeedr?

A4 | We started from scratch with both products and learned how to use them better as we scaled up.

Q5 | What was your learning curve like?

A5 | RabbitMQ’s learning curve was pretty simple thanks to the large number of compatible libraries. We even started with a STOMP broker which was not designed to work with RabbitMQ. As we read more about RabbitMQ, we were able to fine tune our settings quite quickly. It took us a couple months to reach a scale where the “default” settings or setup was not enough.

The learning curve for Redis was a little more complex as it’s probably a much more innovative product. Getting to know all the datastructures, the pros and cons, as well as knowing exactly how to represent the data on our end to improve the read/write speeds was very important. Redis also grew a lot, introduced, and removed features that we tested to see if they’d fit our needs.

Q6 | What did you like best about using each?

A6 | RabbitMQ: the fact that it’s easy to forget about it 🙂

Redis: the speed.

Q7 | Are you looking at any additional projects for Rabbit or Redis?

A7 | Yes, we’ve been waiting for Redis’s cluster since it was announced by Salvatore (is it 3 years ago already?). This may be a game changer in the NoSQL space and may mean we’ll finally be able to use Redis as our only data store. (We currently use Riak and MySQL as well.)

We also use both on several side projects, mostly Redis because it’s so easy to setup and get started.

Q8 | Anything else you might want to mention to the community?

A8 | We’re always looking for new talent to work on side projects. Please get in touch if you love Redis and RabbitMQ as well as Node.js which is our current language/framework of choice 🙂. We’re particularly excited about our Google Reader API replacement and we’d love your feedback on it. Read more about it on our blog.

Q9 | Can you give us the traditional bio or “marketing blurb” about yourself?

A9 | Sure. Julien Genestoux is the founder of Superfeedr. Superfeedr fetches and parses RSS or Atom feeds on behalf of its users and then pushes the new entries to subscribing applications. It is now the leading real-time feed provider on the web and hosts the vast majority of PubSubHubbub hubs. Late in 2009, the company received funding from Betaworks and Mark Cuban. A year later the company became break even.

Julien is a strong open web advocate and will push (pun intended) anyone to use standard protocols rather than custom made APIs. For example, recently Superfeedr introduces SubToMe which aims at being a smart an universal “follow button” for the open web. It works with most news readers out there, and can be very easily put on any website. Try it out, and let us know what you think!