Recently, VMware worked with the Ocean Observatory Initiative to discuss an interesting case study that affects us all. The U.S. has built an ocean of big data on the ocean itself. Currently, we are collecting about 8 terabytes a day or 3 petabytes a year of data about the ocean in order to more efficiently and safely study the body of water that covers over 70% of earth.
The Ocean Observatories Initiative (OOI) is a 25-year program responsible for managing a networked set of 100s of sensor instruments that sit in the ocean, take measurements, send data back to a massive data infrastructure, and make data-sets and reports available to oceanographers, scientists, educators, and the public on a very broad scale. This system, quite literally, is a Hubble Telescope for observing the ocean. While this mega-system has an amazing history and tons of interesting capabilities, we think it’s pretty cool that VMware vSphere and vFabric RabbitMQ play key roles.
Monitoring the Oceans
As information technologists, we’ve all dealt with monitoring systems in some way, shape, or form, but imagine if the Ocean was your data center.
Instead of monitoring disk, CPU, or memory usage, the OOI captures data from global, regional, and coastal sensors and pulls the information into a common information management system called the Common Operating Infrastructure (COI). The sensors include telemetering buoys, electro-optical seafloor cables, underwater gliders, AUVs, profilers, moorings, fixed instrument chains, seafloor equipment, and sub-seafloor installations to monitor information – a total of 49 classes and over 700 instruments deployed off of 6 coastlines. Above, we have a diagram showing examples.
These sensors help scientists to:
- Provide continuous, real-time information about climate, circulation, ecosystem dynamics, air-sea exchange, seafloor processes, and plate-scale dynamics.
- Track dozens of measurements like humidity, water velocity, salinity, pressure, chlorophyll, and nitrates across various physical, chemical, geological, and biological variables on a coastal, regional, and global scale.
- Measure at microscopic and global levels.
- Capture a massive amount of time and space-based information (think geo/map data combined with calendar/time data on steroids).
Processing and Using the Ocean Observatories Data
Most importantly, all of this data is made available via “virtual observatories” to scientists. In the diagram below, the data moves from sensors on the left and through physical interfaces as part of Marine Management and Operations. The data is processed in various ways to correlate information, deal with calibration of instruments, add quality assurance, provide management capabilities (e.g. quality assurance, security policies, metadata cataloging, notifications), and more. This is shown in the diagram above.
As the data is made available to end users, the data is distributed through GUIs and downloaded as data-sets via various formats like CSV and potentially Keyhole Markup Language or MATLAB binary files. The data supports a variety of consumption use cases that ultimately support scientific analysis and presentation. See an example of the end-user GUI below.
Under the Hood – A System Architecture Overview
The entire system is referred to as the Integrated Observatory Network (ION), and ION provides a unifying information conduit with additional capabilities like identity management, governance, state management, resource management, a service framework, and a presentation framework. The subsystems and COI capabilities are held together by a messaging service, based on RabbitMQ, that is applied in a cross¬‐cutting way to the interaction of all elements across all subsystems.
OOI calls this messaging system the "Exchange". The diagram shows the Exchange components. It uses pubsub messaging and queues as the central paradigm to wire and integrate all information between all applications. The subsystems (i.e. applications) include:
- The Data Management subsystem – manages the dynamic data distribution network of data products and metadata based on the OOI-CI common data model.
- The Sensing and Acquisition (SA) subsystem – provides the life cycle and operational management of sensor network environments as well as observing activities (i.e., scheduling, collecting, processing, calibration) associated with sensor data acquisition.
- The Analysis and Synthesis subsystem (AS) – provides capabilities to support advanced data analysis and output synthesis applications. This includes the visualization of science data products, the execution of user provided real-time and interactive analysis workflows, and the operational management numerical models
- The Planning and Prosecution (PP) subsystem – provides services together with the standard models for the management of stateful and taskable resources. It provides controller processes with the semantics to monitor and control the operating state of an active resource as well as to initiate, monitor and amend tasks being carried out by a taskable resource.
- The Common Execution Infrastructure (CEI) – provides an infrastructure for the virtualization of computing across the OOI, including taskable resource provisioning, remote operational management, and process execution.
External integrations are also considered a key type of system:
- The user experience and application interfaces via HTML (web user interfaces), direct data access, and through exchange APIs (if authorized).
- Marine integration provides facilities for agents to manage physical instruments through a common interface to provide status, represent capabilities, perform data acquisition, take commands, and more.
- External observatory integration allows for adapters, scripts, and integration tools for external data input and output.
How the Exchange works with Rabbit MQ
- Message Clients are the interfaces to application logic.
- Exchange Points are the 'postboxes': where messages can be sent to and received from.
- Exchange Spaces group points and permitted users, i.e. they are like 'postal services'
For example, the diagram depicts raw data and meta‐data coming in from an instrument in Portland and put on the Exchange for consumption by any interested party. In this case, the raw stream is noted in the repository and picked up for data processing in an Amazon cloud. The raw data is also turned into processed data and put back on the exchange where the repository is updated again along with a research team’s event detection.
This is an example of how Message‐oriented systems enable "loosely coupled" integration - because the message senders are not directly coupled to message receivers. Instead everyone is connected to the Exchange. Loose coupling is an important architectural property that has beneficial influences on maintainability, extensibility, robustness, scalability and other quality properties of the system and its individual software components.
- According to OOI’s documentation on release 1, ION uses AMQP 0.9.1 and RabbitMQ-Server v. 2.3.1 on CentOS 5.5.
- To learn more about RabbitMQ visit the website, podcast, product page, or download a trial.
- See the full case study on how With VMware RabbitMQ, OOI Gives Ocean Scientists Vast New Infrastructure.