Expanso is a 2024 Nominee for the TECHCOnnect Innovation Showcase. Learn More Here.
In the current landscape of rapid digitization, there’s an unprecedented increase in data generation at the edge, far from centralized data centers. This data spans a wide range, from application logs for troubleshooting to extensive CCTV footage for security. Machine Learning (ML) has the potential to unlock this data’s insights. Yet, there’s a challenge: the separation between data generation sites and ML model locations, especially in terms of data transit. Data typically exists at the edge, while ML training is centralized. Bacalhau changes this dynamic. It allows you to use familiar binaries and architecture, bringing ML inference to the data’s location. This decentralization drastically cuts costs, improves efficiency, and provides real-time insights, enhancing system reliability and response times. Running ML at the edge also boosts data security, simplifies system design, and eases management.
A Scenario
Consider a scenario with 100 cameras recording in 4K resolution at 24 FPS, generating 1TB of data hourly, totaling about 9.6 Petabytes annually. For cost analysis, we use 50 virtual machines (e2-standard-8 (GCP)), comparable to a solid Intel NUC, each processing 50 frames per second with the YOLOv5 model. With a cost of $28.90 per hour per instance, our approach using Bacalhau offers over 95% savings compared to centralized processing with AWS, GCP, or Azure. Previously, this massive data trove was underutilized due to slow insight generation. With Bacalhau distributing ML inference models on local hardware, you not only gain valuable insights and enhanced security but also save significantly on central compute costs.
The Benefits of this Setup
- Decentralization of Machine Learning (ML) Inference: Bacalhau shifts the paradigm by bringing ML inference directly to the data’s location, significantly reducing the need for data transit to centralized locations for processing. This approach facilitates real-time insights, enhances system reliability, and improves response times.
- Cost Reduction and Efficiency: By distributing ML inference models on local hardware, Expanso demonstrates over 95% savings on compute costs compared to traditional centralized processing methods used by AWS, Google Cloud, and Azure.
- Enhanced Data Security and Simplified System Design: Running ML at the edge improves data security and simplifies the overall system design and management, making it easier for organizations to adopt and maintain.
Provider (Totals) | Storage | Access | AI (run and train) | Total | Savings |
AWS | $202,604 | $378,432 | $15,140,530 | $15,721,566 | 19.35% |
Google Cloud | $192,480 | $378,432 | $18,922,450 | $19,493,362 | 0.00% |
Azure | $154,484 | $151,373 | $15,214,580 | $15,520,437 | 20.38% |
Bacalhau | $192,480 | $0 | $738,037.4 | $930,517.9 | 95.23% |
How Bacalhau Works
Bacalhau is an open-source software created to orchestrate jobs regardless of the deployment location and over unreliable networks. It functions through a self-organizing network, which efficiently manages the distribution and execution of jobs. A unique aspect of Bacalhau is its use of a bidding system for job distribution. This system ensures that compute nodes, which are aware of their hardware capabilities, can accurately assess whether they can handle a job. This approach optimizes the distribution and execution of tasks, ensuring efficient use of resources.
Requirements
- Edge Hardware: The first requirement is edge hardware capable of executing and retraining medium-sized inference models, like YOLOv5. Compact yet powerful machines like Intel NUCs or hardware accelerators such as the Intel Neural Compute Stick are recommended.
- Hypervisor Layer: The second component is a hypervisor layer for abstracting and setting up the local hardware. We recommend using VMware vSphere or VMware ESXi for this purpose.
- Bacalhau Installation:The final requirement is installing Bacalhau. Bacalhau’s main advantage is that it allows you to keep using the same binaries and architecture you’re accustomed to while bringing your AI applications to the edge. It seamlessly orchestrates across different formats like Docker, WASM, Python, or other binaries, providing versatility and ease of use.
“We’re thrilled to work with Intel and VMware, pushing the boundaries of edge computing. Intel’s NUCs bring the muscle we need for heavy-duty machine learning tasks, while VMware’s virtualization tech gives us the flexibility to scale and adapt. Together, we’re not just crunching numbers faster; we’re unlocking real-time insights from distributed workloads including high-res video data, making smarter decisions easier and more cost-effective. This is a big step forward in our journey to transform how businesses process and leverage their data.” (David Aronchick, CEO of Expanso)
About the company
Expanso, leveraging its open-source software Bacalhau, redefines the computing landscape. Their distributed compute platform empowers customers to process data exactly where it’s generated, be it on-premise, across various clouds, zones, or regions, enabling a truly global reach. This approach results in operations that are not only faster and more cost-effective but also inherently more secure. Expanso transforms how data is processed, making it more efficient and safer for anyone working with data.