Tag Archives: TPC-DS

New Architectures for Apache Spark™ and Big Data

Key Trends in Big Data Infrastructure:

Some of the key trends in big data infrastructure over the past couple of years are:

• Decoupling of Compute and Storage Clusters
• Separate compute virtual machines from storage VMs
• Data is processed and scaled independently of compute
• Dynamic Scaling of compute nodes used for analysis from dozens to hundreds
• SPARK and other newer Big Data platforms can work with regular filesystems
• Newer platforms store and process data in memory
• New platforms can leverage Distributed Filesystems that can use local or shared storage
• Need for High Availability & Fault Tolerance for master components

Continue reading