With over 93 million unique browsers per day, and one-third of Japan’s population actively using at least one of its services, Yahoo Japan understands scale. However, with technology and user demands constantly changing, the company’s more than 3,000 software engineers need the right tools in place in order to continue delivering value and speed for its customers. Pivotal Platform plays a big part in helping them accomplish that goal.
In their talk at SpringOne Platform 2019, Cloud Platform Manager Yusuke Kondo and Software Engineer Akinori Nitta elaborated on some of the distinct challenges Yahoo Japan faces around operating at this scale, as well as some of the solutions it has devised for solving them. Among the primary challenges are:
-
Role management across multiple clusters.
-
Routine configuration for multi-foundation environments.
-
Developer implementation errors.
-
Increasing log traffic.
-
An “explosion” of pipelines to maintain.
-
24/7 multi-cluster monitoring.
You can watch the video for specifics on how Yahoo Japan resolved each of these challenges, both technologically and procedurally.
If you’re wondering what Yahoo Japan’s scale translates to in terms of footprint, Kondo summed it up: “We now have more than 10,000 apps in production using more than 40,000 [application instances], supporting more than 180,000 requests per second.”
From an infrastructure perspective, that translates to 16 (soon to be 24) PaaS clusters, managed by 25 people spread across an SRE team and a CRE team. Notably, while Yahoo Japan’s application and application-instance counts grew by about 10x from October 2018 to October 2019, it only added 4 additional SREs during that same timeframe.
In addition to some of the technical and procedural improvements Kondo and Nitta explain in detail during the session, Yahoo Japan has also benefited from embracing Buildpacks and turning to Java and Node.js as its preferred programming languages. In 2016, most development was done in PHP, but today approximately 75% of its Buildpacks support Java or Node.js, compared with only 6% for PHP.
For a little extra flavor, here’s an excerpt from the session, in which Kondo explains the unique responsibilities of Yahoo Japan’s SRE and CRE teams:
"The PaaS team is composed of two teams: CRE and SRE. CRE has a mission to provide value to developers, while SRE has a mission to improve overall system reliability. Each team works together to maximize the productivity and value of the Pivotal platform.
"The CRE team's mission is to focus on the platform users [and] the engineer's productivity improvement. There are three main responsibilities here. Firstly, to make the platform easy to use and provide developers with more productivity by developing service broker APIs or some useful tools, libraries, and CLI plugins that help engineers to ship their apps more frequently. Secondly, to be the contact point of developers to provide feedback and raise issues. The team is also responsible for proactively detecting problems with apps before they become problems, and reaching out to developers to improve them. Lastly, the CRE team is responsible for education, providing workshops and documenting best practices, as well as providing architectural guidance.
"The mission of the SRE team is to maximize system reliability. There are three big responsibilities. Firstly, defining SLOs and setting up a monitoring scheme to achieve those targets. . . . Secondly, supporting requests from the CRE team. The CRE team gathers feedback from engineers and works closely with the SRE team to deliver improvement. Lastly, the SRE team is also responsible for platform updates and enabling new platform features, and also logging metrics for developers. They always try to automate their operations as much as possible to eliminate toil."
Learn more about Yahoo Japan, SRE, and scaling PAS
Automation and Culture Changes for 40M Subscriber Platform Operation
SRE and the value of treating operations as a software problem