Testing at scale @ Spotify

Fewer, Smarter and Faster

Testing at scale is a critical aspect of software development, particularly as applications continue to become more complex and integrated with numerous systems. In December 2022, Vivian Santos and Sami Bouchebaba, two leading experts in the field, spoke at Swiftable BA on the topic of testing at scale. Their talk aimed to provide insights and best practices for effectively testing software at large scale. In this article, we’ll explore some of the key takeaways from their presentation, and provide practical tips for developers looking to tackle testing at scale.

Vivian and Sami started their talk by highlighting how Spotify, the popular music streaming service, faces a unique set of challenges when it comes to testing their applications. With hundreds of iOS developers working on five different apps and generating 100s of pull requests (PRs) a day, testing at scale is crucial for ensuring the quality of the apps and enabling developers to ship improvements quickly and confidently. The team responsible for this is “Client Platform”, specifically a group within it called “Mic Check,” whose focus is on improving the testing experience.

The testing process at Spotify is divided into three test suites, with the most important tasks running before any new code is introduced into the rebuild (pre-merge tests). If any of these tests fail, the developer is blocked from merging their code change. Once the code is merged into Master, a slightly more extensive suite of post-merge tests will start, running in parallel across eight worker threads on the CI machines. The current pre-merge test suite consists of 48,000 tests and takes approximately 30 minutes to complete.

The goal of Mic Check is to reduce the pre-merge tests from 48,000 to an infinite number and reduce the pre-merge test time from 30+ minutes to less than 10 minutes. In order to achieve this, the team has focused on writing fewer, smarter, and faster tests.

Fewer tests

The speakers highlighted how the issue with the current testing process is that it tests the entire system, even if only one or two components have changed. The solution to this problem was to target only the tests that need to be run based on the changes made. To accomplish this, the team uses an app called Bazel, which is a fast system developed by Google. Bazel’s “Dependency Analysis” feature allows to compare two Git revisions and output the targets that have been affected in the PR. However, due to the interconnectivity of the components, running “bazel-diff” on the app would result in running all the tests.

To overcome this, they separated the API and implementation into two isolated targets, meaning that changes to the implementation will only trigger the implementation target tests. This is known by them as “test isolation.”

They suggested looking into this blogpost on how to do it.

Impact:

66% reduction in CI startup time
Build time reduced from 90+ minutes to 10-15 minutes
Test time reduced from 30+ minutes to 1-15 minutes

Smarter tests

The speakers then went on to discuss another issue with the testing process. This was flaky tests, which are tests that pass or fail non-deterministically. Those tests reduce both the speed and quality of the test suite, leading to frustration for developers. Their team implemented a system called Master Guardian to tackle this issue. The system follows three steps:

Identifying flaky tests: if a test fails, it is retriggered. If it passes, it is considered flaky and a ticket is opened.
Informing the test owners: the team investigates and stabilizes the test, and if it failed for another reason (e.g. network issue), it is removed from the flaky test cache.
Skipping flaky tests pre-merge: while the ticket is open, Master Guardian will skip the flaky test pre-merge.

Faster tests

Another problem they were facing was that some tests in their suite were taking too long to run, with a long test defined as one that lasted 8 seconds or more. In total, they had 23 tests that fell into this category, and the combined time for all of these tests was 3 minutes in their PR to Green Time. To address this issue, their system allowed them to implement a solution in three steps.

The first step is to identify which tests are slow or flaky. The second step is to inform the owners of these tests, and the third step is to skip the slow or flaky tests, while they are not fixed, during pre-merge.

This three-step process starts by checking if a test is slow, and if so, a ticket is created and passed along to the owner for investigation. The owner then updates the slow test cache and all slow and flaky tests are skipped during pre-merge. If a test does fail post-merge, the owner will be notified, but this approach still results in a reduction in PR to Green Time and a less frustrated developer who has more time to work on other tasks.

Key takeaways

We heard how at Spotify, testing at scale is a critical component of their development process. With hundreds of iOS developers and over 1,000 PR’s a week, they need to ensure their tests are reliable, fast, and efficient. To achieve this goal, Spotify has taken a number of steps to improve their test suites, including writing fewer, smarter, and faster tests.

One approach they’ve taken to writing fewer tests is by using the Bazel app to identify and run only the tests necessary for a given changeset. This approach has resulted in a 66% reduction in startup time in their CI, a build time of 10-15 minutes, and a test time of 1-15 minutes.

Another approach they’ve taken to writing smarter tests is by identifying and skipping flaky tests. They have implemented a system called Master Guardian, which identifies flaky tests, informs their owners, and skips them pre-merge. This helps to reduce the PR to Green Time and minimize frustration for developers.

In conclusion, Spotify has achieved remarkable success in enhancing their testing methodology that has ultimately resulted in maintaining the app’s functionality and high confidence. Here at Qubika we think that the implementation of this approach seems very promising for future endeavors. Being in a constant growth phase, we are open to investigating the practicality of this method and its potential benefits that may prove to be valuable in the future.

Learn more about the work of our QA Studio.

Testing at scale @ Spotify