A great advantage of software systems is that they are extremely adaptable. There comes a point in the evolution of complex software systems, however, where that malleability hinders growth instead of facilitating it. At some point, software will approach a stage where it’s no longer serving its purpose — helping people.
This was the case for the Twitter AdServer in early 2019. After 10 years of iterative development, the system was too inefficient to further evolve with the organization. When it started, we were an extremely small team of engineers, serving a single type of ad format (Promoted Tweets), generating around $28M of revenue. Today, Twitter’s Revenue organization consists of 10X more engineers and ~$3B of revenue, supporting multiple ad formats - Brand, Video, Cards.
Low velocity for launching new products and tightly coupled cross team dependencies with high overhead costs added to the growing complexity of the organization. This demanded a fundamental change in order for us to scale any further.
The AdServer funnel consisted primarily of Admixer and Adshard. The Admixer served as the interface between the upstream clients and the AdServer pipeline. When Admixer received an ad request, it hydrated the request with additional user information before fanning out the request to the Adshards. Adshard operated under sharded architecture, with each Adshard responsible for a subset of the ads. Each Adshard ran the request through 3 main stages of the AdServer funnel:
Throughout this funnel, different components also attach metadata associated with an ad request and ad candidates, which is then written to our underlying key-value stores in AdMixer. This data is later consumed in our feedback loop by the analytics pipeline for billing, fraud detection and other analysis.
Historically, we optimized the system for minimal latency and operational overhead, by reducing network hops wherever possible. This led to a single service i.e. Adshard doing most of the heavy lifting job, resulting in a monolithic model.
The monolithic platform worked well when Twitter had only two ad products - Promoted Tweets and Promoted Accounts. However, as we scaled our business the monolith model posed more challenges than solutions.
In the old Adserver, due to the challenges and complexity of the legacy code reusing the existing patterns and practices became the norm. The figure above is an example of adding a new ad product e.g. Promoted Trend in the old AdServer. Promoted Trend ads have the following characteristics:
1. Should always be selected by Geo == Country.
2. Should not require auction and can therefore skip the ranking stage.
Adding a new ad product often involved patchy work. Due to the nature of the existing framework and other legacy constraints, skipping the ranking stage was not a viable option, which led to a hacky workaround of adding product based conditional logic "if (product_type == 'PROMOTED_TREND') {...} else {..}" to the code in the ranking pipeline. Such product based logic also existed in the selection pipeline, making these stages tightly coupled and adding to the complexity of the growing spaghetti code.
Following were some of the challenges that were shared across all the teams who worked on the monstrous legacy code.
These chronic engineering problems and the loss of developer productivity created a need for a change of paradigm in the design of our systems. We were lacking clear separation of concerns in our architecture, and had high coupling between different product areas.
These problems are fairly common in the software industry, and breaking up monoliths into microservices is a popular approach to solve them. However, it comes with inherent trade-offs, and if designed hastily, can instead lead to reduced productivity. As an example, let’s conduct a thought exercise about one possible approach we could have taken with the decomposition of our service.
Since each product team was bottlenecked on a monolithic AdServer, and different products can require varying architectural requirements, we could choose to break up the single AdServer into N different AdServers, one for each product or group of similar products.
In the architecture above, we have three different AdServers, for Video Ad Products, Takeover Ad Products, and one for Performance Ad Products. These are owned by the respective product engineering teams, and each of them can have their own codebase and deployment pipelines. This seems to provide autonomy, and help with both separation of concerns and decoupling between different product areas, however, in reality, such a split would likely make things worse.
Each product engineering team now has to staff up to maintain an entire AdServer. Each team has to maintain and run their own candidate generation and candidate ranking pipelines, even though they rarely modify them (these are often modified by Machine Learning domain experts). For those domain areas, things become even worse - shipping a new feature to be used in Ad Prediction would now require us to modify the code in three different services, instead of one! Finally, it’s hard to ensure that analytics data and logs from all AdServers can be joined with each other, to make sure downstream systems continue to work (analytics is a cross-cutting concern across products).
We learned that just a decomposition is not enough. The per-product AdServers architecture we built above lacks both cohesion (each AdServer still does too many things) and re-usability (ad candidate ranking, for example, runs in all three services). This leads to an epiphany - while we need to provide autonomy to product engineering teams we must support them with horizontal platform components that can be re-usable across products! Having plug-and-play services for cross-cutting concerns can create a multiplier effect for engineering teams.
As a result, we identified “general purpose ad tech functions” that could be used by most ad products directly. These were -
We built services around each of these functions and reorganized ourselves into platform teams that each own one of them. The product AdServers from the previous architecture have now become much leaner components, that rely on the horizontal platform components and with product specific logic built on top of them.
Let's re-examine the problem presented above associated with spotlight ads and how the new architecture tackles it. By building different services for ad candidate selection and ad candidate ranking, we have better separation of concerns. It breaks the pattern of forcing our ad products through the 3 stage paradigm of the AdServer pipeline. Spotlight ads now have the flexibility to integrate with only the selection service, allowing these ads to skip the ranking stage. This allows us to get rid of hacky ways of bypassing ranking for promoted trend ads and achieve a cleaner and robust codebase.
As we continue to grow our ad business, adding a new product will be as easy as plugging and playing these horizontal platform services as needed.
With well defined APIs we could establish a separation of responsibilities across teams. Modifying the candidate ranking pipeline does not require understanding the selection or creative stages. This is a win-win situation, where each team only has to understand and maintain their code, allowing them to move faster. This also makes troubleshooting errors easier because we can isolate the issues within a service and test them independently.
Such a paradigm shift in the way we serve ads at Twitter is bound to come with inherent risks and trade-offs. We want to list some of them as a word of caution for readers - to encourage that such trade-offs must be identified and acknowledged before deciding on a large refactor to existing systems.
We evaluated these risks, and made the decision that the benefits of the new architecture outweighed the impact of these risks. An overall increase in development velocity, and more predictable delivery of feature improvements is vital for the ambitious business goals that we have set for ourselves. The new architecture delivers that - a modular system that allows faster experimentation and reduces coupling.
We have already started seeing the benefits of this decision:
A daily deploy cadence with multiple teams pushing new code everyday, combined with the large scale in the order of hundreds of thousands of QPS, made decomposition of the AdServer challenging.
To start the migration, we adopted an in-memory API-first approach which allowed logical separation of the code. It also enabled us to run some initial system performance analysis to ensure the CPU and memory footprint delta between the old and the new systems were at par. This laid down the foundation for the horizontal platform services - which were essentially services created by refactoring code and re-arranging the packaging structure of the in-memory version.
To ensure feature parity between the old and new services we developed a custom correctness evaluation framework. It replayed requests to both the legacy and new Ad Server to compare metrics between the two systems within an acceptable threshold. This was used in offline testing and gave us visibility into the performance of the new system. It helped us catch issues early on, preventing bugs from making it into production.
After the code was shipped to production, we used an experiment framework which provides insights into the overall revenue metrics in production. Many prediction and auction related metrics require a longer feedback loop to remove noise and assess the true impact of a change. Thus, for a true end-to-end verification of the migration, we relied on this framework to guarantee on-par revenue metrics.
The Ad Server decomposition improved the state of our systems and reinforced the foundation of Twitter's ad business. It lets us focus our time and resources on solving real engineering problems instead of battling the woes of legacy infrastructure. With the evolution of the ad business and technology, more challenges are to come but we are positive and excited to build solutions that make our systems efficient.
If you're interested in solving such challenges, consider joining the flock.
This project took commitment, cross-functional alignment and work from many teams. We would like to thank project leads who contributed to this blog: Girish Viswanathan, James Gao, Kavita Kanetkar, Rachna Chettri, Siddharth Rao, Tanooj Parekh and Yudian Zheng.
And others who worked on this project: Andrew Taeoalii, Andrew Wilcox, Annie Lin, Alicia Vartanian, Arun Viswanathan, Babatunde Fashola, Bhavdeep Sethi, Brian Kahrs, Corbin Betheldo, Catia Goncalves, Daniel Erenrich, Daniel Kang, Eddie Xie, Eitan Adler, Eric Chen, Hani Dawoud, Heather Chen, Ilho Ye, Irina Sch, Jean-Pascal Billaud, Julio Ng, Jiyuan Qian, Joe Xie, Juan Serrano, Kai Chen, Kai Zhu, Kevin Xing, Keji Ren, Mohammad Saiyad, Paul Burstein, Prabhanjan Belagodu, Praneeth Yenugutala, Ranjan Banerjee, Rembert Browne, Radhika Kasthuri, Ratheesh Vijayan, Rui Zhang, Sandy Strong, Siva Gurusamy, Smita Wadhwa, Siyao Zhu, Su Chang, Scott Chen, Srinivas Vadrevu, Tianshu Cheng, Troy Howard, Tushar Singh, Udit Chitalia, Vishal Ganwani, Vipul Marlecha, Wei Su, Wolf Arnold, Yanning Chen, Yong Wang, Yuemin Li, and Yuanjun Yang.
We would also like to thank the Revenue SRE team, Revenue Product and Engineering leadership team, Revenue Strategy and Operations team, and Revenue QE team for their constant support.
Did someone say … cookies?
X and its partners use cookies to provide you with a better, safer and
faster service and to support our business. Some cookies are necessary to use
our services, improve our services, and make sure they work properly.
Show more about your choices.