Bolstering our infrastructure

Wednesday, 7 November 2012

Last night, the world tuned in to Twitter to share the election results as U.S. voters chose a president and settled many other campaigns. Throughout the day, people sent more than 31 million election-related Tweets (which contained certain key terms and relevant hashtags). And as results rolled in, we tracked the surge in election-related Tweets at 327,452 Tweets per minute (TPM). These numbers reflect the largest election-related Twitter conversation during our 6 years of existence, though they don’t capture the total volume of all Tweets yesterday.

As an engineering team, we keep an eye on all of the activity across the platform –– in particular, on the number of Tweets per second (TPS). Last night, Twitter averaged about 9,965 TPS from 8:11pm to 9:11pm PT, with a one-second peak of 15,107 TPS at 8:20pm PT and a one-minute peak of 874,560 TPM. Seeing a sustained peak over the course of an entire event is a change from the way people have previously turned to Twitter during live events.

In the past, we’ve generally experienced short-lived roars related to the clock striking midnight on New Year’s Eve (6,939 Tweets per second, or TPS), the end of a soccer game (7,196 TPS), or Beyonce’s pregnancy announcement (8,868 TPS). Those spikes tended to last seconds, maybe minutes at most. Now, rather than brief spikes, we are seeing sustained peaks for hours. Last night is just another example of the traffic pattern we’ve experienced this year –– we also saw this during the NBA Finals, Olympics Closing Ceremonies, VMAs, and Hip-Hop Awards.

Last night’s numbers demonstrate that as Twitter usage patterns change, Twitter the service can remain resilient. Over time, we have been working to build an infrastructure that can withstand an ever-increasing load. For example, we’ve been steadily optimizing the Ruby runtime. And, as part of our ongoing migration away from Ruby, we’ve reconfigured the service so traffic from our mobile clients hits the Java Virtual Machine (JVM) stack, avoiding the Ruby stack altogether.

Of course, we still have plenty more to do. We’ll continue to measure and evaluate event-based traffic spikes, including their size and duration. We’ll continue studying the best ways to accommodate expected and unexpected traffic surges and high volume conversation during planned real-time events such as elections and championship games, as well as unplanned events such as natural disasters.

The bottom line: No matter when, where or how people use Twitter, we need to remain accessible 24/7, around the world. We’re hard at work delivering on that vision.

- Mazen Rawashdeh, VP of Infrastructure Operations Engineering (@mazenra)