Performance upgrade - how we did it

Monday, 10 December 2012

A couple of weeks ago, we wrote about our work to improve the performance of TweetDeck when you’re working with high velocity columns like #election2012 or #sandy.

Here, we explain some of the changes that the TweetDeck engineering team made to improve performance in the application.

Measuring performance

The first step was to find some way of measuring our performance. We chose Tweets Per Minute (TPM) as our primary metric for total throughput, but first we wanted to make sure the app didn’t lock up. To test when this was happening we used timeouts. By measuring the delay between the expected start of a timeout and the actual start time, it is possible determine how backed up the processor is. The more overloaded a processor is, the longer it takes to actually fire the timeout.

Eg:

https://gist.github.com/4251073

To start with, we decided that when the timeout delay hit 2 seconds or more, we would consider the app locked up. Before we started our improvements, the app would lock up (as defined by our criteria) almost immediately on adding a fast-moving column such as ‘twitter’ or ‘the’.

We then wrote a system to keep track of the number of tweets we were trying to process per minute – a TPM counter.

The number of tweets was recorded over a short period and extrapolated to give us an estimated TPM. We, rather arbitrarily, chose a 100ms period. Unfortunately, the timeout solution above is also a problem when trying to record an accurate TPM count. Our 100ms interval turned out to not be 100ms at all. However, the solution is much the same. Using a similar system of keeping track of when the interval started and when it ended, we were able to create an adjusted TPM for the period.

These TPM estimates are then collected and averaged. A good sample of TPM estimates is required to produce an accurate figure, so the shorter the aggregation period the more quickly it produces reliable figures.

The results of this were disheartening. TweetDeck was reaching overload at around 1,200 TPM with a single column. There was even slow down on relatively slow-moving columns for search terms like ‘London’ at around 500 TPM. Searching for ‘Twitter’ might hit 5,000 TPM, while something like ‘the’ or ‘a’ could easily by 20,000 TPM or more. We had a lot of work to do.

The interesting part was the disparity between the number of tweets coming in from the stream and the number we were actually processing. As the app locked up, the disparity grew, so there were now two obvious bottlenecks. However, it neatly split our work into pre-processing (receiving data from the stream), processing and post-processing (display).

I’m going to concentrate on the front-end side. At the top of our list was animation. We used to use two jQuery animations when inserting tweets into a column - a fade and an expand. There were 2 main issues that we encountered with this approach. Firstly, you cannot sensibly animate the insertion of a several hundred tweets per second; the animation becomes jerky and reduces the usability of the app. Secondly, the fade in combined with the speed of the column meant that the tweets were off the bottom of the screen by the time they were visible.

So, for columns with greater than 200 TPM, we just turned animation off. Simple. The result was immediately noticeable as our TPM went up from 1200 to around 1500.

We still wanted to improve performance for columns inserting fewer than 200 TPM, so we switched from using jQuery animations to CSS transitions. This allows the browser to do the hard work rather than using JavaScript and gave us another small boost.

Last on the list was a more aggressive DOM cleanup. We were trimming the tweets in a column every couple of minutes, but if 5,000 tweets arrived in this time, there was going to be an awful lot of cruft off the bottom of the screen. We started trimming tweets every time we added new ones to the top, keeping the total number of tweets in the column at no more than 100. This did not result in a notable increase in TPM but did reduce how often the browser locked up while running at high velocity.

Stream batching

For fast moving searches, it is easily possible to hit peaks of over 500 stream events per second. We also found that accessing an xhr’s reponseText in Chrome is slow (there is presumably a certain amount of overhead in copying the data from native networking buffer to the JS string we are trying to access).

By reacting to the xhr onreadystatechange callback, we were receiving a callback for each “line” of data (i.e. once per tweet). Each tweet would then be forced synchronously through the system. However when the stream speeds up, by the time one Tweet has made its way to the DOM, several more are waiting to go.

The solution was to gather tweets over a period of time and fire them in batches through the system. We did this by using a timeout (or its equivalent in the native apps) to poll the connection for new data several times per second. This allows us to both process data in batches and minimise the accesses of responseText. For each batch a processing job is then scheduled in JS event queue to allow the system to process the tweets in its own time.

Caching selectors

TweetDeck streaming columns are very useful and allow you to keep track of things that matter to you the most in real time. From an engineering point of view, this is challenging because it is not easy to predict the volume of tweets that an event is going to produce.

That is precisely why writing scalable code is so important. A few lines of code which may seem harmless in a column which only streams a few tweets every minute, can become a massive bottleneck when that become tens of thousands.

One technique to write scalable code is to cache values that will be reused but are static and don’t need to be recalculated. When a column is streaming and you scroll down to focus on a tweet you are interested in, new tweets coming in will be appended to the top but the position will be maintained so the scroll position doesn’t change. In order to achieve this, we previously called jQuery’s scrollTop() method to determine the scroll position of the column every time a new tweet came in. As mentioned before, this wouldn’t be an issue with slow moving columns, however TweetDeck would spend up to 12% of the CPU time recalculating scrollTop in a fast moving column. Since we manually handle mouseWheel and scroll events to modify the column’s scrollTop, we were able to only call jQuery’s scrollTop method once and use and modify the cached value from then on. That gave the CPU a well-deserved rest while also simplifying the code.

Smaller quick wins

Throughout the week, we made heavy use of Chrome’s developer tools to highlight inefficiencies and other problems with our code. Once we had fixed some of the bigger issues above, we were able to dive in and make small tweaks that made the application a lot more efficient when it was dealing with high velocity streams.

One of these was the way we store the objects that represent Tweets in a column. We keep an array, with the newest items in the column at the beginning, through to the oldest Tweets that are about to drop off of the column at the end.

Profiling the Javascript showed that the way that we were adding items to this array accounted for approximately 10% of the CPU usage of the app. Previously, we were appending new items to the end of the array, then sorting it using a function which would move newer items towards the front. That effectively meant that every time a new update came in, we would shuffle it through hundreds of positions in the array, even though in nearly every case, it would end up at the beginning of the array rather than somewhere in the middle.

We added a very quick optimisation to check whether the incoming Tweets were all newer than the items in the existing array. If they were, we simply added that to the beginning of the array. This reduced the CPU usage of this method to an insignificant level.

In addition to Javascript profiling, we use Chrome’s Heap Snapshots to investigate memory usage in TweetDeck. We run this regularly to spot any memory leaks that may have been introduced, but previously we’ve had trouble spotting leaks that may have affected users who have TweetDeck open for many days at a time, because the only way to reproduce this was to test over many days.

Having the ability to process so many more updates allowed us to spot a couple of memory leaks, which were very easy to fix and which would definitely have affected users who left TweetDeck open for a long time.

One of these fixes was around the way we store a cache of users, the other smaller one was to do with jQuery. We use jQuery in a couple of places to parse HTML into a DOM structure. jQuery has some clever code which, in certain circumstances, caches the document fragments it produces, so that it can speed up the generation of them next time if we pass in the same HTML string (see this great post from John Resig). Unfortunately, we were meeting the conditions for jQuery to cache these, and this cache is never cleared, so the jQuery fragment cache would grow over time. We now clear the cache periodically, so we don’t have to worry about it growing without bound.

We strive to make TweetDeck a more efficient and high performance tool for our users, and will continue to improve the product as we move forward. If working on this kind of performance investigation is your thing then we’d like to hear from you, we’re hiring.

Resources:
http://coding.smashingmagazine.com/2012/11/05/writing-fast-memory-efficient-javascript/

Posted by Tom Woolway (@tomwoolway), Sol Plant (@lostplan), Tom Hamshere (@tbrd), Ramón Arguello (@monchote)
Front-End Engineers, TweetDeck