Mobile app development: Catching crashers

Thursday, 30 May 2013

Before Twitter for iOS code reaches production, we run it through static analysis, automated testing, code review, manual verification, and employee dogfooding. In this last step, we distribute beta builds to our employees to collect real-world feedback on products and monitor code stability through crash reports.

Detailed crash data has had a huge impact to our development process, and significantly improved mobile app performance and quality. This post describes two types of bugs, and how we used detailed data from Crashlytics to diagnose and fix them.

The most common crashers are not elaborate:

  • You forgot to nil out delegate properties when the delegate is deallocated
  • You didn’t check whether a block was nil before calling it
  • You released an object in the wrong order

These mistakes are solved with rote patterns. Sometimes you haven’t learned a pattern. Sometimes you know them, but lack rigor. We experienced the latter with category prefixing.

Objective-C’s Categories allow you to attach new methods to classes you don’t own. For instance, we could attach isTweetable to NSString, which returns YES if the string is less than or equal to 140 characters.

It’s part of Cocoa best practices to prefix categories with something unique. Instead of isTweetable, call your method tw_isTweetable. Then your app is safe if iOS adds a method with the same name to that class in the future. In the past, we usually did this, but missed a few categories.

Late last year, we noticed several crashes consistently plaguing a small number of users. They were related to innocuous categories, but iOS documentation didn’t point to any name collisions.

If we can’t reproduce the crash, we try to isolate the problem with crash environment data. Does it affect users of an older version of iOS? Certain hardware? Is it under a low-memory situation?

Crashlytics revealed most of these crashes were on jailbroken devices. It turned out the jailbreak environment added its own unprefixed categories to core classes, and they shared the same names as our own categories.

We discovered another set of crashes related to categories, on non-jailbroken devices. Older categories were inconsistently prefixed and collided with private categories used by the system frameworks. Sometimes you do the right thing — and just have bad luck.

You should start with the simplest solution that works, but one day you will outgrow that solution. With a more complex architecture come more complex edge cases and their bugs.

For example, Twitter for iPhone originally used NSKeyedArchiver for storage. To get more granular control over what we loaded from disk at launch, we moved to SQLite. Benchmarking on older hardware revealed that if we wanted to keep the main thread responsive, we had to enter the thorny world of SQLite multithreading.

In our first implementation, we used a background queue to write incoming Tweets to a staging table. Then we bounced back to the main thread to replace the main table with the staging table.

This looked fine during testing, but crash reports revealed database lock errors. SQLite’s write locks are not table level, but global. We had to serialize all write operations, so we rewrote the framework to use one GCD queue for reads and one for writes.

Fixing that crash cleared the way for the next one: you should not share the same database connection between threads. You should open the connection on the thread where you’re going to use it. However, GCD makes no promises that items in one operation queue are dispatched to the same thread. We rewrote the framework to use native threads instead of GCD, and watched the graph of crashes dramatically drop.

What lingered were database schema lock errors. We traced them back to the SQLite Shared Cache. Disabling it eliminated the remaining crashes.

While iTunes Connect collects crash reports from production builds, Crashlytics lets us collect crashes from employee dogfood builds, which dramatically reduces the feedback cycle. Instead of iterating on several public releases, we quickly address the crashes internally, and ship a better version to users.

On Crashlytics
Last year, our Twitter for iOS team –– Satoshi Nakagawa, Bob Cottrell, Zhen Ma and I –– started using Crashlytics as an internal crash reporting framework. It was clear their analysis was the best available, immediately catching crashes other frameworks missed. Their web front-end was far more mature than what we were building internally. We liked them so much, we welcomed them to the flock.

Today, the Crashlytics team released the Android SDK, which our Android engineers have been beta testing. We’d all recommend you give it a try.

Posted by Ben Sandofsky (@sandofsky)
Tech Lead, Twitter for Mac (previously, Twitter for iOS)