Guest blogpost: Inferring Jakarta Commuting Statistics from Twitter Data

Friday, 22 September 2017


Every day, people around the world post hundreds of millions of Tweets in dozens of languages. They  can provide real-time information on many issues including the cost of food, availability of jobs, access to health care, quality of education, and reports of natural disasters. There are few places on earth as prolific with Twitter usage than the Indonesian capital, Jakarta.

As the United Nations, its member states and its peoples, pursue the Sustainable Development Goals (SDGs), we need to make much better use of the treasure troves of insights captured in big data sources to measure progress against these global goals. Drawing on the agreement signed between the United Nations Global Pulse (@UNGlobalPulse) and Twitter in 2016 to support efforts to achieve the SDGs, Pulse Lab Jakarta (@PulseLabJakarta), a data innovation lab of the United Nations and the Government of Indonesia and part of the UN Global Pulse network, used Twitter’s data stream to test a new method that has the potential to turn social conversations on Twitter into actionable information for city administrations.

@PulseLabJakarta, partnered with the team from Jakarta Smart City (@JSCLounge) and the Indonesian Institute of Statistics to look at whether aggregated data from social media could help provide information on the movements of commuters within the Greater Jakarta area in order to better inform public officials on daily commuter patterns so as to optimize public transport.

Transport policy in a megacity

Jabodetabek, as locals fondly refer to the Greater Jakarta area, includes the settlements of Jakarta, Bogor, Depok, Tangerang and Bekasi. Jakarta, itself, is split into five administrative cities: North, South, East, West and Central. Population estimates for the settlements range from 10 million for Jakarta, to over 30 million for the broader metropolitan area.

The scale of the population and the state of the transport infrastructure makes the daily commute a common complaint among residents. The city administration is working to improve the commuting experience with significant investments in transport infrastructure, such as the new Jakarta MRT.

To inform these infrastructure investments and to understand the rhythm of the city, the Indonesian Bureau of Statistics conducted the first Jakarta commuting survey in 2014. The survey filled an initial data gap, but from design to delivery, it was one year before the results were available.

Filling the data gaps

The challenge of data relevance in urban planning is not unique to Jakarta, and has spurred many attempts to use other types of data to produce similar statistics. The most promising projects use geolocated information such as GPS devices, sensors, social media and mobile phone data.

In Indonesia, social media is recognised as a promising data source to understand macro patterns of human behaviour while protecting individual privacy. This is especially true of Jakarta, once dubbed the “Twitter Capital of the World” due to the millions of Tweets generated in the city every day. @PulseLabJakarta used this opportunity to test whether the locational information from social media can reveal commuting patterns in the Greater Jakarta area.

  • First we produced Origin-Destination statistics for the ten cities in Greater Jakarta from accounts of users who voluntarily included geolocation information in their Tweets and then analyzed the nature of the commutes between such geographic areas.
  • Second, we calibrated the initial result based on the population distribution and Twitter usage distribution.
  • Finally, we verified the result with the official commuting statistics produced by the Indonesian Bureau of Statistics.

Origin-destination analysis

We collected all location-stamped Tweets posted in Greater Jakarta from Twitter data sets and subsetted Tweets posted between 1st January 2014 and 30th May 2014, considering that the official commuting survey was conducted during the first quarter of 2014.

Analyzing de-personalized usage patterns, we inferred two locations: origin and destination, both at sub-district level. Origin location was inferred as the most Tweeted sub-district location between 9:00 pm and 7:00 am. Destination location was determined as the most Tweeted sub-district location during weekdays, excluding the origin location.

Using this approach, among the 1,456,927 unique users who voluntarily posted location-stamped Tweets in Greater Jakarta during the five months from January 2014, we found the origin and destination information for 305,761 users at the sub-district level (i.e. we were not zooming in any further). This represents about 2.8 per cent of the whole population, and 14 per cent of the commuting population in Greater Jakarta.

Due to the unequal penetration rates of Twitter, we mapped the origin-destination information at sub-district level to city level, and calibrated the information based on the population data from the ten cities. After calibration, the cross correlation score between the two forms of statistics, official statistics and the statistics from our approach, improved from 0.92 to 0.97.

The chord diagram shows that Twitter is a promising source of data for inferring commuting statistics in Greater Jakarta.




This post is unavailable
This post is unavailable.

Results in full

Table C shows the rank difference between official statistics from the Community Survey (Table A) and the information inferred from Twitter (Table B).  This table suggests that the approach produces broadly reliable predictions. For instance, the value for SJ ⇒ CJ is calculated as ‘0’ because the two statistics are the same.


This post is unavailable
This post is unavailable.

Pulse Lab Jakarta is working to improve the method with better calibration using other demographic variables, as well as expanding research by analysing commuting data from Transjakarta, which is the Bus Rapid Transit system in Jakarta.

This research was originally presented at NetMob 2017, a conference on the scientific analysis of mobile phone datasets, and more recently at the Asia-Pacific Economic Statistics Week. Our partner in this research, the Institute of Statistics, which is part of Statistics Indonesia, is planning to use this method to enhance commuting statistics produced by the Government henceforth.

We are pleased to be able to showcase how our partnership with Twitter using #DataforGood is making a positive contribution to the Sustainable Development Goals and we are delighted to be able to work with the Government of Indonesia and partners from the UN to harness new digital data sources for better decision making.

* Pulse Lab Jakarta is a data innovation lab that works to leverage new digital data sources for development and humanitarian action. The Lab is a joint innovation initiative of the United Nations and the Government of Indonesia and is part of the UN Global Pulse network. PLJ works on experimenting with new methodologies and approaches to inform and support achievement of the Sustainable Development Goals. PLJ can be contacted by email: [email protected] or through our website:


This post is unavailable
This post is unavailable.