Insights

Manhattan software deployments: how we deploy Twitter’s large scale

Monday, 9 May 2016

Twitter’s Manhattan distributed database is one of the primary data stores at Twitter, serving Tweets, Direct Messages, and advertisements, among other use cases. We want to share the challenges of handling Manhattan software deployments and our approach to solving them.

The Manhattan service runs on clusters of thousands of physical hosts in multiple data centers. Each instance of the Manhattan service can be viewed as two parts consisting of a stateless coordinator process (to handle the routing of the requests) and the stateful backend layer (which stores the actual data). For reliability guarantees, the same data is replicated to a set of instances — typically three — called a mirror-set. The focus on this post will be on the core Manhattan service described above. We won’t cover supporting services — such as the topology manager, configuration web service, and metadata service — because those services have far more relaxed deployment constraints compared to the core service.

Challenges

At its heart, deployment can be simply viewed as distributing the software package and restarting the service. But the constraints and challenges involved in safely deploying a distributed database complicates this process.

While Manhattan is a multi-tenant datastore, we also run independent Manhattan clusters supporting specialized use cases such as read-only, in-memory, and strongly-consistent. Some Manhattan customers also have unique hardware requirements and are hosted on separate clusters. So in a typical deployment cycle, we canary test and deploy to thousands of physical hosts across numerous clusters in multiple data centers.
The scale at which Manhattan operates directly impacts the speed at which we can deploy. We want to make deployments faster to be able to do frequent deploys with small sets of changes.
The size of the Manhattan package is close to 300MB. We need to ensure that the package is rapidly deployed to all the hosts.
In order to provide consistency guarantees, we can only restart a small subset of nodes in a mirror-set at once. This means the restart logic must be aware and up to date with the topology of the cluster throughout the deployment.
To ensure safety, the restart logic should also take into account broken mirror-sets, planned maintenance, and unplanned outages during the deployment.
We always have to guarantee that only a percentage of hosts in the cluster can be restarted at the same time without affecting the availability of the cluster.
Manhattan deploy should be transparent to our customers. This means our deployment systems should ensure that Manhattan adheres to the same performance, reliability, and availability guarantees during deployments as normal operations. In addition, the deployment service should be cognizant of current outages affecting other services in the company and be able to pause or rollback the deploys to avoid aggravating the situation.

In summary, the challenges of Manhattan deployment can be categorized as the problems of scale and the requirement for speed and safety.

Goals

We have built a deployment service for Manhattan with the following characteristics and features:

Fully automated and easy to use
Ability to honor the constraints required for safe deployments
Facility for suspending and rolling back a deployment
Capability to schedule deploys (e.g., business-hours-only deploys, emergency deploys, hot-fix deploys) and specify dependency, order of clusters, and number of clusters in parallel for each deploy
Support for an audit trail and a deploy history

Approach

The Manhattan package is composed of two sub packages:

Manhattan coordinator and backend binary
Configuration files

Different clusters typically use the same coordinator and backend packages but have specialized configurations. Both packages are versioned separately and can be deployed independently.

In a nutshell, a deployment cycle goes through the following steps:

Creation of a test build from the latest committed code
Deployment of canary to canary Manhattan hosts
If the canary looks good, creation of a production build
Full stable production deployment

This post is unavailable

This post is unavailable.

Components

The operator submits the deployment plan through a command line tool or a web interface.

Scheduler

The Scheduler service receives a plan from web interface or command line and coordinates with different cluster managers to safely execute that plan.

Sample plans:

Canary plan: canary build X on clusters A, B, and C on 10 hosts each and on clusters D, E on 5% of hosts.
Rollout plan: rollout build X on all Manhattan clusters, start with cluster A then do B and so on. Do only one data center at a time. Do not initiate any roll after 4pm and avoid any known incidents.

Cluster manager

This component continuously monitors deployment status, on a per cluster basis, and updates the deploy status.

It creates a deployment configuration based on the deployment plan submitted by the scheduler. This deployment configuration includes things like the package version number. The cluster manager then set this information in ZooKeeper.
At this point, the new deployment state is logged and stored to provide an audit trail. This audit trail is very useful when you need to rollback to a well known state.
It does a pre-deployment validation and continuously checks the health of the cluster via a monitoring service. It also looks out for company-wide outages and deploy moratoriums. Based on these signals it makes decision to suspend or continue the deploy.
It also checks if the deploy is complete and looks out for failure states.
Finally, it is the job of the cluster manager to track and report progress of rolling restarts via email and IM notifications.

Deploy agent

This is the part of the deployment system responsible for rapidly distributing the Manhattan binaries to all the hosts in the cluster. It consists of two parts:

A daemon called deploy-agent runs on each Manhattan host. The deploy-agent watches a ZooKeeper node for deployment updates issued when a new deployment plan has been scheduled.
The deploy-agent launches handler scripts locally based on the configuration found in ZooKeeper. They do the task of downloading the packages from Packer.

When a new deployment plan is executed, batches of hosts download the new packages in parallel. Typically, all hosts in an about twenty five hundred hosts cluster receive the package within five to seven minutes. The deploy-agent also caches a set of previously downloaded packages on disk to save on time required for rollbacks.

OpsGenie

OpsGenie is the topology management service for Manhattan and manages various aspect of Manhattan related to data placement throughout the cluster. One of its components is responsible for driving a safe cluster wide rolling restart. It enforces hard restrictions to support consistency guarantees (i.e., no more than a small subset of hosts in a mirror-set should be down at any given time) while also allowing for controllable concurrency of total number of hosts that can be restarted in the cluster.

Conclusion

During the early days of Manhattan development, the packages were built on some developer’s laptop and manually copied to the production hosts which was both cumbersome and error prone. We have come a long way since then and as we continue to grow in scale, the deployment service saves countless engineering hours and ensures that deployments are a safe, easy, efficient, and joyful exercise.

If you think this is interesting and would like to work on similar problems, the SRE team at Twitter could use your help - join the flock!

Acknowledgements

Special thanks to Istvan Marko for his contribution to this project and blogpost.

We also want to acknowledge contributions by Alex Yarmula, Boaz Avital, Chris Hawkins, David Helder, Devin Kowatch, Juan Serrano, Kevin Crane, Pascal Borghino, Peter Beaman, Peter Schuller, Ravi Sharma, Ronnie Sahlberg, Sumeet Lahorani, Vladimir Vassiliouk, and others from the core-storage team at Twitter.

This post is unavailable

This post is unavailable.