Open-sourcing Teletraan, a code deployment system

contingencies · on Feb 13, 2016

Run Deploy Agents on each host through your provisioning process

If that's the requirement, a serious question might be: why not just deploy code through your provisioning process?

My personal impression is that the key issue with most systems in this area is that they whitewash, ignore or blur the reality that network topology, security, service codependencies, data security, build and deployment pipelines and so forth need to be managed at some level, may not be the same across all infrastructure, and requirements in these spaces can frequently be nontrivial with competing concerns. Thus, individual solutions typically only function for a subset of cases while ignoring the fact that others exist.

A clear description of the limitations of the approach of the tool would be useful.

Corosync/pacemaker is one mature solution in this area with a focus on HA and the capacity to navigate logical topology changes (due to faults, errors, etc.) in real time to resolve service interdependencies towards a defined 'goal state'. Extremely powerful, with a huge library of 'OCF resource agents' (service definitions) already written.[0] However, it also uses the 'agent on each machine' architecture. Where a small number of services are deployed at large scale in a parallel configuration I believe an agentless configuration methodology with PXE and IPMI power control is a more efficient node management path, though it obviously negatively impacts HA.

PS. 'Features' like rollback and 'hotfix' are arguably negligible if you have a reasonable service build process incorporating versioning.

[0] https://github.com/ClusterLabs/resource-agents

sagichmal · on Feb 13, 2016

> why not just deploy code through your provisioning process?

Because provisioning operates with a much larger timescale, with a different fundamental understanding of the act of manipulating remote state. In a word, it's too slow.

In fact it seems to me that this tool is itself still too slow, like a half-step between Chef/Puppet/Ansible (system provisioning) and Kubernetes (application scaling). The future is clearly in the commodification of hardware and network, i.e. immutable infrastructure and cattle-not-pets.

contingencies · on Feb 13, 2016

> In a word, it's too slow.

PXE boot is a few seconds... restores perfectly known state at all layers of the stack and thus has lots of guarantees.

Xeroday · on Feb 12, 2016

Here's the latest blog post, which includes the link to the code: https://engineering.pinterest.com/blog/open-sourcing-teletra...

dang · on Feb 12, 2016

Thanks! We changed the URL to that from https://engineering.pinterest.com/blog/under-hood-teletraan-....

bjohnso5 · on Feb 12, 2016

Thanks, appreciate the updated link

ryanfitz · on Feb 12, 2016

We use Teletraan heavily. Currently we have about 500 deploys per day, including the auto deploys.

Slightly off topic, I'm a big fan of continuous deployments, but 500 deployments a day seems excessive. Assuming an 8 hour work day, thats more than 1 deploy every minute. Its great that this system works that seamlessly, but are there really benefits with deploying that frequently?

eva1984 · on Feb 12, 2016

I think they are talking about different services though.

ieatpancakes · on Feb 12, 2016

Some services probably have continuous deploy too

yakk0 · on Feb 12, 2016

Nice name. If people don't get the reference, Teletraan was the name of a series of computers from the Transformers series: http://tfwiki.net/wiki/Teletraan_I_%28G1%29

bjohnso5 · on Feb 12, 2016

Title is a bit misleading, should be more like "Pinterest announces intent to open-source Teletraan, a tool for code deployments"

EDIT: Xeroday helpfully pointed out that there IS a link to code in another blog post in the same series.

moondev · on Feb 13, 2016

What does a "host" mean in context of deployment with this tool? Does it mean that you spin up instances and then deploy over and over on the same set?

ketralnis · on Feb 13, 2016

Someone like AWS has a minimum amount of time they bill for an instance for. It's usually 1 hour. If Pinterest is deploying 500 times a day, the cost of spinning up all new instances every single time would be pretty excessive. A lot of companies have a fixed set of app servers that stay up most of the time, then another set of dynamic ones that spin up and down with traffic.

So yes, it probably does mean that they are deploying more than once to the same set of servers.

kevan · on Feb 12, 2016

Perfect timing, we just started building a system for this at a hackathon in the past two days, hopefully we can use this instead.

whatnotests · on Feb 12, 2016

Pinterest is talking about deploying code. Meanwhile, the rest of the world is moving toward deploying containers.

mindcrime · on Feb 13, 2016

"There is no silver bullet". - Fred Brooks