Chain of Trust

Overview

Taskcluster is versatile and self-serve, and enables developers to make automation changes without being blocked on other teams. In the case of developer testing and debugging, this is very powerful and enabling. In the case of release automation, the ability to schedule arbitrary tasks with arbitrary configs can present a security concern.

The chain of trust is a second factor that isn’t automatically compromised if scopes are compromised. This chain allows us to trace a task’s request back to the tree.

High level view

Scopes are how Taskcluster controls access to certain features. These are granted to roles, which are granted to users or LDAP groups.

Scopes and their associated Taskcluster credentials are not leak-proof. Also, by their nature, more people will have restricted scopes than you want, given any security-sensitive scope. Without the chain of trust, someone with release-signing scopes would be able to schedule any arbitrary task to sign any arbitrary binary with the release keys, for example.

The chain of trust is a second factor. The embedded ed25519 keys on the workers are either the something you have or the something you are, depending on how you view the taskcluster workers.

Each chain-of-trust-enabled taskcluster worker generates and signs chain of trust artifacts, which can be used to verify each task and its artifacts, and trace a given request back to the tree.

The scriptworker nodes are the verification points. Scriptworkers run the release sensitive tasks, like signing and publishing releases. They verify their task definitions, as well as all upstream tasks that generate inputs into their task. Any broken link in the chain results in a task exception.

In conjunction with other best practices, like separation of roles, we can reduce attack vectors and make penetration attempts more visible, with task exceptions on release branches.

Chain of Trust Versions

Initial Chain of Trust implementation with GPG signatures: Initial 1.0.0b1 on 2016-11-14
CoT v2: rebuild task definitions via json-e. 7.0.0 on 2018-01-18
Generic action hook support. 12.0.0 on 2018-05-29
Release promotion action hook support. 17.1.0 on 2018-12-28
ed25519 support; deprecate GPG support. 22.0.0 on 2019-03-07
drop support for gpg 23.0.0 on 2019-03-27
drop support for non-hook actions 41.0.0 on 2021-09-02

Chain of Trust Key Management

Ed25519 key management is a critical part of the chain of trust. There are valid ed25519 keys per worker implementation (docker-worker, generic-worker, and scriptworker).

Base64-encoded seeds that can be converted to valid level 3 ed25519 pubkeys are recorded in scriptworker.constants, in DEFAULT_CONFIG['ed25519_public_keys']. These are tuples to allow for key rotation.

At some point we may add per-cot-project sets of pubkeys. We may also move the source of truth of these pubkeys to a separate location, to enable cot signature verification elsewhere, outside of scriptworker.

verifying new ed25519 keys

The verify_cot commandline tool supports a --verify-sigs option. This will turn on signature verification, and will break if the cot artifacts are not signed by valid level 3 ed25519 keys.

There is also a verify_ed25519_signature commandline tool. This takes a file path and a signature path, and verifies if the file was validly signed by a known valid level 3 key. It also takes an optional --pubkey PUBKEY argument, which allows you to verify if the file was signed by that pubkey.

Rotating the FirefoxCI CoT keys

See this mana page.

Chain of Trust Artifact Generation

Each chain-of-trust-enabled taskcluster worker generates and uploads a chain of trust artifact after each task. This artifact contains details about the task, worker, and artifacts, and is signed by the embedded ed25519 key.

Embedded ed25519 keys

Each supported taskcluster workerType has an embedded ed25519 keypair. These are the second factor.

docker-worker has the ed25519 privkey embedded in the AMI, inaccessible to tasks run inside the docker container.

generic-worker can embed the ed25519 privkey into the AMI for EC2 instances, or into the system directories for hardware. This are permissioned so the task user doesn’t have access to it.

Chain-of-Trust-enabled scriptworker workers have a valid ed25519 keypair.

The pubkeys for trusted workerTypes are recorded in scriptworker.constants.ed25519_public_keys.

Chain of Trust artifacts

After the task finishes, the worker creates a chain of trust json blob, ed25519 signs it, then uploads it as public/chain-of-trust.json and its detached signature, public/chain-of-trust.json.sig. It looks like

{
  "artifacts": {
    "path/to/artifact": {
      "sha256": "abcd1234"
    },
    ...
  },
  "chainOfTrustVersion": 1,
  "environment": {
    # worker-impl specific stuff, like ec2 instance id, ip
  },
  "runId": 0,
  "task": {
    # task defn
  },
  "taskId": "...",
  "workerGroup": "...",
  "workerId": "..."
}

The v1 chain-of-trust json artifact schema is viewable here.
This is a real example artifact.

Chain of Trust Verification

Currently, only chain-of-trust-enabled scriptworker instances verify the chain of trust. These are tasks like signing, publishing, and submitting updates to the update server. If the chain of trust is not valid, scriptworker kills the task before it performs any further actions.

The below is how this happens.

Decision Task

The decision task is a special task that generates a taskgraph, then submits it to the Taskcluster queue. This graph contains task definitions and dependencies. The decision task uploads its generated graph json as an artifact, which can be inspected during chain of trust verification.

We rebuild the decision task’s task definition via json-e, and verify that it matches the runtime task definition.

Ed25519 key management

The chain of trust artifacts are signed. We need to keep track of the ed25519 public keys to verify them.

We keep the level 3 gecko pubkeys in scriptworker.constants.ed25519_public_keys, as base64-encoded ascii strings. Once decoded, these are the seeds for the ed25519 public keys. These are tuples of valid keys, to allow for key rotation.

Building the chain

First, scriptworker inspects the [signing/balrog/pushapk/beetmover/etc] task that it claimed from the Taskcluster queue. It adds itself and its Decision Task to the chain.

Any task that generates artifacts for the scriptworker then needs to be inspected. For scriptworker tasks, we have task.payload.upstreamArtifacts, which looks like

[{
  "taskId": "upstream-task-id",
  "taskType": "build",  # for cot verification purposes
  "paths": ["path/to/artifact1", "path/to/artifact2"],
  "formats": ["gpg", "jar"]  # This is signing-specific for now; we could make formats optional, or use it for other task-specific info
}, {
  ...
}]

We add each upstream taskId to the chain, with corresponding taskType (we use this to know how to verify the task).

For each task added to the chain, we inspect the task definition, and add other upstream tasks:

if the decision task doesn’t match, add it to the chain.
docker-worker tasks have task.extra.chainOfTrust.inputs, which is a dictionary like {"docker-image": "docker-image-taskid"}. Add the docker image taskId to the chain (this will likely have a different decision taskId, so add that to the chain).

Verifying the chain

Scriptworker:

downloads the chain of trust artifacts for each upstream task in the chain, and verifies their signatures. This requires detecting which worker implementation each task is run on, to know which ed25519 public key to use. At some point in the future, we may switch to an OpenSSL CA.
downloads each of the upstreamArtifacts and verify their shas against the corresponding task’s chain of trust’s artifact shas. the downloaded files live in cot/TASKID/PATH , so the script doesn’t have to re-download and re-verify.
downloads each decision task’s task-graph.json. For every other task in the chain, we make sure that their task definition matches a task in their decision task’s task graph.
rebuilds decision and action task definitions using json-e, and verifies the rebuilt task definition matches the runtime definition.
verifies each docker-worker task is either part of the prebuild_docker_image_task_types, or that it downloads its image from a previous docker-image task.
verifies each docker-worker task’s docker image sha.
makes sure the interactive flag isn’t on any docker-worker task.
determines which repo we’re building off of.
matches its task’s scopes against the tree; restricted scopes require specific branches.

Once all verification passes, it launches the task script. If chain of trust verification fails, it exits before launching the task script.

Chain of Trust Testing / debugging

The verify_cot entry point allows you to test chain of trust verification without running a scriptworker instance locally.

Create the virtualenv

Install git, python>=3.6, and python3 virtualenv.
Clone scriptworker and create virtualenv:

git clone https://github.com/mozilla-releng/scriptworker
cd scriptworker
virtualenv3 venv
. venv/bin/activate
python setup.py develop

Set up the test env

Create a ~/.scriptworker or ./secrets.json with test client creds.
Create the client at the client manager. Mine has the assume:project:taskcluster:worker-test-scopes scope, but I don’t think that’s required.
The ~/.scriptworker or ./secrets.json file will look like this (fill in your clientId and accessToken):

{
  "credentials": {
    "clientId": "mozilla-ldap/asasaki@mozilla.com/signing-test",
    "accessToken": "********"
  }
}

Find a task to test

Find a cot-enabled task on treeherder to test.
Click it, click ‘inspect task’ in the lower left corner.
The taskId will be in a field near the top of the page.

Run the test

Now you should be able to test chain of trust verification!

verify_cot --task-type TASKTYPE TASKID  # e.g., verify_cot --task-type signing cbYd3U6dRRCKPUbKsEj1Iw

To test with signature verification, use the --verify-sigs option. This only works for level 3 trusted workers, since we don’t keep track of the other pubkeys..