Chain of Trust
Overview
Taskcluster is versatile and self-serve, and enables developers to make automation changes without being blocked on other teams. In the case of developer testing and debugging, this is very powerful and enabling. In the case of release automation, the ability to schedule arbitrary tasks with arbitrary configs can present a security concern.
The chain of trust is a second factor that isn’t automatically compromised if scopes are compromised. This chain allows us to trace a task’s request back to the tree.
High level view
Scopes are how Taskcluster controls access to certain features. These are granted to roles, which are granted to users or LDAP groups.
Scopes and their associated Taskcluster credentials are not leak-proof. Also, by their nature, more people will have restricted scopes than you want, given any security-sensitive scope. Without the chain of trust, someone with release-signing scopes would be able to schedule any arbitrary task to sign any arbitrary binary with the release keys, for example.
The chain of trust is a second factor. The embedded ed25519 keys on the workers are either the something you have or the something you are, depending on how you view the taskcluster workers.
Each chain-of-trust-enabled taskcluster worker generates and signs chain of trust artifacts, which can be used to verify each task and its artifacts, and trace a given request back to the tree.
The scriptworker nodes are the verification points. Scriptworkers run the release sensitive tasks, like signing and publishing releases. They verify their task definitions, as well as all upstream tasks that generate inputs into their task. Any broken link in the chain results in a task exception.
In conjunction with other best practices, like separation of roles, we can reduce attack vectors and make penetration attempts more visible, with task exceptions on release branches.
Chain of Trust Versions
Initial Chain of Trust implementation with GPG signatures: Initial 1.0.0b1 on 2016-11-14
CoT v2: rebuild task definitions via json-e. 7.0.0 on 2018-01-18
Generic action hook support. 12.0.0 on 2018-05-29
Release promotion action hook support. 17.1.0 on 2018-12-28
ed25519 support; deprecate GPG support. 22.0.0 on 2019-03-07
drop support for gpg 23.0.0 on 2019-03-27
drop support for non-hook actions 41.0.0 on 2021-09-02
Chain of Trust Key Management
Ed25519 key management is a critical part of the chain of trust. There are valid ed25519 keys per worker implementation (docker-worker, generic-worker, and scriptworker).
Base64-encoded seeds that can be converted to valid level 3 ed25519 pubkeys are
recorded in scriptworker.constants
, in
DEFAULT_CONFIG['ed25519_public_keys']
. These are tuples to allow for key
rotation.
At some point we may add per-cot-project sets of pubkeys. We may also move the source of truth of these pubkeys to a separate location, to enable cot signature verification elsewhere, outside of scriptworker.
verifying new ed25519 keys
The verify_cot
commandline tool supports a --verify-sigs
option. This
will turn on signature verification, and will break if the cot artifacts are
not signed by valid level 3 ed25519 keys.
There is also a verify_ed25519_signature
commandline tool. This takes
a file path and a signature path, and verifies if the file was validly signed
by a known valid level 3 key. It also takes an optional --pubkey PUBKEY
argument, which allows you to verify if the file was signed by that pubkey.
Rotating the FirefoxCI CoT keys
See this mana page.
Chain of Trust Artifact Generation
Each chain-of-trust-enabled taskcluster worker generates and uploads a chain of trust artifact after each task. This artifact contains details about the task, worker, and artifacts, and is signed by the embedded ed25519 key.
Embedded ed25519 keys
Each supported taskcluster workerType
has an embedded ed25519 keypair.
These are the second factor.
docker-worker
has the ed25519 privkey embedded in the AMI, inaccessible
to tasks run inside the docker container.
generic-worker
can embed the ed25519 privkey into the AMI for EC2
instances, or into the system directories for hardware. This are
permissioned so the task user doesn’t have access to it.
Chain-of-Trust-enabled scriptworker
workers have a valid ed25519 keypair.
The pubkeys for trusted workerTypes are recorded in
scriptworker.constants.ed25519_public_keys
.
Chain of Trust artifacts
After the task finishes, the worker creates a chain of trust json blob,
ed25519 signs it, then uploads it as public/chain-of-trust.json
and its
detached signature, public/chain-of-trust.json.sig
. It looks like
{
"artifacts": {
"path/to/artifact": {
"sha256": "abcd1234"
},
...
},
"chainOfTrustVersion": 1,
"environment": {
# worker-impl specific stuff, like ec2 instance id, ip
},
"runId": 0,
"task": {
# task defn
},
"taskId": "...",
"workerGroup": "...",
"workerId": "..."
}
The v1 chain-of-trust json artifact schema is viewable here.
This is a real
example artifact
.
Chain of Trust Verification
Currently, only chain-of-trust-enabled scriptworker instances verify the chain of trust. These are tasks like signing, publishing, and submitting updates to the update server. If the chain of trust is not valid, scriptworker kills the task before it performs any further actions.
The below is how this happens.
Decision Task
The decision task is a special task that generates a taskgraph, then submits it to the Taskcluster queue. This graph contains task definitions and dependencies. The decision task uploads its generated graph json as an artifact, which can be inspected during chain of trust verification.
We rebuild the decision task’s task definition via json-e, and verify that it matches the runtime task definition.
Ed25519 key management
The chain of trust artifacts are signed. We need to keep track of the ed25519 public keys to verify them.
We keep the level 3 gecko pubkeys in scriptworker.constants.ed25519_public_keys
, as base64-encoded ascii strings. Once decoded, these are the seeds for the ed25519 public keys. These are tuples of valid keys, to allow for key rotation.
Building the chain
First, scriptworker inspects the [signing/balrog/pushapk/beetmover/etc] task that it claimed from the Taskcluster queue. It adds itself and its Decision Task to the chain.
Any task that generates artifacts for the scriptworker then needs to be inspected. For scriptworker tasks, we have task.payload.upstreamArtifacts
, which looks like
[{
"taskId": "upstream-task-id",
"taskType": "build", # for cot verification purposes
# paths can be specific artifacts, or globbed patterns
"paths": ["path/to/artifact1", "path/to/artifact2", "path/to/globbed/artifacts/*", "path/to/partially/globbed/artifacts/*.zip"],
"formats": ["gpg", "jar"] # This is signing-specific for now; we could make formats optional, or use it for other task-specific info
}, {
...
}]
We add each upstream taskId
to the chain, with corresponding taskType
(we use this to know how to verify the task).
For each task added to the chain, we inspect the task definition, and add other upstream tasks:
if the decision task doesn’t match, add it to the chain.
docker-worker tasks have
task.extra.chainOfTrust.inputs
, which is a dictionary like{"docker-image": "docker-image-taskid"}
. Add the docker imagetaskId
to the chain (this will likely have a different decisiontaskId
, so add that to the chain).
Verifying the chain
Scriptworker:
downloads the chain of trust artifacts for each upstream task in the chain, and verifies their signatures. This requires detecting which worker implementation each task is run on, to know which ed25519 public key to use. At some point in the future, we may switch to an OpenSSL CA.
downloads each of the
upstreamArtifacts
and verify their shas against the corresponding task’s chain of trust’s artifact shas. the downloaded files live incot/TASKID/PATH
, so the script doesn’t have to re-download and re-verify.downloads each decision task’s
task-graph.json
. For every other task in the chain, we make sure that their task definition matches a task in their decision task’s task graph.rebuilds decision and action task definitions using json-e, and verifies the rebuilt task definition matches the runtime definition.
verifies each docker-worker task is either part of the
prebuild_docker_image_task_types
, or that it downloads its image from a previous docker-image task.verifies each docker-worker task’s docker image sha.
makes sure the
interactive
flag isn’t on any docker-worker task.determines which repo we’re building off of.
matches its task’s scopes against the tree; restricted scopes require specific branches.
Once all verification passes, it launches the task script. If chain of trust verification fails, it exits before launching the task script.
Chain of Trust Testing / debugging
The verify_cot
entry point allows you to test chain of trust
verification without running a scriptworker instance locally.
Create the virtualenv
Install git,
python>=3.6
, and python3 virtualenv.Clone scriptworker and create virtualenv:
git clone https://github.com/mozilla-releng/scriptworker
cd scriptworker
virtualenv3 venv
. venv/bin/activate
python setup.py develop
Set up the test env
Create a ~/.scriptworker or ./secrets.json with test client creds.
Create the client at the client manager. Mine has the
assume:project:taskcluster:worker-test-scopes
scope, but I don’t think that’s required.The ~/.scriptworker or ./secrets.json file will look like this (fill in your clientId and accessToken):
{
"credentials": {
"clientId": "mozilla-ldap/asasaki@mozilla.com/signing-test",
"accessToken": "********"
}
}
Find a task to test
Find a cot-enabled task on treeherder to test.
Click it, click ‘inspect task’ in the lower left corner.
The taskId will be in a field near the top of the page.
Run the test
Now you should be able to test chain of trust verification!
verify_cot --task-type TASKTYPE TASKID # e.g., verify_cot --task-type signing cbYd3U6dRRCKPUbKsEj1Iw
To test with signature verification, use the
--verify-sigs
option. This only works for level 3 trusted workers, since we don’t keep track of the other pubkeys..