Skip to content
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions rfcs/0000-cighash-all/0000-cighash-all.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
Number: "0000"
Category: Standards Track
Status: Proposal
Author: Xuejie Xiao <[email protected]>
Created: 2025-02-05
---

# CIGHASH_ALL

This document defines a new message calculation scheme used by CKB lock scripts to guard against malleable attacks.

## Rationale

Unlike most blockchains out there, CKB does not formally define signature verification flow in CKB transactions. Instead, a CKB transaction is considered to be valid when all lock scripts in its input cells, as well as all type scripts in its input & output cells succeed in [execution](https://github.com/nervosnetwork/rfcs/blob/master/rfcs/0003-ckb-vm/0003-ckb-vm.md).

Nonetheless, a transaction must not be malleable, meaning a transaction shall not be tampered with after someone creates it in the first place. By convention, the lock scripts in CKB guard against malleable attacks: a typical lock script, running for a series of input cells forming a particular script group, would calculage a `message` by accessing the transaction it runs upon, then fetches a signature from one of the designated witness field. It then runs a signature verification process to validate the signature against the `message`, and only succeeds when the signature passes the verification. With this mechanism, any tampering on the transaction itself will result in a different `message`, resulting in a failure of the verification process, leading to a failure of the execution of lock scripts.

The exact way to calculate such a `message` processes enough challenge, since the `message` shall capture enough information so the transaction is safe from any tampering, while the `message` shall not cover too much data to obscure interoperability.

Historically, a particular `message` calculation algorithm has been [introduced](https://github.com/nervosnetwork/ckb-system-scripts/blob/934166406fafb33e299f5688a904cadb99b7d518/c/secp256k1_blake160_sighash_all.c#L149-L219) by lock scripts included in CKB's genesis blocks, and used since then. Many other locks from the community have also adopted a similar workflow. However, this workflow has only since existed in part of a script's implementation. It has never been properly documented. On the other hand, certain pitfalls of this very workflow arise as we learn more about coding for CKB's particular environment:

* While the current workflow assumes the first witness of current executed input cells [script group](https://github.com/nervosnetwork/rfcs/blob/master/rfcs/0022-transaction-structure/0022-transaction-structure.md) is a [WitnessArgs](https://github.com/nervosnetwork/ckb/blob/a6733e6af5bb0da7e34fb99ddf98b03054fa9d4a/util/types/schemas/blockchain.mol#L104-L108) structure serialized in the [molecule](https://github.com/nervosnetwork/rfcs/blob/master/rfcs/0008-serialization/0008-serialization.md) serialization format, this particular assumption is not enforced, and there is code that [exploits](https://github.com/cryptape/quantum-resistant-lock-script/blob/22de5369b60b1e59bb698927c143d9efbe8527a9/c/ckb-sphincsplus-lock.c#L67-L80) this oversight for certain gains. We do believe this can be a problem as future standards arise.
* The current workflow covers the whole [Transaction](https://github.com/nervosnetwork/rfcs/blob/master/rfcs/0022-transaction-structure/0022-transaction-structure.md) structure, as well as all witnesses from the current script group. However, the `Transaction` structure only contains pointer to all the consumed input cells, it does not cover any contents of the input cells, e.g., the CKBytes store in each input cell, or any input cell's data. This makes it harder to design a proper offline signing protocol. If we dig through the literature, the Bitcoin community actually made the same choice early, but later [came up](https://en.bitcoin.it/wiki/BIP_0143) with an updated design, that signs actual contents of each input UTXOs as well. We do believe a message that covers all input cells' contents can definitely bring merits to future CKB wallets & appliations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo appliations

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the Transaction structure only contains pointer to all the consumed input cells, it does not cover any contents of the input cells, e.g., the CKBytes store in each input cell, or any input cell's data

This part explains what the current transaction hash captures, am i correct?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If yes, with a replace by fee feature enabled, and since transaction hash doesn't capture the CKBytes, can the following happen?

Alice sends Bob 100 ckb using her 500ckb cell, get 400 in change.
Nothing stops Bob to take that same signature and add to a new transaction that says:
Alice sends Bob 400ckb using her 500ckb cell, get 100 in change.

Bob broadcast + prioritize it and get 400, instead of 100?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you slightly misunderstand the following lines:

the Transaction structure only contains pointer to all the consumed input cells

Yes a transaction hash does not cover the actual contents of input cells(e.g., CKBytes of each input cell), however transaction hash does cover OutPoint in CellInput structure, which can be viewed as a pointer to an input cell. This pointer, already helps guard against manipulating input cells in a signed CKB transactions:

Your described attacks won't happen due to several reasons:

  • First of all, Alice sends Bob 100 ckb or Alice sends Bob 400 ckb will be represented as an output cell to Bob. This means the new transaction has an output cell changed, the transaction hash naturally changed, the old signature won't work
  • Even if we fit the output cell different, say a transaction Alice sends Bob 400 ckb using her 500 ckb cell, get 99 in change, using 1 ckb as fees, then someone(most likely a miner) changes it to Alice sends Bob 400 ckb using her 1000 ckb cell, get 99 in change, using 501 ckb as fees, now all output cells stay the same, so the problem described in the first bullet point won't happen. However, we still need to perform the attack, by swaping one input cells to a different one, 2 solutions exist:
    ** We could simply change one OutPoint in one of CellInput structure from the transaction to point to a new cell which holds 1000 ckb from Alice, however, this means one CellInput structure is changed in the old transaction, the transaction hash in the new transaction changes as well. The signature validating flow changes
    ** If we cannot change anything from the signed transaction, the question now is shifted to: can we increase the CKBytes stored in a cell on-chain, while also keeping the OutPoint used to reference this on-chain cell the same? To me this is an even harder task, I don't have a way to make it happen now, let me know if you believe such attacks exist.

To summarize here, I don't personally believe the old, current way of signing transactions in CKB has any weak points so one can manipulate a transaction after it is signed. However, being secure on chain is not enough sometimes, CIGHASH_ALL enhances the workflow by signing contents from input cells as well, simply to make offline signing easier.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for clarifying. You're right i misunderstood the line

will be represented as an output cell to Bob. This means the new transaction has an output cell changed, the transaction hash naturally changed

This should clear things up.


As a result, this document aims to propose `CIGHASH_ALL`, a properly defined message calculation scheme used by CKB lock scripts to ensure transactions are not malleable.

The name `CIGHASH_ALL` comes from `CKB's Signature Hash All`. In many places, including [filename](https://github.com/nervosnetwork/ckb-system-scripts/blob/master/c/secp256k1_blake160_sighash_all.c) from CKB's system script code, `SIGHASH_ALL` has been used to represent the old workflow to calculate a signing message. Here we explicitly pick a different name, so as to distinguish between the two.
Copy link
Member

@janx janx Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest choosing a better name. CIGHASH_ALL can easily cause confusion, as it sounds the same as SIGHASH_ALL and a single char typo could turn it into SIGHASH_ALL.

Copy link
Contributor Author

@xxuejie xxuejie Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other names I can think of is CKB_HASH_ALL, CKB_SIGHASH_ALL, CKB_MSGHASH_ALL, or any other suggestions are welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nervos's sighash_all uses an algorithm that aligns with Bitcoin's pre-SegWit SIGHASH, so inputs metadata is not included in the message to sign.

Bitcoin tackled this in BIP143: https://github.com/bitcoin/bips/blob/master/bip-0143.mediawiki

Maybe we can directly or indirectly reference BIP143/SegWit in the name 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I personally don't care about what the name is, I just care about the fact that a commonly-agreed name is decided here. So I will leave this comment as it is, and will always be happy to modify the name to whatever is chosen.

Copy link
Contributor Author

@xxuejie xxuejie Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some thoughts, I suggest the name CKB_SIGHASH_ALL. I will leave this name here for a few days, if no further questions arise, I will make the actual changes here.

EDIT: personally, I feel it necessary to acknowledge BIP143 in the RFC spec definition, but I think it is not appropriate to include BIP143 in the name of the new spec.

Copy link
Contributor Author

@xxuejie xxuejie Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To combine the above comments, I would suggest a different name here: CKB_TX_MESSAGE_ALL, it consists of 3 parts:

  • The CKB_TX_ prefix works as a namespace denoting that we are building a specification for CKB transactions. Let's be nice to the whole blockchain community, to avoid a name that is too general
  • MESSAGE has 2 advantages: first, it helps avoid confusion with SIGHASH; second, if you think about the above defined specification and the reference implementation, we are not really designing a hash here, we are designing a spec which is a series of concatenated bytes, you can keep the bytes as it is, or you can feed them into a hash. So what we have here, really is a message, not necessarily a hash.
  • The final suffix defines the range of the transaction to include in the message. Personally, I think all and full are both fine but semantically speaking I fell all is slightly better when I checked the dictionary. But I'm not a native English speaker, @Matt-RUN-CKB @jordanmack care to weigh in here?

So my current suggestion would be CKB_TX_MESSAGE_ALL. Like the previous one, I'm gonna keep it here for a few days, and will actually make the change later if no further comments are received.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even CKB_MESSAGE_FULL sounds good 👍 (I'm not a native English speaker tho)

About that TX, what's the reason for its inclusion? Can you foresee a possible future where messages are created from non-TX entities?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About that TX, what's the reason for its inclusion? Can you foresee a possible future where messages are created from non-TX entities?

CKB is more than just transactions and scripting part, there might be also messages in other parts, such as p2p layers and others. Adding TX as part of suffix make it more precise and future proof.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we haven't heard more on the suggestions, I've renamed the specification from CIGHASH_ALL to CKB_TX_MESSAGE_ALL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, for sure it's an improvement over CIGHASH_ALL 👍


## Specification

For a CKB transaction, `CIGHASH_ALL` utilities the following workflow:

* The first witness field of the current running script group, must be a valid `WitnessArgs` structure serialized in the molecule serialization format, with compatible mode turned off. The message calculation workflow fails if molecule validation fails.
* The byte concatenation of all the following fields is then calculated, following the exact same order defined here:
+ 32-byte transaction hash returned by [Load Transaction Hash](https://github.com/nervosnetwork/rfcs/blob/bd5d3ff73969bdd2571f804260a538781b45e996/rfcs/0009-vm-syscalls/0009-vm-syscalls.md#load-transaction-hash) syscall.
+ For each input cell of the current transaction in sequential order:
* The full [CellOutput](https://github.com/nervosnetwork/ckb/blob/a6733e6af5bb0da7e34fb99ddf98b03054fa9d4a/util/types/schemas/blockchain.mol#L44-L48) structure of current input cell serialized in the molecule serialization format, which is also the full content returned by [Load Cell](https://github.com/nervosnetwork/rfcs/blob/bd5d3ff73969bdd2571f804260a538781b45e996/rfcs/0009-vm-syscalls/0009-vm-syscalls.md#load-cell) syscall, given the correct `index` and `source`.
* The length of current input cell, packed in little-endian encoded unsigned 32-bit integer.
* The full cell data of current input cell, or the full content returned by [Load Cell Data](https://github.com/nervosnetwork/rfcs/blob/bd5d3ff73969bdd2571f804260a538781b45e996/rfcs/0009-vm-syscalls/0009-vm-syscalls.md#load-cell-data) syscall, given the correct `index` and `source`.
+ The first 16 bytes of data from the first witness field in current script group.
+ The whole `input_type` field (a `BytesOpt` structure) from the first witness field in current script group.
+ The whole `output_type` field (a `BytesOpt` structure) from the first witness field in current script group.
+ Starting from the second witness field in current script group, for each witness in sequential order:
* The length of the witness field, packed in little-endian encoded unsigned 32-bit integer.
* The full witness field, or the full content returned by [Load Witness](https://github.com/nervosnetwork/rfcs/blob/bd5d3ff73969bdd2571f804260a538781b45e996/rfcs/0009-vm-syscalls/0009-vm-syscalls.md#load-witness) syscall, given the correct `index` and `source`.
+ Starting from the first witness that do not have an input cell of the same index(e.g., assuming a transaction has 5 input cells in total, the counting here starts from index 5 of witnesses), for each witness in sequential order:
* The length of the witness field, packed in little-endian encoded unsigned 32-bit integer.
* The full witness field, or the full content returned by [Load Witness](https://github.com/nervosnetwork/rfcs/blob/bd5d3ff73969bdd2571f804260a538781b45e996/rfcs/0009-vm-syscalls/0009-vm-syscalls.md#load-witness) syscall, given the correct `index` and `source`.
* As an optional step, a cryptographic hashing algorithm can be leveraged to convert the above concatenated bytes into a hash of 32 bytes or more.

### Notable Points

There are several notable points worth mentioning regarding the above specification:

* The first witness of current running script group must be a valid [WitnessArgs](https://github.com/nervosnetwork/ckb/blob/81a1b9a1491edca0bc42c12d8bf0f715a055a93f/util/gen-types/schemas/blockchain.mol#L114-L118) structure serialized in the molecule serialization format. This has now become an enforce rule, it is not an assumption that can be exploited or ignored.
* The content of all input cells are covered by the message calculation workflow, making it much easier to design an offline signing scheme.
* Witness length is packed in 32-bit unsigned integers, while 64-bit unsigned integers were used in older workflow. Notice that all CKB data structures, including `CellOutput`, cell data, witness, etc., will first be serialized in molecule serialization format. Note molecule uses 32-bit integer to denote the length of a structure, this means that we will never have a `CellOutput` / cell data / witness structure that is bigger than 4GB, and there is no point in representing the length in 64-bit integers.
* A different concatenation/hashing design is introduced for the first witness of the current script group, discarding the original zero-filled design. We believe this new solution can contribute to a more optimized implementation, both in terms of runtime cycles and binary size.

## Examples

Following the defined spec above, a [series of libraries, CKB scripts and utilities](https://github.com/xxuejie/cighash-all-test-vector-utils) have been developed as a demonstration and inspiration. For example:

* A [Rust module](https://github.com/xxuejie/cighash-all-test-vector-utils/blob/c500a3dd8dd2b8e245527133709bc48d6e67d694/crates/cighash-all-utils/src/cighash_all_in_ckb_vm.rs) calculates `CIGHASH_ALL` message with the help of [ckb-std](https://docs.rs/ckb-std/latest/ckb_std/) to provide CKB-related APIs in CKB-VM environment. It is also designed in a generic way, which makes it compatible with different kinds of hashers;
* Another [Rust module](https://github.com/xxuejie/cighash-all-test-vector-utils/blob/c500a3dd8dd2b8e245527133709bc48d6e67d694/crates/cighash-all-utils/src/cighash_all_from_mock_tx.rs) also calculates `CIGHASH_ALL` message in a generic way. But it was designed to take the whole CKB [Transaction](https://docs.rs/ckb-gen-types/0.119.0/ckb_gen_types/packed/struct.Transaction.html) as input. Certainly, the CKB Transaction structure is missing the actual contents for all input cells, a user can either provide [MockTransaction](https://docs.rs/ckb-mock-tx-types/latest/ckb_mock_tx_types/struct.MockTransaction.html) instead, or simply provide the contents for input cells.
* A [C header-only implementation](https://github.com/xxuejie/cighash-all-test-vector-utils/blob/c500a3dd8dd2b8e245527133709bc48d6e67d694/contracts/c-assert-cighash/cighash_all.h) is also provided to calculate `CIGHASH_ALL` message, also in a generic way to support different kinds of hashers, in CKB-VM compatible environments.

All of the above Rust & C implementations have been carefully written, well optimized, and extensively tested. They are considered to be usable in production environments.

A [utility](https://github.com/xxuejie/cighash-all-test-vector-utils/tree/main/crates/native-test-vector-generator) is also provided so one can manually generate as many test vectors as one wish. Each test vector includes a tx file that can be accepted and executed in [ckb-debugger](https://github.com/nervosnetwork/ckb-standalone-debugger), as well as the generated `CIGHASH_ALL` message, together with enough information to generate such message.