Skip to content

Commit

Permalink
ULM integration documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
mariaKt committed Nov 25, 2024
1 parent 30f4fc6 commit b9644b0
Showing 1 changed file with 268 additions and 0 deletions.
268 changes: 268 additions & 0 deletions ULM-INTEGRATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
## Configuration of ULM-Version of Solidity Lite Semantics

We first explain the main cells of the configuration of the solidity semantics.
```
configuration
```
The contents of the k cell are explained in more detail later, in the section
about handling creation and call.
```
// The program and the transaction to be executed.
<k> decodeProgram($PGM:Bytes) ~> execute($CREATE:Bool, #if $CREATE:Bool #then $PGM:Bytes #else CallData() #fi) </k>
// The compile cell contains information about the solidity program, e.g.
// its contracts and interfaces. It is populated as rules match and
// execute decodeProgram($PGM:Bytes)
<compile>
// Contains the type of the contract whose body we are currently in.
<current-body> Id </current-body>
// Cell containing a subcell per interface in the program.
<ifaces>
// Maintain information related to an interface
<iface multiplicity="*" type="Map">
<iface-id> Id </iface-id> // Interface type
<iface-fns> // Interface functions
// A cell per interface function
<iface-fn multiplicity="*" type="Map">
<iface-fn-id> Id </iface-fn-id> // Interface function name
// Interface function argument types
<iface-fn-arg-types> .List </iface-fn-arg-types>
// Interface function return types
<iface-fn-return-types> .List </iface-fn-return-types>
</iface-fn>
</iface-fns>
</iface>
</ifaces>
// Cell containing a subcell per contract in the program.
<contracts>
// Maintain information related to a contract
<contract multiplicity="*" type="Map">
// Contract type
<contract-id> Id </contract-id>
// Map from contract state variable names to their types
<contract-state> .Map </contract-state>
```
The usage of the cells `<contract-current-sv-address>` and `<contract-statevar-addresses>`
is explained in more detail below, in the section about changing the storage according
to the ULM interface.
```
// Contains the next available Int that will correspond to a state variable
<contract-current-sv-address> 0:Int </contract-current-sv-address>
// Map from contract state variables to their corresponding unique Int.
<contract-statevar-addresses> .Map </contract-statevar-addresses>
// Code initializing the contract state variables.
<contract-init> .List </contract-init>
// Map from computed function selectors to their corresponding contract functions.
<function-selector> .Map </function-selector>
<contract-fns> // Contract functions
// A cell per contract function
<contract-fn multiplicity="*" type="Map">
// Contract function name
<contract-fn-id> Id </contract-fn-id>
<contract-fn-visibility> public </contract-fn-visibility>
<contract-fn-arg-types> .List </contract-fn-arg-types>
<contract-fn-param-names> .List </contract-fn-param-names>
<contract-fn-return-types> .List </contract-fn-return-types>
<contract-fn-return-names> .List </contract-fn-return-names>
<contract-fn-payable> false </contract-fn-payable>
// Statements corresponding to the function's body.
<contract-fn-body> .Statements </contract-fn-body>
</contract-fn>
</contract-fns>
<contract-events> // Events defined within the contract
// A cell per event
<contract-event multiplicity="*" type="Map">
// Event name
<contract-event-id> Id </contract-event-id>
<contract-event-arg-types> .List </contract-event-arg-types>
<contract-event-param-names> .List </contract-event-param-names>
// A set containing Ints corresponding to the positions, in the
// argument list of the event, of indexed parameters.
<contract-event-indexed-pos> .Set </contract-event-indexed-pos>
</contract-event>
</contract-events>
</contract>
</contracts>
</compile>
// The exec cell maintains information about the transaction being executed.
<exec>
// Sender, value, origin, timestamp. Inititialized using the ULM interface.
<msg-sender> Int2MInt(Caller())::MInt{160} </msg-sender>
<msg-value> Int2MInt(CallValue())::MInt{256} </msg-value>
<tx-origin> Int2MInt(Origin())::MInt{160} </tx-origin>
<block-timestamp> Int2MInt(BlockTimestamp())::MInt{256} </block-timestamp>
// Address of this (active contract)
<this> Int2MInt($ACCTCODE:Int)::MInt{160} </this>
// Type of this (active contract)
<this-type> Id </this-type>
// Enviroment: a map from Identifiers for variables to Ints, representing locations in the store
<env> .Map </env>
// Store: a list, that in each position holds an Int representing a Value
<store> .List </store>
// Name of function that is currently being executed
<current-function> Id </current-function>
<call-stack> .List </call-stack> // Stack of call frames
<status> EVMC_SUCCESS </status> // Status code to be returned
<gas> $GAS:Int </gas> // Available gass
</exec>
```

## ULM Integration for Solidity Lite Semantics

The next sections describe the main steps taken to integrate the ULM
functionality with the solidity semantics.

### Removing all summarization
This is specific to the solidity semantics alone. Before starting the ULM
integration we had applied manual CSE for the contract UniswapV2SwapRenamed.sol,
a process that resulted in additional rules (in src/uniswap-summaries.md). As
part of the ULM integration, we removed all changes related to summarization,
which included:
- importing the module that included the summarized rules, `imports SOLIDITY-UNISWAP-SUMMARIES`,
from the module SOLIDITY in src/solidity.md.
- removing the `requires "uniswap-summaries.md"` from src/solidity.md.
- removed the cell `<summarize>`, and all its uses.

### Change contract storage to correspond to the ULM interface. Implement assignment and access of state variables accordingly.
The ULM interfaces to access and edit a contract's storage are
```
syntax Int ::= GetAccountStorage(key: Int)
syntax K ::= SetAccountStorage(key: Int, value: Int)
```
. They provide an Int as a key to the storage, as per the EVM memory model, and
store an Int value. In contrast, the Solidity semantics were implemented
- with Identifiers as keys to each contract's storage.
- Sort Value as values. Aggregate types, e.g. arrays and mappings, were stored
as K Lists and K Maps respectively, still with a single key.
Therefore, in order to be able to make use of the ULM interface, we needed to
reimplement the layout of mappings (arrays are TODO, as they were not needed for
the demo) as well as access and assignement to state variables in a way that
stored values are of sort Int and each stored value would correspond to a unique
integer address. To that end, we implemented a hash-based schema, where we
assign a unique integer `I` to each state variable `V`. If `V` is scalar, then
`I` is its address. If `V` is a mapping, then `V[i]` is accessed by appending
`i` to `I` and obtaining the keccack hash. For a multidimensional mapping, the
process of appending the next index and hashing is repeated.
The cell `<contract-current-sv-address>` is used to track which unique number
should be assigned to a state variable. The cell `contract-statevar-addresses`
maps the variable Identifiers to their assigned addresses.

### Removing all the maintained/emulated "blockchain" and use ULM interface instead
The solidity lite semantics were initially designed to "emulate", or locally
maintain, the blockchain state needed to execute the requested transactions. The
live contracts were maintained in a cell (`<live-contracts>`) that, for each
contract contained a cell `<live-contract>` with its address (`<contract-address>`),
its type (`<contract-type>`), and its storage (`<contract-storage>`). These
cells were used by the rules for assignment and lookup, in order to find the
appropriate contract's storage, and either update it or return a value.
As part of the ULM integration, we removed this locally maintained state from
the configuration, because we would now be accessing it instead through the ULM
interface.
We edited the rule for assignment to state variables to no longer match the
removed cells, using the `SetAccountStorage(key: Int, value: Int)` instead. The
key to the store DB is the unique address for each state variable/mapping
element, the computation of which is described in the previous section. The
value is the value to be assigned (which is the same as in the assignment rule
prior to the ULM integration modifications) converted to Int. There are
different rules to match and convert to Int from differend types. Currrently, we
handle assignment to state variables that are of type uint8, unit32, uint112,
uint256/uint, address (uint160), bool, contract, and mapping with uint256 or
address value types.
We similarly edit the rule for state variable lookup to also not match the
removed cells and instead use `GetAccountStorage(key: Int)`with the same key.
The Int obtained by `GetAccountStorage` is converted to a Value of the
appropriate type. Currently, we support lookup of state variables of type uint8,
unit32, uint112, uint256/uint, address (uint160), bool, contract, and mapping
with uint256 or address value types.

NOTE: There are additional rules in the semantics that need to be updated by
removing the usage of the `<live-contracts>` cell and instead use ULM interfaces
(e.g., rules for new, external call). They have been commented out and are
TODO, since they were not needed for the demo.

### Compute and store function selectors for public/external functions.
In order to be able to call a function `F` with function selector `S`, provided
as part of the calldata, the solidity semantics needs to precompute the function
selectors for all public/external functions of the contracts in `$PGM`. This
includes the contract functions with public/external visibility, and the
automatically generated getter functions for public state variables. The
function selectors are stored per contract in the subcell `<function-selector>`,
which contains a Map with function selectors as keys and Identifiers
(corresponding to function names) as values.
For a function `F`, we construct its function selector according to the solidity
ABI, as follows:
- We first create the function signature `Sig`, by combining the function name
`F`, `"("`, the canonical name of the formal parameter types followed by `","`
unless it is the last, and `")"`. We properly handle geneating the correct
canonical type name for uint256/uint, uint112, uint32, uint8, address and
bool, needed for the demo.
- We compute the keccak hash of `Sig`.
- We get the first four bytes.


### Implement getGasLeft
We added a cell `<gas>`, and initialize it with the gas sent with the
transaction. We then implement the function `getGasLeft` of ULM, making it match
and return the contents of the `<gas>` cell.
Note that the Solidity semantics does not do gas computation. Therefore, at the
moment the returned gas is always equal to the gas that was sent.

### Implement getStatus
We added a cell `<status>`, and initialize it with the status code for
successfull execution, `EVMC_SUCCESS`. We also edit the rules for a failed
`require` and `assert` to include rewriting the contents of the `<status>` cell
to `EVMC_REVERT` and `EVMC_FAILURE` respectively. Finally, we implement the
function `getStatus` of ULM, making it match and return the contents of the
`<status>` cell.

### Implement getOutput
We implement the function `getOutput` of ULM, making it match and return the top
of the `<k>` cell, encoded as required by the Solidity ABI. We handle return
types of Bytes, uint8, uint256, bool, and string (limited to returning a single
string of length < 32).

### Handle creation ($CREATE=true) and call ($SREATE=false)

The k cell comprises (1) the contract(s) and (2) the requested transaction.
1. The contracts are contained in the conriguration variable `$PGM` as Bytes.
`$PGM` is decoded using `decodeProgram`, which simply hooks into
`ULM.decode` and returns Sort Program. The rewrite rules executing the
provided Program result in updating the contents of the `<compile>` cell as
needed to represent the provided contract(s). We require that, while `$PGM`
may contain one or more contracts, the active contract must be the last on
the list. This ensures that the cell `<current-body>` ends up containing the
type of the active contract.
2. The requested transaction is either a contract creation or a function call,
as dictated by the value of the cponfiguration variable `$CREATE`. If
`$CREATE==true`, this is a contract creation and the contract bytes is
provided to the `execute` function. If `$CREATE==false`, this is a call and
the calldata is provided to `execute` instead.

#### execute function
- case `true`
This is a contract creation, thus `execute(true, B)` must be rewritten to
initializing the contract and return the resulting Bytes. The contracts for
the demo do not have constructors, therefore we did not need to add a call to
the constructor. We only need to match and execute `INIT`, which is the code
initializing the contract's state variables, and return the contract bytes.
- case `false`
This is a call to one of the functions of the active contract.
`execute(false, B)` identifies the called function `F` of the active contract
using the function selector (first four bytes of `B`). It decodes the provided
arguments (remaining `B`) using the types of the parameters of `F`. We
currently support decoding for the types uint256/uint, uint112, uint32, uint8
and bool.

### Event logging

We handle event encoding (as needed for the demo), restricting ourselves to the
following cases:
- no unnamed events. In this case, topic 0 is the keccak hash of the event
signature, computed the same way as the function signature above.
- event argument types of only uint256 and uint160 (address).

We use the cell `<contract-event-indexed-pos>` that contains the indexed
parameters. If a parameter is indexed, we create an Int for its value and it
will become a topic. Otherwise we append its value to the end of an initially
empty byte string with all other non indexed parameters. Depending on the number
of topics, we call the appropriate ULM interface `Log1`, `Log2`, `Log3`, or
`Log4`.

0 comments on commit b9644b0

Please sign in to comment.