zk_evm
A collection of libraries to prove Ethereum blocks with Polygon Zero Type 1 zkEVM, powered by starky and plonky2 proving systems.
Directory structure
This repository contains the following Rust crates:
-
mpt_trie: A collection of types and functions to work with Ethereum Merkle Patricie Tries.
-
smt_trie: A collection of types and functions to work with Polygon Hermez Sparse Merkle Trees (SMT).
-
trace_decoder: Flexible protocol designed to process Ethereum clients trace payloads into an IR format that can be understood by the zkEVM prover.
-
evm_arithmetization: Defines all the STARK constraints and recursive circuits to generate succinct proofs of EVM execution. It uses starky and plonky2 as proving backend: https://github.com/0xPolygonZero/plonky2.
-
zero_bin: A composition of
paladin
andevm_arithmetization
to generate EVM block proofs.
Dependency graph
Below is a simplified view of the dependency graph, including the proving system backends and the application layer defined within zero-bin.
%%{init: {'theme':'dark'}}%% flowchart LR subgraph ps [proving systems] A1{{plonky2}} A2{{starky}} end ps --> zk_evm subgraph zk_evm [zk_evm] B[mpt_trie] C[evm_arithmetization] D[trace_decoder] B --> C B ---> D C ---> D F{zero-bin} C --> F D --> F end
Documentation
Documentation is still incomplete and will be improved over time. You can look at the sequence diagrams for the proof generation flow, or go through the zkEVM book for explanations on the zkEVM design and architecture.
Branches
The default branch for the repo is the develop
branch which is not stable but under active development. Most PRs should target develop
. If you need a stable branch then a tagged version of main
is what you're after.
It should be assumed that develop
will break and should only be used for development.
Building
The zkEVM stack currently requires the nightly
toolchain, although we may transition to stable
in the future.
Note that the prover uses the Jemalloc memory allocator due to its superior performance.
License
Licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
STARK framework
Field
Our zkEVM is designed to have its execution traces encoded in a particular prime field , with . A nice property of this field is that it can represent the results of many common u32
operations. For example, (widening) u32
multiplication has a maximum value of , which is less than . In fact a u32
multiply-add has a maximum value of , so the result can be represented with a single field element, although if we were to add a carry in bit, this no longer holds.
This field also enables a very efficient reduction method. Observe that and consequently
To reduce a 128-bit number , we first rewrite as , where is 64 bits and are 32 bits each. Then
After computing , which can be done with a shift and subtraction, we add the first two terms, subtracting if overflow occurs. We then subtract , adding if underflow occurs.
At this point we have reduced to a u64
. This partial reduction is adequate for most purposes, but if we needed the result in canonical form, we would perform a final conditional subtraction.
Cost model
Our zkEVM is designed for efficient verification by STARKs [stark], particularly by an AIR with degree 3 constraints. In this model, the prover bottleneck is typically constructing Merkle trees, particularly constructing the tree containing low-degree extensions of witness polynomials.
Cross-Table Lookups
The various STARK tables carry out independent operations, but on shared values. We need to check that the shared values are identical in all the STARKs that require them. This is where cross-table lookups (CTLs) come in handy.
Suppose STARK requires an operation -- say -- that is carried out by another STARK . Then writes the input and output of in its own table, and provides the inputs to . also writes the inputs and outputs in its rows, and the table's constraints check that is carried out correctly. We then need to ensure that the inputs and outputs are the same in and .
In other words, we need to ensure that the rows -- reduced to the input and output columns -- of calling are permutations of the rows of that carry out . Our CTL protocol is based on logUp and is similar to our range-checks.
To prove this, the first step is to only select the rows of interest in and , and filter out the rest. Let be the filter for and the filter for . and are constrained to be in . (resp. ) whenever the row at hand carries out in (resp. in ), and 0 otherwise. Let also be two random challenges.
The idea is to create subtables and of and respectively, such that and for all their rows. The columns in the subtables are limited to the ones whose values must be identical (the inputs and outputs of in our example).
Note that for design and constraint reasons, filters are limited to (at most) degree 2 combinations of columns.
Let be the columns in an be the columns in .
The prover defines a "running sum" for such that:
The second equation "selects" the terms of interest thanks to and filters out the rest.
Similarly, the prover constructs a running sum for . Note that is computed "upside down": we start with and the final sum is in .
On top of the constraints to check that the running sums were correctly constructed, the verifier checks that . This ensures that the columns in and the columns in are permutations of each other.
In other words, the CTL argument is a logUp lookup argument where is the looking table, is the looked table, and (all the multiplicities are 1). For more details about logUp, see the next section.
To sum up, for each STARK , the prover:
-
constructs a running sum for each table looking into (called looking sums here),
-
constructs a running sum for (called looked sum here),
-
sends the final value for each running sum and to the verifier,
-
sends a commitment to and to the verifier.
Then, for each STARK , the verifier:
-
computes the sum ,
-
checks that ,
-
checks that each and was correctly constructed.
Range-checks
In most cases, tables deal with U256 words, split into 32-bit limbs (to avoid overflowing the field). To prevent a malicious prover from cheating, it is crucial to range-check those limbs.
What to range-check?
One can note that every element that ever appears on the stack has been pushed. Therefore, enforcing a range-check on pushed elements is enough to range-check all elements on the stack. Similarly, all elements in memory must have been written prior, and therefore it is enough to range-check memory writes. However, range-checking the PUSH and MSTORE opcodes is not sufficient.
-
Pushes and memory writes for "MSTORE_32BYTES" are range-checked in "BytePackingStark", except PUSH operations happening in privileged mode. See push_general_view.
-
Syscalls, exceptions and prover inputs are range-checked in "ArithmeticStark".
-
The inputs and outputs of binary and ternary arithmetic operations are range-checked in "ArithmeticStark".
-
The inputs' bits of logic operations are checked to be either 1 or 0 in "LogicStark". Since "LogicStark" only deals with bitwise operations, this is enough to have range-checked outputs as well.
-
The inputs of Keccak operations are range-checked in "KeccakStark". The output digest is written as bytes in "KeccakStark". Those bytes are used to reconstruct the associated 32-bit limbs checked against the limbs in "CpuStark". This implicitly ensures that the output is range-checked.
Note that some operations do not require a range-check:
-
"MSTORE_GENERAL" read the value to write from the stack. Thus, the written value was already range-checked by a previous push.
-
"EQ" reads two -- already range-checked -- elements on the stack, and checks they are equal. The output is either 0 or 1, and does therefore not need to be checked.
-
"NOT" reads one -- already range-checked -- element. The result is constrained to be equal to , which implicitly enforces the range check.
-
"PC": the program counter cannot be greater than in user mode. Indeed, the user code cannot be longer than , and jumps are constrained to be JUMPDESTs. Moreover, in kernel mode, every jump is towards a location within the kernel, and the kernel code is smaller than . These two points implicitly enforce 's range check.
-
"GET_CONTEXT", "DUP" and "SWAP" all read and push values that were already written in memory. The pushed values were therefore already range-checked.
Range-checks are performed on the range , to limit the trace length.
Lookup Argument
To enforce the range-checks, we leverage logUp, a lookup argument by Ulrich Häbock. Given a looking table and a looked table , the goal is to prove that
In our case, and is composed of all the columns in each STARK that must be range-checked.
The logUp paper explains that proving the previous assertion is actually equivalent to proving that there exists a sequence such that:
The values of can be stored in different columns of length each. In that case, the equality becomes:
The 'multiplicity' of value is defined as the number of times appears in . In other words:
Multiplicities provide a valid sequence of values in the previously stated equation. Thus, if we store the multiplicities, and are provided with a challenge , we can prove the lookup argument by ensuring:
However, the equation is too high degree. To circumvent this issue, Häbock suggests providing helper columns and such that at a given row :
The helper columns can be batched together to save columns. We can batch at most helper functions together. In our case, we batch them 2 by 2. At row , we now have:
If is odd, then we have one extra helper column:
For clarity, we will assume that is even in what follows.
Let be a generator of a subgroup of order . We extrapolate and to get polynomials such that, for : .
We can define the following polynomial:
Constraints
With these definitions and a challenge , we can finally check that the assertion holds with the following constraints:
These ensure that We also need to ensure that is well constructed for all :
Note: if is odd, we have one unbatched helper column for which we need a last constraint:
Finally, the verifier needs to ensure that the table was also correctly computed. In each STARK, is computed starting from 0 and adding at most 1 at each row. This construction is constrained as follows:
Tables
Our EVM statements are decomposed into several STARK tables, each corresponding to some coprocessor capable of handling specific operations, orchestrated by a Central Processing Unit.
Each coprocessor execution can be proven independently and concurrently, and a global check (via our cross-table lookups) enforces consistency of values being shared across multiple coprocessors.
Below is a depiction of the current decomposition of EVM statements execution1:
%%{init: {'theme':'dark'}}%% flowchart TB A[Arithmetic] BP[BytePacking] C[CPU] subgraph KK [ Keccak Hash ] direction LR K[Keccak] KS[KeccakSponge] K --- KS end L[Logic] M[Memory] subgraph MM [ zk-continuations ] MB[MemBefore] MA[MemAfter] MB --- MA end C --- A C --- BP C --- KK C --- L C --- M M --- MM
This diagram is simplified, and does not represent all interactions between co-processors.
CPU
The CPU is the central component of the zkEVM. Like any CPU, it reads instructions, executes them and modifies the state (registers and the memory) accordingly. The constraining of some complex instructions (e.g. Keccak hashing) is delegated to other tables. This section will only briefly present the CPU and its columns. Details about the CPU logic will be provided later.
CPU flow
An execution run can be decomposed into two distinct parts:
-
CPU cycles: The bulk of the execution. In each row, the CPU reads the current code at the program counter (PC) address, and executes it. The current code can be the kernel code, or whichever code is being executed in the current context (transaction code or contract code). Executing an instruction consists in modifying the registers, possibly performing some memory operations, and updating the PC.
-
Padding: At the end of the execution, we need to pad the length of the CPU trace to the next power of two. When the program counter reaches the special halting label in the kernel, execution halts. Constraints ensure that every subsequent row is a padding row and that execution cannot resume.
In the CPU cycles phase, the CPU can switch between different contexts, which correspond to the different environments of the possible calls. Context 0 is the kernel itself, which handles initialization (input processing, transaction parsing, transaction trie updating...) and termination (receipt creation, final trie checks...) before and after executing the transaction. Subsequent contexts are created when executing user code (transaction or contract code). In a non-zero user context, syscalls may be executed, which are specific instructions written in the kernel. They don't change the context but change the code context, which is where the instructions are read from.
Continuations
A full run of the zkEVM consists in initializing the zkEVM with the
input state, executing a certain number of transactions, and then
validating the output state. However, for performance reasons, a run is
split in multiple segments of at most MAX_CPU_CYCLES
cycles, which can
be proven individually. Continuations ensure that the segments are part
of the same run and guarantees that the state at the end of a segment is
equal to the state at the beginning of the next.
The state to propagate from one segment to another contains some of the
zkEVM registers plus the current memory. These registers are stored in
memory as dedicated global metadata, and the memory to propagate is
stored in two STARK tables: MemBefore
and MemAfter
. To check the
consistency of the memory, the Merkle cap of the previous MemAfter
is
compared to the Merkle cap of the next MemBefore
.
CPU columns
Registers
-
context
: Indicates which context we are in. 0 for the kernel, and a positive integer for every user context. Incremented by 1 at every call. -
code_context
: Indicates in which context the code to execute resides. It's equal tocontext
in user mode, but is always 0 in kernel mode. -
program_counter
: The address of the instruction to be read and executed. -
stack_len
: The current length of the stack. -
is_kernel_mode
: Boolean indicating whether we are in kernel (i.e. privileged) mode. This means we are executing kernel code, and we have access to privileged instructions. -
gas
: The current amount of gas used in the current context. It is eventually checked to be below the current gas limit. Must fit in 32 bits. -
clock
: Monotonic counter which starts at 0 and is incremented by 1 at each row. Used to enforce correct ordering of memory accesses. -
opcode_bits
: 8 boolean columns, which are the bit decomposition of the opcode being read at the current PC.
Operation flags
Boolean flags. During CPU cycles phase, each row executes a single
instruction, which sets one and only one operation flag. No flag is set
during padding. The decoding constraints ensure that the flag set
corresponds to the opcode being read. There isn't a 1-to-1
correspondence between instructions and flags. For efficiency, the same
flag can be set by different, unrelated instructions (e.g. eq_iszero
,
which represents the EQ
and the ISZERO
instructions). When there is
a need to differentiate them in constraints, we filter them with their
respective opcode: since the first bit of EQ
's opcode (resp.
ISZERO
's opcode) is 0 (resp. 1), we can filter a constraint for an EQ
instruction with eq_iszero * (1 - opcode_bits[0])
(resp.
eq_iszero * opcode_bits[0]
).
Memory columns
The CPU interacts with the EVM memory via its memory channels. At each row, a memory channel can execute a write, a read, or be disabled. A full memory channel is composed of:
-
used
: Boolean flag. If it's set to 1, a memory operation is executed in this channel at this row. If it's set to 0, no operation is done but its columns might be reused for other purposes. -
is_read
: Boolean flag indicating if a memory operation is a read or a write. -
3
address
columns. A memory address is made of three parts:context
,segment
andvirtual
. -
8
value
columns. EVM words are 256 bits long, and they are broken down in 8 32-bit limbs.
The last memory channel is a partial channel: it doesn't have its own
value
columns and shares them with the first full memory channel. This
allows us to save eight columns.
General columns
There are 8 shared general columns. Depending on the instruction, they are used differently:
-
Exceptions
: When raising an exception, the first three general columns are the bit decomposition of the exception code. They are used to jump to the correct exception handler. -
Logic
: For EQ, and ISZERO operations, it's easy to check that the result is 1 ifinput0
andinput1
are equal. It's more difficult to prove that, if the result is 0, the inputs are actually unequal. To prove it, each general column contains the modular inverse of for each limb (or 0 if the limbs are equal). Then the quantity will be 1 if and only if is indeed the modular inverse, which is only possible if the difference is non-zero. -
Jumps
: For jumps, we use the first two columns:should_jump
andcond_sum_pinv
.should_jump
conditions whether the EVM should jump: it's 1 for a JUMP, and for a JUMPI. To check if the condition is actually non-zero for a JUMPI,cond_sum_pinv
stores the modular inverse ofcondition
(or 0 if it's zero). -
Shift
: For shifts, the logic differs depending on whether the displacement is lower than , i.e. if it fits in a single value limb. To check if this is not the case, we must check that at least one of the seven high limbs is not zero. The general columnhigh_limb_sum_inv
holds the modular inverse of the sum of the seven high limbs, and is used to check it's non-zero like the previous cases. Contrary to the logic operations, we do not need to check limbs individually: each limb has been range-checked to 32 bits, meaning that it's not possible for the sum to overflow and be zero if some of the limbs are non-zero. -
Stack
:stack_inv
,stack_inv_aux
andstack_inv_aux_2
are used by popping-only (resp. pushing-only) instructions to check if the stack is empty after (resp. was empty before) the instruction.stack_len_bounds_ aux
is used to check that the stack doesn't overflow in user mode. We use the last four columns to prevent conflicts with the other general columns. See stack handling for more details. -
Push
:is_not_kernel
is used to skip range-checking the output of a PUSH operation when we are in privileged mode, as the kernel code is known and trusted. -
Context pruning
: WhenSET_CONTEXT
is called to return to a parent context, this makes the current context stale. The kernel indicates it by setting one general column to 1. For more details about context pruning, see context-pruning.
Arithmetic
Each row of the arithmetic table corresponds to a binary or ternary arithmetic operation. Each of these operations has an associated flag in the table, such that whenever the operation is and 0 otherwise. The full list of operations carried out by the table is as follows:
Binary operations
-
basic operations: "add", "mul", "sub" and "div",
-
comparisons: "lt" and "gt",
-
shifts: "shr" and "shl",
-
"byte": given , returns the -th "byte" in ,
-
modular operations: "mod", "AddFp254", "MulFp254" and "SubFp254",
-
range-check: no operation is performed, as this is only used to range-check the input and output limbs in the range [].
For 'mod', the second input is the modulus. "AddFp254", "MulFp254" and "SubFp254" are modular operations modulo "Fp254'' -- the prime for the BN curve's base field.
Ternary operations
There are three ternary operations: modular addition "AddMod", modular multiplication "MulMod" and modular subtraction "SubMod".
Besides the flags, the arithmetic table needs to store the inputs, output and some auxiliary values necessary to constraints. The input and output values are range-checked to ensure their canonical representation. Inputs are 256-bits words. To avoid having too large a range-check, inputs are therefore split into sixteen 16-bits limbs, and range-checked in the range .
Overall, the table comprises the following columns:
-
17 columns for the operation flags ,
-
1 column containing the opcode,
-
16 columns for the 16-bit limbs of the first input ,
-
16 columns for the 16-bit limbs of the second input ,
-
16 columns for the 16-bit limbs of the third input ,
-
16 columns for the 16-bit limbs of the output ,
-
32 columns for auxiliary values ,
-
1 column containing values in the range [], for the range-check,
-
1 column storing the frequency of appearance of each value in the range .
Note on
The opcode column is only used for range-checks. For optimization purposes, we check all arithmetic operations against the cpu table together. To ensure correctness, we also check that the operation's opcode corresponds to its behavior. But range-check is not associated to a unique operation: any operation in the cpu table might require its values to be checked. Thus, the arithmetic table cannot know its opcode in advance: it needs to store the value provided by the cpu table.
Auxiliary columns
The way auxiliary values are leveraged to efficiently check correctness is not trivial, but it is explained in detail in each dedicated file. Overall, five files explain the implementations of the various checks. Refer to:
-
"mul.rs" for details on multiplications.
-
"addcy.rs" for details on addition, subtraction, "lt" and "gt".
-
"modular.rs" for details on how modular operations are checked. Note that even though "div" and "mod" are generated and checked in a separate file, they leverage the logic for modular operations described in "modular.rs".
-
"byte" for details on how "byte" is checked.
-
"shift.rs" for details on how shifts are checked.
Note on "lt" and "gt"
For "lt" and "gt", auxiliary columns hold the difference between the two inputs . We can then treat them similarly to subtractions by ensuring that for "lt" and for "gt". An auxiliary column is used for the carry in additions and subtractions. In the comparisons case, it holds the overflow flag. Contrary to subtractions, the output of "lt" and "gt" operations is not but .
Note on "div"
It might be unclear why "div" and "mod" are dealt with in the same file.
Given numerator and denominator , we compute, like for other modular operations, the quotient and remainder : . We then set the associated auxiliary columns to and the output to .
This is why "div" is essentially a modulo operation, and can be addressed in almost the same way as "mod". The only difference is that in the "mod" case, the output is and the auxiliary value is .
Note on shifts
"shr" and "shl" are internally constrained as "div" and "mul" respectively with shifted operands. Indeed, given inputs , the output should be for "shr" (resp. for "shl"). Since shifts are binary operations, we can use the third input columns to store . Then, we can use the "div" logic (resp. "mul" logic) to ensure that the output is (resp. ).
Byte Packing
The BytePacking STARK module is used for reading and writing non-empty byte sequences of length at most 32 to memory. The "packing" term highlights that reading a sequence in memory will pack the bytes into an EVM word (i.e. U256), while the "unpacking" operation consists in breaking down an EVM word into its byte sequence and writing it to memory.
This allows faster memory copies between two memory locations, as well as faster memory reset (see memcpy.asm and memset.asm modules).
The 'BytePackingStark' table has one row per packing/unpacking operation.
Each row contains the following columns:
-
5 columns containing information on the initial memory address from which the sequence starts (namely a flag differentiating read and write operations, address context, segment and offset values, as well as timestamp),
-
32 columns indicating the length of the byte sequence ( if the length is , and otherwise),
-
32 columns indicating the values of the bytes that have been read or written during a sequence,
-
2 columns needed for range-checking the byte values.
Notes on columns generation
Whenever a byte unpacking operation is called, the value is read from the stack, but because the EVM and the STARKs use different endianness, we need to convert to a little-endian byte sequence. Only then do we resize it to the appropriate length, and prune extra zeros and higher bytes in the process. Finally, we reverse the byte order and write this new sequence into the columns of the table.
Whenever the operation is a byte packing, the bytes are read one by one from memory and stored in the columns of the BytePackingStark table.
Note that because of the different endianness on the memory and EVM sides, we write bytes starting with the last one.
The columns hold a boolean value. whenever we are currently reading or writing the i-th element in the byte sequence. otherwise.
Cross-table lookups
The read or written bytes need to be checked against both the cpu and the memory tables. Whenever we call , or on the cpu side, we make use of 'BytePackingStark' to make sure we are carrying out the correct operation on the correct values. For this, we check that the following values correspond:
-
the address (comprising the context, the segment, and the virtual address),
-
the length of the byte sequence,
-
the timestamp,
-
the value (either written to or read from the stack)
The address here corresponds to the address of the first byte.
On the other hand, we need to make sure that the read and write operations correspond to the values read or stored on the memory side. We therefore need a CTL for each byte, checking that the following values are identical in 'MemoryStark' and 'BytePackingStark':
-
a flag indicating whether the operation is a read or a write,
-
the address (context, segment and virtual address),
-
the byte (followed by 0s to make sure the memory address contains a byte and not a U256 word),
-
the timestamp
Note that the virtual address has to be recomputed based on the length of the sequence of bytes. The virtual address for the -th byte is written as: where is equal to .
Note on range-check
Range-checking is necessary whenever we do a memory unpacking operation that will write values to memory. These values are constrained by the range-check to be 8-bit values, i.e. fitting between 0 and 255 included. While range-checking values read from memory is not necessary, because we use the same columns for both read and write operations, this extra condition is enforced throughout the whole trace regardless of the operation type.
Keccak-f
This table computes the Keccak-f[1600] permutation.
Keccak-f Permutation
To explain how this table is structured, we first need to detail how the permutation is computed. This page gives a pseudo-code for the permutation. Our implementation differs slightly -- but remains equivalent -- for optimization and constraint degree reasons.
Let:
-
be the sponge width ( in our case)
-
be the number of Keccak rounds ()
-
a vector of round constants of size
-
be the input of the permutation, comprised of 64-bit elements
The first step is to reshape into a matrix. We initialize the state of the sponge with :
We store in the table, and subdivide each 64-bit element into two 32-bit limbs. Then, for each round , we proceed as follows:
-
First, we define . We store as bits in the table. This is because we need to apply a rotation on its elements' bits and carry out
xor
operations in the next step. -
Then, we store a second vector in bits, such that: .
-
We then need to store the updated value of : Note that this is equivalent to the equation in the official Keccak-f description: .
-
The previous three points correspond to the step in Keccak-f. We can now move on to the and steps. These steps are written as: where is the bitwise cyclic shift operation, and is the matrix of rotation offsets. We do not need to store : 's bits are only a permutation of 's bits.
-
The step updates the state once again, and we store the new values: Because of the way we carry out constraints (as explained below), we do not need to store the individual bits for : we only need the 32-bit limbs.
-
The final step, , consists in updating the first element of the state as follows: where Since only the first element is updated, we only need to store of this updated state. The remaining elements are fetched from . However, because of the bitwise operation, we do need columns for the bits of .
Note that all permutation elements are 64-bit long. But they are stored as 32-bit limbs so that we do not overflow the field.
It is also important to note that all bitwise logic operations (, and ) are checked in this table. This is why we need to store the bits of most elements. The logic table can only carry out eight 32-bit logic operations per row. Thus, leveraging it here would drastically increase the number of logic rows, and incur too much overhead in proving time.
Columns
Using the notations from the previous section, we can now list the columns in the table:
-
columns to determine which round is currently being computed. when we are in the -th round, and 0 otherwise. These columns' purpose is to ensure that the correct round constants are used at each round.
-
column which stores the timestamp at which the Keccak operation was called in the cpu. This column enables us to ensure that inputs and outputs are consistent between the cpu, keccak-sponge and keccak-f tables.
-
columns to store the elements of . As a reminder, each 64-bit element is divided into two 32-bit limbs, and comprises elements.
-
columns to store the bits of the vector .
-
columns to store the bits of the vector .
-
columns to store the bits of .
-
columns to store the 32-bit limbs of .
-
columns to store the bits of .
-
columns to store the two limbs of .
In total, this table comprises 2,431 columns.
Constraints
Some constraints checking that the elements are computed correctly are not straightforward. Let us detail them here.
First, it is important to highlight the fact that a between two elements is of degree 2. Indeed, for , the constraint is , which is of degree 2. This implies that a between 3 elements is of degree 3, which is the maximal constraint degree for our STARKs.
We can check that . However, we cannot directly check that , as it would be a degree 5 constraint. Instead, we use for this constraint. We see that: This implies that the difference is either 0, 2 or 4. We can therefore enforce the following degree 3 constraint instead:
Additionally, we have to check that is well constructed. We know that should be such that . Since we do not have the bits of elements but the bits of elements, we check the equivalent degree 3 constraint:
Finally, the constraints for the remaining elements, and are straightforward: is a three-element bitwise where all bits involved are already storedn and is the output of a simple bitwise with a round constant.
KeccakSponge
This table computes the Keccak256 hash, a sponge-based hash built on top of the Keccak-f[1600] permutation. An instance of KeccakSponge takes as input a Memory address , a length , and computes the Keccak256 digest of the memory segment starting at and of size . An instance can span many rows, each individual row being a single call to the Keccak table. Note that all the read elements must be bytes; the proof will be unverifiable if this is not the case. Following the Keccak specifications, the input string is padded to the next multiple of 136 bytes. Each row contains the following columns:
-
Read bytes:
-
3 address columns:
context
,segment
and the offsetvirt
of . -
timestamp
: the timestamp which will be used for all memory reads of this instance. -
already_absorbed_bytes
: keeps track of how many bytes have been hashed in the current instance. At the end of an instance, we should have absorbed bytes in total. -
KECCAK_RATE_BYTES
block_bytes
columns: the bytes being absorbed at this row. They are read from memory and will be XORed to the rate part of the current state.
-
-
Input columns:
-
KECCAK_RATE_U32S
original_rate_u32s
columns: hold the rate part of the state before XORing it withblock_bytes
. At the beginning of an instance, they are initialized with 0. -
KECCAK_RATE_U32s
xored_rate_u32s
columns: hold the original rate XORed withblock_bytes
. -
KECCAK_CAPACITY_U32S
original_capacity_u32s
columns: hold the capacity part of the state before applying the Keccak permutation.
-
-
Output columns:
-
KECCAK_DIGEST_BYTES
updated_digest_state_bytes columns
: the beginning of the output state after applying the Keccak permutation. At the last row of an instance, they hold the computed hash. They are decomposed in bytes for endianness reasons. -
KECCAK_WIDTH_MINUS_DIGEST_U32S
partial_updated_state_u32s
columns: the rest of the output state. They are discarded for the final digest, but are used between instance rows.
-
-
Helper columns:
-
is_full_input_block
: indicates if the current row has a full input block, i.e.block_bytes
contains only bytes read from memory and no padding bytes. -
KECCAK_RATE_BYTES
is_final_input_len
columns: in the final row of an instance, indicate where the final read byte is. If the -th column is set to 1, it means that all bytes after the -th are padding bytes. In a full input block, all columns are set to 0.
-
For each instance, constraints ensure that:
-
at each row:
-
is_full_input_block
andis_final_input_len
columns are all binary. -
Only one column in
is_full_input_block
andis_final_input_len
is set to 1. -
xored_rate_u32s
isoriginal_rate_u32s
XORblock_bytes
. -
The CTL with Keccak ensures that (
updated_digest_state_bytes columns
,partial_updated_state_u32s
) is the Keccak permutation output of (xored_rate_u32s
,original_capacity_u32s
).
-
-
at the first row:
-
original_rate_u32s
is all 0. -
already_absorbed_bytes
is 0.
-
-
at each full input row (i.e.
is_full_input_block
is 1, allis_final_input_len
columns are 0):-
context
,segment
,virt
andtimestamp
are unchanged in the next row. -
Next
already_absorbed_bytes
is currentalready_absorbed_bytes
+KECCAK_RATE_BYTES
. -
Next (
original_rate_u32s
,original_capacity_u32s
) is current (updated_digest_state_bytes columns
,partial_updated_state_u32s
). -
The CTL with Memory ensures that
block_bytes
is filled with contiguous memory elements [ +already_absorbed_bytes
, +already_absorbed_bytes
+KECCAK_RATE_BYTES
- 1]
-
-
at the final row (i.e.
is_full_input_block
is 0,is_final_input_len
's -th column is 1 for a certain , the rest are 0):-
The CTL with Memory ensures that
block_bytes
is filled with contiguous memory elements [ +already_absorbed_bytes
, +already_absorbed_bytes
+ - 1]. The rest are padding bytes. -
The CTL with CPU ensures that
context
,segment
,virt
andtimestamp
match theKECCAK_GENERAL
call. -
The CTL with CPU ensures that =
already_absorbed_bytes
+ . -
The CTL with CPU ensures that
updated_digest_state_bytes
is the output of theKECCAK_GENERAL
call.
-
The trace is padded to the next power of two with dummy rows, whose
is_full_input_block
and is_final_input_len
columns are all 0.
Logic
Each row of the logic table corresponds to one bitwise logic operation: either AND, OR or XOR. Each input for these operations is represented as 256 bits, while the output is stored as eight 32-bit limbs.
Each row therefore contains the following columns:
-
, an "is and" flag, which should be 1 for an OR operation and 0 otherwise,
-
, an "is or" flag, which should be 1 for an OR operation and 0 otherwise,
-
, an "is xor" flag, which should be 1 for a XOR operation and 0 otherwise,
-
256 columns for the bits of the first input ,
-
256 columns for the bits of the second input ,
-
8 columns for the 32-bit limbs of the output .
Note that we need all three flags because we need to be able to distinguish between an operation row and a padding row -- where all flags are set to 0.
The subdivision into bits is required for the two inputs as the table carries out bitwise operations. The result, on the other hand, is represented in 32-bit limbs since we do not need individual bits and can therefore save the remaining 248 columns. Moreover, the output is checked against the cpu, which stores values in the same way.
Memory
For simplicity, let's treat addresses and values as individual field elements. The generalization to multi-element addresses and values is straightforward.
Each row of the memory table corresponds to a single memory operation (a read or a write), and contains the following columns:
-
, the target address
-
, an "is read" flag, which should be 1 for a read or 0 for a write
-
, the value being read or written
-
, the timestamp of the operation
The memory table should be ordered by . Note that the correctness of the memory could be checked as follows:
-
Verify the ordering by checking that for each consecutive pair.
-
Enumerate the purportedly-ordered log while tracking the "current" value of .
-
Upon observing an address which doesn't match that of the previous row, if the address is zero-initialized and if the operation is a read, check that .
-
Upon observing a write, don't constrain .
-
Upon observing a read at timestamp which isn't the first operation at this address, check that .
-
The ordering check is slightly involved since we are comparing multiple columns. To facilitate this, we add an additional column , where the prover can indicate whether two consecutive addresses changed. An honest prover will set We also introduce a range-check column , which should hold: The extra ensures that the address actually changed if . We then impose the following transition constraints:
-
,
-
,
-
.
The third constraint emulates a comparison between two addresses or timestamps by bounding their difference; this assumes that all addresses and timestamps fit in 32 bits and that the field is larger than that.
Virtual memory
In the EVM, each contract call has its own address space. Within that address space, there are separate segments for code, main memory, stack memory, calldata, and returndata. Thus each address actually has three components:
-
an execution context, representing a contract call,
-
a segment ID, used to separate code, main memory, and so forth, and so on
-
a virtual address.
The comparisons now involve several columns, which requires some minor adaptations to the technique described above; we will leave these as an exercise to the reader.
Note that an additional constraint check is required: whenever we change the context or the segment, the virtual address must be range-checked to . Without this check, addresses could start at -1 (i.e. ) and then increase properly.
Timestamps
Memory operations are sorted by address and timestamp . For a memory operation in the CPU, we have: Since a memory channel can only hold at most one memory operation, every CPU memory operation's timestamp is unique.
Note that it doesn't mean that all memory operations have unique timestamps. There are two exceptions:
-
Before the CPU cycles, we preinitialize the memory with the flashed state stored in the MemBefore table and we write some global metadata. These operations are done at timestamp .
-
Some tables other than CPU can generate memory operations, like KeccakSponge. When this happens, these operations all have the timestamp of the CPU row of the instruction which invoked the table (for KeccakSponge, KECCAK_GENERAL).
Memory initialization
By default, all memory is zero-initialized. However, to save numerous writes, we allow some specific segments to be initialized with arbitrary values.
-
The code segment (segment 0) is either part of the initial memory for the kernel (context 0), or is initialized with externally-provided account code, then checked against the account code hash. In non-zero contexts, if the code is meant to be executed, there is a soundness concern: if the code is malformed and ends with an incomplete PUSH, then the missing bytes must be 0 accordingly to the Ethereum specs. To prevent the issue, we manually write 33 zeros (at most 32 bytes for the PUSH argument, and an extra one for the post-PUSH PC value).
-
The "TrieData" segment is initialized with the input tries. The stored tries are hashed and checked against the provided initial hash. Note that the length of the segment and the pointers -- within the "TrieData" segment -- for the three tries are provided as prover inputs. The length is then checked against a value computed when hashing the tries.
Final memory
The final value of each cell of the memory must be propagated to the MemAfter table. Since memory operations are ordered by address and by timestamps, this is easy to do: the last value of an address is the value of the last row touching this address. In other words, we propagate values of rows before the address changes.
Context pruning
We can observe that whenever we return from a context (e.g. with a RETURN opcode, from an exception...), we will never access it again and all its memory is now stale. We make use of this fact to prune stale contexts and exclude them from MemAfter.
MemBefore & MemAfter
The MemBefore (resp. MemAfter) table holds the content of the memory before (resp. after) the execution of the current segment. For consistency, the MemAfter trace of a segment must be identical to the MemAfter trace of the next segment. Each row of these tables contains:
-
, the memory cell address,
-
, the initial value of the cell.
The tables should be ordered by . Since they only hold values, there are no constraints between the rows.
A CTL copies all of the MemBefore values in the memory trace as reads, at timestamp . Another CTL copies the final values from memory to MemAfter. For more details on which values are propagated, consult the Final Memory section.
Merkle Patricia Tries
The EVM World state is a representation of the different accounts at a particular time, as well as the last processed transactions together with their receipts. The world state is represented using Merkle Patricia Tries (MPTs, see Yellowpaper App. D), and there are three different tries: the state trie, the transaction trie and the receipt trie.
For each transaction we need to show that the prover knows preimages of
the hashed initial and final EVM states. When the kernel starts
execution, it stores these three tries within the Segment::TrieData
segment. The prover loads the initial tries from the inputs into memory.
Subsequently, the tries are modified during transaction execution,
inserting new nodes or deleting existing nodes.
An MPT is composed of five different nodes: branch, extension, leaf, empty and digest nodes. Branch and leaf nodes might contain a payload whose format depends on the particular trie. The nodes are encoded, primarily using RLP encoding and Hex-prefix encoding (see Yellowpaper App. B and C, respectively). The resulting encoding is then hashed, following a strategy similar to that of normal Merkle trees, to generate the trie hashes.
Insertion and deletion is performed in the same way as other MPTs implementations. The only difference is for inserting extension nodes where we create a new node with the new data, instead of modifying the existing one. In the rest of this section we describe how the MPTs are represented in memory, how they are given as input, and how MPTs are hashed.
Internal memory format
The tries are stored in kernel memory, specifically in the
Segment:TrieData
segment. Each node type is stored as
-
An empty node is encoded as .
-
A branch node is encoded as , where each is a pointer to a child node, and is a pointer to a value. If a branch node has no associated value, then , i.e. the null pointer.
-
An extension node is encoded as , represents the part of the key associated with this extension, and is encoded as a 2-tuple . is a pointer to a child node.
-
A leaf node is encoded as , where is a 2-tuple as above, and is a pointer to a value.
-
A digest node is encoded as , where is a Keccak256 digest.
On the other hand the values or payloads are represented differently depending on the particular trie.
State trie
The state trie payload contains the account data. Each account is stored in 4 contiguous memory addresses containing
-
the nonce,
-
the balance,
-
a pointer to the account's storage trie,
-
a hash of the account's code.
The storage trie payload in turn is a single word.
Transaction Trie
The transaction trie nodes contain the length of the RLP encoded transaction, followed by the bytes of the RLP encoding of the transaction.
Receipt Trie
The payload of the receipts trie is a receipt. Each receipt is stored as
-
the length in words of the payload,
-
the status,
-
the cumulative gas used,
-
the bloom filter, stored as 256 words.
-
the number of topics,
-
the topics
-
the data length,
-
the data.
Prover input format
The initial state of each trie is given by the prover as a nondeterministic input tape. This tape has a slightly different format:
-
An empty node is encoded as .
-
A branch node is encoded as . Here consists of a flag indicating whether a value is present, followed by the actual value payload if one is present. Each is the encoding of a child node.
-
An extension node is encoded as , where represents the part of the key associated with this extension, and is encoded as a 2-tuple . is a pointer to a child node.
-
A leaf node is encoded as , where is a 2-tuple as above, and is a value payload.
-
A digest node is encoded as , where is a Keccak256 digest.
Nodes are thus given in depth-first order, enabling natural recursive methods for encoding and decoding this format. The payload of state and receipt tries is given in the natural sequential way. The transaction an receipt payloads contain variable size data, thus the input is slightly different. The prover input for for the transactions is the transaction RLP encoding preceded by its length. For the receipts is in the natural sequential way, except that topics and data are preceded by their lengths, respectively.
Encoding and Hashing
Encoding is done recursively starting from the trie root. Leaf, branch and extension nodes are encoded as the RLP encoding of list containing the hex prefix encoding of the node key as well as
-
Leaf Node: the encoding of the the payload,
-
Branch Node: the hash or encoding of the 16 children and the encoding of the payload,
-
Extension Node: the hash or encoding of the child and the encoding of the payload.
For the rest of the nodes we have:
-
Empty Node: the encoding of an empty node is
0x80
, -
Digest Node: the encoding of a digest node stored as is .
The payloads in turn are RLP encoded as follows
-
State Trie: Encoded as a list containing nonce, balance, storage trie hash and code hash.
-
Storage Trie: The RLP encoding of the value (thus the double RLP encoding)
-
Transaction Trie: The RLP encoded transaction.
-
Receipt Trie: Depending on the transaction type, it is encoded as for Legacy transactions or for transactions of type 1 or 2. Each receipt is encoded as a list containing:
-
the status,
-
the cumulative gas used,
-
the bloom filter, stored as a list of length 256.
-
the list of topics
-
the data string.
-
Once a node is encoded it is written to the Segment::RlpRaw
segment as
a sequence of bytes. Then the RLP encoded data is hashed if the length
of the data is more than 32 bytes. Otherwise we return the encoding.
Further details can be found in the mpt hash
module.
Linked lists
Individual account information are contained in the state and the storage MPTs. However, accessing and modifying MPT data requires heavy trie traversal, insertion and deletion functions. To alleviate these costs, during an execution run, we store all account information in linked list structures and only modify the state trie at the end of the run.
Our linked list construction guarantees these properties:
-
A linked list is cyclic. The last element's successor is the first element.
-
A linked list is always sorted by a certain index, which can be one or more fields of an element.
-
The last element of a linked list is MAX, whose index is always higher than any possible index value.
-
An index cannot appear twice in the linked list.
These properties allows us to efficiently modify the list.
Search
To search a node given its index, we provide via PROVER_INPUT
a
pointer to its predecessor . We first check that 's index is
strictly lower than the node index, if not, the provided pointer is
invalid. Then, we check , 's successor. If 's index is equal to
the node index, we found the node. If 's index is lower than the node
index, then the provided was invalid. If 's index is greater than
the node index, then the node doesn't exist.
Insertion
To insert a node given its index, we provide via PROVER_INPUT
a
pointer to its predecessor . We first check that 's index is
strictly lower than the node index, if not, the provided pointer is
invalid. Then, we check , 's successor, and make sure that is
strictly greater than the node index. We create a new node, and make it
's successor; then we make the new node's successor.
Deletion
To delete a node given its index, we provide via PROVER_INPUT
a
pointer to its predecessor . We check that 's successor is equal
to the node index; if not either is invalid or the node doesn't
exist. Then we set 's successor to the node's successor. To indicate
that the node is now deleted and to make sure that it's never accessed
again, we set its next pointer to MAX.
We maintain two linked lists: one for the state accounts and one for the storage slots.
Account linked list
An account node is made of four memory cells:
-
The account key (the hash of the account address). This is the index of the node.
-
A pointer to the account payload, in segment
@TrieData
. -
A pointer to the initial account payload, in segment
@TrieData
. This is the value of the account at the beginning of the execution, before processing any transaction. This payload never changes. -
A pointer to the next node (which points to the next node's account key).
Storage linked list
A storage node is made of five memory cells:
-
The account key (the hash of the account address).
-
The slot key (the hash of the slot). Nodes are indexed by
(account_key, slot_key)
. -
The slot value.
-
The initial slot value. This is the value of the account at the beginning of the execution, before processing any transaction. It never changes.
-
A pointer to the next node (which points to the next node's account key).
CPU Execution
The CPU is in charge of coordinating the different STARKs, proving the correct execution of the instructions it reads and guaranteeing that the final state of the EVM corresponds to the starting state after executing the input transactions. All design choices were made to make sure these properties can be adequately translated into constraints of degree at most 3 while minimizing the size of the different table traces (number of columns and number of rows).
In this section, we will detail some of these choices.
Kernel
The kernel is in charge of the proving logic. This section aims at providing a high level overview of this logic. For details about any specific part of the logic, one can consult the various "asm" files in the "kernel" module.
We prove a batch of transactions, split into segments. These proofs can later be aggregated recursively to prove a block. Proof aggregation is however not in the scope of this section. Here, we assume that we have an initial state of the EVM, and we wish to prove that a batch of contiguous transactions was correctly executed, leading to a correct update of the state.
Since we process transactions and not entire blocks, a few intermediary values need to be provided by the prover. Indeed, to prove that the registers in the EVM state are correctly updated, we need to have access to their initial values. When aggregating proofs, we can also constrain those values to match from one batch to the next. Let us consider the example of the transaction number. Let be the number of transactions executed so far in the current block. If the current proof is not a dummy one (we are indeed executing a batch of transactions), then the transaction number should be updated: with the number of transactions in the batch. Otherwise, the number remains unchanged. We can easily constrain this update. When aggregating the previous transaction batch proof () with the current one (), we also need to check that the output transaction number of is the same as the input transaction number of .
Those prover provided values are stored in memory prior to entering the kernel, and are used in the kernel to assert correct updates. The list of prover provided values necessary to the kernel is the following:
-
the number of the last transaction executed: ,
-
the gas used before executing the current transactions: ,
-
the gas used after executing the current transactions: ,
-
the state, transaction and receipts MPTs before executing the current transactions: ,
-
the hash of all MPTs before executing the current transactions: ,
-
the hash of all MPTs after executing the current transactions: ,
-
the RLP encoding of the transactions.
Memory addresses
Kernel operations deal with memory addresses as single U256 elements. However, when processing the operations to generate the proof witness, the CPU will decompose these into three components:
-
The context of the memory address. The Kernel context is special, and has value 0.
-
The segment of the memory address, corresponding to a specific section given a context (eg. MPT data, global metadata, etc.).
-
The offset of the memory address, within a segment given a context.
To easily retrieve these components, we scale them so that they can represent a memory address as:
This allows to easily retrieve each component individually once a Memory address has been decomposed into 32-bit limbs.
Segment handling
An execution run is split into one or more segments. To ensure
continuity, the first cycles of a segment are used to "load" segment
data from the previous segment, and the last cycles to "save" segment
data for the next segment. The number of CPU cycles of a segment is
bounded by MAX_CPU_CYCLES
, which can be tweaked for best performance.
The segment data values are:
-
the stack length,
-
the stack top,
-
the context,
-
the
is_kernel
flag, -
the gas used,
-
the program counter.
These values are stored as global metadata, and are loaded from (resp. written to) memory at the beginning (resp. at the end) of a segment. They are propagated between proofs as public values.
The initial memory of the first segment is fixed and contains:
-
the kernel code,
-
the shift table.
Initialization
The first step of a run consists in initializing:
-
The initial transaction and receipt tries are loaded from memory. The transaction and the receipt tries are hashed and the hashes are then compared to . For efficiency, the initial state trie will be hashed for verification at the end of the run.
-
We load the transaction number and the current gas used from memory.
We start processing the transactions (if any) sequentially, provided in RLP encoded format.
The processing of the transaction returns a boolean "success" that indicates whether the transaction was executed successfully, along with the leftover gas.
The following step is then to update the receipts MPT. Here, we update the transaction's bloom filter. We store "success", the leftover gas, the transaction bloom filter and the logs in memory. We also store some additional information that facilitates the RLP encoding of the receipts later.
If there are any withdrawals, they are performed at this stage.
Finally, once the three MPTs have been updated, we need to carry out final checks:
-
the gas used after the execution is equal to ,
-
the new transaction number is with the number of processed transactions,
-
the initial state MPT is hashed and checked against .
-
the initial state MPT is updated to reflect the processed transactions, then the three final MPTs are hashed and checked against .
Once those final checks are performed, the program halts.
Simple opcodes & Syscalls
For simplicity and efficiency, EVM opcodes are categorized into two groups: "simple opcodes" and "syscalls". Simple opcodes are generated directly in Rust, in operation.rs. Every call to a simple opcode adds exactly one row to the cpu table. Syscalls are more complex structures written with simple opcodes, in the kernel.
Whenever we encounter a syscall, we switch to kernel mode and execute its associated code. At the end of each syscall, we run EXIT_KERNEL, which resets the kernel mode to its state right before the syscall. It also sets the PC to point to the opcode right after the syscall.
Exceptions are handled differently for simple opcodes and syscalls. When necessary, simple opcodes throw an exception (see exceptions). This activates the "exception flag" in the CPU and runs the exception operations. On the other hand, syscalls handle exceptions in the kernel directly.
Privileged instructions
To ease and speed-up proving time, the zkEVM supports custom, privileged instructions that can only be executed by the kernel. Any appearance of those privileged instructions in a contract bytecode for instance would result in an unprovable state.
In what follows, we denote by the characteristic of the BN254 curve base field, curve for which Ethereum supports the ecAdd, ecMul and ecPairing precompiles.
-
ADDFP254
. Pops 2 elements from the stack interpreted as BN254 base field elements, and pushes their addition modulo onto the stack. -
MULFP254
. Pops 2 elements from the stack interpreted as BN254 base field elements, and pushes their product modulo onto the stack. -
SUBFP254
. Pops 2 elements from the stack interpreted as BN254 base field elements, and pushes their difference modulo onto the stack. This instruction behaves similarly to the SUB (0x03) opcode, in that we subtract the second element of the stack from the initial (top) one. -
SUBMOD
. Pops 3 elements from the stack, and pushes the modular difference of the first two elements of the stack by the third one. It is similar to the SUB instruction, with an extra pop for the custom modulus. -
KECCAK_GENERAL
. Pops 2 elements (a Memory address, followed by a length ) and pushes the hash of the memory portion starting at the constructed address and of length . It is similar to KECCAK256 (0x20) instruction, but can be applied to any memory section (i.e. even privileged ones). -
MSTORE_32BYTES
. Pops 2 elements from the stack (a Memory address, and then a value), and pushes a new address' onto the stack. The value is being decomposed into bytes and written to memory, starting from the fetched address. The new address being pushed is computed as the initial address + the length of the byte sequence being written to memory. Note that similarly to PUSH (0x60-0x7F) instructions, there are 32 MSTORE_32BYTES instructions, each corresponding to a target byte length (length 0 is ignored, for the same reasons as MLOAD_32BYTES, see below). Writing to memory an integer fitting in bytes with a length will result in the integer being truncated. On the other hand, specifying a length greater than the byte size of the value being written will result in padding with zeroes. This process is heavily used when resetting memory sections (by calling MSTORE_32BYTES_32 with the value 0). -
PROVER_INPUT
. Pushes a single prover input onto the stack. -
GET_CONTEXT
. Pushes the current context onto the stack. The kernel always has context 0. -
SET_CONTEXT
. Pops the top element of the stack and updates the current context to this value. It is usually used when calling another contract or precompile, to distinguish the caller from the callee. -
MLOAD_32BYTES
. Pops 2 elements from the stack (a Memory address, and then a length ), and pushes a value onto the stack. The pushed value corresponds to the U256 integer read from the big-endian sequence of length from the memory address being fetched. Note that an empty length is not valid, nor is a length greater than 32 (as a U256 consists in at most 32 bytes). Missing these conditions will result in an unverifiable proof. -
EXIT_KERNEL
. Pops 1 element from the stack. This instruction is used at the end of a syscall, before proceeding to the rest of the execution logic. The popped element, kexit_info, contains several pieces of information like the current program counter, the current amount of gas used, and whether we are in kernel (i.e. privileged) mode or not. -
MLOAD_GENERAL
. Pops 1 elements (a Memory address), and pushes the value stored at this memory address onto the stack. It can read any memory location, general (similarly to MLOAD (0x51) instruction) or privileged. -
MSTORE_GENERAL
. Pops 2 elements (a value and a Memory address), and writes the popped value from the stack at the fetched address. It can write to any memory location, general (similarly to MSTORE (0x52) / MSTORE8 (0x53) instructions) or privileged.
Stack handling
Top of the stack
The majority of memory operations involve the stack. The stack is a
segment in memory, and stack operations (popping or pushing) use the
memory channels. Every CPU instruction performs between 0 and 3 pops,
and may push at most once. However, for efficiency purposes, we hold the
top of the stack in the first memory channel
current_row.mem_channels[0]
, only writing it in memory if necessary.
Top reading and writing
When a CPU instruction modifies the stack, it must update the top of the stack accordingly. There are three cases.
-
The instruction pops and pushes: The new top of the stack is stored in
next_row.mem_channels[0]
; it may be computed by the instruction, or it could be read from memory. In either case, the instruction is responsible for settingnext_row.mem_channels[0]
's flags and address columns correctly. After use, the previous top of the stack is discarded and doesn't need to be written in memory. -
The instruction pushes, but doesn't pop: The new top of the stack is stored in
next_row.mem_channels[0]
; it may be computed by the instruction, or it could be read from memory. In either case, the instruction is responsible for settingnext_row.mem_channels[0]
's flags and address columns correctly. If the stack wasn't empty (current_row.stack_len > 0
), the instruction performs a memory read incurrent_row.partial_ channel
.current_row.partial_channel
shares its values withcurrent_ row.mem_channels[0]
(which holds the current top of the stack). If the stack was empty,current_row.partial_channel
is disabled. -
The instruction pops, but doesn't push: After use, the current top of the stack is discarded and doesn't need to be written in memory. If the stack isn't empty now (
current_row.stack_len > num_pops
), the new top of the stack is set innext_row.mem_channels[0]
with a memory read from the stack segment. If the stack is now empty,next_row.mem_channels[0]
is disabled.
In the last two cases, there is an edge case if current_row.stack_len
is equal to a special_len
. For a strictly pushing instruction, this
happens if the stack is empty, and special_len = 0
. For a strictly
popping instruction, this happens if the next stack is empty, i.e. if
all remaining elements are popped, and special_len = num_pops
. Note
that we do not need to check for values below num_pops
, since this
would be a stack underflow exception which is handled separately. The
edge case is detected with the compound flag
where
and stack_inv_aux
is constrained to be the modular inverse of
not_special_ len
if it's non-zero, or 0 otherwise. The flag is 1 if
stack_len
is equal to special_len
, and 0 otherwise.
This logic can be found in code in the eval_packed_one
function of
stack.rs.
The function multiplies all of the stack constraints with the degree 1
filter associated with the current instruction.
Operation flag merging
To reduce the total number of columns, many operation flags are merged
together (e.g. DUP
and SWAP
) and are distinguished with the binary
decomposition of their opcodes. The filter for a merged operation is now
of degree 2: for example, is_swap = dup_swap * opcode_bits[4]
since
the 4th bit is set to 1 for a SWAP
and 0 for a DUP
. If the two
instructions have different stack behaviors, this can be a problem:
eval_packed_one
's constraints are already of degree 3 and it can't
support degree 2 filters.
When this happens, stack constraints are defined manually in the
operation's dedicated file (e.g. dup_swap.rs
). Implementation details
vary case-by-case and can be found in the files.
Stack length checking
The CPU must make sure that the stack length never goes below zero and, in user mode, never grows beyond the maximum stack size. When this happens, an honest prover should trigger the corresponding exception. If a malicious prover doesn't trigger the exception, constraints must fail the proof.
Stack underflow
There is no explicit constraint checking for stack underflow. An
underflow happens when the CPU tries to pop the empty stack, which would
perform a memory read at virtual address -1
. Such a read cannot
succeed: in Memory, the range-check argument requires the gap between
two consecutive addresses to be lower than the length of the Memory
trace. Since the prime of the Plonky2 field is 64-bit long, this would
require a Memory trace longer than .
Stack overflow
An instruction can only push at most once, meaning that an overflow
occurs whenever the stack length is exactly one more than the maximum
stack size () in user mode. To constrain this, the column
stack_len_bounds_aux
contains:
-
the modular inverse of
stack_len - 1025
if we're in user mode andstack_len
1025
, -
0 if
stack_len = 1025
or if we're in kernel mode.
Then overflow can be checked with the flag
The flag is 1 if stack_len = 1025
and we're in user mode, and 0
otherwise.
Because stack_len_bounds_aux
is a shared general column, we only check
this constraint after an instruction that can actually trigger an
overflow, i.e. a pushing, non-popping instruction.
Gas handling
Out of gas errors
The CPU table has a "gas" register that keeps track of the gas used by the transaction so far.
The crucial invariant in our out-of-gas checking method is that at any point in the program's execution, we have not used more gas than we have available; that is "gas" is at most the gas allocation for the transaction (which is stored separately by the kernel). We assume that the gas allocation will never be or more, so if "gas" does not fit in one limb, then we've run out of gas.
When a native instruction (one that is not a syscall) is executed, a constraint ensures that the "gas" register is increased by the correct amount. This is not automatic for syscalls; the syscall handler itself must calculate and charge the appropriate amount.
If everything goes smoothly and we have not run out of gas, "gas" should
be no more than the gas allowance at the point that we STOP, REVERT,
stack overflow, or whatever. Indeed, because we assume that the gas
overflow handler is invoked as soon as we've run out of gas, all these
termination methods verify that ,
and jump to exc_out_of_gas
if this is not the case. This is also true
for the out-of-gas handler, which checks that:
-
we have not yet run out of gas
-
we are about to run out of gas
and "PANIC" if either of those statements does not hold.
When we do run out of gas, however, this event must be handled. Syscalls
are responsible for checking that their execution would not cause the
transaction to run out of gas. If the syscall detects that it would need
to charge more gas than available, it aborts the transaction (or the
current code) by jumping to fault_exception
. In fact,
fault_exception
is in charge of handling all exceptional halts in the
kernel.
Native instructions do this differently. If the prover notices that execution of the instruction would cause an out-of-gas error, it must jump to the appropriate handler instead of executing the instruction. (The handler contains special code that PANICs if the prover invoked it incorrectly.)
Overflow
We must be careful to ensure that "gas" does not overflow to prevent denial of service attacks.
Note that a syscall cannot be the instruction that causes an overflow. This is because every syscall is required to verify that its execution does not cause us to exceed the gas limit. Upon entry into a syscall, a constraint verifies that . Some syscalls may have to be careful to ensure that the gas check is performed correctly (for example, that overflow modulo does not occur). So we can assume that upon entry and exit out of a syscall, .
Similarly, native instructions alone cannot cause wraparound. The most expensive instruction, JUMPI, costs 10 gas. Even if we were to execute consecutive JUMPI instructions, the maximum length of a trace, we are nowhere close to consuming (= Goldilocks prime) gas.
The final scenario we must tackle is an expensive syscall followed by many expensive native instructions. Upon exit from a syscall, . Again, even if that syscall is followed by native instructions of cost 10, we do not see wraparound modulo Goldilocks.
Exceptions
Sometimes, when executing user code (i.e. contract or transaction code),
the EVM halts exceptionally (i.e. outside of a STOP, a RETURN or a
REVERT). When this happens, the CPU table invokes a special instruction
with a dedicated operation flag exception
. Exceptions can only happen
in user mode; triggering an exception in kernel mode would make the
proof unverifiable. No matter the exception, the handling is the same:
-- The opcode which would trigger the exception is not executed. The
operation flag set is exception
instead of the opcode's flag.
-- We push a value to the stack which contains: the current program
counter (to retrieve the faulty opcode), and the current value of
gas_used
. The program counter is then set to the corresponding
exception handler in the kernel (e.g. exc_out_of_gas
).
-- The exception handler verifies that the given exception would indeed be triggered by the faulty opcode. If this is not the case (if the exception has already happened or if it doesn't happen after executing the faulty opcode), then the kernel panics: there was an issue during witness generation.
-- The kernel consumes the remaining gas and returns from the current
context with success
set to 0 to indicate an execution failure.
Here is the list of the possible exceptions:
-
Raised when a native instruction (i.e. not a syscall) in user mode pushes the amount of gas used over the current gas limit. When this happens, the EVM jumps to
exc_out_of_gas
. The kernel then checks that the consumed gas is currently below the gas limit, and that adding the gas cost of the faulty instruction pushes it over it. If the exception is not raised, the prover will panic when returning from the execution: the remaining gas is checked to be positive after STOP, RETURN or REVERT. -
Raised when the read opcode is invalid. It means either that it doesn't exist, or that it's a privileged instruction and thus not available in user mode. When this happens, the EVM jumps to
exc_invalid_opcode
. The kernel then checks that the given opcode is indeed invalid. If the exception is not raised, decoding constraints ensure no operation flag is set to 1, which would make it a padding row. Halting constraints would then make the proof unverifiable. -
Raised when an instruction which pops from the stack is called when the stack doesn't have enough elements. When this happens, the EVM jumps to
exc_stack_overflow
. The kernel then checks that the current stack length is smaller than the minimum stack length required by the faulty opcode. If the exception is not raised, the popping memory operation's address offset would underflow, and the Memory range check would require the Memory trace to be too large (). -
Raised when the program counter jumps to an invalid location (i.e. not a JUMPDEST). When this happens, the EVM jumps to
exc_invalid_jump_destination
. The kernel then checks that the opcode is a JUMP, and that the destination is not a JUMPDEST by checking the JUMPDEST segment. If the exception is not raised, jumping constraints will fail the proof. -
Same as the above, for JUMPI.
-
Raised when a pushing instruction in user mode pushes the stack over 1024. When this happens, the EVM jumps to
exc_stack_overflow
. The kernel then checks that the current stack length is exactly equal to 1024 (since an instruction can only push once at most), and that the faulty instruction is pushing. If the exception is not raised, stack constraints ensure that a stack length of 1025 in user mode will fail the proof.