--- layout: cover marp: true --- # Sway Encoding --- - How it works for contracts? - Caller POV - Contract being called POV - **Interlude**: Why even encode the data? - Encoding for - scripts - predicates - receipts (logs) - configurables --- # How it works? (caller POV) At some point, someones calls a contract: ```rust let contract = abi(TestContract, CONTRACT_ID); contract.some_method(0); ``` Compiler will desugar this into ```rust std::codec::contract_call( CONTRACT_ID, "some_method", (0,), ) ``` --- which will be expanded into: ```rust let first_parameter = encode("some_method"); let second_parameter = encode((0,)); let params = encode(( CONTRACT_ID, first_parameter.ptr(), second_parameter.ptr(), )); let (ptr, len) = __contract_call(params.ptr(), coins, asset_id, gas); let mut buffer = BufferReader::from_parts(ptr, len); T::abi_decode(buffer) ``` --- and, in the end, will be compiled into a **fuelVM** `call` instruction. ``` $hp │ │ ┌────────────────────────────┐ ▼ │ │ ┌──────────────┬──────┴──────┬──────────────┬──────▼────────┬───────────────┐ │ │ │ │ │ │ HEAP │ CONTRACT_ID │ method name │ method args │ encoded bytes │ encoded bytes │ │ 32 bytes │ param1 │ param2 │ │ │ └──────▲───────┴─────────────┴──────┬───────┴───────────────┴───────▲───────┘ │ │ │ │ └───────────────────────────────┘ │ ... │ call $ra:ptr $rb:u64 $rc:ptr $rd:u64 ... coins │ gas │ ┌▼─────────────┐ │ │ STACK .................... │ ASSET_ID │ │ 32 bytes │ └──────────────┘ ▲ ▲ │ │ │ │ $ssp $sp ``` --- # How it works? (contract being called POV) The example above was calling a contract implemented like: ```rust impl TestContract for Contract { fn some_method(qty: u64) { ... } } ``` --- Compiler will desugar this into: ```rust pub fn __entry() { let method_name = std::codec::decode_first_param::(); if method_name == "some_method" { let mut buffer = std::codec::BufferReader::from_second_parameter(); let args: (u64,) = buffer.decode::<(u64,)>(); let result: () = __contract_entry_some_method(args.0); let result: raw_slice = encode::<()>(result); __contract_ret(result.ptr(), result.len::()); } __revert(123); } ``` `__contract_entry_some_method` is the original `some_method` as is. `__contract_ret` is a special function that immediately returns the current context, that is why there is no `return` in the generated code. --- This is the memory layout before the contract method is actually called ``` $hp | | +----------------------------+ v | | +--------------+------+------+--------------+------v--------+---------------+ | | | | | | HEAP | CONTRACT_ID | method name | method args | encoded bytes | encoded bytes | | 32 bytes | param1 | param2 | | | +--------------+-------------+------+-------+------^--------+-------^--^----+ | | | | +--------------+----------------+ | | | +-------------------------+ | | | | +-------------------------------------+ | | +--------------+--------+-------+--------+ | | param1 param2 | STACK | ASSET_ID | @ 73 words @ 74 words | | 32 bytes | offset offset | +--------------+-------------------------+ ^ call frame metadata ^ | | | | $fp $ssp/$sp ``` --- # **Interlude**: Why even encode the data? - All this begs the question... why? - Why all the extra cost of encoding and decoding? - why we don´t just pass all parameters directly? - Answer: - ABI instability - Return Value Demotion - Aliasing --- # Why? Avoiding `ABI instability` APIs define the types that are exchanged: ```rust abi MyContract { fn some_method(arg: Vec); } ``` but they do not define how these types are passed around. In the example above, the only safe way to call a contract, would be to use the **exactly same version** of the `stdlib` the contract implementation is using. Why? Any internal change to `Vec` could make calling a contract impossible. --- Example: Suppose that caller and callee were compiled using `std-lib v1`. ```rust struct Vec { pointer: raw_ptr, cap: u64, len: u64, } ``` But the contract implementation decide to update to `std-lib v2`, that for some reason was changed to: ```rust struct Vec { len: u64, pointer: raw_ptr, cap: u64, } ``` --- Now caller will never be able to call the contract. Welcome to **ABI Hell**. ``` | BEFORE | AFTER | +-----> { 0x...., 16, 4 } <----+ | +-----> { 0x...., 16, 4 } <---+ | | | | | | | | | | { pointer, capacity, len } { pointer, capacity, len } | { pointer, capacity, len } { len, pointer, capacity } | What What | What What CALLER CALLEE | CALLER CALLEE sends sees | sends sees Vec @ std-lib v1 Vec @ std-lib v1 | Vec @ std-lib v1 Vec @ std-lib v2 ``` Now imagine everything that can affect byte layouts: lib versions, compiler versions, compiler flags, compiler optimizations etc.... All this can break contract calls without anyone noticing. All this, actually, also breaks whoever is consuming these bytes: indexers, SDKs, receipts parsers etc... --- So our encoding scheme avoid this instability being an issue by forcing types to specify how they are encoded, decoded: ```rust impl AbiEncode for Vec where T: AbiEncode { fn abi_encode(self, buffer: Buffer) -> Buffer { ... } } impl AbiDecode for Vec where T: AbiDecode { fn abi_decode(ref mut buffer: BufferReader) -> Vec { ... } } ``` And to improve dev experience, we automatically implement `AbiEncode`/`AbiDecode` for all types that do not contain pointers. --- # Why? Issues with `Return Value Demotion` Consider a simple contract method as: ```rust impl TestContract for Contract { fn some_method() -> Vec { Vec::new() } } ``` In reality this will be compiled into something like the code below. This is called "Return Value Demotion". ```rust fn __contract_entry_some_method(return_value: &mut Vec) { *return_value = Vec::new(); } ``` --- The problems for contracts is that `return_value` would point to memory region of the callee. And contracts cannot write into it. That would demand `return_value` pointing to somewhere allocated by the contract, and being copied over by the callee. --- # Why? `Aliasing` If we bypass encoding, and allow caller to pass pointers to contracts, that immediately creates a security issues, because malicious callers could craft aliased data structures and trick contracts. For example: ```rust impl TestContract for Contract { fn some_method(v: Vec) { let some_value1 = do_something(&v); // do some_thing that allocate memory let some_value2 = do_something(&v); } } ``` --- A malicious caller can craft a `Vec` which its pointer points to the memory the contract uses. In the case above, would be possible to have `some_value1` be different from `some_value2`, although this 100% not intuitive. Allowing a caller to attack the contract somehow. To avoid this, our encoding scheme, do not allow pointers, references etc... to be passed. You must always pass only the data. --- # What we have left - Encoding for - scripts - predicates - receipts (logs) - configurables --- # Scripts and Predicates `scripts` and `predicates` can have arguments on their `main` function. ```rust fn main(v: u64) -> bool { ... } ``` In both cases, the compiler will desugar into something like: ```rust pub fn __entry() -> raw_slice { let args: (u64,) = decode_script_data::<(u64,)>(); // or decode_predicate_data let result: u64 = main(args.0); encode::(result) } ``` --- # Log When the `std::log` is called we also encode its argument. So ```rust log(1); ``` will be desugared into ```rust __log(encode(1)); ``` --- # Configurables Configurables are a little bit more complex. What happens with configurables is that their initialization is evaluated at compile time. ```rust configurable { SOMETHING: u64 = 1, } ``` In the example above, it will evaluate to `1`. The compiler will then call `encode(1)` and append the result to the end of the binary. --- And to allow SDKs to change this value just before deployment, the compiler generates an entry in the "ABI json", with the buffer offset in the binary. ```json { "name": "SOMETHING", "configurableType": { ... }, "offset": 7104 }, ``` The last piece for configurables is how are they "decoded"? This is done automatically by the compiler. --- For example ```rust configurable { SOMETHING: u64 = 1 } fn main() -> u64 { SOMETHING } ``` will be desugared into something like ```rust const SOMETHING: u64; fn __entry() -> raw_slice { std::codec::abi_decode_in_place(&mut SOMETHING, 7104, 8); encode(main()) } fn main() -> u64 { SOMETHING } ``` This is not valid `sway`, but gives the idea of what is happening. --- end