sway/docs/slides/encoding.md
SwayStar123 a5d9d2835f
Merge std and core libraries (#6729)
## Description
Merges the two libraries. They were initially separate to separate the
core logic and fuel vm specific functionality, but that separation is no
longer maintained so having a merged library is better.

Closes #6708

## Checklist

- [ ] I have linked to any relevant issues.
- [ ] I have commented my code, particularly in hard-to-understand
areas.
- [ ] I have updated the documentation where relevant (API docs, the
reference, and the Sway book).
- [ ] If my change requires substantial documentation changes, I have
[requested support from the DevRel
team](https://github.com/FuelLabs/devrel-requests/issues/new/choose)
- [ ] I have added tests that prove my fix is effective or that my
feature works.
- [ ] I have added (or requested a maintainer to add) the necessary
`Breaking*` or `New Feature` labels where relevant.
- [ ] I have done my best to ensure that my PR adheres to [the Fuel Labs
Code Review
Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md).
- [ ] I have requested a review from the relevant team or maintainers.

---------

Co-authored-by: Sophie <47993817+sdankel@users.noreply.github.com>
Co-authored-by: Igor Rončević <ironcev@hotmail.com>
2025-03-12 23:52:38 +01:00

12 KiB
Raw Permalink Blame History

layout marp
cover true

Sway Encoding


  • How it works for contracts?
    • Caller POV
    • Contract being called POV
  • Interlude: Why even encode the data?
  • Encoding for
    • scripts
    • predicates
    • receipts (logs)
    • configurables

How it works? (caller POV)

At some point, someones calls a contract:

let contract = abi(TestContract, CONTRACT_ID);
contract.some_method(0);

Compiler will desugar this into

std::codec::contract_call(
    CONTRACT_ID,
    "some_method",
    (0,),
)

which will be expanded into:

let first_parameter = encode("some_method");
let second_parameter = encode((0,));
let params = encode((
    CONTRACT_ID,
    first_parameter.ptr(),
    second_parameter.ptr(),
));

let (ptr, len) = __contract_call(params.ptr(), coins, asset_id, gas);

let mut buffer = BufferReader::from_parts(ptr, len);
T::abi_decode(buffer)

and, in the end, will be compiled into a fuelVM call instruction.

      $hp
       │
       │                     ┌────────────────────────────┐
       ▼                     │                            │
       ┌──────────────┬──────┴──────┬──────────────┬──────▼────────┬───────────────┐
       │              │             │              │               │               │
HEAP   │  CONTRACT_ID │ method name │ method args  │ encoded bytes │ encoded bytes │
       │    32 bytes  │   param1    │   param2     │               │               │
       └──────▲───────┴─────────────┴──────┬───────┴───────────────┴───────▲───────┘
              │                            │                               │
              │                            └───────────────────────────────┘
              │
        ...   │
        call $ra:ptr $rb:u64 $rc:ptr $rd:u64
        ...           coins   │        gas
                              │
                             ┌▼─────────────┐
                             │              │
STACK  ....................  │   ASSET_ID   │
                             │    32 bytes  │
                             └──────────────┘
       ▲                                    ▲
       │                                    │
       │                                    │
     $ssp                                  $sp

How it works? (contract being called POV)

The example above was calling a contract implemented like:

impl TestContract for Contract {
    fn some_method(qty: u64) {
        ...
    }
}

Compiler will desugar this into:

pub fn __entry() {
    let method_name = std::codec::decode_first_param::<str>();
    if method_name == "some_method" {
        let mut buffer = std::codec::BufferReader::from_second_parameter();
        let args: (u64,) = buffer.decode::<(u64,)>();
        let result: () = __contract_entry_some_method(args.0);
        let result: raw_slice = encode::<()>(result);
        __contract_ret(result.ptr(), result.len::<u8>());
    }
    __revert(123);
}

__contract_entry_some_method is the original some_method as is. __contract_ret is a special function that immediately returns the current context, that is why there is no return in the generated code.


This is the memory layout before the contract method is actually called

      $hp                                                                           
       |                                                                            
       |                     +----------------------------+                         
       v                     |                            |                         
       +--------------+------+------+--------------+------v--------+---------------+
       |              |             |              |               |               |
HEAP   |  CONTRACT_ID | method name | method args  | encoded bytes | encoded bytes |
       |    32 bytes  |   param1    |   param2     |               |               |
       +--------------+-------------+------+-------+------^--------+-------^--^----+
                                           |              |                |  |     
                                           +--------------+----------------+  |     
                                                          |                   |     
                                +-------------------------+                   |     
                                |                                             |     
                                |       +-------------------------------------+     
                                |       |                                           
        +--------------+--------+-------+--------+                                  
        |              | param1      param2      |                                  
STACK   |  ASSET_ID    |  @ 73 words  @ 74 words |                                  
        |    32 bytes  |  offset      offset     |                                  
        +--------------+-------------------------+                                  
                       ^   call frame metadata   ^                                  
                       |                         |                                  
                       |                         |                                  
                      $fp                    $ssp/$sp                               

Interlude: Why even encode the data?

  • All this begs the question... why?
    • Why all the extra cost of encoding and decoding?
    • why we don´t just pass all parameters directly?
  • Answer:
    • ABI instability
    • Return Value Demotion
    • Aliasing

Why? Avoiding ABI instability

APIs define the types that are exchanged:

abi MyContract {
    fn some_method(arg: Vec<u64>);
}

but they do not define how these types are passed around. In the example above, the only safe way to call a contract, would be to use the exactly same version of the stdlib the contract implementation is using.

Why?

Any internal change to Vec could make calling a contract impossible.


Example: Suppose that caller and callee were compiled using std-lib v1.

struct Vec<T> {
    pointer: raw_ptr,
    cap: u64,
    len: u64,
}

But the contract implementation decide to update to std-lib v2, that for some reason was changed to:

struct Vec<T> {
    len: u64,
    pointer: raw_ptr,
    cap: u64,
}

Now caller will never be able to call the contract. Welcome to ABI Hell.

                                                         |                                                         
                        BEFORE                           |                          AFTER                          
                                                         |                                                         
           +-----> { 0x...., 16, 4 } <----+              |            +----->  { 0x...., 16, 4 } <---+             
           |                              |              |            |                              |             
           |                              |              |            |                              |             
{ pointer, capacity, len }    { pointer, capacity, len } | { pointer, capacity, len }    { len, pointer, capacity }
                                                         |                                                         
        What                           What              |         What                             What           
       CALLER                         CALLEE             |        CALLER                           CALLEE          
        sends                          sees              |         sends                            sees           
    Vec @ std-lib v1               Vec @ std-lib v1      |     Vec @ std-lib v1                  Vec @ std-lib v2  

Now imagine everything that can affect byte layouts: lib versions, compiler versions, compiler flags, compiler optimizations etc....

All this can break contract calls without anyone noticing. All this, actually, also breaks whoever is consuming these bytes: indexers, SDKs, receipts parsers etc...


So our encoding scheme avoid this instability being an issue by forcing types to specify how they are encoded, decoded:


impl<T> AbiEncode for Vec<T> where T: AbiEncode
{
    fn abi_encode(self, buffer: Buffer) -> Buffer {
        ...
    }
}

impl<T> AbiDecode for Vec<T> where T: AbiDecode
{
    fn abi_decode(ref mut buffer: BufferReader) -> Vec<T> {
        ...
    }
}

And to improve dev experience, we automatically implement AbiEncode/AbiDecode for all types that do not contain pointers.


Why? Issues with Return Value Demotion

Consider a simple contract method as:

impl TestContract for Contract {
    fn some_method() -> Vec<u64> {
        Vec::new()
    }
}

In reality this will be compiled into something like the code below. This is called "Return Value Demotion".

fn __contract_entry_some_method(return_value: &mut Vec<u64>) {
    *return_value = Vec::new();
}

The problems for contracts is that return_value would point to memory region of the callee. And contracts cannot write into it.

That would demand return_value pointing to somewhere allocated by the contract, and being copied over by the callee.


Why? Aliasing

If we bypass encoding, and allow caller to pass pointers to contracts, that immediately creates a security issues, because malicious callers could craft aliased data structures and trick contracts.

For example:

impl TestContract for Contract {
    fn some_method(v: Vec<u64>) {
        let some_value1 = do_something(&v);
        // do some_thing that allocate memory
        let some_value2 = do_something(&v);
    }
}

A malicious caller can craft a Vec which its pointer points to the memory the contract uses.

In the case above, would be possible to have some_value1 be different from some_value2, although this 100% not intuitive.

Allowing a caller to attack the contract somehow.

To avoid this, our encoding scheme, do not allow pointers, references etc... to be passed. You must always pass only the data.


What we have left

  • Encoding for
    • scripts
    • predicates
    • receipts (logs)
    • configurables

Scripts and Predicates

scripts and predicates can have arguments on their main function.

fn main(v: u64) -> bool {
    ...
}

In both cases, the compiler will desugar into something like:

pub fn __entry() -> raw_slice {
    let args: (u64,) = decode_script_data::<(u64,)>(); // or decode_predicate_data
    let result: u64 = main(args.0); 
    encode::<u64>(result)
}

Log

When the std::log is called we also encode its argument. So

    log(1);

will be desugared into

    __log(encode(1));

Configurables

Configurables are a little bit more complex. What happens with configurables is that their initialization is evaluated at compile time.

configurable {
    SOMETHING: u64 = 1,
}

In the example above, it will evaluate to 1. The compiler will then call encode(1) and append the result to the end of the binary.


And to allow SDKs to change this value just before deployment, the compiler generates an entry in the "ABI json", with the buffer offset in the binary.

{
    "name": "SOMETHING",
    "configurableType": { ... },
    "offset": 7104
},

The last piece for configurables is how are they "decoded"?

This is done automatically by the compiler.


For example

configurable { SOMETHING: u64 = 1 }

fn main() -> u64 { SOMETHING }

will be desugared into something like

const SOMETHING: u64;

fn __entry() -> raw_slice {
    std::codec::abi_decode_in_place(&mut SOMETHING, 7104, 8);
    encode(main())
}

fn main() -> u64 {
    SOMETHING
}

This is not valid sway, but gives the idea of what is happening.


end