
## Description Merges the two libraries. They were initially separate to separate the core logic and fuel vm specific functionality, but that separation is no longer maintained so having a merged library is better. Closes #6708 ## Checklist - [ ] I have linked to any relevant issues. - [ ] I have commented my code, particularly in hard-to-understand areas. - [ ] I have updated the documentation where relevant (API docs, the reference, and the Sway book). - [ ] If my change requires substantial documentation changes, I have [requested support from the DevRel team](https://github.com/FuelLabs/devrel-requests/issues/new/choose) - [ ] I have added tests that prove my fix is effective or that my feature works. - [ ] I have added (or requested a maintainer to add) the necessary `Breaking*` or `New Feature` labels where relevant. - [ ] I have done my best to ensure that my PR adheres to [the Fuel Labs Code Review Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md). - [ ] I have requested a review from the relevant team or maintainers. --------- Co-authored-by: Sophie <47993817+sdankel@users.noreply.github.com> Co-authored-by: Igor Rončević <ironcev@hotmail.com>
12 KiB
layout | marp |
---|---|
cover | true |
Sway Encoding
- How it works for contracts?
- Caller POV
- Contract being called POV
- Interlude: Why even encode the data?
- Encoding for
- scripts
- predicates
- receipts (logs)
- configurables
How it works? (caller POV)
At some point, someones calls a contract:
let contract = abi(TestContract, CONTRACT_ID);
contract.some_method(0);
Compiler will desugar this into
std::codec::contract_call(
CONTRACT_ID,
"some_method",
(0,),
)
which will be expanded into:
let first_parameter = encode("some_method");
let second_parameter = encode((0,));
let params = encode((
CONTRACT_ID,
first_parameter.ptr(),
second_parameter.ptr(),
));
let (ptr, len) = __contract_call(params.ptr(), coins, asset_id, gas);
let mut buffer = BufferReader::from_parts(ptr, len);
T::abi_decode(buffer)
and, in the end, will be compiled into a fuelVM call
instruction.
$hp
│
│ ┌────────────────────────────┐
▼ │ │
┌──────────────┬──────┴──────┬──────────────┬──────▼────────┬───────────────┐
│ │ │ │ │ │
HEAP │ CONTRACT_ID │ method name │ method args │ encoded bytes │ encoded bytes │
│ 32 bytes │ param1 │ param2 │ │ │
└──────▲───────┴─────────────┴──────┬───────┴───────────────┴───────▲───────┘
│ │ │
│ └───────────────────────────────┘
│
... │
call $ra:ptr $rb:u64 $rc:ptr $rd:u64
... coins │ gas
│
┌▼─────────────┐
│ │
STACK .................... │ ASSET_ID │
│ 32 bytes │
└──────────────┘
▲ ▲
│ │
│ │
$ssp $sp
How it works? (contract being called POV)
The example above was calling a contract implemented like:
impl TestContract for Contract {
fn some_method(qty: u64) {
...
}
}
Compiler will desugar this into:
pub fn __entry() {
let method_name = std::codec::decode_first_param::<str>();
if method_name == "some_method" {
let mut buffer = std::codec::BufferReader::from_second_parameter();
let args: (u64,) = buffer.decode::<(u64,)>();
let result: () = __contract_entry_some_method(args.0);
let result: raw_slice = encode::<()>(result);
__contract_ret(result.ptr(), result.len::<u8>());
}
__revert(123);
}
__contract_entry_some_method
is the original some_method
as is.
__contract_ret
is a special function that immediately returns the current context, that is why there is no return
in the generated code.
This is the memory layout before the contract method is actually called
$hp
|
| +----------------------------+
v | |
+--------------+------+------+--------------+------v--------+---------------+
| | | | | |
HEAP | CONTRACT_ID | method name | method args | encoded bytes | encoded bytes |
| 32 bytes | param1 | param2 | | |
+--------------+-------------+------+-------+------^--------+-------^--^----+
| | | |
+--------------+----------------+ |
| |
+-------------------------+ |
| |
| +-------------------------------------+
| |
+--------------+--------+-------+--------+
| | param1 param2 |
STACK | ASSET_ID | @ 73 words @ 74 words |
| 32 bytes | offset offset |
+--------------+-------------------------+
^ call frame metadata ^
| |
| |
$fp $ssp/$sp
Interlude: Why even encode the data?
- All this begs the question... why?
- Why all the extra cost of encoding and decoding?
- why we don´t just pass all parameters directly?
- Answer:
- ABI instability
- Return Value Demotion
- Aliasing
Why? Avoiding ABI instability
APIs define the types that are exchanged:
abi MyContract {
fn some_method(arg: Vec<u64>);
}
but they do not define how these types are passed around. In the example above, the only safe way to call a contract, would be to use the exactly same version of the stdlib
the contract implementation is using.
Why?
Any internal change to Vec
could make calling a contract impossible.
Example: Suppose that caller and callee were compiled using std-lib v1
.
struct Vec<T> {
pointer: raw_ptr,
cap: u64,
len: u64,
}
But the contract implementation decide to update to std-lib v2
, that for some reason was changed to:
struct Vec<T> {
len: u64,
pointer: raw_ptr,
cap: u64,
}
Now caller will never be able to call the contract. Welcome to ABI Hell.
|
BEFORE | AFTER
|
+-----> { 0x...., 16, 4 } <----+ | +-----> { 0x...., 16, 4 } <---+
| | | | |
| | | | |
{ pointer, capacity, len } { pointer, capacity, len } | { pointer, capacity, len } { len, pointer, capacity }
|
What What | What What
CALLER CALLEE | CALLER CALLEE
sends sees | sends sees
Vec @ std-lib v1 Vec @ std-lib v1 | Vec @ std-lib v1 Vec @ std-lib v2
Now imagine everything that can affect byte layouts: lib versions, compiler versions, compiler flags, compiler optimizations etc....
All this can break contract calls without anyone noticing. All this, actually, also breaks whoever is consuming these bytes: indexers, SDKs, receipts parsers etc...
So our encoding scheme avoid this instability being an issue by forcing types to specify how they are encoded, decoded:
impl<T> AbiEncode for Vec<T> where T: AbiEncode
{
fn abi_encode(self, buffer: Buffer) -> Buffer {
...
}
}
impl<T> AbiDecode for Vec<T> where T: AbiDecode
{
fn abi_decode(ref mut buffer: BufferReader) -> Vec<T> {
...
}
}
And to improve dev experience, we automatically implement AbiEncode
/AbiDecode
for all types that do not contain pointers.
Why? Issues with Return Value Demotion
Consider a simple contract method as:
impl TestContract for Contract {
fn some_method() -> Vec<u64> {
Vec::new()
}
}
In reality this will be compiled into something like the code below. This is called "Return Value Demotion".
fn __contract_entry_some_method(return_value: &mut Vec<u64>) {
*return_value = Vec::new();
}
The problems for contracts is that return_value
would point to memory region of the callee. And contracts cannot write into it.
That would demand return_value
pointing to somewhere allocated by the contract, and being copied over by the callee.
Why? Aliasing
If we bypass encoding, and allow caller to pass pointers to contracts, that immediately creates a security issues, because malicious callers could craft aliased data structures and trick contracts.
For example:
impl TestContract for Contract {
fn some_method(v: Vec<u64>) {
let some_value1 = do_something(&v);
// do some_thing that allocate memory
let some_value2 = do_something(&v);
}
}
A malicious caller can craft a Vec
which its pointer points to the memory the contract uses.
In the case above, would be possible to have some_value1
be different from some_value2
, although this 100% not intuitive.
Allowing a caller to attack the contract somehow.
To avoid this, our encoding scheme, do not allow pointers, references etc... to be passed. You must always pass only the data.
What we have left
- Encoding for
- scripts
- predicates
- receipts (logs)
- configurables
Scripts and Predicates
scripts
and predicates
can have arguments on their main
function.
fn main(v: u64) -> bool {
...
}
In both cases, the compiler will desugar into something like:
pub fn __entry() -> raw_slice {
let args: (u64,) = decode_script_data::<(u64,)>(); // or decode_predicate_data
let result: u64 = main(args.0);
encode::<u64>(result)
}
Log
When the std::log
is called we also encode its argument. So
log(1);
will be desugared into
__log(encode(1));
Configurables
Configurables are a little bit more complex. What happens with configurables is that their initialization is evaluated at compile time.
configurable {
SOMETHING: u64 = 1,
}
In the example above, it will evaluate to 1
. The compiler will then call encode(1)
and append the result to the end of the binary.
And to allow SDKs to change this value just before deployment, the compiler generates an entry in the "ABI json", with the buffer offset in the binary.
{
"name": "SOMETHING",
"configurableType": { ... },
"offset": 7104
},
The last piece for configurables is how are they "decoded"?
This is done automatically by the compiler.
For example
configurable { SOMETHING: u64 = 1 }
fn main() -> u64 { SOMETHING }
will be desugared into something like
const SOMETHING: u64;
fn __entry() -> raw_slice {
std::codec::abi_decode_in_place(&mut SOMETHING, 7104, 8);
encode(main())
}
fn main() -> u64 {
SOMETHING
}
This is not valid sway
, but gives the idea of what is happening.
end