Avatar for the FuelLabs user
FuelLabs
sway
BlogDocsChangelog

Performance History

Latest Results

Merge branch 'master' into IGI-111/dont-duplicate-source
IGI-111/dont-duplicate-source
16 hours ago
Optimize initialization of array repeat when the initial value is zero (#7299) ## Description Continuation of https://github.com/FuelLabs/sway/issues/6860. This PR improves performance of arrays initialized with zero like `[0u16; 5]`. Before, the compiler was initializing the array with `SW` or using a loop, depending on the array size. Now it is being initialized with just one `MCLI`. Example: ```sway #[inline(never)] fn array_repeat_zero_big_u16() -> [u16; 25] { [0u16; 25] } ``` The generated IR will use the local initializer because it has realized the array is initialized with all zeros. ``` fn array_repeat_zero_small_u16_2(__ret_value: __ptr [u64; 5]) -> __ptr [u64; 5], !21 { local [u64; 5] __anon_0 = const [u64; 5] [u64 0, u64 0, u64 0, u64 0, u64 0] entry(__ret_value: __ptr [u64; 5]): v0 = get_local __ptr [u64; 5], __anon_0, !22 mem_copy_val __ret_value, v0 ret __ptr [u64; 5] __ret_value } ``` And this is the generated ASM. ``` pshl i3 ; save registers 16..40 pshh i524288 ; save registers 40..64 move $$locbase $sp ; save locals base register for function array_repeat_zero_small_u16_2 cfei i40 ; allocate 40 bytes for locals and 0 slots for call arguments move $r0 $$arg0 ; save argument 0 (__ret_value) move $r1 $$reta ; save return address addi $$tmp $$locbase i0 ; array initialization - array ptr <----- look here mcli $$tmp i40 ; array initialization - clear mcpi $r0 $$locbase i40 ; copy memory move $$retv $r0 ; set return value cfsi i40 ; free 40 bytes for locals and 0 slots for extra call arguments move $$reta $r1 ; restore return address poph i524288 ; restore registers 40..64 popl i3 ; restore registers 16..40 jal $zero $$reta i0 ; return from call ```` ## Future Optimizations We can simplify the ASM generation for function a lot using the function above as example: ```` pshl i3 pshh i524288 move $$locbase $sp cfei i40 move $r0 $$arg0 move $r1 $$reta addi $$tmp $$locbase i0 mcli $$tmp i40 mcpi $r0 $$locbase i40 move $$retv $r0 cfsi i40 move $$reta $r1 poph i524288 popl i3 jal $zero $$reta i0 ```` - We can merge `ADDI` and `MCLI` into one on `AbstractInstructionSet::constant_propagate`. The optimizer should know that `$$tmp` and `$$locbase` have the same value, and that `addi` here is useless. Another possible option is to allow `constant_propagate` replace `$$tmp` for `$$locbase` in the `mcli` and let DCE remove the `addi`. ``` addi $$tmp $$locbase i0 mcli $$tmp i40 ``` - After this the compiler will see one `mcli` followed by one `mcpi`. Both have the same pointer and the same length. The optimizer should be able to merge it into just one `mcli` ``` mcli $$locbase i40 mcpi $r0 $$locbase i40 ``` - We don't need to save `$reta to `$r1`. This function does not call any other funtion. And we also don't need to move `$r1` back to `$reta`. That would eliminate two instructions... We also don't need to move `$sp` to `$$locbase` for the same reason. And we also don't need to move `$$arg0` to `$r0`. - We also don't need to push and pop all registers. This function is very small and probably only use very few registers. We could eliminate `pshl i3` and `popl i3`, if we just use "high registers" for small functions. After all these optimizations the asm would be ``` pshh i524288 cfei i40 mcli $sp i40 mcpi $$arg0 $sp i40 move $$retv $$arg0 cfsi i40 poph i524288 jal $zero $$reta i0 ```` We can go even further if we realize that `mem[$sp, i40]` is written only once, never read and copied to `mem[$$arg0, i40]`. Which means that we could replace the `mcli` to clear `mem[$$arg0, i40]` directly. That would flag the "local" `mem[$sp, i40]` as never used and we could remove the local entirely. ``` pshh i524288 mcli $$arg0 i40 move $$retv $$arg0 poph i524288 jal $zero $$reta i0 ``` After this we could check if we really need "lowered functions" to have a return value. If not we could remove the `move $$retv $$arg0`. And finish this function on: ``` pshh i524288 mcli $$arg0 i40 poph i524288 jal $zero $$reta i0 ``` And extra optimization would be to realize this function does not write to registers, so it does not need to save and restore them. ``` mcli $$arg0 i40 jal $zero $$reta i0 ``` Of course, that a trivial function like this should be inlined, but I forced to not be inlined so we can exercise some of these optimizations. ## Checklist - [x] I have linked to any relevant issues. - [x] I have commented my code, particularly in hard-to-understand areas. - [ ] I have updated the documentation where relevant (API docs, the reference, and the Sway book). - [ ] If my change requires substantial documentation changes, I have [requested support from the DevRel team](https://github.com/FuelLabs/devrel-requests/issues/new/choose) - [x] I have added tests that prove my fix is effective or that my feature works. - [ ] I have added (or requested a maintainer to add) the necessary `Breaking*` or `New Feature` labels where relevant. - [ ] I have done my best to ensure that my PR adheres to [the Fuel Labs Code Review Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md). - [ ] I have requested a review from the relevant team or maintainers.
master
2 days ago
Merge branch 'master' into xunilrj/optimize-zero-array-repeat
xunilrj/optimize-zero-array-repeat
2 days ago

Active Branches

Don't duplicate source pointers in spans
last run
16 hours ago
#7339
CodSpeed Performance Gauge
0%
#7322
CodSpeed Performance Gauge
0%
#7261
CodSpeed Performance Gauge
0%
© 2025 CodSpeed Technology
Home Terms Privacy Docs