Latest Results
Optimize initialization of array repeat when the initial value is zero (#7299)
## Description
Continuation of https://github.com/FuelLabs/sway/issues/6860.
This PR improves performance of arrays initialized with zero like
`[0u16; 5]`. Before, the compiler was initializing the array with `SW`
or using a loop, depending on the array size. Now it is being
initialized with just one `MCLI`.
Example:
```sway
#[inline(never)]
fn array_repeat_zero_big_u16() -> [u16; 25] {
[0u16; 25]
}
```
The generated IR will use the local initializer because it has realized
the array is initialized with all zeros.
```
fn array_repeat_zero_small_u16_2(__ret_value: __ptr [u64; 5]) -> __ptr [u64; 5], !21 {
local [u64; 5] __anon_0 = const [u64; 5] [u64 0, u64 0, u64 0, u64 0, u64 0]
entry(__ret_value: __ptr [u64; 5]):
v0 = get_local __ptr [u64; 5], __anon_0, !22
mem_copy_val __ret_value, v0
ret __ptr [u64; 5] __ret_value
}
```
And this is the generated ASM.
```
pshl i3 ; save registers 16..40
pshh i524288 ; save registers 40..64
move $$locbase $sp ; save locals base register for function array_repeat_zero_small_u16_2
cfei i40 ; allocate 40 bytes for locals and 0 slots for call arguments
move $r0 $$arg0 ; save argument 0 (__ret_value)
move $r1 $$reta ; save return address
addi $$tmp $$locbase i0 ; array initialization - array ptr <----- look here
mcli $$tmp i40 ; array initialization - clear
mcpi $r0 $$locbase i40 ; copy memory
move $$retv $r0 ; set return value
cfsi i40 ; free 40 bytes for locals and 0 slots for extra call arguments
move $$reta $r1 ; restore return address
poph i524288 ; restore registers 40..64
popl i3 ; restore registers 16..40
jal $zero $$reta i0 ; return from call
````
## Future Optimizations
We can simplify the ASM generation for function a lot using the function
above as example:
````
pshl i3
pshh i524288
move $$locbase $sp
cfei i40
move $r0 $$arg0
move $r1 $$reta
addi $$tmp $$locbase i0
mcli $$tmp i40
mcpi $r0 $$locbase i40
move $$retv $r0
cfsi i40
move $$reta $r1
poph i524288
popl i3
jal $zero $$reta i0
````
- We can merge `ADDI` and `MCLI` into one on
`AbstractInstructionSet::constant_propagate`. The optimizer should know
that `$$tmp` and `$$locbase` have the same value, and that `addi` here
is useless.
Another possible option is to allow `constant_propagate` replace `$$tmp`
for `$$locbase` in the `mcli` and let DCE remove the `addi`.
```
addi $$tmp $$locbase i0
mcli $$tmp i40
```
- After this the compiler will see one `mcli` followed by one `mcpi`.
Both have the same pointer and the same length. The optimizer should be
able to merge it into just one `mcli`
```
mcli $$locbase i40
mcpi $r0 $$locbase i40
```
- We don't need to save `$reta to `$r1`. This function does not call any
other funtion. And we also don't need to move `$r1` back to `$reta`.
That would eliminate two instructions...
We also don't need to move `$sp` to `$$locbase` for the same reason.
And we also don't need to move `$$arg0` to `$r0`.
- We also don't need to push and pop all registers. This function is
very small and probably only use very few registers. We could eliminate
`pshl i3` and `popl i3`, if we just use "high registers" for small
functions.
After all these optimizations the asm would be
```
pshh i524288
cfei i40
mcli $sp i40
mcpi $$arg0 $sp i40
move $$retv $$arg0
cfsi i40
poph i524288
jal $zero $$reta i0
````
We can go even further if we realize that `mem[$sp, i40]` is written
only once, never read and copied to `mem[$$arg0, i40]`. Which means that
we could replace the `mcli` to clear `mem[$$arg0, i40]` directly.
That would flag the "local" `mem[$sp, i40]` as never used and we could
remove the local entirely.
```
pshh i524288
mcli $$arg0 i40
move $$retv $$arg0
poph i524288
jal $zero $$reta i0
```
After this we could check if we really need "lowered functions" to have
a return value. If not we could remove the `move $$retv $$arg0`. And
finish this function on:
```
pshh i524288
mcli $$arg0 i40
poph i524288
jal $zero $$reta i0
```
And extra optimization would be to realize this function does not write
to registers, so it does not need to save and restore them.
```
mcli $$arg0 i40
jal $zero $$reta i0
```
Of course, that a trivial function like this should be inlined, but I
forced to not be inlined so we can exercise some of these optimizations.
## Checklist
- [x] I have linked to any relevant issues.
- [x] I have commented my code, particularly in hard-to-understand
areas.
- [ ] I have updated the documentation where relevant (API docs, the
reference, and the Sway book).
- [ ] If my change requires substantial documentation changes, I have
[requested support from the DevRel
team](https://github.com/FuelLabs/devrel-requests/issues/new/choose)
- [x] I have added tests that prove my fix is effective or that my
feature works.
- [ ] I have added (or requested a maintainer to add) the necessary
`Breaking*` or `New Feature` labels where relevant.
- [ ] I have done my best to ensure that my PR adheres to [the Fuel Labs
Code Review
Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md).
- [ ] I have requested a review from the relevant team or maintainers. Active Branches
#73390%
#73220%
#72610%
© 2025 CodSpeed Technology