During the last advent of code, while starting with Rust, I discovered that the Rust standard library was missing the usual (a.k.a. ) function that would simultaneously return the quotient and remainder. I got confused since I already used the operator in C++ with std::div or Python with divmod. It seemed a basic way of improving performance since both values would come out with the same assembly instruction, nicely avoiding extra CPU cycles.
Looking it up, I found an existing RFC created in 2017, but it hasn't received much attention since.
So, do we need a implementation in the standard library? I decided to investigate.
Let's naively implement the function:
To dissect what is executed we're gonna analyze the compilation steps.
First, let's have a look at the LLVM Intermediate Representation(IR). If you're not familiar with it, is an extremely low-level programming language close to assembly but without being tied to specific hardware. It's extremely useful for compiler writers since it's a common language that can be used to target multiple architectures. It's also a great tool to understand what the compiler is doing and how it "interpret" our code.
For the previous code, here is the annotated LLVM IR we can obtain:
We can witness that this code contains redundant checks for the division and modulo by 0. Also, we can note two operations to get our quotient and remainder: and . This is expected at this stage since LLVM doesn't have an instruction combining them but hopefully, the compiler will be able to optimize this while generating the assembly.
Let's have a look at the next compilation step so we can dive into the assembly code generated from the IR( with optimizations enabled):
And voila! We can see that the compiler was able to optimize our code and fused the two operations into a single instruction. So having a dedicated implementation wouldn't be that helpful for performance while using since this optimization should almost always be done by the IR compiler.
You might have noticed quite a few things in the assembly code, let's have a look at them by creating our own assembly version.
A few things to note in this implementation:
Compiling this, we can see that it ends up a bit more concise than the previous compiled assembly we built:
It should be faster but we're cheating a bit on rust here since we're not taking into account the panic check that was done in the previous version.
It's faster! The overhead of those panic checks is not that dramatic in the end. Branch prediction is probably contributing a lot to this. It was eluded in the first IR snippets but in there, the rust compiler hints the rest of the pipeline that those checks are likely to succeed. The compiler's ability to optimize and hint that these checks are more likely to pass than fail can lead to surprisingly efficient code, even when it appears to be doing "much more" than the assembly version.
Now that we built our unchecked version in assembly, we can get quite close to it by using unchecked div and rem from :
And its intermediate representation:
As you can see this IR doesn't contain anything about overflow checks and the performance should be close to or even better than our assembly version.
That's exactly the same time our assembly version takes. So let's have a look at the assembly:
And it's the same as our assembly version! So bothering writing our assembly version is not really worth it in the end.
To conclude, there is no performance drawback of not having a single function since the compiler optimizations will take care of this burden for us. It might seem odd to write those two operations but unless very specific optimization killer cases they will be fused as a single assembly instruction.
If you enjoyed this article, you can follow us on Twitter to get notified when we publish new articles.