Ygor Serpa
2 min readFeb 2, 2022

--

A bit of added info. Processors nowadays are able to handle multiple instructions at the same time, including pre-processing future instructions. This is called out-of-order execution.

The more "coherent" is the code, the better. In other words, if-statements slow down the processor, as it can only guess which route it will take (the if-route or else-route).

In the memcpy case, after doing all 8-byte copies, it can be harmful to try doing 4 bytes copy followed by a 2 bytes copy and a final byte copy. For each of these copies, you got to check if there is enough data to copy. A much faster solution is to go straight from 8-byte copies to 1-byte copies.

Is this ways the case? no. If you are dealing with a very limited / slow processor, the extra savings from doing 8, 4, 2, 1 byte copies might still be worth while, but not on modern computer processors.

Moreover, most processors from 2010 onwards are able to handle 32 byte data types (AVX instructions), which would be followed by up to 31 one-byte copies.

In general, this is a trend with computing. As processors get faster, some "dumber approaches" just become faster than the "smart approach" because they are more predictable and repetitive.

A nice example is the sort function. Most people would do a quick-sort until fully sorted. In practice, a dumb insertion sort is much faster for < 32 entries than quick sort. Thus, a two-stage approach is used. You start with quick-sort up to 32 entries partitions and you finish it all with multiple insertion sorts.

--

--

Ygor Serpa
Ygor Serpa

Written by Ygor Serpa

Former game developer turned data scientist after falling in love with AI and all its branches.

No responses yet