Fast 1/X division (reciprocal)
An answer to this question on Stack Overflow.
Question
Is there some way to improve reciprocal (division 1 over X) with respect to speed, if the precision is not crucial?
So, I need to calculate 1/X. Is there some workaround so I lose precision but do it faster?
Answer
The rcpss assembly instruction computes an approximate reciprocal with |Relative Error| ≤ 1.5 ∗ 2^−12.
You can enable it on a compiler with the -mrecip flag (you might also need -ffast-math).
The instrinsic is _mm_rcp_ss.