Optimizing n-body simulation
An answer to this question on Stack Overflow.
Question
I'm trying to optimize the n-body algorithm, I have seen that the most expensive function is this:
real3 bodyBodyInteraction(real iPosx, real iPosy, real iPosz,
real jPosx, real jPosy, real jPosz, real jMass)
{
real rx, ry, rz;
rx = jPosx - iPosx;
ry = jPosy - iPosy;
rz = jPosz - iPosz;
real distSqr = rx*rx+ry*ry+rz*rz;
distSqr += SOFTENING_SQUARED;
real s = jMass / POW(distSqr,3.0/2.0); //very expensive
real3 f;
f.x = rx * s;
f.y = ry * s;
f.z = rz * s;
return f;
}
Using perf record I can see the division is the most expensive instruction and this one have a O(n^2) complexity, but I don't really know how to optimize it.
Answer
Convert
for(int i=0;i<N;i++)
for(int j=0;j<N;j++)
into
for(int i=0; i<N;i++)
for(int j=i+1;j<N;j++)
Restructure to take advantage of SIMD operators, this can quadruple your throughput.
Use OpenMP to parallelize the loops either across your CPU or by offloading to your GPU (OpenMP 4.5+).
Learn about the Barnes-Hut algorithm, which groups particles to achieve O(N log N) complexity (down from your O(N^2)).