SOA vs AOS (for the speed freaks) (specially for GoToLoop)

Specially for GoToLoop cause he seems to like things that increase speed...

Ok, SOA vs AOS is Structure of Arrays vs Array of Structures. Long story short, most machines like SOA a lot more but it's a tiny bitt less nice to write. Here is a great video about it:

(If you like that video then also check his Processor Pipeline video).

average normal: 41ms average soa: 19ms

So that's more then a 50% speed increase!!

public void setup() {
  
  float foo = 0;
  
  int N = 16000000;
  
  Vec[] vecs = new Vec[N];
 
  VecSOA vecSOA = new VecSOA();
  vecSOA.x = new float[N];
  vecSOA.y = new float[N];
  vecSOA.z = new float[N];
  
  
  // first create
  for (int i = 0; i < N; i++) {
    vecs[i] = new Vec();
    vecs[i].x = random(1);
    vecs[i].y = random(1);
    vecs[i].z = random(1); 
    
    vecSOA.x[i] = random(1);
    vecSOA.y[i] = random(1);
    vecSOA.z[i] = random(1);   
  }
   
  // test
  
  int tests = 20;
  
  int sumTime = 0;
  
  for (int i = 0; i < tests; i++) {
      int start = millis();
      float sumX, sumY, sumZ;
      sumX = sumY = sumZ = 0;
      
      for (int j = 0; j < N; j++) {
        sumX += vecs[j].x;
        sumY += vecs[j].y;
        sumZ += vecs[j].z;
      }   
      
      foo += sumX + sumY + sumZ;
      
      int end = millis()-start;
      sumTime += end;
      println(end);
  }
  
  int average = sumTime / tests;
  println("average normal: "+average);
  
  
  println();
  
  // reset
  sumTime = 0;
  
  for (int i = 0; i < tests; i++) {
      int start = millis();
      float sumX, sumY, sumZ;
      sumX = sumY = sumZ = 0;
      
      for (int j = 0; j < N; j++) {
        sumX += vecSOA.x[j];
        sumY += vecSOA.y[j];
        sumZ += vecSOA.z[j];
      }   
      
      foo += sumX + sumY + sumZ;
      
      int end = millis()-start;
      sumTime += end;
      println(end);
  }
 
  println();
  
  average = sumTime / tests;
  println("average soa: "+average);
  
  
  
}

class Vec {
  float x;
  float y;
  float z;   
}

class VecSOA {
  
  float[] x;
  float[] y;
  float[] z;
  
}
Tagged:

Comments

  • This guy has a number of videos based under the banner "Math for Game Developers", you can see them here.

  • edited July 2015

    I loved it! Thx for the video @clankill3r! ^:)^

    I was pretty much already aware about pipelines & cache.
    That's why an i7 CPU is faster than an i5 w/ the same clock, due to bigger cache.
    But I didn't know that wrapping up arrays would have such big speed impact! @-)

    Of couse Java doesn't feature structs as C, C++, D, etc do. Only classes are offered. :(
    Therefore we probably won't get contiguous block alignment as perfect as those system languages!

    Nevertheless, here's a tweaked performance test version w/ an extra AoA (Array of Arrays)! ;-)

    // forum.Processing.org/two/discussion/11885/
    // soa-vs-aos-for-the-speed-freaks-specially-for-gotoloop
    
    // 2015-Jul-29
    
    static final int QTY = 5000000, TESTS = 50;
    static final float RND = 1e-3, DIV = 1e1;
    
    class Vec {
      float x = random(RND), y = random(RND), z = random(RND);
    }
    
    class VecSoA {
      final float[] x = new float[QTY], y = new float[QTY], z = new float[QTY];
    
      VecSoA() {
        for (int i = 0; i != QTY; ++i) {
          x[i] = random(RND);
          y[i] = random(RND);
          z[i] = random(RND);
        }
      }
    }
    
    void setup() {
      final Vec[] vecAoS = new Vec[QTY];
      for (int i = 0; i != QTY; vecAoS[i++] = new Vec());
    
      final VecSoA vecSoA = new VecSoA();
    
      final float[][] vecAoA = new float[3][QTY];
      final float[] vx = vecAoA[0], vy = vecAoA[1], vz = vecAoA[2];
      for (float[] f : vecAoA)
        for (int i = 0; i != QTY; f[i++] = random(RND));
    
      int sumTime = 0;
      float sumVal = 0;
    
      // AoS:
      for (int i = 0; i != TESTS; i++) {
        int start = millis();
        float sumX = 0, sumY = 0, sumZ = 0;
    
        //for (Vec v : vecAoS)  sumVal += (v.x + v.y + v.z) / DIV;
        for (Vec v : vecAoS) {
          sumX += v.x;
          sumY += v.y;
          sumZ += v.z;
        }
    
        sumVal  += (sumX + sumY + sumZ) / DIV;
        sumTime += millis() - start;
      }
    
      int avg = sumTime / TESTS;
      println("avg. AoS:", avg, "\tsum:", sumVal);
    
      sumVal = sumTime = 0;
    
      // SoA:
      for (int i = 0; i != TESTS; i++) {
        int start = millis();
        float sumX = 0, sumY = 0, sumZ = 0;
    
        for (int j = 0; j != QTY; ++j) {
          //sumVal += (vecSoA.x[j] + vecSoA.y[j] + vecSoA.z[j]) / DIV;
          sumX += vecSoA.x[j];
          sumY += vecSoA.y[j];
          sumZ += vecSoA.z[j];
        }
    
        sumVal  += (sumX + sumY + sumZ) / DIV;
        sumTime += millis() - start;
      }
    
      avg = sumTime / TESTS;
      println("avg. SoA:", avg, "\tsum:", sumVal);
    
      sumVal = sumTime = 0;
    
      // AoA:
      for (int i = 0; i != TESTS; i++) {
        int start = millis();
        float sumX = 0, sumY = 0, sumZ = 0;
    
        //for (int j = 0; j != QTY; ++j)  sumVal += (vx[j] + vy[j] + vz[j]) / DIV;
        for (int j = 0; j != QTY; ++j) {
          sumX += vx[j];
          sumY += vy[j];
          sumZ += vz[j];
        }
    
        sumVal  += (sumX + sumY + sumZ) / DIV;
        sumTime += millis() - start;
      }
    
      avg = sumTime / TESTS;
      println("avg. AoA:", avg, "\tsum:", sumVal);
    
      exit();
    }
    
  • Nice loop :)

    for (int i = 0; i != QTY; vecAoS[i++] = new Vec());

    And for me AoA was 1ms slower :)

  • edited July 2015

    O yeah, if you have time you should watch this (this one is about SOA):

    He's working on a language named JAI and I can't wait for it! It will be fast just like C is for example but it's so much better in way to many aspects to mention.

    This is the into why he started this language:

Sign In or Register to comment.