How to improve a multi-threaded particle system

edited October 2016 in Questions about Code

I just wanted to wrap my head around the concept of parallel processing (using the CPU) in Java/Processing and coded a little particle system to test things out.

Computing 1000000 particles, it currently runs at 38-42 FPS on my AMD hexacore, which is kinda good I guess, but there is certainly much room for improvements.

Any advice on how to improve this sketch's performance or parallel processing in general?

/////////////////////////////////////////////////////////////
// Imports //////////////////////////////////////////////////
/////////////////////////////////////////////////////////////
import java.util.concurrent.Callable;
import java.util.concurrent.Executors;
import java.util.concurrent.ExecutorService;


/////////////////////////////////////////////////////////////
// Variable definitions /////////////////////////////////////
/////////////////////////////////////////////////////////////
ExecutorService executorService;
ArrayList<Task> tasks;
Particle[] particles;


/////////////////////////////////////////////////////////////
// Initiate /////////////////////////////////////////////////
/////////////////////////////////////////////////////////////
public void setup() {

    // Set window size and renderer
    size(800, 800);

    // Pass applet reference
    Task.parent = Particle.parent = this;

    // Set particle color
    int particleColor = #100302;
    Particle.red   = particleColor & 0x00ff0000;
    Particle.green = particleColor & 0x0000ff00;
    Particle.blue  = particleColor & 0x000000ff;

    // Create particles
    particles = new Particle[1000000];
    for(int n = 0; n < particles.length; n++) {
        particles[n] = new Particle(random(TWO_PI), 100 + random(width / 2 - 100));
    }

    // Get available processor cores
    int availableProcessors = Runtime.getRuntime().availableProcessors();

    // Create thread executor service
    executorService = Executors.newFixedThreadPool(availableProcessors);

    // Create tasks
    tasks = new ArrayList<Task>();
    int particlesPerTask = particles.length / availableProcessors;
    for(int n = 0, index = 0; n < availableProcessors; n++, index += particlesPerTask)
        tasks.add(new Task(particles, index, index + particlesPerTask));

}


/////////////////////////////////////////////////////////////
// Render ///////////////////////////////////////////////////
/////////////////////////////////////////////////////////////
public void draw() {

    // Fill background
    background(#000000);

    // Update & render particles
    loadPixels();
    try {
        executorService.invokeAll(tasks);
    } catch(Exception e) {
        println(e);
    }
    updatePixels();

    // Print info
    text("FPS " + round(frameRate), 10, 10 + textAscent());
    text("Particles " + particles.length, 10, 30 + textAscent());
    text("Tasks " + tasks.size(), 10, 50 + textAscent());

}


/////////////////////////////////////////////////////////////
// Task class ///////////////////////////////////////////////
/////////////////////////////////////////////////////////////
public static class Task implements Callable<Integer> {


    /////////////////////////////////////////////////////////////
    // Variable definitions /////////////////////////////////////
    /////////////////////////////////////////////////////////////
    public static PApplet parent;
    public Particle[] particles;
    public int start;
    public int end;


    /////////////////////////////////////////////////////////////
    // Constructor //////////////////////////////////////////////
    /////////////////////////////////////////////////////////////
    public Task(Particle[] particles, int start, int end) {
        this.particles = particles;
        this.start = start;
        this.end = end;
    }


    /////////////////////////////////////////////////////////////
    // Run //////////////////////////////////////////////////////
    /////////////////////////////////////////////////////////////
    public Integer call() {
        if(parent.mousePressed)
            for(int n = start; n < end; n++)
                particles[n].updatePressed();
        else
            for(int n = start; n < end; n++)
                particles[n].updateReleased();
        for(int n = start; n < end; n++)
            particles[n].render();
        return 1;
    }


}


/////////////////////////////////////////////////////////////
// Particle class ///////////////////////////////////////////
/////////////////////////////////////////////////////////////
public static class Particle {


    /////////////////////////////////////////////////////////////
    // Variable definitions /////////////////////////////////////
    /////////////////////////////////////////////////////////////
    public static PApplet parent;
    public static int red;
    public static int green;
    public static int blue;
    public float positionX;
    public float positionY;
    public float velocityX;
    public float velocityY;


    /////////////////////////////////////////////////////////////
    // Constructor //////////////////////////////////////////////
    /////////////////////////////////////////////////////////////
    public Particle(float angle, float distance) {
        positionX = parent.width / 2.0F + parent.sin(angle) * distance;
        positionY = parent.height / 2.0F + parent.cos(angle) * distance;
    }


    /////////////////////////////////////////////////////////////
    // Update ///////////////////////////////////////////////////
    /////////////////////////////////////////////////////////////
    public void updatePressed() {
        float invDistance = 1.2F / parent.dist(positionX, positionY, parent.mouseX, parent.mouseY);
        velocityX += (parent.mouseX - positionX) * invDistance;
        velocityY += (parent.mouseY - positionY) * invDistance;
        positionX += velocityX;
        positionY += velocityY;
    }
    public void updateReleased() {
        velocityX *= 0.9F;
        velocityY *= 0.9F;
        positionX += velocityX;
        positionY += velocityY;
    }


    /////////////////////////////////////////////////////////////
    // Render ///////////////////////////////////////////////////
    /////////////////////////////////////////////////////////////
    public void render() {
        if(positionX < 0 || positionX >= parent.width || positionY < 0 || positionY >= parent.height)
            return;
        int index = (int)positionX + (int)positionY * parent.width;
        int pixelRGB = parent.pixels[index];
        int r = (pixelRGB & 0x00ff0000) + red;
        int g = (pixelRGB & 0x0000ff00) + green;
        int b = (pixelRGB & 0x000000ff) + blue;
        parent.pixels[index] = 0xff000000 | (r > 0x00ff0000 ? 0x00ff0000 : r) | (g > 0x0000ff00 ? 0x0000ff00 : g) | (b > 0x000000ff ? 0x000000ff : b);
    }


}

Answers

  • edited January 2014

    Processing got 1 displayable canvas only! And it seems like there are 6 Task instances there,
    all accessing and even modifying that very same canvas? :-&

    It coulda be even worse if the "Animation Thread" was also rendering at the same time.
    That'd be 7 threads total for your machine! o=>

    Perhaps you should make each Task have its own transparent PImage or PGraphics.
    And leave to draw() to image() each 1 at the end of the cycle? :-?

    Ah! Avoid dist() calculation, since it relies on sqr() internally! >-)

  • Hey Poersch,

    Nice use of the Callable class!

    The only real improvement or change I could see off hand is to use PVector rather than float for your Particle class. They provide some nice methods to add and multiply to each other that may be faster than doing it manually. But I think behind the scenes they do exactly what you are doing already.

    Maybe, since your particles aren't interacting with each other, you could put the render() at the end of each updatePressed() and updateReleased(), that would prevent the double loops in your call() method.

    In your render() you call parent.width and parent.height each time, I'd save those to constants so you don't need to look them up each time.

    I haven't worked with Callable before, but I know with Runable, it is best to avoid mixing threads with drawing. Or is that what the executorService is doing, that's new to me too?

    Lastly, once you're out of testing, removing the text traces onscreen should give you a small boost in performance too.

    Hope this helps! ak

  • edited March 2014

    Thanks for your answers! :)

    The particle system spawns as much threads as your CPU has cores to ensure the optimal work load distribution. Try different task amounts, 6 gives me the best results for my hexacore CPU.

    The tasks just access the same int array (pixels[]) which is pretty thread safe (in case of additive blending) and super fast since nothing needs to be locked.

    Already tried...

    ... which actually decreased the FPS! I sometimes hate Java runtime optimizations...

    Removing the text doesn't affect the FPS at all (maybe by a few nanoseconds).

    The animation thread sleeps (no busy wait) until all tasks finished their work (thanks to executorService.invokeAll(tasks)).

  • Poerch, Have you done any more work with this project? If anybody else has done anything with this, I would appreciate seeing your work for other ideas and enhancements.

Sign In or Register to comment.