We are about to switch to a new forum software. Until then we have removed the registration on this forum.
I just ran a little test and this
void setup() {
}
void draw() {
for (int i=0; i<1000000000; i++) {
float var = i;
}
}
runs about 3 times faster than this
float var;
void setup() {
}
void draw() {
for (int i=0; i<1000000000; i++) {
var = i;
}
}
which is (at least to an ignorant like me) counter-intuitive, because in the first example, variable var
is cast a billion times, whereas in the second one, it's merely changed in value. How comes?
Answers
In both cases, an
int
value is auto-upcasted tofloat
type before each assignment to variable var.However, for the 2nd example, var is an instance field of class PApplet, stored in memory's heap region.
And the other is a local variable scoped to a
for ( ; ; )
's curly block, stored in memory's stack region.Stack region got faster access than heap's.
And all local variables (including parameters) are faster than object fields. \m/
Unless a field represents a compile-time known constant value.
In this particular case, it is as fast as a literal!!! $-)
Thanks. Stack and heap are new lingo to me. So, if my googling serves me right, I understand that Processing decides for me which region to use for which variable in my program. I get it that in C, for instance, you have to specify the region to use. Is that right?
Not Processing but Java. And even in C, the rule is very similar.
Variables go to stack and allocations go to the heap.
https://en.Wikipedia.org/wiki/Stack-based_memory_allocation
https://en.Wikipedia.org/wiki/Memory_management
Thanks. Does the position of a variable down the stack influence how fast it can be accessed?
I believe that as long as it is in the same stack (and the stack isn't too large), the position of the variable in the stack shouldn't influence the time it takes (it may, but only a bit, and then also depending on the stack size) but I may be wrong.
I also believe that stacks tend to fit inside the L1 cache (or atleast L2 cache) entirely and are hence faster than the heap, which (most of the times) fits only in the main RAM, or at best the L2/L3 cache. Am I right?
The other answers are missing the fact that the Java compiler and the Java VM JIT compiler can do dead code elimination. Because your first example assigns a local variable that is never used, the VM may remove that code. In fact, as the for loop then doesn't do anything, it may remove that too!
http://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html
@Lord_of_the_Galaxy, I just know the general idea.
But I guess it be should be something like you described. ;))
Just wanna let you know that Java, since version 6, got something called escape analysis:
https://en.Wikipedia.org/wiki/Escape_analysis
It allows for Java's JVM to decide if an object can be created in the stack rather than its regular place in the heap!
However, I dunno whether a stack object is faster than a heap 1.
What I know for sure is that once the stack is destroyed, it doesn't involve the Garbage Collector at all.
So at least, it spares the GC from the hard work to deal w/ a stack-allocated object. \m/
@GoToLoop AFAIK a stack object should be faster than a heap one, because heap ones will cause additional delays during creation also (stack allocation is easier, I guess).
@neilcsmith_net Are you sure that it is happening in this case? Because if dead code elimination was occurring, then both the versions should run at same speed (good dead code elimination will remove the variable and the for loop as the only place the for loop is being used is to set the value of
var
, which itself is dead). Maybe it is partially disabled in whatever compiler the Processing IDE uses?@neilcsmith_net is right for the 1st example.
The variable var never escapes the
for ( ; ; ) {}
loop block!An actual valid test would be something like this:
@Lord_of_the_Galaxy in the second case the field could potentially be accessed from outside, even by reflection, so it's harder / impossible to prove it's really dead code. The Java compiler definitely couldn't prove it, the VM JIT compiler might be able to (though would probably be against the spec)
And if the Java compiler proves it, then that (dead code elimination) is the reason for faster execution in first example
That doesn't disprove local variables and parameters are faster than object field members! [-X
If dead code there is, it can only be the local variable declaration, not the loop, because execution still takes about 500 ms.
I didn't say it disproves that, the statement was conditional - if complete dead code elimination is happening then that becomes the primary reason for the reduction in speed and other reasons are secondary.
But since the OP says it still takes a fair bit of time, I'd say that no dead code elimination is occurring here.
I've learned totally new things in the thread. Thanks all.
Just did a "Benchmark for Member Vs. Local Vs. Parameter Variables". :bz
Run it and pick your own conclusions: O:-)
P.S.: New version 1.1, which now includes an extra test for static member variable. :D
Wow cool! Will test this tonight. Thanks!
How would an open poll with result times vs system specs be relevant? :)
You can assign a new value to a variable inside a
for( ; ; )
statement? Mind = blown. :-OConsole output (retabbed):
The third part of the for statement (
for(..;..;this part)
) can be any single statement :)Sorry for the digression... but if
i++
is the equivalent toi = i+1
andvar1 += var2
the equivalent tovar1 = var1 + var2
, then I'm surprised to see that something likevar1 = var1 + (i = i+1)
can evaluate without error (two equal signs here). What am I not getting right?Exactly, it does evaluate without an error. Try it.
(Try
y = y + (x = x + 1);
)Hmm. So you can change the value of a variable WHILE it's being used in an expression. I mean, not only does your expression use variable
x
added of1
, it also changes the stored value of variablex
. That's new too. ;;)EDIT: understanding this might look trivial to many, but believe me I've really had to wrap my head around it for a fair 5 minutes before it sunk in...
The result of
i=i+1
isi
and this value is passed through so can be used by adding tovar1
In your particular expression
(i=i+1)
is equivalent to the pre increment operator++i
You might try
var1 = var1 + ++i;
should compile but can't try it myself at the moment. Alsovar1 = var1 + (i++) ;
should produce the same result.Yes, both compile error-free. My single-equal-sign-per-evaluation world is crumbling.
@quark
Are you trying to say that
var1 = var1 + ++i;
andvar1 = var1 + (i++)
produce the same result? If I remember right, ++i evaluates to (i + 1) and i++ always evaluates to i. Or am I wrong?Okay well, I hadn't bothered to check values, I just saw it ran. But you have a point: if
var1
andi
are equal to 0,var1 = var1 + (i++)
evaluates to 0, whereasvar1 = var1 + ++i
evaluates to 1!Aha! I was right then. I guess he meant that both should compile, and not both should evaluate to same value?
My AMD-CPU laptop here is slow. This is my output for version 1.1.4: 3:-O
I don't get why @Moxl's Local was the slowest, while his Global was the fastest? 8-}
Mine's just the opposite. 8-X
Might be my uninformed method for testing... don't have my sketch here but will post when I get back to my "coding computer".
EDIT: oh you meant the results of your test. Yeah. Weird. Who am I to tell, though...
You din't see nothing yet! We can use commas
,
in order to add more than 1 expression there! =P~Of course, those expressions can be simply function calls too! :)>-
@Lord_of_the_Galaxy
Yes
i++
evaluates toi
but(i++)
evaluates toi+1
because the parentheses have higher precedence than++
so forces the increment beforei
is used, effectively making it the same as++i
In their internal implementation, both pre & post increment/decrement unary operators modify its variable immediately.
The difference is that the post version returns the original previous value instead. :-B
@quark I tested too, what I said is right as far as Java is concerned. And @GoToLoop explains it more correctly now.
@GoToLoop Cool! I didn't know that. Though I would still prefer adding curly brackets to make it look more readable, it will help me in making code much shorter (in length, and not talking of compiled code) if and when required.
Try this code yourself-
I wasn't able to test my code when I posted earlier and it seems I was incorrect regarding
(a++)
being equivalent to++a
.