There is a mess in my head and I happen to like it
април 14, 2014
dart and performance (a test journy in game land)
I have been playing with Dart and StageXL for 10 days now and I feel like there are thoughts to be shared. Part of this post is also an update to a thread in the StageXL group
The game is really simple clone of the flappy bird. The main idea is that the gameplay should be easy to implement and understandable in order to allow me to concentrate on the internals of the game and the rendering engine instead of toying too much with the game itself.
At one point I have noticed that the way the original game detects the collision of the bird on the trees is a naive one and often gives false negatives. I went on reading about what is possible and present currently in Dart and StageXL in this regard. Turns out only very basic approach is available internally as StageXL is geared as animation library and not as a game development framework. Never the less this allowed me to play a bit more with the language itself.
For collision detection I have utilized the following approach:
Using the DisplayObject.hitTestObject find element(s) that are potentially colliding (in the test game I developed it could be only one out of ~ 10).
Determine the rectangle where the transformed protagonist will fit.
Using detached canvas element, clear it and draw the transformed image of the protagonist.
Using detached canvas element, clear it and draw the potentially colliding element.
The 2 canvases used are just big enough to contain the protagonist image.
Compare the non-alpha pixels of the protagonist canvas to the colliding element canvas and there is a pixel where alpha is greater than 0 we want to return true (i.e. there is a collision).
Here are the results of this little experiment.
On PC and DartVM this runs so smooth, one would think why people are not using it all the time. The average time to run the code per frame with complete pixel accurate collision is ~0.35ms (with time limit of ~16ms this is pretty good I think).
Compiled to JS it runs at about twice as slow, the frame taking about 0.78ms still pretty good. This is kind of strange since the code is mainly touching webGL, canvas and arithmetics. Canvas and webGL are DOM interfaces and thus there should be no difference, so the only possible place of optimization / speed is the arithmetics (at least this is what I initially thought). I am not an expert in VMs, but having a VM that can run at least twice as fast the calculations is awesome!
However the problems with the 'compile to js' approach start to show as early as here: taking a look at the memory debug panel one could notice a very different pattern in JS world. The game has two different modes: off-game mode where a very simple animation is performed and on-game mode where in addition to the very simple sprite animation of the 'protagonist' obstacles are animated as well as collision detection is performed at each frame.
Here are the images.
V8 - Chrome desktop
Notice a difference? Again - the code is only using a single canvas element with 3d context (webGL) to draw the animation on it and internally StageXL is using detached canvases and lots of calculations and canvas transformations. First of all - before the snapshot is taken the memory is force cleaned. Then for the test I let the off-game animation run for a while and then I play a little bit and then I leave it alone again.
On top of using around 4 times more memory (JS VM versus DartVM), JS also shows significant difference in memory allocation patterns when more code paths are executed per frame compared to DartVM. For example in the first image it is impossible to distinguish when I am playing the game and when not, while in the second image taken from JS land it is pretty much very clear - the climbs are much more steeper when the more complex code path is executed. Notice the change in steepness around second 33 in the second picture - this is where I killed my protagonist and the complex animation path is discarded. Notice also the enormous difference in memory allocation - going from 7.2MB up to 10MB in 40 seconds, while in DartVM we went from 1.8MB to 2.9MB in 5 minutes. Actually I had to wait for the garbage collection event to happen in DartVM just to see if it will ever happen, this is why the screenshot of the DartVM is encompassing such a long time period. For a moment there I thought it was broken - how come no GC event....
One negative thing about Dartium - when in debug mode (console open and recording) the performance on the screen was worse and noticeably janky. Turn off dev console and you get perfectly smooth animations.
The above summarized data about the performance and memory footprint were all taken on a core i3 @ 3.3GHz with 4GB onboard memory and a dedicated video card Nvidia. This is all well and nice, but the target platform is actually phones and tablets. It was time to test this on those devices.
I started with nexus 7 as easier to approach (no need to find a Mac computer just to debug some web stuff ha!). The results are pretty sad.
The memory consumption is pretty much identical to what we get in the desktop chrome (not surprising).
V8 - Chrome mobile
You can again easily see when the game is on and when not. Also the memory utilized is in the same range.
A very significant difference is however noticed in the frames timing. While on i3 CPU it takes under a millisecond to complete the code paths in the RAF, here it takes about 5ms on average, peaking at ~9ms!! This is so much more than what we had on the PC. Also the composite time is very different (although the canvas size is kept the same for measurement purposes - 480px x 640px) - ~2.5ms versus 0.5ms on PC. Strangely the time reported for compositing in Dartium is reported to be 0.2ms... (doesn't really make sense here...).
Even with those measurements the game should have been pretty good on the Nexus 7 device. But it is not. Well, remember the memory allocation patterns? This is the worse enemy of any game - having too much garbage. Guess the time for garbage collection events on Nexus 7.... ~80ms on GC cycle!!!! This is killing my game (and possibly will kill yours)!
Before following the logical steps to attempt to alter my code to lower the memory allocations and recollection I wanted to know which are the places where most of the CPU cycles are going. From the above tests I already knew that compositing the scene on Nexus 7 was taking ~3ms, which leaves me with ~13ms time to handle all the game logic. I have also tested with LG phone that claims to have the same hardware specs as the Nexus 7 device. Turns out the JS time is almost the same, but compositing time is 4.2ms. The screen is smaller and I expected the compositing time to be shorter, but instead it is longer, leaving the game in a pretty bad state: that is even in the time where no GC events occur the frame rate is below 60.
Back on my mission to understand performance implications of dart2js code I used nexus 7 to measure where is the CPU time spent.
Surprise - surprise! drawImage was taking 50% of the CPU. This is strange, I would have expected this to be a fast operation, considering the fact that the canvas is detached from the document. Anyways, what I was more surprised about was that 14.30% was spent in the comparison of pixel data of my two hidden canvases, but the actual comparison was only 7%, the other 7.3 was spent in 'convertnativeToDart_ImageData' - guessing from the name it converts the native ImageData object to Dart list of integers. Transforming the canvas is also 7.14% of the total time plus the clearing of the canvas. As a whole the slowest is the drawing of the image data. I believe I have put together the best practices (draw on a full pixel value, do not attach the canvas to the DOM), but still the combination of those actions take ~10ms, with a compositing taking ~3 the game is on the verge of not being playable.
Clearly using a different approach for detecting collision will reduce the amount of time (since we can avoid the most expensive operation - i.e. clearing the canvas and drawing on it).
More interestingly - using profiling from Dartium was not possible. The window.console.profile() call did nothing there. I am not sure if this is a bug or simply it does not work this way with DartVM, but it would have been interesting to see.
Going back to actual search for solution I decided to rework the code and instead of Dart idiomatic code to write what is known to work best in JS land. For more details on how one is supposed to write code in Dart please refer to this article.
But it comes at a price. The dart2js compiler aims to produce code that is as close to the original dart code and its idioms, so basically if you use closures and forEach etc they end up in your code. Of course tree shaking is performed, but I do not see a lot of code rewrite being done, even less code optimization. It is a grey area (means it is not clear who should be optimizing the code in this case, the VM or the compiler. We know that in Google both are employed in different projects (for example GWT produces code that is highly optimised per browser, while Closure compiler produces code that is optimized for size and can potentially lead to more ineffective code when it is executed in V8 (actually there are several bugs submitted about this - function in-lining leading to calls that are de-optimized or cannot be optimized at all)).
So what I did to mitigate things: first - remove all closures (so no forEach, no map etc). Second - get rid of all in-method created object instances (mainly Matrix and Rectangle instances for calculations). Instead create 'cache' instances and tie them to the main object (the one where the methods are executed). So now instead of creating several matrices only one instance is used and is mutated several time and reused. Same goes for the rectangles. Third - get rid of local variables, instead use a List instance as cache and put every number needed in there.
Well the answer is (sadly) - YES.
My assumption is that this would not matter in more static web apps or apps using other means to animate, but for games, while providing great library and excellent tooling as a whole Dart hides too many underwater rocks. You have to already be a JS ninja to write Dart code that will perform best when compiled to JS, which basically nulls out the benefits of Dart IMO. Dart promised to get us rid from JS. And yet we are going back there the moment we need a bit more performance.
I should finish with a positive note: all this will be gone once DartVM is available as a built-in Chrome option in stable. At the penetration rate of the new stable version to the users if would be a week after that release and everyone will have it. It is another question how will we deploy pure dart code to the users (my project for example uses ~500 dart files, imagine downloading those to the client...). But it could happen and then one will be truly liberated from the JS. But what about other browsers? Mozilla's ex CTO/CEO is directly opposing it, while Microsoft and especially Apple have financial interest of hindering the adoption of Dart. So it is not purely technical problems after all. Anyway, with Dart support in chrome one would have a large user base (Android + ChromeOS devices + all chrome installs) and in the beginning I think it is enough to incline the developers to explore it more as a platform and less as a compilation mid-stage for JS. Just imagine what you could write with those extra CPU cycles...