Performance

Greetings!

This is part of a set of blog posts this week about a lot of tasks we accomplished during an “open lab” week to finish off the game and test the game for both playability and performance.

Pre-Tweaks Performance

Throughout development of the game, I have always kept performance in mind when writing code or designing a system and have used tricks and common practices to do so. By reducing the number of new objects created each update call, it saves on memory usage as well as on CPU usage during garbage collection which although is not evident in our game, it can cause major performance issues if not managed properly. An example of this is when an enemy gets hit, it’s colour gets set to a Color object already created and stored then the colour get backs reset to another default Color after a few milliseconds meaning only 2 Color objects are ever used. This is at the cost of slightly increased memory usage however it prevents multiple new objects being created per second and reduces the amount of time the garbage collector needs to run.

Despite these pre-emptive measures, there are stills ways that performance can be improved in our game. As such, I collected performance metrics about the game running on a Nexus 9 Android tablet (our minimum target hardware) to see where our weakest area for performance is. The profiler data below is from half way through the final wave (Formation India) before the boss as this is where the most enemies are active at one time so it is under it’s greatest load.

Level 1

  • Texture Memory: 23.5 MB
  • Audio Memory: 56.9 MB
  • Batches: 26
  • Triangles: 1.2 k
  • Vertices: 1.5 k
  • Physics: 5.4 ms
  • Rendering: 8.1 ms
  • APK Size (ETC): 36, 172KB

Pre-optimisation

Before any major tweaks, our game runs at an average of 60 FPS so although any tweaks we make for the game won’t show any increased performance for the player, it will make it so we could support an older range of devices. It will also allow us to expand the game more without worrying as much about performance impact.

It is clear to see that physics takes up the largest amount of processing time taking on average 5ms to run all physics related tasks. The largest cause of this is from collider transforms which occur when an enemy plane, player plane or bullet move in the game world. As I know what type of colliders each plane/projectile uses, we can see how efficient each collider is:

  • Polygon Collider (Enemy Plane)
    • Count: 5
    • Time: 2.30 ms
    • Avg processing time per collider: 0.46 ms
  • Box collider (Bullet)
    • Count: 52
    • Time: 0.66 ms
    • Avg processing time per collider: 0.0127 ms
  • Edge collider (Enemy Bomber)
    • Count: 12
    • Time: 0.47 ms
    • Avg processing time per collider: 0.0392ms

Based upon this, I can see that the use of polygon colliders is the biggest cause of high physics processing time. This is most likely because a polygon collider is much more complex compared to a box collider. A polygon collider will have multiple edges and points that need to be moved every single update compared to a box collider that can be defined by just providing a width and height instead of individual point positions. Edge colliders are more efficient than polygon but less so than Box. This is mostly for the same reason, they have fewer points and edges to be moved compared to the polygon collider but more complex shapes compared to a box collider.

Tweaks

There are numerous tweaks I implemented to improve performance whether it was in-game performance or to reduce APK file size.

One of the first changes I made was to make all texture sizes have dimensions to the power of two. By doing so, it allows Unity to use compression on the textures to reduce their file size without a noticeable decrease in quality. There are also certain optimisations graphics card can use to perform faster operations upon powers of two. This is less of a factor in modern computing however as we are targeting Android devices, these have less processing power than a desktop computer so more optimisations need to be considered.

This however was offset mostly by the use of Unity’s built in sprite packing feature. This feature stitches multiple images together to form a larger image then stores the coordinates and size of each original texture so it can be “fetched” from the larger image. This is more commonly known as a texture atlas and it is very commonly used for 2D games and UI textures as you can fit multiple images onto one sheet compared to 3D games that generally have larger textures per object. The use of this texture atlas can reduce texture bind calls and therefore draw calls as instead of having to bind a new texture for each different object, it can instead batch together all objects that share the same texture atlas and render them at once.

Future Tweaks

There are more tweaks I would like to make to the game in the future to further improve the performance without sacrificing gameplay. Some of these tweaks are also currently not needed but they help future proof our game for when we expand upon it as well increase the range of devices we could possibly support.

One tweak is to implement object pooling, especially for bullets. Object pooling works by instead of discarding objects when done with them and letting the garbage collector remove them, instead the object is returned to a pool of objects and its component values are reset to a default state. When an object is required in the future, instead of creating one, one is removed from the pool and returned instead ready to be used. This use of pooling can greatly reduce CPU usage from creating new objects and garbage collection at the cost of more constant memory usage. This is often prefered as CPU time is more valuable for performing calculations (such as physics or running scripts) whereas there is often memory which may be underutilised. In the case of our game, it would be suitable for bullets as many instances of them are often created over the gameplay session and left for the garbage collector to remove. Although the impact of the GC is not noticeable currently, in the future it could prove useful and would allow us to support devices with a slower CPU.

Further tweaks can also be made to the sprites by reducing the size of a number of them. Currently some textures are twice as large as they are displayed in game so by reducing the dimensions of them, there will be a very minor quality loss however this be offset by a massive reduction in texture size. This will decrease texture memory usage allowing us to use more textures if desired as well as ensure the user can still keep their other applications open on the phone in the background without the Android OS closing them to free up memory.

Some of the audio files used can also be reduced in size with minimal audio quality impact. This may not affect audio memory usage much however as it appears that the audio manager itself uses a large amount of memory despite file size. It is something that can be investigated though and tweaks attempting with minimal time impact but possibility for more memory to be freed up.

As discussed as part of the Pre-Tweaks Performance, a noticeable amount of CPU time was spent upon collider translation. This could be improved by switching to using box colliders and edge colliders on the purple bomber and player plane without reducing the area they can get hit whilst greatly reducing time spent translating them.

Post-Tweaks Performance

Level 1

  • Texture Memory: 26.7 MB
  • Batches: 14
  • Triangles: 661
  • Vertices: 917
  • Physics: 5 ms
  • Rendering: 7.5 ms
  • APK Size (ETC): 34, 881KB
  • APK Size (DXT): 34, 769 KB

Post-optimisation

To ensure a fair comparison, this profile snapshot was also taken half through the final wave (Formation India) before the boss. As all of the tweaks implemented were focused around improving rendering and graphics, we can see the biggest change with texture memory as well as number of batches.

Interestingly, texture memory increased, the mostly cause being that most textures are now stitched together. Textures that weren’t used before may now be part of a texture atlas which is currently loaded, increasing the texture memory usage. All UI elements are also stitched together into one atlas including elements that aren’t shown until the end screen compared to before where the texture for the end screen was separate so only needed to be loaded at the end.  Overall, I don’t count this as a major negative as overall memory usage is still very low but it will be kept in mind for future work. One solution would be to break the end screen UI assets into their own texture atlas to prevent them from staying loaded throughout the entire game.

The number of batches/draw calls has decreased dramatically however, almost in half from the original metrics. This is due to the sprite batching meaning Unity can more efficiently draw the sprites on screen without needing to draw then bind a new texture each time. In this case, as the UI used to be all on one texture before I can conclude this is due to batching the player, fighter, enemy and bullet sprites all into one texture atlas meaning they can all be drawn in fewer batches. This is reflected in the rendering time however as this number fluctuates a lot, it is not able to see  how much exactly it has reduced the time to render by.

APK file size was also reduced, partially thanks to allowing compression to now occur however also due textures being compacted into texture atlases, removing a lot of unused image and thus data. Out of interest, I compiled using Unity’s default standard compression (ETC which does not support built in Alpha) as well as using DXT (Used on Tegra devices such as the Nexus 9) and seen that the DXT APK was marginally smaller. Although DXT supports alpha natively, the tiny reduction in APK file size does not it make worth while to build a separate one for each release. It may provide faster startup times however as DXT textures could load faster on Tegra devices compared to the default ETC and it is something to investigate for the future.

Overview

Overall, time spent rendering was improved at the cost of slightly increased memory usage. There are still many areas that can be improved in terms of performance, most notably with regards to physics and collider transformations. This can be improved by switching from polygon colliders to box and edge colliders instead, improving performance without sacrificing gameplay.

I’m pretty happy with the improvements I made although I believe I should have instead focused upon improving the physics of the game rather than the rendering portion. The physics is currently the more urgent concern and there are simple tweaks that can be made to improve that area. Overall, I don’t believe I used my time efficiently and in the future, I need to take into decide what would be a better focus for my time.

Leave a Comment

Your email address will not be published. Required fields are marked *