User interfaces have special requirements. They must respond with lightning speed and often render 60 frames per second, sometimes with millions of pixels, to convey a high-quality impression. This puts embedded systems to the test. This article is based on the optimization of a 2D application under Linux that used the SDL graphics library. However, the concepts are also transferable to other systems or graphics libraries.
Where is the problem? – Profiling tools
To avoid wasting unnecessary effort in the wrong places, you should first find out what the CPU spends most of its time on. This is often where you can save the most time. Profiling tools are very useful for finding problem areas.
Callgrind is one of them and is part of the powerful analysis tool Valgrind. Unfortunately, Callgrind takes a long time to collect data and can slow down applications that are already too slow to the point where they are no longer usable or testable.
The gprof tool is significantly faster, but it also has its drawbacks. The profiler doesn't take into account the time spent on syscalls (e.g., sleep, read, write, blocking waits for messages), and these operations can take a relatively long time.
Callgrind and gprof are both compatible with the program gprof2dot It is compatible with the IDE, which can be used to generate clearer representations of dependencies and computation times. This tool is highly recommended for a quick overview.
Syscalls can be a huge waste of time. To find these, strace is a great tool, especially with the options -r -T and to filter the syscall types e.g. --trace=read,write.
Programs like htop can help, at least roughly identifying which thread might be causing the problem. Commands like sleep, of course, don't cause too much load, but they do cause significant time loss. Alternatively, you can manually measure the time of specific function calls and have them displayed, or you can comment out suspicious function calls. This isn't as elegant, but it also achieves the goal quite well.
What usually fits is not always good – drawing algorithms
Graphics libraries should be usable for a wide variety of applications and are therefore difficult to optimize for specific use cases. Depending on what you want to draw, however, you can also use significantly more optimized drawing algorithms. Here, a balance must be struck between performance, effort, and code comprehensibility.
For example, lines between any two points are usually drawn using the Bresenham algorithm drawn. However, if the points share an x or y coordinate (i.e. are parallel to the axis), then it is obvious which pixels need to be colored even without a complex algorithm. This can therefore be replaced by a simple loop over the pixels. The SDL graphics library already provides functions for this, as long as the lines are only one pixel wide. For wider lines, however, the Murphy algorithm (a modification of the Bresenham algorithm for thick lines) is used, regardless of the line's orientation. A lot of computing time can be saved here by replacing thick horizontal and vertical lines with simple loops over the pixels or with rectangles. In many applications, the majority of the lines are parallel to the axes, which makes this optimization applicable.
Time can also be saved by implementing the Edge list algorithm' save, which SDL and other libraries use to draw filled polygons. How this works, shows the page under the heading “Scan Line Algorithm.” For a polygon of pixel height h with n Corners and edges are h*n Potential intersection points are calculated, which can quickly become computationally intensive. A horizontal line is always drawn between pairs of intersection points (for which the Bresenham algorithm should definitely not be used).
Polygons such as rounded buttons also use the edge list algorithm in SDL, but for a large range of such buttons, the calculated intersection points are always on the same left and right edges. This can be exploited: Once intersection points have been found on two parallel edges, for many polygons you can draw not just the line between the intersection points, but an entire rectangle extending to the end of one of the edges. For this to work, the polygon must be able to have at most two horizontal intersection points with the polygon. This applies, among other things, to all convex polygons.

Putting the picture together
If you use multiple drawing layers, you have to correctly combine them for each image before updating the displayed image. It is useful to remember which areas have changed since the last image, if the graphics library you are using does not already do this. Especially with user interfaces where not too much changes, you can save a lot of processing power by copying together only the changed areas of the layers and then only updating the area of the displayed image. A simple starting point is to define a rectangle that is always expanded after each drawing operation so that it encloses the area of all previous operations since the last image. Advanced methods can be found under the keyword "dirty rectangles".
Under certain circumstances, you can also reduce the number of layers, blitting (copying the image data, block image ttransfer) is relatively complex if it is not hardware accelerated.
Often, however, it is unavoidable to have multiple layers. This may raise the question of whether you should draw on a lower layer if the content is currently hidden by a higher layer anyway. You could also draw only when the lower layer is no longer hidden, thus saving yourself the time spent drawing beforehand. Whether this is worthwhile depends on what is being drawn. It could be problematic if you have to redraw a lot in one frame. Instead of optimizing the average time per frame, it is usually more sensible to use the maximum time per frame. In this case, you should continue to draw on a hidden layer.
Check settings
Hardware acceleration can save the CPU a lot of work. It's worth taking a look at the available hardware operations, but it's also possible that crucial functionality isn't hardware-accelerated.
The pixel formats of different layers can vary. This means that the pixel data must first be converted to the correct format when composing the entire image, which takes additional time. Therefore, it usually makes sense to keep the formats consistent.
The choice of graphics library also makes a huge difference. If performance issues are foreseeable, you should at least try out a few libraries before proceeding.
Do you have any other profiling or optimization tips? Let us know in the comments!
