We recently programmed a firmware extension for a customer that allows the application to save and read data from the microcontroller's flash memory. The firmware runs on a PIC32MZ, which is equipped with a data cache and an instruction cache. Anyone who has ever tried to parallelize software for a PC with a multi-core CPU knows how important the cache is. If all cores access the same cache simultaneously—for example, in a loop—the code becomes slower rather than faster. Paradoxically, the more cores you use, the slower the code runs. However, the PIC32MZ only has a single-core CPU, so you would expect these problems not to arise here.
In fact, the use of a cache can significantly speed up execution times in microcontrollers, since access times to RAM, and especially to flash, are significantly longer than access times to the cache. The PIC32MZ's instruction cache is 16kB. Theoretically, the entire program flash could be cached for smaller programs. Since the CPU thus fetches instructions almost exclusively from the cache, a program can run faster. Executing code from RAM is therefore hardly necessary.
Despite all the advantages, there are a few things to consider when using a cache. This is especially true when data is read and written past the cache. This always happens when the CPU is not performing the operation, but rather a hardware instance. This could be the DMA or flash controller, for example. Let's imagine the following scenario: An application running on a PIC32MZ needs to regularly save data to the flash. To extend the lifespan of the flash, old data is not deleted with every write operation, but instead is written to a free memory area until the entire memory is full. Therefore, before new data can be written, the application must first read the memory to find a free space. The data contains a CRC (cyclic redundancy check) which was calculated over the entire data set (with the exception of the CRC itself). If the application now calculates the CRC over the memory area just written and compares it with the CRC written to the flash, it will find that the CRCs are not identical.
The following problem arises when calculating the CRC in this example: When searching for a free memory location, the microcontroller flash was first read. The data was then transferred to the cache. Since the addresses at which the data was stored are now at least partially present in the cache, the CPU will attempt to read as much data as possible from the cache when calculating the CRC. However, the data in the cache is no longer current at this point, as the flash controller has already written to the corresponding addresses. For this reason, the application cannot calculate the correct CRC.
|
|
| M.Sc. Björn Schmitz, Software Developer E-mail: schmitz@medtech-ingenieur.de Phone: +49 9131 691 240 |
|
|
Do you need support with the development of your medical device? We're happy to help! MEDtech Ingenieur GmbH offers hardware development, software development, systems engineering, mechanical development, and consulting services from a single source. Contact us. |
|
The figure below shows (simplified) the cache implementation of the PIC32MZ. The figure was taken from the Microchip Application Note AN1600 DMA and flash controllers were subsequently marked in color. As can be seen in the figure, these write data to the flash completely independently of the cache. Therefore, it can happen that data in the RAM or flash is more recent than data in the cache. Furthermore, it could in principle happen that data written to the RAM via DMA is subsequently written to a cache. flush be overwritten.

To prevent this, the programmer has several options. One option would be to perform a software reset after each write operation. After a reboot, the cache is empty and therefore needs to be reloaded. This option is, of course, generally impractical.
It would be more sensible to create a cache after each write operation on the flash invalidation , i.e., to declare the written addresses as "invalid." This means that the corresponding addresses must be re-cached during the next read operation. A cache should be created before each DMA transfer. flush This clears the cache and writes the currently cached data back to RAM. This ensures that only current data is transferred via DMA. With cache flush and cache invalidation Most problems can be solved. However, it is important that both happen before the write operation. If the application were to wait for the end of a flash access and then create a cache invalidation , outdated data could be read from the cache during a context switch. Such a context switch could occur, for example, through an interrupt.
A way to prevent and at the same time cache invalidation and cache flush To avoid these problems – which may be superfluous – it is possible to store data directly in a non-cached area. The PIC32MZ offers two virtually separate address spaces for this purpose, one of which is cached and one of which is not. This is illustrated in the figure below, which was taken from the PIC32MZ datasheet. As can be seen, the address ranges KSEG0 and KSEG1 share the same physical address space and cover the entire flash and the entire RAM, respectively. However, while data in KSEG0 is cached, data in KSEG1 is not. It is therefore possible to prevent data caching through appropriate addressing. To make the code more readable, it is possible to store data directly in the KSEG1 area (by default, data on the RAM is stored in KSEG0). If you write:
uint32_t _attribute__((coherent)) dataBuffer[100];
The "dataBuffer" array is created in KSEG1. DMA transfers to or from the array can therefore be performed safely. When data is read from the flash, its physical address can simply be converted to a KSEG1 address to prevent the data from being read from the cache.

Conclusion
Caches on microcontrollers are no longer uncommon these days. In addition to microcontrollers from the PIC32MZ series, these also use microcontrollers based on the ARM Cortex-M7. For this reason, even embedded developers (or perhaps especially so) should familiarize themselves with the functionality of caches. Despite potential sources of error, caches on a microcontroller offer many advantages, as they can significantly reduce access times to data and program memory.
While the methods explained here all refer to the PIC32MZ, most of them are also applicable to other microcontrollers with cache. Microcontroller manufacturers typically provide information on what to consider when using the cache. Similar to the Microchip application note mentioned above, ST, for example, offers a similar document (AN4839, Level 1 cache on STM32F7 Series).
Of course, there's much more theory surrounding caching than could fit into this blog post. For this reason, some points are presented in a very simplified manner. In addition to comments, I also welcome additions to my text.
