Adaptive Cache

The adaptive cache takes a memory budget, one for RAM and one for VRAM, and a min and max slot exponent, which, used as the exponent of both, gives the slot size in bytes. (2 power n bytes.) Given those, the cache generates one line per slot size between min and max (both included), and then build as many slots for that line as the budget permits. The current implementation does split the memory budget equally for each line.

Each line keeps statistics about its usage pattern, hits & misses, and stores its slots in a linked list, with a pointer to the first or LRU slot, and to the last or Most Recently Used (MRU) slot. Every time a slot is used for rendering, it moves to the end of the linked list, becoming the MRU slot.

From time to time (at user-defined pace) the adaptive cache will collect statistics from all the lines and check whether it needs to adapt. When the cache finds that it must adapt, it finds the line with the most free memory, and removes slots to fill the line with the most memory requirements. The cache does conceptually split and merge slots as required, but in reality it simply deletes the slots and creates new ones with the current implementation. The upcoming one should move some data around to make a contiguous area of a chunk free for the new slot.

I didn't think it was worth implementing a real memory manager at the time, however with the ability of both target APIs to discard a subset of a buffer, the new implementation should be rather close to a real memory manager.

For performance reasons, but somewhat at the expense of safety, the RAM slots are directly filled by the IOModule of the FlExtEngine. Note that the RAM slot may be filled with compressed data, and uncompressed on the fly, making it even more effective since it would store more data.

If you think the system is good, the fact is, as-is, it isn't. There's a situation in which the cache will break and lead the engine to running at likely a single digit frame per second, something nobody would like to happen. When a frame needs more slots than what's available, it'll start what is called cache thrashing. All the slots will be used, then the LRU ones will be filled again, and then in the next frame the first slots to be used will be missing (since they have been overwritten by the last items having been rendered in the last frame), and the cache will have to fill them again. That means that the next set of slots will also be overwritten. It's easy to visualise the whole cache will need be filled every frame for a little while.

The solution to that problem is simple: the slots store a date or frameID, which is updated every time it is used. The cache, when it gets the LRU slot because it's been asked one, checks whether the date/frameID of the slot is different from the current date/frameID, if it's not the case, it does switch from LRU to MRU mode. Because the Vertex Buffer might be queued for rendering by the driver, it's likely the content of the buffer will be copied in a temporary location, in which case the memory budget will be exceeded for a little time and performance might degrade, although not to the point of what would happen if the cache was thrashing.

As I said earlier, the cache does manage both RAM and VRAM, RAM being the scene cache, while the VRAM is the frame cache. When a Renderable is to be drawn, the engine binds the required buffers, and for each of them it'll perform the following operations:

  • Check whether the VRAM slot is ready for use
  • If it is, bind it
  • Otherwise, check whether the RAM slot is ready for use
    • If it is, refill the VRAM slot with the RAM slot content
    • Otherwise, request the IOModule to load the required data into a RAM slot, then fill the VRAM slow with the RAM slot content

Using that scheme, only the missing data is loaded and cached into the RAM, for use in the next frame.