Caching Strategies Overview

Caching is a data storing technique for faster data retrieval. Faster in a sense that it is faster than getting data from its’ primary storage like a database. To achieve this we usually cache frequently requested or computed data. Now, let’s take a closer look on what could be the different caching strategies to consider. Keep in mind that every applications’ needs are different and you should pick the caching strategy according to that.

Cache-Aside Strategy

Cache-aside caching strategy is one of the most widespread caching strategies used. The main idea behind this strategy is that an object is stored in the cache only when an application requests for it.

The basic flow of cache-aside caching goes like this:

Application receives a request for some data to fetch.
Application checks if data exists in the cache:
- If yes (also known as cache hit), takes it from the cache.
- If not (cache miss), calls the data storage (e.g. database) to retrieve data and stores it in the cache.
Application returns requested data.

Cache-aside strategy fits well for data that is read often and not so often stored or updated. User profile info or news stories are two examples of this sort of data. In a usual case, once it’s stored, it is mostly displayed somewhere (e.g. in a webpage) and never gets updated.

One downside of this strategy is possible data inconsistency between the cache and the data store. Therefore it is important to come up with an efficient eviction policy (when to get rid of “old” data) that would fit your application’s data access patterns.

Read-Through Caching Strategy

Read-through strategy is a similar caching approach to the cache-aside. The difference is that the application doesn’t orchestrate where to read the data from (cache or data storage). Instead, it always reads data through the cache. In this case, cache is the one to decide where the data is fetched from. This is a big advantage when we compare it to the cache-aside strategy as it leaves the application’s code much cleaner.

read-though cache — Read-through caching strategy

Write-Through Caching Strategy

In this strategy all of the write operations go through the cache. Every time we write, the cache also stores the data in the underlying data storage. Both operations happen in a single transaction. Therefore, everything is successful only if both of the writes succeed. It does create some extra latency on writes but at least it improves the data inconsistency problem greatly as the data in the cache and data storage is the same. But be aware about what data to write through the cache to not end up with loads of data living in the cache that’s actually never or rarely read. This could lead to unnecessary memory usage. What’s even worse, some useful data could get evicted from the cache by not-so-useful data.

On the positive side, because application only talks to the cache, it’s code is much cleaner and simpler. This becomes especially obvious if you need to duplicate the logic in multiple places in the code.

When using write-through caching strategy it makes sense to also read through the cache as read operations are fast.

Write-through caching strategy fits best for an application that:

Needs to read the same data frequently.
Don’t tolerate data loss and inconsistency between the cache and data storage (“old data”).

One potential example of a system that could use the write-through caching strategy is a banking system.

Write-Behind Caching Strategy

Write-behind caching strategy is similar to the write-through cache in a way that application only communicates with the cache and has a single facade to write data through. The difference with the write-through pattern is that the data gets written to the cache first. And then, after some time later (or by some other trigger) data gets also written to the underlying data source. Now that’s the key part of this strategy- those operations happen asynchronously.

Data source writes can be done in various ways. One option is to “collect” all the writes and then at some point in time (e.g. when the database load is low), do a batch write to the data source. Another approach is to just consolidate the writes into smaller batches. The cache collects for example five write operations and then does a batch write to the data source.

Having the asynchronous writes to the cache and data source helps to greatly decrease latency. In addition to that, it helps to offload the data source. But on the not so positive side it increases the data inconsistency between the cache and data source. Which leads to one extra concern. If someone is to fetch the data directly from the data source while data hasn’t been written to the data source yet, it can lead to fetching out-of -date data.

To tackle the data inconsistency problem a system could combine the write-behind strategy with the read-through strategy. In this way up to date data is always to be read from the cache.

When we compare it to the write-through strategy, it fits more for systems with large write volume that tolerate some data inconsistency.

Conclusion

In this blog post we have covered four different caching strategies: cache-aside, read-trough, write-through and write-behind. To use one, you need to consider the characteristics of the system. Also, in a real world system you are most likely to have a combination of the strategies living hand by hand.