Of the impact of caching on your design (III)

This is the last of a three part series which started here. The first two parts aimed at describing two common and critical issues that cache users face when designing an application. This article will focus on a possible solution. I never actually tested this solution although I believe it is feasible and useful. It can be described in three major steps.

First for each type to cache, create an immutable implementation of it. The immutable structure is beyond the scope of this article but it needs to solve a particular problem of its own which is the serialization efficiency. In the proposed model the serialized form of the object can be stored in the object itself when it is created. Furthermore attributes could be extracted from that serialized version as needed. This last may not be useful in some cases.

Second whenever an object is read from a cache, it is not the immutable instance that is returned but a dynamic proxy to it. There are a number of advantages to it. One is that it solves the concurrency issue described in part II. It also makes the application resistant to evictions without any structural changes. It enables to propagate changes to cached objects even after they have been taken from the cache. It enables predictive or lazy loading of objects in the cache depending on the need. More importantly such a solution enables developers to completely ignore the fact that an object may or may not be cached when they are working outside the cache or the proxy itself. This in turn makes it easy to change an type from not cached to cached.

Third implement a transaction mechanism in the proxy or as a second type of proxy. This is essentially to enable changes made through a proxy that responds to the mutable interface of a type to stay in the proxy. This proxy can be used to detect changes and control the creation and or modification of the transaction context accordingly. That transaction context in turn will offer control over the scope of the transaction (when will the cache return the modified proxy representation rather than a ‘clean’ proxy). This is to solve the issue highlighted in part I.

In conclusion, the proposed solution is applicable to a wide range of languages and situations although it was strongly influenced by my experience building Java applications. It requires a significant development effort to put in place the proxy handlers and the transaction scheme around the cache. However this is a one time effort that will solve problems you will need to solve one way or another anyway. The approach, along with a few design guidelines for immutable objects, applies to all cached objects making it very expensive for a small application and very cheap for a large one. It also has the benefit of removing the distraction of caching issues from developers. Finally it is probably possible to make it a library although accommodating for some of the flexibility a real life application needs may result in yet another XML hell.

Post to Twitter

4 Responses to “Of the impact of caching on your design (III)”

  1. Hering Cheng Says:

    In my previous job at the Pacific Exchange (now part of NYSE), we adopted an approach similar to the one proposed here, where application code interacts with a “proxy” to the actual cached object. We did not use Java dynamic proxy; instead, we used home-grown code generation technology where all of our domain objects are defined in XML files and a generator writes the Java classes. Our proxy also serves the double duty of providing the mutable interface; it merges the job of the proxy mentioned in Part II with that in Part III. We did not use Java dynamic proxy because it appeared to be more difficult to debug. At the company where Denis and I work we use Java dynamic proxy, and my experience is that it is difficult to track down where the actual logic of an accessor method resides.

    One issue that I’d like to see some comments on is how to deal with repeatable reads within a transaction. This is an issue for which we did not implement a generic solution at Pacific Exchange. Basically, we look up a piece of reference data (by primary key, say) at point X in a transaction and we read the same data again (using the same PK, say) at point Y within the same transaction. How do we make sure that we see exactly the same data at both points if that piece of reference data was modified and committed between times X and Y by another transaction? A naive solution is to pessimistically “clone” the data even for read-only purposes, at the cost of performance. Another is to make the transaction (or the “dependent” proxy) “remember” the version (of the reference data, or “independent”, proxy) that was retrieved at point X and look up the same version at point Y.

  2. Denis Says:

    Hering I agree with you that the dynamic proxies are more difficult to debug. This could be adressed in a number of ways but it is also the price of dynamism.

    As for the second part of your comments, can you explicit your use case ? It seems somewhat contradictory with my premise that you don’t want to hold onto stale data when using a cache.

  3. Hering Cheng Says:

    In many cases, the design outlined in the 3-part series is adequate where one always reads the “latest” data from cache. In other situations we /do/ want to hold on to “stale” data when reading the same data repeatedly from the cache. For example, when processing a base (or “prototype” or “template”) quote from a market maker in an exchange, one of the initial validations (let’s call it “A”) may be to verify if the market maker is still “in the market” (e.g., if the market maker’s computer has connectivity to the exchange). A later step of the processing (let’s call it “B”) of the quote (within the same transaction) may again attempt to look up that same market maker to retrieve the parameters of automatic quote generation at different price levels based on the quote template. In this case, we would want to use the same data at point “B” that was used at point “A”.

    Now, for each example I can give, there will always be workarounds. For the specific example above, there are at least two workarounds:

    (1) All static/reference data can be collected — and selectively cloned — at the beginning of a transaction and passed on to latter part of the transaction as part of a “context”. This is the approach I adopted at the exchange, mainly because it allows one to clearly delineate between computation and I/O — a requirement when processing hundreds of thousands of transactions a second as many derivatives exchanges do. However, this is difficult to accomplish and requires careful structuring of code.

    (2) The data looked up at point “A” and point “B” can be separated into different caches and updated independently.

    Regardless of the workarounds, ideally a cache framework would give its users choices on isolation levels, similar to what contemporary RDBMSs do.

  4. Denis Says:

    I am wary of the complexity that isolation levels would introduce. I you well know I am absolutely against confusing cache and database.

    In most cases I believe it is better structuring to use a context, although that context can be managed implicitely in a manner similar to how we manage transactions.

    This kind of freeze mechanism is very appealing to me. It avoids cloning. It is similar to isolation levels but it seems simpler to me (I may be wrong).

Leave a Reply