<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Of the impact of caching on your design (III)</title>
	<atom:link href="http://digitalbrikes.com/onebrikeatatime/2007/12/26/of-the-impact-of-caching-on-your-design-iii/feed/" rel="self" type="application/rss+xml" />
	<link>http://digitalbrikes.com/onebrikeatatime/2007/12/26/of-the-impact-of-caching-on-your-design-iii/</link>
	<description>Notes on software development</description>
	<pubDate>Tue, 06 Jan 2009 00:30:36 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
		<item>
		<title>By: Denis</title>
		<link>http://digitalbrikes.com/onebrikeatatime/2007/12/26/of-the-impact-of-caching-on-your-design-iii/#comment-212</link>
		<dc:creator>Denis</dc:creator>
		<pubDate>Thu, 10 Jan 2008 21:02:48 +0000</pubDate>
		<guid isPermaLink="false">http://digitalbrikes.com/onebrikeatatime/2007/12/26/of-the-impact-of-caching-on-your-design-iii/#comment-212</guid>
		<description>I am wary of the complexity that isolation levels would introduce. I you well know I am absolutely against confusing cache and database.

In most cases I believe it is better structuring to use a context, although that context can be managed implicitely in a manner similar to how we manage transactions.

This kind of freeze mechanism is very appealing to me. It avoids cloning. It is similar to isolation levels but it seems simpler to me (I may be wrong).</description>
		<content:encoded><![CDATA[<p>I am wary of the complexity that isolation levels would introduce. I you well know I am absolutely against confusing cache and database.</p>
<p>In most cases I believe it is better structuring to use a context, although that context can be managed implicitely in a manner similar to how we manage transactions.</p>
<p>This kind of freeze mechanism is very appealing to me. It avoids cloning. It is similar to isolation levels but it seems simpler to me (I may be wrong).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hering Cheng</title>
		<link>http://digitalbrikes.com/onebrikeatatime/2007/12/26/of-the-impact-of-caching-on-your-design-iii/#comment-207</link>
		<dc:creator>Hering Cheng</dc:creator>
		<pubDate>Wed, 09 Jan 2008 19:38:21 +0000</pubDate>
		<guid isPermaLink="false">http://digitalbrikes.com/onebrikeatatime/2007/12/26/of-the-impact-of-caching-on-your-design-iii/#comment-207</guid>
		<description>In many cases, the design outlined in the 3-part series is adequate where one always reads the "latest" data from cache.  In other situations we /do/ want to hold on to "stale" data when reading the same data repeatedly from the cache.  For example, when processing a base (or "prototype" or "template") quote from a market maker in an exchange, one of the initial validations (let's call it "A") may be to verify if the market maker is still "in the market" (e.g., if the market maker's computer has connectivity to the exchange).  A later step of the processing (let's call it "B") of the quote (within the same transaction) may again attempt to look up that same market maker to retrieve the parameters of automatic quote generation at different price levels based on the quote template.  In this case, we would want to use the same data at point "B" that was used at point "A".

Now, for each example I can give, there will always be workarounds.  For the specific example above, there are at least two workarounds: 

(1) All static/reference data can be collected -- and selectively cloned -- at the beginning of a transaction and passed on to latter part of the transaction as part of a "context".  This is the approach I adopted at the exchange, mainly because it allows one to clearly delineate between computation and I/O -- a requirement when processing hundreds of thousands of transactions a second as many derivatives exchanges do.  However, this is difficult to accomplish and requires careful structuring of code.

(2) The data looked up at point "A" and point "B" can be separated into different caches and updated independently.

Regardless of the workarounds, ideally a cache framework would give its users choices on isolation levels, similar to what contemporary RDBMSs do.</description>
		<content:encoded><![CDATA[<p>In many cases, the design outlined in the 3-part series is adequate where one always reads the &#8220;latest&#8221; data from cache.  In other situations we /do/ want to hold on to &#8220;stale&#8221; data when reading the same data repeatedly from the cache.  For example, when processing a base (or &#8220;prototype&#8221; or &#8220;template&#8221;) quote from a market maker in an exchange, one of the initial validations (let&#8217;s call it &#8220;A&#8221;) may be to verify if the market maker is still &#8220;in the market&#8221; (e.g., if the market maker&#8217;s computer has connectivity to the exchange).  A later step of the processing (let&#8217;s call it &#8220;B&#8221;) of the quote (within the same transaction) may again attempt to look up that same market maker to retrieve the parameters of automatic quote generation at different price levels based on the quote template.  In this case, we would want to use the same data at point &#8220;B&#8221; that was used at point &#8220;A&#8221;.</p>
<p>Now, for each example I can give, there will always be workarounds.  For the specific example above, there are at least two workarounds: </p>
<p>(1) All static/reference data can be collected &#8212; and selectively cloned &#8212; at the beginning of a transaction and passed on to latter part of the transaction as part of a &#8220;context&#8221;.  This is the approach I adopted at the exchange, mainly because it allows one to clearly delineate between computation and I/O &#8212; a requirement when processing hundreds of thousands of transactions a second as many derivatives exchanges do.  However, this is difficult to accomplish and requires careful structuring of code.</p>
<p>(2) The data looked up at point &#8220;A&#8221; and point &#8220;B&#8221; can be separated into different caches and updated independently.</p>
<p>Regardless of the workarounds, ideally a cache framework would give its users choices on isolation levels, similar to what contemporary RDBMSs do.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Denis</title>
		<link>http://digitalbrikes.com/onebrikeatatime/2007/12/26/of-the-impact-of-caching-on-your-design-iii/#comment-202</link>
		<dc:creator>Denis</dc:creator>
		<pubDate>Tue, 08 Jan 2008 04:44:39 +0000</pubDate>
		<guid isPermaLink="false">http://digitalbrikes.com/onebrikeatatime/2007/12/26/of-the-impact-of-caching-on-your-design-iii/#comment-202</guid>
		<description>Hering I agree with you that the dynamic proxies are more difficult to debug. This could be adressed in a number of ways but it is also the price of dynamism.

As for the second part of your comments, can you explicit your use case ? It seems somewhat contradictory with my premise that you don't want to hold onto stale data when using a cache.</description>
		<content:encoded><![CDATA[<p>Hering I agree with you that the dynamic proxies are more difficult to debug. This could be adressed in a number of ways but it is also the price of dynamism.</p>
<p>As for the second part of your comments, can you explicit your use case ? It seems somewhat contradictory with my premise that you don&#8217;t want to hold onto stale data when using a cache.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hering Cheng</title>
		<link>http://digitalbrikes.com/onebrikeatatime/2007/12/26/of-the-impact-of-caching-on-your-design-iii/#comment-188</link>
		<dc:creator>Hering Cheng</dc:creator>
		<pubDate>Wed, 02 Jan 2008 19:18:53 +0000</pubDate>
		<guid isPermaLink="false">http://digitalbrikes.com/onebrikeatatime/2007/12/26/of-the-impact-of-caching-on-your-design-iii/#comment-188</guid>
		<description>In my previous job at the Pacific Exchange (now part of NYSE), we adopted an approach similar to the one proposed here, where application code interacts with a "proxy" to the actual cached object.  We did not use Java dynamic proxy; instead, we used home-grown code generation technology where all of our domain objects are defined in XML files and a generator writes the Java classes.  Our proxy also serves the double duty of providing the mutable interface; it merges the job of the proxy mentioned in Part II with that in Part III.  We did not use Java dynamic proxy because it appeared to be more difficult to debug.  At the company where Denis and I work we use Java dynamic proxy, and my experience is that it is difficult to track down where the actual logic of an accessor method resides.

One issue that I'd like to see some comments on is how to deal with repeatable reads within a transaction.  This is an issue for which we did not implement a generic solution at Pacific Exchange.  Basically, we look up a piece of reference data (by primary key, say) at point X in a transaction and we read the same data again (using the same PK, say) at point Y within the same transaction.  How do we make sure that we see exactly the same data at both points if that piece of reference data was modified and committed between times X and Y by another transaction?  A naive solution is to pessimistically "clone" the data even for read-only purposes, at the cost of performance.  Another is to make the transaction (or the "dependent" proxy) "remember" the version (of the reference data, or "independent", proxy) that was retrieved at point X and look up the same version at point Y.</description>
		<content:encoded><![CDATA[<p>In my previous job at the Pacific Exchange (now part of NYSE), we adopted an approach similar to the one proposed here, where application code interacts with a &#8220;proxy&#8221; to the actual cached object.  We did not use Java dynamic proxy; instead, we used home-grown code generation technology where all of our domain objects are defined in XML files and a generator writes the Java classes.  Our proxy also serves the double duty of providing the mutable interface; it merges the job of the proxy mentioned in Part II with that in Part III.  We did not use Java dynamic proxy because it appeared to be more difficult to debug.  At the company where Denis and I work we use Java dynamic proxy, and my experience is that it is difficult to track down where the actual logic of an accessor method resides.</p>
<p>One issue that I&#8217;d like to see some comments on is how to deal with repeatable reads within a transaction.  This is an issue for which we did not implement a generic solution at Pacific Exchange.  Basically, we look up a piece of reference data (by primary key, say) at point X in a transaction and we read the same data again (using the same PK, say) at point Y within the same transaction.  How do we make sure that we see exactly the same data at both points if that piece of reference data was modified and committed between times X and Y by another transaction?  A naive solution is to pessimistically &#8220;clone&#8221; the data even for read-only purposes, at the cost of performance.  Another is to make the transaction (or the &#8220;dependent&#8221; proxy) &#8220;remember&#8221; the version (of the reference data, or &#8220;independent&#8221;, proxy) that was retrieved at point X and look up the same version at point Y.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
