Archive for the ‘Design’ Category

Test Driven Development and Design by Contract

Saturday, August 16th, 2008

I have to admit first that my first idea about this article was going to me a rant about how TDD goes well beyond what design by contract gives as I felt Bertrand Meyer in computer magazine dated August 2008 missed that point. Then I thought better of it and checked for resources through google and discovered that I am late to the party with at least three very intersting articles on thus very subject: DbC and Testing, more on TDD vs Design by Contract and Test Driven Development and design by contract, friend or foe. On the other hand I did not find a tool that was satifsfying to link TDD and DbC (don’t you love acronyms ?). So here is my attempt at a story describing what I would like.

Let’s imagine the following scenario. I need to write the function f of a service S.

That function will take an id and a number as input, find an object responding to interface O from a cache responding to interface C.

In a TDD manner I start writing the test. Discover in the process that S needs to know about a C. Stubs a C for the purpose of returning an O to my service. There I specify that the contract of the find message of C is that the passed id is not null, it returns a non null O or throws a NotFound exception.

Now I would like that the calls to the C stub would verify that contract to ensure I am testing in accordance to the contract. This way if the contract changes in a way that is incompatible with my assumptions, my test will break and signal the problem.

I would further like the contracts to be verified automatically against the instances of implementations of C throughout my development cycle. And I would like to be able to selectively keep the contract enforcement in my deployed product.

The net result is not to get rid of the defensive programming but to move it into a well defined area as a specific concern that crosscuts my software. I would even have the choice to test whether the contracts actually specify what I intended. It would also remove the necessity to test the defensive programming code in the same artefact as the functional code.

Why all that ? Because TDD, beyond creating regression test artefacts also is a way to specify the component being created and that therefore it would be nice to be able to capture in separate, executable artefacts the implied specifications made on the related components.

Anyone knows about such a tool ? Anyone wants to write it ? Am I crazy ?

Zen and the art of software development

Wednesday, April 16th, 2008

One striking thing in the Zen philosophy is the reconciliation of form and substance. In western cultures form is often disregarded in favour of substance. It is actually considered as a distraction, a ‘nice to have’. In Zen the form and substance can not be separated. Form must be the reflection of substance so that the essence of a thing can be grasped. Well that is my (shallow) understanding of it.

How does that apply to software development ? In the same way it applies to all design works. When we develop software we should work hard, and refine our work until its form, its structure matches its purpose.

Another aspect of Zen is the necessity of constant practice and refinement to achieve mastery in one’s art (who said Kaizen ?).

This translates into a fluid attention to detail. In other words, details are taken care of, not by a concentrated effort but by the natural flow of thoughts that creates the work of art that your software should be. Attention to detail is probably not a good term for that as it expresses exactly the opposite of what needs to be achieved. In effect, when we master our art, our mind should be as the water that smoothes the pebble by constantly flowing around it.

The meaning of this is that those who work with an exclusive attention to the final product are bound to deliver a rough product. And those who focus their minds on successive details are bound to not deliver a hacked product. All things considered, it is not that far from the Greek approach to art.

My readings on Zen: The Unfettered Mind and The Book of Five Rings

Your company’s app

Friday, March 14th, 2008

Hilarious.

Your company’s app:

(Via Monoscope.)

Jay Fields on Testing

Tuesday, February 5th, 2008

Three good articles recently posted on Jay Fields’ blog. I may have been guilty of some of the described behaviours… mea maxima culpa !

Behavior Based Testing

State Based Testing

Testing: High Implementation Specification Smell

Especially check out Martin Fowler’s Mocks Aren’t Stubs.

MVC’s three classes myth

Tuesday, February 5th, 2008

Model View Controller is a graphical interface design pattern that aims at ensuring separation between the presented thing and its presentation by loosely coupling them with a controller. The primary benefits of this are the ability to modify one side with minimal impact the other (the controller providing the flexibility required). It also enables a single thing to support multiple representation of itself. Finally it enables composition of smaller elements designed in this way into larger interfaces. This last part is mostly theoretic in many cases.

I was trained using the PAC model which is close enough to be deemed a variety of MVC. What I was taught is that although there are three conceptual parts in a GUI, this does not assume any implementation. Let alone an object oriented one. So why do we feel compelled to have at least three classes named (more or less) Model, View and Controller ?

I believe there are at least two reasons for this.

The first one is that many among us are not that confident in GUI development. We tend to either overlook it as not important or we find it to cumbersome and boring to pay too much attention to it. So instead of thinking well about we are doing we try to apply a well known recipe, I mean pattern, blindly. This attitude leads in turn to more complex code, not as much re-factoring or testing than in other places, less design effort devoted to it. The final result being that GUI is difficult to develop, hard to maintain and is generally painful to produce. This, in turn, entirely justifies the original behaviour.

The second reason is that it is easier to explain to people (developers, management…) that are not familiar with GUI development. It is much simpler to say that you follow a MVC pattern, here is the Model, here is the View and here is the Controller than try to explain that Model and View are fairly easy to identify but that the controller is actually composed of a myriad of listeners, callbacks, small coordination classes, glue code and such. It is even harder to explain that part of the Controller is implemented in a class called Model and another part exists in a class called View.

So basically, everyone is happy and that is why we should continue to do it (which I have to admit is not the conclusion I had in mind when I started this article). However, we need to remember this is largely a myth (or a tale), that the reality is different and that we need to plan, design and act based on reality.

As always, comments are very welcome.

A cache should be read through

Saturday, January 12th, 2008

Although the subject of this article also has an impact on software design, it is a bit more subtle (and simple) than what was exposed in the previous three part series.

Read through means that the cache reads from the data source when it does not already hold a requested element. This is very important because it implies that the cache is NOT the data source. All too frequently I find there is a desire to replace the data source with the cache. This is usually the result of past misuses of databases and result in misuse of caches.

 To clarify my point of view, as its name indicates, a cache contains a sub part of a larger data set. If it achieves its goal, it contains the most useful subset of data (according to some utility function which often is most often used). A consequence of this is that it does not contain the whole data set and the subset it contains is susceptible to change over time, evicting elements and adding new ones.

Now each element of the data set is useful or it should be removed. It therefore needs to be accessible to the application. If the cache is read-through it is able to query the data source and provide the application any element from the data set and also decide which elements should become part of the cache and which should go. If the cache is not read through, the same effect must be achieved in another way (for example the application could query the data source when it is not found in the cache, then offer it to the cache for inclusion and finally provide it to the piece of code that actually needs it. It is achievable (we did precisely that for various reasons in my current project) but it is very cumbersome and imposes very painful constraints on user interactions.

Another good reason to make the cache read through is to fight the assumption that the cache constantly holds the whole data set. If this is a requirement of the application then it is not a cache it should use. If it is not a requirement of the application it is a very dangerous assumption to make as it will result in very difficult fixes when the data set reaches a size that makes it impossible to maintain it in cache. This temptation is all the stronger that caches are often used to cache the object-relational mapping. But it is an assumption that cost a lot of time and money when it becomes false (again this is rooted in my personal experience at Calypso Technology). This kind of assumption also makes data set maintenance (new elements or updates from external sources) more complex.

A corollary of having a read through cache is that it should be able to handle querying in some way, regardless of whether the data is in cache or not and provide a complete result set.

Also read through caching does NOT imply write through. If the cache is read through, it is much simpler to manage writes outside of the cache.

I realize that it could be argued that the caching mechanisms themselves can be isolated from the read/write/query management and still be presented behind a unified facade to application developers. I would contend that conceptually, from the application developer’s point of view the cache is the facade.

Of the impact of caching on your design (III)

Wednesday, December 26th, 2007

This is the last of a three part series which started here. The first two parts aimed at describing two common and critical issues that cache users face when designing an application. This article will focus on a possible solution. I never actually tested this solution although I believe it is feasible and useful. It can be described in three major steps.

First for each type to cache, create an immutable implementation of it. The immutable structure is beyond the scope of this article but it needs to solve a particular problem of its own which is the serialization efficiency. In the proposed model the serialized form of the object can be stored in the object itself when it is created. Furthermore attributes could be extracted from that serialized version as needed. This last may not be useful in some cases.

Second whenever an object is read from a cache, it is not the immutable instance that is returned but a dynamic proxy to it. There are a number of advantages to it. One is that it solves the concurrency issue described in part II. It also makes the application resistant to evictions without any structural changes. It enables to propagate changes to cached objects even after they have been taken from the cache. It enables predictive or lazy loading of objects in the cache depending on the need. More importantly such a solution enables developers to completely ignore the fact that an object may or may not be cached when they are working outside the cache or the proxy itself. This in turn makes it easy to change an type from not cached to cached.

Third implement a transaction mechanism in the proxy or as a second type of proxy. This is essentially to enable changes made through a proxy that responds to the mutable interface of a type to stay in the proxy. This proxy can be used to detect changes and control the creation and or modification of the transaction context accordingly. That transaction context in turn will offer control over the scope of the transaction (when will the cache return the modified proxy representation rather than a ‘clean’ proxy). This is to solve the issue highlighted in part I.

In conclusion, the proposed solution is applicable to a wide range of languages and situations although it was strongly influenced by my experience building Java applications. It requires a significant development effort to put in place the proxy handlers and the transaction scheme around the cache. However this is a one time effort that will solve problems you will need to solve one way or another anyway. The approach, along with a few design guidelines for immutable objects, applies to all cached objects making it very expensive for a small application and very cheap for a large one. It also has the benefit of removing the distraction of caching issues from developers. Finally it is probably possible to make it a library although accommodating for some of the flexibility a real life application needs may result in yet another XML hell.

Of the impact of caching on your design (II)

Friday, December 21st, 2007

This is the second part of a three part series.

The first part was about object references. There is a second pitfall of caching I discovered while working at Calypso Technology and experienced again in my current project. And it seems it is even more overlooked than the direct reference of cached objects. It is the mutation of cached objects.

It is often overlooked because it is very subtle to detect and track down. Obviously it only causes problems in multi threaded applications when two threads concurrently use (not necessarily modify) an object. This is independent of the actual transactional properties of the cache.

The obvious solution is to make a copy of the object to modify before starting to modify it. Another, safer, solution is to put only immutable objects in the cache. This effectively enforces the check-out of the object before it can be modified. It also fits easily in any transaction scheme as the check-in essentially is to create a new immutable object and put it in the cache.

In my current project we came up with a somewhat elegant solution that uses dynamic proxy to provide the mutable side of the objects. This enables to make to implementation cost lighter and provides additional functionality such as show the exact changes, simulating changes (and even persist them) without actually applying them. The most significant advantage of that is the ease of use of that model: it requires an explicit check-out but the modified object is freely usable inside the transaction without having to knw if it is being modified or not. All this significantly reduces programmatic errors. The drawback is that it requires a significant implementation effort.

Another side of this problem is the requirement to keep track of the version of an object to enable optimistic locking rather than the more expensive pessimistic (preemptive) one.

The third and last part of this series (Of the impact of caching on your design (III)) will expand on some of these ideas to try to provide a comprehensive design pattern around caches.

Of the impact of caching on your design (I)

Thursday, December 20th, 2007

This is the first of a three parts series.

I first stumbled upon widespread caching of objects in 2001 when I joined Calypso. At the time the idea was mostly to save time on the relational to object mapping. I did not realize at the time all the consequences of that on the design of the software. Since then, companies have flourished selling distributed caching and I discovered that I was not the only one that lacked this knowledge. So I will try to summarize my accumulated experience on this subject in the first two parts and I will try to finish wih a more speculative solution (a solution that I have not had the occasion to test personnaly).

The most important thing to remember when using caches is that objects that are cached should not have a direct reference to other cached objects. I first understood that in 2002. The reason is that if you have a reference to an instance in cache; when that instance is replaced by an updated version of the object the referencer will be holding a stale version. This obviously quickly becomes a design headache. There are two basic solutions to this (I reserver the fourth one for my speculative part).

The first one (I tried it in 2002-2003 and it works fine for a small project with few cached objects) is to actually reference the instance and never discard it. Instead, when an update comes in the existing (cached) instance is fully updated. This is a bit cumbersome to implement. In java it has the advantage of making use of garbage collection properties with very long lived cache objects and very short lived update objects.

The second one is to modify your object model to use only the identifier of the cached objects. This in tuen implies to provide access to the caches from almost everywhere in the code (whereas in the first solution few parts of the code, apart from the update mechanism, need to know about the cache). I am sure that dozens of solutions can be found to handle this. Ideally, the solution does not imply to change the interfaces (accessors) and most of the code, aside from direct referencers can ignore the mechanism. Also note that collections can become a bit trickier to handle. This is the solution the project I currently work on uses. It requires a fair amount of code to have a well polished solution.

Now there is at least third solution that I can think of. It was rejected in my current project because the team did not have a high level of comfort with it. That solution is therefore entirely speculative on my part. I will present it in part III of this series.

Next: Of the impact of caching on your design (II)

Polish on my software

Tuesday, December 11th, 2007

This article is long overdue but, trying to be agile, I had to adapt to events and threads that happened since I my previous entry in this series.

One of the important points in the graphic design book I read is that graphical design is iterative. Each iteration on design helps refine it by eliminating noise and keeping only the relevant information.

As Agile proponents know very well, software development and software design are iterative processes. However it is often seen from the perspective of adaptation to change and short time to market. It is undeniable that this is true. But it is also true that underlying all the software development processes I know is the iterative refinement of design. In waterfall, even if we skip over the original misunderstanding that made it an explicitly iterative process, each part of the process is iterative (from requirements to specifications to development). Formal methods use an iterative process to refine and precise the specification until it is translatable into code. Refactoring is a common name for refining code design.

The result of these refinement cycles is to make the software more elegant, easy to navigate and enhance it’s ability to accommodate user requirements.

Which leads me back to design in general as Richard Gabriel was presenting it at QCon: human design is canalized. We use patterns to accelerate our design and we refine our original design until it is aesthetically satisfactory. If you look around all our activities are driven in this way. Even scientific theories work in this way. And it takes specially inspired people to break existing patterns and find new ones, new ways of doing things. We call them geniuses.

Back to the point, the only way to achieve good software is to enable and support successive polishing of the initial design (formal methods use demonstrations for that, non formal methods use refactoring). More importantly, as software lives in constrained and changing environments, it is constantly bent and scratched and customized. These in turn results in software decay that will eventually make it useless, unless the proper polish is constantly (and lovingly) applied.