Tuesday, June 6, 2006

Of Factories and Databases

While working on MachBlog we debated how best to handle the multiple database conundrum. When creating an application such as this that needs to support potentially numerous databases there is a fork in the road. Either a single single set of database access code is created that has conditionals throughout to handle database-specific functionality where needed, or database access code unique to each database platform can be created, but this runs the risk of having redundant code where the more mundane SQL is identical across database platforms.

In the end we decided that for the sake of maximum flexibility and future extensibility we would have a set of data objects (DAOs and gateways) for each database platform. Out of the box MachBlog will have two complete sets of data objects, one for Microsoft SQL Server, and another for MySQL. Make that three, actually, since for the sake of object typing we have a set of base classes that are extended by the database-specific objects. The base classes are never instantiated directly. They exist only to provide a common object type that the rest of the application will use, and they also provide a bit of generic documentation for future database implementations.

Either way you go there's some potential maintenance headaches involved. In our case since we have separate objects for each database platform we do have a fair amount of SQL code that is identical between MSSQL and MySQL. The tradeoff here is maximum flexibility. For example, right now views and stored procedures aren't used in the objects for either platform, but let's say I get the crazy notion I want to implement stored procs for SQL Server. MySQL 5 has stored procs but MySQL 4.x doesn't, so to make the MySQL objects compatible with both versions I decide not to implement stored procs for MySQL. No problem, I can change my SQL Server objects all I want and the MySQL objects won't know or care, and I don't have to create potentially messy conditional logic throughout my database objects to do so. Maximum isolation, maximum flexibility, with the downside of potentially having to make the "generic" changes that aren't specific to either platform in multiple places.

The specifics of how this is achieved is through the factory pattern. It's a simple enough concept, but extremely powerful in practice. In short, rather than instantiating specific objects for MSSQL or MySQL directly when the application starts up, the application simply asks the DAO and Gateway factories to return the appropriate objects for the database type that is indicated in the application's configuration file. Because the data objects for both databases extend the same base class, the rest of the application can expect to receive an object of the "generic" type even though the specific instance is suited to a particular database platform.

This also makes it trivial to add support for additional database platforms in the future. In fact a new datbaase platform could be added simply by creating data objects for the platform in question and that's it, other than literally a one-line edit to the application's XML configuration file. The rest of the application's code is happily unaware of this entire process.

There are of course pros and cons to any architectural decision, and this is no exception. As my music theory teacher always used to say, "You pay your money and go either way." Personally I think the independent data objects that are platform specific give maximum flexibility while making the code far easier to deal with and maintain, not to mention extend through the addition of new database platforms in the future. Each database can take advantage of its individual and potentially unique strengths and any weaknesses of particular platforms can be avoided. Meanwhile the rest of the application isn't concerned with what database is being used so long as the data it asks for is returned.


Hello Matt,

I'm probably missing the subtleties, but I'm doing something similar. The only difference is that any methods that are either identical between children (in your case MySQL vs. MSSQL, in my simpler case it is between different entities - UserDAO, PageDAO, ProductDAO, etc.) or have only simply parameterizable differences, I put those methods in the base class with the option to overload in the children if necessary. So if MySQl.Delete() and MSSQL.Delete() are identical (or the only differences can be parameteized in the extending childrens variable scope), you simply have a Base.Delete(), but then when you want a more sophisticated MSSQL.Delete(), you can then add it to that one child class.

Not sure if this matches the details of the problem you were solving . . .

Best Wishes,

Thanks Peter--we actually do have things like EntryDAO, EntryGateway, etc., with a base object, MSSQL, and MySQL version for each. They're separated into different directories but the file names are the same between them. I've done things in the past as you describe with the common SQL in the base object but decided to take a bit different approach here and so far I like it. At this point anyway I prefer having fully implemented objects for each type and not really relying on anything from the base object in terms of specific implementation. I'll see if this opinion holds up as we add a few more databases to the mix.