Wednesday, January 26, 2005

DAOs and Composition

In the spirit of what Joe Rinehart and a few others have been doing so far in this the "year of OO for CFers," I thought I'd share what I've been working through this week. Specifically I've been grappling with Data Access Objects (DAOs) and how best to use them when composition is involved with the objects. I've done this several times before but I'm using a rebuild of our CFUG site to dig deep and try and figure out the best way to handle this (I'll stop short of saying "the right way to handle this"). For example, let's say I have a Person bean and an Address bean, and the Person bean has an Address bean in it. What's the best way to handle this situation in the DAOs?


 


Let's focus on the create method specifically since it's the most messy. If we're dealing with a Person and an Address, let's assume we have a form on which a person enters their basic personal information (name, email, etc.) and also their address information. When they submit this form, on the backend we need to create a Person object and an Address object, populate them with the form data, and pass them to a DAO to run the create method. There are a few different scenarios I've mulled over, and I've come up with my preferred way of doing this, but I'd be curious to get your feedback.


Scenario One: Handle Person and Address Separately


Because these are separate objects and each have their own DAO, one way of handling this would be to handle the Person and Address objects separately. In other words, after the form submission, in whatever component is handling the logic of processing the form, populate a Person, populate an Address, then call personDAO.create(person) and addressDAO.create(address) in sequence.


The Issues


This may seem like the simplest way to handle things, but there are a few issues that arise. First, there is a bit of chicken and egg stuff going on with the relationship between Person and Address on the RDBMS side. Since the person and address tables are separate and related through a key (address_id in the person table), the Person really should have an Address id before personDAO.create() gets called. So we could reverse the order of the calls above and call addressDAO.create() first, then grab the address id from the result of that call and put it in the Person object, then call personDAO.create().


This isn't necessarily a bad way to do things, and would certainly work in this situation, but what happens when you get into more complex scenarios with multiple instances of composition (which in my application is the case)? In some cases the relationships and chicken-and-egg stuff gets even more complex, so you end up making multiple calls to your DAOs (create first to get the ids needed, then update later to put the ids in the necessary spots). Also, in my opinion you end up really muddying up things in the component that's handling the logic of the form submission (which in Mach-II is done in the listeners). So in my mind I scratched this option off as not the way to go about doing things.


Scenario Two: Put Address in Person and Have Separate Queries in the PersonDAO


It doesn't take but a few lines of typing the code for this scenario to realize what the faulty logic here is. It may seem like a decent idea at first, and you can even put the person and address queries in a nice tidy transaction on the database side, but you end up completely duplicating the code that's in the Address DAO, which is a big, big, big no-no. I must admit in some cases for deletion I have done this, but delete queries are typically extremely simple and likely wouldn't change over time. Something like a create might change in the future (if you add a field, for example), so you'd end up having to maintain the create logic in two places. Clearly not the right solution.


Scenario Three: Leverage the Address DAO Within the Person DAO


We're using composition in our beans, so why not use composition of sorts (this isn't strictly composition, but bear with me ...) in our DAO as well? When we instantiate the Person DAO, why not just instantiate an Address DAO inside the Person DAO so we can call things that way? Then after the form submission we're instantiating the Person bean, the Address bean, putting the Address bean in the Person bean, and just calling personDAO.create(person).


At this point, after much pacing and pondering, this is what makes the most sense to me. That isn't to say there aren't downsides here as well, which is why I'm posting my thoughts to get some feedback from others on this. I've seen plenty of examples of Java code that do this, so I'm assuming it's not flat-out "wrong," it seems to work well, and I really like the fact that it keeps things self-contained by only necessitating one call to the Person DAO (even if behind the scenes the Person DAO is actually calling the create method of the Address DAO as well).


There are still some chicken-and-egg situations when you do things this way. Those are relatively unavoidable in any of these scenarios. But what this gains you is A) complete reuse of the DAO code (no duplication of code as in scenario 2 above), and B) it keeps the logic of the form processing component a lot cleaner because you're just instantiating some objects and passing a single object to its DAO for the create action.


So am I on the right track? Is there a fourth option I haven't considered? Should I find a new career? Let me know your thoughts.


Comments


I'm glad you posted this. I have the same personal debates and haven't been able to fully commit to any one solution as being the "best". Maybe you have an object UserAddressDAO that you call to create and it instantiates the UserDAO and the AddressDAO - so the "chicken egg" logic is handled in that one place where the id relationship is handled at a data abstracted location (as opposed to in your business object)? (I guess this is kind of a facade) I am anxious to hear what others think. I am currently debating the inclusion of some deletion code in a DAO as you glossed over "I must admit in some cases for deletion I have done this, but delete queries are typically extremely simple and likely wouldn't change over time." on the cfc-dev mailing list and can't really come to a "feels right" answer on that either. I currently do just what you mention - i'm just not sure its the best way.


I think it makes sense... I do this type of thing all the time. It makes sense that if person HAS A address then personDAO has a addressDAO. In fact, I even have DAO's that contain gateways for another type. Just the other day I had a "student" object that had a property like "previous schools attended" (an array). So, my studentDAO received an instance of a schoolGateway via. dependency injection (rather than creating it on it's own), and during it's read() operation, it placed a result from the schoolGateway.getPreviouslyAttendedByStudentID(id) into the student instance. All the merits you mentioned are valid... in fact I use this same instance of the schoolGateway elsewhere. I know I'm starting to sound like a broken record, but you should take a look at what I've done w/ the spring -> CFC port. It makes managing these situations a hell of a lot easier.


Matt, This is a GREAT post. I think you're right on track, and you did a great job explaining why Scenario Three is probably the the rig -- erm, a Very Good way to handle this problem. Patrick


Thanks for the thoughts Bill--the issue I see with the UserAddressDAO situation is that you wind up with a class explosion of sorts. If we throw a Company object into the mix, do you then build a UserAddressCompanyDAO? That could get ugly real quick. ;-) When I'm thinking through this stuff I try to talk through all the scenarios, no matter how unlikely they seem at first, just to make sure I understand *why* something isn't a good solution. Then I can feel like I've covered all my bases. There's someone who posts on one of the CF lists with the tag line, "I haven't failed. I've found 10,000 ways that don't work" (or something along those lines), which I think is attributed to Edison. That's definitely how I feel doing all this OO stuff. I often spend more time thinking through all the options than actually implementing the one that makes the most sense, but this isn't necessarily a bad thing! Better to spend a lot of time thinking through everything than building a bunch of stuff you have to throw out when you realize you've coded yourself into a corner.


Thanks Dave--I'm really interested in checking out the Spring port you've done. I've read quite a bit about Spring on the Java side of the world and it's pretty impressive, particularly in contrast to something like Struts. Can you post a link here or email it to me? I definitely want to check that out.


Thanks Patrick--given the brief feedback so far at least I know I'm not way off the mark. That will allow me to sleep better at night. ;-)


Though I find myself using the third scenario most often, here's a fourth scenario to throw out there for discussion: Scenario Four: Drop the AddressDAO In this scenario, all the DB functions for managing Addresses would be handled through the PersonDAO. I'm sure there are alarms going off in your head, thinking "What?! I'll have to rewrite the code to manage Addresses for any other business object that has an Address!". That's more than likely correct... the Person-Address example isn't really a good one for this scenario. Instead, consider this example of composition (as opposed to aggregation or association): a Worm is composed of Segments. Since Segments cannot exist without a Worm (and don't make sense outside the context of a Worm), Worm manages the lifecycle of its Segments. Similarly, we can have WormDAO manage all the DB functions for Segments, since no other DAO will ever have to work with Segments (no danger of duplicate code). One big plus of this approach is that you can take advantage of the DB's joining capability (and grab the Worm and all its Segments in one, optimized query). Of course, if WormDAO starts getting bloated, you can factor out the Segment functionality into a SegmentDAO.


nice post. How to handle multiple addresses per person with your scenario three? I suppose have person.address be an array of address type, eh? Now, what I am really stuck on is how to wrap these creates into a DB transaction? what happens if the person.create is successful but the address.create is not? Doug


Definitely true Dough--if there is no need for a separate Address object (or Segment object in your example), then you wouldn't need and really shouldn't have a separate DAO unless there's some other driving need. I actually thought about not having the Person and Address have a separate DAO, but in the end I decided to keep the two separate so I could expand in the future as needed. Great point though--don't create objects just for kicks; make sure you need them and it makes sense to have them be separate entities.


Douglas, I'd likely handle multiple addresses as you outline, namely with an array of objects. As for your other comment concerning the transactions, I'm still messing with that myself so I'd be curious to hear what other folks have done. What I usually do (at this point anyway) is have the create() methods return some sort of indication of whether or not they were successful, and that way I can proceed or not based on the success of each step. That doesn't address the rollback issue, however, so if the first step fails I'm ok, but if the last step fails I'm in a bit of trouble. I haven't quite thought through the best way to handle that other than something like keep track of whether the last step fails, then you can go back and run deletes as needed on the previous steps. Not particularly pretty but it does the trick, although then you get into what happens if the deletes fail ...


good post Doug. I think what you say about the Segment/Worm makes alot of sense and will probably apply to the problem I am having. I hadn't thought to have one "big" dao that takes care of a Parent Element and its subordinate objects (that can only exist if the parent exists). Then pull out those subordinate objects that end up needing to be in their own DAO. I was guess I was under (the false?) impression that a DAO would only deal with one objects interface to the datasource.


I think this was my impression when I first started learning all this as well Bill, that a DAO would only deal with one object. The more code samples I looked at (which are largely in Java, which is one of the reasons I'm trying to make this big charge for CF-specific OO materials this year), the more I saw that this wasn't necessarily the case, and it started to make more sense as well as simplify the code overall, which is always a good goal to have. The CFUG site I'm working on is turning into a bit of a beast, but once it's done I think it should be a relatively decent example of a lot of these concepts as well as a good semi-large Mach-II sample application. I've gotten a lot out of studying Phil Cruz's mach-ii.info site as well as several of the examples that Joe Rinehart, Scott Barnes, and others have put on their blogs.


in regards to the rollback it seems like I remember reading that a transaction will work across components in CF 6.1 now. So, it seems to me that having a "transaction" object that calls you various methods for you - a facade i guess since it simplifies the interface while still giving you full access to the power behind it - the facade woudl call the various methods for you - and in the facade you would wrap those method calls in a cftransaction. Now, something I have never tried is havinga cftransaction wrapped around these generic factory /execution combinations so that the cftransaction is in place as needed if my datasource is a db that supports transactions - but if my datasource is something else - with no queries for instance - that the code won't blow up. Now I'll have to confirm with my memory if that is a valid scenario - cftransaction wrapped around code that contains no queries. It seems like it would be OK but would have a slight performance hit. hrm. Of course, you could have a facade factory i guess that returns the correct facade based on your datastore and the xml based facade wouldn't have the cftransactions in it but the sql based one would. I guess the merged DAO thing that Doug mentioned would work here - in the same regards a merged "facade" could exist to handle an object and all of its subordinate objects (those that wouldn't exist without it). So you could have FacadeFactory -> Facade_xml || Facade_odbc the facade_odbc would be responsible for instantiaing whatever objects it needed then wrapping those object method calls in a cftransaction. this way you get the rollback, the separation of methods to their reusable components (DAOs and so forth) and an easy to access interface for your application to deal with. What do you think? Does this seem like a huge stretch?


That gives me a lot of very cool ideas to try out Bill--thanks! As with most of this OO stuff, there's a million different ways to go with things, so I'll have to experiment, particularly with the transaction stuff. If there's a way to get that going without necessarily involving database queries that would be very cool, I've just never tried it. One of those things I just assumed wouldn't work so I never messed with it.


I always use a service object which has my DAO called inside of this. This helps with the situation you talk about above because I would have a personService that has a getPerson() and savePerson method. savePerson knows to call save using the personDAO and then to call save using the addressDAO each address inside my person. This service layer also works great when Flash remoting and/or web services are involved. I would be happy to do a blog post on my method to get more feedback if people are interested.


Thanks Kurt, that would be cool. It sounds like that's similar to what I'm doing (as far as calling the person DAO method and then that in turn calls the address DAO method), only with the addition of the service layer in the picture. I'm always happy to see details on what other folks are doing with all of this.


Best stuff I read today. Thanks for the inputs. DK


This is great Matt ! This blog (and all the comments) just answer my previous problem posted on CFCDev about an Article objects with multiple properties in it. I didn't know it, but what I meant was _multiple composition_, it's hard when you don't know the terminology of what you need. Now I can have my _ContentObjectClass_ DAO managing all _ContentObjectClassProperty_ DB queries, like Doug Keen segment and worm example. Or maybe I can have simply the _ContentObjectClass_ DAO calling the _ContentObjectClassProperty_ DAO, but anyway, things area getting better all trhe time, thank you guys.


I think I saw the discussion on CFCDev Marcantonio which is what got my wheels turning on all of this as well. I'm getting a lot of great ideas from the discussions as well.


Excellent discussion! I ran into the same issue, but with a header/detail object scenario. I used the same idea as Kurt and created a Service. That allowed the service to contain the logic that knows how to deal with the interaction of the DAO objects and I can keep the DAOs as "pure" as one can. For the batch header and detail row issue, I used a UUID to solve the chicken and egg issue. It requires 32 more bytes per rows to store a char(36) vs. an integer, but our main tables will only have up to 1,000,000 rows. We archive to a data warehouse scenario and those tables use identity columns for smaller storage. I have built and torn apart my objects and services 3 times now in the past 2 weeks, but they keep getting better. And don't get caught up in the *best* process - it will never be good enough. I have been at this for more than 16 years and the idea of *satisficing* is best practiced. No matter how good you build it this time, you will rebuild in the future to take advantage of new technologies and methods. That's the price of progress. And you can't let the new folks take advantage of new ideas/technologies while you stay connected to your old *best* methods. Things move too fast for that to be successful. Excellent conversation!


Thanks for the thoughts Paul--you know, I thought about using UUIDs and then "just decided not to" (no rhyme or reason to it), but now that I've thought through this stuff 100 times I think I may rework my database and all my CFCs to use UUIDs instead. I'll think through all my scenarios to make sure it'll get me what I'm after, but you're right, that would definitely avoid the chicken-and-egg situation with the IDs. Thanks for bringing up something that makes me want to do yet more rework! ;-) Excellent thoughts on refactoring as well. It's nice to hear from someone with a lot more experience than myself so I know I'm not just bad at this stuff! I'm a big enough dork that I'm having a great time thinking about all these OO design issues, reading all the books I can get my hands on, experimenting, reworking, etc. Sometimes I feel like I'm spending tons of time thinking and not a lot of time doing, but as I think I said before, better to do that and make well-informed decisions than just slap something together and hope it's maintainable. Thanks again--now off to see about using UUIDs ...


You need not use UUID's to beat chicken-and-egg; you could use a Sequence. In a current project I have a MS SQL Server db table named 'Sequence' with columns 'SequenceName' and 'SequenceId'. My DAO's all extend a GenericDAO.cfc which has a getNewId() method, that calls a stored procedure to get the next integer id in a specified (named) sequence. In this app my table has 3 rows, with 'SequenceName' = 'UserGroup', 'NodeItem' and 'ValueObject', and SequenceId for each row is the next integer id that will be returned. My UserDAO and GroupDAO both use a 'UserGroup' sequence, so each sets variables.sequenceName = "UserGroup". getNewId() calls the stored procedure like this: EXEC nextVal @sequenceName=<cfqueryparam cfsqltype="CF_SQL_VARCHAR" maxlength="50" value="#variables.sequenceName#"> We adapted the stored procedure from one by James Thornton ( http://jamesthornton.com/software/coldfusion/nextval.html ) -- ours looks like this: CREATE PROCEDURE nextVal @sequenceName nvarchar(50), @sequenceId int=NULL OUTPUT AS -- return an error if sequence does not exist -- so we will know if someone truncates the table set @sequenceId = -1 UPDATE Sequence SET @sequenceId = SequenceId = SequenceId + 1 WHERE SequenceName = @sequenceName Select @sequenceId as sequenceId GO So if you want, each of your DAOs can have its own sequence of integer ids or all can use the same one. My DAOs use getNewId() within their create() methods to grab an id just before inserting the new record, but you could just as well call getNewId() as a public method up front, set it in your User bean and do it again for your Address bean. Now each can know the other's id before you persist them. Hmmm, I guess for flexibility the create() method should use check for a non-zero id in the incoming object and use that if present; if the id is zero (as set by the constructor) then call getNewId() for the id to use in the INSERT query. There's your 32 bytes per record back, at the expense of calling the stored proc once per id (instead of using CreateUUID() in CF). Also, integer ids are nice if you have to move things between (say) Oracle and SQL Server, heaven forbid...


lars: I understand your approach, but there are 3 issues for us: 1. We would have to incorporate into the sproc a check for uniqueness. We are going to key off that new ID and the Sequence table row for a given SequenceName can be reset without any db constraints validating it. This only pertains if you care about uniqueness, which we do in our case. 2. This solution will have a difficult time scaling as every item in the application will now depend on this one DAO and sproc to handle all new rows. That could cause a very large logjam. 3. UUIDs offer one more benefit: they do not have to be created on your system. For instance, in our solution, our own application will be calling the Service and DAO objects from within our app, but there is also a Web service that uses the same ojbects. Other systems can pass in their chunk of information, using their own UUID, but we can also use it since it will be unique not only in our system, but the world. UUID pretty much guarantees us uniqueness from any system. In this manner, all of the disparate systems have their own unique key (the UUID), but we are able to link them together without having to map our internal ID in a row to the UUID in order to speak with other systems regarding this chunk of data. The CreateUUID() CF function also saves us a trip to the db server. Every little bit counts (pun intended) when looking at large volume transaction systems. For us, storage is cheap and network bandwith is a bigger concern. But that may differ in other scenarios, in which your solution handles the tasks with ease.


Matt, I think your chicken and egg issue is due to the DB design. Instead of putting the address_id in the person table, you should have a separate table to join the two. person_addresses { person_id address_id } That would actually more closely model the separation of objects if that is your goal. It also allows for a person to have more than one address (which is a very likely scenario). Using this method, you no longer need worry about which gets created first - you simply create each one, presumably retrieiving the new ids from each operation, then you perform an insert those new ids into the person address table.


Matt, I think your chicken and egg issue is due to the DB design. Instead of putting the address_id in the person table, you should have a separate table to join the two. person_addresses { person_id address_id } That would actually more closely model the separation of objects if that is your goal. It also allows for a person to have more than one address (which is a very likely scenario). Using this method, you no longer need worry about which gets created first - you simply create each one, presumably retrieiving the new ids from each operation, then you perform an insert those new ids into the person address table.


Paul: yes, UUID vs Sequence is not a one-size-fits-all discussion! Your point #1 (Sequence id's must be unique) would suggest table-level security on the Sequence table. Your point #2 -- you could have many DAOs but yes, they'd all call the same stored procedure. Throw enough traffic at it and I guess anything bottlenecks. There must be stopgap solutions (e.g. 2 independent Sequences and sprocs -- one produces odd numbers, the other even ones... or do it with N greater than 2). Come to that, the Sequence could live on a different box than the main db. Not arguing with you, just "thinking aloud". Your point #3 -- yes, I _love_ UUIDs for being GLOBALLY unique, and they can come from a whole other process. I hate that not all platforms I use support UUID / GUID as native types -- I'm sure casting them to 36-character strings and having to index them as text primary keys in, say, Oracle, incurs a wee performance hit that might add up at very high volume. Thanks for the reply.


Roland--thanks for the thoughts. The reason I'm not using a join table for addresses is because, in this current app anyway, people will always only have one address. I also have companies who have addresses, however (again they'll only have one), so that's why I separated the addresses out. In other cases in this app I do have linking tables (person_skill for example) but not for the addresses. I'm just not sure that if there really wasn't a need (in this case there might be, but in other cases not) for a join table that it's worth having the extra table, but it's worth considering I suppose. Good point about avoiding some problems with this though. In the interest of incorporating this and allowing for the possibility of multiple addresses for people and companies (total overkill for this app, but what the heck!) I'll probably rework this as well. Thanks!


Lars--thanks for the info on sequences. Yet another option to consider! And thanks to everyone for all the discussion on this point, it's really helping a lot of people I think. There's a lot of ins and outs to some of this stuff so bouncing these ideas around is really great.


I agree, Matt. Some great ideas from everyone. It's nice to see that people can share thoughts in a way that we all learn how other people approach problems. Lars's comments got me thinking down another road...but I will test some things out and we will have another discussion on another day! Good Luck!

1 comment:

David Ostrander said...

Personally, I'd leave my person and address DAO's alone. Create a new object that extends person and and implements address. Keeping things loosely coupled that way will give you more flexibility for tasks like creating persons, that don't have address information (A blog post responder for example).