Tuesday, November 4, 2008

Avoid Refactoring Yourself Into a Corner

We have developer meetings twice a week at work, and we use these as a forum for people to show things they've been working on, ask for advice on how to approach solving a particular problem, share cool tips and tricks they learned, etc. This has been working really well and fosters what I think is the most important aspect of any development team: learning from each other. Regardless of the various skillsets and experience levels on a team, people have different areas of expertise and I learn something new or start thinking about something differently at every one of these meetings.

Today's meeting was particularly interesting and since it's not an uncommon situation, I thought I'd share a bit about it. A co-worker has been working on an application for the past few months and this application has a very hard deadline. Without going into all the specifics, based on the client's own description of how things should work, this is a person-centric application, meaning everything in the application revolves around data associated with an individual person. Each person has basic demographic data but also a minute-by-minute timeline associated with them, and this timeline consists of events that may either be unique to the individual person or may be inherited from a parent group that is used to organize individual people.

We've discussed this application quite a bit at our developer meetings lately because it's been an interesting exercise in developing with Mach-II, but more importantly it's been a great case study for effective domain modeling in an OO application. It's simple enough that it's not overwhelming from a learning standpoint, but it's also complex enough and has some interesting wrinkles that take it well beyond the level of the basic OO applications that we've likely all studied or seen in conference presentations a thousand times. It's been great because even the people not involved with the application are learning a lot through the discussions at the developer meetings.

The basic overview of the domain model is there is a Person object, which is a typical bean, and one of the attibutes of a Person is their timeline, which for ease of use and efficiency is a query object. When a Person is instantiated they get their timeline, which is created from events explicitly associated with that individual person and potentially events they may inherit from their group timeline. This works very well in dealing with individual people and their individual timelines.

Just recently a new person on the client side was added to the mix. (If you're saying "uh oh" at this point, you know where this is going.) This person looked at the application and from her perspective, everything was backwards. Because of her concerns with the data being managed by the application, it shouldn't be person-centric, but rather should be timeline-centric since her role in things is to be focused exclusively on the high-level aggregate timelines as opposed to the timeline of any individual person.

Natually this was a bit of a "D'OH!" moment. While it would have been nice to have this perspective from the outset and potentially would have changed the approach taken in building the application, the reality of the situation is that we have to get the application done very soon and we have to satisfy the new requirement in order for the application to be useful. Such is life as a developer.

Since the application has been to this point entirely person-centric, my co-worker was going down the path of thinking of aggregate timelines as a grouping of all the individual timelines, which on the face of it might not seem like a bad approach. The issue is that there are a minimum of 1500 individual timelines with a potentially large number of events per timeline, and since they may change extremely frequently, caching isn't a realistic option. Obviously instantiating 1500 objects just to get at query data would be a nice way to bring the server to its knees rather quickly.

Another great aspect of developer meetings is the chance to get a completely fresh perspective. It's easy when you're neck deep in an application to have your thinking get a bit rigid and not be able to see the numerous other ways in which the problem you're facing might be solved. So you start going down straight-line path you see in front of you and figuring out ways to tweak what's already there to handle a new requirement that may well be the antithesis of what your domain model was originally designed to address.

My co-worker described this situation as "refactoring himself into a corner" and I think that's a really apt description of what tends to happen in these situations. Partciularly given the recent discussions about all things OO in the CFML world, this seemed rather timely because my co-worker was thinking all about the objects and the domain model and not about the data. I know, you can probably quote me from any number of my conference presentations saying that's the way you should think about things, and I stand by that ... for the most part.

In this situation, however, what we're really after has nothing to do with the domain model (well, little to do with the domain model at any rate ... more on that in a moment) and everything to do with reporting on aggregate data. These are two completely different ways of looking at things, and going down the object path when what we really want to do is produce a report starts to feel wrong rather quickly. If something feels wrong don't ignore that feeling, because it probably feels wrong for a reason.

Luckily since my co-worker did focus on the domain model so thoroughly in the initial design of the application, the data itself was all nicely organized in the database. This was really is a side-effect of the domain model design as opposed to a data-centric approach to building the application. (Not enough time to go into detail on that point so I hope it makes sense at a high level.)

With the data sitting in the database, why worry about the Person side of things and all those objects? My suggestion was to forget about all that and focus on the end goal of what we need for the timeline/reporting side of the application. Clearly for Person and individual timeline management what has already been developed is important and will continue to be used as it always has been, but timeline reporting is a different requirement and needs a different approach.

Once you throw all the person-centric bits out the window it becomes a lot more clear. The data's in the database, so rather than thinking of an aggregate/master timeline as being made up of individual people, why not just leverage the database to do the reporting? Maybe this calls for a view to be created, or maybe it's a rather simple query/stored procedure, but since what we want is a big list of data, creating that from a huge number of individual objects just doesn't make sense.

Now that doesn't mean that OO goes out the window completely. Since this is a Mach-II application, chances are this will mean a Listener, a Service, and a Gateway in front of the raw SQL that does the reporting, which means the reporting functionality will integrate nicely right alongside the person-centric part of the application that's already done and working well. More importantly, it gets down to using the right tool for the job. Databases are (shocker) great at managing data, so putting that work in the database will not only make things a lot more efficient than shoe-horning all the individual beans into a construct where they don't belong, it just makes more sense overall. In short, think in objects unless you shouldn't. ;-)

I'm not sure there's a real moral to this story other than to point out that when you feel like you're refactoring yourself into a corner, stop, take a deep breath, and force yourself to think about things differently. Throw out all your preconceived notions and if that's difficult, grab a team member or three and talk through the problem with them. It's almost better if they don't know much about your application because they won't have the mental baggage that you do about things, and won't have any concept of how much rework is involved to implement their approach. And once you have them sucked in you can always get them to help you build the missing pieces!


Nice! :) That notion of having "baggage" that keeps you from seeing the simple solution applies to much of life. I don't have any URLs handy right now, but I know that there's been at least one study that shows that the more experience a person has in a given field, the less able they become to learn new or different approaches because there's so much reinforcement behind the techniques they've already learned.

And in this case also dovetails rather nicely into a comment Adam Haskell made a while back about one of the nice things about Fusebox being that you don't have to write objects for things like a report page where there's not much behavior going on.

Very good read. I think the comment near the end sums up my feelings:

"More importantly, it gets down to using the right tool for the job."

If refactoring is the tool that is called for, so be it. Just know that it isn't the tool *in every case*.

Great piece, and definitely not a rare situation. Think it bears pointing out that just because the solution isn't going to be use of the existing objects in the existing domain model, that doesn't mean the solution isn't going to be OO. As you noted, the solution to the aggregate reporting is still going to be handled in MachII. The issue really is that the domain model is being extended, right? Because it's a new requirement, it can be handled by a new domain model, a set of objects that just happen to consume the same data as the original app is already capturing.

I think that can be such a key as well: simply understanding when a change really does impact an existing requirement and when it doesn't. Often the ability to extend your model to add a B option to your existing A option is entirely reasonable, and that ability to extend without destroying is one of the biggest benefits to OO modeling.

As I am trying to dive deeper and deeper into OO, I am definitely seeing that there is a serious difference between what one might consider as being part of the domain model and one might consider to be reporting.

Not too unlike your scenario, I was going through a thought-experiement in which I imagined a Library object. And I asked myself, Should object have a GetBooks() method that returned a collection of all books in the library (potentially millions of objects).

While months ago I might have said YES, I understand that this is a bad idea. Yes, this would probably crash the server, but that is not what actually swayed me. What made me realize the absurdity of such a method was the question:

Would this collection of data be more useful as objects rather than as a report?

When I was challenged to come up a feasible use case in which I would need to affect millions of objects, I realized that this is data that should be report oriented.

Then, I also came to the realization that there is a difference between domain-based actions and "bulk" actions. If I wanted to increase the price of all products in an ecommerce store by 10%, I certainly would instantiate each product, update the price, and then save it. I think what would make more sense is to have some object (maybe a gateway or a service) have methods that allow for bulk actions.

No comments: