Saturday, August 27, 2011

Retrieving Documents Between Two Dates From CouchDB

I'm working on converting yet another application from using SQL Server to using CouchDB, and this morning I'm working with some announcement documents that are displayed based on their start and end date. There are numerous ways to approach this problem but I thought I'd share what I came up with in case this solution helps others, and also to see if there's maybe another approach I didn't consider.


First, since there is no date datatype in JSON, we've standardized (for better or worse) on storing dates as a string with the format "YYYY/MM/DD HH:MM:SS", e.g. "2011/08/27 09:22:36", so date and time separated by a space, always with leading zeros for single digits, and always using a 24-hour clock. This allows date/time strings to sort properly when they're used as keys, it's easy to split the string using the space if you need either just the date or just the time, and since this application is for my day job the time will always be in Eastern US time so we decided not to care about the timezone offset.


In the data I imported from SQL Server there is a dtStart and a dtEnd field so I just converted the SQL Server dates to our preferred CouchDB date format as I imported the data into CouchDB. So far so good.


The next step was to pull these documents from CouchDB based on their dtStart and dtEnd fields, and this is probably obvious but just so it's clear, I need to pull all documents of this type where dtStart <= now, and dtEnd >= now.


As I started creating my view in CouchDB for this, my first thought was to pull all the documents using an array including dtStart and dtEnd as the key. That way when I call the view I could, in theory, use a start and end key to get me the documents in the range of dates that I want.


That approach seems reasonable at first, but when you start trying to put it into practice things get weird rather quickly. This is because what you wind up needing is documents in which the first element of the key array is less than the current date, while the second element of the key array is greater than the current date. Maybe this is just "Saturday morning brain" on my part, but I didn't see a way to include both the start and end date in the key and get where I needed to go.


My next thought was to use only the end date as the key. This gets me a bit closer to what I need since I can at least use a start key to only get documents with an end date >= now, but I'm still faced with having to check the start date at the application level to see if the document is supposed to be displayed.


I'm sure there's some clever way to handle this situation with keys, and part of my reason for posting this is to see how others would approach this, but I messed around with keys for a while and didn't seem to be getting anywhere so I decided to take a different approach.


One of the great things about CouchDB is the fact that you have the full power of JavaScript available in your views. Although JSON doesn't know what a date is, JavaScript certainly does, so I decided that since I needed to pull things based on a specific date range across two fields in my documents the best place to handle that was in the view code itself.


Here's what I came up with for my map function:



var d = new Date();
var curYear = d.getFullYear();
var curMonth = (d.getMonth() + 1).toString();
var curDate = d.getDate().toString();
var curHours = d.getHours().toString();
var curMinutes = d.getMinutes().toString();
var curSeconds = d.getSeconds().toString();

if (curMonth.length == 1) {
  curMonth = '0' + curMonth;
}

if (curDate.length == 1) {
  curDate = '0' + curDate;
}

if (curHours.length == 1) {
  curHours = '0' + curHours;
}

if (curMinutes.length == 1) {
  curMinutes = '0' + curMinutes;
}

if (curSeconds.length == 1) {
  curSeconds = '0' + curSeconds;
}

var dateString = curYear + '/' + curMonth + '/' + curDate + ' ' +
    curHours + ':' + curMinutes + ':' + curSeconds;

if (doc.type == 'announcement' &&
    doc.dtStart <= dateString &&
    doc.dtEnd >= dateString) {
      emit(doc.dtEnd, doc);
}


Now of course you could argue this would all be simpler if I stored the dtStart and dtEnd fields in my documents as milliseconds, because then I could just get the millisecond value of the current date and do a quick numeric comparison instead of all the string formatting and concatenation, and from that perspective you'd be absolutely right. One of the many things I love about CouchDB, however, is the ability to jump into Futon and more directly and easily interact with my data, so keeping the dates human readable is kind of nice. Now I could store both a string and the millisecond value I suppose, but since this did the trick I decided to leave well enough alone.


I'm very curious to hear how others might solve this problem. "You're doing it wrong" information would be quite welcome. ;-)