Skip to main content

Retrieving Documents Between Two Dates From CouchDB

I'm working on converting yet another application from using SQL Server to using CouchDB, and this morning I'm working with some announcement documents that are displayed based on their start and end date. There are numerous ways to approach this problem but I thought I'd share what I came up with in case this solution helps others, and also to see if there's maybe another approach I didn't consider.


First, since there is no date datatype in JSON, we've standardized (for better or worse) on storing dates as a string with the format "YYYY/MM/DD HH:MM:SS", e.g. "2011/08/27 09:22:36", so date and time separated by a space, always with leading zeros for single digits, and always using a 24-hour clock. This allows date/time strings to sort properly when they're used as keys, it's easy to split the string using the space if you need either just the date or just the time, and since this application is for my day job the time will always be in Eastern US time so we decided not to care about the timezone offset.


In the data I imported from SQL Server there is a dtStart and a dtEnd field so I just converted the SQL Server dates to our preferred CouchDB date format as I imported the data into CouchDB. So far so good.


The next step was to pull these documents from CouchDB based on their dtStart and dtEnd fields, and this is probably obvious but just so it's clear, I need to pull all documents of this type where dtStart <= now, and dtEnd >= now.


As I started creating my view in CouchDB for this, my first thought was to pull all the documents using an array including dtStart and dtEnd as the key. That way when I call the view I could, in theory, use a start and end key to get me the documents in the range of dates that I want.


That approach seems reasonable at first, but when you start trying to put it into practice things get weird rather quickly. This is because what you wind up needing is documents in which the first element of the key array is less than the current date, while the second element of the key array is greater than the current date. Maybe this is just "Saturday morning brain" on my part, but I didn't see a way to include both the start and end date in the key and get where I needed to go.


My next thought was to use only the end date as the key. This gets me a bit closer to what I need since I can at least use a start key to only get documents with an end date >= now, but I'm still faced with having to check the start date at the application level to see if the document is supposed to be displayed.


I'm sure there's some clever way to handle this situation with keys, and part of my reason for posting this is to see how others would approach this, but I messed around with keys for a while and didn't seem to be getting anywhere so I decided to take a different approach.


One of the great things about CouchDB is the fact that you have the full power of JavaScript available in your views. Although JSON doesn't know what a date is, JavaScript certainly does, so I decided that since I needed to pull things based on a specific date range across two fields in my documents the best place to handle that was in the view code itself.


Here's what I came up with for my map function:



var d = new Date();
var curYear = d.getFullYear();
var curMonth = (d.getMonth() + 1).toString();
var curDate = d.getDate().toString();
var curHours = d.getHours().toString();
var curMinutes = d.getMinutes().toString();
var curSeconds = d.getSeconds().toString();

if (curMonth.length == 1) {
  curMonth = '0' + curMonth;
}

if (curDate.length == 1) {
  curDate = '0' + curDate;
}

if (curHours.length == 1) {
  curHours = '0' + curHours;
}

if (curMinutes.length == 1) {
  curMinutes = '0' + curMinutes;
}

if (curSeconds.length == 1) {
  curSeconds = '0' + curSeconds;
}

var dateString = curYear + '/' + curMonth + '/' + curDate + ' ' +
    curHours + ':' + curMinutes + ':' + curSeconds;

if (doc.type == 'announcement' &&
    doc.dtStart <= dateString &&
    doc.dtEnd >= dateString) {
      emit(doc.dtEnd, doc);
}


Now of course you could argue this would all be simpler if I stored the dtStart and dtEnd fields in my documents as milliseconds, because then I could just get the millisecond value of the current date and do a quick numeric comparison instead of all the string formatting and concatenation, and from that perspective you'd be absolutely right. One of the many things I love about CouchDB, however, is the ability to jump into Futon and more directly and easily interact with my data, so keeping the dates human readable is kind of nice. Now I could store both a string and the millisecond value I suppose, but since this did the trick I decided to leave well enough alone.


I'm very curious to hear how others might solve this problem. "You're doing it wrong" information would be quite welcome. ;-)

Comments

Nuri Cevik said…
Is it possible to keep dates as dateValue while carriying over from MSSQL to CouchDB and retrieve it with these values so you can compare them? If you need to display them you can convert them to date and time.
radiospiel said…
Wouldn't this make the view non-sideeffect-free? It seems to me you are including a document in the view if its start time is in the past and its end time is in the future *from the moment the view was built*? Wouldn't that make the view contain different entries depending on when it was built?
radiospiel said…
If you want to have a view to find out which documents are valid at a certain point in time, you could probably emit(hourly_timestamp, doc) for each hour between dtStart and dtEnd for the document, and, if needed, let your app remove the first and the last entry, because they might then be false positives.
samdengler said…
@radiospiel is right. Each time that the view is requested, it only executes against the document delta from the last time the view was executed. If the view changes, the map functions are executed against all documents, but that's not the case here.Suppose that the database contains 100 documents, and the view is requested at Wed Sep 14 10:00:00 EDT 2011. The view will contain a subset of the 100 documents in which "Wed Sep 14 10:00:00 EDT 2011" falls within the start and end dates.Now suppose that 10 documents are added in the next 24 hours, and the view is requested tomorrow at Wed Sep 15 10:00:00 EDT 2011. The view will still contain the original subset of 100 documents plus an additional subset of the 10 documents in which "Wed Sep 15 10:00:00 EDT 2011" falls within the start and end dates.The easiest way to prove this is to create a blank database and a view with the following as the map function:function(doc) { emit(doc._id, new Date());}Now create a document and run the view. Create another document and run the view. The view should contain two documents with different timestamps, because the second time that the view was executed it only ran against the delta (the second document in this case).
Matthew Woodward said…
Thanks--this is great feedback! I indeed did not consider the fact that the view won't get updated as I thought it would when the documents change, so I'll have to rethink this. Really appreciate the info as I continue to get deeper into CouchDB.
Matthew Woodward said…
Interesting idea @radiospiel -- I wonder if it'll be simpler to emit the start or end date only that's within a range and then pare it down more in the application. This isn't a massive amount of data so that'd probably work in this case, though of course if it were a ton of data I'd have to approach it differently.Definitely an interesting problem to ponder (for me at least!).
jchris said…
@radiospiel is correct. To do date range queries you need to pick a granularity for queries (day? hour? 15 mins?) and then for long running events emit once per granular period.The alternative is complex query logic and potentially unbounded query time if you had, say, a year long event, in the same database as millions of shorter events.
Matthew Woodward said…
Thanks @jchris -- really appreciate the additional info! My Couch Fu will only get stronger by sharing mistakes such as this one. ;-)
samdengler said…
Another option is to handle complex queries with lucene, which can be configured to listen to couch for changes. There are options as to how you index full documents, selected fields, etc: https://github.com/rnewson/couchdb-lucene
Matthew Woodward said…
Yep, looking into Lucene with Couch on another project so may wind up using that on this project as well.
Matt Woodward said…
Realized I never answered Nuri's original question--yes, it's entirely possible to store the dates however you want. That doesn't address my particular issue though (which I reworked this week so I'll have a follow-up blog post about that), but you can store dates in whatever format you want.They're still strings, of course, and another thing to keep in mind is that while JSON doesn't have a date datatype, there are ISO standards around date formatting that might be wise to follow. We started there and settled on the format we're using because we will never, ever be using anything except US Eastern time and don't care about the timezone offset difference between daylight savings and not, so the slightly simpler format we're using vs. the ISO format works well for us.Big thing is is you want to use the date field as a key in Couch, meaning be able to sort documents based on date, you need a format that will sort correctly. Even something like not using leading zeros will throw your sorting off.Also, date strings are comparable as is too, meaning if you use a format like "YYYY/MM/DD HH:MM:SS" you can do things like if (date1 < date2) and get correct results based on string comparisons which is nice.

Popular posts from this blog

Installing and Configuring NextPVR as a Replacement for Windows Media Center

If you follow me on Google+ you'll know I had a recent rant about Windows Media Center, which after running fine for about a year suddenly decided as of January 29 it was done downloading the program guide and by extension was therefore done recording any TV shows.

I'll spare you more ranting and simply say that none of the suggestions I got (which I appreciate!) worked, and rather than spending more time figuring out why, I decided to try something different.

NextPVR is an awesome free (as in beer, not as in freedom unfortunately ...) PVR application for Windows that with a little bit of tweaking handily replaced Windows Media Center. It can even download guide data, which is apparently something WMC no longer feels like doing.

Background I wound up going down this road in a rather circuitous way. My initial goal for the weekend project was to get Raspbmc running on one of my Raspberry Pis. The latest version of XBMC has PVR functionality so I was anxious to try that out as a …

Running a Django Application on Windows Server 2012 with IIS

This is a first for me since under normal circumstances we run all our Django applications on Linux with Nginx, but we're in the process of developing an application for another department and due to the requirements around this project, we'll be handing the code off to them to deploy. They don't have any experience with Linux or web servers other than IIS, so I recently took up the challenge of figuring out how to run Django applications on Windows Server 2012 with IIS.

Based on the dated or complete lack of information around this I'm assuming it's not something that's very common in the wild, so I thought I'd share what I came up with in case others need to do this.


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Assumptions and CaveatsThe operating system is Windows Server 2012 R2, 64-bit. If another variant of the operating system is being used, these instructions may not work properly.All of the soft…

Setting Up Django On a Raspberry Pi

This past weekend I finally got a chance to set up one of my two Raspberry Pis to use as a Django server so I thought I'd share the steps I went through both to save someone else attempting to do this some time as well as get any feedback in case there are different/better ways to do any of this.

I'm running this from my house (URL forthcoming once I get the real Django app finalized and put on the Raspberry Pi) using dyndns.org. I don't cover that aspect of things in this post but I'm happy to write that up as well if people are interested.

General Comments and Assumptions

Using latest Raspbian “wheezy” distro as of 1/19/2013 (http://www.raspberrypi.org/downloads)We’lll be using Nginx (http://nginx.org) as the web server/proxy and Gunicorn (http://gunicorn.org) as the WSGI serverI used http://www.apreche.net/complete-single-server-django-stack-tutorial/ heavily as I was creating this, so many thanks to the author of that tutorial. If you’re looking for more details on …