Skip to main content

Using Python to Compare Document IDs in Two CouchDB Databases

I'm doing a bit of research into what may or may not be an issue with a specific database in our BigCouch cluster, but regardless of the outcome of that side of things I thought I'd share how I used Python and couchdb-python to dig into the problem.

In our six-server BigCouch cluster we noticed that on the database for one of our most heavily trafficked applications the document counts displayed in Futon for each of the cluster members don't match. As I said above this may or may not be a problem (I'm waiting on further information on that particular point), but I was curious which documents were missing from the cluster member that has the lowest document count. (The interesting thing is the missing documents aren't truly inaccessible from the server with the lower document count, but we'll get to that in a moment.)

BigCouch is based on Apache CouchDB but adds true clustering as well as some other very cool features, but for those of you not familiar with CouchDB, you communicate with CouchDB through a RESTful HTTP interface and all the data coming and going is JSON. The point here is it's very simple to interact with CouchDB with any tool that talks HTTP.

Dealing with raw HTTP and JSON may not be difficult but isn't terribly Pythonic either, which is where couchdb-python comes in. couchdb-python lets you interact with CouchDB via simple Python objects and handles the marshaling of data between JSON and native Python datatypes for you. It's very slick, very fast, and makes using CouchDB from Python a joy.

In order to get to the bottom of my problem, I wanted to connect to two different BigCouch cluster members, get a list of all the document IDs in a specific database on each server, and then generate a list of the document IDs that don't exist on the server with the lower total document count.

Here's what I came up with:

>>> import couchdb
>>> couch1 = couchdb.Server('http://couch1:5984/')
>>> couch2 = couchdb.Server('http://couch2:5984/')
>>> db1 = couch1['dbname']
>>> db2 = couch2['dbname']
>>> ids1 = []
>>> ids2 = []
>>> for id in db1:
...     ids1.append(id)
... 
>>> for id in db2:
...     ids2.append(id)
... 
>>> missing_ids = list(set(ids1) - set(ids2))

What that gives me, thanks to the awesomeness of Python and its ability to subtract one set from another (note that you can also use the difference() method on the set object to achieve the same result), is a list of the document IDs that are in the first list that aren't in the second list.

The interesting part came when I took one of the supposedly missing IDs and tried to pull up that document from the database in which it supposedly doesn't exist:

>>> doc = db2['supposedly_missing_id_here']

I was surprised to see that it returned the document just fine, meaning it must be getting it from another member of the cluster, but I'm still digging into what the expected behavior is on all of this. (It's entirely possible I'm obsessing over consistent document counts when I don't need to be.)

So what did I learn through all of this?

  • The more I use Python the more I love it. Between little tasks like this and the fantastic experience I'm having working on our first full-blown Django project, I'm in geek heaven.
  • couchdb-python is awesome, and I'm looking forward to using it on a real project soon.
  • Even though we've been using CouchDB and BigCouch with great success for a couple of years now, I'm still learning what's going on under the hood, which for me is a big part of the fun.

Comments

Popular posts from this blog

Running a Django Application on Windows Server 2012 with IIS

This is a first for me since under normal circumstances we run all our Django applications on Linux with Nginx, but we're in the process of developing an application for another department and due to the requirements around this project, we'll be handing the code off to them to deploy. They don't have any experience with Linux or web servers other than IIS, so I recently took up the challenge of figuring out how to run Django applications on Windows Server 2012 with IIS.

Based on the dated or complete lack of information around this I'm assuming it's not something that's very common in the wild, so I thought I'd share what I came up with in case others need to do this.


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Assumptions and CaveatsThe operating system is Windows Server 2012 R2, 64-bit. If another variant of the operating system is being used, these instructions may not work properly.All of the soft…

Installing and Configuring NextPVR as a Replacement for Windows Media Center

If you follow me on Google+ you'll know I had a recent rant about Windows Media Center, which after running fine for about a year suddenly decided as of January 29 it was done downloading the program guide and by extension was therefore done recording any TV shows.

I'll spare you more ranting and simply say that none of the suggestions I got (which I appreciate!) worked, and rather than spending more time figuring out why, I decided to try something different.

NextPVR is an awesome free (as in beer, not as in freedom unfortunately ...) PVR application for Windows that with a little bit of tweaking handily replaced Windows Media Center. It can even download guide data, which is apparently something WMC no longer feels like doing.

Background I wound up going down this road in a rather circuitous way. My initial goal for the weekend project was to get Raspbmc running on one of my Raspberry Pis. The latest version of XBMC has PVR functionality so I was anxious to try that out as a …

Fixing DPI Scaling Issues in Skype for Business on Windows 10

My setup for my day job these days is a Surface Pro 4 and either an LG 34UC87M-B or a Dell P2715Q monitor, depending on where I'm working. This is a fantastic setup, but some applications have trouble dealing with the high pixel density and don't scale appropriately.
One case in point is Skype for Business. For some reason it scales correctly as I move between the Surface screen and the external monitor when I use the Dell, but on the LG monitor Skype is either massive on the external monitor, or tiny on the Surface screen.
After a big of digging around I came across a solution that worked for me, which is to change a setting in Skype's manifest file (who knew there was one?). On my machine the file is here: C:\Program Files\Microsoft Office\Office16\LYNC.EXE.MANIFEST
And the setting in question is this:
<dpiAware>True/PM</dpiAware>
Which I changed to this: <dpiAware>False/PM</dpiAware>
Note that you'll probably have to edit the file as administr…