Skip to main content

Detecting Duplicate XML Data in SQL Server

I've been working quite a bit with XML in SQL Server lately (I'll try to do a post on some xquery stuff at some point), and I had a need to check XML data that I'm pulling off disk against a table in SQL Server to see if the data I pulled off disk is a duplicate with data already in the database.


The problem I ran into is that SQL Server "collapses" empty XML nodes when you insert data as XML (e.g. <myXmlNode><myXmlNode/> is turned into <myXmlNode/>, by SQL Server) so if the XML you're checking against hasn't gone through this collapsing process, you won't find duplicates accurately.


The solution turned out to be pretty simple and was suggested to me by a co-worker. First, you can't compare XML to XML directly in a query because, like any binary datatype in SQL Server, the = operator can't be used. Given the issue outlined above, you also can't just convert the XML in the database and the XML from disk into nvarchar(max) because of the collapsed node issue.


The trick is to use SQL Server's CONVERT() function and convert the XML from disk to SQL Server XML within a query, and then compare the result of that with the data already in the database:



DECLARE @xmlToCheck xmlSELECT @xmlToCheck = CONVERT(xml, '#theXmlFromDisk#')SELECT COUNT(id) AS dupeCount FROM xmlTable WHERE CONVERT(nvarchar(max), xmlColumn) = CONVERT(nvarchar(max), @xmlToCheck)



If dupeCount comes back greater than 0, then you have a dupe on your hands. Hope that helps others since I spent more time than I had hoped wrangling with this issue.

Comments

Popular posts from this blog

Installing and Configuring NextPVR as a Replacement for Windows Media Center

If you follow me on Google+ you'll know I had a recent rant about Windows Media Center, which after running fine for about a year suddenly decided as of January 29 it was done downloading the program guide and by extension was therefore done recording any TV shows.

I'll spare you more ranting and simply say that none of the suggestions I got (which I appreciate!) worked, and rather than spending more time figuring out why, I decided to try something different.

NextPVR is an awesome free (as in beer, not as in freedom unfortunately ...) PVR application for Windows that with a little bit of tweaking handily replaced Windows Media Center. It can even download guide data, which is apparently something WMC no longer feels like doing.

Background I wound up going down this road in a rather circuitous way. My initial goal for the weekend project was to get Raspbmc running on one of my Raspberry Pis. The latest version of XBMC has PVR functionality so I was anxious to try that out as a …

Setting Up Django On a Raspberry Pi

This past weekend I finally got a chance to set up one of my two Raspberry Pis to use as a Django server so I thought I'd share the steps I went through both to save someone else attempting to do this some time as well as get any feedback in case there are different/better ways to do any of this.

I'm running this from my house (URL forthcoming once I get the real Django app finalized and put on the Raspberry Pi) using dyndns.org. I don't cover that aspect of things in this post but I'm happy to write that up as well if people are interested.

General Comments and Assumptions

Using latest Raspbian “wheezy” distro as of 1/19/2013 (http://www.raspberrypi.org/downloads)We’lll be using Nginx (http://nginx.org) as the web server/proxy and Gunicorn (http://gunicorn.org) as the WSGI serverI used http://www.apreche.net/complete-single-server-django-stack-tutorial/ heavily as I was creating this, so many thanks to the author of that tutorial. If you’re looking for more details on …

The Definitive Guide to CouchDB Authentication and Security

With a bold title like that I suppose I should clarify a bit. I finally got frustrated enough with all the disparate and seemingly incomplete information on this topic to want to gather everything I know about this topic into a single place, both so I have it for my own reference but also in the hopes that it will help others.Since CouchDB is just an HTTP resource and can be secured at that level along the same lines as you'd secure any HTTP resource, I should also point out that I will not be covering things like putting a proxy in front of CouchDB, using SSL with CouchDB, or anything along those lines. This post is strictly limited to how authentication and security work within CouchDB itself.CouchDB security is powerful and granular but frankly it's also a bit quirky and counterintuitive. What I'm outlining here is my understanding of all of this after taking several runs at it, reading everything I could find on the Internet (yes, the whole Internet!), and a great deal…