Wednesday, February 27, 2008

Detecting Duplicate XML Data in SQL Server

I've been working quite a bit with XML in SQL Server lately (I'll try to do a post on some xquery stuff at some point), and I had a need to check XML data that I'm pulling off disk against a table in SQL Server to see if the data I pulled off disk is a duplicate with data already in the database.

The problem I ran into is that SQL Server "collapses" empty XML nodes when you insert data as XML (e.g. <myXmlNode><myXmlNode/> is turned into <myXmlNode/>, by SQL Server) so if the XML you're checking against hasn't gone through this collapsing process, you won't find duplicates accurately.

The solution turned out to be pretty simple and was suggested to me by a co-worker. First, you can't compare XML to XML directly in a query because, like any binary datatype in SQL Server, the = operator can't be used. Given the issue outlined above, you also can't just convert the XML in the database and the XML from disk into nvarchar(max) because of the collapsed node issue.

The trick is to use SQL Server's CONVERT() function and convert the XML from disk to SQL Server XML within a query, and then compare the result of that with the data already in the database:

DECLARE @xmlToCheck xmlSELECT @xmlToCheck = CONVERT(xml, '#theXmlFromDisk#')SELECT COUNT(id) AS dupeCount FROM xmlTable WHERE CONVERT(nvarchar(max), xmlColumn) = CONVERT(nvarchar(max), @xmlToCheck)

If dupeCount comes back greater than 0, then you have a dupe on your hands. Hope that helps others since I spent more time than I had hoped wrangling with this issue.

Thursday, February 7, 2008

Dealing With Null Values in ColdFusion

The lack of nulls is a well-known ... shall we say "quirk" of ColdFusion.  In most cases it's a minor annoyance, but it's one of those things we just have to deal with. Recently I've been working on a rather large project for which the data integrity is absolutely critical (and I mean critical), and I'm starting to be increasingly annoyed by CF's lack of nulls, not only because of the extra work involved, but because in some cases it leads to unresolvable situations in which the data will not accurately reflect its true state.

First, a little background for those of you who may not be familiar with what I'm talking about. In many languages, such as Java (I'll use that as a point of reference since that's the language I have the most experience with other than ColdFusion), there is the concept of null, which in its simplest terms means "nothing." For example, if variable x is an integer, it has two possible states in strongly typed languages. It can either be an integer value such as 5, or it can be null, which doesn't mean that it's set to 0, it means that it's set to nothing. The notion of this being true in a strongly typed language is semi-important for the later points I make, so keep that in mind.

In ColdFusion, which isn't strongly typed, variables can be more or less whatever they want, but there is no concept of null. I can set variable x to the integer 5, I can set it to the string "this is no longer a number", or I can set it to an instance of a CFC, and CF doesn't care. What I can't do, however, is say that variable x is set to nothing. The closest corollary we have in ColdFusion is setting something to a zero-length string (""), but that is not the same as null.

So am I just nitpicking ColdFusion again? In this case I don't think so. I personally think ColdFusion 9 needs to add support for nulls. It will save a lot of extra work and help avoid situations like the one I'll outline in a bit where there is no good solution to accurately represent the state of the data.

First let's look at the extra work involved. Maybe it's my Java background that makes me this way, but I'm extremely anal about data types even when I'm working in ColdFusion. Perhaps this is something I just need to get over, but bear with me for the sake of this discussion.

In my CFCs, if an attribute is a date, I use a date data type; if it's a number, I use numeric; if it's a boolean, that's what I use, and so on. Again probably coming from my Java background, I'm a rather firm believer in the power of data typing. If I'm expecting a number and that isn't what I get, I want my application to throw an error because that means something's wrong that needs to be investigated. I understand the arguments on the other side of this issue, I just personally don't subscribe to them.

Where this becomes an issue is when null data is retrieved from a database. A null value in the database comes back to CF as a zero-length string (""), so you can probably already see the issue. If I have a CFC with an attribute that's numeric, and I get "" back from the database, CF will throw an error if I try to set "" to my numeric attribute. Same for any data type other than strings, but even then it's problematic when you're putting data back into the database, but I'll get to that in a moment.

Taking the numeric data type example, let's assume the value is null in the database, and let's assume I have a default value of 0 in my CFC for that attribute. What this necessitates on the CF side is this:

<cfif myQuery.my_integer_field is not "">

    <cfset myCFC.setMyIntegerField(myQuery.my_integer_field) />


In other words, I can't just cram all the data from my database query directly into my CFC, because I have to worry about all nullable fields in the database that aren't strings. This may not seem like a big deal on a small scale, but on large applications where the data integrity is vital, this is a lot of extra work, not just considering the physical typing of code, but from the standpoint of having to carefully scrutinize each and every field in each and every table and write little chunks of code like this for every nullable field that could cause problems.

Where this gets into a real issue, however, is with things like numeric fields and booleans. Let's look at the numeric example first. Assume the numeric field is null in the database, which results in "" coming back to ColdFusion. Since I care about datatypes in my CFC, I have to ignore that "" and just use my default value of 0 in the CFC. Let's then assume that some process in the application sets the value of that numeric field in the CFC to 0, and then that data needs to be saved back to the database.

How do I know if 0 is the actual value for that field? How do I know I'm not supposed to set that value to null when I do the database update? The answer is that I don't, not unless I do some sort of null tracking programmatically. By that I mean I could start doing things like setting a "wasNull" flag for each problematic attribute in my CFC, but again, that's a lot of extra work that would be unnecessary if we just had nulls in CF.

Bit fields in the database are equally problematic. Now personally I'm against having nullable bit fields in the database because to me, if it's a yes/no situation, it should be one or the other. There are cases, however, in which something like "yes, no, or no response" is valid, and while you could argue that maybe there should be an indication of the "no response" situation other than null, using null isn't a bad way to handle this situation. So in this case, if I use a boolean data type on the CF side to represent this bit field, that boolean needs to be either true or false. If I get null back from the database, I again have to make an assumption, so I set no to false. Already this isn't a true representation of the data, because I'm assigning a value of "false" to something that should be "nothing." It's even more bad news when I go to put that data back in the database, because at that point the data isn't just a misrepresentation, it's flat out incorrect because the value isn't "no," it's "nothing."

There are certainly ways to manage these situations, but none of them (at least that I've come up with) are pretty, and in all cases they involve a lot of extra work that wouldn't be necessary if we just had nulls in CF. Granted I'm not a language engineer so I have no idea what would be involved with adding nulls to CF, but in my opinion it's absolutely necessary.

While looking into this issue I came across a post on Ben Nadel's blog that shows how to use Java to check for null values, and while that seems to do the trick in one sense, you'd still have to have null flags associated with all your attributes to know how to handle each attribute when you're putting data back into the database. Plus it's still a bit of a hack (albeit a nice one!) when what we really need is null as a first class citizen in CF.

Here's hoping nulls make their way into CF 9!


Completly agree with you Matt. Few weeks ago I've published this: <a href="</p>

It looks like you share the same thoughts.

Also one comment about your integers samp,e from the beggining of yuor entry. In Java you have simple type int which can't be null. Null may be set only to Object. So you can set private Integer i = null; but you can't private int i = null.

Constructing Integer is simple new Integer( int ) but int can't be null :)

Radek--thanks for clarifying that point.

What do you think the result would be if the type-checking in cfargument could accept an empty string for any data type? (ditto for returntype)

I ask because, while I would like NULLs as well, I can see how it would be problematic to add that to the language.

@Steve: Don't think it will be problematic because of one reason - CF doesn't use simple Java types. All numbers are java.lang.Double, ohh... sorry - juest tested:

gives java.lang.String :).

But anyway - all cf types extends java.lang.Object at some point so there is no technical difficulty to let them be null.

ColdFusion checks data types at runtime so this is another reason why it shouldn't be problematic to implement nulls. And there is one mistake in your way of thinking Steve - passing null isn't passing "". null is null - even if argument is required and null is passed that shouldn't failed. It should be developer task to check if the value is null or new cfargument attribute should be presented allowNull="true|false" with true by default.

It is strange CF behavior changing null to "" simply because null shouldn't event be equal to null. I assume in programming languages null==null just to simplify but in T-SQL NULL does not equal to other NULL.

DECLARE @a varchar(10)

DECLARE @b varchar(10)



IF ( @a = @b )


PRINT 'the same'




PRINT 'not the same'


Doh! Again it removed my CF code - here is the code which should be in my previous comment:

<cfset x = true />

<cfdump var="#x.getClass().getName()#">

@Steve--I guess I wants null to be null. Anytime you get into saying things like "well if it's a number and you get a zero-length string, then assume it's null" that's problematic to me because while it may indicate it's null, it might also indicate that there's a problem with your code somewhere that's returning a zero-length string instead of a number.

It's entirely possible I'm being way too anal about this but it's REALLY getting on my nerves lately that there is no null.

I certainly don't to enter a discussion about the feasibility of adding NULL to ColdFusion as that wasn't my point (merely relaying what I have heard).

Similarly, I agree with the desire for having NULLs in the language.

I agree that treating an empty string as a NULL is somewhat problematic. Still, it could be better (in the case of cfargument and returntype) than the current condition. It would at least be consistent with the current practice of ColdFusion to turn a NULL from the database into an empty string.

I program in VB6 and CF in equal amounts. VB has its problems, but they do have a way of assigning null values to a string:


It shouldn't be that hard to implement in CF.

If you call some non-VB API or component, you test the calls with vbNullString. The function you're calling might not check for a NULL string, in which case it might crash. Non-VB functions should check for NULL before processing a string parameter. With bad luck--like in a SQL Server query--the particular function you're calling does not do that. In this case, use "". Usually APIs do support vbNullString and they can even perform better with it!

@Mark: in other words programmers should know what they are doing. They should know when they may expect null value or if they aren't sure if null may be passed they should check for it just in case.

Here's another point I've not seen discussed in this thread.

If you are using a Java component in your CF code and a constructor or method takes an argument where 'null' is meaningful, you have no way to directly invoke it that way from CF.

There are cases where the author of the component does not provide a suitable alternate for the operation via overloading or other technique. So, you end up having to write a worthless wrapper class which you can then call from CF to init or mutate an object in the desired way.

What a pain.

Oh - you have.

createObject("java","package.Class").init( javaCast("null","") )


Yeah, I forgot about that. I guess I was still thinking about before CFMX 7 when there was no JavaCast() function.

There was javaCast in CFMX7 and it was even supporting nulls. JavaCast was in CFMX from the beggining of Java CF implementation AFAIR.

I agree that this is a problem that the adobe team needs to take a look at. I have not run into any issues with this being a problem yet but I can see where it can become an issue with some more advanced applications.

If you didn't know, BlueDragon does support nulls the way you are fact, I found that sometimes I had to slightly tweak my code in areas for BD because it was a little stricter in places about null values that CF just basically ignored. So there definitely is no reason ColdFusion shouldn't have it as well, hopefully they'll get it into CF9 if enough people ask!

I asked for nulls in 7 so 8 has it! I have seen it used in code as JavaCast("null", 0) or JavaCast("null", "")

JavaCast to a null is not the same thing as CFML having null, which it still doesn't in Adobe CF. As Mary Jo pointed out in an earlier comment, BlueDragon and Open BlueDragon does have null as a true CFML language construct.

Absolutely agree. Am a Java and Actionscript programmer learning CF 8 to get up to speed with the rest of my team (who are old school cfm's and custom tags). I am going the CFC way (cause its OOP or death) and the methods are being a real pain. I am stuck between method parameter type errors or database fk violation constraints when i use a value like -1 to represent a null :( Where have all the good nulls gone??!!

Getting XML Text From XML Nodes

I've been working with XML very heavily the past couple of weeks, and for whatever reason this is just one of those things I haven't had to do a whole lot of until now. Don't get me wrong, XML comes up pretty regularly for most people, but I haven't had to live and breathe it this much before.

As with most things in ColdFusion it's incredibly easy to jump in and get going, and for what I'm doing anyway, XmlSearch makes it absolutely trivial to tear through the XML and get the data I need.

I did run into one annoyance and thought I'd see if anyone had a better solution. Again, since I seem to have dodged working with XML this heavily until now I could just be missing something. Specifically I'm referring to the seeming necessity of two lines to get at most things when you use XmlSearch. If you've worked with XML much in CF you probably already know what I mean. Let's assume there's a single node (meaning really a single element array with an XML node in it) returned by this:

<cfset myXmlNode = XmlSearch(myDoc, "/*/myNode") />

To get at the XmlText, I then have to do this:

<cfset myXmlText = myXmlNode[1].XmlText />

What I'd like to be able to do is this:

<cfset myXmlText = XmlSearch(myDoc, "/*/myNode")[1].XmlText />

But unfortunately that syntax isn't valid. To solve the problem I wrote a little UDF called getXmlTextFromXmlSearch (couldn't think of a longer name ;-) that takes in the array and the element number from which you want the XmlText. It works, but I'm a tad irked that this seems to be necessary. What CF could use IMO is an ArrayGetAt *function* so you could wrap arrays in the function instead of having to jump through these other hoops:

<cfset myXmlText = ArrayGetAt(XmlSearch(myDoc, "/*/myNode"), 1).XmlText />

That's essentially what my UDF does but again, seems a bit klunky. Maybe I'm just nitpicking because ColdFusion is so great and I want it to be perfect. ;-)


I'm in the same boat - I barely use XML but here's what I came up with for an XPath expression to get the xml text:


I'm not sure if it's the perfect solution but I do know XPath is pretty powerful stuff (and I've yet to find a *really* good resource for it).

crud...looks like you stripped my code...let me try this:

<cfxml variable="test">





<cfset text = xmlSearch(test, "string(//nodes/node[text()])") />

<cfdump var="#text#">

Thanks Todd--I'm a bit of an XPath moron so I'll need to look into that more.

You could also do this:

<cfset myXmlText = XmlSearch(myDoc, "/*/myNode").get(0).XmlText />

Ah--didn't think of tapping into the Java that way. That's very slick! Thanks radek!

This will work as well

I wasn't sure how to put code in a comment


#HTMLEditFormat( '')#

Sorry Qasim--I need to tidy up the code handling in the comments!

i don't think it;s the best solution

not sure how i happened upon this, as i'm in virtually the same boat this morning, and your post is already indexed in google - well done!

anyhow, radek is right, arrayGetAt IS array.get(index), and fwiw, arrayFind would be array.indexOf(string)

@todd - I think that the xPath functions such (as string()) are only available in CF8, but I have no way to test on 7 atm.

this xmlSearch will expose whether a function is avail in CF xPath: xmlSearch(search, "function-available('testthisfunction')")

My situation was a *little* different, I wanted to get an array of just the text values in a particular node.


xmlSearch(search, '//nodes/node/text()') returns an array that looks like this:


and i coulnd't for the life of me figure out how to get the xpath search down to just xmlValue...becasue it appears that CF is building that DOM node structure back in to the xmlSearch response. so, xPath finds the text, gives it back to CF, and CF goes 'well, hell, this looks like xml, so let's send it back as native CF XML DOM node structure. as it turns out, in CF8, at least, if i output arr[1], i get a string, and not the XML object shown in the dump of the array.