Sunday, March 6, 2011

Using the Google App Engine Datastore with CFML

In my previous post I gave an intro on setting up the Google App Engine (GAE) plugin for Eclipse as well as installing Open BlueDragon for GAE. In this post I'm going to go over some of my sample application from OpenCF Summit 2011, specifically how CFML applications running on GAE interact with the Google Datastore.


I was initially going to cover the entirety of my sample application in one post but I think it's going to be better to split this into smaller, more digestible chunks. I'll start with the Google Datastore, in the next post I'll address file uploads and image manipulation, and in yet another post I'll cover the ability for your application to respond to inbound email as well as receive and send XMPP messages.


Let me preface all of this by stating unequivocally that I am not an expert in how the Google Datastore works, and that I'm going to focus on using it from OpenBD for GAE. What I hope this post will do is not only educate people who are just learning OpenBD for GAE, but also prompt discussions about features and functionality that doesn't yet exist in OpenBD for GAE as it relates to the Datastore, or perhaps there is functionality that is present in OpenBD for GAE that people would like to see work differently.


The Sample Application


If you want to follow along and run things locally you'll want to download the sample application Eclipse project (.tar.gz file, 15.3 MB). You should be able to unzip the file and then import it into Eclipse but please comment if you have any problems with the application files.


The application itself is pretty simple. It's a quick session attendee application, and also has photo upload functionality but we'll cover that in the next post. For now we're going to focus on how to read and write data from the Google Datastore using the GAE-specific functionality that's baked into OpenBD for GAE.


Once you import the project into Eclipse, right-click and choose Run As -> Web Application. After a few seconds you'll see "The server is running" in the Eclipse console, at which point you can go to http://localhost:8888 in your browser and see this:



Ocfs_sample_app_home

Let's add an attendee so you can get a feel for the very simple functionality of the application. Enter a first and last name (I'll use my own name) and click the "Add Attendee" button. You'll then see a bullet list at the top of the screen containing the names of attendees you add. You can clean everything out by clicking the "Delete All Attendees" link, and you can also search for attendees by first name, last name, or both.


As I mentioned above, we'll cover the photo functionality in a future post.


Google Datastore Overview


As you add, delete, and search for attendees, as you might imagine this data is getting stored somewhere. We didn't create a database or configure any datasources, however, so where is the data going?


The answer is the Google Datastore, which is the default data storage functionality for GAE. When you run a GAE project locally in Eclipse you get more or less the entire GAE environment, including a local version of the Google Datastore. As we'll see in a moment, some GAE-specific functionality in OpenBD for GAE allows you to easily interact with the Google Datastore.


The Google Datastore is a schemaless, key/value datastore built on Google Bigtable. At a high level the Datastore can be lumped into the poorly named "NoSQL" database category. What this means is that as opposed to the relational databases and SQL queries that you may be used to, you'll be storing what are called entities in the Datastore, which are objects organized by type or "kind," and entities contain name/value pairs of data called properties. Properties in entities can be of various data types, including integer, float, string, date, binary data, and others. Entities also have an ID or key that uniquely identifies the entity in the Datastore.


Because the Datastore is schemaless, entities of the same kind do not necessarily have to contain all of the same properties. Using contact data as an example, you might store two entities that are both of kind "contact." One contact might have a fax number where another might not. When storing these entities in the Datastore both entities could be defined with a kind (type) of "contact," but the contact with the fax number would have a fax property, whereas the contact without a fax number wouldn't have a fax property at all. In other words, the contact without a fax number would not have a fax property that was empty; rather, the fax property doesn't necessarily exist in the contact without a fax number even though both entities are of kind "contact."


Entities can be retrieved from the Datastore by key, by property values, or by querying the Datastore using GQL. GQL is a query language with some similarities to SQL, but given the nature of how the Datastore stores data there are of course important differences. We'll be looking at some basic ways of interacting with the Datastore in your CFML applications in the next section of this tutorial.


For additional information about the Google Datastore, refer to:



Writing to the Google Datastore


Saving New Entities


In the same application we'll be writing ColdFusion Components (CFCs) to the Datastore. OpenBD for GAE can save both CFCs and structs to the Datastore, and the simple datatypes contained in the value of struct keys, or in the variables scope of CFCs, are mapped to entity properties in the Datastore. Specifically, numbers, booleans, strings, dates, or arrays of these datatypes are what is currently able to be persisted to the Datastore by OpenBD for GAE.


Let's take a look at how the simple attendee form data is stored to the Datastore. The form on the index page of the application is a standard HTML form with form fields firstName and lastName. The form posts to _addAttendee.cfm. On _addAttendee.cfm an instance of an Attendee CFC is created (you can see the code for the Attendee CFC in the sample application's model directory), the form data is set to the CFC's variables scope via the CFC's init() method, and the CFC is then saved to the Datastore.



<!--- create attendee CFC and save it to google datastore --->
<cfset attendee = CreateObject("component",
            "model.Attendee").init(form.firstName, form.lastName) />
<cfset GoogleWrite(attendee) />

<!--- could also have done:
<cfset attendee.GoogleWrite() />
--->

<cflocation url="index.cfm" addtoken="false" />


What's noteworthy in the code above is the use of the GoogleWrite() method, which is baked into OpenBD for GAE. This can be used as a top-level function to which a CFC or struct can be passed, or the base CFC in OpenBD for GAE includes a GoogleWrite() method so GoogleWrite() may be called on the CFC instance itself (as shown in the commented-out chunk of code above).


What happens behind the scenes when GoogleWrite() is called is the CFC becomes an entity in the Datastore, the data in the CFC's variables scope--in this case firstName and lastName strings--become the entity's properties, the entity is assigned a "kind" of attendee based on the CFC type (N.B. this is the CFC type, not the name of the variable being stored), and the entity is assigned a key as well as an ID/Name by the Datastore when it is saved.


You can view data stored in the Datastore using the GAE dashboard. The dashboard is available both locally as well as on the production GAE platform, but the live (non-local) version has a lot more options and information than the local version.


If you haven't already done so, or if you deleted all your data, save a new attendee to the Datastore by filling out the form on the index page of the application and click the "Add Attendee" button.


Next, browse to the GAE dashboard at http://localhost:8888/_ah/admin :



Gae_local_dashboard_home

Click on the Datastore Viewer link on the left, then in the Entity Kind drop-down choose "attendee" and click List Entries:



Gae_datastore_entity_list

As mentioned above, the CFC type of attendee has become an entity kind in the Datastore, so any attendee CFCs you save will be put in this same bucket. A key and ID/Name has been assigned to the object, the firstName and lastName properties have been stored, and an additional field of org.openbluedragon.gae.keyname.cfc has been added, which has a value of model.Attendee. We'll see this in action when we pull data from the Datastore in a moment, but this value is stored so the CFC can be reconstituted in your CFML code when it's pulled from the Datastore. In other words when you pull this record back into CFML, you'll get an instance of the Attendee CFC, and if you pull back multiple attendees, you'll get an array of CFCs as opposed to a query object.


You can also save entities to the Datastore by explicitly specifying a kind and a key name. If you specify the key name when saving you will be able to pull the entity back using a combination of kind and key name as we'll see when we get into retrieving data from the Datastore. If you don't specify a key name when you save an entity to the Datastore, it's auto-assigned an integer value but note that this auto-assigned value cannot be used to pull back the entity. Unless you specify a key name (which shows up in the ID/Name field in the Datastore viewer) you'll have to use the unique key or query the Datastore to retrieve the entity.


Here's a quick example of saving an Attendee CFC to the Datastore and explicitly specifying the kind and key name:



<cfset attendee = CreateObject("component", "model.Attendee").init("Homer", "Simpson") />
<cfset GoogleWrite(attendee, "attendee", "MyKeyName") />


This will save the attendee CFC as an entity of kind attendee with a key of "MyKeyName" in the Datastore. Note that if you do another GoogleWrite() using the same kind and key name, this will update the record with the key name specified. (More on updating entities in a moment.)


Now let's write up a test page that saves a struct to the Datastore. This code is not included in the sample application download so you'll be creating this file from scratch.


Right-click on the war directory in the project in Eclipse and choose New -> File. Call the file structtest.cfm (or whatever you want to call it), and enter the following code:



<cfscript>
foo = StructNew();
foo.foo = "bar";
foo.bar = "baz";

googleKey = GoogleWrite(foo, "MyNewKind");
</cfscript>

<cfoutput>googleKey = #googleKey#</cfoutput>


Since we're not saving a CFC in this case, we have to provide a kind under which the Datastore will store this data. This is the second argument in the GoogleWrite() method, this first of which is obviously the data you wish to save. GoogleWrite() returns the key of the entity that was just saved and this is output in the final line.


With this struct data saved, go back to the dashboard, click on Datastore Viewer, and look at the Entity Kind drop-down. You'll see your new "MyNewKind" available as an entity kind:



Gae_datastore_new_entity_kind

Choose MyNewKind in the entity kind drop-down and click "List Entities," and you'll see the struct data has been saved.



Gae_new_entity_kind_display

Because this is a struct and not a CFC, you'll notice that there is no org.openbluedragon.gae.keyname.cfc property in this entity, since that isn't needed to create a CFC instance when the data is returned.


Updating Existing Entities


Updating existing entities in the Datastore is done in one of two ways:



  1. Read the existing entity, update the properties that need to be updated, and write the modified entity back to the Datastore

  2. Write an entity directly to the Datastore (without reading first) using the entity kind and ID/name property


The way updates are handled in the Datastore is a bit different than the typical SQL update statements since you're saving the entire entity again as opposed to updating discrete properties on the entity directly in the Datastore. When updating an entity in the Datastore, you'll typically first want to read the entity from the Datastore, make the necessary updates to the entity's properties, and then write the entity back to the Datastore.


For example, if I need to update the first name on an attendee entity with a key of ahBvcGVuY2ZzdW1taXQyMDExcg4LEghhdHRlbmRlZRgNDA I run the following code:



<cfscript>
    attendee = GoogleRead("ahBvcGVuY2ZzdW1taXQyMDExcg4LEghhdHRlbmRlZRgNDA");

    attendee.setFirstName("NewFirstName");

    GoogleWrite(attendee);
</cfscript>


The read/update/write steps may seem redundant but this does ensure that you're updating the most recent version of the record, at least as of when you did the GoogleRead(). Many other "NoSQL" databases such as CouchDB take this same approach. CouchDB is also a schemaless, document-based datastore and it requires that you pass a version number of the document you are updating to ensure that you're updating the latest revision of the document.


If you have written an entity and set an explicity ID/Name property you can update the entity by doing a GoogleWrite() and using this same ID/Name value:



<cfscript>
    attendee = CreateObject("component", "model.Attendee").init("NewFirstName", "NewLastName");

    GoogleWrite(attendee, "attendee", "MyExistingKeyName");
</cfscript>


Before moving on to retrieving data from the Datastore, a quick reminder about the schemaless nature of the Datastore. We could have specified a kind of "attendee" when saving the struct above, and it would have saved the struct as the same kind of entity as our attendee CFC, even though the data is not of the same type on the CFML side, and the two entities would not have any properties in common. Here again if you're familiar with CouchDB, another schemaless, document-based datastore, this will sound familiar.


Reading From the Google Datastore


Retrieving Entities by Key


Now that we have some data in the Datastore, let's look at the various ways in which the data can be retrieved, some of which were already introduced above.


The simplest data retrieval method is obviously to use a specific key to pull back a specific record. Since there isn't an example of pulling back a specific record by key in the sample application, we'll create another quick test file.


Right-click on the project's war directory in Eclipse and choose New -> File. Call the file retrievebykey.cfm (or whatever you want). Next, open your GAE dashboard, go to the Datatstore Viewer, choose an Entity Kind of attendee, and click on List Entries. Copy the Key property from one of the entities and paste that value into the code below. My key is ahBvcGVuY2ZzdW1taXQyMDExcg4LEghhdHRlbmRlZRgNDA in the example.



<cfset attendee = GoogleRead('ahBvcGVuY2ZzdW1taXQyMDExcg4LEghhdHRlbmRlZRgNDA') />

<cfoutput>#attendee.getFirstName()# #attendee.getLastName()#</cfoutput>


The GoogleRead() function of OpenBD for GAE is used to retrieve entities by key. GoogleRead() takes a single key, an array of keys, or a kind and ID/name as arguments. Note that if kind and ID/name are used this is instead of using the key since the key is unique across entity kinds, whereas the ID/name value (an integer) is not.


Let's play with this a bit more. If you only have one attendee in your Datastore add another, and view the attendee entities in your GAE dashboard again. Copy two keys and paste into this code (you can create a new file or use the same new file you created earlier):



<cfscript>
    keys = ArrayNew(1);
    keys[1] = 'ahBvcGVuY2ZzdW1taXQyMDExcg4LEghhdHRlbmRlZRgNDA';
    keys[2] = 'ahBvcGVuY2ZzdW1taXQyMDExcg4LEghhdHRlbmRlZRgQDA';
   
    attendees = GoogleRead(keys);
</cfscript>

<cfdump var="#attendees#" />


This will return an array of Attendee CFCs that match the keys in the array.


Retrieving Entities by Kind and Key Name (ID/Name)


Before moving on to querying the Datastore, let's look at an example of pulling an entity back by kind and ID/name. When you look at the Datastore Viewer in the GAE dashboard, you'll notice that all your entities are given an integer ID/name by default, which we discussed above. We also discussed the fact that you can explicitly assign an ID/Name value when you write an entity to the Datastore using GoogleWrite().


When retrieving entities by kind and ID/name, if you have explicitly assigned a value for the ID/name property you can retrieve by this value.



<cfset attendee = GoogleRead("attendee", "MyExistingKeyName") />
<cfdump var="#attendee#" />


The ID/name property is case insensitive, so MyExistingKeyName and myexistingkeyname (and all variants thereof) are the same as far as the Datastore is concerned.


Querying the Google Datastore


In the overview section above I mentioned that the Google Datastore uses a SQL-like language called GQL for Datastore queries. This is at the most raw level, however. When you're using the Datastore through OpenBD for GAE, you'll be querying the Datastore using a slightly different syntax than the raw GQL syntax, and it's one based on JDOQL (Java Data Objects Query Language), which is not unlike other ORM-style languages such as Hibernate Query Language (HQL).


As explained on the Google Datastore wiki page on the OpenBD wiki, the syntax supported is actually a subset of JDOQL as follows:



SELECT FROM <kind>
[WHERE <filter>]
[ORDER BY <sort clause>]
[RANGE <from index>, <to index>]


All queries against the Datastore must start with SELECT FROM <kind> but all other parameters of the query are optional.


As an example, let's first retrieve all entities of kind attendee from the Datastore. This example in a slightly different guise is contained in the sample application's services.AttendeeService CFC, but here's the very simple code:



<cfset allAttendees = GoogleQuery("SELECT FROM attendee") />


Note that the kind value (attendee in this case) is case-insensitive. This will return an array of attendee CFCs, not a query object. You may also use the CFQUERY tag and specify DBTYPE="GOOGLE" to query the Datastore:



<cfquery name="allAttendees" dbtype="google">
  SELECT FROM attendee
</cfquery>


Here too the variable allAttendees will be an array of attendee CFCs, not a query object, even though the CFQUERY tag is being used.


Next let's take a look at how to retrieve all attendees with the last name Smith.



<cfset theSmiths = GoogleQuery("SELECT FROM attendee WHERE lastName == 'Smith'") />


This should look rather familiar to those of us accustomed to SQL, but note the double equal sign (==) instead of the single equal sign you may be used to.


Multiple where clauses can be strung together using boolean operators. In this case we'll get back any attendees with a firstName of John and a lastName of Smith:



<cfset allJohnSmiths = GoogleQuery("SELECT FROM attendee WHERE firstName == 'John' && lastName == 'Smith'") />


Note the && operator for "and," and also be aware that the values of firstName and lastName must match exactly and are case sensitive.


Additional operators are available on numeric datatypes. Let's assume our attendee entities included the attendees age, and we wanted to get an array of all attendees who are 40 or older:



<cfset oldsters = GoogleQuery("SELECT FROM attendee WHERE age >= 40") />


Finally, we can use the RANGE clause to limit the number of results we get back. Let's say we only want to get back the first five John Smiths in the Datastore:



<cfset topFiveJohnSmiths = GoogleQuery("SELECT FROM attendee WHERE firstName == 'John' && lastName == 'Smith' RANGE 1, 6") />


The RANGE is one-based and the TO value in the range is not inclusive, hence the use of 1, 6 above to get the first 5 John Smiths in the Datastore.


Limitations


Limitations to querying the Datastore include:



  • Operators in the WHERE clause are limited to ==, >, >=, <, <=

  • Operators for or (||) and not equal to (!=) are not supported.

  • "Like" queries are not supported, meaning for example you cannot run a query that will return all attendees with a last name starting with S (in SQL terms, WHERE lastName LIKE 'S%'). There are some efforts for creating full-text searching against the Datastore but none of these seem particularly mature at this point.


Deleting Entities From the Google Datastore


Deleting from the Datastore is simple. You use the GoogleDelete() function and pass in one of the following:



  • The key of the entity you want to delete

  • An array of keys of entities you want to delete

  • The object (CFC or struct) you want to delete

  • An array of objects (CFCs or structs) you want to delete


The base CFC in OpenBD for GAE also includes a GoogleDelete() function, so you may call GoogleDelete() directly on an instance of a CFC that has been persisted in the Datastore (e.g. myAttendee.GoogleDelete()).


Let's look at a couple of examples.



<cfscript>
  // delete by key
  GoogleDelete("");

  // delete multiple records by key
  keys = ArrayNew(1);
  keys[1] = "";
  keys[2] = "";
  GoogleDelete(keys);

  // delete an attendee CFC; this assumes you have
  // done a read so the google key is in the CFC
  GoogleDelete(attendee);

  // delete multiple CFCs; assume attendees is an array
  // of attendee CFCs that have been read so the key is
  // included in each
  GoogleDelete(attendees);

  // call GoogleDelete() directly on an attendee CFC
  attendee.GoogleDelete();
</cfscript>


Examples From the Sample Application


I wound up spending a lot more time discussing the Google Datastore in general and its basic use from OpenBD for GAE than I originally intended, but I think going through things blow-by-blow is pretty important for both learning and reference purposes.


At this point the examples in the sample application are probably self-explanatory, so I'll leave studying those and tracing the actions in the application and the interaction with the Google Datastore as an exercise for the reader.


As I said above what I'm really interested in doing with this post in addition to educating people is hearing from people who are using OpenBD for GAE if there is new functionality you'd like to see implemented, or existing functionality you'd like to see work differently.


In my next post, I'll cover doing file uploads and using the Google App Engine image manipulation functionality, which will be a much shorter post than this one. :-)


 

3 comments:

Adam Knott said...

I really like using the GAE Datastore. It makes storing and retrieving objects a breeze.The one thing that I think is missing in the OpenBD implementation is the inability to store entities with a parent-child relationship (unless I am missing something).The GAE Datastore supports parent child relationships within the same kind. It would be great to be able to do:googleWrite(kind,keyname,parentKey)If we had parent child relationships it would open the door to other functionality like full text search. Other implementations of full text search use parent child relationships like this:The parent entity contains the data for your object. The child entity contains the index for the values in your parent object. The index in the child object is basically one array (google calls a list). This list could be generated by a call to Lucene.To search you would query the child entities and have the datastore return only the key and not the entire entity. The key of the child entity is comprised of the parent entity key + the child entity key. You then parse the key and read the parent entity from the datastore.Google supports returning only the key in a query like this:select __key__ from kind where ...The reason for only returning the key is that the datastore would not have to serialize a potentially large array that you don't really need anyways. Google says that serializing these arrays is a very costly operation and can be avoided in this case.If the parent - child functionality existed, you would be able to add a cfproperty tag to a component with an attribute of searchable="true". Then in the base component.cfc, functionality could be added to googleWrite() to create child entities for properties that were marked "searchable". Then, of course, you would also need a googleSearch() function.Sorry if I got too far off track with the full text search stuff, but I wanted to make a case for parent child - relationships.Thanks for the great posts,Adam

penny506 said...

It should also be noted: ":Only one inequality filter per query is supported. Encountered both startDate and endDate"select from event where eventid =='9A77B5AD-155E-4513-86BADE82EAE558D0' && startDate >= '05/22/2011' && endDate <= '05/22/2011' This will not work

penny506 said...

It should also be noted that when doing date searches, there is a format preference as well: http://code.google.com/appengine/docs/python/datastore/gqlreference.htmlDATE('YYYY-MM-DD')among others.