Matt Woodward’s posterous

Matt Woodward’s posterous

Matthew Woodward  //  * CFML, Grails, and Java Developer
* Principal IT Specialist, US Senate
* Open BlueDragon Steering Committee Member
* All-Around Geek

Jun 3 / 3:15pm

Open Source Bridge - Relational vs. Non-Relational

Josh Berkus, PostgreSQL Experts Inc.
  • overview focused on choosing what type of database you need vs. investigation of any specific database
  • up until a few years ago there were only a handful of options for open source databases
    • most were sql/relational
    • a few written in java
    • only really exciting thing going on in the relational world is postgres vs mysql
  • today there are many more open source databases
    • as many as 5 dozen now?
  • databases for lots of different purposes, but lots of people want to lump a lot of the new ones under the "nosql movement" label
    • not so fond of this term
    • has implication that every database that doesn't have a sql interface is more or less identical
    • all non-relational databases aren't the same
    • have graph, document, key-value, distributed, hierarchical ... quite different from one another
    • some of the non-relational databases have sql interfaces
  • all relational databases aren't the same either
    • embedded, oltp, mpp, streaming, c-store ...
  • mythbusting
    • "revolutionary" is bandied about a lot but database technology goes back a long way
    • not really any new database designs in terms of fundamental architecture
    • are new implementations and combinations of design
    • last "new" thing was map-reduce in 2002
    • even couchdb is largely similar to Pick, which was created in 1965
    • when looking at new databases, don't look for revolutionary concepts, look for good implementations
    • what's going on right now is actually a renaissance of non-relational databases
  • myth: "non-relational databases are toys"
    • google - bigtable
    • amazon - dynamo
    • facebook - memcached
    • us veterans administration - pick, cache
  • myth: "relational databases will become obsolete"
    • xml databases were supposed to replace rdbms ca. 2001 -- didn't happen
    • rdbmses evolved to include xml functionality
    • one of the things we'll see out of the current non-relational innovation is that some of the implementations will hybridize with one another
  • myth: "relational databases are for when you need ACID transactions"
    • transactions != relational -- orthagonal features
    • robust transactions without relationality: berkelydb, amazon dynamo
    • sql without transactions: mysql isam, ms access
  • myth: "users are adopting nosql for web-scale performance"
    • sometimes it is, sometimes it isn't
    • performance test done by myyearbook.com
      • benchmark of key/value storage and retrieval
      • only real difference in performance is between databases that guarantee durability and those that don't
    • horizontal scalability
      • some non-relational databases are built for horizontal scalability and some aren't
      • complexity of implementation rises with the ability to scale out to a massive number of nodes
  • myth: "one ring theory of database selection"
    • "what's the best database to use?" - wrong question
    • don't need to use only one database
    • choose the db that meets your applicaton's goals, or use more than one together
    • use a hybrid
      • mysql ndb
      • postgresql hstore
      • hadoopdb
  • but what about choosing between relational and non-relational?
  • relational oltp databases
    • transactions: more mature support
    • constraints: enforce data rules absolutely
    • consistency: enforce structure 100%
    • complex reporting: keep management happy!
    • vertical scaling (but not horizontal)
  • sql vs. no sql--sql promotes ...
    • portability
    • managed changes over time (ddl)
    • multi-application access
    • many mature tools
    • but, sql is a full programming language and you have to learn how to use it
  • no sql promotes ...
    • programmers as dbas
    • no impedance mismatch
    • fast interfaces
    • fast development and deployment
    • but, man involve learning complex proprietary APIs
      • in some cases not easier than sql
  • main reason to use sql-relational databases
    • "immortal data"
    • your data has a life independent of this specific application implementation
    • important to be able to access data accurately and consistently forever
  • how DO i choose?
    • define the problem you're trying to solve
      • what is it that my application wants to do with this data and how does it want to do it?
      • i need a database for my blog, i need to add thousands of objects per second on a low-end device, etc.
    • from the definition you create you can define a database shopping list
      • define the features you ACTUALLY need
  • fit the database to the task
  • "I need a database for my blog"
    • use anything!
    • no open source databases that wouldn't support someone's individual blog, flat files would even work
  • "I need my database to unify several applications and keep them consistent"
    • high-end sql-relational database best choice for this
    • "PostgreSQL: It's not a database, it's a development platform"
  • "I need my application to be location-aware" -- geo applications
    • PostGIS - geographic relational database
    • queries across "contains" "near" "closest"
    • complex geometric map objects
    • couchdb spatial and spatialite are now available as well
  • "I need to store 1000s of event objects per second on a piece of embedded hardware"
    • db4object -- embedded key-value store
    • others: berkeleydb, redis, tokyocabinet, mongodb
    • db4object
      • project was german train system -- records data every few milliseconds
      • low-end embedded console computer
      • simple access in native programming language (java, .net)
  • "I need to access 100K objects per second over thousands of connections from the web"
    • memcached - distributed in-memory key-value non-persistent database
    • use: public website
    • typically supplements another database
    • alternatives: redis, kyototyrant, etc.
  • "i need to produce complex summary reports over 2tb of data"
    • luciddb - relational column-store database
    • for reporting and analysts
    • large quantities of data
    • complex olap and analytics
    • used along-side oltp running production apps
  • "I have 100s of govt documents I need to serve on the web and need to mine the data as cheaply as possible"
    • CouchDB
    • storing lots and lots of government documents that didn't have a consistent format (don't know content or structure)
    • used in combination with postgres to keep structured metadata
    • couchdb is also great for mobile apps
  • "I have a social application and I need to know who-knows-who-knows-who-knows-who"
    • surprisingly hard question to answer with a normal db
    • use a graph database -- neo4j is most popular open source one
    • social network website
    • 6 degrees of separation
    • "you may also like"
    • type and degrees of relationship
  • "I get 1000s of 30K bug reports per minute and I need to mine them for trends"
    • used on mozilla firefox crash reports
    • hadoop -- massively parallel datamine
    • hadoop + hbase
      • reports are then put into postgres for viewing
  • conclusion
    • database systems do better at different tasks
      • every database feature is a tradeoff
      • no database can do all things well
      • need to make tradeoff decisions when picking databases
    • relational vs non-relational doesn't matter
      • pick the database(s) for the project or task
Questions
  • how difficult to migrate from something like couchdb to postgres?
    • depends on how much data and in what form
    • since couchdb works with json that's pretty easy
    • if you want to take the document structure out of something like couchdb and put it into a relational model, the decomposition process will be complicated

Comments (0)

Nov 6 / 8:04am

The "NoSQL" Discussion has Nothing to Do With SQL | Communications of the ACM

Recently, there has been a lot of buzz about “No SQL” databases. In fact there are at least two conferences on the topic in 2009, one on each coast. Seemingly this buzz comes from people who are proponents of:

• document-style stores in which a database record consists of a collection of (key, value) pairs plus a payload. Examples of this class of system include CouchDB and MongoDB, and we call such systems document stores for simplicity

• key-value stores whose records consist of (key, payload) pairs. Usually, these are implemented by distributed hash tables (DHTs), and we call these key-value stores for simplicity. Examples include Memcachedb and Dynamo.

In either case, one usually gets a low-level record-at-a-time DBMS interface, instead of SQL. Hence, this group identifies itself as advocating “No SQL.”

Great first part of a two-part series about data storage and how "NoSQL" doesn't at all get at what things like CouchDB, MongoDB, etc. are all about.

Filed under // CouchDB Databases

Comments (0)

Oct 26 / 4:41pm

Massive CouchDB Brain Dump

The following is a semi-unorganized brain dump of everything of interest I've come across while learning the incredibly cool CouchDB document-oriented database system. In this brain dump I pull things from many different resources including my own head, so there may be literal quotes from some of these resources without inline attributions. For that I apologize, but rest assured I'm not trying to plaigarize anyone or take credit where it's not due; I was just merely taking notes as I perused a lot of different resources and organized them in a way that made sense to me. I do have a complete list of all of the resources I used at the end to let you explore on your own. Again, my apologies to the creators of the resources from which I pull for not attributing inline.

I'll be presenting CouchDB to the ColdFusion Meetup on December 17 (Charlie did a great job of booking a full schedule through the end of the year) so don't miss it!

CouchDB: General Concepts

  • document-oriented
  • schemaless
  • NOT RELATIONAL
  • JSON-based
  • REST-based
  • MapReduce
  • basically throw out everything you know about databases and you'll pick this up a lot faster
  • Calls are made to the database via HTTP
    • Yes, that means via a browser, curl, cfhttp ... anything that talks HTTP
  • Responses come back as JSON
  • Lock-free design--reads don't have to wait for writes or other reads
  • Why are these good things?
    • More like how data works in the real world
      • e.g. business cards--if one has a fax and another doesn't, in an RDBMS you have to have a fax field that's going to be null for anyone who doesn't have a fax
      • with CouchDB, you can have one record with a fax field, and another with no fax field, but they're both considered business cards since every document in CouchDB is 100% self-contained
    • Simple
      • some of this stuff is so simple you'll be amazed there isn't more to it
    • Fast
      • Push/Pull of JSON data over HTTP
      • No messy, time-consuming joins between tables--a document contains all its data
    • Scalable
    • Takes the object-relational mismatch out of the picture to a certain extent
    • It works like the web does
  • History of CouchDB
    • development started in 2004
    • originally written in C++, now in Erlang
    • Damien Katz quit his job and self-funded development full-time for 2 years
      • was formerly with IBM working on Lotus Notes, also a brief stint at MySQL
    • Damien Katz is now a full-time employee at IBM and gets paid to work on CouchDB full time
    • CouchDB is a top-level Apache project and is released under the Apache 2.0 license
  • CouchDB's motto: RELAX.
    • we shouldn't have to worry about data so much


Why Use CouchDB?

  • throws out the relational model and looks at what matters with data in the majority of applications
  • vastly simplifies data modeling and interaction with data
  • extremely flexible since there are no preset schemas
  • in relational databases, data does get to a point where it's unwieldy and slow to access
  • relational model is hard to scale, and doesn't do so very naturally or quickly
  • CouchDB offers ...
    • robust, dead simple replication to any number of servers
    • bi-directional conflict detection and resolution
    • fantastic performance on huge databases
      • number of records in a database has very little impact on performance
    • fantastic scalability
      • Erlang was designed for real-time telcom apps in the 1980s, so it's ideal for high scalability and highly concurrent apps like database servers
      • early testing with CouchDB shows it can handle 20,000 concurrent connections with no problems, and they haven't even done and performance profiling yet
        • lead developer said in an interview that using conventional threading in C++ you'd be lucky to handle 500 concurrent connections
      • Erlang also can help with multi-machine scalability, failover, etc. but CouchDB is not taking advantage of any of this yet
  • CouchDB speaks the language of the web
    • REST, HTTP, and JSON are how CouchDB works natively
  • already gaining a huge amount of traction and becoming very popular


Is the Relational Model Dead?

  • there is an increasing indication that the relational model will begin being seen as a solution, not the solution
  • Map/Reduce is simply a better model for dealing with large datasets and taking advantage of parallel processing
  • "Map/Reduce will kill every traditional data warehousing vendor in the market. Those who adapt to it as a design/deployment pattern will survive, the rest won't."
    • Might think this came from a non-relational database vendor, but it's actually from Brian Aker, one of the original authors of MySQL and currently working on the Drizzle (http://drizzle.org/wiki/Main_Page) fork of MySQL
  • document-based databases like CouchDB scale far better and easier than relational databases do
    • both Amazon and Google came up with their own database solutions for their cloud computing platforms as opposed to using a traditional RDBMS--this should tell you something
  • Better and more natural fit for applications


More on "Better Fit for Applications"

  • self-contained documents
    • no more taking a real world construct and deconstructing it into a relatonal model
  • flexible schemas
    • two documents can be of the same type and not contain the same fields--don't have to have a bunch of nulls involved, worry about foreign keys, etc. etc. since every document is self-contained
    • if you need a change in your schema, it's dead simple to do--just start using the new schema
      • if you don't care that the old documents don't have the new field, you don't have to worry about them
  • speaks our language as application developers
    • REST and JSON--doesn't get much simpler than that
  • since it's all web based you take advantage of the following at the database level
    • can handle more traffic since connections aren't left open
    • clustering, proxying, caching, security, etc. behaves just as it would with an HTML document
  • Creator of CouchDB said one of the goals was to have users feel like "you could touch your data ... like it was right there in your hands"
    • eliminating all the layers between your application and your data


Relational Model vs. Document-Based aka "Key/Value Store" Databases

  • relational diagram
  • key/value diagram
  • CouchDB pros
    • ideally suited for cloud computing
    • more natural fit with the code we write -- no ORM mismatch nonsense to worry about
  • CouchDB cons
    • relational databases enforce integrity at the database level
    • schemaless nature of CouchDB means your data integrity is that the APPLICATION level
      • bugs in application code using RDBMS don't lead to data integrity issues
      • bugs in application code using CouchDB CAN lead to data integrity issues
      • really this just puts this concern in a different place, but it's something to be aware of
    • no shared standards between key/value database vendors
      • much easier to move from SQL Server to MySQL than it would be to move from CouchDB to Amazon or Google
  • other concerns with cloud databases ...
    • limtations on analytics -- e.g. Amazon queries cannot take longer than 5 seconds to run
    • limitations on data returned -- e.g. Google queries cannot return more than 1000 rows
  • from an application development standpoint, in my experience thus far this does bring your data repository a bit more into the realm of your application
    • again, this isn't SQL, so your code isn't running queries and dealing with query objects; instead it's making HTTP calls and dealing with JSON
    • less friction between your app and your data, but be aware that it's a bit of a whole new world when you're working with CouchDB


Should You Consider CouchDB?

  • yes if you ...
    • have tables with lots of columns of which you typically only use/display a few
    • have lots of joins in your queries
    • are serializing JSON or XML data into single columns in your relational database
    • have data that is more heirarchical or flat than it is relational
    • have systems that require frequent schema changes
    • are reaching the performance capacity of a single database server and need to scale out
    • have an amount of data that is difficult for a single server to hold
    • have background processes running on your database that impact performance of the database as a whole
  • the nice thing about CouchDB is that it's highly and easily scalable by its very nature
    • but if you don't need scalability now, you don't have to worry about it; you just get it when you need it practically for free
  • Why not just dump JSON data into a relational database?
    • because RDBMSes don't know anything about JSON, so you don't get any of the huge efficiency and functionality advantages you get with CouchDB


Other Document-Based Databases


Building/Installing CouchDB

  • basic requirements: Erlang, SpiderMonkey (JavaScript engine), other miscellany
  • on Linux you'll need to install some prerequisites/dependencies if you don't have them; here's the list for Ubuntu ...
    • sudo apt-get install subversion
    • sudo apt-get install libtool
    • sudo apt-get install automake
    • sudo apt-get install libmozjs-dev
    • sudo apt-get install libicu-dev
    • sudo apt-get install curl
    • sudo apt-get install libcurl4-gnutls-dev
    • sudo apt-get install erlang-dev
    • sudo apt-get install erlang-nox
    • sudo apt-get install openssl
    • sudo apt-get install libssl-dev
      • double-check you have openssl and libssl installed; otherwise you may not get an error until you first try to run CouchDB
  • alternatively you can try ...
  • then grab the code and build it (do this from your home directory or wherever you like)
    • svn co http://svn.apache.org/repos/asf/couchdb/trunk couchdb
    • cd couchdb
    • sudo ./bootstrap
      • You should see "You have bootstrapped Apache CouchDB, time to relax." If not, fix any dependency issues it lists (the error messages are very explicit).
    • sudo ./configure
      • You should see "You have configured Apache CouchDB, time to relax."
    • sudo make
    • sudo make install
      • if you don't get any errors with make or make install you should be able to launch CouchDB!
  • Check http://wiki.apache.org/couchdb/Installing_on_Windows for information about installing on Windows. Haven't tried this myself, likely won't, so best of luck.


Running CouchDB

  • on Linux the install process puts the couchdb script in your path, so you can open a terminal and type sudo couchdb
    • you should see:
      Apache CouchDB has started. Time to relax.
      [info] [<version_number>] Apache CouchDB has started on http://127.0.0.1:5984/
    • If you see an error along the lines of {"init terminating in do_boot",{undef,[{crypto,start,[]} ... that means you don't have erlang-nox and/or libssl-dev installed, so you'll have to go back through the steps above once you have those dependencies resolved.
    • If you have other errors when trying to start CouchDB check http://wiki.apache.org/couchdb/Error_messages


Interacting with CouchDB with CURL

  • some basic examples
    • curl -X GET http://localhost:5984/
      • returns basic server info
    • curl -X GET http://localhost:5984/_all_dbs
      • returns list of all databases on the server
    • curl -X PUT http://localhost:5984/contacts
      • creates a new database called contacts
    • curl -X PUT http://localhost:5984/contacts/6e1295ed6c29495e54cc05947f18c8af -d '{"firstName":"Matt", "lastName":"Woodward", "email":"matt@mattwoodward.com"}'
      • creates a new document in the contacts database; the string after the database name is a UUID
    • curl -vX PUT http://localhost:5984/contacts/6e1295ed6c29495e54cc05947f18c8af/headshot.jpg?rev=2-2739352689 -d@headshot.jpg -H "Content-Type: image/jpg"
      • attaches a headshot jpeg to the document with the ID provided
    • curl -X GET http://localhost:5984/_uuids
      • CouchDB returns a new UUID; can add ?count=N to get back N UUIDs if you need more than one
    • curl -X GET http://localhost:5984/contacts/6e1295ed6c29495e54cc05947f18c8af
      • returns the document with the UUID provided
    • curl -X DELETE http://localhost:5984/contacts/6e1295ed6c29495e54cc05947f18c8af?rev=2-212344
      • deletes the document with the ID provided; note that you must provide the latest revision number for the document in order for the delete to succeed
    • curl -X DELETE http://localhost:5984/contacts
      • deletes the contacts database
    • curl -X POST http://localhost:5984/_replicate -d '{"source":"contacts","target":"contacts-replica"}'
      • replicates the contacts database to the contacts-replica database
  • of course since this is just HTTP, you can use CURL's -v flag to get a verbose listing of everything CouchDB is doing on each request
  • performing updates is a bit different
    • if you do a PUT of a document with the same ID but don't include a revision number, the update will fail
    • you have to include the latest revision number in CouchDB in updates for them to work
    • what this means in practice is that you'll pull the document you want to update back, update the JSON (or update the data in your application code), and then do a put of the updated document with the new data since this will contain the most recent revision in CouchDB
    • the updated document gets a new revision number, and the original document is retained in CouchDB as a previous revision


Versioning of Documents

  • CouchDB uses a multi-version concurrency control (MVCC) system
  • each document in CouchDB gets a revision number
  • previous versions of documents are saved in CouchDB
    • BUT ... unlike a version control system, there is no guarantee how long the previous versions will be retained
    • you can tell CouchDB you want to retain the previous versions of a document if you need to
  • remember that all communication with CouchDB is done over HTTP
    • HTTP is stateless--you open a connection to CouchDB, make a request, then the connection is terminated
      • this is good because it means CouchDB can handle a lot of traffic since connections are short-lived
  • If you're familiar with Etags in the HTTP world, CouchDB uses its revision numbers as Etags in HTTP responses
    • Etags are very useful for caching
    • since all documents in CouchDB are really just resources in the HTTP/REST sense, your data behaves like any other HTML resource


Futon: CouchDB's Web-Based Interface

  • browse to http://localhost:5984/_utils for the web-based interface to CouchDB
  • handy way of perusing your documents, managing datbases, etc.
  • definitely handy for creating design documents and views
    • mini editor for creating temporary and permanent views--can execute temporary views from within the editor
  • can kick off replication and a ton of other stuff from Futon


Creating a Document

  • new documents have _id and _rev fields added automatically
  • documents are versioned much like code is in SVN, so every version of every document in the db is stored
  • click "add field" in Futon to add a new field to a document
  • double-click value (default is null) to edit
  • values must be JSON valid data
    • strings have to have quotes around them, e.g. "hello" not just hello
    • valid datatypes are string, number, boolean, list, and key/value dictionaries
  • you can do a "view source" on a document from Futon to see the JSON version of the document
  • as you update a document, the version number will change with each revision
    • if another process changes the document before you save your changes, a conflict will arise
  • CouchDB has no concept of "types"
    • e.g. in a blog application we would think of "posts" and "comments" as types
    • remember that CouchDB is schemaless, so there is no inherent structure to documents contained within the database itself
    • common to use a type field on a document containing a string that defines the type
      • makes it easy to write a view that pulls back specific document types
      • CouchDB does NOT CARE what field you use to define type--you can call this anything because again, CouchDB has no concept of document types
    • remember also that even if you define a document as a type, it does NOT have to literally match the structure of other documents with that same type
      • e.g. music library--could define "album" as a type, and if one album has a year field and another doesn't, they're both still "album" types since we defined the type explicitly
      • BUT, if you do want to require a specific structure for a document type, you can do that with validation functions and, e.g., reject an addition or update of an album to a music library if it didn't contain a year field
    • also handy to infer type based on fields for more flexibility
      • e.g. in a blog app we could use if (doc.title && doc.body) and assume if those fields are present that this is a blog post as opposed to a comment


How Documents Are Not Like Database Records

  • self-contained--no joins across table to put together a single record
    • documents in CouchDB map directly to an object instance in your application
  • typically documents will automatically have authors and publish dates associated with them
    • very easy to publish documents of any type in the future
    • if you create user accounts for CouchDB it automatically keeps track of who created and modified records
  • don't break documents into smaller units than you need to!
    • a blog post will have an author--don't have the author be a separate document
  • JSON document format
    • CouchDB documents all have _id and _rev fields
      • _id can be anything, so long as its unique--UUID, plain old string, whatever
      • _rev is the revision number--this changes with each update to a document
        • to update a document, you have to provide the most recent value of _rev so CouchDB knows you're working with the latest revision
    • if users have been configured, documents will have an author field
    • Do I really look like a guy with a plan? You know what I am? I’m a dog chasing cars. I wouldn’t know what to do with one if I caught it. You know, I just… do things. The mob has plans, the cops have plans, Gordon’s got plans. You know, they’re schemers. Schemers trying to control their little worlds. I’m not a schemer. I try to show the schemers how pathetic their attempts to control things really are.

      The Joker, The Dark Knight (this quote is used in the forthcoming O'Reilly CouchDB book)
    • NEED INFO ABOUT THINGS LIKE ARRAYS, ETC. HERE


Running Queries

  • again, forget everything you know
  • there is no SQL here
  • instead of running queries in the traditional sense, data is filtered using map and reduce functions, which are written in javascript
    • DEFINE MAPREDUCE HERE
  • the map and reduce functions combined create a CouchBD view
  • views are stored as rows sorted by key
    • extremely efficient even for millions of records
  • can create temporary views for testing, but these are rather inefficient, so views that are going to be used regularly are stored in the database as documents
    • once a view is stored in the database as a document, CouchDB indexes behind the scenes for efficiency
  • main points:
    • map functions allow you to sort your data using any key you choose
    • CouchDB is designed to provide extremely fast access to data by key and key range
    • you don't really run queries against CouchDB, you query a view
    • when you query a view, CouchDB runs the map function against every document in the database in which the map function is defined
  • map functions have a single "doc" parameter which is each individual document in your database
  • the emit() function is used to spit out matching documents, and you can specify the fields you want to output
  • if you're querying every document in the database every time, isn't that inefficient?
    • you'd think so, but no
    • CouchDB only runs through all the documents the first time the view is queried
    • as documents change, CouchDB only has to update what's changed
    • everything is stored in a B-Tree, which is very efficient
  • creating multiple views specific to how you want to access the data helps with efficiency
  • to execute a view, you just--surprise--hit a URL over HTTP and get JSON back
    • e.g. http://localhost:5984/database/_design/designdocname/_view/viewname
    • to add a key to this, it's just an argument in the query string of the url, e.g.
      • http://localhost:5984/database/_design/designdocname/_view/viewname?key=value
    • can also retrieve documents by key range
      • http://localhost:5984/database/_design/designdocname/_view/viewname?startkey=startvalue&endkey=endvalue
  • default query engine or "view server" in CouchDB is JavaScript, but you can write your own in any language
    • Remember it's all just HTTP and JSON!
  • MAP FUNCTIONS take a document as an argument and emit key/value pairs
    • Btrees are very efficient--even with lots of documents the tree is "shallow" and it's pre-indexed so searches are very fast
  • REDUCE FUNCTIONS operate on the rows returned by map functions and act as filters on the documents


Replication

  • can replicate local -> local, local -> remote, or remote -> remote from Futon
  • as with everything in CouchDB, this is all HTTP/REST based
    • initial replication may be time consuming
    • subsequent replications are diffs only
    • if you trigger replication from Futon, you have to leave the browser window open!
    • but since all this is HTTP based, easy to set up cron jobs that use curl to do replication
  • a POST to CouchDB containing the source and target of replication is all that's needed to kick off replication
    • CouchDB maintains a session history of replication sessions, again in JSON
  • can replicate among local databases or between databases not on the same physical box
  • CouchDB has automatic conflict detection and resolution
    • remember that documents in the database are versioned, so conflicts are handled quite gracefully
    • documents that are in conflict when a replication occurs get a new _conflict:true attribute added to them
      • one of the two competing documents is given the latest revision number, the other is given a previous revision number
      • these conflicts are also replicated, so all databases will have the same information
  • CouchDB takes the approach of "eventual consistency"
    • traditional RDBMS systems enforce consistency--put consistency above all in replication situations
    • “Each node in a system should be able to make decisions purely based on local state. If you need to do something under high load with failures occurring and you need to reach agreement, you’re lost… If you’re concerned about scalability, any algorithm that forces you to run agreement will eventually become your bottleneck. Take that as a given.”

      Werner Vogels, Amazon CTO and Vice President
    • consistency between nodes is not guaranteed on writes, but the nodes will eventually be consistent on reads


Design Documents

  • documents that contain application code
  • IDs must start with _design as the ID, e.g. "_design/myapp"
  • possible to write entire apps in HTML/JavaScript, store this code as a design document in CouchDB, and run the entire app from the CouchDB database
    • dynamic code (views and validation) written as JSON and stored as a document in CouchDB
      • MapReduce queries stored in the views field
      • data output CAN be things other than JSON using the show field, e.g. CouchDB can output RSS without any middleware
    • static HTML pages stored as attachments to the design document


Validation

  • validation functions are used to do things like prevent users who aren't logged into an app from performing document updates
  • validation functions are stored in design documents under the validate_doc_update field
    • can only have one validation function per design document
    • but remember you can have multiple design documents per database
  • documents must pass all the validation rules on all design documents in the database in order to be saved
    • the order in which validation functions are executed is arbitrary
  • most common example, since CouchDB is schemaless, is to require that certain fields be included if a document is declared to be of a particular type
    • e.g. require "title" and "body" for a document type of post
      function(newDoc, oldDoc, userCtx) {
         function require(field, message) {
           message = message || "Document must have a " + field;
           if (!newDoc[field]) throw({forbidden : message});
         };

         if (newDoc.type == "post") {
           require("title");
           require("body");
        }
      }


Show Functions

  • since everything in CouchDB is JSON, HTTP, and JavaScript, it works well in any programming environment
  • CouchDB doesn't, however, address things like outputting HTML
  • easy enough to have CFML call CouchDB over HTTP and output the results in HTML
  • CouchDB can, however, generate HTML natively using show functions
  • basic show function
    function(doc, req) {
       return '<h1>' + doc.title + '</h1>';
    }
    • the "return" bit here is sent back to the browser as a HTTP response
  • you can even write full HTML templates that embed CouchDB-specific scripting so you don't have to embed HTML in javascript functions


Attachments

  • Documents in CouchDB, which are JSON, can have file attachments
  • doing an HTTP PUT with a -d@filename.ext flag tells CouchDB to attach the file to the document ID provided in the PUT request
    • as with other updates to documents, you DO need to provide the current revision number of the document to attach a file to the document
    • unlike other updates you do NOT need to provide the data for the document itself in order to add an attachment to it
  • documents may have mutliple attachments
  • attachments are made available as, surprise, HTTP resources
    • http://localhost:5984/contacts/6e1295ed6c29495e54cc05947f18c8af/headshot.jpg would display the headshot.jpg file attached to the document with the ID provided
  • if you pull a document back from CouchDB that has an attachment, the attachment file name and meta information such as type, size, etc. are contained in the JSON with a key of "_attachments"
    • adding ?attachments=true returns the attachments in base64 format as part of the JSON


Applications

  • you can build applications entirely in CouchDB
  • if you replicate your databases to another server, you replicate your app as well
    • means if you replicate to a local instance of CouchDB, you get offline data mode "for free"
  • applications are stored in CouchDB as design documents
  • can use couchapp for developing native CouchDB apps


Security

  • can lock down databases by editing a simple config file
    • by default it's in /usr/local/etc/couchdb/local.ini
    • there's also a default.ini file, but any changes made to default.ini are overwritten when CouchDB is upgraded
  • e.g. adding admin accounts to CouchDB
    • uncomment [admins] section and add authorized users as user = pass for each line
    • when CouchDB is restarted the passwords are hashed so they aren't stored in the config file in plain text
  • remember--this is all just HTTP so you can apply the same HTTP-based security, proxies, reverse proxies, etc. as you would to any web resource
    • e.g. putting a web server in front of CouchDB and using HTTP authentication would be trivial


General Tips/Tricks

  • In your applications, you'll want to create your own UUIDs for document IDs instead of letting CouchDB auto-create them
    • WHY? stated this in book but didn't elaborate
  • Since replication and resynching is so dead simple, easy to replicate to a local DB for offline use, then resynch when back online
  • For bulk conversions of existing databases to CouchDB, couple of performance tips
    • http://www.atypical.net/archive/2009/05/12/couchdb-090-bulk-document-post-performance
    • use the bulk document API instead of looping and doing individual document additions
      http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
    • don't use CouchDB's auto-assigned IDs--increases db size and has a big performance hit during conversion


There Are Some Cons ...

  • it's new--they call it the bleeding edge for a reason
    • stuff WILL change between versions that will break your apps!
  • it's a completely new skillset--there is no sql here
  • views take a long time to build the first time they're saved, but after that they're incredibly fast regardless of the number of documents involved
  • large databases can take up a lot of disk space
    • raw data is one consideration, but the views often take up much more disk space than the data itself
    • this is trading disk space for performance, which is a good tradeoff, so you just need to plan for disk capacity
  • CouchDB does not deal well with relational data
    • that being said, we all likely spend a lot of time dealing with the shortcomings of relational data, specifically how horrendously bad the relational model is at dealing with heirarchical data, so I'm not sure this is a straight con as compare with the relational model
    • recurring theme in the CouchDB literature is DON'T GO AGAINST THE GRAIN! Don't try to force CouchDB to behave like an RDBMS, because it's not.
  • CouchDB does not support transactions well
    • e.g. check to see if a user name is unique, then assign it--no way to isolate this from another simultaneous request in the same way you can with relational database transactions
  • Reads on single documents, and writes in general, are slower than you might be used to with an RDBMS
    • but, the CouchDB model scales much better
  • Have to write all of your "queries" (views) in advance--no on the fly SQL allowed
  • map-reduce not as flexible as sql--sometimes you'll have to return more data than you want or need and process on the application side


Common Use Case That Is a Bit Odd in CouchDB: Unique Constraints

  • e.g. guarantee a unique user name or email address
  • thread here http://markmail.org/thread/qwgqql74b2gg5hc5
  • other than the _id field, CouchDB has no way of guaranteeing uniqueness in any other field
  • you CAN do a check to see if the record already exists, but remember ...
    • This is HTTP. There are no transactions.
    • There is no locking in CouchDB
  • On the other hand, guaranteed uniqueness doesn't scale
  • Some solutions
    • you can use anything as your _id, so use one piece of data that has to be unique AS your _id
    • use a relational table in an RDBMS to store any unique values and put a unique constraint on that field in the RDBMS
      • you could still have problems, but this would reduce the likelihood from slim (without doing this) to extremely slim


Resources

Filed under // CouchDB Databases

Comments (6)

Oct 7 / 7:06am

CouchDB emerging as a top choice for offline Web apps | InfoWorld

We're not trying to build the Ferrari of databases; we're trying to build the Honda Accord of databases and that's a little different sweet spot," Anderson said.

Really nice article about CouchDB that focuses on CouchDB as an application development platform, which makes running applications offline dead simple.

I'm not sure I'll ever get into building entire apps in CouchDB but I absolutely love it as a data storage engine.

Filed under // CouchDB Databases

Comments (0)

Sep 30 / 6:59am

Code Monkeyism: The dark side of NoSQL

The three problems no-one talks about – almost noone, I had a good talk with the Infinispan lead [1] – are:

  • ad hoc data fixing – either no query language available or no skills
  • ad hoc reporting – either no query language available or no in-house skills
  • data export – sometimes no API way to access all data

Much as I'm loving CouchDB these days, these are valid concerns if you're considering moving to a "no SQL" type database (and there are a few other concerns as well).

As always it comes down to using the right tool for the job. This is pointed out at the end of this article and in the comments, but depending on your needs you may wind up with a combination of a No SQL database for what they're great at, and a traditional RDBMS for the problems that No SQL databases don't solve.

Filed under // Databases

Comments (0)

Sep 25 / 5:02pm

MongoDB: A Light in the Darkness! (Key Value Stores Part 5) | Engine Yard Blog

MongoDB can be thought of as the goodness that erupts when a traditional key-value store collides with a relational database management system, mixing their essences into something that’s not quite either, but rather something novel and fascinating.

I've been playing around with CouchDB quite a lot lately, and MongoDB seems quite similar. After writing a couple of small apps with CouchDB I'm finding it pretty darn painful to go back to RDBMS when I have to.

Filed under // Databases

Comments (0)

Sep 1 / 9:30am

Caveats of Evaluating Databases (Jan Lehnardt)

This is part two in a small series about measuring software performance. There’s a lot of common sense covered, but I feel it necessary to shed some light.

If you haven’t, check out part one.

Say you want to find out what’s behind the buzz of all these new #nosql databases. There’s a large number to choose from today. All options come in varying degrees of maturity and characteristics so it’d be nice to know what solves your problem best. A non-exhaustive list of these databases or storage systems include Memcache[DB], Tokyo Cabinet / Tyrant, Project Voldemort, Scalaris, Dynamite, Redis, Persevere, MongoDB, Solr or my favourite CouchDB. And these are just some of the open source ones.

This article is not a comprehensive comparison of any of the mentioned systems. Instead it tries to give you an idea about what to look for when evaluating a storage system or how to take into perspective evaluations and benchmarks others have done.

We’ll look at some of the technical aspects of data storage systems: Applying common sense when reading benchmarks; b-trees and hashing; speed vs. concurrency; networked systems and their problems; low level data storage (disks’n stuff); and data reliability on single-nodes and multi-node systems.

There are a lot of other reasons to decide for or against a project based on a lot of non-technical criteria, but things like commercial support or a healthy open source community are not part of this article.

Astounding Numbers

From time to time you see some crazy numbers posted to the reddits of the internets that claim fantastic performance.

The (imaginary) SuperfastDB can store 450,000 items per second!.

Wow.

No word on where the items are stored (in memory? on a harddrive? Spindles? Solid State?), what an item is exactly and how big it is, the rest of the hardware this was run on and how to reproduce it.

But boy, 450,000 a second!

My shoes can do 650,000 a second, but you’ve got to figure out what.

Context is as important as reproducibility. The last article here established that finding out that my system and your system come up with different numbers is not much of a help. Any sort of serious test must come with a set of scripts or programs and comprehensive instructions on how the tests were run.

Everything “cool” in computer science has been around for 25+ years. Actual innovation is rare. Advancements in hardware and new combinations of existing solutions make for new stuff coming out each day (that’s a good thing), but the fundamental rules are the same for all. We’re all running von Neumann machines, quicksort is still pretty quick and hashes and b-trees rule the storage world.

Let’s recap.

Hashes & Trees

Hashing revolves around the idea of O(1) lookups. Allocate a number of buckets, create a function that gives you a number of a bucket for any data item you might want to store, make sure no two data items hit the same bucket (or work around that). Runtime characteristics include that you only need to ask your function where to look for or store your data and the allocation of your set of buckets: If you need to store more items than you have buckets, some more work is required which gives you O(N) operations that you can’t ignore in practice.

D5563B63-7B48-4280-A31F-EDB37DB78416.jpg

The other elephant in the room is b-trees. The fundamental idea here is to get to your data in a minimal number of steps traversing a tree because making a step is expensive, but reading your data is very fast comparatively. Steps are expensive because they translate to a head seek (that is the time your spinning hard drive needs to position the reading arm to find the spot to read your data from), but reading from a harddrive once the reading head is in place is fast.

6720EE64-4DFC-4298-B3BA-0145746C6523.jpg

There are a bunch of more interesting lookup structure like R-Trees for spatial queries, but they are mostly used for secondary indexes on top a regular data set that lives in a hash or b-tree.

Concurrency vs. Speed

Concurrency is hard. The devil lies in the details and when briefly looking at things, the details are often overlooked. Suits the devil.

Creating storage systems that assume only one access occurs at a time is relatively easy. If resources are shared concurrently, things become tricky. The two larger schools of thought (and practice) are locking and no-locking (heh).

Locking means that the database has to maintain information for everybody who wants to write to a part of the database, and what part it is.

No locking, or optimistic locking or MVCC moves that burden to the person who is trying to write to the database. She must prove that she won’t be overwriting any existing data.

The trade-offs here are a leaner request handing on the server that works well with remote & concurrent clients at the expense of more complexity on the client (the person who wants to store something in our database).

Hybrid approaches are possible too: While MVCC is used internally, the database’s clients can rely on database-side locking (e.g. PostgreSQL or InnoDB).

Networks

Just a quick note: We already talk about client and server here. There is a strong case for embedded databases like SQLite that don’t expose a concurrent user model to the outside. The program that needs an embedded database just includes it.

Another approach to using databases is having a dedicated computer running a database system and sharing it over the network with any number of clients using this database server. They can often be “a bunch of servers” or a cluster. More on that later.

A separate database server (networked or not) will need to spend some time to deal with connections, network failures, unspecified client behaviour and so on. The upside is a piece of infrastructure that can be maintained separately. An embedded database will thus be faster but probably won’t solve all of your problems and it will always be tied to your application.

fsync(): Reliability vs. Speed

When people tell me “SuperfastDB does 450,000 a second!” I ask “How many fsync()s is that?”. Let me explain:

A database system uses operating system services to use any hardware. The operating systems exposes a harddrive through a filesystem. The database systems talks to the filesystem and asks it to store or retrieve data in its behalf. The filesystem then goes ahead and tries to satisfy the database’s requests.

(I’ll not talk about databases that can use raw block devices to store data. They exist but they are not as common as those who use the filsystem.)

The filesystem also tries to be clever – for good reasons. When the database requests a piece of data, the filesystem will not only find that piece and return it, it will also store it in a cache to avoid having to actually talk to the harddrive the next time this piece of data gets requested. When the data changes, the filesystem either removes it from the cache or updates it with the harddrive. It might even go further and only store the new data that comes in with a write request into the cache and rely on a periodic task to write all of the cache back to the drive. Writing a bunch of of pieces at once is more efficient than storing each one on its own.

More efficient equals to faster and faster is good, right? Well, it depends: If all goes well, this approach is a nice one. But you know computers, things will not go well 100% of the time. The failure scenarios are endless, but they boil down to the question: “What happens when your machine dies and you have data that has only been written to memory?” — The answer isn’t too hard: That data is lost. If there is a delay between a write request finishing and data being written (or “flushed”) to disk any data that has been “written” during the delay period is subject to loss.

There are cases where this is not a problem; in other cases it is. A developer should have the chance to decide. (Note that even your hardware could be lying to you about having stored data, but I’ll punt on this one, get proper hardware).

So, flushing to disk needs to happen before you can rest assured your data has been stored. Your operating system has an API call that forces the filesystem to write its cache to disk. It is called fsync() (on UNIX systems) and it is an expensive operation. You can only do so many fsync()s in a second and it is not a great many.

The 450,000 items were most likely just written to memory and not to disk.

Space & In-Place

When writing files to disk (at the end of the day, your data ends up in one file or another on the filesystem) that represents what lives in a database, there are multiple options to handle updates.

An update is a change to your data item, for example, a new phone number. The intuitive way to handle this is to go and find the old phone number in the file, and overwrite it with the new number. Easy.

There are several problems with this approach: What to do if the new phone number is longer than the old one (say you added an international calling prefix)? The new number needs to be written to a different place and the change in location must be recorded. Not too big of an issue.

Back to failure scenarios: Again, the reasons can be manifold, but what happens when we’ve (over-)written the first 4 digits of the old with the new number and then the server dies, power goes away or the database server crashes? The next time you want to read the phone number you get a mix of the old and the new one (if you are lucky) and you don’t exactly know that this is the case and which parts are missing. Your database file is inconsistent and you need to run an integrity check to find missing bits and correct half-written bytes. In the worst case that means scanning your entire database file a few times before you resolved all inconstancies. If you have a lot of data, that can take days.

To solve this, you always write the new phone number to a new place in the database file and only when it has been fsync()ed to disk, you update the location of the phone number (and then flush that update to disk as well). You will never end up in a scenario where your database file can end up an inconsistent state and after a crash you are back online without an integrity check.

The trade-off for consistency is write-speed (remember fsync()s are expensive) for consistency-check-speed after a failure.

A nice bonus is that if the “new place in the database” is the end of the file, you keep your disk-drive head busy with writing data to disk instead of seeking all over the place (remember: seeks are expensive).

Distribution, Sharding & Resharding

So far, we’ve been looking at scenarios that involve a single database. We learned a great deal (I hope), but in reality we often deal with more than one database. The simplest reason to have two databases is for redundancy. Failures can bring down your database temporarily or even permanently. If it is a temporary issue, waiting a bit (or a bit longer) to get up and running again might be an option, but often, an application or service should be available at all times. A fatal failure where a database server is lost beyond repair, your data is gone if you haven’t stored it in a second place.

“I’ll just make two copies, easy!”. Yup easy, until you look at the details (that damn devil again!).

It’s all about failures again. Consider a single read request. A client connects to a server and asks for a data item. The server looks it up and returns the data to the client. All is well. At any point things can go wrong. The network connection can drop (or slow down so much that client or server assume it dropped), the client can disappear (because of a network failure or crash) as can the server. Clients, servers and the protocols they speak need to be built around the assumption that any of these things (and many more) can go wrong. If any parts is not designed to handle error cases, your system will do funny things, but it won’t reliably store and manage your data.

Add complexity: With each write target (store in two places) the possibility of error and the need for proper error handling grows quadratically. When evaluating a distributed storage system, looking at how errors are handled is vital.

Another reason to distribute data among multiple servers is capacity. The three metrics of interest here are read requests, write requests and data. If you have more requests or data than a single machine can handle, you need to move to multiple machines. Each metric calls for different strategies, but they often go along with each other. The need for fault tolerance that I discussed above needs to be considered alongside.

Growing read capacity is relatively easy once you covered the base case where the source for reading data might not be the same as the the target for writing data and that there can be a mismatch (cf. eventual consistency).

Distributing writes and data works by designating two machines with 50% of the operations. A clever intermediate, a proxy server for example, decides which request goes where and all is well, we can store twice as much and we can store at twice the speed. When we need to grow bigger yet, we add another server and tell the proxy server to distribute the load equally among them. Adding a proxy for distribution introduces a single point of failure and you don’t want these; there’s added complexity with this approach.

resharding.png

The diagram shows that there is another step needed that wasn’t included in the above description. The new “node” needs to have a copy of all data items that are assigned to him and are currently living on the two existing nodes. The process of moving data items to new nodes is called resharding and needs to happen every time a new node is added.

Resharding can be an expensive operation if you have a lot of data. Techniques like consistent hashing help with minimising the amount of items that need to move. If you are looking at a sharding database, you want to understand how the sharding is performed and if you like the trade-offs.

CAP Theorem

The CAP Theorem states that out of consistency, availability and partition tolerance, a system can choose to support two at any given moment, but never three.

cap.png

Consistency guarantees that all clients that talk to cluster of nodes will always get to read the same data. Write operations are atomic on all nodes.

Availability guarantees that in any (reasonable) failure scenario, clients are still able to access their data.

Partition tolerance guarantees that when nodes in the cluster lose their network connection and two or more completely separated sub-clusters emerge, the system will still be able to store and retrieve data.

Please Talk! (To Developers)

If you are aiming for a comparative benchmark of two or more systems, you should run your procedure by they authors. I found developers are happy to help out with benchmarks by clearing up misconceptions or sharing tricks to speed things up (which you can choose to ignore, if you are looking for out-of-the box comparison, but this is rarely useful).

Filed under // CouchDB Databases

Comments (0)

Sep 1 / 9:27am

Benchmarks: You are Doing it Wrong (Jan Lehnardt)

This is part one in a small series about measuring software performance. There’s a lot of common sense covered, but I feel it is necessary to shed some light.

Coffee

Pete needs coffee and his coffee maker broke down. Pete’s browsing through Craigslist. He’s looking for a coffee maker and he’s fine with a used one if he can get it from nearby. While results may vary when Pete’s got his coffee, his brain processes what he sees on a web page in between 200 and 500 milliseconds. Of course this depends on the complexity of the page and outside distractions[citation needed].

Computers are very limited in what they can calculate but they are incredibly fast and reliable. Human brains are a lot more sophisticated, but not as fast on raw computations. To render the Craigslist homepage takes about 150ms right now (I’m in Berlin) when I ask curl and it takes Safari around 1.4 seconds (1400ms) to display the page.

This in part demonstrates the measuring dilemma. Pete never sees the 150ms response for http://craigslist.org/. He only sees that it takes a bit before his browsers finishes loading. We’ll get back to that later.

The point here is, even if all parts of the system would result in a sub-200ms response time, Pete (and everybody else) would not notice. Pages would change “instantly” as far as he (and everybody else) is concerned. While the fallacies of distributed computing (read: The Internet) will probably never get us there, at some point it does not make any more sense to speed things up because no one will notice.

Moving Parts

Lets take a look what a typical web app looks like. This is not exactly how Craigslist works (because I don’t know how Craigslist works), but it is a close enough approximation to illustrate problems with benchmarking.

You have web server, some middleware, a database. A user request comes in, the web server takes care of the networking and parses the HTTP request. The request gets handed to the middleware layer which figures out what to run; then runs whatever is needed to serve the request. The middleware might talk to your database and other external resources like files or remote web services. The requests bounces back to the web server which sends out any resulting HTML. The HTML includes references to other resources living on your web server, like CSS-, JS- or image files and the process starts anew for every resource. A little different each time, but in general, all requests are similar. And along the way there are caches to store intermediate results to avoid expensive recomputation.

That’s a lot of moving parts. Getting a top-to-bottom profile of all components to figure out where bottlenecks lie is pretty complex (but nice to have). I start making up numbers now, the absolute values are not important, only numbers relative to each other. Say a request takes 1.5 seconds (1500ms) to be fully rendered in a browser.

In a simple case like Craigslist there is the initial HTML, a CSS file, a JS file and the favicon. Except for the HTML, these are all static resources and involve reading some data from a disk (or from memory) and serve it to the browser who then renders it. The most notable things to do for performance are keeping data small (gzip compression, high jpg compression) and avoiding requests all together (HTTP level caching in the browser). Making the web server any faster doesn’t buy us much (yeah, hand wavey, but I don’t want to focus on static resources here. Pete wants his coffee. Let’s say all static resources take 500ms to serve & render.

(Read all about improving client experience with proper use of HTTP from Steve Sounders. The YSlow tool is indispensable for tuning a web site.)

That leaves us with 1000ms for the initial HTML. We’ll chop off 200ms for network latency [cf. Network Fallacies]. Let’s pretend HTTP parsing, middleware routing & execution and database access share equally the rest of the time, 200ms each.

If you now set out to improve one part of the big puzzle that is your web app and gain 10ms in the database access time, this is probably time not well spent (unless you have the numbers to prove it).

Variables

We established that there are a lot of moving parts. Each part has a variable performance characteristic, based on load, disk I/O, state of various caches (down to CPU L2 caches) and different OS scheduler behaviour based on any input variable. It is nearly impossible to know every interfering factor, so any numbers you ever come up with should be read with a grain of salt. In addition, when my system reports a number of 1000ms and yours reports 1200ms the only thing we can derive from that is our systems are different and we knew that before.

To combat variables, usually profiles are run multiple times (and a lot of times!) to have statistics tell you the margin of error you’re getting. Profiles should also run a long time with the same amounts of data that you will see in production. If you run a quick profile for a few seconds or minutes, you will hit empty caches and get skewed numbers. If your data does not have the same properties as the data you have in your production environment, you’ll get skewed results.

Story time: Chris tried to find out how many documents of a certain size he could write into CouchDB. CouchDB has a feature that generates a UUID for every new document you store. The UUID variant it is using uses a full 128 bits of randomness. The documents are then stored in a b+-tree. Turns out that for a b+-tree, truly random keys for any kind of access are the worst possible case to handle. Chris then switched to pre-genereated sequential ids for his test and got a 10x improvement. Now he’s testing the best case for CouchDB which coincides with the application’s data, but your application might have a different key distribution only resulting in a 2x or 5x improvement or none at all.

In a different case, the amount of data stored and retrieved could easily fit in memory and Linux’ filesystem cache was smart enough to turn all disk access to memory access which is naturally faster. But it doesn’t help if you production setup has more data that fits in memory.

Take home point: Profiling data matters.

The second part of this little series will look at pitfalls when profiling storage systems.

Trade Offs

Tool X might give you 5ms response times and this is an order of magnitude faster than anything else on the market. Programming is all about trade-offs and everybody is bound by the same laws.

On the outside it might appear that everybody who is not using Tool X is a moron. But speed & latency are only part of the picture. We already established that going from 5ms to 50ms might not even be noticeable by anyone using your product. The expense for speed can be multiple things:

  • Memory; instead of doing computations over and over, Tool X might have a cute caching layer that saves recomputation by storing results in memory. If you are CPU bound, that might be good, if you are memory bound it might not. A trade off.

  • Concurrency; the clever data structures in Tool X are extremely fast when only one request at a time is processed, and because it is so fast most of the time, it appears as if it would process multiple request in parallel. Eventually though, a high number of concurrent requests fill up the request queue and response time suffers. — A variation on this is that Tool X might work exceptionally well on a single CPU or core, but not on many, leaving your beefy servers idling.

  • Reliability; making sure data is actually stored is an expensive operation. Making sure a data store is in a consistent state and not corrupted is another. There are two trade offs here: Buffers that store data in memory before committing it to disk to ensure a higher data throughput. In case of a power loss or crash (hard- or software), the data is gone. This may or may not be acceptable for your application. The other is a consistency check that is required to run after a failure. If you have a lot of data, this can take days. If you can afford to be offline, that’s okay, but maybe you can’t afford it.

Make sure to understand what requirements you have and pick the tool that complies instead of taking the one that has the prettiest numbers. Who’s the moron when your web application is offline for a fix up for a day and your customers impatiently wait to get their job done; or worse, you lose their data.

But…My Boss Wants Numbers!

Yeah, you want to know which one of these databases, caches, programming language, language constructs or tools are faster, harder, stronger. Numbers are cool and you can draw pretty graphs that management types can compare and make decisions from.

First thing a good exec knows is that she’s operating on insufficient data (aside, everybody does all the time, but sometimes it is just not apparent to you) and diagrams drawn from numbers are a very distilled view of reality. And graphs from numbers that are effectively made up by bad profiling are not much more than a fairy tale.

If you are going to produce numbers, make sure you understand how much is and isn’t covered by your results. Before passing them on, make sure the receiving person knows as much.

A Call to Arms

I’m in the market for databases and key-value stores. Every solution has a sweet spot in terms of data, hardware, setup and operation and there are enough permutations that you can pick the one that is closest to your problem. But how to find out? Ideally, you download & install all possible candidates, create a profiling test suite with proper testing data, make extensive tests and compare the results. This can easily take weeks and you might not have that much time.

I would like to ask developers [*] of storage systems to compile a set of profiling suites that simulate different usage patterns of their system (read-heavy & write-heavy loads, fault tolerance, distributed operation and a lot more). A fault tolerance suite should include steps necessary to get data live again, like any rebuild or checkup time. I would like users of these systems to help their developers to find out how to reliably measure different scenarios.

* I’m working on CouchDB and I’d like to have such a suite very much!

Even better, developers could agree (hehe) on a set of benchmarks that objectively measure performance for easy comparison. I know this is a lot of work and the results can still be questionable (you read the above part, did you?), but it’ll help our users a great when figuring out what to use.

Stay tuned for the next part in this series about things you can do wrong when testing databases & k-v stores.

Filed under // CouchDB Databases

Comments (0)