Skip to main content

Open Source Bridge - Relational vs. Non-Relational

Josh Berkus, PostgreSQL Experts Inc.
  • overview focused on choosing what type of database you need vs. investigation of any specific database
  • up until a few years ago there were only a handful of options for open source databases
    • most were sql/relational
    • a few written in java
    • only really exciting thing going on in the relational world is postgres vs mysql
  • today there are many more open source databases
    • as many as 5 dozen now?
  • databases for lots of different purposes, but lots of people want to lump a lot of the new ones under the "nosql movement" label
    • not so fond of this term
    • has implication that every database that doesn't have a sql interface is more or less identical
    • all non-relational databases aren't the same
    • have graph, document, key-value, distributed, hierarchical ... quite different from one another
    • some of the non-relational databases have sql interfaces
  • all relational databases aren't the same either
    • embedded, oltp, mpp, streaming, c-store ...
  • mythbusting
    • "revolutionary" is bandied about a lot but database technology goes back a long way
    • not really any new database designs in terms of fundamental architecture
    • are new implementations and combinations of design
    • last "new" thing was map-reduce in 2002
    • even couchdb is largely similar to Pick, which was created in 1965
    • when looking at new databases, don't look for revolutionary concepts, look for good implementations
    • what's going on right now is actually a renaissance of non-relational databases
  • myth: "non-relational databases are toys"
    • google - bigtable
    • amazon - dynamo
    • facebook - memcached
    • us veterans administration - pick, cache
  • myth: "relational databases will become obsolete"
    • xml databases were supposed to replace rdbms ca. 2001 -- didn't happen
    • rdbmses evolved to include xml functionality
    • one of the things we'll see out of the current non-relational innovation is that some of the implementations will hybridize with one another
  • myth: "relational databases are for when you need ACID transactions"
    • transactions != relational -- orthagonal features
    • robust transactions without relationality: berkelydb, amazon dynamo
    • sql without transactions: mysql isam, ms access
  • myth: "users are adopting nosql for web-scale performance"
    • sometimes it is, sometimes it isn't
    • performance test done by myyearbook.com
      • benchmark of key/value storage and retrieval
      • only real difference in performance is between databases that guarantee durability and those that don't
    • horizontal scalability
      • some non-relational databases are built for horizontal scalability and some aren't
      • complexity of implementation rises with the ability to scale out to a massive number of nodes
  • myth: "one ring theory of database selection"
    • "what's the best database to use?" - wrong question
    • don't need to use only one database
    • choose the db that meets your applicaton's goals, or use more than one together
    • use a hybrid
      • mysql ndb
      • postgresql hstore
      • hadoopdb
  • but what about choosing between relational and non-relational?
  • relational oltp databases
    • transactions: more mature support
    • constraints: enforce data rules absolutely
    • consistency: enforce structure 100%
    • complex reporting: keep management happy!
    • vertical scaling (but not horizontal)
  • sql vs. no sql--sql promotes ...
    • portability
    • managed changes over time (ddl)
    • multi-application access
    • many mature tools
    • but, sql is a full programming language and you have to learn how to use it
  • no sql promotes ...
    • programmers as dbas
    • no impedance mismatch
    • fast interfaces
    • fast development and deployment
    • but, man involve learning complex proprietary APIs
      • in some cases not easier than sql
  • main reason to use sql-relational databases
    • "immortal data"
    • your data has a life independent of this specific application implementation
    • important to be able to access data accurately and consistently forever
  • how DO i choose?
    • define the problem you're trying to solve
      • what is it that my application wants to do with this data and how does it want to do it?
      • i need a database for my blog, i need to add thousands of objects per second on a low-end device, etc.
    • from the definition you create you can define a database shopping list
      • define the features you ACTUALLY need
  • fit the database to the task
  • "I need a database for my blog"
    • use anything!
    • no open source databases that wouldn't support someone's individual blog, flat files would even work
  • "I need my database to unify several applications and keep them consistent"
    • high-end sql-relational database best choice for this
    • "PostgreSQL: It's not a database, it's a development platform"
  • "I need my application to be location-aware" -- geo applications
    • PostGIS - geographic relational database
    • queries across "contains" "near" "closest"
    • complex geometric map objects
    • couchdb spatial and spatialite are now available as well
  • "I need to store 1000s of event objects per second on a piece of embedded hardware"
    • db4object -- embedded key-value store
    • others: berkeleydb, redis, tokyocabinet, mongodb
    • db4object
      • project was german train system -- records data every few milliseconds
      • low-end embedded console computer
      • simple access in native programming language (java, .net)
  • "I need to access 100K objects per second over thousands of connections from the web"
    • memcached - distributed in-memory key-value non-persistent database
    • use: public website
    • typically supplements another database
    • alternatives: redis, kyototyrant, etc.
  • "i need to produce complex summary reports over 2tb of data"
    • luciddb - relational column-store database
    • for reporting and analysts
    • large quantities of data
    • complex olap and analytics
    • used along-side oltp running production apps
  • "I have 100s of govt documents I need to serve on the web and need to mine the data as cheaply as possible"
    • CouchDB
    • storing lots and lots of government documents that didn't have a consistent format (don't know content or structure)
    • used in combination with postgres to keep structured metadata
    • couchdb is also great for mobile apps
  • "I have a social application and I need to know who-knows-who-knows-who-knows-who"
    • surprisingly hard question to answer with a normal db
    • use a graph database -- neo4j is most popular open source one
    • social network website
    • 6 degrees of separation
    • "you may also like"
    • type and degrees of relationship
  • "I get 1000s of 30K bug reports per minute and I need to mine them for trends"
    • used on mozilla firefox crash reports
    • hadoop -- massively parallel datamine
    • hadoop + hbase
      • reports are then put into postgres for viewing
  • conclusion
    • database systems do better at different tasks
      • every database feature is a tradeoff
      • no database can do all things well
      • need to make tradeoff decisions when picking databases
    • relational vs non-relational doesn't matter
      • pick the database(s) for the project or task
Questions
  • how difficult to migrate from something like couchdb to postgres?
    • depends on how much data and in what form
    • since couchdb works with json that's pretty easy
    • if you want to take the document structure out of something like couchdb and put it into a relational model, the decomposition process will be complicated

Comments

Popular posts from this blog

Running a Django Application on Windows Server 2012 with IIS

This is a first for me since under normal circumstances we run all our Django applications on Linux with Nginx, but we're in the process of developing an application for another department and due to the requirements around this project, we'll be handing the code off to them to deploy. They don't have any experience with Linux or web servers other than IIS, so I recently took up the challenge of figuring out how to run Django applications on Windows Server 2012 with IIS.

Based on the dated or complete lack of information around this I'm assuming it's not something that's very common in the wild, so I thought I'd share what I came up with in case others need to do this.


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Assumptions and CaveatsThe operating system is Windows Server 2012 R2, 64-bit. If another variant of the operating system is being used, these instructions may not work properly.All of the soft…

Installing and Configuring NextPVR as a Replacement for Windows Media Center

If you follow me on Google+ you'll know I had a recent rant about Windows Media Center, which after running fine for about a year suddenly decided as of January 29 it was done downloading the program guide and by extension was therefore done recording any TV shows.

I'll spare you more ranting and simply say that none of the suggestions I got (which I appreciate!) worked, and rather than spending more time figuring out why, I decided to try something different.

NextPVR is an awesome free (as in beer, not as in freedom unfortunately ...) PVR application for Windows that with a little bit of tweaking handily replaced Windows Media Center. It can even download guide data, which is apparently something WMC no longer feels like doing.

Background I wound up going down this road in a rather circuitous way. My initial goal for the weekend project was to get Raspbmc running on one of my Raspberry Pis. The latest version of XBMC has PVR functionality so I was anxious to try that out as a …

Fixing DPI Scaling Issues in Skype for Business on Windows 10

My setup for my day job these days is a Surface Pro 4 and either an LG 34UC87M-B or a Dell P2715Q monitor, depending on where I'm working. This is a fantastic setup, but some applications have trouble dealing with the high pixel density and don't scale appropriately.
One case in point is Skype for Business. For some reason it scales correctly as I move between the Surface screen and the external monitor when I use the Dell, but on the LG monitor Skype is either massive on the external monitor, or tiny on the Surface screen.
After a big of digging around I came across a solution that worked for me, which is to change a setting in Skype's manifest file (who knew there was one?). On my machine the file is here: C:\Program Files\Microsoft Office\Office16\LYNC.EXE.MANIFEST
And the setting in question is this:
<dpiAware>True/PM</dpiAware>
Which I changed to this: <dpiAware>False/PM</dpiAware>
Note that you'll probably have to edit the file as administr…