Wednesday, October 27, 2010

Apache Error "Document Root Doesn't Exist" on Red Hat Enterprise LinuxWhen the Document Root DOES Exist

I upgraded one of our Red Hat Enterprise Linux VMs to Tomcat 7.0.4 tonight, did a bit of Apache reconfiguration, and when I restarted Apache I got a "document root doesn't exist" error even though in fact the document root does exist. (Trust me Apache, it's there.)

I double-checked the owners and permissions of all the directories in question and everything was identical to how things are set on another RHEL VM in this cluster on which I haven't upgraded Tomcat and done the reconfiguration yet. I was at a bit of a loss, so I googled around a bit and the prevailing sentiment seemed to be this was related to either A) a config file copied from a Windows box and the line breaks were throwing things off (wasn't applicable in my case), or B) the fact that SELinux was enabled.

If you've been around the Red Hat flavors of Linux long enough you'll remember that SELinux used to be absolutely horrible. For years the very first thing you had to do on Red Hat and Fedora to get anything working at all was turn SELinux off, and for a long time I believe it was even Red Hat's recommendation under their breath to just shut it off. It's gotten better over the years, and honestly stays out of the way to the point where I'd almost forgotten about it.

Since I didn't have anything else to try, however, I went into /etc/sysconfig/selinux and changed SELINUX=enforcing to SELINUX=disabled, restarted the server, and voila the complaining from Apache went away.

What I still don't get is A) why this isn't occurring on my other RHEL box with the same setup, and B) why it just started happening now. The only potential weirdness here is that my document root is a symlink, but again, it's been that way since I setup up these boxes originally and it hasn't been an issue.

So if you run into this same problem the fix (at least until I have more information about why it's happening) is to disable SELinux, but if anyone has more ideas about why this might be happening I'd love to hear them.

CFHTTP "Could Not Determine MIME Type of File" Error with FacebookGraph API

I'm finishing up the Facebook and Twitter integration for the OpenCF Summit application we'll be launching soon, and I was running into an error when calling the Facebook Graph API via CFHTTP.

If you've worked with Facebook you're probably aware that once a user logs into Facebook via your application, a cookie gets set that contains an access token you can use to grab additional details about the Facebook user via the Graph API. You make a simple HTTP call to http://graph.facebook.com/me?access_token and you get JSON back containing the details about the user that they have authorized you to retrieve.

The first issue I ran into is if the access token contains a pipe you may get a "Failed to set URL:Invalid query" error (and thanks to Andy Wu for pointing out to me that was the issue!). Simple enough to fix by putting the access token into a CFHTTPPARAM tag:

<cfhttp url="https://graph.facebook.com/me">
    <cfhttpparam type="url" name="access_token" value="cookie_value_here" />
</cfhttp>



That fixed the invalid query error, but then the response I was getting back had a filecontent of "Connection failed." It worked fine in my browser so I wasn't sure what was going on, but the mimetype of the response was "Unable to determine MIME type of file" so that gave me a bit of a clue.

CFHTTPPARAM to the rescue again. Setting a MIMETYPE header with a value of text/javscript did the trick:

<cfhttp url="https://graph.facebook.com/me">
    <cfhttpparam type="url" name="access_token" value="cookie_value_here" />
    <cfhttpparam type="header" name="mimetype" value="text/javascript" />
</cfhttp>


Hope that saves someone else a bit of time if they run into a similar issue.

Tuesday, October 26, 2010

Open BlueDragon Admin Console Version 1.4 Updates

It's that time again! As many of you know Open BlueDragon is on a regular six-month release cycle, so late October means it's time to certify those great nightly builds as gold, stamp a final version number on it, and immediately get started on the next version.

Along with each new release of OpenBD comes updates to the admin console, which is an open source project of its own, written entirely in CFML. I point that out not only to let you know if there are features you'd like to see in the admin console I'd love to hear about them, and also since it's written in CFML if you want to get involved with the project you're more than welcome!

Version 1.4 of the admin console doesn't have major changes (we have more major revisions planned for the 1.5 release next April), but I managed to squeeze some nice new features in here. See the attached screenshots to get a visual on these.

Application and Session Status Info

The admin console now displays the number of running applications and sessions along with a list of the names of the running applications. In addition, leveraging the new ApplicationRemove() function you can now unload applications from your OpenBD instance with a simple click of a button in the admin console. If you have an OnApplicationEnd() method defined in your Application.cfc this will get called as the application is unloaded.


JVM Memory Information

This was a simple little addition that uses the new SystemMemory() function, allowing you to see JVM memory stats for used memory, free memory, total memory, and max memory. Nothing fancy yet but it at least gives you an idea of what's going on with your memory.


File Cache Statistics

If you've been using OpenBD for a while you're probably familiar with the fact that the file cache hit and miss stats were not shown in the admin console. This is no longer the case. You will now see file cache statistics from the admin console, and can even drill into details about the specifc files in the cache. This leverages the new SystemFileCacheInfo() and SystemFileCacheList() functions.


Slow Query Log

This feature was added to OpenBD a whlle back, but until now you had to modify your bluedragon.xml file to enable it. There is now a checkbox in the admin console that lets you enable slow query logging and specify the number of seconds past which a query will be considered slow and be logged. You can also view the slow query log file in the admin console's log file viewer. The slow query log shows you the time when the query was run, the total execution time in milliseconds, the tag (CFQUERY or CFSTOREDPROC) that executed the query, the file and line number of the query, and the SQL code of the query. Great info to have when you're debugging or troubleshooting.

That and a few bug fixes are what made it into the 1.4 release, and as I said we have plans for a bigger overhaul for 1.5.

Enjoy!





Monday, October 25, 2010

"Shiny app syndrome" and Gov 2.0 - O'Reilly Radar




This was sent to me by a coworker--the entire article is really great but the person interviewed in this video makes some excellent points about the dangers of requiring specific devices to access services. This is bad from the standpoint of freedom and technology in general, vendor lock-in, etc., but is absolutely horrible when it comes to government services.


Unless they're developed by a third party completely independent of any particular government agency, citizens fund the development of the applications that make the promise of letting them interact more directly and more effectively with their government. By limiting access to a specific device, it's like simultaneously spitting in the face of the citizens that fund the development and handling Apple a check.


With the decreasing cost and increasing availability of technology the digital divide was supposed to get smaller, not bigger, but by requiring citizens to buy one of the most expensive phones on the market and sign up for an expensive data plan through one specific wireless carrier, we're making it far, far worse and the conspiracy theorist in me has to wonder if something nefarious is going on behind the scenes.


Thankfully there's a simple solution to this problem. First, follow the "just give us the data" mantra of Gov2.0 advocates, and second, build apps with standards that don't lock people into any one device. There is absolutely no reason any publicly developed application should only be available on one particular device, and if there aren't any rules in government that mandate cross-device compatibility as a requirement, there should be.


Friday, October 22, 2010

Grails + CouchDB #s2gx

Scott Davis - thirstyhead.com

NoSQL Databases in General

  • given the number of big companies using them, clearly they're ready to use today
  • time to re-examine our unnatural obsession for relational databases
  • rdbms has been around for 50 years now--well understood, great tooling, lots of information
  • rdbmses are silos
    • still good at what they do, but aren't necessarily well-suited to all data
  • as developers we're being forced to use sql to express something that's crucial to the success of your application
    • not our native language, kind of foreign when it comes down to it
  • we use orm to insulate ourselves from sql
    • express yourself in the native language of your choice instead of in sql
Is ORM State of the Art?
  • really just a bridge
  • why aren't there pure java or groovy datastores?
  • persistence is pretty uninteresting to developers
  • orm is a reasonable bridge, but a rather leaky abstraction as well
  • ted neward: orm is the vietnam of computer science
    • "[ORM] represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy."
What Drew Me to CouchDB
  • what if i didn't have to bridge technologies anymore?
  • what if i could save my objects in their native format?
    • couchdb is actually a json datastore, but grails makes it trivial to transfer pogo <-> json
  • just need a thin translation layer
NoSQL Solutions
  • Google BigTable
  • mongoDB
  • CouchDB
  • Cassandra
    • "this is the future, but no one believes us"
  • each one of these are a bit different and each has their strengths and weaknesses
  • NoSQL = "not only SQL"
  • don't think of nosql solutions as just another database; truly different way to think about persistence
  • if you think of it as just another database, it'll be the worst database you've ever used
  • need to get out of the mindset of "spreadsheet" type format for data
  • start thinking more about the right tool for the job
CouchDB History
  • starting point was Lotus Notes
    • largely ahead of its time
    • document database
    • not brand-new stuff--ideas and foundation has been around for a very long time
  • Apache project
RDBMS vs. CouchDB
  • rdbms
    • row/column oriented
    • language: sql
    • insert, select, update, delete
  • CouchDB
    • if your data has a more vertical orientation as opposed to horizontal, starts to look more like attachments
    • email is a good example: to, from, body, attachment
    • language: javascript (map/reduce functions)
    • put, get, post, delete (REST)
    • "Django may be built for the Web, but CouchDB is built of the Web." -- Jacob Kaplan-Moss, Django Developer
    • can build entire apps in CouchDB
  • Couch = acronym for "cluster of unreliable commodity hardware"
  • clustering is much more difficult to do clustering--couch was built from the ground up to be massively distributed, clusters out of the box
  • O'Reilly book available -- free online
Using CouchDB With Grails
  • grails has native json support out of the box
import grails.converters.* class AlbumController { def scaffold = true def listAsJson = { render Album.list() as JSON } def listAsXml = { render Album.list() as XML } } CouchDB 101
  • json up and down
  • restful interface
  • no drivers since it's just http
  • written in erlang
    • incredibly fast
    • designed for scalability and parallel processing
Installing CouchDB
  • sudo apt-get install couchdb
  • windows installer available
Kicking the Tires
  • ping
    • curl http://localhost:5984
      {"couchdb":"Welcome","version":"1.0.1"}
    • can also hit this in a browser, but of course can't do a POST from a URL in a browser
  • get databases
  • create a database
  • delete a database
  • uses standard HTTP response codes, e.g. a 201 response code for a database create
  • web UI available - "Futon"
  • create a document
  • create a document from a file
  • URIs for documents are essentially your primary key--unique way of representing the document
  • don't have to create schemas -- just start throwing documents at the database
  • documents get etags so they're very cache friendly
  • documents also get revisions--keeps tracks of multiple versions of the document
    • have to provide version number when updating
    • versioning numbers are revision number (integer), then -, then md5 hash of the document itself
    • can explicitly compress the database to get rid of old versions to reduce size of database
  • couch prefers uuids for the ids, but you can use anything you want
  • get UUID(s) from couch
  • to update a document, you'll get the latest version of the document, then do the update, then pass your changes back to couchdb which includes the revision number
  • one of the major things couchdb gives you since it's document based is that the data is accurate at that point in time
    • if the data changes in the future, in an rdbms the old document would get the new data
CouchDB With Grails
  • domain class--id and _rev as properties
  • can add couchdb stuff to Config.groovy to do stuff like create-drop for couchdb databases
  • add stuff to BootStrap.groovy
  • showing CouchDBService that has convenience methods around a lot of the URL calls to couch
Map/Reduce
  • in sql you say select firstname, lastname from foo (this is map) where state = 'NE' (this is reduce)
  • map and reduce are stored in 2 separate javascript functions

Polyglot Web Development With the Grails Framework #s2gx

Jeff Brown - SpringSource

Polyglot?

  • "many languages"
  • writing software in multiple languages
  • some people would say if you do any web development, you're doing polyglot
    • javascript, css, html, java, etc.
  • in the context of this talk, we'll be talking about implementing the actual business logic with multiple languages
Languages on the JVM
  • 200+ languages available on the JVM
  • many of these aren't exactly practical, but many are
  • at least 10-12 programming languages available on the JVM that could be used for serious development
  • big players are java, groovy, clojure, scala, jruby, jython
  • which of these is the best? no answer of course
    • personal preference, best tool for the job, etc.
  • many of these languages solve specific problems really well
  • all these languages are turing complete, so anything you can do in one you can do in another
  • but depending on the problem you're trying to solve, you may find one language or another is ideally suited to the task at hand
  • reached a point with CPU development where the speed of light is a factor in terms of increasing the speed
    • can't really make processors any faster with the current method of developing processors
    • instead of making faster processors, we're using more processors and multiple cores
  • concurrency is becoming more and more important
  • OO languages don't lend themselves to managing concurrency very well
    • allocate objects on the heap, objects in a shared mutable state
    • best we can do in OO languages is use locks so multiple threads can't access things at the same time
    • problem with locking is it's opt-in
  • functional languages make the concurrency problem almost disappear
    • no such thing as destructive assignment in a pure functional language
    • clojure and scala *do* allow destructive assignment
    • but in clojure, for example, you have to do this in a transaction
      • get a snapshot of the heap, and nothing can change on the heap while you're making changes
  • because of the advantages in terms of concurrency, it will be more common to use polyglot programming moving forward
    • e.g. write much of the application in groovy, but build parts of the application using a functional language
  • ultimately all this will run on the jvm, but we can take advantage of the best things each language has to offer
  • pretty different from the past--would have been rather unusual to write a C++ program that had other languages mixed in
Grails?
  • full stack MVC platform for the JVM
    • build system down to ORM, etc.
  • leverages proven staples
    • spring, hibernate, groovy, quartz, java, sitemesh
  • extensible plugin system
    • e.g. can pull out hibernate and use a different persistence mechanism, or can write your own plugins
  • since grails is built on the jvm you can take advantage of any language that will run on the jvm
Demo
  • showing how to write a grails app that uses a groovy controller, but a "math helper" class can be written in groovy, or java, or clojure
  • as long as there's a bean in the spring context, regardless of language, it can be injected into grails controllers
  • the grails controller doesn't care what language the classes it uses are written in
Clojure
  • core grails doesn't have clojure support, but there's a clojure plugin
  • plugin creates an src/clj file for clojure source
  • code in the grails controller doesn't have to change to take advantage of the math helper written in the different languages

(ns grails)
(defn addNumbers [x y]
  (+ x y ))

Using the Clojure Plugin

  • call via clj.mathHelper.addNumbers(x, y)
  • clj is the same as calling getClj()
  • the cloure plugin adds the getCjl method to all of the grails classes
    • classes*.metaClass*."getClj" = { return proxy } -- this is done in the withDynamicProperties method in the plugin config
    • the proxy is an instance of the clojure proxy class (grails.clojure.ClojureProxy)
  • no addNumbers method in the proxy class -- uses methodMissing
  • plugin looks for clojure methods in the grails namespace by default
    • if you need another namespace, clj['mynamespace'].methodName() will handle this
  • clojure plugin declares that it wants to watch all the files in src/clj
    • gets notified when files change and compiles the files if they change
  • if your plugin is adding things to the metaclasses on application startup, then you need to make sure your plugin also modifies controllers, services, etc. as they're changed while the application is running
  • can also observe only specific plugins for changes, e.g. only notify me when something involved with hibernate changes
  • can swap out other view technologies in grails
  • for taglibs, they have to be written in groovy, but inside the taglib you could be making calls to code in other languages
Who gets the credit?
  • grails? groovy? clojure? java? the jvm?
  • really it's the combination of all of them
  • don't have to walk away from grails to take advantage of any of the languages that run on the jvm

Thursday, October 21, 2010

How to Analyze Your Data and Take Advantage of Machine Learning in YourApplication #s2gx

Christian Schalk - Google
Google's New Cloud Technologies
  • google storage for developers 
    • api compatible with amazon s3
  • prediction api (machine learning)
  • bigquery
Google Storage
  • store your data in google's cloud 
    • any format, any amount, any time
  • you control access to your data 
    • private, shared, public
  • access via google apis or third party tools/libraries
  • sample use cases 
    • static content hosting, e.g. static html, images, music, video
    • backup and recovery
    • sharing
    • data storage for applications 
      • e.g. used as storage backend for android, appengine, cloud based apps
    • storage for computation 
      • bigquery, prediction api
Google Storage Benefits
  • high performance and scalability 
    • backed by google infrastructure
  • strong security and privacy 
    • control access to your data
  • easy to use 
    • get started fast with google and third party tools
Google Storage Technical Details
  • restful api 
    • get, put, post, head, delete
    • resources identified by uri
    • compatible with s3
  • buckets -- flat containers
  • objects 
    • any type
    • size: 100 gb / object
  • access control for google accounts 
    • for individuals and groups
  • two ways to authenticate requests 
    • sign request using access keys
    • ???
Performance and Scalability
  • objects of any type and 100GB/object
  • unlimited numbers of objects, 1000s of buckets
  • all data replicated to multiple US data centers
  • leveraging google's worldwide network for data delivery
  • only you can use bucket names with your domain names
  • read-your-writes data consistency
  • range get
Security and Privacy Features
  • key-based authentication
  • authenticated downloads from a browser

Getting Started with Google Storage
  • go to http://code.google.com for basic info
  • http://code.google.com/apis/storage (currently in preview mode) 
    • getting started guide, docs, etc.
    • can sign up for an account
  • command line tool available -- gsutil -- low-level access from the command line, scripting
  • google storage manager -- web-based tool for managing google storage

Google Storage Usage Within Google & Early Adopters
  • google bigquery
  • google prediction api
  • google.org -- imagery
  • google patents
  • panoramio
  • picnik
  • vmware
  • US Navy
  • theguardian
  • socialwok
  • xylabs
  • etc.
Pricing
  • storage: 0.17/gb/month
  • also costs for up/downloads
  • similar pricing to amazon s3
  • preview in US 
  • non-US preview available on case-by-case basis

Google Prediction API
  • google's sophisticated machine learning technology
  • available as an on-demand restful http web service
  • provide a bit of text and "train" the algorithm in the service to predict outcomes based on patterns 
  • simple example: language detection 
    • provide series of examples of english, spanish, french, etc. and train the prediction api to recognize the language
  • endless number of applications 
    • customer sentiment
    • transaction risk
    • etc
Prediction API Examples
  • predict and respond to emails in an automated way
Using the Prediction API
  • three step process 
    • upload training data to google storage
    • build a model from your data
    • make new predictions
Training
  • POST prediciton/v1.1/training?data=mybucket...
  • can respond when the prediction engine is ready and gives an estimate of accuracy

Predict
  • apply the trained model to make predictions on new data
  • returns json data
  • includes scores indicating confidence of prediction

Prediction API Capabilities
  • data 
    • input features: numeric or unstructured text
    • output: up to hundreds of discrete categories
  • Training 
    • many machine learning techniques
Prediction Demo
  • cuisine predictor
  • spreadsheet of type of food (e.g. mexican, italian, french) and food description as training data
  • upload spreadsheet to google data storage
  • kick off training process, then can check to see if it's done
  • pretty accurate predictions even on a limited training dataset
Google BigQuery
  • also resides on top of google storage
  • can have large amounts of data that you can quickly analyze using sql-like language
  • fast, simple to use
Use Cases
  • interative tools
  • spam
  • trends detection
  • web dashboards
  • network optimization
Key Capabilities
  • scalable to billions of rows
  • fast--response in seconds
  • simple--queries in sql
  • webservice based--rest, json
Using BigQuery
  • upload to google storage
  • call bigquery service to import raw data into bigquery table
  • perform sql queries on table
Security and Privacy
  • google accounts
  • oauth
  • https

Tools
  • bigquery shell utility available -- just type sql commands and get responses back
  • can tie in a google spreadsheet and point it to a bigquery table

Google App Engine for Business 101 #s2gx

How to Build, Manage & Run Your Business Applications on Google's Infrastructure
Christian Schalk - Developer Advocate, Google
  • not really an advocacy position
  • still in engineering, but work a lot more with users directly
  • go out to companies to help them be successful

What is cloud computing?
  • lots of different definitions
  • pyramid of (bottom up): 
    • infrastructure as a service 
      • joyent, rackspace, vmware, amazon web services
      • provides cooling, power, networking
    • application platform as a service 
      • GAE falls in this category
      • tools to build apps
    • software as a service 
      • google docs, etc.
GAE
  • easy to build
  • easy to maintain
  • easy to scale 
    • appengine resides in google's overall infrastructure so will scale up as needed
  • started with only python
  • with java support, opened the doors for java enterprise developers
By the Numbers
  • launched in 2008
  • 250,000 developers
  • 100,000+ apps
  • 500M+ daily pageviews 
    • 19,000 queries per second -- has almost doubled since January
Some Partners
  • best buy
  • socialwok
  • xylabs
  • ebay
  • android developer challenge
  • forbes
  • buddypoke 
    • 62 million users
  • gigya 
    • do social integration for large media events (movie launches, sports events) -- huge spikes in traffic so GAE just handles it
  • ubisoft
  • google lab
  • ilike
  • walk score
  • gigapan
  • others
  • point here is it's very easy to drop specific apps on GAE without running litearlly everything on GAE
  • very popular among social networking apps because of easy scalability
Why App Engine?
  • managing everything is hard
  • diy hosting means hidden costs 
    • idle capacity
    • software patches & upgrades
    • license fees
  • "cloud development in a box"
App Engine Details
  • collection of services 
    • memcache, datastore, url fetch, mail, xmpp, task queue, images, blobstore, user service
  • ensuring portability -- follows java standards 
    • servlets -> webapp container
    • jdo/jpa -> datasource api
    • java.net.URL -> URL fetch
    • javax.mail -> Mail API
    • javax.cache -> memcache
  • extended language support through jvm 
    • java, scala, jruby, groovy, quercus (php), javascript (rhino)
  • always free to get started
  • liberal quotas for free applications 
    • 5M pageviews/month
    • 6.5 CPU hours/day
Application Platform Management
  • download and install SDK 
    • Eclipse plugin also available
  • build app and then deploy to the public GAE servers
  • app engine dashboard
  • app engine health history 
    • shows status of each service individually across GAE as a whole
Tools
  • google app engine launcher for python
  • sdk console 
    • local version of the app engine dashboard
  • google plugin for eclipse 
    • wizard for building new app engine apps
    • can run the entire gae environment locally within eclipse
    • easy deployment to app engine servers
    • in process of building a new version of this with more features
Continuously Evolving
  • aggressive schedule for providing new features
  • may 2010 -- app engine for business announced
What's New?
  • multi-tenant apps with namespace API
  • high performance image serving
  • openid/oauth integration
  • custom error pages
  • increased quotas
  • app.yaml now usable in java apps
  • can pause task queues
  • dashboard graphs now show 30 days
  • more -- see http://googleappengine.blogpost.com
Getting Started
Creating and Deploying an App
  • demoing eclipse plugin
  • can create a new Google Web Application, optionally with GWT
  • projects follow the typical java webapp structure
  • before deployment, can test/debug locally just like any Java project in eclipse
  • even the datastore is available locally for development/testing
  • new features tend to be introduced in python first, then java gets them later
  • to deploy, right click the project, choose "google," then deploy 
    • this brings up a window where you put in your application ID and version, then uploads to the GAE servers
  • can log into GAE dashboard and configure billing with maximum charges if your app will exceed the free quotas
  • can use your own custom domains, this ties into google apps
  • can assign additional developers to GAE applications by email address
  • can deploy new versions of applications and keep the old ones as well, can toggle between versions and choose one as default
What about business applications?
  • GAE for Business
  • same scalable cloud hosting platform, but designed for the enterprise
  • not production quite yet
  • enterprise application management 
    • centralized domain console (preview available today)
  • enterprise reliability and support 
    • 99.9% SLA
    • direct support 
      • tickets tracked, phone support, etc.
  • hosted SQL (preview available today) 
    • managed relational sql database in the cloud
    • doesn't replace the datastore--available in addition to the datastore
  • ssl on your domain 
    • current core product doesn't offer this
  • secure by default 
    • integrated single signon
  • pricing that makes sense 
    • apps cost $8/user, up to a max of $1000 per month

Enterprise App Development With Google
  • GAE for Business
  • Google Apps for Business
  • Google Apps Marketplace
  • Firewall tunneling technology available (Secure Data Connector)
App Engine for Business Roadmap
  • enterprise admin console (preview)
  • direct support (preview)
  • hosted sql (limited release q4 2010)
  • sla (q4 2010)
  • enterprise billing (q4 2010)
  • custom domain ssl (2010 - 2011)
SQL Support
  • can run this all locally in eclipse
  • demo of spring mvc travel app running on GAE with the SQL database 
    • have to explicitly enable sessions
    • had to disable flow-managed persistence
Become an App Engine for Business Trusted Tester!

Developing Social-Ready Web Applications #s2gx

Craig Walls - SpringSource
  • working on Spring Social, which is the brains behind Greenhouse (web/mobile conference app for SpringOne)
Socializing Your Applications
  • why would you want to do this?
  • this is where your customers are--lots of people spend a LOT of time on Facebook
    • if they're there, you want to be there with them
  • Facebook--over 500 million active users
    • third largest country in the world
    • 50% log on to Facebook on any given day
    • there's even a movie about it--that says something
  • Twitter -- over 100 million users
    • more than 190 million unique visitors monthly
    • more than 65 million tweets per day
  • Others: LinkedIn (80 million members), TripIt (230,000 trips planned per month)
  • More: FourSquare, YouTube (2 billion videos viewed per day), MySpace, Gowalla, Google, Flickr
  • how do you use this to better your application?
    • really depends on the customers and applications
    • don't want to make people come to you, better to interact with people where they already are
    • you can have your customers tell you things about themselves and this data would be hard to get otherwise
Types of Social Integration
  • widgets
    • facebook xfbml/js; the "like" button
      • xfbml -- tag library that's interpreted on the client by javascript
    • twitter @anywhere
    • linkedin widgets / linkedin jsapi
      • jaspi resembles xfbml
  • embedded
    • facebook applications
    • igoogle gadgets
    • myspace applications
  • rest api
    • provided by virtually all social networks
    • consumed by external and embedded applications
Widgets
  • facebook connect
    • xfbml tag on page adds the login button to any page (<fb:login-button ...>Connect to Facebook</fb:login>
    • demoing "find my facebook friends" functionality (<fb:multi-friend-selector ...> -- fbml tags that run on the server)
  • twitter @anywhere offers some javascript-based widgets, e.g. follow, connect with twitter
    • can also linkify and hovercard text--does this with a class to add the links and javascript handles adding links (hovercard is the thing that shows the little twitter profile boxes for users)
    • twitter anywhere has great examples in their documentation
Facebook Embedded Applications
  • hosted on your own servers, but look seamless when you're on facebook (look like they're part of facebook)
  • can leverage widgets, REST APIs, javascript apis, etc.
  • most often used for games, quizzes, surveys, etc.
Accessing Social Data with REST Social APIs
  • common operations
    • get user profile
    • get/update status
    • get list of friends
  • specialized operations
    • facebook: create photo album, create a note, etc.
    • twitter: create/follow a list, view trends
    • tripit: retrieve upcoming trips, view friends nearby
  • all done with restful apis
    • most support both json and xml representations
Searching Twitter RestTemplate rest = new RestTemplate(); String query = "#s2gx"; String results = rest.getForObject("http://search.twitter.com/search.json?q={query}", String.class);
  • if you want to get friends on twitter, you get the user IDs back, so you have to make another call back to get info about the user based on the user id
Facebook Graph API
  • interesting form of REST API
  • two basic url patterns
  • if you don't have an authorization key you only get very basic info back (name, gender, country)
Securing Social Data: OAuth is the key to social data
  • most social data is secured behind oauth
  • authentication takes place on social provider
  • consumer application given an access token to access user's profile
    • this gets around having to give another application your login credentials
    • also lets you revoke access for specific applications
  • consumer never knows the user's social network credentials
  • demo of trying to post a tweet without being authorized--throws a 401 error
  • when you sign in via oauth you're signing into the originating application (e.g. facebook) and then facebook tells the application "yes, the provided the correct authentication and have given you permission to do what you told them you were going to do"
    • click "connect with facebook" button from an application
    • box pops up from facebook where the user logs in and grants permissions
    • facebook then makes the connection and gives the application an access key
Comparing OAuth and OpenID
  • openid
    • primary concern is single sign-on
    • shared credentials for multiple sites
    • authentication takes place on your chosen openid server
  • oauth
    • concern is shared data
    • sign into the host application
    • host application then gives some other application access
  • if you sign on via oauth the underlying mechanism could be openid
Versions of OAuth in Play
  • OAuth 1.0: tripit
  • OAuth 1.0a: twitter, linkedin, foursquare, most others
  • OAuth 2: still in draft; early adoption by facebook (not quite full oauth 2) salesforce, gowalla, github, 37signals
    • on target to go final by the end of the year
Signing a request: OAuth 1.0a
  • construct a base string that includes ...
    • the http method
    • the request url
    • any parameters (including post/put body parameters if the content type is "application/x-www-form-urlencoded")
  • encrypt the base string to create signature
    • commonly hmac-sha1, signed with api secret
    • could be plaintext or rsa-sha1 (if supported)
  • add authorization header to request
The OAuth 2 Dance -- much simpler than oauth 1
  • request authorization from user
  • return to consumer with the authorization code in the request
  • exchange auth code and client secret for access token
  • return access token to consumer for use in REST API calls
Easy Facebook OAuth
  • <fb:login-button perms="email.publish_stream,offline_access">Connect to Facebook</fb:login-button>
  • offline access = the application can access your facebook account at any time
  • oauth 2 gives you the option to create an access token that will expire after a period of time
  • oauth 2 also has a renewal token so you can renew expired tokens, but facebook doesn't support renewal tokens yet
  • if you give the application the "give this app access at any time" it's really just a way to not have the access token expire
    • currently access tokens expire after about an hour
  • once you authorize with FB, you get a cookie back called fbs_appKey (where appKey is your application's key)
    • cookie also includes the access token and user id
  • if you store access tokens in your application's local database, you should store them encrypted
  • once you have the access token, you make the same call to facebook but pass the access token, and then you get a lot more of the profile info from facebook
Social REST API Challenges
  • signing a request for oauth 1.0(a) is difficult when using Spring's RestTemplate
  • each social provider's api varies wildly
  • getting a facebook access token requires parsing the cookie string
  • how should various http response codes be handled?
Spring Social
  • supports social integration in Spring
  • born out of Greenhouse development
TwitterTemplate
  • simplifies signing of OAuth 1 requests through RestTemplate
  • Offers consistent API template-based API across social providers
  • extends spring MVC to offer Facebook access token and user ID as controller parameters
  • maps social responses to a hierarchy of social exceptions
  • Spring Social can get at the actual response to a 4XX error code which you can't get if you're using RestTemplate directly
  • similar to using JdbcTemplate which gives you more detail than the raw sql exceptions
  • Spring Social includes TwitterTemplate to make interacting with twitter much easier
FacebookTemplate
  • a bit simpler since all that's needed is the access token
  • FacebookTemplate facebook = new FacebookTemplate(ACCESS_TOKEN);
  • String profileId = facebook.getProfileId();
  • also linkedin template and tripittemplate
Spring Social Next Steps
  • expanding available operations in social templates
  • more social templates for other providers

Introduction to Tomcat 7 #s2gx

Mark Thomas, SpringSource
  • Tomcat 7 Supports ...
    • Servlet 3.0
    • JSP 2.2
    • EL 2.2
    • Java 1.6
  • New major release of Tomcat every time the spec has a major change
  • Servlet 3.0
    • asynchronous processing
    • pluggability
    • annotations
    • session management
    • miscellaneous
  • Asynchronous processing
    • request processing is synchronous, but the response processing can now be asynchronous
    • outline
      • start asynch processing
      • request/response passed to a new thread
      • container thread returns to the pool
      • new thread does its work
    • allows container threads to be used more efficiently
      • when waiting for external resources
      • when rationing to a resource
      • or any other time when the container thread would be blocking
    • allows separation of request and response
      • chat applications
      • stock tickers
    • all filters, servlets, and valves in the processing chain must support asynchronous processing
    • not as asynchronous as COMET
  • pluggability
    • purpose was to improve developer productivity--worry less about application configuration
    • annotations
    • web fragments
    • static resources in JARs
    • programmatic configuration options
    • pros
      • development can be faster
      • apps can be more modular
    • cons
      • fault diagnostics are significantly hampered
      • might end up enabling things you don't want or need
    • overall, I don't recommend using it for production
    • instead:
      • get tomcat to generate the equivalent web.xml
      • use the equivalent web.xml instead
    • can be frustrating to figure out what's going on when the application is doing things that aren't in web.xml
    • JARs can contain their own web.xml
    • allows JARs to be self-contained
    • JARs can also contain static resources
      • always used, cannot be excluded by fragment ordering
      • non-deterministic if there are duplicate reosurces in multiple JARs
  • annotations
    • servlets, filters, listeners
      • can be placed on any class
      • tomcat has to scan every class on application start
    • JARs scanned if included in fragment ordering
      • can exclude JARs from the scanning process; controlled in catalina.properties
    • security, file upload
      • placed on servlets
      • processed when class is loaded
    • file upload has almost--but not quite--the same API as Commons File Upload
      • don't have to ship commons file upload with your apps anymore
    • with annotations the configuration can become a lot more opaque
    • can turn all of this off in your main web.xml--turn off metadata complete
      • this is all or nothing--can't pick and choose what bits you want on or off
  • programmatic configuration
    • allows a subset of things you can do in we.xml
      • add servlets, filters, and listeners
      • change session tracking
      • configure session cookies
      • configure security
      • set initialization parameters
    • allows greater control / optional configuration
    • some environment-specific settings
    • can make troubleshooting difficult--no xml to refer to in order to see what's going on
    • main advantage is doing things like if/thens in your configuration which you can't do in web.xml
  • servlet 3.0 - session tracking
    • adds tracking via ssl session id
      • must be used on its own
    • allows selecting of supported tracking methods
      • url, cookie, ssl
    • url based tracking is viewed as a security risk
      • can't turn this off in servlet 2.2, but can turn it off in servlet 3.0
      • another release of tomcat 6 will likely allow this to be turned off
    • session id is cryptographically secure -- can't be spoofed
  • servlet 3.0 - session cookies
    • can control default parameters for session cookies
      • name - may be overridden by tomcat
      • domain - may be overridden by tomcat
      • path - may be overridden by tomcat
      • maxage
      • comment
      • secure - may be overridden by tomcat
      • httponly - may be overridden by tomcat
  • servlet 3.0 - misc
    • httpOnly
      • not in any of the specs
      • however, widely supported
      • prevents scripts accessing the cookie content
      • provide a degree of xss protection
    • programmatic Login
      • useful when creating a new user account
      • can log the user in without redirecting them to the login page
      • allows the application to trigger a login
  • jsp 2.2
    • propery group changes
    • can specify default content type in jsp-config
    • can specify the buffer size for a page
    • new feature - error-on-undeclared-namespace
      • e.g. if you have a typo when using a tag library it fails silently
      • with error-on-undeclared-namespace turned on, error is thrown at compile time
    • jsp:attribute adds support for the omit attribute
  • ESL 2.2
    • now possible to invoke methods on a bean
    • correctly identifying the intended method is tricky
    • likely to be some differences between containers--spec if unclear on behavior
    • tomcat tries to do what the java compiler does
  • other tomcat 7 changes: management
    • add the ability to fix the remote jmx ports
      • previously jmx picked a port at random
    • single line log formatter
    • manager app can distinguish between primary, backup, and proxy sessions (for clusters)
    • aligned mbeans with reality (GSoC 2010)
    • general improvements to JMX support
      • can now have a server.xml with just a <Server .../> element and create a fully working Tomcat instance (Hosts, Contexts, etc. all via JMX)
        • can't save this config out but that's being worked on
  • performance
    • unlikely to see a big change
    • can limit the number of JSPs loaded at any one time
      • useful for development
    • not many areas where tomcat needs a big performance boost
  • security
    • generic CSRF protection
      • if you go to a site with malicious code, might trigger your browser to make a call to the tomcat manager to deploy an app that gives access to your machine
      • now the manager looks for a token that was passed from the previous response to the manager app and if the token doesn't exist, the request will fail
    • separate roles for manager and host manager apps
    • session fixation protection
      • changes session ID on authentication
    • enable the LockOutRealm by default (e.g. lock out user for 10 minutes after 5 failed login attempts)
    • enable an access log by default
    • added ability to disable exec command for SSI
  • code cleanup
    • use of generics throughout
    • removed deprecated and unused code
    • reduced duplication, particularly in the connectors
    • better definition of the lifecycle interface
    • added checkstyle to the build process
    • if you've written your own custom tomcat components, you might need to change them for tomcat 7
  • extensibility
    • added hooks for rfc66 - used by virgo
    • refectored to simplify geronimo integration
    • significantly simpler embedding
  • stability
    • builds on tomcat 6
    • tomcat 6 is already very stable
    • significant reductions in the open bug count
      • 6 open bugs without patches when i wrote this slide
      • for tomcat 5.5.x, 6.0.x, and 7.0.x combined
    • added unit tests
      • CI using BIO, NIO, and APR/native on every commit
    • memory leak detection and prevention
      • back-ported to tomcat 6
  • flexibility
    • copying of /META-INF/context.xml is now configurable -- can control whether or not the expansion/copying of this file happens
    • alias support for contexts
      • map external content into a web application
      • keeps tomcat from deleting things in a symlink when the app is undeployed
    • shutdown address is now configurable
      • deliberately limited to localhost by default
    • tomcat equivalent of some httpd modules
      • mod_expires
      • mod_remoteIP
  • tomcat 7 status
    • passes servlet 3.0 TCK with every combination of connectors
    • passes jsp 2.2 TCK
    • passes EL 2.2 TCK
    • all with the security manager enabled
    • note that just because it passes the TCK doesn't necessarily mean it's fully compliant
    • 7.0.4 just released today
  • when will tomcat 7 be stable?
    • when three +1 votes come from committers
    • in practice the committers each have their own criteria
    • i'm looking for 2-3 releases with ...
      • no major code changes that might cause regressions
      • tcks all pass (already have this)
      • no major bugs reported
      • good levels of adoption (already have this)
  • tomcat 7 plans
    • one release every month
      • bug 49884 put a spanner in the works
    • stable by the end of the year?
    • keep on top of the open bugs
    • work on bringing the open enhancement requests down
    • if all goes well, 7.0.6 will be the stable release
    • jsr 196 implementation?
      • authentication SPI for containers
      • geronimo has most (all?) of this already
    • windows authentication
      • looking unlikely -- too much baggage
        • needs some native libraries for it to work well
      • waffle project already does this
    • simpler jndi configuration for shared resources
      • no more <ResourceLink ... />
    • more jmx improvements
    • further improvements to memory leak protection
    • continue migration from valves to filters
    • java ee 6 web profile
      • no interest so far from user community
      • had more questions from journalists than users
      • no plans at present
      • adds a lot of baggage that isn't that useful
      • if you want a web profile implementation, there's geronimo
  • useful resources
  • new feature -- rolling update/side-by-side deployment
    • can deploy a new version while the app is running and when a user's session expires, they hit the new version of the app
    • came out of a tc server requirement but made more sense to implement it in Tomcat
    • springsource providing patch to ASF and will be part of a future tomcat release
    • deploy a new WAR with the same name as an existing app, but add ##N at the end of the war file name where N is the version (e.g. myapp##1.war will be a new version of myapp.war)
      • context path is retained, meaning context path is the same for both versions of the app
    • feature that will be added is when no more sessions are active on the old version it will be automatically undeployed

Wednesday, October 20, 2010

Advanced GORM: Performance, Customization, and Monitoring #s2gx

Speaker: Burt Beckwith, SpringSource

Overview

  • demo of potential performance issues with mapped collections in GORM
  • using the hibernate 2nd-level cache
  • monitoring and managing 2nd-level caches
  • app info plugin
Standard Grails One-to-Many
  • library has many visits
  • visit class has person name and date, with backreference to library
What's the problem?
  • hasMany = [visits:Visit] creates a set
  • sets guarantee uniqueness
  • adding to the set required loading all instances from the database to guarantee uniqueness, even if you know the item is unique
  • likewise for a mapped list--lists don't guarantee uniqueness, but they do guarantee order, so they still have to pull all records from the db to get the order right
  • you get a false sense of security since it's lazy-loaded; only partially helpful
  • works fine in development when you only have a few visits, but imagine when you deploy to production and you have 1,000,000 visits and want to add one more
  • risk of artificial optimistic locking exceptions; altering a mapped collection bumps the version, so simultaneous visit creations can break but shouldn't
What's the Solution?
  • don't use collections
  • instead of visit belonging to a library, visit HAS a library
  • different syntax for persisting a visit
  • no cascading; to delete a library you need to delete its visits first in a transactional service method
  • you also lose your collection so you can't do library.visits.size(), etc. but you can still use dynamic finders,which is better anyway since you're only pulling what you need
Standard Grails Many-to-Many
  • user has many roles, role has many users
  • problem is that if all new users are granted a particular role, you get into scaling issues quickly
  • with many to many you have an intermediate table with pointers to both tables and can map the join table
  • the belongsTo in a many to many can go in either class since it's bidirectional
    • but this is the problem since the same amount of data will get loaded either way
  • more efficient to treat kind of like one-to-many and create the user, then grant the role
    • this way you're adding/deleting single records in a single table due to existence of a domain class describing the relationship
  • important to have well-defined equals() and hashCode() methods in your domain classes, as well as implement serializable so you can use second level caching
  • wind up with user.addToRole() or role.addToUsers()
  • no cascading like before--have to manage this yourself
So never used mapped collections?
  • no, you need to examine each case
  • standard approach is fine if the collections are reasonably small--for both sides in the case of many to many
  • the collections will contain proxies, so they're smaller than real isntances until initialized, but still a memory concern
  • great example of something that's convenient and easy out of the box, but when it becomes a problem, you just do it a different way
Using Hibernate 2nd-level Caching
  • great, but have to  be careful because it can burn you in the same way that a query cache in a database can bite you
  • great candidates for caching--anything that's read only and doesn't change often
  • can overuse cache to the point where you're spending more cycles flushing and aren't saving yourself any db traffic--can actually make things worse than just hitting the db
Caching Usage Notes
  • 1st level cache is the hibernate session itself
  • get is always cached
  • can significantly reduce db load by keeping instances in memory
  • can be distributed between multiple servers to let one instance load from the db and share updated instances, avoiding extra db trips
  • "cache true" creates a read/write cache, best for read-mostly object since frequently updated objects will result in excessive cache invalidation (and network traffic when distributed)
  • "cache usage: 'read only'" creates a read-only cache, best for lookup data (countries, zip codes, etc.) that never change
Query Cache
  • can set cacheable true on all the query options; this caches the query and you can grab class instances from this
Hibernate query cache considered harmful?
  • most queries are not good candidates for caching; must be same query with same parameters
  • updates to domain classes will pessimistically flush all potentially affected cache results
  • DomainClass.list() is a decent candidate if there aren't any (or many) updates and the total number isn't huge
  • great blog post by alex miller at http://tech.puredanger.com/2009/07/10/hibernate-query-cache
2nd Level Cache API
  • evict one instance: sessionFactory.evict(DomainClass, id)
  • can get stats (hits/misses, etc.)
  • can look at the stats and get a good sense of whether or not you're caching effectively
  • e.g. if miss count is high then your cache strategy isn't effective
  • appinfo plugin gives you tons of information about what's going on in your app, what's going on in hibernate, etc.

Groovy Web Services, Part I: REST #s2gx

Speaker: Ken Kousen - Kousen IT, Inc.
  • currently working on book for Manning: "Java and Groovy: The Sweet Spots"
  • "Java is really good for tools and infrastructure. Groovy is good for pretty much everything else."
Two Flavors of Web Services
  • SOAP based
    • SOAP wrapper for payload
    • much like an XML API on a system
    • makes header elements available
    • lots of automatic code generation--tools are very mature
      • stubs, skeletons, proxies, etc. are all written for you
      • sprinkle annotations into your codebase to get a web service out of it
  • REST based
    • everything is an addressable resource (uri)
    • leverage http transport
    • uniform interface (get, post, put, delete)
      • a lot of rest web services available today aren't truly restful--e.g. amazon web services where you have to specify the operation name
      • just invoking methods over the web using URLs, but the request type doesn't matter
  • REST -> Representational State Transfer
    • term coined by Roy Fielding in PhD thesis (2000), "Architectural Styles and Design of Network-Based Software Architectures" (available in HTML online; very readable and thought-provoking)
    • one of the founders of the Apache Software Foundation
    • on the team that came up with the spec for HTTP
  • Idempotent
    • repeated requests give the same result
    • get, put, and delete requests are meant to be idempotent
    • POST is not meant to be idempotent
    • e.g. on put--if you're supposed to get the same result every time, the assumption is that you'd know the uri you're applying this to every time
      • the ID will be in the request somehow
      • assumption here is that you know what the ID is going to be before you do the insert
      • having the ID ahead of time probably isn't going to happen in the real world
    • most people wind up doing POST for inserts and PUT for updates
    • "safe" is another term that comes into play--requests do not change the state of the server
      • GET requests should never change the state of the server
  • Content negotiation
    • each resource can have multiple representaitons
      • e.g. xml or json?
    • request will state which representation it wants
  • RESTful client
    1. asemble url with query string
    2. select the http request verb
    3. select the content type
    4. transmit request
    5. process results
  • Steps for rest above are more cumbersome than SOAP since SOAP does a lot of this for you. With REST you have to deal with all of this yourself.
    • where groovy comes into play, is that it's augmented a lot of these classes to make this easier
    • groovy also shines when you get the results back
EXAMPLE: Google Geocoder
  • The Google Geocoding API -- converts address info for latitude, longitude
  • Base URL is http://maps.googleapis.com/maps/api/geocode/output?parameters
    • output is XML or JSON
    • parameters include
      • url encoded physical address
      • sensor = true | false (indicates whether or not the request is coming from a GPS-enabled device)
  • Groovy makes it easy to assemble a query string
    • map of parameters, then collect and join, then deal with the xml

def url = base + [sensor:false, address:[loc.street, loc.city, loc.state].collect { v -> URLEncoder.encode(v, 'UTF-8') }.join(',+')].collect {k,v -> "$k=$v"}.join('&')

def response = new XmlSlurper().parse(url)

  • simple read-only web services are a natural fit for REST, even if a lot of other web services in your organization are SOAP
  • get, post, put, and delete also map very naturally to sql verbs, so simple data management apps are natural for rest
  • content negotiation based on URL, e.g. http://.../xml for xml, http://.../json for json
Twitter API
  • closer to true REST
  • in the middle of changing their API again
  • have been supporting basic authentication with base64 encoding for years, which isn't encrypted
  • twitter API is http://dev.twitter.com/doc
  • twitter is oauth only now
HTTP Clients
  • simple http client in groovy -- java.net.URL class
  • get is easy, just access the url
  • def xml = new URL(base).text -- returns the text of the entire page
  • post, put, and delete in groovy are similar to java
Building a RESTful Service
  • design your URL strategy
    • what are URLs going to mean, what are http verbs going to mean
  • sample url strategy
    • http://.../songs
      • get returns all songs
      • post adds a new song
      • put, delete not used on this url
    • http://.../songs/id
      • get -- return a specific song
      • post -- not used
      • put -- update song with given id
      • delete -- remove song with given id
Groovlet
  • groovy servlet -- groovy class that responds to an http request
  • easy to configure
  • provides access to the http methods
  • markup builder called html
  • set up the groovy servlet in web.xml
  • showing code for the various operations in the groovlet
  • main tools we have in groovy for restful web services are the xml slurper and markup builder--makes all this very simple
What does Java bring to the table?
  • one of groovy's principles is to not reinvent stuff that's available in java
  • inevitable that there's a performance penalty for using groovy
  • nice thing is you can mix and match groovy with java and use what's appropriate for individual portions of the application
  • JSR311 - JAX-RS
  • currently JAX-WS is part of Java EE 5 but also part of Java SE 1.6
  • specification for JAX-RS -- part of Java EE 6
Libraries for HTTP
  • apache http client
  • groovy http builder
  • rest client available in the groovy http builder
  • typical for groovy: take a java library and make it easier/build on it
Grails
  • grails apps make REST easy
  • can set up mappings to map urls to controllers/methods -- this is built in
  • xml marhsalling is dead simple in grails -- just use "render foo as XML"
    • works great for xml that doesn't go too deep
    • can always use the markup builder
Downside to REST
  • No WSDL, therefore no proxy generation tools
    • all hand-written code for the plumbing
  • WADL: Web Application Description Language
    • attempt to create WSDL for REST
    • slowly gaining acceptance but not common yet
JAX-RS
  • attempt to do for REST what JAX-WS does for SOAP
  • Jersey -- reference implementation for JAX-RS
  • annotation based
Conclusions
  • groovy makes it very easy to build URLs with query strings, invoke urls and get responses, and parse xml response
  • groovy works with JAX-RS reference implementation

Clustering and Load Balancing With tc Server and ERS httpd #s2gx

Mark Thomas - SpringSource
  • Tomcat committer
  • tc Server developer
  • responsible for keeping tc Server and Tomcat in sync
    • memory leak detection in tomcat manager app
    • recent logging improvements
    • simplifying jmx access
    • all of the above started in tc Server, but have been contributed back and implemented these features in tomcat
    • don't want to get into having a significant fork of tomcat
Typical Architectures
  • load balancer (round robin) -> httpd (sticky sessions) -> tc Server (clustered)
    • don't go anywhere near tc Server clustering unless you absolutely have to--adds complexity and overhead
    • only thing tc Server clustering gives you is the ability for users not to lose sessions if an instance of tomcat goes down
    • ask yourself how big of a deal it is if your users lose their sessions when an outage occurs--if it's a big deal then you may need clustering
Starting Point
  • ubuntu 8.04.4 64-bit VM
  • vmware tools installed
  • 64-bit sun jdk 1.6.0_21
  • will be installing tc Server, Hyperic, etc. on this clean image
tc Server Installation
  • don't run tc Server as root
  • create a tcserver user
    • owns the tc Server files
    • runs the tc Server processes
  • install to /usr/local/tcserver
Instance Naming and Port Numbering
  • think about this in advance--may wind up with 100s of instances
  • tc01, tc02, etc. as the instance name, then follow this for ports
  • example scheme for ports
    • 1NN80 - http
    • 1NN43 - https
    • 1NN09 - ajp
    • 1NN05 - shutdown (if used)
    • 1NN69 - jmx
  • server and jvmRoute naming--consider linking server name to IP address, e.g. srvXXX-tcYY where XXX is the end of the IP address, YY is the tomcat instance number
    • 1NN20 - cluster communication
DEMO: Installing tc Server
  • tc Server version names are e.g. apache-tomcat-6.0.29.A.RELEASE where the first part is the version of Tomcat, the "A" means it's the first release of tc Server based on that tomcat release
  • if shutdown port is disabled, doing a kill -15 does a graceful shutdown. kill -9 works too and tomcat won't care, though your application might, so only do -9 if you have to
  • created two instances of tc Server using the tc Server create instance script
  • tc Server comes with templates for startup scripts--copy these over to /etc/init.d and edit as needed
  • paramterize cluster addresses and ports in a catalina properties file
  • can use ${...} notation in server.xml to hit the properties in catalina.properties
Creating a Cluster
  • switching to static node membership
    • cumbersome for large clusters
    • remove the <Membership .../> element
    • need to add a bunch of config stuff after the <Interceptor .../> elements
  • easier to use dynamic node discovery
  • backup strategies -- tomcat gives you DeltaManager and BackupManager
    • delta manager is simplest--replicates every session to every node in the cluster
    • if your sessions use a lot of memory, delta manager doesn't give you much scalability
    • if your limitation is CPU, delta manager gives you some scalability
    • amount of network traffic on delta manager increases with the square of the number of nodes--not terribly scalable
  • backup manager
    • replicates session data to one other node in the cluster
    • send options: synchronous vs. asynchronous
      • in synchronous, writes session changes to other nodes, waits for acknowledgement, and then sends response to the user. can mean a lag for the user.
      • asynchronous -- changes to sessions are put on a queue and the user gets the response immediately. means there's a chance that the cluster will be in an inconsistent state. use of sticky sessions means the consistency of the cluster doesn't really matter.
      • because java thread running isn't deterministic, in asynchronous mode the session updates may not be processed in the same order in which they were placed on the queue, so if your application depends on these being processed in the same order this is a risk
    • no need for the WAR farm deployer -- hyperic does this better
      • WAR farm deployer has been removed from tc Server
    • backup manager DOES know where the primary and backup nodes ARE for every session
      • i.e. it doesn't actually store all the sessions from all nodes, but it knows where to get the session it lost
    • backup manager scales much better than delta manager in both memory and network traffic
      • network traffic scales linearly with number of nodes
  • for availability on a small cluster, use the delta manager
  • if you're worried about scalability, go with the backup manager
Hyperic HQ Installation
  • create an hqs user
  • hqs user owns the hyperic hq agent files
  • the agent itself runs as the tcserver user
  • os security considerations
    • agent doesn't need root privileges to access OS mechanics, start/stop processes, etc.
    • tc Server needs to be able to read WAR files uploaded via the agent
    • don't want tc Server runtime running as root
  • hyperic security considerations
    • don't want agent connecting as hqadmin super user
    • create a dedicated agent user
    • requires create, modify, and delete privileges for platform and platform services only
ERS httpd
  • ERS = Enterprise Ready Server
  • SpringSource's distribution of Apache httpd
  • install ERS as root
    • httpd processes run as nobody:nobody so this is fine
  • remove the test instance
  • create a new instance
  • module configuration
    • enable mod_proxy_balancer
    • enable mod_proxy_ajp
    • mod_proxy_ajp isn't quite as stable vs. mod_jk and mod_proxy_http
    • mentioned something about mod_http now having remote IP addresses available--need to ask about this
  • configure balancer in ers

<Proxy balancer://tc>
  BalancerMember http://ip.address.here:port route=tc01-uniqueID
  BalancerMember http://ip.address.here:port route=tc02-uniqueID
</Proxy>

ProxyPass /cluster-test balancer://tc/cluster-test stickysession=JSESSIONID:jessionid
ProxyPassReverse /cluster-test balancer://tc/cluster-test

Debugging Clusters

  • need something in your apps that tells you which cluster node you're on
  • also need something to spit out the session ID so you can test that the sticky sessions are working
  • if your context path differs from your host name in tc Server, this may cause your cookies not to work since the hosts are different
    • can use cookiepath in proxypassreverse directive
    • easier: just have your context path match your host name
  • anything you want replicated in sessions has to be serializable
    • if your application can't support having everything in the session be serializable, terracotta will support non-serializable data in session replication

Gaining Visibility Into Enterprise Spring Applications with tc ServerSpring Edition #s2gx

Gaining Visibility Into Enterprise Spring Applications with tc Server Spring EditionSteve Mayzak - SpringSource
  • tc Server -- enterprise version of Tomcat developed by SpringSource
  • Built on Tomcat
  • Survey in 2008: 68% of companies surveyed using Tomcat; most popular lightweight container
tc Server editions
  • developer edition -- can get it when you download STS
  • standard edition -- application provisioning, server administration, advanced diagnostics
  • spring edition -- spring + tomcat stack, spring application visibility, spring performance management
tc Server: Key Highlights
  • Developer efficiency
  • operational control
  • deployment flexibility
Spring Insight
  • Spring Insight -- knows about Spring and Grails applications, so can provide specific information about your apps as they're running
  • When you deploy a WAR to tc Server, Spring Insight gets involved and get get specific information from your apps
  • Spring Insight is currently intended for development use only
  • DEMO: huge amount of details come out of Spring Insight, from the HTTP request details down to the JDBC statements that were run and timings on the various actions during the request
  • can use Insight with other frameworks, etc. as well by adding annotations to your code
Operational Control
  • performance & sla management of spring apps
  • application provisioning and server administration
  • rich alert definition, workflows, and control actions
  • group availability & event dashboards
  • secure unidirectional agent communications
  • tc Server is a combination of Hyperic and Tomcat
  • monitoring is done via valves in Tomcat--isn't a fork or modified version of Tomcat
  • Hyperic monitors web servers, app servers, databases, caching, messaging, directories, virtualization, etc.
  • Hyperic is also a management tool--admin, provisioning, groups, metrics, alerts, events, access control, upgrades, etc.
    • hyperic is jmx based, runs as an agent on each server
Enterprise Capabilities in tc Server
  • Run multiple instances per install -- creates tc Server install updates
    • tc Server instances can point to a central set of binaries so upgrades are simpler
  • advanced scalability options
    • non-blocking (NIO) conectors
    • high-concurrency connection pool
  • Can create tc Server templates and create multiple instances from a base template
  • Advanced diagnostics--detects deadlocks and slow-running requests
Hyperic
  • monitoring and deployment
  • can see what apps are running on tc Server, number of sessions on the apps, up/down times, etc.
  • can deploy war files and change context path as you deploy
  • can schedule deployments and have the server auto-restart
  • can deploy from a remote machine (e.g. build server)
  • can remotely stop/start/restart instances
  • can set up scheduled restarts--e.g. make configuration changes during the day, have server auto-restart after hours
  • can schedule recurring restarts (e.g. restart daily at 1 am)
  • hyperic consumes jmx metrics and can show them in context, meaning it shows application metrics in the context of the overall server health (cpu, memory, etc.)
  • with the metrics coming in you can look at the specifics of your application and tune according to what your specific application is doing
    • has everything from cpu, memory, disk I/O metrics, to request metrics, down to spring-specific metrics
  • can add JMX instrumentation to your own classes without having to write your own MBeans--just annotate with @ManagedResource, @ManagedMetric, @ManagedOperation, @ManagedAttribute
    • this lets you get down to questions like "how many bookings per second can I handle in my travel booking application?" which lets you plan infrastructure and scalability in a very granular way
  • hyperic does metric baselining for your specific app so you'll get alerts based on the baseline metrics for your application
  • can enable/disable metrics and alerts across multiple servers from a single interface
  • can see metrics globally (across a cluster) or individually on each server
Deployment Flexibility
  • Lean server (10MB) for virtual environments
  • template-drive server instance creation
  • integrated experience with vmware environments
  • open, secure API for all operations
  • server-specific settings like ports, etc. have been taken out and put into a catalina properties file, so server.xml can be applicable to all servers
  • streamlines process of spinning up new server instances
  • shared binaries for upgrades
  • multiple server versions can be installed per machine
  • complete flexibility for various "sizes" of VMs

Monday, October 18, 2010

Fix for Eclipse Menu Issues in Ubuntu 10.10 Netbook Edition

This is but one of the many,many reasons I love free software.

In my previous post about Ubuntu 10.10 Netbook Edition I pointed out there are some issues with the menus in some applications, which in my case I ran into with Eclipse (actually SpringSource Tool Suite). I posted to the Ubuntu Forums and in no time a kind user pointed me to this bug report (which I somehow missed in my searches before posting to the forums), and also provided the fix for the issue which is simple enough. Basically you just have to tell Ubuntu not to use its own menus when you start the application, so from a terminal you do this to start the app:

UBUNTU_MENUPROXY= /home/mwoodward/sts/STS



Note that there is a space after the equals sign. Of course if you're having this issue with another application you'd just substitute the path to that application where I have /home/mwoodward/sts/STS And you can obviously throw this into a launcher script so you don't have to remember to type this every time.
I was also having problems with the menus on UltraEdit so I was happy to see this fix resolved those issues as well. With a workaround for this issue I now give the Unity interface a grade of a nice solid B as opposed to the C I gave it in my previous post. There are still some quirks here and there but at least now I can use STS without any issue.

I'm always amazed, yet never surprised, at the excellent support available from the community of free software projects. With commercial software you pay for support, and in my experience it isn't very good as a rule, but with free software you have legions of users at the ready to help other users, and you can also pay for commercial support if you so need or desire.

What's not to love?

Sunday, October 17, 2010

Thoughts on Ubuntu 10.10 Netbook Edition

This isn't intended to be a full-blown review since there are plenty of those out there, but while I was installing Ubuntu 10.10 Netbook Edition on my Asus 1000HE this weekend, I thought I'd jot down my basic thoughts.

Installation

A+++. Absolutely amazing. 25 years later Windows still doesn't even come close to having such a fantastic installation process. Fast, clean, and flawless. Couldn't ask for anything better.


Boot Time
Notably faster than 10.04 on my netbook. My main laptop has a solid state drive and already boots up in about 7 seconds with 10.04, so I can't wait to see if 10.10 makes a difference on that machine.


Ubuntu Font
Ubuntu ships with a new default font called (not surprisingly) Ubuntu. It took some getting used to at first, but I like it! Very readable and easy on the eyes, not to mention Ubuntu-sheik styling.


Unity Interface
I have to give Unity about a C for the time being. The idea of it is awesome, but there are a lot of idiosyncrasies and display issues.

For example, Eclipse flat-out doesn't work because the entire top menu bar in Eclipse doesn't appear. There are also many applications (UltraEdit being one example) on which there are too many menu items across the top for the Unity interface to handle correctly, so they spill over into the notification icons on the right-hand side of the screen. (See screenshots for some examples.)

It's not so bad that I'm going to uninstall 10.10, but I really hope they address the issues soon. If I had to use Eclipse on a regular basis on my netbook I'd simply have to move over to Crunchbang or Easy Peasy, or back to Ubuntu 10.04 which ran Eclipse just fine.

Unity also seems just a bit sluggish on my 1000HE. Not to the point where it's irritating to use, and a VAST improvement over some of the release candidates. I was using RC1 a few weeks ago and the entire machine was horrendously slow, so if you tried an RC and were turned off by the performance, rest assured they fixed that issue for the most part. Seems just a bit more slow than 10.04 when doing certain things, but overall the performance is acceptable.

Software Center

Huge success here. Software Center got a major upgrade both visually and in terms of functionality. I still do most of my installs in a terminal, but Software Center is a real treat to use. Search and categorization is better and when installing a .deb, there's a very nice progress bar and clear notification of when the install is complete. It's a lot more easy to use and clear, particularly for less technical users.

Other Random Thoughts

  • 10.10 is a bit more locked down than 10.04 was in terms of customization. If you like doing a lot of customization to your desktop, menu items, etc. this probably isn't the distro for you. Adding a launcher for programs you install yourself, for example, simply isn't possible from what I've seen because you can't customize the launcher directly, and not all programs support the "Keep in launcher" option when you right-click in the launcher after starting the program from a terminal. This isn't really a criticism per se since if all you want to do is surf the web and read your email 10.10 is fantastic for that, but if you're more of a hacker with your machines, look elsewhere.
  • Ubuntu One got some really nice new features that I haven't had time to dig into just yet. Definitely notice fewer random "your login failed" type issues so they've clearly been focusing a lot of attention here.
  • The Ubuntu One music store is really awesome. I still have to jump over to Amazon.com's MP3 store for some things that Ubuntu One doesn't have, but overall it's really nice and incredibly usable.
  • The social features in 10.10 seem about the same to me as on 10.04, with maybe just a bit more polish. Note that if you rely on using Gwibber for interfacing with Facebook (which I don't), there is a bug that is preventing a lot of people (myself included) from being able to successfully add their Facebook accounts to Gwibber. Facebook *chat* works fine in Empathy, but Gwibber has issues.
Overall this is another great release from Ubuntu. If any of the annoyances I'm outlining here are dealbreakers, just stick with what you have for now. The new features are nice, and I like upgrading every six months to have the latest and greatest, but I'm not sure this is a "must have" upgrade. To be fair the .10 releases aren't really supposed to be "must have" since they don't have long term support (LTS) like the .04 releases, but there's enough here to warrant an upgrade if you're not put off by a couple of glitches here and there.

Will I upgrade my main laptop with Ubuntu 10.10 desktop? Still debating on that one. Short answer is "probably" since I doubt I'll run into the display issues with Eclipse, and my main laptop is in need of a scrubbing anyway. And if I can brag to all my friends that my laptop boots up in 5 seconds instead of 7, all the better.


Installing OpenConferenceWare on Ubuntu

I've been working a soon-to-be-released app called "OpenCFSummitWare" (a.k.a. "Engage" but that name was taken on Google Code) for a while now, and it's the application we'll be using to manage proposals, scheduling, and attendee information for OpenCF Summit.

The inspiration for the application is the excellent OpenConferenceWare that was created for the Open Source Bridge conference. Obviously we want to run a CFML conference on a CFML app, but it saved me countless hours by having an extremely strong model on which to base our new application.

OpenConferenceWare is written in Ruby, and since I have zero experience with Ruby (at this point anyway) it took a bit of work to install the app and get it up and running. With a bit of help from the OpenConferenceWare Google Group and some tenacity I got it running, so I thought I'd share the step-by-step process here.

Note that this assumes you're starting with a clean system or at least one that hasn't ever had Ruby (and some of the other tools outlined below) installed on it. You can do this all in one shot, and probably in a much more logical order, so what you see here is the step-by-step I went through as I ran into missing items while trying to install OpenConferenceWare. (Note that I am * not* doing the optional MySQL database steps, so I think it uses SQLite by default.)
  1. Install git:
    sudo apt-get install git
  2. Clone the OpenConferenceWare git repo:
    git clone git://github.com/igal/openconferenceware.git
  3. Install Ruby:
    sudo apt-get install ruby
  4. Install Ruby Gems:
    sudo apt-get install rubygems1.8
  5. Install additional tools necessary for building and compiling:
    sudo apt-get install ruby1.8-dev build-essential gcc autoconf libtool
  6. Install the MySQL development libraries:
    sudo apt-get install libmysqlclient-dev
  7. Install XML/XSLT libraries:
    sudo apt-get install libxml2-dev libxslt1-dev
  8. Install SQLite3 libraries:
    sudo apt-get install libsqlite3-dev
  9. Install rake:
    sudo apt-get install rake
  10. Install the bundler gem:
    sudo gem install bundler
  11. Install the MySQL gem:
    sudo gem install mysql
  12. Create a symlink to the bundle command:
    sudo ln -s /var/lib/gems/1.8/bin/bundle /usr/local/bin/bundle
  13. Go into the openconferenceware project directory you cloned in step 2 above:
    cd openconferenceware
  14. Install the necessary libraries for the application:
    bundle install
  15. Update the styles:
    rake bridgepdx:styles
  16. Set bridgepdx to be the default theme.
    1. Navigate to openconferenceware/config
    2. Create a new file called theme.txt
    3. Add the following line to theme.txt:
      bridgepdx
    4. Save the file
  17. In the openconferenceware directory, create the databases:
    rake db:create:all
  18. Finish the database creation, populate database with sample data, and set the admin password:
    rake setup:sample
    (If you don't want any sample data, just do rake setup instead of rake setup:sample)
  19. Startup the app:
    ruby script/server
  20. Navigate to http://localhost:3000/admin and log in!
Pay special attention to step 16 if you're getting an error along the lines of "bridgepdx theme broken"--that simply means you haven't set a default theme in openconferenceware/config/theme.txt

Final note: this setup is intended for running on a local development box, not for production! If you're interested in additional performance and security settings for production, make sure and check out the installation instructions on github.

Thanks to Igal and the entire team for such a great open source conference management app on which I could model the app we'll be using for OpenCF Summit!