Skip to main content

How to Analyze Your Data and Take Advantage of Machine Learning in YourApplication #s2gx

Christian Schalk - Google
Google's New Cloud Technologies
  • google storage for developers 
    • api compatible with amazon s3
  • prediction api (machine learning)
  • bigquery
Google Storage
  • store your data in google's cloud 
    • any format, any amount, any time
  • you control access to your data 
    • private, shared, public
  • access via google apis or third party tools/libraries
  • sample use cases 
    • static content hosting, e.g. static html, images, music, video
    • backup and recovery
    • sharing
    • data storage for applications 
      • e.g. used as storage backend for android, appengine, cloud based apps
    • storage for computation 
      • bigquery, prediction api
Google Storage Benefits
  • high performance and scalability 
    • backed by google infrastructure
  • strong security and privacy 
    • control access to your data
  • easy to use 
    • get started fast with google and third party tools
Google Storage Technical Details
  • restful api 
    • get, put, post, head, delete
    • resources identified by uri
    • compatible with s3
  • buckets -- flat containers
  • objects 
    • any type
    • size: 100 gb / object
  • access control for google accounts 
    • for individuals and groups
  • two ways to authenticate requests 
    • sign request using access keys
    • ???
Performance and Scalability
  • objects of any type and 100GB/object
  • unlimited numbers of objects, 1000s of buckets
  • all data replicated to multiple US data centers
  • leveraging google's worldwide network for data delivery
  • only you can use bucket names with your domain names
  • read-your-writes data consistency
  • range get
Security and Privacy Features
  • key-based authentication
  • authenticated downloads from a browser

Getting Started with Google Storage
  • go to http://code.google.com for basic info
  • http://code.google.com/apis/storage (currently in preview mode) 
    • getting started guide, docs, etc.
    • can sign up for an account
  • command line tool available -- gsutil -- low-level access from the command line, scripting
  • google storage manager -- web-based tool for managing google storage

Google Storage Usage Within Google & Early Adopters
  • google bigquery
  • google prediction api
  • google.org -- imagery
  • google patents
  • panoramio
  • picnik
  • vmware
  • US Navy
  • theguardian
  • socialwok
  • xylabs
  • etc.
Pricing
  • storage: 0.17/gb/month
  • also costs for up/downloads
  • similar pricing to amazon s3
  • preview in US 
  • non-US preview available on case-by-case basis

Google Prediction API
  • google's sophisticated machine learning technology
  • available as an on-demand restful http web service
  • provide a bit of text and "train" the algorithm in the service to predict outcomes based on patterns 
  • simple example: language detection 
    • provide series of examples of english, spanish, french, etc. and train the prediction api to recognize the language
  • endless number of applications 
    • customer sentiment
    • transaction risk
    • etc
Prediction API Examples
  • predict and respond to emails in an automated way
Using the Prediction API
  • three step process 
    • upload training data to google storage
    • build a model from your data
    • make new predictions
Training
  • POST prediciton/v1.1/training?data=mybucket...
  • can respond when the prediction engine is ready and gives an estimate of accuracy

Predict
  • apply the trained model to make predictions on new data
  • returns json data
  • includes scores indicating confidence of prediction

Prediction API Capabilities
  • data 
    • input features: numeric or unstructured text
    • output: up to hundreds of discrete categories
  • Training 
    • many machine learning techniques
Prediction Demo
  • cuisine predictor
  • spreadsheet of type of food (e.g. mexican, italian, french) and food description as training data
  • upload spreadsheet to google data storage
  • kick off training process, then can check to see if it's done
  • pretty accurate predictions even on a limited training dataset
Google BigQuery
  • also resides on top of google storage
  • can have large amounts of data that you can quickly analyze using sql-like language
  • fast, simple to use
Use Cases
  • interative tools
  • spam
  • trends detection
  • web dashboards
  • network optimization
Key Capabilities
  • scalable to billions of rows
  • fast--response in seconds
  • simple--queries in sql
  • webservice based--rest, json
Using BigQuery
  • upload to google storage
  • call bigquery service to import raw data into bigquery table
  • perform sql queries on table
Security and Privacy
  • google accounts
  • oauth
  • https

Tools
  • bigquery shell utility available -- just type sql commands and get responses back
  • can tie in a google spreadsheet and point it to a bigquery table

Comments

Popular posts from this blog

Running a Django Application on Windows Server 2012 with IIS

This is a first for me since under normal circumstances we run all our Django applications on Linux with Nginx, but we're in the process of developing an application for another department and due to the requirements around this project, we'll be handing the code off to them to deploy. They don't have any experience with Linux or web servers other than IIS, so I recently took up the challenge of figuring out how to run Django applications on Windows Server 2012 with IIS.

Based on the dated or complete lack of information around this I'm assuming it's not something that's very common in the wild, so I thought I'd share what I came up with in case others need to do this.


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Assumptions and CaveatsThe operating system is Windows Server 2012 R2, 64-bit. If another variant of the operating system is being used, these instructions may not work properly.All of the soft…

Installing and Configuring NextPVR as a Replacement for Windows Media Center

If you follow me on Google+ you'll know I had a recent rant about Windows Media Center, which after running fine for about a year suddenly decided as of January 29 it was done downloading the program guide and by extension was therefore done recording any TV shows.

I'll spare you more ranting and simply say that none of the suggestions I got (which I appreciate!) worked, and rather than spending more time figuring out why, I decided to try something different.

NextPVR is an awesome free (as in beer, not as in freedom unfortunately ...) PVR application for Windows that with a little bit of tweaking handily replaced Windows Media Center. It can even download guide data, which is apparently something WMC no longer feels like doing.

Background I wound up going down this road in a rather circuitous way. My initial goal for the weekend project was to get Raspbmc running on one of my Raspberry Pis. The latest version of XBMC has PVR functionality so I was anxious to try that out as a …

Fixing DPI Scaling Issues in Skype for Business on Windows 10

My setup for my day job these days is a Surface Pro 4 and either an LG 34UC87M-B or a Dell P2715Q monitor, depending on where I'm working. This is a fantastic setup, but some applications have trouble dealing with the high pixel density and don't scale appropriately.
One case in point is Skype for Business. For some reason it scales correctly as I move between the Surface screen and the external monitor when I use the Dell, but on the LG monitor Skype is either massive on the external monitor, or tiny on the Surface screen.
After a big of digging around I came across a solution that worked for me, which is to change a setting in Skype's manifest file (who knew there was one?). On my machine the file is here: C:\Program Files\Microsoft Office\Office16\LYNC.EXE.MANIFEST
And the setting in question is this:
<dpiAware>True/PM</dpiAware>
Which I changed to this: <dpiAware>False/PM</dpiAware>
Note that you'll probably have to edit the file as administr…