Skip to main content

What's the New York Times Doing with Hadoop?

Interesting yet very brief interview on what the New York Times is doing with Hadoop. It's always fascinating to me to read about the tools and approaches people use with the level of scalability most of us don't have to worry about. Also interesting to me is the MapReduce functionality in Hadoop since it's the same idea used by CouchDB views, and I'm absolutely loving the bit of work I've been doing with CouchDB.

Comments

cfwhisperer said…
Matt this is all very fascinating to me I got interested at the O'Reilly Velocity Conference in San Jose. I have to discipline myself and dedicate time to completing build-out of our LA lab I am intrigued to at least get rolling with CouchDB. We are also looking closely at aiCache which is a web acceleration product, thanks for the pointer.
Matthew Woodward said…
Thanks DrQz--great info.
DrQz said…
Welcome. BTW, the functional part is not all that difficult either. Just think of EVERYTHING (including a '+' operator) as function or procedure with args as inputs and returns as outputs. The main difference from procedural languages, like C or Java, is that the output of one function can be the input to another function ... a LISP-ism.Here's an example in Mathematica:In[1]:= Times[Plus[a, b], c] produces: Out[1]:= (a + b) cwhich is what a mathemagician would've written in the first place (the multiply being implicit in math). If 'a' and 'b' are given numerical values, you'd get a single number as the output. MMA can do either numbers or symbols.From a programming standpoint, you see the 'a' and 'b' are args into the function Plus and it's output (together with the input 'c') is an input into the function Times.This example is pedestrian but it generalizes into some very cool and powerful constructs that can be written with relatively little code. For example:1. A Quine (code that reproduces itself):Print[# 1,FromCharacterCode[91], #1,FromCharacterCode[93]]&[Print[#1,FromCharacterCode[91], #1,FromCharacterCode[93]]&]2. Sequence generator using recursion:Nest[Join[ # , ReplacePart[ # , Length[ # ] -> Last[ # ] + 1]] &, {0, 1}, 5] See http://www.research.att.com/~njas/sequences/A007814 (bottom of page).
DrQz said…
With respect to how Hadoop/MR might be getting applied at NYT, here's a video of an ACM talk (http://www.sfbayacm.org/?p=88) that I attended recently, about how Google.com is actually using MR in their AdSense group (i.e., where the action is). http://fora.tv/2009/08/12/Josh_Herbach_PLANET_MapReduce_and_Tree_LearningThe MR slide appears @ 00:37:37 approx.This presentation also provides a quick overview of the whole schmeer related to Biz Analytics and DM. Hard to find, otherwise.

Popular posts from this blog

Running a Django Application on Windows Server 2012 with IIS

This is a first for me since under normal circumstances we run all our Django applications on Linux with Nginx, but we're in the process of developing an application for another department and due to the requirements around this project, we'll be handing the code off to them to deploy. They don't have any experience with Linux or web servers other than IIS, so I recently took up the challenge of figuring out how to run Django applications on Windows Server 2012 with IIS.

Based on the dated or complete lack of information around this I'm assuming it's not something that's very common in the wild, so I thought I'd share what I came up with in case others need to do this.


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Assumptions and CaveatsThe operating system is Windows Server 2012 R2, 64-bit. If another variant of the operating system is being used, these instructions may not work properly.All of the soft…

Installing and Configuring NextPVR as a Replacement for Windows Media Center

If you follow me on Google+ you'll know I had a recent rant about Windows Media Center, which after running fine for about a year suddenly decided as of January 29 it was done downloading the program guide and by extension was therefore done recording any TV shows.

I'll spare you more ranting and simply say that none of the suggestions I got (which I appreciate!) worked, and rather than spending more time figuring out why, I decided to try something different.

NextPVR is an awesome free (as in beer, not as in freedom unfortunately ...) PVR application for Windows that with a little bit of tweaking handily replaced Windows Media Center. It can even download guide data, which is apparently something WMC no longer feels like doing.

Background I wound up going down this road in a rather circuitous way. My initial goal for the weekend project was to get Raspbmc running on one of my Raspberry Pis. The latest version of XBMC has PVR functionality so I was anxious to try that out as a …

Fixing DPI Scaling Issues in Skype for Business on Windows 10

My setup for my day job these days is a Surface Pro 4 and either an LG 34UC87M-B or a Dell P2715Q monitor, depending on where I'm working. This is a fantastic setup, but some applications have trouble dealing with the high pixel density and don't scale appropriately.
One case in point is Skype for Business. For some reason it scales correctly as I move between the Surface screen and the external monitor when I use the Dell, but on the LG monitor Skype is either massive on the external monitor, or tiny on the Surface screen.
After a big of digging around I came across a solution that worked for me, which is to change a setting in Skype's manifest file (who knew there was one?). On my machine the file is here: C:\Program Files\Microsoft Office\Office16\LYNC.EXE.MANIFEST
And the setting in question is this:
<dpiAware>True/PM</dpiAware>
Which I changed to this: <dpiAware>False/PM</dpiAware>
Note that you'll probably have to edit the file as administr…