The R Programming Language is the open source version of S, a statistical programming language, with a massive FOSS community supporting it. I'm writing this post sitting in my first R programming class of the semester listening to William Cleveland extol the virtues of the language, its history, explaining how it's open source and is community-supported and how nobody quite knows how many users there are because it's open... I smile a little.

I'm learning R for a few reasons:

I need to take a statistics class, and this seemed vastly more palatable than some theoretical formulas-on-the-blackboard thing or anything that'd make me use SAS, which has the Interface from Hell (according to my classmates who have been forced to use it). I'm hoping this will be the graduate-school version of ThinkStats (which Sebastian got to take - slight jealousy on my part here, but that's what I get for going to Olin earlier).

R is one of those strange beasts: an open source project that's gotten some damn good traction inside academia. In fact, I'd say R lives in large part (although definitely not entirely) within the academic world. Part of that is because it's just a great tool for doing things that academics need to do -- but I am curious to look inside the R community and see how it works and who it's made of, because perhaps there are some subtle adaptations that come with a large portion of your contributors also being based at colleges and universities. For instance, R's 2012 GSoC application says the project does "...not use IRC. It does not fit with the general culture of statisticians and related scientists. We tend to work on packages in small groups of 1-3 people and use other communications channels, i.e. email, telephone." And its 2011 GSoC report is published in a peer-reviewed journal. I'm blinking.

I like programming. I've been missing it for far too long. R is a programming language (and an interpreted one, too -- my impatient self's favorite kind). I like getting down and dirty and manipulating my data and being able to pop the hood as much (or as little) as I need to. For instance, I can write extensions in C if I need to. And I am used to being able to navigate an open source community, although this one (no IRC channel?) might be an interesting challenge. So I've subscribed to the R-help and R-devel mailing lists in order to listen to the traffic that goes by, get a feel for things.

Interesting things I'm learning about R now by poking around the intarwebz while the historical lecture continues and other students install R on their platforms:

  • R is slow for large datasets because it was developed in the 80s. R users get around this with things like the D&R (divide and recombine) strategy, and CS researchers are trying to speed up R, but the not-optimized-for-giant-things is part of the nature of the way the beast was built.
  • More on D&R: you can either split up the data across multiple processors (and then stich it back together), or parallelize the method by making something like a computational assembly line.
  • Even if R's 2012 GSoC application says that it does not have an IRC channel, the Finnish usergroup uses #r-project on IRCnet, but it's "mainly targeted at Finnish community."
  • 3 hours from me, there's a Chicago RUG (R Users Group) but it's got no upcoming meetings. I've joined it anyway; they seems fairly healthy at first glance and their last few meetups have been on interesting topics.

Hah! Dr. Cleveland emailed us some questions about our experience with R, along with a couple diagnostics questions to check what we have and haven't used. I'm going to reply saying "as of the end of class today, I hadn't used R at all, but after <length-of-time>, the answers to your questions are..."

Oh, this is fun.