There are plenty of open communities that are interested in education research. For instance, we've tried to track the effects of POSSE and Teaching Open Source for years, and I remain tremendously unsatisfied with what we've got. Sarah Allen from Railsbridge wants to see what effect their workshops have on participants. We think HFOSS participation makes students more employable... we think. Now, scientific research does not prove something to be true - if you don't understand that, you don't understand science - but it can give us more certainty, more insight... it is worth doing well.

It's also difficult to do rigorous research. And it's interesting to try doing rigorous research in an open community, because the typical assumption of researchers is that morally, ethically, you are closed by default -- we must give people privacy, we must hide and encrypt and obscure and strip their data of identifiers; their words and actions must never see the light of day. This is important; terrible things have been done (especially in medical research) to people before these standards were set down, and every university that does research has an IRB (Institutional Research Board) to make sure that experiments are treating their subjects right.

Now, privacy is important in open communities too; we have closed-door conversations, we confide in others things we trust and ask them not to spread. Identity is as well; people use pseudonyms sometimes, people prefer to be anonymous, to not identify their employer, to not give their location. This must be respected. But open communities believe that morally, ethically, you are open by default -- perhaps we have private people, but we have public data. Get it out there! Preserve as much identifying information as you can! Public domain and open license it! Go!

I can't speak for other members of open source communities, but it's always felt weird to me to participate in research on open communities where my data becomes part of a closed and private set, accessible only to researchers, then published and accessible only behind a proprietary paywall. I know these come from rules designed (ostensibly, at least in part) to protect me, but... I do not feel like that is "treating me [as a subject] right," because it runs counter to the culture I believe in. I want to be able to take my data - my own interview transcript, my words -  and post it openly (possibly with some edits of my choosing) under my name. I've asked, and each time I've been told "gee, I wish we could do that, but... it might look like coercion, I'm not sure how my IRB would feel about that, I don' t know..."

I can sympathize with that fear. As a first-year grad student, sorting out what is and isn't allowable in the blurry boundaries of research and open communities is something I'm still working through. It's not something that has an answer; these are all shifting, fuzzy shades of grey that vary with time and from person to person - and the thing is, you can be shut down. The academic world can be blocked to you with a single, irreversible 'no.'  It can be devastating. Graduation, tenure, employment... poof. Of course you wouldn't want to risk that.

But if you're not tied to the academic world? If you would like your ideas to be published in scholarly venues and will give them first right of refusal, but are willing to take the risk they won't be, and the work of finding other places to publish if the "normal" places won't take your things under your conditions? Hm. As my buddy and fellow first-year grad student Seb said shortly before our first semesters started: "Worst case is that we 'drop out' back into industry." And you know what? We're pretty happy there too.

So, with that in mind, some thoughts on trying to clarify this weird grey haze - I do not know if these are true or not, and would welcome both agreement and dissent, but they are what I believe to be true right now. Also, I am not a lawyer. I'm a first-year grad student who has yet to be a first author, a PI, or go through the IRB process without significant hand-holding. There is a good chance that I may be wrong. Do not take these statements as fact.

Now then.

  1. You only need to do IRB if you want to publish in a scholarly setting. If I was just interviewing someone for fun, say on my blog or something, and they gave their recorded consent to having it released online under an open license, etc. I don't need to get that cleared. And I could publish that wherever I wanted personally - online, in a book, etc - according to the license of the content.
  2. You can use open-licensed and public domain datasets in scholarly publications.  This includes prior data you have personally collected.
  3. IRB folks generally don't like it if it seems like you are trying to "get around" the process. So if you're, say, interviewing people and releasing transcripts openly, then going "oh hey there just happens to be public data on this, let me analyze and publish," eyebrows may be raised. It's hard to tell how a board will see things - they're people, after all, and will have varying opinions. But by and large, just as with any other human being, don't try to weasel things through. Be up-front and honest about what you're trying to do.
  4. The worst thing that can happen is that you'd have to re-collect the data. Possibly with a different interviewee set (which would be annoying, and likely time-consuming and expensive). But on the up side, you can think of your initial data collection as validation of the instrument, or preliminary results, or just plain practice - research skills, like any other sort of skill, improve dramatically with use. It's really, really difficult to be banned from everything forever, unless you... I dunno, deliberately murder people in the course of your experiment or something.
  5. Reusing stuff is ok if you cite it. For instance, I've been reading Creativity by Csikszentmihalyi, and the book's first appendix is the interview protocol used. It's short, nicely done, and I would love to reuse it. Again, I'm confident I can go off and interview friends this way (informally, just to get some practice in) and write stuff up however they give permission for, and will of course cite Csikszentmihalyi as a source if I do that -- but if this crosses the boundary into "hm, this might be publishable in a scholarly realm," my impression is that it's good to reuse protocols with citations so long as you're not replicating the experiment (for instance, transferring it to a different discipline, population, etc) because it's validating the instrument and not reinventing the wheel.

That's what I've got so far. Let's see where operating on these assumptions takes me - I want to treat my research subjects right, both by their standards and by the "conventional" standards of research ethics (read: IRB). I believe that these two things can usually be made compatible. (This is assuming that my subjects are well-informed and educated people of the age of consent and not in extenuating circumstances where I might hold power over them... research on students, especially my students, will be harder -- research on minors, research on folks in tenuous financial situations... that gets interesting, but I'll cross that bridge when I get to it.) But when there is a conflict, I want to defer to the wishes of my subjects, because in the end, they are the ones I want to be accountable to.