Saturday, April 18, 2015

Research Minutiae

I have a confession: While I enjoy learning other people's ideas about major issues in the field of psychology, I also am interested in the mundane details about psychological research. You know, the minutiae that most people overlook or take for granted when they do psychological research. In this way, I am sort of like a less cool, less funny Jerry Seinfeld (if he happened to be into research). The nice part is that when I learn a tip or a trick that may be useful I often can incorporate these ideas immediately into my work. The research process is slow, so it is not often that we get an immediate payoff.

With that said, I believe that every researcher learns and collects little tips and tricks along the way, but these are not often the focus of conversations. However, I believe these small details are important, helpful, and interesting. So, I put together a list of 5 mundane details that I care about. I call these my research minutiae. These are from my personal experience, so take them with a grain of salt (and a cup of coffee, research should always involve coffee). But, perhaps some of these will be helpful to you.

1) I cannot tell you how often I review an article and the authors report an interaction with the letter 'x' and not the multiplication symbol '×'. Please keep in mind that x !=×.
Incorrect: We observed a Condition x Gender interaction.
Correct: We observed a Condition × Gender interaction.
I don't believe this detail is picky. First, one is correct and one is incorrect. Further, keep in mind that our audience is other humans who we we must convince that our research is meaningful. These small details help signal to the reader that you (the author) attend to details and that you have taken your time with preparing your manuscript. Conversely, if your results are filled with errors, the reader may wonder how careful you were with other aspects of the research design or statistical analyses.

2) When you develop a line of research you often end up measuring the same things in different studies. Label your variables consistently. This allows you to more easily re-use syntax/script that you have already created. This also makes it easy to find common variables across your different datasets. This has the added benefit of nudging you to start thinking of your research less as a collection of individual studies and more of a body of research. In addition to using the same variable names when referring to the same thing, here is a list of common ways to label variables.

Labels with spaces (example: feedback condition): This may work fine for labeling columns in excel, but most stats programs won't be able to read variable names with spaces. Don't do it.
ant.format (example: feedback.condition): This is recommended by the google style guide for R. It is pretty readable and, from what I can tell, is fairly common.
gooseneck_format (example: feedback_condition): Also very common. In my opinion, this is not visually appealing. For example, mean_of_trait_aggression looks ugly. But that is just my opinion.
camelBack (example: feedbackCondition): This is what I tend to use. It saves a keystroke and a space, which is nice for making concise script. However, if your variable names are long, this format can be tough to read. Also, some people capitalize the first letter of the first word and some don't. This becomes important when using programs like R, which are case sensitive.

First, be consistent within your personal datasets. Second, if you run a lab or have close collaborators, encourage everybody to use the same variable names and naming style. It is a small thing that helps with communication.

3) Come up with a good set of demographic questions and try to use it in all of your research. I always collect demographic information, but I used to come up with a set of questions for each of my studies. Recently I had a reason to examine data from across my different studies (I was comparing demographics of my in-person participants and my online participants). I quickly found my inconsistencies to be frustrating. For example, I sometimes gave participants the option to self-identify as Asian-American or as of Middle Eastern descent and other times there was not the option of to self-identify as Middle Eastern. Sometimes I forced people to identify as male or female, other times I gave an NA/other option. I also used different response formats for education level, relationship status, and employment status. This made my cross-study comparisons less than optimal and took much longer than it needed to. Just to be clear, I am not only talking about collecting the same demographic information, but I also am talking about using the exact same wording and the exact same response formats. This makes apples-to-apples comparisons across all of your studies a cinch.

If you run a lab, take the time to come up with a good set of demographic items (this is a good task for an undergraduate to learn about question wording and different response formats). At this point, try to be inclusive, answering these questions only adds seconds to a participants' time. Get the information from them while you can. Try to use those questions in all of your studies. Not only will it allow you to re-use script/syntax (see previous point), but it will facilitate your cross-study comparisons. Also keep in mind that you are not beholden to these questions. If there are problems with some questions, or you need to add/drop questions for a specific study, then do what you need to do.

There are many reasons that the entire field will never adopt a common protocol for the collection of demographic information, but it seems feasible for a small group of researchers in a specific area to do so. I am not aware of that happening, but one could imagine. In my mind, I imagine that would be awesome. Future meta-analysts would thank you.

4) Save your raw data in a separate file and try to never open it again. In fact, if you ever looked at the folders where I keep my files, you would find one datafile that is labeled with a "RAW" and then there is the datafile that I actually use for analyses. For example, when I get survey data entered into an .csv file, I save that file as Study 1 RAW.csv. Then I save the file again as Study 1.csv. I only do my analyses on the Study 1.csv file and NEVER on the Study 1 RAW.csv file. When computing variables, reverse coding variables, etc. it is too easy to make mistakes. It is always nice to be able to go back to the RAW datafile and start from scratch if you need to.

5) Annotate your script/syntax. People will tell you that this is important because in the age of open science that other people need to be able to open your files and understand what you did. Another researcher should be able to exactly reproduce your analyses merely by looking at your files.

That is all good and well. I love open science and I think those are valid points. I will make a more pragmatic argument. Annotate your script/syntax so that you will know what you did 3 months or 3 years from now. It may seem like a time waster. After all, while you are doing your analyses you know what your variable names mean and you know why you are doing your analyses. It is hard to imagine not knowing what your variables mean or why you did a specific analysis. But trust me, your memory for these things decays quickly. If I look back at my old files, my memory for what I was thinking is surprisingly bad. I am thankful that I am a good note taker. Make detailed notes while you are doing your analyses and why you are doing your analyses. Your future self will thank you.

So, these are a few of the research minutiae that I have picked up over the few years that I have been doing research. My hope is that one of these things may help at least one other person. Most of these are things that I do out of awareness now. So, if I notice myself doing any other things, I may add to this list in future posts. I also enjoy hearing about other people's research minutiae. If you have any that you would like to share, please leave a comment or contact me.