How many wasps are on this nest? What are their unique identifying marks? How many eggs, larvae, and pupae are in the nest? How many times does a given wasp dominate another? These are the questions that gave the numbers for my earliest graduate school research projects. I worried about every single one of these numbers. Is there a wasp hiding on the back of the nest? Have I recognized the painted marks correctly? Did I miss a young larva, calling it an egg (actually unlikely because eggs are pearly white and young larvae, though tiny, turn pinkish)? Was that really a domination, or just a wasp climbing on another as she flew off the nest? I worried about my data and did my best.
We learn what blind means in science. If there is an experiment, we make sure when we are scoring resulting actions we do not know which treatment a case received. There are lots of ways to do this. But the point is we don’t really even trust ourselves to be unbiased because we could inadvertently favor a hypothesis. How to avoid bias is something worth spending time on.
We keep careful data notebooks. In my lab these are still mostly on paper. The pages are numbered and dated. But increasingly data are collected directly onto loggers of various sorts. There we also preserve the details of data collection.
Once we have collected our data, we examine it for obvious errors. We graph it and look hard at the outliers. Are they real, or was there a data entry or other kind of mistake? Often we enter data twice and then compare as a way of checking that step. Of course, if the outliers are real, we keep them.
But what about our collaborators? What if they have not been as careful as we have? How can we tell that we have a sloppy or fraudulent collaborator? What checks should we do? These questions are timely because of the Jonathan Pruitt case, where collaborators who trusted his data and trusted him are now retracting papers. I am not going to summarize that case here but here are links to what Kate Laskowski, Dan Bolnick, and Science have said. Perhaps a link to the Dynamic Ecology blog and my own previous post are also warranted.
I know you want an answer. Perhaps a great R package to run your collaborators through. Or a tutorial from Elizabeth Bik on how to recognize fraud in images of biological samples or gels. Maybe you want a personality test, or to learn of the traits common to those who cheat.
I have to disappoint you. I have racked my brain for what we might do but for anything I thought of I ran across two stumbling blocks. One was wondering what else the collaborators in the Pruitt case might have done. The other was thinking of my own collaborators and how I might behave differently in the future.
My conclusion was that, no, we cannot be constantly checking our collaborators’ data. No, we cannot ask that they show us their raw data. No, we cannot identify a flawed personality type that cheats. We are stuck. These techniques might work occasionally, but basically they will not work. They did not break open the Pruitt case. That, apparently, was thanks to an internal whistle blower with inside knowledge of the problem (see previous links).
Or are we stuck when it comes to our collaborators? I think there are only two possible solutions. The first is to stop collaborating. Collect all your data yourself. Then you will be sure of its accuracy. But what would that do to science? How greatly that would slow down progress?
What is the other solution? It is easy, but flawed. But it is the best option, by far. It will not avoid any of the pain the Pruitt collaborators are currently suffering. But it is best for science. It is to simply trust your collaborators.
Odds are they are trustworthy. In most cases they are either replicating something you are doing but somewhere else, as with the big ecology experiments where plant communities are studied in similar ways all over the globe. Or they are providing an expertise in something you are not skilled in and have no hope of ever learning on top of everything else you do. In neither of these cases can you check their data in any meaningful way.
And no, trustworthiness does not increase with the number of beers you have shared with a collaborator. Cooperation, new ideas, and scientific fun may increase, but not necessarily good data.
For some people some of those collaborators will provide flawed data. Should we limit the potential impact of this possibility by not collaborating too much with any one person? Here again, I would say no. I have spent much of my career collaborating mostly with one person and with a lot of others in addition. I know of other very productive long-term completely trustworthy collaborations.
So I suggest a simple personal solution. Trust your collaborators. But what does that mean for science when someone turns out to be fraudulent? Where is the protection there? Here again I have an answer. It is that no important idea should be validated by work from just one lab or one set of collaborators.
We should think hard about this and keep track of what truths have come from just one group. If they are close to your field, spend some time redoing, or doing similar experiments so that science progresses on a firm foundation. And of course, remember to be absolutely trustworthy yourself. I hope someone will let us know what to now believe about spider social personalities.