Science is founded on a lovely relationship between theory and data. Theory predicts patterns. Data tells us which theories work. Together they make sense of our world. How do we teach students to handle data properly? Six general areas come to mind.
Easiest and most obvious is not to cheat in any way. If the pH meter reads 7.10, that is what you write down. It would be wrong to write down even 7.14. If the mouse moves right, you cannot say it moved left. If there is a band on the gel at a certain position, record it. And on and on. Report what you measure. This is actually the easiest essential, and the worst to break. Do not agonize too much about teaching this one, even considering recent events, because it is the easiest. Would it be too much to say that those that break this one are different from you and me?
The second essential is similar. Do not let others convince you to change your data. If someone else, even in your own lab, suggests that you did not see what you saw, or measure what you measured, do not change it. The ethical treatment of collaborators is a huge topic of its own, to be treated in a separate entry, but no one should ever try to convince you to change your data. This does not mean you never redo measures if for example the pH meter was off last week, but don’t redo it because the result didn’t support your pet theory.
The third essential is to understand your own biases and how they impact data collection. Even if you are trying your best, you might inadvertently bias your data to the direction you think it should go. We are all vulnerable to this. This is not fraud like point one. So, whenever possible conduct your study blind. This does not mean you shut your eyes. It means the person scoring the data is ignorant of the impact of a given outcome. If some males were injected with extra testosterone and then observed to see if it made them more aggressive, the person observing the behavior should not know which birds were which, for example.
The fourth essential is to analyze your data properly. Use appropriate statistics. Understand random and fixed variables. Use parametric statistics only when the assumptions are met. But this is only the beginning. In our genomic analyses there are all kinds of complexities to worry about. Your data form patterns only with correct statistics.
The fifth essential is that your data and it analysis should actually show what you say it shows. It is surprising how often people get this one wrong. Consider what your data show and do not discuss things they did not show. You might have wanted to study the other topic, but you did not.
The last point is to make your data and your analyses public. This should be entirely possible for data and is increasingly possible for analyses. Someone else should truly be able to replicate what you did with your data and come to the same result.
I’m sure there are lots of other important cautions on data. But these six categories seem to me to cover the most crucial areas. Don’t cheat. Don’t listen to others who want you to cheat. Be aware of inadvertent bias. Use the right statistics. Don’t overextend your results. Make your data and your analyses public. And of course, have fun, for you will be on the path of discovering new truths!