 The last post talked about making sure undergrads get the big picture of their questions. This is essential, but it is not the end. All too often analysis is left for the end and there is no exploring. Ideally, students learn their analytical techniques right from the start. One of my grad students, Tyler Larsen, is doing a great job of this.

Here is something he sent Cara Jefferson who is working on a project that is an extension of his. The details will be different for everyone, so this might not mean a lot to others, but there are things to get out of it.

First, Tyler encourages graphic exploration of the data.

Second, he does not give everything in detail to Cara but instead gives his scripts and encourages her to think about how her data differ from his and modify the analyses. It is too easy to turn off your brain if you are given everything. But he makes it clear he is available for help.

Third, Tyler suggests ways to explore the data, things she might look for. Cara is fairly new on the project so this is a really good idea. But he doesn’t tell her exactly how to do it.

Fourth, Tyler’s questions lead to the important questions of the research project.

Fifth, Tyler makes it clear that science has variation and we have to worry when it is caused by things like date of experiment and helps her see how to look for that.

Sixth, Tyler gives Cara the R code necessary to start the work. The sooner students get comfortable with R, the better, and getting code for one’s project is a good way to start. He gave her the R code in an R format, but I appended it to this post.

What you see below is all the stuff Tyler thought hard about and gave Cara. It is an wonderful example of a careful grad student mentor working with an excellent undergrad.

Cara,

It is valuable when you’ve collected a bunch of data to take the time to play with it and look for patterns.  Obviously we will ultimately do statistics to test any hypotheses we ultimately want to test, but in the meantime it is worthwhile to graph things every which way to Sunday and see what pops out.

1. In Excel, take a look at the growth curves themselves.  Plot at least a few of them as scatter or line graphs.  Do they take the shape we talked about, with a flat beginning, an exponential rise, a plateau, and then a slow decline?  Take note of any you see that don’t look like this and try to think about why.
2. Look at the final OD values reached by different strains.  This is the yield.  It isn’t as informative as the growth rate (it’s very sensitive to the specific growth conditions), but it may differ between strains.
3. Plot the data from the negative control wells.  The media only wells should not grow at all (ie just a flat line at around 0).  If they do the media was contaminated and that plate is likely not usable. Ideally the Dicty negative controls (the wells with Dicty but not Burk) should not grow either but past experience tells me this is wishful thinking and many often do show some growth, presumably from KP that survived the antibiotic treatment.  Take note of how long it took these wells to grow – if it’s later than the Burk strains (as it often is), it probably isn’t a concern.  If it’s earlier it is something we need to look more carefully at, lest we accidentally measure the wrong strain’s growth rate.
4. Run the first part of the script to calculate growth rates.  Note that growth curves with aberrant shapes are hard to calculate growth rate from.  The script will notice most of these and refuse to calculate growth rates for them, but it does miss some of them.  If you get a number that seems really weird, go back and plot that well’s growth curve.  Chances are it’ll be screwy and you’ll know it’s not to be trusted.
5. In Excel or R, convert all of the growth rates into doubling times.  Doubling time = ln(2)/µ, and is usually expressed in growth per hour.  Remember our time points are 15 minutes apart.
6. Now compare growth rates or doubling times between strains.  Do this first just by eye on the spreadsheet, and then try to produce a graph to answer these questions (one graph may be able to address more than one question at a time).
1. How do the ancestor strains’ growth rates compare?  (No Dicty present)
2. How do evolved strains compare to ancestor strains?  (No Dicty present)  Are they faster or slower?  Does it vary by strain?  Why?
3. How does adding Dicty change growth rates of ancestor strains?  Is it faster or slower?  Does it vary by strain?  Why?
4. How does adding Dicty change growth rates of evolved strains?  Are evolved Burk strains more or less sensitive to the effects of adding Dicty?  Does it vary by strain?  Why?
1. How consistent are your results between days?  Do you get the same answers?  If not, why not?

If you get stuck on any of the initial part (setting up data, calculating growth rates, etc), come find me and I’ll help you.  For the interpretation part I encourage you to take a sincere whack at it yourself before you seek help.  We’ll go over it once you’ve had the chance to give it some thought.

And he gave her the R code to use:

##Here is a script for analyzing growth rates from the Tecan

##This version of the script was modified on 03/21/18 improve usability and incorporate some statistics

##############################################################################################################

#Install packages – this should only be necessary once per computer.

#install.packages(“ggplot2”)

install.packages(“reshape2”)

#install.packages(“dplyr”)

#install.packages(“devtools”)

#library(devtools)

#install_github(“dcangst/fitr”)

rm(list=ls()) # clear memory and start fresh

library(ggplot2) #For plotting

library(reshape2)  #For converting data between wide and long forms

library(dplyr)  #For grouping and manipulating data

library(fitr)  #For the actual growth rate analysis

###############################################################################################################

#Set some strings for file naming

todaysdate<-“042018”

filename<-“combineddata”

##Import data.  Data should have columns Date, Well, Bstrain, Bline, Dstrain, Dline, Dtreatment, and Notes (case sensitive)

backup<-simpledata #make a backup

#convert the data to long form.  The id argument defines which columns are categories identifying the

#data points.  Everything else is the data.  Variable.name and value.name arguments name the two new columns in the resultant data

meltdata<-melt(simpledata, id=c(“Date”,”Well”,”Bstrain”,”Bline”,”Dstrain”,”Dline”,”Dtreatment”,”Notes”),variable.name=”Time”,value.name=”OD600″)

meltdata\$Time<-as.numeric(meltdata\$Time) #Convert time from a factor into a number

meltdata\$OD600<-as.numeric(meltdata\$OD600) #Convert OD600 from a character into a number

meltdata\$Bstrain[which(meltdata\$Bstrain==”None”)]<-NA

#Save data

write.csv(meltdata, paste(mainpath,filename,”_”,todaysdate,”_”,”melted.csv”,sep=””))

#Read in the cleaned up data

mydata <- meltdata

mydata<-mydata[which(mydata\$Bstrain != “NA”),]  #Remove blanks

mydata\$ID<-paste(mydata\$Date,mydata\$Well,mydata\$Bstrain,mydata\$Bline,mydata\$Dstrain,mydata\$Dline,mydata\$Dtreatment, sep=”_”) #add ID column that is a string of all of the columns(except Notes).  This is intended to give each line a unique name.

#Read the help for the fitr script

#?d_gcfit

#Run the script, finding mu max values for each, and saving everything into an object called ‘rates’

rates <- d_gcfit(data=mydata,    #The growth curve data in long form

w_size=6,       #How many data points the script should use for each line fitting

od_name=”OD600″,#The column name containing the absorbance data (y axis)

time_name=”Time”,#The column name containing the time data (x axis)

trafo=”log”,

logBase=2,

RsqCutoff=0.95, #This is the cutoff of how accurate the regression line must be without being rejected

growthCheck=”none”,

parallel=FALSE,

progress=”text”,

.inform=TRUE)

#rates #view the rates object

fitsforexport<-rates\$bestfits #save the line fits (but not the rest of the script’s outputs) into a subobject called fitsforexport

#Copy all of the identifying columns and merge them into the fits

mydata2<-mydata[which(mydata\$Time == 1),]

mergedata<-merge(mydata2,fitsforexport,by.x=c(“ID”),by.y=c(“ID”))

write.csv(mergedata, paste(mainpath,filename,”_”,todaysdate,”_”,”fit.csv”,sep=””))

#Remove unnecessary columns

mergedata<-mergedata[ , !(names(mergedata) %in% drops)]

#Remove lines that the script could not find clean fits for (ie rsq>.95) or lines in which there was no growth

mergedata<-mergedata[which(mergedata\$mumax != “NA”),]

#Add a column, Bevolved, which specifies if a Burk strain is evolved or not

mergedata\$Bevolved<-FALSE

mergedata\$Bevolved[which(mergedata\$Bline %in% c(“E1″,”E2″,”E3”))]<-TRUE

#Add a column, Devolved, which specifies if a Dicty strain is evolved or not

mergedata\$Devolved<-NA

mergedata\$Devolved[which(mergedata\$Dline %in% c(“E1″,”E2″,”E3″))]<-TRUE

mergedata\$Devolved[which(mergedata\$Dline ==”Anc”)]<-FALSE

#add a new column, Matched, which indicates whether a B and D strain belong to one another

mergedata\$Matched<-FALSE

mergedata\$Matched[which(mergedata\$Bstrain==”Bf” & mergedata\$Dstrain==”QS6″)]<-TRUE

mergedata\$Matched[which(mergedata\$Bstrain==”BD66″ & mergedata\$Dstrain==”QS9″)]<-TRUE

mergedata\$Matched[which(mergedata\$Bstrain==”soil99″ & mergedata\$Dstrain==”QS18″)]<-TRUE

mergedata\$Matched[which(mergedata\$Bstrain==”bQS70″ & mergedata\$Dstrain==”QS70″)]<-TRUE

mergedata\$Matched[which(mergedata\$Bstrain==”bQS159″ & mergedata\$Dstrain==”QS159″)]<-TRUE

mergedata\$Matched[which(mergedata\$Bstrain==”bQS161″ & mergedata\$Dstrain==”QS161″)]<-TRUE

mergedata\$Matched[which(mergedata\$Bstrain==”bQS11″ & mergedata\$Dstrain==”QS11″)]<-TRUE

mergedata\$Matched[which(mergedata\$Bstrain==”bQS21″ & mergedata\$Dstrain==”QS21″)]<-TRUE

mergedata\$Matched[which(mergedata\$Bstrain==”bQS69″ & mergedata\$Dstrain==”QS69″)]<-TRUE

mergedata\$Matched[is.na(mergedata\$Dstrain)]<-NA

#add a new column, Dictystatus, which indicates whether a Dicty strain is evolved or ancestral (ie collapses E1, E2, and E3)

mergedata\$Dictystatus<-“None”

mergedata\$Dictystatus[which(mergedata\$Devolved ==FALSE)]<-“Anc”

mergedata\$Dictystatus[which(mergedata\$Devolved ==TRUE)]<-“Evolved”

#Save final dataset

write.csv(mergedata, paste(mainpath,filename,”_”,todaysdate,”_”,”annotated.csv”,sep=””))

finaldata<-mergedata

###############################################################################################################

####Plotting for runs including Dicty##########################################################################

###############################################################################################################

matcheddata<-finaldata[which(finaldata\$Matched %in% c(TRUE, NA)),]

matcheddata<-matcheddata[which(matcheddata\$Bstrain!=”KP”),]

#matcheddata<-matcheddata[which(matcheddata\$Date != “102217”),] #remove the 102217 run, which was HK and probably shouldn’t have gone in there in the first place

matcheddata<-matcheddata[which(!(matcheddata\$Date %in% c(“102217″,”30818″,”22818″,”30918”))),]

#Set the order of the points correctly

matcheddata\$Bstrain<-factor(matcheddata\$Bstrain, levels=c(“Bf”,”BD66″,”soil99″,”bQS70″,”bQS159″,”bQS161″,”bQS11″,”bQS21″,”bQS69″))

matcheddata\$Dline[is.na(matcheddata\$Dline)]<-“None”

matcheddata\$Dline<-factor(matcheddata\$Dline, levels=c(“None”,”Anc”,”E1″,”E2″,”E3″))

matcheddata\$Dictystatus<-factor(matcheddata\$Dictystatus, levels=c(“None”,”Anc”,”Evolved”))

matcheddata\$Doublingtime<-(0.69314718056/matcheddata\$mumax)/4

#Boxplot for growth rate

plot1<-ggplot(data=matcheddata, aes(x=Dline, y=mumax, fill=Dictystatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=”both”) +

theme(legend.position=’none’) +

labs(y=”Specific growth rate (mu)”,x=””) +

ggtitle(“Growth rate by line”)

plot1

#Boxplot for growth rate (collapsed by strain)

plot2<-ggplot(data=matcheddata, aes(x=Dictystatus, y=mumax, fill=Dictystatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=”both”) +

theme(legend.position=’none’) +

labs(y=”Specific growth rate (mu)”,x=””) +

ggtitle(“Growth rate by strain”)

plot2

#Boxplot for growth rate (collapsed by clade)

plot3<-ggplot(data=matcheddata, aes(x=Dictystatus, y=mumax, fill=Dictystatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

theme(legend.position=’none’) +

labs(y=”Specific growth rate (mu)”,x=””) +

plot3

##################

#Do some math to change it into fold changes

#Calculate as fold change

grouped1<-group_by(matcheddata, Dstrain, Dline, Bstrain)

stats1<-summarise(grouped1, N=length(mumax),Average=mean(mumax),StDev=sd(mumax))

stats1\$ID<-paste(stats1\$Bstrain,stats1\$Dstrain,stats1\$Dline, sep=”_”)

matcheddata\$ID<-paste(matcheddata\$Bstrain,”NA”,”None”, sep=”_”)

mergedata<-merge(matcheddata,stats1[,c(5,7)],by.x=c(“ID”),by.y=c(“ID”))

mergedata\$Foldchange<-mergedata\$mumax/mergedata\$Average

#Boxplot for growth rate (scaled)

plot4<-ggplot(data=mergedata, aes(x=Dline, y=Foldchange, fill=Dictystatus)) +

geom_boxplot() +

geom_hline(yintercept=1) +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=”both”) +

theme(legend.position=’none’) +

labs(y=”Fold change of specific growth rate (mu)”, x=””) +

ggtitle(“Growth rate (scaled) by line”)

plot4

#Boxplot for growth rate (scaled) (collapsed by strain)

plot5<-ggplot(data=mergedata, aes(x=Dictystatus, y=Foldchange, fill=Dictystatus)) +

geom_boxplot() +

geom_hline(yintercept=1) +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=”both”) +

theme(legend.position=’none’) +

labs(y=”Fold change of specific growth rate (mu)”, x=””) +

ggtitle(“Growth rate (scaled) by strain”)

plot5

#Boxplot for growth rate (scaled) (collapsed by clade)

plot6<-ggplot(data=mergedata, aes(x=Dictystatus, y=Foldchange, fill=Dictystatus)) +

geom_boxplot() +

geom_hline(yintercept=1) +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

theme(legend.position=’none’) +

labs(y=”Fold change of specific growth rate (mu)”, x=””) +

plot6

#############

############################################################################

#Doin some stats

########

#STATS STUFF:

#Use non parametric test to see if Anc differ from evolved

wilcox.test(mergedata\$Foldchange[which(mergedata\$Dictystatus==”Anc”)], mergedata\$Foldchange[which(mergedata\$Dictystatus==”Evolved”)])

#They do.

summary(teststats)

#Status (whether D is evolved or not), clade, and status*clade are all quite significant

#Try again with fold change

summary(teststats)

#Status (whether D is evolved or not) and status*clade are quite significant

#################################

#Boxplot for doubling time

plot4<-ggplot(data=matcheddata, aes(x=Dline, y=Doublingtime, fill=Dictystatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=’both’) +

ylim(0,15) +

theme(legend.position=’none’) +

labs(y=”Doubling time (hours)”) +

ggtitle(“Doubling time by line”)

plot4

#Boxplot for doubling time (collapsed by strain)

plot5<-ggplot(data=matcheddata, aes(x=Dictystatus, y=Doublingtime, fill=Dictystatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=’both’) +

ylim(0,15) +

theme(legend.position=’none’) +

labs(y=”Doubling time (hours)”) +

ggtitle(“Doubling time by strain”)

plot5

#Boxplot for doubling time (collapsed by clade)

plot6<-ggplot(data=matcheddata, aes(x=Dictystatus, y=Doublingtime, fill=Dictystatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

ylim(0,15) +

theme(legend.position=’none’) +

labs(y=”Doubling time (hours)”) +

plot6

###############################################################################################################

####Plotting for runs without Dicty##########################################################################

###############################################################################################################

#Remove lines with Dicty present

matcheddata2<-finaldata[which(is.na(finaldata\$Dstrain)),]

matcheddata2<-matcheddata2[which(matcheddata2\$Date %in% c(“30918″,”31618″,”30818”)),]

#Set the order of the points correctly

matcheddata2\$Bstrain<-factor(matcheddata2\$Bstrain, levels=c(“KP”,”Bf”,”BD66″,”soil99″,”bQS70″,”bQS159″,”bQS161″,”bQS11″,”bQS21″,”bQS69″))

matcheddata2\$Bline[is.na(matcheddata2\$Bline)]<-“Anc”

matcheddata2\$Doublingtime<-(0.69314718056/matcheddata2\$mumax)/4

matcheddata2\$Bstatus<-“None”

matcheddata2\$Bstatus[which(matcheddata2\$Bevolved ==FALSE)]<-“Anc”

matcheddata2\$Bstatus[which(matcheddata2\$Bevolved ==TRUE)]<-“Evolved”

#Boxplot for growth rate

plot7<-ggplot(data=matcheddata2, aes(x=Bline, y=mumax, fill=Bline)) +

geom_boxplot() +

scale_fill_manual(values=c(“#F8666D”,”#00BFC4″,”#00BFC4″,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=’both’) +

theme(legend.position=’none’) +

labs(y=”Specific growth rate (mu)”, x=””) +

ggtitle(“Growth rate (Dictyostelium absent) by line”)

plot7

#Boxplot for growth rate (collapsed by strain)

plot8<-ggplot(data=matcheddata2, aes(x=Bstatus, y=mumax, fill=Bstatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=’both’) +

theme(legend.position=’none’) +

labs(y=”Specific growth rate (mu)”, x=””) +

ggtitle(“Growth rate (Dictyostelium absent) by strain”)

plot8

#Boxplot for growth rate (collapsed by clade)

plot9<-ggplot(data=matcheddata2, aes(x=Bstatus, y=mumax, fill=Bstatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#F8666D”,”#00BFC4″)) +

geom_point() +

theme(legend.position=’none’) +

labs(y=”Specific growth rate (mu)”, x=””) +

ggtitle(“Growth rate (Dictyostelium absent) by clade”)

plot9

#Boxplot for doubling time

plot10<-ggplot(data=matcheddata2, aes(x=Bline, y=Doublingtime, fill=Bline)) +

geom_boxplot() +

scale_fill_manual(values=c(“#F8666D”,”#00BFC4″,”#00BFC4″,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=’both’) +

theme(legend.position=’none’) +

labs(y=”Doubling time (hours)”, x=””) +

ggtitle(“Doubling time (Dictyostelium absent) by line”)

plot10

#Boxplot for doubling time (collapsed by strain)

plot11<-ggplot(data=matcheddata2, aes(x=Bstatus, y=Doublingtime, fill=Bstatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=’both’) +

theme(legend.position=’none’) +

labs(y=”Doubling time (hours)”, x=””) +

ggtitle(“Doubling time (Dictyostelium absent) by strain”)

plot11

#Boxplot for doubling time (collapsed by clade)

plot12<-ggplot(data=matcheddata2, aes(x=Bstatus, y=Doublingtime, fill=Bstatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#F8666D”,”#00BFC4″)) +

geom_point() +

theme(legend.position=’none’) +

labs(y=”Doubling time (hours)”, x=””) +

ggtitle(“Doubling time (Dictyostelium absent) by clade”)

plot12 