Women and Wikipedia

Scientists considering a hike, not their Wikipedia pages.

Perhaps you have seen that Nobel Laureate in chemistry, Donna Strickland did not have a Wikipedia page until just now. She was deemed by the moderator not to be worthy back in March 2018. See the discussion on the Wikipedia talk page, or in any number of publications. It has been fixed, but does a woman have to get a Nobel prize before the overly strict moderators value them?

I wrote about this before, here. In that post you can see that a scientist named Debra Brock was denied a Wikipedia page though all kinds of athletes were approved, even if they are limited to those with the last name Brock.

Other female scientists whose Wikipedia pages I have been involved in creating have either been taken down or challenged. What do people mean by notability? I think being a professor or active scientist should be enough. Shouldn’t we be more concerned about accuracy and completeness? Is having a Wikipedia page at all such a big honor?

So I had to look at my own page. It is almost nothing. There was more once but someone stripped it all away. Since no one close to one or oneself can work on these pages, it is easier to lose than gain information.

I know there are people that have as projects to write Wikipedia pages for women. I did at one time, but got discouraged that so many were taken down. I did not want to subject my students to that level of discouragement. Higher ups at Wikipedia said they were powerless to fix this.

Maybe now with the extreme nature of this particular woman being told she gets a Nobel prize, but not a Wikipedia page, maybe now the trolls that take women down on Wikipedia will hesitate. I am not optimistic, though.

Advertisements
Posted in Awards and prizes, Gender bias, Wikipedia | Tagged , , , , | 3 Comments

Wissenschaftskolleg: It’s not just time to write, it is connections with fabulous novelists, thoughtful former politicians, historians, and scientists

The welcome with Hassan Salem, Peter Hammerstein, David Queller, Janis Antonovics, and Mandy Gibson

Ever since I got to the Wissenschaftskolleg in Berlin, I have been trying to understand  what I can offer it and what it can offer me. This is the script: I come here for 10 months, take no more than a combined one month off during this period, eat 3 lunches, one dinner, and one brunch with my fellow Wiko members, go to at least one talk a week, and for that and whatever project suits my fancy I am paid, given a lovely apartment and all the library and computer help I could dream of. Oh, and also 3 weeks of intensive German and continuing classes in the language should I want them. Did I say Berlin? Berlin! A city to love!

Why does Berlin do this? You can see here who we are this year. What will we offer to Berlin? Or what will we not offer in terms of fellowship, scholarship, academic advances, and personal freedom for those of us that come from places not so free. We are given some guidance from our leader, Rector Barbara Stollberg-Rilinger, and Thorsten Wilhelmy, here.  Actually, there is a large staff and they all help.

Barbara told us that although we all wrote detailed proposals to come here, we do not have to do that work. We are free to explore. There will be no evaluation, no pressure, other than that what you put on yourself. “We offer you time.” she said. It is a respite with an inspiring intellectual environment, perhaps broader than we have at home.

Barbara went on to tell us what she wanted to hear in our Tuesday talks and this has had me really thinking. Since we are so diverse, a research Powerpoint will not work. Instead, she wants to know: 1) what counts as a problem in our discipline; 2) what counts as a sound argument in our discipline; and 3) how do you know when you are right. I have been thinking about these challenges for the last month. They actually caused me to write my first piece here on something entirely new for me (creativity).

Berlin!

Thorsten gave me other things to think about. He focused on the Wiko paradoxes in a way that brought us all back to earth. He started with the paradise paradox and reminded us that while Wiko was amazing, it is not a paradise. After all, we are not teaching students, so if everything were like Wiko, we would die out in a generation. If I leave here feeling that I loved it, but love home more, that will not be a failure. Another of his paradoxes was the Humboldt paradox, by which he meant that solitude has its negative side, so Wiko welcomes partners, spouses, and children. And it encourages us to talk to others. I won’t tell you all the paradoxes, but one I have noticed having a large effect is that we are all playing an away game. No one of the fellows is permanent (well actually there are some permanent fellows, those lucky souls), so we are making faster friends than we might otherwise. The last paradox I will reveal is the productivity paradox. We are not required to be productive. Is this the way to get the most counter-intuitive imagination to blossom?

Our home in Villa Walther

Why does Berlin care is something I keep getting back to. Well, we had the Empfang, a huge welcome with more than 300 people, including the mayor of Berlin. There were people from all sorts of places, including several academics I already knew. This place has put Berlin on the academic map, perhaps more than any other. What a wonderful thing that Berlin lionizes independent thinking in the way another city might celebrate its sports team.

Now I am here for another 9 months. I hope to write a book that will help people be a help to our troubled natural world. I hope to become more creative. I hope to take away things that transform my existence in my home institution. I hope to come to terms with the city my father had to flee in 1937. Maybe we can all bring a bit of the Wiko mentality away with us and around the globe.

 

 

Posted in Managing an academic career, New ideas, Sabbatical, Social interactions | Tagged , , , , | Leave a comment

How to read a scientific paper

Undergraduates Rintsen, Rory, Clarissa, and Cara are learning to read papers

Do you remember when you read your first scientific paper? For me it was hard. Some parts I did not understand. Other parts were interesting. The structure seemed odd, with a narrative that did not flow. I read it from beginning to end a couple of times. I felt like the places I did not understand were my fault, not the writer’s. I then went to a class discussion of the paper and was amazed at all the flaws others could point out in the paper, flaws I did not see at all until they were pointed out. How could I become the person that could find the flaws and also see the strengths?

Now I know how to read a scientific paper and will share some tips that should help everyone. One thing about these tips is that they can also be very useful to keep in mind when you are writing a paper. A scientific paper need not be about science. It is simply a paper that backs up its claims with evidence, in the form of citations, either as inline references to other articles or as footnotes. My perspective, however, comes from my own field, biology.

The first thing to decide is why you are reading the paper. Did its title catch your eye? Are you working on something similar? Did you get it to review for a journal? Or are you reading it because it was assigned for a class? Why you are reading the paper will determine how you read the paper. In fact, you won’t read most papers. You will scan them for what you are looking for and then move on. It might be in the discussion where you find other references. It might simply be the data in the figures. You might be looking for a method to try. Don’t feel you have to read every paper through. I often don’t.

Most papers you come across, you will just read the title. This is true if you look at the bibliography of a paper, or get a table of contents emailed to you, or have set up a Google Scholar reminder on a certain topic. I see hundreds of papers a week in this way and only read past the title for a handful of them. So a strong title should tell what a paper is about.

The second thing you will read is the abstract. You will read hundreds more abstracts than full papers. The abstract should tell the whole story, not just what the paper is about. I have written about how to write a perfect abstract here. Such an abstract should identify the topic, state what has gone before, add what will be new here, give the results, then indicate how the field has changed because of the study. Sound impossible? Even complex work can do this in a few sentences. In fact, the paper I’m reading right now does it marvelously, here.

Where you go next might vary according to your specific interests, but I tend to look at the figures next. The figures and their captions alone should tell the story. All important findings should be in the figures. Look at the figures carefully to see what the research has discovered. Which figures are key and which are secondary?

OK, let’s say this is a paper you really want to understand and you are actually going to read the whole thing. There are lots of ways of doing this. I tend to read title, abstract, figures, results, methods. Then I make up my own mind what the paper is about, and what it actually shows. Then I read the introduction and discussion. This should tell how these results are put in context. I like to read them second because authors all too often make conclusions from their results that are grander than they actually warrant. If I read their own framing first, I might be taken in.

But if it is not a familiar field, I might actually read the paper in the order it is presented because the results won’t make much sense to me without the framing. Then I am less able to judge whether the work delivers on its promises. Assessing this is an important part of critical reading and is always something to address in a discussion of a paper.

What if you are totally new to reading papers and the paper you are supposed to be critical about seems totally fine and you can’t imagine what you might say? What to do? I have two tips. One is to find another paper close in subject to the first one and compare them. This will usually give you something interesting to say. You can find that other paper with a Google Scholar search of the topic, limiting to the more recent couple of years. Or it might be one actually cited in your paper. This can help a lot in a discussion.

The other thing to do if you can’t think of what to say, or if you really want to understand the paper is to follow the sample sizes and degrees of freedom on the experiments. Degrees of freedom should be independent and free to vary. All too often they are not, which means the statistics are incorrect. Tracing through these numbers can help you understand the exact experiments or observations of the paper and will often give you something to say.

Each paper you read is part of the great web of knowledge. The better it tells you where it fits in that network, the easier it will be to read. The more you know of that corner of the web, the more easily you will understand and critique the paper. Just remember to read critically, not to assume the study does what the authors claim it does, and to try hard to see what is new. Have fun!

Posted in Presentations and seminars, Publishing your work, Reading critically, Scholarship, Undergraduates, Writing | Tagged , , | Leave a comment

What does a professor or a postdoc do at an advanced study institute?

We are about to start 10 months at the renowned Wissenschaftskolleg zu Berlin, a place where academics go to concentrate on their research and to find inspiration across the academy. You might think you have to be advanced yourself to land a spot here, but that is not the case. There are postdocs here too, like Hassan Salem, normally of Emory University in Nicole Gerardo’s lab.

In our first 10 days here we have almost entirely concentrated on improving our German and on getting settled. We have met sociologists from France and Michigan, artists from Chicago, novelists from Kenya, historians from Turkey and Boston, and just now a few biologists we already knew.  What is the point of traveling around the world to do things we could do at home?

Potsdam, tearing down DDR remnants, on a German class field trip.

Well, the truth is we couldn’t really do them at home. We could not concentrate so fully on ideas. We could not learn from others so different from ourselves. We could not skip all the things at home that seemed so important and do something different. This is what our new leader here, the first woman at the helm, Barbara Stollberg-Rilinger said to the Frankfurter Allgemeine Zeitung on Wednesday 29 August 2018.

Here we get space in an institute that complements our home universities. I might learn better to write from novelists, or historians. I might learn things I could not imagine from my colleagues of the year, here. How many things like this are there? I don’t know, but this one might be unique for its mix across the disciplines.

Thus far I have been humbled by all the scholarship that surrounds me and the challenges of German, from prepositions that sometimes take the dative and sometimes the accusative. But this struggle should be positive, not daunting. By the end of the year my German will be better and the better parts of two books will be written, I hope.

The things we learn first are about buying tickets for public transportation, how to use our cell phones, and the complexities of German recycling. I have learned the etiquette of walking a dog, unfortunately on a leash unlike many German dogs. You don’t pet other people’s dogs, and it is best if all ignore each other, to the disappointment of our eager Zeus.

German class coffee break, Wiko

I asked the wise Jennifer Fewell how to make the most out of the year. She recommended that I forget about all my earlier projects and find people very different from me to learn from. I think I’ll take the second half of that advice. She recommended bikes, but we have a pup who walks and doesn’t ride. So remember to tailor any advice to oneself and your circumstances.

No words for this horror.

I walked today to Gleis 17, the track from which many Jews were transported today to their deaths in my father’s lifetime. For me, being in Berlin will be a time of remembering, of witnessing, of learning. I hope it will be a time of growth and intellectual discovery.

How does one learn about such things? From this blog, from talking to people, and from being unafraid to apply! I’ll be posting more on the life in an Advanced Study Institute.

Posted in Managing an academic career, Sabbatical | 2 Comments

How do you get an academic job in biology?

You have published your research, figured out how to apply for grants, identified some absorbing big ideas to spend a few years or a lifetime on, but now you want that coveted academic job to put this all together. It will entail research, teaching, becoming part of a community of scholars, teachers, and researchers. How do you do it? I’ve written a bit on this before, so this post largely compiles those earlier ones. But first, just apply broadly. Don’t overthink each application, just get them out there. Have a system and just do it as you see the job ads. Have your referees ready. We don’t mind sending the letter to lots of places.

Make your application stand out, as described here. This is important, because hiring fairly is hard, described here. And, no, I don’t think a rubric would help, here.  Here are some reasons we will hire you, here.

Just send your job application to the official address, not to anyone else generally, see here.  Think about your cover letter and keep it pithy and to the point, see here. Do not name possible collaborators here. This could work against you, see here.

If your publications include many multiple-author papers lots of which you are middle author, please describe your contributions, perhaps with a sentence below the reference, see here.

Here is one on how I read your file on first pass, here.

Once you get a job interview, there are some things you should not do, covered here. And here is some advice for the chalk talk. After all, they are very challenging, see here.

If you get a phone interview, do this.

Do not worry too much about the order in which you are interviewed and never agree to an interview on a very short time line unless you are ready, read more here.

The process of deciding whom to hire is complex. It covers area, collegiality, and how we agree or do not on the candidates. Here are some guidelines to the process at our institution, with details from a search from a few years ago. We won’t necessarily agree on the top candidates for these reasons. Ranking into categories, not one by one is better, but it is not often done, unfortunately, more information here. Ultimately we hire someone, like this.

Remember, there are questions you cannot be asked legally. This covers them. If you are asked them, you can demur, perhaps turn it into a joke, change the subject, or answer. Keep track of illegal things you are asked, but I’m not sure it will help.

Have fun, don’t stress, keep doing research and mentoring and remember why you are in this in the first place! And lots of others have written on these topics. Go find them too!

IMG_2024

Fred Inglis (back second from left) and Longfei Shu (back right end) got faculty positions this year!

Posted in Interviewing, Jobs | Tagged , , | Leave a comment

Get your undergrads thinking about analysis from the start

The last post talked about making sure undergrads get the big picture of their questions. This is essential, but it is not the end. All too often analysis is left for the end and there is no exploring. Ideally, students learn their analytical techniques right from the start. One of my grad students, Tyler Larsen, is doing a great job of this.

Here is something he sent Cara Jefferson who is working on a project that is an extension of his. The details will be different for everyone, so this might not mean a lot to others, but there are things to get out of it.

First, Tyler encourages graphic exploration of the data.

Second, he does not give everything in detail to Cara but instead gives his scripts and encourages her to think about how her data differ from his and modify the analyses. It is too easy to turn off your brain if you are given everything. But he makes it clear he is available for help.

Third, Tyler suggests ways to explore the data, things she might look for. Cara is fairly new on the project so this is a really good idea. But he doesn’t tell her exactly how to do it.

Fourth, Tyler’s questions lead to the important questions of the research project.

Fifth, Tyler makes it clear that science has variation and we have to worry when it is caused by things like date of experiment and helps her see how to look for that.

Sixth, Tyler gives Cara the R code necessary to start the work. The sooner students get comfortable with R, the better, and getting code for one’s project is a good way to start. He gave her the R code in an R format, but I appended it to this post.

What you see below is all the stuff Tyler thought hard about and gave Cara. It is an wonderful example of a careful grad student mentor working with an excellent undergrad.

Cara,

It is valuable when you’ve collected a bunch of data to take the time to play with it and look for patterns.  Obviously we will ultimately do statistics to test any hypotheses we ultimately want to test, but in the meantime it is worthwhile to graph things every which way to Sunday and see what pops out.

Here is a non-exhaustive list of questions to answer about your data.  Some of these may be easy to tackle in Excel on the spreadsheet itself, while some others will require some R.  Consult the R script (attached) for guidance.  The first part of it restructures the data, calculates growth rates, and further annotates it with a few columns that will be useful for pulling out specific subsets to graph.  If you’ve set up your data correctly you should be able to run it as is (though you’ll have to specify the ‘mainpath’, ‘todaysdate’, and ‘filename’ variables).  If you have any trouble running the first part please let me know and I’ll help you sort it out.  The second part makes graphs of various sorts and is more fluid.  You may find the code as written helpful but it was written for my experiments, not yours, and so some simple modifications will be necessary.  See if you can figure them out.  Think carefully about what specific question you are hoping a graph will answer.  It can be helpful to draw on paper what the final graph would look like if your hypotheses were correct, just to work out what should be on the x axis, y axis, etc.

  1. In Excel, take a look at the growth curves themselves.  Plot at least a few of them as scatter or line graphs.  Do they take the shape we talked about, with a flat beginning, an exponential rise, a plateau, and then a slow decline?  Take note of any you see that don’t look like this and try to think about why.
  2. Look at the final OD values reached by different strains.  This is the yield.  It isn’t as informative as the growth rate (it’s very sensitive to the specific growth conditions), but it may differ between strains.
  3. Plot the data from the negative control wells.  The media only wells should not grow at all (ie just a flat line at around 0).  If they do the media was contaminated and that plate is likely not usable. Ideally the Dicty negative controls (the wells with Dicty but not Burk) should not grow either but past experience tells me this is wishful thinking and many often do show some growth, presumably from KP that survived the antibiotic treatment.  Take note of how long it took these wells to grow – if it’s later than the Burk strains (as it often is), it probably isn’t a concern.  If it’s earlier it is something we need to look more carefully at, lest we accidentally measure the wrong strain’s growth rate.
  4. Run the first part of the script to calculate growth rates.  Note that growth curves with aberrant shapes are hard to calculate growth rate from.  The script will notice most of these and refuse to calculate growth rates for them, but it does miss some of them.  If you get a number that seems really weird, go back and plot that well’s growth curve.  Chances are it’ll be screwy and you’ll know it’s not to be trusted.
  5. In Excel or R, convert all of the growth rates into doubling times.  Doubling time = ln(2)/µ, and is usually expressed in growth per hour.  Remember our time points are 15 minutes apart.
  6. Now compare growth rates or doubling times between strains.  Do this first just by eye on the spreadsheet, and then try to produce a graph to answer these questions (one graph may be able to address more than one question at a time).
    1. How do the ancestor strains’ growth rates compare?  (No Dicty present)
    2. How do evolved strains compare to ancestor strains?  (No Dicty present)  Are they faster or slower?  Does it vary by strain?  Why?
    3. How does adding Dicty change growth rates of ancestor strains?  Is it faster or slower?  Does it vary by strain?  Why?
    4. How does adding Dicty change growth rates of evolved strains?  Are evolved Burk strains more or less sensitive to the effects of adding Dicty?  Does it vary by strain?  Why?
  1. How consistent are your results between days?  Do you get the same answers?  If not, why not?

If you get stuck on any of the initial part (setting up data, calculating growth rates, etc), come find me and I’ll help you.  For the interpretation part I encourage you to take a sincere whack at it yourself before you seek help.  We’ll go over it once you’ve had the chance to give it some thought.

And he gave her the R code to use:

##Here is a script for analyzing growth rates from the Tecan

##Useful link:  https://www.r-bloggers.com/analyzing-microbial-growth-with-r/

##This version of the script was modified on 03/21/18 improve usability and incorporate some statistics

 

##############################################################################################################

 

#Install packages – this should only be necessary once per computer.

#install.packages(“ggplot2”)

install.packages(“reshape2”)

#install.packages(“dplyr”)

#install.packages(“readr”)

#install.packages(“devtools”)

#library(devtools)

#install_github(“dcangst/fitr”)

 

rm(list=ls()) # clear memory and start fresh

#Load packages into memory (will need to be downloaded and installed first)

library(ggplot2) #For plotting

library(reshape2)  #For converting data between wide and long forms

library(dplyr)  #For grouping and manipulating data

library(readr)  #For importing CSV files

library(fitr)  #For the actual growth rate analysis

 

###############################################################################################################

 

#Set some strings for file naming

mainpath<-“C:/Users/Tyler/Google Drive/School/QSLab/ExperimentalEvolution/Platereaderdata/”

todaysdate<-“042018”

filename<-“combineddata”

 

##Import data.  Data should have columns Date, Well, Bstrain, Bline, Dstrain, Dline, Dtreatment, and Notes (case sensitive)

simpledata <- read_csv(paste(mainpath,”combineddata_042018.csv”,sep=””))

backup<-simpledata #make a backup

 

#convert the data to long form.  The id argument defines which columns are categories identifying the

#data points.  Everything else is the data.  Variable.name and value.name arguments name the two new columns in the resultant data

meltdata<-melt(simpledata, id=c(“Date”,”Well”,”Bstrain”,”Bline”,”Dstrain”,”Dline”,”Dtreatment”,”Notes”),variable.name=”Time”,value.name=”OD600″)

meltdata$Time<-as.numeric(meltdata$Time) #Convert time from a factor into a number

meltdata$OD600<-as.numeric(meltdata$OD600) #Convert OD600 from a character into a number

meltdata$Bstrain[which(meltdata$Bstrain==”None”)]<-NA

#Save data

write.csv(meltdata, paste(mainpath,filename,”_”,todaysdate,”_”,”melted.csv”,sep=””))

 

#Read in the cleaned up data

mydata <- meltdata

mydata<-mydata[which(mydata$Bstrain != “NA”),]  #Remove blanks

mydata$ID<-paste(mydata$Date,mydata$Well,mydata$Bstrain,mydata$Bline,mydata$Dstrain,mydata$Dline,mydata$Dtreatment, sep=”_”) #add ID column that is a string of all of the columns(except Notes).  This is intended to give each line a unique name.

 

#Read the help for the fitr script

#?d_gcfit

 

#Run the script, finding mu max values for each, and saving everything into an object called ‘rates’

rates <- d_gcfit(data=mydata,    #The growth curve data in long form

w_size=6,       #How many data points the script should use for each line fitting

od_name=”OD600″,#The column name containing the absorbance data (y axis)

time_name=”Time”,#The column name containing the time data (x axis)

trafo=”log”,

logBase=2,

RsqCutoff=0.95, #This is the cutoff of how accurate the regression line must be without being rejected

growthCheck=”none”,

parallel=FALSE,

progress=”text”,

.inform=TRUE)

#rates #view the rates object

fitsforexport<-rates$bestfits #save the line fits (but not the rest of the script’s outputs) into a subobject called fitsforexport

 

#Copy all of the identifying columns and merge them into the fits

mydata2<-mydata[which(mydata$Time == 1),]

mergedata<-merge(mydata2,fitsforexport,by.x=c(“ID”),by.y=c(“ID”))

write.csv(mergedata, paste(mainpath,filename,”_”,todaysdate,”_”,”fit.csv”,sep=””))

#Remove unnecessary columns

drops<-c(“X1″,”Time”,”OD600″,”trafo”,”logBase”,”growth”,”comment”,”numP”,”nTime”,”minT”,”maxT”,”intercept”,”adj.r.sq”,”dt”,”maxOD”,”minOD”)

mergedata<-mergedata[ , !(names(mergedata) %in% drops)]

 

#Remove lines that the script could not find clean fits for (ie rsq>.95) or lines in which there was no growth

mergedata<-mergedata[which(mergedata$mumax != “NA”),]

#Add a column, Bclade, which specifies which clade/category each Burk strain belongs to (will be useful for stats comparing between clades)

mergedata$Bclade<-mergedata$Bstrain

mergedata$Bclade[which(mergedata$Bclade ==”KP”)]<-NA

mergedata$Bclade[which(mergedata$Bclade %in% c(“Bf”,”BD66″,”soil99″))]<-“nonsymbiont”

mergedata$Bclade[which(mergedata$Bclade %in% c(“bQS70″,”bQS159″,”bQS161”))]<-“B1”

mergedata$Bclade[which(mergedata$Bclade %in% c(“bQS11″,”bQS21″,”bQS69”))]<-“B2”

#Add a column, Bevolved, which specifies if a Burk strain is evolved or not

mergedata$Bevolved<-FALSE

mergedata$Bevolved[which(mergedata$Bline %in% c(“E1″,”E2″,”E3”))]<-TRUE

#Add a column, Devolved, which specifies if a Dicty strain is evolved or not

mergedata$Devolved<-NA

mergedata$Devolved[which(mergedata$Dline %in% c(“E1″,”E2″,”E3″))]<-TRUE

mergedata$Devolved[which(mergedata$Dline ==”Anc”)]<-FALSE

#add a new column, Matched, which indicates whether a B and D strain belong to one another

mergedata$Matched<-FALSE

mergedata$Matched[which(mergedata$Bstrain==”Bf” & mergedata$Dstrain==”QS6″)]<-TRUE

mergedata$Matched[which(mergedata$Bstrain==”BD66″ & mergedata$Dstrain==”QS9″)]<-TRUE

mergedata$Matched[which(mergedata$Bstrain==”soil99″ & mergedata$Dstrain==”QS18″)]<-TRUE

mergedata$Matched[which(mergedata$Bstrain==”bQS70″ & mergedata$Dstrain==”QS70″)]<-TRUE

mergedata$Matched[which(mergedata$Bstrain==”bQS159″ & mergedata$Dstrain==”QS159″)]<-TRUE

mergedata$Matched[which(mergedata$Bstrain==”bQS161″ & mergedata$Dstrain==”QS161″)]<-TRUE

mergedata$Matched[which(mergedata$Bstrain==”bQS11″ & mergedata$Dstrain==”QS11″)]<-TRUE

mergedata$Matched[which(mergedata$Bstrain==”bQS21″ & mergedata$Dstrain==”QS21″)]<-TRUE

mergedata$Matched[which(mergedata$Bstrain==”bQS69″ & mergedata$Dstrain==”QS69″)]<-TRUE

mergedata$Matched[is.na(mergedata$Dstrain)]<-NA

#add a new column, Dictystatus, which indicates whether a Dicty strain is evolved or ancestral (ie collapses E1, E2, and E3)

mergedata$Dictystatus<-“None”

mergedata$Dictystatus[which(mergedata$Devolved ==FALSE)]<-“Anc”

mergedata$Dictystatus[which(mergedata$Devolved ==TRUE)]<-“Evolved”

 

#Save final dataset

write.csv(mergedata, paste(mainpath,filename,”_”,todaysdate,”_”,”annotated.csv”,sep=””))

finaldata<-mergedata

 

#finaldata <- read_csv(paste(mainpath,filename,”_”,todaysdate,”_”,”annotated.csv”,sep=””))

 

 

 

 

 

###############################################################################################################

####Plotting for runs including Dicty##########################################################################

###############################################################################################################

 

matcheddata<-finaldata[which(finaldata$Matched %in% c(TRUE, NA)),]

matcheddata<-matcheddata[which(matcheddata$Bstrain!=”KP”),]

#matcheddata<-matcheddata[which(matcheddata$Date != “102217”),] #remove the 102217 run, which was HK and probably shouldn’t have gone in there in the first place

matcheddata<-matcheddata[which(!(matcheddata$Date %in% c(“102217″,”30818″,”22818″,”30918”))),]

 

#Set the order of the points correctly

matcheddata$Bstrain<-factor(matcheddata$Bstrain, levels=c(“Bf”,”BD66″,”soil99″,”bQS70″,”bQS159″,”bQS161″,”bQS11″,”bQS21″,”bQS69″))

matcheddata$Dline[is.na(matcheddata$Dline)]<-“None”

matcheddata$Dline<-factor(matcheddata$Dline, levels=c(“None”,”Anc”,”E1″,”E2″,”E3″))

matcheddata$Dictystatus<-factor(matcheddata$Dictystatus, levels=c(“None”,”Anc”,”Evolved”))

matcheddata$Bclade<-factor(matcheddata$Bclade, levels=c(“nonsymbiont”,”B1″,”B2″))

matcheddata$Doublingtime<-(0.69314718056/matcheddata$mumax)/4

 

#Boxplot for growth rate

plot1<-ggplot(data=matcheddata, aes(x=Dline, y=mumax, fill=Dictystatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=”both”) +

theme(legend.position=’none’) +

labs(y=”Specific growth rate (mu)”,x=””) +

ggtitle(“Growth rate by line”)

plot1

 

#Boxplot for growth rate (collapsed by strain)

plot2<-ggplot(data=matcheddata, aes(x=Dictystatus, y=mumax, fill=Dictystatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=”both”) +

theme(legend.position=’none’) +

labs(y=”Specific growth rate (mu)”,x=””) +

ggtitle(“Growth rate by strain”)

plot2

 

#Boxplot for growth rate (collapsed by clade)

plot3<-ggplot(data=matcheddata, aes(x=Dictystatus, y=mumax, fill=Dictystatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bclade, switch=’both’) +

theme(legend.position=’none’) +

labs(y=”Specific growth rate (mu)”,x=””) +

ggtitle(“Growth rate by clade”)

plot3

 

 

##################

#Do some math to change it into fold changes

 

#Calculate as fold change

grouped1<-group_by(matcheddata, Dstrain, Dline, Bstrain)

stats1<-summarise(grouped1, N=length(mumax),Average=mean(mumax),StDev=sd(mumax))

stats1$ID<-paste(stats1$Bstrain,stats1$Dstrain,stats1$Dline, sep=”_”)

matcheddata$ID<-paste(matcheddata$Bstrain,”NA”,”None”, sep=”_”)

mergedata<-merge(matcheddata,stats1[,c(5,7)],by.x=c(“ID”),by.y=c(“ID”))

mergedata$Foldchange<-mergedata$mumax/mergedata$Average

 

#Boxplot for growth rate (scaled)

plot4<-ggplot(data=mergedata, aes(x=Dline, y=Foldchange, fill=Dictystatus)) +

geom_boxplot() +

geom_hline(yintercept=1) +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=”both”) +

theme(legend.position=’none’) +

labs(y=”Fold change of specific growth rate (mu)”, x=””) +

ggtitle(“Growth rate (scaled) by line”)

plot4

 

#Boxplot for growth rate (scaled) (collapsed by strain)

plot5<-ggplot(data=mergedata, aes(x=Dictystatus, y=Foldchange, fill=Dictystatus)) +

geom_boxplot() +

geom_hline(yintercept=1) +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=”both”) +

theme(legend.position=’none’) +

labs(y=”Fold change of specific growth rate (mu)”, x=””) +

ggtitle(“Growth rate (scaled) by strain”)

plot5

 

#Boxplot for growth rate (scaled) (collapsed by clade)

plot6<-ggplot(data=mergedata, aes(x=Dictystatus, y=Foldchange, fill=Dictystatus)) +

geom_boxplot() +

geom_hline(yintercept=1) +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bclade, switch=”both”) +

theme(legend.position=’none’) +

labs(y=”Fold change of specific growth rate (mu)”, x=””) +

ggtitle(“Growth rate (scaled) by clade”)

plot6

 

 

 

 

 

 

 

 

 

 

#############

############################################################################

#Doin some stats

########

#STATS STUFF:

 

 

#Use non parametric test to see if Anc differ from evolved

wilcox.test(mergedata$Foldchange[which(mergedata$Dictystatus==”Anc”)], mergedata$Foldchange[which(mergedata$Dictystatus==”Evolved”)])

#They do.

 

 

 

teststats<-aov(mumax~Dictystatus*Bclade,data=matcheddata)

summary(teststats)

#Status (whether D is evolved or not), clade, and status*clade are all quite significant

 

#Try again with fold change

teststats<-aov(Foldchange~Dictystatus*Bclade,data=mergedata)

summary(teststats)

#Status (whether D is evolved or not) and status*clade are quite significant

 

 

 

 

 

 

 

 

 

 

#################################

 

#Boxplot for doubling time

plot4<-ggplot(data=matcheddata, aes(x=Dline, y=Doublingtime, fill=Dictystatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=’both’) +

ylim(0,15) +

theme(legend.position=’none’) +

labs(y=”Doubling time (hours)”) +

ggtitle(“Doubling time by line”)

plot4

 

#Boxplot for doubling time (collapsed by strain)

plot5<-ggplot(data=matcheddata, aes(x=Dictystatus, y=Doublingtime, fill=Dictystatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=’both’) +

ylim(0,15) +

theme(legend.position=’none’) +

labs(y=”Doubling time (hours)”) +

ggtitle(“Doubling time by strain”)

plot5

 

#Boxplot for doubling time (collapsed by clade)

plot6<-ggplot(data=matcheddata, aes(x=Dictystatus, y=Doublingtime, fill=Dictystatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#999999″,”#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bclade, switch=’both’) +

ylim(0,15) +

theme(legend.position=’none’) +

labs(y=”Doubling time (hours)”) +

ggtitle(“Doubling time by clade”)

plot6

 

 

 

###############################################################################################################

####Plotting for runs without Dicty##########################################################################

###############################################################################################################

 

#Remove lines with Dicty present

matcheddata2<-finaldata[which(is.na(finaldata$Dstrain)),]

matcheddata2<-matcheddata2[which(matcheddata2$Date %in% c(“30918″,”31618″,”30818”)),]

 

#Set the order of the points correctly

matcheddata2$Bstrain<-factor(matcheddata2$Bstrain, levels=c(“KP”,”Bf”,”BD66″,”soil99″,”bQS70″,”bQS159″,”bQS161″,”bQS11″,”bQS21″,”bQS69″))

matcheddata2$Bline[is.na(matcheddata2$Bline)]<-“Anc”

matcheddata2$Bclade[is.na(matcheddata2$Bclade)]<-“KP”

matcheddata2$Bclade<-factor(matcheddata2$Bclade, levels=c(“KP”,”nonsymbiont”,”B1″,”B2″))

matcheddata2$Doublingtime<-(0.69314718056/matcheddata2$mumax)/4

matcheddata2$Bstatus<-“None”

matcheddata2$Bstatus[which(matcheddata2$Bevolved ==FALSE)]<-“Anc”

matcheddata2$Bstatus[which(matcheddata2$Bevolved ==TRUE)]<-“Evolved”

 

#Boxplot for growth rate

plot7<-ggplot(data=matcheddata2, aes(x=Bline, y=mumax, fill=Bline)) +

geom_boxplot() +

scale_fill_manual(values=c(“#F8666D”,”#00BFC4″,”#00BFC4″,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=’both’) +

theme(legend.position=’none’) +

labs(y=”Specific growth rate (mu)”, x=””) +

ggtitle(“Growth rate (Dictyostelium absent) by line”)

plot7

 

#Boxplot for growth rate (collapsed by strain)

plot8<-ggplot(data=matcheddata2, aes(x=Bstatus, y=mumax, fill=Bstatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=’both’) +

theme(legend.position=’none’) +

labs(y=”Specific growth rate (mu)”, x=””) +

ggtitle(“Growth rate (Dictyostelium absent) by strain”)

plot8

 

#Boxplot for growth rate (collapsed by clade)

plot9<-ggplot(data=matcheddata2, aes(x=Bstatus, y=mumax, fill=Bstatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bclade, switch=’both’) +

theme(legend.position=’none’) +

labs(y=”Specific growth rate (mu)”, x=””) +

ggtitle(“Growth rate (Dictyostelium absent) by clade”)

plot9

 

#Boxplot for doubling time

plot10<-ggplot(data=matcheddata2, aes(x=Bline, y=Doublingtime, fill=Bline)) +

geom_boxplot() +

scale_fill_manual(values=c(“#F8666D”,”#00BFC4″,”#00BFC4″,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=’both’) +

theme(legend.position=’none’) +

labs(y=”Doubling time (hours)”, x=””) +

ggtitle(“Doubling time (Dictyostelium absent) by line”)

plot10

 

#Boxplot for doubling time (collapsed by strain)

plot11<-ggplot(data=matcheddata2, aes(x=Bstatus, y=Doublingtime, fill=Bstatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bstrain, switch=’both’) +

theme(legend.position=’none’) +

labs(y=”Doubling time (hours)”, x=””) +

ggtitle(“Doubling time (Dictyostelium absent) by strain”)

plot11

 

#Boxplot for doubling time (collapsed by clade)

plot12<-ggplot(data=matcheddata2, aes(x=Bstatus, y=Doublingtime, fill=Bstatus)) +

geom_boxplot() +

scale_fill_manual(values=c(“#F8666D”,”#00BFC4″)) +

geom_point() +

facet_grid(.~Bclade, switch=’both’) +

theme(legend.position=’none’) +

labs(y=”Doubling time (hours)”, x=””) +

ggtitle(“Doubling time (Dictyostelium absent) by clade”)

plot12

 

 

 

Posted in Data and analysis, The joy of teaching, Undergraduates | Tagged , , | Leave a comment

Do your undergrads actually understand their summer research project?

Cara Jefferson, undergrad extraordinaire

All over the country, undergraduates are embarking on research projects. They are banding birds, squeezing ticks for parasites, culturing bacteria, seining streams, cutting open mice, and many other things. If you ask them what they are doing, they will be able to tell you. They can probably go over their methods in some detail. They will have learned techniques, how to measure properly, perhaps to use a fancy microscope, or how to untangle a bird from a net.

But do they understand why they are doing this particular project? Could they explain to their congressperson or their friend what questions they will answer and why these questions are important? If they were given a list of 5 projects could they pick out the one that is best for asking and anwering a big question? The answer to this is all too often no. But why?

In a way it is simple. The things that matter on a day to day basis are what they know. They may not have been there when the project was devised. They may have had it explained at the beginning and not again. So it is up to their mentors, the PI of the lab and their bench or field mentors to explain the project, to provide readings on it, and then to listen to the students tell it back to them so it is clear they understand. This should be done orally and in writing. It should be reinforced frequently.

This came up in a different way this week with one of our undergraduates. She is very ambitious and eager to do the best possible work this summer.  Her project is going very well so far. She is very organized and has figured out exactly how much time her project will take and would like another one to fill her time.  So what did we do?

Instead of taking her word for it and moving on to discussing different projects, we had her meet with us and explain the existing project in detail. This gave her another chance to show us she understood it. It gave us a chance to remember in detail exactly what it was since it is a project that spins off of a graduate student project. Of course he was also present. With that refreshed understanding, we were then able to guide her not to a side project, but to ways to augment the existing project. Sometimes this involves additional replicates. Sometimes it involves growing things on their own and just looking at them. Sometimes, for our work which is on a population of evolved bacteria, it involves plating them out clonally and looking for morphological variation in the population. These additional parts of the existing project are the best approach in this case. This will give our marvelous student a better understanding of the project, a way of discovering new angles herself, and a way to fill her research time.

What is the main message? It is that students should be given lots of opportunities to explain the point of their main projects. They should grow with the project, adding dimensions as they find time. It isn’t that a second project is never all right, but in a short summer, doing the best on the main project, from the daily work to writing and analysis, will likely take all the time. Even beginning with a dummy dataset for practicing analysis can be a good idea.

Pay attention to your undergrads and be sure they get the point of the research project, not once, but daily having to remind you and themselves what it is all about and how it can be enhanced. After all, pipetting can get old if you don’t remember the point of it all.

 

Posted in Research, Undergraduates, Your lab group | Tagged , , , | Leave a comment