library("tidyr")
library("ggpubr")
library("stringr")
library("plyr")
library("dplyr")
Bioinformatics Class Project Day 3
1 Introduction
Last week, we set out to test the hypothesis that TEs are contributing to the evolution of the worker-queen differentiation in eusocial shrimp. We began with raw RNAseq data from three workers and three queens. I have assembled a transcriptome from the RNAseq data and ran Repeat Masker on the transcriptome assembly. We analyzed the raw RNAseq data with Galaxy to find the differential expressed genes and Finally, we loaded and explored two data sets in R:
df
- List of transcripts that are SHARED in all individuals with results related to differential expression between queens and workers (mainlylog2FoldChange
andpadj
).df_transcript_te
- List of transposable elements (TEs) and their locations in the transcripts that has TEs.
Our task today is to combine these data sets into a master data set and to decide what analyses can test whether the data support the hypothesis.
2 Load libraries
3 Make a master data set
How do we combine df
and df_transcript_te
? Examine these files to see if there’s common column.
Note that df
has only transcripts that are mapped across all individuals, while df_transcript_te
has only transcripts that have TEs. None of these files have ALL transcripts in the assembled transcriptome. But we do not need that information.
Write some codes to combine df
and df_transcript_te
and save the it as df_tranTE
. Look at the different ways to combine dataframes using dplyr here, decide which one should be used.
# your code
# This should return true if you did this correctly
# nrow(df_tranTE) == nrow(df)
4 Discussion
If the hypothesis is true, what pattern would you expect to see in
df_tranTE
? These are your predictions/analysis.What are the predictor and response in each analysis?
What statistic tests should be used for each analysis?
5 Predictions
Below, we will run through a few example analysis using all TEs.
You do not need to report Prediction 1 in your project report. Instead, I’ll provide a sample report based on Prediction 1.
For Prediction 2 and Prediction 3, you will need to
(1) report the main analysis here (with all TEs) and
(2) modify the codes to run it for your assigned TE class and report it in the project report.
Name TE_class Akhila DNA Diandra DNA Mohab LTR Rebeca LTR Sanjna LINE Steven LINE Tyla SINE Solomon SINE