Wrapup of GO-MWU
Last post, I had run GO-MWU on my two analyses - Elevated Day 2 vs. Ambient Day 0+2 (individual libraries only), and Ambient Day 0+2+17 + Elevated Day 0 + Lowered Day 0 vs. Elevated Day 2 (individual and pooled libraries).
I’ve been continuing on a bunch of different analyses and running them through my pipeline. Reminder: pipeline is kallisto matrix -> DESeq2 -> GO-MWU (with steps in between for formatting). I’ve ran the following pairwise comparisons all the way through both:
- Ambient Day 0 vs. Ambient Day 2 (individual libraries only)
- Ambient Day 0 vs. Ambient Day 17 (individual libraries only)
- Ambient Day 2 vs. Ambient Day 17 (individual libraries only)
- Ambient Day 2 vs. Elevated Day 2 (individual libraries only)
- Elevated Day 0 vs. Elevated Day 2 (individual libraries only)
Here’s a link to all visualizations of significant GO categories, and here’s a link to the output from GO-MWU.
Findings from GO-MWU analyses:
- Tons of significant GO categories for Ambient Day 0 or 2 vs. 17!
- Not many significant GO categories when looking at different crabs with different temperature groups (ex: Day 2 ambient vs. Day 2 elevated). However, when we track the same crabs over time (ex: Day 0 elevated vs. Day 2 elevated), we see quite a few significant GO categories. Reminder: Day 0 elevated samples were taken when all crabs were held at ambient water temperature.
- To me, the most promising areas for further analysis are Ambient Day 0 vs. Ambient Day 17, Ambient Day 2 vs. Ambient Day 17, and Elevated Day 0 vs. Elevated Day 2
What direction to go next
At this point, the project is splitting along two general lines - DEG analysis and correlation analysis. I’ll break down the plans for each of those.
DEG analysis:
The goal of this is to investigate the role of temperature on Hematodinium and C. bairdi expression. Lists of DEGs will be produced for two particularly relevant comparisons - Ambient Day 2 vs. Elevated Day 2 (individual libraries only) and Elevated Day 0 vs. Elevated Day 2 (individual libraries only). Both contrast samples taken at ambient temperatures vs. samples taken at elevated temperatures.
We will take these DEGs and BLAST them against an NCBI database of all Alveolata sequences (Alveolata is the superphylum containing Hematodinium, as there are relatively few Hematodinium sequences). From there, we can specifically investigate differentially-expressed Hematodinium genes on an individual level.
Correlation Analysis
Essentially, this tries to determine the roles of both temperature and time. This involves taking a table of all genes and their transcripts per million (TPM) from the kallisto output files, and examining the correlation.
I’m still in the really early stages of figuring this stuff out, but based on the lab meeting today, there are two possibilities for a correlation analysis.
Option 1: ANOVA-simultaneous component analysis (ASCA). Shelly’s done quite a bit of work on this, including a paper that examines the effect of both temperature and time, which seems extremely relevant to what I’m doing.
Option 2: Weighted correlation network analysis (WGCNA). Yaamini’s the expert on this one - here’s a script she wrote.
Both seem like good options, but I’m not really working on this immediately. My plan is to try out the WGCNA tutorial Yaamini linked in her script, do some reading, and see which one I prefer/fits project goals better.