Creating .fasta file of differentially-expressed transcripts

We previously created tables of differentially-expressed transcripts for all pairwise comparisons. We then decided to focus on two particularly relevant comparisons - Ambient Day 2 vs. Elevated Day 2 (individual libraries only) and Elevated Day 0 vs. Elevated Day 2 (individual libraries only). Both contrast samples taken at ambient temperatures vs. samples taken at elevated temperatures.

I then used these tables to create a .fasta file containing all differentially-expressed transcripts by using this script. I then used blastn to compare these .fasta files to a database containing all Alveolata nucleotide sequences. The output is here for Ambient Day 2 vs. Elevated Day 2 and here for Elevated Day 0 vs. Elevated Day 2. The database of all Alveolata nucleotide sequences was obtained from the NCBI Taxomony Browser at

I initially used megablast (the default) before changing to the more thorough blastn.

At this stage, I have a few next steps:

  • Decide what our bar for evalue (expect value) should be to definitively say that a sequence is Alveolata
  • Repeat this process, but with a .fasta file containing all Arthropoda sequences. Comparing the total number of matched sequences from both BLASTns should give us a good idea of how good this process is at determining taxa.
  • Use BLAST with the taxonomy filter as another method of determining taxa

I also uncovered a potentially huge shortcut! Sam previously separated cbai_transcriptome_v2.0 (which is the one I’m currently using, and is unfiltered by taxa) into Alveolata and non-_Alveolata_. Link here. I can do the following:

  • Download the cbai_transcriptome_v2.1 and hemat_transcriptome_v2.1 .fasta files and see if they sum up to cbai_transcriptome_v2.0
  • If yes, merge 2.1 with 2.0 and annotate non-_Alveolata_ (and potentially Alveolata?) by joining

Essentially, this should produce an annotated version of transcriptome 2.0, which will be super useful in future analyses!