# 目录

1.Module 1 - Introduction to RNA sequencing

  1. Installation
  2. Reference Genomes
  3. Annotations
  4. Indexing
  5. RNA-seq Data
  6. Pre-Alignment QC

2.Module 2 - RNA-seq Alignment and Visualization

  1. Adapter Trim
  2. Alignment
  3. IGV
  4. Alignment Visualization
  5. Alignment QC

3.Module 3 - Expression and Differential Expression

  1. Expression
  2. Differential Expression
  3. DE Visualization
  4. Kallisto for Reference-Free Abundance Estimation

4.Module 4 - Isoform Discovery and Alternative Expression

  1. Reference Guided Transcript Assembly
  2. de novo Transcript Assembly
  3. Transcript Assembly Merge
  4. Differential Splicing
  5. Splicing Visualization

5.Module 5 - De novo transcript reconstruction

  1. De novo RNA-Seq Assembly and Analysis Using Trinity

6.Module 6 - Functional Annotation of Transcripts

  1. Functional Annotation of Assembled Transcripts Using Trinotate

# 2.1 Adapter Trim (可选步骤)

使用 Flexbar 从读取的 FASTQ 文件中修剪 reads。这个步骤的输出将为每个数据集裁剪 FASTQ 文件。

参考 Flexbar 帮助文档获得更详细的解释:

  • https://github.com/seqan/flexbar
  • https://github.com/seqan/flexbar/wiki

Flexbar 基本用法:

flexbar -r reads [-t target] [-b barcodes] [-a adapters] [options]

额外选项如下:

  • '--adapter-min-overlap 7' requires a minimum of 7 bases to match the adapter
  • '--adapter-trim-end RIGHT' uses a trimming strategy to remove the adapter from the 3 prime or RIGHT end of the read
  • '--max-uncalled 300' allows as many as 300 uncalled or N bases (MiSeq read lengths can be 300bp)
  • '--min-read-length' the minimum read length allowed after trimming is 25bp.
  • '--threads 8' use 8 threads
  • '--zip-output GZ' the input FASTQ files are gzipped so we will output gzipped FASTQ to save space
  • '--adapters' define the path to the adapter FASTA file to trim
  • '--reads' define the path to the read 1 FASTQ file of reads
  • '--reads2' define the path to the read 2 FASTQ file of reads
  • '--target' a base path for the output files. The value will _1.fastq.gz and _2.fastq.gz for read 1 and read 2 respectively
  • '--pre-trim-left' trim a fixed number of bases at left read end. For example, to trim 5 bases at the left side of reads: --pre-trim-left 5
  • '--pre-trim-right' trim a fixed number of bases at right read end. For example, to trim 5 bases at the right side of reads: --pre-trim-right 5
  • '--pre-trim-phred' trim based on phred quality value to deal with higher error rates towards the end of reads. For example, to trim the 3 prime end until quality offset value 30 or higher is reached, specify: --pre-trim-phred 30

# Flexbar trim

首先,为输出设置一些目录

mkdir trim

下载必要的 Illumina 接头序列文件。

wget http://genomedata.org/rnaseq-tutorial/illumina_multiplex.fa

使用 flexbar 删除 illumina 接头序列 (如果有的话),并修剪每个读取的前 13 个碱基。

../flexbar-3.4.0-linux/flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --pre-trim-left 13 --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads UHR_Rep1_ERCC-Mix1_Build37-ErccTranscripts-chr22.read1.fastq.gz --reads2 UHR_Rep1_ERCC-Mix1_Build37-ErccTranscripts-chr22.read2.fastq.gz --target trim/UHR_Rep1_ERCC-Mix1_Build37-ErccTranscripts-chr22

可选练习:比较裁剪前后 FastQC 文件的质控报告。所有 fastqc 报告都可以在命令行上生成。

fastqc *.fastq.gz

# 练习 5

作业:使用上面的方法,修剪你在之前的实践练习中下载的正常样本和肿瘤样本 reads 文件。注意:尝试去掉上面使用的硬左修剪选项 (”--pre-trim-left”)。一旦你削减了读取,使用 FastQC 工具比较修剪前和修剪后的 FastQ 文件。

mkdir trimmed
wget http://genomedata.org/rnaseq-tutorial/illumina_multiplex.fa
flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads hcc1395_normal_rep1_r1.fastq.gz --reads2 hcc1395_normal_rep1_r2.fastq.gz --target trimmed/hcc1395_normal_rep1
flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads hcc1395_normal_rep2_r1.fastq.gz --reads2 hcc1395_normal_rep2_r2.fastq.gz --target trimmed/hcc1395_normal_rep2
flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads hcc1395_normal_rep3_r1.fastq.gz --reads2 hcc1395_normal_rep3_r2.fastq.gz --target trimmed/hcc1395_normal_rep3
flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads hcc1395_tumor_rep1_r1.fastq.gz --reads2 hcc1395_tumor_rep1_r2.fastq.gz --target trimmed/hcc1395_tumor_rep1
flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads hcc1395_tumor_rep2_r1.fastq.gz --reads2 hcc1395_tumor_rep2_r2.fastq.gz --target trimmed/hcc1395_tumor_rep2
flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads hcc1395_tumor_rep3_r1.fastq.gz --reads2 hcc1395_tumor_rep3_r2.fastq.gz --target trimmed/hcc1395_tumor_rep3

修剪后,hcc1395 正常样本 1 号重复,reads1 的读长范围是多少?25-151

FastQC 报告中哪些部分最适合观察修剪的效果?'Basic Statistics', 'Sequence Length Distribution' 以及 'Adapter Content'

在 “Per base sequence content section” 部分,你看到了什么模式?什么可以解释这种模式呢?

前 9 个碱基位置显示出一个尖状的模式,表明每个碱基在我们的读取 / 片段的开头有偏倚的表示。一种可能的解释是,cDNA 合成的随机六聚体引物在文库准备过程中以非随机的方式产生。因此碎片的生成 (以及最终的 reads) 在开始时有一个非随机模式。

更新于 阅读次数

请我喝[茶]~( ̄▽ ̄)~*

amane 微信支付

微信支付

amane 支付宝

支付宝