RNAseq 教程 (2.1) - 生信学习 | Zhou Xiaozhao = 小钊の笔记 = 前天是小兔子，昨天是小鹿，今天是你

# 目录

1.Module 1 - Introduction to RNA sequencing

2.Module 2 - RNA-seq Alignment and Visualization

3.Module 3 - Expression and Differential Expression

4.Module 4 - Isoform Discovery and Alternative Expression

5.Module 5 - De novo transcript reconstruction

De novo RNA-Seq Assembly and Analysis Using Trinity

6.Module 6 - Functional Annotation of Transcripts

Functional Annotation of Assembled Transcripts Using Trinotate

# 2.1 Adapter Trim (可选步骤)

使用 Flexbar 从读取的 FASTQ 文件中修剪 reads。这个步骤的输出将为每个数据集裁剪 FASTQ 文件。

参考 Flexbar 帮助文档获得更详细的解释:

https://github.com/seqan/flexbar
https://github.com/seqan/flexbar/wiki

Flexbar 基本用法:

flexbar -r reads [-t target] [-b barcodes] [-a adapters] [options]

额外选项如下：

'--adapter-min-overlap 7' requires a minimum of 7 bases to match the adapter
'--adapter-trim-end RIGHT' uses a trimming strategy to remove the adapter from the 3 prime or RIGHT end of the read
'--max-uncalled 300' allows as many as 300 uncalled or N bases (MiSeq read lengths can be 300bp)
'--min-read-length' the minimum read length allowed after trimming is 25bp.
'--threads 8' use 8 threads
'--zip-output GZ' the input FASTQ files are gzipped so we will output gzipped FASTQ to save space
'--adapters' define the path to the adapter FASTA file to trim
'--reads' define the path to the read 1 FASTQ file of reads
'--reads2' define the path to the read 2 FASTQ file of reads
'--target' a base path for the output files. The value will _1.fastq.gz and _2.fastq.gz for read 1 and read 2 respectively
'--pre-trim-left' trim a fixed number of bases at left read end. For example, to trim 5 bases at the left side of reads: --pre-trim-left 5
'--pre-trim-right' trim a fixed number of bases at right read end. For example, to trim 5 bases at the right side of reads: --pre-trim-right 5
'--pre-trim-phred' trim based on phred quality value to deal with higher error rates towards the end of reads. For example, to trim the 3 prime end until quality offset value 30 or higher is reached, specify: --pre-trim-phred 30

# Flexbar trim

首先，为输出设置一些目录

mkdir trim

下载必要的 Illumina 接头序列文件。

wget http://genomedata.org/rnaseq-tutorial/illumina_multiplex.fa

使用 flexbar 删除 illumina 接头序列 (如果有的话)，并修剪每个读取的前 13 个碱基。

../flexbar-3.4.0-linux/flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --pre-trim-left 13 --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads UHR_Rep1_ERCC-Mix1_Build37-ErccTranscripts-chr22.read1.fastq.gz --reads2 UHR_Rep1_ERCC-Mix1_Build37-ErccTranscripts-chr22.read2.fastq.gz --target trim/UHR_Rep1_ERCC-Mix1_Build37-ErccTranscripts-chr22

可选练习：比较裁剪前后 FastQC 文件的质控报告。所有 fastqc 报告都可以在命令行上生成。

fastqc *.fastq.gz

# 练习 5

作业：使用上面的方法，修剪你在之前的实践练习中下载的正常样本和肿瘤样本 reads 文件。注意：尝试去掉上面使用的硬左修剪选项 (”--pre-trim-left”)。一旦你削减了读取，使用 FastQC 工具比较修剪前和修剪后的 FastQ 文件。

	mkdir trimmed
	wget http://genomedata.org/rnaseq-tutorial/illumina_multiplex.fa
	flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads hcc1395_normal_rep1_r1.fastq.gz --reads2 hcc1395_normal_rep1_r2.fastq.gz --target trimmed/hcc1395_normal_rep1
	flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads hcc1395_normal_rep2_r1.fastq.gz --reads2 hcc1395_normal_rep2_r2.fastq.gz --target trimmed/hcc1395_normal_rep2
	flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads hcc1395_normal_rep3_r1.fastq.gz --reads2 hcc1395_normal_rep3_r2.fastq.gz --target trimmed/hcc1395_normal_rep3

	flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads hcc1395_tumor_rep1_r1.fastq.gz --reads2 hcc1395_tumor_rep1_r2.fastq.gz --target trimmed/hcc1395_tumor_rep1
	flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads hcc1395_tumor_rep2_r1.fastq.gz --reads2 hcc1395_tumor_rep2_r2.fastq.gz --target trimmed/hcc1395_tumor_rep2
	flexbar --adapter-min-overlap 7 --adapter-trim-end RIGHT --adapters illumina_multiplex.fa --max-uncalled 300 --min-read-length 25 --threads 8 --zip-output GZ --reads hcc1395_tumor_rep3_r1.fastq.gz --reads2 hcc1395_tumor_rep3_r2.fastq.gz --target trimmed/hcc1395_tumor_rep3

修剪后，hcc1395 正常样本 1 号重复，reads1 的读长范围是多少？25-151

FastQC 报告中哪些部分最适合观察修剪的效果？'Basic Statistics', 'Sequence Length Distribution' 以及 'Adapter Content'

在 “Per base sequence content section” 部分，你看到了什么模式？什么可以解释这种模式呢？

前 9 个碱基位置显示出一个尖状的模式，表明每个碱基在我们的读取 / 片段的开头有偏倚的表示。一种可能的解释是，cDNA 合成的随机六聚体引物在文库准备过程中以非随机的方式产生。因此碎片的生成 (以及最终的 reads) 在开始时有一个非随机模式。

# 目录

# 2.1 Adapter Trim (可选步骤)

# Flexbar trim

# 练习 5

RNAseq教程(1.6)

RNAseq教程(2.2)