Primary analysis includes all the steps required to "call" - identify - each base. Besides identifying the bases, the sequencing machine will also assign a quality score for each of the bases.

The outcome is stored as a FASTQ file (see image), containing the sequence identifiers, the assigned nucleotides (A, G, T or C) which are also called "reads", and the associated Phred quality score. When the character "N" is associated to a nucleotide, it implies that the machine cannot determine the exact nucleotide. The Phred quality score refers to the probability of an incorrect base calling. In a FASTQ file, the Phred quality score is stored as an ASCII character (a letter, a digit or a symbol), which ASCII value will indicate the accuracy of the base calling.

The primary analysis is typically automatically performed in the sequencing machine after each run.

If you want to sequence several samples together in one run (for example from different patients or different experiments), you can assign a specific tag to each of them. The Tag (also known as the barcode) is a short DNA sequence that is added to the adapter to differentiate the reads from each sample. This tag will also be sequenced, and by identifying the specific adapter sequence for each sample, you will be able to separate them from each other. This is also calledmultiplexingand has the added advantage of lowering the sequence cost and producing larger samples.

捷的FASTQ文件结果的例子.

At the top of the results, an illumina sequence identifiers are shown. Next, the read, or sequence of nucleotides, is shown. A phred quality score is shown underneath the read. The illumina sequence identifiers consist of numbers and letters representing the instrument name, flow cell number, tile, and coordinates within the tile. The phred quality score is a number from 10 to 50, with 10 indicating 90% base call accuracy and 50 indicating 99.999% base call accuracy.

The next step in theNGS data analysis是街道d thesecondary analysis.

Baidu