Sequencing key words

Next Generation Sequencing (NGS) or High-throughput Sequencing (HTS) are catch-all terms that describe the modern sequencing technologies. By sequencing the DNA (or RNA) faster than previous technologies, NGS are conquering the research in biology providing a quick access to biological sequences, genes expression level and so forth. However, dealing with this huge amount of data generated by these techniques is one of the most important challenge in bioinformatics; How to store, organize, make it accessible and process these data?

To start, it is important to know how the sequencing is done in order to understand how can you process your data and define the steps of your pipeline.

Some terms that are important to know:

  • Flow cell: This is the support of the sequencing process. A flow cell can be composed of several lines (8 for instance).

    Capture d’écran 2016-04-06 à 11.58.23
    An Illumina Flow Cell (8 lines). (source: illumina.com)
  • Line: A line is a component of the flow cell. In a line, a sample of DNA (or RNA) is loaded in order to be sequenced. Depending on the sequencers you choose, a line will give you a certain amount of sequences (called « reads »).The Illumina sequencers comparison will give you an overview.
  • Multiplexing: As the sequencing cost is usually line-billed, the multiplexing allow you to sequence several samples in the same line. Each sample will be labeled with a barcode, allowing the « demultiplexing » at the end of the process in order to get the sequences of each sample separately.
  • Barcode: A barcode is a very short sequence of nucleotides added at the beginning of each sequence of each sample if the latter will be multiplexed with others.
  • Primers: The primers are specifics DNA sequences that will be used as « anchors » to sequence the DNA fragment of interest.
  • Amplicon: This term describes the region of interest amplified by the selected primers.

To exemplify the multiplexing:

Capture d’écran 2016-04-06 à 13.25.24
Three multiplexed samples, each one identified with a specific barcode.

The barcodes (Red, Green and Blue) are used to label each sample. At the end of the process, the corresponding sample of each read is easily identified.