Sentieon | Application Tutorial: Recommendations for Read Groups

0
117

Introduction

This document describes the recommended usage of the RGID field when using Sentieon® Genomics software to minimize potential issues. This document will help you determine the best practices for setting the different fields of the RG tags in the bam files you use.

Detailed description of RG fields and their usage

Detailed description of RG fields

The SAM format specification http://samtools.github.io/hts-specs/SAMv1.pdf defines a read group as an identifier that groups reads together. The read group fields in a BAM file can contain the following tags:

ID: Identifier. Unique identifier for the read group. You need to ensure that the RGID is unique within the BAM file and also unique across multiple BAM files used in the same command pipeline. This field is required.

CN: Center name. Name of the sequencing center that produced the reads. This tag is typically not used.

DS: Description. Free-form description of the read group. This tag is typically not used.

DT: Date. Date the run was produced, following the ISO8601 date or date/time format. This tag is typically not used.

FO: Flow order. Array of nucleotides corresponding to the order of nucleotides used for each flow of each read. This tag is typically not used.

KS: Key sequence. Array of nucleotide bases corresponding to the key sequence of each read. This tag is typically not used.

LB: Library. Library used for sequencing the reads.

PG: Programs. Programs used to process the read group. Typically, relevant information is included in the PG field of the BAM file rather than set individually for each read group.

PI: Predicted median insert size. This tag is typically not used.

PL: Platform. Technology used for sequencing the reads. This tag is required if you plan to run BQSR as it is used to determine the correct error model to apply.

PM: Platform model. Free-form text providing more details about the platform/technology used. This tag is typically not used.

PU: Platform unit. Unique identifier used by the sequencing instrument that performed the sequencing. This tag is recommended if you intend to run BQSR, as BQSR will model all reads belonging to the same PU; if PU is missing, BQSR will model reads with the same RGID.

SM: Sample name. Name of the sample to which the reads belong. This field is required.

RG field tags and Sentieon®

The following are general principles for using RG field tags in Sentieon® tools:

  • When using multiple input bam files, it is necessary to make the ID tag unique for each bam file; two different bam input files cannot have RGs with the same ID.
  • Tools use the SM tag to identify reads belonging to the same sample and process them accordingly.
  • Deduplication uses the LB tag to determine groups that may contain duplicates, duplicate reads should belong to the same library.
  • The BQSR model requires the PL tag to determine which error model to apply. If there is no PL tag, BQSR will not be performed.
  • If a PU tag is present, BQSR modeling will be based on read groups identified by the PU tag; if no PU tag is present, BQSR modeling will be based on read groups identified by the ID tag.

Filling in RG field tags

Sentieon® recommends the following conventions for RG field tags:

ID: sample_name.flowcell.lane.barcode

SM: sample_name

PL: technology platform, e.g., ILLUMINA

PU: flowcell.lane

LB: sample_name.library_prep

These recommendations ensure that:

  • Read group IDs will be unique even across multiple bam files, even for the same sample sequenced in different lanes or using different libraries.
  • BQSR will create recalibrations based on actual unique sequencing units, which can be performed if multiple samples are sequenced on the same sequencing unit.
  • Tumor and normal sample names will be unique in somatic variant detection.

 

Buscar
Categorías
Read More
Film
GANZER Film 80 Plus (2024) Stream Deutsch Gratis Online Schauen yuh
04 Sekunden - Mit der zunehmenden Nachfrage nach Online-Unterhaltung hat die...
By Nuurig Nuurig 2024-11-25 12:19:08 0 350
Film
Se A Different Man (2024) Fuld Film med dansk undertekst jrb
04 sekunder - Med den stigende efterspørgsel efter online underholdning har...
By Nuurig Nuurig 2024-11-25 12:44:09 0 378
Film
[-NEW LEAK!-] Nisha Guragain Video | Original Video Link | Nisha Guragain Video Viral On Social Media X & Trending Now plx
CLICK THIS L!NKK 🔴📱👉...
By Nuurig Nuurig 2024-11-28 23:11:29 0 275
Film
[+Here's how To WatcH!] Imsha Rehman Original Viral Link L𝚎aked on Social Media Trending X Twitter yhx
CLICK THIS L!NKK 🔴📱👉...
By Dobvec Dobvec 2024-11-21 11:49:35 0 443
Film
Oglądaj za Darmo CAŁY FILM Lany Poniedziałek (2024) Online z Polskim Dubbingiem mrg
08 sekundy - Wraz z rosnącym zapotrzebowaniem na rozrywkę online przemysł rozrywkowy zaobserwował...
By Nuurig Nuurig 2024-11-25 12:33:43 0 377