pwg-docs

Documentation for BIC Pipelines

View the Project on GitHub soccin/pwg-docs

Variant Pipeline Results Description (2016-12-30)

Version 3.1 (2016-12-30)

Directory structure

The main results directory will be of the form:

.../<LAB>/<INVESTIGATOR>/<PROJNUM>/<REVISION>/

LAB, INVESTIGATOR and the mskcc ids for the lab head and investigator. PROJNUM is the iGO project number and REVISION is the version of this result. All previous runs of the pipeline will be stored and any re-run will go into a new REVISION folder.

Under this main folder you will see:

alignments
metrics
post
variants

which are the various output directories of the WES pipeline. The contents are as follows

post folder

This folder contains the primary MAF file. It will be called:

post/Proj_<NUM>___SOMATIC.vep.filtered.facets.V3.maf

It has been post process and annotated with the VEP annotator. Additionally gene level information from the FACETS output has been joined in, including: local purity, ploidy, cancer cell fraction, and confidence intervals.

Additionally, the file is now marked with a number of filtering flags to help mark a number of potential artifacts that lead to false positives.

The filters currently being used are:

N.B., the events have only been flagged not removed. But it is strongly suggested that some of all of these be redacted before moving forward with any further downstream analysis.

The filtering code is online at:

https://github.com/mskcc/wes-filters

there you can find more documentation and the source code.

And there is also


which is the _filled_ outmaf which contains DP and AD information for every sample for each event called in the SOMATIC MAF.


The code for post-processing is available here:

```bash
https://github.com/soccin/Variant-PostProcess/tree/feature/Version3

on on LUNA at:

/home/socci/Code/Pipelines/CBE/Variant/PostProcessV3

I will write up more the postprocessing steps are time goes on.

facets folder

This folder has the output from the FACETS algorithm. At the top level there is a SEG file which is an IGV formatted copy number segmenation file of all samples in the hisensitivity settings. Then for every sample (tumor/normal) pair there is a separate folder for the facets output for each pair.

For each sample FACETS is run twice once in a low sensitivity seeting (Cval==300) to get an estimate for the diploid level and sample level purity. Then the algorithm is rerun in a high sensitivity pass (Cval==100) to get more fine grain segmenation.

For each pass the following files are output

Plots of the BiSegmention and the Cancer Cell Fractions

Facets segmention which has for each segment: coordinates, metrics, cell fraction (cf), total and lessor copy number (tcn, lcn) for both estimators, cncf and em.

Run metrics/settings and global (sample level) Purity, Ploidy, dipLogR

IGV segmentation file for this sample

FACETS input file; SNP counts for both tumor and normal.