snp_parser - SNPs analysis¶
The workflow starts with a number of alignments passed to the SNP calling software, which produces one VCF file per alignment/sample. These VCF files are used by SNPDat along a GTF file and the reference genome to integrate the information in VCF files with synonymous/non-synonymous information.
All VCF files are merged into a VCF that includes information about all the SNPs called among all samples. This merged VCF is passed, along with the results from SNPDat and the GFF file to snp_parser.py which integrates information from all data sources and output files in a format that can be later used by the rest of the pipeline. 
The GFF file passed to the parser must have per sample coverage information.
|||This step is done separately because it’s both time consuming and can helps to paralellise later steps|
This script parses results of SNPs analysis from any tool for SNP calling  and integrates them into a format that can be later used for other scripts in the pipeline.
It integrates coverage and expected number of syn/nonsyn change and taxonomy from a GFF file, SNP data from a VCF file.
The script accept gzipped VCF files
|||GATK pipeline was tested, but it is possible to use samtools and bcftools|
Changed in version 0.2.1: added -s option for VCF files generated using bcftools
Changed in version 0.1.16: reworkked internals and removed SNPDat, syn/nonsyn evaluation is internal
Changed in version 0.1.13: reworked the internals and the classes used, including options -m and -s
SNPs analysis, requires a vcf file and SNPDat results
usage: snp_parser [-h] [-o OUTPUT_FILE] [-q MIN_QUAL] [-f MIN_FREQ] [-r MIN_READS] -g GFF_FILE -p VCF_FILE -a REFERENCE -m SAMPLES_ID [-c COV_SUFF] [-s] [-v | --quiet] [--cite] [--manual] [--version]
Minimum SNP quality (Phred score)
Minimum allele frequency
Minimum number of reads to accept the SNP
|-g, --gff-file||GFF file with annotations|
|-p, --vcf-file||Merged VCF file|
|Fasta file with the GFF Reference|
|the ids of the samples used in the analysis|
Per sample coverage suffix in the GFF
bcftools call was used to produce the VCF file
more verbose - includes debug messages
|--quiet||less verbose - only error and critical messages|
|--cite||Show citation for the framework|
|--manual||Show the script manual|
|--version||show program’s version number and exit|