sampling-utils - Resampling Utilities¶
Overview¶
New in version 0.3.1.
Resampling Utilities¶
sample command¶
This command samples from a Fasta or FastQ file, based on a probability defined by the user (0.001 or 1 / 1000 by default, -r parameter), for a maximum number of sequences (100,000 by default, -x parameter). By default 1 sample is extracted, but as many as desired can be taken, by using the -n parameter.
The sequence file in input can be either be passed to the standard input or as last parameter on the command line. By defult a Fasta is expected, unless the -q parameter is passed.
The -p parameter specifies the prefix to be used, and if the output files can be gzipped using the -z parameter.
Options¶
Sampling utilities
usage: sampling-utils [-h] [-v | --quiet] [--cite] [--manual] [--version]
{sample} ...
Named Arguments¶
-v, --verbose | more verbose - includes debug messages Default: 20 |
--quiet | less verbose - only error and critical messages |
--cite | Show citation for the framework |
--manual | Show the script manual |
--version | show program’s version number and exit |
Sub-commands:¶
sample¶
Sample a Fasta/FastQ
sampling-utils sample [-h] [-p PREFIX] [-n NUMBER] [-r PROB] [-x MAX_SEQ] [-q]
[-z] [-v | --quiet] [--cite] [--manual] [--version]
[input_file]
Positional Arguments¶
input_file | Input FASTA file, defaults to stdin Default: - |
Named Arguments¶
-p, --prefix | Prefix for the file name(s) in output Default: “sample” |
-n, --number | Number of samples to take Default: 1 |
-r, --prob | Probability of picking a sequence Default: 0.001 |
-x, --max-seq | Maximum number of sequences Default: 100000 |
-q, --fastq | The input file is a fastq file Default: False |
-z, --gzip | gzip output files Default: False |
-v, --verbose | more verbose - includes debug messages Default: 20 |
--quiet | less verbose - only error and critical messages |
--cite | Show citation for the framework |
--manual | Show the script manual |
--version | show program’s version number and exit |