snsxt package¶
Subpackages¶
- snsxt.config package
- snsxt.report package
- snsxt.sns_classes package
- snsxt.sns_tasks package
- snsxt.util package
- Submodules
- snsxt.util.classes module
- snsxt.util.find module
- snsxt.util.git module
- snsxt.util.log module
- snsxt.util.mutt module
- snsxt.util.qsub module
- snsxt.util.sh module
- snsxt.util.template module
- snsxt.util.test module
- snsxt.util.test_find module
- snsxt.util.test_qsub module
- snsxt.util.test_tools module
- snsxt.util.tools module
- Module contents
Submodules¶
snsxt.cleanup module¶
Functions for cleaning up after an analysis is finished
-
snsxt.cleanup.
analysis_complete
(analysis)[source]¶ Actions to take after an analysis is done
Parameters: analysis (SnsWESAnalysisOutput) – object representing output from an sns wes analysis pipeline output on which to run downstream analysis tasks
-
snsxt.cleanup.
save_configs
(analysis_dir)[source]¶ Saves the global
configs
object to a YAML file in the analysis dirParameters: analysis_dir (str) – path to a directory to hold the analysis output Notes
Some config items are added or modified during program run time, so final configs may not exactly match starting configs set in external config YAML files
snsxt.job_management module¶
Functions for custom management of compute cluster qsub jobs
-
snsxt.job_management.
background_jobs
= []¶ If an analysis task generated qsub jobs, but did not wait for them to finish, they will be captured in this list and will be monitored to completion when run_tasks finishes running all tasks. This way, the program will not exit until all jobs created have finished.
snsxt.mail module¶
Sends email output of the pipeline results
-
snsxt.mail.
check_default_address
(address, server, default_key='__self__')[source]¶ Checks if the provided
address
matches thedefault_key
, and if so, returns a default email address made from the username of the user running the program + theserver
.Parameters: - address (str) – email address(es) in the format 'email1@server.com,email2@server.com’
- server (str) – email server to use for a default email address
- default_key (str) – value to use for recognizing when a default address should be returned
Returns: either the original
address
string, or an email address composed of the user’s system name +server
Return type: str
-
snsxt.mail.
email_error_output
(message_file, *args, **kwargs)[source]¶ Sends an email in the event that errors occurred during the analysis.
Parameters: message_file (str) – path to a file to use as the body of the email, typically the program’s log file
Keyword Arguments: - subject_line (str) – the subject line that should be used for the email
- recipient_list (str) – the recipients for the email, in the format ``recipient_list = “user1@server.com,user2@server.com” ``
-
snsxt.mail.
email_files
= []¶ This list should contain file paths output by analysis tasks for inclusion as email attachments at the end of a successful analysis pipeline. It should be accessed by other parts of the program external to this module
Examples
Example usage:
task_output_file = 'foo.txt' mail.email_files.append(task_output_file)
-
snsxt.mail.
email_output
(message_file, *args, **kwargs)[source]¶ Sends an email upon the successful completion of the analysis pipeline. If any
email_files
were set by the program while running, they will be validated and included as email attachments.Parameters: - message_file (str) – path to a file to use as the body of the email, typically the program’s log file
- args (list) – a list containing extra args to pass to
email_output()
- kwargs (dict) – a dictionary containing extra args to pass to
email_output()
Keyword Arguments: - recipient_list (str) – the recipients for the email, in the format ``recipient_list = “user1@server.com,user2@server.com” ``
- reply_to (str) – email address to use in the ‘Reply To’ field of the email
- subject_line (str) – the subject line that should be used for the email
-
snsxt.mail.
sns_start_email
(analysis_dir, **kwargs)[source]¶ Emails the user when the sns pipeline starts
Parameters: - analysis_dir (str) – path to a directory to hold the analysis output
- kwargs (dict) – dictionary containing extra args to pass to run_tasks
-
snsxt.mail.
validate_email_files
()[source]¶ Makes sure all the items in the
email_files
list exist and are considered valid for inclusion in email outputNotes
Since the email output is sent by an external program such as
mutt
, it is important that file attachments be valid before attempting to include them, since it will be more difficult to ensure that the email is sent successfully.
snsxt.run module¶
Runs a series of analysis tasks
Originally designed as an extension to the sns pipeline output, with the flexibility of added ad hoc extra analysis tasks for downstream processing
-
snsxt.run.
configs
= {'analysis_id_file': 'analysis_id.txt', 'tasks_config_dir': 'config', 'report_compile_script': '/home/docs/checkouts/readthedocs.org/user_builds/snsxt/checkouts/latest/snsxt/compile_RMD_report.R', 'GATK_summary_file': 'VCF-GATK-HC-annot.all.txt', 'LoFreq_summary_file': 'VCF-LoFreq-annot.all.txt', 'MuTect2_annot_file': 'VCF-MuTect2-annot.all.txt', 'results_id_file': 'results_id.txt', 'sns_repo_dir': '/home/docs/checkouts/readthedocs.org/user_builds/snsxt/checkouts/latest/snsxt/sns', 'MuTect2_summary_file': 'summary.VCF-MuTect2-annot.csv', 'tasks_files_dir': 'files', 'notification_recipients': '__self__', 'sns_route': 'wes', 'main_report': '/home/docs/checkouts/readthedocs.org/user_builds/snsxt/checkouts/latest/snsxt/report/analysis_report.Rmd', 'snsxt_parent_dir': '/home/docs/checkouts/readthedocs.org/user_builds/snsxt/checkouts/latest', 'tasks_reports_dir': 'reports', 'success_recipients': '__self__', 'snsxt_dir': '/home/docs/checkouts/readthedocs.org/user_builds/snsxt/checkouts/latest/snsxt', 'report_dir': '/home/docs/checkouts/readthedocs.org/user_builds/snsxt/checkouts/latest/snsxt/report', 'samples_fastq_raw_file': 'samples.fastq-raw.csv', 'samples_pairs_file': 'samples.pairs.csv', 'reply_to_server': 'nyumc.org', 'extra_handlers': [<logging.FileHandler object>, <logging.FileHandler object>], 'success_subject_line_base': '[NGS580] [Success]', 'error_recipients': '__self__', 'mail_files': ['RunParameters.xml', 'RunParameters.txt', 'summary-combined.wes.csv'], 'GATK_HC_annot_file': 'summary.VCF-GATK-HC-annot.csv', 'notification_subject_line_base': '[NGS580] [Update]', 'tasks_scripts_dir': 'scripts', 'Strelka_annot_file': 'VCF-Strelka-annot.all.txt', 'email_recipients': 'kellys04@nyumc.org', 'summary_combined_file': 'summary-combined.wes.csv', 'Strelka_summary_file': 'summary.VCF-Strelka-annot.csv', 'sns_pairs_route': 'wes-pairs-snv', 'error_subject_line_base': '[NGS580] [Error]', 'tasks_sns_repo_dir': 'sns', 'LoFreq_annot_file': 'summary.VCF-LoFreq-annot.csv', 'report_files': ['report_tools.R', 'report_config.yml', 'report_styles.css', 'summary_report.Rmd', 'variant_report.Rmd', 'paired_variant_report.Rmd']}¶ The main configurations dictionary to use for settings throughout the program. The sns_repo_dir value is modified at program run time, by preprending the snsxt_dir path (path to this script’s directory). Other dict keys are set at program run time as well, including snsxt_parent_dir, snsxt_dir, and extra_handlers
-
snsxt.run.
default_probes
= '/home/docs/checkouts/readthedocs.org/user_builds/snsxt/checkouts/latest/probes.bed'¶ A .bed formatted file to use by default for CNV analysis. Must have only 3 tab-delimited columns.
-
snsxt.run.
default_targets
= '/home/docs/checkouts/readthedocs.org/user_builds/snsxt/checkouts/latest/targets.bed'¶ A .bed formatted file to use by default as the target regions for variant calling
-
snsxt.run.
default_task_list
= '/home/docs/checkouts/readthedocs.org/user_builds/snsxt/checkouts/latest/task_lists/default.yml'¶ The YAML formatted task list containing analysis tasks to be run by default
-
snsxt.run.
email_logpath
()[source]¶ Returns the path to the email log file; needed by the logging.yml config file
This generates dynamic output log file paths & names
Returns: a Python logging FileHandler object configured with a log file path set dynamically at program run time Return type: logging.FileHandler
-
snsxt.run.
extra_handlers
= [<logging.FileHandler object>, <logging.FileHandler object>]¶ Python logging Filehandlers to be passed throughout the program, in order to keep all submodules logging to the same file(s) set by logpath() and email_logpath()
-
snsxt.run.
get_task_list
(task_list_file)[source]¶ Reads the task_list from a YAML formatted file
Parameters: task_list_file (str) – the path to a YAML formatted file from which to read analysis tasks Returns: a dictionary containing the contents of the YAML task_list_file Return type: dict
-
snsxt.run.
logpath
()[source]¶ Returns the path to the main log file; needed by the logging.yml config file
This generates dynamic output log file paths & names
Returns: a Python logging FileHandler object configured with a log file path set dynamically at program run time Return type: logging.FileHandler
-
snsxt.run.
main
(**kwargs)[source]¶ Main control function for the program
Parameters: kwargs (dict) – dictionary containing args to run the program, expected to be passed from parse() and passed on to run_sns_tasks() and run_sns_tasks()
Keyword Arguments: - analysis_id (str) – an identifier for the analysis (e.g. the NextSeq run ID)
- results_id (str) – a sub-identifier for the analysis (e.g. a timestamp)
- task_list_file (str) – the path to a YAML formatted file containing analysis tasks to be run
- debug_mode (bool) – prevents the program from halting if errors are found in qsub log output files; defaults to False. True = do not stop for qsub log errors, False = stop if errors are found
- fastq_dirs (list) – a list of paths to directories to use as input data locations for a new sns analysis. These directories should contain .fastq.gz files within two levels from the top level of the dir (e.g. at most 2 subdirs deep). The .fastq.gz files contained in these directories should keep the exact filenames output by the NextSeq; sample parsing will take place automatically.
- targets_bed (str) – path to a .bed formatted file to use as the target regions for variant calling
- probes_bed (str) – path to a .bed formatted file to use as the probes for CNV analysis
- pairs_sheet (str) – path to a .csv samplesheet to use for matching tumor and normal samples in the paired variant calling analysis steps. See GitHub for example.
-
snsxt.run.
parse
()[source]¶ Runs the program based on CLI arguments. arg parsing happens here, if program was run as a script
Returns: a dictionary of keyword arguments to pass to main() Return type: dict Examples
Example script usage:
snsxt$ snsxt/run.py -d mini_analysis-controls/ -f mini_analysis-controls/fastq/ -a mini_analysis -r results1 -t task_lists/dev.yml --pairs_sheet mini_analysis-controls/samples.pairs.csv_usethis
snsxt.setup_report module¶
Sets up and compiles the parent analysis report for the pipeline output
-
snsxt.setup_report.
compile_RMD_report
(input_file)[source]¶ Compiles a .Rmd format report using the R script set in the configs.
Returns: the tools.SubprocessCmd
object for the shell command that was run to execute the report compilation scriptReturn type: SubprocessCmd
-
snsxt.setup_report.
get_main_report_file
()[source]¶ Gets the path to the main parent report .Rmd file which should be used to compile the analysis report.
Returns: the path to the parent .Rmd file to use in compiling the report Return type: str
snsxt.test module¶
Runs all the unit tests found throughout the program
snsxt.validation module¶
Functions for validating aspects of the pipeline
-
snsxt.validation.
background_output_files
= []¶ By default, a task will validated its expected output files upon task completion. However, tasks that submit qsub jobs and do not wait for them to complete will not be able to validate their expected output files. Instead, the paths to those expected files will be collected in this list, and they will be evaluated once all qsub jobs have been monitored to completion and validated.