🧪

Community Workflows: nf-core

Overview

nf-core provides a collection of open source pipelines and workflows. This workflow can be launched via API/SDK or via Web Walkthrough.

image

nf-core Launch via API/SDK

You can launch nf-core workflows programmatically using the Form Bio CLI/SDK tool to call the Form Bio API.

  1. Upload any relevant input data/files
# Upload files to Form Bio project
$ formbio storage cp -r ./local-files/sequences formbio://${org}/${project}
  1. Create input parameters for a given nf-core workflow as a JSON params list
    1. You can use nf-core Launch pipeline to create JSON parameters.
    2. Note: any input files should be of the URI scheme: formbio://${project}/${org}/${filepath}
  2. You then can launch a workflow via the API using the Form Bio CLI/SDK:
    • See docs on how to use the Form Bio CLI/SDK to run workflows via the API

For example, to launch nf-core/bamtofastq workflow:

# Run nf-core/bamtofastq workflow
$ formbio workflow run \
--run-name 'nf-core_bamtofastq_re-run_1' \
--org formbio \
--project formbio-workflows \
--repo nf-core \
--workflow formbio/formbio-workflows/nf-core \
--version main \
--execution-engine nextflow \
-- \
--outdir='{{formbio.params.output}}' \
# nf-core JSON input params
--params='{
    "input": "https://raw.githubusercontent.com/nf-core/test-datasets/bamtofastq/samplesheet/test_bam_samplesheet.csv"
}' \
--workflow='nf-core/bamtofastq' \
--workflowVersion='2.1.0'

Another example to launch nf-core/taxprofiler using formbio input files:

$ formbio workflow run \
--run-name 'nf-core-tax-profiler' \
--org formbio \
--project formbio-workflows \
--repo nf-core \
--workflow formbio/formbio-workflows/nf-core \
--version main \
--execution-engine nextflow \
-- \
--outdir='{{formbio.params.output}}' \
# nf-core JSON input params
--params='{
  "input": "formbio://formbio/formbio-workflows/nf-core-data/taxprofiler/samplesheet.csv",
  "databases": "formbio://formbio/formbio-workflows/nf-core-data/taxprofiler/database_v1.1.csv",
  "perform_shortread_qc": true,
  "perform_longread_qc": true,
  "shortread_qc_mergepairs": true,
  "perform_shortread_complexityfilter": true,
  "perform_shortread_hostremoval": true,
  "perform_longread_hostremoval": true,
  "perform_runmerging": true,
  "hostremoval_reference": "https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/genome.fasta",
  "run_kaiju": true,
  "run_kraken2": true,
  "run_bracken": true,
  "run_malt": false,
  "run_metaphlan": true,
  "run_centrifuge": true,
  "run_diamond": true,
  "run_krakenuniq": true,
  "run_motus": false,
  "run_ganon": true,
  "run_krona": true,
  "run_kmcp": true,
  "kmcp_mode": 0,
  "krona_taxonomy_directory": "https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/metagenome/krona_taxonomy.tab",
  "malt_save_reads": true,
  "kraken2_save_reads": true,
  "centrifuge_save_reads": true,
  "run_profile_standardisation": true
}' \
--workflow='nf-core/taxprofiler' \
--workflowVersion='1.1.2'

🔔 WATCH OUT for special characters!

Specific characters in parameter values can cause errors. In the nf-core ampliseq workflow, using the "=" symbol within the data_ref_taxonomy parameter's value e.g., coidb=221216 could crash the workflow.

  • 👍 Replacing the = character with its Unicode equivalence
  • 🙀 Put the below in a shell script and run it
json_param="$1"

formbio workflow run \
--run-name 'ampliseq_nf_core_re-run_10' \
--org form-bio-customer-support \
--project onboarding-project \
--repo nf-core \
--workflow formbio/formbio-workflows/nf-core \
--version main \
--execution-engine nextflow \
-- \
--outdir='{{formbio.params.output}}' \
--params='{ 
    "input":"formbio://form-bio-customer-support/onboarding-project/ampliseq_nf_core/samplesheet_ampliseq.csv", 
    "max_cpus": 2, 
    "max_memory":"6.GB", 
    "dada_ref_taxonomy": "'" ${json_param//=/\\u003D} "'", 
    "skip_cutadapt": true 
}' \
--workflow='nf-core/ampliseq' \
--workflowVersion='2.8.0'

nf-core Launch via Web Walkthrough

1. Navigate to the nf-core launcher card.

  • Find nf-core workflow via:
    • Inside Workflows, start scrolling down the category, click nf-core to display it
    • Search bar in the top right corner once click Launch
image

2. Launcher Tabs

3 to-do before running an nf-core workflow

  1. Workflow name
    • Search for the desired nf-core workflow. nf-core workflow repository supports a variety of workflows available on nf-co.re.
image

b. Workflow version

  • Search for workflow version of selected workflow repository
image

c. Parameters JSON

  • Step 1: To determine required parameters, go to the documentation of selected workflow and version in the link as shown below.
image
  • Step 2: Check desired/required Parameters

Navigate to the Parameters tab to see list of parameters required for the JSON input.

  • Compulsory are parameters with required on the right side. These are needed for the workflow to run.
  • Others are optional depending on your specific needs.
  • outdir does not need be defined since Form Bio platform handles it automatically .
image
  • Step 3: For each desired/required parameter, assign value to it a put the parameters in JSON format
{
  "input": "formbio://form-bio-customer-support/onboarding-project/rnaseq_nfcore/rnaseq_nf_core/samplesheet_test.csv", 
  "fasta": "formbio://form-bio-customer-support/onboarding-project/rnaseq_nfcore/rnaseq_nf_core/genome.fasta", 
  "gtf": "formbio://form-bio-customer-support/onboarding-project/rnaseq_nfcore/rnaseq_nf_core/genes_with_empty_tid.gtf.gz"
}

  • Add the created JSON to Parameters JSON on the platform
image

🚧 Note:

  • Any file or directory paths should point to its formbio:// path
  • Within a file, any path should be formatted as described below:
    • The platform can handle different types of data paths, including those from public datasets (S3 buckets) and web addresses (HTTP).
    • If your data is already uploaded onto the platform, paths must be in Google Cloud Storage format.
    • # Example of correctly formatted paths in a samplesheet.csv
      patient,sample,lane,fastq_1,fastq_2
      ID1,S1,L002,gs://formbio-production-26a68935-d9a6-4b3e-9b89-43dacfbf5e32/DNAseq/cancergenomics/SRR15663418_T1_1.fastq.gz,gs://formbio-production-26a68935-d9a6-4b3e-9b89-43dacfbf5e32/DNAseq/cancergenomics/SRR15663418_T1_2.fastq.gz
      ID2,S2,L002,gs://formbio-production-26a68935-d9a6-4b3e-9b89-43dacfbf5e32/DNAseq/cancergenomics/SRR15663419_N1_1.fastq.gz,gs://formbio-production-26a68935-d9a6-4b3e-9b89-43dacfbf5e32/DNAseq/cancergenomics/SRR15663419_N1_2.fastq.gz

      📖 To generate the GCS format path

      🤏 Go to Data → Find desired input files → Click three-dots → Select Copy Download URL

image

👍 This generates a download URL:

https://storage.googleapis.com/formbio-production-26a68935-d9a6-4b3e-9b89-43dacfbf5e32/DNAseq/cancergenomics/SRR15663418_T1_1.fastq.gz?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=go-api%40tundra-prd.iam.gserviceaccount.com%2F20240220%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240220T065959Z&X-Goog-Expires=604799&X-Goog-Signature=8b47d09eeda7d5c3d50ca44e93390f76acf5e3e576617a622686060a4b66c52d45fb293534ddeae725593281133542f1f048deda3c8fd328b84471c103a170d89e8f684b4f6154578d48fd39481d88bc1e04ffb604f87d6a8eed7c56171912c8ea21fcae409409397d0382589695a1965e4b838ba692068d90323bfa4edbfb0b26dae5ac1513b7ae628293368c13083859f375c9c917b4040e9fe38d711923153805045093cf4f2149ea7fcb61b13f65f9f5e6ab165be1e5d2718fd335f982feecb6b307b5bb588b56424b3b59b943df5ef91ea86532814f27cf28ec45e6fc45313fc9d2440d6d5725b6d88d35784ad91dc1e71169c9a60135adafcb49c4676c&X-Goog-SignedHeaders=host&response-content-disposition=attachment
  • 👉 Replace https://storage.googleapis.com/ with gs://
  • 👉 Remove whatever comes after the first ?
  • 🙀 The final result should be:

gs://formbio-production-26a68935-d9a6-4b3e-9b89-43dacfbf5e32/DNAseq/cancergenomics/SRR15663418_T1_1.fastq.gz

💯 Parameters JSON can be created using nf-core website

  • You can use nf-co.re Launch to map which fields to populate in the JSON input field for this workflow.
    • For rnaseq https://nf-co.re/rnaseq/3.14.0 → Click on Launch version <latest> (here it’s 3.14.0) to see which JSON fields to populate into JSON format
C
Click Launch version 3.14.0
A
Add value of parameter, red asterisk marks those required which must be filled. Note: outdir must be omitted in the final result.
F
Finally, you can remove outdir and add the generated JSON to Parameters JSON
# Final JSON format
{
    "input": "formbio:\/\/form-bio-customer-support\/onboarding-project\/rnaseq_nfcore\/rnaseq_nf_core\/samplesheet_test.csv",
    "fasta": "formbio:\/\/form-bio-customer-support\/onboarding-project\/rnaseq_nfcore\/rnaseq_nf_core\/genome.fasta",
    "gtf": "formbio:\/\/form-bio-customer-support\/onboarding-project\/rnaseq_nfcore\/rnaseq_nf_core\/genes_with_empty_tid.gtf.gz"
}

3. Go to the Review & Submit

  • ✍️ Add a run name
  • ✅ Check all input again
  • 😃 Click Run Workflow to execute
image