🧪

Community Workflows: nf-core

Overview

nf-core provides a collection of open source pipelines and workflows. This workflow can be launched via API/SDK or via Web Walkthrough.

image

nf-core Launch via API/SDK

You can launch nf-core workflows programmatically using the Form Bio CLI/SDK tool to call the Form Bio API.

  1. Upload any relevant input data/files
# Upload files to Form Bio project
$ formbio storage cp -r ./local-files/sequences formbio://${org}/${project}
  1. Create input parameters for a given nf-core workflow as a JSON params list
    1. You can use nf-core Launch pipeline to create JSON parameters.
    2. Note: any input files should be of the URI scheme: formbio://${project}/${org}/${filepath}
  2. You then can launch a workflow via the API using the Form Bio CLI/SDK:
    • See docs on how to use the Form Bio CLI/SDK to run workflows via the API

For example, to launch nf-core/bamtofastq workflow:

# Run nf-core/bamtofastq workflow
$ formbio workflow run \
--run-name 'nf-core_bamtofastq_re-run_1' \
--org formbio \
--project formbio-workflows \
--repo nf-core \
--workflow formbio/formbio-workflows/nf-core \
--version main \
--execution-engine nextflow \
-- \
--outdir='{{formbio.params.output}}' \
# nf-core JSON input params
--params='{
    "input": "https://raw.githubusercontent.com/nf-core/test-datasets/bamtofastq/samplesheet/test_bam_samplesheet.csv"
}' \
--workflow='nf-core/bamtofastq' \
--workflowVersion='2.1.0'

Another example to launch nf-core/taxprofiler using formbio input files:

$ formbio workflow run \
--run-name 'nf-core-tax-profiler' \
--org formbio \
--project formbio-workflows \
--repo nf-core \
--workflow formbio/formbio-workflows/nf-core \
--version main \
--execution-engine nextflow \
-- \
--outdir='{{formbio.params.output}}' \
# nf-core JSON input params
--params='{
  "input": "formbio://formbio/formbio-workflows/nf-core-data/taxprofiler/samplesheet.csv",
  "databases": "formbio://formbio/formbio-workflows/nf-core-data/taxprofiler/database_v1.1.csv",
  "perform_shortread_qc": true,
  "perform_longread_qc": true,
  "shortread_qc_mergepairs": true,
  "perform_shortread_complexityfilter": true,
  "perform_shortread_hostremoval": true,
  "perform_longread_hostremoval": true,
  "perform_runmerging": true,
  "hostremoval_reference": "https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/genome.fasta",
  "run_kaiju": true,
  "run_kraken2": true,
  "run_bracken": true,
  "run_malt": false,
  "run_metaphlan": true,
  "run_centrifuge": true,
  "run_diamond": true,
  "run_krakenuniq": true,
  "run_motus": false,
  "run_ganon": true,
  "run_krona": true,
  "run_kmcp": true,
  "kmcp_mode": 0,
  "krona_taxonomy_directory": "https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/metagenome/krona_taxonomy.tab",
  "malt_save_reads": true,
  "kraken2_save_reads": true,
  "centrifuge_save_reads": true,
  "run_profile_standardisation": true
}' \
--workflow='nf-core/taxprofiler' \
--workflowVersion='1.1.2'

🔔 WATCH OUT for special characters!

Specific characters in parameter values can cause errors. In the nf-core ampliseq workflow, using the "=" symbol within the data_ref_taxonomy parameter's value e.g., coidb=221216 could crash the workflow.

  • 👍 Replacing the = character with its Unicode equivalence
  • 🙀 Put the below in a shell script and run it
json_param="$1"

formbio workflow run \
--run-name 'ampliseq_nf_core_re-run_10' \
--org form-bio-customer-support \
--project onboarding-project \
--repo nf-core \
--workflow formbio/formbio-workflows/nf-core \
--version main \
--execution-engine nextflow \
-- \
--outdir='{{formbio.params.output}}' \
--params='{ 
    "input":"formbio://form-bio-customer-support/onboarding-project/ampliseq_nf_core/samplesheet_ampliseq.csv", 
    "max_cpus": 2, 
    "max_memory":"6.GB", 
    "dada_ref_taxonomy": "'" ${json_param//=/\\u003D} "'", 
    "skip_cutadapt": true 
}' \
--workflow='nf-core/ampliseq' \
--workflowVersion='2.8.0'

nf-core Launch via Web Walkthrough

1. Navigate to the nf-core launcher card.

  • Find nf-core workflow via:
    • Inside Workflows, start scrolling down the category, click nf-core to display it
    • Search bar in the top right corner once click Launch
image

2. Launcher Tabs

3 to-do before running an nf-core workflow

  1. Workflow name
    • Search for the desired nf-core workflow. nf-core workflow repository supports a variety of workflows available on nf-co.re.
image

b. Workflow version

  • Search for workflow version of selected workflow repository
image

c. Parameters JSON

  • Step 1: To determine required parameters, go to the documentation of selected workflow and version in the link as shown below.
image
  • Step 2: Check desired/required Parameters

Navigate to the Parameters tab to see list of parameters required for the JSON input.

  • Compulsory are parameters with required on the right side. These are needed for the workflow to run.
  • Others are optional depending on your specific needs.
  • outdir does not need be defined since Form Bio platform handles it automatically .
image
  • Step 3: For each desired/required parameter, assign value to it a put the parameters in JSON format
{
  "input": "formbio://form-bio-customer-support/onboarding-project/rnaseq_nfcore/rnaseq_nf_core/samplesheet_test.csv", 
  "fasta": "formbio://form-bio-customer-support/onboarding-project/rnaseq_nfcore/rnaseq_nf_core/genome.fasta", 
  "gtf": "formbio://form-bio-customer-support/onboarding-project/rnaseq_nfcore/rnaseq_nf_core/genes_with_empty_tid.gtf.gz"
}

  • Add the created JSON to Parameters JSON on the platform
image

🚧 Note:

  • Any file or directory paths should point to its formbio:// path
  • Within a file, any path should be formatted as described below:
    • The platform can handle different types of data paths, including those from public datasets (S3 buckets) and web addresses (HTTP).
    • If your data is already uploaded onto the platform, paths must be in Google Cloud Storage format.
    • # Example of correctly formatted paths in a samplesheet.csv
      patient,sample,lane,fastq_1,fastq_2
      ID1,S1,L002,gs://formbio-production-26a68935-d9a6-4b3e-9b89-43dacfbf5e32/DNAseq/cancergenomics/SRR15663418_T1_1.fastq.gz,gs://formbio-production-26a68935-d9a6-4b3e-9b89-43dacfbf5e32/DNAseq/cancergenomics/SRR15663418_T1_2.fastq.gz
      ID2,S2,L002,gs://formbio-production-26a68935-d9a6-4b3e-9b89-43dacfbf5e32/DNAseq/cancergenomics/SRR15663419_N1_1.fastq.gz,gs://formbio-production-26a68935-d9a6-4b3e-9b89-43dacfbf5e32/DNAseq/cancergenomics/SRR15663419_N1_2.fastq.gz

      📖 To generate the GCS format path

      🤏 Go to Data → Find desired input files → Click three-dots → Select Copy Download URL

image

👍 This generates a download URL:

https://storage.googleapis.com/formbio-production-26a68935-d9a6-4b3e-9b89-43dacfbf5e32/DNAseq/cancergenomics/SRR15663418_T1_1.fastq.gz?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=go-api%40tundra-prd.iam.gserviceaccount.com%2F20240220%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240220T065959Z&X-Goog-Expires=604799&X-Goog-Signature=8b47d09eeda7d5c3d50ca44e93390f76acf5e3e576617a622686060a4b66c52d45fb293534ddeae725593281133542f1f048deda3c8fd328b84471c103a170d89e8f684b4f6154578d48fd39481d88bc1e04ffb604f87d6a8eed7c56171912c8ea21fcae409409397d0382589695a1965e4b838ba692068d90323bfa4edbfb0b26dae5ac1513b7ae628293368c13083859f375c9c917b4040e9fe38d711923153805045093cf4f2149ea7fcb61b13f65f9f5e6ab165be1e5d2718fd335f982feecb6b307b5bb588b56424b3b59b943df5ef91ea86532814f27cf28ec45e6fc45313fc9d2440d6d5725b6d88d35784ad91dc1e71169c9a60135adafcb49c4676c&X-Goog-SignedHeaders=host&response-content-disposition=attachment
  • 👉 Replace https://storage.googleapis.com/ with gs://
  • 👉 Remove whatever comes after the first ?
  • 🙀 The final result should be:

gs://formbio-production-26a68935-d9a6-4b3e-9b89-43dacfbf5e32/DNAseq/cancergenomics/SRR15663418_T1_1.fastq.gz

💯 Parameters JSON can be created using nf-core website

  • You can use nf-co.re Launch to map which fields to populate in the JSON input field for this workflow.
    • For rnaseq https://nf-co.re/rnaseq/3.14.0 → Click on Launch version <latest> (here it’s 3.14.0) to see which JSON fields to populate into JSON format
Click Launch version 3.14.0
Click Launch version 3.14.0
Add value of parameter,
Add value of parameter, red asterisk marks those required which must be filled. Note: outdir must be omitted in the final result.
Finally, you can remove
Finally, you can remove outdir and add the generated JSON to Parameters JSON
# Final JSON format
{
    "input": "formbio:\/\/form-bio-customer-support\/onboarding-project\/rnaseq_nfcore\/rnaseq_nf_core\/samplesheet_test.csv",
    "fasta": "formbio:\/\/form-bio-customer-support\/onboarding-project\/rnaseq_nfcore\/rnaseq_nf_core\/genome.fasta",
    "gtf": "formbio:\/\/form-bio-customer-support\/onboarding-project\/rnaseq_nfcore\/rnaseq_nf_core\/genes_with_empty_tid.gtf.gz"
}

3. Go to the Review & Submit

  • ✍️ Add a run name
  • ✅ Check all input again
  • 😃 Click Run Workflow to execute
image