📝

Form Bio Workflow Development Guide

  • Form Bio Reserved Nextflow Configuration values
  • Form Bio Reserved Nextflow Param values
  • Source code repository requirements
  • Create correct structure
  • Convert json schema
  • Update nextflow.config
  • Import Workflow
  • Importing a new or existing workflow
  • Monitor
  • Workflow JSON schema
  • Inputs
  • Field Inputs
  • UI

Form Bio Reserved Nextflow Configuration values

The following configuration is provided by the Form Bio platform at workflow runtime, and will be ignored if set by the workflow via nextflow.config or in main.nf.

🧑‍💻
Engineering Note The Form Bio reserved configuration is provided via 2 different Nextflow configurations.

The base Nextflow head node Docker container used to launch all workflows is defined here:

  1. formbio.config generated at runtime by the Form Bio API and provides runtime configuration specific to a form bio project (e.g. BYOBID context)
  2. $HOME/nextflow.config provided as default configuration in the Form Bio Nextflow Docker container base image gls-nextflow.

See Nextflow Configuration documentation

Configuration — Nextflow 23.10.0 documentation

Introduction

Nextflow Configuration
Value
Description
process.executor
“google-lifesciences” OR “google-batch”
The Nextflow executor to run the workflow this depends on if the Workflow is running via Google Lifesciences or Google Batch
process.time
30d
Set new default timeout for a process (7 days is default)
exectuor.queueSize
50 (default) OR provided by LaunchDarkly flag enable-larger-queue-size override per Form Bio Project https://app.launchdarkly.com/default/production/features/enable-larger-queue-size/targeting
The number of tasks (VMs) the executor will handle in parallel (default for lifesciences is 1000)
google.storage.maxTransferAttempts
10
Increase retry for intermittent GCS API errors like (503 service unavailable) - Default is 0 https://www.nextflow.io/docs/latest/google.html?highlight=maxtransferattempts
google.project
Default tundra-prd OR BYO BID GCP project ID
Form Bio GCP Project to execute workflow tasks in e.g. central production projects tundra-prd for BYO BID projects then use that GCP project
google.region
Google Batch Executor: us-central1 (or whatever BYO BID project region) - Batch only supports a single region to execute tasks in (but can use multiple zones) https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#LocationPolicy.FIELDS.allowed_locations Google Lifesciences Executor (load balanced across US regions): ["us-central1", "us-west2", "us-west4", "us-east1", "us-west1"]
GCP Region(s) that the workflow tasks will execute in.
-work-dir
./work
The local working directory for nextflow tasks/steps for processes running via executor=local Docker
-bucket-dir
formbio://${org}/${project}/pipeline-outputs/${workflow-run-folder}
The working GCS directory for nextflow tasks/steps as well as intermediate staging of channel files and cache for resuming failed workflows.

Form Bio Reserved Nextflow Param values

The following params are reserved and provided by the Form Bio platform and will be overwritten if provided by the workflow or as user-provided input params.

Workflow Param
Value
Description
--output params.output
formbio://${org}/${project}/pipeline-outputs/output/
Form Bio (GCS) path for output workflow results
--region params.region
Google Lifesciences Executor: us-central1,us-west2,us-west4,us-east1,us-west1 Google Batch Executor: us-central1
GCP region to execute workflow in, based on BYOBID project. (this may not be used as we’re setting the config regions too.
--bqLabels params.bqLabels
Example: "formbio-org":"sbx-uat","formbio-project-id":"40c9baa0-9156-4e5e-a784-f3ffffd022ef","formbio-user-id":"d4312620-981b-4276-b619-567bb9518e07","formbio-operation":"workflow_run"
Form Bio provide GCP Resource labels used to track / categorize cloud usage costs by Org/Project/Workflow/User
--registry params.registry
gcr.io/bioinfo-devel
Docker container registry used by Form Bio managed workflows. (This is likely not relevant to BYO WF User Defined workflows)
--cloudprj params.cloudprj
tundra-prd OR BYOBID customer GCP project ID
deprecated Not sure if this is being used as we’re overriding --registry which by default in Form Bio managed workflows is params.registry = "gcr.io/${params.cloudprj}"

Source code repository requirements

Create correct structure

  1. Ensure main.nf and nextflow.config live in the root of the repo.
  2. Create a “workflows” directory inside the repo, if one does not already exist.
  3. Inside that directory, create another directory with same name as the ID of the workflow (if one does not already exist).
  4. Inside that directory, create a workflow.json file (the GUI input specification) and optional documentation files: overview.md citations.md inputs.md ,outputs.md .
    1. You can symlink any documentation that already exists in the repo here under the above names, or create these docs from scratch.
    2. The workflow.json schema is covered in detail below.

Convert json schema

Convert the repo’s existing json schema to a Form Bio workflow schema in workflow.json, using our Workflow JSON Schema Guide (below) to help you.

There is a script inside our workflow-schema repo that will help with this process if you have node installed on your machine. To use it:

  1. Clone the repo to your local machine
  2. Cd into repo → workflow-schema/
  3. Run sh scripts/json-schemaToV3.sh <path to json-schema file>
  4. This will output a workflow.json in our format inside the json-schema file’s parent directory
    1. It is important to then go through the output and fill in missing fields, like id, and fix any field types that the conversion script may have missed, and verify everything looks okay
    2. It can then be moved to the correct workflows/[id] directory

Some important notes:

  1. Ensure the value you give the id property matches the parent directory’s name
  2. If the schema includes a field for specifying an output directory, make sure to hide it
    1. Keep the id of the output param in the back of your mind for future steps

Update nextflow.config

Beneath the section where default params are defined, include statement that remaps our output param to theirs:

params.out_dir = "${params.output}" where “out_dir” is their param name

Import Workflow

Our GitHub app allows for fast, observable automation driven by pushes to workflow repositories on GitHub. Once a workflow is imported, any push to that repo will trigger an upload of the workflow under the version of the branch pushed to.

Importing a new or existing workflow

  1. Inside the Form Bio web app, navigate to the Manage section under Workflows.
  2. Select Create New.
  3. If you have not linked your GitHub account to the FormBio GitHub App yet, please click Sign in with GitHub and Authorize access.
  4. Once authorized, you will be redirected to the Web App and see an Import Workflow from GitHub screen.
  5. Select formbio under Select Account.
  6. Select the repository containing the workflow you would like to import under Select Repository
    1. If you do not see the repository you are looking for you will need to install the App to it
    2. To do this, open up the Select Account dropdown again and select Add or Modify Installations
    3. From there, select formbio organization and select the repositories you would like to install the app onto.
    4. Click Update Access
    5. You will be redirected back to the Web App where you can now select that repository
  7. The Select Workflow list should automatically detect the workflows located within the repo’s workflows/ directory that contain a workflow.json for you.
  8. Click Configure on the workflow you’d like to import.
  9. The configure form should be pre-populated with the default values and paths. It is strongly discouraged to change any of these at this time.
  10. Click Import Workflow and you’ll be redirected to the main branch’s build in progress. (See below, Monitor step 6).

Monitor

  1. Inside the Form Bio web app, navigate to the Manage section under Workflows (See first step in Import).
  2. You’ll see a list of cards representing all workflows that have been uploaded to the current project (whether via CLI or GitHub App) listed here.
  3. Workflows imported via GitHub App will be indicated by a GH logo and a View Deployments button.
  4. Click View Deployments to see a record of all uploads since being imported. To see logs for a particular upload, click View Logs.
  5. You’ll be taken to a logs view, displaying data about the workflow, version, and build status.
  6. If an upload was successful, you’ll see a Go to Launch button that will take you to the docs page for that version, where you can Launch the workflow.

Workflow JSON schema

Form Bio schemas currently support the configuration of workflows as JSON. They define metadata for the workflow, the necessary and optional parameters needed for a successful run, how those parameters should be presented in forms in the web application, as well as conditionals that allow the schema to dynamically respond to certain input values being provided by users.

Here is a simple example schema:

{
	"id": "myWorkflowSchema",
	"schema": "v3",
	"displayName": "My Workflow Schema",
	"description": "Here is my workflow schema",
	"title": "formbio/nf-myWorkflow",
	"workflowVersion": "v1.0.0",
	"categories": ["My Workflow Category"],
	"inputs": {
		"myFirstTextInput": {
			"title": "My First Text Input",
			"type": "string",
			"hidden": false,
			"required": true,
			"description": "Here is a description",
			"help_text": "Here are more details....",
			"default": "my default value",
			"pattern": "someRegexPatternItMustMatch"
		},
		"mySecondTextInput": {
			"title": "My Second Text Input",
			"type": "string",
			"hidden": false,
			"required": true,
			"description": "Here is a description",
			"help_text": "Here are more details....",
			"default": "my default value",
			"pattern": "someRegexPatternItMustMatch"
		}
	},
	"ui": {
		"inputs": [
			{
				"id": "myGroup",
			  "title": "My Group",
				"fields": ["myFirstTextInput", "mySecondTextInput"],
			  "help_text": "More information on this grouping...",
			  "description": "Here is a description",
			  "hidden": false
			}
		]
	},
}

id

  • A unique id for identifying this workflow.

schema

  • A pointer to the schema version, should be v3.

displayName

  • A user-friendly name displayed in the UI for this workflow.

title

  • A pointer to the repo name this workflow lives inside.

workflowVersion

  • The version of this schema to be uploaded and selected in the UI. Should be incremented on any changes.

categories

  • An array of categories this workflow belongs to. These are shown in the GUI workflow “Launch” page.
  • Categories given here do not have to match an existing category.

description

  • A description of this workflow to be explained to users in the UI.

inputs

  • See “Inputs” (below)

ui

  • See “UI” (below)
  • Note that all workflow field inputs must be nested inside a Group.
  • Note that workflows do not have outputs so only inputs is supported inside ui.

Inputs

Field Inputs

Field inputs represent a run parameter whose value can be provided by a user. The Field Input configuration defines the type of input to display in the form in the web application, as well as labels, help texts, default values, and various forms of validation, depending on the field.

Every Field Input contains the following properties: title, hidden, required, description, and help_text

  • description should be a concise description of parameter as it will be displayed immediately below the input in the form. These can be dynamically updated using conditionals.
  • help_text can be more verbose explanations of the parameters as these will be displayed in a collapsable help drawer to the right of the form. These cannot be dynamically updated using conditionals, so it maybe helpful to provide insight here as to why an input might change.

Here is a list of supported Field Inputs and their run parameters:

‣
Text Input
‣
Integer Input
‣
Text Area Input
‣
Dropdown Input
‣
Multi-select Input
‣
Radio Input
‣
Checkbox Input
‣
Range Input
‣
File Input
‣
Directory Input
‣
Spreadsheet Input

UI