Interactive jobs#

Interactive jobs enable you to interpret the results of automated analysis workflows and deliver actionable insights to your customers using data exploration and visualization tools. With interactive jobs, you can:

  • Query and visualize multi-omics big data using SQL

  • Better determine the biological classification and clinical relevance of observed variants

  • Make conclusions about particular features, expressions, or mutations

  • Recommend downstream actions


Create an interactive job using the console#

Prerequisites#

  • User role with permission to create and manage jobs (Global Administrator, Job Administrator, or Job Executor)

  • Dataset imported into the SeqsLab Data Hub

  • Container image built from one of the base images that Atgenomix specifically created for SeqsLab
    These base images are publicly accessible in the Atgenomix GitHub repository.

Steps#

  1. Go to Jobs.
    The Jobs screen appears.

  2. Click + New job and then select Create an interactive job.
    The Job Settings screen appears.

  3. Configure the following required settings:

  • Name: Specify a unique, descriptive name.

  • Run Datasets: Select the dataset that you want to analyze.

  • Workspace: Select the workspace wherein you want to run the job and access the output.

  • Container image name: Select the name of the image that reflects the runtime environment you want to use.

  • Container image tag: Assign a tag to the selected image.

  • Managed cluster: Select a cluster with the computing resources required to run the job.

    SeqsLab displays a summary of the selected cluster’s computing resources and automatically fills the Advanced settings field. You can modify the default Spark cluster properties based on your needs.

    The cluster’s configuration influences the number of Atgenomix Compute Units (ACUs) assigned to your job. The total ACU count is determined by the amount of consumed computing resources and processed data.

  1. Click Create.
    SeqsLab starts provisioning the required computing resources. The job appears in the list on the Jobs screen.

Next steps#

  1. Wait for three to five minutes and then refresh the screen to see the updated job state. Proceed to the next step only when the job state is Running.

    Operation

    Job State

    A run_id and start_time are assigned to the job.

    Unknown

    An Azure Batch node is launched.

    Queued

    A shell script for provisioning the cluster is run in the node.

    Initializing

    The output (lab_endpoint and sql_endpoint) is generated.

    Running

  2. Obtain the output by navigating to the job details panel.

  3. Investigate the output using data exploration and visualization tools.

Note

Interactive jobs remain in the Running state until stopped. You can manually stop a job or schedule stoppage at a later time.

Output retrieval#

  1. Locate the job in the Jobs list.

  2. Verify that the job state is Running or Complete.

  3. Click the job name.
    The job details panel opens.

  4. Click Activity.
    The panel displays the URLs for SQL and JupyterLab.


Create an interactive job using the WES API#

Use the Add a run API to create an interactive job.

  • Operation: POST

  • Path: /wes/v1/runs/

Prerequisites#

  • User role with permission to create and manage jobs (Global Administrator, Job Administrator, or Job Executor)

  • API access token

  • Dataset imported into the SeqsLab Data Hub

  • Container image built from one of the base images that Atgenomix specifically created for SeqsLab
    These base images are publicly accessible in the Atgenomix GitHub repository.

Steps#

  1. Configure the required parameters in the request body.

  • name: Specify a unique, descriptive name.

  • workflow_type: Select SQL.

  • workflow_params: Specify the input variables, the dataset that you want to analyze, and the workflow tasks to be executed.

    • dataset: Specify the default Hive database that you want to use when running data exploration and visualization tools such as Superset through the SeqsLab spark cluster.

    • tasks: Specify the name of the container image that reflects the runtime environment you want to use.

  • workflow_backend_params: Specify the workspace and cluster wherein you want to run the job.

    Important

    In the cluster settings parameter, ensure that you specify the correct values for the following:

    • virtual machine size

    • spot and dedicated nodes

  1. Send the request.
    SeqsLab starts provisioning the resources required to generate the endpoints for data exploration and visualization.

    Operation

    Job State

    A run_id and start_time are assigned to the job.

    Unknown

    An Azure Batch node is launched.

    Queued

    A shell script for provisioning the cluster is run in the node.

    Initializing

    The output (lab_endpoint and sql_endpoint) is generated.

    Running

  2. Locate the job run_id in the response payload.

Note

Interactive jobs remain in the Running state until stopped. You can manually stop a job or schedule stoppage at a predetermined time.

Output retrieval#

Use the Get a specific run API to retrieve the output of an interactive job.

  • Operation: GET

  • Path: /wes/v1/runs/{run_id}

  1. Verify that the role assigned to your user account has permission to view job details (Global Administrator, Job Administrator, or Job Viewer).

  2. Specify the job run_id in the request body.

  3. In the response payload, under outputs, locate the values for lab_endpoint and sql_endpoint.
    The fields display values only if the job state is Running.


Using the output of interactive jobs#

You can interpret the results of automated jobs and other secondary analysis methods using the following tools:

  • JupyterLab

    JupyterLab is a web-based, interactive development environment that enables you to explore and visualize data within computational notebooks.

    To access interactive job output, obtain the link for JupyterLab (on the console) or lab_endpoint (in the API response) and then open the URL on your browser.

  • Apache Superset + SeqsLab Connector for Python

    Apache Superset is a business intelligence web application that enables you to explore and visualize data with numerous visualization options and interactive dashboards.

    SeqsLab Connector for Python is a client for the Thrift Hive Server that enables you to create Python Database API connections to SeqsLab interactive jobs. The connector does not have any dependencies on the Microsoft Open Database Connectivity (ODBC) interface and the Java Database Connectivity (JDBC) API.

    To access interactive job output, install Apache Superset and the SeqsLab Connector for Python on the same terminal. Next, obtain the link for SQL (on the console) or sql_endpoint (in the API response) and then open the URL on your browser.