Cybersecurity assurance#

Cybersecurity, also known as information technology security, is the best practice of protecting networked systems, critical functions, and sensitive information from digital attacks, which may either originate from inside or outside an organization. Atgenomix follows the most relevant security frameworks and regulations in the healthcare industry. Furthermore, the SeqsLab platform uses a layered approach to cloud native security, which further enhances the existing defense in depth (external link) security layers in place.

Cybersecurity considerations have been brought into each design and development stage of the SeqsLab platform in order to deliver trustworthy systems that are reasonably secure from cyberattacks and misuse while performing their intended functions. This section outlines the specific design features and/or rationales and cybersecurity design controls that have been implemented on the SeqsLab platform.

Identify and protect system assets and functionality#

Atgenomix SeqsLab helps organizations manage their information security of assets and functionality to protect the loss of authenticity, availability, integrity, and confidentiality. The protection mechanisms prevent all unauthorized uses of the systems to ensure code, data, and execution integrity.

Prevent unauthorized use#

All access to the data and functionality of SeqsLab requires layered authorization with role-based access control that enforces system privileges associated with an organization’s IAM (Identity and Access Management) authentication credentials and rejects all disallowed behavior. As a defense measure, an adversary using a credential with lower role permissions would be unable to access system resources or functionality that require higher privileges (i.e., a viewer role cannot access administrative functions).

Important

Since SeqsLab seamlessly incorporates cloud IAM (e.g., Microsoft Azure Active Directory) for credential authentication, Atgenomix strongly recommends enabling multi-factor authentication and using strong passwords to harden privileged system access.

SeqsLab has the appropriate protections in place that prevent system resources for data, tools, execution, and transmission interfaces from being used by unauthorized parties.

  • User must have a valid account credential in an organization’s identity and access management system in order to sign in to the platform.

  • User must be registered in the provisioned instance of the SeqsLab platform and activated by the organization’s User Administrator.

  • User must have the appropriate role permissions to access corresponding resources or functionality (by default, all privileges are denied).

  • When accessing the systems, a user must present a valid time-limited access token. When the token expires, a user is required to refresh the token or to sign back in.

  • The system verifies the ownership and the required role permissions before processing access requests (create, retrieve, update, and delete).

Ensure trusted content by maintaining code, data, and execution integrity#

Biomedical data analysis and genome sequencing bioinformatics are lengthy and complicated processes that involve large volumes of datasets, pipelines of algorithmic software tools, and heterogeneous computing environments. Any glitches and unintended changes, originating from either inside or outside an organization, will adversely affect the analysis results, which could lead to misinterpretations. SeqsLab provides fully-managed integrity controls on users’ workflow code, data, and execution in the entire process in order to ensure run-to-run content consistency.

Code integrity#

SeqsLab offers integration with containerized bioinformatics tools (and workflows that string them together) stored in a Docker container registry (e.g., Azure Container Registry), which allows organizations to package up software tools and dependencies as a whole so that they can be easily distributed and used in a variety of computing environments. Furthermore, SeqsLab implements the GA4GH Tool Registry Service V2 (external link) standard that provides the content management of tools and workflows on semantic version control, cryptographic verification (FIPS PUB 180-4 secure hash standard), and metadata-based tool descriptor (authorship, verifier, production use, tagged container images) that are configured consistently and automatically time-tracked for each iteration of updates (meta version).

Data integrity#

Sensitive information that are transmitted on public networks, such as users’ identities, access tokens, and human genomic data, are protected and transferred using Transport Level Security (TLS) version 1.2 or later. All datasets registered to the SeqsLab Data Hub conforms to the GA4GH Data Repository Service V1 (external link) specification that provides a standard interface for data consumers, including workflow systems, to access data objects in a single, secure, and managed way. SeqsLab workflow execution service verifies the integrity of all incoming datasets by cryptographic hash algorithm (e.g., SHA-256) and ensures they are not modified in transit or at rest. This data integrity protection is necessary to ensure the safety and qualitative performance of the biomedical data analysis.

Execution integrity#

The containerized bioinformatics workflow is a modern and practical approach to run data analysis in a cloud-native infrastructure where content trust is a central concern. This is why in addition to managing code integrity and data integrity on the SeqsLab platform, the SeqsLab workflow execution service further employs the Docker Content Trust (DCT) mechanism. This allows SeqsLab to verify both the integrity and the publisher of all the Docker container images that the service operates on and pulls from, regardless of whether it’s a public or private container registry.

DCT provides the SeqsLab workflow execution service with the ability to use digital signatures for workflow images received from remote Docker registries. The signatures allow the SeqsLab workflow runtime verification of the integrity and publisher of specific image tags. By using DCT, SeqsLab requires Tool Developers to sign their images as part of their release process and Tool Users can ensure that the images they pull are signed when running analysis workflows on the SeqsLab platform.

Detect, respond, recover#

Audit logging#

The SeqsLab platform creates and stores time-stamped event logs of user activity and security events in consideration of the FDA 21 CFR Part 11 audit trail requirements. The logging mechanism allows an organization to detect security compromises and enable forensic evidence capture.

The user activity logging includes, but is not limited to, the following events:

  • Complete cycle of resource requests and responses in all available SeqsLab API apps.

  • Creation, modification, and deletion of data repository, tool registry, workflow jobs, user accounts, etc.

  • Complete process of workflow execution service including datasets accessed and created, task commands executed, cloud compute resources allocated, etc.

Furthermore, security logging includes, but is not limited to, the following events:

  • User account sign-in, sign-out, failed attempts.

  • Bad request attempts to access system resources such as role permission denials.

  • Failed integrity checks when executing workflows.

All logging records are stored and archived in the user designated cloud storage (e.g. Azure Data Lake storage) in chronological order and the default lifetime is 270 days. SeqsLab stores security logs in Common Event Format (CEF) that can be consumed by automated analysis software (e.g., Intrusion Detection System or IDS).

Contain and recover the impact and impairment of a cybersecurity incident#

The SeqsLab platform provides an on-demand, automated, and accelerated bioinformatics framework that is designed and built to contain the impact of a potential cybersecurity incident and to give organizations the resilient ability to recover from the impairment of a cybersecurity incident.

On-demand cluster computing#

The SeqLab workflow execution service facilitates granular cluster computing configuration associated with each workflow, sub-workflow, and even individual workflow tasks. Each computing cluster runtime is launched and loaded dynamically only when needed and is shut down automatically when the workflow or task is completed. This automated functionality reduces the risk of cybersecurity attacks and gives users the ability to contain the impact of a potential cybersecurity incident at a granular level. Since the computing runtimes are loaded on demand, the SeqsLab platform further facilitates the rapid deployment, verification, and validation of workflow tool container patches and updates.

Managed platform as a service#

The SeqsLab fully-managed Platform as a Service (PaaS) solution gives organizations the ability to build, run, and manage their own biomedical informatics workflows without having to develop and support the Bio-IT infrastructure required for developing and launching an analysis application. The solution architecture allows organizations to quickly deploy the required Bio-IT infrastructure on any available data center of a supported cloud provider. This decentralized infrastructure guarantees the desired level of resilience against possible cybersecurity incident scenarios such as network outages, denial of service (DoS) attacks, and disruptions that affect the quality of service for a significant period.