library("ofhelper")
# required if not in $PATH
my_dx_binary <- "/PATH/TO/dx"
# required for initialization. Can be changed afterwards
my_dx_project <- "project-12345"
# Required if not logged in already (outside of R)
my_dx_access_token <- "..."
# Optional, will use `/` otherwise
my_dx_project_dir <- "/some/dnanexus/directory/"
# Some Jupyter sessions do not provide internet access,
# so you can disable the connectivity check here
do_connectivity_check = TRUE
# set up the package using this function
dx_init(
dx_binary = my_dx_binary,
dx_project = my_dx_project,
dx_token = my_dx_access_token,
dx_path = my_dx_project_dir,
check_connectivity = do_connectivity_check
)Getting Started
Initialization
After installing the package and its dependencies, you can parse your existing dx session, but it is important to first initialize with the dx_init() function.
Afterwards, you can check the status of the package by running get_dx_cache(). This informs you about the status available to your R session and this package’s functions (i.e. the package’s cache of dx’s status). To directly check the status of dx, run dx_get_env(), which simply parses the output of dx env.
get_dx_cache()
#> [1] "/PATH/TO/dx"
#>
#> $dx_project_id
#> [1] "project-12345"
#>
#> $dx_project_name
#> [1] "YOUR_PROJECT_NAME"
#>
#> $dx_path
#> [1] "/some/dnanexus/directory/"
#>
#> $dx_user
#> [1] "your_user_name"
#>
#> $dx_server_host
#> [1] "api.dnanexus.com"
#>
#> $dx_initialized
#> [1] TRUELaunching Workstations
To launch a workstation to work in interactively, use dx_launch_workstation(). You can also get the worker job URL using get_workstation_worker_url().
my_instance <- find_tre_instance_type(
required_n_cpus = 8,
required_gb_ram = 64,
required_gb_disk_storage = 256
)
my_worker_id <- dx_launch_workstation(
session_name = "my_workstation",
instance_type = my_instance
)
# When the workstation is booted up, this will return the URL to use for interactive work
get_workstation_worker_url(my_worker_id)
#> https://...Submitting R Scripts
To submit an R script from your local environment, you can use the dx_submit_r_job() function. This simplifies to required workflow by not having to paste code to an interactive session the code changes slightly.
# Create script you'd want to execute on DNAnexus
script_file <- tempfile(pattern = "r_job", fileext = ".R")
writeLines("sessionInfo()", script_file)
# Submit the script - see the function documentation for the additional parameters
dx_submit_r_job(
script_path = script_file,
include_ofhelper = F,
tag = "test"
)Exporting Raw Data Entities
To work with the study’s data, it needs to be extracted from the SPARK database and exported as tabular data. See here for a more detailed description of the process.
# The function requires the record ID of your project as input
export_entities(
dataset_or_cohort_or_dashboard = "record-1234ABCD5678",
entities = "participant_nhs_linked", work_dir = "test"
)Decoding Raw Data
Before analysing from the SPARK database, data needs to be prepared for analysis. These steps include:
- Export (raw) data from the database
- Decode data (replace coded values with meaning) where applicable
- Clean and explore coded data
- Transform data with multiple items per participant and variable
- Deriving new variables for analysis & mapping terms to existing ontologies
For exporting raw data and decoding it, the functons export_entities() and decode_raw_ofh_file() are provided.
# TODO
# ...