TCIApathfinder and downstream analysis

Pamela Russell

2017-09-11

TCIApathfinder wraps the Cancer Imaging Archive REST API. See TCIApathfinder vignettes for an introduction to package usage. This vignette shows how images downloaded with TCIApathfinder can be processed and analyzed with other R packages.

Use TCIApathfinder to download and extract an image series

library(TCIApathfinder)

# Pick a patient of interest
patient <- "TCGA-AR-A1AQ"

# Get information on all image series for this patient
series <- get_series_info(patient_id = patient)

# Pick an image series to download
series_instance_uid <- as.character(series$series[1, "series_instance_uid"])

# Download and unzip the image series
ser <- save_image_series(series_instance_uid = series_instance_uid, out_dir = "~/Desktop", out_file_name = "series1.zip")
dicom_dir <- "~/Desktop/series1/"
unzip("~/Desktop/series1.zip", exdir = dicom_dir)

Use the “oro.dicom” package to load the image series

The oro.dicom package provides functions to process image files in DICOM format, which is the format used by TCIA. See oro.dicom package documentation for further details.

suppressPackageStartupMessages(library(oro.dicom))

# Read in the DICOM images and create a 3D array of intensities
dicom_list <- readDICOM(dicom_dir)
img_array_3d <- create3D(dicom_list)

# Check the dimensions of the 3D array
dim(img_array_3d)
## [1] 256 256  36

Note that this series consists of 116 DICOM images. Each image is 256x256 pixels.

Use the “radiomics” package to extract features from images

The radiomics package provides functions to calculate first and second order statistics from grayscale images. See radiomics package documentation for further information.

suppressPackageStartupMessages(library(radiomics))

# Pick one of the image slices
img_array <- img_array_3d[, , 1]
img_matrix <- matrix(img_array, dim(img_array))

# Calculate basic image features
calc_features(img_matrix)
##   calc_energy calc_entropy calc_kurtosis calc_meanDeviation calc_skewness
## 1   125683610     4.620808      185.7081           14.26662      9.266714
##   calc_uniformity calc_mean calc_median calc_max calc_min calc_variance
## 1      0.05375624  35.05847          31      853        0      688.6938
##   calc_RMS  calc_sd
## 1 43.79246 26.24298
# Analyze the grey level co-occurrence matrix
glcm <- glcm(img_matrix)
calc_features(glcm)
##   glcm_mean glcm_variance glcm_autoCorrelation glcm_cProminence
## 1  1.834536      2.043861             4.002711         1880.031
##   glcm_cShade glcm_cTendency glcm_contrast glcm_correlation
## 1    30.00833       7.362101     0.8133425        0.8010279
##   glcm_differenceEntropy glcm_dissimilarity glcm_energy glcm_entropy
## 1               1.247342          0.5283241   0.1929883     2.002612
##   glcm_homogeneity1 glcm_homogeneity2 glcm_IDMN  glcm_IDN
## 1         0.7604096         0.7551098  0.999229 0.9842117
##   glcm_inverseVariance glcm_maxProb glcm_sumAverage glcm_sumEntropy
## 1            0.4318814    0.3217218        5.669072        2.246996
##   glcm_sumVariance
## 1         20.26713

Download genomic data for this patient from The Cancer Genome Atlas

This patient is included in The Cancer Genome Atlas. A variety of germline and somatic genomic data can be downloaded with the Bioconductor package TCGAbiolinks. See TCGAbiolinks package vignettes for further detail. A sample workflow for analyzing TCGA data is provided in TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages.