Built 2022-01-05 using NMdata 0.0.10.
Please make sure to see latest version available here.
This cheat sheet is intented to provide an overview and remind of command names. Please refer to other vignettes for more details on specefic topics and individual manual pages for details on the functions.
install.packages("NMdata")
library(NMdata)
In building the data set, key steps are stacking data sets (like doses, samples, and simulation records) and adding additional information such as covariates. We often use rbind
and merge or join operations for these steps. NMdata
helps explore how to do these steps and ensure the results are as expected.
compareCols
- Compare presence and classes of columns across data sets before merging or stacking.
compareCols(covs,covs2)
#> Dimensions:
#> data nrows ncols
#> 1: covs 150 2
#> 2: covs2 150 2
#>
#> Columns that differ:
#> column covs covs2
#> 1: WEIGHTB numeric <NA>
#> 2: cov2 <NA> character
Use the cols.wanted
argument for the overview to especially focus on the columns you need in your final data set.
renameByContents
- Keep track of what columns are compatible with Nonmem by renaming those that aren’t. Rename all columns that Nonmem cannot interpret as numeric to lowercase (see NMisNumeric
in Programming section):
## Append an "N" to columns that NONMEM can read (as numeric)
pk <- renameByContents(data=pk,
fun.test = NMisNumeric,
fun.rename = function(x)paste0(x,"N"))
## lowercase names of columns that NONMEM cannot read as numeric
pk <- renameByContents(data=pk,
fun.test = NMisNumeric,
fun.rename = tolower,
invert.test = TRUE)
mergeCheck(x1,x2,...)
- Merges data and only accept results if all that happened was that columns from x1
were added to x1
. Row order of x1
is retained. Arguments are passed to data.table which does the actual merge. This automates the checks we need to do after say merging covariates onto data.
pk2 <- mergeCheck(pk,covs2,by="ID")
#> The following columns were added: cov2
We did not get an error from mergeCheck
so we know that the rows in pk2
are exactly identical to those in pk
, except the addition of a column called cov2
.
flagsAssign
- Assign exclusion flags to a dataset based on specified table
flagsCount
- Create an overview of number of retained and discarded datapoints
This is a simple example where we use only two exclusion flags. If time is negative, we assign exclusion flag FLAG=100
. If (time is non-negative and) BLQ==1
we assign FLAG=10
. If none of these conditions are met, FLAG=0
, and the row will be included in the analysis. fread
is just a way to write the table row-wise for readability.
dt.flags <- fread(text="FLAG,flag,condition
10,Below LLOQ,BLQ==1
100,Negative time,TIME<0")
pk <- flagsAssign(pk,tab.flags=dt.flags,subset.data="EVID==0")
#> Coding FLAG = 100, flag = Negative time
#> Coding FLAG = 10, flag = Below LLOQ
pk <- flagsAssign(pk,subset.data="EVID==1",flagc.0="Dosing")
flagsCount(pk[EVID==0],tab.flags=dt.flags)[,.( flag, N.left, Nobs.left, N.discard, Nobs.discard)]
#> flag N.left Nobs.left N.discard Nobs.discard
#> 1: All available data 150 1352 NA NA
#> 2: Negative time 150 1350 0 2
#> 3: Below LLOQ 131 755 19 595
#> 4: Analysis set 131 755 NA NA
NMorderColumns
- Standardize column order. Columns that can be read by NONMEM are prioritized towards left.
NMcheckData
- Extensive data checks for NONMEM compatibility and common issues.
NMwriteData
- Write data ensuring compatibility with NONMEM. By defaults saves both a csv (for NONMEM) and an rds (for R, retaining factor levels etc). Text for optional use in $INPUT
and $DATA
NONMEM sections is returned. script
and args.stamp
are optional arguments, see “Traceability” section for their purpose.
text.nm <- NMwriteData(pk,file="derived/pkdata.csv",script="NMdata-cheat.Rmd",args.stamp=list(Description="PK data for the NMdata Cheatsheet"))
#> Data written to file(s):
#> derived/pkdata.csv
#> derived/pkdata.rds
#> For NONMEM:
#> $INPUT ROW ID NOMTIME TIME EVID CMT AMT DV FLAG STUDY BLQ CYCLE DOSE
#> PART PROFDAY PROFTIME eff0
#> $DATA derived/pkdata.csv
#> IGN=@
#> IGNORE=(FLAG.NE.0)
NMwriteSection
- Replace sections of a nonmem control stream. NMwriteSection
can use the text generated by NMwriteData
to update NONMEM runs to match the newly generated input data. Update INPUT section (and not DATA) for all control streams in directory “nonmem” which file names start with “run1” and end in “.mod” (say “run101.mod” to “run199.mod”):
NMwriteSection(dir="nonmem",
file.pattern="run1.*\\.mod",
list.sections=text.nm["INPUT"])
NMwriteSection
has the argument data.file
to further limit the scope of files to update based on what data file the control streams use. It only makes sense to use the auto-generated text for control streams that use this data set.
The text for NONMEM can be generated without saving data using NMgenText
. You can tailor the generation of the text to copy (DV=CONC)
, drop (COL=DROP)
, rename (DV
instead of CONC
) and more.
NMcheckData
was mentioned under “Data preparation” because it can check a data set before it’s written to file. However, it can also be on a path to a control stream, in which case it checks column names in INPUT section against data and then runs a full check of the data set as read by NONMEM (according to column names in $INPUT and ACCEPT/IGNORE statements in $DATA). We suppress the default print to terminal (quiet=T
) and provide selected parts of the results here.
res.debug <- NMcheckData(file="nonmem/run201.mod",quiet=T)
## we will only show some of what is available here
names(res.debug)
#> [1] "datafile" "tables" "dataCreate" "input.filters"
#> [5] "input.colnames" "NMcheckData"
## Meta data on input data file:
res.debug$tables
#> source name nrow ncol nid filetype file.mtime
#> 1: input pkdata.csv 1502 23 150 text 2022-01-05 23:11:17
#> file has.col.row has.col.id
#> 1: nonmem/../derived/pkdata.csv TRUE TRUE
In this model we forgot to update the control stream INPUT section after adding a column to data (“off” means that INPUT text can be reorganized to match data file better):
## Comparison of variable naming:
res.debug$input.colnames[c(1:2)]
#> datafile INPUT nonmem result compare
#> 1: ROW ROW ROW ROW OK
#> 2: ID ID ID ID OK
res.debug$input.colnames[c(9:12)]
#> datafile INPUT nonmem result compare
#> 1: FLAG FLAG FLAG FLAG OK
#> 2: STUDY BLQ BLQ BLQ off
#> 3: BLQ CYCLE CYCLE CYCLE off
#> 4: CYCLE DOSE DOSE DOSE off
We have some findings on the data set too. But since res.debug$input.colnames
tells us we are reading the data incorrectly, we have to address that before interpreting findings on the data.
res.debug$NMcheckData$summary
#> column check N Nid
#> 1: EVID Subject has no obs 19 19
#> 2: MDV Column not found 1 0
If you are preparing a data set, run NMcheckData
directly on the data (using the data
argument) insted of on a control stream.
NMscanData
- Automatically find Nonmem input and output tables and organize data. By default, available column names are taken from the NONMEM control stream. Additional column names (columns not read by NONMEM) are taken from input data file.
res1 <- NMscanData("nonmem/run101.lst")
#> Model: run101
#> Input and output data merged by: ROW
#>
#> Used tables, contents shown as used/total:
#> file rows columns IDs
#> run101_res.txt 905/905 7/7 150/150
#> run101_res_vols.txt 905/905 3/7 150/150
#> run101_res_fo.txt 150/150 1/2 150/150
#> pkdata.rds (input) 905/1502 20/23 150/150
#> (result) 905 31+2 150
#>
#> Distribution of rows on event types in returned data:
#> EVID Output
#> 0 755
#> 1 150
class(res1)
#> [1] "NMdata" "data.table" "data.frame"
The following plot serves to illustrate that the obtained data set combines output tables (PRED
is from a $TABLE statement) with input data (exclusion flags are represented as character variables). Moreover, the “below LLOQ” samples are included in the result even though they were not in the analysis (excluded using IGNORE
in control stream, recovered in NMscanData
using recover.rows=TRUE
)
library(ggplot2)
## tell NMdata functions to return data.tables
NMdataConf(as.fun="data.table")
res1.dt <- NMscanData("nonmem/run101.lst",recover.rows=TRUE)
#> Model: run101
#> Input and output data merged by: ROW
#>
#> Used tables, contents shown as used/total:
#> file rows columns IDs
#> run101_res.txt 905/905 7/7 150/150
#> run101_res_vols.txt 905/905 3/7 150/150
#> run101_res_fo.txt 150/150 1/2 150/150
#> pkdata.rds (input) 1502/1502 20/23 150/150
#> (result) 1502 31+2 150
#>
#> Distribution of rows on event types in returned data:
#> EVID Input only Output
#> 0 597 755
#> 1 0 150
ggplot(res1.dt[ID==135&EVID==0],aes(TIME))+
geom_point(aes(y=DV,colour=flag))+
geom_line(aes(y=PRED))+
labs(y="Concentration (unit)",subtitle=unique(res1.dt$model))
#> Warning: Removed 2 row(s) containing missing values (geom_path).
Read the messages from NMwriteData
and NMscanData
carefully and notice that an rds file was written and read. This bypasses the loss of information caused by writing and reading csv, and so we have kept factor levels from the input data we generated:
levels(res1.dt$trtact)
#> [1] "Placebo" "3 mg" "10 mg" "30 mg" "100 mg" "300 mg"
NMscanTables
- Find and read all output data tables based on a NONMEM control stream file. A list of tables is returned.
NMreadTab
- Read an output table file from NONMEM based on path to output data file
NMscanInput
- Read input data based on NONMEM control stream and optionally translate column names according to the $INPUT
NONMEM section
NMreadCsv
- Read input data formatted for nonmem
Use the many options in NMdataConf
to tailor NMdata behaviour to your setup and preferences. Make NMdata functions return data.tables or tibbles:
NMdataConf(as.fun=tibble::as_tibble)
NMdataConf(as.fun="data.table")
By default, NMdata functions will look for a unique row identifier in columns called ROW
. If you call this column REC
, do
NMdataConf(col.row="REC")
By default, NMdata is configured to read files from PSN in which case the input control stream is needed to find the input data. Do this if you don’t use PSN:
NMdataConf(file.mod=identity)
Loosely speaking, NMdataConf
changes default values of NMdata function arguments. Many options can be configured this way so you don’t have to remember to type in those arguments every time you call an NMdata funtion.
NMinfo
- Get metadata from an NMdata object. This will show where and when input data was created, when model was run, results of concistency checks, what tables were read, how they were combined and a complete list of data columns and their origin.
A list of the available elements:
names(NMinfo(res1.dt))
#> [1] "details" "datafile" "dataCreate" "input.colnames"
#> [5] "tables" "columns"
The information recorded during saving of the input data:
NMinfo(res1.dt,"dataCreate")
#> $DataCreateScript
#> [1] "NMdata-cheat.Rmd"
#>
#> $CreationTime
#> [1] "2022-01-05 23:11:17 EST"
#>
#> $writtenTo
#> [1] "derived/pkdata.rds"
#>
#> $Description
#> [1] "PK data for the NMdata Cheatsheet"
A full list of columns in all columns in output and input data is included. The source data file and the column number in the result (COLNUM
) are listed.
NMinfo(res1.dt,"columns")[1:8]
#> variable file source level COLNUM
#> 1: ROW run101_res.txt output row 1
#> 2: ID run101_res_vols.txt output row 2
#> 3: NOMTIME pkdata.rds input row 3
#> 4: TIME pkdata.rds input row 4
#> 5: EVID pkdata.rds input row 5
#> 6: CMT pkdata.rds input row 6
#> 7: AMT pkdata.rds input row 7
#> 8: DV run101_res.txt output row 8
We saw earlier that we got “30+2” columns back. We see that the additional two were added by NMscanData (source
). DV
was already included from another table so the redundant DV
column is omitted.
NMinfo(res1.dt,"columns")[30:33]
#> variable file source level COLNUM
#> 1: flag pkdata.rds input row 30
#> 2: trtact pkdata.rds input row 31
#> 3: model <NA> NMscanData model 32
#> 4: nmout <NA> NMscanData row 33