ontologyIndex
has a function to create an ontology_index
object from a given .obo
file. The package comes with three such an objects: hpo
, mpo
and go
, encapsulating the Human Phenotype Ontology (HPO), Mammalian Phenotype Ontology (MPO) and Gene Ontology (GO) respectively. Here we’ll demonstrate the package using the HPO.
library(ontologyIndex)
data(hpo)
To use an up-to-date version, download the relevant .obo
file and read it into R using the function get_ontology
, passing it the file name.
ontology <- get_ontology(file)
You can use the functions get_term_property
, get_term_names
, get_term_children
, get_term_ancestors
and get_term_parents
to query the ontology_index
object. For instance:
get_term_ancestors(ontology=hpo, term="HP:0001873", as_names=FALSE)
## [1] "HP:0000001" "HP:0000118" "HP:0001871" "HP:0001872" "HP:0011873"
## [6] "HP:0001873"
get_term_ancestors(ontology=hpo, term="HP:0001873", as_names=TRUE)
## HP:0000001
## "All"
## HP:0000118
## "Phenotypic abnormality"
## HP:0001871
## "Abnormality of blood and blood-forming tissues"
## HP:0001872
## "Abnormality of thrombocytes"
## HP:0011873
## "Abnormal platelet count"
## HP:0001873
## "Thrombocytopenia"
However the object is just a list of vectors and lists of term properties, indexed by the IDs of the terms.
## property class
## 1 id character
## 2 obsolete logical
## 3 name character
## 4 parents list
## 5 alt_id character
## 6 children list
## 7 ancestors list
## 8 version character
Thus you can also look up properties for a given term using [
and [[
as appropriate. This is the best way to use the ontology_index
if you are operating on multiple terms as it’s faster.
hpo$name["HP:0001873"]
## HP:0001873
## "Thrombocytopenia"
hpo$id[grep(x=hpo$name, pattern="Thrombocytopenia")]
## HP:0001873
## "HP:0001873"
hpo$ancestors[["HP:0001873"]]
## [1] "HP:0000001" "HP:0000118" "HP:0001871" "HP:0001872" "HP:0011873"
## [6] "HP:0001873"
hpo$name[hpo$ancestors[["HP:0001873"]]]
## HP:0000001
## "All"
## HP:0000118
## "Phenotypic abnormality"
## HP:0001871
## "Abnormality of blood and blood-forming tissues"
## HP:0001872
## "Abnormality of thrombocytes"
## HP:0011873
## "Abnormal platelet count"
## HP:0001873
## "Thrombocytopenia"
A set of terms, i.e. a character vector of term IDs, may contain ancestor-descendant pairs. The function minimal_set
removes such pairs so as to leave a minimal set of terms, in the sense of the ontology’s directed acyclic graph.
terms <- c("HP:0001871", "HP:0001873", "HP:0011877")
hpo$name[terms]
## HP:0001871
## "Abnormality of blood and blood-forming tissues"
## HP:0001873
## "Thrombocytopenia"
## HP:0011877
## "Increased mean platelet volume"
minimal_set(hpo, terms)
## [1] "HP:0001873" "HP:0011877"
get_ancestors(hpo, c("HP:0001873", "HP:0011877"))
## [1] "HP:0000001" "HP:0000118" "HP:0001871" "HP:0001872" "HP:0011873"
## [6] "HP:0001873" "HP:0011876" "HP:0011877"
This example shows how, starting with terms "Thrombocytopenia"
and "Autosomal dominant inheritance"
we would remove/prune and intersect with the "Mode of inheritance"
branch of the ontology.
terms <- c("HP:0001873","HP:0000006")
hpo$name[terms]
## HP:0001873 HP:0000006
## "Thrombocytopenia" "Autosomal dominant inheritance"
mode_of_inheritance <- hpo$id[grep(x=hpo$name, pattern="Mode of inheritance")]
hpo$name[mode_of_inheritance]
## HP:0000005
## "Mode of inheritance"
#remove mode of inheritance branch
exclude_branches(ontology=hpo, branch_roots=mode_of_inheritance, terms=terms)
## [1] "HP:0001873"
#prune down to mode of inheritance root
prune_branches(ontology=hpo, branch_roots=mode_of_inheritance, terms=terms)
## [1] "HP:0001873" "HP:0000005"
#only mode of inheritance branch
intersection_with_branches(ontology=hpo, branch_roots=mode_of_inheritance, terms=terms)
## [1] "HP:0000006"
To force terms to be compatible with a particular ontology, use the force_compatibility
function. This includes attempting to map obsolete terms to non-obsolete terms if they have and alternative id, i.e. ‘alt_id’ in the .obo file.
force_compatibility(hpo, c("HP:0001873","nonsense term"))
## [1] "HP:0001873"