ontologyIndex guide

Daniel Greene

2016-08-03

ontologyIndex has a function to create an ontology_index object from a given .obo file. The package comes with three such an objects: hpo, mpo and go, encapsulating the Human Phenotype Ontology (HPO), Mammalian Phenotype Ontology (MPO) and Gene Ontology (GO) respectively. Here we’ll demonstrate the package using the HPO.

library(ontologyIndex)
data(hpo)

To use an up-to-date version, download the relevant .obo file and read it into R using the function get_ontology, passing it the file name.

ontology <- get_ontology(file)

You can use the functions get_term_property, get_term_names, get_term_children, get_term_ancestors and get_term_parents to query the ontology_index object. For instance:

get_term_ancestors(ontology=hpo, term="HP:0001873", as_names=FALSE)
## [1] "HP:0000001" "HP:0000118" "HP:0001871" "HP:0001872" "HP:0011873"
## [6] "HP:0001873"
get_term_ancestors(ontology=hpo, term="HP:0001873", as_names=TRUE)
##                                       HP:0000001 
##                                            "All" 
##                                       HP:0000118 
##                         "Phenotypic abnormality" 
##                                       HP:0001871 
## "Abnormality of blood and blood-forming tissues" 
##                                       HP:0001872 
##                    "Abnormality of thrombocytes" 
##                                       HP:0011873 
##                        "Abnormal platelet count" 
##                                       HP:0001873 
##                               "Thrombocytopenia"

However the object is just a list of vectors and lists of term properties, indexed by the IDs of the terms.

##    property     class
## 1        id character
## 2  obsolete   logical
## 3      name character
## 4   parents      list
## 5    alt_id character
## 6  children      list
## 7 ancestors      list
## 8   version character

Thus you can also look up properties for a given term using [ and [[ as appropriate. This is the best way to use the ontology_index if you are operating on multiple terms as it’s faster.

hpo$name["HP:0001873"]
##         HP:0001873 
## "Thrombocytopenia"
hpo$id[grep(x=hpo$name, pattern="Thrombocytopenia")]
##   HP:0001873 
## "HP:0001873"
hpo$ancestors[["HP:0001873"]]
## [1] "HP:0000001" "HP:0000118" "HP:0001871" "HP:0001872" "HP:0011873"
## [6] "HP:0001873"
hpo$name[hpo$ancestors[["HP:0001873"]]]
##                                       HP:0000001 
##                                            "All" 
##                                       HP:0000118 
##                         "Phenotypic abnormality" 
##                                       HP:0001871 
## "Abnormality of blood and blood-forming tissues" 
##                                       HP:0001872 
##                    "Abnormality of thrombocytes" 
##                                       HP:0011873 
##                        "Abnormal platelet count" 
##                                       HP:0001873 
##                               "Thrombocytopenia"

Removing redundant terms

A set of terms, i.e. a character vector of term IDs, may contain ancestor-descendant pairs. The function minimal_set removes such pairs so as to leave a minimal set of terms, in the sense of the ontology’s directed acyclic graph.

terms <- c("HP:0001871", "HP:0001873", "HP:0011877")
hpo$name[terms]
##                                       HP:0001871 
## "Abnormality of blood and blood-forming tissues" 
##                                       HP:0001873 
##                               "Thrombocytopenia" 
##                                       HP:0011877 
##                 "Increased mean platelet volume"
minimal_set(hpo, terms)
## [1] "HP:0001873" "HP:0011877"

Finding all ancestors of a set of terms

get_ancestors(hpo, c("HP:0001873", "HP:0011877"))
## [1] "HP:0000001" "HP:0000118" "HP:0001871" "HP:0001872" "HP:0011873"
## [6] "HP:0001873" "HP:0011876" "HP:0011877"

Operating on branches

This example shows how, starting with terms "Thrombocytopenia" and "Autosomal dominant inheritance" we would remove/prune and intersect with the "Mode of inheritance" branch of the ontology.

terms <- c("HP:0001873","HP:0000006")
hpo$name[terms]
##                       HP:0001873                       HP:0000006 
##               "Thrombocytopenia" "Autosomal dominant inheritance"
mode_of_inheritance <- hpo$id[grep(x=hpo$name, pattern="Mode of inheritance")]
hpo$name[mode_of_inheritance]
##            HP:0000005 
## "Mode of inheritance"
#remove mode of inheritance branch
exclude_branches(ontology=hpo, branch_roots=mode_of_inheritance, terms=terms)
## [1] "HP:0001873"
#prune down to mode of inheritance root
prune_branches(ontology=hpo, branch_roots=mode_of_inheritance, terms=terms)
## [1] "HP:0001873" "HP:0000005"
#only mode of inheritance branch
intersection_with_branches(ontology=hpo, branch_roots=mode_of_inheritance, terms=terms)
## [1] "HP:0000006"

Forcing compatibility

To force terms to be compatible with a particular ontology, use the force_compatibility function. This includes attempting to map obsolete terms to non-obsolete terms if they have and alternative id, i.e. ‘alt_id’ in the .obo file.

force_compatibility(hpo, c("HP:0001873","nonsense term"))
## [1] "HP:0001873"