Dealing with country names

Identifying country names

The function is_country() allows to check whether a string is a country name. The argument fuzzy_match can be used to increase tolerance and allow for small typos in the names.

is_country(c("United States","Unated States","dot","DNK",123), fuzzy_match = FALSE) # FALSE is the default and will run faster
#> [1]  TRUE FALSE FALSE  TRUE FALSE
is_country(c("United States","Unated States","dot","DNK",123), fuzzy_match = TRUE)
#> [1]  TRUE  TRUE FALSE  TRUE FALSE

Furthermore, is_country() can also be used to check for a specific subset of countries. In the following example, the function is used to test whether the string relates to India or Sri Lanka, while allowing for different naming conventions and languages.

is_country(x=c("Ceylon","LKA","Indonesia","Inde"), check_for=c("India","Sri Lanka"))
#> [1]  TRUE  TRUE FALSE  TRUE

Finally, the package also provides the function find_countrycol(), which can be used to find which columns in a data frame contain country names.

Getting a list of country names

The functions list_countries() and random_countries() allow to get a list of country names. The former will return a list of ALL countries, while the second provides n randomly picked countries.

random_countries(5)
#> [1] "Senegal"               "Seychelles"            "Solomon Islands"      
#> [4] "Rwanda"                "Sao Tome and Principe"
list_countries()
#>   [1] "Afghanistan"                                         
#>   [2] "Åland Islands"                                       
#>   [3] "Albania"                                             
#>   [4] "Algeria"                                             
#>   [5] "American Samoa"                                      
#>   [6] "Andorra"                                             
#>   [7] "Angola"                                              
#>   [8] "Anguilla"                                            
#>   [9] "Antarctica"                                          
#>  [10] "Antigua and Barbuda"                                 
#>  [11] "Argentina"                                           
#>  [12] "Armenia"                                             
#>  [13] "Aruba"                                               
#>  [14] "Australia"                                           
#>  [15] "Austria"                                             
#>  [16] "Azerbaijan"                                          
#>  [17] "Bahamas"                                             
#>  [18] "Bahrain"                                             
#>  [19] "Bangladesh"                                          
#>  [20] "Barbados"                                            
#>  [21] "Belarus"                                             
#>  [22] "Belgium"                                             
#>  [23] "Belize"                                              
#>  [24] "Benin"                                               
#>  [25] "Bermuda"                                             
#>  [26] "Bhutan"                                              
#>  [27] "Bolivia (Plurinational State of)"                    
#>  [28] "Bonaire, Sint Eustatius and Saba"                    
#>  [29] "Bosnia and Herzegovina"                              
#>  [30] "Botswana"                                            
#>  [31] "Bouvet Island"                                       
#>  [32] "Brazil"                                              
#>  [33] "British Indian Ocean Territory"                      
#>  [34] "Virgin Islands (British)"                            
#>  [35] "Brunei Darussalam"                                   
#>  [36] "Bulgaria"                                            
#>  [37] "Burkina Faso"                                        
#>  [38] "Burundi"                                             
#>  [39] "Cabo Verde"                                          
#>  [40] "Cambodia"                                            
#>  [41] "Cameroon"                                            
#>  [42] "Canada"                                              
#>  [43] "Cayman Islands"                                      
#>  [44] "Central African Republic"                            
#>  [45] "Chad"                                                
#>  [46] "Chile"                                               
#>  [47] "China"                                               
#>  [48] "Hong Kong"                                           
#>  [49] "Macao"                                               
#>  [50] "Christmas Island"                                    
#>  [51] "Cocos (Keeling) Islands"                             
#>  [52] "Colombia"                                            
#>  [53] "Comoros"                                             
#>  [54] "Congo"                                               
#>  [55] "Cook Islands"                                        
#>  [56] "Costa Rica"                                          
#>  [57] "Côte d'Ivoire"                                       
#>  [58] "Croatia"                                             
#>  [59] "Cuba"                                                
#>  [60] "Curaçao"                                             
#>  [61] "Cyprus"                                              
#>  [62] "Czechia"                                             
#>  [63] "Korea (Democratic People's Republic of)"             
#>  [64] "Congo, Democratic Republic of the"                   
#>  [65] "Denmark"                                             
#>  [66] "Djibouti"                                            
#>  [67] "Dominica"                                            
#>  [68] "Dominican Republic"                                  
#>  [69] "Ecuador"                                             
#>  [70] "Egypt"                                               
#>  [71] "El Salvador"                                         
#>  [72] "Equatorial Guinea"                                   
#>  [73] "Eritrea"                                             
#>  [74] "Estonia"                                             
#>  [75] "Eswatini"                                            
#>  [76] "Ethiopia"                                            
#>  [77] "Falkland Islands (Malvinas)"                         
#>  [78] "Faroe Islands"                                       
#>  [79] "Fiji"                                                
#>  [80] "Finland"                                             
#>  [81] "France"                                              
#>  [82] "French Guiana"                                       
#>  [83] "French Polynesia"                                    
#>  [84] "French Southern Territories"                         
#>  [85] "Gabon"                                               
#>  [86] "Gambia"                                              
#>  [87] "Georgia"                                             
#>  [88] "Germany"                                             
#>  [89] "Ghana"                                               
#>  [90] "Gibraltar"                                           
#>  [91] "Greece"                                              
#>  [92] "Greenland"                                           
#>  [93] "Grenada"                                             
#>  [94] "Guadeloupe"                                          
#>  [95] "Guam"                                                
#>  [96] "Guatemala"                                           
#>  [97] "Guernsey"                                            
#>  [98] "Guinea"                                              
#>  [99] "Guinea-Bissau"                                       
#> [100] "Guyana"                                              
#> [101] "Haiti"                                               
#> [102] "Heard Island and McDonald Islands"                   
#> [103] "Holy See"                                            
#> [104] "Honduras"                                            
#> [105] "Hungary"                                             
#> [106] "Iceland"                                             
#> [107] "India"                                               
#> [108] "Indonesia"                                           
#> [109] "Iran (Islamic Republic of)"                          
#> [110] "Iraq"                                                
#> [111] "Ireland"                                             
#> [112] "Isle of Man"                                         
#> [113] "Israel"                                              
#> [114] "Italy"                                               
#> [115] "Jamaica"                                             
#> [116] "Japan"                                               
#> [117] "Jersey"                                              
#> [118] "Jordan"                                              
#> [119] "Kazakhstan"                                          
#> [120] "Kenya"                                               
#> [121] "Kiribati"                                            
#> [122] "Kuwait"                                              
#> [123] "Kyrgyzstan"                                          
#> [124] "Lao People's Democratic Republic"                    
#> [125] "Latvia"                                              
#> [126] "Lebanon"                                             
#> [127] "Lesotho"                                             
#> [128] "Liberia"                                             
#> [129] "Libya"                                               
#> [130] "Liechtenstein"                                       
#> [131] "Lithuania"                                           
#> [132] "Luxembourg"                                          
#> [133] "Madagascar"                                          
#> [134] "Malawi"                                              
#> [135] "Malaysia"                                            
#> [136] "Maldives"                                            
#> [137] "Mali"                                                
#> [138] "Malta"                                               
#> [139] "Marshall Islands"                                    
#> [140] "Martinique"                                          
#> [141] "Mauritania"                                          
#> [142] "Mauritius"                                           
#> [143] "Mayotte"                                             
#> [144] "Mexico"                                              
#> [145] "Micronesia (Federated States of)"                    
#> [146] "Monaco"                                              
#> [147] "Mongolia"                                            
#> [148] "Montenegro"                                          
#> [149] "Montserrat"                                          
#> [150] "Morocco"                                             
#> [151] "Mozambique"                                          
#> [152] "Myanmar"                                             
#> [153] "Namibia"                                             
#> [154] "Nauru"                                               
#> [155] "Nepal"                                               
#> [156] "Netherlands"                                         
#> [157] "New Caledonia"                                       
#> [158] "New Zealand"                                         
#> [159] "Nicaragua"                                           
#> [160] "Niger"                                               
#> [161] "Nigeria"                                             
#> [162] "Niue"                                                
#> [163] "Norfolk Island"                                      
#> [164] "North Macedonia"                                     
#> [165] "Northern Mariana Islands"                            
#> [166] "Norway"                                              
#> [167] "Oman"                                                
#> [168] "Pakistan"                                            
#> [169] "Palau"                                               
#> [170] "Panama"                                              
#> [171] "Papua New Guinea"                                    
#> [172] "Paraguay"                                            
#> [173] "Peru"                                                
#> [174] "Philippines"                                         
#> [175] "Pitcairn"                                            
#> [176] "Poland"                                              
#> [177] "Portugal"                                            
#> [178] "Puerto Rico"                                         
#> [179] "Qatar"                                               
#> [180] "Korea, Republic of"                                  
#> [181] "Moldova, Republic of"                                
#> [182] "Réunion"                                             
#> [183] "Romania"                                             
#> [184] "Russian Federation"                                  
#> [185] "Rwanda"                                              
#> [186] "Saint Barthélemy"                                    
#> [187] "Saint Helena, Ascension and Tristan da Cunha"        
#> [188] "Saint Kitts and Nevis"                               
#> [189] "Saint Lucia"                                         
#> [190] "Saint Martin (French part)"                          
#> [191] "Saint Pierre and Miquelon"                           
#> [192] "Saint Vincent and the Grenadines"                    
#> [193] "Samoa"                                               
#> [194] "San Marino"                                          
#> [195] "Sao Tome and Principe"                               
#> [196] "Saudi Arabia"                                        
#> [197] "Senegal"                                             
#> [198] "Serbia"                                              
#> [199] "Seychelles"                                          
#> [200] "Sierra Leone"                                        
#> [201] "Singapore"                                           
#> [202] "Sint Maarten (Dutch part)"                           
#> [203] "Slovakia"                                            
#> [204] "Slovenia"                                            
#> [205] "Solomon Islands"                                     
#> [206] "Somalia"                                             
#> [207] "South Africa"                                        
#> [208] "South Georgia and the South Sandwich Islands"        
#> [209] "South Sudan"                                         
#> [210] "Spain"                                               
#> [211] "Sri Lanka"                                           
#> [212] "Palestine, State of"                                 
#> [213] "Sudan"                                               
#> [214] "Suriname"                                            
#> [215] "Svalbard and Jan Mayen"                              
#> [216] "Sweden"                                              
#> [217] "Switzerland"                                         
#> [218] "Syrian Arab Republic"                                
#> [219] "Tajikistan"                                          
#> [220] "Thailand"                                            
#> [221] "Timor-Leste"                                         
#> [222] "Togo"                                                
#> [223] "Tokelau"                                             
#> [224] "Tonga"                                               
#> [225] "Trinidad and Tobago"                                 
#> [226] "Tunisia"                                             
#> [227] "Turkey"                                              
#> [228] "Turkmenistan"                                        
#> [229] "Turks and Caicos Islands"                            
#> [230] "Tuvalu"                                              
#> [231] "Uganda"                                              
#> [232] "Ukraine"                                             
#> [233] "United Arab Emirates"                                
#> [234] "United Kingdom of Great Britain and Northern Ireland"
#> [235] "Tanzania, United Republic of"                        
#> [236] "United States Minor Outlying Islands"                
#> [237] "United States of America"                            
#> [238] "Virgin Islands (U.S.)"                               
#> [239] "Uruguay"                                             
#> [240] "Uzbekistan"                                          
#> [241] "Vanuatu"                                             
#> [242] "Venezuela (Bolivarian Republic of)"                  
#> [243] "Viet Nam"                                            
#> [244] "Wallis and Futuna"                                   
#> [245] "Western Sahara"                                      
#> [246] "Yemen"                                               
#> [247] "Zambia"                                              
#> [248] "Zimbabwe"                                            
#> [249] "Taiwan, Province of China"

The function allows to request country names in different languages and nomenclatures. The list of all possible languages and nomenclatures is available in the next section.

random_countries(5, nomenclature = "ISO3")
#> [1] "GBR" "GUF" "DOM" "SPM" "ESP"
random_countries(5, nomenclature = "name_ar")
#> [1] "سان بيير وميكلون" "ساحل العاج"       "جزر أولاند"       "المغرب"          
#> [5] "ألمانيا"

Converting and translating country names

The function country_name() can be used to convert country names to different naming conventions or to translate them to different languages.

example <- c("United States","DR Congo", "Morocco")

# Getting 3-letters ISO code
country_name(x= example, to="ISO3")
#> [1] "USA" "COD" "MAR"

# Translating to Spanish
country_name(x= example, to="name_es")
#> [1] "Estados Unidos"                  "República Democrática del Congo"
#> [3] "Marruecos"

If multiple arguments are passed to the argument to, the function will output a data.frame object, with one column corresponding to every naming convention.

# Requesting 2-letter ISO codes and translation to Spanish and French
country_name(x= example, to=c("ISO2","name_es","name_fr"))
#>   ISO2                         name_es                          name_fr
#> 1   US                  Estados Unidos                       États-Unis
#> 2   CD República Democrática del Congo République démocratique du Congo
#> 3   MA                       Marruecos                            Maroc

The to argument supports all the following naming conventions:

CODE DESCRIPTION
simple This is a simple english version of the name containing only ASCII characters. This nomenclature is available for all countries.
ISO3 3-letter country codes as defined in ISO standard 3166-1 alpha-3. This nomenclature is available only for the territories in the standard (currently 249 territories).
ISO2 2-letter country codes as defined in ISO standard 3166-1 alpha-2. This nomenclature is available only for the territories in the standard (currently 249 territories).
ISO_code Numeric country codes as defined in ISO standard 3166-1 numeric. This country code is the same as the UN’s country number (M49 standard). This nomenclature is available for the territories in the ISO standard (currently 249 countries).
UN_xx Official UN name in 6 official UN languages. Arabic (UN_ar), Chinese (UN_zh), English (UN_en), French (UN_fr), Spanish (UN_es), Russian (UN_ru). This nomenclature is only available for countries in the M49 standard (currently 249 territories).
WTO_xx Official WTO name in 3 official WTO languages: English (WTO_en), French (WTO_fr), Spanish (WTO_es). This nomenclature is only available for WTO members and observers (currently 189 entities).
name_xx Translation of ISO country names in 28 different languages: Arabic (name_ar), Bulgarian (name_bg), Czech (name_cs), Danish (name_da), German (name_de), Greek (name_el), English (name_en), Spanish (name_es), Estonian (name_et), Basque (name_eu), Finnish (name_fi), French (name_fr), Hungarian (name_hu), Italian (name_it), Japponease (name_ja), Korean (name_ko), Lithuanian (name_lt), Dutch (name_nl), Norwegian (name_no), Polish (name_po), Portuguese (name_pt), Romenian (name_ro), Russian (name_ru), Slovak (name_sk), Swedish (name_sv), Thai (name_th), Ukranian (name_uk), Chinese simplified (name_zh), Chinese traditional (name_zh-tw)
GTAP GTAP country and region codes.
all Converts to all the nomenclatures and languages in this table

Further options and warning messages

country_name() can identify countries even when they are provided in mixed formats or in different languages. It is robust to small misspellings and recognises many alternative country names and old nomenclatures.

fuzzy_example <- c("US","C@ète d^Ivoire","Zaire","FYROM","Estados Unidos","ITA")

country_name(x= fuzzy_example, to=c("UN_en"))
#> Multiple country IDs have been matched to the same country name
#> 
#> Set - verbose - to TRUE for more details
#> [1] "United States of America"         "Côte d’Ivoire"                   
#> [3] "Democratic Republic of the Congo" "North Macedonia"                 
#> [5] "United States of America"         "Italy"

More information on the country matching process can be obtained by setting verbose=TRUE. The function will print information on:

country_name(x= fuzzy_example, to=c("UN_en"), verbose=TRUE)
#> 
#> In total 6 unique country names were provided
#> 5/6 have been matched with EXACT matching
#> 1/6 have been matched with FUZZY matching
#> 
#> 
#> Multiple arguments have been matched to the same country name:
#>   - Estados Unidos : United States of America 
#>   - US : United States of America
#> [1] "United States of America"         "Côte d’Ivoire"                   
#> [3] "Democratic Republic of the Congo" "North Macedonia"                 
#> [5] "United States of America"         "Italy"

In addition, setting verbose=TRUE will also print additional informations relating to specific warnings that are normally given by the function:

country_name(x= c("Taiwan","lsajdèd"), to=c("UN_en"), verbose=FALSE)
#> Some country IDs have no match in one or more country naming conventions
#> There is low confidence on the matching of some country names
#> 
#> Set - verbose - to TRUE for more details
#> [1] NA NA

All the information from verbose mode can be accessed by setting ´simplify=FALSE´. This will return a list object containing:

Using custom conversion tables

In some cases, the user might be unhappy with the naming conversion or no valid conversion might exist for the provided territory. In these cases, it might be useful to tweak the conversion table. The package contains a utility function called match_table(), which can be used to generate conversion tables for small adjustments.

example_custom <- c("Siam","Burma","H#@°)Koe2")

#suppose we are unhappy with how "H#@°)Koe2" is interpreted by the function
country_name(x = example_custom, to = "name_en")
#> There is low confidence on the matching of some country names
#> 
#> Set - verbose - to TRUE for more details
#> [1] "Thailand" "Myanmar"  NA

#match_table can be used to generate a table for small adjustments
tab <- match_table(x = example_custom, to = "name_en")
#> There is low confidence on the matching of some country names
tab$name_en[2] <- "Hong Kong"

#which can then be used for conversion
country_name(x = example_custom, to = "name_en", custom_table = tab)
#> [1] "Thailand"  "Myanmar"   "Hong Kong"