Introduction to the R package covid19br

Introduction

This vignette shows how to use the R package covid19br for downloading and exploring data from the COVID-19 pandemic in Brazil and the globe as well. The package downloads datasets from the following repositories:

The last repository has data on the COVID-19 pandemic at the global level (daily counts of confirmed cases, deaths, and recovered patients by countries and territories), and has been widely used all over the world as a reliable source of data information on the COVID-19 pandemic. The former repository, on the other hand, possesses data on the Brazilian territory by city, state, region, and national levels.

We hope that this package may be helpful to other researchers and scientists to understand and fight this terrible pandemic that has been plaguing the world.

Getting started with R package covid19br

We will get started by showing how to use the package to load into R data sets of the COVID-19 pandemic by downloading the COVID-19 data set from the official Brazilian repository https://covid.saude.gov.br

library(covid19br)
library(tidyverse)

# downloading the data (at national level):
brazil <- downloadCovid19("brazil")

# looking at the downloaded data:
glimpse(brazil)
#> Rows: 1,157
#> Columns: 9
#> $ date         <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week     <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases     <int> 0, 1, 0, 0, 1, 0, 0, 0, 1, 4, 6, 6, 6, 0, 9, 18, 25, 21, …
#> $ accumCases   <int> 0, 1, 1, 1, 2, 2, 2, 2, 3, 7, 13, 19, 25, 25, 34, 52, 77,…
#> $ newDeaths    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> 0, 1, 1, 0, 1, 1, 0, 0, 1, 4, 6, 7, 6, 1, 6, 16, 23, 24, …
#> $ newFollowup  <int> 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 7, 12, 19, 24, 28, 36, 54, …
#> $ pop          <dbl> 210147125, 210147125, 210147125, 210147125, 210147125, 21…

# plotting the accumulative number of deaths:
ggplot(brazil, aes(x = date, y = accumDeaths)) +
  geom_point() +
  geom_path()

Next, will show how to draw a plot with the daily count of new deaths along with its respective moving averarge. Here, we will use the function pracma::movavg() to compute the moving average.

library(pracma)

# computing the moving average:
brazil <- brazil %>%
  mutate(
    ma_newDeaths = movavg(newDeaths, n = 7, type = "s")
  )

# looking at the transformed data:
glimpse(brazil)
#> Rows: 1,157
#> Columns: 10
#> $ date         <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week     <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases     <int> 0, 1, 0, 0, 1, 0, 0, 0, 1, 4, 6, 6, 6, 0, 9, 18, 25, 21, …
#> $ accumCases   <int> 0, 1, 1, 1, 2, 2, 2, 2, 3, 7, 13, 19, 25, 25, 34, 52, 77,…
#> $ newDeaths    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> 0, 1, 1, 0, 1, 1, 0, 0, 1, 4, 6, 7, 6, 1, 6, 16, 23, 24, …
#> $ newFollowup  <int> 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 7, 12, 19, 24, 28, 36, 54, …
#> $ pop          <dbl> 210147125, 210147125, 210147125, 210147125, 210147125, 21…
#> $ ma_newDeaths <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.…

After computing the desired moving average, it is convenient to reorganize the data to fit the so-called tidy data format. This task can be easily done with the aid of the function pivot_long():

deaths <- brazil %>%
  select(date, newDeaths, ma_newDeaths) %>%
  pivot_longer(
    cols = c("newDeaths", "ma_newDeaths"),
    values_to = "deaths", names_to = "type"
  ) %>%
  mutate(
    type = recode(type, 
           ma_newDeaths = "moving average",
           newDeaths = "count",
    )
  )

# looking at the (tidy) data:
glimpse(deaths)
#> Rows: 2,314
#> Columns: 3
#> $ date   <date> 2020-02-25, 2020-02-25, 2020-02-26, 2020-02-26, 2020-02-27, 20…
#> $ type   <chr> "count", "moving average", "count", "moving average", "count", …
#> $ deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

# drawing the desired plot:
ggplot(deaths, aes(x = date, y=deaths, color = type)) +
  geom_point() +
  geom_path() + 
  theme(legend.position="bottom")

When dealing with epidemiological data we are often interested in computing quantities such as incidence, mortality and lethality rates. The function covid19br::add_epi_rates() can be used to add those rates to the downloaded data, as shown below:


# downloading the data (region level):
regions <- downloadCovid19("regions") 

# adding the rates to the downloaded data:
regions <- regions %>%
  add_epi_rates()

# looking at the data:
glimpse(regions)
#> Rows: 5,785
#> Columns: 13
#> $ region       <chr> "Midwest", "Midwest", "Midwest", "Midwest", "Midwest", "M…
#> $ date         <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week     <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 3, 4, …
#> $ accumCases   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 5, 9, …
#> $ newDeaths    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ newFollowup  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ pop          <dbl> 16297074, 16297074, 16297074, 16297074, 16297074, 1629707…
#> $ incidence    <dbl> 0.000000000, 0.000000000, 0.000000000, 0.000000000, 0.000…
#> $ lethality    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ mortality    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

The function plotly::ggplotly() can be used to draw an interactive plot as follows:

library(plotly)

p <- ggplot(regions, aes(x = date, y = mortality, color = region)) +
  geom_point() +
  geom_path()

ggplotly(p)

In our last example, we will obtain a table summarizing the for the 27 Brazilian capitals in 2023-04-26.

library(kableExtra)

cities <- downloadCovid19("cities")

capitals <- cities %>%
  filter(capital == TRUE, date == max(date)) %>%
  add_epi_rates() %>%
  select(region, state, city, newCases, newDeaths, accumCases, accumDeaths, incidence, mortality, lethality) %>%
  arrange(desc(lethality), desc(mortality), desc(incidence))

# printing the table:
capitals %>%
 kable(
    full_width = F,
    caption = "Summary of the COVID-19 pandemic in the 27 capitals of Brazilian states."
  )
Summary of the COVID-19 pandemic in the 27 capitals of Brazilian states.
region state city newCases newDeaths accumCases accumDeaths incidence mortality lethality
Southeast SP São Paulo 0 0 1167278 44799 9527.227 365.6457 3.84
Northeast MA São Luís 0 0 77038 2758 6991.480 250.2986 3.58
North PA Belém 0 0 158424 5448 10612.931 364.9652 3.44
North AM Manaus 0 0 318040 9940 14570.524 455.3861 3.13
Southeast RJ Rio de Janeiro 0 0 1330988 38212 19809.603 568.7238 2.87
South PR Curitiba 0 0 305113 8765 15783.571 453.4156 2.87
Northeast CE Fortaleza 0 0 410573 11795 15381.056 441.8692 2.87
Northeast BA Salvador 0 0 338082 9113 11770.235 317.2667 2.70
Midwest MT Cuiabá 0 0 153954 3744 25133.418 611.2184 2.43
Northeast AL Maceió 0 0 132452 3219 12998.897 315.9141 2.43
Northeast PE Recife 0 0 302939 6678 18407.610 405.7781 2.20
Midwest MS Campo Grande 0 0 215395 4685 24040.103 522.8900 2.18
Northeast PI Teresina 0 0 141098 3015 16314.831 348.6174 2.14
North RO Porto Velho 0 0 129376 2748 24431.586 518.9370 2.12
South RS Porto Alegre 0 0 333768 6648 22494.576 448.0476 1.99
Northeast RN Natal 0 0 156185 3077 17665.548 348.0289 1.97
Northeast PB João Pessoa 0 0 178559 3294 22071.161 407.1618 1.84
Southeast MG Belo Horizonte 0 0 472634 8435 18814.523 335.7789 1.78
Midwest GO Goiânia 0 0 468354 8049 30891.761 530.8971 1.72
North AP Macapá 0 0 99612 1616 19790.713 321.0636 1.62
Northeast SE Aracaju 0 0 169874 2617 25855.501 398.3178 1.54
North AC Rio Branco 0 0 86972 1211 21352.306 297.3100 1.39
Midwest DF Brasília 0 0 903944 11854 29978.894 393.1326 1.31
North RR Boa Vista 0 0 139775 1650 35012.637 413.3132 1.18
Southeast ES Vitória 0 0 150218 1456 41485.569 402.1022 0.97
North TO Palmas 0 0 89942 734 30068.165 245.3807 0.82
South SC Florianópolis 0 0 171830 1350 34299.254 269.4756 0.79