Bar Charts

David Gerbing

library("lessR")
## 
## lessR 3.9.3     feedback: gerbing@pdx.edu     web: lessRstats.com/new
## ---------------------------------------------------------------------
## 1. d <- Read("")           Read text, Excel, SPSS, SAS or R data file
##                            d: default data frame, no need for data=
## 2. l <- Read("", var_labels=TRUE)   Read variable labels into l,
##                            required name for data frame of labels
## 3. Help()                  Get help, and, e.g., Help(Read)
## 4. hs(), bc(), or ca()     All histograms, all bar charts, or both
## 5. Plot(X) or Plot(X,Y)    For continuous and categorical variables
## 6. by1= , by2=             Trellis graphics, a plot for each by1, by2
## 7. reg(Y ~ X, Rmd="eg")    Regression with full interpretative output
## 8. style("gray")           Grayscale theme, + many others available
##    style(show=TRUE)        all color/style options and current values
## 9. getColors()             create many styles of color palettes
## 
## lessR parameter names now use _'s. Names with a period are deprecated.
## Ex:  bin_width  instead of  bin.width

Bar Charts

One of the most frequently encountered visualizations is the bar chart.

Bar chart: Plots a number associated with each category of a categorical variable as the height of the corresponding bars.

A call to a function to create a bar chart has to contain the name of the variable that contains the categories to be plotted. With the BarChart() function, that variable name is the first argument passed to the function, and often, as in this example, the only argument passed to the function. In that situation, the numerical value associated with each bar is the corresponding count of the number of occurrences.

First read the Employee data included as part of lessR.

d <- Read("Employee")
## 
## >>> Suggestions
## Details about your data, Enter:  details()  for d, or  details(name)
## 
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
## 
##     Variable                  Missing  Unique 
##         Name     Type  Values  Values  Values   First and last values
## ------------------------------------------------------------------------------------------
##  1     Years   integer     36       1      16   7  NA  15 ... 1  2  10
##  2    Gender character     37       0       2   M  M  M ... F  F  M
##  3      Dept character     36       1       5   ADMN  SALE  SALE ... MKTG  SALE  FINC
##  4    Salary    double     37       0      37   53788.26  94494.58 ... 56508.32  57562.36
##  5    JobSat character     35       2       3   med  low  low ... high  low  high
##  6      Plan   integer     37       0       3   1  1  3 ... 2  2  1
##  7       Pre   integer     37       0      27   82  62  96 ... 83  59  80
##  8      Post   integer     37       0      22   92  74  97 ... 90  71  87
## ------------------------------------------------------------------------------------------

To illustrate, consider the categorical variable Dept in the Employee data table. Use BarChart() to tabulate and display the number of employees in each department, here relying upon the default data frame (table) named d.

BarChart(Dept)
Bar chart of tablulated counts of employees in each department.

Bar chart of tablulated counts of employees in each department.

## >>> Suggestions
## BarChart(Dept, horiz=TRUE)  # horizontal bar chart
## BarChart(Dept, fill="greens")  # sequential green bars
## PieChart(Dept)  # doughnut (ring) chart
## Plot(Dept)  # bubble plot
## Plot(Dept, stat="count")  # lollipop plot 
## 
## 
## --- Dept ---
## 
## 
## Missing Values of Dept: 1 
## 
## 
##                 ACCT   ADMN   FINC   MKTG   SALE    Total 
## Frequencies:       5      6      4      6     15       36 
## Proportions:   0.139  0.167  0.111  0.167  0.417    1.000 
## 
## 
## Chi-squared test of null hypothesis of equal probabilities 
##   Chisq = 10.944, df = 4, p-value = 0.027

The BarChart() function provides a default color theme, and labels each bar with the associated numerical value. The function also provides the corresponding frequency distribution, the table that lists the count of each category, from which the bar chart is constructed.

Pie Charts

An alternative to the bar chart for a single categorical variable is the pie chart.

Pie Chart: Relates each level of a categorical variable to the area of a circle (pie) scaled according to the value of an associated numerical variable.

Here the presented version of a pie chart is the doughnut or ring chart.

PieChart(Dept)

## >>> Suggestions
## PieChart(Dept, hole=0)  # traditional pie chart
## PieChart(Dept, values="%")  # display %'s on the chart
## BarChart(Dept)  # bar chart
## Plot(Dept)  # bubble plot
## Plot(Dept, values="count")  # lollipop plot 
## 
## 
## --- Dept ---
## 
## 
##                 ACCT   ADMN   FINC   MKTG   SALE    Total 
## Frequencies:       5      6      4      6     15       36 
## Proportions:   0.139  0.167  0.111  0.167  0.417    1.000 
## 
## 
## Chi-squared test of null hypothesis of equal probabilities 
##   Chisq = 10.944, df = 4, p-value = 0.027

``

The doughnut or ring chart appears easier to read than a standard bar chart. But the lessR function PieChart() also can create the “old-fashioned” pie chart. We have seen the summary statistics several times now, so turn off the output to the R console here with the quiet parameter.

PieChart(Dept, hole=0, quiet=TRUE)
Standard pie chart of variable _Dept_ in the _d_ data frame.

Standard pie chart of variable Dept in the d data frame.

Set the size of the hole in the doughnut or ring chart with the parameter hole, which specifies the proportion of the pie occupied by the hole. The default hole size is 0.65. Set that value to 0 to close the hole.