##
## lessR 3.9.4 feedback: gerbing@pdx.edu web: lessRstats.com/new
## -------------------------------------------------------------------
## > d <- Read("") Read text, Excel, SPSS, SAS or R data file
## d: default data frame, no need for data=
## > l <- Read("", var_labels=TRUE) Read variable labels into l,
## required name for data frame of labels
## > Help() Get help, and, e.g., Help(Read)
## > hs(), bc(), or ca() All histograms, all bar charts, or both
## > Plot(X) or Plot(X,Y) For continuous and categorical variables
## > by1= , by2= Trellis graphics, a plot for each by1, by2
## > reg(Y ~ X, Rmd="eg") Regression with full interpretative output
## > style("gray") Grayscale theme, + many others available
## > style(show=TRUE) all color/style options and current values
## > getColors() create many styles of color palettes
## > d[.(rows), .(cols)] subset with . more flexible than base R
One of the most frequently encountered visualizations is the bar chart.
Bar chart: Plots a number associated with each category of a categorical variable as the height of the corresponding bars.
A call to a function to create a bar chart has to contain the name of the variable that contains the categories to be plotted. With the BarChart()
function, that variable name is the first argument passed to the function, and often, as in this example, the only argument passed to the function. In that situation, the numerical value associated with each bar is the corresponding count of the number of occurrences.
First read the Employee data included as part of lessR.
##
## >>> Suggestions
## Details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 Years integer 36 1 16 7 NA 15 ... 1 2 10
## 2 Gender character 37 0 2 M M M ... F F M
## 3 Dept character 36 1 5 ADMN SALE SALE ... MKTG SALE FINC
## 4 Salary double 37 0 37 53788.26 94494.58 ... 56508.32 57562.36
## 5 JobSat character 35 2 3 med low low ... high low high
## 6 Plan integer 37 0 3 1 1 3 ... 2 2 1
## 7 Pre integer 37 0 27 82 62 96 ... 83 59 80
## 8 Post integer 37 0 22 92 74 97 ... 90 71 87
## ------------------------------------------------------------------------------------------
To illustrate, consider the categorical variable Dept in the Employee data table. Use BarChart()
to tabulate and display the number of employees in each department, here relying upon the default data frame (table) named d.
Bar chart of tablulated counts of employees in each department.
## >>> Suggestions
## BarChart(Dept, horiz=TRUE) # horizontal bar chart
## BarChart(Dept, fill="greens") # sequential green bars
## PieChart(Dept) # doughnut (ring) chart
## Plot(Dept) # bubble plot
## Plot(Dept, stat="count") # lollipop plot
##
##
## --- Dept ---
##
##
## Missing Values of Dept: 1
##
##
## ACCT ADMN FINC MKTG SALE Total
## Frequencies: 5 6 4 6 15 36
## Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
##
##
## Chi-squared test of null hypothesis of equal probabilities
## Chisq = 10.944, df = 4, p-value = 0.027
The BarChart()
function provides a default color theme, and labels each bar with the associated numerical value. The function also provides the corresponding frequency distribution, the table that lists the count of each category, from which the bar chart is constructed.
Specify a single fill color with the fill
parameter, and a horizontal bar chart with base R parameter horiz
. Turn off console output with the parameter quiet
. Turn off the displayed value on each bar with the parameter values
.
Use the theme
parameter to change the entire color theme: “colors”, “lightbronze”, “dodgerblue”, “darkred”, “gray”, “gold”, “darkgreen”, “blue”, “red”, “rose”, “green”, “purple”, “sienna”, “brown”, “orange”, “white”, and “light”.
Or, can use style()
to change the theme for subsequent visualizations as well.
## >>> Suggestions
## BarChart(Dept, horiz=TRUE) # horizontal bar chart
## BarChart(Dept, fill="greens") # sequential green bars
## PieChart(Dept) # doughnut (ring) chart
## Plot(Dept) # bubble plot
## Plot(Dept, stat="count") # lollipop plot
##
##
## --- Dept ---
##
##
## Missing Values of Dept: 1
##
##
## ACCT ADMN FINC MKTG SALE Total
## Frequencies: 5 6 4 6 15 36
## Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
##
##
## Chi-squared test of null hypothesis of equal probabilities
## Chisq = 10.944, df = 4, p-value = 0.027
Dept is not an ordinal variable, but to illustrate, can choose many different sequential palettes from getColors()
: “reds”, “rusts”, “browns”, “olives”, “greens”, “emeralds”, “turquoises”, “aquas”, “blues”, “purples”, “violets”, “magentas”, and “grays”.
## >>> Suggestions
## BarChart(Dept, horiz=TRUE) # horizontal bar chart
## BarChart(Dept, fill="greens") # sequential green bars
## PieChart(Dept) # doughnut (ring) chart
## Plot(Dept) # bubble plot
## Plot(Dept, stat="count") # lollipop plot
##
##
## --- Dept ---
##
##
## Missing Values of Dept: 1
##
##
## ACCT ADMN FINC MKTG SALE Total
## Frequencies: 5 6 4 6 15 36
## Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
##
##
## Chi-squared test of null hypothesis of equal probabilities
## Chisq = 10.944, df = 4, p-value = 0.027
Rotate and offset the axis labels with rotate_x
and offset
parameters. Do a descending sort of the categories by frequencies with the sort
parameter.
## >>> Suggestions
## BarChart(Dept, horiz=TRUE) # horizontal bar chart
## BarChart(Dept, fill="greens") # sequential green bars
## PieChart(Dept) # doughnut (ring) chart
## Plot(Dept) # bubble plot
## Plot(Dept, stat="count") # lollipop plot
##
##
## --- Dept ---
##
##
## Missing Values of Dept: 1
##
##
## SALE ADMN MKTG ACCT FINC Total
## Frequencies: 15 6 6 5 4 36
## Proportions: 0.417 0.167 0.167 0.139 0.111 1.000
##
##
## Chi-squared test of null hypothesis of equal probabilities
## Chisq = 10.944, df = 4, p-value = 0.027
Map the value of tabulated count to bar fill. This way, the color of the bars reflects the bar height.
## >>> Suggestions
## BarChart(Dept, horiz=TRUE) # horizontal bar chart
## BarChart(Dept, fill="greens") # sequential green bars
## PieChart(Dept) # doughnut (ring) chart
## Plot(Dept) # bubble plot
## Plot(Dept, stat="count") # lollipop plot
##
##
## --- Dept ---
##
##
## Missing Values of Dept: 1
##
##
## ACCT ADMN FINC MKTG SALE Total
## Frequencies: 5 6 4 6 15 36
## Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
##
##
## Chi-squared test of null hypothesis of equal probabilities
## Chisq = 10.944, df = 4, p-value = 0.027
```
Specify both the categorical variable, \(x\), as well as the numerical variable that specifies the height of the bars, \(y\). Then can do a statistical transformation of \(y\). Set the bars proportional to the height of the corresponding mean deviations of \(y\) with the stat
parameter. Possible values of stat
: “sum”, “mean”, “sd”, “dev”, “min”, “median”, and “max”. The “dev” value displays the mean deviations to further facilitate a comparison among levels.
Here the \(x\) is Dept and \(y\) is Salary.
Display bars for values of dev <= 0 in a different color than values above with the fill_split
parameter. Do an ascending sort with the sort
parameter.
## Salary
## - by levels of -
## Dept
##
## n miss mean sd min mdn max
## ACCT 5 0 61792.776 12774.606 46124.970 69547.600 72502.500
## ADMN 6 0 81277.117 27585.151 53788.260 71058.595 122563.380
## FINC 4 0 69010.675 17852.498 57139.900 61937.625 95027.550
## MKTG 6 0 70257.128 19869.812 51036.850 61658.990 99062.660
## SALE 15 0 78830.065 23476.839 49188.960 77714.850 134419.230
## >>> Suggestions
## Plot(Salary, Dept) # lollipop plot
##
##
## Data for: Salary
## -----------------
## ACCT FINC MKTG SALE ADMN
## -10440.776 -3222.877 -1976.424 6596.513 9043.565
Can annotate a plot with the add
parameter. Here add a rectangle around the message centered at <3,10>. First lighten the fill color of the annotation with the add_fill
parameter for the style()
function.
style(add_fill="aliceblue")
BarChart(Dept, add=c("rect", "Employees by\nDepartment"),
x1=c(1.75,3), y1=c(11, 10), x2=4.25, y2=9)
## >>> Suggestions
## BarChart(Dept, horiz=TRUE) # horizontal bar chart
## BarChart(Dept, fill="greens") # sequential green bars
## PieChart(Dept) # doughnut (ring) chart
## Plot(Dept) # bubble plot
## Plot(Dept, stat="count") # lollipop plot
##
##
## --- Dept ---
##
##
## Missing Values of Dept: 1
##
##
## ACCT ADMN FINC MKTG SALE Total
## Frequencies: 5 6 4 6 15 36
## Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
##
##
## Chi-squared test of null hypothesis of equal probabilities
## Chisq = 10.944, df = 4, p-value = 0.027
An alternative to the bar chart for a single categorical variable is the pie chart.
Pie Chart: Relates each level of a categorical variable to the area of a circle (pie) scaled according to the value of an associated numerical variable.
Here the presented version of a pie chart is the doughnut or ring chart.
## >>> Suggestions
## PieChart(Dept, hole=0) # traditional pie chart
## PieChart(Dept, values="%") # display %'s on the chart
## BarChart(Dept) # bar chart
## Plot(Dept) # bubble plot
## Plot(Dept, values="count") # lollipop plot
##
##
## --- Dept ---
##
##
## ACCT ADMN FINC MKTG SALE Total
## Frequencies: 5 6 4 6 15 36
## Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
##
##
## Chi-squared test of null hypothesis of equal probabilities
## Chisq = 10.944, df = 4, p-value = 0.027
The doughnut or ring chart appears easier to read than a standard bar chart. But the lessR function PieChart()
also can create the “old-fashioned” pie chart. We have seen the summary statistics several times now, so turn off the output to the R console here with the quiet
parameter.
Standard pie chart of variable Dept in the d data frame.
Set the size of the hole in the doughnut or ring chart with the parameter hole
, which specifies the proportion of the pie occupied by the hole. The default hole size is 0.65. Set that value to 0 to close the hole.
Specify the second categorical variable with the by
parameter.
## >>> Suggestions
## Plot(Dept, Gender) # bubble plot
## BarChart(Dept, by=Gender, horiz=TRUE) # horizontal bar chart
## BarChart(Dept, fill="steelblue") # steelblue bars
##
##
## Joint and Marginal Frequencies
## ------------------------------
##
## Dept
## Gender ACCT ADMN FINC MKTG SALE Sum
## F 3 4 1 5 5 18
## M 2 2 3 1 10 18
## Sum 5 6 4 6 15 36
##
##
## Cramer's V: 0.415
##
## Chi-square Test: Chisq = 6.200, df = 4, p-value = 0.185
## >>> Low cell expected frequencies, chi-squared approximation may not be accurate
The stacked version is default, but the values of the second categorical variable an also be represented with bars, more helpful to compare the values with each other.
## >>> Suggestions
## Plot(Dept, Gender) # bubble plot
## BarChart(Dept, by=Gender, horiz=TRUE) # horizontal bar chart
## BarChart(Dept, fill="steelblue") # steelblue bars
##
##
## Joint and Marginal Frequencies
## ------------------------------
##
## Dept
## Gender ACCT ADMN FINC MKTG SALE Sum
## F 3 4 1 5 5 18
## M 2 2 3 1 10 18
## Sum 5 6 4 6 15 36
##
##
## Cramer's V: 0.415
##
## Chi-square Test: Chisq = 6.200, df = 4, p-value = 0.185
## >>> Low cell expected frequencies, chi-squared approximation may not be accurate
Obtain the 100% stacked version with the stack100
parameter.
## >>> Suggestions
## Plot(Dept, Gender) # bubble plot
## BarChart(Dept, by=Gender, horiz=TRUE) # horizontal bar chart
## BarChart(Dept, fill="steelblue") # steelblue bars
##
##
## Joint and Marginal Frequencies
## ------------------------------
##
## Dept
## Gender ACCT ADMN FINC MKTG SALE Sum
## F 3 4 1 5 5 18
## M 2 2 3 1 10 18
## Sum 5 6 4 6 15 36
##
##
## Cramer's V: 0.415
##
## Chi-square Test: Chisq = 6.200, df = 4, p-value = 0.185
## >>> Low cell expected frequencies, chi-squared approximation may not be accurate
##
##
## Cell Proportions within Each Column
## -----------------------------------
##
## Dept
## Gender ACCT ADMN FINC MKTG SALE
## F 0.600 0.667 0.250 0.833 0.333
## M 0.400 0.333 0.750 0.167 0.667
## Sum 1.000 1.000 1.000 1.000 1.000