Functions to interpret a SOP expression

Description

These functions interpret an expression written in a SOP (sum of products) form, for both crisp and multivalue QCA. The function translate() translates the expression into a standard (canonical) SOP form using a matrix of implicants, while compute() uses the first to compute the scores based on a particular data input.

For crisp sets notation, upper case letters are considered the presence of that causal condition, and lower case letters are considered the absence of the respective causal condition. Tilde is recognized as a negation, even in combination with upper/lower letters.

Functions similar to translate() and compute() have initially been written by Lewandowski (2015) but the actual code in these functions has been completely re-written to integrate it with the package QCA, and expanded with more extensive functionality (see details and examples below).

The function sop() transforms any expression (most notably a POS product of sums) into a sum of products, minimizing it to the simplest equivalent logical expression. It provides a software implementation of the intersection examples presented by Ragin (1987: 144-147), and extended to multi-value sets.

Usage

translate(expression = "", snames = "", noflevels, data)
compute(expression = "", data, separate = FALSE)
sop(expression = "", snames = "", use.tilde = FALSE, noflevels)

Arguments

expression String: a QCA expression written in sum of products form.
snames A string containing the sets' names, separated by commas.
use.tilde Logical, use tilde to negate bivalent conditions.
noflevels Numerical vector containing the number of levels for each set.
data A dataset with binary cs, mv and fs data.
separate Logical, perform computations on individual, separate paths.

Details

A SOP - sum of products is also known as a DNF - disjunctive normal form, or in other words a "union of intersections", for example A*D + B*c.

The same expression can be written in multivalue notation: A{1}*D{1} + B{1}*C{0}. Both types of expressions are valid, and yield the same result on the same dataset.

For multivalue notation, causal conditions are expected as upper case letters, and they will be converted to upper case by default. Expressions can contain multiple values to translate, separated by a comma. If B was a multivalue causal condition, an expression could be: A{1} + B{1,2}*C{0}.

In this example, all values in B equal to either 1 or 2 will be converted to 1, and the rest of the (multi)values will be converted to 0.

These functions automatically detects the use of tilde "~" as a negation for a particular causal condition. ~A does two things: it identifies the presence of causal condition A (because it was specified as upper case) and it recognizes that it must be negated, because of the tilde. It works even combined with lower case names: ~a, which is interpreted as A.

To negate a multivalue condition using a tilde, the number of levels should be supplied (see examples below). Improvements in version 2.5 allow for intersections between multiple levels of the same condition. For a causal condition with 3 levels (0, 1 and 2) the following expression ~A{0,2}*A{1,2} is equivalent with A{1}, while A{0}*A{1} results in the empty set.

The number of levels, as well as the set names can be automatically detected from a dataset via the argument data. Arguments snames and noflevels have precedence over data, when specified.

The use of the product operator * is redundant the set names are single letters (for example AD + Bc), and is also redundant for multivalue data, where product terms can be separated by using the curly brackets notation.

When conditions are binary and their names have multiple letters (for example AA + CC*bb), the use of the product operator * is preferable but the function manages to translate an expression even without it (AA + CCbb) by searching deep in the space of the conditions' names, at the cost of slowing down for a high number of causal conditions. For this reason, an arbitrary limit of 7 causal snames is imposed, to write an expression with.

For the function sop(), if a tilde is present in the expression, the argument use.tilde is automatically activated.

Value

For the function translate(), a matrix containing the implicants on the rows and the set names on the columns, with the following codes:
 0 absence of a causal condition
 1 presence of a causal condition
-1 causal condition was eliminated

The matrix was also assigned a class "translate", to avoid printing the -1 codes when signaling a minimized condition. The mode of this matrix is character, to allow printing multiple levels in the same cell, such as "1,2"

References

Ragin, C.C. (1987) The Comparative Method: Moving beyond Qualitative and Quantitative Strategies. Berkeley: University of California Press.

Lewandowski, J. (2015) QCAtools: Helper functions for QCA in R. R package version 0.1

Examples

translate("A + B*C")
A B C A 1 B*C 1 1
# same thing in multivalue notation translate("A{1} + B{1}*C{1}")
A B C A{1} 1 B{1}*C{1} 1 1
# using upper/lower letters translate("A + b*C")
A B C A 1 b*C 0 1
# the negation with tilde is recognised translate("~A + b*C")
A B C ~A 0 b*C 0 1
# even in combination of upper/lower letters translate("~A + ~b*C")
A B C ~A 0 ~b*C 1 1
# and even for multivalue variables # in multivalue notation, the product sign * is redundant translate("C{1} + T{2} + T{1}V{0} + C{0}")
C T V C{1} 1 T{2} 1 T{1}V{0} 1 0 C{0} 0
# negation of multivalue sets requires the number of levels translate("~A{1} + ~B{0}*C{1}", noflevels = c(2, 2, 2))
A B C ~A{1} 0 ~B{0}*C{1} 1 1
# multiple values can be specified translate("C{1} + T{1,2} + T{1}V{0} + C{0}")
C T V C{1} 1 T{1,2} 1,2 T{1}V{0} 1 0 C{0} 0
# or even negated translate("C{1} + ~T{1,2} + T{1}V{0} + C{0}", snames = "C, T, V", noflevels = c(2,3,2))
C T V C{1} 1 ~T{1,2} 0 T{1}V{0} 1 0 C{0} 0
# if the expression does not contain the product sign * # snames are required to complete the translation translate("AB + cD", snames = "A, B, C, D")
A B C D AB 1 1 cD 0 1
# otherwise snames are not required translate("PER*FECT + str*ing")
FECT ING PER STR PER*FECT 1 1 str*ing 0 0
# snames are required translate("PERFECT + string", snames = "PER, FECT, STR, ING")
PER FECT STR ING PERFECT 1 1 string 0 0
# it works even with overlapping columns # SU overlaps with SUB in SUBER, but the result is still correct translate("SUBER + subset", "SU, BER, SUB, SET")
SU BER SUB SET SUBER 1 1 subset 0 0
# error because combinations of condition names clash (not run) translate("SUPER + subset", "SUP, ER, SU, PER, SUB, SET") # to print _all_ codes from the standard output matrix (obj <- translate("A + b*C"))
A B C A 1 b*C 0 1
print(obj, original = TRUE) # also prints the -1 code
A B C A 1 -1 -1 b*C -1 0 1
# for compute() data(LF) compute("DEV*ind + URB*STB", data = LF)
[1] 0.27 0.89 0.91 0.16 0.58 0.19 0.31 0.09 0.13 0.72 0.34 0.99 0.02 0.01 0.03 [16] 0.20 0.33 0.98
data(CVF) compute("DEV*ind + URB*STB", data = LF, separate = TRUE)
DEV*ind URB*STB 1 0.27 0.12 2 0.00 0.89 3 0.10 0.91 4 0.16 0.07 5 0.58 0.03 6 0.19 0.03 7 0.04 0.31 8 0.04 0.09 9 0.07 0.13 10 0.72 0.05 11 0.34 0.10 12 0.06 0.99 13 0.02 0.00 14 0.01 0.01 15 0.01 0.03 16 0.03 0.20 17 0.33 0.13 18 0.00 0.98
# for sop() sop("(A + B)(A + ~B)")
[1] "A"
# to force a certain order of the set names sop("(URB + LIT*~DEV)(~LIT + ~DEV)", snames = "DEV, URB, LIT")
[1] "URB*~LIT + ~DEV*URB + ~DEV*LIT"
# multilevel conditions can also be specified (and negated) sop("(A{1} + ~B{0})(B{1} + C{0})", snames = "A, B, C", noflevels = c(2, 3, 2))
[1] "A{1}*C{0} + B{1} + B{1,2}*C{0}"
# in Ragin's (1987) book, the equation E = SG + LW is the result # of the Boolean minimization for the ethnic political mobilization. # intersecting the reactive ethnicity perspective (R = lw) # with the equation E (page 144) sop("lw(SG + LW)", snames = "S, L, W, G")
[1] "SlwG"
# resources for size and wealth (C = SW) with E (page 145) sop("SW(SG + LW)", snames = "S, L, W, G")
[1] "SWG + SLW"
# and factorized factorize(sop("SW(SG + LW)", snames = "S, L, W, G"))
F1: SW(G + L)
# developmental perspective (D = Lg) and E (page 146) sop("Lg(SG + LW)", snames = "S, L, W, G", use.tilde = TRUE)
[1] "LW~G"
# subnations that exhibit ethnic political mobilization (E) but were # not hypothesized by any of the three theories (page 147) # ~H = ~(lw + SW + Lg) = GLs + GLw + GsW + lsW sop("(GLs + GLw + GsW + lsW)(SG + LW)", snames = "S, L, W, G")
[1] "sLWG + SLwG"

Author

Adrian Dusa