understanding interactions between dummy variables

So let’s say we have these burning questions:

1. Does breastfeeding influence lung function later in life?

2. And, second, is this influence different in children of asthmatic mothers when compared with the influence in children of non-asthmatic mothers?

Let’s first put some symbols out, to simplify the problem:

Explanatory (independent) variables:
A = maternal asthma (0 and 1, “no” and “yes”)
B = breastfeeding (0 and 1, “no” and “yes”)

Outcome (dependent) variable:
Y= some lung function measurement, say Forced Vital Capacity (FVC), measured in liters.

c = regression coefficient

For the first question we have this simple equation, which is a linear regression because the outcome (dependent) variable, Y, is measured on a continuous scale (liters):

Y=c0 + c1*A + c2*B

(plus some error e which we can ignore for now);

What do these coefficients mean?
First of all, we have coefficients attached to an exploratory variable (c1….) and a coefficient that’s not attached (c0). c0 is the mean value of FVC when ALL explanatory variables in the equation equal 0 (in our case A=0 and B=0, children of non-asthmatic mothers who were not breastfed). This is the reference category, c0 is also called “the intercept”. The other coefficients (c1 and up) are not mean values anymore, but the difference between a particular group and the reference group.

The typical interpretation is that each coefficient represents the number of units the outcome variable will change (liters of FVC) if the particular explanatory variable the coefficient is attached to changes one unit and everything else is constant (this is important). Since our explanatory variables are dichotomous (having values 0 and 1), the coefficients represent the difference in FVC between A=0 and A=1 (for c1), respectively between B=0 and B=1 (for c2).In other words, c1 is the difference in liters between the (mean) FVC of children of asthmatic mothers and children from non-asthmatic mothers, or the effect of maternal asthma on FVC. This difference is same in breastfed and not-breastfed children. Conversely, c2 represents the difference in liters between the (mean) FVC of breastfed children and not-breastfed children, or the effect of breastfeeding on FVC. Again, this difference is the same in children of asthmatic and children of non-asthmatic mothers.

You can also play with the equation, entering values of 0 and 1 for “A’ and “B”, in all possible combinations; you’ll obtain the following (mean) FVC values in each group (in blue):

B = 0

(not breastfed)

B = 1

(breastfed)

A = 0

(non-asthmatic mothers)

c0 c0 + c2
A = 1

(asthmatic mothers)

c0 + c1 c0 + c1 + c2

The difference between B=0 and B=1 in the non-asthmatic group (A=0) is c2 = [(c0 + c2)-c0]; the difference between B=0 and B=1 in the asthmatic group is also c2= [(c0 + c1 + c2) – (c0 + c1)]. Same arithmetic for the difference between A=0 and A=1

Now, let’s complicate things a bit (and answer question 2), and ask whether the effect of breastfeeding on FVC is the same in children of asthmatic mothers and non-asthmatic mothers. To check that the wise people tell us to introduce, in the equation, a third term, which is the product of the two variables of interest; it is called an “interaction term” (because it is considered that the two variables “interact” with each other). The equation thus becomes:

Y=c0 + c1*A + c2*B + c3*(A*B)

We’ll start by playing with the equation and calculate the FVC mean values in all four possible groups:

B = 0

(not breastfed)

B = 1

(breastfed)

A = 0

(non-asthmatic mothers)

c0 c0 + c2
A = 1

(asthmatic mothers)

c0 + c1 c0 + c1 + c2 + c3

Interpreting the coefficients:

Now the coefficients have a different interpretation, which is not always obvious.

c0 has the same interpretation, the FVC mean value in not breastfed children of non-asthmatic mothers

c1 is now the difference in FVC between children of asthmatic mothers and children of non-asthmatic mothers in the group of not-breastfed children only (the column B=0)

c2 is now the difference in FVC between breastfed children and not-breastfed children in the group of children of non-asthmatic mothers only (the row A=0)

c3 ….what is c3?  – this is where most people get confused. Let’s look at children of asthmatic and children of non-asthmatic mothers and see how breastfeeding is influencing FVC in each of those two groups (you can do it the other way around, how maternal asthma influences FVC in each breastfeeding group).

Non-asthmatic mothers (A=0): the difference between breastfed and not-breastfed children is c2: (c0 + c2) – c0

Asthmatic mothers (A=1): the difference between breastfed and not-breastfed children is c2 + c3: (c0 + c1 + c2 + c3) – (c0 + c1)

It follows that c3 is the additional effect of breastfeeding on FVC in the group of children of asthmatic mothers, on top of the effect in the group of children of non-asthmatic mothers. The effect of breastfeeding is not anymore the same in the two groups (asthmatic/non-asthmatic mothers) as in the previous example, but differs by c3.that