understanding interactions between dummy variables

So let’s say we have these burning questions:

1. Does breastfeeding influence lung function later in life?

2. And, second, is this influence different in children of asthmatic mothers when compared with the influence in children of non-asthmatic mothers?

Let’s first put some symbols out, to simplify the problem:

Explanatory (independent) variables:
A = maternal asthma (0 and 1, “no” and “yes”)
B = breastfeeding (0 and 1, “no” and “yes”)

Outcome (dependent) variable:
Y= some lung function measurement, say Forced Vital Capacity (FVC), measured in liters.

c = regression coefficient

For the first question we have this simple equation, which is a linear regression because the outcome (dependent) variable, Y, is measured on a continuous scale (liters):

Y=c0 + c1*A + c2*B

(plus some error e which we can ignore for now);

What do these coefficients mean?
First of all, we have coefficients attached to an exploratory variable (c1….) and a coefficient that’s not attached (c0). c0 is the mean value of FVC when ALL explanatory variables in the equation equal 0 (in our case A=0 and B=0, children of non-asthmatic mothers who were not breastfed). This is the reference category, c0 is also called “the intercept”. The other coefficients (c1 and up) are not mean values anymore, but the difference between a particular group and the reference group.

The typical interpretation is that each coefficient represents the number of units the outcome variable will change (liters of FVC) if the particular explanatory variable the coefficient is attached to changes one unit and everything else is constant (this is important). Since our explanatory variables are dichotomous (having values 0 and 1), the coefficients represent the difference in FVC between A=0 and A=1 (for c1), respectively between B=0 and B=1 (for c2).In other words, c1 is the difference in liters between the (mean) FVC of children of asthmatic mothers and children from non-asthmatic mothers, or the effect of maternal asthma on FVC. This difference is same in breastfed and not-breastfed children. Conversely, c2 represents the difference in liters between the (mean) FVC of breastfed children and not-breastfed children, or the effect of breastfeeding on FVC. Again, this difference is the same in children of asthmatic and children of non-asthmatic mothers.

You can also play with the equation, entering values of 0 and 1 for “A’ and “B”, in all possible combinations; you’ll obtain the following (mean) FVC values in each group (in blue):

B = 0

(not breastfed)

B = 1

(breastfed)

A = 0

(non-asthmatic mothers)

c0 c0 + c2
A = 1

(asthmatic mothers)

c0 + c1 c0 + c1 + c2

The difference between B=0 and B=1 in the non-asthmatic group (A=0) is c2 = [(c0 + c2)-c0]; the difference between B=0 and B=1 in the asthmatic group is also c2= [(c0 + c1 + c2) – (c0 + c1)]. Same arithmetic for the difference between A=0 and A=1

Now, let’s complicate things a bit (and answer question 2), and ask whether the effect of breastfeeding on FVC is the same in children of asthmatic mothers and non-asthmatic mothers. To check that the wise people tell us to introduce, in the equation, a third term, which is the product of the two variables of interest; it is called an “interaction term” (because it is considered that the two variables “interact” with each other). The equation thus becomes:

Y=c0 + c1*A + c2*B + c3*(A*B)

We’ll start by playing with the equation and calculate the FVC mean values in all four possible groups:

B = 0

(not breastfed)

B = 1

(breastfed)

A = 0

(non-asthmatic mothers)

c0 c0 + c2
A = 1

(asthmatic mothers)

c0 + c1 c0 + c1 + c2 + c3

Interpreting the coefficients:

Now the coefficients have a different interpretation, which is not always obvious.

c0 has the same interpretation, the FVC mean value in not breastfed children of non-asthmatic mothers

c1 is now the difference in FVC between children of asthmatic mothers and children of non-asthmatic mothers in the group of not-breastfed children only (the column B=0)

c2 is now the difference in FVC between breastfed children and not-breastfed children in the group of children of non-asthmatic mothers only (the row A=0)

c3 ….what is c3?  – this is where most people get confused. Let’s look at children of asthmatic and children of non-asthmatic mothers and see how breastfeeding is influencing FVC in each of those two groups (you can do it the other way around, how maternal asthma influences FVC in each breastfeeding group).

Non-asthmatic mothers (A=0): the difference between breastfed and not-breastfed children is c2: (c0 + c2) – c0

Asthmatic mothers (A=1): the difference between breastfed and not-breastfed children is c2 + c3: (c0 + c1 + c2 + c3) – (c0 + c1)

It follows that c3 is the additional effect of breastfeeding on FVC in the group of children of asthmatic mothers, on top of the effect in the group of children of non-asthmatic mothers. The effect of breastfeeding is not anymore the same in the two groups (asthmatic/non-asthmatic mothers) as in the previous example, but differs by c3.that

12 thoughts on “understanding interactions between dummy variables

  1. aham.
    am intrat sa editez postul ca scrisesem “taht” in loc de “that” undeva mai sus in text, si probabil am sarit cucursoru fara sa imi dau seama la sfarsit, am scris “that”… si vazand ca “taht” nu s-a schimbat am intrat iar..

    (si io care, vazand deja doo comentarii, ma gandeam “o fi de bine? o fi de rau?” si ma pregatisem de lupta 🙂 )

  2. Da’ tabelul cu valorile pentru c0…c3 când le pui? Sau cu asta îți ocupi timpul acuma? Vroiam și eu să văd un eșantion, ceva analiză statistică, un mean deviation acolo…

    Plus că ai sărit două categorii: copil de mamă astmatică, dar cu doică ne-astmatică și respectiv viceversa 😉

  3. another teaser:

    *4 CATEGORIES
    eststo fef_fa_yi_4_al: xi3, prefix(fe50): regress fef50_ps i.bf4*i.modifier $interaction $anthropometric

    *R square and N
    replace mean=e(r2_a) in 44
    replace mean=e(N) in 58

    *not breastfed no asthma
    lincom _cons

    replace mean_na=r(estimate) in 15
    replace lb_na=r(estimate)-1.96*r(se) in 15
    replace ub_na=r(estimate)+1.96*r(se) in 15

    *not breastfed asthma
    lincom _cons+$xi3_modifier

    replace mean_a=r(estimate) in 15
    replace lb_a=r(estimate)-1.96*r(se) in 15
    replace ub_a=r(estimate)+1.96*r(se) in 15

    * breastfed no asthma
    local j=15
    foreach var in $xi3_bf4{
    lincom _cons+`var'
    local j=1+`j'
    replace mean_na=r(estimate) in `j'
    replace lb_na=r(estimate)-1.96*r(se) in `j'
    replace ub_na=r(estimate)+1.96*r(se) in `j'
    }

    local j=15
    foreach var in $xi3_bf4{
    local j=1+`j'
    replace coef_na=_b[`var'] in `j'
    replace c_lb_na=_b[`var']-1.96*_se[`var'] in `j'
    replace c_ub_na=_b[`var']+1.96*_se[`var'] in `j'
    replace p_na=2*ttail(e(df_r),abs( _b[`var']/_se[`var'])) in `j'
    }

    *breastfed asthma
    *<3mo
    lincom _cons+$xi3_modifier+fe50bf4_1+fe50bf1Xmo1
    replace mean_a=r(estimate) in 16
    replace lb_a=r(estimate)-1.96*r(se) in 16
    replace ub_a=r(estimate)+1.96*r(se) in 16

    lincom fe50bf4_1+fe50bf1Xmo1
    replace coef_a=r(estimate) in 16
    replace c_lb_a=r(estimate)-1.96*r(se) in 16
    replace c_ub_a=r(estimate)+1.96*r(se) in 16

    test fe50bf4_1 fe50bf1Xmo1
    replace p_a=r(p) in 16

    *4-6mo
    lincom _cons+$xi3_modifier+fe50bf4_2+fe50bf2Xmo1
    replace mean_a=r(estimate) in 17
    replace lb_a=r(estimate)-1.96*r(se) in 17
    replace ub_a=r(estimate)+1.96*r(se) in 17

    lincom fe50bf4_2+fe50bf2Xmo1
    replace coef_a=r(estimate) in 17
    replace c_lb_a=r(estimate)-1.96*r(se) in 17
    replace c_ub_a=r(estimate)+1.96*r(se) in 17

    test fe50bf4_2 fe50bf2Xmo1
    replace p_a=r(p) in 17

    *>6mo
    lincom _cons+$xi3_modifier+fe50bf4_3+fe50bf3Xmo1
    replace mean_a=r(estimate) in 18
    replace lb_a=r(estimate)-1.96*r(se) in 18
    replace ub_a=r(estimate)+1.96*r(se) in 18

    lincom fe50bf4_3+fe50bf3Xmo1
    replace coef_a=r(estimate) in 18
    replace c_lb_a=r(estimate)-1.96*r(se) in 18
    replace c_ub_a=r(estimate)+1.96*r(se) in 18

    test fe50bf4_3 fe50bf3Xmo1
    replace p_a=r(p) in 18

    local j=15
    foreach var in $xi3_bf4Xmodifier{
    local j=1+`j'
    replace p_i=2*ttail(e(df_r),abs( _b[`var']/_se[`var'])) in `j'
    }

    *p-trend
    xi3, prefix(fe50): regress fef50_ps bf4*i.modifier $interaction $anthropometric
    replace p_tr=2*ttail(e(df_r),abs(_b[bf4]/_se[bf4])) in 15

    outsheet model-p_tr using "$breasttable/BF-LF_FEF50_asthma_mo (1yrs).txt", replace
    estout fef*2* using "$breasttable/FEF2 coef asthma_mo (1yrs).txt", cells("b(star fmt(3) label(coef.)) se(fmt(3) label(SE)) ci(fmt(3) label (CI)) p(fmt(3))") stats(N r2_a p, labels("Sample size" "Adj. R-square" "Overall p-value")) replace
    estout fef*4* using "$breasttable/FEF4 coef asthma_mo (1yrs).txt", cells("b(star fmt(3) label(coef.)) se(fmt(3) label(SE)) ci(fmt(3) label (CI)) p(fmt(3))") stats(N r2_a p, labels("Sample size" "Adj. R-square" "Overall p-value")) replace

    capture log close

  4. Deci, da, deci m-a cheaunit de cap complet. Da’ ce-are dacă în loc de ţâţă bea copilul un beberon cu lapte. Plus că, dacă mămica are astm, e alcoolică, e addicted, e, în puii mei, cu bube pe adeneu, ăla mic a avut timp în nouă luni să le ia pe toate. Recomand eutanasia.

  5. sau condoms

    mai e ideea că ce a luat în noo luni mai are şanse să scape de dacă îi dai ţâţă – care oricum nu se compara cu nici un lapte. Dacă la politicieni strâmb din nas legitim, la natură nu, a avut ea timp să-şi selecteze chestiile mai bine ca Nestle şi alţi făcători de bani.

  6. Uite ca mai dadui peste ceva apropo de laptele la sticla. Mi-am luat lista de studii la puricat din nou, pentru acel systematic review la care lucrez, si am cautat pe net studiul originar al lui Rajit Chandra. Si uite peste ce dau:
    http://en.wikipedia.org/wiki/Ranjit_Chandra
    http://www.infactcanada.ca/Chandra_Jan30_2006.htm
    http://www.breastfeedingsymbol.org/2007/10/02/the-story-of-ranjit-chandra/

    PS cea mai tare e pozitia universitatii care spune ca nu poate raspunde acuzatiei ca Chandra a fabricat datele (recte, ca datele nu exista) pe motiv ca…datele nu pot fi gasite si nu pot fi verificate, carevasazica.

  7. Nu pot să cred că vorbeşti serios. Deci Kafka e mic copil…
    Şi, oricum, adeneu’ care e format din mamă şi din tată, cu toată moştenirea lor adeneică, de la atîţia alţi strămoşi, cu gene care se activează sau nu. E o tîmpenie. Ce lapte? Ce prostii? Ăla e pentru anticorpii banali, comuni, gen răceală şi gîlci.
    Mă rog, tot humanly put to death. Nu înţeleg de ce se aplică chestia asta numai la animale.

  8. păi vorbesc serios, da’ nu prea pricep cum ie cu adeneu’ şi Kafka ăia.
    dă common sense cred mai degrabă laptele lu’ mă-sa decât pe a lu’ Nestle.
    la fel ca cretineala aia cu soarele care dă cancer. i’auzi, brusc prin anii cinzeci soarele s-a decis să dea cancer de piele, până atunci – vro doo milioane de ani – n-a dat.

Leave a Reply

Your email address will not be published. Required fields are marked *