You are currently viewing Never miss confounding on research

Never miss confounding on research

DAGs — A visual tool that won’t go wrong and save you time

Case-study — Coffee causes lung cancer (does it?)

Photo by Chevanon Photography from Pexels: https://www.pexels.com/photo/close-up-of-coffee-cup-324028/

Papers from the last century suggested that drinking coffee was associated with a higher likelihood of developing lung cancer. It was a shock for everyone.

However, when the data was reassessed it was found that smokers tend to drink more coffee and, as it is known today, smoking causes lung cancer. After running the same analysis, it was found that it was drinking coffee had no effect on lung cancer, and smoking was a confounder (Galarraga et Boffetta 2016).

So, what is a confounder?

A confounder is a common cause of the exposure and the outcome. In this case, those who smoked (confounder) tended to drink more coffee (exposure) and also had more lung cancer (outcome). However, there was no direct association between the exposure “drinking coffee” and the outcome “lung cancer”.

Essentially, a confounder is something that is associated with the exposure and is also associated with the outcome.

However, a confounder cannot be a consequence of the exposure (on average people won’t drink more coffee because they have lung cancer).

Every time we want to assess the association between two variables (more on regression analysis in this article) it is important to recognize all possible confounders and account for them (for example by running a regression analysis with the exposure, the outcome, and all possible confounders on the same model).

The issue here is that confounders are sometimes hard to recognize. Luckily, there is a visual tool that can help the researcher recognize what is and what is not a confounder. They are the directed acyclical graphs (DAGs).

Enter the DAGs

Directed acyclical graphs (DAGs) are a graphical method to easily recognize confounders, minimizing bias in the design and analysis of epidemiological studies.

These are the three main characteristics of every DAG.

  • They are DIRECTED. They have edges linking nodes (variables)
  • They are ACYCLIC. Because arrows can only go from ancestors (causes) to descendants (effects);
  • And they are GRAPHS.
In this DAG C=confounder; E= exposure; D= outcome

Which are the basic pieces of DAGs?

  • Nodes, or variables — represented by a letter
  • Arrows or directed edges — Arrows connecting nodes (letters)
  • Expert subject matter knowledge about the clinical problem — which is decisive on how to put together nodes and arrows to “tell a story”.

Going back to Coffee and Lung Cancer

Returning to our previous example.

DAG where C=cigarrette smoking; E=drinking coffee; D= lung cancer

Itis known that smoking (confounder or C) causes lung cancer (exposure or D). Also, there is a know association between drinking coffee (exposure or E) and smoking (confounder or C), the why is hard to explain but let’s think about workers that make more pauses at work to go outside and drink a cup of coffee while smoking a cigarette. As we see here, there is no effect of coffee drinking (exposure or E) over incident lung cancer (outcome or D).

We can tell there is a direct effect of C over D, in other words, there is a direct effect of smoking on lung cancer.

Direct Effect: a causal effect not mediated through other variables

When to draw the arrows?

There is an arrow between C and D because we know or at least suspect there is an association and there is an arrow between C and E because we know or at least suspect there is another association.

However, the only reason why there is no arrow between E and D is that we are sure there is no association between E and D. In other words, we are encoding our previous subject matter knowledge into the graph.

This table summarizes the meaning of a present or an absent arrow.

Some more relevant terms; more on paths, backdoor paths, and collider…

Paths

Every time we want to go from point A to point B (let’s say from home to work) we go through a series of streets, buildings, and other landmarks. This trip from A to B is a path. A path that passed through many streets (arrows) and by many landmarks (nodes).

path is any arrow-based route between two variables in the graph

Here there is the following path D <- C -> E , which is made by 3 nodes and 2 arrows. We should not care about the directions where the arrowheads are pointing, that’s completely irrelevant to our DAG path.

Backdoor paths and confounding

Using some DAG-related terminology, we say there is confounding when there is a backdoor path open (present).

There is an entire article regarding the interpretation of open and closed paths, and their association with clinical paradoxes.

To keep this article straightforward when a backdoor path is open there is confounding.

The common cause C opens a backdoor path, connecting D to E.

E ←C →D is a backdoor path. So, C is a confounder of the possible effect of E over D.

Colliders (or common effects)

More about colliders (with many practical examples) in the next article. You may have never heard before about colliders. Even though they are not perceived as relevant as confounders, it is important to be aware of the presence of colliders.

To keep things simple, if we condition on a collider (for example you mislabel a collider as a confounder and put it into a regression model) we may end up creating a new source bias that was not present in the first place.

In this case, S is called a “collider” on the path. As you can see, there are two arrowheads (one from C and one from D) colliding over S.

It is said that C and D are common effects of S.

Confounder vs Collider

There is an important difference here. A collider is NOT a confounder because:

  • confounder is a common cause of the exposure and the outcome (C confounds E and D). A confounder is NOT an effect of the outcome or the exposure;
  • collider is a common effect of two other variables (C and D cause S). A collider is NOT a common cause.

Even though, conditioning our analysis on colliders may create bias. But that’s a different topic.

Why DAGs are useful

  • Inform model building
  • Easy way to distinguish between confounding and selection bias
  • Avoid inappropriate adjustment (avoid adjusting on colliders)
  • Motivate the use of alternative analytic methods

What DAGs can encode

  • Selection bias
  • Recall bias
  • Detection bias
  • Confounding
  • Loss to follow up

(more on these individual topics in future articles)

What DAGs cannot tell

  • DAGs may have limited use for encoding effect modification
  • DAGs cannot tell whether or not there is an association. They are not a “statistical test”. DAGs are only a graphical tool.

We summarized in this article

1- What are confounders

2- How DAGs can help us recognize confounders

3- Which are the basic pieces of DAGs

4- The potentials and limitations of DAGs;

5- We learned some basic DAG-related concepts such as nodes, arrows, paths and backdoor paths, and colliders.