You are currently viewing Getting proportions (%) in R shouldn’t cause you a headache
crosstable headache rstudio posit

Getting proportions (%) in R shouldn’t cause you a headache

Solution: Crosstable ( ) function for posit (RStudio)

Photo by Ron Lach from Pexels

For the Medium article Click Here

Are you tired of trying to get relative frequencies (percentages %) in R? There is a function for the package “gmodels” that can help make your life easier; Crosstable( ).

This tutorial uses an example from clinical research (the Framingham Heart Study) as the perfect example.

First steps

First go to https://www.r-project.org/ to download R and then, to https://rstudio.com/ to download the RStudio environment (based on R). All are freeware.

The Framingham Heart Study cohort is the longest-running clinical cohort (from 1948). This cohort focuses on cardiovascular disease and risk factors in the general population.

After completing the installation open your RStudio. Then, go to the console and try the following steps

Open and attach the dataset

Weare going to extract a publically available version of the Framingham heart study from;

https://www.kaggle.com/amanajmera1/framingham-heart-study-dataset/version/1

On this website, you can also consult the coding of the different variables. For example, regarding gender (variable male; male=1 means male and male=2 means female)

First Download the file to your computer.

Then go to RStudio and open the file via the following options (this is the easiest way)

File -> Import Dataset -> From Text (base) (this is a .csv file)

Then, you should see a window with a data frame called framingham with 4240 observations (rows) and 16 variables (columns) (as shown in Environment).

Let’s “attach” the dataset (Click here to know WHY we are doing this).

attach(framingham)

Let’s begin our Crosstable tutorial by answering a clinical question

Let’s try to answer the question “What is the distribution of coronary heart disease (CHD) between gender?” using both the Classical Method and the Crosstable( ) package

  • male is a variable for gender;
  • TenYearCHD is the variable for the prevalence of coronary heart disease after 10 years.

Let’s begin by getting proportions (%) the classical way…

table(male,TenYearCHD)

We see that 343 men and 301 women had CHD. But we do not yet have the proportions (%) for each group.

The trick is to create an intermediate variable “z” (you can name it as you like it), that represents a two-by-two table with our discrete variables.

Afterward, we divide the absolute count of each cell by the sum of all cases in the same row. This will show a proportion (between 0 and 1)

z <- table(male,TenYearCHD)
z/rowSums(z)

For housekeeping, we can remove this extra value “z” (feel free to use any other letter you wish)

z <- table(male,TenYearCHD)
z/rowSums(z)
rm(z)

After running everything…

We see 12% of females and 19% of males had CHD after 10 years.

Ans: 301 (12%) female and 343 (19%) male patients had a CHD.

Enter the easier way — the “crosstable” function

We need to have the “gmodels” package installed.

install.packages("gmodels")
library("gmodels")

Let’s try the CrossTable function.

CrossTable(male,TenYearCHD)

We see the meaning of the 5 rows inside each cell, on the box above.

However, if we only need the row-wise proportion we can type instead…

CrossTable(male,TenYearCHD, prop.r=TRUE,prop.c=FALSE,prop.t=FALSE,prop.chisq=FALSE)

The only argument we want is the probability for each row (prop-r).

Ans: 12% of female and 19% of male patients had a CHD after 10 years.

Testing the statistical significance

We can go even further and test whether this difference is statistically significant using the χ2 test (chi-squared).

CrossTable(male,TenYearCHD, prop.r=TRUE,prop.c=FALSE,prop.t=FALSE,prop.chisq=FALSE,chisq=TRUE)

In fact p<0.05 (p=8.69e-09), meaning that there is a statistically relevant difference between CHD in men and women.

Since p<α (type 1 error) of 0.05, we can reject the null hypothesis. More about the value of p-value here.

Ans: There is a statistical difference between CHD between gender, where 12% of female and 19% of male patients had a CHD after 10 years (p<0.05).

What if I need percentages from 0 to 100%…

If you are looking for percentages (0–100%) instead of 0–1, there are many ways of achieving this.

The simplest method I have found is to type format=“SPSS” (the default would be format= “SAS”)

CrossTable(male,TenYearCHD, prop.r=TRUE,prop.c=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")

We now see, again, that 19% (343) of male and 12% (301) of female participants had CHD after 10 years.

BONUS — Understanding all possible arguments for the CrossTable function…

  • digits — Number of digits after the decimal point for cell proportions
  • max.width — In the case of a 1 x n table, the default will be to print the output horizontally. If the number of columns exceeds max.width, the table will be wrapped for each successive increment of max.width columns. If you want a single-column vertical table, set max.width to 1
  • expected — If TRUE, chisq will be set to TRUE and expected cell counts from the χ2 will be included
  • prop.r — If TRUE, row proportions will be included
  • prop.c — If TRUE, column proportions will be included
  • prop.t — If TRUE, table proportions will be included
  • prop.chisq – If TRUE, chi-square contribution of each cell will be included
  • chisq — If TRUE, the results of a chi-square test will be included
  • fisher — If TRUE, the results of a Fisher Exact test will be included
  • mcnemar — If TRUE, the results of a McNemar test will be included
  • resid — If TRUE, residual (Pearson) will be included
  • sresid — If TRUE, standardized residual will be included
  • asresid — If TRUE, adjusted standardized residual will be included
  • missing.include — If TRUE, then remove any unused factor levels
  • format — Either SAS (default) or SPSS, depending on the type of output desired.
  • dnn — the names to be given to the dimensions in the result (the dimnames names).

Disclosure

This article does NOT give any kind of individual medical recommendation. If you are seeking medical advice please visit a licensed physician in your country.

References

https://www.rdocumentation.org/packages/gmodels/versions/2.18.1.1/topics/CrossTable