GigaSheet for beginners

The no-code alternative for spreadsheets and Excel

Get to know one of the easiests softwares to handle spreadsheets without needing coiding skills. Only drag and drop and drop-down menus. — Photo by Ivan Samkov from Pexels

Check this article on Medium here: https://medium.com/evidentebm/gigasheet-for-beginners-98d54c44849c

GigaSheet is a company founded in August 2020 that claims to help anyone who can use a spreadsheet quickly analyze massive, disparate data sets without code.

In this tutorial will follow these steps in order to test this platform:

Uploading your file;
Dimension (rows and columns)
Visualizing data
Transforming data (Data cleanup and Data management)
Performing descriptive statistics
Drawing your graphs

Getting to know GigaSheet

We know software like Microsoft Excel and GoogleSheets which allow you to do some work with spreadsheets using only a handful of formulas. Also, some data analysis software (Stata and SPSS) allows you to use drop-down menus to perform complex data management and inferential analysis.

However, GigaSheet takes the no-code to a new level of simplicity. As simple as beginning your analysis by dragging and dropping your file and then using user-friendly menus to perform the different tasks.

GigaSheets allows beginners to have almost no entry barriers and allows all users to minimize the time wasted typing lines of code.

Running data step-by-step

We will go back to a dataset you have been using a lot in Epidence tutorials. The open-source Framingham Heart Cohort.

1. Uploading your file

After signing up you will have this window show. Here you can import your file. You can either:

Upload other file formats such as CSV, JSON, XLSX, TSV, and GZIP;
Upload your file directly from Google Drive, Dropbox, OneDrive, etc.

2. Dimension (rows and columns)

The number of rows and columns is automatically displayed once you upload the file.

Here we have around 4200 rows (observations) and 17 columns (variables)

In this case, 4.2k is rounded. To have the exact number of rows and columns click over the 3 dots and choose “File properties”

Here we see there are exactly 4240 rows (observations) and 17 columns (variables)

3. Visualizing your data

Once you open your spreadsheet for the first time GigaSheet will show a short tutorial with some hints regarding data management and data visualization.

Data visualization

You can visualize 100 rows at a time. And you can use the bottom left arrows to change the “page” showing different 100 observations.

Columns mode vs Rows mode

On the right bar, you have the option to choose between columns and rows.

LEFT: Columns mode — shows all columns available | RIGHT: Rows mode — shows all values for each row

Managing columns

In the uttermost right column, you will see a list with all variables (columns) available.

You can select and unselect in order to filter which variables show, and in which order (by dragging the variable up or down).

If for some reason what you need is to DELETE a column click on the 3 bars beside the column name and click “Delete”.

4. Transforming data (Data cleanup and Data management)

If we take a look at our Framingham dataset, we see, using rows visualization, that the variable “GLUCOSE” is in the wrong format. It should be numerical (calculator symbol) but it is coded as text (T symbol).

We need to transform this variable to the right format.

Data cleanup -> Change Data type

Then you will have the option to choose between integer and decimal. We will go for decimal here… and then Apply and Proceed.

Now, it’s done. Our variable is changed to the desired format.

Some advanced formatting techniques…

For more advanced issues, you can use the VLOOKUP function to merge different sheets.

More about it on their official YouTube channel

NOTE: Their function option has somewhat changed. Now it looks something like; Insert -> Cross File VLookup

5. Performing descriptive statistics

You can view the above information using various aggregations available at the bottom of the sheet.

Here instead of using “tab” or “describe” functions, we will have to rely on “Data aggregation”.

In order to answer to the clinical question; How many patients had an ischemic heart event (variable TenYearCHD)? We will have to ask GigaSheet;

How many participants have a unique observation for the variable TenYearCHD?

In other words, we are asking DataSheets, how many participants are “=1” (had an ischemic event) and how many are “=0”, did not have an ischemic event.

We go to Group and then select which variable we would like to group.

We see how 2420 rows “collapsed” into only 2 rows. We get here the answer to the previous question

How many participants have a unique observation for the variable TenYearCHD?

There are 343 with an ischemic event (TenYearCHD =1).

But what about cross-tabulation?

Let’s try our first cross-tabulation. In other words, let’s cross the observations of the different discrete variables.

Here we want to cross-tabulate ischemic events (TenYearCHD) with presence of hypertension (prevalentHyp). In order to see…

How many patients with ischemic events had hypertension? And how many patients without an ischemic event had hypertension?

From the previous selection (where we grouped observations according to the TenYearCHD variable), we will click above one of the cells below the column “PREVALENTHYP” and choose the option “row count”.

Here we see:

992 patients of those 3596 without any ischemic event had hypertension (prevalenthyp=1 + TenYearCHD=0);
325 patients of those 644 with a previous ischemic event had hypertension (prevalenthyp=1 + TenYearCHD=1).

However, we cannot test here whether this difference has any statistical meaning.

It is not perfect for descriptive statistics. We get it…

It is not as practical as having an output table with all parameters
To get more than one parameter (for example; count, range, and proportion) you would have to repeat this manually while typing the output in another window (not really feasible);
It does not allow for the calculation of statistical significance

But still, it is a nice way to visually explore data.

6. Drawing your graphs

Let’s have a bar chart with each bar showing the number of patients with hypertension in both groups (with and without an ischemic heart event).

You have to click and drag (in Windows hold shift while clicking all the desired cells). Then right-click Chart Range -> Bar -> Grouped

Hoover over the bar to see the exact count

Here you have a bar chart. There are many other options such as pie charts and scatter plots (this last is very useful for two continuous variables).

Key Takeaways

GigaSheets is a spreadsheet software. Do not expect it to perform inferential analysis or machine learning. Think about it more as an “online Excel” (with a lot of potentials);
Being a cloud service (like Kagle) there is always the question of safety and trust. It is not like an offline Jupyter Notebook or R Markdown;
Gigasheet offers a great opportunity for the user to focus more on the analysis itself rather than wasting time learning code. A very user-friendly platform.
Contrary to R or Python Gigasheet is not freeware. It is rather Freemium. They charge between 79 and 95$ per month for their “Premium” version, which allows exporting more than 100 rows or using more than 10GB of data.

My personal opinion

Gigasheet is a must-try platform. Their no-code interface is the future of data analysis! 🚀

However, this falls very short of my needs. Namely, it does not allow me to easily perform*:

Data management (creating new variables using formulas),
Testing statistical differences (“p-values”),
Performing inferential statistics (running models to test associations).

Maybe it may fill your needs. So give it a try.

Disclosure

None is this content was sponsored. I am neither a shareholder nor an employee of Gigasheet. There is no link between Epidence and Gigasheet.

This article does NOT give any kind of individual medical recommendation. If you are seeking medical advice please visit a licensed physician in your country.

*These features may or may not become available in the future. This information is true at the time of writing.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.