The no-code alternative for spreadsheets and Excel
Check this article on Medium here: https://medium.com/evidentebm/gigasheet-for-beginners-98d54c44849c
GigaSheet is a company founded in August 2020 that claims to help anyone who can use a spreadsheet quickly analyze massive, disparate data sets without code.
In this tutorial will follow these steps in order to test this platform:
- Uploading your file;
- Dimension (rows and columns)
- Visualizing data
- Transforming data (Data cleanup and Data management)
- Performing descriptive statistics
- Drawing your graphs
Getting to know GigaSheet
We know software like Microsoft Excel and GoogleSheets which allow you to do some work with spreadsheets using only a handful of formulas. Also, some data analysis software (Stata and SPSS) allows you to use drop-down menus to perform complex data management and inferential analysis.
However, GigaSheet takes the no-code to a new level of simplicity. As simple as beginning your analysis by dragging and dropping your file and then using user-friendly menus to perform the different tasks.
GigaSheets allows beginners to have almost no entry barriers and allows all users to minimize the time wasted typing lines of code.
Running data step-by-step
We will go back to a dataset you have been using a lot in Epidence tutorials. The open-source Framingham Heart Cohort.
1. Uploading your file
After signing up you will have this window show. Here you can import your file. You can either:
- Upload other file formats such as CSV, JSON, XLSX, TSV, and GZIP;
- Upload your file directly from Google Drive, Dropbox, OneDrive, etc.
2. Dimension (rows and columns)
The number of rows and columns is automatically displayed once you upload the file.
In this case, 4.2k is rounded. To have the exact number of rows and columns click over the 3 dots and choose “File properties”
3. Visualizing your data
Once you open your spreadsheet for the first time GigaSheet will show a short tutorial with some hints regarding data management and data visualization.
Data visualization
You can visualize 100 rows at a time. And you can use the bottom left arrows to change the “page” showing different 100 observations.
Columns mode vs Rows mode
On the right bar, you have the option to choose between columns and rows.
Managing columns
In the uttermost right column, you will see a list with all variables (columns) available.
You can select and unselect in order to filter which variables show, and in which order (by dragging the variable up or down).
If for some reason what you need is to DELETE a column click on the 3 bars beside the column name and click “Delete”.
4. Transforming data (Data cleanup and Data management)
If we take a look at our Framingham dataset, we see, using rows visualization, that the variable “GLUCOSE” is in the wrong format. It should be numerical (calculator symbol) but it is coded as text (T symbol).
We need to transform this variable to the right format.
Data cleanup -> Change Data type
Then you will have the option to choose between integer and decimal. We will go for decimal here… and then Apply and Proceed.
Now, it’s done. Our variable is changed to the desired format.
Some advanced formatting techniques…
For more advanced issues, you can use the VLOOKUP function to merge different sheets.
More about it on their official YouTube channel
NOTE: Their function option has somewhat changed. Now it looks something like; Insert -> Cross File VLookup
5. Performing descriptive statistics
You can view the above information using various aggregations available at the bottom of the sheet.
Here instead of using “tab” or “describe” functions, we will have to rely on “Data aggregation”.
In order to answer to the clinical question; How many patients had an ischemic heart event (variable TenYearCHD)? We will have to ask GigaSheet;
How many participants have a unique observation for the variable TenYearCHD?
In other words, we are asking DataSheets, how many participants are “=1” (had an ischemic event) and how many are “=0”, did not have an ischemic event.
We go to Group and then select which variable we would like to group.
We see how 2420 rows “collapsed” into only 2 rows. We get here the answer to the previous question
How many participants have a unique observation for the variable TenYearCHD?
There are 343 with an ischemic event (TenYearCHD =1).
But what about cross-tabulation?
Let’s try our first cross-tabulation. In other words, let’s cross the observations of the different discrete variables.
Here we want to cross-tabulate ischemic events (TenYearCHD) with presence of hypertension (prevalentHyp). In order to see…
How many patients with ischemic events had hypertension? And how many patients without an ischemic event had hypertension?
From the previous selection (where we grouped observations according to the TenYearCHD variable), we will click above one of the cells below the column “PREVALENTHYP” and choose the option “row count”.
Here we see:
- 992 patients of those 3596 without any ischemic event had hypertension (prevalenthyp=1 + TenYearCHD=0);
- 325 patients of those 644 with a previous ischemic event had hypertension (prevalenthyp=1 + TenYearCHD=1).
However, we cannot test here whether this difference has any statistical meaning.
It is not perfect for descriptive statistics. We get it…
- It is not as practical as having an output table with all parameters
- To get more than one parameter (for example; count, range, and proportion) you would have to repeat this manually while typing the output in another window (not really feasible);
- It does not allow for the calculation of statistical significance
But still, it is a nice way to visually explore data.
6. Drawing your graphs
Let’s have a bar chart with each bar showing the number of patients with hypertension in both groups (with and without an ischemic heart event).
You have to click and drag (in Windows hold shift while clicking all the desired cells). Then right-click Chart Range -> Bar -> Grouped
Here you have a bar chart. There are many other options such as pie charts and scatter plots (this last is very useful for two continuous variables).
More tutorials
We do not need to go too deep into tutorials here, since the Gigasheet team has already built nice tutorials (both in text and video) with almost any possible issue you may face while using this Software.
https://www.gigasheet.com/how-to
Key Takeaways
- GigaSheets is a spreadsheet software. Do not expect it to perform inferential analysis or machine learning. Think about it more as an “online Excel” (with a lot of potentials);
- Being a cloud service (like Kagle) there is always the question of safety and trust. It is not like an offline Jupyter Notebook or R Markdown;
- Gigasheet offers a great opportunity for the user to focus more on the analysis itself rather than wasting time learning code. A very user-friendly platform.
- Contrary to R or Python Gigasheet is not freeware. It is rather Freemium. They charge between 79 and 95$ per month for their “Premium” version, which allows exporting more than 100 rows or using more than 10GB of data.
My personal opinion
Gigasheet is a must-try platform. Their no-code interface is the future of data analysis! 🚀
However, this falls very short of my needs. Namely, it does not allow me to easily perform*:
- Data management (creating new variables using formulas),
- Testing statistical differences (“p-values”),
- Performing inferential statistics (running models to test associations).
Maybe it may fill your needs. So give it a try.
Disclosure
None is this content was sponsored. I am neither a shareholder nor an employee of Gigasheet. There is no link between Epidence and Gigasheet.
This article does NOT give any kind of individual medical recommendation. If you are seeking medical advice please visit a licensed physician in your country.
*These features may or may not become available in the future. This information is true at the time of writing.