Adding Google Sheets to BigQuery with Google Apps Script and Running SQL Queries on Sheet Data

This tutorial and accompanying video is going to show how to programmatically use Google Apps Scripts to add Google Sheets as tables in BigQuery. It will also demonstrate and include some SQL queries run against Google Sheets data.

If you have any questions, ask in the comments or use this contact form.

Also, I have a Google Sheet you can copy that contains the data and the code. You can get that here.

About The Data

I took this data from a sample Airtable database in their Universe examples. https://www.airtable.com/universe/expusA0mTzB4qLDcR

For the sake of simplicity, I only pulled two of the Airtable tables into Google Sheets but this is representative of how we provide more robust reporting for clients that use Airtable.

The tables in question are:

  • Completed Research
  • Participants

There is a one to many relationship from Completed Research to Participants… meaning, any given research project can have one or more participants.

The “Participants” field in Completed Research includes one or more comma separated Participant ID’s from the Participant table. In the SQL queries I will be converting that to an array and then unnesting it in order to join the tables.

When we bring this type of data into BigQuery native tables, they are defined as repeated fields. That discussion falls outside of this tutorial. I hope the SQL queries I provide below and the video will help make the use of the BigQuery UNNEST function clear.

About Performance

When building robust reporting systems, we take a final step of adding that sheet data into BigQuery as native BigQuery tables. The performance increase is significant; far better than data in Airtable or data in Google Sheets. However, if you have reasonably small datasets and you have users who update your Google Sheets, you may find this to be a useful tool.

About Permissions/Security

I’m running the App Script below under the same account that administers and owns the BigQuery data. If your situation is different, the account running this code needs the authority to add BigQuery tables and data.

Adding Google Sheets as Tables in BigQuery

You can add Google Sheets as BigQuery tables manually or via code. I’m focused on the automated method but I provide a simple outline of the manual process as well.

Adding a Google Sheet to BigQuery Manually

From the BigQuery console, do the following:

  1. Select your dataset
  2. Select “Create Table”
  3. Create table from: Drive
  4. Enter the URL/URL of your sheet
  5. File format: Google Sheet
  6. Enter a table name
  7. Schema: Auto detect
  8. Advanced options: Header rows to skip, typically 1
  9. Press the Create Table button

Adding a Google Sheet to BigQuery using Google Apps Script

Setup

Add the BigQuery Service

You need to add the BigQuery service in your Apps Script project

  1. Click the plus symbol next to Services.
  2. Look for the BigQuery V2 service and add it.

This will let you instantiate the BigQuery methods using the name, BigQuery, in the same way you do with SpreadsheetApp, DocumentApp, DriveApp, or any other available services.

Note: You can edit the service name to something like, BQApp if you wish to initiate it that way.

Add a call to DriveApp

Somewhere in your code you need to add a call to the DriveApp service.

Add something like:

let rootFolder = DriveApp.getRootFolder();
let newFolder = rootFolder.createFolder(”test folder”);

It can live in its own function and never has to be called. But you need the OAuth scope: https://www.googleapis.com/auth/drive

It is used by the BigQuery service to connect to the Google Sheet’s URI. If you do not add it, you will receive a permissions error during the table insert process.

Now for the Code

In the code, I build a basic table definition object for BigQuery. This definition can be found here:

https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#Table

You do not need all the parameters.

It is part of the BigQuery API documentation, found here.

https://cloud.google.com/bigquery/docs/reference/rest

function bqInsertTable(sourceURI, sheetName, tblName,strDescription) {
  
  let skipRow = 1;
  let gSheetOptions = {
    skipLeadingRows: skipRow,
    range: sheetName
  }
  let extDataConfig = {
    sourceUris: [
      sourceURI
    ],
    sourceFormat: "GOOGLE_SHEETS",
    autodetect: true,
    googleSheetsOptions: gSheetOptions,

}

let tableDef = {
  tableReference: {
    projectId: bqProjectId,
    datasetId: bqDatasetId,
    tableId: tblName
  },
  friendlyName: tblName,
  description: strDescription,
  externalDataConfiguration: extDataConfig,

}

let insertTableResponse = BigQuery.Tables.insert(tableDef,bqProjectId,bqDatasetId);
return insertTableResponse;

}

function dropTables() {
  bqDeleteTable("zShtCompletedResearch");
  bqDeleteTable("zShtParticipants");
}


function bqDeleteTable(tblName) {
  const delResponse = BigQuery.Tables.remove(bqProjectId,bqDatasetId,tblName);
  Logger.log(delResponse);
}

Calling the above function

In the code below, you put your project ID, dataset ID, and Google Sheet URL in the constants at the top of the code. Then run, connectSheetsAsTables. That will create two BigQuery external data tables.

const bqProjectId = 'YOUR BIGQUERY PROJECT ID';
const bqDatasetId = 'YOUR BIGQUERY DATASET ID';
const sourceURI = 'YOUR GOOGLE SHEET URL'


function connectSheetsAsTables() {
  let sheetName = "CompletedResearch";
  let tblName = "zShtCompletedResearch";
  let strDescription = "Completed research from Airtable";
  let response = bqInsertTable(sourceURI,sheetName,tblName,strDescription);
  Logger.log(response.creationTime);

  sheetName = "Participants";
  tblName = "zShtParticipants";
  strDescription = "Participants from Airtable";
  response = bqInsertTable(sourceURI,sheetName,tblName,strDescription);
  Logger.log(response.creationTime);

}

Running SQL Commands Against Google Sheets Data

In the code above, the two tables in BigQuery will be called:

  • zShtCompletedResearch
  • zShtParticipants

Why the zSht prefix? I want these tables near the bottom of my dataset list. Usually, immediately following this step, I run a make table query where I fix field names and create native BigTable queries. I may cover that in my next tutorial.

For now, let’s look at some basic SQL statements and what is required to join the tables. You'll need to replace YOURDATASETID with... your dataset id in BigQuery.

#SOME SQL CODE

# A Basic select statement for each table:
Select * from `YOURDATASETID.zShtCompletedResearch`;
Select * from `YOURDATASETID.zShtParticipants`;

# Selected fields from the Completed Research table
#  I give the table an alias of CR.
#  Note the Participants field displays the comma separated
#  list of participant record numbers. We cannot do a join on this field directly.
SELECT CR.id as researchID, CR.Test_Name, CR.Test_Type, CR.Testing_Date, CR.Participants FROM `YOURDATASETID.zShtCompletedResearch` CR
WHERE CR.id = 'recvwsT8FIgMhE6fd';

# This query SPLITS the comma separated participants, creating an array
#   which is immediately unnested into unique rows. I give the Unnested
#   field a name of participantId.
SELECT CR.id as researchID, CR.Test_Name, CR.Test_Type, CR.Testing_Date, participantId FROM `YOURDATASETID.zShtCompletedResearch` CR,UNNEST(SPLIT(CR.Participants,",")) as participantId
WHERE CR.id = 'recvwsT8FIgMhE6fd';

# Now I use the unnested participantID to join to the participants table on the Id field
#   If you change any of the returned data in the Google Sheet, you will see it immediately
#   reflected the next time you run the query.
SELECT CR.id as researchID, CR.Test_Name, CR.Test_Type, CR.Testing_Date, participantId, Participant_Name, P.Role FROM `YOURDATASETID.zShtCompletedResearch` CR,
UNNEST(SPLIT(CR.Participants,",")) as participantId
LEFT JOIN `YOURDATASETID.zShtParticipants` P ON (participantId = P.id)
WHERE CR.id = 'recvwsT8FIgMhE6fd'

# Show a count of tests taken by participant roles
SELECT P.Role, CR.Test_Type, count(CR.Test_Type) as countTestType FROM `YOURDATASETID.zShtCompletedResearch` CR, 
UNNEST(SPLIT(CR.Participants,",")) as participantId
LEFT JOIN `YOURDATASETID.zShtParticipants` P ON (participantId = P.id)
GROUP BY P.Role, CR.Test_Type
ORDER BY count(CR.Test_Type) DESC;

# Show a count of participants by test type
SELECT CR.Test_Type, count(participantId) as countParticipants FROM `YOURDATASETID.zShtCompletedResearch` CR, 
UNNEST(SPLIT(CR.Participants,",")) as participantId
LEFT JOIN `YOURDATASETID.zShtParticipants` P ON (participantId = P.id)
GROUP BY CR.Test_Type
ORDER BY count(participantId) DESC;

# Show a count of participants by test type - more accurate using distinct
#   in the count function
SELECT CR.Test_Type, count(distinct participantId) as uniqueParticipants FROM `YOURDATASETID.zShtCompletedResearch` CR, 
UNNEST(SPLIT(CR.Participants,",")) as participantId
LEFT JOIN `YOURDATASETID.zShtParticipants` P ON (participantId = P.id)
GROUP BY CR.Test_Type
ORDER BY count(distinct participantId) DESC;

Conclusion

I hope this sheds some light on what is capable using BigQuery with Google Sheets data. The ability to create external data tables can be expanded to include views (pre-created queries) for more complex reporting.

If you connect BigQuery to Looker Studio, Tableau, or Power BI, and your Google Sheets are constantly updated, you can get real-time updates for your dashboards. Plus, BigQuery SQL allows you to create far more complex queries than is possible using just Google Sheets data.

Then, when it makes sense, you can automate the creation of BigQuery native tables for larger datasets and MUCH GREATER performance!

Video Demo/Example

Posted in Automation, Google Apps Script and tagged , .

Leave a Reply

Your email address will not be published. Required fields are marked *