Linus Larsson

Import Google Analytics data to BigQuery without 360

Wouldn't it be great if you could store your Google Analytics data in BigQuery without having to pay for a Google Analytics 360 license? With this script in R you could do just that. Although, you will not get all the detailed data that you get with the 360 connection. If you decide to import a lot of detailed data this might not work since the rows would exceed the limit of the function in GoogleAnalyticsR. This will happen if you have a lot of traffic and decide to import Session ID. But if you are fine using tables with aggregated data then this will make your day a whole lot better now!

At the bottom of the post you will see the entire script but let's dive in to what it does before you copy and paste it. It's important to understand how it works and what limits there might be. 

First off you will import the relevant libraries and set the scopes of Google Authentication to both BigQuery and Google Analytics. It is very important to put the BigQuery scope first because you might get the same error that I received otherwise. The error is simply put that Google can't authenticate the use of BigQuery since it's not part of the scope. But this occurs because a function in the BigQuery library for R checks only the first scope in the vector (this might have been solved). By putting BigQuery first you will only receive a warning instead of an error and you can ignore the warning since it will only say that bq_check_auth() only will check the first item in scopes.

Then we create a function that we will later call on to extract data from Google Analytics. This is a pretty simple function that will extract the chosen metrics and dimensions between given dates. You could also set a specific segment or filter, if not the function will use the filter "All pages" and the segment "All users". 

In the next part we set the project id, data set id and table id (name) for where we want to send the data in BigQuery. We also set the Google Analytics view id. Then we set the dimensions and metrics that we want to use in our data set. You will have to pick date in order for this to work! The reason is that we want to upload the data to BigQuery as a partitioned table and therefor the date is necessary. 

Now it's time to actually perform the magic! We start off by checking if the table already exists in BigQuery. If it does, then we extract the last date in the table to see when we did the latest update. We also check if the last date is before yesterday. If it is then we will run the entire script. If it isn't then that means that the table is already up to date and we will do nothing more than to print that message to the console. If the table doesn't exist then the start date to import from will be set to 30 days ago. You can change this to your liking. The end date however should always be set to yesterday (or maybe even the day before that). 

If the script should run then we extract the chosen data from Google Analytics in a data frame. Then we create a table with a schema based on the columns in our data frame. If the table already exists then this line will do nothing so there's no harm running it even if you know the table exists. The next line of code will upload the data to the chosen table and append the rows without overwriting anything.

So why is this better than just using Google Analytics as a connector? Well first of all it's easier to import data from BigQuery into your own data warehouse. Second of all you can now use the data and combine it with other data in BigQuery to create more valuable datasets to use. For example you could create a new view in BigQuery and then connect to that view instead of creating relationships in your BI visualization tool. 

Check out the script on GitHub instead.

# Created by Linus Larsson
# 2019-01-11
# https://lynuhs.com/
#install.packages("googleAnalyticsR")
#install.packages("googleAuthR")
if(exists("hasRun") == FALSE){
  library(bigQueryR)
  library(googleAnalyticsR)
  library(googleAuthR)
  
  
  # You will have to include all scopes below and make sure to put BigQuery first.
  # There's a problem in a function for BigQuery that will only check the first element in scopes to verify credentials.
  # You can ignore that warning when running the script.
  
  options(googleAuthR.scopes.selected = c("https://www.googleapis.com/auth/bigquery", "https://www.googleapis.com/auth/devstorage.full_control", "https://www.googleapis.com/auth/cloud-platform", "https://www.googleapis.com/auth/analytics", "https://www.googleapis.com/auth/analytics.readonly", "https://www.googleapis.com/auth/analytics.manage.users.readonly", "https://www.googleapis.com/auth/analytics.edit", "https://www.googleapis.com/auth/analytics.manage.users", "https://www.googleapis.com/auth/analytics.provision"))
  
  gar_auth()
  hasRun <- TRUE
}
gaGetData <- function(id, start, end, dimensions, metrics, filter="ga:pagePath=~.*", segment = "gaid::-1"){
  df <- google_analytics(id,
                         start = start,
                         end = end,
                         metrics = metrics,
                         dimensions = dimensions,
                         filters = filter,
                         segment = segment,
                         samplingLevel = "WALK",
                         max_results = 999999999999)
  
  return (df)
}
###################################################################################
# Set global variables
###################################################################################
# Put your BigQuery project ID here
bq_global_project("PROJECT_ID") 
# Put your BigQuery dataset ID here 
bq_global_dataset("DATASET_ID") 
# Put your Google Analytics View ID here
ga_id <- "XXXXXXXXXX" 
# Name the Table you want to create and send data to in BigQuery.
tableName <- "ga_import"
###################################################################################
# Enter the dimensions and metrics you want to use in the data upload. YOU HAVE TO USE DATE!
dimensions <- c("date", "dimension1","source", "medium", "deviceCategory")
metrics <- c("sessions","transactions","pageviews","uniquePageviews","bounces","itemQuantity")
# Set the date to import from by checking the last date in the table if it exists
if(tableName %in% bqr_list_tables()$tableId){
  start <- as.Date(bqr_query(query =  paste0("SELECT max(date) as date FROM ",tableName))[1,1])+1
  if(start < Sys.Date()-1){
    runScript <- TRUE
  } else {
    runScript <- FALSE
  }
} else {
  start <- Sys.Date()-30
  runScript <- TRUE
}
if(runScript){
  ga <- gaGetData(id = ga_id,
                  start = start,
                  end = Sys.Date()-1,
                  dimensions = dimensions,
                  metrics = metrics)
  
  
  
  # If the table has already been created this function will not try to create another one. 
  bqr_create_table(tableId = tableName, template_data = ga, timePartitioning = TRUE)
  
  # Run this to append your data frame to the table. 
  bqr_upload_data(upload_data = ga, tableId = tableName, overwrite = FALSE)
} else {
  cat("\014")
  print("Table is already up to date!")
}

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Cookie Settings

© Copyright - Lynuhs.com - 2018-2024