Linus Larsson

Remove possible PII query data in Google Analytics

Have you noticed that Google Analytics page view data sometimes include query strings? There are a lot of positive aspects to this, e.g. the possibility to auto track AdWords, adding UTM parameters etc. But in some cases this could be a real headache. What if your website puts sensitive data in the query string? One example of this is login information. Sometimes the logic in the back end of your website needs the information as a query and the result can look something like this:

http://domain.com/[email protected]&userID=132412

In this case you will send personally identifiable information to Google Analytics, which is very bad from two perspectives:

  1. You violate the policy from Google!
  2. You will store PII without consent (maybe you can get the consent on the website beforehand) which means that it will have to follow the new data protection regulation from the EU.

So should we all panic? NO, there's an easy solution that can be applied via Google Tag Manager. If you already know what queries you need to remove you can always do it directly from Google Analytics. But what if you don't know all the queries that can include possible PII, and what if your IT department decides to add another one without your knowledge? Wouldn't it be better with a solution that will be somewhat more bulletproof than adding queries in Google Analytics? I'm definitely not saying that this will work ALL THE TIME but it's better than writing down the queries you know about.

Step 1

First off you need to create a new variable in Google Tag Manager. Choose URL and then Query. Leave the field blank and save it. This variable will now contain the entire query string in the URL.

Step 2

Create a new custom JavaScript variable and paste the following code:

function(){
    var q = {{Query - All}};
    var forbiddenKeywords = ["email","user","password","login","order","transaction","telephone","cellphone","card","registration"];
    var queries = q.split('&');
    for (n=0;n<queries.length;n++){
        for (i=0;i<forbiddenKeywords.length;i++){
        if(queries[n].indexOf(forbiddenKeywords[i]) > -1){
            var pos = queries[n].indexOf('=');
           queries[n] = queries[n].substring(0,pos+1) + "[... ]";
            break;
        }
      } 
    }
    if (q != ''){
     q = '?' + queries.join('&');
    }
 return q;
}

The script will import the entire query and separate them in single queries. On row 3 we add the list of keywords. Then we loop through all queries from the URL and check if our forbidden keywords are part of any of the existing queries. If a query contains one of the forbidden keywords then the value of the query will be replaced by the string "[removed]".

Step 3

Make sure you change all your Analytics tags (or your Analytics setting variable) with a field for page that include the page path variable and then the query variable you just created as shown in the image below.

Now the page with the possible PII will look like this instead:

http://domain.com/?user=[removed]&userID=[removed]

Comments

  1. Anna

    2019-03-21 16:53

    Hello Linus! Thanks for this very insighful article. Would the GA TAG avoid sending PII data to GA when you modify the tag with your method? As far as I understand your method differs in this sense from the method to remove the parameters in GA. Best and thanks, Anna

    • Linus

      2019-03-21 17:34

      Hi Anna, I’m not exactly sure what you mean. If you mean what the difference is from removing the query parameters in GA my method will keep the queries but only remove the query values for the specific type of queries you want to avoid. Maybe you don’t know in advance what queries your developers will add to the site? Then you can’t exclude them in GA until it’s too late and the information has already been loaded. into GA. With this method you will automatically mask everything in parameters that contains the chosen keywords.

      Another difference is that with this method your queries will still appear in the path in GA. Maybe you would like to be able to still see if the query has been present in the path and not allowing GA to simply remove it from all reports. ALTHOUGH in my own GTM I have modified the settings to send the secured query as a custom dimension and instead send path without any queries.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Cookie Settings

© Copyright - Lynuhs.com - 2018-2024