This guide explains what the cleanUrlTracker
plugin is and how to integrate it into your analytics.js
tracking implementation.
When viewing your most visited pages in Google Analytics, it's not uncommon to see multiple different URL paths that reference the same page on your site. The following report table is a good example of this and the frustrating situation many users find themselves in today:
Page | Pageviews |
---|---|
/contact | 967 |
/contact/ | 431 |
/contact?hl=en | 67 |
/contact/index.html | 32 |
To prevent this problem, it's best to settle on a single, canonical URL path for each page you want to track, and only ever send the canonical version to Google Analytics.
The cleanUrlTracker
plugin helps you do this. It lets you specify a preference for whether or not to include extraneous parts of the URL path, and updates all URLs accordingly.
The cleanUrlPlugin
works by intercepting each hit as it's being sent and modifying the page
field based on the rules specified by the configuration options. The plugin also intercepts calls to [tracker.get()
] that reference the page
field, so other plugins that use page
data get the cleaned versions instead of the original versions.
Note: while the cleanUrlTracker
plugin does modify the page
field value for each hit, it never modifies the location
field. This allows campaign (e.g. utm
params) and adwords (e.g. glclid
) data encoded in the full URL to be preserved.
To enable the cleanUrlTracker
plugin, run the require
command, specify the plugin name 'cleanUrlTracker'
, and pass in the configuration options you want to set:
ga('require', 'cleanUrlTracker', options);
The following table outlines all possible configuration options for the cleanUrlTracker
plugin. If any of the options has a default value, the default is explicitly stated:
Name | Type | Default |
---|---|---|
stripQuery |
boolean |
When true , the query string portion of the URL will be removed.Default: false
|
queryParamsWhitelist |
Array |
An array of query params not to strip. This is most commonly used in conjunction with site search, as shown in the queryParamsWhitelist example below.
|
queryDimensionIndex |
number |
There are cases where you want to strip the query string from the URL, but you still want to record what query string was originally there, so you can report on those values separately. You can do this by creating a new custom dimension in Google Analytics. Set the dimension's scope to "hit", and then set the index of the newly created dimension as the queryDimensionIndex option. Once set, the stripped query string will be set on the custom dimension at the specified index.
|
indexFilename |
string |
When set, the indexFilename value will be stripped from the end of a URL. If your server supports automatically serving index files, you should set this to whatever value your server uses (usually 'index.html' ).
|
trailingSlash |
string |
When set to 'add' , a trailing slash is appended to the end of all URLs (if not already present). When set to 'remove' , a trailing slash is removed from the end of all URLs. No action is taken if any other value is used. Note: when using the indexFilename option, index filenames are stripped prior to the trailing slash being added or removed.
|
urlFieldsFilter |
Function |
A function that is passed a The Warning: be careful when modifying the |
The following table lists all methods for the cleanUrlTracker
plugin:
Name | Description |
---|---|
remove |
Removes the cleanUrlTracker plugin from the specified tracker and restores all modified tasks to their original state prior to the plugin being required. |
For details on how analytics.js
plugin methods work and how to invoke them, see calling plugin methods in the analytics.js
documentation.
Given the four URL paths shown in the table at the beginning of this guide, the following cleanUrlTracker
configuration would ensure that only the URL path /contact
ever appears in your reports (assumes you've created a custom dimension for the query at index 1):
ga('require', 'cleanUrlTracker', {
stripQuery: true,
queryDimensionIndex: 1,
indexFilename: 'index.html',
trailingSlash: 'remove'
});
And given those four URLs, the following fields would be sent to Google Analytics for each respective hit:
[1] {
"location": "/contact",
"page": "/contact"
}
[2] {
"location": "/contact/",
"page": "/contact"
}
[3] {
"location": "/contact?hl=en",
"page": "/contact"
"dimension1": "hl=en"
}
[4] {
"location": "/contact/index.html",
"page": "/contact"
}
Unlike campaign (e.g. utm
params) and adwords (e.g. glclid
) data, Site Search data is not inferred by Google Analytics from the location
field when the page
field is present, so any site search query params must not be stripped from the page
field.
You can preserve individual query params via the queryParamsWhitelist
option:
ga('require', 'cleanUrlTracker', {
stripQuery: true,
queryParamsWhitelist: ['q'],
});
Note that not stripping site search params from your URLs means those params will still show up in your page reports. If you don't want this to happen you can update your view's Site Search setup as follows:
- Specify the same parameter(s) you set in the
queryParamsWhitelist
option. - Check the "Strip query parameters out of URL" box.
These options combined will allow you to keep all unwanted query params out of your page reports and still use site search.
If the available configuration options are not sufficient for your needs, you can use the urlFieldsFilter
option to arbitrarily modify the URL fields sent to Google Analytics.
The following example passes the same options as the basic example above, but in addition it removes user-specific IDs from the page path, e.g. /users/18542823
becomes /users/<user-id>
:
ga('require', 'cleanUrlTracker', {
stripQuery: true,
queryDimensionIndex: 1,
indexFilename: 'index.html',
trailingSlash: 'remove',
urlFieldsFilter: function(fieldsObj, parseUrl) {
fieldsObj.page = parseUrl(fieldsObj.page).pathname
.replace(/^\/users\/(\d+)/, '/users/<user-id>')
return fieldsObj;
},
});