Making Sense of Data Sampling in Google Analytics 4

March 21, 2024

5 Minute Read
Making Sense of Data Sampling - Be Found Online

If you've had the opportunity to explore Google Analytics 4 (GA4), you may have noticed that while the platform provides powerful analytics capabilities and insights, there are occasional nuances or constraints which may prevent you from fully harnessing your data’s potential. 

 

You might observe certain gaps in your reports or encounter discrepancies in data accuracy. In some instances, values may take the form of "not set," making the extraction of meaningful insights challenging, though not insurmountable. So long as your tracking has been implemented correctly (Ask us how we can help you set up your tracking!), then you can assume these variances are coming from GA4.

 

In the majority of cases where you see a “not set” value in your data, your best bet is that it has to do with one of the following issues: 

  1. Data Sampling
  2. Data Thresholding
  3. Data Cardinality Limits. 

These three obstacles prevent you from having full visibility on and accessibility to your data. Fortunately, there’s a relatively simple fix for this; cue BigQuery to the rescue!

 

What Is BigQuery?

BigQuery is a fully-managed, serverless data warehouse and analytics platform provided by Google Cloud. It allows users to run SQL-like queries against large datasets in real-time, making it suitable for analyzing and processing massive amounts of data. It also erases the sampling, thresholding and cardinality issues we would otherwise run into in GA4. 

 

Even better, Google provides free, direct data exports to BigQuery from GA4, and BigQuery comes with a free terabyte (1TB) of data storage per month. BigQuery does charge for additional storage, as well as frequent querying, but because the application is serverless and dynamically scalable, these costs are relatively low. Many users get by with only paying a few bucks a month!

 

To better understand the value of integrating BigQuery into your analytics arsenal, let’s take a deeper look at the three potential sources of issues mentioned above.

 

Data Sampling

Data sampling occurs when a subset of your data is analyzed for patterns and trends, and that logic is then applied to the entire data set from which you then gather your insights. Data sampling provides a snapshot of what’s going on. Google implemented this in Google Analytics to reduce query load times and keep costs down by limiting server-usage. 

 

GA4 opens up your business to deeper and more detailed data collection with the addition of custom parameters instead of the Event Model, Label, and Action model of Universal Analytics (UA). Google can rely on data sampling less when you’re paying them for GA360, but since you can gather an infinite amount of data, at some point, so much will still result in sampling.

 

For a little more on data sampling, here's a video clip from our Director of Analytics, Jon Phillips:

 

 

 

Data Thresholding

On the other hand, if your site doesn’t see as much traffic, or you’re just not collecting much data for a certain dimension, you mayExample of data thresholding in Google Analytics 4notice data thresholding in your reports. This occurs when you’re attempting to view a small amount of data and your report fails to populate. Google uses this to protect user’s identities, but it limits visibility of your data. 

 

There are varying minimum thresholds for different types of data, and you can try to boost traffic to collect more data, but this will always be an issue in GA4 to some degree.

 

Data Cardinality

Cardinality refers to the number of unique values assigned to a dimension. Some dimensions have a fixed number of uniqueexample of data cardinality values. High-cardinality dimensions are dimensions with more than 500 unique values in one day. High-cardinality dimensions increase the number of rows in a report, making it more likely that a report hits its row limit, causing any data past the limit to be condensed into the (other) row.


In GA4, only use high-cardinality dimensions when the information collected is necessary for the business, as they can more quickly cause reports to reach the row limit.

 

How Do I Get Started With BigQuery? (And How Can BFO Help?)

Luckily, there are many resources out there on the wild, wild web to get you started on setting up BigQuery, if you’re in the DIY spirit. That said, we know how quickly these types of setups can get complicated and confusing. When you’re working with data that important, it’s critical to your business that you know these implementations are in good, experienced hands.

 

If you’re looking for a trusted, fully-Google certified agency to get you into the world of better data, you can always reach out to us and we’re happy to have a conversation. From billing set up to querying and managing your data, we can walk you through it!

 

Interested in staying up-to-date with all these digital marketing changes? Sign up for our monthly newsletter!

Lauren San Gregory - SEO/Analytics Account Manager

Lauren San Gregory

Lauren graduated from Bowling Green State University with a Bachelor’s in marketing, and minors in Advertising + Entrepreneurship. She is currently in school for my MBA in Data Analytics. Her favorite platforms to advertise with are Google, Wayfair and Amazon. Her favorite thing about working with clients is learning about new industries and products. For example, after working with a sink company, she’s now well versed on sinks and faucets (which has turned out to be surprisingly useful day-to-day). Outside of work, Lauren loves to bake (she makes a mean cream-cheese-filled vanilla bean scone). She loves thrifting, hiking, going to concerts and taking her cat to the beach (as Curtiss likes to point out any chance he can get). An interesting fact about Lauren is that she is going to Italy next year to claim her citizenship (her Dad’s side of the family is Italian).