Users have become demanding when it comes to privacy, and regulations by public authorities worldwide have been tightening. We believe it’s for the better, and we stand behind respecting user privacy as the foundation of any personalized marketing campaign. Yet, if you handle your user data like it’s 2023, you can still run powerful marketing personalization. To show you how this article is going to explain our method for supercharging your customer data platform (CDP) with Reverse ETL.
Key Takeaways
- Consumer privacy laws and technology, such as ad blockers, are growing in adoption, scope, and maturity. Tracking marketing continues to get harder.
- Customer Data Platforms (CDPs) are the key to combining all of your customer data while respecting privacy. They’re also key to the ease of managing the data and compliance.
- Reverse ETL is the secret sauce that will give your CDP superpowers, e.g., that’s the tool that helps you anonymize personal data.
- Using GA4 as your only analytics tool is asking for trouble
- Building your own data models keeps data within your own systems. This allows you to have control and stay compliant.
Contents
The Seismic Shifts in Data Privacy Regulations
Sweeping regulations like the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR) mandate businesses to provide stringent safeguards for personal data. Ad blockers are proliferating, and tech giants like Apple are introducing Intelligent Tracking Prevention (ITP) features to more and more devices.
Compliance with privacy laws is not to be taken lightly.
Additionally, privacy and compliance are becoming the default instead of just something you had to do under specific conditions. When you look at California’s set of privacy regulations (CCPA), or most privacy regulations by the US states as well as other countries, you’ll also find that they’re becoming increasingly similar. This is a sign of regulation maturity that is here to stay or grow.
The most common tool, GA4, doesn’t fare well in this landscape. In March and April of 2023, the Norwegian data protection authority concluded that Google Analytics is not in line with GDPR.
While these advancements bolster privacy, they also create challenges for us marketers, such as:
- Attribution of ads and other channels becomes increasingly difficult
- Marketing becomes hindered by compliance rules
The fines and loss of reputation as a result of breaches, such as the high-profile privacy cases suffered by the downstream vendor Customer.io or even sending PII to 3rd parties like Meta without customer consent, can be severe.
In the case of sending data to 3rd parties, the fine was not issued for a data breach. It was for what’s called an operational leak — when a system is set up to send data to 3rd parties without user consent. In legal analogies, think about it as negligence that causes harm.
Why the Old Stack Model Has Gotten Old
In a traditional stack model, data was often moved around using tools like Google Tag Manager (GTM) and loading the tags of the tools you were using from email marketing, i.e., Hubspot, and through various different 3rd party ad providers, i.e., Facebook or Google Ads.
While we still think Google Tag Manager is a great tool for deploying tags, even if you’re using a CDP, it still creates challenges for code governance, not only from a privacy and compliance perspective but also from a website performance perspective.
Moreover, customer data is spread all over the place, which makes things like GDPR deletion requests a massive chore. Effectively, when a user exercises their right to have their data deleted, you will need to go to each of the tools that you send PII to and manually delete the user from the system. This is especially hard since the average company uses at least 12 different MarTech tools.
Further, ad blockers are now too common for this to work, and ITP hinders cookies from staying on the browser. And even if you still managed to get your tracking to stick, you’d have to send a pixel for every ad platform, which is error-prone and developer-heavy.
Customer Data Infrastructure as a Better Way, but Still with Privacy Issues
The rise of Customer Data Infrastructure tools, like Segment Connections, has improved the situation. They offer a more centralized way of managing data and control over what data goes where.
Segment is super helpful here because:
- It collects data from multiple platforms and sources. It then sends it to your analytics, such as Amplitude, a data warehouse, such as Snowflake, and into an event stream used for creating user profiles for communication tools such as Braze.
- It ingests data that was tracked server-side, which is one of the things you want to do to work around ad blockers.
It helps you combine GA4 with Amplitude (or another analytics tool similar to Amplitude). Only using GA4 for your analytics would straight up mean you’re doing it wrong. The combined solution allows you to perform actions such as audience syncing more safely.
Sending PII to third parties is an issue in itself.
While this is a great list and a major improvement, you’d still ultimately be sending a lot of data to third parties to make them work. For example, you will still need to send SHA256 hashed PII to Facebook to have a high match rate. HIPAA and other regulations don’t allow you to send hashed PII, either.
We’re here to help!
Future-proof your privacy compliance, and build the best user data systems you’ve ever seen. Get Segment CDP implemented by the pros.
Customer Data Platform as Your Data Police Officer
A Customer Data Platform (CDP), such as Segment Engage, serves as a repository for all your customer data, securely storing it in one place and preventing its distribution across multiple systems. It ties users together across all their identifiers, making data management more efficient.
As much as Segment is our go-to CDP, the alternatives include Amplitude, Rudderstack, or Tealium.
The new stack model with a CDP at its core is a substantial improvement. Data is sent to the CDP and can remain in the CDP exclusively or be sent selectively to downstream tools. Only the strictly necessary data required by each tool is sent, keeping sensitive, Personally Identifiable Information (PII) and Protected Health Information (PHI) outside of these 3rd party systems. Segmentation and audience creation can occur within the CDP, ensuring more efficient and secure data handling.
A CDP also supports more sophisticated user identity models for creating user profiles and assists in building a fully anonymous stack. Private information is kept secure, and obscured identifiers remain within the CDP. Composite user IDs can be sent to “low trust” tools like Braze or analytics tools, ensuring the right balance between data utility and privacy.
Braze or Amplitude are great tools that we trust in our work at McGaw. But for this scenario, think of “high trust” as only those tools that you control fully. So your CDP and your data warehouse. You can also notice that 3rd parties are mainly sending data to the CDP instead of receiving it.
The high trust zone can have a limited number of users, So a data breach is much less likely to cause a crisis. No PII lives outside of the high trust zone — such as emails, phone numbers, or names. You’re also not enabling Facebook or Google to profile your customers and help the competition target them.
To further help you with attribution, your data is now all in one place, unsampled, and you can run sophisticated user identity models early in the funnel. Identity graphs used in this context are one of the secret sauces of CDPs.
Every CDP needs an owner
Assign a team member as the owner of your CDP. Learn how with our Segment Owner article.
Reverse ETL as the Key Ingredient of Anonymized Data
With Reverse ETL (Extract, Transform, Load), data can be pulled from your CDP as needed. This technique offers several advantages. First, data doesn’t leave your “walled garden,” giving you better control. Second, it keeps platforms like Facebook or Google Ads from receiving more data than necessary. This approach respects user privacy.
What happens here is that Reverse ETL lets you manipulate data into a different format and swap out anything that’s personally identifiable. To make this step even more powerful, combine Reverse ETL with CDP filters.
You anonymize data with Reverse ETL first, and then only send that to low trust or 3rd party tools.
This is the differentiator. None of the downstream tools will have a full profile of a customer of yours anymore. They’ll only see anonymous IDs or user IDs. And you’ll only send that transformed data to exactly the tools you want while also giving them exactly the minimal data they need to work. PII won’t leave your walled garden anymore.
Because this is your own system, it means you create and hold the documentation you need to prove you’re doing due diligence and best practice.
Privacy Use Case #1: Healthcare Communications and HIPAA
If you’re in healthcare and you’re not using a CDP, it’ll be really, really hard for you to be HIPAA compliant. A common use case is the ability of a CDP to create audiences of users and sync them with their email marketing tools. Braze then doesn’t see that a user is suffering from high blood pressure. All it sees is that a user belongs to “Audience 123”. The personalized interactions that Braze creates are then highly relevant even when PII is not exposed.
Privacy Use Case #2: Marketing Attribution
How do you do attribution to 3rd party ads if you’re not able to send any data to the 3rd party ad platforms? In the old model, you’d have Google Ads on one side and the analytics tool on the other. In the new model, you connect a GCLID to the entire user journey from pre-click to conversion. You can then connect the dots, attach the exact ad placement to the user, and track them to revenue.
UTM tagging is critical to making this work. Tools like UTM.io will enforce UTM best practices for you semi-automatically.
Privacy Use Case #3: User Asks You to Remove Their Data
You can transform a user’s PII into a composite user ID, and that’s going to be all the other tools get. So you only send the user’s PII to the CDP and maybe the data warehouse, and the composite ID that’s passed on to the other tools doesn’t expose the user in any way. All the tools’ event streams are associated with the composite ID.
If a user exercises their right under GDPR or CCPA and asks you to remove their personal data from your systems, you go ahead and delete the PII. You will not own anything personally identifiable about the user, but you’ll have the history of interactions under the composite ID, and you can keep personalizing some of your interactions.
This falls under consent management, a set of practices we elaborated on in the article on consent management with Segment and GTM.
Start Building the Stack that Respects Privacy and Powers Personalization at the Same Time
CDPs have emerged as powerful tools in the current privacy landscape. They provide businesses with an improved framework for managing customer data securely and efficiently. As we grapple with privacy concerns and regulatory demands, CDPs will play an increasingly critical role in our digital future.
So go ahead and make moves. Establish a methodical taxonomy, and use our stack builder to unblock your creativity. It will even help you plan out the cost, and as budgets can often be easily decreased if you leave out all the data, you wouldn’t be acting on anyway.
Rajesh says
Your article could not have come at a better time, our organization is currently implementing our own CDP with the same objectives in mind.
You’ve articulated our key requirements, being personalization and privacy, very simply and in a manner that a variety of stakeholders can understand. Thank you!