Ensure Data Purity with Whitelisted Domains

At Datazoom, our mission is to help you create a real-time, standardized data layer built on trusted, clean data. A critical component of this is ensuring that the data you collect comes only from sources you own and trust.

In the wild, data collectors can be targeted by anything from scrapers and bots to misconfigured development environments, polluting your dataset with irrelevant or malicious information. This noise makes analysis difficult and can compromise your data integrity.

That's why we’re excited to launch Whitelisted Domains, a powerful new feature for our Collectors that puts you in complete control of your data's origin.

What is Domain Whitelisting?

Think of it as a bouncer for your data collector. This new feature, found in your collector's configuration page, allows administrators to specify an "allow list" of domain suffixes that are permitted to send data.

If a data-collection request comes from a domain on your list, it’s welcomed in. If it comes from an unknown domain, it’s politely turned away.

How It Works

The system is simple, smart, and secure. When a request is made to your collector's endpoint, the system checks the request's Origin or Referer header.

  • Request from a Whitelisted Domain: If the header value matches a domain on your list, the system returns the full, correct collector configuration with a 200 OK status. Collection proceeds as normal.

  • Request from a Non-Whitelisted Domain: If the header is present but does not match a domain on your list, the system returns an empty (but valid) JSON configuration with a 200 OK status. This effectively disables the collector on the unauthorized domain, and any subsequent attempts to POST data from that origin will be rejected with a 4xx error.

  • No Whitelist or Missing Headers: If you haven't configured any domains in your whitelist or if a request arrives with no Origin or Referer headers at all, the validation is skipped, and the full configuration is returned. This ensures that server-to-server integrations or other valid use cases without these headers are not interrupted.

Why It Matters for Your Data Strategy

This feature is a powerful addition to Datazoom's data governance and hygiene capabilities.

  1. Protect Data Integrity: This is the most significant benefit. By ensuring only your true first-party data from your designated web properties is collected, you eliminate data pollution. This means your analytics, from user-journey tracking to conversion funnels, are built on a foundation of pure, reliable data.

  2. Enhance Security: You prevent unauthorized third-party sites or bad actors from sending junk data to your collector endpoints, which is a key part of securing your data pipeline.

  3. Optimize Performance and Cost: By rejecting unwanted traffic at the source, you reduce the load on your entire data pipeline. This helps save money on egress and processing by ensuring you only store and analyze data that matters.

Simple Control for Admins, Clear View for Users

We’ve integrated this feature directly into the collector configuration details page with clear user roles:

  • Admins can easily specify, edit, add, and remove domains from the list. The interface shows "save" and "cancel" buttons as soon as a change is made, making management simple.

  • Standard Users can view the list of whitelisted domains for full transparency but cannot modify it, ensuring your governance policies are maintained.

This new feature is another step forward in making the Datazoom platform the most robust and trusted solution for collecting, standardizing, and activating your data.

To learn more, visit the whitelisted domains help site article.

Previous
Previous

More Accurate User Journeys: The Power of Configurable Session Timeouts