What is a Data Collection Platform?
A Data Collection Platform is an essential tool for any business that relies on an application with multiple data needs to interact with its customers.
For users, every digital interaction—whether streaming a movie, placing a bet, or adding an item to a shopping cart—should be a seamless experience. Behind the scenes, multiple teams and technologies work to cultivate this experience. From product to marketing to engineering, these teams strive to maintain and enhance the Quality of Experience (QoE) to keep users engaged and drive business goals.
By analyzing relevant data to gain insights and base decisions on their findings. However, this process is often fragmented. Data is collected by numerous specialized tools, creating data silos, SDK bloat, and an incomplete view of the user journey. That’s why businesses are turning to unified data collection platforms. But is it the right choice for you and your business?
A Data Collection Platform acts as a universal data layer for your entire digital portfolio. It’s designed to capture event data from any application—web, mobile, CTV, or gaming consoles—and provide a standardized, real-time stream of information about what's happening from the "first mile" of user interaction to the "last mile" of conversion or engagement.
These platforms are dedicated to collecting, standardizing, and routing data related to every step of the user experience. This data provides vital, real-time insights into the performance of your application and the behavior of your users. These insights guide product, technology, operations, and marketing decisions, allowing teams to make manual or automated changes to improve the end-user experience and achieve business outcomes.
For example, if your e-commerce app sees a high rate of cart abandonment at the payment step, your teams need to know why—fast. A Data Collection Platform gathers the necessary data for a root cause analysis, reducing the time spent figuring out what to fix. This same data can be streamed to your marketing platform to trigger a real-time retargeting campaign, or to your support tools to enable proactive customer outreach, minimizing revenue loss and improving customer satisfaction.
Why Your Business Needs a Data Collection Platform
If you are committed to providing a high-quality user experience and maximizing the value of your digital assets, you need a comprehensive data solution. A Data Collection Platform is the most efficient way to gain access to clean, correlated, and actionable data in real-time. Here’s a breakdown of the biggest advantages these platforms offer:
Create a Single Source of Truth Having to switch between dozens of dashboards and reports to identify patterns is inefficient and leads to conflicting information. A streamlined approach begins with consolidating all the data you need into a single, standardized format on one platform. This breaks down data silos and creates one comprehensive solution that serves every team’s needs.
Enable a Shared Understanding It's critical that everyone in the organization speaks the same “data language.” This starts with the underlying data used for calculations and metrics. Data Collection Platforms provide a clean, standardized, and trusted data set for the entire organization. Even though your teams might use different tools for analysis or activation, everyone can leverage the same core data to drive critical decisions. This drastically increases efficiency, especially for data scientists who often spend most of their time cleaning and preparing data rather than analyzing it.
Unlock Deep, Custom Analysis Let’s say one of your business goals is to increase user retention. Many might look at a simple metric like Daily Active Users. But a better metric might be Engaged Users, which you define as users who log in at least 3x per week and interact with a key feature. Why? Because you’ve examined your churn rates and see that your most valuable customers meet those minimum qualifications. A Data Collection Platform allows you to easily collect the specific data needed to measure these custom KPIs and join it with attribution data from marketing campaigns to make smarter investment decisions.
Deliver Optimized, Real-Time Experiences Ensuring a seamless user experience is a constant challenge. What happens when a user's connectivity slows? How do you react when they abandon a sign-up flow? Are you able to personalize their experience based on their real-time behavior?
Having access to real-time data enables a future where automated, personalized actions can be deployed instantly. You can trigger a push notification after a user watches a certain type of content, enable a dynamic paywall when they reach a usage limit, or create a support ticket when they encounter friction in a checkout flow. A clean, standardized, real-time data set is the foundation for these powerful, in-the-moment activations.
Leverage Your Data Like Never Before: Datazoom
If you operate a digital business, you stand to benefit from a Data Collection Platform. The sooner you make the investment, the sooner you can unlock the unique benefits and possibilities your data can provide. To maximize the value of your investment, you need a solution that fits and adapts to your business—and that’s why you need Datazoom.
Datazoom is a Data-as-a-Service Platform designed to help you unify and activate your data. Our ecosystem of data collection, standardization, and routing solutions are built to support any digital strategy. By implementing a single, lightweight Datazoom SDK, you create a real-time, standardized data layer from which every business unit can draw the same trusted, clean data to drive results. Want to learn more?
Start 5GB Free trial today!
Have questions? Get in touch, and we’ll get you the information you need — it’s what we do.
Freeing the Video Industry’s Data from Its Black Box
What it means to unbox the black box, release video data from its silos, and improve the online video experience.
The impact of these unforeseen times has narrowed the lens on video analytics dramatically. People have not only embraced how video streaming has become the main source of consuming entertainment content, but also how it’s rapidly starting to play a major role within other industries.
We’re starting to see the adoption of video streaming with work conferences, telehealth appointments, education settings, and more during this pandemic. Video, in general, has had a huge boost not just because of the yearning for connectivity to keep us sane and entertained, but because organizations are now realizing they can still survive and thrive taking their business execution to the screen. With a somewhat “no hands” on deck mentality, organizations are using virtual technologies including video streaming to replace in-person touchpoints and conduct business as usual.
Where the Industry’s at Now
Even before the recent months, as an industry, we’ve also seen big changes under a few major conglomerates with the shuffling of different platforms. Disney swallowed up ESPN and the majority of Hulu, while Viacom scooped PlutoTV and merged with CBS. When Comcast acquired NBC and SKY, AT&T followed suit with DirecTV, Turner, and Otter Media. This proves how all the big players are at the forefront as media corporations scramble to stake their claim in the new streaming world order.
With 80% of the internet now being video traffic, companies can’t dispute their consumers prefer to ingest content via video. With that in mind, those who value the end-user experience are constantly looking for answers, and specifically more data to help them better understand and control the streaming video experience.
Today’s Black Box
A common pain point we hear is that analysts, marketers, and other decision-makers are frustrated by the walled gardens of information they’re forced to operate in. Their patchwork of single platform tools creates scattered data across their video stack, failing to generate proper insights that drive actions or business outcomes. A black box of information without any context or visibility.
For product, operations, marketing, advertising, and business teams at a content publisher, the insights they have come from several systems and technologies that perform analysis without anyone fully understanding their inner workings. These black boxes lack transparency because they’re comprised of very complex systems, contrasting inputs, and complicated algorithms.
When it comes to video streaming, improving the experience of consuming video content requires real-time data. For this to happen, the data must become unified in real-time to power observability, adaptability, and to optimize solutions. For the video stack to truly become actionable, you must maintain a constant pulse on the health of the data in your system. Identifying and evaluating data quality and discoverability issues leads to healthier pipelines, more productive teams, and happier customers.
Why It’s Taken So Long
Streaming is complicated. It’s unique in the sense that it requires an uninterrupted experience for its entirety. Customers expect their viewing experience to be seamless without ample spinning hourglasses. The Internet is used in every other function. It’s adaptable in the sense that you can deliver a file here and a text there. The user’s expectations of file, text, and photo sharing are less impacted by the microsecond changes across the end to end environment. Video is not that forgiving. It’s highly susceptible, down to the very millisecond, to the impact across the delivery chain.
The unique challenge with video is it’s not just a system under a single entity’s control, it’s a system that has control spread throughout many entities. From the content owner to the vendors who support those owners, to the internet providers who balance traffic and connectivity, it’s a distributed system that needs to come together and work synergistically for the final outcome to meet expectations. It requires having the ability to observe, trace between, and influence the control over the interaction between multiple back-end systems.
The industry hasn’t been able to fully apply consistent measurement across the end-to-end process, which prevents businesses from toggling the variables to change the output. Many systems (Encoders, Origin, CDN, Transit, ISPs) are used to prepare, deliver, and play content, but all are monitored independently. Moreover, the data and metric outputs from those systems are inconsistent and unstandardized, preventing true apples to apples analysis. And without a common understanding of end-to-end performance, we can’t pinpoint operational breakages or areas to improve. This leaves us with the black box because we never used a precise consistent measurement, standardized as an industry, or tried to pull together a framework as we do for user experience. It’s a fully distributed system that needs to come together.
Generally speaking, when it comes to data sharing, how nice would it be to share some non-private, telemetry data about how certain services are performing? Sharing this technical feedback with multiple outside vendors would in turn help them work cohesively and serve you better. Are there ways to collect the right data about system performance and the effects on video quality and share them today?
How To Unbox the Black Box
Breaking the seal around what’s happening for the end-user is the first line of duty. However, the results generated from the tools available today to monitor the end-user experience have great variability in measurement, and this has led to an inability to interpret the current state. Even when conjoining these insights with those from other back-end platforms, the inconsistency of results generated makes it difficult to align all stakeholders efficiently to take action. Today manual re-interpretation of metrics is often required, and this prevents any scalable, automated, or real-time improvements from being deployed.
So how do we get our data and metrics to be reliable and insightful for all? Investments need to be made to ensure consistent data collection and measurement at every stage. Establishing a single methodology for what and how things are monitored and measured will create a common understanding of system performance. Therefore, when we tie together insights, we can easily deduce what variables impact our end to end workflows, and thus actually control the outcome.
Agreeing to not only sharing insights but using shared data collection and measurement methodologies will allow all stakeholders, including external vendors, to align and take action to best support the end-user experience.
A quality video data platform will help all stakeholders involved in the end-to-end video pipeline to do their job more efficiently, and thus help level the playing field for customers of any size to take advantage of the internet to deliver content. You don’t have to be Comcast or Disney to create a great user experience for your users if you can efficiently and effectively align all parties involved to make it happen. You can start by creating data pipes to customize which datasets are shared internally, and which can be provided as feedback for vendors.
Essentially to optimize an end-to-end workflow requires that everyone is able to optimize their system, and thus do their part. If we do that, we can raise the bar for all video delivery and deliver flawless video experiences.
A Glimpse into the Future
A business operates best when everyone’s on the same page. Your video systems should be run the same way. If you can tap into the power of raw data to align your technologies with a single source of truth, you’ll create a vast ecosystem.
The future surely holds more data standards creation, adoption, and technical data sharing between entities at different stages of the end to end workflow — together we can eliminate the black box. If all parties can be more transparent, practices can be improved, and opportunity cost can be reduced. With more controlled data sharing in a standardized manner, the more likely a premium experience is achieved for end-users.
Joining CDN Logs and Playback Data with the Datazoom Session_ID
When a playback error occurs, a Law & Order-esque drama unfolds for video product managers seeking to understand the root cause of the issue. First, they review the analytics which indicated the error. These often include metrics like high buffer ratios, user drop-offs, and Exits Before Video Starts (EBVS). But when it comes time to dig down through the delivery chain to identify the failing links, siloed data fails them. Then the real mystery begins, what caused the problem?
Today, we have no shortage of alerts, indicators, metrics, and reports which define playback errors. However, aside from institutional knowledge (really a glorified ‘best guess’), there are few resources available to identify the culprit, or culprits, causing the problem. The resulting confusion affects user QoE and ultimately, revenues.
Fortunately, there’s a way to avoid these mysteries in order to perform efficient and effective root-cause analysis. This methodology centers around an identifier traveling through the delivery chain: the Datazoom Session_ID.
What is the Datazoom Session_ID
The Datazoom Session_ID is like an anchor, a unique 1-to-1 identifier which allows you to correlate events generated during playback against other events generated “upstream.” These events could include ISP drop-offs, CDN abnormalities, a problem with the encoder, et cetera.
As a common variable spanning the entire delivery chain from CDN to end-point, the Session_ID a key nexus with which logs and events from each link can be correlated. This means information like CDN logs can be queried and correlated with client-side player events in an analytics system. Today, we’ll focus on this CDN use case and provide a starting point for testing it.
Implementation of the Datazoom Session_ID is possible for Self-Service and Enterprise customers of Datazoom. For step by step guides, click the links below:
1. Setting Up Custom Header Requests: This article lays out the steps necessary for configuring the Datazoom Session_ID on a webpage hosting a supported Datazoom Collector.
2. Configuring CDN logs to Accept the Datazoom Session_ID: This article lays out the steps for configuring a CDN to accept the Datazoom Session_ID to facilitate the joining of client-side player events with CDN logs. Fastly enables customers to set this themselves, while other CDNs like Akamai, Edgecast, Cloudfront, and Limelight can support this functionality via a request made to your account representative.
Visualizing CDN Data with Playback Data
Once the Datazoom Session_ID is implemented across players and the CDNs, you can begin constructing metrics and visualizations for this data. Our team has prepared a sample dashboard (as an XML file) for Splunk users which can be easily imported into their account.
Alongside conventional QoE metrics built using Datazoom’s Data Dictionary (KPIs like Minutes Viewed, Requests, Starts, Average Time to First Frame, Exits Before Video Start, Average Bitrate, and Buffer Ratio), this dashboard includes CDN focused metrics for Cache Status, Fastly State (for this example), Edge v. Shield, as well as Cache and Cluster Hit Ratios. This dashboard is a great starting point for conducting root cause analysis and obtaining a grasp on how different links in the video delivery chain affect the performance of your service.
Getting Started
Interested in implementing the Datazoom Session_ID across your video delivery stack? Click here to signup for your 15-day, 5GB free trial of Datazoom. Reach out to us if you want more information on how to get started.