Over the last several posts, we’ve talked about different areas of the Datatecture and why they are important. But putting together a Datatecture for your own streaming video stack is more than just selecting technologies. The companies and technologies listed in the many Datatecture categories represent the lifeblood of a streaming service: data. From ingest to playback and encoding to security, every selected component is a means to derive greater insight into what’s happening within the workflow.
Having that insight can mean the difference between growth and churn. But even as you may want to collect and use as much data as possible, there are significant challenges in doing so.
The Challenges of Collecting All That Data
One of those challenges is data fragmentation. Each of those components within the Datatecture is an island unto itself. There is no impetus for an encoding vendor to share data with a CDN and yet, as the Datatecture illustrates, all of that data is connected. Deriving insights from data within the streaming video technology stack, requires linking between those datasets which means supporting and ensuring compliance of data elements such as a common session or user ID. It can be like herding kittens to get all the vendors to accommodate an individual customer’s requirements . And even when it is possible, it requires a tremendous amount of data post-processing, valuable time which could be spent by optimizing the workflow to improve viewer QoE.
Another challenge is a lack of interoperability. Vendors within the stack have no reason to work with one another which creates a rather thorny problem: variable naming. With no standardization of names, vendors can end up collecting the same data using different nomenclature (or worse, different data using the same nomenclature), making it very hard to normalize and reconcile when post-processing. It can become almost a jigsaw puzzle to ferret out the relationships between data sources.
A final challenge is data accessibility. Even as many of the technologies within the stack have become virtualized and available as cloud-based services or software, there is still no standard way to allow accessibility. Some vendors provide proprietary interfaces, others provide ways to export data, and even others provide APIs for programmatic access. And even when the majority enable APIs through which to gather the data, there is no saying that the APIs are consistent. Some can be REST, some can be SOAP, and some can be based on OpenAPI. The lack of a standard way to access data again puts a tremendous strain on the streaming operator’s engineering resources as developers must be tasked with building and maintaining connectivity to each data source even while contending with normalization.
From Metrics to Raw Data to Observability
There are many issues within the Datatecture that can create challenges to deriving insight but, ultimately, it comes down to post-processing and requires serious data engineering effort. The more challenges there are, the greater the time required in dealing with the data after it has been collected. And the more time that needs to be spent dealing with connectivity issues, normalizing, and fragmentation means the business decisions, like which part of the workflow to optimize or where the root-cause of an issue resides, take longer to make.
Many technology vendors provide a visualization tool to look at the data. Unfortunately, these tools can sometimes reflect a metric rather than expose raw data. This metric is often a calculation carried out against one or more data points using some equation or algorithm the vendor has created, and not shared. Although metrics can be useful, they are only useful within the context of that tool, which doesn’t help in identifying patterns and relationships between data sources or insights seen in other tools. When looking to derive real insight across the streaming workflow, the viewer experience, and viewer behavior, metrics alone are insufficient.
The natural evolution has been, though, to offer direct access to the data itself. In most cases, this is done programmatically (although, as mentioned before, not necessarily in a standardized way). But, again, pulling the data from dozens of technologies within the streaming video stack still runs up against the challenge of normalization. Yes, you can create a giant datalake with all of the data from all of those sources and, yet, post-processing can still be a nightmare.
What streaming operators want, and what many industries have already begun to build towards, is observability. This is the concept of using some filtering or normalizing middleware to handle data relationships prior to the data hitting the lake. What that means is when the operations engineer pulls up the primary visualization tool, like a custom Looker or Datadog dashboard, they don’t have to figure out how the data is related. They can see bigger patterns across a lot of the data sources within the Datatecture and even derive remarkable insights which can positively impact the business like the effectiveness of certain ad types on certain devices within specific content elements. Combined with rich metadata about ads, viewers, and content, this can be truly insightful and end up optimizing ad targeting and, ultimately, CPM.
How Data Enrichment Can Improve Observability
With all of your data gathered from throughout your Datatecture, you can begin your journey towards enabling observability within your business. One of the ways to do that more efficiently is through data enrichment. Remember that you’ll need some middleware or other processes between gathering and storing the data to normalize and relate data sources. Instead of using a computational means, you can simply add data into the stream from other sources. For example, when ad events are captured from the player you could enrich that stream with data from your ad manager, like Google Ad Manager. Because the enrichment is based on an ad ID that is already shared between the ad server (for delivery), the player (for display), and the ad campaign (within GAM), there is absolutely nothing that needs to be done. This stream of data is immediately useful when it hits the datalake.
A Video Data Platform Can Help You Actualize Your Datatecture
The Datatecture is a fundamental layer within the streaming video technology stack. In fact, it’s right above your infrastructure, a river of data flowing between components and technologies that can truly provide valuable insight to affect both the business and the streaming service. But gaining that observability you need can require a lot of upfront development and always begs the question, “is building observability a differentiator for your business?” The answer is probably no. The success of streaming services isn’t gauged by how it can gain insight from its Datatecture. Rather, it is gained through unique content, reliability, availability, and even the user experience.
Datazoom is a powerful Video Data Platform which can enable you to connect all the components within your Datatecture into a single stream of data, enrich it with third-party sources, and deliver it anywhere you need it to be such as a datalake and existing visualization tools.