There is no doubt that data is the lifeblood of streaming video. Although having the right content is critical to keeping and attracting subscribers, data provides the insight to what that content library should be. In the streaming video workflow, dozens of components throw off data which provide insight into everything from viewer behavior to revenue opportunities to performance information. Imagine the components of the streaming workflow as islands all connected together by bridges (APIs) suspended in a roaring river of data. And it’s important to remember that the data relevant to streaming video isn’t just from within the workflow. That river is fed by countless tributaries, including such sources as Google Ad Manager and content delivery networks. The fact that this is a river of data, and not just a trickle, emphasizes how challenging it is for streaming operators to make sense of the information within, to take action against what might amount to billions of data points. It’s like trying to identify two fish from the same clutch by just looking at the water. And yet that’s exactly what many streaming operators try to do in real-time: make sense of all the connections within the massive river of data.
The Impact Of Issues With Data
Of course, handling a large volume of data is only one challenge. There are others (as detailed below), but ultimately, any of these challenges result in one thing: slowing down the ability to take action. These challenges represent blockers to using the data in real-time to make critical business decisions. If there’s too much data coming at one time (rather than a sample of the data, for example), it can take too long to process and display in a visualization tool. Even if the visualization tool is connected to unlimited computational resources, it still takes time to process the data. Of course, unlimited resources are often not available so processing mass amounts of data can add significant time. This, and other kinds of delays with handling the data from the streaming workflow, keeps operators from putting that data to use and that undermines the value of the data in the first place. Consider this example: understanding five minutes after an outage where the outage happened doesn’t mitigate customer discontent. But the outage didn’t just suddenly happen. There was probably data which hinted at the impending problem yet it was lost in the river. Only when the outage happened or was noticed (minutes after processing was completed), and the aftermath evident (such as a suddenly spike in customer emails) did it become impossible to miss.
Understanding the Challenges of Streaming Video Data
As was already pointed out, the volume of data is only one of the challenges facing streaming operators with respect to putting data to use. There are several other challenges which can have just as much of a negative impact as having too much data:
- Delivery time. How fast does the data need to get where it’s going? Many streaming operators employ software in their player to capture information about the viewer experience. But what if that data comes two, three, or even 10 minutes after an issue is detected? Of course, the issue is not in the player. It’s most likely upstream. But having visibility into the viewer experience provides an indicator of other problems in the workflow. So the data needs to be delivered as quickly as it’s needed. The time constraints of individual pieces of data is not a one-size-fits-all approach. Different data needs to be delivered at different speeds.
- Post-processing. Countless hours are spent processing data once it has been received. That post-processing may be automated, such as through programming attached to a datalake or a visualization dashboard, or it may be manual. However it’s carried out, it takes time. But that post-processing must happen to turn the data into usable information. For example, it doesn’t help the ad team to provide them raw numbers on time spent watching a particular ad. What helps is telling them if a particular ad, across all views, has hit a certain threshold of viewing percentage (which is probably a contractual number). In other words, post-processing makes data usable. But when it takes too much time, the value of the data can diminish.
- Standardization. Streaming video can be a unique monster when it comes to data sets. Lots of providers are collecting similar (if not identical) data but may represent it differently. When this happens, that data must be sanitized and scrubbed (post-processed) to ensure that it can be compared with similar values from other providers and used as part of larger roll-ups, such as KPIs. Content delivery network logs are a great example of this. Without any standardized approach to variable representation, streaming operators are forced to come up with their own lingua franca which has to be maintained and enforced with new providers.
Yes, data is critical to the success of streaming platforms. But actually using that data in a meaningful way is fraught with challenges: volume, delivery, processing, and standardization. So just as important as identifying and gathering the right sources of data is having a strategy to deal with these challenges. With the right data and the right strategy, streaming operators can ensure that their viewers are always having the best experience because the operator has access to the right amount of data, optimized and transformed and delivered right where, and when, it’s needed.
In the next blog post of this series, we’ll take a look at data volume in more detail and the ways that it might be mitigated. Getting the right data to the right people is critical for streaming platform success. But that’s sometimes more easier said than done.