What is True Video Comprehension (TVC)?

Defining True Video Comprehension (TVC). A ground-breaking, new concept within Computer Vision (CV) that applies to the field of content comprehension

(The first of a series called the Netra Knowledge Series.)

What is True Video Comprehension (TVC)?      

This is the first piece in a series we are calling the Netra Knowledge Series. In this stream, we seek to define and engage the most ground-breaking, emerging concepts within content comprehension technologies that apply to the field of Computer Vision (CV). We aim to highlight the opportunities and extraordinary impacts that trail-blazing CV is inspiring within businesses and across applications, verticals, and industries. Today we focus on Marketing, Media and Entertainment…

The volume of video-based content is continuing to explode, and yet the tools for marketers to harness safe, correctly-targeted, high-quality content aimed at the right audiences have remained lackluster until recently. Why is it so challenging for marketers to reel in and accurately target the high quality of video-based content?

Three challenges are still pervasive for marketers:

  1. Limited comprehension of video assets.
  2. Today’s widely-accepted solutions only scrape (literally) the surface in terms of analysis and understanding of video assets, leading to unavoidable concerns over brand safety, proper monetization, and effective targeting.
  3. Relying on the context of the video environment.
  4. Currently-available solutions only solve for the text-based or audience-based metadata information that surrounds a video asset, without understanding the particulars of the video asset itself, completely missing the breadth of insights (emotion, scene detection, content…) within the video asset
  5. Further, the text and information that “surrounds” a video asset, and the information it provides, is only what is now known as “video contextual.” But this low fidelity of scraping the information surrounding a video asset is unreliable and is not guaranteed to be connected to, or deeply relatable to, the video asset.
  6. Trusting that metadata is reliable.
  7. Alternative solutions focus on targeting based on the metadata assigned to video assets. Metadata, however, is notoriously messy and is typically assigned by a human. Unless the solution scans or extracts data from the actual video, these solutions are poor substitutions for true VC.

To solve the growing challenges that video content creates, Netra is pioneering True Video Comprehension (TVC). Our TVC solution is a breakthrough in its ability to truly comprehend high volumes of video assets, and at 1/10th the cost of other solutions.

What is True Video Comprehension? TVC is the act of decoding, interpreting, and classifying a rapid sequence of images into meaningful constructs. TVC uses AI to scan images at the pixel-level to extract insights on the subject, activity, object, narrative, place, context, and emotion embedded within a video through scene-by-scene analysis. TVC analysis is critical because of its ability to derive insights directly from the video asset, as opposed to situational text or metadata content that may surround the video asset.

We enable our partners to tap into our TVC solution through our API, which uses Computer Vision and AI to enable scene-by-scene information extraction to return a deep array of classification attributes, including GARM brand safety IAB context, affinity, emotions, places, activities, and object detection. Further, our patented technology uses AI to selectively scan a video’s high-priority scenes to enable scalable and efficient processing, leading to the highest-quality video analysis at 1/10th the cost of current industry solutions. Through TVC, the imagery within a video is the primary source of the data, and insights are extracted from the video asset. Such analysis provides the true meaning of a video asset and allows media owners and their vendor partners alike to classify or ‘tag’ it and maximize the value of all content.

With our technology, we’re revolutionizing video (as well as text and static images) analysis and comprehension. Through Netra, media owners and marketers - and their partners - can truly capture critical elements of a video’s features to achieve unparalleled contextual positioning. Where Netra’s technology is unique is in our ability to analyze at the scene-level, with subject, object, and narrative data relayed through our API.

What are the benefits of adopting TVC across the ecosystem?

While each stakeholder has very unique needs and use cases, we have generalized the benefits below for each.


  • Achieve appropriate monetization of undervalued video assets through enhanced contextual signals
  • Differentiate through enhanced brand safety guarantees through scene-by-scene AI-enabled detection - even when meta data is not present (think TikTok!)
  • Provide and leverage a consistent content classification taxonomy across all channels (video, image and text) for monetization, analytics, and editorial use cases


  • Optimize ad placement within the correct context with the highest-potential creatives
  • Eliminate headaches around brand safety with AI-enabled x-rays of streams of video content
  • Develop an advanced and consistent strategy against video content and the internal brands’ desired taxonomy targets


  • Optimize campaign ROAS and offer much richer classifications against video content, achieving higher benchmarks for monetization, delivery, and demand
  • Deliver search for buyers to identify the content they like and wish to target based on search and similarity across the entirety of the publisher portfolio

If you are interested in learning more about how content comprehension can empower your business, reach out here.

Similar posts

Subscribe to our newsletter today!

Get the latest on the application of Computer Vision and Artificial Intelligence and its use across all types of visual content.