Video Fair-Use Risk Analysis with Vision-Language Models

Exploring whether multimodal AI can assist in legal reasoning and copyright assessment

Background

What are Vision-Language Models?

Vision-Language Models (VLMs) are advanced AI systems that can process and understand both images and text simultaneously. Unlike traditional AI models that handle only one type of data, VLMs integrate computer vision and natural language processing to analyze visual content and reason about it in natural language. Recent models like GPT-4V (GPT-4 with Vision) can examine images or video frames, identify objects, scenes, and actions, and generate detailed textual descriptions or answer questions about what they observe.

These models work by encoding visual information into representations that can be processed alongside text, enabling them to perform tasks like image captioning, visual question answering, and content analysis. When combined with speech recognition models like OpenAI's Whisper, VLMs can comprehensively analyze video content by processing both visual frames and audio transcripts.

Research Question

Can VLMs Assist in Fair-Use Assessment and Legal Reasoning?

Motivating Question: As vision-language models become increasingly sophisticated, can they move beyond simple content recognition to assist with complex legal frameworks like copyright fair use? Can AI help creators, platforms, and legal professionals identify potential fair-use videos and understand the nuances of transformative use?

This tool evaluates three key capabilities of state-of-the-art VLMs in the context of fair-use analysis:

Video Understanding: Can the model analyze video frames and audio to understand content, context, and purpose?
Content Identification: Can the model identify similar or source content online to recognize copyrighted material?
Legal Reasoning: Can the model apply the four statutory fair-use factors to provide meaningful risk assessment?

Methodology

How We Formulated the Problem

1. Video Processing Pipeline

When a user uploads a video, we extract both visual and audio information:

Frame Extraction: Sample video frames at 1 frame per second (fps) using client-side HTML5 Canvas API, with a maximum of 30 frames to balance detail and computational cost
Audio Transcription: Extract the complete audio track and process it through OpenAI's Whisper API to generate a full speech-to-text transcript
Format: Frames are encoded as base64 JPEG images (768×768px) optimized for GPT-4V processing

2. Two-Stage AI Analysis

Stage 1 - Content Identification:

We first send 8 representative frames to GPT-4V (specifically gpt-4o) with a similarity detection prompt. The model examines the frames and attempts to identify:

What copyrighted work(s) the video appears to use or derive from
Specific titles, creators, and content types
The extent and nature of the source material present

Stage 2 - Fair-Use Evaluation:

Using the identified content, transcript, and 10 detailed frames, we prompt GPT-4V to evaluate the video across the four statutory fair-use factors defined in 17 U.S.C. § 107. For each factor, we provide:

Factor definitions explaining legal criteria (e.g., transformativeness, commercial nature, amount used, market harm)
In-context learning examples showing how the factor applies to hypothetical scenarios
Structured output format requesting a score (0-100, where 0 = strong fair use, 100 = high infringement risk) and detailed explanation for each factor

3. Aggregating Results

The model returns a structured JSON response containing:

Overall fair-use risk score (0-100) with risk level classification (Low/Moderate/High)
Confidence score indicating analysis completeness
Individual scores and explanations for each of the four factors
Identified similar content from Stage 1

The Ultimate Goal

Can VLMs meaningfully assist in legal frameworks? Beyond fair use, this research explores whether vision-language models can:

Identify nuanced legal concepts like "transformativeness" from visual and audio evidence
Balance multiple factors to reach holistic legal conclusions
Provide explanations grounded in observable content rather than hallucinated reasoning
Serve as decision-support tools for creators, platforms, and legal professionals navigating copyright

Important Disclaimer: This is a research tool, not legal advice. AI-generated scores are heuristic assessments based on pattern recognition, not legal expertise. Lower risk scores do not guarantee fair use; higher scores do not prove infringement. Only courts can make definitive fair-use determinations.

Upload Video for Analysis

Upload an MP4 video (max 200MB recommended)

Choose Video File

Technical Details

System Prompts and Implementation

Below are the exact prompts used to interact with GPT-4V. These prompts were engineered to elicit structured, evidence-based responses grounded in observable video content.

Stage 1: Content Identification Prompt

Model: gpt-4o (GPT-4 with Vision)
Input: 8 representative frames from the video
Temperature: 0.3 (low variability for consistency)
Response Format: JSON object

You are analyzing video content to identify what it resembles or derives from.
Examine these frames. Identify:
- What copyrighted work(s) this video appears to use
- How much of the original is present
- The nature of the original work

Respond in JSON format:
{
  "summary": "brief description",
  "identified_works": [
    {"title": "work name", "creator": "creator", "confidence": "high/medium/low", "evidence": "what you observed"}
  ],
  "unidentified": true or false
}

Stage 2: Fair-Use Evaluation Prompt

Model: gpt-4o
Input: 10 frames (high detail), full transcript, and identified content from Stage 1
Temperature: 0.3
Response Format: JSON object
Max Tokens: 2000

You are a fair-use assessment tool. Given video frames, transcript, and identified source material,
evaluate fair-use risk across the four statutory fair use factors. You are producing a heuristic risk assessment for educational purposes,
not legal advice. Base your reasoning on the supplied evidence only; if critical context is missing, reflect this in a lower confidence score
and state key assumptions.

Define the four factors (1–2 sentences each):

1) Purpose and Character of the Use: Assess why and how the material is used (e.g., commentary, criticism, news reporting, teaching, parody, research) and whether the use is transformative (adds new meaning, message, or purpose) versus a substitute. Also consider whether the use is commercial or nonprofit/educational.

2) Nature of the Copyrighted Work: Consider what kind of original work is used and how "creative" or "published" it is. Use of highly creative, fictional, or unpublished works generally increases infringement risk relative to factual, informational, or published works.

3) Amount and Substantiality: Evaluate how much of the copyrighted work is used (duration/percentage) and whether the "heart" or most memorable/key parts are taken. Using only what is reasonably necessary for the stated purpose lowers risk; extensive or continuous copying raises risk.

4) Effect on the Market: Assess whether the use could act as a market substitute for the original or harm licensing/derivative markets (including plausible markets the rights holder exploits). If viewers could consume the original via the new video instead of the original (or its licensed clips), risk is higher.

Scoring guidance (0–100, where 0 = strong fair use indications, 100 = high infringement risk):
- 0–33: strong fair-use signals; limited copying; clear transformative purpose; minimal market harm.
- 34–66: mixed signals; some transformative elements but nontrivial copying or unclear market impact.
- 67–100: weak fair-use signals; mostly republishing; extensive copying/"heart" taken; likely substitution or market harm.

Output requirements:
- Respond ONLY with valid JSON matching the schema below (no markdown, no extra keys).
- "overall_risk_score" should be consistent with factor scores (not necessarily a simple average).
- "risk_level" must be exactly one of: "Low Risk (0-33)", "Moderate Risk (34-66)", "High Risk (67-100)".
- "confidence_score" reflects evidentiary completeness (lower if transcript is missing, source unclear, frames insufficient, etc.).

IN-CONTEXT EXAMPLE (for calibration only; do not reuse facts unless supported by the provided input):

Input (abridged):
Transcript: "In this video essay, I critique the film's portrayal of history. Here is a 6-second clip to illustrate the scene I'm discussing..."

Similar Content Found:
{
  "summary": "The frames resemble excerpts from a professionally produced feature film; multiple frames appear to be direct clips with original cinematography and color grading.",
  "identified_works": [
    {
      "title": "Feature film (exact title not fully confirmed from frames alone)",
      "creator": "Unknown/Not confidently determinable from frames alone",
      "confidence": "medium",
      "evidence": "Several frames show consistent character designs, cinematic lighting, and continuity across shots, suggesting direct reuse of scenes rather than recreated footage."
    }
  ],
  "unidentified": false
}

Expected JSON output:
{
  "overall_risk_score": 28,
  "risk_level": "Low Risk (0-33)",
  "confidence_score": 78,
  "factors": {
    "purpose_and_character": {
      "score": 20,
      "explanation": "The use appears to be criticism/commentary with added analysis, and the clips serve an illustrative purpose rather than republishing the work. The presentation is transformative and not simply a substitute for the original."
    },
    "nature_of_work": {
      "score": 45,
      "explanation": "The source is a creative audiovisual work, which generally weighs against fair use, but this factor is often less decisive when the purpose is commentary."
    },
    "amount_and_substantiality": {
      "score": 25,
      "explanation": "Only brief, non-continuous clips are used and seem reasonably necessary to support specific points. There is no indication that the video's value comes from extensive or uninterrupted copying."
    },
    "market_effect": {
      "score": 22,
      "explanation": "Because the video uses short excerpts with commentary, it is unlikely to substitute for viewing the film or licensed full-scene distribution. Any market harm appears minimal based on the limited overlap described."
    }
  }
}

Now evaluate the following inputs.

Transcript: [User video transcript]

Similar Content Found: [Stage 1 output]

Respond ONLY with valid JSON:
{
  "overall_risk_score": <0-100>,
  "risk_level": "Low Risk (0-33)" or "Moderate Risk (34-66)" or "High Risk (67-100)",
  "confidence_score": <0-100>,
  "factors": {
    "purpose_and_character": {"score": <0-100>, "explanation": "..."},
    "nature_of_work": {"score": <0-100>, "explanation": "..."},
    "amount_and_substantiality": {"score": <0-100>, "explanation": "..."},
    "market_effect": {"score": <0-100>, "explanation": "..."}
  }
}

Model Parameters

Parameter	Stage 1 (Similarity)	Stage 2 (Fair Use)
Model	`gpt-4o`	`gpt-4o`
Frames	8 frames (low detail)	10 frames (high detail)
Temperature	0.3	0.3
Max Tokens	1000	2000
Response Format	JSON object	JSON object

Design Rationale

Two-stage approach: Separating content identification from legal analysis mirrors how human experts first establish facts before applying law
In-context learning: Providing a worked example calibrates the model's scoring and explanation style
Explicit factor definitions: Ensures the model reasons from legal principles rather than ad hoc intuitions
Low temperature (0.3): Reduces randomness for more consistent, deterministic assessments
JSON structure: Forces organized output and makes parsing/display straightforward
Confidence scoring: Acknowledges evidentiary limitations (e.g., no transcript, unclear source material)

Understanding Fair Use

Why we built this

Fair use is one of the most important and confusing parts of copyright law. If you make or post videos online, you have probably asked yourself some version of "Is this fair use or will it get taken down?" The law does not give a simple checklist but rather looks at flexible factors and balances them in context. This calculator is meant to turn the abstract legal test into something you can explore interactively. Instead of giving a definitive answer of whether something is fair use, the calculator will help you think through how the fair use factors apply to a specific video and where your choices might create more or less risk. This calculator is focused on videos because that is where a lot of real world fair use questions show up such as reaction videos, commentary, parodies, fan edits, and remixes that reuse other people's clips or audio.

How the calculator works

The calculator is built around four statutory fair use factors, plus a "fifth factor" that captures how judges and juries react in close cases:

Factor 1 - Purpose and character of your use: This is about what you are trying to do with the original work and how you do it. Courts ask whether your use is transformative, whether it adds new meaning or message, and whether it is commercial or noncommercial.

Factor 2 - Nature of the copyrighted work: This looks at what kind of work you are using. Courts tend to give more protection to highly creative works like music videos and fiction than to factual works like news clips or instructional videos.

Factor 3 - Amount and substantiality used: This factor is about how much you use and how important that portion is. Using a small piece can favor fair use, but even a short clip can be risky if it is the "heart" of the work.

Factor 4 - Effect on the market for the original: This asks whether the new video could replace the original or harm its current or potential markets. Courts care a lot about whether your use is a substitute that reduces sales, streams or licensing.

Factor 5 - Fifth factor, are you acting in good or bad faith: Officially, courts are not supposed to decide based on whether they "like" you, but in close cases they do react to tone, intent, and fairness. A video that feels cruel, exploitative, or sloppy about rights is more likely to be treated harshly.

How to use the calculator

For each factor, you will answer five short questions about your video. Every question uses a sliding scale from 1-9 with a score of 1 meaning that "this choice is very friendly to fair use." Conversely, a score of 9 means "this choice looks more like infringement risk." For example, for the "amount used" factor, a question might ask how much of the original audio you use. Sliding to 1 means "none of the original audio," and sliding to 9 means "the entire original audio track." We chose a 1-9 scale instead of yes or no answers because fair use is almost never all or nothing. Courts talk about uses being more or less transformative, more or less commercial, or using more or less of the original. As Brad Rosen repeatedly put it, "Where do you draw the line?" A slider lets you place your video somewhere on the spectrum instead of forcing a simple yes or no.

Why five questions per factor

Each factor covers more than one idea. For example, the first factor is not just "commercial or noncommercial." It also includes transformation, commentary, and parody. If there was only one question per factor, the law would be grossly oversimplified. Five questions per factor lets the calculator touch on different aspects courts actually look at while keeping the quiz short enough to finish in a few minutes.

How scoring works

After you answer all 25 questions, the calculator does two things.

Factor scores: For each factor, we take the average of your five answers. This gives you 5 scores between 1-9, one per factor. Lower scores suggest the facts you reported are more friendly to a fair use argument for that factor. Higher scores suggest more risk on that factor.

Overall risk score: The five factor scores are then combined into one overall "risk" score. In our basic version, a weighted average is used where Factor 1 and Factor 4 are weighted a little more while Factor 2 and Factor 3 carry moderate weight and Factor 5 carries a little less weight. This reflects how courts often treat transformation and market harm as especially important, while also recognizing that tone and good faith can influence close cases. Your resulting score will fall into three bands.

1 to 3: leans toward fair use
4 to 6: is mixed or uncertain
7 to 9: factor leans against fair use

Logistic Regression Analysis

We also trained a logistic regression model on 10 cases where U.S. Law, Technology, and Culture experts answered all 25 questions. This is low data (just for demo purposes) but we analyzed which questions had the biggest impact on the AI's predictions.

The top 5 most impactful questions were: Q15 (Substitute for original), Q13 (Heart of work), Q16 (Replace original), Q20 (Licensing markets), and Q18 (Effect on sales/views). Note that Q11-15 are Factor 3 (Amount & Substantiality), Q16-20 are Factor 4 (Market Effect), Q1-5 are Factor 1 (Purpose), Q6-10 are Factor 2 (Nature), and Q21-25 are Factor 5 (Other).

What's interesting is that most of these come from Factor 4 (Market Effect), pointing to this being the biggest predictor of fair use in the cases we looked at. With more data, analysis like this could show which factor has historically been most important in fair use cases from a statistical perspective.

What this tool is and what it is not

This tool is meant to be a teaching and reflection aid, allowing you to see how different choices affect each fair use factor as well as a way to connect your own video ideas to the fair use test that courts use. This tool is not legal advice, a guarantee that any given video is or is not fair use, or a substitute for talking to a lawyer in a real dispute. The scores and blurbs are based on typical patterns in fair use cases, but real outcomes always depend on specific facts, context, and the court involved. Videos with similar answers could still be treated differently in the real world.

Fair Use Evaluation Tool

A quick, plain-language overview

Learn More