The Build Notes+ Complete Template Library for AI Ad Creative (HUGE value)
Build Notes Plus members get the complete template library below: production-ready JSON templates for Midjourney V7 (static ads), Kling AI (product demo video), Google VEO 3.1 (spokesperson video with native audio), and Minimax Hailuo (camera-controlled product reveals).
But you could also just use Manus or MindStudio, they have access to all the models and do video and static images.
Plus the XML campaign brief structure for organizing multi-platform creative, the variant set architecture for systematic A/B testing, and a model-by-model parameter reference so you know exactly what each tool can control.
How These Templates Work
Each template below is a production-ready JSON structure that controls every visual parameter of your ad creative. The ${variable} placeholders are where your specific product data goes. The ${variant} fields are where your A/B testing variables live.
You can use these templates in three ways:
- Manual generation. Copy the template, fill in your product details, and use the output as a prompt for the relevant tool. This is the fastest way to start.
- LLM-assisted generation. Feed the template to Claude or GPT along with your product brief, and ask it to populate the template and convert it to a natural language prompt optimized for the specific model. This handles the translation layer for tools that don’t accept JSON natively.
- Agent-automated generation. Next week in Issue #005, I’ll show you how to build a MindStudio agent that does steps 1 and 2 automatically like your own little agentic ad factory. But you need the templates first.
Template A: Product Hero Shot (Midjourney V7 / Flux / Nano Banana Pro)
This is your workhorse template for static ad imagery. Product shots for Meta feeds, Instagram, Google Display, and anywhere you need a clean product image with space for text overlay.
⚠️ You can just paste JSON directly into Gemini, ChatGPT, or Manus and it'll know what to do with it.
json
{
"template_id": "product_hero_v3",
"tool": "midjourney_v7",
"description": "${product_name} hero shot for ${platform}",
"environment": {
"location": "${surface_variant}",
"surfaces": ["${surface_material}"],
"props": ["${prop_1}", "${prop_2}", "${prop_3}"],
"spatial_arrangement": "product centered, props arranged naturally"
},
"visual_style": {
"aesthetic": "${brand_tone} product photography",
"mood": "${mood_variant}",
"lighting": {
"type": "${lighting_variant}",
"direction": "soft side lighting from left",
"intensity": "bright and airy",
"color_temperature": "neutral to slightly warm"
},
"camera": {
"angle": "slightly above eye level, 25 degrees",
"distance": "medium shot",
"lens": "85mm",
"aperture": "f/2.8",
"depth_of_field": "shallow, product sharp, background softly blurred"
},
"color_palette": ["${brand_color_1}", "${brand_color_2}", "${brand_color_3}"]
},
"composition": {
"product_placement": "center-left third",
"product_scale": "40% of frame",
"negative_space": "upper right for text overlay",
"balance": "asymmetrical"
},
"constraints": {
"preserve_product_accuracy": true,
"humans_in_image": false,
"no_text_in_image": true
},
"output": {
"aspect_ratio": "${ratio_variant}",
"platform": "${platform_variant}"
}
}
Example Output:
I asked Claude to customize this template to my Build Notes product and then Claude gave me the JSON to paste into Gemini.
Here is the result...
Why each parameter matters:
The camera block is where most people lose quality. “85mm” and “f/2.8” aren’t just photography jargon. 85mm compresses the background in a flattering way that makes products look premium. f/2.8 creates that shallow depth of field where the product is razor sharp and everything behind it falls into a creamy blur. This is what separates an amateur product photo from one that looks like it belongs in a magazine.
The composition block solves the “where do I put my headline?” problem. By specifying “negative space: upper right for text overlay” and “product_placement: center-left third,” every generated image leaves room for your ad copy in exactly the same spot. Your designer (or your Canva template) never has to fight the image for space.
The constraints block prevents the AI from doing things you don’t want. “no_text_in_image: true” is critical. AI-generated text in images still looks terrible in 2026. Always add text in post-production. “humans_in_image: false” prevents the six-fingered hand problem entirely for product shots.
Converting to a Midjourney prompt:
Feed this JSON to Claude or GPT with this instruction:
``` Convert the following JSON creative brief into a Midjourney V7 prompt. Follow Midjourney’s prompt formula: [Photography type], [product description], [surface/setting], [lighting], [color grading], [composition], [mood] –ar [ratio] –s [stylize] –v 7 –no [negatives]Keep stylize between 150-200 for commercial photography. Include –no text, watermark, people ```
The output will be something like:
Commercial photography, premium supplement bottle with forest-green label, clean marble countertop with scattered fresh mint leaves and a glass of green smoothie, soft natural window light from left creating gentle shadows, warm neutral color grade, center-left composition with generous negative space on right for headline, fresh aspirational wellness mood --ar 4:5 --s 180 --v 7
--no text, watermark, people
Midjourney V7 Parameters You Should Know:
| Parameter | Syntax | What It Does for Ads |
|-----------|--------|---------------------|
| Aspect Ratio | `--ar 4:5` | Match platform (4:5 feed, 9:16 story, 16:9 banner) |
| Stylize | `--s 100-250` | Lower = literal accuracy; higher = editorial feel |
| Chaos | `--c 0-50` | 0 for finals; 30+ for concept exploration |
| Quality | `--q 2` | Maximum detail for production assets |
| Seed | `--seed 12345` | Lock composition for controlled A/B tests |
| Negative | `--no text, watermark` | Suppress unwanted elements |
| Draft | `--mode draft` | 10x faster at half cost for rapid iteration |
| Style Ref | `--sref
| Omni Ref | `--oref
| Repeat | `--r 10` | Generate 10 variations in one command |
The batch testing shortcut: Midjourney’s permutation syntax lets you test multiple variables in one command. Wrap variables in curly braces:
```Commercial photography, supplement bottle,{marble countertop, wooden table, white studio surface},{soft window light, studio three-point lighting, golden hour warmth}--ar {4:5, 1:1, 9:16} --s {100, 200} --v 7 --no text```
That single input generates 3 x 3 x 3 x 2 = 54 unique prompts. Combined with --seed to lock base composition, this is true single-variable A/B testing.
Template B: Product Demo Video (Kling AI)
Kling AI is the strongest option for product demonstration videos. Multi-shot sequences up to 15 seconds, 4K HDR, and native audio generation. Its image-to-video mode is especially powerful: start with your actual product photo and animate it.
Text-to-Video Template:
json
{
"template_id": "product_demo_video_v2",
"tool": "kling_ai",
"api_payload": {
"model": "kling-v2.6-pro",
"prompt": "${camera_movement} ${product_action} on ${surface_variant}. ${lighting_description}, shallow depth of field, ${color_grade}. ${style_modifier}, commercial product photography.",
"duration": 5,
"aspect_ratio": "${ratio_variant}",
"mode": "professional",
"negative_prompt": "text, watermark, blurry, distorted hands"
},
"camera_variants": [
"Close-up macro shot, slow push in.",
"Medium shot, slow orbit around product.",
"Low angle hero shot, slight upward tilt."
],
"action_variants": [
"A hand reaches in and picks up the ${product_name}, tilting it toward camera",
"The ${product_name} rotates slowly, light reflections glide across surface",
"${product_name} sits center frame, steam or particles drift past"
],
"style_variants": [
"Cinematic, cool tones",
"Warm and inviting, golden hour feel",
"Clean and clinical, bright whites"
]
}
Image-to-Video Template (the money shot):
This is where Kling really shines for advertisers. Take your best static product image (from Template A or from a real photoshoot) and animate it.
json
{
"template_id": "product_i2v_v1",
"tool": "kling_ai",
"api_payload": {
"type": "pro-image-to-video",
"prompt": "${motion_description}",
"image": "${product_hero_image_url}",
"duration": 5,
"aspect_ratio": "${ratio_variant}"
},
"motion_variants": [
"The bottle rotates slowly on the surface, light reflections glide across the glass, slight camera push-in",
"A hand reaches into frame and picks up the bottle, examining the label, shallow depth of field",
"Camera slowly orbits the product, background shifts from shadow to light, revealing the label"
]
}
Critical I2V (Image2Video) rule: Do NOT redescribe the image. The image IS the scene. Your prompt should ONLY describe what moves and what the camera does. Redescribing the image causes the AI to reinterpret it and you lose product accuracy.
Kling’s Prompt Formula: Subject + Action + Context + Style. Keep prompts under 50 words for best results.
Example Output:
Camera Movements Available: Horizontal pan, vertical movement, zoom, tilt, roll, and tracking. For 360-degree product rotation, type “360-degree rotation” directly in the prompt. Always specify why the camera moves (“camera zooms in on the label” beats “camera zoom in”).
Pricing: API runs $0.07-$0.14 per second. A 5-second Professional-mode clip costs roughly $1.75. Pro subscription at $26-37/month gives you about 85 five-second clips.
⚠️ Manus has Kling and Nano Banana built in, meaning you could use Manus to build all your static image ads and video ads. Also, you could build these templates into an agent with MindStudio.
Template C: UGC Spokesperson Video (Google VEO 3.1)
VEO is the only tool that generates synchronized dialogue, sound effects, and ambient audio alongside the video. For spokesperson-style ads, this means a complete ad from one prompt. No voiceover session. No audio editing. No lip-sync nightmares.
json
{
"template_id": "ugc_spokesperson_v1",
"tool": "google_veo",
"api_payload": {
"instances": [{
"prompt": "${shot_type} on a ${lens}. A ${spokesperson_description}, speaking directly to camera with natural enthusiasm. ${spokesperson_action}. ${environment}. Style: authentic UGC feel, ${color_grade}, fine skin detail. Audio: ${audio_environment}.",
"referenceImages": [{
"image": {
"gcsUri": "${product_image_gcs_url}",
"mimeType": "image/png"
},
"referenceType": "asset"
}]
}],
"parameters": {
"aspectRatio": "${ratio_variant}",
"durationSeconds": 8,
"generateAudio": true,
"negativePrompt": "text overlay, watermark, blurry",
"resolution": "1080p",
"sampleCount": 2,
"seed": "${seed_for_consistency}"
}
},
"shot_variants": [
"Medium close-up",
"Over-the-shoulder with product visible",
"Wide shot showing full environment"
],
"script_variants": [
"She holds up the product and says, 'This changed my morning routine completely.'",
"He sets the product on the counter and says, 'I was skeptical until I tried it for a week.'",
"She gestures toward the product and says, 'My clients keep asking me what I'm using.'"
]
}
VEO’s “Ingredients to Video” feature accepts up to 3 reference images of your product, maintaining packaging accuracy, brand colors, and logo placement throughout the video. This solves the problem of AI tools inventing product details. Use the referenceImages array to pass your actual product photos.
Scene extension chains clips beyond the 8-second limit, enabling 60+ second sequences. Use seed for reproducibility across chained clips.
Pricing: $0.75/second with audio via Vertex AI (an 8-second clip is about $6). VEO 3.1 Fast drops to roughly $0.15/second ($1.20 per clip), which is viable for volume production.
Template D: Camera-Controlled Product Reveal (Minimax Hailuo 2.3)
Hailuo gives you the most explicit camera control of any video model. Its bracket-based camera tokens let you specify exactly how the camera moves, frame by frame.
json
{
"template_id": "product_reveal_hailuo_v1",
"tool": "minimax_hailuo",
"api_payload": {
"model": "MiniMax-Hailuo-2.3",
"prompt": "[${camera_token_1}] ${product_reveal_action} on ${surface_variant}. ${lighting_description}. [${camera_token_2}] ${detail_action}. ${style_modifier}, commercial style, shallow depth of field.",
"duration": 6,
"resolution": "1080P"
},
"camera_token_sets": [
["Push in", "Zoom in"],
["Truck left", "Pan right"],
["Pull out", "Static"],
["Tracking shot", "Zoom in"]
]
}
All Available Camera Tokens:
[Pan left], [Pan right], [Tilt up], [Tilt down], [Truck left], [Truck right], [Push in], [Pull out], [Pedestal up], [Pedestal down], [Zoom in], [Zoom out], [Shake] (handheld feel), [Tracking shot], [Static]
You can combine up to three: [Truck left, Pan right, Zoom in]
Why Hailuo for high-volume production: At roughly $0.27 per 6-second clip at 768p, Hailuo is the cheapest model for social video. Generate 100 clips for under $30. The tradeoff: no native audio. All Hailuo videos are silent. You add voiceover, music, and sound effects in post. For social ads where you’re adding a music track and text overlay anyway, this isn’t a limitation.
Subject Reference (S2V-01 model) maintains face/character consistency across multiple clips, essential for spokesperson campaigns. First-and-last-frame conditioning lets you specify exact start and end states for precise product reveal timing.
The XML Campaign Brief: Organizing Multi-Platform Creative
JSON handles individual assets. XML handles campaign-level organization. When you need coordinated creative across platforms with consistent messaging and systematic testing, XML gives you the hierarchy.
```xml
<option id="A">natural window light from left</option>
<option id="B">warm golden hour studio</option>
<option id="C">dramatic side light with deep shadows</option>
<option id="A">pain point lead: frustration with current solution</option>
<option id="B">aspiration lead: vision of desired outcome</option>
<option id="C">proof lead: specific result or statistic</option>
<option id="A">close-up macro, slow push in</option>
<option id="B">medium shot, slow orbit</option>
<option id="C">low angle hero, slight upward tilt</option>
```
This XML brief is your campaign blueprint. It defines what gets tested, where it runs, and which templates produce each asset. One brief. Eleven coordinated, on-brand, systematically varied assets across four platforms.
In next week's Issue #005, I'll show you how to build the agent that reads this brief and executes the entire thing automatically.
But it can be as simple as giving these templates to Claude or the enemy (ChatGPT) and they'll know what to do with them.
The Variant Architecture: How to Run Real A/B Tests with AI Creative
The variant sets are where the testing discipline lives.
Here’s how to think about them.
Start with 3 variables, 3 options each. That gives you 27 possible combinations. You don’t run all 27. You pick the highest-impact subset based on your campaign objective.
For conversion campaigns: Test copy angle first (pain vs. aspiration vs. proof), lighting second (it affects emotional response more than most people realize), CTA treatment third.
For awareness campaigns: Test visual style first, then mood, then composition.
For retargeting: Test urgency framing first, then social proof elements, then offer presentation.
The seed trick for true A/B testing: In Midjourney, use --seed to lock the base composition. Change only one variable. The output will differ only in the variable you changed. This is real experimental design, not “let’s generate a bunch of stuff and see what looks good.”
Example variant set for a supplement brand:
json
{
"variant_matrix": {
"lighting": {
"A": "soft natural window light from left",
"B": "warm golden hour studio lighting",
"C": "cool clinical overhead, bright and clean"
},
"surface": {
"A": "white marble countertop with grey veining",
"B": "natural oak wood table",
"C": "clean white studio surface, no texture"
},
"mood": {
"A": "fresh, clean, aspirational wellness",
"B": "premium, sophisticated, luxury",
"C": "warm, approachable, everyday health"
}
},
"test_plan": {
"round_1": {
"test": "lighting",
"hold_constant": {"surface": "A", "mood": "A"},
"generate": ["lighting_A", "lighting_B", "lighting_C"],
"seed": 88421
},
"round_2": {
"test": "surface",
"hold_constant": {"lighting": "winner_from_round_1", "mood": "A"},
"generate": ["surface_A", "surface_B", "surface_C"],
"seed": 88421
},
"round_3": {
"test": "mood",
"hold_constant": {"lighting": "winner_from_round_1", "surface": "winner_from_round_2"},
"generate": ["mood_A", "mood_B", "mood_C"],
"seed": 88421
}
}
}
Three rounds. Nine total generations. You’ve isolated the winning lighting, the winning surface, and the winning mood. Your final creative combines all three winners. That’s a data-driven creative decision made for under $5 in generation costs.
Quick Reference: Which Model for Which Ad Type
| Ad Type | Best Model | Why | Cost per Asset |
|---------|-----------|-----|----------------|
| Product hero shot | Midjourney V7 | Highest image quality, best text-free rendering | ~$0.15 |
| Lifestyle product scene | Midjourney V7 or Flux | Consistent environment generation | ~$0.15-0.25 |
| Product demo video (5s) | Kling AI V3 Pro | Multi-shot, native audio, strong I2V | ~$1.75 |
| UGC spokesperson video | Google VEO 3.1 | Only model with native dialogue + audio | ~$1.20-6.00 |
| Camera-controlled reveal | Minimax Hailuo 2.3 | 15 camera tokens, cheapest per clip | ~$0.27 |
| High-volume social clips | Minimax Hailuo 2.3 | Best cost efficiency at scale | ~$0.27 |
| Product rotation / 360 | Kling AI V3 Pro | Best motion quality for rotating objects | ~$1.75 |
Your Action List for This Week
Here’s what to do before next week’s agent build drops:
[ ] Pick one template above that matches your most common ad format
[ ] Fill in the ${variables} with your actual product data
[ ] Feed the completed JSON to Claude and ask it to convert it
[ ] Generate 3 variants changing only one variable (use the variant sets above)
[ ] Compare the output quality against your last manually-prompted generation
[ ] Build variant sets for the 3 variables that matter most in your campaigns
[ ] Create an XML campaign brief for your next product launch using the structure above
[ ] Start a template library folder (you’ll need this for the agent build next week)
[ ] Bookmark the model reference table so you know which tool to reach for
[ ] Set up a free MindStudio workspace (you’ll need it for Issue #005)
That’s the full template library.
Next week in Build Notes #005: “The Ad Agency Agent:”
I’ll show you how MindStudio makes these templates work at scale, why it beats n8n, Make, and Zapier for AI creative workflows, and you’ll get the complete step-by-step agent build.
If you haven't upgraded to Build Notes+ yet, make sure you do so you don't miss part 2 of this series.
If you want the pre-build MindStudio agent ready to deploy without doing any of this yourself, that’s what Build Club is for. Every template and agent featured in Build Notes is available in the Build Club library.