How to Caption Your Content | Conversent Help Center

Conversent help center

How to caption your content

Conversent makes it easy to add accurate captions to your podcasts, videos, lectures, and multimedia content. Whether you're looking to improve accessibility, boost SEO, or engage a broader audience, our caption tools work seamlessly with your workflow.

1
Upload your content
Getting Started
  • 1
    Log in to your Conversent dashboard and navigate to your program.
  • 2
    Click Upload Content in the Caption Library section.
  • 3
    Select your media files. You can upload video files (.mp4, .mov), audio files (.mp3, .wav, .aac), or any media format with embedded audio.
  • 4
    Choose whether to enable Auto-Captioning to automatically generate captions using speech recognition.
  • 5
    Click Upload and your files will begin processing in the background.
Supported Formats & Requirements
Video formats
.mp4, .mov
Standard video files with embedded audio tracks.
Audio formats
.mp3, .wav
Any audio file with clear spoken content.
Auto-Captioning processes files in the background and typically completes within 30 minutes to 2 hours depending on file length and complexity.
2
Review & edit captions
Using the Caption Editor
  • 1
    Once processing completes, click Open Caption Editor to review your captions.
  • 2
    View captions in the timeline-based interface with audio playback. Each caption shows the speaker name, text, and timestamps.
  • 3
    Fix text errors by double-clicking any caption line and typing your correction.
  • 4
    Adjust timing by clicking and dragging caption boundaries in the timeline. Shift the start or end time as needed.
  • 5
    Assign speaker names and manage speaker identification. Tag voices so Conversent can recognize them in future episodes.
Quality Review Checklist
  • All dialogue is captured accurately
  • Speaker identification is correct
  • Timing aligns with actual speech
  • Proper names and technical terms are spelled correctly
Your changes are automatically saved as you edit. You can pause and resume editing at any time — your progress is preserved.
3
Download & publish
Supported Caption Formats
VTT (Web Video Text Tracks)
Best for YouTube, web players, HTML5 video. The most compatible format for modern platforms.
SRT (SubRip)
Best for video editing. Widely compatible with Premiere Pro, DaVinci Resolve, and desktop players.
ASS (Advanced SubStation Alpha)
Best for custom styling. Supports speaker colors, fonts, and positioning for specialized formatting.
YTT (YouTube Timed Text)
Best for YouTube-native captions. Direct upload to YouTube without conversion.
Publishing Guide
  • 1
    Select your caption format based on where you'll publish (YouTube, web, editing software, etc.).
  • 2
    Click Download to generate your caption file instantly.
  • 3
    Upload the caption file to your platform's caption settings. Most platforms have a "Upload captions" or "Add subtitles" option.
  • 4
    Publish and your captions will be live for viewers.
Pro tip: Download captions in multiple formats to use across different platforms simultaneously.
Advanced Features
A
Building your voice profile
  • 1
    As you caption more content, Conversent learns your unique speaking patterns and voice characteristics.
  • 2
    Speaker recognition improves — Conversent gets better at identifying recurring speakers in future episodes.
  • 3
    Caption accuracy increases — Repeated words and phrases are recognized with higher confidence over time.
  • 4
    Processing speeds up — Personalization reduces processing time for subsequent videos as Conversent becomes more familiar with your content.
Learn more about how voice learning works and the benefits of building your caption library in our detailed How Conversent Learns Your Voice guide.
B
Managing speakers
  • 1
    Adding speakers: In the Caption Editor, identify a speaker by name. Conversent will assign all their lines to their profile.
  • 2
    Merging speakers: If the same speaker is identified multiple times with different names, select them and click "Merge" to consolidate.
  • 3
    Searching speakers: Use the filter to find all episodes featuring a specific speaker across your library.
  • 4
    Privacy note: Only assign names to people who have given consent to have their voice modeled and analyzed.
Frequently Asked Questions
How long does captioning take?
Auto-captioning typically completes in 30 minutes to 2 hours depending on file length and complexity. Processing happens in the background while you work.
What if my captions aren't accurate?
Use the Caption Editor to correct any errors. Conversent learns from your corrections — just edit and save. For recurring issues, the system improves over time.
Can I upload captions I already have?
Yes. You can import existing SRT, VTT, or ASS files instead of using Auto-Captioning. Conversent will sync them with your media file.
Can I caption content in other languages?
English is supported by default. Use the "Language Override" setting when uploading to specify a different language.
Is there a limit to how many captions I can create?
There's no limit to the number of episodes or captions you can process with Conversent.
What happens to my caption data?
Your captions are securely stored in your Conversent account. You own all caption files and can download them anytime.

start Captioning Today