Conversent help center
How to caption your content
Conversent makes it easy to add accurate captions to your podcasts, videos, lectures, and multimedia content. Whether you're looking to improve accessibility, boost SEO, or engage a broader audience, our caption tools work seamlessly with your workflow.
1
Upload your content
Getting Started
-
1Log in to your Conversent dashboard and navigate to your program.
-
2Click Upload Content in the Caption Library section.
-
3Select your media files. You can upload video files (.mp4, .mov), audio files (.mp3, .wav, .aac), or any media format with embedded audio.
-
4Choose whether to enable Auto-Captioning to automatically generate captions using speech recognition.
-
5Click
Uploadand your files will begin processing in the background.
Supported Formats & Requirements
Video formats
.mp4, .mov
Standard video files with embedded audio tracks.
Audio formats
.mp3, .wav
Any audio file with clear spoken content.
Auto-Captioning processes files in the background and typically completes within 30 minutes to 2 hours depending on file length and complexity.
2
Review & edit captions
Using the Caption Editor
-
1Once processing completes, click Open Caption Editor to review your captions.
-
2View captions in the timeline-based interface with audio playback. Each caption shows the speaker name, text, and timestamps.
-
3Fix text errors by double-clicking any caption line and typing your correction.
-
4Adjust timing by clicking and dragging caption boundaries in the timeline. Shift the start or end time as needed.
-
5Assign speaker names and manage speaker identification. Tag voices so Conversent can recognize them in future episodes.
Quality Review Checklist
-
✓All dialogue is captured accurately
-
✓Speaker identification is correct
-
✓Timing aligns with actual speech
-
✓Proper names and technical terms are spelled correctly
Your changes are automatically saved as you edit. You can pause and resume editing at any time — your progress is preserved.
3
Download & publish
Supported Caption Formats
VTT (Web Video Text Tracks)
Best for YouTube, web players, HTML5 video. The most compatible format for modern platforms.
SRT (SubRip)
Best for video editing. Widely compatible with Premiere Pro, DaVinci Resolve, and desktop players.
ASS (Advanced SubStation Alpha)
Best for custom styling. Supports speaker colors, fonts, and positioning for specialized formatting.
YTT (YouTube Timed Text)
Best for YouTube-native captions. Direct upload to YouTube without conversion.
Publishing Guide
-
1Select your caption format based on where you'll publish (YouTube, web, editing software, etc.).
-
2Click
Downloadto generate your caption file instantly. -
3Upload the caption file to your platform's caption settings. Most platforms have a "Upload captions" or "Add subtitles" option.
-
4Publish and your captions will be live for viewers.
Pro tip: Download captions in multiple formats to use across different platforms simultaneously.
Advanced Features
A
Building your voice profile
-
1As you caption more content, Conversent learns your unique speaking patterns and voice characteristics.
-
2Speaker recognition improves — Conversent gets better at identifying recurring speakers in future episodes.
-
3Caption accuracy increases — Repeated words and phrases are recognized with higher confidence over time.
-
4Processing speeds up — Personalization reduces processing time for subsequent videos as Conversent becomes more familiar with your content.
Learn more about how voice learning works and the benefits of building your caption library in our detailed How Conversent Learns Your Voice guide.
B
Managing speakers
-
1Adding speakers: In the Caption Editor, identify a speaker by name. Conversent will assign all their lines to their profile.
-
2Merging speakers: If the same speaker is identified multiple times with different names, select them and click "Merge" to consolidate.
-
3Searching speakers: Use the filter to find all episodes featuring a specific speaker across your library.
-
4Privacy note: Only assign names to people who have given consent to have their voice modeled and analyzed.
Frequently Asked Questions
How long does captioning take?
Auto-captioning typically completes in 30 minutes to 2 hours depending on file length and complexity. Processing happens in the background while you work.
What if my captions aren't accurate?
Use the Caption Editor to correct any errors. Conversent learns from your corrections — just edit and save. For recurring issues, the system improves over time.
Can I upload captions I already have?
Yes. You can import existing SRT, VTT, or ASS files instead of using Auto-Captioning. Conversent will sync them with your media file.
Can I caption content in other languages?
English is supported by default. Use the "Language Override" setting when uploading to specify a different language.
Is there a limit to how many captions I can create?
There's no limit to the number of episodes or captions you can process with Conversent.
What happens to my caption data?
Your captions are securely stored in your Conversent account. You own all caption files and can download them anytime.