Working with Transcripts¶

pycamtasia can parse word-level transcripts from two sources: TechSmith Audiate and WhisperX. Both produce a Transcript object with the same API — a list of Word objects with start/end timestamps.

From Audiate¶

Audiate is TechSmith’s companion app for recording and transcribing voiceovers. The .audiate file uses the same JSON schema as Camtasia .tscproj files, with transcription data stored as keyframes.

from camtasia.audiate import AudiateProject

project = AudiateProject('path/to/file.audiate')
for word in project.transcript.words:
    print(f"{word.start:.2f}s: {word.text}")

AudiateProject also exposes:

project.language — language code (e.g. 'en')
project.audio_duration — total audio length in seconds
project.source_audio_path — resolved path to the source audio file
project.session_id — UUID for linking back to a Camtasia session

From WhisperX¶

WhisperX is a free, locally-run speech recognition model that produces word-level timestamps. Install it separately (pip install whisperx).

import whisperx
from camtasia.audiate import Transcript

model = whisperx.load_model('large-v3', 'cpu', compute_type='int8')
audio = whisperx.load_audio('voiceover.wav')
result = model.transcribe(audio, batch_size=4, language='en')
model_a, metadata = whisperx.load_align_model(language_code='en', device='cpu')
result = whisperx.align(result['segments'], model_a, metadata, audio, 'cpu')

transcript = Transcript.from_whisperx_result(result)
print(f"{len(transcript.words)} words, duration: {transcript.duration:.1f}s")

The alignment step (whisperx.align) is critical — it produces the word-level timestamps that pycamtasia needs. Without it, you only get segment-level timing.

Transcript API¶

Once you have a Transcript, the API is the same regardless of source:

Properties¶

transcript.words — list of Word objects
transcript.full_text — all words joined by spaces
transcript.duration — time of the last word’s end (seconds)

find_phrase¶

Find the first word where a phrase begins:

word = transcript.find_phrase("click the submit button")
if word:
    print(f"Found at {word.start:.2f}s")

Matching is case-insensitive and checks consecutive words.

words_in_range¶

Get all words within a time window:

segment = transcript.words_in_range(10.0, 20.0)
for w in segment:
    print(f"  {w.start:.2f}s: {w.text}")

Word fields¶

Each Word has:

Field	Type	Description
`text`	`str`	The word text
`start`	`float`	Start time in seconds
`end`	`float \| None`	End time in seconds (None if unavailable)
`word_id`	`str`	Unique identifier

Audiate vs WhisperX¶

Both produce similar quality transcripts. The main differences:

	Audiate	WhisperX
Cost	Paid (TechSmith subscription)	Free / open source
Runs	Cloud	Locally (CPU or GPU)
Integration	Native `.audiate` file	Requires alignment step
Languages	Multiple	Multiple (large-v3)

WhisperX with the large-v3 model produces comparable word-level accuracy to Audiate. If you’re already using Audiate for recording, use its transcript directly. Otherwise, WhisperX is a solid free alternative.