# Working with Transcripts pycamtasia can parse word-level transcripts from two sources: TechSmith Audiate and WhisperX. Both produce a `Transcript` object with the same API — a list of `Word` objects with start/end timestamps. ## From Audiate Audiate is TechSmith's companion app for recording and transcribing voiceovers. The `.audiate` file uses the same JSON schema as Camtasia `.tscproj` files, with transcription data stored as keyframes. ```python from camtasia.audiate import AudiateProject project = AudiateProject('path/to/file.audiate') for word in project.transcript.words: print(f"{word.start:.2f}s: {word.text}") ``` `AudiateProject` also exposes: - `project.language` — language code (e.g. `'en'`) - `project.audio_duration` — total audio length in seconds - `project.source_audio_path` — resolved path to the source audio file - `project.session_id` — UUID for linking back to a Camtasia session ## From WhisperX WhisperX is a free, locally-run speech recognition model that produces word-level timestamps. Install it separately (`pip install whisperx`). ```python import whisperx from camtasia.audiate import Transcript model = whisperx.load_model('large-v3', 'cpu', compute_type='int8') audio = whisperx.load_audio('voiceover.wav') result = model.transcribe(audio, batch_size=4, language='en') model_a, metadata = whisperx.load_align_model(language_code='en', device='cpu') result = whisperx.align(result['segments'], model_a, metadata, audio, 'cpu') transcript = Transcript.from_whisperx_result(result) print(f"{len(transcript.words)} words, duration: {transcript.duration:.1f}s") ``` The alignment step (`whisperx.align`) is critical — it produces the word-level timestamps that pycamtasia needs. Without it, you only get segment-level timing. ## Transcript API Once you have a `Transcript`, the API is the same regardless of source: ### Properties - `transcript.words` — list of `Word` objects - `transcript.full_text` — all words joined by spaces - `transcript.duration` — time of the last word's end (seconds) ### find_phrase Find the first word where a phrase begins: ```python word = transcript.find_phrase("click the submit button") if word: print(f"Found at {word.start:.2f}s") ``` Matching is case-insensitive and checks consecutive words. ### words_in_range Get all words within a time window: ```python segment = transcript.words_in_range(10.0, 20.0) for w in segment: print(f" {w.start:.2f}s: {w.text}") ``` ### Word fields Each `Word` has: | Field | Type | Description | |-------|------|-------------| | `text` | `str` | The word text | | `start` | `float` | Start time in seconds | | `end` | `float \| None` | End time in seconds (None if unavailable) | | `word_id` | `str` | Unique identifier | ## Audiate vs WhisperX Both produce similar quality transcripts. The main differences: | | Audiate | WhisperX | |---|---------|----------| | Cost | Paid (TechSmith subscription) | Free / open source | | Runs | Cloud | Locally (CPU or GPU) | | Integration | Native `.audiate` file | Requires alignment step | | Languages | Multiple | Multiple (large-v3) | WhisperX with the `large-v3` model produces comparable word-level accuracy to Audiate. If you're already using Audiate for recording, use its transcript directly. Otherwise, WhisperX is a solid free alternative.