Format Specification
SpeechDown provides a clear set of conventions for structuring your audio notes.
1. File Naming Convention
- Date-based naming: Files are named using the `YYYY-MM-DD.md` format for easy chronological sorting.
- Optional suffix: For multiple note files on the same day, a descriptive suffix can be added, like `YYYY-MM-DD-afternoon.md`.
- Directory Structure: Files should be stored in a consistent directory, such as `notes/`.
2. Content Structure
- YAML Frontmatter: Each file begins with YAML frontmatter containing essential metadata.
- Timestamped Sections: Audio segments are organized under H2 headings with `## HH:MM:SS` timestamps.
- Transcribed Content: The transcribed text for each segment follows its timestamped heading.
3. Metadata Standards
The frontmatter includes key details for context and automation:
- `date`: The creation date of the note.
- `languages`: A list of languages detected in the audio (e.g., `[en, fr]`).
- `source`: The name of the original audio file.
- `processed_at`: The timestamp when the file was processed.
Example Format:
---
date: 2025-06-18
languages: [en, fr]
source: morning-notes.m4a
processed_at: 2025-06-18T09:30:00Z
---
## 09:15:23
Quick thoughts about the project direction...
## 09:18:45
Switching to French: Quelques idées sur...
Why Standardize Your Voice Notes?
Tool Independence
Your notes are in plain text Markdown, ensuring they outlive any single application. Migrate between tools without losing your data.
Consistent Organization
A predictable, date-based file structure makes your notes easy to browse, search, and integrate with other systems.
Searchability & Accessibility
Timestamped content allows for precise referencing of audio segments. Markdown is universally compatible and Git-friendly.
Multilingual Support
Standardized metadata for language handling makes it easy to manage notes in multiple languages within a consistent format.
Implementations
The SpeechDown format is implementation-agnostic, but a reference CLI tool is available to get you started.
- Reference Implementation: The SpeechDown CLI is the primary tool demonstrating the format's conventions in practice.
- Community Implementations: Developers are encouraged to build their own tools, scripts, and integrations that follow the SpeechDown format.
- Integration Possibilities: The standardized format is ideal for creating custom scripts, automation workflows, and plugins for note-taking apps.
Getting Started
You can adopt the SpeechDown format manually or use the reference CLI for an automated workflow.
1. Adopt the Format Manually
Start by creating `.md` files following the naming and content structure defined in the Format Specification. This is a great way to integrate SpeechDown with your existing note-taking system.
2. Quick Implementation with the CLI
For a fully automated experience, use the SpeechDown CLI. Install it with pipx:
pipx install git+https://github.com/dudarev/speechdown
Then, initialize your notes directory and start transcribing:
sd init
sd transcribe path/to/your/audio.m4a
Use Cases
Personal Voice Journaling
Maintain a consistent, timestamped daily journal of your thoughts and reflections, accessible on any device.
Meeting & Interview Notes
Create standardized, searchable transcripts of meetings and interviews for easy sharing and collaboration.
Multilingual Content Creation
Manage voice notes in multiple languages with consistent metadata, perfect for international teams.
Research & Study Notes
Compile and organize voice memos for research projects, with a timestamped system for precise referencing.