SpeechDown: A Format for Voice Notes

A standardized format for organizing spoken audio notes into timestamped, structured Markdown files. SpeechDown defines conventions for file organization, content structure, and metadata that ensure your voice notes remain accessible, searchable, and future-proof across any tool or workflow.

Format Specification

SpeechDown provides a clear set of conventions for structuring your audio notes.

1. File Naming Convention

  • Date-based naming: Files are named using the `YYYY-MM-DD.md` format for easy chronological sorting.
  • Optional suffix: For multiple note files on the same day, a descriptive suffix can be added, like `YYYY-MM-DD-afternoon.md`.
  • Directory Structure: Files should be stored in a consistent directory, such as `notes/`.

2. Content Structure

  • YAML Frontmatter: Each file begins with YAML frontmatter containing essential metadata.
  • Timestamped Sections: Audio segments are organized under H2 headings with `## HH:MM:SS` timestamps.
  • Transcribed Content: The transcribed text for each segment follows its timestamped heading.

3. Metadata Standards

The frontmatter includes key details for context and automation:

  • `date`: The creation date of the note.
  • `languages`: A list of languages detected in the audio (e.g., `[en, fr]`).
  • `source`: The name of the original audio file.
  • `processed_at`: The timestamp when the file was processed.

Example Format:

---
date: 2025-06-18
languages: [en, fr]
source: morning-notes.m4a
processed_at: 2025-06-18T09:30:00Z
---

## 09:15:23
Quick thoughts about the project direction...

## 09:18:45
Switching to French: Quelques idées sur...

Why Standardize Your Voice Notes?

Tool Independence

Your notes are in plain text Markdown, ensuring they outlive any single application. Migrate between tools without losing your data.

Consistent Organization

A predictable, date-based file structure makes your notes easy to browse, search, and integrate with other systems.

Searchability & Accessibility

Timestamped content allows for precise referencing of audio segments. Markdown is universally compatible and Git-friendly.

Multilingual Support

Standardized metadata for language handling makes it easy to manage notes in multiple languages within a consistent format.

Implementations

The SpeechDown format is implementation-agnostic, but a reference CLI tool is available to get you started.

  • Reference Implementation: The SpeechDown CLI is the primary tool demonstrating the format's conventions in practice.
  • Community Implementations: Developers are encouraged to build their own tools, scripts, and integrations that follow the SpeechDown format.
  • Integration Possibilities: The standardized format is ideal for creating custom scripts, automation workflows, and plugins for note-taking apps.

Getting Started

You can adopt the SpeechDown format manually or use the reference CLI for an automated workflow.

1. Adopt the Format Manually

Start by creating `.md` files following the naming and content structure defined in the Format Specification. This is a great way to integrate SpeechDown with your existing note-taking system.

2. Quick Implementation with the CLI

For a fully automated experience, use the SpeechDown CLI. Install it with pipx:

pipx install git+https://github.com/dudarev/speechdown

Then, initialize your notes directory and start transcribing:

sd init
sd transcribe path/to/your/audio.m4a

Use Cases

Personal Voice Journaling

Maintain a consistent, timestamped daily journal of your thoughts and reflections, accessible on any device.

Meeting & Interview Notes

Create standardized, searchable transcripts of meetings and interviews for easy sharing and collaboration.

Multilingual Content Creation

Manage voice notes in multiple languages with consistent metadata, perfect for international teams.

Research & Study Notes

Compile and organize voice memos for research projects, with a timestamped system for precise referencing.

Newsletter

Follow the evolution of the SpeechDown open standard. Our newsletter delivers format updates, adopter showcases, and a curated look at the best voice-first tools and workflows for making your spoken ideas organized and actionable.