Audio recordings

Paper transcribes audio files with Whisper and treats the transcript as a regular source: it can be summarised, used to make flashcards, or chatted with.

Upload audio

Open a page.
In the Sources panel, click + Add source → Upload a file.
Pick the file. Common audio formats work - .mp3, .m4a, .wav, .ogg, .flac.

There are two paths depending on size:

Short recordings (up to 25 MB - roughly an hour of mono MP3) are sent straight to Whisper.
Longer recordings (up to 500 MB) are uploaded to storage first, then the audio is extracted and transcribed on the server. This takes a few minutes.

After transcription

Read the transcript alongside an audio player. Click any line to jump to that point.
Chapters are auto-generated - Paper groups the transcript into 5–15 topic sections with timestamps and a short summary of each.
Summarise to get the key points without listening end-to-end.
Quote in chat - citations include timestamps so you can verify.

Tips

If the audio is noisy, accuracy drops. Fix obvious errors in the transcript before generating flashcards or summaries - they pick up your corrections.
For your own voice memos, recording closer to the mic and avoiding background noise makes a big difference.
For a series of recordings (multi-part interviews, weekly lectures), add each as its own source in the same page so chat can reason across them all.

Paper transcribes audio files with Whisper and treats the transcript as a regular source: it can be summarised, used to make flashcards, or chatted with.

Upload audio

Open a page.
In the Sources panel, click + Add source → Upload a file.
Pick the file. Common audio formats work - .mp3, .m4a, .wav, .ogg, .flac.

There are two paths depending on size:

Short recordings (up to 25 MB - roughly an hour of mono MP3) are sent straight to Whisper.
Longer recordings (up to 500 MB) are uploaded to storage first, then the audio is extracted and transcribed on the server. This takes a few minutes.

After transcription

Read the transcript alongside an audio player. Click any line to jump to that point.
Chapters are auto-generated - Paper groups the transcript into 5–15 topic sections with timestamps and a short summary of each.
Summarise to get the key points without listening end-to-end.
Quote in chat - citations include timestamps so you can verify.

Tips

If the audio is noisy, accuracy drops. Fix obvious errors in the transcript before generating flashcards or summaries - they pick up your corrections.
For your own voice memos, recording closer to the mic and avoiding background noise makes a big difference.
For a series of recordings (multi-part interviews, weekly lectures), add each as its own source in the same page so chat can reason across them all.