How to Get Live Captions for Discord Voice Chat in 2026
Discord has text-to-speech — type /tts and the robot reads your message aloud. But what about the other direction? If your squad is shouting callouts in a Valorant match and you can't hear them, Discord offers you... nothing.
There's no built-in way to see what people are saying in a voice channel as text on your screen. The feature request has been sitting on Discord's feedback forum for years. If you're deaf, hard of hearing, a non-native English speaker, or just someone gaming at 2am who can't turn on speakers — you're left out of voice chat entirely.
The good news: several tools now solve this problem. Some are Discord bots that join your voice channel and transcribe. Some are desktop apps that caption any audio on your computer. Each approach has real tradeoffs in accuracy, latency, privacy, and setup complexity.
This guide covers every option available in 2026, with honest pros and cons, so you can pick what actually works for your situation.
The Quick Comparison
Before diving into details, here's how every approach stacks up:
Tool | How it works | Speaker labels | Gaming overlay | Latency | Languages | Price |
|---|---|---|---|---|---|---|
CaptionsRush (bot mode) | Personal Discord bot you create | Yes — per user | Yes | <50ms | 125+ | Free beta |
CaptionsRush (system audio) | Captures all desktop audio | No | Yes | <50ms | 125+ | Free beta |
Scripty | Server-wide Discord bot | No | No | Fast | 55 | Free |
Scriptly | Server-wide Discord bot | Yes (premium) | No | Near real-time | Multiple | Free / $4.99-49.99/mo |
SeaVoice | Server-wide Discord bot | Yes | No | Near real-time | 12 | Free |
Windows Live Captions | Built into Windows 11 | No | No | ~200ms | 40+ (translation on Copilot+ PCs) | Free |
Mac Live Captions | Built into macOS (Apple Silicon) | No | No | ~200ms | Limited | Free |
Now let's look at each one in detail.
Option 1: CaptionsRush
Best for: Gamers who want captions without leaving their game, deaf/HoH players who need low latency and speaker identification.
CaptionsRush is a desktop app built specifically for gaming voice chat. It offers two modes, depending on how much setup you want to do.
Bot Mode (Recommended for Gaming)
You create a personal Discord bot through Discord's Developer Portal and connect it to CaptionsRush. This takes about 5 minutes the first time, and then the bot stays in your servers permanently. When you want captions, you type /join in any text channel and the bot enters your voice channel.
Why bother with the bot setup? Because the bot receives each person's audio as a separate stream directly from Discord. That means CaptionsRush can label who's speaking ("Alex: push B site now") and the audio is clean — no game explosions, no NPC dialogue, no background music mixed in. Just voices.
For competitive gaming, this matters. When your Valorant teammate says "flank left," you need to read that now, not 300 milliseconds from now after the tool has tried to separate their voice from gunfire audio. The bot mode sidesteps that problem entirely.
Setup: Follow the Discord bot setup guide →
System Audio Mode
If you don't want to set up a bot, CaptionsRush can also just caption whatever audio is playing on your computer — Discord included. You launch it, it listens, captions appear in the overlay. No bot, no server permissions, no configuration.
The tradeoff: since it's hearing the same mixed audio your speakers play, it can't tell who's speaking, and game sounds can sometimes interfere with accuracy. For casual Discord calls or when you're watching Twitch streams, this works great. For competitive ranked matches where you need per-speaker callouts, the bot mode is worth the setup.
Setup: System audio setup guide →
What makes it different
The in-game overlay is the main thing. Captions appear on top of your game — you customize where they go, how big they are, what color and opacity. You don't need to alt-tab to a text channel to read what someone said. For gamers, that's the whole point: staying in the game while reading your team's comms.
Platforms: Windows (full gaming overlay + all audio), Mac (all audio, gaming overlay coming soon) Price: Free during beta
Option 2: Scripty
Best for: Privacy-focused users who want a free, open-source, no-data-collection option.
Scripty is a Discord bot that runs its speech recognition entirely offline. Your voice data never leaves the server it's processing on — nothing goes to Google, Amazon, or any third-party cloud. For a free tool, that's a genuinely strong privacy stance.
A server admin invites Scripty to the server, runs a setup command, and asks it to join a voice channel. Transcriptions appear in a designated text channel. It supports 55 languages and processes quickly.
The tradeoff is accuracy. Scripty uses an open-source speech model trained on public voice datasets. For clear, well-enunciated speech with a good mic, it works quite well — users on Top.gg regularly praise its accuracy after recent updates. But in a chaotic gaming voice channel with overlapping speakers, background noise, and people shouting abbreviated callouts, it struggles more than cloud-based alternatives.
The other limitation: transcriptions go to a text channel, not an overlay. During a game, you'd need to alt-tab or have a second monitor to read them.
Setup: Invite from scripty.org, run the setup command, done. Platforms: Any (it's a Discord bot, runs server-side) Price: Free forever, with optional premium tiers for supporters
Option 3: Scriptly
Best for: Communities that want a polished, server-wide transcription and TTS solution with premium accuracy options.
Scriptly is another Discord bot focused on accessibility. Its free tier provides standard transcription and text-to-speech. What sets Scriptly apart is its premium "Ultra-Real-Time" transcription mode, which delivers word-by-word captioning as people speak — faster and more accurate than the standard tier.
It also supports transcribing Discord voice messages (not just live voice channels), which is a nice touch for servers where people send voice notes.
The pricing structure is tiered: $4.99/month for personal TTS premium, $7.99/month for server TTS premium, and $12.99-49.99/month for tiers that include the Ultra-Real-Time transcription with 10-100 hours of premium transcription time per month. When you exceed your hours, you fall back to standard (free) transcription.
Like Scripty, transcriptions go to a text channel — no gaming overlay. And because it's a server-level bot, a server admin needs to add it. You can't just use it in any server you happen to join.
Setup: Invite from scriptly.xyz, use /transcribe to start. Platforms: Any (Discord bot) Price: Free tier + premium from $4.99/month
Option 4: SeaVoice
Best for: Users who want accurate transcription with session recordings and transcript downloads.
SeaVoice is built by Seasalt.ai, a Seattle startup specializing in speech technology. Their English and Taiwanese Mandarin models are trained in-house and users report strong accuracy. They support 12 languages total, with other languages running on a tuned open-source model.
A unique feature: when a transcription session ends, SeaVoice DMs you a full transcript file, an SRT subtitle file, and a download link for the complete audio recording. If you're transcribing D&D sessions, team meetings, or anything you want to reference later, that's genuinely useful.
The honesty from the SeaVoice team is also refreshing — they openly acknowledge that their non-English models can sometimes "hallucinate" (insert words that weren't said or transcribe in the wrong language). That kind of transparency is worth noting.
Like the other bots, transcriptions go to a text channel. No overlay. And it requires server-level permissions to add.
One concern: a recent review on Top.gg questioned whether the project is still actively maintained. Worth checking their Discord server for current status before relying on it.
Setup: Invite from Discord App Directory or Top.gg. Platforms: Any (Discord bot) Price: Free
Option 5: Windows Live Captions
Best for: Windows 11 users who want zero-setup captions for any audio, and don't need a gaming overlay.
If you're on Windows 11, you already have a surprisingly good captioning tool built into your operating system. Press Win + Ctrl + L and live captions appear for any audio playing on your PC — Discord included.
Windows Live Captions processes everything on-device (nothing goes to the cloud), it's fast, and for clear audio, it's quite accurate. On Copilot+ PCs, it can even translate from 40+ languages into English in real time.
The limitations for gamers are significant though. There's no in-game overlay — captions appear in a separate window that sits on top of or below your screen. There's no speaker identification — you can't tell who said what. It's not optimized for voice chat audio where multiple people talk over each other. And it's Windows 11 only — if you're on Windows 10, it's not available.
For casual use — watching a YouTube video, joining a quick Discord call, listening to a podcast — Windows Live Captions is excellent and free. For competitive gaming where you need to instantly read who said "enemy flanking right" while keeping your eyes on the game, it falls short.
Setup: Press Win + Ctrl + L. That's it. Platforms: Windows 11 only Price: Free
Option 6: Mac Live Captions
Best for: Mac users with Apple Silicon who need basic captioning for calls and media.
Apple's Live Captions improved significantly with macOS Tahoe. It now handles clean, noise-free audio well and processes everything on-device. Turn it on in System Settings → Accessibility → Live Captions.
The limitations: Apple Silicon Macs only (no Intel Macs), language support is still limited compared to Windows, and it's weaker at picking speech out of noisy backgrounds. There's no overlay mode for gaming, and no speaker identification.
For Mac users on Discord calls, Zoom meetings, or watching uncaptioned videos, it's a solid free option. For gaming specifically, Mac gaming is limited to begin with, and Live Captions doesn't offer the overlay or latency characteristics gamers need.
Setup: System Settings → Accessibility → Live Captions → On. Platforms: macOS (Apple Silicon only) Price: Free
Which Should You Choose?
The right tool depends on your situation:
You're a competitive gamer who needs to read callouts in real-time without leaving your game. → CaptionsRush (bot mode). The per-speaker labels and in-game overlay are built for this exact scenario. The one-time bot setup takes 5 minutes and then it's permanent.
You game casually and want something simple with no setup. → Windows Live Captions (if on Win 11) or CaptionsRush in system audio mode. Both work instantly with no configuration.
You run a Discord server and want transcription available for everyone. → Scripty (if privacy matters most) or Scriptly (if accuracy matters most). These are server-level bots that work for all members.
You want transcription recordings and downloadable transcripts. → SeaVoice. The post-session transcript files and audio downloads are unique.
You use Discord on Mac for calls and want basic captioning. → Mac Live Captions or CaptionsRush in system audio mode.
You're on a budget and don't want to pay anything, ever. → Scripty (completely free forever), Windows/Mac Live Captions (built-in), or CaptionsRush (free during beta).
Why Discord Hasn't Built This Yet
It's worth asking: why doesn't Discord just add voice-to-text? They already have text-to-speech. They have voice activity detection. They clearly have the audio infrastructure.
The feature request for live captioning has been one of the most upvoted accessibility requests on Discord's feedback platform for years. Discord has made improvements in other accessibility areas, but native speech-to-text for voice channels remains missing.
Until that changes, third-party tools are the only option. The good news is that several of them — including the ones covered in this guide — work well enough today that you don't have to wait for Discord to act.
Getting Started
If you're ready to try captions in Discord, here's the fastest path depending on what you want:
Fastest setup (30 seconds): Turn on Windows Live Captions with Win + Ctrl + L, or launch CaptionsRush in system audio mode.
Best gaming experience (5 minutes): Set up CaptionsRush with the Discord bot for per-speaker labels and an in-game overlay.
Server-wide solution (2 minutes): Invite Scripty or Scriptly to your server.
Whichever you choose, being able to read what your teammates are saying changes the game — literally. No more asking "what did you say?" No more missing the callout that cost the round. Just captions, running at the speed of the conversation.
CaptionsRush is built by a hard-of-hearing gamer who got tired of missing callouts. It's free during beta — try it here.