Whisper Speech Recognition MCP Server

Fast-Whisper-MCP-Server is a high-performance speech recognition server built on Faster Whisper, delivering efficient audio transcription.

To use it, clone its repository, install the dependencies, and start the server using the provided scripts. It can then be configured for use with compatible applications such as Claude Desktop.

Its key features include integration with Faster Whisper for efficient recognition, batch processing acceleration, and automatic CUDA acceleration when available. It supports multiple model sizes from tiny to large-v3, outputs formats like VTT, SRT, and JSON, and employs model instance caching. It also features dynamic batch size adjustment based on GPU memory.

Use cases include transcribing audio for content creation, enabling real-time speech recognition in applications, and batch processing multiple audio files for analysis.

For system requirements, it needs Python 3.10+, Faster Whisper, and PyTorch with CUDA support for optimal performance. It can handle multiple audio files at once via batch transcription in a folder. Currently, it is command-line based but can be integrated with GUI applications like Claude Desktop.

Overview