Stable Audio Open Released by Stability AI as an Open-Source Text-to-Audio Generator

TL;DR


1. Stability AI, the company behind the popular Stable Diffusion text-to-image model, has announced the launch of Stable Audio, a new text-to-audio generation model. Stable Audio is designed to generate high-quality audio from text prompts, similar to how Stable Diffusion generates images. The model is currently in open beta, allowing users to experiment with its capabilities.

2. Stable Audio is built on top of Whisper, an open-source speech recognition model developed by OpenAI. The Stable Audio model has been trained on a large dataset of text-audio pairs, enabling it to generate audio that closely matches the provided text prompts. The company claims that Stable Audio can produce natural-sounding speech, music, and other audio outputs.

3. The launch of Stable Audio is part of Stability AI's broader efforts to expand its AI capabilities beyond just text-to-image generation. The company sees the potential for text-to-audio generation to have a wide range of applications, from content creation and audio production to language learning and accessibility tools. Stable Audio is currently available as a web-based demo, and the company plans to release an API in the future to allow developers to integrate the model into their own projects.

Like summarized versions? Support us on Patreon!