20.8 C
New York
Thursday, April 24, 2025

Amazon Nova Sonic Audio Generation AI Model Released, Can Process Speech in Real-Time


Amazon launched a brand new synthetic intelligence (AI) mannequin in its flagship Nova household of fashions on Tuesday. Dubbed Amazon Nova Sonic, it’s a voice technology mannequin able to producing human-like speech. However, it isn’t a text-to-speech (TTS) device; as a substitute, it may course of voice enter in actual time and reply to it. The Seattle-based tech large says builders can use the mannequin to construct conversational AI chatbots and related instruments. Notably, the Amazon Nova Sonic AI mannequin additionally helps practical calling and gear use, making it suitable with agentic utility developments as properly.

Amazon Nova Sonic Is Available As an API

In a blog post, the tech large introduced the discharge of the Amazon Nova Sonic. The firm mentioned conventional approaches to voice-enabled functions use a fancy with a number of fashions comparable to textual content recognition, speech-to-text conversion, information processing, and TTS fashions. This typically results in a rise in latency, and failure in preserving linguistic context, the put up added.

Amazon mentioned its strategy with the Nova Sonic mannequin was to unify speech understanding and speech technology parts. The AI mannequin is claimed to have the ability to course of information and generate speech in actual time, giving it a conversation-like expertise. This unified system additionally permits the mannequin to higher perceive the tempo and timbre of enter speech to contextualise the intent of the person.

Additionally, the AI mannequin can perceive totally different talking kinds in addition to separate masculine and feminine-sounding voices in several accents. It may also perceive when a person misspeaks, mumbles, or pauses whereas talking. Amazon says the mannequin can decide up speech even in a loud setting.

In response technology, the corporate claims the mannequin could be extra expressive and human-like, and might modify its response type to match the context of the dialog. Currently, the AI mannequin solely helps the English language. Amazon mentioned assist for extra languages will likely be added quickly. The mannequin helps a context window of 32,000 tokens for audio, with a further window to deal with longer conversations. It has a default session restrict of eight minutes.

To use the Nova Sonic mannequin, builders can head to Amazon Bedrock and discover it below the mannequin entry choice. It can be accessed by way of a bidirectional streaming utility programming interface (API) that may each course of audio enter and generate output.



Latest Posts

Don't Miss