Microsoft Introduces 3 Foundational AI Models To Take on OpenAI, Anthropic

Images generated by MAI-Image-1.

Credit: Microsoft

On Thursday, Microsoft introduced three new foundational AI models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2—focused on transcription, audio, and image generation, respectively. The tech giant positions them as in-house systems that will provide it with better control over cost, performance, and integration across its software and cloud services.

MAI-Transcribe-1 offers text-to-speech transcription in 25 different languages. This could be used to create instant transcripts of Teams meetings or customer-facing phone calls. Microsoft describes MAI-Transcribe-1 as “lightning fast,” meaning it should produce captions or transcripts with very low latency. The brand also reports its model as having a lower word error rate than GPT-Transcribe, Gemini 3.1 Flash, and other transcription-focused AI models.

MAI-Voice-1 is a voice-generation model aimed at providing “voice experiences and voice agents” with nuance and emotional expression. It can reportedly produce 60 seconds of audio in just one second.

Finally, MAI-Image-2 targets marketing, design, and other professionals who may want to generate visuals through Copilot experiences and Azure APIs. Microsoft has also begun phased rollouts of MAI-Image-2 in Bing and PowerPoint.

All three models are available through Microsoft’s Azure AI platform and MAI Playground, where businesses can test, customize, and deploy them.

Source link

Microsoft Introduces 3 Foundational AI Models To Take on OpenAI, Anthropic

Recent Articles

MI5 and PSNI ordered to pay damages to BBC journalist Vincent Kearney over unlawful surveillance

Reddit Weighs Tighter Google AI Data Access in Deal Renewal Talks

Alexa Plus is getting an AI update to handle more complicated instructions

Meta won’t have to face the next planned social media addiction trial

Google commits $40M to the Genesis Mission

Related Stories