Microsoft Introduces 3 Foundational AI Models To Take on OpenAI, Anthropic


Images generated by MAI-Image-1.

Credit: Microsoft

On Thursday, Microsoft introduced three new foundational AI models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2—focused on transcription, audio, and image generation, respectively. The tech giant positions them as in-house systems that will provide it with better control over cost, performance, and integration across its software and cloud services.

MAI-Transcribe-1 offers text-to-speech transcription in 25 different languages. This could be used to create instant transcripts of Teams meetings or customer-facing phone calls. Microsoft describes MAI-Transcribe-1 as “lightning fast,” meaning it should produce captions or transcripts with very low latency. The brand also reports its model as having a lower word error rate than GPT-Transcribe, Gemini 3.1 Flash, and other transcription-focused AI models.

MAI-Voice-1 is a voice-generation model aimed at providing “voice experiences and voice agents” with nuance and emotional expression. It can reportedly produce 60 seconds of audio in just one second.

Finally, MAI-Image-2 targets marketing, design, and other professionals who may want to generate visuals through Copilot experiences and Azure APIs. Microsoft has also begun phased rollouts of MAI-Image-2 in Bing and PowerPoint.

All three models are available through Microsoft’s Azure AI platform and MAI Playground, where businesses can test, customize, and deploy them.



Source link

Recent Articles

spot_img

Related Stories