OpenAI is launching a new flagship generative AI model called GPT-4o, which will be introduced “iteratively” in the company's developer and consumer products over the coming weeks. There had been speculation that a search engine would be deployed, but CEO Sam Altman denied the rumors.
OpenAI CTO Muri Murati said GPT-4o delivers “GPT-4 level” intelligence while enhancing GPT-4’s capabilities in text, vision, and now audio.
Murati highlighted the increasing complexity of these models and the goal of making interactions more natural and effortless, saying: “We want the interaction experience to actually become more natural, easier, and not have you concentrate not at all about the UI, but simply about the UI. focus on collaboration with (GPT).
Say hello to GPT-4o, our new flagship model that can reason in real-time about audio, vision and text: https://t.co/MYHZB79UqN
Text and image input will roll out to the API today and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx
-OpenAI (@OpenAI) May 13, 2024
What are the features of GPT-4o?
During a speech at the OpenAI offices, Murati explained: “GPT-4o reasons through voice, text and vision. This is extremely important as we envision the future of interaction between us and machines.
@openai GPT-4o reasons through text vision and speech.
From today everyone can use
-GPT and ChatGPT-4o
-vision
-memory
-browse (search your discussions)
-quality and speed in 50 different languages
free.Paid users will have 5x more capacity
ChatGPT-4o is:
2x faster… pic.twitter.com/7E5UQuV0dB– Erik Machorse (@erikmachorse) May 13, 2024
The predecessor, GPT-4, was capable of processing both images and text, performing tasks such as extracting text from images or describing their content. GPT-4o extends them features to include speech.
Significantly change the ChatGPT experience, GPT-4o enables more interactive and wizard-like interactions. Previously, ChatGPT included a voice mode that converted text to speech. Now, GPT-4o improves this functionality, allowing users to interrupt ChatGPT during replies, with the model providing “real-time” responsiveness. It can also detect emotional signals in the user's voice and respond in different emotional tones.
GPT-4o also improves the visual capabilities of ChatGPT. Whether analyzing a photograph or a computer screen, ChatGPT can now quickly answer queries ranging from analyzing software code to identifying clothing brands. The company is also releasing a desktop version of ChatGPT and introducing a revamped user interface.
Starting today, the new model is accessible in the free tier of ChatGPT and is also available to OpenAI's ChatGPT Plus subscribers with “5x higher” message limits. OpenAI plans to introduce the new GPT-4o-powered voice feature to Plus users in alpha over the next month.
🚨 BREAKING: OpenAI's new voice assistant acts as a translator. Impressive range of emotions and fluidity throughout. pic.twitter.com/JPNJjLAGhn
– Zain Kahn (@heykahn) May 13, 2024
The model also has enhanced multilingual capabilities, with improved performance in 50 different languages, according to OpenAI. In OpenAI's API, GPT-4o runs at twice the speed of its predecessor, particularly GPT-4 Turbo, which costs half as much and offers higher rate limits.
What new features are available to free ChatGPT users?
With the deployment of GPT-4o, free ChatGPT users will experience a suite of new features, including GPT-4 level intelligence. Users will be able to receive responses directly from the model, as well as access information pulled from the web.
GPT-4o will also be able to perform data analysis and visualizations such as charting. People will also be able to use the chat feature to talk about their photos, allowing users to participate in discussions or search for information about the images they upload. The model also supports users with more complex tasks such as uploading files to help them summarize documents, write content, or perform detailed analyses.
Finally, there is now a Memory feature, designed to create a more useful experience, remembering previous interactions and context to provide a more consistent and personalized user journey.
Featured image: Canva