Why GPT-4o Is the Next Big Thing in AI

OpenAi showed off their new GPT-4o model in a live stream on May 13, 2024, and I believe we are now one step closer to a future where we can talk to AI assistants the same way we talk to humans.

The features I saw, reminded me of ‘Her’ with Joaquin Phoenix and Scarlett Johansson, where a writer falls in love with an AI chatbot after spending some time with it.

Although chatbots aren’t a new thing, OpenAI has added a myriad of features to ChatGPT to make it more expressive. Not just that, the new model is much more responsive, and it’s going to be free for all users, with paid users getting up to five times more capacity limits.

So, let’s talk about what’s new with the newest model of ChatGPT.

GPT-4o

GPT-4o is a new iteration of the GPT-4 model that powers ChatGPT. The latest model is a lot faster and adds to the capabilities of GPT in text, audio, and vision.

The best part? It’s going to be free for all users.

Sam Altman, OpenAI’s CEO, said that GPT-4o is natively multimodal, meaning that it can generate content and understand commands in text, voice, and images. The model is also now half the price and is more than two times fast, compared to GPT-4 Turbo.

The Voice mode is also getting new features now, mainly being able to respond in real-time and respond to the world around you. Let’s talk about some of the model’s capabilities that were shown in the livestream.

Real-time Conversational Speech

What’s fascinating about GPT-4o is that it can now understand context and comment on sounds rather than the text.

So, for example, in a live demo, one of the presenters said that they were nervous about presenting to an audience. GPT suggested to take a deep breath. He then demonstrated GPT’s listening abilities by breathing very fast and loudly, to which GPT responded, “Whoa, slow a bit there. Mark, you’re not a vacuum cleaner.”

Once he started breathing normally, GPT told him to breathe slow and steady and asked how he felt after he was done.

If you’re someone who already uses the Voice mode in ChatGPT, you’re going to notice a few key changes. First, you can now interrupt them, whereas before you had to wait for GPT to stop speaking or cancel the output manually.

Secondly, the model is real-time now, meaning that it won’t have the annoying 2-3 second delay before it starts talking. And third, my personal favorite, is that GPT can now understand your emotions when you talk, and it responds accordingly.

A New User Interface

OpenAI is going to redesign the interface offered by the ChatGPT app on mobile as well as release a new app for Windows and macOS users.

The refreshed UI of the app is going to make it easier for you to interact with GPT-4o on a conversational level without needing to touch the screen too much. The interface will now also allow you to share videos and pictures.

Like other voice assistants, like Siri and Alexa, you can also activate GPT with a familiar phrase like “Hey, ChatGPT.”

The model has not only become a lot better now but also much cheaper and faster, which is what I expect to see with future updates.

In the perfect future, I want to see ChatGPT working natively on the device and only connecting to the internet to grab nececcary information. But, I expect that this future is a little far for now.

Translation in Real Time

The thing that fascinated me the most was its ability to translate two languages in real time with seemingly no lag.

In the live demo, the two presenters demonstrated their translation abilities by making it act as a translator for English and Spanish.

They started by giving it a prompt to listen to the two people talking. One of them spoke Spanish and the other was an English speaker. GPT-4o was then told to translate the two languages in real-time.

After that, the English speaker spoke and after a second, GPT-4o translated it into Spanish. The Spanish speaker then spoke and GPT translated it to English in real time.

Although the model is seemingly working in real-time, it needs access to a fast and stable internet connection to provide fast responses, especially when accessing the internet for information. GPT-4’s paid subscription also allows you to use Microsoft’s Browse with Bing feature.

To make sure the experience is as seamless as possible, we recommend using an internet provider like Spectrum, which isn’t just a reliable and fast internet option but also widely available throughout the country. Spanish speakers can head over to Spectrum en español to explore more.

If you’re going to use the app on a PC, I recommend using a wired connection to ensure there’s no lag.

While it hasn’t even been two years since the release of ChatGPT, we are seeing miraculous new things being added to the model. In the future, I hope to see a model which can run on our devices natively, without needing to connect to the internet for everything.

The future of AI has just started and it’s going to be an amazing journey ahead. With the next iterations of ChatGPT, we expect to see the models become smarter, and faster, as well as run on hardware locally with the help of neural engines in modern PCs and smartphones.

Jim Miller

Jim’s passion for Apple products ignited in 2007 when Steve Jobs introduced the first iPhone. This was a canon event in his life. Noticing a lack of iPad-focused content that is easy to understand even for “tech-noob”, he decided to create Tabletmonkeys in 2011.

Jim continues to share his expertise and passion for tablets, helping his audience as much as he can with his motto “One Swipe at a Time!”