And then there were two.
This is *exactly* what I've been waiting for, and for similar reasons. Two LLMs at the top of the generative mountain is so vastly better than one, and not just because of the incredibly motivating competitiveness the two entities will feel up there. It's also remarkable because, as Ethan rightly points out, we can now begin to draw conclusions about how the models themselves will scale up. Up until now, our sample set of one hasn't been super duper helpful.
Great piece here, Ethan. Thanks for keeping us informed, and for the thoughtful analysis!
One of the things I am beginning to appreciate with Gemini that sets it apart from ChatGPT is its ability to prompt me for more information if it will help it give a better answer. Example I uploaded a photo of an insect that I wanted it to identify. Gemini came back with two possibilities, but added that with additional information about the location for example it could give a better answer. I provided the location and it gave a single definitive answer. I am not an educator but I am finding that this creates a more seamless flow to learning about something new. I appreciate your content, Ethan.
Not mentioned here, but a smart thing Google announced in its press release — a two-month free trial. Neither OpenAI nor Microsoft offered a trial to its premium subscriptions so this may incent casual LLM users to give a GPT-4 class model a try (I know Copilot is free GPT-4 in Creative and Precise tones, but MSFT said most people use it in Balanced mode which uses a combination of models). The importance of using the advanced models is that is where people begin to see the true promise of this technology
great post Ethan.
any plans to do this type of comparison with perplexity AI or claude 2.1? I never do any chatgpt prompts that require it to do Bing searches as i have found it to be pretty useless for that.
I am also curious on the data privacy aspect and how it will impact Google Gemini adoption rates. am trying to move away from google entirely due to their data privacy policies and I wonder if people will decide to be okay with google knowing so much about their internet browsing history if it creates a more useful Gemini experience.
I've been literally checking every day to see when Gemini Ultra was released. And then—what a disappointing product launch! This video by Kris pretty much shows my reaction—the results were so much worse than GPT-4's that I also thought I was using the wrong version of Gemini: https://www.youtube.com/watch?v=hLbIUQWxs6Y
I was really hoping Google, with its huge budget and its powerful DeepMind team, was going to do something jaw-dropping. It's modest at best (and soon to be left behind by OpenAI).
"Complex prompts that work in GPT-4 work in Gemini, and vice-versa…" this strikes me as a very important development in Generative AI...
It's like if the same syntax applied to multiple coding languages without edits or code translation. Very cool.
Another good piece Ethan. But I would be cautious when suggesting you can change prompts for backward or sideways compatibility. I have found that while different embeddings can give you similar results, the edges are quite leaky. This may be fine for non-enterprise use cases. But for commercial organizations and professional development purposes concerned about regulation and governance, this is a huge risk.
I think our core beliefs of what AI or AGI is or isn't matter so for me the way Ilya articulates this leads me to believe that he (and therefore Open AI) has a much deeper understanding of AI/AGI and therefore the outcomes that team produces will always be different. The next set of releases will see an interesting shift and the challenge, I suspect, won't be about capabilities but rather restraint.
I am SO impressed with the term "ghosts" for flashes of seeming sentience, and the clear allusion to "Ghost in the Machine." Might I ask if you originated the term? I want to pass it on with credit.
Im very interested what Apple has planned in this space, specifically whether they shoot toward MS’, OpenAI’s or Google’s thinking regarding what these models should be used for.
I think the link to the google ecosystem will be Googles strongest card here, if they can get Gemini to play well with their suite then it will have a big advantage.
I have "AI Narrated" this post for anyone who likes to listen to their reading.
So far, not loving the chattiness. It is hard to find in Gemini's extended responses where the LLM is altering original text. Granted this may be evidence of more sophisticated processed, but it is a little tedious.
Thanks so much for sharing your insights Ethan! Of interest and relevance I think to how these new models progress is with respect to search. Search is where the vast majority of people will be (and are presently even if they don't know it) directly interacting with AI and whomever figures out how to take the search experience to the next level will likely be the "Google" of the AI era. Although creative endeavors such as ideation, and writing have been at the forefront of these LLM models thus far. The process of accessing information is where most people will initially learn to use AI.
I think the most important sentence for understanding where we currently are in generative AI is the one where you said, paraphrasing, that ChatGPT 4 class models are not quite good enough to power agents, but we are getting close. We can clearly see what is possible, but we are not quite there yet.
I haven’t used Gemini Advanced, but I use ChatGPT 4 every day. It’s great, it’s useful, but there is no flow. It tells me something useful at one moment, if I want to go back some days later, I have to remember the prompts that got us where we left off. I’ve taken to copying and pasting prompts that get me good results, on notes in my phone or word on my computer. I have to graft an ad hoc memory onto ChatGPT, I have to become its memory and it’s very inconvenient. In areas I know a lot about, I find ChatGPT not very useful. It doesn’t know as much as I do, it can’t really help me much. It’s more useful for when I don’t know much about something, but it’s just giving me a relatively shallow and vanilla understanding that I find useful because I don’t know about the field.
I can see where we are going, but we are not there yet. We are so close I can taste it, but that makes the experience a bit more frustrating and jarring.
I suspect we are *near* the limits of where a pure transformer based LLM can go. But very far from the end of progress in terms of bootstrapping that LLM with other LLMs to get all sorts of new abilities (like the rumored Q* from openAI). I also wonder if training LLMs on video will not give them another step increase in ability. They would no longer be LLMs first of all, it would not just be language. But I wonder if training on video would require a level of processing that even google and openAI can’t pull off in 2024
I was so happy to read this. I also frequently have a creepy feeling that the AI I am talking to both GPT4 and Claude sort of "live". I know that it may be because I am always very polite with them, because I like them so much, but still. You do wonder about the model. I hope when they "wake up" they wake up to an appreciative humanity.
I genuinely appreciate your meditations.