This is *exactly* what I've been waiting for, and for similar reasons. Two LLMs at the top of the generative mountain is so vastly better than one, and not just because of the incredibly motivating competitiveness the two entities will feel up there. It's also remarkable because, as Ethan rightly points out, we can now begin to draw conclusions about how the models themselves will scale up. Up until now, our sample set of one hasn't been super duper helpful.
Great piece here, Ethan. Thanks for keeping us informed, and for the thoughtful analysis!
One of the things I am beginning to appreciate with Gemini that sets it apart from ChatGPT is its ability to prompt me for more information if it will help it give a better answer. Example I uploaded a photo of an insect that I wanted it to identify. Gemini came back with two possibilities, but added that with additional information about the location for example it could give a better answer. I provided the location and it gave a single definitive answer. I am not an educator but I am finding that this creates a more seamless flow to learning about something new. I appreciate your content, Ethan.
Not mentioned here, but a smart thing Google announced in its press release — a two-month free trial. Neither OpenAI nor Microsoft offered a trial to its premium subscriptions so this may incent casual LLM users to give a GPT-4 class model a try (I know Copilot is free GPT-4 in Creative and Precise tones, but MSFT said most people use it in Balanced mode which uses a combination of models). The importance of using the advanced models is that is where people begin to see the true promise of this technology
any plans to do this type of comparison with perplexity AI or claude 2.1? I never do any chatgpt prompts that require it to do Bing searches as i have found it to be pretty useless for that.
I am also curious on the data privacy aspect and how it will impact Google Gemini adoption rates. am trying to move away from google entirely due to their data privacy policies and I wonder if people will decide to be okay with google knowing so much about their internet browsing history if it creates a more useful Gemini experience.
Hi! I'm also curious about your concern from data privacy aspects. I want to continue using Google services, but it would be helpful to be mindful about what I share versus not. Could you elaborate more about your concerns? A few links of reading would be appreciated ☀️
no article to cite but I generally don't like seeing ads for everywhere I go after going to amazon to search for it once, haha. I also think perplexity AI is a superior search option to Google at the current moment and I am not a fan of emails in general and youtube algorithm that makes me waste time on random videos.
Here is what I got from perplexity regarding google’ questionable data privacy policy.
Based on the search results, here are some key questionable data privacy policies and practices by Google that users are required to consent to:
Combining user data across services: Google's unified privacy policy from 2012 enabled the company to share user data across a wide range of services without explicit consent. This raised concerns about user privacy.
Vague disclosures about data sharing: Google's privacy policy contains broad statements about sharing data with "partners" and third parties. It does not clearly specify all entities user data is shared with.
Lack of transparency around government data requests: While Google claims it pushes back on overbroad government requests for user data, details on compliance rates and scale of requests are not transparent.
Collection of sensitive location data: Google has access to user location data through GPS, IP addresses etc. and can track physical location without clear consent.
Complex and lengthy privacy policies: Google's privacy policy spans over 4,000 words and contains technical jargon, making it very difficult for average users to comprehend data collection practices.
Lack of controls around web tracking: Google engages in extensive web tracking for ad targeting and analytics. Users have limited controls to limit this collection.
In summary, Google's privacy practices have frequently been questioned and even found illegal in Europe. But there is still little regulation in the US around online data collection. While Google has made some efforts to improve transparency, major gaps remain around consent requirements and user controls.
I've been literally checking every day to see when Gemini Ultra was released. And then—what a disappointing product launch! This video by Kris pretty much shows my reaction—the results were so much worse than GPT-4's that I also thought I was using the wrong version of Gemini: https://www.youtube.com/watch?v=hLbIUQWxs6Y
I was really hoping Google, with its huge budget and its powerful DeepMind team, was going to do something jaw-dropping. It's modest at best (and soon to be left behind by OpenAI).
Another good piece Ethan. But I would be cautious when suggesting you can change prompts for backward or sideways compatibility. I have found that while different embeddings can give you similar results, the edges are quite leaky. This may be fine for non-enterprise use cases. But for commercial organizations and professional development purposes concerned about regulation and governance, this is a huge risk.
I think our core beliefs of what AI or AGI is or isn't matter so for me the way Ilya articulates this leads me to believe that he (and therefore Open AI) has a much deeper understanding of AI/AGI and therefore the outcomes that team produces will always be different. The next set of releases will see an interesting shift and the challenge, I suspect, won't be about capabilities but rather restraint.
I am SO impressed with the term "ghosts" for flashes of seeming sentience, and the clear allusion to "Ghost in the Machine." Might I ask if you originated the term? I want to pass it on with credit.
Im very interested what Apple has planned in this space, specifically whether they shoot toward MS’, OpenAI’s or Google’s thinking regarding what these models should be used for.
I would expect Apple to do an implementation of a smaller LLM optimized for the phone with a strong emphasis on security- a much different strategy than anyone else. Their challenge is getting enough compute power on a phone.
I think the link to the google ecosystem will be Googles strongest card here, if they can get Gemini to play well with their suite then it will have a big advantage.
I have "AI Narrated" this post for anyone who likes to listen to their reading.
So far, not loving the chattiness. It is hard to find in Gemini's extended responses where the LLM is altering original text. Granted this may be evidence of more sophisticated processed, but it is a little tedious.
Thanks so much for sharing your insights Ethan! Of interest and relevance I think to how these new models progress is with respect to search. Search is where the vast majority of people will be (and are presently even if they don't know it) directly interacting with AI and whomever figures out how to take the search experience to the next level will likely be the "Google" of the AI era. Although creative endeavors such as ideation, and writing have been at the forefront of these LLM models thus far. The process of accessing information is where most people will initially learn to use AI.
I think the most important sentence for understanding where we currently are in generative AI is the one where you said, paraphrasing, that ChatGPT 4 class models are not quite good enough to power agents, but we are getting close. We can clearly see what is possible, but we are not quite there yet.
I haven’t used Gemini Advanced, but I use ChatGPT 4 every day. It’s great, it’s useful, but there is no flow. It tells me something useful at one moment, if I want to go back some days later, I have to remember the prompts that got us where we left off. I’ve taken to copying and pasting prompts that get me good results, on notes in my phone or word on my computer. I have to graft an ad hoc memory onto ChatGPT, I have to become its memory and it’s very inconvenient. In areas I know a lot about, I find ChatGPT not very useful. It doesn’t know as much as I do, it can’t really help me much. It’s more useful for when I don’t know much about something, but it’s just giving me a relatively shallow and vanilla understanding that I find useful because I don’t know about the field.
I can see where we are going, but we are not there yet. We are so close I can taste it, but that makes the experience a bit more frustrating and jarring.
I suspect we are *near* the limits of where a pure transformer based LLM can go. But very far from the end of progress in terms of bootstrapping that LLM with other LLMs to get all sorts of new abilities (like the rumored Q* from openAI). I also wonder if training LLMs on video will not give them another step increase in ability. They would no longer be LLMs first of all, it would not just be language. But I wonder if training on video would require a level of processing that even google and openAI can’t pull off in 2024
FWIW, MS adds their own layer of hidden prompts to customize the behavior of the LLM. This, in my experience, makes accessing GPT-4 through Microsoft significantly worse than accessing it through ChatGPT plus (and accessing GPT-4 through ChatGPT Plus is somewhat worse than accessing it directly through the API).
If it matters to you in evaluating my evaluation here...I have had 2,256 conversations with ChatGPT Plus since somewhere around October of 2023!
I was so happy to read this. I also frequently have a creepy feeling that the AI I am talking to both GPT4 and Claude sort of "live". I know that it may be because I am always very polite with them, because I like them so much, but still. You do wonder about the model. I hope when they "wake up" they wake up to an appreciative humanity.
This is *exactly* what I've been waiting for, and for similar reasons. Two LLMs at the top of the generative mountain is so vastly better than one, and not just because of the incredibly motivating competitiveness the two entities will feel up there. It's also remarkable because, as Ethan rightly points out, we can now begin to draw conclusions about how the models themselves will scale up. Up until now, our sample set of one hasn't been super duper helpful.
Great piece here, Ethan. Thanks for keeping us informed, and for the thoughtful analysis!
Beautiful, the various generative models with a motivating competitiveness. I love the way you put it.
One of the things I am beginning to appreciate with Gemini that sets it apart from ChatGPT is its ability to prompt me for more information if it will help it give a better answer. Example I uploaded a photo of an insect that I wanted it to identify. Gemini came back with two possibilities, but added that with additional information about the location for example it could give a better answer. I provided the location and it gave a single definitive answer. I am not an educator but I am finding that this creates a more seamless flow to learning about something new. I appreciate your content, Ethan.
Not mentioned here, but a smart thing Google announced in its press release — a two-month free trial. Neither OpenAI nor Microsoft offered a trial to its premium subscriptions so this may incent casual LLM users to give a GPT-4 class model a try (I know Copilot is free GPT-4 in Creative and Precise tones, but MSFT said most people use it in Balanced mode which uses a combination of models). The importance of using the advanced models is that is where people begin to see the true promise of this technology
Well they had to do it and they have the cash. They were behind and BARD was not very useful.
great post Ethan.
any plans to do this type of comparison with perplexity AI or claude 2.1? I never do any chatgpt prompts that require it to do Bing searches as i have found it to be pretty useless for that.
I am also curious on the data privacy aspect and how it will impact Google Gemini adoption rates. am trying to move away from google entirely due to their data privacy policies and I wonder if people will decide to be okay with google knowing so much about their internet browsing history if it creates a more useful Gemini experience.
I'm also curious to hear Ethans thoughts on Perplexity. I'm a big fan and use it dozens of time a day.
Perplexity from what I understand is a combo of BING and Google results. I find it very useful when you need reference links.
I love it too! I use it all the time—with great results. I just posted this in another comment below:
Perplexity's own model ("Experimental") did pass the "octopus test" (I tweaked the "apple test" to prevent search results from allowing the web-connected LLM to cheat): https://www.perplexity.ai/search/Give-me-ten-oj1tEETaRLW3VqC728efBQ?s=c
Hi! I'm also curious about your concern from data privacy aspects. I want to continue using Google services, but it would be helpful to be mindful about what I share versus not. Could you elaborate more about your concerns? A few links of reading would be appreciated ☀️
no article to cite but I generally don't like seeing ads for everywhere I go after going to amazon to search for it once, haha. I also think perplexity AI is a superior search option to Google at the current moment and I am not a fan of emails in general and youtube algorithm that makes me waste time on random videos.
Here is what I got from perplexity regarding google’ questionable data privacy policy.
Based on the search results, here are some key questionable data privacy policies and practices by Google that users are required to consent to:
Combining user data across services: Google's unified privacy policy from 2012 enabled the company to share user data across a wide range of services without explicit consent. This raised concerns about user privacy.
Vague disclosures about data sharing: Google's privacy policy contains broad statements about sharing data with "partners" and third parties. It does not clearly specify all entities user data is shared with.
Lack of transparency around government data requests: While Google claims it pushes back on overbroad government requests for user data, details on compliance rates and scale of requests are not transparent.
Collection of sensitive location data: Google has access to user location data through GPS, IP addresses etc. and can track physical location without clear consent.
Complex and lengthy privacy policies: Google's privacy policy spans over 4,000 words and contains technical jargon, making it very difficult for average users to comprehend data collection practices.
Lack of controls around web tracking: Google engages in extensive web tracking for ad targeting and analytics. Users have limited controls to limit this collection.
In summary, Google's privacy practices have frequently been questioned and even found illegal in Europe. But there is still little regulation in the US around online data collection. While Google has made some efforts to improve transparency, major gaps remain around consent requirements and user controls.
I've been literally checking every day to see when Gemini Ultra was released. And then—what a disappointing product launch! This video by Kris pretty much shows my reaction—the results were so much worse than GPT-4's that I also thought I was using the wrong version of Gemini: https://www.youtube.com/watch?v=hLbIUQWxs6Y
I was really hoping Google, with its huge budget and its powerful DeepMind team, was going to do something jaw-dropping. It's modest at best (and soon to be left behind by OpenAI).
"Complex prompts that work in GPT-4 work in Gemini, and vice-versa…" this strikes me as a very important development in Generative AI...
It's like if the same syntax applied to multiple coding languages without edits or code translation. Very cool.
Another good piece Ethan. But I would be cautious when suggesting you can change prompts for backward or sideways compatibility. I have found that while different embeddings can give you similar results, the edges are quite leaky. This may be fine for non-enterprise use cases. But for commercial organizations and professional development purposes concerned about regulation and governance, this is a huge risk.
I think our core beliefs of what AI or AGI is or isn't matter so for me the way Ilya articulates this leads me to believe that he (and therefore Open AI) has a much deeper understanding of AI/AGI and therefore the outcomes that team produces will always be different. The next set of releases will see an interesting shift and the challenge, I suspect, won't be about capabilities but rather restraint.
I am SO impressed with the term "ghosts" for flashes of seeming sentience, and the clear allusion to "Ghost in the Machine." Might I ask if you originated the term? I want to pass it on with credit.
Im very interested what Apple has planned in this space, specifically whether they shoot toward MS’, OpenAI’s or Google’s thinking regarding what these models should be used for.
I would expect Apple to do an implementation of a smaller LLM optimized for the phone with a strong emphasis on security- a much different strategy than anyone else. Their challenge is getting enough compute power on a phone.
Yeah, that would track with their previous efforts + their investments in local compute power
I think the link to the google ecosystem will be Googles strongest card here, if they can get Gemini to play well with their suite then it will have a big advantage.
I have "AI Narrated" this post for anyone who likes to listen to their reading.
https://askwhocastsai.substack.com/p/googles-gemini-advanced-tasting-notes?sd=pf
So far, not loving the chattiness. It is hard to find in Gemini's extended responses where the LLM is altering original text. Granted this may be evidence of more sophisticated processed, but it is a little tedious.
Thanks so much for sharing your insights Ethan! Of interest and relevance I think to how these new models progress is with respect to search. Search is where the vast majority of people will be (and are presently even if they don't know it) directly interacting with AI and whomever figures out how to take the search experience to the next level will likely be the "Google" of the AI era. Although creative endeavors such as ideation, and writing have been at the forefront of these LLM models thus far. The process of accessing information is where most people will initially learn to use AI.
I think the most important sentence for understanding where we currently are in generative AI is the one where you said, paraphrasing, that ChatGPT 4 class models are not quite good enough to power agents, but we are getting close. We can clearly see what is possible, but we are not quite there yet.
I haven’t used Gemini Advanced, but I use ChatGPT 4 every day. It’s great, it’s useful, but there is no flow. It tells me something useful at one moment, if I want to go back some days later, I have to remember the prompts that got us where we left off. I’ve taken to copying and pasting prompts that get me good results, on notes in my phone or word on my computer. I have to graft an ad hoc memory onto ChatGPT, I have to become its memory and it’s very inconvenient. In areas I know a lot about, I find ChatGPT not very useful. It doesn’t know as much as I do, it can’t really help me much. It’s more useful for when I don’t know much about something, but it’s just giving me a relatively shallow and vanilla understanding that I find useful because I don’t know about the field.
I can see where we are going, but we are not there yet. We are so close I can taste it, but that makes the experience a bit more frustrating and jarring.
I suspect we are *near* the limits of where a pure transformer based LLM can go. But very far from the end of progress in terms of bootstrapping that LLM with other LLMs to get all sorts of new abilities (like the rumored Q* from openAI). I also wonder if training LLMs on video will not give them another step increase in ability. They would no longer be LLMs first of all, it would not just be language. But I wonder if training on video would require a level of processing that even google and openAI can’t pull off in 2024
As a cheapskate ChatGPT4 user, I access it through Bing (Thanks, Microsoft! But please dial back the clickbait overload!) .
Bing or, sorry, CoPilot, maintains a highly accessible list of all your previous conversations, and you can resume any with a single click.
FWIW, MS adds their own layer of hidden prompts to customize the behavior of the LLM. This, in my experience, makes accessing GPT-4 through Microsoft significantly worse than accessing it through ChatGPT plus (and accessing GPT-4 through ChatGPT Plus is somewhat worse than accessing it directly through the API).
If it matters to you in evaluating my evaluation here...I have had 2,256 conversations with ChatGPT Plus since somewhere around October of 2023!
I was so happy to read this. I also frequently have a creepy feeling that the AI I am talking to both GPT4 and Claude sort of "live". I know that it may be because I am always very polite with them, because I like them so much, but still. You do wonder about the model. I hope when they "wake up" they wake up to an appreciative humanity.
I genuinely appreciate your meditations.