69 Comments

I was waiting for your take on Claude after the Claude 3.5 boost! Thanks for that.

Claude is just amazing compared to the other models. For me, there's no point of comparison. It's great at producing creative pieces. And I've found it's great as a discussion partner to analyze poems (for instance, this poem: https://www.ronnowpoetry.com/contents/hall/ImpossibleMarriage.html). Artifacts made it more powerful and interactive.

And for some reason only Claude 3 Opus and Claude 3.5 Sonnet can nail this test:

• Write ten sentences in which the last word is "apple" and the fifth word is "Japan."

GPT-4o goes wrong from the start:

In Japan, they grow a unique variety of apple.

And Gemini Advanced goes wrong too:

• The traveler brought back stories from Japan and a unique type of apple.

Claude 3 Opus and Claude 3.5 Sonnet get all 10 sentences right.

Expand full comment

Same here; I'm not rubbishing the benchmarks, they have their uses, but Sonnet 3.5 just *feels* so far ahead of anything else, I had to go for a long walk on the beach after using it to contemplate how long humanity has left.

Sonnet 3.5 is able to have thoughtful and nuanced conversations on complex topics and to be proactive in coding tasks, actively suggesting next steps or additions.

Expand full comment

yeah, but it stills fail pretty consistently at coding SQL, which is pretty importance for my job as a Data Scientist. That it baffles me sometimes, given its capabilities

Expand full comment

This is not part of my use cases at all, but this helps explain why Gemini and GPT-4o keep getting top scores in tests and leaderboards, when my experience with them would not merit the use of the word "stellar."

Expand full comment

With GPT-4o now is working: Write ten sentences in which the last word is "apple" and the fifth word is "Japan.",

Output is:

1. The best apples from Japan are known as Fuji apple.

2. In Japan, there is a popular dish made with apple.

3. When visiting Japan, tourists often buy local apple.

4. My friend in Japan sent me a delicious apple.

5. The cuisine in Japan often includes fresh apple.

6. While in Japan, I tasted a rare apple.

7. The markets in Japan sell a unique type of apple.

8. People in Japan prefer a crispy apple.

9. During my trip to Japan, I discovered a sweet apple.

10. Farmers in Japan cultivate a special variety of apple.

Expand full comment

Hi, Carlo! But did you notice what happens to the placement of the word "Japan." Take sentence #10: it's the third (not the fifth) word in the sentence. Claude gets that right while GPT-4o still fumbles.

Expand full comment

Yes, you are right, for me was good to have ten items as sometimes ChatGPT stops at three.... :)

Expand full comment

Modify the test to add another additional clauses, it gets it somewhat right

Write ten sentences in which the last word is "apple" and the tenth word is "Hello" and the first word is "wtf"

Only 3/10 sentences contains the world "hello" and only 2/10 "hello" is placed correctly

Expand full comment

Interesting! It could be that, with those additional constraints, it becomes practically impossible to produce grammatical sentences in English that comply with those conditions… at least ten times with different sentences. If we ask for certain word combinations (X word as first, Y word as second), English grammar may be stretched past breaking point.

Expand full comment

Excellent piece and one that all GenAI skeptics should read. The terms used by yourself (Co-Intelligence) and Microsoft (Copilot) are how people should view these tools. They are useful assistants *now*; too many people dismiss their utility if the first answer to a simplistic prompt is not exactly what they wanted.

And I completely agree on Artifacts. I’ve been experimenting with LLMs since late 2022 and even I said, “Are you kidding me? This is so good!” when I first experienced Artifacts in action

Expand full comment

> I suggest that people and organizations keep an “impossibility list”

Great suggestion! I’ve been unconsciously doing something similar when a new “top” model is released but now I can formalize a named process and keep track of the results and potential impacts for my work.

Expand full comment

Here's one. Tell me an original joke where I laugh out loud.

Expand full comment

Ok, here goes. Don’t ever say I didn’t put in the work at least.

https://chatgpt.com/share/3a1e49b8-9e64-4e9a-880e-2dc33dcab95d

Me: Tell me an original joke that makes me laugh out loud please. We’ll keep trying until you get it

ChatGPT: Why don’t skeletons fight each other?

They don’t have the guts.

Me: Nope

ChatGPT: Why did the scarecrow win an award?

Because he was outstanding in his field!

Me: Not original

ChatGPT: Why did the mathematician bring a ladder to work?

Because he wanted to reach new heights in his calculations!

Me: ChatGPT. I know you can do this. Lower the temperature and play with language a little

ChatGPT: Why did the dictionary go on a date with the thesaurus?

It wanted to find the right word to describe how it felt!

Me: Almost there please keep trying

ChatGPT: Why did the punctuation mark break up with the sentence?

It was tired of all the commas but no full stops.

Me: No

ChatGPT: Why did the philosopher always carry a pencil?

In case he needed to draw a conclusion!

Me: Again, more punny

ChatGPT: Why did the music note go to school?

To improve its pitch and measure up to the rest!

Me: Punnier

ChatGPT: Why did the tomato turn red?

Because it saw the salad dressing!

Me: Why would that be funny?

ChatGPT: Why did the scarecrow become a successful neurosurgeon?

Because he was outstanding in his field and had the brains to prove it!

Me: That’s just desperate!

ChatGPT: Why did the bicycle fall over?

Because it was two-tired of your puns and needed a brake!

Me: This isn’t going any better. Drop the puns and get back to work

ChatGPT: Why did the scarecrow get promoted?

Because he was the only one who could "field" the tough questions.

Me: Does it help if I tell you this is important?

ChatGPT: Absolutely, I understand the importance. Let me give it another shot:

Why don't scientists trust atoms?

Because they make up everything!

Me: I don’t think that’s original but at least it’s funny.

ChatGPT: Why did the invisible man turn down the job offer?

He just couldn't see himself doing it!

Me: Win

ChatGPT: I'm glad it worked! Here's another one for good measure:

Why did the computer go to therapy?

It had too many bytes from its past.

Expand full comment

Same.

Here are some things off the top of my head:

- OCR to accurately transcribe the text in a screenshot of a medieval Hebrew manuscript

-an original and convincing interpretation of a talmudic passage

-generating an accurate and photorealistic image of a contemporary orthodox Jewish woman

Expand full comment

I've found all the models to be very reluctant to discuss religion with me, running into frequent refusals.

Expand full comment

Yep, I've found that as well. Though I'm dealing more with historical analysis, so that reluctance doesn't come up so much

Expand full comment

Agreed. If he says to do something, do it. That is my plan.

Expand full comment

This past semester, a student of mine despaired over analyzing a 20+ question survey with 92 responses. She uploaded the spreadsheet to NotebookLM, a tool we'd used in class, which not only cheerfully assured her it would do the task, but also returned basic R values with short statements about strong relationships. She was delighted, beaming at "her" research results. Most important to me was that she accepted the results which did not support her own biases, a task that even mature researchers struggle with. She didn't re-work the data or ask for an analysis that would support her initial hypotheses. The "magical" quality of the tool (it took less than 10 seconds to do the work!) created a type of intellectual distance, like an impartial observer, that she was willing to listen to. I saw something similar back in the 80s when computers first entered classrooms. Some kids hated using them, others would accept correction from a machine that was intolerable from a human teacher. This past semester, I let students use AI, so long as they were transparent about it. Some didn't, others did. The concept of "co-intelligence" is out there, morphing perhaps into ambient intelligence.

Expand full comment

I’ve been building websites for small businesses for over 20 years. And I’ve been playing with Ai for many months. Midjourney was one of the first art services I used. It was a little aggravating to get into but the results were fascinating. I get several AI newsletters that I cannot keep up with but I scan them on occasion. Through one of these found an image creator that blew my mind. You just go to Leonardo.ai. They’ll give you a few credits and you can use some of their services for free. It’s worth your time to try it. I realized I would use this quickly. So I subscribed. The biggest problem I had getting my subscription was paying for it. The problem is they’re located in Australia. My bank thought this was a spam attempt and wouldn’t let the transaction go through until I actually talked to someone one the phone at the bank.

Leonid Afremov is one of my favorite artists. So after I got my paid subscription problem solved I started playing with the program. I can’t possibly use all the credits they give me every month, so I could do a lot of playing. I started putting in “Leonid Afremov” and describing situations that I like. Once I got a feel for the program I began to get some results that blew me away. It was like he had done work for me. Personally! The biggest concern I have is that these are my ideas but his talent. Any he makes nothing on what I create.

AI is having an incredible impact on content creation. For me, text and artwork for my websites. But it is starting to hit many other areas. It’s worth your time to go watch the YouTube series of “60 Mimutes” shows on Ai. Just go to https://youtu.be/wrJsExM8D-o?si=y3u2shR0Hl8cLaPv

It’s over a four hour combination of shows and the first few will very likely captivate you. If your interest fades after an hour or so, at least find the show on 3-D home building. This is another one that will drop your jaw.

Years ago my grandfather told me “I’ve live in the most fascinating generation in history. I’ve gone from riding a horse and buggy to watching a man walk on the moon.” I truly envied his generation. Today the changes are times more frequent and times more fascinating. Much of this thanks to AI. I don’t envy has generation anymore. I wouldn’t trade mine for anything.

Expand full comment

I've been blown away 3 times in 3 months honestly :

- First, Claude 3 Opus was excellent at handling large amount of text. I gave it a 300 pages PDF of my sales pages and sales emails for the past few years, with just a one page explaining intro, and asked him to basically make my copywriting for a new product. It was at least 90% as good as what I would have done.

- Then, GTP4o was released, and the improvement in both speed, comprehension, and most of all precision in image generation was ASTOUNDING. Like literally I could generate very precise images I couldn't possibly generate 3 days before.

I also noticed when I asked it for help in summarizing data, benchmarking things, or in general act as a thought assistant, GPT4o just sometimes creates code in the background without asking me. I clicked on the little bubble and was amazed to see it created a single-use program on the fly to help me convert a table in imperial system to metric system. And that table was in a screenshot !

It'd have taken me at the very least 30 minutes a few months ago !

- And now Claude 3.5 even tops that, although I do think Claude 3 Opus is still better at some tasks.

To me, these evolutions made me pass a mental threshold. Before, I was using AI for pretty specific use cases, generally around content. Now, I use AI all the time just to accelerate thinking. I feel like I have an high IQ always available extremely quick assistant in my pocket.

To me this is as world changing as when I discovered the internet.

Expand full comment

There are many skeptics about this. For example, Daron Acemoglu is convinced that this whole movement is a flash in the pan. The Economist had an article last week where they note "America’s Census Bureau produces the best estimates. It finds that only 5% of businesses have used AI in the past fortnight. Even in San Francisco many techies admit, when pressed, that they do not fork out $20 a month for the best version of ChatGPT."

Executive confession: I am agnostic.

Expand full comment

I think it just shows we’re still early days. When a typical executive realizes they can add 30 IQ points for $20 a month, and understand how that plays out in their daily work flow, the skepticism will melt.

Expand full comment

I am flabbergasted when I talk to business owners and executives who I respect that are not using this yet!

Expand full comment

As Ethan said a year ago, it takes 10 hours of monkeying around to get into it. But I think a good AI executive training program can dramatically shorten the window, assuming a growth mindset on the exec’s part. I ran into a retired architect the other day who was bemoaning wanting to work in retirement but he had never learned CAD-CAM! Now he’s two technological innovations behind, i.e., hopeless.

Expand full comment

But this innovation allows skipping technical skills you left behind. Why learn CAD drafting now when the Computer will Aid the Design 100% in the coming months? Anyhow if someone did want to learn now it's great timing — they get unlimited private tutoring at any hour all month for the cost of one lunch.

Expand full comment

They could just be wrong; at my company it took ages to get people to use things like Zoom rather than conference calls, despite it being an absolute upgrade.

Expand full comment

Good point but this feels bigger than Zoom.

Expand full comment

Zoom is worse than Google meet and Teams I don't know why it's such a big thing. Zoom is unreliable with a heap of redudant features that slows the program down especially for weak computers

Expand full comment

Bill Gates thought 640K of memory was more than anyone would ever need for a personal computer.

AI a fash in the pan? Nah.

A lot of people are saying foolish, myopic, things. Even the smart people.

Expand full comment

That particular Bill Gates quote is an urban legend: https://www.toptenz.net/top-10-famously-bad-predictions-experts-didnt-actually-make.php

But that doesn't take away from your point: Search, and you can find an expert opinion at any extreme of the prediction spectrum. All of them can't be right simultaneously.

Expand full comment

Maybe I knew that, but had forgotten. Goodness knows it still floats around!!

A real one, not as dramatic , but still pretty fun - Digital Equipment Corp. founder Ken Olsen’s 1977 quip, “There is no reason for any individual to have a computer in his home.” (Olsen did actually say that, but he said later that the quote was taken out of context, and that he was referring not to PCs but to computers set up to control houses.) Yeah, no one would ever want a computer to control their home. 😂

Expand full comment

Ha, that quote--and the clarification--is actually #4 on the list of misunderstood/misattributed ones in that same article I've linked to.

Expand full comment

Jeez, even though they go to the trouble of noting that he really said it.

I'm so trusting.

They're great anecdotes, pity they're not true.

Expand full comment

Well, this one is, but definitely taken out of context by modern audiences with the benefit of hindsight.

(Then again, I'm the one who researched and wrote that article, so hey, maybe take it with a grain of salt.)

Expand full comment

Yes, there’s a lot of high emotion, low information opinions out there.

Expand full comment

Well said!

Expand full comment

You are right about bigwigs saying myopic things about computers. Presumably, in 1943, Thomas Watson, chairman of IBM, said, "I think there's a world market for maybe five computers."

You cannot get wronger than that.

Expand full comment

Oh, if that one is true, brilliant! ,🤣

Expand full comment

Economist using “techies admitting they don’t pay $20/mo for ChatGPT Plus” as proof that GenAI is a flash in the pan just shows how little that author knows about the space. Techies experimenting with the tools know how to access the best models for free >> GPT-4 and 4-Turbo have always been available for free in Microsoft Copilot. 4o is now free in ChatGPT. Claude 3.5 Sonnet is now free in Claude. So there are many that have never subscribed to a premium plan that have been using the best models all along still

Expand full comment

Great article, and a lot of great comments. I'm a believer. However, it seems increasingly clear that we will turn over much our thinking to AI and lose, or never obtain, some skills in the process. In which case, how will we know to question it?

I saw a presentation a week ago by a law professor at ASU. He said the writing in students' papers had improved recently and his assumption was that it was because of AI. Which lead me to wonder, how does he know what his students know? And worse, does it matter?

Expand full comment

First, if we don't have sufficient in-brain skill and knowledge, we won't know what to ask the AI to do or how to interpret the products it creates. Second, if we turn over our thinking to AI, we are doing ourselves harm. I don't mean that in the metaphorical sense. Our brains are like muscles, and the harder they are used, the stronger they get. Conversely, the less we challenge them, the weaker they get. In the same way cars have contributed to sedentary lifestyles with negative health effects, over reliance on AI to do our thinking for us will almost certainly have negative cognitive impacts. To your example, the value in writing challenging papers in college isn't only to demonstrate our understanding, it is to engage in the difficult task of stringing ideas and words together in a way that makes sense to someone else. The hard work is what provides us benefit, not the pretty outcome. The neurologist Dr. John Medina has said that research shows the best way to fight cognitive decline is to never retire, to keep working the brain with challenging tasks. AI can certainly be a wonderful assistant to our intellectual struggle, but if we come to rely on it for all the hard tasks, we ourselves will be diminished.

Expand full comment

Yet another terrific article -- thanks for sharing these insights. It made me realize the skeptics out there, the ones who will say things like, "sure, that's an interesting use-case but I don't see how this can apply to my world", are really saying, "I haven't bothered playing around with ChatGPT et al." I say that because most doubters, especially those stuck in Gartner's trough of disillusionment, are also the ones who are still most likely sitting on the sidelines. Or at least that's been the case for the skeptics I've crossed paths with.

Expand full comment

In the Hamlet example, Claude is clearly referring to extant discussions of the passage, including the third meaning. This is impressive and allows it to produce an exceptionally detailed and informative response, but would it have interpreted the text well enough to understand about the third meaning without these?

Expand full comment

Great article Ethan. You link to Claude is broken though, the right one is https://claude.ai/ :)

Expand full comment

oops - will fix

Expand full comment

GenAI skeptics are indexing "progress" off of a single outcome, and in a binary true/false way. I definitely believe that businesses and VCs have every motive to overhype AI, and that they are, dishonestly or honestly, doing so (it is easier to hype what you actually believe). But if the hype is "artificial general intelligence by 2030," the alternative is not "no more progress at all from now to 2030." It's a vast range of possible futures, not one of which is that AI systems become less capable than they are today. The transformation doesn't come when the AI is, at economically valuable task X, better than the best human being, but when it's better than 80% of all human beings, while being orders of magnitude faster and less expensive. And I think we are, for many tasks, almost there.

Expand full comment

Yes! This matches my experience with 3.5 Sonnet, I'm using AI way more than before just because of it.

I did a bunch of coding over the weekend and every mini-project started with a long description to Claude of what I wanted and it was so good and so fast.

Even though I'd have described myself as a heavy AI user up to now I would never have bothered with that before because it just wouldn't have worked

Expand full comment

thanks very useful

Expand full comment

Astonishing post, well written and very insightful!

Expand full comment

I was a participant in a development session recently where we were asked how good we thought we were as listeners. We then were presented with this example: https://www.eslwriting.org/wp-content/goldilocks-T-F-quiz.pdf

It is designed to be intentionally misleading to anyone who is familiar with the story and to demonstrate how bad we are at assuming things whilst listening through context.

Immediately I was thinking, how would AI handle this task? ChatGPT 4o failed to understand and constantly would insist on certain things based on assumptions. Claude 3 however did better initially but got some things wrong, however when I asked it to not assume anything and to focus on the story itself, it went through each of the statements and corrected itself and explained why/how it had come to the new conclusion.

Since then that 'understanding' Claude has demonstrated has been consistently more impressive that 4omni to me.

Expand full comment