58 Comments
author

I realized revealing the answer now will ruin the survey, so I'll add it here in a few hours.

Expand full comment
author

No answer ever crossed more than 50%, and the people who guessed the right answer did so because of guesses that I was a bad photographer and too many of these pictures seemed well lit and posed; not because it looked obviously AI. Ha!

Before I give the answer, at the time I revealed it, the poll stood at:

1st image (the one on the left) -12%

2nd image - 45%

3rd image - 33%

4th image (the one on the right) - 9%

The correct answer was the 2nd image, which, while it got the plurality, was not the run away winner. And this is my own amateur efforts that took just a few minutes. I rest my case, I think.

Expand full comment

It's not that you're a bad photographer, it's that some of the other images looked so good in terms of lighting and whatnot that they were unlikely to be real. That's my main heuristic these days, when something looks "too good." I wavered on the 3rd one since the two women looked so similar, and I thought that seemed real (e.g., sisters) rather than unreal. It's interesting how all of these AI images are training us to come up with priors for what a real photo should look like.

I'm not sure how fast AI generated images will get fully realistic since that last 1-2% probably isn't really worth going for, but the distinction is going to get harder to make as our actual photos get more and more post-processed by Apple and Google.

Expand full comment

I chose #2 because the scene looked slightly less arranged than the others.

Expand full comment

I am now astounded at the woman's tiny hand in the second picture... I based it entirely on hands and went #3!

Expand full comment

Yeah more than half of voters guessed wrong!

Expand full comment

there is no one answer with as much as 50% of votes, I think this proves your point already.

Expand full comment

The 2 girls in a class. I didn't zoom, but the context (background, colors, etc) made my decision.

Expand full comment

I went for my immediate impression without looking too carefully, but one of those images immediately stood out to me as being authentic in that it looked less "professional" than the others. There were two others that seemed very AI to me in that they had some of the characteristics of professional photography. There was a third that sat half way between the "authentic" one and the two "professional" ones.

Of course, just like your examples of redrafting the WW I text, it would be perfectly possible to prompt for more "average snapshot taken with a cheap phone camera" to simulate exactly this difference.

Expand full comment

When you give the answer, please record what the survey results were at the time that you posted it!

Expand full comment

AI seems to make people awfully good looking. Even if I prompt it to make the humans on the picture to look less like models and more like regular humans, it makes them look strangely “perfect.” Also it seems to “prefer” displaying younger people. Bias?

Expand full comment

The photo you took is the second one from the left. It's the one that isn't professionally lit in a way that a photographer would arrange. Plus, one of the students is wearing what looks like a Penn sweatshirt. If I'm wrong, then my mind will be blown!

Expand full comment

A decade post MGMT 801 and you're still teasing me Ethan. Great blog BTW. Incredibly useful.

Expand full comment

I chose #2 because I've messed with enough AI image generators to have a sense of what those images look like, and #2 looks the most 'human' - but yes, you're right, mostly I just guessed based on that experience.

Expand full comment
Oct 12, 2023·edited Oct 12, 2023

I love that closing sentence: "The only thing I know for sure is that the AI you are using today is the worst AI you are ever going to use."

Also, I wanted to address an earlier point about the internet being a finite source of content for AI training, and using AI-generated content to bypass that. There's a potential phenomenon called model collapse that might occur if LLM output becomes too strongly the primary source of information that subsequent generations are trained on. Paper here: https://arxiv.org/abs/2305.17493

but TLDR version: the probable gets overrepresented, and the improbable (but real) slowly gets erased. Based on the probabilistic way that these large models work, this makes a lot of sense-- but a probable reality and an actual reality are two extremely different things.

LLMs and LMMs (large multimodal models) are likely to improve for quite a while yet, but it's quite possible that it will not be a linear or even exponential direction upwards. There will probably be some hidden valleys of performance loss that we might not notice until we solve them with novel architectures (if we ever even notice them at all!)

So I'll close with a sentiment that echoes yours: "The only thing I know for sure is that the AI you are using today is the worst AI you are ever going to use - but the same thing might not be true in the future."

Expand full comment

People always overestimate their own perceptive abilities. I remember when preparing for Kabul, we had to look at still shots of video footage just before a suicide attack so that we could see that no matter how perceptive you think you are, you are not going to spot the suicide bomber except through sheer dumb pin-the-tail-on-the-donkey luck.

Expand full comment

Hey Teachers! Want a foolproof way to detect AI cheating students? Sit them down with a pencil and paper, write an open question on the chalkboard, and have them compose an answer. WATCH ‘EM SQUIRM.

Expand full comment

The 1st and 4th were the easiest to eliminate, because they look like classic genAI image. I chose the 2nd because one of the students is sitting on her coat - genAI unlikely to add such detail :D but without reference of other pictures, I could not detect that 3rd is generated.

Expand full comment

Regarding LLMs I notice the one I reach for most has been Llama 2 70b which is available free at https://labs.perplexity.ai/. A quick search shows it is comparable to GPT-4 in many respects. I find it responds very quickly. Bard is so slow to the point it breaks my flow. Using Bing has a bit more friction and slightly slow. I have just signed up for Claude so will be interesting to see how it compares. Thanks.

Expand full comment

I've been meaning to post a 'thank you' here since started reading your comments back in the spring. Since then, I read your posts as soon as they hit my inbox. Along with your articles, they have been a major influence in how I redesigned my teaching for this academic year (I am an academic in a British 'Russell Group' university, where I teach mathematics and quantum mechanics to undergraduates in chemistry and other natural sciences). I'm finding AI excellent of these subjects. Under anything other than expertly-prepared prompts, current AIs are poor at doing the work I ask the students to do. They are excellent, however, at tutoring students at how to approach the material, how to engage with it, how to access their own work.

Declaration (as opposed to 'disclaimer'): every word on this post is 100% genuinely mine. AIs did not contribute to it in any way! ;D

Expand full comment

I recently asked chat gpt 4 to help me think of a question a human has never asked. It could not do it... it just couldn’t...just a variation on the same grand themes of our short time...

Expand full comment

Well, I don't know what the question is, but the answer is 42

Expand full comment

Not much to do with the post, but I just wanted to share about a strange bug I have found on ChatGPT. Whenever I type "Frank Van Dyke," it gives me the error message, "This prompt may violate our content policy." It was just meant to be a fictional name that I made up for my teaching material, but ChatGPT stubbornly refuses to accept any prompts with it. I searched the internet but did not find any controversial figures associated with the name. It's been a mystery to me.

Expand full comment

I have a colleague with a similar last name and their name often gets flagged. I assume it is referencing the homophobic term (at least in the UK) for lesbian without sufficient context. Made all the harder because like "queer" it is also a term that has been reclaimed and used positively, which can be a challenge for language models to detect.

Expand full comment

So the woke filters aren't even competent.

Expand full comment

This is not a new issue.

https://en.wikipedia.org/wiki/Scunthorpe_problem

Expand full comment

And wasn't it solved decades ago?

Expand full comment

Obviously not since there's examples on the page from less than decades ago.

Expand full comment

Just because a problem is solved, doesn't mean everybody follows best practices.

Expand full comment

I didn't know "dyke" had such a meaning. Thanks for the information! I will try to come up with other Dutch-sounding names for future :D

Expand full comment

Are you using 3.5? I just asked ChatGPT 4 and it doesn't seem to have an issue with it, at least for me.

Expand full comment

Really? I use ChatGPT 4, too. It might have something to do my settings then.

Expand full comment
Oct 12, 2023·edited Oct 12, 2023

GPT4 did not get worse, it is just different. So you might need you to change the prompt. In my case it actually gets better. More or less the same prompt seems to me to return better results over time. How should you manage this as a coder? In traditional development (more or less) once it is tested and it works, it works. What is the right testing approach when you use a LLM API?

Thanks!

Expand full comment

LLMs are really good at evaluating their own outputs.

From my experience few-shot GPT4 is just fantastic.

You can construct QA banks and get the LLM to verify that answers haven't significantly drifted in tone or content.

Also, a bit more domain dependent but LLMs can reason really well by writing and then evaluating code. Their ability to then accurately assess the validity of code interpreter sessions accurately is very high.

Expand full comment
Oct 12, 2023·edited Oct 12, 2023

Yes I need to implement this! In my case the "question" is a flat dataset and the "answer" is a structured JSON file so building and checking a QA bank should indeed be possible. Thanks for the suggestion!

Expand full comment

Great insights … humility is one of your absolute best virtues …in helping us travelers experience the AI journey 👏🏿👏🏿👏🏿

Expand full comment

"The AI you are using today is the worst AI you are ever going to use" Love it.

Expand full comment

Without zooming and searching for details, I moved relatively quickly to #2 (from left). It would look pretty much like the photo I would take myself (with a relatively cheap, 1 year old phone).

I immediately rejected #1 and #4: they are perfectly lit, and the blurred background looks more like a studio shot than a classroom snapshot. And as for #3 (the two girls), I quickly realised it was AI-generated: the window behind the girls (I think it should be a window) made me think it couldn't be a real photo.

BTW: Because of my relatively bad English i had "Deepl Write" look over my comment - exceptthis very last sentence (did you realize it?)

Expand full comment
AnonymousOct 28, 2023

"training AIs on data that the AI makes up." If AI makes up the data, what good is it?

Expand full comment

Open to collaborate Ethan? -Quentin

Expand full comment