Discussion about this post

User's avatar
Green-2.99's avatar

It seems like there is a distinction missing here regarding what replication means. Running another statistical analysis on existing datasets is not the same thing as replication, which requires running experiments again. So in psych for example, it's not just analyzing the existing reported data from a questionnaire filled out by participants: rather, it's obtaining a new batch of questionnaire results and seeing whether those results accord with prior sets. Am I missing something?

Expand full comment
Ezra Brand's avatar

>"It all checked out. I tried this on several other papers with similarly good results, though some were inaccessible due to file size limitations or issues with the replication data provided. Doing this manually would have taken many hours."

But it didn't actually do anything helpful. It just said " it all checks out". Why aren't AI agents going through tens of thousands of papers, and unearthing papers with problems?

In practice, even cutting-edge AI tools still need a tremendous amount of guidance to do anything helpful, even on narrow tasks like coding. And they go off the rails in a significant percentage of cases.

I say this as a big user of AI for the now-standard use-cases of coding, editing, and searches. But regarding bigger tasks described here, I'm quite skeptical. like many others, I've become skeptical of toy studies and benchmarks. And I'm constantly testing the tools, and they simply don't currently work well for larger tasks

Expand full comment
21 more comments...

No posts