Clinical trials data ultimately drives medical care.

Defective data can be caused by incompetence or malicious intent, both of which should be caught in peer review. Can it be? With what consequences?

Jul 20, 2023

Nature Reports (published in Nature This Week) carried an article by Richard Van Noorden under the headline

Medicine is plagued by untrustworthy clinical trials. How many studies are faked or flawed?

The subhead says

Investigations suggest that, in some fields, at least one-quarter of clinical trials might be problematic or even entirely made up, warn some researchers. They urge stronger scrutiny.

Are you shocked? (Read Van Noorden’s article.)

I accept the author’s estimates on face value, recognizing that the real number may be higher or lower.

The author’s estimate was based on good-sized samples but the market segment in which he assessed the levels of bad data (unforced, unintended errors and out-and-out fakery), anesthesiology, was quite narrow.
Further, his methodology may not be sufficiently repeatable to produce independently verifiable results.

So the author’s analysis of science research has apparent weaknesses in generalizability and replicability.

So what?

He deserves commendation for flagging the issue, getting it more broadly discussed and, one would hope, seeing better approaches that increase the findings’ precision and remediation.

Which brings me to an announcement from 9 July this year, of ChatGPT Code Interpreter. This is a new version of ChatGPT-4 extended with many new abilities based, at least in part, on converting the user’s request into code (for example, python) which it then runs to generate an answer.

There are many different uses for the new capabilities but the one that stands out to me, initially, is its ability to assist with (and partially take responsibility for) data analyst activities. It’s not a simple set of python programs that the ChatGPT creators developed and included.

Here’s what its developers said in github in terms of the new value of the Code Interpreter for Data Analysts. It deals with:

Open ended analysis:
"Here is some data on superhero powers, look through it and tell me what you find"
"I am interested in doing some predictive modelling, where we can predict what powers a hero might have based on other factors."
Joins: "Could you first combine the two datasets"
Integrity checks: "... and confirm that they combined properly"
Data Cleaning: "does the data need cleaning in any way?"
Data Preprocessing: "Great! Cluster analysis can help us group similar superheroes together based on their attributes. Before we start, we need to preprocess our data. This will involve the following steps:" https://chat.openai.com/share/770ab170-8fed-402f-850f-e6d7273e77cd
Summary of analysis: "what is your summary of all this analysis, in bullet points?"
Create Interactive Dashboard https://emollick.github.io/Superhero/
and graphs https://emollick.github.io/3Dmusic/
and maps https://github.com/emollick/GPTflightmap.github.io
And a lot more! Go here to get more detail

With some extra work (!!!) this technology might be able to be fed research papers submitted for peer review to identify problems in

The data sets themselves
How they were analyzed
How they were presented
Evidence of data dredging, p-hacking and other statistical abuses.
Inferences the researchers drew

This might give editors automated tools to deal with elaborate content that doesn’t follow fixed templates or forumulas.

Of course, ChatGPT and other LLMs have problems with consistency between answers to the same question (an issue that OpenAI says has been reduced with the latest release, reduced, not eliminated.)

My mind gets boggled too when I think about researchers running their drafts through this class of tooling beforehand so they can tweak the numbers to get a passing grade.

Where in the real world do you think the new data analyst capabilities in the GPT-4 Code Interpreter should be tested? Do you know anyone doing it yet? How does it apply to you and your organization?

Tom Austin's Substack

Discussion about this post