Despite fears surrounging the supposedly widespread use of LLMs in generating academic writing, it remains to be seen what the quality of such texts actually is. There are of course conversations surrounding the ethics of using such algorithms in the classroom (Bender et al. 2021; Yan et al. 2024), and certainly LLMs don't produce text the way that people do on the basis of how they are "trained"/manufactured (Bender & Koller 2020; Bender et al. 2021). In fact, the consensus amongst people who work in the field of computational linguistics is that while LLMs can produce text that is coherent, they do so by exploiting statistical regularities in aggregate human texts rather than "understanding" what the patterns in said texts mean (Titus 2024). Simply put: LLMs don't write like people because they don't treat language the way that people do, even if what they produce is grammatically correct.
In a conference paper titled "LLMs don't do things with words" (Rosen & Dale 2024), the authors provided evidence that text generated by chatGPT contained less new information than human generated texts. One of the main takeaways from that paper is that when LLMs do produce a text, they don't progress conversation -- if you never give novel information, then conversations are ultimately one-sided. And people don't talk with each other in a one-sided way. We're constantly bringing in our own thoughts, criticisms, and experiences to the conversations we're a part of.
Academic writing is itself a kind of conversation. The objective for writing papers, both at the undergraduate and researcher levels, is to provide novel insights into questions. Or, to at least provide a window into the mind of the person writing that particular text. If LLMs like chatGPT are bad at incorporating novel information into a given discourse, the text produced by such algorithms should bring to bear less information than our classmates' responses do -- you should be able to predict more about what ideas are being written about in an LLM produced text from what's been written in another LLM produced text than you should be able to predict what ideas you (a person) wrote about from having read one of your classmates' texts. We wanted to test this.
If you are interested in our findings, feel free to skip ahead to the results tab. To get a better picture of the methods we used to test the hypothesis above, see the methods tab. And take a moment to stop by the contributors & references tab to thank the people who ran this mini citizen-science study and to see our citations.