Leiter Reports: A Philosophy Blog

News and views about philosophy, the academic profession, academic freedom, intellectual culture, and other topics. The world’s most popular philosophy blog, since 2003.

  1. J.P. Loo's avatar
  2. Sebastian Sunday Grève's avatar
  3. Giovanni Molteni Tagliabue's avatar
  4. Fabien Muller's avatar
  5. Saul Smilansky's avatar

Does AI degrade human comprehension and reasoning?

Law professors at the University of Minnesota investigated, and came up with a somewhat more optimistic answer than a lot of research–although careful structuring of how and when it’s used is probably needed to avoid negative effects.

Comments from readers who actually read the paper are welcome.

,

Leave a Reply

Your email address will not be published. Required fields are marked *

6 responses to “Does AI degrade human comprehension and reasoning?”

  1. These type of experiments should, as much as possible, emulate real-world conditions. The initial assignment in this study was to draft a legal memo for a partner in 75 minutes. It is unsurprising that the non-AI group was unable to fully absorb the assigned materials under such constraints. Does anyone dispute that AI-assisted review is superior to cursory human review?

    1. Thanks for your comments. A few quick responses:

      (1) The source packet was 12 pages long and less than 5,000 words. We specifically set time limits to ensure that participants who were reasonably diligent would have plenty of time to read the materials and craft a brief memo synthesizing the rule. We selected 75 minutes after pre-testing, where we found this was an adequate amount of time. And, in fact, the average amount of time that participants in the control group spent on the synthesis task was 69.46 minutes (out of 75), which is inconsistent with the idea that we gave participants an insufficient amount of time. In this regard, two additional points are worth highlighting. First, participants has meaningful financial incentives to perform well, meaning that they generally had good reason to work diligently and efficiently within the experimental structure. Second, our initial hypothesis, consistent with literature from outside the legal domain, was that participants in the AI group would perform less well than those in the control group once they no longer had access to AI; we thus were motivated to ensure that the time we gave participants to read and digest the relevant legal source materials in stage 1 was sufficient.

      (2) We certainly agree that experiments should emulate real world conditions to the greatest extent possible. At the same time, crafting the experiment to test our hypothesis meant imposing some artificial constraints on the participants. In our case, this biggest such constraint was not time (which, again, we pre-tested to ensure reasonability), but the highly-structured breakdown of tasks and subtasks for participants. This is indeed quite artificial; a partner tells an associate to write a memo on topic X, and does not usually provide source materials, instruct them to first research the law, tell them how to use AI, etc. We acknowledge this artificiality in the paper, and note that it may well be one of the reasons that AI improved participants’ capacity to engage in legal reasoning even after AI was no longer available. For that reason, we highlight the practical take-away that lawyers should confine AI use to narrow, well-defined components of a project.

      (3) We did not test whether “AI-assisted review is superior to cursory human review.” We tested whether human legal analysis that was based on AI-assisted synthesis of legal source materials was superior to human legal analysis based on ordinary human reading of legal source materials. We certainly did not think the answer was “yes” going into the experiment (as indicated by our pre-registered hypothesis), consistent with the broader literature we discuss in the paper. In retrospect, that data do seem clear that the mechanism at play is that AI allowed participants to better understand the legal source materials, which then had downstream positive effects on their capacity to engage in legal reasoning based on those materials. But that result was not intuitive to us, at least.

      1. Thank you, Dan. Your “synthesis task” required students to both read the packet and write a legal memo on a novel topic in 75 minutes. Do you know how much time participants spent reading versus writing? The mere fact that the control group completed the assignment in 70 minutes on average versus 61 for the AI-assisted group does not mean that time was not a factor or that they read as they normally would.

        1. *Yes, you are correct that the synthesis task required participants both to read the source material and draft a memo. The assignment was to draft a “short (around 750 words)” memo “summarizing the current law” in the source packet. We don’t have any way of knowing how much time participants spent reading vs drafting.

          *To be clear, we DID intend for time to be a factor, precisely because this reflects the reality of legal practice. But we did our best to structure the assignment so that there was a reasonable amount of time to complete the assignment. I do think the fact that the average amount of time spent was 70/75 minutes for the control group when they had financial incentives to perform well is strongly suggestive that we hit our target (a target we set after pre-testing and that we had incentives to hit in order to test our original hypothesis). But I’m sure that there were indeed some participants who would have performed better with more time; this is true for pretty much any timed assessment, and I don’t think that this alters our analysis/conclusions given the reality that lawyers often must work under tight time constraints.

  2. Thanks for a very interesting paper as we try to understand this new technology and how to work with it. The results of this study are important and counter-intuitive (I would have agreed with the authors’ hypotheses on the effects of removing AI assistance!). And I wonder how much is tied to the specific tasks performed in the study (a limitation the authors acknowledge). I think there’s a useful distinction between legal reasoning in which the relevant analytical framework is specified in advance (what we might call “closure-oriented”), such as synthesizing legal sources or applying a test to novel facts, and legal reasoning directed toward constructing a framework that responds to the needs of the situation (what we might call “generative”, if we could free the word from its association with “generative AI”), such as counseling in an emotionally and strategically complex legal matter.
    The study’s findings are suggestive: that AI improves the quality of synthesis and downstream application suggests to me that it is a potentially powerful tool for those closure-oriented tasks. That AI levels the quality of revision by strengthening weaker memos and degrading the strongest ones–potentially by degrading the latter memos’ precision, granularity, and creativity–suggests that it optimizes for competency while blunting the idiosyncrasies that make for excellent (and “generative”) work. Distinguishing between forms of legal reasoning may be helpful, recognizing that legal practice encompasses many ways of thinking through legal problems. Again, a fascinating and provocative read!

  3. Very interesting interview with Professor Schwarcz on the latest episode of Ipse Dixit podcast:

    https://shows.acast.com/ipse-dixit/episodes/daniel-schwarcz-on-ai-and-human-legal-reasoning

Designed with WordPress