Leiter Reports: A Philosophy Blog

News and views about philosophy, the academic profession, academic freedom, intellectual culture, and other topics. The world’s most popular philosophy blog, since 2003.

  1. Fool's avatar
  2. Santa Monica's avatar
  3. Charles Bakker's avatar
  4. Matty Silverstein's avatar
  5. Jason's avatar
  6. Nathan Meyvis's avatar
  7. Stefan Sciaraffa's avatar

    The McMaster Department of Philosophy has now put together the following notice commemorating Barry: Barry Allen: A Philosophical Life Barry…

A framework for preserving authorship and trust in the AI era

Philosopher Eli Alshanetsky has been writing about “how we might verify human authorship and accountability in AI-mediated work without shifting the burden onto already overstretched faculty.” A short version of his ideas are available at The Conversation. A longer version is here.

Professor Alshanetsky welcomes comments, and intends to reply here. Be sure to at least read the shorter Conversation piece before commenting.

, ,

Leave a Reply to Jason Leddington Cancel reply

Your email address will not be published. Required fields are marked *

16 responses to “A framework for preserving authorship and trust in the AI era”

  1. “Yet those mostly reward speed under pressure, not reflection.”

    This is a criticism of blue books that I often hear. But I don’t think it lands, especially if the assignment was designed with this problem in mind.

    Here is what I have done. I give students a full class meeting to write first drafts in blue books. I collect the drafts and have the drafts redistributed in discussion section for peer review. After peer review, my TA’s collect drafts and redistribute them in lecture for a “final” draft session. (I say “final” because I allow students to do further rewrites if they please; to do so, they have to book an appointment where they will be given the previous draft of their paper and a fresh blue book.)

    It’s worth adding that the results have been very promising. I have completely gotten out of the funk of mistrust that set in when GPT was released. Further, engagement is up: Students are talking to me, my TAs, and each other noticeably more. Even better, they’re enjoying it. So far, I have only gotten rave comments from my students, who have been telling me that they sincerely enjoy being given the opportunity to think for themselves and like the break from staring at screens.

    1. That makes sense. My line about blue books was mostly aimed at one-shot, timed exams. What you’re describing sounds more like a writing workshop, which I can see working well.

      Part of my worry, though, is that teachers shouldn’t have to give up assignment formats that were there for a reason just because tech has changed. Your system takes coordination and classroom time. I imagine students probably end up writing shorter essays, and it comes at the cost of discussion or other kinds of engagement you could do in a “flipped classroom” model.

      The “away from screens” benefit is nice, but part of the challenge is that the blue-book format no longer fits how many of us actually write or think. My own process involves lots of cutting, pasting, and rearranging large sections of text, and my handwriting is terrible. Reading student handwriting, or forcing everyone to write linearly without revision tools, adds a different kind of overhead. It can also complicate accessibility, since many students now have approved accommodations for extended time or digital tools, which I imagine this format doesn’t easily support.

      The protocol I’m developing keeps the benefits you describe, but doesn’t require teachers to redesign their assignments or students to change the way they actually write. It also creates a space where teachers can hold themselves to the same standards (e.g. returning digital feedback under the same transparency conditions, which keeps the process collaborative rather than adversarial). It also opens new options: you can still have “AI-free” in-class writing, but you can also allow limited AI use under teacher-defined rules (in class or out), or unrestricted AI use paired with an authorship check at submission, which tests whether students have genuinely assimilated what they wrote.

      (Here’s the longer article for anyone interested in the broader framework:
      https://link.springer.com/article/10.1007/s12115-025-01149-x)

      1. One follow-up thought: if students use AI for almost all their writing outside of class, they’ll gradually lose the ability to write on their own, and we’ll end up lowering standards to meet them where they are, just as we did when smartphones and social media started breaking up how people read. Blue-books might slow that trend, but they don’t really solve it. What we need to teach are metacognitive skills that help students notice when they’re actually driving the process and when they’ve started to hand it over to AI. That’s largely what the protocol is meant to do.

      2. Thanks for the response. A few thoughts…

        First, I have experienced in-class, pen and paper writing sessions as an extremely *good* use of time. Writing days are very much a flipped classroom experience. Students work on their papers and ask me and TAs questions that are coming up as they work, including little ones like trying to find synonyms for words they don’t want to repeat. It’s a very nice experience, and I think the educational return on doing in-class, pen and paper writing sessions is higher than that of the lectures they are displacing.

        As for the comment about accommodations, this has not been an issue. All of my students are all STEM students. Their other instructors frequently have assessments that are not open-internet. I’m holding a similar line. This is not to say that we do not accommodate students. The point is that there are many ways to do this because it’s not uncommon for instructors to require their students to do in-class, closed-internet assessments. To cope with this, our campus has a testing center with computers that do not have internet access. But one does not have to use these services, it’s just an example. What we have done is book a room for the same time slot (and building) as the writing sessions. This allows students to sit with each other and a TA while they write, which allows them to ask questions as they work through their ideas. In short (and again), accommodations have not been an issue. Students who have them know how to navigate them, and the university has a variety of options for assistance.

        Finally, access to cut and paste etc. has not been an issue. I have literally gotten zero complaints of this sort. And, to be clear, I have been open with my students that I’m doing this for the first time and want feedback (and I have given them avenues for submitting anonymous feedback, all of which, it turns out, has been positive). Further, we have, as a class, discussed strategies for using blue books in creative ways to support different ways of writing, including writing sections/paragraphs/etc. out of order and reordering them later (which can be done with literal cutting and pasting!) More generally, pen and paper is a very flexible format. It can be adapted to many ways of thinking. My students do not seem to be bothered (or hindered—to your other point, their essays have been very good and I have been holding these essays to *higher*, not lower, standards) by not having the option to cut/paste, etc. digitally.

        In short, I appreciate the concerns. I had them myself. Having now run the experiment, I can say that my use of blue books has been a major improvement and that the actual trade-offs have been minimal to nonexistent. I will stick with blue books, and my only regret is that I wasn’t doing this before GPT forced my hand.

  2. This is an interesting proposal, but do I understand correctly that the verification method only works if the students compose their work in a lab where people can monitor them? Otherwise, they could open another browser (perhaps on another machine) and feed the work into an LLM and get real-time fabricated results to plug into the lab that way, correct? I would be reluctant to require that essay-writing take place in a proctored environment like that, because I’m not sure it’s the best way to stimulate good essay writing, but perhaps that’s the only way to continue to assign out-of-class essays.

    1. That’s a great question, and I completely understand why it might sound that way — but no, the idea isn’t to have students write in a monitored lab or proctored environment. In fact, the whole point is to move away from surveillance, spying on students’ private writing habits or anything in that vicinity.

      In the Authorship Check mode, students write wherever they normally would. Before submitting, they go through a short, low-pressure interaction with an AI assistant that asks targeted questions about specific parts of their own draft; for example, suggesting they clarify a thesis claim, vary an example, or choose between alternative formulations. The student edits or responds in real time for about 5–10 minutes. That process shows whether they can actually work with and extend their own reasoning.

      To “game” it, a student would have to feed their essay and the adaptive prompts into another AI in real time (while having the live conversation with ours), figure out which parts were being probed, and make consistent edits as the essay changes. That’s a level of effort and sophistication that’s much harder than just doing the work. Someone will try, but that’s fine. Teachers who want to run the check in class to close that loophole could do it, though I doubt it’s really necessary.

      In the AI-Free mode, a student could still “game” the system by taking screenshots and feeding them to an AI on another device, but they’d have to retype everything manually, since copy-paste and AI calls are disabled. That’s easier to get around than the Authorship Check mode, but it’s still a step up from just trusting students to follow a “no-AI” policy on the syllabus. The goal isn’t to eliminate cheating altogether; it’s to solve the AI problem, which is to get us back, or at least closer, to the baseline of trust we had before.

      1. How do you then evaluate the students editing/responses that they produce in the pre-submission interaction?

        Is that more grading work for the instructor, or is that assessed in an automated way by another LLM? (that seems fine, if you have some way to be confident that the LLM is doing a good job).

        Or does no one see that interaction, and the point is just that the students went through it? I guess if the essay is improved, that will show up in the final submission. But it seems like it could be significant extra work to compare the pre- and post- edit versions, to see what effect the interaction had. (That is, it seems really worthwhile to do as part of this pilot program, just to see what kinds of effects you get, but to do it as part of every assignment does seem to be adding to the burden for overstretched instructors).

      2. Replying to Anonymous below: “How do you then evaluate the students editing/responses that they produce in the pre-submission interaction? Is that more grading work for the instructor?”

        No, it doesn’t add grading work — that’s the whole point. The system automatically checks (using rules we set) whether students can work within their own essay. If they can, it passes through; if not, they try again. The comparison between versions is simple and tightly constrained, the kind of coarse evaluation we’re confident an AI can handle reliably (in fact, there are several such checks during the short 5–10 minute process). Instructors don’t review that interaction in any form. (And you’re exactly right about tracking effects: we’re studying this now as part of the initial pilot before a broader rollout.)

  3. Michael Magoulias

    I think the author is mistaken in stating that the “medieval” methods of the invigilated essay exam “mostly reward speed under pressure, not reflection.” This is the perfect description of how multiple choice exams work, but bears no resemblance to the blue book exam model.

    The latter demonstrates both what one knows and how one thinks, since the student has to sustain an argument and support it with evidence. The possibility of cheating is simply non-existent in this scenario. The model of a three-hour essay exam in which three questions are answered has worked extremely well for assessing candidates seeking to enter Oxford and Cambridge, as well as for assessing their performance at the end of their undergraduate careers.

    Perhaps Professor Alshanetsky is using the word “reflection” in a different sense from “a demonstration of how one thinks,” but having had to do these kinds of exams myself in the sweet days before AI had any reality outside of the Terminator franchise, I can attest that this authentic representation of a student’s thought process was the whole point of the exercise.

    Substitution of this tried and tested (and extremely cost-effective) system with a kind of electronic lie- detector program seems like a more time-consuming way of ending up in the very place that many are trying to avoid: the off-loading onto machines of a simulacrum of what should be genuine, embodied thinking and assessment conducted between human beings. Perhaps most worrying is the fact that there is simply no certainty that important aspects of the work were not done by AI in the case of exams that are not invigilated. So why opt for an uncertain result when a certain one can be easily obtained?

    1. I get the appeal of that model — the invigilated essay exam did work well within a very specific system: the Oxford–Cambridge tutorial model. Students had weekly one-on-one meetings, constant feedback, and years of training in the art of argument under pressure. In that context, the three-hour essay could demonstrate real command and improvisational intelligence. But for most students today, even in top research universities, that environment doesn’t exist.

      I wasn’t educated at Oxford or Cambridge. I studied philosophy at Berkeley, where I discovered that the act of writing itself — struggling to articulate an inchoate thought until it became clear — could be philosophy. If philosophy had been tested through timed essays, I would never have found it.

      Heinrich von Kleist’s short essay, “On the Gradual Construction of Thoughts During Speech,” captures this: thought unfolds through articulation. Reflection isn’t the display of what we already know under pressure, but the process of discovering what we mean as we try to say it. The kind of reflection my protocol aims to protect is precisely this.

      Describing what I’m proposing as an “electronic lie detector” misses the point entirely. The system doesn’t monitor, analyze keystrokes, or attempt to infer deception. It’s the opposite — an alternative to surveillance-based “integrity” systems that treat students like suspects. The older invigilated essay was valuable because it made reasoning visible. My argument is simply that we need new structures to preserve that same visibility in a digital, AI-mediated world.

      1. Requiring students to pass an authorship check is still a little bit like treating them as suspects, and I expect students will still see it that way. Maybe it’s time to just admit that we have to monitor students to determine whether they’re actually doing the work themselves.

      2. The goal is to make AI use visible on both sides, so that teachers, students, and institutions can define the terms of its integration. The authorship check isn’t “monitoring” for anything. It doesn’t collect data to catch misconduct; it simply defines what it means to complete the task. Students can choose how to work: they can write in AI-Free mode, use AI freely with an authorship check, or choose another mode. The whole system is reciprocal: the same framework can apply to teachers, who can make their own use of AI transparent when returning feedback. Right now, a student can swear they didn’t use AI (or used it “just a little”) but have no way to show it beyond a social promise. The system gives them that ability.

  4. “In my lab at Temple University, we’re piloting this approach by using the authorship protocol I’ve developed. In the main authorship check mode, an AI assistant poses brief, conversational questions that draw students back into their thinking: “Could you restate your main point more clearly?” or “Is there a better example that shows the same idea?” Their short, in-the-moment responses and edits allow the system to measure how well their reasoning and final draft align.” In other words, comment on students’ draft papers, then make them submit a final revised paper. This is what I do for all my students’ papers. Why would students want to be asked questions by a bot who can’t understand the texts they are writing on, instead of a professional philosopher who does? The author’s proposal may be compared to designing an AI therapist who ‘listens’ to a patient and is programmed at intervals to ask, ‘And how did that make you feel?’ If you really want to help your students think and do philosophy, create prompts that defeat AI (they must use multiple sources, respond to multiple highly specific questions, and cite the specific page numbers of their multiple sources each time they quote or otherwise rely on them). If you don’t want to grade papers, don’t teach philosophy.

    1. The system isn’t a grader or a substitute for teaching. Only the instructor has the moral and intellectual authority to evaluate the quality of philosophical work. It’s not there to comment on ideas, but to check that there’s a genuine cognitive link between process and product. A student could pass the authorship check with flying colors and still turn in a weak paper.

      You’re right that as philosophers, we give feedback on drafts, and that’s irreplaceable. But when much of what we now grade includes AI-generated text students don’t own, the feedback loop collapses. We can’t teach thinking if the work we’re engaging with doesn’t come from the student’s thinking.

      Re “just write AI-proof prompts”: I’ve spent two years experimenting with exactly that — multiple sources, layered questions, personalized prompts, process reflections. AI can handle all of it. You can’t expect students to “beat” AI. The point is to help them recognize when they’re thinking and when they’re offloading.

      If grading means spending hours commenting on work a student didn’t actually produce, then no — I don’t want to grade that. But I do want to restore a situation where my feedback connects to real student reasoning. And it’s not just for students: I designed it so that instructors and writers (myself included) can check that they’re still guiding their own thought when they’re taking AI assistance.

      1. ” But when much of what we now grade includes AI-generated text students don’t own, the feedback loop collapses.” This year I would say 98% of my students have produced their own work. Those using AI were immediately caught. You say you have ‘spent two years experimenting” with AI-proof prompts. Perhaps that is the problem. A grad student could practice for two years in grad school learning how to write prompts. It takes years of experience. Perhaps you could share some examples of prompts you use in your courses that students so easily ‘beat’?

    2. You say that “those using AI were immediately caught.” What’s your basis for this claim? The evidence I have from talking to students strongly suggests that most of them regularly use AI to generate text for writing assignments. (On this point see also the Higher Education Policy Institute’s “Student Generative AI Survey 2025” (https://www.hepi.ac.uk/wp-content/uploads/2025/02/HEPI-Kortext-Student-Generative-AI-Survey-2025.pdf).) And the evidence we have about the capacities of current systems suggests that we can’t reliably tell when they’re being used. (See, e.g., “A real-world test of artificial intelligence infiltration of a university examinations system: A ‘Turing Test’ case study” (Scarfe et al., 2024).)

Designed with WordPress