Leiter Reports: A Philosophy Blog

News and views about philosophy, the academic profession, academic freedom, intellectual culture, and other topics. The world’s most popular philosophy blog, since 2003.

  1. Jason Leddington's avatar
  2. Jonathan Nash's avatar
  3. John Pillette's avatar
  4. AG Tanyi's avatar

Should book authors consent to having generative AI train on their work?

Philosopher Elliott Sober posed a version of this question to me via email, and thought it would make a useful topic for discussion.  I agree.  What do readers think?  What are the pros and cons?  What decisions have you as an author made when approached by publishers about this?   Please note your academic discipline if it is not philosophy.

Leave a Reply to Keith Douglas Cancel reply

Your email address will not be published. Required fields are marked *

13 responses to “Should book authors consent to having generative AI train on their work?”

  1. T&F decided for all its authors (me, via Routledge) without their consent, and without even informing them of what it had done.

    For my part, if I had been asked, I might well have consented in exchange for payment (provided it was at least in the three figures, rather than just one or two, and despite my general hostility to all of this LLM crap). But I wasn't even given the option of being mercenary about my work.

  2. I write academic philosophy, but I also represent my wife's intellectual property (she is a successful novelist). It is probably worth emphasizing that, in purely practical terms, your consent is unnecessary. If your work exists in a machine-readable format, it can and probably will be used to train generative AI models at some point, whether "legitimately" or by people in places with looser IP laws. If consenting to this process gets you some benefit you might otherwise lose, then it seems practical to consent. If you have a moral objection to the development or use of generative AI, or if you just don't like the idea of people doing things with your work without your permission, you can express that in part by withholding consent. But this will be primarily a psychological victory.

    On my view, the somewhat open legal question that remains to be answered (perhaps by the actions between NYT and OpenAI) is whether "training" a model is "Fair Use" of a work. The argument from the AI developers is that an AI "reading" your book is not so different from a human reading it. The argument from rights holders like the NYT is that this allows AI developers to free ride on the labor of others.

    I have argued elsewhere that this leaves authors in a weak position, because no single author is very valuable to AI development; LLMs need large libraries to draw from. Companies like the NYT, academic publishers, and traditional publishing houses–who already have large databases–may benefit from a NYT victory, and of course tech companies would benefit from an OpenAI victory. But individual authors, at best, may see some de minimis royalty payments (if they consent) or nothing at all (if they don't). Either way, the resulting AI is probably not going to be aligned to your interests.

    One way to maybe respond to this would be to look to existing models of both open software and open scholarship. If universities were to develop open, public LLMs and other AI that is not anchored to corporate interests, authors might be put in a position to "elect" which kinds of LLM their work can permissibly be used to train. It may be worth noting that this was roughly the original idea behind the non-profit version of OpenAI (hence its name), though that model has of course since been abandoned.

  3. No, of course we shouldn't consent, and "Well, they're going to steal your stuff anyway" is one of the worst rationalizations for doing so I could imagine.

    One must simply refuse such things. That others are going to behave in a criminal or unethical manner is their issue, not yours. And if they go to far with the wrong people or institutions, it may wind up costing them. In the meantime, one does what one can in one's own small capacity: i.e., say "No."

  4. Dr. Kaufman, on belief that your response was a reaction to my commentary: I agree that "they're going to steal your stuff anyway" is not a reason in itself to consent to the use of one's work in training LLMs. Dr. Leiter's question asked after "pros and cons," however, and the practical impact of consent (or its lack) seems relevant to whether and how one should approach negotiations in the near term.

    You have suggested that nonconsensual use of an author's work may be "criminal or unethical." But this remains at least partly to be seen. If the U.S. government legislatively or judicially determines that LLM training is "Fair Use," this would moot the question of author consent as a matter of law. In the same way that authors cannot now prevent their critics from quoting their work, authors would lack any legal right to prevent the inclusion of their work in LLM training data. I do not personally think this is the most likely outcome, but for now it is one possible outcome, particularly given the political influence of Silicon Valley. For comparison we might look to the Supreme Court's commentary in U.S. v. Causby, which arguably rewrote the common law of property for the sole benefit of the fledgling airline industry:

    "It is ancient doctrine that at common law ownership of the land extended to the periphery of the universe — cujus est solum ejus est usque and coelum. But that doctrine has no place in the modern world. The air is a public highway, as Congress has declared. Were that not true, every transcontinental flight would subject the operator to countless trespass suits. Common sense revolts at the idea."

    It may be that withholding consent now is a good way to influence public discourse on the matter. If one wishes to prevent LLM training from becoming "Fair Use," refusing consent today might contribute to public perception against LLMs. However it is not clear to me that this is obvious; the opposite could also be true, as a large number of authors refusing consent could galvanize lobbying efforts toward codifying LLM training as Fair Use. Given these prevailing uncertainties, I do not regard consent or non-consent to these requests as an easy or obvious question at all. My suspicion, though it is only a suspicion, is that the best most authors will be able to accomplish when presented with such questions, is to either extract some money or other benefit in consideration of consent, or publicly register their position on a contested policy argument by withholding consent.

  5. (My original disciplines were mostly philosophy and logic; now I work in cyber security and from time to time attempt tiny contributions to my original fields and regard my views as disciplinarily neutral but more severe for fields where precise diagraming or symbolism is important.)
    I would point out that basically these systems destroy data integrity – that's what the so-called hallucinations are about. So one side effect of consent might be that your work is not only used, but misrepresented, mashed up with that of others, confused with work of similar character or similar name to yours, etc. For example (note: this one is now fixed, likely because someone at OpenAI reviews these requests from time to time) it was easy to provoke ChatGPT to conflate David Hilbert the philosopher of perception with David Hilbert the mathematician. Current models are especially bad at handling image segmentation which might also play into decisions by some. For example: Bing Copilot a week or so ago could not draw a label-and-line diagram of a very simple case from organic chemistry. I have not tried much symbolism in logic recently and should get back to that (there was a lot of discussion about image segmentation and object permanence – for video – that I was investigating). For myself, then, these reasons lead to "likely not" in many cases.

  6. Suppose I was to write a book containing only the proprietary code for a given LLM. (We shall suppose the code was provided to me by an unnamed whistle-blower.) Suppose further that the general public does not know which LLM I describe in my book, but they know that I will be describing one of the leading LLMs which was developed by a for-profit tech company. If you are a lawyer working for one of these tech companies, and you hear that my book will be published in the near future, do you think that I ought to give my consent to allow not only your company’s LLM, but also your competitor’s LLMs to train on my book? If not, why not?

  7. I should add first, I'm a translator, however I have zero training or experience in copyright law (as my focus is ancient manuscripts). The question that springs to mind for me is, based on what was helpfully stated previously—namely that LLM companies claim that what their models do is legally not sufficiently different than a human agent's reading of a book and (it is implicit) that they themselves do not publicly copy books wholesale, but rather rephrase, paraphrase (and not even excerpt verbatim)—what is the legal permissibility/advisability of an LLM company stating that it's actions are creative collage and little different than an artist's or a translator's? For example, If I were a translator working for the UN or EU what better way to execute a translation request than by looking at the source text and then scouring the corpus of recent online UN documents available online published in the target language for an officially published (and therefore sanctioned) text that already contains a nearly perfect translation of the first phrase in the source text I'm working on and continue using the same method for each phrase until "my" translation is complete? The resulting translation is phrase for phrase identical to another previously published document but at the same time in no way identical to any other document per se in existence (I would have made sure of that). Isn't this, grosso modo, a significant facet or aspect of what these LLMs are doing at macroscale, with an added veil of rewording in the end? If I'm correct in my intuition, how is this any different than human creativity a la collage, except for the obvious scale, volume, and speed? (Not to mention that it seems to signal the Borgesian propheticness of just such an invention).

  8. I think this all somewhat misses the point of what I was getting at, but never mind. You certainly are right that we find ourselves entering a world that I barely recognize and very much dislike. What you say here and in your earlier remarks strikes me as a significant disincentive to make ones writing available to the public, which always has been most of the point of doing it. It also largely undermines what seems to me the spirit — if not the obviously very flexible letter — of what copyright laws are supposed to be about. But as we've seen in a much more frightening context, our law and the institutions that are involved with it seem intrinsically vulnerable to this sort of manipulation and corruption (in the descriptive sense of the term). If a convicted felon and a seditionist can be a Chief Executive and Head of State, why shouldn't a company be able to suck up everyone's stuff like a vacuum for whatever use that company likes and regardless of whether the people who made it consent?

    You want to use my work to train your horrible machines? I say "no." Your going to twist every law and regulation to make it so that you can, regardless of what I consent to? Maybe then I'll just stop making it and you'll have to do it with someone else's stuff. You say it doesn't matter, because there always will be enough of everyone else's stuff? At that point, I will congratulate you. You've won! Trophies all around! And I'll say to you what I say to the Trumpers: I hope you enjoy living in the world you've made. I'm just glad I spent the majority of my life in a very different one.

  9. Should we allow the blood-sucking publishers, who reap all the economic value of our work for free and gate the information we produce behind paywalls, to also sell this work to AI companies for the generous compensation of $0.00? Obviously not, independently of all considerations about AI.

  10. Consent? No consent required by my lights. This is basic fair use. LLMs are utter transformations of content, and rightly so. (See Perfect 10.) You don't get to write a book about BLAH that I don't get to read, ingest, process, and spin a riff off of. There is a moment at which it is no longer yours. I am a fan of this circumstance. I am not even sorry for your loss, poor author.

    I note that none of the complaints here point out the fact (so I assume?) that LLMs are hideous consumers, wasters, of energy and water.

  11. William F. Buckley, Jr. once said that "A conservative is someone who stands athwart history, yelling Stop, at a time when no one is inclined to do so, or to have much patience with those who so urge it." I am sympathetic to your conservatism! Generative AI already poses some genuinely troubling possibilities for the near future, even assuming we never get the "apocalyptic" advances certain commentators have warned against.

    But this is exactly why I do not think it is enough to merely be "for" or "against" consenting to the use of our work in LLM training. The djinn is free from the bottle, the cows are outside the barn, Elvis has left the building. Unless we intend to throw a Butlerian jihad, generative AI is out there, and growing. If we can find a way to contribute to the ethical development of humane models, aligned with our individual human interests rather than the interests of large publishers, Silicon Valley corporations, or oppressive governments, I think that would be better than the apparent alternatives. Open software and open academic publishing projects exist. Rather than saying only "yes" or "no" (or nothing at all), I would like for us to be able to say "here, not there," "this, not that."

    I admit, I do not have high hopes for this to occur! Your pessimism seems to me quite warranted. But I think there are ways forward, underappreciated and overlooked though they often be.

  12. Kenneth, this is all perfectly fair. Our respective reactions are likely as much a matter of temperament as to do with anything objective.

    I am the sort that when told "The cows are outside the barn" and other such things, I become less compliant rather than more. I don't accept "faits accomplis" dictated by others and especially those with no legitimate authority. If they want to create a horrible world and try to force everyone to live in it, I will not cooperate. And the fact that these things are presented as faits accomplis gives me every reason to think that they will *not* permit anyone to "contribute to the ethical development of humane models…..etc." People who are open to that sort of development don't present others with faits accomplis. And endeavoring to make such shit-piles into slightly nicer shit-piles just makes one complicit.

    But again, your remarks are fair enough. It's just not who I am.

  13. It is striking that leading language models, despite their ability to parse and generate nuanced linguistic content, fail to produce accurate depictions of coats of arms. This is perplexing given that heraldic language is one of the most methodical and precise descriptive systems ever developed, operating within the comparatively rudimentary parameters of its function. Rooted in the need for clear and unambiguous communication—especially in contexts of war and identity—heraldry relies on a symbolic vocabulary with unimpeachable clarity and standardized, non-alphabetical grammar.

    The paradox lies in the mismatch between these models’ linguistic strengths and their inability to translate heraldry’s minimalist precision into coherent visuals. If heraldic descriptions are so explicit, why does this clarity fail to manifest in generated images? Are such limitations merely due to insufficient exposure to heraldic content during training, or do they reveal something fundamental about the challenge of converting structured symbolic systems into visual form?

    J.R.R. Tolkien—the epitome of traditionalist devotion to the beauty and purity of medieval handcrafted preindustrial art—would doubtlessly have recoiled at the very notion of large language models, embodying as they do, in almost caricatural form, the famously ultramundane monopoly of the One Ring: tools of unfathomable power inseparable in their capacity for assimilation and simulation from hegemonic realpolitik directives of full-spectrum dominance. Grim irony then, that the niche art of heraldry has so far proven among the most resistant to the 'dark' magic of LLMs.

    —–
    KEYWORDS:
    Primary Blog

Designed with WordPress