Leiter Reports: A Philosophy Blog

News and views about philosophy, the academic profession, academic freedom, intellectual culture, and other topics. The world’s most popular philosophy blog, since 2003.

  1. Matt Lister's avatar
  2. Henry Clarke (acquisition editor, OUP)'s avatar

    Professor Halbach raises three concerns. One is about the availability of PDFs of OUP monographs. Another is about version discrepancies…

  3. Alejandro Esteban Camacho's avatar

    I should clarify that this was the conclusion of university counsel as a matter of policy as well. But of…

  4. Alejandro Esteban Camacho's avatar

    I have no expertise in this, but i have been looking for such a program for student papers. My understanding…

  5. Samuel Murray's avatar

    I just tried Pangram out as a test. I uploaded two chunks of text that were almost entirely AI-generated using…

  6. Nathan Meyvis's avatar
  7. Jason's avatar

    Not selling PDF versions of books, or shifting to HTML-only versions for web access, will do absolutely nothing to prevent…

HTML or PDFs online?

MOVING TO FRONT FROM MARCH 19–SEE REPLY FROM OUP PHILOSOPHY EDITOR HENRY CLARKE IN THE COMMENTS

Philosopher and logician Volker Halbach writes:

Oxford University Press recently told me that “it appears that we no longer make individual pdfs of our works available for sale, as digital rights management is extremely challenging for that format.” Generally, the only version of a monograph that is accessible online will be an html file. This is very worrying because the html version of every monograph that I have looked at differs significantly from the hard copy and pdf. Of course, line and page breaks of the paper version are lost in the transition to html. In the html, italics can appear as roman, sans serif letters grow serifs, and the size of letters and symbols can shrink or grow. In my field, logic, these differences can be significant, especially if the changes are not uniform. Generally, htmls are also much harder to read with their bad spacing and other limitations to typesetting. 

However, there are more serious concerns: Often parts of the text are added, replaced, or deleted. Here are some highlights from two monographs: On p.132 of Jarred Warren’s Shadows of Syntax, a line with two inference rules is just replaced with the word “math”. The reader of the html is left wondering what the rules might be. It becomes more confusing when symbols are added. One of the most dangerous symbols in this respect is negation: On page 261 there is an additional negation symbol making the claim there false in the html. On p.272 the reader of the html will be left with the impression that Warren does not know the main difference between Zermelo and Zermelo-Fraenkel set theory, because the crucial axiom of replacement is absent from the html. Instead the power set axiom is repeated. I also looked at Alex Paseau and Owen Griffiths’ One True Logic. On page 137,  eight lines (in different places) are missing. On p.141 the authors announce a formula “in all its unabbreviated glory”. In the html exactly this formula is truncated and the second half replaced with a single hyphen. There are many further deviations between the two versions that do not only impede readability, but constitute substantial differences in content.

None of the authors I have talked to have approved the html version. Already now there are books for which the pdfs of the hard copies are not available online. One can still click the pdf button, but this generates only a pdf from the html with all the mistakes. In such cases there is no legal way to access a digital version that has been authorized by the author(s). 

I became aware of the issue when I recently tried to persuade OUP to publish a digital version of my book on logical consequence that had appeared as a hard copy last September. They asked me to make some minor changes to the pdf on which the hard copy version is based. Thus there will also be two hard copy versions; and even readers who get hold of the paper version cannot be sure that they have the definitive version.

These sound like some rather serious problems that affect the integrity of published work. What do other philosophers or academics think? Press editors are also welcome to weigh in to explain this change. It might be useful to know whether other presses, besides OUP, are doing this. (Submit your comment only once, it may take awhile to appear.)

,

Leave a Reply to William J Rapaport Cancel reply

Your email address will not be published. Required fields are marked *

12 responses to “HTML or PDFs online?”

  1. Disability is probably the hidden variable here. The new disability rules that take effect next month require tagging and reformatting of PDFs; I’ve found that even top publishers have neglected to create accessible PDFs. And so perhaps the most future-proof approach is to avoid PDF. At Berkeley, we are getting commands to de-publish old courses and materials lest there be PDFs and other content not in compliance with WCAG 2.1, Level AA. Here’s a useful checklist of the new requirements: https://www.csulb.edu/sites/default/files/2026/documents/WCAG_PDF_Checklist_2026_TAG.pdf

  2. Margo Schlanger

    I don’t have any insight as to common practice — but I can say that HTML is much easier for screen readers to handle, which means that it’s more accessible for people who use those readers (people who are blind or low vision, have various kinds of visual processing disorders, and the like).

  3. I do not think this is a new problem with OUP. In fact, many of their older titles, the on-line version is HTML only. I have been teaching parts of van Fraassen’s Scientific Image and Laws and Symmetry. The OUP (p. 134) equivalent of CUP’s Cambridge Core, their on-line platform they sell to university libraries, only has HTML versions. I have not noticed changes in content, but I do hate how they insert the book page numbers into the text, as I did above, after “The OUP”. Somehow CUP and Routledge have been able to figure something out such that you can access PDFs that are identical to the printed version. So the problem can be solved.

  4. I have had similar problems with my Philosophy of Computer Science book (Wiley-Blackwell). The differences only became apparent when someone from China was trying to translate it based on the ePub version and found some obvious errors. I alerted my editor, but haven’t followed up other than to make the “corrections” in my online errata page.

  5. Jason Leddington

    Despite the inconvenience, this makes a lot of sense to me. Thousands of recently published philosophy books can be found online as PDFs (or in easily convertible ebook formats) on well-known pirate sites. Controlling this is nigh impossible if the books are published in easily and readily readable formats. Perhaps protecting the integrity of the publishing process means that we’ll have to start buying hard copies again and using digital versions as backups and research tools (they’re obviously easier to search) rather than full substitutes for physical texts. (Of course, someone can still scan a physical copy and upload it, but that’s a lot of work.)

  6. I’ve nothing to add except to reaffirm that Volker is right. It’s a mess, and likely to get messier. What would be useful to know is whether whatever led Oxford to its pdf/html policy is shared/embraced by, e.g., Cambridge or MIT or other logic-friendly philosophy venues.

  7. (Not an academic, but I read a lot of PDFs of current philosophy publications). Besides the big-picture concerns (like undermining the very practice of citation), this seems like OUP just can’t be bothered to do the hard parts of its job, and their clients are letting them get away with it. At the very least, I would expect OUP to respect a moral obligation to assure the fidelity of all official versions of their published works. That they would be content to leave such a void (the need for reliable, stable digital versions of their published works) unfilled, when they are the only entity legally authorized to fill it, signals a level of comfort that should be untenable in a competitive market.

    If this practice has spread to other academic publishers, academics and their host institutions may need to collectively demand industry standards for the fidelity and accessibility of digital versions, which are already the de facto definitive version for many professional audiences.

  8. I’ve had to deal with a few of these HTML e-books from OUP.

    Aside from the usual annoyances, I have two main concerns. First, anything in Greek becomes a mess. I think the diacritics are right, but random words are italicised, put into different fonts, and sometimes moved by themselves to different lines. It does not inspire much confidence that the text has been faithfully represented. Second, in some HTML versions, footnotes are changed to endnotes. To be fair, other publishers prefer endnotes over footnotes. Still, footnotes are frequently used for the original language passages. It is helpful to have them on the same page for reference, as in the typeset print and PDF versions of the books.

    I am fortunate because my library subscribes to a number of databases (including OUP Scholarship Online), which ensures fast and legal access to recent scholarship. However, in my country, it is very difficult for libraries or individuals to get hard copies of foreign books. In many cases, there is not a single library in my country that has a print version of the book I need. Therefore, I have to rely on the materials available through the online databases: I cannot go down the street and check out the hard copy if something looks wrong.

    Presumably other researchers in smaller or ‘developing’ countries are in a similar situation. It would be helpful to us if publishers can ensure that the resources we can legally access, in whatever digital format, are held to the same high standards as their print materials.

    1. My general rule is that any book involving extensive mathematical or logical notation should be read in hard copy. Digital versions are frequently unreadable and the slightest infelicity of reproduction can make the text incomprehensible- especially if, like me, one is often at the very extremity of one’s understanding anyway! (A professional might notice a typo in an equation, a novice is probably sunk).

  9. Why not publish open access? Are university presses such an important tool to generate money?

  10. Not selling PDF versions of books, or shifting to HTML-only versions for web access, will do absolutely nothing to prevent the proliferation of books available on pirate sites, so long as publishers continue to sell paper copies. And I cannot see them banning paper copies of books anytime soon.

    The reason is the well known “analogue hole” problem for DRM: regardless of how heavily encrypted digital data is, it must be converted to an analogue form (light and/or sound) to be perceived by a human. That analogue form is then subject to capture by encoding technology which can generate a DRM-free version for transmission. So anyone who wanted to pirate the book could get a paper copy, scan it, and upload it.

    Scanning a book is actually not a lot of work. It takes about 40 minutes to scan a book (two pages at a time, A3 paper) and use the right tools to crop it and manipulate it (all automatically) so that you get a pretty good OCR PDF version. Online piracy is generally done by people who have a strong ideological belief in making information freely available, and for each book it only takes one such person willing to invest that amount of time to generate a copy which will propagate all over the internet.

    So the upshot of this policy, I predict, will just make online versions of formal work less usable to readers while doing nothing to prevent piracy.

  11. Henry Clarke (acquisition editor, OUP)

    Professor Halbach raises three concerns. One is about the availability of PDFs of OUP monographs. Another is about version discrepancies across formats. And there is a concern about errors in HTML versions. I welcome the opportunity to clarify on these points.

    On the availability of PDFs: Professor Halbach reports that ‘Generally, the only version of a monograph that is accessible online will be an html file.’ This is not wholly accurate. OUP makes ebook available for purchase for the majority of monographs. EPUB is our preferred format for ebooks; as commenters on this post have highlighted, reflowable formats like EPUB and HTML have a number of advantages for both readers and authors. OUP does produce PDF versions (as displayed on Google Books, for example), and where an EPUB ebook is not feasible, a PDF version can be made available for purchase as an alternative. For example, the PDF ebook of *One True Logic* is available for purchase currently (e,g., here: https://www.vitalsource.com/en-uk/products/one-true-logic-owen-griffiths-a-c-paseau-v9780192565242). I have written to Professor Halbach to clarify that the PDF ebook of his monograph will be available for purchase when it is ready. (The file is currently being finalised following corrections being made to some errata in the text.) I regret that this wasn’t communicated with sufficient clarity or adequate speed to Professor Halbach to pre-empt his concern.

    As well as ebooks, OUP publishes monographs on the Oxford Academic platform. This platform presents text in HTML for display in web browsers, and also provides PDFs for download, as single chapters. PDFs available for download currently come in two different types: there are ‘print-replica’ chapter PDFs, which do reproduce the layout and typesetting of the print book, and HTML-generated chapter PDFs, which do not. For the majority of our newly published titles, print-replica PDFs are available; as an example, the chapter PDFs of *Shadows of Syntax* are print-replica chapter PDFs (https://academic.oup.com/book/33536). However, for legacy technology reasons, there is a proportion of monographs available on Oxford Academic for which only HTML-generated PDFs are available.

    About version discrepancies: correcting errors undiscovered prior to publication does raise the possibility of different versions in circulation, which is why we have introduced a robust post-publications corrections process to eliminate differences between formats. However, there will inevitably be differences between already produced print copies and digital formats when a book is made available in print and then undergoes subsequent corrections. To minimise the potential for version discrepancies between print and digital on publication, we have introduced a production workflow whereby the print, ebook, and online versions are generated from a single XML source, and this is now used for much of our publishing. In the case of Professor Halbach’s book, this workflow was not applicable.

    Finally, on concerns with HTML versions of OUP monographs: some text elements (e.g., displayed logical formulae) are challenging to display online and require special treatment which may be liable to error. Despite this, the improved access for readers that the online format provides is in the interests of both authors and the scholarly community. While we take errors very seriously, we have to accept that eliminating these entirely is not realistic, especially in cases where there is a separate process of composition for the HTML version based on the print files. Where we are made aware of errors, we will correct them, and we are continually looking at ways to ensure uniform high quality across our publishing formats. I have initiated a corrections process for both books mentioned by Professor Halbach.

Designed with WordPress