For better or for worse, PDF is the standard for exchanging scholarly articles. As the “lowest common denominator”, it accurately preserves the article’s content and presentation, regardless of whether a Word-, LaTeX- or InDesign-based workflow was used.
Due to its print-based legacy, PDF documents are rather inflexible. It is difficult to extract data out of the PDF or make edits or annotations during the reviewing process. For a number of years, there has been a push to move towards better structured workflows, typically based on XML, such as JATS and RASH, which don’t have these limitations, and using web technologies to render the document to the screen or printer.
However, the effect of such initiatives has been rather limited up to now, as publishers, libraries and search engines are all set up to handle PDF files. Furthermore, the move away from PDF would mean forgoing the two main advantages that that made it so suitable for publishing in the first place:
- High typographic quality and the ability to accurately maintain visual presentation across different devices
- PDF files are self-contained, containing all images, fonts, etc. Unlike the Web, if you’ve got the PDF, you’ve got it all.
These features are notably missing from the technologies earmarked to succeed PDF, which is why publishers are understandably apprehensive about moving to these formats.
So why not use PDF as a starting point for better structured documents? Hybrid PDF offers the ideal opportunity to embed data in other formats, such as JATS-XML, into the PDF file, giving users the best of both worlds. And, as it’s still PDF, it is supported by the existing infrastructure.