But isn’t the whole point of PDF that you can’t edit it?
No. The fact that standard PDFs are difficult to edit is more of an accident than a feature, as PDF’s roots are in printing, where only final-form documents needed to be transmitted. Many people believe PDF to be “impossible to edit,” but beware: minor edits in PDFs, such as swapping figures on an invoice, are trivial — therefore you need other technologies, such as digital signatures, to verify that your PDFs have not been tampered with. More extensive edits, however, are more difficult, as they require the document’s logical structure to be automatically detected, and this is an error-prone task.
Why not use web standards, such as HTML/CSS?
We do use the relevant parts of HTML and CSS, where appropriate. But web standards do not provide for specification of the layout of the document in a robust way, which is guaranteed not to reflow when opened on other systems. Furthermore, browser technologies are a moving target, with implementations changing very rapidly. Therefore, they do not provide a suitable basis for archival documents.
Having said that, the objective of our work is to bring web and PDF technologies together. However, we have taken the decision to start with the robust PDF format and make it more flexible; an alternative approach would have been to start with a web document and make it more robust.
Why use PDF as a starting point, then, and not the Web? Everyone has a web browser.
The installed base of PDF viewing software is just as large, and PDF offers the following advantages:
- First, from a user experience point of view, everyone already knows what a PDF is and how to interact with it, and editable PDF is backwardly compatible; users will still be able to open editable PDFs for viewing in the “legacy” PDF viewing app that they are accustomed to; only the editability is new.
- Secondly, from a technical point of view, PDF will display the same way regardless of which viewer they are using; in contrast, browsers have numerous incompatibilities and rendering differences.
- Finally, PDF has proven (typo)graphical fidelity; by basing Editable PDF on PDF, any PDF can be made editable without altering its visual presentation or sacrificing its fidelity in any way.
How does this project differ from the editing functionality in programs such as Acrobat Professional and FoxIt Phantom PDF?
The editing functionality of such programs is essentially based on intelligent guesswork. As PDFs are missing the (complete) structure and layout description in machine-readable form, this information needs to be rediscovered using AI techniques in order to reflow the text, which is why errors can occur. Such functions can be a life-saver for making single edits to a document in an emergency but, due to generation loss, are unsuitable for making repeated extensive edits to a document.
How does this work differ from simply embedding the document’s source in the PDF file (Hybrid PDF or PDF/A-3)?
Some applications (e.g. OpenOffice, LibreOffice and Adobe Illustrator) support embedding the document’s source in the PDFs they generate. When such a “hybrid” PDF file is opened for editing, the robustness guaranteed by PDF is lost, particularly if the file is opened on a different system, as minor technical differences can cause text to reflow. Furthermore, the document can only be edited in the one application that created; it is not possible to use several applications to work on one single document.
Are you sure we need such a editable PDF format? I believe one of the most important benefits of PDF is its concrete, solid state.
The idea of Editable PDF stems from a real-world need to improve the efficiency in the way that we work with documents. Today, the only editable file formats are those native to the applications that generated documents, and none of these formats guarantees the layout to be preserved in the same way as PDF. Furthermore, despite improvements in compatibility, using a native file format still often requires the recipient to be using the same software (and often the same version) of the application, which may not be available.
PDF’s largest asset, its rock-solid visual presentation, will remain, and editable PDFs will be backwardly compatible with the current installed base of PDF viewers such as Adobe Reader and Preview.
What if text is curved/outlined?
The objective of the Editable PDF Initiative is to define a format; a set of standards for how PDFs should be generated and which metadata must be included in order for the file to be editable. Text as curves or bitmaps would obviously not conform to this standard. Ideally, conformant tools would be used right from the beginning, and such issues would not occur.
The conversion of “legacy” PDF documents to editable or structured form is a related topic; however it is outside of the scope of this project, and there is in fact a whole field of research, known as document analysis and recognition, which deals with such problems as converting curves back to text, etc. Given the complexity of this problem, such conversions are rarely 100% accurate.
Where do you plan to save fonts and how to manage licenses and copyright? How do you plan to edit a PDF if the fonts have been subsetted?
This issue has been discussed in detail in the upcoming publication at DocEng 2018. Thanks to initiatives such as Google Fonts, there is now a much larger variety of high-quality fonts that can be freely embedded in editable documents. However, there is still the need to be able to embed commercially licensed fonts in PDF documents, particularly if they form part of a corporate identity. Most font vendors have up to now been happy to permit subsetting of their fonts for this purpose, as it would be difficult to extract a usable font from this limited information.
If fonts are subset, it is therefore important that their complete metric information is encoded in order to be able to synthesize an alternative font to exactly fit the space of the original. This enables such documents to be edited by collaborators outside of the organization using this substitute font, whilst ensuring that the content will fit correctly when the correct font is reapplied.