Schedule - Editable PDF

The following submissions have been accepted for presentation at the workshop. The programme is as follows:

2:00 – 2:30pm: Aaron MacSween, Caleb James DeLisle, Yann Flory, all from XWiki SAS, France:
Confidential authoring and publishing workflows with CryptPad
2:30 – 3:00pm: Patrick Golden, The University of North Carolina at Chapel Hill:
Florilegia: Organizing scholarly annotations in PDFs
3:00 – 3:30pm: Tamir Hassan, Round-Trip PDF Solutions, Vienna:
Beyond the PDF: A new publication workflow based on … PDF 😉
3:30 – 4:00pm: Coffee break
4:00 – 4:30pm: Maria Da Graça Pimentel, Kamila Rodrigues, Isabela Zaine, Bruna Cunha, Leonardo Fernandes Zimmermann, Larissa Cardoso, Caio Cesar, all from the University of São Paulo:
Collecting and visualizing multimedia data toward fostering data sharing and reproducible procedures
4:30 – 5:00pm: Gioele Barabucci, Cologne Center for eHumanities, University of Cologne:
The human cost of making a better publication
5:00 – 5:10pm: Closing notes

As each presentation has been allocated a 30 minute slot, presenters are asked to constrain their presentations to around 15 minutes, in order to leave ample time for discussion.

MacSween et al.: Confidential authoring and publishing workflows with CryptPad

Scholarly publishing often follows an iterative model, with versions of documents passed between authors and editors.
Such a workflow can make collaboration difficult, as an author may have to avoid making continued changes to the work in question before suggestions are made, so as to avoid conflicts.
Near-real-time collaboration has achieved widespread adoption as a solution to the difficulties introduced by high-latency communication.

Collaboration systems which satisfy the demand for low-latency iteration have historically been based primarily upon centralized algorithms which require access to the content of the document being edited. For projects with sensitive content matter, the level of privacy provided by such an architecture may be insufficient.

We will present CryptPad, an open-source suite of collaborative utilities providing low-latency multi-user editing, file-sharing, and instant messaging, and a familiar drive interface for organizing content. As suggested by its name, CryptPad’s design has a foundation in cryptography, such that the service provider is not able to observe the contents of users collaborative documents, messages, or uploads.
Our presentation will focus on three topics: current functionality which is especially suited toward scholarly works, the scope of metadata which is accessible to an observer, and a forecast of planned future developments.

Golden: Florilegia: Organizing scholarly annotations in PDFs

The life of a scholarly publication only begins as it is released in a final form to a library or database. From that moment on, it is made available to researchers as a document to be read, understood, and connected to other documents. Researchers will interact with documents through the process of annotation: making visible marks in order to express importance, emphasis, commentary, agreement or disagreement, and so on. Annotation techniques that developed with printed books have persisted with digital publishing. These include highlighting, underlining, marking with symbols, adding notes in margins, and bookmarking pages, all of which are part of the PDF standard.

Yet while practices for publishing and reading PDF documents have advanced and proliferated, the way that researchers are able to interact with and organize annotations has not generally improved. PDF readers are expected to provide well-developed interfaces to create graphical representations of annotations analogous to those marked on paper, but those annotations themselves are rarely presented in aggregate as anything more than a simple chronological list of marks made on a page. Given that annotations are such a crucial part of the scholarly research process, more systems should be available that treat annotations themselves as documents worthy of being described, recalled, and connected in their own right.

Over the past two years, I have started to put together an application, tentatively called Florilegia, which is meant to capture and organize annotations in PDF documents. I will present the work that I have already done to develop a library to convert annotations into RDF via the W3C Web Annotation standards, as well as an early application using that library for end users.

Hassan: Beyond the PDF: A new publication workflow based on … PDF 😉

The familiar PDF-based publishing workflow still prevails in many fields, including DocEng. In the past few years, several alternatives have been proposed, based on light-weight markup languages, which better meet our expectations of flexibility and structure.

However, we need to be careful what we wish for. PDF has two main benefits, which are often taken for granted: a robust, authoritative visual presentation and a self-contained file, which is easily archived and can be opened on any PDF viewing application. Moving towards markup languages usually means forgoing these benefits.

But it doesn’t have to be this way. By embedding structured content in a PDF file, so-called “hybrid” PDF gives us the benefit of both worlds, combining visual fidelity and robustness with the structure that we have now come to expect from a modern document exchange format. Such files remain backwardly compatible with regular PDF viewers. We will present a novel workflow in which papers are authored in the standardized JATS-XML format and later typeset and converted to hybrid JATS/PDF files, demonstrating the Texture GUI and Pint typesetter in detail.

Pimentel et al: Collecting and visualizing multimedia data toward fostering data sharing and reproducible procedures

Research in several areas make use of data captured by various instruments such as surveys and interviews. Social scientists, for example, apply questionnaires to groups of individuals to study relationships among individuals within a society. Health researchers conduct surveys to identify, for example, health trends within a population. Computer scientists perform experiments that demand, among others, analyzing logs resulting from user-application interaction. Moreover, using audio or video to capture interviews is paramount to research in many areas. We demonstrate two tools designed to support researchers collecting and visualizing multimedia data toward fostering data sharing and reproducible procedures. The first tool allows researchers to collect data using graph-based event-aware multimedia programmed interventions to asynchronously interact with participants in their natural environments. The second tool allows researchers to annotate interviews during and after the session. In both cases, metadata is automatically added to the collected data to support analyzing and exchanging the corresponding data. Using results from the use of the tools to support the research process in distinct scenarios, we discuss the opportunity for publish the corresponding data to foster peer review, data sharing and reproducible science.

Barabucci: The human cost of making a better publication

When a research field is young, the simple act of sending a half-baked idea to the editor of a journal may be enough to grant its publication.

In more established fields, a publication about an idea has to come with justifications, clear explanations of the methods, theoretical and practical experiments and a discussion of the impact of this idea on the world. In addition to all this, there is also a push towards asking authors to be layout experts and create beautifully crafted and typographically awesome camera-ready papers. On top of all this, recent publishing venues are starting to also ask for animated interactive graphs.

The publications produced in this new way are undoubtedly a joy to read. At the same time, however, they cost the authors a huge amount of time. Spending a lot of time on creating such perfect publications means that the researchers have less time to think about new ideas and write about them. In turn, this means that the public will see fewer ideas, but better explained.

Is the increasing human cost of publishing something to worry about? Or should we be happy that there is a contrasting force to the deluge of publications caused by the publish-or-peril model?