Wednesday, January 11, 2023

Better article PDF

When you publish an article, you want it to be discoverable by other researchers. This requires that it be indexed. Indexing systems need metadata, which they usually extract from the PDF files in digital libraries or other document repositories.

Ideally, a manuscript submission system collects the necessary metadata from the corresponding author when the final manuscript is submitted. Since author names are not unique, a good system will require each author to log into the system using their ORCID credentials, which also ensures that all coauthors know they are such.

Some metadata is not known by the submitter, for example, the DOI, volume and issue number, page, and publication date. Such metadata is added by the managing editor. In some journals, the managing editor creates all metadata, but it is safer when the publication system generates algorithmically the metadata from that provided by the corresponding author.

Some publishers omit the verified ORCID collection from the authors. In that case, you should put each author's identifier in the author declaration. When you create your manuscript using LaTeX, you can use the package orcidlink. In the preamble add

\usepackage{orcidlink}

and in the author declaration add your ORCID

\author{John Doe\,\orcidlink{nnnn-nnnn-nnnn-nnnn}}

Some publishers just publish in their digital libraries the final PDF they receive. If you just submit the default format file you generated, it will not have any metadata and your article will not be discoverable. It is safer, to submit the article in the PDF/A archival format and with your metadata included as XMP so it can be extracted by the web crawlers of the indexing organizations.

You could accomplish this using the full version of Acrobat, but as I wrote above, manual operations are not recommended and you should let the LaTeX typesetter do it. Fortunately, River Valley Technologies has contributed a package called pdfx to automate this step. You already have this package with the standard LaTeX installation.

After importing this package, you also import hyperref. Then you write an xmpdata file declaring the metadata. You can find all the information in the exhaustive help file. The import order is important, for example in the preamble you could declare

\usepackage[a-1b]{pdfx}

\usepackage{hyperref}

If you are an editor and maintain a template for your journal, you can also embed the xmpdata file at the top of the preamble. The help file explains how to do that.

You can check the PDF format and the metadata with the free Acrobat Reader. I have generated the following two documents using the method described above:

article

slides