A newspaper moves into the digital long-term memory

By Christian BlockLex Kleren Switch to German for original article

Listen to this article

Open the lid, put the newspaper in, press a button, done? The digital archiving of newspapers is more complex than that. In spring, the National Library began preparations to include the Lëtzebuerger Journal, published between 1948 and 2020, in the eluxemburgensia platform.

This article is provided to you free of charge. If you want to support our team and promote quality journalism, subscribe now.

To describe it as a Herculean task is probably no exaggeration. More than 20 years ago, the National Library (BnL) set itself the ambitious goal of digitising the country's printed cultural heritage and making it accessible via the eluxemburgensia portal. One has to bear this in mind to understand the scope of the goal set in 2002. Novels, non-fiction books, daily and weekly newspapers, monthlies, all publications of public interest by municipalities, associations, institutions, postcards as well as posters, i.e. all documents that fall under the legally stipulated compulsory deposit (see info box), are to be preserved for posterity.

As luck would have it, the digitisation of the Journal coincides with the 75th anniversary year of the daily newspaper, which was published between 1948 and 2020. The entire process, from analysing the newspaper to scanning and preparing the metadata to publishing it online, will take a good two years. Users can already browse through the predecessor newspapers from which the Lëtzebuerger Journal 1948 emerged, namely the Obermosel-Zeitung (published from 1881 to 1941 and from 1945 to 1948) and the D'Unio'n (1944 to 1948), on the online platform eluxemburgensia.

The planning work for the digitisation of the Journal began about nine months ago. "The preparatory phase is very important. It decides on the rest of the project, " says Ralph Marschall. He is project manager, coordinator of the tender (more on that later) and developer of the eluxemburgensia platform.

"The preparation phase is very important. It decides the rest of the project."

Ralph Marschall, Project Manager and Coordinator of the Call for Proposals

"The Lëtzebuerger Journal comprises a total of 73 volumes. This is roughly equivalent to 20,000 issues and 363,000 pages, which are now being digitised, " Martine Mathay explains. She coordinates the planning phase as well as the preparations of the documents for digitisation. Analysing means keeping meticulous records of the number of pages, supplements, errata, the condition of the issue, the names of the editors. In short: a detailed inventory of the title for completeness and state of preservation as well as its editorial and publishing history, as Mathay adds. It is staff and students who do this painstaking work. Going through the approximately 300,000 Journal pages took one person more than half a year.

Various criteria (see info box) influence the selection and prioritisation within the framework of the digitisation project. Especially in the case of historical newspapers that were published over a long period of time, as was the case with the Journal's predecessor newspapers, the BnL tends to take a chronological approach, "that is, to work our way from the oldest and sometimes most fragile documents into the 20th century, " Mathay explains. In this way, the National Library pursues the goal of preserving publications that are most vulnerable due to their previous storage conditions, age and/or paper quality. Because sooner or later all newspapers "deteriorate into dust". It also has the advantage that the copyright situation is simpler. The image and author rights expire 70 years after the death of their authors. The National Library assumes that works published before 1881 will be in the public domain from 2023. "For historical newspapers, where the rights of the authors and producers have expired, the rights situation is relatively simple and the contents are freely accessible. For all other titles, comprehensive conventions with publishers and rights clearance with authors, journalists, photographers, illustrators, cartoonists or their beneficiaries are needed." To this end, BnL, in cooperation with publishers and Luxorr, conducts extensive research to identify and contact rights holders and obtain their consent. The experience of a systematic rights clearance with the journal for Luxembourg history Hémecht (which has been published under this name since 1964) and Lëtzebuerger Land (since 1954) respectively has shown that objections are the absolute exception.

"Since newspapers often publish many unsigned articles, photographs and illustrations, identifying the authors often proves difficult, if not impossible, so that a residual risk remains, " Mathay explains. However, if a complaint is subsequently made, the relevant texts can be blacked out in the online version and made visible again 70 years after the author's death.

Why age does not necessarily allow conclusions about the state of preservation

Martine Mathay picks up a cardboard box and lifts it onto the table. "This is a typical example of how we receive donations." Stacked in the cardboard box are folded copies of the Luxemburger Zeitung (1868 to 1941). Just as someone must have put them in decades ago, to stow them away later in an attic or cellar. The rival journal to the Luxemburger Wort is part of the current digitisation campaign and is eagerly awaited by researchers from various fields. At first glance, you can see that the 1916 edition has split at the folds. "Newsprint is made of cellulose, often based on wood or waste paper. Acids that get into the paper during production break down the cellulose, which is responsible for the mechanical strength of the paper. As a result, the paper becomes brittle and cracked. Tears and breaks in the paper can be partially restored, even if text passages have been lost at the bending points."

About eluxemburgensia

  • More than 500 books, biographies, non-fiction or historical monographs, 113,000 magazines, 700 posters, 17,000 postcards, which all in all make up a fund of more than one million pages, can already be found in the digital collection. According to Mathay, the process of digitising books is "still in its early stages". "For books, we now invite tenders for one to two million pages a year and next year we will get projects back to the tune of one million pages. That's more or less the rhythm we have to maintain in the future," explains Ralph Marschall. The archiving work is made more difficult by out-of-print books or incomplete series. Every now and then, publications long forgotten by the general public suddenly reappear.

  • All publications issued in Luxembourg concerning political, economic, social, cultural, scientific, religious or tourist life must be deposited in the National Library in the form of obligatory copies. The legal deposit obligation covers books, brochures, newspapers, trade journals, posters, calendars, scores or plays, regardless of whether they are printed or digital publications. In annually compiled national biographies, which cannot currently be viewed online, the BnL keeps a record, so to speak, of the country's journalistic life.

  • In selecting publications for digitisation, the BnL follows a variety of criteria ranging from general condition, rarity value and completeness to historical and scholarly interest and public or research demand.

    The preparation phase, which runs parallel to the rights clearance of the individual titles, can be divided into four work steps:

    • The selection and prioritisation of titles according to the aforementioned selection criteria
    • The analysis of the titles
    • The completion of the missing editions through loans or donations from cultural institutions, publishers and private individuals.
    • The restoration and packaging in archive boxes (transport and storage).

Ralph Marschall, Martine Mathay

You can't necessarily draw conclusions about the age of a newspaper – apart from storage conditions – as to its state of preservation. "We have newspapers from the 19th century that still look very good today, simply because they were a completely different type of paper." Until the beginning of the 19th century, paper fibre was obtained from linen (flax) by using textiles or rags. In contrast to industrially produced paper based on wood, this historical paper is more durable, she says. This is an advantage for its conservation.

The first Journal issues don't necessarily look very fresh either. A large stain adorns the front page of a copy of the first edition – perhaps due to a cup of coffee or because the paper got wet – and brittle dog-ears the margins. "In the post-war period, paper was expensive and the quality of the paper used by the Journal in the early years was not the best."

The retained volumes have to be prepared and restored for the scanning process. For this purpose, the pages are detached from the book cover, dog-ears and wrinkles removed and then smoothed in batches overnight in presses. Later, the loose pages are placed in archive boxes that prevent the paper from being further damaged by environmental influences. "We try to keep the newspapers in this format on paper for as long as possible."

Before that, it is important to ensure that the printed product is complete and, ideally, to replace missing or damaged issues with better-preserved copies. This is not always an easy task, because from a purely material perspective, the lifespan did not really matter when newspapers were produced. It was a commodity whose contemporary historical value was only recognised much later. Since newsprint is a fragile medium, this phase could be very lengthy, depending on the title, Mathay reports.

"We have newspapers from the 19th century that still look very good today, simply because it was a very different type of paper."

Martine Mathay, Coordinator of the Planning Phase and Preparations of Documents

In August, the preparatory work for the digitisation of the Lëtzebuerger Journal had not yet been completed, "by the end of October at the latest" it should be. "We want to put out the call for tenders before the end of the year." The BnL is leaving both the restoration and the actual digitisation work to specialised external companies. The specifications comprise two small folders in printed form. Everything is regulated down to the smallest detail. "It is mainly companies from Europe as well as India that are taking part. But the newspapers stay in Europe and are scanned here, " Marschall explains. In this way, BnL wants to limit risks in as well as impacts from transport. "This is important to us because we invest a lot in the restoration and archiving of the documents and this will later be our archive copy, " adds Mathay. So far, she says, there have been no problems with this – even though in one case, due to the Ukraine war, the transport company had to take a major diversion. Like all other documents, the archive copy will later be preserved at 18° Celsius and 50 per cent humidity – the optimal conditions for preserving paper – in the warehouse of BnL's new building, which was inaugurated in 2019. Clearly, access to the originals will be restricted once the newspapers are available online.

It is expected that the archive boxes of the four titles – the tender includes the Journal, the Tageblatt from 1951 to 2014, the Cahiers Luxembourgeois and the Annalen des Acker- und Gartenbau-Vereins – will start their journey at the beginning of next year. Approximately eight months and much more of the lives of journalists, photographers and many other participants are packed into one box. "That's how we get to almost a million pages, " says Mathay. The entire online collection of eluxemburgensia will thus double to more than two million pages in the near future.

Digitally capturing a print product is again a "huge task" in itself, says Marschall. Page by page, all documents are scanned. Then the digitised pages go through a semi-automated process that recognises text and the various layout elements (titles, articles, illustrations, advertisements and much more). Nevertheless, all pages are examined again individually to ensure that the various elements are structured correctly. When the project is completed (probably in spring 2025), the articles from the Journal 's recent past will also be revived, which had to give way to the new web presence of the digital magazine in the course its reorientation, much to the chagrin of some contributors and observers.

New AI tools at the start

By 2030, the National Library hopes to have at least the majority of Luxembourg's daily and weekly newspapers digitised. But even then, much remains to be done. "Luxembourg is in the unique situation of being a country that has published an extremely large amount. In the 19th and 20th centuries, a new publication appeared almost every month. Some were one-day flies that disappeared into oblivion after a few issues or after a few years, " knows Mathay, who also reveals at the on-site appointment that in the coming years, the newspapers of National Socialism and the resistance movements from the time of the Second World War will be processed, as will the remaining Luxembourg daily and weekly newspapers. The digitisation of the Zeitung vum Lëtzebuerger Vollek, the woxx and its predecessor, the GréngeSpoun, is also being planned.

"By 2030, we are trying to digitise the majority of Luxembourg's daily and weekly newspapers."

Martine Mathay

Sooner or later, the digitisation of historical documents will catch up with the present. The current articles in newspapers published by Luxembourg publishing houses will not be publicly accessible to everyone tomorrow. Conventions with the National Library provide for a "moving wall", i.e. a time gap between the publication date and the making accessible of the current issues.

Meanwhile, the eluxemburgensia platform is being further developed. After all, the platform does not want to be simply a virtual exhibition space, but to make it possible to work with the electronic versions. According to Ralph Marschall, the BnL is involved in several AI projects. For example, the OCR technology, which reads the text from the scanned images, can extract a much better result from texts with Gothic lettering type or multilingual articles than was the case a decade ago. Also in the project phase is automated object recognition on images or automated grouping of thematically related articles (topic modelling) to further improve the user experience in front of the screens.

Despite a clear trend towards digitisation, print is far from being obsolete. "Every month, there is at least one new title in the National Library's mandatory collection, " Mathay reports. So, one way or another, the BnL will not run out of work that quickly.