Amber or Archive? What to do before digitizing historic books

If you work in a large enough library, you’ll know that digitizing historic books is rarely just a capture job. By the time a project reaches imaging, many of the most important decisions should already have been made.

For museums, libraries, and other custodians, the challenge is not how to create digital copies, that’s simple these days. It is how to do so in a way that protects fragile originals, produces usable digital assets and supports access over the long term.

This is especially important with bound volumes, rare books, and archival material. Unlike modern paper records. These collections, naturally, carry value in their physical form as well as their content. Bindings, foldouts, annotations, inserts, and evidence of wear can all matter to researchers (even historic graffiti has value). So, according to the British Library’s digitization guidance, preparation and method are critical, particularly when dealing with rare or delicate items. The right capture setup, handling process, and conservation input can make the difference between a successful project and unnecessary risk.

How to get started digitising precious archives

The first step is deciding exactly what the project is for. That sounds obvious, but you might be already familiar with how easily the objectives of these projects become blurred. As just some examples: Some institutions want to widen access. Others want to reduce handling in reading rooms. Some are preparing material for publication, digital exhibitions, or internal preservation goals. Each of those aims leads to slightly different technical and operational decisions.

That is reflected in the broader guidance too. In the U.S., the Northeast Document Conservation Center notes that digitization can support preservation by reducing physical use of fragile originals, but it is not the same thing as preservation in itself. A digital copy creates a new preservation obligation, with its own storage, management, and authenticity requirements. In other words, digitization should be planned as part of a larger stewardship strategy, not treated as a one-off conversion exercise.

Selection is part of that. Not everything should be digitized in the same way, or at the same time. The International Federation of Library Associations and Institutions has long stressed the need to balance research value, condition, demand, rights, and institutional priorities when planning rare book and manuscript digitization. For experienced collection managers, that usually means asking harder questions up front. Which items are most at risk from handling? Which are most valuable for access? Which need specialist support before they can even be opened safely? And which require image quality or metadata depth that goes beyond a standard workflow?

Technical standards matter for similar reasons. If digital files may later support research, publication, or machine reading, image capture needs to be consistent and defensible. The Federal Agencies Digital Guidelines Initiative remains one of the clearest benchmarks for this kind of work because it ties image quality to measurable standards rather than assumptions. That can be especially important for institutions that want to avoid having to re-digitize material later because the original files were good enough for access, but not good enough for preservation or reuse.

The often overlooked importance of metadata

Metadata deserves just as much attention. A strong set of images is only part of the outcome. Files need to remain findable, understandable, and connected to the original collection record. The Digital Preservation Coalition makes the point well: digital materials need to remain accessible and trustworthy over time, which depends as much on structure and management as on the images themselves. In practice, that means thinking early about identifiers, pagination, structural metadata, naming conventions, and how digital outputs will connect back to catalogs, finding aids, or collection systems.

It also means being realistic about OCR and text extraction. These tools can be extremely useful, but historic books often present obstacles that are not obvious at the planning stage. Older typefaces, marginalia, bleed-through, damaged pages, tight bindings, and unusual layouts can all reduce accuracy. Both the British Library and the Library of Congress have highlighted how even well-established OCR workflows can struggle with historical material when the original imaging or page conditions are challenging. For institutions digitizing books for searchability as well as image access, this is worth testing early rather than assuming it will fall into place later.

Another point that is often underestimated is the relationship between digitization and physical storage. Creating digital access does not remove the need to care for the originals. In many cases, it increases the value of doing so properly. Once a collection has been digitized, there is often a stronger case for stabilizing, organizing, and storing the physical items in a controlled environment so that the originals remain protected and available when their material qualities matter.

Our work

That balance between access and stewardship is something we have supported in practice. Our work with Senate House Library in London involved the management of a major academic collection environment that included around two million books, 1,800 archived collections, and 50 special collections. In another project focused on historic document digitization, we worked with materials more than a century old, where careful handling and image quality were central to the outcome. Sitting across both secure, safe, sustainable storage as well as advanced scanning and digitization technology, we’re in a unique position where we understand the physical and non-physical needs. We have also worked with institutions including the British Library, reflecting the need for partners who understand both the digital process and the physical realities of heritage collections.

That combined view matters. Historic books do not sit neatly in one category. They are preservation assets, research assets, and often institutional assets too. A digitization project has to respect all three.

What to keep in mind before you get started

In our work with museum managers and library leaders like the Senate House Library, the strongest projects tend to begin with a small number of clear questions:

What is the access goal? What level of capture is appropriate? What handling risks exist? What metadata will make the output genuinely useful? Where will master files live? What will happen to the originals once the work is complete?

When those decisions are made early, digitization becomes more than a technical task. It becomes part of a broader collections strategy, one that improves access without losing sight of preservation, context, and long-term value.

Manage a collection you’d like to store or digitize? Get in touch with one of our experts today and we’ll work to understand your needs from the first minute.

Share this article