DataElix : Projects

It is common knowledge. We are in the midst of a paradigm shift to digital. We see it daily when we write emails, prepare briefings and author virtually all of our business information. We produce all this work in digital and have been doing so at a breath-taking rate for decades.

What is not obvious is that Information Management practices have made the same shift. In consequential ways, we continue to manage information as if we were still operating in the old paper technology paradigm. Metadata management, record-keeping, and archives among others, are firmly rooted in practices that were born in the paper age. As a result, so are the digital backlog, limited findability and other IM problems.

Digital paradigm shift

Saying that we are in the midst of a digital paradigm shift may seem like old news. We have been talking about paradigm shift, the replacement of one established system by a new one, for decades now. Yet Thomas Kuhn’s original analysis still yields powerful insight. For Kuhn, a paradigm shift is not just a displacement of an old established system with a new one, overnight. It is a long-term process, slowed by resistance from a mindset and well established practices that belong to the old paradigm. In this light, the shift from a paper to a digital paradigm is as much about the persistence of the old mindset and practices as it is about the innovation that catalyses it. It also explains why the digital paradigm is still shifting 30 years in.

Managing digital records as if they were paper

Disposition is a central, if not defining element of recordkeeping. It is the process that determines which documents to keep and which ones to delete. When records have reached the end of their business lifecycle, they are either kept as a record of long-term value or disposed of as having no future value at all.

Disposition is essential in the paper world, to make room for new documents in filing cabinets, storage shelves and record centres. In other words, it is based on the imperative to manage a physical medium in a physical space.

Digital is different. It is not paper. It takes up no physical space. This alone should mean that disposition that rests on the paper paradigm should evaporate, provided we don’t print the digital documents.

Even though digital information does not take up physical space, it is argued that it still takes up digital space. We are told that if we don’t dispose of our digital backlog, we will face the substantial and unnecessary cost of storing, backing up and protecting useless and noisy information.

While it is true that digital storage costs can be in the tens of thousands of dollars (TCO) per terabyte, these are more than offset by the overall cost of the disposition process. Ongoing and often repeated efforts to define policy frameworks, developing consensus- based retention schedules and the activity around reducing the immense digital backlog, are orders of magnitude larger than the costs of storing. One can even argue that the cost of a single enterprise wide “clean up” day alone, exceeds the cost of incremental storage.

If disposition based on the imperative to make room for new information or to control the spiraling costs of physical or digital storage makes little sense, keeping everything then becomes a real option and one with important consequences.

First, the costs of eliminating the digital backlog are substantially reduced. But the most significant consequence is the change in mindset from viewing the digital backlog as a risk to be managed or a burden to be borne, to one where records and information are treated and valued as knowledge assets to be exploited.

The geography of knowledge - classifying information as if it was located in space

A second way in which information practices are still stubbornly paper-based is manifested in metadata management. Current practice is anchored in classification methods developed by librarians over the centuries.

The sole purpose of the library system was and is to locate paper documents amongst thousands of others in the physical space of the stacks. Under this system, documents are catalogued according to a subject-based taxonomy that moves from general to specific. For instance, all agricultural knowledge is grouped in one section of the library stacks, more specific information on dairy farming is located on shelves in that section and so on. The subject, the sub-topic, and the sub-sub-topic are translated into an index number that refers to one unique location in the library.

This “geography of knowledge” (Weinberger) works well in the tightly managed environment of the library or a paper records office where information growth is controlled, where the staff have the time and expertise to manage the rigorous process, and where things are still stored on shelves and in boxes.

The problem occurs when file plans and taxonomies are translated into the digital workplace where information is growing at exponential rates, where users are responsible for classifying their own documents but have neither the knowledge nor the personal inclination to do it. As a result, file plans are, at best, continually being rethought and at worst fragmented and unused.

But, the biggest problem with this paper-based mindset is that it closes off examination of the rich metadata structures inherent in unstructured digital information.

The structure of unstructured information

One of the deeper assumptions of paper-based IM practice is that subject-based metadata or file plans are necessary because access to the document’s content stops at the book cover. Metadata is believed to provide the only context or wrapper for the information locked inside. Without this wrapper there is simply no other way to navigate the geography of knowledge.

Even if we could go behind the book cover, the full text of the document is so noisy and lacking in structure that it would be of little use. This is, of course, why we call this kind of information unstructured.

But, digital information is different. The barrier between the metadata wrapper and the content of the document does not exist. And, the assumption that unstructured information is, for lack of a better word, unstructured, is false. Documents have rich and granular structure.

At the most fundamental level, all written information in a document is language! And, the written language has both semantic and syntactic structure that enable the readers to find meaning in it and to classify it.

There are also titles and subtitles, document headers, pages, and paragraphs. All of these elements are unexplored by metadata practitioners who continue to focus on the wrapper.

While IM and IT professionals haven’t taking advantage of the breadth and granularity of this inherent metadata, the value of these structures has been fully recognized by search engine and AI developers. Google uses internal semantic and syntactic metadata along with traditional classification elements to determine relevance. It has been highly effective in doing so, despite billions of documents being indexed daily with almost as many different classification schemas.

Digital knowledge

The real promise of these inherent metadata structures is that the digital backlog, opened to search engines and AI, can now be seen, not as a records management burden but as a source of digital knowledge.

Search engines, cognitive processors and AI have been designed to access and to mine these internal structures. Each and every document can be combined or synthesized with countless other documents to make new connections, and to reveal new insights.

Imagine if we kept everything ever written by an accomplished policy analyst over the span of her 30 year career: all of the briefing notes and white papers, the working level emails exchanges and draft policy documents. What best practices, what contextual knowledge, what insight could be gleaned from this?

Also imagine if we could combine the policy analyst’s body of work with every other policy analyst, good or bad. What patterns would be revealed? What implicit knowledge would be made explicit across the thousands even millions of documents? What big data insights could be distilled from this wealth of knowledge?

This discussion of the valuation of information and knowledge is perhaps one of the most important and strategic discussions for Information professionals to have. Regardless of what we think of traditional practices, the emergence of these born-digital tools is fundamentally altering the landscape and opening up the potential for new IM roles in curating and managing organizational information AND knowledge.

Digital transformation

Of all the things that should be digitally transformed, it seems that managing our information is one of the last holdouts. Even though the majority, up to 80 percent according to IBM, of government information is “unstructured”, we have continued to apply untransformed practices born in the paper paradigm.

This is a lost opportunity. Rather than mining and reusing the accumulated wisdom of government employees we have been asked to delete it. An updated and properly founded digital information and knowledge practice would no longer be limited by the paper mindset. It would not be bound by lifecycle management or deep taxonomy. Rather, it would focus on mastering insight engines and AI to curate their organization’s digital knowledge assets.

Articles

The Digital Paradigm Shift

Resetting information and knowledge practices in the age of digital transformation