Monday, September 25, 2006

Converging classification schemes of documents and records

Matt McAlister blogged about Challenging why (and how) people tag things (brought to my attention by Colin on the Bankwatch blog). Based on my current exploration of exciting Records Management topics, I want to compare the formal classification schemes offered by traditional Document and Records Management with the more freeform user tagging technique used by a range of Internet apps, like, Technorati, Flickr and others. I'm also keen to work out if tagging can meet the requirements of corporate Records Managers, while being more acceptable to users than complex and clumsy formal classification schemes.

Over the next few posts I'd like to approach this comparison by first laying out how I see the current state of Document and Records Management classification, before diving into the new world of tagging.

Structured classification of records - Indexing / Titling

Traditional Electronic Document and Records Management Systems (EDRMS) use a range of well structured and generally enforceable mechanisms for classifying the unstructured data in documents so that they can later be rapidly searched and identified. Document 'titles' or 'indexes' are the metadata attached to the content of a record, the identifying data that describes succinctly and accurately what the content of the record holds. In addition, the metadata may contain other information about the record beyond being a direct description of its content, like the author, date of storage, the record's owner, and the policies around how to retain it. Document metadata is typically held in relational database tables with each field representing a metadata attribute.

Records are typically classified as being a specific type, which usually descibes the business use or organizational filing requirements for the document. It is this classification that sets the exact set of metadata attributes that must be filled in for the document. Dependent on how it is envisaged that the record will need to be searched and identified in the future, a range of attributes will be required. Using an insurance claims business unit as an example, the classification requirements for a claim form may be:

  • Claim number
  • Submission date
  • Claim type
  • Policy number
  • Claim value
  • Claimant name

Finding documents relating to a specific claim, or having an application automatically present a claim file when viewing associated data in a claims business system or even better a BPM implemented claims process becomes simple based on a quick query of the document database.

By having a well defined set of attributes entered for the document, traditional relational database queries can be utilized to search for documents, without resorting to looking at their content. This SQL-based search was essential in document management systems that were effectively providing an index of paper documents. Despite the fact that the paper was typically scanned to produce electronic images of the content, the meaningful text content of the images was not extracted for search.


Structured indexing of documents is fundamental to the way document management systems organize and search documents. It follows the original intention, to formalize the storage and use of documents making this unstructured data efficient within business processes. Records Management has different requirements around the long term retention, discovery and destruction of documents that needs different structured classification methods.

Any form of structured classification can seem clumsy and overburdening to a business user that otherwise has a 'real' job to do. The next posts will highlight the features of Records classification and start to introduce how documents and records metadata converge, while still requiring something more to be usable in many scenarios.


No comments: