Improving It: September 2006

Friday, September 29, 2006

Converging classification schemes of documents and records - Part 5 - Tagging and pulling it all together

All this week I have been exploring some of the approaches that enable the classification of documents and records in Electronic Document and Records Management Systems (EDRMS), with the aim of being able to see how the convergence of all of these schemes may lead to a more usable classification mechanism, as we have seen emerge with tagging on the Web. I have covered:

Structured indexing / titling of documents
Full text indexing
Taxonomy and fileplan
Keywords and thesaurus
This post: Tagging and combined classification schemes

Combined classification schemes

During the discussions, I have continuously reiterated that each of the classification schemes described is not complete on its own. Structured indexes are great for business process driven document managment, like my insurance claims example; full text indexing is great for unstructured, work in progress and documents that are fully described by their content; fileplans, the staple of traditional records management are great for organizing documents into big buckets representing the business activities around them; thesaurus enforced keywords provide taxonomic classification for items related to a specific domain.

All four approaches are in fact used together in EDRMSs, to provide a fairly complete classification of the documents for recordkeeping purposes. The structured index metadata captures a lot of information about the description, authorship, ownership and status of a record. A component of this, a single item of metadata, may capture multiple keywords driven from a controlled thesaurus enabling consistency in domain classification of the records. In traditional recordkeeping environments, the fileplan will provide a further big-bucket classification and may drive high-level security and retention. The fileplan may also drive the records management for physical documents and assets within the same environment. By far, most of the information is added by records managers at the point of declaring documents as official records (the definition of what makes a document a record was in the first post of this series).

This is a lot of metadata and classification that is captured, based on the original business use, future retrieval requirements and the content of the document. If documents have been electronic throughout their life, or at least prior to being records, often there will be collaborative tools and document management systems that have also captured their own metadata along the way. Much of this remains valuable to the business users to retrieve documents to do their jobs, even after documents have been passed for records management. Using this business information without having to duplicate it or the documents is a challenge. Vignette has some great approaches to this business document/metadata and record/metadata integration problem. They exceed the standard mechanism of a separate records management system referencing or copying documents in document repositories or filesystems that risks duplication , damaged data and broken custody issues.

Business users can't be trusted to classify records

Part of the reason for this week-long discussion of records management has been to get to the point of understanding why business users can't be trusted to classify records. Partly, this is because many business users can't even be trusted to store documents in a secure place without some carrot and stick persuasion. The feedback from users has been that even if they store documents, classifying them is a time consuming process that seems complex - being faced with a long form of data to fill in feels like completing your tax return.

Even if business users were inclined to store records, their business is not to understand the details of records management to ensure effective, consistent classification. So is there a halfway house?

Tagging for user friendly classification

Many Web 2.0 sites use tagging to help random Internet users find documents, blog posts, URL bookmarks, photos, videos, music and whatever else may be put out there by contributors. Contributors have a vested interest in tagging their content effectively, since decent tagging is likely to lead to more viewers.

Although the information contributor attaches may not meet the Dublin Core standard element set, the information he provides is likely to provide a fast and concise classification of the content, enabling it to be found by users when a full text Google search may not help.

Tags are just keywords that describe the content, but they typically don't come from a thesaurus, instead being added by the contributor to meet his current requirements. Taking a look at my Technorati tag cloud, you can see how there is a mass of tags that perhaps may have only been used once, and may never be used again. From a classification point of view within the context of this site, these one off tags probably reduce the value of the cloud for users trying to find relevant posts. From an external viewpoint, an anonymous user searching Technorati for interesting blog posts may find the pointedness of this classification valuable. If the user searches explicitly for tags like taxonomy+indexing, he or she gets 2 back blog posts that are identified as being relevant to those tags. Searching blog content with taxonomy+indexing leads to 452 posts, many which may just mention each word once in the course of the text.

The Wordpress blogging tool goes a little further, providing predefined categories for tagging posts. This enables some consistency to the blog categorization, useful for more focused blogs, like my (personal) pet project, Bruncher. Here, restaurants and diners that serve brunch are categorized with a predefined set of tags based on their location (e.g. MA, Boston), style (bar, diner), and most importantly how good a Bloody Mary they serve (0 - No Bloody Mary in sight, up to 4 - The best!). Wordpress allows custom tags to be applied to any post, but within this fixed domain of interest the fixed categories seem to work well and further tags may not provide any value.

Tagging for business users

In the business world, users are unlikely to be working on documents that require user friendly tagging that also has a very narrow domain or fixed set of tags, otherwise this whole discussion would be a non-issue; classification would be fast and easy. Therefore it seems that the freeform tagging technique may be a good start for providing some contextual information about the document they have produced. The accuracy of tagging is not enforceable, since it is completely freeform, but maybe it provides enough information to enhance document search and retrieval beyond pure full text indexing prior to a document becoming a record.

How can this be used to enhance record classification? A records manager could choose to use some of these freeform tags as keyword that they further refine at the point of filing. Alternatively, the records managers could collaborate with business users to provide a very limited set of fixed keywords for typical documents that translate directly to the records management environment, supplemented by freeform tagging to actually represent the context of the document.

Summary

Records management is a highly involved discipline, requiring specialist records manager to ensure the consistency of record classification to ensure that records are retrievable and are retained according to corporate policies and legal mandates. At the same time, there is a huge amount of electronic information being produced by the business that should be kept as records. Reusing as much of the business metadata as possible is essential to ensure efficient and scalable records management resources.

Tagging is a form of classification that seems to be acceptable enough to Web users that it could be applicable to non-threatening corporate document classification by business users as well. A combination of predefined categorization and freeform tagging may not only help users searching for documents find them prior to record declaration, but also assist records managers (and maybe automated systems) in the formal classification.

This is the end of this series of records management classification posts. I hope that some of the reference information will be useful to people over time and maybe some of the newer ideas will trigger some discussion. I have many thoughts for corporate document tagging, none of which I know of have been proven in practice in a corporate setting. I'd love to hear of any examples.

Thursday, September 28, 2006

Converging classification schemes of documents and records - Part 4 - Keywords, thesaurus and tagging

For last few days I have been posting about some of the approaches that enable the classification of documents and records in Electronic Document and Records Management Systems (EDRMS), with the aim of being able to see how the convergence of all of these schemes may lead to a more usable classification mechanism, as we have seen emerge with tagging on the Web. So far I have covered:

In this post I want to look at the thesaurus as an approach to help users in accurately applying metadata, since I believe that this is getting very close to the Web tagging paradigm.

Thesuarus / Controlled Vocabulary

In records management a thesaurus or controlled vocabulary is used to assist records managers to apply metadata to records that is more consistent and falls within a recognized taxonomy.

In yesterday's post I talked about taxonomies as being a way of classifying documents according to a predefined scheme. This has the advantage of guiding users to pick from available and recognized items when identifying their documents. A fileplan is a specialized form of taxonomy that provides a representation of the business or filing structure that documents relate to, and also provides some extra notation to assist in efficient filing and retrieval (the filecode).

A thesaurus is a specialized way of representing a taxonomy. It is used to add identification metadata to documents according along a specific classification dimension - it is limited to specific a domain or topic, not intending to fully define the record.

A language thesaurus that organizes the English language vocabulary and defines relationships between 'literary' words within it. A records management thesaurus focuses on a specific domain or type of activity (rather than the whole language), laying out a set of acceptable words or keywords that make up the vocabulary, defining the relationships between them. A typical way of doing this is to provide a tree of words, starting with the most general or broader terms within the topic and working towards the most tightly defined or narrower terms. The aim of the thesaurus is to ensure consistency of use of the keywords, so additional descriptions and scope notes are provided to help elaborate and reduce the chance of different people interpreting words differently. Within this hierarchy, there can also be relationships that cut across branches to show related terms.

The Keywords AAA thesaurus is a well recognized example from Australia, used in New South Wales government record keeping. The NAICS is a scheme that defines standard industries for commecial or employment classification, which many people will be familiar with when classifying themselves. Both of these schemes provide identifications for things within their specific domain and therefore do not usually fully describe the thing they are attached to.

In an EDRMS a thesaurus is a tool to help users pick the correct keywords to apply to an item of metadata for a record. Dependent on the definition of the metadata attribute, one or multiple keywords may be selected, to fully identify the meaning. Thesaurus keywords may be used alongside any other metadata to classify records, so metadata from multiple thesauri may represent the classification of a single record in the multiple domains of its use. Alternatively, a thesaurus can be used alongside more straightforward index metadata. The thesaurus really just provides a tool to help guide records managers to provide the most consistent and exact information to classify a record for a single item of metadata.

Summary

When used in records classification, a thesaurus is a tool that enables a specific item of metadata representing to be set with the most tightly defined term or set of terms available within the set. It can provide a tight definition of the records within the constraints of the recognized terms, a specific domain's taxonomy.

Typically a thesaurus is not used alone and much like a fileplan is just another way of more accurately identifying records for storage and effective retrieval. It is a tool to help users pick the correct keywords to apply to an item of metadata for a record. This enables the document to be 'tagged' with keywords from a well defined vocabulary. This is similar to the category tags used in Wordpress and other blogs, so I feel I'm getting close to my original aim. In the next post I will round up all of the classification schemes I have described, and try and relate them to the use of tagging on the Web.

Tags: thesaurus metadata EDRMS tag

Wednesday, September 27, 2006

Converging classification schemes of documents and records - Part 3

Over the last couple of days I have started my comparison of document classification schemes, with the eventual aim of understanding how the freeform user tagging offered by Internet apps, like del.icio.us, Technorati, Flickr and others can be applied to corporate documents and records management.

So far, I've addressed structured metadata indexing / titling and at the opposite end of the spectrum full-text indexing with zero user classification requirements.

Metadata indexing typically requires manual interaction to relate the content of the document with metadata attributes. It provides the most accurate and repeatable approach to identifying documents for a very limited scope of use, typically a business unit and maybe even a specialized business process.

Full-text indexing on the other hand requires no effort to identify the documents, allowing users to search on any words that may appear inside the content of the document. Since this can target a large range of use of the documents, full-text indexing may not provide the accuracy required in repeatable business processes (the example I've been using is insurance claims), but can find information that may have been difficult to find through formal classification. Even for Knowledge Management (KM) use cases, full-text search alone may be too broad to separate documents based on the required context. In this case, search needs to be limited to a broad bucket of documents that are known to be the required context.

To satisfy the 'big bucket' requirement there is a high level classification scheme that is recognizable to all of us - the hierarchical folder structure or directory tree, and its formal equivalent, the file plan. This provides at least some structure to otherwise unclassified documents, and when supplemented with full-text search can provide contextual classification that is granular enough to provide meaningful search results.

Tags: full text indexing titling

User defined filing hierarchies

Filing hierarchies even in their simple PC folder structure form are powerful ways of keeping large amounts of data and documents in context with little other classification apart from a filename. Most novice computer users learn rapidly that the Windows desktop alone is no better for storing documents than a pile of paper in a desk drawer, and will start to experiment with different subfolders that may split their documents up.

Collaborative tools that focus on the management and sharing of documents often take this format as well, in the form of workspaces for teams, projects, events, etc. By segregating documents into appropriate workspaces some context has been applied that allows a user searching for a document to know what it is related to. Further sub-foldering offers additional context to documents allowing browse or search.

This approach has become so pervasive to users that organizations buying a document management system (DMS) for collaboration will often insist on the familiar Windows Explorer lookalike UI for navigation of the DMS. WebDAV is a technology to enable this, and MS Windows has finally started to support it effectively, allowing Windows and Mac users to work in their most familiar 'explorer-type' navigation environment while looking at the same DMS, browsing its hierarchy of workspaces and folders.

Tags: WebDAV Collaboration

Taxonomy, Fileplan

A Fileplan is really a form of Taxonomy - a mechanism for classifying things. Typically the classification scheme is hierarchical, although it doesn't have to be. Certainly within an Electronic Document and Records Management System (EDRMS) a hierarchical fileplan is almost always the case.

So what makes a taxonomy for EDRMS different from a folder structure on a network filesystem, accessed through Windows Explorer? Really it is that a taxonomy is a pre-defined, formal hierarchy, rather than a user defined hierarchy. A taxonomy has been predefined ahead of time as a rigid classification structure, providing a well understood hierarchy for managing documents and records. A familiar type of taxonomy is the Web directory managed by the DMOZ Open Directory Project, and used by Yahoo, Google and others.

The traditional fileplan used by records management systems is really a taxonomy for records in an organization, typically representing the structure of the business or activities that creates the records. It is designed to ensure:

Consistent filing
Efficient retrieval
Effective management of retention
Defined approach to documenting the mechanism

The fileplan may be navigated through its named folders / containers or using a filecode, effectively a shorthand code for addressing each folder and representing its position in the hierarchy. As demonstrated by this EPA example, this is the stuff of traditional filing structures dealing with massive archives of paper and other items.

In an EDRMS the fileplan may represent folders for holding documents, and as such becomes a high level point for defining access and usage permissions, business specific metadata and records retention policies. This is another reason why a fileplan is a formally defined structure, since the enforceable retention and eventual disposition of vast numbers of records is at stake.

Tags: fileplan taxonomy

Using a fileplan

Fileplans provide big buckets for classifying documents, in a well understood, consistent manner. This makes them ideal for records management, since they define a structure for documents to be filed so that they can be found again long after the original creator or owner has left the building. In an EDRMS, coupling a fileplan with full text indexing provides a powerful mechanism for identifying and finding documents in context.

Fileplans though are typically just a component of a broad set of record metadata used to describe each document. In an EDRMS the fileplan container for a record is typically just another metadata attribute in a record index, that highly structured set of metadata describing documents, which users actively avoid using. And due to the consistency and accuracy required by records managers in the filing of documents for long term retention, regular business users are not often allowed the ability to directly classify documents.

Much of the metadata that is found in records management systems comes from government requirements that have built up formal records management systems over many years (long before EDRMS). This has resulted in standards like Dublin Core, and is familiar in the US as components of DoD 5015.2.

Summary

A fileplan is different from regular storage folder hierarchies because it is a formally defined structure. It still provides buckets for throwing in large numbers of documents, so when used in conjunction with full-text indexing the context and consistency it offers can be extremely powerful. Despite this, traditional EDRMSs require significant amounts of metadata to be applied to records in addition to the fileplan, to ensure their complete identification. This metadata is something that users do not often complete effectively, since it is seen as being cumbersome and inflexible.

The pre-defined nature of a fileplan or other taxonomy can be useful to users that must file documents, since it takes much of the guesswork out of classification. Inexperienced users do not need to generate their own filing systems through trial and error. Based on this experience, other taxonomic approaches are used in records management, such as the controlled vocabulary or thesaurus. These approaches, in combination with Dublin Core type metadata are starting to converge on the freeform user tagging and blog categories we see on the Internet. The next post will start to address this convergence.

Tuesday, September 26, 2006

Converging classification schemes of documents and records - Part 2

In my post yesterday I introduced my aim to compare the formal classification schemes offered by traditional Document and Records Management with the more freeform user tagging technique used by a range of Internet apps, like del.icio.us, Technorati, Flickr and others. I'm doing this because I'm keen to work out if tagging can meet the requirements of corporate Records Managers, while being more acceptable to users than complex and clumsy formal classification schemes.

Yesterday I introduced the structured classification approach of indexing / titling documents, laying out why this still has its place in business process oriented document management, despite being perceived as clumsy. In this post I'm going to talk about Full Text Indexing, and its use in corporate records management as the 'anti-title', requiring no effort from the user.

Full Text Indexing - "anti-titling"

As businesses have shifted to more electronic content, the ability for search mechanisms to read the content of a greater number of documents and index them for full-text search has increased. Many people would look at this and suggest that the old structured indexing of documents that was necessary when using document imaging (scanning) is outdated and unnecessary. After all, a user could now enter a search term that would return all documents that have that text in their content, much as they are familiar doing with Google. Why duplicate what is accessible in the content of a document with attributes in a database?

Full text indexing of electronic documents (and scanned and ICR documents for that matter) works well where there is a mass of documents that:

Do not fall into a formal business indexing structure
Have content that is self-describing
Are held in context through links they have to one another.

The first of two of these are the justification for email archiving, to keep the mass of emails that goes through employee's inboxes searchable. The emails are all about their content, so full text search is a reasonable approach to finding them. I use Google Desktop all the time for my own emails.

The final type is the current state of corporate websites and intranets, where pages link to one another through hyperlinks, enabling a user to understand the context of a page from where it links to or is linked from. This isn't the stuff around the Steve Gillmor argument that hyperlinks are dead. This is more about the way that an organization structures the relationships between documents (as pages) that inherently provide some classification above and beyond the pure content.

Tags: email archive Google hyperlinks full text indexing

Full Text Indexing and repeatable business processes

Using full-text indexing alone does not work particularly well for structured business files like the insurance claim described in the last post. In the example, all of the documents in the claim must be kept together and have some basic title information, to ensure a full contextual view of the transaction taking place, making a claims manager's work efficient and accurate.

The vague algorithms that control full text search may not allow control over the presentation and completeness of the claim file to the extent that is possible with structured indexing. This leads to the type of filing that equates to having every document and stuffing it straight into a big box - usually the claims manager will find what she is looking for after rifling through the paperwork and sometimes a document creeps in that was in fact not related to the box of paper.

Where full text indexing works well is for unstructured business documents, where being roughly classified into the big boxes is not really a disadvantage. In my experience, these documents are not part of a repeatable or structured business process, instead being 'work in progress' documents authored across the organization. Collaborative tools enable users to file documents with minimal identifying attributes in workspaces that may have little or no structure. The tools enable documents to be found through a combination of full text search and the user knowing the context of the collaboration workspace they are looking for. More rapid access to documents is enabled by passing URLs in emails or IMs and creating browser bookmarks.

Tags: collaboration

Federated and enterprise search

Most documents in corporations can be found through full text indexing / search, although typically within confined applications. Full federated or enterprise search provides mechanisms for enabling a user to search across all of the organization's content stores from a single point. Google's search appliance is one approach, Autonomy provides enterprise search with a range of connectors to third party repositories, and Vignette enhances this with further enterprise search capabilities within the context of the corporate intranet portal.

Full enterprise search has not been implemented in many organizations, and its effectiveness has yet to be proven. It is certainly valuable to support legal discovery requirements, where a broad set of documents across the enterprise have to be found quickly, independent of the applications they were created in or the repositories where they are sitting. My feeling on enterprise search is that it may lead to the lowest common denominator of search capabilities being presented to the user (pure full text search), since attributes within specific systems may not be easily extracted and used. But given the range of disparate data that must be searched that may be the only approach.

Tags: enterprise search

Summary

Full text indexing / search is essential to organizations requiring certain types of documents to be made searchable. Documents that are not part of repeatable, structured business processes, especially those in email and collaboration tools are ideal candidates for full text indexing. This is because users typically want to spend little time thinking how to identify their documents going forward.

As organizations attempt to apply collaborative tools to Knowledge Management (KM) they may find that completely freeform storage and search is not enough to enable them to find their documents within the mass of information that is floating about. Much like Records Management, valuable documents require a little more classification to ensure that they can be found.

In the following post I'm going to look at a high level classification scheme that is recognizable to all of us - the hierarchical folder structure or directory tree, and its formal equivalent, the file plan. This provides at least some structure to otherwise unclassified documents.

Monday, September 25, 2006

Converging classification schemes of documents and records

Matt McAlister blogged about Challenging why (and how) people tag things (brought to my attention by Colin on the Bankwatch blog). Based on my current exploration of exciting Records Management topics, I want to compare the formal classification schemes offered by traditional Document and Records Management with the more freeform user tagging technique used by a range of Internet apps, like del.icio.us, Technorati, Flickr and others. I'm also keen to work out if tagging can meet the requirements of corporate Records Managers, while being more acceptable to users than complex and clumsy formal classification schemes.

Over the next few posts I'd like to approach this comparison by first laying out how I see the current state of Document and Records Management classification, before diving into the new world of tagging.

Structured classification of records - Indexing / Titling

Traditional Electronic Document and Records Management Systems (EDRMS) use a range of well structured and generally enforceable mechanisms for classifying the unstructured data in documents so that they can later be rapidly searched and identified. Document 'titles' or 'indexes' are the metadata attached to the content of a record, the identifying data that describes succinctly and accurately what the content of the record holds. In addition, the metadata may contain other information about the record beyond being a direct description of its content, like the author, date of storage, the record's owner, and the policies around how to retain it. Document metadata is typically held in relational database tables with each field representing a metadata attribute.

Tags: indexing titling

Records are typically classified as being a specific type, which usually descibes the business use or organizational filing requirements for the document. It is this classification that sets the exact set of metadata attributes that must be filled in for the document. Dependent on how it is envisaged that the record will need to be searched and identified in the future, a range of attributes will be required. Using an insurance claims business unit as an example, the classification requirements for a claim form may be:

Claim number
Submission date
Claim type
Policy number
Claim value
Claimant name

Finding documents relating to a specific claim, or having an application automatically present a claim file when viewing associated data in a claims business system or even better a BPM implemented claims process becomes simple based on a quick query of the document database.

Tags: BPM database

By having a well defined set of attributes entered for the document, traditional relational database queries can be utilized to search for documents, without resorting to looking at their content. This SQL-based search was essential in document management systems that were effectively providing an index of paper documents. Despite the fact that the paper was typically scanned to produce electronic images of the content, the meaningful text content of the images was not extracted for search.

Tags: SQL imaging

Structured indexing of documents is fundamental to the way document management systems organize and search documents. It follows the original intention, to formalize the storage and use of documents making this unstructured data efficient within business processes. Records Management has different requirements around the long term retention, discovery and destruction of documents that needs different structured classification methods.

Any form of structured classification can seem clumsy and overburdening to a business user that otherwise has a 'real' job to do. The next posts will highlight the features of Records classification and start to introduce how documents and records metadata converge, while still requiring something more to be usable in many scenarios.

Tags: SQL Records Management

Friday, September 22, 2006

Are blogs and wikis records?

In my post a couple of days back I suggested a list of records management topics I was thinking of covering in an exciting ;) records management presentation. Standard disclaimer: there is no original thinking in here, just a regurgitation of public ideas, none of which should be considered a substitute for good legal counsel.

The topic from the list I have selected today is: Are blogs and wikis records?

To start with I need to define what a 'record' is. There are a range of definitions out there, so I'm going with one from a pamphlet on records, by ARMA, typically a good authority on Records Management:

A record is recorded information that supports the activity of the business or organization that created it. It can take the form of
paper documents such as a hand-written memo or a hardcopy
report
electronic records such as databases or e-mail
graphic images such as drawings or maps; these may be in
photographic, electronic, or hard-copy formats

The ARMA pamphlet has an interesting take on records that I haven't often seen elsewhere; the importance of the document preparation and its content. Many sources that focus on records are more interested in their retention and eventual destruction, which is also important but does not actually ensure that you are retaining information that is actually 'record-worthy'.

The ARMA pamphlet has an interesting section, "How do I write for the record?", which has some standard rules for writing business documents:

The creation of or writing for the record begins the life cycle for recorded information. The purpose for writing is to:
communicate information for use immediately
transfer or convey information for reference
document or record an event, decision, or process
When writing, ask yourself these questions:
Am I the right person to author this?
Would I cringe if my mother read it?
Would I be embarrassed if it were published in a newspaper or put on a bulletin board?
Would I be comfortable if senior management read it?
Do I have any hesitation signing my name to it?

I think many people would agree that these rules should also be applied to professional blogs and wikis. Maybe the author will apply them with some latitude, since he or she wants people to read something interesting and not just another dull business document.

Since a professional blog hosted within a corporate website will feel the need to follow these writing rules, this implies that the content being published is important enough to represent the views of the company and something that customers might act on (disclaimers accepted). This in itself puts the blog posts in the realm of business records, that should be appropriately written, retained and destroyed according to the records policies of the company.

Wikis need to be treated more stringently than blogs. Since the point of a wiki is typically to convey information in a form that appears reasonably authoritative, the apparent value of the information presented is further inflated. The fact that anyone (within the bounds of your organization) can potentially edit the information again makes the first half of the lifecycle uncontrolled and therefore uncomfortable for companies.

Companies are still adapting to how they deal with the seamingly uncontrolled authorship of content, the first stage in the records lifecycle. Standard records management policies can still be applied to published information while they are deciding, handling the second half of the records lifecycle, and providing an essential historical record of information, enabling the company to react and respond to questions down the line.

Control

While it sounds like I'm proposing professional blogs and wikis should be highly controlled, I don't believe that their underlying value as collaborative and distributed authoring and publishing tools should be diluted. I'm not proposing formalized review and approval processes. It is important though, for the protection of everybody, that companies publish concise policies on the use and content of these collaborative tools. The more restrictive the policies, the less value the tools will likely have, since they will be relegated to being a new view onto an outdated information publishing policy.

From a control perspective I am proposing that corporately hosted blogs and wikis should be treated as formal business records at the point of publishing. Every post or wiki entry should be captured and stored on publishing or subsequent editing, complete with details of the user performing the actions. This gives companies the audit history needed to respond to questions about information that was made publicly available on their site.

Summary

Corporate blogs and wikis are tools that are valuable for communicating with a company's customers and prospects. Given that value, companies must treat these tools as another
source of records. Although the authorship policies around the first half of the record lifecycle are being developed, the traditional second half of the lifecycle including retention and disposition should still be applied. This may require organizations to look at how they can integrate their current tools into a records management system, or select from the more corporately focused tools that are coming on to the market.

Technorati tags: Collaboration blogs wikis Records Management

Thursday, September 21, 2006

Persuading users to store records - carrot and stick approaches

In my post yesterday I suggested a list of records management topics I could cover in a presentation, to try and spice things up a little. As ever, there is no original thinking in here, just a regurgitation of commonly held ideas.

The topic I have selected to think about first is: Persuading users to store records - carrot and stick approaches.

One of the issues that organizations face with introducing records management is that business users are not typically good about consistently capturing document records received or produced in their day to day activities. Especially since the advent of email, users typically are not good at spending the time to identify and file important documents in a records repository. For organizations looking to invest in records management this is a key challenge that could affect the success or failure of their project. It may lead others to avoid RM altogether, instead plumping for the brute force but highly ineffective approach of email archiving.

In my view there are two approaches to enabling and persuading users to save records. On the side of persuasion (the 'stick') :

Encourage use of IM & phone for non-record communication (reducing cluttered email inboxes and the need to identify records within the mess)
Reduce size of email inbox and prevent local archiving (ensuring that email inboxes are kept small and manageable)
Enforce a store-it-or-lose-it policy (periodically clear out emails and files over a certain age)
Limit local storage (helping ensure that important documents make it to the repository)

Some of these approaches could be considered inflexible or even draconian, and could well backfire leading to nothing being stored. So as a balance, the way to encourage users to store records (the 'carrot'):

Educate users in the advantages of records management to them personally (automatic backup, enhanced document management capabilities, reduced effort over time)
Educate users in the advantages of records management to the business (reduced risk, less litigation, more money for bonuses)
Integrate business and office applications, making their use of records easier or even invisible to them
Use collaborative tools that enable more effective team working, which also integrate seamlessly with the records management system (users want to work collaboratively, and seamless integration hides the effort of records management)

Both approaches could help users actually handle records effectively. In reality, without some significant work users are unlikely to change their habits, so providing invisible and seamless means for capturing and classifying records within appealing applications they want to use has got to be the way to go.

Technorati tags: EDRMS Email Records Management Collaboration

Wednesday, September 20, 2006

Records Management - making it interesting

Records Management is always an enthralling topic (you can't hear the sarcasm in my voice). And I have the pleasure of presenting two sessions on the subject at Vignette's annual user event - Vignette Village.

Beyond the usual shpiel of "its really important to keep your company's documents secure" and "not destroying documents when you should could lead to discovery of smoking-guns", I need to present some fresh ideas (or at least recycle the same ones differently) to help make the 45 minutes go a little faster for the audience.

Here are some of the major topics I'm thinking of covering. If anyone has any more exciting topics I could cover, or any thoughts on any of these, feel free to let me know.

What is a record anyway?
Retaining records - damned if you do, damned if you don't.
Why not just store everything and destroy it all after 7 years?
Persuading users to store records - carrot and stick approaches
How the Sarbanes-Oxley Act made the photocopier salesman happy and the Records Manager a nervous wreck
Migrating from FileNet before IBM gets to your CIO.
So records managers are like glorified librarians, right?
Classifying, indexing or titling documents, what's in a name?
Why broker/dealers don't need records management for NASD compliance.
Just retaining records isn't good enough.
Why BPM/workflow is essential to records management.
Migrating from Hummingbird before Open Text gets to your CIO.
Are blogs and wikis records?
Intelligent documents provide process control outside the firewall, but tie you to a single vendor.
BPM processes are records too.
Electronic Document and Records Management (EDRMS) is not the whole story.
Andersen/Enron didn't get us records management laws - don't let vendors tell you corporate compliance is about RM.
Keeping records of a corporate web-site. Tough, but possible.
Digital Rights Management on records - are the risks worth the rewards?
Sharepoint provides records management for virtually free. So why are you listening to me?

And so I could go on. I'm going to pick up some of these topics to present, and as I'm building out my thoughts I'm going to blog about some as well. So be warned!

Technorati tags: Records Management Vignette EDRMS

Tuesday, September 19, 2006

Please excuse our appearance - mashups gone wrong!

It seems that I prematurely blamed Blogger for display problems - a holdover of being a software consultant maybe! It seems that my recent addition of a Technorati tag cloud consumed all of my hosting bandwidth in the background, preventing access to stylesheets, images and all of the other fancy stuff that makes my blog so slow to load! The tag cloud has been disabled from updating automatically for now, while I try and work out how to get some more bandwidth from my hosting account and fix an underlying problem with the code.

This has been a demonstration of abuse of the Web 2.0 approach - despite the simplicity of being able to add new features and components to a web app, it is essential that the underlying components are well tested, trusted and generally likely to play well with others. I made the inexcusable mistake of taking a piece of code that appeared to work and trusting that it would be fine in my environment. Web 2.0 still depends on a strong software discipline for components, even if they can be mashed-up into web pages with relative simplicity and little effort.

Hopefully this blog will return to regular service soon. Blogger is down for schedule maintenance at 4pm this afternoon (presumably PST, but they don't make that very clear), presumably to fix some problems they had a couple of days ago with their templates. So expect that there will be some problems that I can actually blaim them for later today, so they can hold my premature doubting of their quality of service in credit.

The presentation components of Blogger (blogblog.com) are down - again. If you are coming in directly to the website rather than a feed reader, I apologize if the default style is hard to read. Hopefully Blogger will quickly sort out its recent extremely poor reliability so that everything can return to its normal state of unreliability.

Monday, September 18, 2006

Building the Organization of the future

James Taylor has been looking at Using decisioning to build the bank of the future. In it he picks up on several key themes that I agree are important. I would summarize them as:

Customer identification

Always identify the customer
Make the interaction experience more personal and appropriate
Not request the customer to re-identify themselves for every activity requested

Cross-channel consistency

Ensure that a customer's interactions follow the same preferences and approach across all the contact points they have with the bank
Enable services and transactions selected in one channel to be used across others

Personalization

Ability for a customer to select preferences for the way interactions work
Provide themes that drive how the activities are presented and run (e.g. fast, step-by-step, most used options, etc) - personalization isn't just about choosing the color of your website
Ensure preferences are remembered and used across all channels

Individualization

Offer the customer options in the way they interact based on observed behavior

Cross/up-sell of services

Predictive selection of offers that are most appropriate to the customer and not likely to be an annoyance
Target offers to the channel - a customer making a fast cash withdrawal from an ATM probably does not want to be hassled with other offers

In his post, James lists the key channels as being:

ATM
Call center/Interactive Voice Response (IVR)
Website
Branch
Monthly statement

I hadn't thought about the last one (as I only get my statement online), but it is an important point of customer interaction. In addition, I would like to extend the channels to include 'mass marketing' - the postal and phone based marketing that is pushed to customers and can be an annoyance alongside an otherwise acceptable service. Especially the use of postal mass-mailings when a customer has selected to have all other statements and communications delivered online (personal rant complete!).

The rules that James has highlighted are not really bank specific, and could be easily applied to the "Wireless or Mobile Phone Service of the Future", the "Government of the Future", and the "Utility Company of the Future". Achieving the goal of "Organization of the Future" requires a combination of technologies, processes, identity management, integration and analytics, all of which exist today. What is needed are some smart people to push the business drivers, building teams to perform the process design, the technology implementations and the systems integrations. The Organization of the Future is a possibility, it just needs the Organization to realize the value.

Technorati tags: Financial Services Technology EDM banking user experience

Friday, September 15, 2006

DRM hacked again - is it useless for protecting corporate documents?

The speed with which hackers broke the Digital Rights Management (DRM) from Microsoft and Apple last week, was commented on by David Berlind on ZDNet - 24 hours: The time it takes to crack the newest DRM from Microsoft or Apple.

This article was timely for me, as I was reading it just after coming off a conference call with Brad Beutlich, Director of Business Development for Safenet, a provider of encryption applications and devices for software and media. His comment around the announcement that was something along these lines:

Perfect DRM protection is like a waterproof watch - it doesn't exist

He wasn't surprised that either Microsoft or Apple had their protection systems hacked, citing a host of technical reasons why this is possible. A significant one is that the OS and the hardware it runs on must both provide facilities to secure the encryption keys used to unlock protected media - standard OS and hardware makes it too easy for a determined hacker to read the required information and incorporate it into a separate unlock utility. It seems unlikely that in the near future the PC hardware will provide this capability as standard, so DRM protection is likely to continue to be a feature of hacker's attention.

Now I'm not so worried about the DRM schemes that protect Windows media and iTunes (at least not when wearing my business hat). My previous posts, DRM gets lardy and Web technology for electronic records worry about the use of DRM for protection of vital business documents, both inside and outside of the corporate firewall. Both Stellent and EMC claim that they have coupled their recently acquired DRM technology with Records Management:

EMC recently announced that it was pairing its Records Manager with DRM technology aquired from Authentica, enabling records managers to enforce their policies for all records, independent of custody. In principle this seems like a great pairing, but there are some issues to be addressed[...]

It is in the corporate world that DRM could offer significant value, though there are issues related to use and lock-in with proprietary DRM solutions as I mention in my previous posts. Documents that need encryption are likely to be most valuable to a malicious hacker with the aim of damaging a corporate reputation. Given the recent news that DRM can be hacked easily, it seems it can only really protect against misuse of documents by casual users. A determined hacker that had acquired corporate documents through fraudulent means could probably hack a utility to unencrypt them (or find one on the Internet) in a short amount of time. The DRM protection of the documents would be worthless for the most valuable use case.

Safenet claims a strong security background, going far beyond that of consumer targeted DRM. Amongst other things, they can provide hardware devices and tokens that hold encryption keys, preventing a major hacking loophole in consumer systems. Although this may limit flexibility of document usage, targeting the deployment of the technology to appropriate users in an organization may be considered worthwhile in highly sensitive environments.

When identifying a need for DRM for corporate assets, ensure that a vendor that provides it is not just repackaging a consumer solution - Windows Media Player and iTunes are not the gold-standard! Stronger hardware reliant technologies such as Safenet come at a price - both flexibility and dollars. The question is whether IT managers and legal cousel can balance the risks of document misuse against the issues and costs of corporate DRM.

Technorati tags: DRM Records Management Microsoft Apple

Thursday, September 14, 2006

Offering third-party content & services may save traditional businesses

Recently I have been immersed in the world of telco, media and entertainment; all related industries for communicating different stuff. At the same time I've been digging more deeply into the telecommunication companies: wireless/mobile, wireline/landline, and VoIP. It is prime-time for new technology driving new consumer capabilities, and new market requirements driving new technology. Despite this, telcos struggle to keep up with the new possibilities and opportunities that technology offers, partly due to their traditional philosophies and business models. They, like other traditional service providers risk losing long-term customers to new innovative companies if they don't adapt.

Telecommunication carriers have traditionally been highly conservative in the way they develop and adopt new technology, driven by an underlying philosophy requiring measurable and controllable Grade of Service (GOS) and Quality of Service (QoS). Grade of Service roughly equates to how likely you will be unable to place a call because of network congestion. Quality of Service is more about the actual quality and reliability of individual calls. As the technology for the telephony system was growing, significant engineering rigor was applied to ensure every component of the system operated effectively to ensure service levels. To ensure service levels, stringent control is taken to what is plugged in to the system, and all that can pass across the wire is voice, fax and dial-up modems.

Mobile/wireless carriers adopted the trusted approaches of their landline parents, to build telco networks on top of far less controllable infrastructure, while still offering a level of service generally acceptable to their customers. The network and all that plugs into it is still stringently controlled. Despite rapid increases in capabilities of mobile devices, the ability of mobile carriers to adapt and provide new services and content has lagged due to stringent requirements for quality and control over what is plugged into the network.

Based on initial network philosophy and subsequent rapid growth, the Internet is built on the concept of being an interconnected system of unreliable and largely uncontrolled networks and components. After adding the World Wide Web on top of this network infrastructure, the result is a highly organic, often disorganized mass of media, information, services and devices. All of this leads to a level of service that is largely unknown. It is only the efforts of the best datacenters and network infrastructure that enable any control of the level of service experienced by users. But it is the ability for services of low quality and reliability to be accessible to users at all that makes the Internet a breeding ground for rapid advances and disruptive technology.

Internet users love the speed of change they experience. Telcos, both wireless and wireline are slow to respond due to their traditional business models and perceived need to control everything that is attached and everything that is put on the wire.

As telcos start to offer themselves up as an entry to a range of mobile media, services and entertainment, tech-savvy users will demand far more rapid development of new offerings. The networks have the opportunity to capitalize, by meeting this demand and being the simplest identifiable means for customers to get at new stuff. The high walls around everything that is offered by today's mobile networks will start to erode if they don't manage to keep up with the expectations of their customers, but they can't do it alone.

Ringtones, wallpapers and basic games offered by some of the networks have satisfied basic customers for a while. But fast 3G networks and off-deck (third-party) websites like Jamster make access to new content, services and applications far easier. If the networks don't keep up by making their own portals better while opening them up to more third-party services, they will start to lose an important revenue stream as customers start to identify more closely with third-party mobile content providers.

Customers start to switch their allegiances fast when they see the opportunities offered by a renegade third party service provider, threatening traditional business models and revenue streams. Wireline telcos are seeing the threat from VoIP. Mobile telcos are at the point of losing lucrative content services to easier to use third-party portals. Maybe other traditional services will start to experience that soon. The question is, how much do banks have to feel the pressure of services like PayPal and Zopa before they start producing innovative new offerings for customers?

Technorati tags: Innovation Telco VoIP Wireless

Wednesday, September 13, 2006

ATMs for buying services on the move

My trip back to the UK was a useful reminder of the different services offered by banks to their customers, compared to the US. Some banks seem to be playing with their vast network of ATMs as a channel for selling services alongside simple cash withdrawals. Given that many UK banks do not charge for withdrawals from ATMs within their broader partner networks, this may be considered an important way of improving the returns from cash machines.

The most common service I noticed was the ability to buy 'mobile phone top-ups' (Natwest provides a decent flash demo of how it works). Given the penetration of mobile phones in the UK (greater than one subscription per capita), offering services to cell phone users may make sense. Prepaid phones make up a significant percentage of the total, and buying top-up credit is something that users do regularly, so making top-ups easily available and fast to buy makes sense to the mobile networks.

From a bank point of view, ATMs do not need to be physically modified to dispense credits - a back end integration is purely required to talk to the common mobile networks. Compare this with Bank of America that insists on offering me the useful, but barely utilized service where you can by stamps. This type of service carries a cost in keeping the 'stamp dispenser' filled and presumably needed the ATM to be modified to operate this.

ATMs are suited to providing services to customers who are 'out and about', rather than performing transactions that they could perform in the comfort of the own home in front of the PC. What other services do (or could) banks provide through their pervasive ATM channel? The University of Pennsylvania has some thoughts that don't seem to extend much beyond the obvious mini-statement and check cashing. I expect that we will continue to see big differences between the offerings in the US and Europe, where electronic / online payments, direct debits and prepaid mobiles are common and paper checks/cheques are less so.

As payment mechanisms converge across online, mobile phone, debit cards and contactless payments (see Bankwatch for many examples), the ATM and mobile phone are likely to become a central point of contact for buying services on the move. As I focus more on mobile technologies, expect some new blogs on this soon.

Technorati tags: Financial Services Technology ATM payments

Tuesday, September 12, 2006

DIY rather than IBM

Andy Mulholland blogged on CTO Blog Unavailable Soon or no two are ever the same! about the evolution of differentiating products. It made me think about examples in the enterprise software market.

In Andy's mind, the standard tool for differentiation as we head to a level of blandness in all of our products (IT, financial, consumer, etc) ends up being price. He believes that customization is a factor that can more effectively provide differentiation.

[...]In amongst the differentiation possibilities are some factors that were quoted in the first age of the Internet, the one that really didn’t work when some basic business rules were ignored. The one that really interests me is ‘customisation’, the idea that by a fully connected and interoperable business world it would be possible to deliver exactly what was wanted by connecting the consumer directly through the manufacturing or services ecosystem.

In the current age of the Internet; Web Services, SOA and Web 2.0, [...]

I agree with Andy that Web 2.0, with its focus on simplicity in aggregating web services into a single application has enabled application designers to differentiate. My blog is for example different from the CTO Blog not just because of its focus or content, but because of the components we have alongside the basic text, differentiating our approach and the way that readers interact with us. Our product is different, not just because Andy writes better stuff than me.

Looking at this from a slightly different angle, some enterprise software companies have been successful at differentiation before, by being the focus of a strong ecosystem of partners that customize their products. FileNet with its community of Value Added Resellers (VARs) had a powerful base product (that was otherwise just another bland repository/workflow offering) that was tailored by the VARs according to the needs of their market segments to provide vertically focused offerings. These were further customized by the VARs to provide the eventual 'bespoke' solution for each customer. FileNet was highly differentiated by the fact that its VARs knew their own market segment better than almost anyone out there, making the end product different than that offered by IBM for example.

To my mind, this mirrors Toyota's approach with Scion, as described by Andy. The problem now is that FileNet has been acquired by IBM. And Big Blue aren't renowned for wanting to hand over potential services engagements to other companies, despite the 'toolkit' nature of some of their enterprise software that would suit this model.

I haven't seen how FileNet's VAR network is holding up, but I would be interested to hear of any feedback from anyone 'in the know'. VAR customization was an essential part of FileNet. Can IBM stomach that reality? If not, maybe in the world of enterprise software acquisitions Web 2.0 is the only sustainable customization model - it offers DIY rather than IBM.

Technorati tags: FileNet Web2.0 customization

Travel, processes and back to blogging

After a nice little trip to London to visit friends and family I have made it back to Boston. I avoided reading too much work-related email while I was away, so now I'm playing that big catch-up game. Feeling tied to blogging during this time (both mine and my regular reads) was also a new experience, so I'm trying to skim some of the more interesting blogs I missed while I was away.

As ever, airport security, immigration and customs formalities were interesting to watch and judge from a 'process' standpoint. Heathrow always strikes me to be a reflection of London as a city - the airport manages to cram a large number of impatient people into a small space and half a dozen chain pubs, while formalities try to be effective and efficient but somehow just fail to get it completely right. Logan airport similarly models Boston, with constant construction work, ineffective directions, and security and customs formalities that seem to require an ever increasing number of people and paperwork, but at least seem to work OK.

I would also love to meet the bloke that originally designed the baggage claim conveyor. Another idea in industrial efficiency, badly executed in most airports. I don't claim any abilities in handling big, physical processes like baggage handling, but I do wonder if anyone with a process or manufacturing background has really looked at our largest airports to see how this most visible process could be improved so that people will actually check in their luggage.

Similarly, I'm sure that many business process experts could look at the paper handling around customs and immigration and really improve the processes. To my untrained eye there appears to be many things that could be done, including reducing paperwork, improving forms, making effective use of available technology that could make the traveler's experience (and waiting times) far better.

As ever, travel is a great experience, with even the most familiar places and processes becoming open to new interpretation when you've been away from them for a while!

Technorati tags: BPM Security travel

Thursday, September 07, 2006

DRM gets lardy

Vacation / holiday is great! The weather is great, and the travel through Heathrow is always a nightmare, but its still great to be 'home' for a bit. I could say the guilt of not blogging hit me. Its more that I'm staying with my folks, so escaping to the PC for an hour is a good thing now and again, so here is an item I felt I should comment on.

I almost missed a nice post at the end of last week by James Governor on MonkChips, titled: digital lard for the enterprise: DRM meets document formats. I certainly couldn't have named it better myself, as it probes some of the fatty issues around Digital Rights Management clogging up documents repositories. Stellent was named as one offender with their most recent acquisition of SealedMedia. As I talked about in a little detail last week:

EMC recently announced that it was pairing its Records Manager with DRM technology aquired from Authentica, enabling records managers to enforce their policies for all records, independent of custody. In principle this seems like a great pairing, but there are some issues to be addressed[...]

Then I go on to talk about some of these issues, finishing with the most important: "Proprietary encryption and DRM typically ties an organization to that vendor for life". James puts it a little more bluntly (and effectively):

You see, if the vendor in question decides you have broken the terms of the software license, they could, in theory simply turn your documents off. The vendor , not the customer, holds DRM keys.
Of course you could argue the great majority of current documents usually require Microsoft anyway, to read them. But those documents are readable. DRM on the other hand adds a whole new level of difficulty.

And so it seems that we agree that DRM has its issues. It also has its place, so to reinforce the message James finished up:

But not all documents need that protection.

Now I have to go back to wrestling why it is that I am looking at DRM at all, and why companies believe they need it. I'm going to be doing a little work with wireless / mobile telcos and their sale and distribution of content. A space well removed from the records repositories I am most familiar and closer to the 'end of the (consumer digital) world is nigh'.

Technorati tags: DRM Records Management

Friday, September 01, 2006

Blog hiatus and mashup

Those of you who click through from your feedreaders into these blog pages may notice that there have been a few changes.

First, I have added Trackback using Haloscan - you should be able to ping a post's trackback URL found by clicking the link (a popup) at the end of the post. The standard Blogspot links remain as well for the Blogger users.

Second, I hacked a script by Matt on the Random Thoughts blog to retrieve this blog's Technorati tag cloud (see right menu). Sort of in lieu of categories really. Since I don't have control of Blogger, I'm running the PHP for this on a hosted server, hidden away somewhere, and plugging it into Blogger as a chunk of javascript. Let me know if its useful, or if it just fails to work - I'll save the bandwidth!. Matt, nice work on the script!

Finally, you'll notice a little break in my blogging - I'm heading back to the UK for a vacation and a friend's wedding. Should be back full force in just over a week (assuming my precarious visa situation let's me back into the US). So don't run away. And if you want to make sure you catch up with my writing as soon as I return, subscribe now by clicking this icon and selecting your preferred feed reader:

Technorati tags: blog technorati mashup

Who takes responsibility for security?

Neil Maechiter put out an interesting post about biometrics and the use of multi factor authentication. He references a post by Jerry Fishenden, Microsoft's National Technology Officer for the UK, describing how extremely secure systems should use 3 factor authentication:

something you know (such as a PIN), something you have (such as a smart card) and something you are (which is, of course, where biometrics typically come in).

A couple of weeks ago I talked about how US and UK banks are approaching multi-factor authentication for access to online services and secure banking sites. In the US the FFIEC has mandated two factor authentication, although the way that some banks are approaching this seems to require only software tricks to support two types of 'something you know'. In the UK, card readers and key fob tokens are being rolled out now to supplement the first factor of username/password, following hot on the heels of mainland European banks.

The third factor, "something you are", has not been applied by banks (to my knowledge), but with the accelerating pace of biometric passports and identity cards this could soon be an option. The question is, does your online banking really need a third factor of identity to be secure? The third factor (biometrics) purely adds extra protection to ensure it really is 'me' at the computer keyboard - not someone who found my stolen key fob and attached Post-It with my username and password written on it.

It seems to me that the bigger risk to my accounts are through personal information being compromised internally by company and agency IT staff or insecure systems. Maybe with 2 factor security providing authentication protection, the banks will start to take some responsibility for their part in the security of our accounts and information -- deflecting the blame to hackers and customers outside the organization will no longer provide enough aircover.

Technorati tags: Financial Services Technology authentication biometrics security