Thursday, August 31, 2006

Microsoft XPS - is this just a 'not invented here' development?

I have to admit that I know little about Microsoft's XML Paper Specification (XPS). Based on my research for yesterday's post about MS Office document intelligence I see that XPS is intended to be able to provide Microsoft products with device independent, repeatable rendition for viewing or printing, much like PDF/A. This means you get to see how a document would look printed, every time you view it, something a little alien to MS Office to date. Darren Strange added a quick comment to my post asking what I think of XPS, and as the response grew I realized that I should make a post out of it to really demonstrate my thoughts (based on ignorance) in this area - so Darren, this is for you! (thanks for the trigger to write something).

So far I have found it difficult to find a succinct description of what XPS is, or why it is needed. Wikipedia describes XPS like this:
The XML Paper Specification (XPS) Microsoft's initiative to integrate document creation and viewing into its Windows operating system. ... Most notably, XPS uses the Windows Presentation Foundation, so that the methods used for rendering for display in Windows are the same as those used for rendering for print devices.

XPS is viewed as a potential competitor to Adobe's portable document format (PDF). XPS, however, is a static document format that does not include dynamic capabilities similar to those of PDF.

Maybe anyone (Darren?) could comment on how true this description is?

If XPS really is just a static document format, I think I would struggle with the concept, though maybe there is value in something completely static for archiving. It seems that Office 2007 still has the issue that there is is no way to produce document files that are viewable/printable identically in a device independent manner, so as an addition XPS document format maybe fills the repeatable rendition gap that Office has always suffered from.

As I understand it, the XPS print rendition is not a standard part of an Office 2007 file package, making Office files poor for archiving in their original form, since they can't be guaranteed to be viewed as originally intended. At the same time, XPS lack the dynamic capabilities of PDF (presumably dynamic attributes) to be fully standalone for review and approval leading to eventual archiving.

Does Office 2007 actually insert XPS into the Office Open XML package when you 'publish' the document? That would fill the gap, since by placing a print rendition inside the Office document, an XPS enabled viewer (or even the Word editor) could show a true 'print view'. This would just need mechanisms to ensure that the versions of Office and XPS documents where aligned, so that you know that the dynamic Office document has not been changed since the last XPS was generated.

Taking the discussion back a little, maybe I'm missing the point, but why not just adopt the ISO standard PDF/A specification (based on PDF 1.4 and therefore XML based), rather than recreating yet another format to be supported by vendors? XPS needs more bloated print drivers and new viewers to be useful. What are these going to add to users and developers toolkits that help the desktop environment move forward leaps and bounds? Right now it just looks like Microsoft play the 'not invented here' card to print/display rendition documents.

I can feel the appeal of a pure XML based format for printable documents (maybe just because its XML), but why not go for Office Open XML content combined with XSL-FO, the standard XML translation specification that allows output to a range of current print formats (including Postscript and PDF)? Maybe it is the standalone XML-ness of XPS that appeals.

There must be a need here that the Microsoft XPS introduction page does not explain, since it gets bogged down early in technical capabilities and requirements. I'd love to know what that need is!

David Perry on the Freeform Comment blog has also picked up on the XPS/PDF thinking, relating it to Adobe's financial situation. He also refers back to the issues that Adobe has with Microsoft producing PDFs in Office 2007. Maybe one to watch again! ]

Technorati tags:

Wednesday, August 30, 2006

Microsoft Open XML truly enables document intelligence

A little while back I discussed embedded attributes in documents that provide intelligence to support features like semantic search, security, rights and even management of workflow outside of the firewall. Aside from the gamesmanship of Microsoft pushing its new Office Open XML format onto the standards train to compete with the ODF format of Open Office, this new Microsoft format is playing catchup in its support for XML. The three key dimensions enabling document intelligence are covered: open access to content; XML document metadata storage; dynamic attribute storage for forms and process state.

The 15 year old PDF format has had the ability to embed attributes in files since the format was first created, partially based as I understand it on Adobe's work on Postscript and some over the shoulder looks at the Aldus Tagged Image File Format (TIFF) - Aldus eventually 'merged' with Adobe in 1994. TIFF originally contained metatags to enable device independent storage and display of scanned documents, and any raster images. Initially simple tags described physical attributes of the image like width, height, resolution, color depth, compression and so on.

The TIFF specification was designed to be extensible, enabling new tags to be added to meet the requirements of emerging capabilities or individual vendors. This is exactly what intelligent documents need - a mechanism for recording any type of attributes inside the file. In PDF 1.4, the format recognized the need for XML support for multiple dimensions, not just to describe the document's 'physical' properties, but also to enable it to be more easily edited outside of its primary Acrobat authoring application and to enable storage of dynamic attributes from forms. According to Classic Planet PDF:
XML support. Under pressure to extend PDF to include XML markup, Adobe has made changes in three areas. First, Acrobat forms can now be set up to capture data as tagged XML, as well as HTML and Adobe's FDF format. Second, Adobe is introducing a new metadata architecture to PDF, one based on an RDF-compliant DTD. Metadata can be attached both at the document and object level, and the DTD can be extended, opening up interesting possibilities for defining and embedding metadata other than the basic set supported in Acrobat 5. Third, Adobe has defined a way to embed structure into PDF. Called "Tagged PDF," it is a set of conventions for marking structural elements within the file.
This multi-dimensional support for XML inside the document is part of modern PDF's power and ability to support a range of security, rights and workflow capabilities offline. Microsoft on the other hand has not got to this level yet. Of course, the MS Office products have supported metadata attributes to describe the document for several versions (title, author, etc), extending this to enable custom attributes to be added more recently. The problem has been that non-Microsoft applications, including document repositories, have struggled to be able to reuse this information without accessing an application like MS Word through its OLE interface. In a Java world this requires some reading of the proprietary Office format, something error-prone and generally poorly supported. As for XML representation of the displayable content, this was still a sideline capability in the last version of Office.

In Office 2007 Microsoft will provide a common XML-based format that will be used by default for all of the Office products, Open XML. According to Microsoft:
By default, documents created in the next release of Microsoft Office products will be based on new, XML-based file formats. Distinct from the binary-based file format that has been a mainstay of past Microsoft Office releases, the new Office XML Formats are compact, robust file formats that enable better data integration between documents and back-end systems. An open, royalty-free file format specification maximizes interoperability in a heterogeneous environment, and enables any technology provider to integrate Microsoft Office documents into their solutions.
This will have the following advantages:
  • Compact file format. Documents are automatically compressed—up to 75 percent smaller in some cases.
  • Improved damaged file recovery. Modular data storage enables files to be opened even if a component within the file is damaged—a chart or table, for example.
  • Safer documents. Embedded code—for example, OLE objects or Microsoft Visual Basic for Applications code—is stored in a separate section within the file, so it can be easily identified for special processing. IT Administrators can block the documents that contain unwanted macros or controls, making documents safer for users when they are opened.
  • Easier integration. Developers have direct access to specific contents within the file, like charts, comments, and document metadata.
  • Transparency and improved information security. Documents can be shared confidentially because personally identifiable information and business sensitive information—user names, comments, tracked changes, file paths—can be easily identified and removed.
  • Compatibility. By installing a simple update, users of Microsoft Office 2000, Microsoft Office XP, and Office 2003 Editions can open, edit, and save documents in one of the Office XML Formats.

In reality, the Open XML format is a bunch of XML files that represent each component of the Office document, alongside binary objects like images, all zipped into a single package. This has obvious advantages both from the simplicity for developers to get at the appropriate information for reading and editing outside of the core applications, as well as the compression that zip can offer. In addition, there are none of the hassles associated with embedding binary objects in XML - images and multimedia objects can remain in their standard formats, just packaged within the zip.

This covers two of the main facets required for intelligence: open, editable content and accessible document metadata. Now Office documents be edited and assembled outside of the MS Word, Excel, Powerpoint, etc applications, enabling automation around the format, for example for document generation from backend business data. Also, repositories can read the document metadata by just examining an XML file.

The storage of document metadata in XML within the Open XML package will initally be leveraged when storing documents to Sharepoint. Users will be presented with a document information panel in Word, to enable easy capture of identifying or semantic information for the document. On storing the document to Sharepoint, these attributes entered into the form in Word are read by Sharepoint to update its structured 'index' metadata used for classifying and searching documents. This is a round-trip process - on removing a document from Sharepoint, the document attributes stored in Open XML will be set based on the current Sharepoint index metadata, ensuring that the document is up to date and can be identified offline. Due to the open format, other vendors' repositories can (and should) do the same.

Transfer of document to repository metadata is simple, powerful integration that could be useful to many organizations when tracking documents. At the same time, it presents issues when dealing with published, finalized documents - do you or don't you update the metadata in a document that should otherwise be locked down, when retrieving it from a records repository?

The final dimension of document intelligence, capture and storage of dynamic metadata, as Adobe uses in PDF for forms and worflow state, could be considered to be covered by extending this approach, coupled with Microsoft's InfoPath 2007:
Microsoft Office InfoPath 2007 is a Windows-based application for creating rich, dynamic forms that teams and organizations can use to gather, share, and reuse information—leading to improved collaboration and decision-making throughout your organization.

InfoPath is used to design the document information panel that will be presented to users in Word 2007, enabling complex forms to be used to capture document metadata. If the document is central to a workflow process, the process related attributes could also be captured here. These are transferred back to Sharepoint when submitted back to the system, enabling the Microsoft workflow to deliver the document to the next user appropriate to the information entered. This is much the same as Adobe does with Livecycle. To complete the picture, pure InfoPath forms can be delivered by workflow, email, web or Sharepoint Portal, enabling information collection outside of a document centric world. In Microsoft's world, forms and documents are separate packages that can be used independently or combined as required.

There has been much discussion over the relationship between Microsoft Open XML and ODF, and their interchange. I believe that although Microsoft sees the threat from Open Office, but as much emphasis is being placed on getting Office to a state where it can compete with higher end users and their preference for Adobe (the threat is evident to Adobe it seems). Microsoft is finally catching up with Adobe in its ability to provide document intelligence within its Office documents, enabling offline classification, document workflow management and forms data capture. Coupled with Microsoft's ECM strategy with Sharepoint and workflow, Office 2007 will be a good stepping stone to effective document and business process management in every organization over the next few years. All vendors in the space need to adapt fast to stay ahead of the game, while leveraging the power that both Microsoft and Adobe offer with their products.

Technorati tags:

Monday, August 28, 2006

Outside the firewall processes with intelligent documents

In my post last week I discussed embedded attributes in documents and how there is more and more intelligence being pushed into what were traditionally very static document files. Adobe is probably the greatest proponent of this trend, with almost every facet of their proprietary PDF format being leveraged to embed more features. Adobe LiveCycle is possibly the best example of this, offering document centric BPM centered around the Adobe PDF format.

Much of Adobe's appeal is the pervasiveness of the Adobe reader. The bloat that some users experience with the reader is what actually enables interaction with intelligent LiveCycle forms and PDF documents. By enabling form entry, signatures, digital rights and other features dynamically in the reader Adobe has ensured that almost every PC user has access to LiveCycle workflow without large software installations. Since workflow client installations are considered a huge burden for the IT team due to the number of users typically involved, the pre-installed application is appealing. With forms being edited directly in the reader, Adobe has provided a rich user interface that is hard to beat with competitors browser driven offerings.

In LiveCycle, a forms designer provides the creation of forms (rendered as PDF or HTML), which are the primary way of representing and interacting with tasks. The forms are documents that can contain data representing the task status, audit history and can be signed to confirm actions performed within the workflow. A form therefore is a completely self contained task, that may be as simple or complex as required.

Form intelligence is central to the capabilities of LiveCycle. It enforces required fields, provides autocalculation of attributes and enables complex validation of entries. Since all of the state information about an activity can be stored within the attributes of the document, flexible delivery to users can be performed not just by the LiveCycle process engine, but email, collaborative systems and shared filesystems. Since everything is contained within a document file, the integrity of the underlying business process is ensured even as the document passes outside of the firewall and the confines of the process engine. The document security and signatures that are contained within the document and managed by the Adobe viewer enforce access control and recording of actions, even when a form is completed offline.

Since tasks may be delivered to end users by email or on the web as opposed to soley through the LifeCycle forms manager, the Adobe approach may be ideal for requests where customers are typically unknown at the outset and the availability of specific software or even reliability of connectivity to the Internet may not be asssured. In the worst case, Adobe forms can generate 2D barcodes that represent all of the entered information on the form, allowing the completed form to be printed and snail-mailed. The US immigration service (INS as it used to be) used this approach for visa applications since it allowed the capture of a wet signature on the printed form.

Bruce Silver provides research papers on a range of BPMS offerings in conjunction with the BPMInstitute. His research on Adobe provides a detailed description of their LiveCycle BPMS offering (going into the full breadth of the BPMS offering), with an interesting use case early on that describes a financial services firm that uses the software for new account opening involving signatures, backend system integration and distribution outside of the firewall.

It is by embedding workflow information, security and signature intelligence into documents that Adobe can offer powerful online and offline document processes. The rich forms with the appearance of paper can be less threatening to some users and delivery through the pervasive Adobe reader enables this. Embedded intelligence in documents provides powerful outside-firewall capability for virtually any PC user, which can be a great enabler for some business processes, especially when interacting with previously unknown members of the public.

Since Adobe does not offer a content repository, organizations must rely on their current (or a new) content management system to enable management of documents through their lifecycle. The security of a document / records repository is essential to ensure that documents can be found to prove transactions at a later date, since search tools do not cater for structured search of embedded metadata at this stage and a filesystem may not be a trusted document store. With this approach it is the integration of metadata in the content management system with the embedded document attributes that can cause complexity and inaccuracy. This could be a downfall if not handled by integrations with common content management systems out of the box.

As Adobe and soon Microsoft Open XML (in Office 2007) push more intelligence into documents, the restriction of keeping workflow inside the firewall may start to go away. The complexity of 'work-in-progress' document repositories may also start to shrink. Managing records centrally throughout their lifecycle will remain though, to ensure that these intelligent documents can still be found when needed - finding themselves is one thing that standalone documents can not do.

Technorati tags:

WCM - The web content, process, integration, services, management system

Any large company that bases their business on the web, represents their products or makes recommendations to their readers is likely to have a need to produce and publish a huge amount of content. This information needs to be changed regularly, while being approved before use. Especially for financial institutions, it is essential that the information that is presented on the website is up to date, accurate and auditable.

Core to the Vignette ECM suite is the V7 web content management product. As I'm finally in a position to actually spend some time with the technology side of the product I'm starting to understand the breadth of its capabilities, for web content management (WCM) and beyond. This blog tries to be vendor agnostic (you'll see as many links to Open Text as you will Vignette), so without naming names I'd like to start discussing why enterprise-grade web content management is important to large organizations.

Enterprise WCM systems enable organizations to manage the creation to publishing lifecycle of digital assets, including some or all of the following:

  • recognize new content and manage it in place
  • capture new content into the repository
  • integrate content from multiple systems
  • manage content business processes with full workflow
  • render content to standard styles for display
  • management of any type of digital asset separately from presentation
  • delivery to the Web, mobile device, print and more

As a bit of a dummy in the WCM space I was initially tempted to try and understand the concepts through the available and accessible open source products like Joomla. I had played with this software a little in the past, making use of its reasonably usable console to manage a website, author some content articles, template layouts and add in some modules like RSS syndication. It is Joomla's range of add-on modules, the relative ease with which you can deploy a decent looking web site and publish content without coding that seems to really appeal. I'm purely a dabbler in this, but I know that the technology is powerful enough to produce some impressive web sites, which guys like Chris (an ex-product manager at Vignette and someone I really respect) and Brett (a good friend) base their technology businesses on.

For large enterprises the challenges are different from just delivering an attractive and functional website. Huge amounts of content are produced and published by users across the organization, and this needs to be delivered to the web efficiently, without requiring a bunch of HTML coders to translate everything that is originally authored by a business user or other content contributor. User friendly tools are required to enable users to not only author new articles but also reuse content that may routinely come from 3rd-parties, business systems and other repositories. Not needing to rework content enables it to be more rapidly published, while enabling it to be reused across the range of web properties a large organization may have. Enterprise WCM should provide integration alongside user friendly authoring tools to enable this.

Most organizations like the need to get items out to their website fast, but they need to be able to enforce its review and approval. Enterprise WCM provides highly capable workflow to tie the integrated content sources to authoring, approval and publishing processes. This provides enforceability, something that the bloggers of the world dislike. Like it or loathe it, corporations with content that represents their business, products, policies or anything else that readers may act on, feel that they need to demonstrate controlled publishing to mitigate the risk of litigation. Getting things right and maintaining content quality is not likely to go away for most companies, reinforcing the need for strong content management processes.

Enterprise web content management pulls together a range of technology to solve the key business problems of rapidly, controllably and scalably publishing huge amounts of information to corporate websites. WCM is fundamental to a full ECM vision since the website is the obvious public face of the company, while extranet sites are the point of contact with your business partners and customers. Challenges lie in how to effectively incorporate this super-sized WCM application into the broader ECM model, ensuring consistent management of content for the web, extranet, intranet, business units, finance group, auditors and more.

Technorati tags:

Friday, August 25, 2006

Personal information - don't just protect it in the database

Kim Cameron's Identity Blog highlighted the case of more than 100 Australian government employees being forced out of a single agency for snooping on client information. According to the Sydney Morning Herald article, hundreds more were demoted or faced salary deductions as punishment.

Interestingly I have a little insight into some of the Centrelink agency's online applications. Despite this, the rest of the specifics to Centrelink in this post are wild speculation, so take them with a pinch of salt.

The agency provides a range of online services to Australians, especially around benefits and financial support, and enables users to perform many interactions and transactions with the agency online. This leads to approximately 80 million online transactions per week. As I understand it, before going online the agency had struggled with how to counter individual users claiming that information they had (or had not) provided online was incorrectly recorded, leading to incorrect payment of benefits and other issues. This would mean that cases that led to litigation would be hard to defend. The requirement for non-repudiation rested with the agency and this proved difficult for them to address.

Here is where the wild speculation starts. Centrelink is considered a gold-standard in the Australian government for an online service that is secure and trusted. It employs a website monitoring application called WebCapture that for online transactions records both the information presented to a user, the forms they see and the documents requested, alongside any information that users enter into forms, the options they select, links they follow and buttons they click. This information is recorded on the web-server, stored to a repository and may be played back by authorized users as a virtual video recording of the entire transaction. As I understand it, the captured, replayable transaction has been tested in court as having appropriate legal weight to provide non-repudiation: the logged in user did perform the transaction, and this is exactly the information they were presented and they responded with.

I am guessing if some of the employees in question used this monitoring capability to snoop on customer information that they couldn't access in other systems. WebCapture information is held in an extremely secure repository, with metadata passed to a standard database. The question is whether the agency effectively designed and enforced their security policies with respect to accessing this data. A system's security is only a strong as the security policies you define for it. In this case, it may be that the WebCapture repository or associated database was the subject of poor IT security policy enforcement or poor governance around the maintenance of those policies or the users that could access it.

If this scenario is actually true, it highlights an issue that should be obvious, but may have been missed in this case. As we add additional layers of software into our infrastructure, if they are not subject to good IT governance and management processes they may be fraudulently used to access personal data and transactions, or lead to other security issues. Every new layer of infrastructure needs to be managed - personal data does not just reside in the database anymore.

With good governance and management of the systems and security policies using best practices like ITIL, a system like WebCapture can provide undeniable proof of transactions performed by clients, protecting the organization from false claims and litigation. This is a huge benefit to an organization like Centrelink. There is no substitute for good management of data in all IT systems, not just the database.

Technorati tags:

Thursday, August 24, 2006

Embedded metadata in documents

As a prelude to a post I want to write soon (when new home life becomes more settled), here is a link to an interesting piece I saw today: Embedded Document MetaData on the Formtek blog. It is a great extension to my post yesterday about Web technology for electronic records, describing in more detail the Adobe approach to embedding metadata into documents.

The value to Adobe by embedding tags is more than making documents standalone for records management purposes, actually enabling Adobe's document centric workflow / BPM. Bruce Silver covers Adobe's BPM capabilities in great detail in his 2006 Report Series. I need to re-read this myself!

Other areas of interest around these types of intelligent documents include:

  • Audit history embedded in document
  • Managing workflow outside of the enterprise
  • Semantic web technology
  • Enhanced searchability
  • Document authoring
  • Security around embedded metadata
  • Digital Rights Management

Intelligent documents are presenting new opportunities and issues to organizations. I would like to understand and write about this in more detail, so if you have any thoughts on important areas to focus on, let me know.

Technorati tags:

Wednesday, August 23, 2006

Web technology for electronic records

Electronic records are hugely valuable evidence of business operations, which must be managed rigorously through their lifecycle to eventual destruction. They may be documents generated by human-driven authoring tools or other (typically human readable) information created through automated processes. Regardless, they are still just electronic files.

Organizations pay a lot of attention to records management (RM) to ensure that their records are well organized, accurate and retained securely through their lifecycle to eventual destruction. The importance of strong electronic records management systems and practices can not be underestimated. What happens though when documents are removed from the system, to take to court, or just to pass to a business partner? How do you continue to demonstrate good management of those records, their classification, authenticity and integrity? Traditional records management has always struggled with the problem of records leaving the custody of the records archive. Perhaps records management can learn a little from the web world.

When you decide to download a file you find on a website you are interacting with a vulnerable system, one that is open to malicious attack, putting the authenticity of files at risk. As the consumer of this downloaded information you need a way to be sure that the file you downloaded really was the file that you were intended to see. The vulnerability of the web world has warranted the creation of strong mechanisms to address this issue.

Open source software projects typically provide a checksum like an MD5 hash of a file which can be used to ensure that none of the information in the file has been changed since it was published. Digital signatures on Adobe PDF and Microsoft Office documents demonstrate the authenticity and author information of the files. They rely on certificates and Public Key Infrastructure to enable the publisher to demonstrate the authenticity and integrity of the file without needing any contact with the consumer.

The approach of embedding more and more metadata into documents is being perpetuated by Adobe and Microsoft on the desktop, not just for DRM but also for general document classification and audit information. In general, documents and all of their associated information are becoming more self-identifiable, through semantic web technologies like RDF. This can only help records managers locate and manage electronic documents that have left the central repository, much like RFID can help them with physical assets.

Digital or Information Rights Management (DRM) takes embedded information, digital signatures and encryption a step further. With DRM a document publisher can also enforce the lock down of the document by embedding unalterable policy information into it. The policy information ensures that the document can only be read by the person it was assigned to as well as ensuring its effective destruction when it expires, even when it is outside of the repository. Think of this like the tape recorded message in Mission: Impossible that would self-destruct a few seconds after being played, or the ability of iTunes to prevent you sharing downloaded tracks.

EMC recently announced that it was pairing its Records Manager with DRM technology aquired from Authentica, enabling records managers to enforce their policies for all records, independent of custody. In principle this seems like a great pairing, but there are some issues to be addressed:

  • Seamless integration of the separate technologies is required to make DRM manageable from standard records policies without additional complexity
  • Handling legal holds to prevent the destruction of documents is virtually impossible
  • Automated destruction of records outside the repository may adversly affect business partners ability to retain records and may represent unexpected legal or compliance issues
  • Producing protected versions of every document retrieved by a user is processor intensive
  • Proprietary encryption and DRM typically ties an organization to that vendor for life

This final point is essential to bear in mind. Without the DRM infrastructure available to enable enforcement of the rights and policies an organization and its partners are left with a worthless set of files that can not be read, much like the proprietary storage systems of the 70's and 80's that required massively expensive migrations as systems reached the end of life.

The effective management of electronic records requires organizations to rethink the completely locked down records archive that was possible with physical assets. Web technologies offer many alternatives, and CIOs under the guidance of general counsel should ensure that they embrace opportunities based on standards that will truly benefit the organization long term.

Technorati tags:

Tuesday, August 22, 2006

More on BPMSs for STP

In my post yesterday, Top-down BPMSs suitable for Straight-Through Processing? I posed a question that had been troubling me: should top-down business process targeted BPMSs be used for STP, and if so what makes one better than another?

Thinking again about the question, it seemed a bit 'dumb'. Surely STP is about pushing a transaction from one end of a process to another without human interaction and purely through automation, right? A workflow like this should be performed by an Enterprise Service Bus (ESB) or something similar that can choreograph/orchestrate web service requests. Well maybe or maybe not, since yesterday I boldly stated:
My belief up 'til [now] has been that the top-down BPM types of tools are good for 'STP' where there are breakouts of the process requiring human exception processing. I think that they may also be sensibly applied to business processes that can not be implemented as pure STP as a big-bang reengineering, instead having STP as an end goal that is approached through refinement and iteration of a live business process over time.
What I am implying here is that STP of a current business process requires huge updates to the way the process and infrastructure works. Using a BPMS that can orchestrate many of the tasks that have to be performed in the meantime (both human and automated) makes a lot of sense. Building on this foundation provides a manageable approach to incorporating all of the integration requirements and human elements of the process over time.

The original question was triggered by Bruce Silver who has added a commentary on BEA's take on BPM-SOA. Although very much focused on BEA, reasonably so as they have been talking about this problem directly, Bruce points out that:
In AquaLogic BPM 5.7, coming up fairly soon, there will be a bit tighter linkage between BPM and the AquaLogic service registry, but a lot of this you can already do today. And probably you can do it with almost any BPMS that can consume web services with just about any SOA registry and ESB. When you can and when you can’t, what makes BPM, ESBs, and registries either hard or easy to glue together — those are stories that neither BPM nor SOA vendors are yet telling.

This could be the answer I am looking for. Maybe any BPMS that supports web service consumption can play well with SOA. And therefore, perhaps straight-through processing is something that a good BPMS can perform. It could be that some vendors have an advantage through providing an Enterprise Service Bus (ESB), but it could be something else implicit in their BPMS that makes a difference. Maybe the BPMS can not only call out to the ESB for certain tasks, but also be used as a service by an ESB orchestrated process to manage (sub-)business processes. Lots of 'ifs, buts and maybes', so if someone knows the answer, please put me out of my misery. Alternatively, if the question is just dumb, feel free to let me know.

I am starting to believe that STP is an end goal that still requires exception processing through human interaction - there are few business processes that can be handled with ZERO human interaction or intelligence. And this fits a top-down BPMS. Perhaps producing a STP process built on this toolset through iterative improvement over time, as I suggest above, will not be the absolutely most (hardware) efficient approach, compared to say building the whole thing from scratch in an ESB. But it may be the most practical in reality.

Technorati tags:

Changing address is as simple as shopping online

Things change. Where I choose to live is one of those things that is undergoing a gradual transition. My 'identities' are changing with it.

The view from my window in Boston, which is the banner picture on this blog, is associated with the glow of the floodlights from Fenway Park and the bustle of post-game Red Sox fans and Berkley music school kids.

This view is gradually being replaced through the physical manhandling of boxes of stuff to a new home as I move in with 'the missus' a.k.a 'my girlfriend'. The new view is more relaxing, though the transition comes with a little stress, beyond the pure manual labor.

Its been a few years since I made a move. Fortunately things have got easier as more and more organizations embrace online services. Within about an hour I had made a change to most of the major mailing addresses I have on file with organizations scattered around the US.

For many organizations this type of address change should be relatively simple, since I already have an online account with them, which I use to identify myself. My residential address is just an attribute of my profile, rather than an attribute used to identify me. Financial institutions especially, but almost any organization, just need to email me a confirmation message about the address update to ensure that fraudulent changes to my information can not occur without my knowledge. Hopefully this will be more successful than the experience of Jim Davies from Gartner.

The problems come once I start trying to change my address with organizations that identify me offline or tie services to my home. Offline identity seems to regularly have residential address as an essential component of the identity. Utilities like electricity identify me this way since they only supply their services to that address. I need to recite my address to change my address - it seems like there is too much dependency on this one item of information!

Worse still is the US Postal Service (USPS) - they have little option but to identify me through my mailing address. Cleverly, they enabled my to submit a change of address online even though I have never had contact with them before.

The USPS approach is to validate a customer lives at the address they are attempting to change in a comparable way to PayPal setting up a new customer account. USPS as I say has no idea when I hit their website who on earth I am. So they choose to validate my address through the use of a trusted third party - my credit card company.

When I request an address change, USPS requests I enter my credit card details for a card with a billing address that matches either my old or new address. They make a set charge of $1 to the card, enabling them to pass my name and the address they want to compare to the credit card company for standard payment authorization. If the payment is successful USPS accepts the address change, since the credit card company has authenticated my card, name and address match. For me the customer the $1 payment is well worth the time saved going to the post office to register my change of address. For USPS, they get instant validation that this otherwise unknown customer is authorized to make the address change.

Financial institutions, due to fraud and anti-money laundering customer identification programs have some of the best mechanisms for ensuring most up to date and accurate records of their customers' profiles and personal information. It is natural that secondary organizations, both financial and otherwise would choose to accept customers' relationships with their primary financial institutions as proof of identity.

Online payments have produced a novel way for identity validation to be performed, at little cost to the the secondary organization or customer. And even more importantly, the mechanism uses standard payment methods to complete, not requiring additional integration effort or agreements with the third parties to run the transactions. Maybe unwittingly, credit card companies and banks have made themselves a primary validator of my identity, through both their own needs to ensure they know me and the mandates of the regulators such as FDIC. The major cost of initially identifying me is borne by the bank and credit card company to enable them to maintain a relationship with me. Maybe each of these small transactions made by secondary organizations wishing to identify me helps to offset this cost over time.

Technorati tags:

Monday, August 21, 2006

Top-down BPMSs suitable for Straight-Through Processing?

It seems that Bruce Silver and Jesper Joergensen have been having an interesting discussion around the marriage of BPM and SOA. Normally I would avoid any post on this subject, since I can look at the patterns of letters on the page through squinted eyes and know that its going to get too deep, too quick. These posts though have led me to a question that keeps coming back to me: should top-down business process targeted BPMSs be used for Straight-Through Processing (STP), and if so what makes one better than another?

In his post, Bruce specifically talks at a level that really makes sense, rather than the usual "we need BPEL/BPMN/WS/SOAP/I'm smarter than you by being able to use technical acronymns in seamingly meaningful ways that are completely unrelated to the real issues". He hits all of the points that I have been seeing over and over, but in a way that does not open the door to deep semantic discussion, which is fine by me. I have summarized Bruce's comparisons heavily:
  • BPM is top-down. SOA is bottom-up.
  • BPM is business-driven. SOA is IT-driven.
  • In BPM, success is measured by business metrics and KPIs at the end-to-end process level. In SOA, success is measured by architecture, logical consistency, ease of integration.
  • BPM is project-oriented. SOA is enterprise infrastructure-oriented.
  • In BPM, what is reused is the process model, i.e. the “abstract” design of a process fragment. In SOA, what is reused is the service implementation.
  • In BPM, a business process is inherently hierarchical, composed of nested and chained orchestrations. In SOA, services are inherently independent.
  • In BPM suites based on service orchestration, process activities are bound to service endpoints. In SOA (supposedly), orchestrated services are supposed to be abstract, with connection and mapping to endpoints mediated by an ESB.

These points make sense since they describe the classes of software-based solutions that are BPM and SOA, not individual vendors take on it. Given these definitions there seem to be different classes of BPMS, some highly human process biased, others more SOA biased. In my head I have not really been able to work out what it is that distinguishes one from another since all of the BPMSs I have had contact with have been human interaction based with some plug and play integration capabilities. Is it just that the more 'toolkit' type BPMSs have an SOA bias since business users can't really use them?

My belief up 'til has been that the top-down BPM types of tools are good for 'STP' where there are breakouts of the process requiring human exception processing. I think that they may also be sensibly applied to business processes that can not be implemented as pure STP as a big-bang reengineering, instead having STP as an end goal that is approached through refinement and iteration of a live business process over time.

In Part 2 of his series, Bruce says he will talk about the way current “SOA-based” BPMSs work. I'm hoping that this addresses my question: should top-down business process targeted BPMSs be used for STP, and if so what makes one better than another? If not, I'm going to have to accept that there is an obvious answer to my question that I'm just missing!

Technorati tags:

Friday, August 18, 2006

P2P financial services has a 3D Long Tail

In my post Financial products have a Long Tail I argued a little that the Long Tail model could be applied to financial services and right now is especially evident in the online brokerage accounts and tools available from the likes of E*Trade and Fidelity. I chose to focus on these types of accounts over retail banking with on a little thought or explanation. As I suggested in the post, banking is most useful to me by picking a pervasive bank with ATMs everywhere; Bank of America fits that bill. As a consumer I'm not going to head much further down from the fat-arse end of the tail to the smaller banks and credit unions because there is no real advantage to me exploring the niche options. I am not a niche customer of the banking world, so I shouldn't have used myself as the reference.

James Gardner responded in a comment that directed me to one of his posts about how the Long Tail could be applied to banking. He had identified Zopa (in the UK) and Prosper (in the US) as Long Tail contenders:

What are Prosper and Zopa doing? Democratising the production of loans: now anyone can be a bank! Selling the niche: chances are the loan terms that best suit you are going to be available where there are thousands of consumer created loan products. Zero cost distribution: the loan is matched and fulfilled automatically.

Zopa and Prosper are long tail business.

That is the key: "now anyone can be a bank", the same way as with eBay anyone can be an online retailer. That is the thinking I should have done around this!

I would also like to extend James' thought. In most of banking and credit there is a Long Tail of the population that is unbanked; for some reason they can not gain access to financial services, or at least not more than the check cashing branch on street corners.

The unbanked population does not fit the peak end of the power curve that represents customers that are likely to be appropriately profitable and of acceptable risk to many banks. James Taylor has written much about how Enterprise Decision Management can actually ensure this, enabling banks to effectively manage their customer profile during account opening. For the Wal-Mart type banks, this makes absolute sense; customer profile matches the inventory of products the bank chooses to offer. Even in an online world the banks operations are not flexible enough to offer a greater array of products (beyond an Internet only account maybe), so they target the mass of standard customers.

Looking at the new online financial services, Zopa and Prosper; they offer the ability for everyone to be a bank, offering an enormous set of credit products for a credit customer to pick from. In doing so they appeal to the different niches of customers with different needs, allowing them to gain access to credit that would otherwise be impossible.

There is another product in this new financial world, which makes this a peer-to-peer (p2p) form of trade. Looking at it from the other way round, each of these niche credit customers is like an inventory item or niche investment product for the investors. Now the guys with the cash have a Long Tail investment products with varying degrees of risk to select from, which they can select off the virtual shelf of Zopa/Prosper.

I don't believe that eBay customers can be viewed as a Long Tail of 'products' in the same way. For the new financial services, the two credit product / investment product views exist because both parties are benefiting financially from the transaction. This new wave of financial services presents a kind of 3D Long Tail of potential credit / investment product matches for both parties to benefit from. This is a powerful offering that extends beyond the 'community/social' banking brush it could be tarred with, which (if I wanted to really push my limits of credibility) could lead to a whole new form of Long Tail trade.

Technorati tags:

Wednesday, August 16, 2006

Would BI Vendors really buy into ECM?

This was such an unlikely sounding item I had to post about it. A recent post by Alan Pelz-Sharpe mentions an IT Week piece in which he is named alongside Mike Davis from Ovum.

In there Mike talks about the possibility of BI vendors like Cognos and Business Objects continuing the trend of acquiring ECM vendors. This has all been triggered by the IBM announcement around FileNet, and is probably no surprise as analysts try and out-bid each other trying to guess who will be next. Alan, like me, seems to have a different opinion about the match of BI with ECM for this to happen, and Alan's is well worth reading, based on historical context of where this has caused trouble in the past.

As I mention in a comment on Alan's post, the acquisition of an ECM vendor by a BI vendor...
Seems like an odd choice to me. I'm not sure I really understand the use cases for structured BI tools in unstructured content. And selecting acquiring companies purely based on size, rather than fit, is something that I hope we don't see as a trend.

Admittedly, real-time analytics like Aungate provide an interesting take on BI for unstructured information. BI alongside content management seems better when looking at structured metadata, especially when linked to BPM data. But that works nicely as a partnership if the analytics capabilities exceed what is available in the content management suite.

I strongly advocate getting information out of unstructured form if it needs strong BI. Strong analytics need suggests that the data should be freed from the spreadsheet constraints, while providing more control and repeatability around the data. Maybe I'm behind the times!:(

In many systems and business processes, unstructured data is often a byproduct of other poorly operating upstream processes, or the (shrinking) requirement to communicate with customers on paper. The creation of Excel spreadsheets is often the sign of immature processes in my view, although it is one of the few scenarios I could consider useful for structured BI against documents.

If Stellent, Open Text, or Vignette (please no! you won't take me alive), are swallowed by Cognos or Business Objects, I would be surprised. If they have some killer use case I haven't yet imagined, then cool, but I think its going to take some selling!

Technorati tags:

Tuesday, August 15, 2006

On being acquired by a bigger ECM vendor

The IBM acquisition of FileNet made me start thinking about how companies manage acquisitions and subsequent integration very differently.

I used to work for Tower Technology, an imaging, workflow and records management vendor that was acquired by Vignette (my current employer) back in 2004. Reminiscing, I went to the Way Back Machine to dig out the old, lurid colored website. Interesting headlines on the homepage about recent customer wins (back then) in the US, Europe and Australia made me think about how Tower used to consider FileNet their number one competitor. I'm not sure, maybe purely because of size, that FileNet considered Tower a great threat, but probably acknowledged the Australian company's existence.

The first positive thing Vignette did was to change the garish colors of the Tower website, replacing it with a more acceptable Vignette look. Unlike EMC with Documentum, Vignette did not retain the Tower Tech name, and within days mostly replaced the messaging to fit more of a Content Management theme, which the mother company was most comfortable with. The first home page after the acquisition was substantially different in messaging, and I remember the shock of many salesmen to find out that they no longer knew the story behind what they were selling. They were also worried, because a significant portion of leads came from Google, and it looked like Tower web properties were going away.

Much of the core talent behind Tower Technology, its Integrated Document Management (IDM) product and Enterprise Records, Document and Case Management product (then Seraph, now Records and Documents) fortunately survived. Many have done wonders to rebuild the messaging, to integrate with the Vignette organization and technology, and to take forward a stronger ECM story.

Sometimes I have to ask myself, wouldn't it have all been easier if the Vignette marketing group of the day had not messed with the brand and the messaging, allowing itself to learn over time how to really take these products to market under the Vignette banner? EMC seemed to get it right with Documentum. Will IBM get it right with FileNet?

Just to round up the thoughts out there, Sandy Kemsley did a nice round up of the comments around the Big Blue acquisition of FileNet: Comments on the IBM-FileNet acquisition - Column 2 - ebizQ

Not all were in agreement that FileNet's BPM was the focus of IBM, although I must say that the content-based BPM offered by FileNet certainly is appealing to customers wanting to capitalize on their investment in content management rapidly. Hopefully the dust will settle and we will see this as more than a land grab and door opener for the IBM sales-teams, and some real use made of the technology and most importantly the FileNet expertise.

Technorati tags:

Monday, August 14, 2006

Financial products have a Long Tail

I finally succumbed to the hype and an extended period of time waiting at Boston Logan airport. The opportunity to buy a real paper book presented itself in the form of Chris Anderson's The Long Tail. Sat in prime position on a Borders' display, it must be deserving of every penny of the $17.50 per year that few square inches of space cost the retailer in rent and overheads (my estimate from Anderson's figures!).

I was flying to Austin, Texas via Atlanta, to take a short notice trip to Vignette's HQ. By the way, one day very soon the company will complete the migration of its corporate www site to its V7 flagship product and I might actually be proud to link to it. In the meantime its just a courtesy. This introduction into my travel schedule is also an implicit excuse for a lack of posts over the next few days.

On the flights, fired by a lack of carry on luggage (I don't need the extra hassle of buying toothpaste and other necessary items every time I travel, so I'm happy not to have to fight the masses for another piece of prime real-estate, an overhead bin), I ploughed through the first 178 pages of the book. Which only leaves me about 50 for the return trip - best do some real work I suppose!

In any case, the book made me start thinking (and I apologize if this is addressed in the last 50 pages) how the Long Tail might be applied to financial products. I'm not thinking bank accounts, since I believe they probably need a bank to be relatively pervasive to work effectively right now, as I don't wan't ATM charges every time I need cash. More effective would be items like loans, insurance, mutual funds and annuities.

Financial product Long Tail

The Long Tail probably can apply quite effectively, assuming I limit the scope a little. For any of financial product where I invest my own money, I would suggest that the Long Tail applies to the multitude of products from recognized and reputable institutions that fall outside of the Top 500 'Hit' products. I don't want to include marginal or unheard of institutions into the mix since the risk of me losing my shirt is far higher than buying an unwanted birthday present from someone unknown on eBay. Although it has to be said that Zopa has a good model for investing in loans that people seem to trust, despite its relatively unknown status.

There are financial institutions that have built their brand on the ability to sell a vast array of securities, independent of popularity or ranking. Go to Fidelity, E*Trade, or another online brokerage and they will offer stock in many companies that could be considered to be in the mid-portion of the Long Tail, all as part of their standard low-cost service. And they provide many of the filtering and advice tools that Anderson suggests are necessary to help customers when facing a bewildering array of options.

This meets two of the requirements for the Long Tail according to Anderson (page 57):

  • Democratize distribution - e.g. Fidelity is just an aggegator of stock for sale
  • Connect supply and demand - e.g. E*Trade provides tools to enable customers to select stock based on many sources of information
The final force for the LT is 'democratize production'. Although I suppose anyone could run a public company, SEC regulations and Sarbanes-Oxley (SOX) seem to be making that harder and more expensive than ever. This could be considered as to ensure that production of public company stock is never really democratized, all in the best interests of the public investor (!).

The discontinuous Long Tail

Sliding down the slope to the more distant end of the LT appendage will place you into the 'penny' stocks. These don't meet the rules or the volume of trading that would have them listed on the Nasdaq, therefore making it more difficult to find out information about them, or even their current value. Given that, E*Trade for example will allow you to buy them online, although the restrictions really separate these stocks from those higher up the ranking. This is not the seamless LT that we see with iTunes, where rank does not affect the ease with which I can buy a track. With penny stock the LT is a little discontinuous, where we can no longer apply the common structure and rules of NASD to the items we want to buy. Its a bit like iTunes trying to sell vinyl albums when you get to an imaginary point in their database where the item you want exists but has not yet been digitized.

Complex products have more to gain

With complex products like annuities, where the number of combinations of securities components coupled with insurance components is enormous, the rules and potential benefits of LT could really kick in. Advanced consumers could benefit greatly from matching the endless array of products to exactly their requirements, if the information and access to products through a single online access-point was available. In this mode though there are many other issues that do not face Amazon and Google, like licensing of advisors and agents, assessing suitability for a product and so on.

The Long Tail could be enabled by the work that NAVA is doing to prepare the annuity industry for online account opening and management. Fidelity and E*Trade could for example start assembling and selling a far greater range of annuities, at a far lower cost of production. This matches the requirement for a LT to reduce the cost of production, with the brokerages standards for aggregating distribution and provide filtering to match products with people.


Financial products have probably followed a partial Long Tail model for a while, especially around the online stock brokerages we are familiar with. The costs of production may have risen, but to the customer the distribution costs and filtering / information tools have provided a far enhanced and more varied environment to trade in.

Other more complex products like annuities could greatly benefit from the Long Tail dynamics. The manufacturers of these items require a significant push, both in terms of standards, but maybe also something (or someone) else to put them into a state where they can benefit from this new model of selling 'less of more'.

[UPDATE: Apparently I messed up a couple of the links. All fixed now. Sorry!]

Technorati tags:

Friday, August 11, 2006

Citibank Hardware Tokens Defeated - but don't blame the tokens

A post today at Bankwatch » Citibank Hardware Tokens Defeated: The Beginning of the End
pointed to an AllPayNews article about the weakness of physical tokens (like the RSA SecurID). The article talks about how Citibank's online banking security was defeated by a fairly simple phishing method, despite the use of physical tokens.

As it turns out, the article was a push by a security token vendor, PhishCops. In any case, it did point out an apparent weakness of SecurID type security tokens when implemented without other anti-phishing measures.

According to the AllPayNews article, the scam worked like this:
In a textbook example of a "man-in-the-middle" attack, Citibank business customers were lured to dozens of counterfeit websites located in Russia where they were prompted to supply their token-generated passwords and other credentials. The counterfeit websites then swiftly sent the solicited credentials to the genuine Citibank website where they were used to access the accounts.
For background on tokens, see my recent posts around electronic signatures and the use of physical tokens
to strengthen the standard username/password pair for user authentication at online banking and other financial sites.

How did this work?

A valid Citibank online service would prompt the user for their standard login details, an account ID and password. Knowing which account the customer is attempting to access, the service now requests a one-time number to be entered from the user's token. SecurID and similar tokens rely on a user specific number being generated that remains valid for 30-60 seconds after being displayed on the device's LCD screen. The user enters this number into the online prompt, confirming that they are in possession of the token. The two-factor approach, an item that is memorized (password) and an item that is in-hand (token) ensures the authenticity of the customer.

The problem is that the scam used an advanced phishing approach. Not only did the scam direct customers to a website that presumably appeared exactly like a Citibank website, prompting them for all of their credentials, it immediately used these to access the real Citibank online service with the provided information. Assuming that the scam site completed this within the lifetime of the one-time password (30-60 seconds), it would be successfully authenticated, enabling the scammers to perform fraudulent transactions on the customer's account.

Tokens don't work?

Tokens with a limited lifetime passcode are still very valuable devices to ensure authentication of users. Even if both password and one-time passcode are stolen, the fraudulent user has to exercise them in a very short amount of time. Certainly this is a strong barrier preventing many keylogging and phishing attempts, but as is demonstrated here, not all.

The problem is that tokens don't close the loop. Although the banking service can confirm that the user credentials entered truly are the two-factor identification for the customer, the customer still has no way of being sure that the site they are entering this information into is real.

Need to prevent users entering credentials into fake sites

It is essential that online services provide a mechanism for very obviously confirming to users that they are using the real site. Educating users to study the URL is really insufficient. Anti-phishing toolbars are also an option, but since many users who travel may use PCs that are not their own for access to online services, this is also impractical to depend on. A good option is the SiteKey as used by Bank of America.

SiteKey is a way of demonstrating to a customer that they are looking at a valid BoA site. When attempting to login, the only information requested on the front screen is basic ID, the account number and state you live in. This is submitted to a the secure site, which retrieves a picture and a phrase. These two items were selected by the user when the online account was setup, and are only known to them. Users learn that they should only enter their password if they see this personal SiteKey. The SiteKey page shows how this works visually.

PhishCops goes a step further

The PhishCops 'virtual' token claims to be able to counter phishing sites, since even if the user's credentials are handed over, fraudulent use of them is impossible without the PhishCops token on the user's PC. PhishCops is not a physical device, and claims to be purely browser based.

The 'how' is not entirely clear to me, but it seems to be done by handling two way authentication and authorization of specific user PCs (I wasn't going to open the MS Powerpoint presentation to find out). It seems that the secure site presents a one-time key that the user enters into PhishCop, then the virtual token returns a one-time passcode to be entered into the website. This seems like a great idea, since it validates to the user that the website is real before they ever enter their credentials, and validates to the website that the token and user credentials are real.

The downside that I see is the need to pre-authenticate different computers that you want to use with the system. The process for performing this, and its complexity is not clear. I also wonder whether keylogging software, and scripting hacks could also break the system. A physical token is powerful since it is completely separated from a PC.


Physical tokens are not dead. But as the Citibank example has shown, without providing additional layers of protection to users to help them avoid phishing, a well crafted, realtime scam can defeat even this two factor authentication.

If Citibank had used a site authentication approach like SiteKey to prevent phishing sites easily convincing user to give up their personal credentials, it is likely that there would have been no question about the security of their physical tokens. SiteKey is not perfect, but probably reasonably effective.

PhisingCops may provide an approach, though I am wary of software based tokens as being thoroughly secure. Unfortunately, a hardware equivalent may be more bulky and expensive to produce and distribue. Banks will need to balance the risks and provide as much information as they can to users to protect them online.

Technorati tags:

Thursday, August 10, 2006

IBM platform for massive systems re-engineering?

This morning's announcement by IBM that it intends to acquire FileNet was a little surprising. Many observers had been suggesting that FileNet would drop soon, but I don't believe anyone guessed to IBM. My outsider view was that there was far too much overlap in their offerings for this to make sense from a technology viewpoint.

Thinking again, there seems to be a little sense to the acquisition, beyond absorbing a competitor. As Sandy Kemsley suggests in her Column 2 blog, the IBM SOA/integration capabilities paired with FileNet BPM could be a powerful combination. According to Sandy:
This is an area where FileNet provides a quite different and possibly complementary product to IBM, so I think that FileNet's BPM product could actually survive, get properly integrated with the IBM integration substructure, and become the product that it should have been years ago.

Much of the IBM data integration capability comes from their acquisition of Ascential, around March 2005. Combining the capabilities of WebSphere Information Integrator to access and manage a range of datasources with Ascential's abilities to transform and integrate the data into something meaningful, for migration or pure business intelligence, is quite powerful. One use case quoted is:
For example, a company trying to consolidate data from multiple ERP systems into a single system could leverage WebSphere Information Integrator to access various mainframe or distributed sources for profiling and assessment, and then use Ascential Software's data migration and transformation capabilities to integrate the data.

This gave IBM a sound basis for integration, migration and business intelligence at a technical level. Now add to that IBM's desire to strengthen its vertical solution plays. The acquisition of Webify to provide industry specific integration accelerators last week makes a lot of sense. Worked well, this gives IBM the ability to get at even more data from common vertical industry systems, especially across Insurance and Healthcare, with ready built adaptors and tools. Far more rapid deployment of systems becomes possible with this technology, as well as the opportunity to dislodge the niche integration vendors in certain industry segments, enabling IBM to own far more of an organization's overall architecture.

In combination, the integration pieces also enable IBM to handle one of the trickiest parts of systems combining a range of business applications and business process management. With some smart thinking they should be able to work out how to integrate the data to more easily provide a common structured datastore, enabling simplified synchronization of all of the data from all of the system components including BPM.

Organizations will benefit from this at several levels. From the technical level they will benefit if IBM can make this synchronization of shared data simpler and more accurate. From a business level they will benefit from enhanced data accuracy and consistency driving better customer service and reduced rate of operational errors.

I have no idea if IBM has a formal vision for how all of these pieces look outside of a spending-spree. From a generic systems standpoint, I would see a common architecture a little like this:

This architecture gives IBM the ability to:

  • Orchestrate human and systems processes (BPM)
  • Manage a central integration infrastructure (WebSphere Integrator)
  • Migrate and integrate data to a central datastore (Ascential / DB2)
  • Access a range of industry specific datasources (Webify)
  • Store unstructured data related to business processes (Content Manager / FileNet)
  • Manage information lifecycle of all unstructured (documents) AND structured (systems data) through records retention policies (Records Manager)
  • Present user and analytics applications through a common portal interface
This is a powerful architectural mix that very few (if any) organizations can provide without requiring them to partner with other vendors.

With the FileNet acquisition, IBM gains a foot in the door to many corporations that have problems that they may have been attempting to solve with SOA and BPM in a piecemeal fashion. With the other recent acquisitions IBM could claim the ability to approach these problems in a broader, all encompassing technical approach.

If IBM really pushes this hard we could be about to experience the next generation of organizations bravely attempting massive systems re-engineering exercises, this time with a suite of products from a single vendor. Big Blue will again be the central figure of IT in many organizations.

Technorati tags:

Wednesday, August 09, 2006

Long term document format portability is not important

A post by David Perry on the Freeform Comment blog ODF Debate: A real world view caught my eye. The competitive nature of the Open Document Format and Microsoft's new Open XML format for Office 2007 are discussed.

An interesting point is raised:
We must also remember that Microsoft has serious plans to build a developer community around Office 2007 so, just as with .Net, and Visual Basic, we can anticipate a growing level of support for Open XML from ISVs that is likely outstrip ODF, at least in the short to medium term. If you have an application or service that you think should be integrated with or accessible through an “office like” application, or has the ability to manipulate an office style document, should you build around Open XML and reach 90% plus of the market, or ODF and reach a minority - no-brainer really. Perhaps it ain’t fair, possible it ain’t right, but that’s the real world.

This makes sense for document editing applications. As David also says, setting yourself up for document compatibility problems is unthinkable in a business sense, when you suddenly can't read mission critical documents 25 years after their initial creation. Being able to view documents long-term is essential, and ODF v. Open XML presents a challenge to that.

I look at the problem of document standards over the lifecycle of the document:

The lifecycle works such that the draft, review or 'work in progress' timeline is typically relatively short, compared to the timeline after the point of publishing where, in a well controlled organization the document is made an official record.

ODF and Open XML apply to the document in its 'work in progress' state. PDF/A should be the published format that provides a perfectly repeatable rendition of the document on every view, but does not require further editing.

To my mind the most important task for the work in progress formats (ODF and Open XML) is enabling editing in whichever application the user chooses. That said, early in the document lifecycle, which is fairly short, file format portability is most important only within the limited set of versions of applications available at that time. In an ideal world Open Office should not have to provide support for a MSFT Office format version that is not current. Vice versa, MSFT Office should not have to provide support for an ODF format that is not current. By current I mean with a significant number of users authoring documents. In both cases I am just worried about the editing of work in progress documents, and that happens over a fairly short period of time and therefore with a limited set of available application versions (nobody in the real world uses MS Word prior to v6 to do they?).

After publishing my primary concern as a user is being able to read the document, exactly as published, time after time. PDF/A is the enabling format for this, supported by almost everyone. Whether this will be achieved is a little dependent on whether MSFT gets over its spat with Adobe and just uses PDF/A, rather than Adobe's proprietary PDF format.

This does not mean that organizations do not need the ability to edit published document year in year out. These type of vital documents are handled by retaining an editable version of the document alongside the published version. If the document is edited over time, the portability between tools will remain current and changes to the standard tool used in an organization will be handled by saving to the new format on the next round of editing.


Document format portability is essential to allow organizations to select their editing application of choice, and to be sure that their partners can collaborate with them in the editing of work in progress documents. The portability of every combination of document format version across every version of the tool is not required, since editing should be over a relatively short period of time compared to the overall document lifecycle.

PDF has been adopted by almost every organization for publishing final documents, so there is no fear that they will not be able to read those document into the future.

The ODF v. Open XML argument for long term viewing of documents is moot: do not rely on document formats designed for editing to provide long term viewing capability - use PDF/A instead.

Technorati tags:

Tuesday, August 08, 2006

Identity theft: banks must start monitoring

Identity theft is a big issue, as we are constantly being reminded by our banks and credit card companies. Some even appear to try and profit from the fear of this problem by offering fee-based monitoring services, often by offering customers a copy of their free credit report as an enticement.

A Bank Systems & Technology article Agencies Issue Proposed Rule on Identity Theft 'Red Flags' reports that US federal banking regulators are proposing new rules requiring banks to perform monitoring of customers' accounts as part of their standard operations:
The proposed regulations include guidelines listing patterns, practices and specific forms of activity that should raise a "red flag" signaling a possible risk of identity theft. Under proposed regulations, an identity theft prevention program established by a financial institution or creditor would have to include policies and procedures for detecting any "red flag" relevant to its own operations and implementing a mitigation strategy appropriate for the level of risk, according to a release from the agencies.

Although it is likely that banks will come back with questions regarding this proposal, an identity theft program seems close enough in appearance to their ongoing anti-money laundering (AML) programs that there will be little additional compliance burden. Specifically, the program as mandated by the Bank Secrecy Act (BSA) requires monitoring for suspicious activity, including specific money laundering 'red-flags'.

It is likely that financial institutions will be able to leverage current technology, or use this event as a driver to invest in appropriate technology, to perform automated monitoring and analysis of transactions and activities to also encompass the identity theft 'red flags'. From the Bank Systems & Technology report:

The proposal lists 31 red flags in connection with an account application or an existing account, including:

  • A notice of address discrepancy is provided by consumer reporting agency.
  • The photograph of physical description on the identification is not consistent with the pearance of the applicant or customer presenting the identification.
  • An account that has been inactive for a reasonably lengthy period of time is used.
  • The financial institution or creditor is notified that the customer is not receiving account statements.
  • An employee has accessed or downloaded an unusually large number of customer account records.

It would surprise me if some of these items were not already included in the AML program. For example, Know Your Customer requires that an institution verifies the identity of new customers. Discrepancies with other sources of information should automatically flag an issue. Not all 'red flags' will apply to every bank, and their risk assessments will help mould the scope of the new compliance program.

James Taylor often blogs about the capabilities of business rules and decision management to address these types of issues. Once in place these systems enable institutions to respond to this type of compliance monitoring rapidly and with minimal incremental cost. These approaches, along with basic monitoring within BPM processes such as New Account Opening could provide everything that is required across a range of identity theft, fraud and AML requirements. Another approach to look at is Aungate, which has examples of background monitoring capabilities.

As with any compliance regulation, the documentation and periodic audit of the program and controls may end up being larger than the effort to actually put it in place. Well designed automated systems reduce this burden by being effectively self-documenting and readily available for audit. A decent document management system, or an enterprise compliance management system (e.g. Certus) will hold all that documentation.

Since it seems that some banks already perform some of this monitoring as a fee-based service, this regulation may purely represent a revenue stream that may be going away soon.

Technorati tags: