As I continue my offline delve into all things SOA and BPM, the concept of Master Data Management (MDM) keeps coming up. In the past I just assumed that MDM was about data warehousing and maybe even a grand vision for CRM - really just a way of collecting all of the information you have about your customers that is spread across a bunch of disparate legacy, COTS and home grown systems, and having it available whenever you need it. When you bring SOA and BPM into the MDM picture, things seem to get complicated.
Pre-SOA the problem with disparate systems was that applications typically required swivel-chair integration through re-keying portions of data from one system to another, or at best providing ocassional batch loads of information. A single accurate view of a customer's information rarely existed, and in any organization that has not replaced all its systems you will see how much data is spread around. For a different example, when you start a new job, look at how many times you enter your same personal information onto different forms. Each form exists to simplify the entry of your data into a separate system, and the duplicated information on each is an insight into this swivel-chair IT world.
When BPM was brought into the picture, traditional workflow systems typically added to the issue by copying data into a process instance at the start and never sent updates to the data back. Where workflow did use live customer data, it tended to extract the data from a point system, completely disregarding the reliability of that data, more focusing on the simplicity of accessing that system due to lack of other good integration mechanisms.
Customer data reflects our current knowledge of our customers, and should be affected by everything we do with a them, every transaction that is made, every interaction that we have and any background tasks that are going on. If every system that records customer data for itself is not effectively synchronized with others, even SOA is going to struggle to pull disparate systems into meaningful and accurate business services. To me this seems like a fundamentally unreliable piece of SOA. Each service has to rely on not only the actions backend systems perform but also understand the data that a system uses and how that may be inconsistent with another system used by the same service.
MDM provides a way to synchronize and pull data together from the underlying systems into a central place, and this consistent and current layer of master data does appear to have some value. I can also see that it is useful to be able to build new business logic on top of reliable master data, abstracted once from all the underlying sources of unreliable and disjointed data. This makes data reusable and new business logic easier to build and more reliable.
The problem is that I don't see how MDM helps SOA. SOA needs to work with disparate backend systems largely intact, benefiting from the logic they already provide. It should not be trying to replicate or rebuild the business logic in underlying systems, since if you are going to do that you might as well rip and replace those systems, not duplicate the logic in the integration layer.
To round it all out, SOA in combination with BPM needs to be aware of data inconsistency when using disparate backend systems. For them to be effective, all processes and services should ensure the feedback of up-to-date data into the backend systems that own it as the result of processes and service calls. For BPM this requires strong data modeling and integration (with SOA interoperability) to prevent process data duplication - something we have not seen in traditional systems, but I'm seeing more of now.
Maybe MDM can be useful as a component of SOA/BPM , but right now I'm struggling with how it doesn't just become another layer of data that disagrees with everything else in the enterprise.