Maximum Data Modeling: July 2010

As Thomas Friedman famously pointed out, the world is flat (or at least, flattish). But with no universal language to unite us, English has become the de facto international language; the modern day Esperanto. Web based translating tools have become more accurate and agile in recent years, allowing multilingual conversations in forums and across enterprises. But there remains a problem in this ever diversifying world. Namely, how can we integrate complex systems that were defined with different native languages? In a world or international corporate mergers, how can we make our data play nice?

Data quality is the key to the success of any company. The cleaner and more accurate your data is, the better your business decisions will be. Some corporations continue to delay the Master Data Modeling (MDM) process due to a lack of leadership, or simply a lack of time and resources to achieve these goals. But these delays will almost certainly lead to larger costs the longer they wait. As insurmountable as the idea of mastering ones sprawling data may appear, delaying your efforts will be far more costly and time consuming. After all, there will only be more data to wrangle. The reality is that data is a corporation's greatest resource and any ability to stay ahead of the pack requires the ability to identify or predict trends, to focus efforts and to avoid work redundancies. All of this requires a deep understanding of our data. Of course, this is all dependent on the quality of that data and this issue is even further compounded by a language barrier.

Luckily, regardless of which MDM method your corporation eventually implements, there are basic strategies that can facilitate the process. One of the most fundamental strategies is to establish strongly defined Data Standards. For the purposes of this article, I am simply talking about creating a glossary of terms that become the universal vocabulary for an enterprise. In so doing, we facilitate communication across the entire enterprise, whether we are designing an application or generating a report or weekly sales, the transparency of the data comes from its name. This same method allows international collaboration.

In the following case we discuss an actual example in which an English team and a Spanish team needed to integrate their systems. The example has been simplified but the process would be valid for any larger project.
The first step was to have the US and Spanish team compile a glossary of business terms within their individual teams. Luckily, both teams had already created a glossary and were enforcing naming standards using a data modeling tool.

The next step was to translate and align the defined glossary values between both corporate standards. This was done mostly by using available translation tools and a bilingual employee.

Alignment was pretty straightforward but there remained a number of unaligned terms. All remaining unmapped terms were discussed in a meeting between the two teams. From this discussion, it was found that some of the remaining terms actually did map to each other. The remaining terms existed only in one system. These were added to the current shared glossary. The remaining entries needed to be discussed in greater detail. In the end these needed to be added to the glossary or removed from discussion and different terminology applied from the existing list of terms. This turned out to be the most time consuming part of the process.

Next,the existing glossary file which had English names mapped to physical abbreviations was updated and using the resulting translations from the steps above, another glossary was generated. A section of the resulting files follows:

Each glossary value was replaced with its foreign counterpart in the Spanish version of the file:

Since the teams were using a data modeling tool and naming standards, the logical design was currently synchronized with the physical table design. For example, the following Employee table...

...transformed by the glossary, defined the physical table...

This physical model was used to derive a new logical model using the translated Spanish language glossary. The results look like...

Even more useful was that the original Spanish language model could be linked to our physical gold standard model and our compare showed an apples to apples object alignment.
The two models appear as...

But when we add our Gold Standard physical model as the source for the Spanish model we see that the objects align based on our defined abbreviations

Using the sample described above, we notice that there is no Unique Employee ID on the current Employee table in the Spanish model. The decision is made that this should be implemented to have unique identifiers across the enterprise. We export the column to the Spanish model and the result is the following

Notice that the exported column appears as Empleado_Identificador and that the translation was automated by the attached naming standard file.

So what are the benefits of all this? By clearly defining a clear common language we can allow our designers to work in their own languages while updating a shared model. This allows more transparency and easier reporting. This also saves time as the translation is automated rather than having to occur multiple times during the process. Alternately, English could have been imposed on everyone’s design but this can lead to issues when developers are writing applications and may misunderstand the contents of a column due to the language barrier. Also, users requiring reporting services would need to have these reports translated multiple times. The initial investment is a huge time saver in the long term.

The takeaway is that a strong glossary of agreed upon terminology and concepts can make information sharing across a multilingual enterprise far easier and more transparent. A strongly defined data model greatly facilitates this.

Pages

Monday, July 12, 2010

Se Hable Data Model - Integrating Your Global Enterprise