Pages

Friday, June 18, 2010

Ensuring Data Quality and Governance Using Model Templates

In an ideal world the contents and purpose of any database table could be intuited solely on its name. Wouldn’t it help the development process to understand the content of a column based purely on its label? Wouldn’t data analysis be easily accomplished if system reports always implemented naming standards for all business objects on different systems? Wouldn't it facilitate data warehousing if similar data was always stored with similar datatypes regardless of the system in question? The answer, in every case, is yes. But this is easier said than done.

Whether you are a data architect designing a set of system tables, the database administrator who maintains that system, the development team who writes applications against an existing system, or even the business analyst that relies on report data to analyze current and future trends; the standardization of your design across your enterprise lends a level of transparency that facilitates work flow.

Data makes the world go round and the success of our business decisions, in large part, depends on the quality of our data. This has brought Master Data Management (MDM) to the forefront of every business discussion. As data stores continue to grow, the question becomes how to best ensure the transparency and quality of this data to all levels of the organization. The cornerstone of any MDM initiative should be a well defined data model.

A powerful data modeling tool can greatly facilitate this process by allowing you to define, maintain and share a set of corporate standards. Once these standards are defined, their reusability and convenient maintenance can save time and money.

The CA ERwin Data Modeler has an especially robust number of reusable features that can easily be shared as standalone template files as well as to the shared Model Manager repository. With that in mind, I am launching a series of articles relating to the reusability of design to ensure design integrity and data quality.

If you would like to try some of the exercises in this series, CA offers a limited Community edition of Erwin (available here) as well as a full version of the tool with a 15 day limited trial (available here).

In this first post I will be looking at the ability to define a reusable Domain object to define a column that changes depending on its context. Note that you can click on any image for a larger view.

First, we need to create a new model. We can define any physical database as our target server in this particular example. We create a new Domain in the Logical model and call it ID. Define the Domain Parent as Number. In the Datatype tab, define select INTEGER from the list.
In the general tab, notice that the default Name Inherited by Attribute is %AttDomain. This simply means that the domain name will be the domain name in a new occurrence of the attribute.
We modify this to read %OwnerEntity_%AttDomain:
This will prefix the domain name with the Owner entity. Note that this is not the Entity Name but rather the entity in which the column first occurs. This allows the name to remain constant as it migrates to other entities. So let us see the results.

Click OK to exit the editor. Now create two entities called Parent and Child.
Drag the ID domain into the parent column from the Model Explorer and release it. The result is:
Draw an identifying relationship from Parent to Child and note the migrating column name:
Now drag the ID domain into the Child table:
In this way, we can create domains that automatically follow a corporate standard to facilitate naming conventions. The Domain can further be customized to contain Defaults and Constraints and also to create UDPs and even to define default Data Warehouse rules on each column.
We will be covering the use of UDPs and implementation of Data Warehousing concepts in future entries in this series so stay tuned.

No comments:

Post a Comment