Pages

Tuesday, December 21, 2010

From Spreadsheet to Data Model

Happy Holidays everybody.

In an ideal world, the physical model is derived from a logical model. The modeling team has done their due diligence and compiled all the business requirements and a consensus of naming standards, notations, and process definitions have already been agreed upon.

However, we do not live in an ideal world. The reality is that we often have to begin with the current physical environment. Any process of documenting and standardizing our design requires working backward from the current status to a more generalized one.

This often leads to the fact that there is a spreadsheet somewhere (there always is) that contains our documented logical attribute names and a data model that was reverse engineered from the database environment. How can we integrate the two?

What follows is a step by step process of integrating your logical names and your physical model using CA ERwin Data Modeler. You will need a Physical Model to start with and a spreadsheet with the documented attribute names.

1 - Reverse engineer (RE) your database as a Physical Only (PO) model and save it. If you already have a combined Logical/Physical model, you can use the Tools | Split model to derive a PO model. The RE can be against any supported database version (even ODBC) or a flat file containing the SQL statements to generate the objects.


2 - We will need to format the documentation of the column to logical names so that the logical names are in the first column of the spreadsheet and so that there is no header information. So the example documentation below...


...is reformatted into the format below...

Don't worry if you have duplicate column names. Even with duplicates, the mapping will work. There may be issues that will need to be manually corrected but I will save that discussion for the end.

3 - Save this Excel file as a CSV file, using the File | Save As... feature.

4 - In ERwin, in the Tools | Names | Model Naming Options select the Use File option and click Edit


This opens the Naming Standards Editor. We can manually build a glossary here but we will be importing, instead.

5 - Go to the Glossary tab and select the Import button. Notice that the expected File type in the File Selection editor is a CSV file. Select the CSV file from step 3. This will populate the glossary


6 - After building the glossary, be sure to save this glossary using the File | Save. This will create a NSM file. The NSM file is a proprietary ERwin file used in Name Mapping.

7 - Once you save the NSM, exit the editor. You will be back in the Model Naming Options editor in the ERwin software. Browse to the newly created NSM file.

8 - While still in the Model Naming Options editor, be sure to click on the Name Mapping tab and select the option to use the glossary for mapping your logical to physical attribute names. This step is essential and often overlooked.



9 - Click OK to save these changes.

10 - Use the Tools | Derive New Model and select your model type as Logical only model.


Notice that in the Naming Standards options, in this editor, that the NSM file attached to the current physical model is already selected. This is where an alternate NSM file could also be used, if necessary, in the future.

11 - Click Derive.

12 - If all goes correctly, we will be seeing our expanded Logical names in the new Logical model. Be sure to save this new Logical Only model version.

We're almost done. Now, let's derive the final combined model.

13 - In the current Model use the Tools | Derive New Model. Specify the Model Type to be Logical/Physical and make sure that the database version matches the original model (in case it does not) and click Derive.


The resulting Logical Physical model will allow you to toggle from the logical to the physical display and show the appropriate expansion or abbreviations based on the mappings.









Final notes and a caveat:

I am often asked if this works with foreign languages or special characters. The answer is yes. A user can use this technique even with foreign characters. Another frequent question is if the glossary can be refined, afterward, so that phrases can be replaced with individual word mappings. Again, the answer is yes. The nicest thing about the above outlined technique is that the final derived combined model is actively mapping based on the NSM glossary. The glossary can be modified and the model will update accordingly.

There is one caveat to this process and that is that you may often find that the same column (perhaps ID) can exist in many places but can have different expanded logical names (Department Identifier in the Department table and Employee Identifier in the Employee table, perhaps). This technique will not automatically be able to differentiate these and will map both instances of column to the same expansion. It would be pretty easy to identify the duplicates in the spreadsheet by sorting on the column names. These duplicates would need to be manually fixed in the model, by renaming these attributes in the in the logical model.