Copyright © 2002 Essential Strategies, Inc.

The Common Warehouse Metamodel (CWM) and the Meta Object Facility (MOF)

David C. Hay

The Object Management Group (OMG) has published a UML data model of its view of metadata. The Common Warehouse Metamodel (CWM) is characterized by the OMG as a "standard ... for meta data interchange in the data warehousing and business analysis environments. CWM provides the long sought -after commmon language for descrribing meta data (based on a generic, but semantically replete, common data warehousing and business analysis domain metamodel)."[1]
In addition, they have published a UML model purporting to be the syntax used to create the CWM. Called the Meta Object Facility, it is "the 'abstract language' for defining different kinds of metadata."[2]
The only difficulty with this is that these models are very complex and, as rendered in the UML, very difficult to understand. They are documented by means of a Specification (http://www.omg.org/technology/documents/formal/cwm.htm for CWM and http://www.omg.org/technology/documents/formal/mof.htm for MOF), and by means of a book on CWM, Common Warehouse Metamodel, by John Poole, Dan Chang, Douglas Tolbert, and David Mellor. The book is a good overview, describing the various sections of the model, but its representations do not include relationship descriptions, nor do they include any indication of cardinality or optionality. Moreover, there is no glossary to define the various classes, and the definitions in the text are of varying quality. The Specification is more difficult to read, but it does include more detail, including some relationship names and the optionality and cardinality indicators for associations. It does include a glossary of technical terms, but this does not include definitions of the classes.
The models are very abstract, making extensive use of generalization, but this has the effect that in any portrayal of a detailed section of the model, there are important associations that were defined at a higher level that do not appear. One solution to this has been to add lower level relationships in some cases, but these are redundant in the overall model.
So, Essential Strategies has taken it upon itself to render the models into the entity relationship notation promoted by Richard Barker and the Oracle Corporation. This allows portrayal of sub-type boxes within super-type boxes, so that the full impact of the hierarchies can be seen.
The CWM models are organized in terms of the same "packages" portrayed in the book, although the relational package has a second drawing to show SQL aspects that are in the Specification but not the book. Each diagram is labeled with the figure number of the corresponding drawing in the book. The entity types in each diagram represent exactly the classes shown in the corresponding book diagram. In some cases, there were differences between the book and the Specification versions, and this is highlighted.
Because sub-types are shown inside super-types, the relationships among the associations at different levels of generalization are shown. This approach allows one to see the full array of associations between classes on all levels. This comes at the cost, however, of it's being difficult to represent multiple inheritance. To get around this problem, where it comes up, it is shown by a replication of the entity type boxes. It's not very rigourous, but it is to be hoped that this will suffice for purposes of this exposition.
In nearly every case, the relationships have been named. In some cases it was impossible to guess and question marks denote the association roles. Whereever aggregation or composition was shown in the UML model, in almost every case, the names are the UML's standard "part of" and "composed of". In some cases, however, other names seemed more appropriate and were used.


How to Read a Data Model

This data modeling notation is analogous to UML. Instead of class boxes, you have entity type boxes. Instead of associations, you have relationship lines. In each case, the relationship lines are named (in both directions) so that the relationships can be read as normal English sentences.
A "crow's foot" at one end of a relationship means that more than one occurrence of the attached entity type may be associated with the entity type at the other end. Absence of a crow's foot means that only one occurrence may be associated.
A dashed half line means that an occurrence of the entity type on the far end of the relationship line is optional. It may or may not occur. A solid half line means that at least one occurrence of the entity type at the far end is required. These rules, combined with the naming convention produce the following structure for naming relationships:

  • Each

  • [first entity type name]

  • must be (if the line is solid)
    (or)
    may be (if the line is dashed)

  • [relationship name]

  • one or more (if a crow's foot is present)
    (or)
    one and only one (if no crow's foot is present)

  • [second entity type name].




    For example, in the diagram above,

    1. Each DEPLOYED COMPONENT must be on one and only one MACHINE.
    2. Each MACHINE may be the site of one or more DEPLOYED COMPONENTS.



Except for OLAP and Data Mining, the CWM packages may be seen by clicking on one or more of the following sections: