Copyright © 2006 Cutter Consortium.

Semantics, Ontologies, and Data Modeling

David C. Hay
Copyright © 2006, Cutter Consortium

Introduction

It has always been the case that as organizations get bigger and more diverse it becomes progressively harder for separate groups to communicate with each other.  Moreover, as each department’s functions become more specialized, a language arises from the specialty that further removes it from other departments.  In the past, companies dealt with this through the organizational hierarchy that limited the actual communication that took place between departments.

In the modern age, however, it has become progressively more important not only for people from different departments to work together, but also for their systems to work together.  In particular, as companies are trying to integrate systems, it has become necessary to take a corporate view that requires different departments, if not to use the same language, at least to understand how their languages differ.  There is, of course one level of problem when an individual thinks (s)he’s passed along some information to another but the other misunderstood it.  It is a greater problem when one system sends megabytes of data to another, and the meaning of them is misunderstood (and therefore mishandled) at the receiving end.

For years now, some practitioners of data modeling have attempted to address this problem by using data models as a way of describing the semantics—the language and concepts—of an organization from a company-wide perspective.  The idea has always been that, if the concepts are correctly understood and these are the basis for system design, the resulting systems will be more robust and flexible—and they will then communicate with each other more effectively. 

This has been problematic, however, since the developers of systems have often not really understood or appreciated the more global view and pressures upon them to deliver a solution quickly has prevented them from taking proper advantage of the insights.

The problem has gotten new attention, however, as semantics has been discovered from a different direction.  In particular, the advent of the Semantic Web and its promise to allow more versatile manipulation of information on a world-wide basis, has revealed to an important degree the importance of unifying the semantics of different organizations.  On top of that, previously obscure disciplines such as artificial intelligence and linguistics have suddenly become relevant to the commercial systems development world.

The term ontology and its partner semantics have the distinction of being hot new buzzwords that are nearly 2500 years old.  Originally a branch of metaphysics, ontology now refers to a company’s vocabulary—finally recognized as something to be managed.  And semantics, the study of how language describes concepts, is getting new attention as we realize that how we communicate with each other is important.

The work being done in the field of semantics has sprung up over the past ten years or so quite independent of the efforts going on in data modeling and database design.  As with the companies we serve, those of us trying to address a company’s language have been split apart by our own specialties and haven’t known nearly enough about what the others are thinking.  

This paper is an attempt to bridge that gap.  In particular, the time has come to bring together the fields of

*        Data management (particularly data modeling)

*        Semantics and ontologies.

Specifically, it will take three sample data models and describe in detail the conversion of their semantics into the ontology language OWL.  In the process it should prove a good tutorial on as much of OWL as is being described by these models.  Without such an exercise, most books on OWL are difficult for novices to read and absorb.  The process of moving from a data model should make clearer what is going on in the OWL scripts thus produced.

Definitions

In keeping with the spirit of this paper, the place to start is with some definitions. 

Conceptual data modeling

Speaking of semantic confusion in a field of interest, the very term “data model” has several definitions in our industry.  For purposes of this article, a logical data model is defined to be a representation of data organized to be stored according to a particular data base technology.  This is where we find relational tables and columns, object-oriented classes, and IMS segments.  This is the basis for database design.

But it is not our concern here.

A conceptual data model, on the other hand, describes business concepts without regard for how they might be captured using a particular data base management technology.  To discuss semantics, this is the kind of data model that will be presented. 

A conceptual data model consists of the following elements:

*        Entity class* – The definition of a kind of thing of significance to the organization, about which it wishes to hold information.  These are named with common business terms, like order and product.

*        Relationship – An assertion about how two entity classes are associated with each other.  To be semantically meaningful the relationship names should accommodate formation of a formal sentence describing the assertion. 

Your author uses the form:

Each

 

<entity class 1>

 

must be
(or)

may be

 

<relationship name>

 

one or more
(or)

one and only one

 

<entity class 2>

            .

So, for example, a relationship might assert that “Each purchase order must be composed of one or more line items”.

·       Attribute – The definition of a kind of data describing instances of an entity class.  Attributes define the structure of an entity class, while instances of the entity class take values for each attribute.

For example, “Color” could be an attribute of the entity class automobile; for an instance of automobile that is your author’s automobile, the value of the attribute “Color” could be “Bright red”.*

Semantics

According to the Philosophical Dictionary, semantics “refers to the theory of meaning; study of the signification of signs or symbols, as opposed to their formal relations (syntactics)”.[1]  That is, in this context, we are concerned with how a company uses language.  What do words mean and how are they related to the underlying concepts they represent?

Ontology

Looking at the same dictionary, ontology  “originally referred to the branch of metaphysics concerned with identifying, in the most general terms, the kinds of things that actually exist”.[2]   This is concerned with such questions as “What are things?  What is the essence that remains inside things even when they change (changes in their color, changes in their size, etc.)?”[3] 

“In the modern age, Emmanuel Kant (1724-1804) provoked a Copernican turn.  The essence of things is not only determined by the things themselves, but also by the contribution of whoever perceives and understands them. ... Jose Ortega Y Gasset (1883-1955) went one step further than Kant.  He stated that the world strongly depends on the person who perceives it.” [4]  Assuncion Gomez-Perez goes on to say “And then we could add that this is not valid only for persons but for information systems.  In fact, each system may represent the word in different ways, and it may even perceive different features of the same world. ...Information systems, data structures and knowledge bases are designed not to represent the world faithfully but to work more efficiently for the purpose they have been designed.”[5]

Ontology, then is concerned not just with what exists, but also with how to classify the things we perceive exist.  Specifically, it can be defined as A catalog of the types of things that are assumed to exist:

               -   …in a domain of interest

               -   …with rules governing how those terms can be combined to make valid statements

               -   …and the resulting “sanctioned inferences” that can be made.[6]

In short, ontologies tell us what exists, and semantics tells us how to describe it.

Approaches

This article will show how data models are in fact examples of ontologies portraying how language describes what exists in an organization.  The article will also describe the most prominent language currently in use by ontologists—the Web Ontology Language, or OWL.*  Ontologies represented in OWL constitute a different approach to the problem, and as such, describe many things in different ways.  In particular, data models are better as vehicles for discussing concepts, but ontologies are more suitable for automating analysis.

Data modeling/database design

A data model is a graphic representation of the things of significance to an organization, along with relationships among those things.  Because it is graphic, it is particularly suitable as a basis for discussing concepts with the business community.  Properly done, it graphically describes the semantics of the business and in so doing provides a good picture of its underlying structures.

Data modeling comes with a particular world view, however, which reflects that of the database design it drives.  First, the data modeler deals primarily with classes (specifically, “entity classes”).  Instances are used to test models, but they are not a significant part of either the model or the modeling process.

Second, Data models are essentially for humans.  They are not in a form that a computer can process in more than the simplest way.

Third, a data model is created with the closed world assumption—namely, that only the statements made explicitly are considered to be true.  In conjunction with the business rules that accompany it, this is the assertion that if data do not conform to the model, by definition they are incorrect.  This is the “filter” approach.

Ontological engineering

Ontological engineering, on the other hand, involves creating text descriptions of assertions.  These are not exactly easy to read for the human observer, and it is not easy to understand all the implications the descriptions contain.  The text can be read by a computer, however, so it can be he basis for automated analysis and inference. 

The process of developing an ontology begins with instances.  If sets of instances share attributes or the values of particular attributes, class membership is inferred, and can then be described.  

Most significantly, ontological engineering makes the open world assumptionnamely, unless an assertion is specifically stated as not true, it may be true.  This makes it possible to analyze a large body of data and draw inferences about it.

For example, if the rule says that “each city must be located in one and only one state”, and the following data appear...

1.       Portland” is the name of a City

2.       Maine”, “Oregon” and “Ore.” are the names of States

3.       Portland is located in Maine

4.       Portland is located in Oregon

5.       Portland is located in Ore.

... a primitive database system would reject the fourth and fifth assertion. An ontological system, on the other hand, would not reject anything, but rather would ask:

1.       Are Portland in Maine, Portland in Oregon, and Portland in Ore. different cities?

2.       Are “Maine” and “Oregon” different names for the same state? What about “Oregon” and “Ore?

When analyzing unstructured data or data from multiple legacy systems, this kind of analysis can be critical.

THE sEMANTIC wEB

Ontologies and ontological engineering have been the domain of the artificial intelligence community for several decades, and in the last ten years have caught the attention of the knowledge management world.  What has brought them into the mainstream of the information technology world, however, has been the appearance of the idea of the Semantic Web.  

From the beginning, Tim Berners-Lee (inventor of the World-wide Web) envisioned that eventually it would go beyond its original objective of being a collaborative medium to the objective of being a semantic web—understandable, and thus processable, by machines. He saw it as “putting data on the web in a form that machines can naturally understand...a web of data that can be processed directly or indirectly by machines”.[7]  He sees it as an extension of the World-wide Web to allow for not just the retrieval of documents based on key words, but for their retrieval based on the semantics of their contents.

Michael Daconta and his colleagues see the Semantic Web as “a machine-processable web of smart data, [where] smart data is data that is application-independent, composeable, classified, and part of a larger information ecosystem (ontology)”[8]

This differs from simply creating a networked database that validates data with application programs (which act as filters, as described above).  Rather, it means exploring the semantics of data via a wide range of processors.  In this case, it is the data themselves that contain the semantics necessary for understanding by any program. The data are now a raw material for exploration.  According to Michael Daconta and his colleagues, there are four stages in the “smart data” continuum:

*           Text and databases (pre-XML)Most data are proprietary to an application.  The “smarts” are in the application and not in the data.

*           XML documents in a single domain—Here data achieve application independence within a domain.  For example XML could describe standard semantics within the health care industry, the insurance industry, and so forth.

*           Taxonomies and documents with mixed vocabularies—In this stage data can be collected from multiple domains and accurately classified. This classification can then be used for discovery of data.  Simple relationships between categories in the taxonomy can be used to relate and thus combine data.  Data are now smart enough to be easily discovered and sensibly combined with other data.

*           Ontologies and rules—in this stage, new data can be inferred from existing data by following logical rules.  Data are now smart enough to be described with concrete relationships and sophisticated formalisms.  Logical calculations can be made on this “semantic algebra”.  In this stage data no longer exist as a blob but as a part of a sophisticated microcosm.[9]

The Semantic Web, then, is an attempt to build a World-wide Web of interconnected, machine-understandable information, with the “smarts” contained in the data, not in the application.  This is a logical extension from thirty years of attempting to build corporate databases that were application independent, but it moves the idea beyond the boundaries of the organization.

The architecture of the Semantic Web is based on six “layers”, much like the ISO concept of layers in data communications:

1.    Uniform Resource Identifiers (URIs) and Namespaces — how things are named.

2.    XML and XML Schema Data types — an underlying language for describing and communicating data via “tags”.

3.    Resource Development Framework (RDF) and RDF/XML — a set of XML tags for describing basic natural language sentences.

4.    RDF Schema  — a set of additional XML tags for constructing primitive ontologies, including the ability to distinguish between classes and instances.

5.    Ontology languages, such as OWL — additional XML tags for describing an ontology more completely and precisely.

6.    Applications — making use of ontologies.

The field is new, and it is not clear just what “applications” will entail, but we can begin by addressing the other layers.  Specifically, RDF and OWL are structured languages we can map back to what we already know how to do with data models.

1.      Uniform Resource Identifiers and Namespaces

In order to describe anything, it is necessary to name it.  The World-wide Web provides a ready-made scheme for doing exactly that.  The concept of the Uniform Resource Locator (URL) is known to all as the way to refer to a web page.  The ontology world has generalized this to include not only URLs but anything that is to be identified. The Uniform Resource Identifier (URI) is a formatted piece of text that identifies anything.  It is in two parts:

<scheme name>:<scheme-specific name>

 URLs are specifically licensed by the Internet Corporation for Assigned Names and Numbers (ICANN).  URIs include these plus anything else one could imagine.  Naturally, a URL, such as http://www.essentialstrategies.com  is also a URI, but trees:elm is a URI as well, even though it is not a URL. This more generalized approach to URIs allows ontologies to be specified outside the realm of the World-wide Web.

XML and XML Schema

Here’s a whirlwind synopsis of XML[10]:

An XML document contains “tags” describing strings of text.  These are similar to the tags in HTML, but where HTML tags describe formatting components of a document, these tags describe the semantic content of it.  For example,

<product>
   <product name>BlackBerry</product name>
</product>

As you can see, the tag describes the text that follows.  The text is then demarked by a corresponding end tag in the form </…>.

Tags are typically defined in accompanying files called data type definitions (DTD).  A DTD is itself a document with the following structure:

Header:                          <DOCTYPE PRODUCT>

Context for the tag:      <!ENTITY product (product_name)>

Tag definition:              <product_name (#PCDATA)>

Note that “(#PCDATA)” simply means that’s where actual data go.  In the context line, a character may be added after the tag name (for example, product_name+).  The character determines how many occurrences of the tag are required for each occurrence of the context tag:

(no character)   (Default) Mandatory, single valued (must be one and only one.)

               +   Mandatory, one or more occurrences (must be one or more).   

                ?   Optional, single valued (may be one and only one).

                *   Optional, one or more occurrences (may be one or more).

XML schema is an alternative to DTDs.  XML Schema is an XML document that configures other documents. In fact, the languages being presented here—RDF, RDF Schema, and OWL, are examples of XML Schemas.  In each case, a namespace is included in the ontology script defining the language and with it the tags to be used.

2.      XML Namespaces

In the context of XML, an XML Namespace is a URI describing an Ontology.  All the terms contained in the ontology are identified in terms of that namespace.  For example, in the “Parties” ontology to be described below, the term “Person” appears.  The ontology namespace might be declared as:

XMLNS=https://essentialstrategies.com/OWL/parties

and the term “person” would be described as:

https://essentialstrategies.com/OWL/Parties#person

Note that the namespace could be abbreviated “parties” and defined thus:

XMLNS:parties=https://essentialstrategies.com/OWL/parties

Which would mean that “Person” could be simply referred to as

parties:person

3,      Resource Definition Framework (RDF)

The Resource Description Framework (RDF) is a collection of XML tags used to describe the structure of sentences.  Specifically, tags identify the following semantic concepts:

<rdf:subject> -        The subject of a sentence.  The actor.

<rdf:predicate> -     The action taken by the subject.

<rdf:object> -         Something that the subject acts upon, via the         predicate.

This structure provides a way to catalogue documents written in otherwise unstructured text.  There is no distinction between classes and instances, however, and the manipulations available with this syntax are limited.

If a data model were to be rendered in RDF, all that could be represented would be:

*         Entity class:          <rdf:subject> or <rdf:object>

*         Relationship:                 <rdf:predicate> + <rdf:object>

*         Attribute:             <rdf:predicate>
                              an attribute of
               
</rdf:predicate>

And as mentioned, instances are not differentiated from classes although there is a tag <rdf:Type> that defines instances.  It is only that, absent the ability to define classes it is not meaningful in the context of RDF. (More about that, below.)

For example:

<rdf:subject>Acura Integra</rdf:subject>
<rdf:predicate>was manufactured by</rdf:predicate>
<rdf:object>Honda Motors</rdf:object>

Note that RDF cannot distinguish between instances of facts, such as this, and classes such as the generic assertion that an automobile is manufactured by a company.

4.      Resource Development Framework Schema (RDFS)

The Resource Description Framework Schema (RDFS) is an additional set of XML tags, to be used in combination with the RDF tags.  This expands the semantic descriptiveness of the ontology.  Specifically, tags are added to describe:

<rdfs:resource>

<rdfs:class>

<rdfs:sub-class>

<rdfs:range>

<rdfs:domain>

...and others

In addition, the RDF tag <rdf:type> is now meaningful as a way of distinguishing instances from classes.  Data models can be constructed from RDF Schema as follows:

Entity class - <rdfs:class rdf:ID=”...”>

Attribute -      <rdfs:property rdf:ID=”...”>  
             + <rdfs:domain rdf:ID=”...”>

Relationship -          <rdfs:property rdf:ID=”...”>
                     + <rdfs:domain rdf:ID=”. .”>
             + <rdfs:range rdf:ID=”...”>

Instance -       <rdfs: type rdf:ID=”...”>

Note that both attributes and relationships are properties, defined independently of the things they are properties of.  Unlike our data modeling definition of “domain”, domain here refers to the entity class that a property (attribute or relationship) is a property of*.  In addition, if the property is a relationship, the range refers to the entity class that the relationship points to.  If the property is an attribute, range refers to an XML Schema class that defines its format.

Also note that the name of the class, property, etc. is in the form of the RDF tag “ID” used as an argument.

For example, you could define the classes status and color, and give both of them the attribute “Name” as follows:

<rdfs:class rdf:id=”status”/>

<rdfs:class rdf:id=”color”/>

 

<rdfs:property rdf:id=”Name”>

      <rdfs:domain rdf:resource=”status”>

      <rdfs:domain rdf:resource=”color”>

</rdfs:property>

 

5.      Web Ontology Language (OWL)

RDF and RDFS provide a good basis for developing ontologies, but they are unable to describe restrictions and rules for inference.  For this is needed a full-blown ontology language.

As part of its IDEF series of notations, back in the 1990s the Federal Government sponsored the creation of IDEF5, an ontology expression graphical language.  A report describing it was published in 1994, but it has gotten little publicity since then, since it predated the efforts to develop the Semantic Web.  The report is a good synopsis of various ontological topics, however, and is worth reading for anyone interested in the field.[11]

The Defense Advanced Research Projects Agency (DARPA)  began a project in 2000 that resulted in the DARPA Agent Markup Language (DAML).  At the same time, the European Union was creating the Ontology Inference Layer (OIL), which covered much the same ground.  As the researchers became aware of each other’s efforts, DAML+OIL was born.   

Based on DAML+OIL, the Web Ontology Language (OWL)  has been developed by the World Wide Web Consortium (W3C), the organization which is responsible for defining standards for the Semantic Web.[12]  It is more expressive than either RDF and RDFS or DAML+OWL in its ability to describe constraints on data.

OWL is an extensive language.  There are in fact three versions of it, OWL LITE, OWL DL, and OWL FULL.  Each is more rich than the last in its ability to describe constraints.  For purposes of this article, we have been using OWL DL, primarily because this is the version supported by most tools.   The “DL” stands for description logic.

Like the other languages discussed above, OWL consists of a set of XML tags, and encompasses several of those from RDFS.

A data model can be expressed in terms of the following OWL constructs:

*        Entity class -       <owl:Class rdf:id=”...”>

*        Attribute -         <owl:datatypeProperty rdf:id=”...”>
                              + <owl:domain rdf:id=”...”>
               + <owl:range rdf:resource=

                                                  ”=http://www.w3.org/2001/
                    XMLSchema#...”>

*        Relationship -    <owl:objectProperty rdf:id=”...”>

                                        + <owl:domain rdf:id=”...”>

                    + <owl:range rdf:id=”...”>

Note that these tags are equivalent to the corresponding RDFS tags, but the values of the class, property, etc. are arguments within the tags.  One difference is that “property” has been specialized into “datatypeProperty” to represent attributes and “objectProperty” to represent relationships. 

6.      Applications

The Semantic Web is too new for there to be more than a few demonstration applications available, but several directions seem clear.  In particular, OWL’s open world assumption specifically makes it unsuitable for describing business rules as it now exists. (This is discussed in more detail, below.)  After all, business rule specification is all about excluding data that doesn’t conform.  The need remains, however, and a great deal of work is going on in other areas to address this.  At the very least, one could imagine another layer of software being developed to parse ontological data to find business rule violations.  Whether this will be the solution or OWL is extended to cover such things remains to be see.

A sample Data Model – a Taxonomy

A taxonomy is an ontology that is constrained to represent a hierarchy.  The best example of that is from biology, with its phylum, family, genus, species, and so forth.  There is no multiple inheritance:  each element is a sub-class of only one other element.  An elephant is a mammal and can never be a reptile. 

In the world, there are in fact relatively few examples of natural taxonomies.  Most real-world things to be modeled belong to many different categories, typically overlapping, and certainly changing over time. There are certainly many examples of multiple inheritance in the world.

Taxonomies and ontologies with multiple inheritance are treated very differently in data modeling and in ontologies.  Our data model example will show this.

Figure 1 shows a data model of a taxonomy.  The notation used is originally from a consulting company, CACI.  It is used extensively both by the Oracle Corporation and as part of the European Structured Systems Analysis and Design Method (SSADM) notation.[13]  It is notable in that it shows sub-class boxes (“sub-types” in model-speak) graphically as being contained inside super-class boxes. 

That is, in this model, each instance of party must also be an instance of either organization or person, but not both.  Similarly, each organization must be either an internal organization or an external organization, but not both.  An instance of sole proprietorship, by definition must also be an instance of company, external organization, organization, and party.

Figure 1:  A Data Model of a Taxonomy

This data modeling approach makes several assumptions:

1.    The sub-types are exhaustive.  That is, each instance of party (in this example) must be either a person or an organization and can be nothing else.  There are no parties that are not one of these.

2.    The sub-types are mutually exclusive (also called disjoint).  An instance of party may not be both a person and an organization.

3.    An entity class may not be a sub-type of more than one other entity class.  (No multiple inheritance.)

In addition to party, the model also shows the entity class party type.  The relationship asserts that each party must be an example of  one and only one party type.  That is exactly the same as asserting (as the sub-type structure does) that each party must be either an instance of organization or an instance of person, but not both.  The similarity is not an accident.  The model is expressing exactly the same thing in two different ways.  By definition, the first instances of party type must be “Person”, “Organization”, “Internal Organization”, and so forth.

Note that each party type (such as “Organization”) may be a super-type of one or more other party types (such as Internal Organization” and “External Organization”).

Converting The Taxonomy To OWL

Define classes

Numerous tools are available for creating an OWL ontology.  Most notable are Protégé[14], Cerebra[15], and Metatomix[16].  Only Cerebra has a graphic interface, so for purposes of this exercise, that is what will be used.  The other tools are actually better at allowing one to specify the details of a model.

Figure 2 shows the sample taxonomy as it is represented in the Cerebra tool.  Coincidentally enough, it looks much as it would in any data modeling tool that shows sub-types outside super-types.  Of course what were called “entity classes” in the conceptual model are here called “classes”, but that doesn’t matter to the meaning of the model.

There is one significant difference, though.  With two exceptions, the “attributes” are shown not as text inside the class boxes, but as properties (specifically datatype properties) represented by the greyed hexagons outside the boxes.  The lines from each datatype property to a class represent the class’s being the “domain” of that datatype property.


Figure 2:  The Taxonomy in Cerebra

 

Once the picture is drawn, then the next task is to export it to an .owl file containing the XML that is the OWL script.

After the XML heading, the first RDF statement defines the namespaces:

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF

  xmlns:xsd=http://www.w3.org/2001/XMLSchema#               (XML Schema)

  xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#     (RDF

  xmlns:owl=http://www.w3.org/2002/07/owl#                  (OWL

  xmlns:rdfs=http://www.w3.org/2000/01/rdf-schema#          (RDF Schema

  xmlns:dc=http://purl.org/dc/elements/1.1/                 (Dublin Core)

  xmlns:essentialstrategies=                                (Your author)  

      "https://essentialstrategies.com/OWL/Parties#"      

  xml:base=                                                 (Default)

      "https://essentialstrategies.com/OWL/Parties#">   

Note that XML Schema, Dublin Core, RDF, RDF Schema, and OWL are themselves namespaces.  This is where the tags are in fact defined.  Also included is the “xml:base” argument that defines the default namespace for terms used in the OWL script.  

Also note that the tag “rdf:RDF” defines the entire document.  Namespaces shown here are arguments for that tag.  The </rdf:RDF> tag comes at the end of the document.

The ontology that is to be this OWL script is then defined.  This could be a simple place holder to give it a name and version, but in this case, one of the namespaces included in the script is from the Dublin Core.  That provides tags for various parameters describing documents, such as the creator, date, and so forth:

 <owl:Ontology rdf:about="">

    <dc:title/>

    <dc:date>5/21/2006 10:44:51 AM</dc:date>

    <dc:creator>David C. Hay</dc:creator>

    <dc:description/>

    <dc:subject/>

    <owl:versionInfo>1.0</owl:versionInfo>

  </owl:Ontology>

Then the party class is defined, always as a sub-class of the OWL generic class thing.

<owl:Class rdf:about="http://www.w3.org/2002/07/owl#Thing"/>

<owl:Class rdf:about="https://essentialstrategies.com/
            OWL/Parties#PARTY">
      <rdfs:subClassOf rdf:resource=

           "http://www.w3.org/2002/07/owl#Thing"/>

</owl:Class>

Then come the rest of the classes and their associations with their super-types.

<owl:Class rdf:ID="PERSON">

      <owl:disjointWith rdf:resource="#ORGANIZATION"/>

      <rdfs:subClassOf rdf:resource="#PARTY"/>

</owl:Class>

 

<owl:Class rdf:ID="ORGANIZATION">

      <owl:disjointWith rdf:resource="#PERSON"/>

      <rdfs:subClassOf rdf:resource="#PARTY"/>

</owl:Class>

 

<owl:Class rdf:ID="INTERNAL_ORGANIZATION">

      <owl:disjointWith rdf:resource="#EXTERNAL_ORGANIZATION"/>

      <rdfs:subClassOf rdf:resource="#ORGANIZATION"/>

</owl:Class>

 

<owl:Class rdf:ID="EXTERNAL_ORGANIZATION">

      <owl:disjointWith rdf:resource="#INTERNAL_ORGANIZATION"/>

      <rdfs:subClassOf rdf:resource="#ORGANIZATION"/>

</owl:Class>

 

<owl:Class rdf:ID="COMPANY">

      <owl:disjointWith rdf:resource="#GOVERNMENT_ORGANIZATION"/>

      <owl:disjointWith rdf:resource="#OTHER_ORGANIZATION"/>

      <rdfs:subClassOf rdf:resource="#EXTERNAL_ORGANIZATION"/>

</owl:Class>

 

<owl:Class rdf:ID="GOVERNMENT_ORGANIZATION">

      <owl:disjointWith rdf:resource="#COMPANY"/>

      <owl:disjointWith rdf:resource="#OTHER_ORGANIZATION"/>

      <rdfs:subClassOf rdf:resource="#EXTERNAL_ORGANIZATION"/>

</owl:Class>

 

<owl:Class rdf:ID="OTHER_ORGANIZATION">

      <owl:disjointWith rdf:resource="#COMPANY"/>

      <owl:disjointWith rdf:resource="#GOVERNMENT_ORGANIZATION"/>

      <rdfs:subClassOf rdf:resource="#EXTERNAL_ORGANIZATION"/>

</owl:Class>

 

<owl:Class rdf:ID="REGULATORY_AGENCY">

      <owl:disjointWith rdf:resource="#TERRITORIAL_GOVERNMENT"/>

      <owl:disjointWith rdf:resource="#OTHER_GOVERNMENT_ORGANIZATION"/>

      <rdfs:subClassOf rdf:resource="#GOVERNMENT_ORGANIZATION"/>

</owl:Class>

 

<owl:Class rdf:ID="TERRITORIAL_GOVERNMENT">

      <owl:disjointWith rdf:resource="#REGULATORY_AGENCY"/>

      <owl:disjointWith rdf:resource="#OTHER_GOVERNMENT_ORGANIZATION"/>

      <rdfs:subClassOf rdf:resource="#GOVERNMENT_ORGANIZATION"/>

</owl:Class>

 

<owl:Class rdf:ID="OTHER_GOVERNMENT_ORGANIZATION">

      <owl:disjointWith rdf:resource="#REGULATORY_AGENCY"/>

      <owl:disjointWith rdf:resource="#TERRITORIAL_GOVERNMENT"/>

      <rdfs:subClassOf rdf:resource="#GOVERNMENT_ORGANIZATION"/>

</owl:Class>

 

<owl:Class rdf:ID="CORPORATION">

      <owl:disjointWith rdf:resource="#SOLE_PROPRIETORSHIP"/>

      <owl:disjointWith rdf:resource="#PARTNERSHIP"/>

      <rdfs:subClassOf rdf:resource="#COMPANY"/>

</owl:Class>

  

<owl:Class rdf:ID="SOLE_PROPRIETORSHIP">

      <owl:disjointWith rdf:resource="#CORPORATION"/>

      <owl:disjointWith rdf:resource="#PARTNERSHIP"/>

      <rdfs:subClassOf rdf:resource="#COMPANY"/>

</owl:Class>

 

<owl:Class rdf:ID="PARTNERSHIP">

      <owl:disjointWith rdf:resource="#CORPORATION"/>

      <owl:disjointWith rdf:resource="#SOLE_PROPRIETORSHIP"/>

      <rdfs:subClassOf rdf:resource="#COMPANY"/>

</owl:Class>

Important note:  In the Ontology world, simply making classes sub-classes of others does not impose any constraints.  The classes are neither exclusive nor exhaustive, nor is there anything preventing them from being sub-classes of more than one other class.

More significantly, there is nothing to prevent an instance of one class from being an instance of another.  This is significant because it is such a fundamental assumption among data modelers and database designers that it is never expressed.  You couldn’t imagine an instance of flight also being an instance of airport. Ontologies make no such assumptions and you must make them explicit.

Thus, each class must explicitly be made disjoint from every other class in the sub-class group, if the ontology is to mimic the data model.  Hence the extra line(s) in each class definition above.

(The Cerebra tool allows you to add red lines to the drawing, but in the absence of good graphic control, these lines tend to clog the diagram, so they aren’t shown in Figure 2.)

Define attributes

Note that in Figure 2, attributes are shown external to the class boxes.  This is because in OWL, properties exist independent of the things they are properties of.  The kind of properties that concern us here are datatype properties, properties that describe one or more classes. Once a datatype property has been defined, it may then be linked to classes in one of two ways:


·       First, it may have a class defined as its domain.  That is, a datatype property’s domain is the class it is a property of.  In this case, it also may have an XMLSchema class defined as its range.  This defines its format.

·       The second approach is expressed as the fact that the class is a sub-class of a restriction that is represented by this property (described by its onProperty tag).  That is, the class is a sub-class of all things that have this property. Its allValuesFrom tag defines the XML Schema class that holds its format.

In Figure 2, the datatype properties in the first category are shown linked to their entity classes.  The datatype properties in the second category are shown alongside their classes, but not connected to them.  In addition, properties of this second type are also shown listed within the class boxes.

In this example, the attributes of person, party, corporation, and regulatory agency are described according to the first method in the OWL document as follows:

<owl:DatatypeProperty rdf:ID="Surname">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#PERSON"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#string"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Given_Name">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#PERSON"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#string"/>


</owl:DatatypeProperty>

      <owl:DatatypeProperty rdf:ID="Middle_Name">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#PERSON"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#string"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Middle_Initial">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#PERSON"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#string"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Birthdate">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#PERSON"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#date"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="ID">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#PARTY"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#string"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Tax_ID">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#CORPORATION"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Domain">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#REGULATORY_AGENCY"/>

</owl:DatatypeProperty>

In each case, the domain and range of the property is shown as part of the definition of the property. 

Note that these are all defined as functional properties. This means that for any instance of the class involved, the property can only take one value.  This is one way to assert cardinality.  In a later section will be a description of how to define cardinality constraints (>0, <1) explicitly.

Because the domain class is part of the definition of the property, the same attribute name cannot appear in more than one class.  That is, if “ID” is defined as having the domain person it cannot also be used to identify product.  To do this, you have to take the second approach to defining properties.

In the example, this second approach is represented by the attributes of organization.  In the resulting OWL script, organization appears as a sub-class of two Owl classes: things that are restricted to (defined by) the datatype property “Creation_date”; and things that are restricted to (defined by) the datatype property “Organization_name”.  The <owl:onProperty> tag describes the attribute, and the <owl:allValuesFrom> tag describes its dataType or format.

First*, as before, the dataTypeProperties are defined:

<owl:DatatypeProperty rdf:ID="Organization_Name">

      <rdf:type rdf:resource="http://www.w3.org/2002/07/

            owl#FunctionalProperty"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Creation_Date">

      <rdf:type rdf:resource="http://www.w3.org/2002/07/

            owl#FunctionalProperty"/>

</owl:DatatypeProperty>


Then the classes are associated with them:

<owl:Class rdf:ID="ORGANIZATION">

      <rdfs:subClassOf rdf:resource="#PARTY"/>

 

      <rdfs:subClassOf>      

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#Organization_Name"/>

                  <owl:allValuesFrom                                                            rdf:resource="http://www.w3.org/2001/

                        XMLSchema#string"/>

            </owl:Restriction>

      </rdfs:subClassOf>

 

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#Creation_Date"/>

                  <owl:allValuesFrom                                                            rdf:resource="http://www.w3.org/2001/

                        XMLSchema#string"/>

            </owl:Restriction>

      </rdfs:subClassOf>

 

</owl:Class>

Since the assignment of the class to a datatype property is not part of the definition of the datatype property, it can be used in more than one class.

You will notice that on this model there is a relationship between party and party type that we have not yet explained.  That will be covered in the next section.

A second data model – with relationships

Most data models, though, are not taxonomies.  They are multidimensional with all kinds of entity classes related to each other in various ways. Figure 3 shows a typical business model, describing order, which is any contract between a buyer and a seller of products or services. 

A Data Model

Specifically, each order must be from one person or organization (the buyer in the order) and it must be to another party (the seller in the order).  Each order, in turn, may also be composed of one or more line items, each of which must be either for one service type or for one product type.*

 


Figure 3: A Data Model with Relationships

Note that these are product and service types.  That is, the order is placed for the specification of the product or service, as found in, for example, a catalogue.  An extention of the model would describe the actual delivery of either a product or a service. 

The service type is priced in terms of the hours that would be spent to carry it out.  The product type is priced in terms of the unit of measure by which the product will be delivered—“Each”, “Kilogram”, etc.

A limitation of ontologies

In the course of writing this article, your author has discovered a serious limitation of ontologies.  They are unable to describe calculated fields.  A data model cannot represent the calculations directly either, unless they are added as a note, as is done in UML, but at least the resulting attributes can be shown and annotated behind the scenes. Application generators can then implement these calculations.

This model has several calculated fields.  First, in line item, you see “Unit price”.  For each instance of line item, this is inferred from either the “Price per hour” in service type or “Unit price” in product type, depending on which relationship applies to the instance of line item.  (Note that “Unit of measure” is also inferred through two levels.)  Once this is obtained, “Extended value” can be computed as “Quantity” times “Unit price”.  Then, for each instance of order, it is possible to compute the “Total value” by taking the sum of line item’s “Extended value” for all instances of line item that are part of the order.

The ontology - classes


Because the Cerebra graphical representation of ontologies can be extremely complex, let’s start with how the classes look.  This is shown in Figure 4,  the classes from our model are shown, with the explicit lines showing that all the classes are disjoint.

 

Figure 4: Cerebra and Order Classes

In the OWL/XML script, first, we see that party, person¸ and organization are again defined as before.  Note that if person is disjoint with organization, it is not necessary to also say that organization is disjoint with person, although you may.

 

<owl:Class rdf:ID="PARTY">

      <owl:disjointWith rdf:resource="#ORDER"/>

      <owl:disjointWith rdf:resource="#LINE_ITEM"/>

      <owl:disjointWith rdf:resource="#SERVICE_TYPE"/>

      <owl:disjointWith rdf:resource="#PRODUCT_TYPE"/>

      <owl:disjointWith rdf:resource="#UNIT_OF_MEASURE"/>

</owl:Class>

 

<owl:Class rdf:ID="PERSON">

      <rdfs:subClassOf rdf:resource="#PARTY"/>

      <owl:disjointWith rdf:resource="#ORGANIZATION"/>

</owl:Class>

 

<owl:Class rdf:ID="ORGANIZATION">

      <rdfs:subClassOf rdf:resource="#PARTY"/>

      <owl:disjointWith rdf:resource="#PERSON"/>

</owl:Class

The remaining classes are similarly defined:

<owl:Class rdf:ID="ORDER">

      <owl:disjointWith rdf:resource="#PARTY"/>

      <owl:disjointWith rdf:resource="#LINE_ITEM"/>

      <owl:disjointWith rdf:resource="#SERVICE_TYPE"/>

      <owl:disjointWith rdf:resource="#PRODUCT_TYPE"/>

      <owl:disjointWith rdf:resource="#UNIT_OF_MEASURE"/>

</owl:Class>

 

<owl:Class rdf:ID="LINE_ITEM">

      <owl:disjointWith rdf:resource="#PARTY"/>

      <owl:disjointWith rdf:resource="#ORDER"/>

      <owl:disjointWith rdf:resource="#SERVICE_TYPE"/>

      <owl:disjointWith rdf:resource="#PRODUCT_TYPE"/>

      <owl:disjointWith rdf:resource="#UNIT_OF_MEASURE"/>

</owl:Class>

 

<owl:Class rdf:ID="UNIT_OF_MEASURE">

      <owl:disjointWith rdf:resource="#PARTY"/>

      <owl:disjointWith rdf:resource="#ORDER"/>

      <owl:disjointWith rdf:resource="#LINE_ITEM"/>

      <owl:disjointWith rdf:resource="#SERVICE_TYPE"/>

      <owl:disjointWith rdf:resource="#PRODUCT_TYPE"/>

</owl:Class>

 


<owl:Class rdf:ID="SERVICE_TYPE">

      <owl:disjointWith rdf:resource="#PARTY"/>

      <owl:disjointWith rdf:resource="#ORDER"/>

      <owl:disjointWith rdf:resource="#LINE_ITEM"/>

      <owl:disjointWith rdf:resource="#PRODUCT_TYPE"/>

      <owl:disjointWith rdf:resource="#UNIT_OF_MEASURE"/>

</owl:Class>

 

<owl:Class rdf:ID="PRODUCT_TYPE">

      <owl:disjointWith rdf:resource="#PARTY"/>

      <owl:disjointWith rdf:resource="#ORDER"/>

      <owl:disjointWith rdf:resource="#LINE_ITEM"/>

      <owl:disjointWith rdf:resource="#SERVICE_TYPE"/>

      <owl:disjointWith rdf:resource="#UNIT_OF_MEASURE"/>

</owl:Class>

The ontology – relationships

A relationship in OWL-speak is called an object property, and is similar to a datatype property (described for the previous model) in its relationship to classes.  In effect, a property of a class may be either an attribute or a relationship to something else.

As with attributes, each object property is defined independently of the things it is relating.  It is then linked to classes in one of two ways:

·       In one case, it may have a class defined as its domain and a second class defined as its range.  As with datatype properties, an object property’s domain is the class it is a property of.  In addition, its range is the class it is related to.

·       The second approach is expressed as the fact that the class with the relationship is a sub-class of the restriction that is represented by this property.  This restriction is then expressed for the class in terms of the property involved (the “onProperty” tag), and the class whose values are to be linked to this property (the “someValuesFrom” tag). Both approaches are illustrated below.

Figure 5 shows the model with relationships. In this case, most of the relationships shown are of the “domain/range” type since the relationship names are unlikely to be re-used anywhere else in the model.  Between order and line item, however, the relationships “part of” and “composed of”  are of the “sub-class of restriction” type, since it is reasonable to believe that these relationship names could be reused.

 

 


Figure 5: Cerebra Order Relationships


Here is how the order/line_item  relationships appear in the script:

<owl:ObjectProperty rdf:ID="composed_of"/>

      <owl:inverseOf rdf:resource="#part_of"/>

</owl:ObjectProperty>

 

<owl:Class rdf:ID="ORDER">

      <owl:disjointWith rdf:resource="#PARTY"/>

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#composed_of"/>

                  <owl:someValuesFrom rdf:resource="#LINE_ITEM"/>

             </owl:Restriction>

      </rdfs:subClassOf>

</owl:Class>

The class order is a sub-class of an anonymous class that is the set of all things constrained by the property “composed of”.  It takes “some values from” the other class, line_item, to populate the relationship.

Note that in both cases, you can define that relationships are inverses of each other, although this doesn’t show in the Cerebra drawing.

Remember the arc between line item and either service type or product type in the data model?  One way to deal with this would have been—in the data model itself—to create a super-type, such as “sales item”,  with service type and product type as sub-types.  Then the relationship would not involve the arc at all.  It would simply be that “Each line item must be for one and only one sales item”.  If that’s not done, however, an option in OWL is to define a new class that is the union of the two classes.  Logically, this is equivalent to defining a super-type, in that it consists of the set of instances of product type plus the set of instances of service type.  The assertion that each line item must be for one and only one of either is equivalent to the assertion that each line item must be for one and only one instance of the union.

The domain and range of the inverse object properties “purchased_via” and “for” are described thus in OWL:

 


<owl:ObjectProperty rdf:ID="for">

      <rdf:type rdf:resource="http://www.w3.org/2002/07/

            owl#FunctionalProperty"/>

      <owl:inverseOf rdf:resource="#purchased_via"/>

      <rdfs:domain rdf:resource="#LINE_ITEM"/>

      <rdfs:range>

            <owl:Class>

                  <owl:unionOf rdf:parseType="Collection">

                        <owl:Class rdf:about="#SERVICE_TYPE"/>

                        <owl:Class rdf:about="#PRODUCT_TYPE"/>

                  </owl:unionOf>

            </owl:Class>

       </rdfs:range>

</owl:ObjectProperty>

 

<owl:ObjectProperty rdf:ID="purchased_via">

      <owl:inverseOf rdf:resource="#for"/>

      <rdfs:domain>

            <owl:Class>

                  <owl:unionOf rdf:parseType="Collection">

                        <owl:Class rdf:about="#SERVICE_TYPE"/>

                        <owl:Class rdf:about="#PRODUCT_TYPE"/>

                   </owl:unionOf>

            </owl:Class>

      </rdfs:domain>

      <rdfs:range rdf:resource="#LINE_ITEM"/>

</owl:ObjectProperty>

 

The remaining relationships use the standard domain/range structure in OWL:

<owl:ObjectProperty rdf:ID="from">

      <rdf:type rdf:resource="http://www.w3.org/2002/07/

            owl#FunctionalProperty"/>

      <owl:inverseOf rdf:resource="#buyer_in"/>

      <rdfs:domain rdf:resource="#ORDER"/>

      <rdfs:range rdf:resource="#PARTY"/>

</owl:ObjectProperty>

 

<owl:ObjectProperty rdf:ID="buyer_in">

      <owl:inverseOf rdf:resource="#from"/>

      <rdfs:domain rdf:resource="#PARTY"/>

      <rdfs:range rdf:resource="#ORDER"/>

</owl:ObjectProperty>

<owl:ObjectProperty rdf:ID="to">

      <rdf:type rdf:resource="http://www.w3.org/2002/07/

            owl#FunctionalProperty"/>

      <owl:inverseOf rdf:resource="#vendor_in"/>

      <rdfs:domain rdf:resource="#ORDER"/>

      <rdfs:range rdf:resource="#PARTY"/>

</owl:ObjectProperty>

 


<owl:ObjectProperty rdf:ID="vendor_in">

      <owl:inverseOf rdf:resource="#to"/>

      <rdfs:domain rdf:resource="#PARTY"/>

      <rdfs:range rdf:resource="#ORDER"/>

</owl:ObjectProperty>

 

<owl:ObjectProperty rdf:ID="priced_in_terms_of">

      <rdf:type rdf:resource="http://www.w3.org/2002/07/

            owl#FunctionalProperty"/>

      <owl:inverseOf rdf:resource="#term_for"/>

      <rdfs:domain rdf:resource="#PRODUCT_TYPE"/>

      <rdfs:range rdf:resource="#UNIT_OF_MEASURE"/>

</owl:ObjectProperty>

 

<owl:ObjectProperty rdf:ID="term_for">

      <owl:inverseOf rdf:resource="#priced_in_terms_of"/>

      <rdfs:domain rdf:resource="#UNIT_OF_MEASURE"/>

      <rdfs:range rdf:resource="#PRODUCT_TYPE"/>

</owl:ObjectProperty>

The ontology – attributes

Adding attributes (datatype properties) to this model is just like it was for the previous model.  As described above, there are two ways to do it:

·       First, it may have a class defined as its domain.  That is, a datatype property’s domain is the class it is a property of.  In addition, its range is defined as a class that defines its format.

·       The second approach is expressed as the fact that the class is a sub-class of a restriction that is represented by this property (described by its onProperty tag).  That is, the class is a sub-class of all things that have this property.

Figure 6 shows the Cerebra model, with datatype properties added as dark hexagons.  Note that the datatype properties for order and line_item   are described using the first approach.  These are attributes that are only to be used for these classes.  Attributes like “Name”, “ID”, “Definition”, and “Description” appear in multiple classes, so these are defined according to the second approach.  These appear not only as the dark hexagons, but also listed explicitly within each class box. “Price_per_hour” and “Unit_price” are also in this category.


Figure 6: Cerebra and Attributes

 

The domain/range datatype properties show up like this in the OWL script:

<owl:DatatypeProperty rdf:ID="Line_number">

      <rdf:Type rdf:resource="http://www.w3.org/2002/07/

            owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#LINE_ITEM"/>

      <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#int"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Quantity">

      <rdf:Type rdf:resource="http://www.w3.org/2002/07/

            owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#LINE_ITEM"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#decimal"/>

</owl:DatatypeProperty>

<owl:DatatypeProperty rdf:ID="Inferred_unit_price">

      <rdf:Type rdf:resource="http://www.w3.org/2002/07/

            owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#LINE_ITEM"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#decimal"/>

      <rdfs:comment>Inferred from either SERVICE_TYPE.Price_per_hour or             PRODUCT_TYPE.Unit_price</rdfs:comment>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Calculated_extended_value">

      <rdf:Type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#LINE_ITEM"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#decimal"/>

      <rdfs:comment>Quantity*Inferred_unit_price</rdfs:comment>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Order_number">

      <rdf:Type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#ORDER"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#positiveInteger"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Order_date">

      <rdf:Type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#ORDER"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#date"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Close_date">

      <rdf:Type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#ORDER"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#date"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Summarized_total_value">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#ORDER"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#string"/>

      <rdfs:comment>

            The sum of LINE_ITEM "Calculated_extended_value"                        across LINE_ITEM part of ORDER.

      </rdfs:comment>

</owl:DatatypeProperty>

 

Here are the datatype properties used for party, producttype, service type, and unit_of_measure:

<owl:DatatypeProperty rdf:ID="ID">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Name">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Price_per_hour">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Unit_price">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Abbreviation">

      <rdf:type rdf:resource=

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Full_name">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

</owl:DatatypeProperty>

 

<owl:DatatypeProperty rdf:ID="Description">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

</owl:DatatypeProperty>

... with the classes that used them:

<owl:Class rdf:ID="PARTY">

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#ID"/>

                  <owl:allValuesFrom rdf:resource=

                        "http://www.w3.org/2001/XMLSchema#integer"/>

            </owl:Restriction>

      </rdfs:subClassOf>

</owl:Class>

 

<owl:Class rdf:ID="SERVICE_TYPE">

     

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#ID"/>

                  <owl:allValuesFrom rdf:resource=

                        "http://www.w3.org/2001/XMLSchema#string"/>

            </owl:Restriction>

      </rdfs:subClassOf>

 

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#Name"/>

                  <owl:allValuesFrom rdf:resource=

                        "http://www.w3.org/2001/XMLSchema#string"/>

           </owl:Restriction>

      </rdfs:subClassOf>

 

      <rdfs:subClassOf>

           <owl:Restriction>

                  <owl:onProperty rdf:resource="#Price_per_hour"/>

                  <owl:allValuesFrom rdf:resource=

                        "http://www.w3.org/2001/XMLSchema#string"/>

            </owl:Restriction>

      </rdfs:subClassOf>

</owl:Class>

 

<owl:Class rdf:ID="PRODUCT_TYPE">

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#ID"/>

                  <owl:allValuesFrom rdf:resource=

                        "http://www.w3.org/2001/XMLSchema#string"/>

            </owl:Restriction>

      </rdfs:subClassOf>

 

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#Name"/>

                  <owl:allValuesFrom rdf:resource=

                        "http://www.w3.org/2001/XMLSchema#string"/>

            </owl:Restriction>

      </rdfs:subClassOf>

 

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#Unit_price"/>

                  <owl:allValuesFrom rdf:resource=

                        "http://www.w3.org/2001/XMLSchema#string"/>

            </owl:Restriction>

      </rdfs:subClassOf>

</owl:Class>

 


The Ontology – Cardinality

Until now, the only cardinality constraint we could apply to properties (either attributes or relationships) was “maximum 1”.  The “Functional” constraint accomplished this.  OWL also has the ability to specify minimum cardinality, as well as expressing maximum cardinality explicitly.  As it happens, Cerebra does not support this, but other tools such as Protégé do, so here is how you do it.

In the model above, the following mandatory relationships have cardinality constraints, to be implemented as object properties that explicitly define a single domain and range for each:

·       Each order must be from one and only one party,

·       Each order must be to one and only one party. 

·       Each line_item must be for one and only one instance of the union of product_type and service_type.

·       Each product_type must be priced in terms of one and only one unit_of_measure.

One mandatory relationship, however, is implemented by making the class line_item a sub-class of the class of those things that are part of something.

·       Each line_item must be part of one and only one order.

In addition, note that the following mandatory attributes are implemented as datatype properties with their domains explicit and making the properties unique.

·       order “Order_number”

·       order “Order_date”

·       line_item “Line_number”

·       line_item “Quantity”

Several of the attribute names are re-used, however, which means that they must be implemented as datatype properties of classes that are sub-classes of the class of those things with that property.

·       party “ID”

·       service_type “ID”

·       service_type  “Name”

·       product_type “ID”

·       product_type  Name”

·       unit_of_measure “Name”

Beginning with the relationships order from party, order to party, and the attribute order “Order_number”, using the domain/range approach, cardinality for these appear in the OWL script as follows:

First, the object and datatype properties themselves are defined, specifying their domain and range:

<owl:ObjectProperty rdf:ID="from">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <owl:inverseOf rdf:resource="#buyer_in"/>

      <rdfs:domain rdf:resource="#ORDER"/>

      <rdfs:range rdf:resource="#PARTY"/>

</owl:ObjectProperty>

 

<owl:ObjectProperty rdf:ID="to">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <owl:inverseOf rdf:resource="#vendor_in"/>

      <rdfs:domain rdf:resource="#ORDER"/>

      <rdfs:range rdf:resource="#PARTY"/>

</owl:ObjectProperty>

 


<owl:DatatypeProperty rdf:ID="Order_number">

      <rdf:type rdf:resource=

            "http://www.w3.org/2002/07/owl#FunctionalProperty"/>

      <rdfs:domain rdf:resource="#ORDER"/>

      <rdfs:range rdf:resource=

            "http://www.w3.org/2001/XMLSchema#positiveInteger"/>

</owl:DatatypeProperty>

Then the cardinality constraint is then assigned as a restriction on the use of the relationships and attribute by this class:

<owl:Class rdf:ID="ORDER">

      <rdfs:subClassOf>

 

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#from"/>

                  <owl:Cardinality rdf:datatype=

                        http://www.w3.org/2001/XMLSchema#int>

                        1

                  </owl:cardinality>

            </owl:Restriction>

 

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#to"/>

                  <owl:Cardinality rdf:datatype=

                        http://www.w3.org/2001/XMLSchema#int>

                        1

                  </owl:cardinality>

            </owl:Restriction>

 

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#Order_number"/>

                  <owl:cardinality rdf:datatype=

                        http://www.w3.org/2001/XMLSchema#int>

                        1

                  </owl:cardinality>

            </owl:Restriction>

 

      </rdfs:subClassOf>

</owl:Class>

The object property line_item part of order is implemented with the sub-class of restriction method.  In this case, cardinality is simply another restriction.  Note that the object property definition is very simple:

  <owl:ObjectProperty rdf:ID="part_of"/>

Cardinality is then simply added to the restriction that defines the use of the object property in the first place:

<owl:Class rdf:ID="LINE_ITEM">

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#part_of"/>

                  <owl:someValuesFrom rdf:resource="#ORDER"/>

                  <owl:Cardinality rdf:datatype=

                        http://www.w3.org/2001/XMLSchema#int>

                        1

                  </owl:cardinality>

            </owl:Restriction>

      </rdfs:subClassOf>

</owl:Class>

Classification in data models

The taxonomy has become very popular as an ontology form.  Part of this may be because we have all been raised under the influence of the Dewey Decimal System for classifying library books.  In the case of a library, it is  necessary to classify books in a hierarchical fashion, since that is the only practical way of organizing physical books.  A book can only be in one place, and the taxonomy tells you where that is. 

While the system has worked for libraries for over 100 years, it is clear that the actual taxonomy that has evolved is not without its problems.  The group labeled as the 100s has philosophy and psychology lumped together, while the 200s are assigned to religion (with only the 290s allocated to anything but Christianity). The 500s are the natural sciences and the 600s are assigned to technology and the applied sciences. Literature is the 800s while Geography and history are in the 900s.

This means that a book on Islamic approaches to mathematics and the sciences throughout history is going to be very hard to classify.  It’s not possible to give it a 290-500-600-900 number.

The fact of the matter is that the body of knowledge represented by a library (and pretty much everything else, for that matter) is not strictly hierarchical.  While biologists have been successful at creating a hierarchical representation of life forms, other fields have been less successful.

In point of fact, the ontologists are quite happy to arrange their classes in a non-hierarchical fashion.  Note that above, it was necessary explicitly to identify the sub-classes as disjoint.  Absent an explicit constraint, there is nothing in OWL that, for example, prevents a company from being a sub-class of person, or more typically, that, for example, prevents a car from being a sub-class of both red car and high-horsepower car.

Data modelers, on the other hand (well, this data modeler, anyway) are quite reluctant to permit multiple inheritance in a sub-type structure.  The idea is that sub-types should be limited to fundamental categories of the super-type.  In our example, by definition, an internal organization cannot be an instance of anything other than an organization.  A regulatory agency cannot also be a corporation.  And while the party type entity class is redundant in describing the sub-types, it does clearly say that each party may be an example of one and only one party type. This proves to be a useful restriction to place on the sub-typing process, because it forces analysts to really understand the nature of the classification.

This restriction is absolutely appropriate if what you are describing are taxonomies.  If not, what we need is a different approach.

A Classification Data Model

Data modeling is normally a two-dimensional representation, where taxonomies are the exception rather than the rule.  Data models are collections of entity classes, all related to each other in various ways.  It is true that data modelers tend to start with an understanding of the classes and only later speculate as to what instances of them might look like.  What ontologiests tend to do, however, is to look at samples of real data and from there infer the kinds of classes they might be members of.  This results in more diverse classification of the same sets of data than typically might be done by data modelers.

There exists a particular data model, however, that can provide an approach to this diversity of classification, and this is shown in Figure 7.

Here, party category is recognized as an entity class.  A set of party categories constitutes a party category scheme.  Thus, a party category scheme might, for example, be “Income range”, and associated party categories could be “less than $20,000”, “$20,000 - $50,000”, “$50,000 - $100,000”, and “Greater than $100,000”.  The entity class party classification  then represents the fact that a particular party falls into the particular party category.

 

Figure 7: A More General Data Model

Note that “Income range” isn’t a particularly appropriate party category scheme for parties that are regulatory agencies (that is, parties that are an example of  the party type, “Regulatory Agency”).  Indeed, the only party type  that this party category scheme is appropriate for is “Person”.  The entity class party category scheme assignment allows us to say that.  That is, party scheme category assignment is defined as the fact that a particular party category scheme is appropriate for classifying instances of party that are of a specified party type.   Each party category scheme assignment, then, must be of one and only one party category scheme and to one and only one party type. 

Creating this entity class does not enforce any business rules, by the way.  It is still necessary for any applications collecting data here enforce the rule that:

No party may be subject to a party classification into a party category that is not part of a party category scheme  that is subject to at least one party category scheme assignment to a the party type that the party is an example of.

This approach gives us a much more flexible approach to classification than the sub-types of a taxonomy.  This means, for example, that the same party can be classified as many ways as anyone might want.  It is also allows you to specify that each party classification must be by someone (another party).  It is possible, then, for different departments to maintain completely different classification schemes, potentially classifying parties in completely different ways.  Note that not only is the classifier of a party classification shown, but so to are the parties responsible for defining both each party classification scheme  and its party categories. 

With this approach, you can also document when the party was classified (via “Effective date” and “Until date” attributes). 

Converting the Model to OWL

Figure 8 shows the Cerebra drawing of our model. The classes are now those that correspond to the data model, and the datatype properties are again shown in dark shaded hexagons.  The object properties, however, have been added, and are shown with light grey hexagons.  The following relationships are defined using the domain/range approach to modeling relationships:

*        party SubjectTo / of  party classification (defined as the inverse of each other)

*        party_category_scheme subject_to / about  party_category_scheme_assignment  (defined as the inverse of each other)

*        party type the object of /  to party_category_scheme_assignment (defined as the inverse of each other)

In addition, three relationships are shown as restrictions on a class.  These are:

·       party_classification into party_category

·       party_category part of party_category_scheme

·       party an example of  party_type / party_type embodied in party (defined as the inverse of each other)

As an example of the first approach, where each object property is given a domain and range, the OWL looks like this:

<owl:ObjectProperty rdf:ID="SubjectTo">

      <owl:inverseOf rdf:resource="#of"/>

      <rdfs:domain rdf:resource="#PARTY"/>

      <rdfs:range rdf:resource="https://essentialstrategies.com/

            owl/parties#PARTY_CLASSIFICATION"/>

</owl:ObjectProperty>

 

<owl:ObjectProperty rdf:ID="of">

      <owl:inverseOf rdf:resource="#SubjectTo"/>

      <rdfs:domain rdf:resource="https://essentialstrategies.com/

            owl/parties#PARTY_CLASSIFICATION"/>

      <rdfs:range rdf:resource="#PARTY"/>

</owl:ObjectProperty>

 

 


Figure 8: Cerbra Model of Relationships

 

<owl:ObjectProperty rdf:ID="to">

      <owl:inverseOf rdf:resource="#the_object_of"/>

      <rdfs:domain rdf:resource="#PARTY_CATEGORY_SCHEME_ASSIGNMENT"/>

      <rdfs:range rdf:resource="#PARTY_TYPE"/>

</owl:ObjectProperty>

 

<owl:ObjectProperty rdf:ID="the_object_of">

      <owl:inverseOf rdf:resource="#to"/>

      <rdfs:domain rdf:resource="#PARTY_TYPE"/>

      <rdfs:range rdf:resource="#PARTY_CATEGORY_SCHEME_ASSIGNMENT"/>

</owl:ObjectProperty>

 

<owl:ObjectProperty rdf:ID="subject_to">

      <owl:inverseOf rdf:resource="#about"/>

      <rdfs:domain rdf:resource="#PARTY_CATEGORY_SCHEME"/>

      <rdfs:range rdf:resource="#PARTY_CATEGORY_SCHEME_ASSIGNMENT"/>

</owl:ObjectProperty>

 

<owl:ObjectProperty rdf:ID="about">

      <owl:inverseOf rdf:resource="#subject_to"/>

      <rdfs:domain rdf:resource="#PARTY_CATEGORY_SCHEME_ASSIGNMENT"/>

      <rdfs:range rdf:resource="#PARTY_CATEGORY_SCHEME"/>

</owl:ObjectProperty>

The relationship from party to party type is an example of the second approach and looks like this in OWL:

<owl:ObjectProperty rdf:about="https://essentialstrategies.com/

      OWL/Parties#an_example_of">

      <owl:inverseOf rdf:resource="#embodied_in"/>

</owl:ObjectProperty>

 

</owl:ObjectProperty>

<owl:Class rdf:ID="PARTY">

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#an_example_of"/>

                  <owl:someValuesFrom rdf:resource="#PARTY_TYPE"/>

            </owl:Restriction>

      </rdfs:subClassOf>

</owl:Class>

Party_classification into party_category looks like this (without the property definition, which is elsewhere in the script):

<owl:Class rdf:about="https://essentialstrategies.com/

      owl/parties#PARTY_CLASSIFICATION">

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#into"/>

                  <owl:someValuesFrom rdf:resource="#PARTY_CATEGORY"/>

            </owl:Restriction>

      </rdfs:subClassOf>

</owl:Class>

And party_category part_of  party_category_scheme has this form in OWL, (along with the datatype properties that are also restrictions):

<owl:Class rdf:ID="PARTY_CATEGORY">

          (Relationship)

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#part of"/>

                  <owl:someValuesFrom                                                           rdf:resource="#PARTY_CATEGORY_SCHEME"/>

            </owl:Restriction>

      </rdfs:subClassOf>

 

          (Attributes)

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#Name"/>

                  <owl:allValuesFrom rdf:resource=

                        "http://www.w3.org/2001/XMLSchema#string"/>

            </owl:Restriction>

      </rdfs:subClassOf>

     

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#Definition"/>

                  <owl:allValuesFrom rdf:resource=

                        "http://www.w3.org/2001/XMLSchema#string"/>

            </owl:Restriction>

      </rdfs:subClassOf>

 

      <rdfs:subClassOf>

            <owl:Restriction>

                  <owl:onProperty rdf:resource="#Description"/>

                  <owl:allValuesFrom rdf:resource=

                        "http://www.w3.org/2001/XMLSchema#string"/>

            </owl:Restriction>

      </rdfs:subClassOf>

 

</owl:Class>

An issue

An ongoing issue in the data modeling world has always been, how abstract should the model be?  In the taxonomy example above, party is an abstraction of person and organization, and it specifically does not directly represent “Customer”, “Employee”, “Contractor” or any of the other roles that people and organizations play.  This is done because systems built based on such models are more robust and reliable.  New roles can be implemented as data, rather than as changes to structure. 

Existing systems, however (and the data they contain), are often defined in much more concrete terms.  The kinds of assertions that an OWL analysis will have to deal with are such things as “Charlie is an employee”, “Sally is a customer”, and so forth. 

Indeed, in the classification example above, converting that rather abstract model into OWL classes does not make it easy to identify, for example, the instances of people who are high-income, Hispanic, and have 4 cars.  In the data model, it is difficult to do any correlations between people’s being members of different classes.

An ontologist would take a completely different approach.  Simply define classes as necessary.  A person can be a member of any or all of them.  Then by analyzing the unions and intersections of these classes, conclusions can be drawn.

Here is one area where the difference between the way data modelers view the world and the way ontologists view the world is significant.  It is not clear what we should do about it.

To be sure, one of the kinds of analysis that an OWL inference engine might accomplish is to discover that being an employee and being a customer are not mutually exclusive, so future systems should take that into account (and the data model should be made more general).  But this would be an interactive process, and we are a long way from understanding just how that would work in practice.

implications

What ontologies do that data models do not

Ontology languages like OWL are fundamentally text based, with actionable semantics in every word.  This makes it possible for a computer program to parse the language, “understand” what it says, and draw inferences about what is not being said.  The significance of this, however, is not in the class structures that we have been describing in this paper.  Rather, this is an important facility in analyzing a body of data whose structure is not initially clear.  Each assertion is that a subject belongs to one or more classes.  The structure provided by an ontology language such as OWL provides a basis for determing just what those classes are, and what the implications of that membership are.

If Joe is described in one record as being an employee of a company, does that rule out the possibility that he is also a contractor?  Are “Employee” and “Contractor” intersecting sets?  Are Kansas City, Kansas and Kansas City, KS the same city?  What about Kansas City, Kansas and Kansas City, MO?  These are the kinds of questions ontologies allow us to ask.   

When gathering together either unstructured data or data from disparate legacy systems, these seem reasonable and useful questions.  The answers highlight things we didn’t think of and assumptions we didn’t know we were making.

While data modelers use examples to test models and to demonstrate them, they are not asking the same kinds of questions.  They have already defined exactly what is and is not an instance of a particular class.  The assumptions they may have made in doing so are not obvious, and can lead developers astray if they are not made explicit.

And of course, data models are harder to build (and often impossible to implement) when trying to address unstructured data, such as e-mails or product specifications.

What data models do that ontologies do not

Cerebra’s pale attempts to provide a graphic front-end notwithstanding, the problem with ontologies is the same as its advantage: they are fundamentally textual.  This makes books and articles on ontologies difficult to read, and absent an inference engine to highlight inconsistencies, it makes it difficult to ensure that an ontology is correct.  An OWL script is not typically a good subject for a discussion with management about the nature of its data.

Data models (properly done, it should be noted), with their graphic representation, make it much easier to see the assertions they make so that they can be discussed.  What do you mean by a “Customer”?  What roles are implied by the use of that word?  If you want to change a relationship or a cardinality constraint, it is easy to do.

To be sure, the ontology exercise revealed to your author assumptions he didn’t realize he was making—specifically, disjointedness.  This has always been a point when discussing multiple inheritance in building sub-type structures, but the fact of the matter is that throughout the model, you have to ask if something can be an instance of two different entity classes, such as, for example, contract and order.  Discipline in defining entity classes clearly goes a long way towards addressing this concern, but it is good to make the issues explicit.

What neither one of them do, yet

For the past decade or so, it has been clear that data models are limited in their ability to describe business rules.  They can describe cardinality and optionality constraints, and some notations provide for mutually exclusive relationships, but that is about it.  With data models, you can assume universal disjointedness.  You can represent computed attributes on a diagram, and some notations allow you to show the calculation alongside it, although the diagram itself is not really representing the calculation.  Object role modeling does have the ability to express things like limitations on recursion and constraints on navigation across relationships, but that notation is not as widely used as other, less expressive ones.

The problem is compounded by the fact that until recently there hasn’t been a formal way to express business rules, even in natural language.  This has been addressed recently by a group called the Business Rule Team, who, in 2006 submitted to the Object Management Group a proposed standard called “Semantics of Business Vocabulary and Business Rules”[17]  This was the result of extensive work by linguists, philosophers, business rule experts, and vendors to define a semantic approach accessible enough that it can be used by the business community to describe business rules and other constraints, but rigorous enough that the results can be automated.  It is a formal derivation of the semantics of business rules, beginning with the notions of concepts and meanings, building to propositions and facts, and concluding with the definition of a business vocabulary and business rules.  But this still doesn’t address the problem of how to link those rules to the data model.

Does the ontology world’s open world assumption and inference engines help address this?

Alas, not yet.  In the Semantic Web’s layered architecture, ontology languages remain descriptive only.  Indeed, they cannot describe as many business rules as data models can.  OWL cannot, for example, describe a constraint on an attribute value, or compare it to another value.  As described above, it cannot deal with computed attributes.  OWL can describe (as the Greeks would have it) what exists.  But as with data models, it cannot describe what should or should not be. 

In the promise of the Semantic Web, this can so far only be addressed by the “Applications” layer. 

“Upper Ontologies”

It is common enough to begin trying to create an ontology by capturing the specific vocabulary of your organization.  As it happens, though, much of that vocabulary is derived from industry standard vocabulary.  How about starting with an industry ontology and simply refining it?

As it happens, the ontological world is moving in that direction.  Little by little industry specific ontologies are being developed, and the advantage of OWL is that the language permits the merging of ontologies.  Industry versions can provide a good starting point, whereupon they can be enhanced to add both enterprise-specific definitions and entirely new enterprise-specific words.

The ontology world has gone beyond these industry-specific models, however, to develop what are called “upper ontologies”.  This represents an attempt to establish a universal glossary of terms that can be used across all areas of interest.  Beginning in 1990, D. B. Lenat and R.V. Guha developed the “Cyc” project to capture what was hoped to be all of “common sense” in an ontological repository.  The model of this is nothing short of a model of knowledge itself.  As of 2004, it contained about 3,000 terms arranged in 43 topical groups (spatial relations, time and dates, etc.).  The terms were derived from over 1,000,000 assertions, all entered by hand. Cyc begins with “Thing”, with its primary sub-types of “Tangible thing”, temporal thing, and “intangible thing”.   

Along the same lines, RA Pease and I Niles set out to create an IEEE Standard Upper Ontology.  Similar to Cyc, this starts with “Entity” (ok, “Thing”), sub-divided into “Physical entity” and “Abstract entity”.   This attempts to construct an upper ontology from other ontologies available from public sources.

John Sowa has developed an upper level ontology using 27 basic concepts. Under “Thing”, he has “Physical” and “Abstract” things, like the others, but he also adds “Independent”, “Relative”, and “Mediating.  His ontology is clearly a network, with many terms falling into multiple categories.

The nice thing about ontological technology is that it is possible to build a specific ontology by starting first with a generic ontology such as one of these, and then adding domain specific technologies for your industry (such as pharmaceuticals, oil extraction and refining, and banking, for example).  From there, you can then refine it further to include the specific terms of your company. 

Since the early 1990s, the data modeling community has also recognized that there are standard models to describe standard business situations  In 1995, your author wrote Data Model Patterns: Conventions of Thought[18], describing standard models for standard business situations.  The idea was that an analyst could start with the general models and then elaborate on them as necessary to meet particular requirements.  It is biased towards manufacturing, but includes a range of other models applicable to all industries.  In 2001 Len Silverston published  The Data Model Resource Book, in two volumes[19].  The first volume continues with the idea of generalized patterns.  The second volume has extensive models for specific industries.   

The problem with the data modeling approach to these efforts is that it is difficult mechanically to begin with a standard model and build on it.  In principle, CASE tools allow this, but the diversity of notations and levels of abstraction makes this difficult.

The diversity of these upper ontologies and data model patterns, combined with the enthusiasm with which they are being pursued suggests that there ought to be a common language we can all use.  Ok, given the modern international political situation, there ought at least to be a common language we can use in developing systems.  Although, of course this requires a common language to be used throughout our organizations.

Ah, well.


What next?

This article showed that, except for the issue of level of abstraction described previously,  it is a reasonably mechanical transformation from a data model to an ontology language such as OWL.  It would be very nice if there were software available to do this automatically.  This would make it possible to bring the business and IT communities’ analysis of the nature of the enterprise’s data in a graphical environment together  with OWL and an OWL inference engine.  The graphical analysis is where  conflicting definitions are resolved, structures are made clear, and discussions about how abstract to make the models are carried out.  How wonderful it would be if this could then be supplemented with analysis of existing databases and unstructured data to determine where they conform to these structures and where they do not.  More importantly, such inference engines could describe why they do not.



*           When data modeling was invented, this was called an “entity type”, with the word “entity” used to refer to an instance of an entity type.  Person would be an entity type, for example, and “Jane Smith” would be an entity.  Over the years, however, people got sloppy and began using the word “entity” to mean the kind or class of things, relegating instances to be called simply “entity instances”, or some such.  Then the object-oriented movement came along and claimed that it was special because it talked about “classes” of things. Since this is nothing other than the entity type concept, your author is attempting to bring the worlds together by calling a class/entity type an “entity class”. 

 

*           It could be, but, alas, it is not...

[1]           Kemerling, Garth, (1997-2002) Philosophy Pages.  See at http://www.philosophypages.com/dy/s4.htm#sems

[2]           Ibid, http://www.philosophypages.com/dy/o.htm#onty

[3]           Gómez-Péres, Asunción, Fernández-López, M, Corcho, O (2004). Ontological Engineering.  (London: Springer-Verlag). Page 3.

[4]           Ibid. Pages 4-5.

[5]           Ibid. Pages 4-5.

[6]           Knowledge Based Systems, Inc., Information for Concurrent Egineering. Prepared for Armstrong Laboratory AL/HRGA. 1994.

*           You may wonder why the language is not called “WOL”. In Winnie the Pooh,  the character Owl tries to spell his name and comes up with “WOL”, whereupon all of his friends quickly point out his error.   Rumor has it that The W3C simply decided not to make the error in the first place.

[7]           Tim Berners-Lee, Weaving the Web.  Harper, San Francisco.  1999.

[8]           Michael C. Daconta, L.J. Obrst, and K.T. Smith, The Semantic Web. Indianapolis: Wiley Publishing, Inc.  2003. page 4.

[9]           Ibid., page 3.

[10]          This section is taken from David C. Hay. “Data Modeling, RDF, and OWL”.  The Data Administration Newsletter.  To be found at http://www.tdan.com/i036ht04.htm.

*           Again, note that this is a very different definition of the word “domain” than we are accustomed to in the relational database world  There, the word refers to (for example) a list of possible values for an attribute.  Here it refers to the class the attribute belongs to. 

[11]          Knowledge Based Systems, Inc. IDEF5 Method report.  Prepared for Armstrong laboratory AL/HRGA Wright-Patterson Air Force Base, Ohio.  It can be found at http://www.idef.com/pdf/idef5.pdf.

[12]          An overview of OWL can be found at http://www.w3.org/TR/owl-features; The Reference Manual may be found at http://www.w3.org/TR/owl-ref.

[13]          Best documented in Barker, Richard. CASE*Method: Entity Relationship Modelling. Wokingham, UK: Addison-Wesley. 1990.

[14]          http://protege.stanford.edu

[15]          http://cerebra.com

[16]          http://metatomix.com

*           Note by the way, that, while this sequence is useful to the explanation, in the physical document, this may not be “first” at all.  There is no significance to the sequence of statements in an OWL script.  Each XML statement is processed independently of all others that are not part of its hierarchy.  In the example, all dataTypeProperties were defined at the end of the document.

*           Note the addition to the notation of the arc across the two “for” relationships.  This means that, for an instance of the entity class, there is either an occurrence of one relationship or an occurrence of the other relationship.

[17]          Business Rules Team. “Semantics of Business Vocabulary and Business Rules”. An Interim Convenience Document submitted to the Object Management Group. 2006.

[18]          David C. Hay. Data Model Patterns: Conventions of Thought. New York: Dorset House. 1995.

[19]          Len Silverston. The Data Model Resource Book (two volumes). Indianapolis: John Wiley, Inc. 2001.