|
Crossing Chasms: A Pattern Language for Object-RDBMS Integration"
"The Static Patterns"
By Kyle Brown and Bruce G. Whitenack
Abstract
Crossing Chasms is a growing pattern language to help design and
build object-oriented applications that use a relational database
for persistent storage. Crossing Chasm's patterns are categorized
into three groups: static, dynamic and client-server. Static patterns
deal with the definition of the relational schema and the object
model. This section of Crossing Chasms specifically addresses the
static issues.
Introduction
Patterns can be used to express the rationale of architectural
designs. Crossing Chasms is an effort to establish the foundation
patterns for a relational database to object oriented application
architecture. It is a pattern language to help design and build
object-oriented applications that use a relational persistent store.
Building such a system requires difficult design decisions that
involve maintainability, performance, simplicity, and interoperability
between the client and server systems. Any database architecture,
framework or tool used in object-oriented applications of this type
should support these patterns.
Relational databases are commonly used for applications (even those
that are developed in an OO language) for the following reasons:
- They exist in legacy systems that must be used by new systems.
- It is a technology that has been used and tested for a number
of years and is consequently well-understood.
- The table model is a simple model which has a sound mathematical
foundation.
The weaknesses of using a relational database with an object system
are:
- Limited modeling capabilities (object instantiation, behavior
and inheritance are not easy to define when compared to an object
database).
- Poor performance for complex applications. Multiple table joins
may be required to represent a complex object. This is much less
efficient than directly referencing an object.
- There is a semantic mismatch with OO languages. There are a
limited set of data types in a relational database while there
are a potentially infinite number of potential classes.
These strengths and weaknesses are important to keep in mind as
the patterns are described. Crossing Chasms' patterns are categorized
into three groups: static, dynamic and client-server. Static patterns
describe how to define the structural relationships of the entities
(objects and tables) as well as their properties.
Static Patterns (Relational Side)
The Static Patterns for the relational side deal with when and
how to best define a database schema to support an object model.
The identity of the objects, their relationships (inheritance, aggregation,
semantic associations) and their state must be preserved in the
tables of a relational database. Table Design Time deals with when
is the best time during development to actually design the relational
schema. Representing Objects as Tables, Representing Object Relationships
as Tables, Representing Inheritance in a Relational Database, Representing
Collections in a Relational Database and Foreign-Key Reference deal
with defining the relationships between objects and defining each
object's state. Object Identifier (OID) defines how to establish
object identity in a relational database.
Table Design Time
Problem
When is the best time to design your relational database during
object-oriented development?
Forces
Assume no legacy database exists prior to development or if one
does exist, it is extremely flexible (i.e., it can be changed according
to application needs). When the database design is kept foremost
in mind during development, the object model will tend to be data
driven while the behavior and responsibilities of the objects will
be deprived of the thought and energy they deserve. Consequently,
the object model will tend to have separate data objects and controller
objects. This leads to a design that has heavy-duty controller objects
and stupid data objects rather than a better, more-distributed,
less-centralized design . If the database design is completely ignored
until the application is completed the project may suffer. Since
25% to 50% of the code in such applications often deals with object-database
integration, the design of the database is crucial and should be
considered early in development. Consequently:
Solution
Design the tables based on your object model after you have implemented
it in an architectural prototype but before the application is in
full-stage production.
Discussion
Definition of domain object behavior and properties is in reality
a first pass at the database design. A stop-gap persistency approach
(perhaps using flat ASCII files) is often "good enough"
for an architectural prototype. A benefit of this approach is that
legacy data can be quickly exported from existing databases to an
ASCII file. The prototype can then be easily demonstrated on stand-alone
workstations that may not have a relational database and still show
"real" data familiar to customers.
Related Patterns
Representing Objects as Tables
Representing Object Relationships as Tables
Representing Inheritance in a Relational Database
Representing Collections in a Relational Database
Architectural Prototype (Kent Beck's Smalltalk Best Practices Patterns)
Representing Objects as Tables
Problem
How do you map an object structure into a relational database schema?
Forces
Objects do not map neatly into tables. For instance, object classes
do not have keys. Tables do not have the same identity property
that objects do. The datatypes of tables in a relational database
do not match the classes in the object model. Complex objects can
reference other complex objects and collections of objects.
Solution
Begin by creating a table for each persistent object in your object
model. Determine what type of object each instance variable is likely
to contain. For each object that is representable in a database
as a base datatype (i.e., String, Character, Integer, Float, Date,
Time) create a column in the table corresponding to that instance
variable, naming it the same as the instance variable. If an instance
variable contains a Collection subclass, use 1 Representing Collections
in a Relational Database. If an instance variable contains any other
value, use 1 Foreign-Key Reference.
Finally, create a column to contain this objects OID. (See 1 Object
Identifier (OID)).
Discussion
The design of the database may need modification (for instance,
denormalization) depending upon the access patterns required for
particular scenarios. Remember that this design is an iterative
process. There are several variations of mappings between classes
and tables. These are:
- 1 Object Class maps to 1 table
- 1 Object Class maps to multiple tables
- multiple object classes map to 1 table.
- collections of the same class map to a 1 table
- multiple object classes map to multiple tables
The Database Access Architecture must handle each of these variations.
Sources
[Elmasri 94] [Rumbaugh]
Representing Object Relationships as Tables
Problem
How do you represent object relationships in a relational database
schema?
Forces
A variety of relationships exist between classes in an object model.
These relationships may be:
- 1 to 1 ( husband - wife)
- 1 to many ( mother-child)
- many to many ( ancestor - child)
- ternary (or n-ary) associations ( student - class - professor)
- qualified associations ( company - office - person)
A Qualified association is an association between two objects where
the association is constrained or identified in some way. For example
a Company can be associated with a Person through a position held
by that Person. The position qualifies the association between the
Company and the Person.
The association between objects may represent containment, associated
properties or have come special semantic meaning in their own right
(e.g., a marriage is a special relationship between a man and a
woman).
The choices for 1 to 1, and 1 to many relationships, are either
to merge the association into a class or to create a class based
on the association.
It is important to remember that the semantics of relationship
between objects can be significant. It is often is useful to create
classes to represent the associations, especially if the relationship
has values of its own. These classes will be represented as tables
in the relational database. For many to many, 1 to many and 1 to
1 associations, when an association has a meaningful existence in
the problem domain, create a class for the association. A meaningful
existence is when the relationship itself can have value such as
the relationship itself possessing properties such as duration,
quality or type. A marriage is a relationship between a man and
a woman that can have all these properties.
Therefore:
Solution
Merge 1 to 1 associations with no special meaning into one of the
tables. If it has special meaning create a table based on the class
derived from the association.
For 1 to many associations, create a relationship table (see Representing
Collections in a Database).
A many to many relationship always maps to a table that contains
columns referenced by the foreign keys of the two objects.
Ternary and n-ary associations should have their own table that
reference the participating classes by foreign key.
A qualified association should have its own table.
Discussion
Consideration of the forces of this pattern will often result in
changes to a first-pass object model. This is desirable, since it
will often generate a more general and flexible solution.
Sources
[Rumbaugh]
Related Patterns
Representing Inheritance in a Relational Database
Representing Collections in a Relational Database
Representing Inheritance in a Relational Database
Problem
How do you represent a set of classes in an inheritance hierarchy
in a relational database?
Forces
Relational databases do not provide support for inheritance of
attributes. It is impossible to do a true 1-1 mapping between a
relational table and a class when that class inherits attributes
from another class, or if other classes inherit from it.
There are two possible contexts that are used in this pattern,
depending upon what is more important to your particular application,
speed of queries, or maintainability and flexibility of your relational
schema.
Solution
(When ease of schema modification is paramount)
Create one table for each class in your hierarchy that has attributes.
This will include both concrete and abstract classes. The tables
will contain columns for each of the attributes defined in that
class, plus an additional column that represents the common key
shared between all subclass tables. An instance of a concrete subclass
is retrieved by doing a relational JOIN of all of the tables in
a path to the root with the common key as the join parameter.
Discussion
This is a direct mapping, which makes it easy to change if a class
anywhere in the hierarchy changes. If a class changes, you must
change at most one table. Unfortunately, the overhead of doing multi-table
joins can become a problem if you have even a moderately deep hierarchy.
Solution
(When speed of queries is more important)
Create one table for each concrete subclass of your hierarchy that
contains ALL of the attributes defined in that subclass or inherited
from its superclasses. An instance is retrieved by querying that
table.
Discussion
This avoids the joins of the previous solution, making queries
more efficient. This is also a simple mapping, but has the drawback
that if a superclass is changed, then many tables must be modified.
It is also difficult to infer the object design from the relational
schema.
There is a third solution that may be more appropriate in a multiple-inheritance
environment, but that does not have much to recommend itself beyond
that. It is possible to create a single table that represents ALL
of the superclass's and subclasses attributes, with SELECT statements
picking out only those that are appropriate for each class. Unfortunately,
this can lead to a large number of NULL's in your database, wasting
space.
Sources
[Elmasri 94] contains a discussion of how this problem is dealt
with in the Extended ER (EER) model. Both [Jacobson92] and [Rumbaugh91]
discuss this problem and present this solution.
Related Patterns
Object Identifier (OID)
Representing Collections in a Relational Database
Problem
How do you represent Collection subclasses in a relational database?
Forces
The first normal form rule of Relational Databases prevents a relation
from containing a "Multivalued" attribute, or what we
would normally think of in Object terms as a Collection. The kind
of 1-N relationships represented in OO languages by collection classes
are represented in a very different form in a relational database.
Collection classes in Smalltalk often convey additional information
besides the relationship between the objects contained in the collection,
and the object that contains the collection. Order, sorting methods,
and type of the contained objects are all problems that must be
addressed.
Solution
Represent each collection in your object model (where one object
class is related to another object class by a 1-N has-a relationship)
by a relationship table. The table may also contain additional attributes
that address the other issues.
The basic solution involves creating a table that consists of at
least two columns, one which represents the primary key (usually
the OID) of the containing object (the object that holds the collection)
and another which represents the primary key of the contained objects
(the objects held in the collection). Each entry in the table shows
a relationship between the contained object and the containing object.
The primary key of the relationship table is comprised of both columns.
A third column may be needed which indicates either the class of
the object or the table that the object is located in. Collections
may contain objects of various classes.
Discussion
There are other possible representations of the 1-N relationships,
including back-pointers. Back pointers have the drawback that it
is difficult to have an object be contained in more than one collection
at the same time when the two collections are contained in different
instances of the same class. The simplest, and most common additional
information to include in a relationship table is a column that
indicates the type of the contained object. This is necessary when
a Collection may be heterogeneous. If an OrderedCollection is utilized,
and the order is significant, the position of the object in the
collection may be stored in an additional column. It must be noted
that unless a distinguishing column indicating a position or OID
is added to a relation table and made part of its primary key then
the basic solution represents a Set, rather than a more general
collection, since the key constraint of relational databases prevent
a tuple from occurring more than once in the same table.
Sources
[Elmasri94] presents this as the primary solution for handling
1-N relationships in the E-R model. [Beck94] and [Loomis94] present
additional information about constraints on these relationships
when mapping Smalltalk collections to relational tables.
Related Patterns
Object Identifier (OID)
Object Identifier (OID)
Problem
How do you represent an object's individuality in a relational
database?
Forces
In object-oriented languages, objects are uniquely identifiable.
In Smalltalk, an equivalence comparison (==) determines if two objects
are exactly identical. This is accomplished through the comparison
of their Object Pointers (OOPs) which are uniquely assigned to each
object when it is instantiated.
In an environment where objects may become persistent, some way
of identifying what particular persistent structure (be it a row
in a relational database, or a structure in an OODBMS) corresponds
to that object has to be added to the mix. OOPs are reassigned and
reclaimed by the system, precluding their use as an object identifier.
Solution
Assign an identifier (an Object IDentifer or OID) to each object
that is guaranteed to be unique across image invocations. This identifier
will be part of the object, and will be used to identify it during
query operations, and update operations.
Discussion
OID's can be generated either internally to your application, or
externally. Some relational databases include a sequence number
generator that can be used to generate OID's, and it is preferable
to use that option when available. OID's only need be unique within
a class, as long as some other way of identifying the class of an
object is provided by the persistence scheme. OID's are customarily
long integers.
If an OID is generated within the application, it is often common
to have a table that represents the latest available OID for each
class. The table will be locked, queried, updated and unlocked whenever
a new OID is required. To improve performance, sometimes an entire
block of OID numbers can be acquired at once.
OID's can include type information encoded into the identifier.
In this case, it may be more appropriate to use a char or varchar
column rather than an integer.
Sources
[Jacobson92] and [Rumbaugh91] discuss the use of OID's. Many papers
on OODBMS' also discusses the use of OID's in their implementation.
Related Patterns
Foreign-Key Reference
Foreign-Key Reference
Problem
How do you represent the fact that in an object model an object
can contain not only "base datatypes" like Strings, Characters,
Integers and Dates, but other objects as well?
Forces
Given that the first normal form (1NF) rule of relational databases
specifically excludes a tuple from containing another tuple you
must use another representation of an object that can be represented
by a legal value that a column can contain.
Solution
Assign each object in your object model a unique OID (See pattern
OID). Add a column for each instance variable that contains an object
that is not either:
a collection object a "base datatype" In this column,
store the OID of object contained in the previous object. If your
database supports the feature, declare the column to be a foreign
key to the table that represents the class of object whose OID is
stored in that column.
Discussion
This restriction (the 1NF rule) is both a strength, and the Achilles'
Heel of the relational model. When this pattern is used in self-similar
objects (i.e., a Person has children, who are also Persons) it is
exceedingly difficult to retrieve a tree of connected objects rooted
on a single object in a single SQL query.
If you find that the vast majority of columns in your database
schema arise from this pattern, you may wish to reconsider the decision
to use a relational database as a persistent object store.
Sources
[Rumbaugh91] discusses the use of foreign key references.
Related Patterns
Object Identifier (OID)
Representing Collections in a Relational Database
Static Patterns (Object Side)
The previous section discussed the relationships and the definition
of class properties as defined in the relational database schema.
However, we must also consider the definition of the object model
on the client. Foreign Key versus Direct Reference addresses how
to best define the relationships of complex objects to be instantiated
in the object image.
Foreign Key versus Direct Reference
Problem
In the domain object model when should you reference objects with
a "foreign key" and when should you have direct reference
with pointers?
Forces
In general, the object model should closely reflect the problem
domain and its behavior. However, the network of objects that support
this model can be complex and large. Modeling a large corporation
with its numerous organizations and branches, may require hundreds
of thousands of objects and multiple levels of objects of different
classes.
In object models, objects usually directly reference one another.
This make navigation among the object network direct and easier
than via foreign-key reference.
Objects can reference other objects by using their foreign keys.
When this is the case, the objects must also have methods to dereference
the foreign key to get the referenced object. This makes maintaining
the object relationships in the object model more complex. If foreign
keys are used to reference the objects then more searches and more
caches are required to support the accessing methods. However, using
the foreign key makes it easier to map the domain objects to the
database tables during their instantiation and passivation. Relying
on foreign keys alone with the object model can result in recursive
relations and may also result in extremely poor performance problems
as large collections of objects are needed to represent a complex
object.
In many cases, the application simply requires a list of names
to peruse in order to locate the object of interest. The number
of potential objects in such a list may be in the millions. This
puts a heavy strain on the memory requirements of such a system.
A great majority of the time the application just requires a foreign
key for display and selection purposes. This means keeping the supporting
application domain models "light," where they contain
only those attributes necessary for display purposes.
Solution
An object model should use direct reference as much as possible.
This permits fast navigation over the object structures. Build the
object network piece by piece as required using Proxy objects to
minimize storage. Make the associations only as complex as necessary.
When dealing with large collections or a set of complex objects
use foreign keys and names to represent the objects for user interface
display and selection. After selection is made, instantiate the
complex object depending upon memory constraints and performance.
Discussion
If each domain object maps to a single table then there is probably
a table model in the domain object layer. You may be adding complexity
to the whole system. If the domain objects have no behavior other
then being information holders, you may consider getting them out
of the way. Instead, have the application model refer directly to
broker objects. This way you do not have an object cache to keep
in sync with the relational tables. If domain behavior is required
(which it probably will be) then you can add domain objects as required.
Make the domain objects "prove" themselves. In reference
to using foreign keys within the object model instead of direct
references, one developer learning Smalltalk said: "What the
hell good is objects if you do not hold real objects? You might
as well use PowerBuilder."
Related Patterns
Foreign-Key Reference
Bibliography
[Beck 94] Beck, Bob and Hartley, Steve, "Persistent storage
in a workflow tool implemented in Smalltalk," OOPSLA `94 proceedings,
Portland, OR, ACM, 1994. pp. 373-387
[Elmasri 94] Elmasri and Navathe, Fundamentals of Database Systems,
Benjamin Cummings, 1994.
[Jacobsen 92] Ivar Jacobsen, et. al. "Object-Oriented Software
Engineering, A Use-Case Driven Approach," Addison-Wesley, 1992.
[Loomis 94] Loomis, Mary E.S., "Hitting the Relational Wall,"
Journal of Object-Oriented Programming, January 1994, pp. 56- 59+
[Rumbaugh 91] James Rumbaugh, Michael Blaha, William Premerlani,
Fredrick Eddy, William Lorensen, "Object-Oriented Modeling
and Design," Prentice Hall, Englewood Cliffs, New Jersey, 1991
[Wirfs-Brock 95] Rebecca Wirfs-Brock, "Characterizing your
Application's Control Style," Presentation at Smalltalk Solutions
`95.
|