Knowledge Systems Corporation

About KSC

Press and Media

Job Opportunities

KSC Articles

Contact KSC

Site Map

Home


Courses
Smalltalk Courses
Java Courses
OO Courses
 
Opportunities
Jobs with Smalltalk
Jobs with Java
Jobs with .NET
Jobs with Data Base
Employers Sign Up Here
 
Business Services
Migration Services
System Implementation
Application Development
 
Education Services
Tutoring
Immersion Programs
 


 

Crossing Chasms:
A Pattern Language for Object-RDBMS Integration"
"The Static Patterns"

By Kyle Brown and Bruce G. Whitenack

Abstract

Crossing Chasms is a growing pattern language to help design and build object-oriented applications that use a relational database for persistent storage. Crossing Chasm's patterns are categorized into three groups: static, dynamic and client-server. Static patterns deal with the definition of the relational schema and the object model. This section of Crossing Chasms specifically addresses the static issues.

Introduction

Patterns can be used to express the rationale of architectural designs. Crossing Chasms is an effort to establish the foundation patterns for a relational database to object oriented application architecture. It is a pattern language to help design and build object-oriented applications that use a relational persistent store. Building such a system requires difficult design decisions that involve maintainability, performance, simplicity, and interoperability between the client and server systems. Any database architecture, framework or tool used in object-oriented applications of this type should support these patterns.

 

Mission Software

Has created a Smalltalk compiler for the Java Virtual Machine. This compiler allows Smalltalk to run on any JVM. The compiler currently produces 100% Java class files fully compatible with the Sun Java Virtual Machine specification. This allows Smalltalk and Java code to interact seamlessly and allows Smalltalk programs to run anywhere Java runs! Click to learn more

 

Relational databases are commonly used for applications (even those that are developed in an OO language) for the following reasons:

The weaknesses of using a relational database with an object system are:

These strengths and weaknesses are important to keep in mind as the patterns are described. Crossing Chasms' patterns are categorized into three groups: static, dynamic and client-server. Static patterns describe how to define the structural relationships of the entities (objects and tables) as well as their properties.

Static Patterns (Relational Side)

The Static Patterns for the relational side deal with when and how to best define a database schema to support an object model. The identity of the objects, their relationships (inheritance, aggregation, semantic associations) and their state must be preserved in the tables of a relational database. Table Design Time deals with when is the best time during development to actually design the relational schema. Representing Objects as Tables, Representing Object Relationships as Tables, Representing Inheritance in a Relational Database, Representing Collections in a Relational Database and Foreign-Key Reference deal with defining the relationships between objects and defining each object's state. Object Identifier (OID) defines how to establish object identity in a relational database.

Table Design Time

Problem

When is the best time to design your relational database during object-oriented development?

Forces

Assume no legacy database exists prior to development or if one does exist, it is extremely flexible (i.e., it can be changed according to application needs). When the database design is kept foremost in mind during development, the object model will tend to be data driven while the behavior and responsibilities of the objects will be deprived of the thought and energy they deserve. Consequently, the object model will tend to have separate data objects and controller objects. This leads to a design that has heavy-duty controller objects and stupid data objects rather than a better, more-distributed, less-centralized design . If the database design is completely ignored until the application is completed the project may suffer. Since 25% to 50% of the code in such applications often deals with object-database integration, the design of the database is crucial and should be considered early in development. Consequently:

Solution

Design the tables based on your object model after you have implemented it in an architectural prototype but before the application is in full-stage production.

Discussion

Definition of domain object behavior and properties is in reality a first pass at the database design. A stop-gap persistency approach (perhaps using flat ASCII files) is often "good enough" for an architectural prototype. A benefit of this approach is that legacy data can be quickly exported from existing databases to an ASCII file. The prototype can then be easily demonstrated on stand-alone workstations that may not have a relational database and still show "real" data familiar to customers.

Related Patterns

Representing Objects as Tables

Representing Object Relationships as Tables

Representing Inheritance in a Relational Database

Representing Collections in a Relational Database

Architectural Prototype (Kent Beck's Smalltalk Best Practices Patterns)

 

Representing Objects as Tables

Problem

How do you map an object structure into a relational database schema?

Forces

Objects do not map neatly into tables. For instance, object classes do not have keys. Tables do not have the same identity property that objects do. The datatypes of tables in a relational database do not match the classes in the object model. Complex objects can reference other complex objects and collections of objects.

Solution

Begin by creating a table for each persistent object in your object model. Determine what type of object each instance variable is likely to contain. For each object that is representable in a database as a base datatype (i.e., String, Character, Integer, Float, Date, Time) create a column in the table corresponding to that instance variable, naming it the same as the instance variable. If an instance variable contains a Collection subclass, use 1 Representing Collections in a Relational Database. If an instance variable contains any other value, use 1 Foreign-Key Reference.

Finally, create a column to contain this objects OID. (See 1 Object Identifier (OID)).

Discussion

The design of the database may need modification (for instance, denormalization) depending upon the access patterns required for particular scenarios. Remember that this design is an iterative process. There are several variations of mappings between classes and tables. These are:

 

 

The Database Access Architecture must handle each of these variations.

Sources

[Elmasri 94] [Rumbaugh]

 

Representing Object Relationships as Tables

Problem

How do you represent object relationships in a relational database schema?

Forces

A variety of relationships exist between classes in an object model. These relationships may be:

 

 

A Qualified association is an association between two objects where the association is constrained or identified in some way. For example a Company can be associated with a Person through a position held by that Person. The position qualifies the association between the Company and the Person.

The association between objects may represent containment, associated properties or have come special semantic meaning in their own right (e.g., a marriage is a special relationship between a man and a woman).

The choices for 1 to 1, and 1 to many relationships, are either to merge the association into a class or to create a class based on the association.

It is important to remember that the semantics of relationship between objects can be significant. It is often is useful to create classes to represent the associations, especially if the relationship has values of its own. These classes will be represented as tables in the relational database. For many to many, 1 to many and 1 to 1 associations, when an association has a meaningful existence in the problem domain, create a class for the association. A meaningful existence is when the relationship itself can have value such as the relationship itself possessing properties such as duration, quality or type. A marriage is a relationship between a man and a woman that can have all these properties.

Therefore:

Solution

Merge 1 to 1 associations with no special meaning into one of the tables. If it has special meaning create a table based on the class derived from the association.

For 1 to many associations, create a relationship table (see Representing Collections in a Database).

A many to many relationship always maps to a table that contains columns referenced by the foreign keys of the two objects.

Ternary and n-ary associations should have their own table that reference the participating classes by foreign key.

A qualified association should have its own table.

Discussion

Consideration of the forces of this pattern will often result in changes to a first-pass object model. This is desirable, since it will often generate a more general and flexible solution.

Sources

[Rumbaugh]

Related Patterns

Representing Inheritance in a Relational Database

Representing Collections in a Relational Database

Representing Inheritance in a Relational Database

Problem

How do you represent a set of classes in an inheritance hierarchy in a relational database?

Forces

Relational databases do not provide support for inheritance of attributes. It is impossible to do a true 1-1 mapping between a relational table and a class when that class inherits attributes from another class, or if other classes inherit from it.

There are two possible contexts that are used in this pattern, depending upon what is more important to your particular application, speed of queries, or maintainability and flexibility of your relational schema.

Solution

(When ease of schema modification is paramount)

Create one table for each class in your hierarchy that has attributes. This will include both concrete and abstract classes. The tables will contain columns for each of the attributes defined in that class, plus an additional column that represents the common key shared between all subclass tables. An instance of a concrete subclass is retrieved by doing a relational JOIN of all of the tables in a path to the root with the common key as the join parameter.

Discussion

This is a direct mapping, which makes it easy to change if a class anywhere in the hierarchy changes. If a class changes, you must change at most one table. Unfortunately, the overhead of doing multi-table joins can become a problem if you have even a moderately deep hierarchy.

Solution

(When speed of queries is more important)

Create one table for each concrete subclass of your hierarchy that contains ALL of the attributes defined in that subclass or inherited from its superclasses. An instance is retrieved by querying that table.

Discussion

This avoids the joins of the previous solution, making queries more efficient. This is also a simple mapping, but has the drawback that if a superclass is changed, then many tables must be modified. It is also difficult to infer the object design from the relational schema.

There is a third solution that may be more appropriate in a multiple-inheritance environment, but that does not have much to recommend itself beyond that. It is possible to create a single table that represents ALL of the superclass's and subclasses attributes, with SELECT statements picking out only those that are appropriate for each class. Unfortunately, this can lead to a large number of NULL's in your database, wasting space.

Sources

 [Elmasri 94] contains a discussion of how this problem is dealt with in the Extended ER (EER) model. Both [Jacobson92]  and [Rumbaugh91] discuss this problem and present this solution.

Related Patterns

Object Identifier (OID)

Representing Collections in a Relational Database

Problem

How do you represent Collection subclasses in a relational database?

Forces

The first normal form rule of Relational Databases prevents a relation from containing a "Multivalued" attribute, or what we would normally think of in Object terms as a Collection. The kind of 1-N relationships represented in OO languages by collection classes are represented in a very different form in a relational database.

Collection classes in Smalltalk often convey additional information besides the relationship between the objects contained in the collection, and the object that contains the collection. Order, sorting methods, and type of the contained objects are all problems that must be addressed.

Solution

Represent each collection in your object model (where one object class is related to another object class by a 1-N has-a relationship) by a relationship table. The table may also contain additional attributes that address the other issues.

The basic solution involves creating a table that consists of at least two columns, one which represents the primary key (usually the OID) of the containing object (the object that holds the collection) and another which represents the primary key of the contained objects (the objects held in the collection). Each entry in the table shows a relationship between the contained object and the containing object. The primary key of the relationship table is comprised of both columns. A third column may be needed which indicates either the class of the object or the table that the object is located in. Collections may contain objects of various classes.

Discussion

There are other possible representations of the 1-N relationships, including back-pointers. Back pointers have the drawback that it is difficult to have an object be contained in more than one collection at the same time when the two collections are contained in different instances of the same class. The simplest, and most common additional information to include in a relationship table is a column that indicates the type of the contained object. This is necessary when a Collection may be heterogeneous. If an OrderedCollection is utilized, and the order is significant, the position of the object in the collection may be stored in an additional column. It must be noted that unless a distinguishing column indicating a position or OID is added to a relation table and made part of its primary key then the basic solution represents a Set, rather than a more general collection, since the key constraint of relational databases prevent a tuple from occurring more than once in the same table.

Sources

[Elmasri94] presents this as the primary solution for handling 1-N relationships in the E-R model. [Beck94] and [Loomis94]  present additional information about constraints on these relationships when mapping Smalltalk collections to relational tables.

Related Patterns

Object Identifier (OID)

Object Identifier (OID)

Problem

How do you represent an object's individuality in a relational database?

Forces

In object-oriented languages, objects are uniquely identifiable. In Smalltalk, an equivalence comparison (==) determines if two objects are exactly identical. This is accomplished through the comparison of their Object Pointers (OOPs) which are uniquely assigned to each object when it is instantiated.

In an environment where objects may become persistent, some way of identifying what particular persistent structure (be it a row in a relational database, or a structure in an OODBMS) corresponds to that object has to be added to the mix. OOPs are reassigned and reclaimed by the system, precluding their use as an object identifier.

Solution

Assign an identifier (an Object IDentifer or OID) to each object that is guaranteed to be unique across image invocations. This identifier will be part of the object, and will be used to identify it during query operations, and update operations.

Discussion

OID's can be generated either internally to your application, or externally. Some relational databases include a sequence number generator that can be used to generate OID's, and it is preferable to use that option when available. OID's only need be unique within a class, as long as some other way of identifying the class of an object is provided by the persistence scheme. OID's are customarily long integers.

If an OID is generated within the application, it is often common to have a table that represents the latest available OID for each class. The table will be locked, queried, updated and unlocked whenever a new OID is required. To improve performance, sometimes an entire block of OID numbers can be acquired at once.

OID's can include type information encoded into the identifier. In this case, it may be more appropriate to use a char or varchar column rather than an integer.

Sources

 [Jacobson92] and [Rumbaugh91]  discuss the use of OID's. Many papers on OODBMS' also discusses the use of OID's in their implementation.

Related Patterns

Foreign-Key Reference

Foreign-Key Reference

Problem

How do you represent the fact that in an object model an object can contain not only "base datatypes" like Strings, Characters, Integers and Dates, but other objects as well?

Forces

Given that the first normal form (1NF) rule of relational databases specifically excludes a tuple from containing another tuple you must use another representation of an object that can be represented by a legal value that a column can contain.

Solution

Assign each object in your object model a unique OID (See pattern OID). Add a column for each instance variable that contains an object that is not either:

a collection object a "base datatype" In this column, store the OID of object contained in the previous object. If your database supports the feature, declare the column to be a foreign key to the table that represents the class of object whose OID is stored in that column.

Discussion

This restriction (the 1NF rule) is both a strength, and the Achilles' Heel of the relational model. When this pattern is used in self-similar objects (i.e., a Person has children, who are also Persons) it is exceedingly difficult to retrieve a tree of connected objects rooted on a single object in a single SQL query.

If you find that the vast majority of columns in your database schema arise from this pattern, you may wish to reconsider the decision to use a relational database as a persistent object store.

Sources

[Rumbaugh91] discusses the use of foreign key references.

Related Patterns

Object Identifier (OID)

Representing Collections in a Relational Database

Static Patterns (Object Side)

The previous section discussed the relationships and the definition of class properties as defined in the relational database schema. However, we must also consider the definition of the object model on the client. Foreign Key versus Direct Reference addresses how to best define the relationships of complex objects to be instantiated in the object image.

Foreign Key versus Direct Reference

Problem

In the domain object model when should you reference objects with a "foreign key" and when should you have direct reference with pointers?

Forces

In general, the object model should closely reflect the problem domain and its behavior. However, the network of objects that support this model can be complex and large. Modeling a large corporation with its numerous organizations and branches, may require hundreds of thousands of objects and multiple levels of objects of different classes.

In object models, objects usually directly reference one another. This make navigation among the object network direct and easier than via foreign-key reference.

Objects can reference other objects by using their foreign keys. When this is the case, the objects must also have methods to dereference the foreign key to get the referenced object. This makes maintaining the object relationships in the object model more complex. If foreign keys are used to reference the objects then more searches and more caches are required to support the accessing methods. However, using the foreign key makes it easier to map the domain objects to the database tables during their instantiation and passivation. Relying on foreign keys alone with the object model can result in recursive relations and may also result in extremely poor performance problems as large collections of objects are needed to represent a complex object.

In many cases, the application simply requires a list of names to peruse in order to locate the object of interest. The number of potential objects in such a list may be in the millions. This puts a heavy strain on the memory requirements of such a system. A great majority of the time the application just requires a foreign key for display and selection purposes. This means keeping the supporting application domain models "light," where they contain only those attributes necessary for display purposes.

Solution

An object model should use direct reference as much as possible. This permits fast navigation over the object structures. Build the object network piece by piece as required using Proxy objects to minimize storage. Make the associations only as complex as necessary. When dealing with large collections or a set of complex objects use foreign keys and names to represent the objects for user interface display and selection. After selection is made, instantiate the complex object depending upon memory constraints and performance.

Discussion

If each domain object maps to a single table then there is probably a table model in the domain object layer. You may be adding complexity to the whole system. If the domain objects have no behavior other then being information holders, you may consider getting them out of the way. Instead, have the application model refer directly to broker objects. This way you do not have an object cache to keep in sync with the relational tables. If domain behavior is required (which it probably will be) then you can add domain objects as required. Make the domain objects "prove" themselves. In reference to using foreign keys within the object model instead of direct references, one developer learning Smalltalk said: "What the hell good is objects if you do not hold real objects? You might as well use PowerBuilder."

Related Patterns

Foreign-Key Reference

 

Bibliography

 

[Beck 94] Beck, Bob and Hartley, Steve, "Persistent storage in a workflow tool implemented in Smalltalk," OOPSLA `94 proceedings, Portland, OR, ACM, 1994. pp. 373-387

[Elmasri 94] Elmasri and Navathe, Fundamentals of Database Systems, Benjamin Cummings, 1994.

[Jacobsen 92] Ivar Jacobsen, et. al. "Object-Oriented Software Engineering, A Use-Case Driven Approach," Addison-Wesley, 1992.

[Loomis 94] Loomis, Mary E.S., "Hitting the Relational Wall," Journal of Object-Oriented Programming, January 1994, pp. 56- 59+

[Rumbaugh 91] James Rumbaugh, Michael Blaha, William Premerlani, Fredrick Eddy, William Lorensen, "Object-Oriented Modeling and Design," Prentice Hall, Englewood Cliffs, New Jersey, 1991

[Wirfs-Brock 95] Rebecca Wirfs-Brock, "Characterizing your Application's Control Style," Presentation at Smalltalk Solutions `95.

 


Mission Software

DotNetBuzz

 

Knowledge Systems Corporation is a member of the Smalltalk Webring.

 This Smalltalk Webring site is owned by Knowledge Systems Corporation.
[ Previous Page | Next Page | Skip Next | List Next 5 | Random Link ]
Want to join the ring? Click here for info

Email:  Sales sales@ksc.com
Copyright 2002 - Knowledge Systems Corporation