Beginning of Test 2 Material

 

March 3, 2003

 

Normalization is an alternative database design tool to data modeling. It was developed as part of theoretical foundation for the relational model application of a series of rules that gradually improve the design. 

 

Data Modeling first emerged as E-R modeling in a paper by Chen. There were two main concepts

 

Normalization is a method for increasing the quality of database design. It is also a theoretical base for defining the properties of relations. Normalization is useful cross check for ensuring the soundness of a data model.

 

Relations can be classified by the types of modification anomalies to which they are vulnerable. Meaning certain types of tables can be prone to insertion anomalies. Some tables are prone to deletion anomalies and others are prone to an update anomaly. Research was done in the 1970’s to classify these types of tables and to find a set of techniques that would prevent these anomalies. The technique that was developed was normalization.

 

Terminology for Normalization:

 

Functional Dependency

 

One or more attributes determine the value of another attribute.

An identifier functionally determines all the attributes of an entity.

 

Basically if we are given the value of one attribute or set of attributes, we can look up the value for the other attribute.

 

It is denoted as

 

Stock code → firm name, stock price, stock quantity, stock dividend

In other words if we know the stock code we know the value of firm name etc.

 

An attribute, or set of attributes, that fully functionally determines another attribute is called a determinate. So in referring to stock code above we see that it is a determinant because it fully functionally determines stock PE.

 

Unlike a key a determinate need not be unique. In other words an attribute can determine another attribute. The example in the book shows that “course type determines fee.”

 

An identifier (key) can be and should be the determinate, however a determinate is not necessarily a key.

 

There are situations were a given value (attribute) determines multiple values.

 

A multi-determines B. Denoted as A →→B.

 

Example: Department multi-determines course.

 

Attribute Relationships

 

There is a one – to – one relationship

A value of an attribute determines the value of another attribute and vice versa

Denoted as  A →B and B →A.

 

When two attributes have a 1:1 relationship, they must occur together in at least one table in a database so that their equivalence is a recorded fact

 

There is a one to much relationship

A value of an attribute determines the value of another attribute but not vice versa

Denoted as  A →B

 

Example:

 

As a result,  if A and B occur in the same table, then A must be the key

 

There is a many to many relationship

Denoted as A not →B and B not →A

 

Example:

 

French and Flemish is spoken in Belgium and French is spoken in many countries

To record the m:m relationship between these attributes,  a table containing both attributes as a composite key is required.

 

Normal Forms

There are five levels of normalization.

 

The first level says A relation is in first normal form if and only if all columns are single-valued.

 

In other words if you have table with single values in the columns the data model is in first normal form.

 

 

The second level says A relation is in second normal form if and only if it is in first normal form, and all non key columns are dependent on the key.

 

According to this definition, if a relation has a single attribute as its key, then it is automatically in second normal form, by default, every nonkey attribute is dependent on all of the key. Thus, second normal form is of concern only in relations that have composite keys.

 

Slide 11 –

Second normal form is:

CUSTOMER-CREDIT in this case

 

ORDER

 

 

 

ITEMNO

CUSTOMERID

QUANTITY

CUSTOMER-CREDIT

12

57

25

OK

34

679

3

POOR

 

So to fix this problem we would break the ORDER table back to the original table structure of ITEM and CUSTOMER

 

The third level says A relation is in third normal form if and only if it is in second normal form and has no transitive dependencies.

 

Transitive dependencies….

 

March 5, 2003

 

Transitive dependencies finally addressed. A transitive dependency occurs when part of the attributes are determined by the key and then those attributes which are determined by the key are the determinates for another attribute

 

Example

 

SID

Building

Fee

100

Jesse Hall

3200

150

Graig Hall

3100

200

Jesse Hall

3200

250

Jesse Hall

3200

 

The functional dependency is that Building determines Fee and SID determines building.

This is remedied by splitting the table into two tables.

 

SID

Building

100

Jesse Hall

150

Graig Hall

200

Jesse Hall

250

Miller Hall

300

Jesse Hall

 

Building

Fee

Jesse Hall

3200

Graig Hall

3100

Jesse Hall

3200

Miller Hall

3100

Jesse Hall

3200

 

 

What happens when you have:

 

You cry and tear your hair …then you deal with it by inventing  a stronger version of 3NF (third normal form).

 

The precise definition is a relation is in Boyce-Codd normal form if and only if every determinant is a candidate key.

 

What is a candidate key?

By looking at the table below we see that by combining SID and Major we can come up with Fname. Remember when we use two attributes in combination this is called a composite key. The combination SID and Fname will get us Major. The combinations (SID , Major) and (SID, Fname) are candidate keys. The combination we pick to use will be the primary key.

 

Example:

This table is in third normal form however it fails the Boyce-Codd test.

 

SID

Major

Fname

100

Math

Cauchy

150

Psychology

Jung

200

Math

Riemann

250

Math

Cauchy

300

Psychology

Perls

300

Math

Reimann

 

Primary key(s) SID, Major

Key (candidate) SID, Fname

 

Functional Dependencies Fname determines Major

 

Solution:

 

SID

Fname

100

Cauchy

150

Jung

200

Riemann

250

Cauchy

300

Perls

300

Reimann

 

Fname

Subject

Cauchy

Math

Jung

Psychology

Riemann

Math

Cauchy

Math

Perls

Psychology

Reimann

Math

 

 

March 7, 2003

 

Normalization continued

The precise definition is a relation is in fourth normal form if it is in BCNF and has no multi-value dependencies.

 

Violation of fourth normal form

Student

 

 

 

Student ID

Sport

Subject

50

Football

English

50

Football

Music

50

Tennis

Botany

50

Karate

Botany

 

What is a multi-value dependency?

 

Refer to the above table. Note that Student ID can participate in many sports so Sport does not functionally depend on Student ID. In addition Student ID can participate in many subjects. In general a multi-value dependency exists when a relation has at least three attributes, two of them are multi-value,  and their values depend on the third attribute.

 

In other words, in a relation R(A,B,C), a multi-value dependency exists if A determines multiple values of B,  A determines multiple values of C and B and C are independent of each other.

 

Multi value dependencies lead to modification anomalies. The solution is to break the one table into two tables.

 

Fifth Normal Form

<![if !supportEmptons,  as we have been doing,  but then cannot be reconstructed. The condition under which this situation arises has no clear, intuitive meaning.  We do not know what the consequences of such dependencies are or even if they have practical consequences…”

 

March 10, 2003

 

Domain/Key Normal Form

 

The precise definition is a relation is in domain key/normal form if and only if every constraint on the relation is a logical consequence of the domain constraints and the key constraints that apply to the relation.

 

What are example of constraints?

 

What is a domain?

 

Unfortunately, there is no known algorithm for converting a relation to DK/NF, nor is it even known which relations can be converted to DK/NF. Finding, or designing, DK/NF relations is more of an art than a science.

 

In spite of this, in the practical work of database design, DK/NF is an exceedingly useful design objective. If we can define relations in a way that constraints on them are logical consequences of domains and keys, then there will be no modification anomalies.

 

Example 1 of Domain/Key Normal Form

 

STUDENT(SID, GradeLevel, Building, Fee)

            Key: SID

            Constraints: Building → Fee

                               SID must not begin with digit 1

 

We know:

  1. SID functionally determines the other three attributes so SID is a key
  2. We know that Building determines Fee.
  3. SID must not begin with one.

 

The question is how do we create relations from the above information so that there will be no modification anomalies?

 

We give the domains for the attributes definitions:

 

SID                    in    CDDD, where C is a decimal digit not = 1; D = decimal digit

GradeLevel        in {FR, SO, JR, SN, GR}

Building             in CHAR(4)

Fee                   in DEC(4)

 

We define relations and keys

 

STUDENT (SID, GradeLevel, Building)

Key: SID

 

BLDG-FEE (Building, Fee)

Key: Building

 

Other data modeling methods

 

The E-R (Entity –Relationship) model

 

One of the most widely known data modeling methods is the entity-relationship model, developed by Chen.  There is no standard for the E-R model, and it has been extended and modified in a number of ways.

 

Typically the shapes mean the following:

 

Rectangles represent entities

Diamonds represent relationships with the cardinality encapsulated inside the diamond

1:1 (one to one), 1:N (one to many) and M:N (many to many). One important difference is that a m:m is not shown as an associative entity; thus the database designer must convert this relationship to a table.

Ellipses represent attributes

 

Representing Relationships

The ways to represent relationships is varied (see page 222 of text) Some modeling techniques are very refined in their approach and other data modeling mediums have loose definition of relationships. Which ever model is chosen the same goal remains…improving database design. The data model chosen for use in this book was chosen for its simplicity and high quality results.

 

Chapter 9

 

The Relational Model

 

So far this semester we have covered data modeling, normalization and E-R diagram modeling of database design. This chapter officially introduces the relational model. We have touched on components of the relational model but now we will bring the components together.

 

The relational model was developed as a result of recognized short comings of hierarchical and network DBMSs. According to Codd there are three good reasons for adopting the relational mode over other database models

 

  1. Other models force the programmer to code at a level of structural detail. So application  programs are more complex and take longer to code and debug
  2. No commands are provided for processing multiple records at a time
  3. Only the relational model through a query language such as SQL, recognizes the client’s need for ad hoc queries.

 

Objectives of relational model research

 

  1. Create a clearly delineated boundary between the logical and physical aspects of database management. This is known as the data independence objective
  2. Create a simple model that was readily understood by a wide range of users and programmers. This is the communicability objective
  3. Increase processing capabilities from a record-at-a-time to multiple records at a time. This is known as the set processing objective.

 

The relational model has three components:

  1. data structures
  2. integrity rules
  3. operators used to retrieve, derive, or modify data

 

Data Structures

 

Domain – The set of values all of the same data type

Relations – a table of n columns and m rows. Each column has a unique name, and all the values in a column are drawn from the same domain. Each row is uniquely identified.

 

The cardinality of a relation is its number of rows.  The degree of a relation is the number of columns. Hence one can expect that the cardinality of a relation will change but the degree shall not. The degree changes if a column is added to a relation, but in terms of relational theory, it is considered to be a new relation.

 

A relational database collection of relations or tables. Tables are linked by a common tuple that need not have the same name in both tables but it must draw on the same domain.

 

The primary key is the unique identifier

 

In some situations there may be several attributes – known as candidate keys – that are potential primary keys. One of the candidate keys is chosen as the primary and the candidate keys remaining are then referred to as alternate keys.

 

A foreign key is an attribute (possibly composite) of a relation that is also a primary key of a relation. The foreign key and primary key may be in different relations or the same relation, but both keys should be drawn from the same domain.

 

Integrity Rules

 

No component of the primary key of a relation may be null.

 

This is the entity integrity rule. This rule ensures that each instance of an entity described in a relation is identifiable in some way. Its implementation means that each row in a relation can be uniquely identified.

 

A database must not contain any unmatched foreign key values

 

In short this enforces the definition of a foreign key. The referential integrity rule says that a foreign key must have a corresponding primary key.

 

We have a pretty good idea what a relational database consists of. We have been working with them all semester. According  to our text there are four ways to manipulate relational tables.

 

  1. SQL
  2. QBE
  3. Relational Algebra
  4. Relational Calculus

 

We are fairly familiar with SQL and QBE now we will take a closer look at relational algebra and relational calculus.

 

While relational algebra can be used to resolve queries, it is seldom employed because it requires you to specify what you want and how to get it. This makes relational algebra more difficult to use than SQL.

 

Relational calculus is a non-procedural language, there for it is less concerned with how you get information. While relational calculus over comes some of the shortcomings of relational algebra it is harder to learn. As a result language designers developed SQL and QBE.

 

Relational algebra is the standard for judging data retrieval languages. If a language, such as SQL,  can be used to express every relational algebra operator, it is said to be relationally complete,

 

Relational Algebra Operators

 

Restrict - Creates a new table from specified rows of an existing table

 

Project- Creates a new table from specified columns of an existing table

 

Product – Creates a new table from  all of the existing rows of two existing rows. If Table A has 2 columns and 3 rows; Table B has 3 columns and two rows. The product of A and B will have 5 columns and six rows.

 

Union – Creates a new table containing rows appearing in one or both tables of two existing tables. The relations must be union compatible, which means that they have the same column names,  in the same order, and drawn on the same domain. Union is the sum of the rows in both tables. If it is in Table A or Table B it will be in Table C less the duplications.

 

Intersect – Creates a new table containing rows appearing in both tables of two existing tables. If it is in Table A  and it is in Table B it will be in the resultant table. The relations must be union compatible.

 

Difference – Creates a new table containing rows appearing in one table but not in the other of two existing tables. The difference of two relations is a new relation containing all rows appearing in the first relation but not the second.

 

Join – creates a new relation from two relations for all combinations of rows satisfying the join condition. Technically a join is a product of two tables followed by a restrict.

 

Divide – Division requires that A and B have a set of attributes that are common to both relations.

 

March 12, 2003

 

The full set of relational operators is not required. Join, Divide and Intersect can be described and done by using combinations of the other operators. Hence there are five primitive relational algebra operators.

 

Comparison of relational algebra primitive operations and SQL:.

Operation

Relational algebra

SQL

Restrict

A where condition

SELECT * FROM A WHERE condition

Project

A[X]

SELECT X FROM A

Product

A times B

SELECT * FROM A, B

Union

A union B

SELECT * FROM A UNION SELECT * FROM B

Difference

A minus B

SELECT * FROM A

    WHERE NOT EXISTS

     (SELECT * FROM B WHERE

         A.X = B.X  AND  A.Y = B.Y AND …

 

 

A relational database has structures, integrity rules and a manipulation language. A fully relational database system provides complete support for each of these components. Many commercial systems support SQL, but do not provide support for domains or integrity rules. Such systems are not fully relational but are relationally complete

 

Chapter 13 – Object Oriented Data Management

 

The proponents of OO maintain that ODBMS offers several advantages over current implementation of relational database technology. They assert that the shortcomings of the relational model are

 

  1. It cannot handle complex objects such as multimedia;
  2. There is no support for general data types found in some programming languages;
  3. Performance degrades when large numbers of tables must be joined to respond to a query;
  4. There is a mismatch between the relational data model’s set at a time processing and the record-at- time processing of programming languages;
  5. There is no support for representing and recording change, such as different versions of objects.

 

It should be noted that there is considerable disagreement between representatives of OO and relational adherents. This chapter tries to present a balance of the view points.

 

UML

Unified Modeling Language was introduced in 1997 by the Object Management Group. UML has become the common modeling language for OO software development. UML standardizes the notation for describing processes and has nine predefined diagrams.

 

We will focus on the class diagram because of its correspondence to the data model.

 

E-business developers are using UML to meet customer demands.

 

Traditional relational database structure in not designed to handle complex data structures such as

 

 

Object Oriented Programming Languages (OOPL) have been in the making for about 40 years. Smalltalk was developed by Xerox PARC in the 1970’s and introduced the terms object and OO. As you know OO programming has become the standard in many development arenas these days.

 

Key OO concepts

 

OO applications are created by assembling and using objects – self contained units that can contain data – very much like entities in data modeling. An object is something in the real world. Objects also contain the necessary instructions, the algorithm to transform data. Objects are application development building blocks.

 

OO modeling and data modeling have a common objective of creating a representation of a real-world information system. Consequently, they frequently deal with the same concepts but use different terminology.

 

Data Modeling Term

OO Modeling Term

Entity

Object class or classes

Instance

Object

Attribute

Attribute

 

Data Abstraction is the process of creating an abstract model to enhance understanding of reality. An organizational chart shows the associations between roles in organization. A diagram of an investment portfolio is another example of data abstraction.

 

Encapsulation means that all processing that changes the state of an object (changes values) is done within that object. Encapsulation implies that an object is shielded from interference from other objects; that is, and object cannot directly change any other objects. Encapsulation means that data and methods can be packaged together.

 

Message passing

Objects communicate with each other by sending and receiving messages. The receiving object needs a method for handling the message. A message can trigger a change in a receiving object.

 

Inheritance

An important feature of OO is that classes in a generalization/specialization hierarchy will inherit data and methods from the superclass.

 

Reuse

Don’t reinvent the wheel. If a class with similar characteristics is in existence …reuse it. But be careful. Watch out for kangaroos.

 

Why use 00?

 

 

March 14, 2003

 

The slow growth in programmer productivity is exacerbated by the massive effort required to keep operational software systems current. Some companies spend as much as 80 percent of their potential development resources on maintenance of old systems.

 

    1. Programming is a difficult, intellectual task and prone to logic errors.
    2. Determining the client’s true needs is a major challenge
    3. Additional features may be needed
    4. The business environment changes
    5. Hardware changes

 

OO software in constructed by creating self contained object classes. Because each object is relative independent there is less ripple effect, which often happens with traditional programming methods. The code is easier to understand even if documentation is sparse and the maintenance programmer has little knowledge of the application.

 

Gathering client information is difficult at best. OO advocates say that the OO modeling techniques is more in tune to the way clients view their world.

 

Adding features is deemed to be easier with OO methodology due to the independence of the existing structure. Thus OO is suitable for evolutionary development in which the main features of the system are first developed, and less critical elements added later.

 

A model by definition is an abstraction of something else. When modeling, we omit some of the features or detail we find in the real world and incorporate only those we find useful to the client’s needs.

 

A city map is a model of some aspects of a city’s streets. A model should reduce the communication gap between the client and the analyst.  OO supporters assert that dealing with systems objects, which closely resemble real-life objects is more natural than dealing with entities, relationships and data flows.

 

Types of models

 

The procedural model describes what the system needs to do and how to do it. Often referred to as the dynamic part of the of an information systems model, a procedural model has two parts.

 

The static aspects are modeled in the class diagram, which shows the potential or the capabilities of the system as opposed to what the system will actually do in a given circumstance.

 

Scenarios

 

A simple banking system is good example.

 

Static OO modeling

 

The basic modeling concepts learned earlier apply to the OO modeling of the static part of the system. Like entities, object classes can be related to other object classes.

 

Two broad types of possible relationships are depicted in an OO model:

 

Finding Objects and Classes

 

Finding objects and classes is very similar to identifying entities. Which classes to include in a system is highly dependent on the context, the client, and the modeler. Although there is no correct set of classes the OO analyst should strive to create a high fidelity model of the client’s world.

 

One approach is to underline any nouns in the problem description. Most nouns are possible classes, and underlining ensures that no potential classes are overlooked.

 

We will use a four-layer model of OO analysis

 

 

Although presented sequentially, actual analysis will usually not strictly follow such a pattern.

 

Example – Class and Object layer

 

Customers can rent diving equipment and boats from DDT. When a customer has seen what is available and decided what to rent, a rental agreement or contract  is produced and signed.

 

Underline the nouns and we see that Customer, Diving equipment, Contract and Boat are prime candidates for classes

 

Structure Layer

 

There are two types of structures:

 

Generalization

 

Depending on how and with which classes we start, OO modeling can be top down or bottom up. If we look at BOAT and DIVING EQUIPMENT first and realize that they can both be part of a generalized class which we call RENTAL EQUIPMENT we have done bottom up modeling. If we started with RENTAL EQUIPMENT we have done top-down modeling.

 

RENTAL EQUIPMENT does not contain any objects, which is why it is an abstract class. It serves solely as a vehicle for specifying the common attributes and methods of DIVING EQUIPMENT and BOAT.

 

As we look at this situation a bit closer we realize that more detail may be required. DIVING EQUIPMENT would need specialized information about some of this equipment. For example suits are described by size, thickness and type. SUIT would be a subclass to DIVING EQUIPMENT.

 

An inheritance relation is between classes, but inheritance is not like a relationship in data modeling. The inheritance relationship between SUIT and DIVING EQUIPMENT does not imply that a SUIT object (one specific suit) is linked to a corresponding DIVING EQUIPMENT object.  Here, inheritance means that SUIT has the same methods and attributes and DIVING EQUIPMENT and some other methods and attributes that are unique to SUIT.

 

Multiple inheritance

 

So far, generalization structures have been simple tree structures. Each subclass has one superclass from which it inherits methods and attributes. Multiple inheritance means that a class can inherit attributes and methods from more than one superclass. The generalization structure becomes more like a lattice than a tree.

 

As you may anticipate, multiple inheritance can create problems when an application is executed. There may be conflicts between the inherited methods and attributes. Although there are many strategies for dealing with this problem the issue remains unresolved. One strategy is not to support multiple inheritance, which is the approach taken by some OOPL’s. Java does not support multiple inheritance.

 

March 17, 2003

 

Aggregation

 

An aggregation shows the whole assembly and its parts. See page 396 of your text for an example. The relationship is sometimes call a whole/part structure because a TRAILER is a part of  BOAT ASSEMBLY.  It is also sometimes called an owned relationship. Note the cardinality.

 

The composition relationshipis distinguished from the other types of relationships by open or filled diamonds. In this case, we have a composite aggregation between boat assembly and the boat’s hull; they are inseparable. The other two aggregations are shared because a boat assembly might not always have a trailer or motor.

 

An OO model, like a data model records details of the aggregation relationship. In the case of an OO model the relationship is between objects.

 

Read the aggregation model as follows: (figure 13-10)

  1. A boat has zero or one trailers,  and a trailer belongs to one boat;
  2. A boat has zero, one , or two motors, and a motor belongs to one boat;
  3. A boat has one hull,  and one hull belongs to one boat.

 

We will combine pieces to illustrate the current state of our OO model. Note this model is incomplete because it does not include CUSTOMER or CONTRACT.

 

We have covered the Class and object layer, and the structure layer. Now we will discuss the attribute layer.

 

The attribute layer has two purposes: specification of attributes and specification of association relationships. Specification is similar to data modeling. Some OO proponents contend that you need not define attributes until physical design or implementation. The author recommends that you specify attributes as you discover them. How much detail you include seems to depend on what you are modeling.

 

Association Relationships

 

Sometimes called reference relationships, association relationships depict links between occurrences of an object.  Also called instance connections, they are conceptually the same as relationships in data modeling.

 

The aggregation relationship is actually a special version of the association relationship. By using both aggregation and association modeling we come up with a model that is closer to depicting reality.

 

Service layer

 

The required processing for each object is described in the service layer. Simple and complex services are distinguished. Simple services are the standard services that most objects need to be able to perform. These are usually implied as always being available and are not shown in the model. Simple services typically include create, connect, access and release.

 

Complex or nonstandard services, which are shown explicitly,  fall into two categories: calculate and monitor. For example, the object class CONTRACT needs to calculate a rental fee. This nonstandard service is shown in the services window.

 

Comparison of OO Modeling and Relational Paradigms

 

Mapping OO model to a relational database…here are some alternatives:

 

 

Persistent Objects

 

In OO programming, objects only exist for the duration of a program’s execution. If we use OO for information systems, we need to be able to store details of persistent objects, those that last much longer. We need an ODBMS which stores simple and complex objects. It also supports the creation of abstract data types, encapsulation and inheritance.

 

Fundamental Differences between ODBMS and RDBMS Technology

 

March 19, 2003

 

Below is a useful URL which has journals associated with what we will cover this chapter.

http://lamp.infosys.deakin.edu.au/journals/index.php?page=alpha&alpha=D

 

 

Organizational intelligence is the outcome of an organization’s effort to collect, store, process, and interpret data from internal and external sources. The conclusions or clues gleaned from an organization’s data stores enable it to identify problems or opportunities, which is the first stage of decision making.

 

Transaction processing systems (TSPs) are a core component of organizational memory and thus an important source of data. Along with relevant external information,  the various TSPs are the bedrock of an organizational intelligence system. TSPs are creating a massive torrent of data that potentially reveals to an organization a great deal about its business and its customers.

 

Unfortunately, many organizations are unable to exploit, either effectively or efficiently,  the massive amount of data generated by TSPs for the following reasons:

 

 

In other words organizational memory is highly fragmented. A technology is needed that can accumulate a considerable proportion or organizational memory into one readily accessible system. In order to improve organizational performance, increase revenues and provide better customer service decision makers need to have facts readily available. An organization first needs to organize these data into one logical collection and then use software to sift through this collection to extract meaning.

 

A data warehouse is a subject oriented, integrated, time variant, and nonvolatile set of data that supports decision making.

 

 

Associated with data warehouse are data management aids, data extraction, analysis tools and applications.

 

A data warehouse is a snapshot of an organizatio, stored in operational data stores (ODS),  are the raw material of a data warehouse. Data must be extracted from many systems with different names and in different formats however some data may be the same from these different systems. At the time of operational system development there was no plan to ever merge the data from the systems. As a result the extraction of data from many different systems is time consuming and complex. Further extraction must be an on going process so that analysts can work with current data.

 

Transformation

Transformation is part of the data extraction process. In the warehouse, data must be standardized and follow consistent coding systems. There are several types of transformation:

 

 <