Relational database

From Wikipedia, the free encyclopedia

A relational database is a database that conforms to the relational model, and refers to a database's data and schema (the database's structure of how those data are arranged). Common usage of the term "Relational database management system" technically refers to the software used to create a relational database, but sometimes mistakenly refers to a relational database.

The term relational database was originally defined and coined by E.F. Codd in 1970.^[1]

Relational databases have become the overwhelming choice for the storage of the tabular information that supports the world economy, including financial records, manufacturing and logistical information, personnel data and much more. Relational databases replaced hierarchical databases and network databases and have survived challenges from Object Databases. More recently, relational database have attempted to fight off challenges from XML databases, in part through the incorporation of basic XML functionality into relational database products.

The three leading relational database vendors are Oracle, Microsoft, and IBM. The leading open source implementations are MySQL and PostgreSQL.

1 Contents
2 Relational operations
3 Normalization
4 References

[edit] Contents

Strictly, a relational database is a collection of relations (frequently called tables). Other items are frequently considered part of the database, as they help to organize and structure the data, in addition to forcing the database to conform to a set of requirements.

[edit] Terminology

Relational database terminology.

Relational database theory uses a different set of mathematical-based terms, which are equivalent, or roughly equivalent, to SQL database terminology. The table below summarizes some of the most important relational database terms and their SQL database equivalents.

Relational term	SQL equivalent
relation, base relvar	table
derived relvar	view, query result
tuple	row
attribute	column

[edit] Relations or tables

Main articles: Relation (mathematics) and Table (database)

A relation is defined as a set of tuples that have the same attributes. A tuple usually represents an object and information about that object. Objects are typically physical objects or concepts. A relation is usually described as a table, which is organized into rows and columns. All the data referenced by an attribute are in the same domain and conform to the same constraints.

The relational model specifies that the tuples of a relation have no specific order and that the tuples, in turn, impose no order on the attributes. Applications access data by specifying queries, which use operations such as select to identify tuples, project to identify attributes, and join to combine relations. Relations can be modified using the insert, delete, and update operators. New tuples can supply explicit values or be derived from a query. Similarly, queries identify tuples for updating or deleting.

[edit] Base and derived relations

Main articles: Relvar and View (database)

In a relational database, all data are stored and accessed via relations. Relations that store data called "base relations", and in implementations are called "tables". Other relations do not store data, but are computed by applying relational operations to other relations. These relations are sometimes called "derived relations". In implementations these are called "views" or "queries". Derived relations are convenient in that though they may grab information from several relations, they act as a single relation. Also, derived relations can be used as an abstraction layer.

[edit] Domain

Main article: data domain

A domain describes the set of possible values for a given attribute. Because a domain constrains the attribute's values, it can be considered constraints. Mathematically, attaching a domain to an attribute means that "all values for this attribute must be an element of the specified set."

The character data value 'ABC', for instance, is not in the integer domain. The integer value 123, satisfies the domain constraint.

[edit] Constraints

Main article: Constraint

Constraints are a way of implementing business rules in the database. For instance, a constraint can restrict an integer attribute to values between 1 and 10.

Constraints restrict the data that can be stored in relations. These are usually defined using expressions that result in a boolean value, indicating whether or not the data satisfies the constraint. Constraints can apply to single attributes, to a tuple (restricting combinations of attributes) or to an entire relation.

Constraints are not formally part of the relational model, but because of the integral role that they play in organizing data, they are usually discussed together with relational concepts.

[edit] Keys

Main article: Superkey

A unique key is a kind of constraint that ensures that an object, or critical information about the object, occurs in at most one tuple in a given relation. For example, a school might want each student to have a separate locker. To ensure this, the database designer creates a key on the locker attribute of the student relation. Keys can include more than one attribute, for example, a nation may impose a restriction that no province can have two cities with the same name. The key would include province and city name. This would still allow two different provinces to have a town called Springfield because their province is different. A key over more than one attribute is called a compound key.

[edit] Foreign keys

Main article: Foreign key

A foreign key is a reference to a key in another relation, meaning that the referencing tuple has, as one of its attributes, the values of a key in the referenced tuple. Foreign keys need not have unique values in the referencing relation.

A foreign key could be described formally as "For all tuples in the referencing relation projected over the referencing attributes, there must exist a tuple in the referenced relation projected over those same attributes such that the values in each of the referencing attributes match the corresponding values in the referenced attributes".

[edit] Stored procedures

Main article: Stored procedure

A stored procedure is executable code that is associated with the database. Stored procedures usually collect and customize common operations, like inserting a tuple into a relation, or gathering statistical information about usage patterns. Frequently they are used as an application programming interface (API) for security or simplicity.

Stored procedures are not part of the relational database model, but all commercial implementations include them.

[edit] Indices

Main article: Index (database)

An index is one way of providing quicker access to data. Indices can be created on any combination of attributes on a relation. Queries that filter using those attributes can find matching tuples randomly using the index, without having to check each tuple in turn. Relational databases typically supply multiple indexing techniques, each of which is optimal for some combination of data distribution, relation size, and typical access pattern. B+ trees, R-trees, and bitmaps.

Indices are usually not considered part of the database, as they are considered an implementation detail, though indices are usually maintained by the same group that maintains the other parts of the database.

[edit] Relational operations

Main article: Relational algebra

Queries made against the relational database, and the derived relvars in the database are expressed in a relational calculus or a relational algebra. In his original relational algebra, Dr. Codd introduced eight relational operators in two groups of four operators each. The first four operators were based on the traditional mathematical set operations:

The union operator combines the tuples of two relations and removes all duplicate tuples from the result. The relational union operator is equivalent to the SQL UNION operator.
The intersection operator produces the set of tuples that two relations share in common. Intersection is implemented in SQL in the form of the INTERSECT operator.
The difference operator acts on two relations and produces the set of tuples from the first relation that do not exist in the second relation. Difference is implemented in SQL in the form of the EXCEPT or MINUS operator.
The cartesian product of two relations is a join that is not restricted by any criteria, resulting in every tuple of the first relation being matched with every tuple of the second relation. The cartesian product is implemented in SQL as the CROSS JOIN join operator.

The remaining operators proposed by Dr. Codd involve special operations specific to relational

The selection, or restriction, operation retrieves tuples from a relation, limiting the results to only those that meet a specific criteria, i.e. a subset of terms of set theory. The SQL equivalent of selection is the SELECT query statement with a WHERE clause.
The projection operation is essentially a selection operation in which duplicate tuples are removed from the result. The SQL GROUP BY clause, or the DISTINCT keyword implemented by some SQL dialects, can be used to remove duplicates from a result set.
The join operation defined for relational databases is often referred to as a natural join. In this type of join, two relations are connected by their common attributes. SQL's approximation of a natural join is the INNER JOIN join operator.
The relational division operation is slightly more complex operation, which involves essentially using the tuples of one relation (the dividend) to partition a second relation (the divisor). The relational division operator is effectively the opposite of the cartesian product operator (hence the name).

Other operators have been introduced or proposed since Dr. Codd's introduction of the original eight including relational comparison operators and extensions that offer support for nesting and hierarchical data, among others.

[edit] Normalization

Main article: Database normalization

Normalization was first proposed by Dr. Codd as an integral part of the relational model. It encompasses a set of best practices designed to eliminate the duplication of data, which in turn prevents data manipulation anomalies and loss of data integrity. The most common forms of normalization applied to databases are called the normal forms. Normalization trades reducing redundancy for increased information entropy.

[edit] References

^ Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377-387.

Categories: Databases | Database theory

See also ebooksgratis.com: no banners, no cookies, totally FREE.