Information Structuring in Databases

Every database has a database . A database schema defines the structure and organization of the database and the information that it contains. Hence, in order for information to be captured within a database, it must be analyzed and made to fit within an organized structure that is consistent with the purposes of the database.

For example, let's assume that we want to provide information about certain people in a database. Clearly, all the information about a person is incredibly complex. So, our first step is to define the purpose of the database and determine what information is relevant. The next step is to determine what pieces of information about that person are relevant for this particular database. Those pieces of information are called attributes. Finally, one of the most difficult steps is to determine the relationships that are connected to a person in the database. The following list identifies a few different types of databases and the types of attributes and relationships that might be captured in each. Notice that each of these items listed below refers to the attributes of a "person," yet the list of attributes is distinct.

  • Customer database: Attributes include name, address, telephone number, account balance, credit rating, credit card number, and so on. Relationships may include details of purchases or details of payments.

  • Movie star database: Attributes include stage name, real name, birthday, age, spouse name, and so on. Relationships may include details of movie or TV roles, previous marriages, or a gallery of photos.

  • University professor database: Attributes include name, address, telephone number, office number, office phone number, years of work, title, department, college, salary, and so on. Relationships may include details of publications, details of university committees, awards, current classes, previous classes, or student ratings.

  • Student database: Attributes include name, address, telephone number, total hours completed, current registered hours, total GPA, overall standing, and so on. Relationships may include details of courses and grades, current registered courses, library materials, or campus credit balance.

The above examples of diverse groups of people are rather simple. In many databases, attributes and relationships are easy to define, primarily due to the focused nature of the database. However, as we try to capture and provide information about diverse and complex elements of the real world, it can become difficult and challenging to determine how to best structure the information. Normally it is not just one entity that needs to be described, but an entire set of different, and often diverse, entities that need to be described and their relationships identified. For example, looking at the public databases in the previous section, how does one accurately describe the world's economic data, or detail the genomes and DNA sequences of all invertebrates? Without question, designing those databases must have required extensive understanding of real world contexts.

Even though the designing and structuring of complex data into meaningful information can be a challenging problem, it also has tremendous benefits. The mere process of trying to impose structure onto diverse sets of data helps us to understand the nature of data. Databases provide a tremendous benefit to our understanding of the real world as data is structured and codified. Figure 1.1 illustrates a sample database structure to maintain information about an incident such as an automobile accident or a workplace-related occurrence. Notice how many different pieces of information must be captured for a simple concept such as an "incident."

Figure 1.1: Sample "Incident" Database Structure

Let's briefly look at some of the processes required to codify data and turn it into meaningful information. Interesting questions about information include the following:

  • What is information?

  • How does it exist in the world?

  • How are databases used to help organize, structure, record, and provide information by identifying new relationships that were not seen previously?

There are many different definitions and philosophical concepts about information. Our short discussion will barely scratch the surface of defining information. However, we can note that information is more than just data or facts. For our purposes, let us define information as a set of data facts that (1) can be labeled, (2) are identified within a larger context, and (3) have identified relationships with other facts. For example, if we have a piece of data such as "2875039275," then we clearly do not have much information. However, if we label the data as a phone number, then the data begins to contain meaningful information. If I add that this phone number belongs to John Appleby, then I have added both a context and a relationship and even more information is available to me.

Additional information is added if I identify that it is a cell phone number; now I know that I can send a text message to that number. We also gain additional information on phone numbers by adding more contextual and relationship information. This might include whether a particular phone number is a business telephone or a residential telephone, and whether the phone is a landline (with a physical location), or cellular (with a current cell tower location).

As you can see, meaningful information is more than just a set of facts. Databases are uniquely qualified to store a large set of data facts, attach labels to them, maintain data about the context, and establish important relationships.

Entities and Attributes

Every database will contain information about many different entities. An is a specific type of object or type of thing. In the Unified Modeling Language, or UML, an entity is referred to as a class. It can also be considered a classification of the specific type of items. (Note: We will use the terms entity and entity set interchangeably to refer to the set of items that are the same type.) A database will describe both the details about each entity and the relationships between entities. Entities usually come from physical world items such as customers or vehicles. However, abstract items, such as purchases or sales, are also entities. Detailed information about the properties of each entity are called . For example, attributes of Customers in a database will be things like Name, Address, and Phone Number. The database schema will identify all the entities to be included in the database and the attributes for each entity. The schema describes the structure of the data in the database.

Figure 1.2 is an example of a visual data model of some of the information that is maintained in a database as part of the schema. This diagram is a partial database model for a shoe retail store that maintains detailed information about its customers. Information about each sale is kept in the database along with the individual pairs of shoes that are sold (SaleItems) as part of the sale. Other information includes product information (shoes) as well as the manufacturers of the shoes. As you can see in the figure, there are six different entities as identified by the boxes.

Figure 1.2: Data Entities for a Shoe Retail Store

Within each box is a list of the attributes for each entity. The attributes define the detailed information that is kept about each entity. The entities and their list of attributes are part of the database schema. In Figure 1.2, we see that attributes for Employee are EmployeeID, FirstName, LastName, Address, and so forth.

Each entity has one or more attributes that is defined as a attribute. In the figure above, we have identified those attributes with a small key icon in the attribute rectangle. For example, for a Customer entity, the key attribute is CustomerID. A key is defined as an attribute whose value is unique or distinct for each record in the entity. Notice that a SaleItem requires three fields as the key—SaleID, ProductID, and ItemSize. It takes all three attributes to uniquely identify each SaleItem. Examples of key values are given below in the explanation of data.

Relationships

The database also needs to capture and describe information about the between the entities. In UML. relationships are also called associations. In this example, we want to know which sales were done by which customers. Sale Items are associated with each sale. In other words, a sale may include several pairs of shoes. Each sale item, which is a pair of shoes, is described in detail by product information. Finally, each product is manufactured by a particular manufacturer. The visual data model in Figure 1.2 is not sufficient to provide this information. Figure 1.3 expands the model by defining the important relationships. These relationships are identified by lines connecting the boxes of related entities.

Figure 1.3: Data Entities with Relationships for a Shoe Retail Store

One good way to capture relationship information is by using an entity’s key. Because keys uniquely identify a record, if we put the same key value in the data of a related record, then the two records are related to each other. In Not Found the Sale entity includes the CustomerID attribute, which is another entity that gives information about the customer. Thus, we know which customer purchased items on a particular sale. The Product entity includes the ManufacturerID attribute, which, again, is its own entity; we know who the manufacturer is for any given product. When a key attribute for one entity is placed in a different entity, it is called a in that different entity. In other words, a foreign key is a key attribute that belongs in one entity, but is listed in a different entity. In SaleItem, SaleID is a foreign key because it is the primary key for Sale. SaleItem's primary key is made up of three, and exactly three, fields: SaleID, ProductID, and ItemSize.

There is also a relationship that connects an entity with itself. This is called a . In the Employee entity, the manager is also an employee, so the ManagerID value must refer to an EmployeeID.

The visual data model described in this section is an easy, and powerful, technique that is used to understand the schema for a given database. Obviously, the DBMS must contain this information within the database schema so that it can organize and maintain the correct data.

Data

As indicated above, the database schema describes the structure of the database. The schema is not the same as the actual data. Each one of the entities, along with its attributes, can be thought of as a database . A database table is like a spreadsheet with the name of the spreadsheet being the entity name, the attributes being the columns of the spreadsheet, and the data being the rows. In database terminology, the rows can be called rows or records. Figure 1.4 illustrates this concept for the Employee entity. (Note: Click on the three dots on either side of the page to expand the page and make the table bigger.)

Figure 1.4: Employee Data

Remember, it is important not to confuse the database schema with the actual data. The schema defines the structure and the data is the actual information about each individual Employee, Customer, or Product.

Earlier, we introduced the idea of a key attribute. We can see from the data how the key attribute serves as the mechanism to uniquely identify each row or each record in the table. This is important because we could have two employees who have the same name or live in the same house. The key attribute solves this problem by guaranteeing that each record will have a unique value. We are familiar with this concept. Your bank account has a unique number. Your medical records are identified with a key, which may be your Social Security number. In the example, we have used a somewhat standard notation by naming the key with "ID". At times, key fields are also identified by a “_no” or “_id” as part of the name. The DBMS enforces this requirement so that no two records in the same table have the same key value.