You are here

Introduction to SQL

Introduction to SQL

Database

A database is simply a collection of data. Computer databases such as Oracle, Access, SQL Server and so on are typically collections of data that are, in some way or another, related to each other. For example, a payroll database is a collection of pertinent data about people, wages, work hours, departments and so on.

Tables

The table is the basic organizational structure for data in a relational database. All of the data in a table should be based on one particular type of object. Using the payroll example from above, most database designers would decide that there should be a table for employee information, another for departments, another for recording hours worked, another for payroll deduction rules and so on. The process of breaking a database up into tables is called normalization (see below).

Each table in a database can be broken down into rows and columns (you can think of the individual tables in a database as if they were a spreadsheet).

Columns

Each column in a database table contains all the data about one piece of the information stored in the table. In a phone book we typically have three columns (name, address and phone number). The name of each column and the type of information it contains are a major part of the definition of the table.

Rows

Each row in a table contains one piece of information for each column in the table for one object. In our phone book example one row would contain information on the name, address and phone number of one person. It is possible that a piece of information could be left blank (null in database parlance) so a row in a database actually can contain up to one piece of information for each column in the table.

By the way, another commonly used word for row is record.

Fields

A field is simple the intersection of a row and a column in the table. In spreadsheet terminology this would be called a cell. For example, somebody's phone number in a phone book table would be an example of a field.

Relationships

The major concept that sets relational databases apart from spreadsheets (or flat file databases) is the ability to define and use relationships between tables. We can decide that certain rows from one table contain information that is related to information in another table.

Primary Keys
In almost all cases in a database it is important that every row of information in a table be somehow different from all of the other rows in that table. Not only that but as database designers we need to decide what characteristic about that row will be unique. This is necessary so that we can easily get information on or modify information for a single row of information in the table. This piece (or combination of pieces of information) that we decide must be unique is then a possible choice for primary key.

In short, the primary key is the combination of information for each row in a table that is used to uniquely identify each row in the table.

Foreign Keys

In almost all databases certain records or rows should only be allowed to be in the table if there is a corresponding record or row in another table. A foreign key is a rule that ensures that if a record with a certain primary key (unique identifier) does not exist in one table, we cannot create a corresponding entry in another table.

The foreign key also works in the opposite direction. It ensures that we cannot delete information from one table if there is corresponding information in another table It is, by the way, an extremely common misconception that a foreign key is a definition of a relationship for the purposes of retrieving information from the database. In a strict sense this is definitely not the case

Normalization

Normalization is the process by which a database designer decides what information belongs in which tables. Normalization is usually discussed in terms of normal forms (the most commonly used of which are the first, second and third normal forms which are commonly summed up as “every piece of information in a row in a table should depend on the primary key, the whole primary key and nothing but the primary key”).

Further discussions of normalization are well beyond the scope of this introduction.

Standard Query Language (SQL)

Standard Query Language or SQL is simply a language for communicating with relational databases. It allows programmers to read information from the database (DRL), add and update information in the database (DML) and change the definition of the information in the database (DDL).

Data Definition Language (DDL)

SQL has a set of commands such as alter table which allow a database programmer to change the definition of various objects in the database. Using this set of commands columns can be added to a table, deleted from a table or simply altered whenever the programmer wants to.

Data Retrieval Language (DRL)

SQL has one command for retrieving information from the database. It is the ‘select’ command. The basic purpose of the select command is to give you a result table (rows and columns of information) based on the requirements you give in the select statement clauses.

Clauses

Select

This is where you specify what information you wish to extract from the database.

From

This is where you specify the tables that contain the information you wish to retrieve.

Where

This is where you specify the conditions that the information has to meet to be retrieved from the database.

Group By

In some cases the actual information in the database is not of interest but some sort of grouping of the information (e.g., an average or a count). SQL gives you the ability to group information and display the results of these groupings.

By the way group functions are called aggregate or aggregation functions in most documentation.

Having

The having clause is very similar to the where clause except that it allows the rest of the query to run and then further reduces the result table based on the conditions specified.

Subqueries

In most implementations of SQL a query can contain another query. Although we will probably not be able to devote much time to subqueries they are an extremely powerful aspect of SQL.

Data Manipulation Language (DML)

The data manipulation commands allow you to modify information in the database.

Insert

The insert command is used to add new rows of data to a table in the database.

Update

The update command is used to modify information in row of information that are already in the database.

Delete

The delete command is used to remove data from the database.