SQL

The educational technology and digital learning wiki
Jump to navigation Jump to search

Definition

According to Wikipedia, “SQL, commonly expanded as Structured Query Language, is a computer language designed for the retrieval and management of data in relational database management systems, database schema creation and modification, and database object access control management. SQL has been standardized by both ANSI and ISO.”, retrieved 11:47, 4 September 2007 (MEST).

See also: MySQL and the SQL and MySQL tutorial (this article just provides a short summary of SQL and points to further reading on Wikipedia)

Overview of language elements

This overview chapter is copied from Wikipedia SQL article. (retrieved 11:47, 4 September 2007 (MEST)) with minor markup modificiations. Its contents are available under the GNU Free Documentation License. Links lead to specialized Wikipedia articles.

This chart shows several of the SQL language elements that compose a single statement.

The SQL language is sub-divided into several language elements, including:

  • Statements which may have a persistent effect on schemas and data, or which may control transactions, program flow, connections, sessions, or diagnostics.
  • Queries which retrieve data based on specific criteria.
  • Expressions which can produce either (computing) scalar values or tables consisting of columns and (database) rows of data.
  • Predicates which specify conditions that can be evaluated to SQL three-valued logic (3VL) Boolean truth values and which are used to limit the effects of statements and queries, or to change program flow.
  • Clauses which are (in some cases optional) constituent components of statements and queries.[1]
  • Whitespace is generally ignored in SQL statements and queries, making it easier to format SQL code for readability.
  • SQL statements also include the ";" statement terminator. Though not required on every platform, it is defined as a standard part of the SQL grammar.

Queries

The most common operation in SQL databases is the query, which is performed with the declarative SELECT keyword. SELECT retrieves data from a specified table, or multiple related tables, in a database. While often grouped with Data Manipulation Language (DML) statements, the standard SELECT query is considered separate from SQL DML, as it has no persistent effects on the data stored in a database. Note that there are some platform-specific variations of SELECT that can persist their effects in a database, such as Microsoft SQL Server's proprietary SELECT INTO syntax. (SQL Server 2005 Books Online)

SQL queries allow the user to specify a description of the desired result set, but it is left to the devices of the management system database management system (DBMS) to plan, optimize, and perform the physical operations necessary to produce that result set in as efficient a manner as possible. A SQL query includes a list of columns to be included in the final result immediately following the SELECT keyword. An asterisk ("*") can also be used as a "wildcard" indicator to specify that all available columns of a table (or multiple tables) are to be returned. SELECT is the most complex statement in SQL, with several optional keywords and clauses, including:

  • The FROM clause which indicates the source table or tables from which the data is to be retrieved. The FROM clause can include optional JOIN clauses to join related tables to one another based on user-specified criteria.
  • The WHERE clause includes a comparison predicate, which is used to restrict the number of rows returned by the query. The WHERE clause is applied before the GROUP BY clause. The WHERE clause eliminates all rows from the result set where the comparison predicate does not evaluate to True.
  • The GROUP BY clause is used to combine, or group, rows with related values into elements of a smaller set of rows. GROUP BY is often used in conjunction with SQL aggregate functions or to eliminate duplicate rows from a result set.
  • The HAVING clause includes a comparison predicate used to eliminate rows after the GROUP BY clause is applied to the result set. Because it acts on the results of the GROUP BY clause, aggregate functions can be used in the HAVING clause predicate.
  • The ORDER BY clause is used to identify which columns are used to sort the resulting data, and in which order they should be sorted (options are ascending or descending). The order of rows returned by a SQL query is never guaranteed unless an ORDER BY clause is specified.

The following is an example of a SELECT query that returns a list of expensive books. The query retrieves all rows from the books table in which the price column contains a value greater than 100.00. The result is sorted in ascending order by title. The asterisk (*) in the select list indicates that all columns of the books table should be included in the result set.

SELECT * 
FROM books
WHERE price > 100.00
ORDER BY title;

The example below demonstrates the use of multiple tables in a join, grouping, and aggregation in a SQL query, by returning a list of books and the number of authors associated with each book.

SELECT books.title, count(*) AS Authors
FROM books
JOIN book_authors 
ON books.isbn = book_authors.isbn
GROUP BY books.title;

Example output might resemble the following:

Title                   Authors
----------------------  -------
SQL Examples and Guide     3
The Joy of SQL             1
How to use Wikipedia       2
Pitfalls of SQL            1
How SQL Saved my Dog       1

(The underscore character "_" is often used as part of table and column names to separate descriptive words because other punctuation tends to conflict with SQL syntax. For example, a dash "-" would be interpreted as a minus sign.)

Under the precondition that isbn is the only common column name of the two tables and that a column named title only exists in the books table, the above query could be rewritten in the following form:

SELECT title, count(*) AS Authors
FROM books 
NATURAL JOIN book_authors 
GROUP BY title;

However, many vendors either don't support this approach, or it requires certain column naming conventions. Thus, it is less common in practice.

Data retrieval is very often combined with data projection when the user is looking for calculated values and not just the verbatim data stored in primitive data types, or when the data needs to be expressed in a form that is different from how it's stored. SQL allows the use of expressions in the select list to project data, as in the following example which returns a list of books that cost more than 100.00 with an additional sales_tax column containing a sales tax figure calculated at 6% of the price.

SELECT isbn, title, price, price * 0.06 AS sales_tax
FROM books
WHERE price > 100.00
ORDER BY title;

Data manipulation

First, there are the standard Data Manipulation Language (DML) elements. DML is the subset of the language used to add, update and delete data:

  • INSERT is used to add rows (formally tuples) to an existing table, eg:
INSERT INTO my_table (field1, field2, field3) VALUES ('test', 'N', NULL);
  • UPDATE is used to modify the values of a set of existing table rows, eg:
UPDATE my_table SET field1 = 'updated value' WHERE field2 = 'N';
  • DELETE removes zero or more existing rows from a table, eg:
DELETE FROM my_table WHERE field2 = 'N';
  • MERGE is used to combine the data of multiple tables. It is something of a combination of the INSERT and UPDATE elements. It is defined in the SQL:2003 standard; prior to that, some databases provided similar functionality via different syntax, sometimes called an "upsert".

Transaction controls

Transactions, if available, can be used to wrap around the DML operations:

  • BEGIN WORK (or START TRANSACTION, depending on SQL dialect) can be used to mark the start of a database transaction, which either completes completely or not at all.
  • COMMIT causes all data changes in a transaction to be made permanent.
  • ROLLBACK causes all data changes since the last COMMIT or ROLLBACK to be discarded, so that the state of the data is "rolled back" to the way it was prior to those changes being requested.

COMMIT and ROLLBACK interact with areas such as transaction control and locking. Strictly, both terminate any open transaction and release any locks held on data. In the absence of a BEGIN WORK or similar statement, the semantics of SQL are implementation-dependent. Example:

BEGIN WORK;
UPDATE inventory SET quantity = quantity - 3 WHERE item = 'pants';
COMMIT;

Data definition

The second group of keywords is the Definition Language (DDL). DDL allows the user to define new tables and associated elements. Most commercial SQL databases have proprietary extensions in their DDL, which allow control over nonstandard features of the database system. The most basic items of DDL are the CREATE, ALTER, RENAME, TRUNCATE and DROP statements:

  • CREATE causes an object (a table, for example) to be created within the database.
  • DROP causes an existing object within the database to be deleted, usually irretrievably.
  • TRUNCATE deletes all data from a table (non-standard, but common SQL statement).
  • ALTER statement permits the user to modify an existing object in various ways -- for example, adding a column to an existing table.

Example:

CREATE TABLE my_table (
 my_field1   INT,
 my_field2   VARCHAR (50),
 my_field3   DATE         NOT NULL,
 PRIMARY KEY (my_field1, my_field2) 
);

Data control

The third group of SQL keywords is the Data Control Language (DCL). DCL handles the authorization aspects of data and permits the user to control who has access to see or manipulate data within the database. Its two main keywords are:

  • GRANT authorizes one or more users to perform an operation or a set of operations on an object.
  • REVOKE removes or restricts the capability of a user to perform an operation or a set of operations.

Example:

GRANT SELECT, UPDATE ON my_table TO some_user, another_user.

Other

  • ANSI-standard SQL supports double dash, --, as a single line comment identifier (some extensions also support curly brackets or C style /* comments */ for multi-line comments).

Example:

SELECT * FROM inventory -- Retrieve everything from inventory table

References

  1. ANSI/ISO/IEC International Standard (IS). Database Language SQL, Part 2: Foundation (SQL/Foundation). 1999

Links

Overviews
  • SQL (Wikipedia)
Introductory Tutorials
  • What is SQL ? (SQL Course.com). Includes an online SQL interpreter where you can test expressions.
  • Greespun. Philip (2003). SQL for Web Nerds, HTML
PHP
Tools
  • SchemaSpy analyzes schema metadata, letting you click through the hierarchy of tables' parent/child relationships. Works with just about any RDBMS that supports JDBC (Oracle/MySQL/DB2/SQL Server/PostgreSQL/Sybase/etc).