Part of what has unbottled community publishing is the use of databases to store and deliver content. It becomes clear to anyone working for an extended period on a website that there’s a lot of repetitive work. Unless you have some way to manage your content, a large site quickly becomes unmanageable. Databases are key to managing that growth.
Databases are one of the key publishing tools for your site. Here, we look at databases as isolated components – used for specific applications – on your site. We will cover the key terms and concepts you should understand.
What’s a Database?
A database, at its most basic, is a structured collection of information, and sometimes also a method for manipulating that data.
A spreadsheet can be a database (if a list of data is stored in it). A phone book is a database, though it’s not nearly as useful if it’s not in digital form. Apple’s “Address Book” is primarily a database, though a very specialized one. Apple’s iTunes music player also uses a database to keep track of a user’s music library.
Confusion arises because in common usage, the data and the program that lets you access it are both frequently called databases. Strictly speaking, this isn’t correct. A piece of software that lets you change or add to a database is a database application. For instance, Microsoft Excel is not a spreadsheet, it’s a spreadsheet application. Microsoft Word is not a document, it’s a word-processing application. And so Microsoft Access, one of the most commonly used database applications, isn’t really a database. But you should be prepared to hear people call MySQL or MS Access “databases.” Now you’ll know what they really mean.
One possible reason this concept is a little blurry is because of Microsoft Access, which often stores multiple bits of information (such as forms, scripts, images and queries) in the same file as the basic database information with a .db extension.
A database server, it should be noted, is a database application that also provides a way to share that data with other remote machines. This is different from a web server, which provides web pages to other remote machines, because a database server provides a direct, underlying connection to a database. A database file by itself isn’t enough. You’ll need a database server to interpret, order and share the data with web pages.
You may hear people talk about relational databases. This really means a database that has more than one set of data, with ties between that data. Most databases in use on the web are relational, and the software applications we cover here all build and use relational databases.
Do You Need a Database?
The short answer is an unequivocal yes. No modern site can be effectively run without an underlying database keeping track of revisions, creating multiple versions from the same content, and updating quickly and correctly across all pages.
A better set of questions might be: How many databases do you need? Should a particular set of data be made into an interactive database and how? At what point does the extra development of a database pay you back in time saved when adding, revising or tracking content on your site?
These are difficult questions to answer. If you have a set of data, for example, a list of high school sports teams, we recommend creating a database if it meets one or more of the following criteria:
- It’s a large volume of data. It’s simply easier to format a large amount of data by outputting from a database instead of coding a lot of HTML by hand.
- It’s going to be updated regularly. It’s usually simpler and less prone to error to update a database instead of tweaking HTML code.
- It’s going to be updated by multiple people. It’s easier for more than one person to edit information if it’s stored in a single central location rather than scattered through dozens of individual HTML files.
- It needs to be searchable. It’s more difficult to set up searches for databases, but the results are more flexible and precise. With a regular HTML page, you can only search the text of the page. With database data, you can search for specific portions of the page. For example, on a database-powered news site, you could search for every story whose headline contains the word ‘Belgium,’ or every story with a byline of ‘Bob Woodward’.
- It will be re-used in multiple ways. If you’re going to syndicate, repurpose, deliver by e-mail, by SMS, by RSS, or allow people to comment on or modify the information, you need to serve it out of a database so that you don’t end up doing work twice.
Key Parts of a Database
A database is made up of one or more tables. A table is a set of records. A record is made up of fields. Let’s start with fields.
Fields
Fields contain a unit of data, such as a name, an amount, a date, or a true/false value. (For instance, “Do you want our e-mail newsletter?”)
When you set up your database, you decide what kind of data each field will hold. Switching that later can be difficult, because you will likely have to modify your code in several places. Using the wrong data type also makes your database larger and slower than it needs to be. On the positive side, using the right field type makes it easier to sort, filter or use the data in that field.
Because of this, a lot of the development time in database work is done up front, creating the proper fields and tables.
Fields are sometimes referred to as columns because when you’re looking at a number of records, the data line up in columns corresponding to fields.
Records
A record of data is a collection of fields that are worked on at the same time. Computer scientists call them a tuple, but you don’t have to be that fancy. They’re usually visualized and commonly referred to as a row. When you work with data from a database you work on it one row at a time.
Each record in a database must have a unique way of identifying it. This is called a key. A key is a set of one or more fields that together will let the program isolate and identify a single row. Name, place of birth and birth date, for example, could be a key in a database of people.
Rows are more often identified through a single unique value, usually a number, known as a primary key. A customer ID number or an article ID can be the unique value. Sometimes you’ll choose as a primary key a value that you think will be unique — such as an e-mail address or a license plate number. But if it turns out that two people have the same one, then you’ll have a problem.
So, if two Henry Johnsons were born in Detroit on the same day, you would not be able to enter them both in the database without later causing problems.
There are some rare cases when you don’t need a key in a database table. You should think of hiring someone who knows about those cases.
Tables
A single database table is simply a collection of records.
What gets complicated is that, in a relational database, you have multiple tables so that data are stored efficiently. The ties between those tables can get complicated quickly.
Keeping tables as small and efficient as possible is very important. For example, if you track local events, you might end up with a table listing 150,000 events. If you had a phone number and address associated with each event, that would be 150,000 instances of perhaps 600 characters. That is a total of about 90 megs of disk space and would slow down all the sorting and searching of your database. If you were able to store phone numbers separately from events to avoid duplicating them over and over, your sorting and searching could run much more quickly.
SQL
SQL is the generic language used to access and modify databases. It stands for “Simple Query Language,” and most modern database applications understand requests made in SQL format.
Why is this important? Because it means that, if necessary, you can switch the underlying database application and not have to completely recode your website. The same commands sent to the old database application will be understood by the new application.
The catch is that even though SQL is a standard, every database application understands a slightly different set of SQL commands, and most also implement extra commands that control the special features of that particular application. In practice, changing databases is a little like putting a new motor in your car: It can be done, and will give you quite a speed boost, but it costs a lot and should be done by a professional.