Adding ‘Document’ Content Types

|

The core feature of the ‘Ark of the Government’ website is to store and catalog US Federal Government documents (and perhaps later, other governments).

This is already accomplished on a number of other public and private sites, obviously, but I need to learn Drupal and I can’t think of anything better. 🙂

In WordPress, the fundamental database entry is known as a 'post' and stored in the wp_posts table. In Drupal, the equivalent basic database entry is known as a 'node' and stored in the 'node' table.

We’ll need a special Drupal content type to store our Documents in a one-to-one relationship–every one document gets one Drupal node.

Defining a ‘Government Document’

As I began this project I thought it would be as simple as ‘create a repository of federal documents.’ I didn’t realize the full scope of that idea, and how oversimplified it was. The distinction of what classifies laws, acts, statutes and more seems to be quite broad, and it may be difficult to compartmentalize them into neat little boxes.

Some simple questions help define the problem:

  1. “How many total documents are there?”
  2. “Who writes these documents?”
  3. “Where are these documents created?”
  4. “How can I get access to these documents?”

There is too much information to get started; I need to narrow the focus.

So to start, I’ll just be looking at enacted legislation by the US Congress.

A Brief Civics Refresher

As of this writing in the year 2017, it’s been 228 years since the first congressional session in 1789. Since every term is two years (as defined by the length of House of Representatives term), in January of 2017 we started on our 115th Congressional Term.

A piece of legislation, a ‘bill,’ can be introduced to the Senate or House of Representatives. The bill will be scrutinized, edited, and ultimately put to a vote in both bodies. If it passes both house of congress with a simple majority, it becomes ‘enrolled’ and goes to the President for approval. If the President approves and signs, it becomes a law. If the President vetoes, the legislation can still become a law if the House and Senate overrides the veto with a two-thirds majority vote.

Over the course of the last 114+ congressional terms, some 20,000 pieces of legislation have become law in this manner. As mentioned above, to simplify this exercise for now, we’ll just be using these enacted laws.

Of course, School House Rock famously simplified this process with their 3 minute ‘How a Bill Becomes a Law’.

An Example Document: The ADA

Congress.gov already hosts an excellent legislative document repository with simple field filtering. It lists 11,955 pieces of legislation that have become law.

Here is a famous government document we can examine: The Americans with Disabilities Act of 1990. Here is the filtered search to find it on Congress.gov.

The ADA started in the Senate in 1989, introduced and sponsored by Senator Tom Harkin of Iowa as S-933 from the 101st Congress.

When the bill became law it was formally enacted as Public Law 101-336 in 1990. It’s the Office of the Federal Register that assigns the Public Law number.

Here is the PDF as listed on the Government Publishing Office website. Below is an image of the document’s first of 52 pages.

So here we can easily identify the different ‘pieces’ of the document. It has both ‘short’ and ‘long’ titles, a table of contents, a Public Law number, a few dates and other piece of codification.

Most of these ‘pieces’ will be commonly shared across all Public Law documents. It’s this common information we’ll set up as Drupal ‘nodes’ and ‘fields’ in the next sections.

‘Document’ Content Type Schema

Drupal ‘Node’ Basics

A Drupal ‘node’ is a basic piece of content stored in the node table of the database. It has a unique table ID, or node.nid, along with standard content information like title, created and edited dates. There are additional pieces of information stored in the node table, like the language, status and number of comments on the entry.

Each node also has ‘content type’ to group similar types of content. The default Drupal content types article (for dynamic blog type content) and page (for more permanent static content).

Every node’s default URL structure is /node/1. This can be overwritten with a node alias, for example, using the title of the node as the URL /the-node-title. Like most modern CMS’s Drupal URL aliases can get very complex; more on that below.

Drupal also has ‘Fields’: arbitrary buckets of information that can be attached to nodes. For example, the ‘body’ text of the node and the ‘attached image’ are both stored as fields.

Lastly, Drupal groups nodes together with ‘Taxonomies’ to classify content, with the use of  tags or categories.

Document Schema

Our ‘Document’ content type will start pretty generically at first, similar to an ‘Article’ content type. It will have these fields:

  • Title
    • This is a required field that all Documents must have
    • For now, we will also use a version of this field as the URL alias
    • For human readability, this will be the ‘short title’ of the document.
    • Using the ADA as an example, the ‘Title’ of our Document will be ‘Americans with Disabilities Act of 1990’ 
    • That creates the question though: “Do all public laws have short titles?”
  • Body
    • This will potentially store all the text of the Public Law, though for now it will remain blank
  • Document Publication Date
    • This is the enactment date
    • For the ADA, this date is ‘7/26/1990’  or 648950400 as an epoch timestamp.
    • The template should not show the Drupal created or changed timestamp.
  • Public Law Number
    • A simple text field
    • For the ADA, this is ‘101-336’
  • A ‘Featured Image’
    • A snapshot of the first page of the document
    • For the ADA, this would be the image used in the above example
  • No comments
  • No revisions

We’ll keep this content type simple for now. Ultimately though, a ‘Document’ content type will be extended with these possible taxonomies, or connections to other possible content types

  • Country
  • Authoring Body
  • Sponsor
  • Official Links to Government Websites
  • Abstract ‘Declaration’ or ‘Rule’ the document attempts to implement

Creating Content Types from the Web Admin Interface

In Drupal, Content Types are stored as a configuration in the database, as opposed to a configuration in code. I’m accustomed to them being stored as code in WordPress, so this process may be a bit different for me at first.

In my opinion, site building configurations like this should be stored as code. However, I’m trying to learn as much as possible about Drupal, so I’ll try every way I can.

I ultimately want to know how Content Types are stored in the database, so first we’ll set up a bit of a backup configuration to reset when necessary. This way we can experiment a bit.

Backing up the Database with Drush

I may want to regularly back up the site during deployments. If the CI_RELEASE variable is set (checking with this method), that means it’s in the middle of a deployment, and I’ll want to use that date as the file name. Otherwise I can use the current date with format (date +%Y%m%d%H%M%S).

I’d like to update it to use the DRUPAL_ROOT instead of .., but this will do for now.

Quick restore the Database with Drush?

I might want a similar drush-restore.sh script in the future, for now I’m gonna skip it and do it manually.

Creating the Content Type

Adding the Content Type is super fast. Go to /admin/structure/types/add to find the web GUI.

There we’ll configure these settings:

  • Name: Document
  • Description: A government document
  • Preview before submitting: Disabled
  • Explanation or submission guidelines: A Document must have a title
  • Display Settings: Uncheck “Display author and date information.”
  • Comment Settings: Select “Closed” for “Default comment setting for new content”

Click “Save Content Type.”

It’s crazy easy to add a content type in the admin interface.

Add Drupal Fields to ‘Document’ Content Types

According to the basic specification we created above we need to add a ‘Public Law Number’, ‘Publication Date’ and ‘Document Image’ fields.

Here’s how to add the fields. Go to /admin/structure/types/manage/document/fields or click the “Manage Fields” tab in the edit content type admin screen.

‘Public Law Number’ is the simplest field to add, it’s a simple text field that doesn’t need to store a lot of information. Let’s limit that to 10 characters. The example Public Law Number was 101-336.

Step through the GUI wizard to add a new text field

Installing the ‘Date’ Module

By default, you cannot add ‘dates’ as a field content type. We more than likely want to store the ‘Publication Date’ as a Unix Timestamp, though it might have some longterm boundary issues (a timestamp can only go so far back in time).

A timestamp will work for now though. To save fields with timestamps, we have add and enable the ‘Date’ module.

I added drush pm-enable date -y to the releases.sh file to install and enable the Date module. Doing this of course will install and activate it later on the staging server, as created during the sub-theme post.

Now I can create the ‘Publication Date’ field.

I chose to keep the select drop downs. Your mileage may vary.

Lastly, add the ‘Document Image’ field as an ‘image’ type. I left all the meta fields as default for now.

Adding the first Document

Now that we have our ‘Document’ Content Type configured with fields we can add the ADA.

Navigate to /node/add/document/ and enter all the information as it appears below.

After saving, it will then appear like this on the front end. I later updated the URL alias field to americans-with-disabilities-act-of-1990

A Quick Look at the Database

Now that we have our ‘Document’ Content Type built and our first Node added, let’s take a quick look at the database.

The `node_type` table is updated with a ‘document’ record
The `node` table has a new record with nid 3
The `url_alias` table stores the default ‘source’ url with the ‘alias’ to overwrite it with.
The `field_config` table stores the additional ‘Document’ fields we created…
… and the `field_config_instance` table stores the relationship between `field_id` and content types

Next Steps

This post gave a quick overview of defining and creating a new Drupal content type, along with showing how to create new nodes.

However, the content type configuration is still local to the development database and our configuration could use a bit of polish. The next steps will expand on this foundation and create a more usable admin.

Creating Content Types from Drush

This unfortunately does not seem possible. I would love to see something similar to Rails generator or wp-cli scaffold.

There might be work arounds for this, like maybe using raw php scripts in the releases.php deployment file.

Creating Content Types from Custom Modules

As mentioned content types should really be stored in code. This way they are tied to the repository and can be updated programatically.

Adding Advanced Fields and Relationships

We still need to add automatic URL aliasing based on the title and more fields, including external links taxonomies.

Importing Bulk Documents with Migration Script

Since the majority of this data is available publicly online, we should be able to rapidly import thousands of data points to quickly create a robust application.

Resources

  1. Drupal Docs: Understanding Drupal Content Types
  2. Drupal Docs: Working with content types and fields
  3. Drupal Docs: Backup Database with Drush
  4. Drupal Docs: About Nodes
  5. Drupal Project: Date
  6. USA: Laws and Regulations
  7. Wikipedia: ADA
  8. Wikipedia: United States Code
  9. Wikipedia: List of United States federal legislation
  10. Quora: What is the difference between law, act and statute?
  11. Senate.gov: Laws, Acts and Statutes
  12. House.gov: Legislative Process

Leave a Reply

Your email address will not be published. Required fields are marked *