Shortened URLs with dotCMS

While there are many solutions out there for doing URL shortening, most of them lack one feature that can be really important for a large site (at least without paying a hefty fee) – the ability to control destination URLs. This might not matter to a lot of folks, but can be extremely important if you migrate a site, move pages, or otherwise use those links in a way that would result in them breaking due to other site changes.

In all fairness, there is a good reason for this: you don’t want people creating shortened URLs and then changing them to point at malicious pages down the road. But, if you’re doing this all internally as a company (or even individually), you have some built in trust with yourself that you won’t do something to compromise your own links. There are tools out there that can do this for free that are perfectly good, like YOURLS. Nothing wrong with that solution. And platforms like WordPress have plugins that allow you to bolt the functionality onto your site. The nice part about rolling your own is it eliminates an additional tool or technology stack to maintain for the functionality.

Getting Started

Creating a shortener in dotCMS is actually pretty trivial thanks to URL Map Patterns and structured content. I’ll describe the simple approach here, and address some ways at the end that you could use to make it better. You’ll end up creating a new host, structure, and a couple pages. That’s it.

The Host

First, we’ll need a place to build the shortener. Since dotCMS supports multiple sites in one installation, simply create a new host. We’ll call it “URL Shortener” (note, you don’t have to name hosts by their TLD or subdomain. The names can be human-readable). Then in the Aliases field, we’ll list the domain name that will supply the shortened URLs.

Host Configuration

Host Configuration

Here’s the reason for this particular approach. It allows you to use a single host to handle all your shortened URL needs. If you only need one base domain for the shortener, you can skip the fancy name and alias approach, and just name it your shortened URL’s TLD. Doing it this way let’s you have multiple base domains though, which can be useful if you’re a company that wants multiple branded base TLDs. Save and activate!

The Structure

This is where the magic happens. Because, you know, all the data is here. There are only two absolutely required fields:

  1. Shortened URL slug
  2. Destination URL

I also left a title field, just for labeling purposes, and left it optional. My completed structure looks like this:

Structure Field Setup

Structure Field Setup

Tracking

In our case, I added a set of additional, optional fields that allow the person creating the shortened URL to specify certain tracking values compatible with Google Analytics. This way, when the redirect is processed, it will attache values to the URL so that you know where the user came from. One reason bit.ly and goo.gl are so useful is in part thanks to the analytics they provide natively. Using this approach, you can simply use Google Analytics to see how often shortened links are used.

URL Map Pattern

I feel like this deserves a special mention. This approach requires you to do a root level URL Map Pattern. In version 2.x, it’s possible for this to cause some performance issues. The same is theoretically true in 3.x as well, but the system should be efficient enough to handle things. This is because having a root level URL Map means every single HTTP request the system gets has to be checked against the pattern. Pages, files, everything. That just means an index hit for each request, but it’s something you should stress test first before deploying on a live site to be sure it’s not an issue.

At the very least, it will definitely cause a lot of log spam regardless of version, which means you’ll want to adjust your log4j settings to exclude the pattern misses. Doing that is relatively simple. Just add the following lines to your log4j.xml file in your config plugin:

The Code

Drop this into a VTL file, or simple widget, or whatever functionally suits your needs. It should work almost as is for most purposes. You can note that it will detect if your destination URL includes both a query string and anchor hash. The anchor is the important part, since we have to make sure it ends up on the end of the URL.

The 404

What you do with content 404s (people trying to use a shortened URL that doesn’t exist) is sort of up to you. If you want, you can let it fall through to the default dotCMS 404 page. Note, for code semantics in the example above, I use an #if...#else that includes an error statement, even though you won’t normally fall through to that else condition if the map pattern doesn’t match (unless you’ve altered the config value of URLMAP_FALLTHROUGH, in which case you could print the error message, but also log said error with the $dotlogger.error() viewtool method).

Making it Better

Front End Submission

As I’ve described it, making these URLs requires you to enter them through the backend. While maybe not an issue for some, it’s possible that you’ll want to make this available to other members of your company without needing them to go through the backend interface. You could easily build a form around the dotCMS RESTful content API to solve that problem.

Additionally, this would also allow you to easily replicate features like QR Code generation and such, if valuable to you.

Random String Generation

The URL Alias field can act like a “normal” URL Title field in my described setup (normal in the sense of the included URL Title custom field VTL file included in the default dotCMS host when you install). Title’s aren’t necessarily required or helpful all the time though on something like this, so you could tweak the custom field JS to generate a string on load, and/or add a button next to the field to generate the value.

Autofill utm_source Field Value

If you include the options for Google utm_* (or equivalent) tracking of the URLs, you could autofill this value either by default in the field settings, or dynamically based on the host alias it’s attached to (see Alias Limited URLs below). It could still be editable after the fact, but I find it nice to have default values actually attached to the content, rather than relying on them to fall to a default value programmatically in the Velocity code (for instance, making it easier to search on).

Hit Counter

Assuming you aren’t using tracking via other analytics, or you wanted a secondary way to sanity check the shortened URL usage, you could add a field to the structure for an integer counter that gets incremented each time a redirect is processed. You could either write your own viewtool to do that incrementing, or us the RESTful API.

Host-Unique URLs

If you follow my instructions, you get one host that handles all of your shortened URLs. This may or may not be desirable though. If you want truly unique shortened URLs per host, you don’t have much choice but to make a structure for each host, and even then, you could still have conflicts if you have multiple structures with root level URL Map Patterns.

Alias Limited URLs

One thing that could be used as a half-solution to the domain-uniqueness problem is to include a selector field that requires you to pick the domain (I’d use a custom field based on the $host.aliases values) you want the shortened URL to be attached to. This won’t remove the requirement that every path be unique, but it will allow you to throw a specific 404 if you detect a mismatch between $request.getHostName() and the alias field value on the $URLMapContent.