Thursday, August 12, 2010

Eliminating index.cfm From Mura URLs With Apache and mod_rewrite

This may seem like it's a topic that's been beaten to death since there are several examples of how to do this, but none of the examples I found quite fit my exact situation. And since URL rewrite rules are all about getting the exact right characters in the exact right order (pesky computers), I thought I'd share my situation in the hopes of saving someone else some time.

First, in case some of you aren't using Mura but are reading out of general interest in URL rewriting (get a life! ;-), let's take a look at Mura URLs. Mura URLs are in the format http://server/siteid, where 'siteid' is an actual physical directory on disk. Since Mura is built on CFML it uses a directory index file of index.cfm, so when you hit http://server/siteid it's the equivalent of hitting http://server/siteid/index.cfm. The presence of a .cfm file is what triggers Apache to hand off the processing of that file to the CFML engine. This is rudimentary stuff I realize, but that index.cfm bit is important in this case since that's what we're going to be eliminating from the URL.

Pages other than the site index page in a Mura site have URLs along the lines of http://server/siteid/index.cfm/page-name, and the 'page-name' bit is not a directory or file on disk since all the Mura content is contained in the database. So if my site ID is 'foo' and my page name is 'bar', the URL for the 'bar' page would be http://server/foo/index.cfm/bar What I'd like my URL to be for that page, however, is http://server/foo/bar, so I want to eliminate index.cfm from the URL.

The solution for this is to use URL rewriting in your web server, which in my case is Apache and mod_rewrite. Using URL rewriting I can have the web server match certain URL patterns and translate these patterns into something different behind the scenes so things still work properly. So in this case I need a rewrite rule that will take the URL http://server/foo/bar and treat that as if it were http://server/foo/index.cfm/bar. Otherwise, Apache would be looking for a directory 'bar' under the 'foo' directory, which of course doesn't exist, so it would throw a 404.

Many of the URL rewrite examples I came across were designed for Mura sites that are also eliminating the site ID from the URL. This is all well and good, but since I'm working on an intranet site we want to keep the site IDs in the URLs (e.g. http://server/department), so that meant most of the examples I saw weren't quite right for my needs.

The tricky part (at least for me as a relative mod_rewrite amateur) is that instead of being able to say "take my server name, add 'index.cfm' to that, and then tack on the rest of the URL being requested," I had to take the first part of the URL being requested (the site ID, which again is a physical directory), then insert index.cfm, then tack on the rest of the URL being requested (i.e. everything after the site ID in the URL).

This is actually pretty easy if you want to create a rewrite rule for each site ID on the server, but that would mean additional Apache configuration and an Apache restart every time a new site is added to Mura. That seemed like a maintenance nightmare to me so I decided to try to come up with a solution that would handle eliminating index.cfm regardless of the site ID.

What I ended up doing was cobbling together several of the other great resources linked above for many of the other considerations with rewriting for Mura, and then came up with a rewrite rule that will leave the site ID in place, then add index.cfm, and then tack on the rest of the URL being requested.
Quite a lot of preamble for a one-line rewrite rule, but here it is in all its gory glory:

RewriteRule ^/([a-zA-Z0-9-]{1,})/(.*)$ ajp://%{HTTP_HOST}:8009/$1/index.cfm/$2 [NE,P]


  • Create a backreference match for alphanumeric characters and hyphens in the URL up to the first / (these are the allowed characters in Mura site IDs)
  • Create a second backreference for everything else in the URL
  • Proxy this to port 8009 on the server (since I'm using AJP proxying with Tomcat), but make the URL backreference 1 + /index.cfm/ + backreference 2
The backreference stuff wound up being my savior here. Basically whatever you put in parens in the regex of your rewrite rule becomes a backreference, so whatever matches that pattern gets assigned to a placeholder. That way when you want to do the rewriting you can refer to pieces of your URL using $ and the position. So in this case $1 refers to the first backreference (the Mura site ID), and $2 refers to the second backreference (everything in the URL following the site ID).

There are a few other rewrite rules involved to get this all working when you take into account things like image files, etc., and as I said above I'm standing on the shoulders of giants (thanks guys!), so here's the full set of rewrite rules:

  ProxyRequests Off

  <Proxy *>
    Order deny,allow
    Allow from

  ProxyPreserveHost On
  ProxyPassReverse / ajp://your-host-here:8009/

  RewriteEngine On

  # if it's a cfml request, proxy to tomcat
  RewriteRule ^([cm])(/.*)?$ ajp://%{HTTP_HOST}:8009$1$2 [P]
  # if it's a trailing slash and a real directory, append index.cfm and proxy
  RewriteRule ^(.+/)$ ajp://%{HTTP_HOST}:8009%{REQUEST_URI}index.cfm [P]

  # if it's a real file and haven't proxied, just serve it
  RewriteRule . - [L]

  # require trailing slash at this point if otherwise valid CMS url
  RewriteRule ^([a-zA-Z0-9/-]+[^/])$ $1/ [R=301,L]

  # valid cms url path is proxied to tomcat
  RewriteRule ^/([a-zA-Z0-9-]{1,})/(.*)$ ajp://%{HTTP_HOST}:8009/$1/index.cfm/$2 [NE,P]

Note that you can put the rewrite rules in a .htaccess file or in the virtual host configuration file. I prefer having the rewrite rules be defined as part of the virtual host, so in my case I have all this in the <VirtualHost> block in my virtual hosts config file. (I have a bunch of other stuff in the virtual host configuration as well but I left everything other than the parts relevant to the URL rewriting out to avoid any confusion.)

That's it! I hope that helps others looking into doing this with Mura, or that you at least learned a few things about mod_rewrite along the way.

1 comment:

samoir said...

thanks for the writeup matt, about to launch into this now (Search: "corz mod_rewrite" ) a gd reference you might like