Canonical URLs is one of the features that I worked on for WordPress 2.3 It’s sort of a geeky concept, but the end result has benefits that a non-geek can appreciate, so I’m going to break it down for you.
WordPress has traditionally been very lenient in the URLs that it will accept.
For instance, say your blog is hosted on http://www.example.com/blog/.
You can likely access the front page of your blog via these alternative URLs:
- http://example.com/blog/
- http://www.example.com/blog/index.php/
- http://example.com/blog/index.php/
- http://www.example.com/blog/?paged=1
- http://example.com/blog/?paged=1
- http://www.example.com/blog/page/1/
- http://example.com/blog/page/1/
And those are just the “sane” ones. Try this one on for size:
That’s the front page. We have additional issues for other views. For example, consider if you are using “fancy” permalinks and have a post up at http://www.example.com/blog/2007/09/17/dont-tase-me-bro/ with a post ID of 17. The following alternative URLs will work:
- http://www.example.com/blog/2007/09/17/dont-tase-me-bro
- http://example.com/blog/2007/09/17/dont-tase-me-bro/
- http://example.com/blog/2007/09/17/dont-tase-me-bro
- http://www.example.com/blog/index.php/2007/09/17/dont-tase-me-bro/
- http://www.example.com/blog/index.php/2007/09/17/dont-tase-me-bro
- http://example.com/blog/index.php/2007/09/17/dont-tase-me-bro/
- http://example.com/blog/index.php/2007/09/17/dont-tase-me-bro
- http://www.example.com/blog/?p=17
- http://example.com/blog/?p=17
- http://www.example.com/blog/index.php?p=17
- http://example.com/blog/index.php?p=17
The following issues comprise the majority of incorrect alternative WordPress URLs.
- Old URL structure when using “fancy” permalinks
- <www.example.com vs. example.com
- “Fancy” permalinks with /index.php/ (called “PATH_INFO permalinks”) vs “fancy” permalinks without (“mod_rewrite permalinks”)
- URLs with trailing slashes vs. URLs without trailing slashes
- /page/1/ (always redundant)
- ?paged=4 vs. /page/4/
So, what’s the problem with this? The URLs are all showing the exact same content, so why should it matter? Well, search engines can’t assume that all of these alternative URLs represent the same resource. So they don’t automatically get condensed into a single resource. As a result, you can actually end up competing against yourself in search engine rankings. So to avoid confusing search engines and to consolidate your rankings for your content, there should only be one URL for a resource. We call this URL the canonical URL. Canonical means “standard” or “authoritative.” It’s the one that WordPress generates, and it’s the one that you want everyone to use.
Since version 2.2, WordPress-generated rules have been very well standardized. I personally invested a lot of time making sure things like trailing slashes were consistently standardized. So that’s one piece of the puzzle — making sure that WordPress isn’t working against you by generating non-canonical URLs. But of course, you can’t control who links to you, and third parties can make errors when typing or copy-pasting your URLs. This canonical URL degeneration has a way of propogating. That is, Site A links to your site using a non-canonical URL. Then Site B see’s Site A’s link, and copy-pastes it into their blog. If you allow a non-canonical URL to stay in the address bar, people will use it. That usage is detrimental to your search engine ranking, and damages the web of links that makes up the Web. By redirecting to the canonical URL, we can help stop these errors from propagating, and at least generate 301 redirects for the errors that people may make when linking to your blog.
My goal for WordPress 2.3 was to cover the majority of canonical URL issues that people have and make WordPress automatically redirect those requests to the correct (canonical) URL for that resource. Early tries at this functionality had issues with being too aggressive. I rewrote the functionality multiple times, until I settled upon the current incarnation. I’m quite happy with it.
Ideally, you shouldn’t even be aware of the feature. You might have issues, however, if you have enabled your own form of canonical URL redirection that isn’t redirecting the the URLs that WordPress thinks are the canonical version. For instance, if your blog is http://www.example.com/blog/ but you have a line in your .htaccess
that redirects people to http://example.com/blog/, you’re not going to be able to access your site, as the two redirects will “fight” each other in an infinite loop until the browser gives up. You’ll also have issues if your server is generating a non-standard $_SERVER['REQUEST_URI']
value. For this reason, the feature has been disabled for IIS. WordPress can set a correct $_SERVER['REQUEST_URI']
for some IIS incarnations, but fails on others. This is an issue that I hope we’re able to fix in the future. That said, the vast majority of WordPress blogs are not running on IIS, so you’ll likely be fine.
If you’re having issues with infinite redirects, please open a ticket. And in the meantime, you can use this one-line plugin to disable the feature.
309 thoughts on “WordPress 2.3: Canonical URLs”