Tropical Web Works

  • Home
    • Blog
  • Services
    • Turnkey Service
    • WordPress Site Maintenance
  • Portfolio
  • About
  • Contact
You are here: Home / Google / The Infamous Canonical URL Issue

The Infamous Canonical URL Issue

January 18, 2007 by Sonja Ray Leave a Comment

Difficult as it may be to believe, but by January of 2007, Google is still unable to recognize when URLs that obviously lead to the same page are in fact the same page. So what’s a URL, and what’s the problem here?

URL (pronounced you-are-ell, or sometimes “earl” as in Duke of) stands for Uniform Resource Locator. It’s the technical name for the address of a particular web page. For example, the URL of this site’s home page is https://www.tropicalwebworks.com, and the URL of this page is https://www.tropicalwebworks.com/2007/01/18/infamous-canonical-url/.

It’s common that any particular web page may be reached at multiple URLs. If this site were not configured optimally, the home page might be reachable at both https://www.tropicalwebworks.com and http://tropicalwebworks.com (notice the missing “www.”). Normal people would logically think that this would be desirable: After all, you don’t want people to get a “server not found” error if they try to get to your site without including the www part.

But Google sees these as two completely separate URLs that just happen to contain exactly the same content. There are two problems with such a situation:

  1. First, the “strength” of that page, and its ability to turn up in the search engine results, is diluted. Some of the page’s strength is allotted to one version, and some to the other, and neither “page” performs as well as it would if all the strength were concentrated in one page.
  2. And second, Google attempts to filter out pages containing duplicate content, based on the reasonable logic that people don’t want to see multiple results in their searches for the exact same thing. Thus, since both of these “pages” contain the exact same content, one of them will suffer in searches due to the dupe content filter.

It’s a double whammy. It’s not that your site actually has duplicate content. No, we could possibly call this situation “virtual duplicate content.” But it’s all the same to Google: It’s duplicate content, period.

And if that’s not bad enough, many people link to their home page like this: http://www.example.com/index.html. Now Google sees yet another instance of duplicate content: http://www.example.com and http://www.example.com/index.html. So ultimately what Google sees is four “duplicate content” pages:

  • http://www.example.com
  • http://example.com
  • http://www.example.com/index.html
  • http://example.com/index.html

And all this before we’ve even gotten past the home page of your site!

It’s easy-peasy to configure the server to do what’s called a “301 permanent redirect” from the non-www version to the www version of your site. This technique, which is recommended by Google, tells Google that the two are indeed the same and keeps the poor Googlebot from deciding that you have duplicate content and splitting your page’s strength among more than one version. “301” refers to the status code that’s returned by the web server to the browser (or the spider, in this case), and it says, in effect, “Hey, the correct, permanent URL for the page you’re requesting is actually over there. Don’t index it at this URL.”

It’s likewise easy-peasy to link to your home page without the “index.html” (or other directory index name, such as home.htm or default.asp). For index pages in subdirectories, you simply link to the directory: http://www.example.com/subdirectory/, again leaving out the actual filename index.html.

I apply an appropriate 301 permanent redirect to the www version of every web site I develop. It’s not something I charge extra for, or something that I tout to my clients as being anything special. It’s about a 20-second task to set up the 301 properly. And I never link to directory index pages by filename. I don’t know why some of the big companies aren’t aware of this issue, or, if they are aware, why they don’t care enough to do it properly. It raises the question, if they’re so ignorant, or uncaring, about a thing that is so simple to do right, in how many other areas are they incompetent?

Share:

  • Facebook
  • Twitter
  • LinkedIn

Related

Filed Under: Google, Search Engines, Technology

Share Your Thoughts: Cancel reply

You must be logged in to post a comment.

Newer: Can your site be tweaked?
Older: Top Ten Easiest Code Tweaks To Improve Your Site’s Search Engine Performance

About Sonja Ray

Hi! I'm Sonja Ray, the owner of Tropical Web Works, a boutique web design and development firm in Punta Gorda in sunny South Florida. For help with your next website project, feel free to contact me.

Tropical Web Works

  • Home
  • Services
    • WordPress Site Maintenance
    • Full-Service Website Design & Development
  • Portfolio
  • About Tropical Web Works
  • SEO
  • FAQs
  • Glossary
  • Blog
  • Contact

Recent Posts

  • Corr Commercial Advisors September 25, 2019
  • Blue Water Surfing November 8, 2016
  • Charlotte Harbor Boat Storage November 8, 2016
  • “Your mailbox quota is full” Scam May 5, 2016
  • Contact Form 7 and Validation Errors May 1, 2016

Topics

Search

Top Posts & Pages

  • Website Design That Works

About Tropical Web Works

With two decades of experience in website development, I have the background and expertise to provide you with a website that will exceed your expectations. I cut my teeth on manual HTML coding back in the 1996 — the web's Jurassic era — before … Read more...

Facebook

  • Facebook

Get Started Today

We'd love to hear about your upcoming project. Tell us about it by filling out our contact form. Or, email us at sonja@tropicalwebworks.com or give us a call at 941-916-5671.

Login

  • Register
  • Lost Password

Tropical Web Works

  • Home
  • Portfolio
  • Services
  • About Tropical Web Works
  • FAQs
  • Search Engine Optimization
  • Glossary
  • Blog
  • Links
  • Portfolio
  • Search Engines
  • Technology
  • Rants
Copyright © 2003-2024 Tropical Web Works. All rights reserved.
Designed by Tropical Web Works • Privacy Policy • Archives