What is Canonicalization?
The word canonical means something that conforms to an accepted standard. Canonicalization — or canonicalisation in British English — is the process whereby something is brought into conformity with the accepted standard. In the realm of computers, the term canonicalization is used to refer to meeting standards in several different areas. It is often taken to be the problem, when it is actually the solution to a variety of problems. Since it is such a long word, canonicalization is abbreviated using its first and last letters and the number of letters in between: c14n.
Canonicalization is used in IT (Information Technology) in several settings. It refers to email sender addresses, to filename construction, to string encoding in Unicode, to the use of XML (EXtensible Markup Language), and to URL (Uniform Resource Locator) construction. In every case, the problem is the capacity for multiple formats representing the same item, with canonicalization being the way to consistency and standardization.
Take XML as an example. XML allows for syntactic changes. This means that two documents that are not identical could have the same canonical form, and thus be functionally equivalent. The Canonical XML specification was designed to address this by establishing a method by which the identity of separate documents can be established. The method for generating the canonical form for any given XML document is called the XML canonicalization method.
For URL canonicalization, the idea is to refer to a specific webpage consistently by one URL. The simplest example is two versions of a homepage, one of which has the three w’s and the other doesn’t:
This is a problem for SEO (Search Engine Optimization) because it divides the reports for traffic, all of which is actually going to the same place. The result is that the site with multiple URLs for the same pages seems to be performing more poorly than it actually is.
There are other issues besides the w’s. These include trailing slashes and differences between URL versions with upper and lower case letters. Matt Cutts of Google® recommends addressing this by using a permanent (301) redirect of all alternative URLs to the URL you want, allowing search engines to judge which is the canonical URL.
Discuss this Article
Post your comments