What is Canonicalization?

Mary Elizabeth
Mary Elizabeth

The word canonical means something that conforms to an accepted standard. Canonicalization — or canonicalisation in British English — is the process whereby something is brought into conformity with the accepted standard. In the realm of computers, the term canonicalization is used to refer to meeting standards in several different areas. It is often taken to be the problem, when it is actually the solution to a variety of problems. Since it is such a long word, canonicalization is abbreviated using its first and last letters and the number of letters in between: c14n.

Woman doing a handstand with a computer
Woman doing a handstand with a computer

Canonicalization is used in IT (Information Technology) in several settings. It refers to email sender addresses, to filename construction, to string encoding in Unicode, to the use of XML (EXtensible Markup Language), and to URL (Uniform Resource Locator) construction. In every case, the problem is the capacity for multiple formats representing the same item, with canonicalization being the way to consistency and standardization.

Take XML as an example. XML allows for syntactic changes. This means that two documents that are not identical could have the same canonical form, and thus be functionally equivalent. The Canonical XML specification was designed to address this by establishing a method by which the identity of separate documents can be established. The method for generating the canonical form for any given XML document is called the XML canonicalization method.

For URL canonicalization, the idea is to refer to a specific webpage consistently by one URL. The simplest example is two versions of a homepage, one of which has the three w’s and the other doesn’t:




This is a problem for SEO (Search Engine Optimization) because it divides the reports for traffic, all of which is actually going to the same place. The result is that the site with multiple URLs for the same pages seems to be performing more poorly than it actually is.

There are other issues besides the w’s. These include trailing slashes and differences between URL versions with upper and lower case letters. Matt Cutts of Google® recommends addressing this by using a permanent (301) redirect of all alternative URLs to the URL you want, allowing search engines to judge which is the canonical URL.

Mary Elizabeth
Mary Elizabeth

Mary Elizabeth is passionate about reading, writing, and research, and has a penchant for correcting misinformation on the Internet. In addition to contributing articles to EasyTechJunkie about art, literature, and music, Mary Elizabeth is a teacher, composer, and author. She has a B.A. from the University of Chicago’s writing program and an M.A. from the University of Vermont, and she has written books, study guides, and teacher materials on language and literature, as well as music composition content for Sibelius Software.

You might also Like

Discuss this Article

Post your comments
Forgot password?