Silence & Solitude makes...

Pu's mind space

Compare Latin Charactors to Basic ASCII Charactor

In some circumstances we meet words that contains Latin charactors, like the word naïve , especially in names like simão. For some reason, we want to translate them to naive, and simao, or at least we can know that they are equal to naive or simao respectively.

I’ve met such a problem recently, and I try to find a solution to this, but honestly, it’s hard to describe such a question. When I browse all the posible related pages which google shows, I find a function in ES6 called normalize which finally helped me out. If you look at the description of the argument of this function, and finally trace to the concept of the so called Canonical Decomposition, you probably would WOW out like me do. Yes, it’s exactly what we want, we want È É Ê Ë being equal to E, want ìíîï all equal to i. Now with this function, we can easily solve the problem with the help of this function, and I’m really glad I found the solution even before I know how to describe the problem.

So here is the utility function

1
2
3
function convertLatin(str) {
return str.normalize("NFD").match(/\w/g).join("");
}

And you’ll find that convertLatin(“naïve”) === “naive”, and convertLatin(“simão”) === “simao”. Enjoy this small utility!

##Update(26 Mar 2017)

After some close investigation, I found the solution above is neither robust nor necessary, as ES6 have provided official support for this user case, please check the new API of
String.prototype.localeCompare
Intl.Collator