Remove letter accents from a given text
An answer to this question on Stack Overflow.
Question
Maybe I'm missing something obvious, but is there a "painless" way to replace the accented letters in a given text with their unaccented counterparts? I can only use the standard ANSI C libraries/headers, so my hands are tied. What I've tried so far:
unsigned char currentChar;
(...)
if (currentChar == 'à') {
currentChar = 'a';
}
else if (currentChar == 'è' || currentChar == 'é') {
currentChar = 'e';
}
else if (...)
However, this doesn't work. Detecting accented vowels with their extended ASCII value isn't an option, either, as I've noticed that it changes depending upon the system locale.
Any hints/suggestions?
(update)
Thanks for the answers, but I'm not really asking for the best approach for this problem - I'll think about it later. I'm simply asking for a way to detect the accented vowels, as the code above simply ignores them.
(update #2)
Okay. Let me clarify:
#include <stdio.h>
int main(void) {
int i;
char vowels[6] = {'à','è','é','ì','ò','ù'};
for (i = 0; i < 6; i++) {
switch (vowels[i]) {
case 'à': vowels[i] = 'a'; break;
case 'è': vowels[i] = 'e'; break;
case 'é': vowels[i] = 'e'; break;
case 'ì': vowels[i] = 'i'; break;
case 'ò': vowels[i] = 'o'; break;
case 'ù': vowels[i] = 'u'; break;
}
}
printf("\n");
for (i = 0; i < 6; i++) {
printf("%c",vowels[i]);
}
printf("\n");
return 0;
}
This code still prints "àèéìòù" as its output. This is my problem. I appreciate the answers, however it's pointless to tell me to implement a conversion map, or a switch/case structure. I'll think about it later.
Answer
The accented characters are likely part of the UTF-8 character set, or some other encoding. Your program is using the char type, which usually uses the [ASCII character set][1].
In the ASCII character set, each character is represented by a single byte. This character set does not include the accent character.
Other encodings do include the character, but it is probably not represented by a single byte and so cannot be processed by your code. The solution to this is usually to use wide characters.
What you will need are [wide characters][2].
[This question][3] may has more general explanation.
[This question][4] may provide a solution for your case.
This code seems to do what you would like:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(int argc, char **argv){
setlocale(LC_CTYPE, "");
FILE *f = fopen(argv[1], "r");
if (!f)
return 1;
for (wchar_t c; (c = fgetwc(f)) != WEOF;){
switch (c) {
case L'à': c=L'a'; break;
case L'è': c=L'e';break;
case L'é': c=L'e';break;
case L'ì': c=L'i';break;
case L'ò': c=L'o';break;
case L'ù': c=L'u';break;
default: break;
}
wprintf(L"%lc", c);
}
fclose(f);
return 0;
}
[1]: http://www.asciitable.com/ [2]: https://en.wikipedia.org/wiki/Wide_character [3]: https://stackoverflow.com/questions/11287213/what-is-a-wide-character-string-in-c-language [4]: https://stackoverflow.com/questions/1373463/handling-special-characters-in-c-utf-8-encoding