Different apostrophes in C
An answer to this question on Stack Overflow.
Question
I'm writing a program that reads text files, and reads the characters into an array.
I need it to be done, character by character into the array, and I need to sanitize the characters of whitespace, punctuation, and capitalization.
I already have written the code to do so. However, the assignment that I am following has a specific text document that I am to sanitize.
The problem is that the apostrophes in the document are not being marked by the ispunct function. If I remove the apostrophes in the text document and replace them with normal apostrophes, it works fine. I feel that that is not good enough because I will not be using my own text document for proof of the program's validity.
I have tried including a statement specifically for that apostrophe:
else if(c=='’') [which is pulling from c=fgetc(fp)], I get a multi-character character constant [-Wmultichar].
This small detail is driving me insane, and I can't tell why the apostrophes are different!
"that’s" Here is a piece of text from the document, when converted to hex equals 74 68 61 74 e2 80 99 73.
Answer
A better approach might be to think about what characters are left after you've stripped all the illegal ones. If it's just a-z, and a few others, that's an easy-to-detect range.