Skip to content

Avoiding Comments w/ C++ getline()

An answer to this question on Stack Overflow.

Question

I'm using getline() to open a .cpp file.

getline(theFile, fileData);

I'm wondering if there is any way to have getline() avoid grabbing c++ comments (/*, */ and //)?

So far, trying something like this doesn't quite work.

if (fileData[i] == '/*')

Answer

I think it's unavoidable for you to read the comments, but you can dispose of them by reading through the file one character at a time.

To do this, you can load the file into a string and build a state machine with the following states:

  1. This is actual code
  2. The previous character was /
  3. The previous character was *
  4. I am a single-line comment
  5. I am a multi-line comment

The state machine starts in State 1

If the machine is in State 1 and hits a / character, transition to State 2.

If the machine is in State 2 and hits a / character, transition to State 4. Otherwise, transition to State 1.

If the machine is in State 2 and hits a * character, transition to State 5. Otherwise, transition to State 1.

If the machine is in State 4 and hits a newline character, transition to State 1.

If the machine is in State 5 and hits a * character, transition to State 3.

If the machine is in State 3 and hits a / character, transition to State 1 (the multi-line comment ends). Otherwise, transition to State 5.

If you mark the positions of the characters where the machine enters and exits the comment states, you can then strip these characters from the string.

Alternatively, you could explore regular expressions, which provide ways of describing this kind of state machine very succinctly.