I'm hewing at the ANTLR parser generator documentation and find a lot of things that need to be discovered by trial and error.
Here's a simple way to change the text of a token. This example is the common problem of converting escape characters in text strings.
Here's a simple way to change the text of a token. This example is the common problem of converting escape characters in text strings.
tokens { BACKSLASH = '\\'; DOUBLEQUOTE = '"'; }ESCAPE:
BACKSLASH (
| 'n' { setText("\n");}
| 'r' { setText("\r");}
| 't' { setText("\t");}
| DOUBLEQUOTE { setText("\"");}
);
stringQuote:
q=DOUBLEQUOTE { $q.setText("");};
string:
s=stringLiteral ->^(STRING[$s.text]);
stringLiteral:
stringQuote ( ESCAPE | ~( DOUBLEQUOTE | BACKSLASH))* stringQuote;
The salient point here is that setText() changes the text of the whole token as it is ultimately presented to a parser rule. Therefore ESCAPE must be a complete token - not a fragment or referenced by another token. That makes stringLiteral a parser rule and not a token; otherwise the entire text would be overwritten by a setText().
Also the stringQuote production eliminates the delimiting double quotes from the text of stringliteral. If stringQuote were a token my grammar would be ambiguous.
The string production tidies up the tree by condensing the glob of token children of stringLiteral.into one node. If stringLiteral were a token the token fragments that compose it would combine into one node, but having to make it a parser rule makes a node with every token a child. I said this was simple?