About WhiteSpace and Antlr

Are you unsure if you need to include the space in your rules?

Most grammars will have this LEXER rule in their grammar:

WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;

Now you’ll like a rule that needs to match “a sample string” (quotes included)? How will Antlr respond to the space?

or

You want to parse a command line like parameter alike string such as “operation /option1 /option2” (quotes not included).

grammar test20091014;

//thanks to AntlrWorks 1.3 for it's useful grammar wizard.

prog	:	STRING+ OPTION*;

OPTION	:	'/' STRING;

STRING
    :  '"' ( ESC_SEQ| ~('\\'|'"') )* '"'
    |	ID
    ;

fragment ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

Parse Tree:

input: str /opt1

input: str /opt1

"str with space" /"some option"

"str with space" /"some option"

"str with space" another space /"some option"

"str with space" another space /"some option"

Conclusion:

No explicit need to include the WS in your lexer rule. “WS” will result in a token and splits input such as  “a b c d” (quote not include.

Leave a Reply