antlr4 - How to capture all data from part of the input - Stack Overflow

IT技术

更新时间：2025-04-105

admin管理员组
文章数量:1405570

I'd like to replicate the 'here' document from the bash scripting language so I need to capture all data between a start and end point.

Here's my grammar;

grammar Here;

hereDocument: '<<' IDENTIFIER '\n'  hereContent IDENTIFIER ;

hereContent:
        .*
        ;

IDENTIFIER
    : [a-zA-Z_][a-zA-Z_0-9]* ;

WS
    : [ \t\r\n]+ -> skip ;

Here's the data block

<<here
    sdfsdf
    !@#$%^&*()_
    1234567890-
    a111
here

I want to capture all data between '<<here' and 'here' as hereContent but ANTLR4 falls back the definition of IDENTIFIER and anything that does not match that definition is treated as extraneous input.

I'd like to replicate the 'here' document from the bash scripting language so I need to capture all data between a start and end point.

Here's my grammar;

grammar Here;

hereDocument: '<<' IDENTIFIER '\n'  hereContent IDENTIFIER ;

hereContent:
        .*
        ;

IDENTIFIER
    : [a-zA-Z_][a-zA-Z_0-9]* ;

WS
    : [ \t\r\n]+ -> skip ;

Here's the data block

<<here
    sdfsdf
    !@#$%^&*()_
    1234567890-
    a111
here

I want to capture all data between '<<here' and 'here' as hereContent but ANTLR4 falls back the definition of IDENTIFIER and anything that does not match that definition is treated as extraneous input.

Share Improve this question edited Mar 23 at 5:26 Ken White 126k15 gold badges236 silver badges466 bronze badges asked Mar 23 at 5:15 user1818726 1318 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

You can't really do that reasonably inside parser rules. You'll need to add some logic inside your lexer to perform checks if you've encountered a start <<... token. You could use lexical modes for this. Here's a quick demo:

lexer grammar HereLexer;

@members {
  String hereStart = null;

  boolean hereEndAhead() {
    for (int i = 1; i <= hereStart.length(); i++) {
      if (hereStart.charAt(i - 1) != _input.LA(i)) {
        return false;
      }
    }
    return true;
  }
}

ID
 : [a-zA-Z_] [a-zA-Z_0-9]*
 ;

HERE_START
 : '<<' ID {hereStart = getText().substring(2);} -> pushMode(HereMode)
 ;

SPACE
 : [ \t\r\n] -> skip
 ;

OTHER
 : .
 ;

mode HereMode;

HERE_END
 : {hereEndAhead()}? [a-zA-Z_] [a-zA-Z_0-9]* -> popMode
 ;

DATA_BLOCK
 : ({!hereEndAhead()}? . )+
 ;

If you now run the Java code:

String source = "here\n" +
        "there\n" +
        "<<here\n" +
        "    sdfsdf\n" +
        "    !@#$%^&*()_\n" +
        "    1234567890-\n" +
        "    a111\n" +
        "here\n" +
        "done";

HereLexer lexer = new HereLexer(CharStreams.fromString(source));
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();

for (Token t : tokens.getTokens()) {
    System.out.printf("%-20s '%s'\n",
            HereLexer.VOCABULARY.getSymbolicName(t.getType()),
            t.getText().replace("\n", "\\n"));
}

you'll see the following output being printed:

ID                   'here'
ID                   'there'
HERE_START           '<<here'
DATA_BLOCK           '\n    sdfsdf\n    !@#$%^&*()_\n    1234567890-\n    a111\n'
HERE_END             'here'
ID                   'done'
EOF                  '<EOF>'

本文标签： antlr4How to capture all data from part of the inputStack Overflow

版权声明：本文标题：antlr4 - How to capture all data from part of the input - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744296358a2599360.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

antlr4 - How to capture all data from part of the input - Stack Overflow

1 Answer 1

更多相关文章