admin管理员组文章数量:1394544
I'd like to replicate the 'here' document from the bash scripting language so I need to capture all data between a start and end point.
This is my grammar;
options { tokenVocab = HereLexer; }
doc : part* EOF ;
part: hereDocument | ID;
hereDocument: HERE_START DATA_BLOCK HERE_END ;
This is the lexer (provided in previous question How to capture all data from part of the input);
@members {
String hereStart = null;
boolean hereEndAhead() {
for (int i = 1; i <= hereStart.length(); i++) {
if (hereStart.charAt(i - 1) != _input.LA(i)) {
return false;
}
}
return true;
}
}
ID
: [a-zA-Z_] [a-zA-Z_0-9]*
;
HERE_START
: '<<' ID {hereStart = getText().substring(2);} -> pushMode(HereMode)
;
SPACE
: [ \t\r\n] -> skip
;
mode HereMode;
HERE_END
: {hereEndAhead()}? [a-zA-Z_] [a-zA-Z_0-9]* -> popMode
;
DATA_BLOCK
: ({!hereEndAhead()}? . )+
;
The lexer is able to properly tokenize the input but the parser/visitor still fails to properly parse this input;
When I run this code,
"there\n" +
"<<here\n" +
" sdfsdf\n" +
" !@#$%^&*()_\n" +
" 1234567890-\n" +
" a111\n" +
"here\n" +
"done";
HereLexer lexer = new HereLexer(CharStreams.fromString(source));
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();
for (Token t : tokens.getTokens()) {
System.out.printf("%-20s '%s'\n",
HereLexer.VOCABULARY.getSymbolicName(t.getType()),
t.getText().replace("\n", "\\n"));
}
HereDocVisitor<?> visitor = new HereDocBaseVisitor<>();
HereDoc parser = new HereDoc(new CommonTokenStream(lexer));
visitor.visit(parser.hereDocument());
I Get this result;
ID 'there'
HERE_START '<<here'
DATA_BLOCK '\n sdfsdf\n !@#$%^&*()_\n 1234567890-\n a111\n'
HERE_END 'here'
ID 'done'
EOF '<EOF>'
line 9:4 mismatched input '<EOF>' expecting HERE_START
The ANTLR4 tool shows this tree;
I'd like to replicate the 'here' document from the bash scripting language so I need to capture all data between a start and end point.
This is my grammar;
options { tokenVocab = HereLexer; }
doc : part* EOF ;
part: hereDocument | ID;
hereDocument: HERE_START DATA_BLOCK HERE_END ;
This is the lexer (provided in previous question How to capture all data from part of the input);
@members {
String hereStart = null;
boolean hereEndAhead() {
for (int i = 1; i <= hereStart.length(); i++) {
if (hereStart.charAt(i - 1) != _input.LA(i)) {
return false;
}
}
return true;
}
}
ID
: [a-zA-Z_] [a-zA-Z_0-9]*
;
HERE_START
: '<<' ID {hereStart = getText().substring(2);} -> pushMode(HereMode)
;
SPACE
: [ \t\r\n] -> skip
;
mode HereMode;
HERE_END
: {hereEndAhead()}? [a-zA-Z_] [a-zA-Z_0-9]* -> popMode
;
DATA_BLOCK
: ({!hereEndAhead()}? . )+
;
The lexer is able to properly tokenize the input but the parser/visitor still fails to properly parse this input;
When I run this code,
"there\n" +
"<<here\n" +
" sdfsdf\n" +
" !@#$%^&*()_\n" +
" 1234567890-\n" +
" a111\n" +
"here\n" +
"done";
HereLexer lexer = new HereLexer(CharStreams.fromString(source));
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();
for (Token t : tokens.getTokens()) {
System.out.printf("%-20s '%s'\n",
HereLexer.VOCABULARY.getSymbolicName(t.getType()),
t.getText().replace("\n", "\\n"));
}
HereDocVisitor<?> visitor = new HereDocBaseVisitor<>();
HereDoc parser = new HereDoc(new CommonTokenStream(lexer));
visitor.visit(parser.hereDocument());
I Get this result;
ID 'there'
HERE_START '<<here'
DATA_BLOCK '\n sdfsdf\n !@#$%^&*()_\n 1234567890-\n a111\n'
HERE_END 'here'
ID 'done'
EOF '<EOF>'
line 9:4 mismatched input '<EOF>' expecting HERE_START
The ANTLR4 tool shows this tree;
Share Improve this question asked Mar 27 at 9:34 user1818726user1818726 1318 bronze badges 4 |1 Answer
Reset to default 1As mentioned in the comments: if you invoke hereDocument()
, the input should only be a a here-document and not the entire input you provided in your question.
I don't know what ANTLR tool/plugin you're using, but many of these tools do not run embedded code, which might be the cause of the error you're getting.
When I run this Java code:
String source = "here\n" +
"there\n" +
"<<here\n" +
" sdfsdf\n" +
" !@#$%^&*()_\n" +
" 1234567890-\n" +
" a111\n" +
"here\n" +
"done";
HereLexer lexer = new HereLexer(CharStreams.fromString(source));
HereDoc parser = new HereDoc(new CommonTokenStream(lexer));
System.out.println(parser.doc().toStringTree(parser));
the following output is printed (without errors):
(doc
(part here)
(part there)
(part
(hereDocument <<here \n sdfsdf\n !@#$%^&*()_\n 1234567890-\n a111\n here))
(part done)
<EOF>)
(I added the indentation afterwards)
本文标签: How to capture all data from part of the input with ANTLR4 visitorStack Overflow
版权声明:本文标题:How to capture all data from part of the input with ANTLR4 visitor - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744099503a2590808.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
new CommonTokenStream(lexer)
twice? Your lexer has to bereset()
, which rewinds the input back to the first char, but you don't do that. And you don't override it to reset yourhereStart
field. You need to learn how to use a debugger. – kaby76 Commented Mar 27 at 12:26parser.doc()
because the root of the tree isdoc
. The code you give says you calledparser.hereDocument()
.hereDocument()
is the wrong entry point for the parse because the input starts with an ID "there". You calltokens.reset()
, which is wrong, not only because it's deprecated, but it's on a token stream that you ignore because you create a new one:new CommonTokenStream(lexer)
a second time. Calllexer.reset()
after printing the tokens. Do not create a second token stream. Call the correct parser entry pointparser.doc()
. – kaby76 Commented Mar 27 at 19:03lexer
object and new character stream as well.lexer.reset()
works fine. If the grammar writer writes a lexer base class, he must implement areset()
method. Otherwise, the state of the base class will not be reset. – kaby76 Commented Mar 27 at 19:12