admin管理员组

文章数量:1124411

Question: I'm working on a custom parser using ANTLR to define a small programming language. One of the requirements is that return statements can only appear inside the body of a function. If a return statement appears outside a function, the parser should throw an error.

Here's the simplified grammar I'm working with (in ANTLR):

grammar Grammar;

options {
    language=Python3;
}

// Parser Rules
program: (var_decl | fun_decl)*;

fun_decl: type_spec ID '(' param_decl* (';' param_decl)* ')' body; // Function declarations
param_decl: type_spec ID (',' ID)* ; // Parameters for functions
type_spec: 'int' | 'float' ; // Valid types

body: '{' stmt* '}'; 
expr: 'expr';

stmt: assignment | call | r_return | var_decl;
var_decl: param_decl ';'; // Variable declarations
assignment: ID '=' expr ';';
call: ID '(' expr* (',' expr)* ')' ';';
r_return: 'return' expr ';';

// Lexer Rules
WS: [ \t\r\n] -> skip ; // Skip whitespace
ID: [a-zA-Z]+ ; // Identifiers (variable and function names)
ERROR_CHAR: . {raise ErrorToken(self.text)} ; // Error handling

The issue is that this grammar allows return statements (r_return) to appear anywhere a stmt is allowed, including in the global scope. For example:

int x;
return x; // This should throw an error.

But inside a function, it should work:

int myFunction() {
    return 42; // Valid
}

I thought about it but I did not come up with a solution. Please help me.

Question: I'm working on a custom parser using ANTLR to define a small programming language. One of the requirements is that return statements can only appear inside the body of a function. If a return statement appears outside a function, the parser should throw an error.

Here's the simplified grammar I'm working with (in ANTLR):

grammar Grammar;

options {
    language=Python3;
}

// Parser Rules
program: (var_decl | fun_decl)*;

fun_decl: type_spec ID '(' param_decl* (';' param_decl)* ')' body; // Function declarations
param_decl: type_spec ID (',' ID)* ; // Parameters for functions
type_spec: 'int' | 'float' ; // Valid types

body: '{' stmt* '}'; 
expr: 'expr';

stmt: assignment | call | r_return | var_decl;
var_decl: param_decl ';'; // Variable declarations
assignment: ID '=' expr ';';
call: ID '(' expr* (',' expr)* ')' ';';
r_return: 'return' expr ';';

// Lexer Rules
WS: [ \t\r\n] -> skip ; // Skip whitespace
ID: [a-zA-Z]+ ; // Identifiers (variable and function names)
ERROR_CHAR: . {raise ErrorToken(self.text)} ; // Error handling

The issue is that this grammar allows return statements (r_return) to appear anywhere a stmt is allowed, including in the global scope. For example:

int x;
return x; // This should throw an error.

But inside a function, it should work:

int myFunction() {
    return 42; // Valid
}

I thought about it but I did not come up with a solution. Please help me.

Share Improve this question asked 2 days ago hdz1412hdz1412 31 bronze badge New contributor hdz1412 is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 2
  • 1 Your grammar already disallows all stmt outside of a body (which only appears as part of a fun_decl), so it should already be giving an error. If it is not, the problem is not in the shown part of the grammar -- perhaps you have no way of dealing with any error, so it is simply exiting with no message when an error occurs? – Chris Dodd Commented yesterday
  • You are correct. My problem is forgetting to add EOF. – hdz1412 Commented yesterday
Add a comment  | 

2 Answers 2

Reset to default 0

Add EOF to the end of your program parser rule...

program: (var_decl | fun_decl)* EOF;

...to cause the parser to indicate an error in your first test case.

Not directly related to your question, I suggest defining lexer rules such as...

OPEN_PAREN: '(';
CLOSE_PAREN: ')';
SEMICOLON: ';';
COMMA: ',';
OPEN_CURLY: '{';
CLOSE_CURLY: '}';
EQ: '=';
INT: 'int';
FLOAT: 'float';
RETURN: 'return';

...to use in your parser rules instead of character literals.

Only allow a r_return to occur inside a r_body as the last statement and remove it from stmt:

r_body
 : '{' stmt* r_return? '}'
 ;

stmt
 : assignment
 | call
 //| r_return <-- removed
 | var_decl
 ;

r_return
 : 'return' expr ';'
 ;

本文标签: parsingHow to Restrict return Statements to Function Declarations in ANTLR GrammarStack Overflow