validation - Recursive Descent Parser in Python not validating code - Stack Overflow

IT技术

更新时间：2025-03-200

admin管理员组
文章数量:1336281

I was tasked to write a top-down recursive descent parser for the Puck-24.3 programming language.

Task:

Language Token Types:

Integers are non-empty sequences of digits optionally preceded with either a ‘+’ or ‘-’ sign.

Decimal numbers are Integers followed by a ‘.’, followed by a non-empty sequence of digits.

Strings are any non-space sequences of characters enclosed in “”. e.g. “hello” “abc123”.

Keywords are the following strings: ‘PRINT’, ‘.’, ‘[’, ‘]’, ‘(’, ‘)’ and ‘^’.

Notice: keywords are uppercase

Operators are the special strings. In this homework we will use ‘:-’, ‘~’, ‘<’, ‘>’, '=', ‘#’, ‘+’, ‘-’, ‘&’, ‘OR’, ‘*’, ‘/’ and ‘AND’.

Identifiers are sequences of digits or letters. The first character must be a letter, and an identifier cannot be a Keyword.

Reminder: In Puck-24.3 tokens are always separated by white-spaces.

NOTE: For this assignment, your parser does not need to handle line comments.

Examples

Sample 1

Input:
x :- 2 + 2 .
PRINT ( x * 100 ) .
Expected output:
VALID
Sample 2

Input:
x :- 1 .
a [ i ] :- 2 .
w [ 3 ] ^ ch :- "a" .
t ^ key :- s .
p ^ next ^ data :- alpha .
x :- x + y .
y :- x - y .
c :- c + 1 .
PRINT ( c ) . 
Expected output:
VALID
Sample 3

Input:
PRINT ( a # b ) .
flag :- a > b .
a :- a - b .  b :- b - a .
PRINT ( "Hello" & "World!" ) .
Expected output:
VALID
Sample 4

Input:
foo :- ( PRINT ) ;
PRINT ( * )
Expected output:
INVALID!
Error: factor expected, got "PRINT" 
Note: you may output a different error message.

Sample 4

Input:
void getToken ( ) {
  cin >> token ;
}
Expected output:
INVALID!
Error: ":-" expected, got "getToken" 
Note: you may output a different error message.

My approach

Given a grammar, in order to build a top-down recursive descent parser, I follow the following technique:

For each non-terminal, I designed a corresponding function that parses sentences that can be generated by that non-terminal.

The function getToken() retrieves the next token from the input and stores it in a global variable token.

A non-terminal that has more than one production requires a check of the token to determine which production to apply. The token is compared with the first symbol in each production until a match is found. If no match is found, the program outputs an error and exits.

My code

import sys

# Initialize token and jetton
token = None
jetton = 0
tokens = []

def getToken():
    """Fetches the next token from the input and increments jetton."""
    global token, jetton
    if tokens:
        token = tokens.pop(0)
        jetton += 2  # Increment jetton for each token
    else:
        token = None

def error(expected):
    """Raise an error with an expected message."""
    print("INVALID!")
    print(f"Error: {expected}, got '{token}'")
    sys.exit(1)

def parse_Factor():
    """Parses the Factor rule."""
    global token
    if token.isdigit() or (token[0] in "+-" and token[1:].isdigit()):  # Integer
        getToken()
    elif token.startswith('"') and token.endswith('"'):  # String
        getToken()
    elif token.isidentifier():  # Identifier as a variable or designator
        getToken()
    elif token == "(":
        getToken()
        parse_Expression()
        if token == ")":
            getToken()
        else:
            error("')' expected")
    elif token == "~":
        getToken()
        parse_Factor()
    else:
        error("Factor expected")

def parse_Term():
    """Parses the Term rule."""
    parse_Factor()
    while token in ["*", "/", "AND"]:
        getToken()
        parse_Factor()

def parse_SimpleExpression():
    """Parses the SimpleExpression rule."""
    parse_Term()
    while token in ["+", "-", "OR", "&"]:
        getToken()
        parse_Term()

def parse_Expression():
    """Parses the Expression rule."""
    parse_SimpleExpression()
    if token in ["<", ">", "=", "#"]:  # Relation
        getToken()
        parse_SimpleExpression()

def parse_Designator():
    """Parses the Designator rule."""
    if token.isidentifier():
        getToken()
        while token in ["^", "["]:
            if token == "^":
                getToken()
                if token.isidentifier():
                    getToken()
                else:
                    error("Identifier expected after '^'")
            elif token == "[":
                getToken()
                parse_Expression()
                if token == "]":
                    getToken()
                else:
                    error("']' expected")
    else:
        error("Identifier expected")

def parse_Assignment():
    """Parses the Assignment rule."""
    parse_Designator()
    if token == ":-":
        getToken()
        parse_Expression()
        if token == ".":
            getToken()
        else:
            error("'.' expected")
    else:
        error("':-' expected")

def parse_PrintStatement():
    """Parses the PrintStatement rule."""
    if token == "PRINT":
        getToken()
        if token == "(":
            getToken()
            parse_Expression()
            if token == ")":
                getToken()
                if token == ".":
                    getToken()
                else:
                    error("'.' expected")
            else:
                error("')' expected")
        else:
            error("'(' expected")
    else:
        error("'PRINT' expected")

def parse_Statement():
    """Parses a single Statement."""
    if token.isidentifier():
        parse_Assignment()
    elif token == "PRINT":
        parse_PrintStatement()
    else:
        error("Statement expected")

def parse_StatementSequence():
    """Parses the StatementSequence rule."""
    while token:
        parse_Statement()

def main():
    """Main function to handle input and parsing."""
    global tokens
    print("Enter your program (end input with Ctrl+D):")
    try:
        # Read program from standard input
        program = sys.stdin.read()
        # Tokenize by whitespace
        tokens = program.split()
        getToken()  # Initialize first token
        parse_StatementSequence()  # Start parsing
        print("VALID")
    except Exception as e:
        print(f"Error: {e}")
        sys.exit(1)

if __name__ == "__main__":
    main()

Problem

For the first 3 input samples, the program prints "INVALID" with an error message when it should print "VALID".

Error messages:

(sample 1)
INVALID!
Error: ':-' expected, got '('

(sample 2)
INVALID!
Error: Factor expected, got 'i'

(sample 3)
INVALID!
Error: Factor expected, got 'i'

What is the problem in my code?

I was tasked to write a top-down recursive descent parser for the Puck-24.3 programming language.

Task:

Language Token Types:

Integers are non-empty sequences of digits optionally preceded with either a ‘+’ or ‘-’ sign.

Decimal numbers are Integers followed by a ‘.’, followed by a non-empty sequence of digits.

Strings are any non-space sequences of characters enclosed in “”. e.g. “hello” “abc123”.

Keywords are the following strings: ‘PRINT’, ‘.’, ‘[’, ‘]’, ‘(’, ‘)’ and ‘^’.

Notice: keywords are uppercase

Operators are the special strings. In this homework we will use ‘:-’, ‘~’, ‘<’, ‘>’, '=', ‘#’, ‘+’, ‘-’, ‘&’, ‘OR’, ‘*’, ‘/’ and ‘AND’.

Identifiers are sequences of digits or letters. The first character must be a letter, and an identifier cannot be a Keyword.

Reminder: In Puck-24.3 tokens are always separated by white-spaces.

NOTE: For this assignment, your parser does not need to handle line comments.

Examples

Sample 1

Input:
x :- 2 + 2 .
PRINT ( x * 100 ) .
Expected output:
VALID
Sample 2

Input:
x :- 1 .
a [ i ] :- 2 .
w [ 3 ] ^ ch :- "a" .
t ^ key :- s .
p ^ next ^ data :- alpha .
x :- x + y .
y :- x - y .
c :- c + 1 .
PRINT ( c ) . 
Expected output:
VALID
Sample 3

Input:
PRINT ( a # b ) .
flag :- a > b .
a :- a - b .  b :- b - a .
PRINT ( "Hello" & "World!" ) .
Expected output:
VALID
Sample 4

Input:
foo :- ( PRINT ) ;
PRINT ( * )
Expected output:
INVALID!
Error: factor expected, got "PRINT" 
Note: you may output a different error message.

Sample 4

Input:
void getToken ( ) {
  cin >> token ;
}
Expected output:
INVALID!
Error: ":-" expected, got "getToken" 
Note: you may output a different error message.

My approach

Given a grammar, in order to build a top-down recursive descent parser, I follow the following technique:

For each non-terminal, I designed a corresponding function that parses sentences that can be generated by that non-terminal.

The function getToken() retrieves the next token from the input and stores it in a global variable token.

A non-terminal that has more than one production requires a check of the token to determine which production to apply. The token is compared with the first symbol in each production until a match is found. If no match is found, the program outputs an error and exits.

My code

import sys

# Initialize token and jetton
token = None
jetton = 0
tokens = []

def getToken():
    """Fetches the next token from the input and increments jetton."""
    global token, jetton
    if tokens:
        token = tokens.pop(0)
        jetton += 2  # Increment jetton for each token
    else:
        token = None

def error(expected):
    """Raise an error with an expected message."""
    print("INVALID!")
    print(f"Error: {expected}, got '{token}'")
    sys.exit(1)

def parse_Factor():
    """Parses the Factor rule."""
    global token
    if token.isdigit() or (token[0] in "+-" and token[1:].isdigit()):  # Integer
        getToken()
    elif token.startswith('"') and token.endswith('"'):  # String
        getToken()
    elif token.isidentifier():  # Identifier as a variable or designator
        getToken()
    elif token == "(":
        getToken()
        parse_Expression()
        if token == ")":
            getToken()
        else:
            error("')' expected")
    elif token == "~":
        getToken()
        parse_Factor()
    else:
        error("Factor expected")

def parse_Term():
    """Parses the Term rule."""
    parse_Factor()
    while token in ["*", "/", "AND"]:
        getToken()
        parse_Factor()

def parse_SimpleExpression():
    """Parses the SimpleExpression rule."""
    parse_Term()
    while token in ["+", "-", "OR", "&"]:
        getToken()
        parse_Term()

def parse_Expression():
    """Parses the Expression rule."""
    parse_SimpleExpression()
    if token in ["<", ">", "=", "#"]:  # Relation
        getToken()
        parse_SimpleExpression()

def parse_Designator():
    """Parses the Designator rule."""
    if token.isidentifier():
        getToken()
        while token in ["^", "["]:
            if token == "^":
                getToken()
                if token.isidentifier():
                    getToken()
                else:
                    error("Identifier expected after '^'")
            elif token == "[":
                getToken()
                parse_Expression()
                if token == "]":
                    getToken()
                else:
                    error("']' expected")
    else:
        error("Identifier expected")

def parse_Assignment():
    """Parses the Assignment rule."""
    parse_Designator()
    if token == ":-":
        getToken()
        parse_Expression()
        if token == ".":
            getToken()
        else:
            error("'.' expected")
    else:
        error("':-' expected")

def parse_PrintStatement():
    """Parses the PrintStatement rule."""
    if token == "PRINT":
        getToken()
        if token == "(":
            getToken()
            parse_Expression()
            if token == ")":
                getToken()
                if token == ".":
                    getToken()
                else:
                    error("'.' expected")
            else:
                error("')' expected")
        else:
            error("'(' expected")
    else:
        error("'PRINT' expected")

def parse_Statement():
    """Parses a single Statement."""
    if token.isidentifier():
        parse_Assignment()
    elif token == "PRINT":
        parse_PrintStatement()
    else:
        error("Statement expected")

def parse_StatementSequence():
    """Parses the StatementSequence rule."""
    while token:
        parse_Statement()

def main():
    """Main function to handle input and parsing."""
    global tokens
    print("Enter your program (end input with Ctrl+D):")
    try:
        # Read program from standard input
        program = sys.stdin.read()
        # Tokenize by whitespace
        tokens = program.split()
        getToken()  # Initialize first token
        parse_StatementSequence()  # Start parsing
        print("VALID")
    except Exception as e:
        print(f"Error: {e}")
        sys.exit(1)

if __name__ == "__main__":
    main()

Problem

For the first 3 input samples, the program prints "INVALID" with an error message when it should print "VALID".

Error messages:

(sample 1)
INVALID!
Error: ':-' expected, got '('

(sample 2)
INVALID!
Error: Factor expected, got 'i'

(sample 3)
INVALID!
Error: Factor expected, got 'i'

What is the problem in my code?

Share edited Nov 20, 2024 at 11:34 trincot 352k36 gold badges272 silver badges325 bronze badges asked Nov 19, 2024 at 18:25 DaneShuler 435 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

The cause of the problem you list is that your code will never call parse_PrintStatement. This is because isidentifier evaluates to True when token is "PRINT". Whereever you call isindentifier you should exclude the case where the token is "PRINT" (and also exclude "OR" and "AND").

Note how this bug also affects the result on sample 4: although your program outputs INVALID for that case, it didn't spot the problem with the "PRINT" token, but only complained about the semi-colon. If you replace the semi-colon with a dot, the input should still be considered invalid, but your program will then find no more issues in that input. The reason is the same as described above.

For example, you could create this helper function:

def is_identifier():
    return token.isidentifier() and token not in ("AND", "OR", "PRINT")

And then use that function instead of calling token.isidentifier().

Also, be aware that the rules for identifiers applied by the native isidentifier method are slightly different than what is described in your task. For instance, underscores are allowed as identifier by Python, but maybe not in this Puck language. Similar concerns may arise when using isdigit, isalpha...

Other remarks

Without knowing the grammar rules for this language, there might be more problems in your parser, but:

A quoted string literal cannot be just " (one character). Yet your code would consider this a valid string literal. You should make sure there are at least two characters in the token.
Your code does not recognise decimal literals, like 3.14.

本文标签： validationRecursive Descent Parser in Python not validating codeStack Overflow

版权声明：本文标题：validation - Recursive Descent Parser in Python not validating code - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1742406180a2468865.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

validation - Recursive Descent Parser in Python not validating code - Stack Overflow

Task:

Language Token Types:

Examples

Sample 1

Sample 2

Sample 3

Sample 4

Sample 4

My approach

My code

Problem

Task:

Language Token Types:

Examples

Sample 1

Sample 2

Sample 3

Sample 4

Sample 4

My approach

My code

Problem

1 Answer 1

Other remarks

更多相关文章

validation - Recursive Descent Parser in Python not validating code - Stack Overflow

发表评论

推荐文章

javascript - Closing Dropdown Menu by Clicking Outside the Box - Stack Overflow

javascript - window.print() repeating modal on every page - Stack Overflow

java - Maven: Create ZIP and store in JAR - Stack Overflow

Cron job for creating posts not excecuting properly

swift - Downloading from Camera Roll - Stack Overflow

热门文章

javascript - How to import a JSON5 file (as one can regular JSON) in Typescript? - Stack Overflow

mysql - How do I run debezium with Confluent platform? - Stack Overflow

java - JPA repository query that returns all rows with a nth-to-nth relation - Stack Overflow

php - Limit Authors to their Own Posts on front-end excluding admins

javascript - How can I insert a character after every 2 characters? - Stack Overflow

javascript - The better approach to design AngularJS services - Stack Overflow

c# - How to predefine subscription prices and trial period? - Stack Overflow

javascript - Adding autofocus attribute when input is hovered - Stack Overflow

jms - Azure Service Bus and Azure Container Apps - Redeploy fails as there alread is a client using tha sub against that topic -

javascript - Implementing circular scroller in jquery - Stack Overflow

最新文章

Windows安装配置Maven

下载Domino V10以及其系统要求

Mac装Ｗindows双系统

Windows下Anaconda的下载,安装与使用

windows安装Oracle11下载地址以及安装步骤带图详解每个步骤

javascript - How to use svg icons just like &quot;font awesome&quot; icons - Stack Overflow

python - Pandas Dataframe ffill with One Greater the Previous Nonzero Value - Stack Overflow

javascript - How to do slow scrolling like these website are doing - Stack Overflow

mysql - how to access 2nd Mysqli Database from different server for wordpress

javascript - Merging 2 arrays based on one common key angular 2 - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

javascript - How to use svg icons just like "font awesome" icons - Stack Overflow