admin管理员组

文章数量:1122832

I want to build a little lexer and parser by myself. I want the lexer to produce a vector of tokens that I feed into the parser later. Now I think about what belongs into which stage.

Let's look at this input:

xy = 1.23

My token stream could be one of the following - or a mixture of both:

  1. letter letter whitespace eqsign whitespace digit dot digit digit
  2. identifier eqsign decimal

To further process the input, I need (2) of course. But to what extend will the lexer stage do the job? I could also think of 2 consecutive lexer stages in which Lexer1 will produce (1) from String and Lexer2 will produce (2) from List<Lexer1Token>.

Similary, for <b>test</b> in HTML, the tokens might be

  1. lt string gt string lt slash string gt
  2. opentag[type=b] string closingtag[type=b]

I want to build a little lexer and parser by myself. I want the lexer to produce a vector of tokens that I feed into the parser later. Now I think about what belongs into which stage.

Let's look at this input:

xy = 1.23

My token stream could be one of the following - or a mixture of both:

  1. letter letter whitespace eqsign whitespace digit dot digit digit
  2. identifier eqsign decimal

To further process the input, I need (2) of course. But to what extend will the lexer stage do the job? I could also think of 2 consecutive lexer stages in which Lexer1 will produce (1) from String and Lexer2 will produce (2) from List<Lexer1Token>.

Similary, for <b>test</b> in HTML, the tokens might be

  1. lt string gt string lt slash string gt
  2. opentag[type=b] string closingtag[type=b]
Share Improve this question edited Nov 21, 2024 at 18:23 MJane 132 bronze badges asked Nov 21, 2024 at 18:16 MrSnrubMrSnrub 1,1831 gold badge11 silver badges24 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

Obviously it depends if your language (e.g. Your language might need special handling of .``. ) but for most cases you just need version 2, [identifier, equal, decimal] ( I would call it assign).

Let the lexer do as much as possible without getting into the domain of the parser (e.g. decide if the order is valid).

本文标签: parsingGranularity of tokens for lexerStack Overflow