parsing - How to write a context free grammar in Haskell? Also, Am I going in the right direction in making this compiler? - Sta-软件玩家

admin管理员组
文章数量:1332382

For reference, I am learning from the book "Compilers principles: Techniques and Tools"(also known as "the dragon book").

I am making a language where you can add two natural numbers. I have finished writing my Lexical Analyser. The tokens for the Lexical Analyser consist of the following.

data Token = Digit String | Plus | Minus | Multiply | Divide
    deriving (Show, Eq)

The tokens do contain other operations but I am only concerned with adding two natural numbers. The way I have built my Lexical Analyser is as follows.

I first read all the input string and get all the words from that string. A word is any continuous string of characters followed by a space. The following function does that.

getAllWords :: String -> String -> [String]
getAllWords string word = case string of
    []     -> [word]
    ' ':xs -> word : (getAllWords xs "")
    x:xs   -> getAllWords xs (word ++ [x])

Then I remove all the whitespaces from this list of strings.

removeWhiteSpace :: [String] -> [String]
removeWhiteSpace list = case list of
    []    -> []
    "":xs -> removeWhiteSpace xs
    x:xs  -> x : (removeWhiteSpace xs)

I have a DFA for every token in the language. I run a function tokeniser that passes every word from the list I get from (removeWhiteSpace (getAllWords list "")) to every DFA. If a DFA recognises the word, it outputs the relevant token. This function returns a list of all the tokens by reading the original string from left to right.

tokeniserHelper :: [String] -> [Token]
tokeniserHelper list = case list of
    [] -> []
    x:xs
        | (digitDFA x S0 == True)    -> (Digit x) : (tokeniserHelper xs)
        | (plusDFA x S0 == True)     -> (Plus)    : (tokeniserHelper xs)
        | (minusDFA x S0 == True)    -> (Minus)   : (tokeniserHelper xs)
        | (multiplyDFA x S0 == True) -> (Multiply) : (tokeniserHelper xs)
        | (divideDFA x S0 == True)   -> (Divide) : (tokeniserHelper xs)
 
tokeniser :: String -> [Token]
tokeniser string = tokeniserHelper (removeWhiteSpace (getAllWords string ""))

I realise that this a bit different from the general idea of combining all the DFA’s into one big DFA. But I found this approach more simpler for a first time. It was also suggested in the book so I decided to go with it.

Now I have moved on to the parser part and I am trying to implement a recursive descent parser to construct a parse tree. For the parse tree, I am using a rose tree as my base data structure.

data ParseTree a = Node a [ParseTree a]
    deriving (Show, Eq)

The grammar that I want to use is as follows.

E -> E + E | 0E | . . . | 9E | \epsilon

But I am not sure how to implement this grammar to use for the recursive descent parser. Could someone guide me on how to do it.

Also, am I heading in the right direction with building a compiler? Is there anything that I am doing fundamentally wrong?

For reference, I am learning from the book "Compilers principles: Techniques and Tools"(also known as "the dragon book").

I am making a language where you can add two natural numbers. I have finished writing my Lexical Analyser. The tokens for the Lexical Analyser consist of the following.

data Token = Digit String | Plus | Minus | Multiply | Divide
    deriving (Show, Eq)

The tokens do contain other operations but I am only concerned with adding two natural numbers. The way I have built my Lexical Analyser is as follows.

I first read all the input string and get all the words from that string. A word is any continuous string of characters followed by a space. The following function does that.

getAllWords :: String -> String -> [String]
getAllWords string word = case string of
    []     -> [word]
    ' ':xs -> word : (getAllWords xs "")
    x:xs   -> getAllWords xs (word ++ [x])

Then I remove all the whitespaces from this list of strings.

removeWhiteSpace :: [String] -> [String]
removeWhiteSpace list = case list of
    []    -> []
    "":xs -> removeWhiteSpace xs
    x:xs  -> x : (removeWhiteSpace xs)

I have a DFA for every token in the language. I run a function tokeniser that passes every word from the list I get from (removeWhiteSpace (getAllWords list "")) to every DFA. If a DFA recognises the word, it outputs the relevant token. This function returns a list of all the tokens by reading the original string from left to right.

tokeniserHelper :: [String] -> [Token]
tokeniserHelper list = case list of
    [] -> []
    x:xs
        | (digitDFA x S0 == True)    -> (Digit x) : (tokeniserHelper xs)
        | (plusDFA x S0 == True)     -> (Plus)    : (tokeniserHelper xs)
        | (minusDFA x S0 == True)    -> (Minus)   : (tokeniserHelper xs)
        | (multiplyDFA x S0 == True) -> (Multiply) : (tokeniserHelper xs)
        | (divideDFA x S0 == True)   -> (Divide) : (tokeniserHelper xs)
 
tokeniser :: String -> [Token]
tokeniser string = tokeniserHelper (removeWhiteSpace (getAllWords string ""))

Now I have moved on to the parser part and I am trying to implement a recursive descent parser to construct a parse tree. For the parse tree, I am using a rose tree as my base data structure.

data ParseTree a = Node a [ParseTree a]
    deriving (Show, Eq)

The grammar that I want to use is as follows.

E -> E + E | 0E | . . . | 9E | \epsilon

But I am not sure how to implement this grammar to use for the recursive descent parser. Could someone guide me on how to do it.

Also, am I heading in the right direction with building a compiler? Is there anything that I am doing fundamentally wrong?

Share edited Nov 22, 2024 at 10:05 asked Nov 22, 2024 at 6:39 Seeker 2801 silver badge9 bronze badges

"":xs -> removeWhiteSpace xs looks suspicious. Should it be ' '? – talex Commented Nov 22, 2024 at 7:26
@talexmovedtoCodidact When the “getAllWords” function sees a space, it adds “” to the list. – Seeker Commented Nov 22, 2024 at 8:28
3 If you are following the Dragon Book you should also be aware of the Happy tool (hackage.haskell./package/happy) which is the Haskell version of YACC, and Alex (hackage.haskell./package/alex) which is the Haskell version of Lex. – Paul Johnson Commented Nov 22, 2024 at 10:53
@PaulJohnson I thought Lex was a pretty cool tool when I read about it. Thanks for sharing those! – Seeker Commented Nov 22, 2024 at 11:09

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

The usual approach in Haskell is to use a parser monad. This isn't something covered in the Dragon Book. I'm going to assume you understand monads in general. If you don't then I suggest going and looking at how things like the State, Reader and Random monads work first.

A simple parser monad for your tokens will look like this:

newtype Parser a = Parser { runParser :: [Token] -> [(a, [Token])] }

That means that a Parser takes a list of tokens and outputs a list of possible parses, along with the residual list of tokens that follow each item parsed.

In this case I've made the return type a list. This is the most general solution: it means that every possible parse is returned, allowing you to have an ambiguous grammar. However it can only signal an error by returning an empty list with no clue as to what went wrong.

So now you can write a parser rule like:

digit :: Parser (ParseTree Token)
digit = Parser $ \tokens -> case tokens of
  (Digit str : rest) -> [(ParseTree (Digit str) [], rest)]
  _ -> []

This says that a digit parser will succeed if the first token is a Digit. In this case it returns the digit as a single leaf in a parse tree, along with the rest of the list. If its anything else then the parser fails, returning an empty list of results. The Parser $ \tokens -> bit is just wrapping up an anonymous function in the Parser type.

On its own this isn't much use, but you can combine these primitive parsers into more complex ones. The two main combinators you use are concatenation (i.e. parse this, then that) and alternation (i.e. parse either this or that). This is where the monad instances come in.

First, here is the monad instance for your parser.

instance Monad Parser where
  return a = Parser $ \tokens -> [a, tokens]
  v >>= f =  Parser $ \tokens -> concatMap f2 $ runParser v tokens
    where f2 (v, rest) = runParser f2 v rest

The return function is a null parser: it returns a value without consuming anything.

The >>= (aka "bind") operator takes an existing parser v of type Parser a and a second function f of type a -> Parser b. It converts this into a single parser which does the v parser and passes the result to f, giving a second parser. This parser is then run on the rest of the tokens (i.e. what was left by the first parser) and returns the result.

I'm going to assume you know how do notation desugars into a >>= calls.

So you can write something like this:

expression :: Parser (ParseTree Token)
expression = do
   arg1 <- digit
   op <- operator  -- Exercise: define operator :: Parser (ParseTree Token)
   arg2 <- digit
   return $ ParseTree op [arg1, arg2]

That will parse a simple digit + digit expression.

You will also need an operator for alternatives, and then you can construct your general recursive descent parser. See also the Alternative typeclass, which gives you the general typeclass for this concept.

I said at the top that the parser type I gave isn't very good. It can't do error messages, and it can be very inefficient if there are multiple ways of parsing some tokens which then turn out not to work. So a more usual approach is like this:

data Result a = Result a | Error String

newtype Parser a = Parser { runParser :: [Token] -> (Result a, [Token]) }

This either returns a definite result or an error message. It's still not very sophisticated: there is no way of tracking line numbers for instance. But its an improvement, and will do for a simple expression parser exercise. I'll leave the Monad and other instances for you to figure out. They aren't complicated, except that you will need to decide what to do with multiple error messages in the alternative case. (Clue: make the error case data more sophisticated).

In response to your queries: your lexer isn't very sophisticated, but will do for now. Its actually easier to just make your lexer a recursive-descent parser. The only reason to separate lexer and parser in this case is to make error handling easier.

Your result shouldn't be a rose tree. You should define a custom data type for your expressions, something like:

data Expr = Value String | Plus Expr Expr | Times Expr Expr ....

Then you can mirror this structure in your parser.

本文标签：

版权声明：本文标题：parsing - How to write a context free grammar in Haskell? Also, Am I going in the right direction in making this compiler? - Sta 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1742293800a2448318.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

parsing - How to write a context free grammar in Haskell? Also, Am I going in the right direction in making this compiler? - Sta

1 Answer 1

更多相关文章

iOS Safari javascript audio recording - Stack Overflow

scala - Kerberors Ticket renewal for long running application - Stack Overflow

twitter bootstrap - Loop two different category WP_Query

functions - How to get the index number of the posts?

python file to process excel sections not working - Stack Overflow

javascript - CoffeeScript scope issue - Stack Overflow

Remove checkout fields with Woocommerce depending on one of several shipping methods

How to write multiline if-else statements inside &lt;record&gt; in Fluentd record_transformer? - Stack Overflow

access google api&#39;s from local htmljavascript (without webserver) - Stack Overflow

javascript - Is `route.transitionTo` deprecated? - Stack Overflow

JSON String as Javascript function argument - Stack Overflow

Google Drive API V3 Javascript - Create File with Content - Stack Overflow

javascript - Calculate Stripe and Application fee before payment - Stack Overflow

http api - Sending JSON Payload using Request::request_multiple()

javascript - How to get AST tree from markdown via remark - Stack Overflow

javascript - Uncaught SyntaxError: Unexpected token &#39;)&#39; in underscore when looping throught template - Stack Ove

How do I prevent a browser page refresh in JavaScript? - Stack Overflow

html - Populate array elements in dropdown list using javascript - Stack Overflow

amazon web services - Configuring AWS Kinesis Firehose to a specific S3 Folder - Stack Overflow

eslint - Javascript While loop condition with no-cond-assign error - Stack Overflow

发表评论

推荐文章

internet explorer - Object.prototype in JavaScript - Stack Overflow

javascript - Why does the Canvas.FillText method seem to use the wrong coordinates? - Stack Overflow

javascript - ThreeJS updating texture with HTML canvas gives &quot;WebGL: INVALID_VALUE: texImage2D: no canvas&quot; on

customization - CF7 select value get to function and reciept value from function

javascript - How to configure factory externally in AngularJS - Stack Overflow

热门文章

javascript - Problems reading RSS feed with jQuery.get() - Stack Overflow

javascript - jQuery Sticky Sidebar failing - Stack Overflow

javascript - Dynamically create angular object with key, value from angular.forEach() - Stack Overflow

javascript - Change jsTree node css class after json_data loads? - Stack Overflow

Javascript event simulation library - Stack Overflow

azure - What permissionsroles do my account need to create new containers in a cosmos db? - Stack Overflow

rust - How can I accept a generic trait implementation to my struct and propagate it? - Stack Overflow

python - Any way to create a subset (or partially clone) from a conda environment? - Stack Overflow

plugin gravity forms - Wordpress - Gravityforms https: is not a valid url

javascript - How to generate random colors in an specific range using D3? - Stack Overflow

最新文章

笔记本Win7系统WiFi功能打开指南

thinpad安装win8.1&amp;win7双系统

Win7纯净版无网卡驱动的解决方案

Undelete-360：数据恢复专家指南

发掘数据恢复的强大武器：WinHex 免费版体验之旅

javascript - How to create array of strings in VueJS? - Stack Overflow

eslint - Javascript While loop condition with no-cond-assign error - Stack Overflow

String translation within WordPress multisite with i18n not working on all sites

r - How to make multiple groups of circle markers with different colors for each group? - Stack Overflow

javascript - Can a variable be passed to Playwright JS config in global-setup.js file? - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

How to write multiline if-else statements inside <record> in Fluentd record_transformer? - Stack Overflow

access google api's from local htmljavascript (without webserver) - Stack Overflow

javascript - Uncaught SyntaxError: Unexpected token ')' in underscore when looping throught template - Stack Ove

javascript - ThreeJS updating texture with HTML canvas gives "WebGL: INVALID_VALUE: texImage2D: no canvas" on

thinpad安装win8.1&win7双系统