Recursive Decent Parser (example)
- Posted by xecronix 1 week ago
- 192 views
Recursive Decent Parser (example) Just the parser bit is about 100 lines of code. Comments add a little bulk.
Update:
- I just noticed I forgot to include unary -/+. So, I fixed it.
- Noticed tabs and spaces both used for indentation. Fixed it. tab = space * 4
- Fixed "pyramid" code by reducing block level nesting.
forward function main() forward function opAddSub() forward function opMultDiv() forward function opPrefix() forward function opPower() forward function parseTerm() forward function advance_ptr() -- --- This parser is VERY unforgiving by way of expression format. --- -- Most notably the spaces between terms, operations, and parentheses are mandatory as written. -- I didn't write a robust tokenizer for this. It's just an expression parser as simple as -- I know how to make it. -- These were the test cases I used. Very light weight testing. Exactly -- Zero (0) -- tests -- for malformed expressions. This parser should not be considered production-ready without -- some real tests. But it's probably close. --sequence str = "2 + 3 * ( 4 + 5 )" --sequence str = "2 + 3 * 4" --sequence str = "2 * 3 + 4 * 6" --sequence str = "2 + 2 ^ 3" --sequence str = "4 ^ 3" --sequence str = "( 2 + 2 ) ^ 3" --sequence str = "1 + 2 + 4" --sequence str = "1 + 2 - 4" --sequence str = "4 ^ 2 ^ 2" --sequence str = "2 * 3 ^ 2" --sequence str = "( 2 + 3 ) * ( 4 + 5 )" --sequence str = "( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 ) * - 1" --sequence str = "( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 )" --sequence str = "- 2 ^ 2" sequence str = "- 3 + ( 4 * 2 ) ^ ( 1 + 1 ) / 8 - - 1" -- The simplest tokenizer I know of. sequence expression = split(str, " ") -- simulated pointer. Because for me, it kept things simple. integer ptr = 1 -- But even if my fake pointer is "simple," a little safety doesn't hurt. function advance_ptr() if ptr < length(expression) then ptr += 1 return 1 else return 0 end if end function -- In this parser, the lowest operation is addition and subtraction. -- So, this is the entry point to parsing expressions. function opAddSub() atom left = opMultDiv() -- Order of operations states multiply and divide come before add and subtract. sequence op = expression[ptr] sequence ops = "+-" printf(1, "Looking for %s in %s\n", {op[1], ops}) while find(op[1], ops) do if not advance_ptr() then exit -- if we couldn't advance the pointer, bail. end if atom right = opMultDiv() -- this moves the pointer to the next op. if op = "+" then left = left + right else left = left - right end if op = expression[ptr] -- now we're ready to read the next op end while return left end function -- This is very similar to opAddSub. The difference is this one is -- doing different operations and has a divide by 0 check. function opMultDiv() atom left = opPrefix() -- Order of operations states exponents come before multiply and divide. sequence op = expression[ptr] sequence ops = "*/" printf(1, "Looking for %s in %s\n", {op[1], ops}) while find(op[1], ops) do if not advance_ptr() then exit end if atom right = opPrefix() -- this moves the pointer to the next op. if op = "*" then left = left * right elsif right = 0 then puts(1, "Error, cannot divide by 0.\n") abort(1) else left = left / right end if op = expression[ptr] -- now we're ready to read the next op end while return left end function -- This is how unary signs are handled. Essentially, what's happening here is that there is a sign -- where a number should be. So, we'll handle it. function opPrefix() integer sign = 1 sequence tok = expression[ptr] while tok = "+" or tok = "-" do if tok = "-" then sign = -sign end if -- in case someone does something like ----4 if not advance_ptr() then exit end if tok = expression[ptr] end while return sign * opPower() end function -- This looks the same as the other two, but it isn't. Exponents are evaluated -- Right to Left. The other operations are evaluated from Left to Right. -- This tripped me up a little. Notice this is the first time -- We're actually doing recursion. The right side of the binary expression comes -- from calling itself. I'll point it out with a comment. function opPower() atom left = parseTerm() sequence op = expression[ptr] sequence ops = "^" printf(1, "Looking for %s in %s\n", {op[1], ops}) while find(op[1], ops) do if not advance_ptr() then exit end if atom right = opPower() -- this is calling itself and will cause 4 ^ 2 ^ 2 to be evaluated like 4 ^ ( 2 ^ 2 ) left = power(left, right) op = expression[ptr] -- now we're ready to read the next op end while return left end function -- NOTE -- This function moves the pointer. Be aware of that, please. -- The objective of this function is to return a single atom value. -- We'll deal with params here. If I were going to add variables or -- function calls to this parser, this is where I would do it. function parseTerm() sequence token = expression[ptr]; if atom(to_number(token)) then atom success = advance_ptr() -- <<-- This is why the pointer advances in a seemingly sneaky way. Sorry. return to_number(token); elsif token = "(" then atom success = advance_ptr() -- consume the ( <<-- pointer moves if not success then printf(1, "No term after open ( ", {token}) abort(0) end if atom retval = opAddSub() if expression[ptr] != ")" then printf(1, "expected ) but got %s", {expression[ptr]}) abort(0) end if success = advance_ptr() -- consume the ) <<-- pointer moves return retval else printf(1, "expected number or ( but got %s", {token}) abort(0); end if end function function main() atom num = opAddSub(); return num end function printf (1, "main exited with %f\n", {main()})