1. Recursive Decent Parser (example)
- Posted by xecronix 1 month ago
- 372 views
Recursive Decent Parser (example) Just the parser bit is about 100 lines of code. Comments add a little bulk.
Update:
- I just noticed I forgot to include unary -/+. So, I fixed it.
- Noticed tabs and spaces both used for indentation. Fixed it. tab = space * 4
- Fixed "pyramid" code by reducing block level nesting.
forward function main() forward function opAddSub() forward function opMultDiv() forward function opPrefix() forward function opPower() forward function parseTerm() forward function advance_ptr() -- --- This parser is VERY unforgiving by way of expression format. --- -- Most notably the spaces between terms, operations, and parentheses are mandatory as written. -- I didn't write a robust tokenizer for this. It's just an expression parser as simple as -- I know how to make it. -- These were the test cases I used. Very light weight testing. Exactly -- Zero (0) -- tests -- for malformed expressions. This parser should not be considered production-ready without -- some real tests. But it's probably close. --sequence str = "2 + 3 * ( 4 + 5 )" --sequence str = "2 + 3 * 4" --sequence str = "2 * 3 + 4 * 6" --sequence str = "2 + 2 ^ 3" --sequence str = "4 ^ 3" --sequence str = "( 2 + 2 ) ^ 3" --sequence str = "1 + 2 + 4" --sequence str = "1 + 2 - 4" --sequence str = "4 ^ 2 ^ 2" --sequence str = "2 * 3 ^ 2" --sequence str = "( 2 + 3 ) * ( 4 + 5 )" --sequence str = "( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 ) * - 1" --sequence str = "( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 )" --sequence str = "- 2 ^ 2" sequence str = "- 3 + ( 4 * 2 ) ^ ( 1 + 1 ) / 8 - - 1" -- The simplest tokenizer I know of. sequence expression = split(str, " ") -- simulated pointer. Because for me, it kept things simple. integer ptr = 1 -- But even if my fake pointer is "simple," a little safety doesn't hurt. function advance_ptr() if ptr < length(expression) then ptr += 1 return 1 else return 0 end if end function -- In this parser, the lowest operation is addition and subtraction. -- So, this is the entry point to parsing expressions. function opAddSub() atom left = opMultDiv() -- Order of operations states multiply and divide come before add and subtract. sequence op = expression[ptr] sequence ops = "+-" printf(1, "Looking for %s in %s\n", {op[1], ops}) while find(op[1], ops) do if not advance_ptr() then exit -- if we couldn't advance the pointer, bail. end if atom right = opMultDiv() -- this moves the pointer to the next op. if op = "+" then left = left + right else left = left - right end if op = expression[ptr] -- now we're ready to read the next op end while return left end function -- This is very similar to opAddSub. The difference is this one is -- doing different operations and has a divide by 0 check. function opMultDiv() atom left = opPrefix() -- Order of operations states exponents come before multiply and divide. sequence op = expression[ptr] sequence ops = "*/" printf(1, "Looking for %s in %s\n", {op[1], ops}) while find(op[1], ops) do if not advance_ptr() then exit end if atom right = opPrefix() -- this moves the pointer to the next op. if op = "*" then left = left * right elsif right = 0 then puts(1, "Error, cannot divide by 0.\n") abort(1) else left = left / right end if op = expression[ptr] -- now we're ready to read the next op end while return left end function -- This is how unary signs are handled. Essentially, what's happening here is that there is a sign -- where a number should be. So, we'll handle it. function opPrefix() integer sign = 1 sequence tok = expression[ptr] while tok = "+" or tok = "-" do if tok = "-" then sign = -sign end if -- in case someone does something like ----4 if not advance_ptr() then exit end if tok = expression[ptr] end while return sign * opPower() end function -- This looks the same as the other two, but it isn't. Exponents are evaluated -- Right to Left. The other operations are evaluated from Left to Right. -- This tripped me up a little. Notice this is the first time -- We're actually doing recursion. The right side of the binary expression comes -- from calling itself. I'll point it out with a comment. function opPower() atom left = parseTerm() sequence op = expression[ptr] sequence ops = "^" printf(1, "Looking for %s in %s\n", {op[1], ops}) while find(op[1], ops) do if not advance_ptr() then exit end if atom right = opPower() -- this is calling itself and will cause 4 ^ 2 ^ 2 to be evaluated like 4 ^ ( 2 ^ 2 ) left = power(left, right) op = expression[ptr] -- now we're ready to read the next op end while return left end function -- NOTE -- This function moves the pointer. Be aware of that, please. -- The objective of this function is to return a single atom value. -- We'll deal with params here. If I were going to add variables or -- function calls to this parser, this is where I would do it. function parseTerm() sequence token = expression[ptr]; if atom(to_number(token)) then atom success = advance_ptr() -- <<-- This is why the pointer advances in a seemingly sneaky way. Sorry. return to_number(token); elsif token = "(" then atom success = advance_ptr() -- consume the ( <<-- pointer moves if not success then printf(1, "No term after open ( ", {token}) abort(0) end if atom retval = opAddSub() if expression[ptr] != ")" then printf(1, "expected ) but got %s", {expression[ptr]}) abort(0) end if success = advance_ptr() -- consume the ) <<-- pointer moves return retval else printf(1, "expected number or ( but got %s", {token}) abort(0); end if end function function main() atom num = opAddSub(); return num end function printf (1, "main exited with %f\n", {main()})
2. Re: Recursive Decent Parser (example)
- Posted by petelomax 1 month ago
- 324 views
Just so everyone knows, that's a Phix program, btw (obvs I approve), and won't run as-is on Euphoria, not that it would be difficult to tweak (eg token="(" ==> equal(token,"(")).
3. Re: Recursive Decent Parser (example)
- Posted by xecronix 1 month ago
- 322 views
Now, pure Euphoria. Slightly less tested. :)
include std/convert.e include std/sequence.e -- --- This parser is VERY unforgiving by way of expression format. --- -- Most notably, the spaces between terms, operations, and parentheses are mandatory as written. -- I didn't write a robust tokenizer for this. It's just an expression parser as simple as -- I know how to make it. -- These were the test cases I used. Very light weight testing. Exactly -- Zero (0) -- tests -- for malformed expressions. This parser should not be considered production-ready without -- some real tests. But it's probably close. --sequence str = "2 + 3 * ( 4 + 5 )" --sequence str = "2 + 3 * 4" --sequence str = "2 * 3 + 4 * 6" --sequence str = "2 + 2 ^ 3" --sequence str = "4 ^ 3" --sequence str = "( 2 + 2 ) ^ 3" --sequence str = "1 + 2 + 4" --sequence str = "1 + 2 - 4" --sequence str = "4 ^ 2 ^ 2" --sequence str = "2 * 3 ^ 2" --sequence str = "( 2 + 3 ) * ( 4 + 5 )" --sequence str = "( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 ) * - 1" --sequence str = "( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 )" --sequence str = "- 2 ^ 2" sequence str = "- 3 + ( 4 * 2 ) ^ ( 1 + 1 ) / 8 - - 1" --sequence str = "( 4 + 5 )" -- The simplest tokenizer I know of. sequence expression = split(str, " ") -- simulated pointer. Because for me, it kept things simple. integer ptr = 1 -- But even if my fake pointer is "simple," a little safety doesn't hurt. function advance_ptr() if ptr < length(expression) then ptr += 1 return 1 end if return 0 end function -- In this parser, the lowest operation is addition and subtraction. -- So, this is the entry point to parsing expressions. function opAddSub() atom left = opMultDiv() -- Order of operations states multiply and divide come before add and subtract. sequence op = expression[ptr] sequence ops = "+-" printf(1, "Looking for %s in %s\n", {op[1], ops}) while find(op[1], ops) do if not advance_ptr() then -- Move to the next token, probably a number. exit -- if we couldn't advance the pointer, bail. end if atom right = opMultDiv() -- fetch an number and move the pointer to the next token. Probably an op. if equal(op, "+") then left = left + right else left = left - right end if op = expression[ptr] -- now we're ready to read the next op end while return left end function -- This is very similar to opAddSub. The difference is this one is -- doing different operations and has a divide by 0 check. function opMultDiv() atom left = opPrefix() -- Order of operations states exponents come before multiply and divide. sequence op = expression[ptr] sequence ops = "*/" printf(1, "Looking for %s in %s\n", {op[1], ops}) while find(op[1], ops) do if not advance_ptr() then exit end if atom right = opPrefix() -- this moves the pointer to the next op. if equal(op, "*") then left = left * right elsif right = 0 then puts(1, "Error, cannot divide by 0.\n") abort(1) else left = left / right end if op = expression[ptr] -- now we're ready to read the next op end while return left end function -- This is how unary signs are handled. Essentially, what's happening here is that there is a sign -- where a number should be. So, we'll handle it. function opPrefix() integer sign = 1 sequence tok = expression[ptr] while equal(tok, "+") or equal(tok, "-") do if equal(tok, "-") then sign = -sign end if -- in case someone does something like ----4 if not advance_ptr() then exit end if tok = expression[ptr] end while return sign * opPower() end function -- This looks the same as the other two, but it isn't. Exponents are evaluated -- Right to Left. The other operations are evaluated from Left to Right. -- This tripped me up a little. Notice this is the first time -- We're actually doing recursion. The right side of the binary expression comes -- from calling itself. I'll point it out with a comment. function opPower() atom left = parseTerm() sequence op = expression[ptr] sequence ops = "^" printf(1, "Looking for %s in %s\n", {op[1], ops}) while find(op[1], ops) do if not advance_ptr() then exit end if atom right = opPower() -- this is calling itself and will cause 4 ^ 2 ^ 2 to be evaluated like 4 ^ ( 2 ^ 2 ) left = power(left, right) op = expression[ptr] -- now we're ready to read the next op end while return left end function -- NOTE -- This function moves the pointer. Be aware of that, please. -- The objective of this function is to return a single atom value. -- We'll deal with parentheses here. If I were going to add variables or -- function calls to this parser, this is where I would do it. function parseTerm() sequence token = expression[ptr] if is_number(token) then atom success = advance_ptr() -- <<-- This is why the pointer advances in a seemingly sneaky way. Sorry. return to_number(token) elsif equal(token, "(") then atom success = advance_ptr() -- consume the ( <<-- pointer moves if not success then printf(1, "No term after open ( ", {token}) abort(0) end if atom retval = opAddSub() -- <<-- This is recursion if not equal(expression[ptr], ")") then printf(1, "expected ) but got %s", {expression[ptr]}) abort(0) end if success = advance_ptr() -- consume the ) <<-- pointer moves return retval else printf(1, "expected number or ( but got %s", {token}) abort(0) end if end function function is_number(sequence maybe) integer found_dot = 0 integer i = 1 integer c = 0 integer retval = 1 while i < (length(maybe) + 1) do c = maybe[i] if equal(c, '.') and found_dot then return 0 end if if equal(c, '.') then found_dot = 1 elsif not find(c, "0123456789") then return 0 end if i+=1 end while return retval end function function main() atom num = opAddSub() return num end function printf (1, "main exited with %f\n", {main()})
4. Re: Recursive Decent Parser (example)
- Posted by xecronix 1 month ago
- 304 views
FWIW. I replaced main() in both Phix and Euphoria code with:
function main() sequence tests = { {"2 + 3 * ( 4 + 5 )", "29.000000"}, {"2 + 3 * 4", "14.000000"}, {"2 * 3 + 4 * 6", "30.000000"}, {"2 + 2 ^ 3", "10.000000"}, {"4 ^ 3", "64.000000"}, {"( 2 + 2 ) ^ 3", "64.000000"}, {"1 + 2 + 4" , "7.000000"}, {"1 + 2 - 4" , "-1.000000"}, {"4 ^ 2 ^ 2", "256.000000"}, {"2 * 3 ^ 2", "18.000000"}, {"( 2 + 3 ) * ( 4 + 5 )", "45.000000"}, {"( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 ) * - 1", "-3.000122"}, {"( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 )", "3.000122"}, {"- 2 ^ 2", "-4.000000"}, {"- 3 + ( 4 * 2 ) ^ ( 1 + 1 ) / 8 - - 1", "6.000000"} } sequence results = {} integer i = 1 while i < (length(tests) + 1) do --The simplest tokenizer I know of. expression = split(tests[i][1], " ") ptr = 1 sequence result = sprintf("%f", {opAddSub()}) sequence pass_fail = "PASS" if not equal(tests[i][2], result) then pass_fail = "FAIL" end if results = append(results, sprintf("%s: Test: [%s] Expected: [%s] Got: [%s]\n", {pass_fail, tests[i][1], tests[i][2], result})) i += 1 end while i = 1 while i < length(results) + 1 do puts(1, results[i]) i += 1 end while return 0 end function
And this is the result: (Answers limited to 6 decimal places and verified by google)
PASS: Test: [2 + 3 * ( 4 + 5 )] Expected: [29.000000] Got: [29.000000] PASS: Test: [2 + 3 * 4] Expected: [14.000000] Got: [14.000000] PASS: Test: [2 * 3 + 4 * 6] Expected: [30.000000] Got: [30.000000] PASS: Test: [2 + 2 ^ 3] Expected: [10.000000] Got: [10.000000] PASS: Test: [4 ^ 3] Expected: [64.000000] Got: [64.000000] PASS: Test: [( 2 + 2 ) ^ 3] Expected: [64.000000] Got: [64.000000] PASS: Test: [1 + 2 + 4] Expected: [7.000000] Got: [7.000000] PASS: Test: [1 + 2 - 4] Expected: [-1.000000] Got: [-1.000000] PASS: Test: [4 ^ 2 ^ 2] Expected: [256.000000] Got: [256.000000] PASS: Test: [2 * 3 ^ 2] Expected: [18.000000] Got: [18.000000] PASS: Test: [( 2 + 3 ) * ( 4 + 5 )] Expected: [45.000000] Got: [45.000000] PASS: Test: [( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 ) * - 1] Expected: [-3.000122] Got: [-3.000122] PASS: Test: [( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 )] Expected: [3.000122] Got: [3.000122] PASS: Test: [- 2 ^ 2] Expected: [-4.000000] Got: [-4.000000] PASS: Test: [- 3 + ( 4 * 2 ) ^ ( 1 + 1 ) / 8 - - 1] Expected: [6.000000] Got: [6.000000] main exited with 0.000000
5. Re: Recursive Decent Parser (example)
- Posted by RobertS 3 weeks ago
- 186 views
I think I've saved myself a lot of trouble by using RPN for my Hypatia calculator ...
6. Re: Recursive Decent Parser (example)
- Posted by xecronix 3 weeks ago
- 174 views
Did you write it in Euphoria? Is it shareable?
UPDATE: I see it is shareable. Cool. Reverse Polish Notation Calculator
7. Re: Recursive Decent Parser (example)
- Posted by RobertS 3 weeks ago
- 168 views
Did you write it in Euphoria? Is it shareable?
Originally it was in Euphoria, but there were a few problems which I couldn't resolve - I don't fully remember what they were, but they went away when I tried Phix, so it's been Phix ever since.
Yes, you can freely download it from hypatia-rpn.net, also the source code, though this is regrettably poorly documented.
If you have a look at it, I'd be happy to hear what you think!
8. Re: Recursive Decent Parser (example)
- Posted by xecronix 3 weeks ago
- 154 views
Great work! There are quite a few gold nuggets in your code-some of which are directly relevant to BzScript, a language I'm currently working on. Honestly, it's probably the most digestible example of a REPL implementation I've seen so far.
To be truthful, I've spent about 30 minutes reviewing your code. With dedicated time, it would likely take a few days to fully digest it all. That said, it's absolutely worth the effort. There are clear study pieces in there that stand out.
Thanks again for sharing this solid example.
By the way, I'm curious about your comment that reading my RDP example made you feel good about choosing RPN. What was it that helped reinforce your decision? BZScript did some sort of hybrid Pratt/RDP to create an AST. Which, honestly, is only one step away from linear eval, RPN as I see it. I've strongly considered starting execution at this point (AST). But, one more iteration and I could have a linear representation (RPN). I'm wondering if it's worth the effort. (tree vs linear)
9. Re: Recursive Decent Parser (example)
- Posted by RobertS 3 weeks ago
- 116 views
Thank you for your praise! As I've said and you have now seen for yourself, the code is poorly documented, it has grown slowly and not always systematically (to put it mildly) over a period of almost 20 years, and I tend to use variable names that are short and convey little information ... If you have any questions please feel free to ask, you can reach me by mail, robert@drs.at
By the way, I'm curious about your comment that reading my RDP example made you feel good about choosing RPN. What was it that helped reinforce your decision?
Oh, there was no deep meaning to those words - looking at your code just showed me again that it isn't trivial to parse infix notation, and made me glad that I never tried it. :) Even though laziness wasn't my only motive to avoid it, I really like RPN.