1. Recursive Decent Parser (example)

Recursive Decent Parser (example) Just the parser bit is about 100 lines of code. Comments add a little bulk.

Update:

  • I just noticed I forgot to include unary -/+. So, I fixed it.
  • Noticed tabs and spaces both used for indentation. Fixed it. tab = space * 4
  • Fixed "pyramid" code by reducing block level nesting.
 
forward function main() 
forward function opAddSub() 
forward function opMultDiv() 
forward function opPrefix() 
forward function opPower() 
forward function parseTerm() 
forward function advance_ptr() 
 
-- ---    This parser is VERY unforgiving by way of expression format. --- 
-- Most notably the spaces between terms, operations, and parentheses are mandatory as written. 
-- I didn't write a robust tokenizer for this.    It's just an expression parser as simple as  
-- I know how to make it.     
 
-- These were the test cases I used.    Very light weight testing.    Exactly -- Zero (0) -- tests  
-- for malformed expressions.    This parser should not be considered production-ready without 
-- some real tests.    But it's probably close. 
 
--sequence str = "2 + 3 * ( 4 + 5 )" 
--sequence str = "2 + 3 * 4" 
--sequence str = "2 * 3 + 4 * 6" 
--sequence str = "2 + 2 ^ 3" 
--sequence str = "4 ^ 3" 
--sequence str = "( 2 + 2 ) ^ 3" 
--sequence str = "1 + 2 + 4"  
--sequence str = "1 + 2 - 4"  
--sequence str = "4 ^ 2 ^ 2" 
--sequence str = "2 * 3 ^ 2" 
--sequence str = "( 2 + 3 ) * ( 4 + 5 )" 
--sequence str = "( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 ) * - 1" 
--sequence str = "( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 )" 
--sequence str = "- 2 ^ 2" 
sequence str = "- 3 + ( 4 * 2 ) ^ ( 1 + 1 ) / 8 - - 1" 
 
-- The simplest tokenizer I know of. 
sequence expression = split(str, " ") 
 
-- simulated pointer. Because for me, it kept things simple. 
integer ptr = 1 
 
-- But even if my fake pointer is "simple," a little safety doesn't hurt. 
function advance_ptr()  
    if ptr < length(expression) then 
        ptr += 1 
    return 1  
    else 
        return 0 
    end if 
end function 
 
-- In this parser, the lowest operation is addition and subtraction.     
-- So, this is the entry point to parsing expressions. 
function opAddSub() 
    atom left = opMultDiv() -- Order of operations states multiply and divide come before add and subtract. 
    sequence op = expression[ptr] 
    sequence ops = "+-" 
    printf(1, "Looking for %s in %s\n", {op[1], ops}) 
    while find(op[1], ops) do  
        if not advance_ptr() then 
            exit -- if we couldn't advance the pointer, bail. 
        end if 
        atom right = opMultDiv()    -- this moves the pointer to the next op. 
        if op = "+" then 
            left = left + right 
        else 
            left = left - right 
        end if 
        op = expression[ptr] -- now we're ready to read the next op 
    end while 
    return left 
end function 
 
-- This is very similar to opAddSub.    The difference is this one is  
-- doing different operations and has a divide by 0 check. 
function opMultDiv() 
    atom left = opPrefix() -- Order of operations states exponents come before multiply and divide. 
    sequence op = expression[ptr] 
    sequence ops = "*/" 
    printf(1, "Looking for %s in %s\n", {op[1], ops}) 
    while find(op[1], ops) do  
        if not advance_ptr() then 
            exit 
        end if 
         
        atom right = opPrefix()    -- this moves the pointer to the next op. 
        if op = "*" then 
            left = left * right 
        elsif right = 0 then 
            puts(1, "Error, cannot divide by 0.\n") 
            abort(1) 
        else 
            left = left / right 
        end if 
        op = expression[ptr] -- now we're ready to read the next op 
    end while 
    return left 
end function 
 
-- This is how unary signs are handled.    Essentially, what's happening here is that there is a sign  
-- where a number should be.    So, we'll handle it. 
function opPrefix() 
    integer sign = 1 
    sequence tok = expression[ptr] 
    while tok = "+" or tok = "-" do 
        if tok = "-" then sign = -sign end if --    in case someone does something like ----4  
        if not advance_ptr() then exit end if 
        tok = expression[ptr] 
    end while 
    return sign * opPower() 
end function 
 
-- This looks the same as the other two, but it isn't.    Exponents are evaluated 
-- Right to Left.    The other operations are evaluated from Left to Right.     
-- This tripped me up a little.    Notice this is the first time  
-- We're actually doing recursion. The right side of the binary expression comes 
-- from calling itself.  I'll point it out with a comment.     
function opPower() 
    atom left = parseTerm() 
    sequence op = expression[ptr] 
    sequence ops = "^" 
    printf(1, "Looking for %s in %s\n", {op[1], ops}) 
    while find(op[1], ops) do  
        if not advance_ptr() then 
            exit 
        end if 
         
        atom right = opPower()    -- this is calling itself and will cause 4 ^ 2 ^ 2 to be evaluated like 4 ^ ( 2 ^ 2 ) 
        left = power(left, right) 
        op = expression[ptr] -- now we're ready to read the next op 
    end while 
    return left 
end function 
 
-- NOTE -- This function moves the pointer.    Be aware of that, please. 
-- The objective of this function is to return a single atom value. 
-- We'll deal with params here.    If I were going to add variables or 
-- function calls to this parser, this is where I would do it. 
function parseTerm() 
    sequence token = expression[ptr]; 
    if atom(to_number(token)) then 
        atom success = advance_ptr() -- <<-- This is why the pointer advances in a seemingly sneaky way.    Sorry. 
        return to_number(token); 
    elsif token = "(" then 
        atom success = advance_ptr() -- consume the (    <<-- pointer moves  
        if not success then 
            printf(1, "No term after open ( ", {token}) 
            abort(0) 
        end if 
        atom retval = opAddSub() 
        if expression[ptr] != ")" then 
            printf(1, "expected ) but got %s", {expression[ptr]})      
            abort(0)         
        end if 
        success = advance_ptr() -- consume the )    <<-- pointer moves  
        return retval 
    else 
        printf(1, "expected number or ( but got %s", {token}) 
        abort(0); 
    end if 
end function 
 
function main() 
    atom num = opAddSub(); 
    return num 
end function 
 
printf (1, "main exited with %f\n", {main()}) 

Ronald Weidner

new topic     » topic index » view message » categorize

2. Re: Recursive Decent Parser (example)

Just so everyone knows, that's a Phix program, btw (obvs I approve), and won't run as-is on Euphoria, not that it would be difficult to tweak (eg token="(" ==> equal(token,"(")).

new topic     » goto parent     » topic index » view message » categorize

3. Re: Recursive Decent Parser (example)

Now, pure Euphoria. Slightly less tested. :)

include std/convert.e 
include std/sequence.e 
 
-- ---    This parser is VERY unforgiving by way of expression format. --- 
-- Most notably, the spaces between terms, operations, and parentheses are mandatory as written. 
-- I didn't write a robust tokenizer for this.    It's just an expression parser as simple as  
-- I know how to make it.     
 
-- These were the test cases I used.    Very light weight testing.    Exactly -- Zero (0) -- tests  
-- for malformed expressions.    This parser should not be considered production-ready without 
-- some real tests.    But it's probably close. 
 
--sequence str = "2 + 3 * ( 4 + 5 )" 
--sequence str = "2 + 3 * 4" 
--sequence str = "2 * 3 + 4 * 6" 
--sequence str = "2 + 2 ^ 3" 
--sequence str = "4 ^ 3" 
--sequence str = "( 2 + 2 ) ^ 3" 
--sequence str = "1 + 2 + 4"  
--sequence str = "1 + 2 - 4"  
--sequence str = "4 ^ 2 ^ 2" 
--sequence str = "2 * 3 ^ 2" 
--sequence str = "( 2 + 3 ) * ( 4 + 5 )" 
--sequence str = "( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 ) * - 1" 
--sequence str = "( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 )" 
--sequence str = "- 2 ^ 2" 
sequence str = "- 3 + ( 4 * 2 ) ^ ( 1 + 1 ) / 8 - - 1" 
--sequence str = "( 4 + 5 )" 
 
 
-- The simplest tokenizer I know of. 
sequence expression = split(str, " ") 
 
-- simulated pointer. Because for me, it kept things simple. 
integer ptr = 1 
 
-- But even if my fake pointer is "simple," a little safety doesn't hurt. 
function advance_ptr()  
    if ptr < length(expression) then 
        ptr += 1 
        return 1  
    end if 
    return 0 
end function 
 
-- In this parser, the lowest operation is addition and subtraction.     
-- So, this is the entry point to parsing expressions. 
function opAddSub() 
    atom left = opMultDiv() -- Order of operations states multiply and divide come before add and subtract. 
    sequence op = expression[ptr] 
    sequence ops = "+-" 
    printf(1, "Looking for %s in %s\n", {op[1], ops}) 
    while find(op[1], ops) do  
        if not advance_ptr() then -- Move to the next token, probably a number. 
            exit -- if we couldn't advance the pointer, bail. 
        end if 
        atom right = opMultDiv()    -- fetch an number and move the pointer to the next token.  Probably an op. 
        if equal(op, "+") then 
            left = left + right 
        else 
            left = left - right 
        end if 
        op = expression[ptr] -- now we're ready to read the next op 
    end while 
    return left 
end function 
 
-- This is very similar to opAddSub.    The difference is this one is  
-- doing different operations and has a divide by 0 check. 
function opMultDiv() 
    atom left = opPrefix() -- Order of operations states exponents come before multiply and divide. 
    sequence op = expression[ptr] 
    sequence ops = "*/" 
    printf(1, "Looking for %s in %s\n", {op[1], ops}) 
    while find(op[1], ops) do  
        if not advance_ptr() then 
            exit 
        end if 
         
        atom right = opPrefix()    -- this moves the pointer to the next op. 
        if equal(op, "*") then 
            left = left * right 
        elsif right = 0 then 
            puts(1, "Error, cannot divide by 0.\n") 
            abort(1) 
        else 
            left = left / right 
        end if 
        op = expression[ptr] -- now we're ready to read the next op 
    end while 
    return left 
end function 
 
-- This is how unary signs are handled.    Essentially, what's happening here is that there is a sign  
-- where a number should be.    So, we'll handle it. 
function opPrefix() 
    integer sign = 1 
    sequence tok = expression[ptr] 
    while equal(tok, "+") or equal(tok, "-") do 
        if equal(tok, "-") then sign = -sign end if --    in case someone does something like ----4  
        if not advance_ptr() then exit end if 
        tok = expression[ptr] 
    end while 
    return sign * opPower() 
end function 
 
-- This looks the same as the other two, but it isn't.    Exponents are evaluated 
-- Right to Left.    The other operations are evaluated from Left to Right.     
-- This tripped me up a little.    Notice this is the first time  
-- We're actually doing recursion. The right side of the binary expression comes 
-- from calling itself.  I'll point it out with a comment.     
function opPower() 
    atom left = parseTerm() 
    sequence op = expression[ptr] 
    sequence ops = "^" 
    printf(1, "Looking for %s in %s\n", {op[1], ops}) 
    while find(op[1], ops) do  
        if not advance_ptr() then 
            exit 
        end if 
         
        atom right = opPower()    -- this is calling itself and will cause 4 ^ 2 ^ 2 to be evaluated like 4 ^ ( 2 ^ 2 ) 
        left = power(left, right) 
        op = expression[ptr] -- now we're ready to read the next op 
    end while 
    return left 
end function 
 
-- NOTE -- This function moves the pointer.    Be aware of that, please. 
-- The objective of this function is to return a single atom value. 
-- We'll deal with parentheses here.    If I were going to add variables or 
-- function calls to this parser, this is where I would do it. 
function parseTerm() 
    sequence token = expression[ptr] 
    if is_number(token) then 
        atom success = advance_ptr() -- <<-- This is why the pointer advances in a seemingly sneaky way.    Sorry. 
        return to_number(token) 
    elsif equal(token, "(") then 
        atom success = advance_ptr() -- consume the (    <<-- pointer moves  
        if not success then 
            printf(1, "No term after open ( ", {token}) 
            abort(0) 
        end if 
        atom retval = opAddSub() -- <<-- This is recursion 
        if not equal(expression[ptr], ")") then 
            printf(1, "expected ) but got %s", {expression[ptr]})      
            abort(0)         
        end if 
        success = advance_ptr() -- consume the )    <<-- pointer moves  
        return retval 
    else 
        printf(1, "expected number or ( but got %s", {token}) 
        abort(0) 
    end if 
end function 
 
function is_number(sequence maybe) 
    integer found_dot = 0 
    integer i = 1 
    integer c = 0 
    integer retval = 1 
    while i < (length(maybe) + 1) do 
        c = maybe[i] 
        if equal(c, '.') and found_dot then 
            return 0 
        end if 
 
        if equal(c, '.') then  
            found_dot = 1 
        elsif not find(c, "0123456789") then 
            return 0 
        end if 
        i+=1 
    end while 
    return retval 
end function 
 
function main() 
    atom num = opAddSub() 
    return num 
end function 
 
printf (1, "main exited with %f\n", {main()}) 
new topic     » goto parent     » topic index » view message » categorize

4. Re: Recursive Decent Parser (example)

FWIW. I replaced main() in both Phix and Euphoria code with:

function main() 
    sequence tests = { 
        {"2 + 3 * ( 4 + 5 )", "29.000000"}, 
        {"2 + 3 * 4", "14.000000"}, 
        {"2 * 3 + 4 * 6", "30.000000"}, 
        {"2 + 2 ^ 3", "10.000000"}, 
        {"4 ^ 3", "64.000000"}, 
        {"( 2 + 2 ) ^ 3", "64.000000"}, 
        {"1 + 2 + 4" , "7.000000"}, 
        {"1 + 2 - 4" , "-1.000000"}, 
        {"4 ^ 2 ^ 2", "256.000000"}, 
        {"2 * 3 ^ 2", "18.000000"}, 
        {"( 2 + 3 ) * ( 4 + 5 )", "45.000000"}, 
        {"( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 ) * - 1", "-3.000122"}, 
        {"( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 )", "3.000122"}, 
        {"- 2 ^ 2", "-4.000000"}, 
        {"- 3 + ( 4 * 2 ) ^ ( 1 + 1 ) / 8 - - 1", "6.000000"} 
    } 
     
    sequence results = {} 
     
    integer i = 1 
    while i < (length(tests) + 1) do 
        --The simplest tokenizer I know of. 
        expression = split(tests[i][1], " ") 
        ptr = 1 
        sequence result = sprintf("%f", {opAddSub()}) 
        sequence pass_fail = "PASS" 
        if not equal(tests[i][2], result) then 
            pass_fail = "FAIL" 
        end if 
        results = append(results, sprintf("%s: Test: [%s] Expected: [%s] Got: [%s]\n", {pass_fail, tests[i][1], tests[i][2], result})) 
        i += 1 
    end while 
     
    i = 1 
    while i < length(results) + 1 do 
        puts(1, results[i]) 
        i += 1 
    end while 
     
    return 0 
end function 

And this is the result: (Answers limited to 6 decimal places and verified by google)

PASS: Test: [2 + 3 * ( 4 + 5 )] Expected: [29.000000] Got: [29.000000] 
PASS: Test: [2 + 3 * 4] Expected: [14.000000] Got: [14.000000] 
PASS: Test: [2 * 3 + 4 * 6] Expected: [30.000000] Got: [30.000000] 
PASS: Test: [2 + 2 ^ 3] Expected: [10.000000] Got: [10.000000] 
PASS: Test: [4 ^ 3] Expected: [64.000000] Got: [64.000000] 
PASS: Test: [( 2 + 2 ) ^ 3] Expected: [64.000000] Got: [64.000000] 
PASS: Test: [1 + 2 + 4] Expected: [7.000000] Got: [7.000000] 
PASS: Test: [1 + 2 - 4] Expected: [-1.000000] Got: [-1.000000] 
PASS: Test: [4 ^ 2 ^ 2] Expected: [256.000000] Got: [256.000000] 
PASS: Test: [2 * 3 ^ 2] Expected: [18.000000] Got: [18.000000] 
PASS: Test: [( 2 + 3 ) * ( 4 + 5 )] Expected: [45.000000] Got: [45.000000] 
PASS: Test: [( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 ) * - 1] Expected: [-3.000122] Got: [-3.000122] 
PASS: Test: [( 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 )] Expected: [3.000122] Got: [3.000122] 
PASS: Test: [- 2 ^ 2] Expected: [-4.000000] Got: [-4.000000] 
PASS: Test: [- 3 + ( 4 * 2 ) ^ ( 1 + 1 ) / 8 - - 1] Expected: [6.000000] Got: [6.000000] 
main exited with 0.000000 
 

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu