Re: Py 2.5g Update

new topic     » goto parent     » topic index » view thread      » older message » newer message

Kat wrote:

> Dan, can you tell me what "lexer" is? It's
> not in any of my dictionaries.

Darn! First I get caught for terrible spelling, and now nailed for making up
words.

There are a pair of popular Unix programs for writing new languages, called
'yacc' (yet another compiler compiler) and 'lex' (which does the
lexigraphical analysis).

I don't have a dictionary handy, so this isn't precise: 'lexigraphical' has
to do with the written word. The word  'lexer' is short for a program that
performs lexigraphical analysis. With a language like English, it would
involve looking at the roots of the word (Latin/Greek/Old English, etc.), or
how the ending shows plurality - stuff like that.

In the case of computers, it's typically more simple-minded. A 'lexer' has a
series of rules as to what various kinds of words look like, and how they
can legally be assembled. For an example, think of the Perl-type pattern
matchers:

   integer := {+|-}[0-9]*

says that an integer is composed of an optional sign, followed by one or
more characters in the set 0 through 9 (I can't recall the Perl syntax
exactly, so I'm winging it here).

The lexer in Ox is much stupider. Like any other lexer, it is responsible
for converting a string of characters into seperate words (tokens), and
assigning meaning to those tokens (identifier, number, whitespace, etc.).
The parser then looks at how those tokens are combined, and takes some
action (generates assembly code, builds an executable parse tree, etc).

But in Ox, the lexer treats the optional sign (+,-) as a seperate token:

   +12.4 -> { '+' '12.4' }

and it's up to the grammar to specify that the sign is part of a number.

The parser also adds further rules (based on context) to give meaning to
words. For example, the string of characters 'foo' represents an identifier.
But if it's followed by '(', it's probably a routine call. Context also
determines what is gramatically legal. For example, you can't legally write
this in Euphoria:

   integer i
   for i = 1 to 10 do

Although each bit is a gramatically legal 'sentance', the context of 'i'
being a declared variable conflicts with the rule that loop variables can't
be declared.

Did that clarify things, or just make them worse?

-- David Cuny

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu