Re: Py 2.5g Update
- Posted by David Cuny <dcuny at LANSET.COM> Nov 03, 2000
- 462 views
Kat wrote: > Dan, can you tell me what "lexer" is? It's > not in any of my dictionaries. Darn! First I get caught for terrible spelling, and now nailed for making up words. There are a pair of popular Unix programs for writing new languages, called 'yacc' (yet another compiler compiler) and 'lex' (which does the lexigraphical analysis). I don't have a dictionary handy, so this isn't precise: 'lexigraphical' has to do with the written word. The word 'lexer' is short for a program that performs lexigraphical analysis. With a language like English, it would involve looking at the roots of the word (Latin/Greek/Old English, etc.), or how the ending shows plurality - stuff like that. In the case of computers, it's typically more simple-minded. A 'lexer' has a series of rules as to what various kinds of words look like, and how they can legally be assembled. For an example, think of the Perl-type pattern matchers: integer := {+|-}[0-9]* says that an integer is composed of an optional sign, followed by one or more characters in the set 0 through 9 (I can't recall the Perl syntax exactly, so I'm winging it here). The lexer in Ox is much stupider. Like any other lexer, it is responsible for converting a string of characters into seperate words (tokens), and assigning meaning to those tokens (identifier, number, whitespace, etc.). The parser then looks at how those tokens are combined, and takes some action (generates assembly code, builds an executable parse tree, etc). But in Ox, the lexer treats the optional sign (+,-) as a seperate token: +12.4 -> { '+' '12.4' } and it's up to the grammar to specify that the sign is part of a number. The parser also adds further rules (based on context) to give meaning to words. For example, the string of characters 'foo' represents an identifier. But if it's followed by '(', it's probably a routine call. Context also determines what is gramatically legal. For example, you can't legally write this in Euphoria: integer i for i = 1 to 10 do Although each bit is a gramatically legal 'sentance', the context of 'i' being a declared variable conflicts with the rule that loop variables can't be declared. Did that clarify things, or just make them worse? -- David Cuny