Re: Py 2.5g Update
- Posted by Raude Riwal <RAUDER at THMULTI.COM> Nov 10, 2000
- 454 views
He he . you probably mean lexiCOgraphical? Riwal > -----Original Message----- > From: David Cuny [SMTP:dcuny at LANSET.COM] > Sent: Saturday, November 04, 2000 4:04 AM > To: EUPHORIA at LISTSERV.MUOHIO.EDU > Subject: Re: Py 2.5g Update > > Kat wrote: > > > Dan, can you tell me what "lexer" is? It's > > not in any of my dictionaries. > > Darn! First I get caught for terrible spelling, and now nailed for making > up > words. > > There are a pair of popular Unix programs for writing new languages, > called > 'yacc' (yet another compiler compiler) and 'lex' (which does the > lexigraphical analysis). > > I don't have a dictionary handy, so this isn't precise: 'lexigraphical' > has > to do with the written word. The word 'lexer' is short for a program that > performs lexigraphical analysis. With a language like English, it would > involve looking at the roots of the word (Latin/Greek/Old English, etc.), > or > how the ending shows plurality - stuff like that. > > In the case of computers, it's typically more simple-minded. A 'lexer' has > a > series of rules as to what various kinds of words look like, and how they > can legally be assembled. For an example, think of the Perl-type pattern > matchers: > > integer := {+|-}[0-9]* > > says that an integer is composed of an optional sign, followed by one or > more characters in the set 0 through 9 (I can't recall the Perl syntax > exactly, so I'm winging it here). > > The lexer in Ox is much stupider. Like any other lexer, it is responsible > for converting a string of characters into seperate words (tokens), and > assigning meaning to those tokens (identifier, number, whitespace, etc.). > The parser then looks at how those tokens are combined, and takes some > action (generates assembly code, builds an executable parse tree, etc). > > But in Ox, the lexer treats the optional sign (+,-) as a seperate token: > > +12.4 -> { '+' '12.4' } > > and it's up to the grammar to specify that the sign is part of a number. > > The parser also adds further rules (based on context) to give meaning to > words. For example, the string of characters 'foo' represents an > identifier. > But if it's followed by '(', it's probably a routine call. Context also > determines what is gramatically legal. For example, you can't legally > write > this in Euphoria: > > integer i > for i = 1 to 10 do > > Although each bit is a gramatically legal 'sentance', the context of 'i' > being a declared variable conflicts with the rule that loop variables > can't > be declared. > > Did that clarify things, or just make them worse? > > -- David Cuny