Ooops
January 19, 2021, 06:11:51 AM

Author Topic: [bb] Lexer generator by Yasha [ 1+ years ago ]  (Read 559 times)

Offline BlitzBot

  • Jr. Member
  • **
  • Posts: 1
[bb] Lexer generator by Yasha [ 1+ years ago ]
« on: June 29, 2017, 12:28:41 AM »
Title : Lexer generator
Author : Yasha
Posted : 1+ years ago

Description : Final update: There is no longer any reason to use this entry: it has been completely superseded by a <a href="codearcs54e8.html?code=2985" target="_blank">much more convenient generic lexer API[/url].
Updated 21/07/2011: Completely rewrote the regular expression engine in an attempt to fix some bugs, and tidied up the output of the scanner to use objects.

Updated 18/06/2010: Added the "Type" command; affects both files.

Updated 19/04/2010: Fixed a minor error with backslash-escaped backslashes in the regexen; added a couple of major speed enhancements.

Updated 01/03/2010: If you downloaded this previously, please do so again as there was a problem with the regular expression engine.
A lexical scanner, or tokeniser, is a tool that reads through souce code or some other kind of input, and breaks it up into separate tokens, identified by type. It's an absolutely essential component in a compiler or interpreter, and is useful as the first stage of parsing many other kinds of input as well, such as in a calculator or command-line interface. Writing them can be both difficult and tedious, however; so it's common to automate the process...

This generator is very loosely based on <a href="http://flex.sourceforge.net/" target="_blank">Flex[/url], although significantly less powerful and only aimed at producing a specific kind of output.

Input is in the form of a definitions file - this is a simple text file in the following format:
Code: [Select]
Case Insensitive

Constants: {
    Digit [0-9]
    Quote "
    Point .
    Int 1
    Float 2
    String 3
}

Modes: {
    COMMENT Exclusive
    DOG Inclusive
}

Rules: {
    -?{Digit}+ Store Int
    -?{Digit}*{Point}{Digit}+ Store Float
    {Quote}[^
]*{Quote} Store String
    {- Mode <COMMENT>

    <COMMENT> -} Mode <>
    <COMMENT,DOG,> doggy {
        print "nada"
    }
}

Code: {
;Anything in here is copied straight to the main body
;so make sure it's valid BB code
}


Input is arranged in "blocks" - anything not in one of these blocks is currently ignored, with the exception of the "case insensitive" directive which should appear outside the blocks (EDIT: There's now also a "case sensitive" directive, if you need to return it to the default, which also needs to be outside the blocks). There are four kinds of block, as shown above, although none are necessary and each type may appear more than once (a file with no "Rules" block will not generate a functioning lexer, though!) and in any order. Blocks begin with a type specifier ("Rules", "Code", "Constants" or "Modes") followed by a colon and an opening brace; the definitions in the block must then begin on a new line. The block ends when a closing brace is encountered on its own new line.

The "Code" block is simplest: anything in this block is simply copied verbatim into the resulting file, in the "main body" of the code. You can specify include lines or helper functions here if you like.

The "Constants" block defines simple replacement constants for use in rule definitions and their actions (more on this below). These won't make it into the final BB code though, so don't reference them in any code sections. You can declare these before or after Rules; it makes no difference.

The "Modes" block defines different modes of operation that will determine which rules are followed at any given time (supposed to imitate Flex's "start conditions"). For example, if you wanted to scan C or C++, you could set the token "/*" to trigger "comment" mode, which has only one rule: "*/", which puts the scanner back in normal mode and allows it to pick up names and commands again (in both cases the asterisk must be escaped). Modes are defined as "inclusive" or "exclusive": exclusive modes, like the comment mode, only allow rules explicitly assigned to them to be followed while active; an inclusive mode will also allow rules without any specific mode to be followed. As with constants, modes can be declared anywhere in the definitions file.

Finally, the most important section is the Rules section. A rule begins with an optional list of modes, contained within <> (more than one mode can be assigned to a rule, separated with commas, including as above, the empty mode). Next is a regular expression that defines a pattern to match tokens to, following the rules outlined <a href="codearcs5392.html?code=2632" target="_blank">here[/url] with the addition of ^ at the start of a pattern to force start-of-line or -file and $ at the end of a pattern to force end-of-line/file. Note that the braces are escaped with backslash to force their literal value. If you included the "case insensitive" directive somewhere in the file (outside the blocks) then all of the rules will be case-insensitive (if you need only some of the rules to be case-sensitive, you'll need to devise regex rules that reflect this).

After the pattern, you can specify an action for the lexer to take: you can store the token and an integer type for it, just store the type (useful for saving memory on things like operators where the token itself is always known), change mode, or execute arbitrary BB code within a braced block, again allowing a new line for the final brace (unlike the code in Code:{} blocks, this is placed into the scanner function scope). If you want to do more than one of these, at the moment the only way is to do so directly in BB code (will probably change this). You don't need to specify an action at all - in a C++ lexer you might have the rule //[^
]*
 (double slash, then anything up until the next newline) with no action, to comment out the rest of a line.

If more than one pattern could match a character string (eg. "End" and "End Function") the lexer will go with the longer match. If the matches are the same length then the first specified in the list will be chosen.

The generated scanner builds a list of tokens (as bankstrings) and their integer types in a bank, which it returns, unless you use the BB code blocks to make it do something else, so unlike some tokenising functions that are called repeatedly to cough up the next token, BBLex_ScanFile() only needs to be called once and then the tokens can be obtained by navigating the resulting bank.

Only three functions are actually created by this generator - the scanner itself (BBLex_ScanFile) and two other initialisation functions that it calls. The vast majority of the scanner actually consists of a slightly-modified version of my <a href="codearcs5392.html?code=2632" target="_blank">regular expressions[/url] library, and so is simply Included as BBLex_Functions.bb:

Code: [Select]
;===============================================================================
;This function library provides an interface through which you can use the lexer
;generated by BBLex, rather than calling it directly.


Include "Regex.bb" ;Get this Include from: <a href="codearcs5392.html?code=2632" target="_blank">http://www.blitzbasic.com/codearcs/codearcs.php?code=2632</a>


;Extend this with any other useful info that can be read by your scanner,
;for example, line numbers (remember to extend the constructor too!)
Type BBLex_Token
Field val$
Field tType
End Type


;Run the lexical scanner on the contents of the given file
Function BBLex_ScanFile(filename$)
Local sBank,tBank
sBank=LoadFileBank(filename)
tBank=BBLex_ScanData(sBank)
FreeBank sBank
Return tBank
End Function

;Get a token from the list by index
Function BBLex_GetToken.BBLex_Token(tBank, index)
Return Object.BBLex_Token(PeekInt(tBank, index * 4))
End Function

;Free the token list when done with it
Function BBLex_FreeTokenBank(tBank)
Local i, token.BBLex_Token

For i = 0 To BankSize(tBank) - 4 Step 4
token = Object.BBLex_Token(PeekInt(tBank, i))
Delete token
Next

FreeBank tBank
End Function


;Internal functions to the scanner - don't use separately
;--------------------------------------------------------

;Internal function to the scanner - clean up regex objects once done
Function BBLex_DeleteRegexen(regexBank)
Local i
For i=0 To BankSize(regexBank)-4 Step 4
RegEx_Delete(Object.RegEx_Node(PeekInt(regexBank,i)))
Next
FreeBank regexBank
End Function

;Internal function to the scanner - clean up mode list when done
Function BBLex_ClearModes(modeBank)
Local i
For i=0 To BankSize(modeBank)-4 Step 4
If PeekInt(modeBank,i) Then FreeBank PeekInt(modeBank,i)
Next
FreeBank modeBank
End Function

;Internal function to the scanner - check that a rule applies in the current mode
Function BBLex_ModeMatch(rule, mBank, cMode)
Local i, ruleModes
ruleModes = PeekInt(mBank, rule * 4)
If ruleModes
For i = 0 To BankSize(ruleModes) - 4 Step 4
If (cMode = 0 And PeekInt(ruleModes, i) < 1) Or cMode = PeekInt(ruleModes, i) Then Return True
Next
Else
Return (cMode < 1)
EndIf
End Function

;Add a token to the token list - used by the scanner
Function BBLex_StoreToken(tBank, tType, token$)
ResizeBank tBank, BankSize(tBank) + 4
PokeInt tBank, BankSize(tBank) - 4, Handle(BBLex_MakeToken(token, tType))
End Function

;Add a token's type to the token list without keeping its value
Function BBLex_StoreType(tBank, tType) ;This is mainly here for compatibility
ResizeBank tBank,BankSize(tBank) + 4
PokeInt tBank, BankSize(tBank) - 4, Handle(BBLex_MakeToken("", tType))
End Function

;New token object
Function BBLex_MakeToken.BBLex_Token(val$, tType)
Local tok.BBLex_Token = New BBLex_Token
tokval = val
tok Type = tType
Return tok
End Function


;===============================================================================



;===============================================================================
;General utility functions (not really connected to this library)
;===============================================================================


Function StrToBank(s$) ;Return a bank containing the binary value of the given string
Local i,bank
bank=CreateBank(Len(s))
For i=0 To Len(s)-1
PokeByte bank,i,Asc(Mid(s,i+1,1))
Next
Return bank
End Function

Function BankToStr$(bank) ;Return a string containing the ASCII value of the given bank
Local i,s$
For i=0 To BankSize(bank)-1
s=s+Chr(PeekByte(bank,i))
Next
Return s
End Function

Function LoadFileBank(filename$) ;Load a file straight into a bank
Local bank,file
file=ReadFile(filename)
bank=CreateBank(FileSize(filename))
ReadBytes bank,file,0,BankSize(bank)
CloseFile file
Return bank
End Function


;===============================================================================


;~IDEal Editor Parameters:
;~F#B#12#1B#20#30#39#42#4F#55#5B#6C#75#7D
;~C#Blitz3D


Here's an example program to demonstrate the results:

Code: [Select]
Local i, tokenBank

tokenBank = BBLex_ScanFile("test.txt")

For i = 0 To BankSize(tokenBank) / 4 - 1
        Local tok.BBLex_Token = BBLex_GetToken(tokenBank, i)
Print tok Type + " : " + tokval
Next

WaitKey
End

Include "testlex.bb"


...and a really simple test file to tokenise:

Code: [Select]
12 345 56.45 doggy {- This is a
comment and shouldn't be picked up -}
"String literal 1!"n
n"String literal 2!"
doggy
65 -12.34
boogledoggy 35.8 "doggy as a string literal!"


And finally, the generator itself: [/i]

Code :
Code: BlitzBasic
  1. Write "Generating... "
  2. BBLex_Generate("Scythe lexer.txt","BBLex_Scythe.bb")            ;Change these to the desired input and output files
  3. Print "done!"
  4.  
  5. Print ""
  6. Print "Press any key to exit..."
  7.  
  8. WaitKey
  9. End
  10.  
  11.  
  12. Const SIZEOF_CONST = 9
  13.  
  14. Function BBLex_Generate(defFile$,lexFile$)              ;Generate a .bb lexer from the definitions given in defFile and output it as lexFile
  15.         Local dFile,dLine$,lFile,i,caseSen,userCodeOutput
  16.         Local ruleBank,constBank,modeBank
  17.        
  18.         dFile=ReadFile(defFile)
  19.         lFile=WriteFile(lexFile)
  20.         constBank=CreateBank()
  21.         modeBank=CreateBank(5)
  22.         PokeInt modeBank,0,StrToBank("")
  23.         ruleBank=CreateBank()
  24.        
  25.         WriteLine lFile,""
  26.         WriteLine lFile,";This file was automatically generated using BBLex: http://www.blitzbasic.com/codearcs/codearcs.php?code=2636"
  27.         WriteLine lFile,""
  28.        
  29.         While Not Eof(dFile)
  30.                 dLine=Replace(Replace(Lower(ReadLine(dFile)),Chr(9),"")," ","")
  31.                 Select dLine
  32.                         Case "caseinsensitive","case-insensitive"
  33.                                 caseSen=False
  34.                         Case "casesensitive","case-sensitive"
  35.                                 caseSen=True
  36.                         Case "constants:{"
  37.                                 LoadConstants(constBank,dFile)
  38.                         Case "modes:{"
  39.                                 LoadModes(modeBank,dFile)
  40.                         Case "rules:{"
  41.                                 LoadRules(ruleBank,dFile)
  42.                         Case "code:{"
  43.                                 WriteLine lFile,""
  44.                                 dLine=ReadLine(dFile)
  45.                                 While Not Eof(dFile)
  46.                                         If Left(Trim(dLine),1)="}" Then Exit
  47.                                         If userCodeOutput=False
  48.                                                 WriteLine lFile,""
  49.                                                 WriteLine lFile,Chr(9)+";User code:"
  50.                                                 WriteLine lFile,""
  51.                                                 userCodeOutput=True
  52.                                         EndIf
  53.                                         WriteLine lFile,dLine
  54.                                         dLine=ReadLine(dFile)
  55.                                 Wend
  56.                                 WriteLine lFile,""
  57.                 End Select
  58.         Wend
  59.        
  60.         CloseFile dFile
  61.         ProcessRules(ruleBank,modeBank,constBank)
  62.        
  63.         OutputLexer(constBank,ruleBank,lFile,caseSen,userCodeOutput)
  64.         CloseFile lFile
  65.        
  66.         For i=0 To BankSize(constBank)-SIZEOF_CONST Step SIZEOF_CONST
  67.                 FreeBank PeekInt(constBank,i)
  68.                 FreeBank PeekInt(constBank,i+4)
  69.         Next
  70.         FreeBank constBank
  71.         For i=0 To BankSize(modeBank)-5 Step 5
  72.                 FreeBank PeekInt(modeBank,i)
  73.         Next
  74.         FreeBank modeBank
  75.        
  76.         FreeBank ruleBank
  77. End Function
  78.  
  79. Function OutputLexer(constBank,ruleBank,lexFile,caseSen,userCodeOutput)
  80.         Local newLine$,i,j,action$
  81.        
  82.         newLine=Chr(13)+Chr(10)
  83.        
  84.         If userCodeOutput Then WriteLine lexFile,newLine+newLine+Chr(9)+";Generated code:"
  85.         WriteLine lexFile,newLine+"Include "+Chr(34)+"BBLex_Functions.bb"+Chr(34)+newLine
  86.        
  87.         If BankSize(constBank)
  88.                 For i=0 To BankSize(constBank)-SIZEOF_CONST Step SIZEOF_CONST
  89.                         If PeekByte(constBank,i+8)=True
  90.                                 WriteLine lexFile,"Const "+BankToStr(PeekInt(constBank,i))+" = "+BankToStr(PeekInt(constBank,i+4))
  91.                         EndIf
  92.                 Next
  93.                 WriteLine lexFile, ""
  94.         EndIf
  95.        
  96.         WriteLine lexFile,"Function BBLex_ScanData(sBank)"
  97.         WriteLine lexFile,Chr(9)+"Local rBank, mBank, tBank, cPtr"+Chr(9)+newLine+Chr(9)+"Local token$, cMatch$, rID, i, cMode"+newLine
  98.         WriteLine lexFile,Chr(9)+"rBank = BBLex_InitRegexen()"+newLine+Chr(9)+"mBank = BBLex_InitModes()"
  99.         WriteLine lexFile,Chr(9)+"tBank = CreateBank()"+newLine
  100.         WriteLine lexFile,Chr(9)+"While cPtr < BankSize(sBank)"+newLine+Chr(9)+Chr(9)+"token = "+Chr(34)+Chr(34)+newLine
  101.        
  102.         WriteLine lexFile,Chr(9)+Chr(9)+"For i = 0 to "+((BankSize(ruleBank)/12)-1)
  103.         WriteLine lexFile,Chr(9)+Chr(9)+Chr(9)+"If BBLex_ModeMatch(i, mBank, cMode)"
  104.         WriteLine lexFile,Chr(9)+Chr(9)+Chr(9)+Chr(9)+"cMatch = Regex_Match(Object.RegEx_Node(PeekInt(rBank, i * 4)), sBank, cPtr)"
  105.         WriteLine lexFile,Chr(9)+Chr(9)+Chr(9)+Chr(9)+"If Len(cMatch) > Len(token) Then token = cMatch : rID = i"
  106.         WriteLine lexFile,Chr(9)+Chr(9)+Chr(9)+"EndIf"+newLine+Chr(9)+Chr(9)+"Next"+newLine
  107.        
  108.         WriteLine lexFile,Chr(9)+Chr(9)+"If token = "+Chr(34)+Chr(34)+newLine+Chr(9)+Chr(9)+Chr(9)+"cPtr = cPtr + 1"
  109.         WriteLine lexFile,Chr(9)+Chr(9)+"Else"+newLine+Chr(9)+Chr(9)+Chr(9)+"Select rID"
  110.        
  111.         For i=0 To ((BankSize(ruleBank)/12)-1)
  112.                 WriteLine lexFile,Chr(9)+Chr(9)+Chr(9)+Chr(9)+"Case "+i
  113.                 action=BankToStr(PeekInt(ruleBank,i*12+8))
  114.                 Select Lower(Left(action,1))
  115.                         Case "s"
  116.                                 WriteLine lexFile,Chr(9)+Chr(9)+Chr(9)+Chr(9)+Chr(9)+"BBLex_StoreToken tBank, "+Trim(Mid(action,6))+", token"
  117.                         Case "t"
  118.                                 WriteLine lexFile,Chr(9)+Chr(9)+Chr(9)+Chr(9)+Chr(9)+"BBLex_StoreType tBank, "+Trim(Mid(action,6))
  119.                         Case "m"
  120.                                 WriteLine lexFile,Chr(9)+Chr(9)+Chr(9)+Chr(9)+Chr(9)+"cMode = "+Trim(Mid(action,5))
  121.                         Case "{"
  122.                                 WriteLine lexFile,Chr(9)+Chr(9)+Chr(9)+Chr(9)+Chr(9)+Mid(action,2)
  123.                 End Select
  124.         Next
  125.         WriteLine lexFile,Chr(9)+Chr(9)+Chr(9)+"End Select"
  126.         WriteLine lexFile,Chr(9)+Chr(9)+Chr(9)+"cPtr = cPtr + Len(token)"+newLine+Chr(9)+Chr(9)+"EndIf"+newLine+Chr(9)+"Wend"+newLine
  127.         WriteLine lexFile,Chr(9)+"BBLex_DeleteRegexen(rBank)"+newLine+Chr(9)+"BBLex_ClearModes(mBank)"
  128.         WriteLine lexFile,Chr(9)+"FreeBank sBank"+newLine
  129.         WriteLine lexFile,Chr(9)+"Return tBank"+newLine+"End Function"+newLine
  130.        
  131.         WriteLine lexFile,"Function BBLex_InitRegexen()"
  132.         WriteLine lexFile,Chr(9)+"Local regexBank"+newLine
  133.         WriteLine lexFile,Chr(9)+"regexBank = CreateBank("+(BankSize(ruleBank)/3)+")"+newLine
  134.        
  135.         For i=0 To BankSize(ruleBank)/12-1
  136.                 WriteLine lexFile,Chr(9)+"PokeInt regexBank, "+(i*4)+", Handle(Regex_Parse("+ExpandQuotes(BankToStr(PeekInt(ruleBank,i*12+4)))+", "+StrFromBool(caseSen)+"))"
  137.         Next
  138.        
  139.         WriteLine lexFile,newLine+Chr(9)+"Return regexBank"+newLine+"End Function"+newLine
  140.        
  141.         WriteLine lexFile,"Function BBLex_InitModes()"
  142.         WriteLine lexFile,Chr(9)+"Local modeBank"+newLine+Chr(9)+"modeBank = CreateBank("+(BankSize(ruleBank)/3)+")"+newLine
  143.        
  144.         For i = 0 To BankSize(ruleBank) / 12 - 1
  145.                 If BankSize(PeekInt(ruleBank, i * 12))
  146.                         WriteLine lexFile, Chr(9)+"PokeInt modeBank, " + i * 4 + ", CreateBank("+BankSize(PeekInt(ruleBank, i * 12)) + ")"
  147.                         For j = 0 To BankSize(PeekInt(ruleBank, i * 12)) - 4 Step 4
  148.                                 WriteLine lexFile, Chr(9) + "PokeInt PeekInt(modeBank, "+i*4+"), "+j+", " + PeekInt(PeekInt(ruleBank, i * 12), j)
  149.                         Next
  150.                 Else
  151.                         WriteLine lexFile, Chr(9) + "PokeInt modeBank, " + i * 4 + ", 0"
  152.                 EndIf
  153.         Next
  154.         WriteLine lexFile,newLine+Chr(9)+"Return modeBank"+newLine+"End Function"+newLine
  155. End Function
  156.  
  157. Function ExpandQuotes$(s$)
  158.         Local i
  159.        
  160.         If s = Chr(34) Then Return "Chr(34)"
  161.        
  162.         Local l$ = Left(s, 1), r$ = Right(s, 1), m$ = Mid(s, 2, Len(s) - 2)
  163.        
  164.         If l = Chr(34) Then l = "Chr(34) + " + Chr(34) : Else l = Chr(34) + l
  165.         If r = Chr(34) Then r = Chr(34) + " + Chr(34)" : Else r = r + Chr(34)
  166.         m = Replace(m, Chr(34), Chr(34) + " + Chr(34) + " + Chr(34))
  167.        
  168.         Return l + m + r
  169. End Function
  170.  
  171. Function StrFromBool$(b)
  172.         If b Then Return "True" Else Return "False"
  173. End Function
  174.  
  175. Function LoadConstants(constBank,dFile)
  176.         Local dLine$,cName$,cValue$,i, export
  177.        
  178.         While Not Eof(dFile)
  179.                 dLine=Trim(ReadLine(dFile))
  180.                 If Left(dLine,1)="}" Then Exit
  181.                
  182.                 If dLine<>""
  183.                         If Left(dLine,1)<>";"
  184.                                 For i=1 To Len(dLine)
  185.                                         If i>1
  186.                                                 If Mid(dLine,i-1,1)<>""
  187.                                                         If Asc(Mid(dLine,i,1))<=32 Then Exit
  188.                                                 EndIf
  189.                                         EndIf
  190.                                         cName=cName+Mid(dLine,i,1)
  191.                                 Next
  192.                                
  193.                                 dLine=Trim(Mid(dLine,i+1))
  194.                                
  195.                                 For i=1 To Len(dLine)
  196.                                         If i>1
  197.                                                 If Mid(dLine,i-1,1)<>""
  198.                                                         If Asc(Mid(dLine,i,1))<=32 Then Exit
  199.                                                 EndIf
  200.                                         EndIf
  201.                                         cValue=cValue+Mid(dLine,i,1)
  202.                                 Next
  203.                                 dLine=Trim(Mid(dLine,i))
  204.                                
  205.                                 ResizeBank constBank,BankSize(constBank)+SIZEOF_CONST
  206.                                 PokeInt constBank,BankSize(constBank)-SIZEOF_CONST,StrToBank(cName)
  207.                                 PokeInt constBank,BankSize(constBank)-(SIZEOF_CONST-4),StrToBank(cValue)
  208.                                 PokeByte constBank,BankSize(constBank)-(SIZEOF_CONST-8),(Lower(Left(dLine,6))="export")
  209.                                 cName=""
  210.                                 cValue=""
  211.                         EndIf
  212.                 EndIf
  213.         Wend
  214. End Function
  215.  
  216. Function LoadModes(modeBank,dFile)
  217.         Local dLine$,mName$,i
  218.        
  219.         While Not Eof(dFile)
  220.                 dLine=Trim(ReadLine(dFile))
  221.                 If Left(dLine,1)="}" Then Exit
  222.                
  223.                 If dLine<>""
  224.                         If Left(dLine,1)<>";"
  225.                                 For i=1 To Len(dLine)
  226.                                         If i>1
  227.                                                 If Mid(dLine,i-1,1)<>""
  228.                                                         If Asc(Mid(dLine,i,1))<=32 Then Exit
  229.                                                 EndIf
  230.                                         EndIf
  231.                                         mName=mName+Mid(dLine,i,1)
  232.                                 Next
  233.                                
  234.                                 dLine=Trim(Mid(dLine,i+1))
  235.                                
  236.                                 ResizeBank modeBank,BankSize(modeBank)+5
  237.                                 PokeInt modeBank,BankSize(modeBank)-5,StrToBank(mName)
  238.                                
  239.                                 If Lower(Left(dLine,2))="in" Then PokeByte modeBank,BankSize(modeBank)-1,1:Else PokeByte modeBank,BankSize(modeBank)-1,0
  240.                                 mName=""
  241.                         EndIf
  242.                 EndIf
  243.         Wend
  244. End Function
  245.  
  246. Function LoadRules(ruleBank,dFile)
  247.         Local dLine$,cPtr,mode$,rule$,action$
  248.        
  249.         While Not Eof(dFile)
  250.                 dLine=Trim(ReadLine(dFile))
  251.                 If Left(dLine,1)="}" Then Exit
  252.                
  253.                 If dLine<>""
  254.                         If Left(dLine,1)<>";"
  255.                                 mode=""
  256.                                
  257.                                 If Left(dLine,1)="<"
  258.                                         cPtr=2
  259.                                         While Mid(dLine,cPtr,1)<>">"
  260.                                                 If Asc(Mid(dLine,cPtr,1))>32 Then mode=mode+Mid(dLine,cPtr,1)
  261.                                                 cPtr=cPtr+1
  262.                                         Wend
  263.                                         dLine=Trim(Mid(dLine,cPtr+1))
  264.                                 EndIf
  265.                                
  266.                                 rule=""
  267.                                 For cPtr=1 To Len(dLine)
  268.                                         If cPtr>2
  269.                                                 If Mid(dLine,cPtr-1,1)<>""
  270.                                                         If Asc(Mid(dLine,cPtr,1))<=32 Then Exit
  271.                                                 ElseIf Mid(dLine,cPtr-2,2)="\"
  272.                                                         If Asc(Mid(dLine,cPtr,1))<=32 Then Exit ;If that backslash was part of the pattern
  273.                                                 EndIf
  274.                                         ElseIf cPtr=2
  275.                                                 If Left(dLine,1)<>""
  276.                                                         If Asc(Mid(dLine,cPtr,1))<=32 Then Exit
  277.                                                 EndIf
  278.                                         EndIf
  279.                                         rule=rule+Mid(dLine,cPtr,1)
  280.                                 Next
  281.                                 dLine=Trim(Mid(dLine,cPtr))
  282.                                
  283.                                 action=dLine
  284.                                 If Left(dLine,1)="{"
  285.                                         While Not Eof(dFile)
  286.                                                 dLine=Trim(ReadLine(dFile))
  287.                                                 If Left(dLine,1)="}" Then action=action+Chr(13)+Chr(10):Exit
  288.                                                
  289.                                                 action=action+Chr(13)+Chr(10)+dLine
  290.                                         Wend
  291.                                 EndIf
  292.                                
  293.                                 ResizeBank ruleBank,BankSize(ruleBank)+12
  294.                                 PokeInt ruleBank,BankSize(ruleBank)-12,StrToBank(mode)
  295.                                 PokeInt ruleBank,BankSize(ruleBank)-8,StrToBank(rule)
  296.                                 PokeInt ruleBank,BankSize(ruleBank)-4,StrToBank(action)
  297.                         EndIf
  298.                 EndIf
  299.         Wend
  300. End Function
  301.  
  302. Function ProcessRules(ruleBank,modeBank,constBank)
  303.         Local r,c,m,mode$,rule$,action$
  304.        
  305.         For r=0 To BankSize(ruleBank)-12 Step 12
  306.                 mode=BankToStr(PeekInt(ruleBank,r))
  307.                 FreeBank PeekInt(ruleBank,r)
  308.                 PokeInt ruleBank,r,CreateBank()
  309.                
  310.                 While mode<>""
  311.                         If Right(mode,1)=","
  312.                                 ResizeBank PeekInt(ruleBank,r),BankSize(PeekInt(ruleBank,r))+4
  313.                                 PokeInt PeekInt(ruleBank,r),BankSize(PeekInt(ruleBank,r))-4,0
  314.                                 mode=Trim(Left(mode,Len(mode)-1))
  315.                         EndIf
  316.                         If Instr(mode,",")>0
  317.                                 For m=0 To BankSize(modeBank)-5 Step 5
  318.                                         If Left(mode,Instr(mode,",")-1)=BankToStr(PeekInt(modeBank,m))
  319.                                                 ResizeBank PeekInt(ruleBank,r),BankSize(PeekInt(ruleBank,r))+4
  320.                                                 If PeekByte(modeBank,m+4)=1
  321.                                                         PokeInt PeekInt(ruleBank,r),BankSize(PeekInt(ruleBank,r))-4,-(m/5)
  322.                                                 Else
  323.                                                         PokeInt PeekInt(ruleBank,r),BankSize(PeekInt(ruleBank,r))-4,m/5
  324.                                                 EndIf
  325.                                         EndIf
  326.                                 Next
  327.                                 mode=Mid(mode,Instr(mode,",")+1)
  328.                         Else
  329.                                 For m=0 To BankSize(modeBank)-5 Step 5
  330.                                         If mode=BankToStr(PeekInt(modeBank,m))
  331.                                                 ResizeBank PeekInt(ruleBank,r),BankSize(PeekInt(ruleBank,r))+4
  332.                                                 If PeekByte(modeBank,m+4)=1
  333.                                                         PokeInt PeekInt(ruleBank,r),BankSize(PeekInt(ruleBank,r))-4,-(m/5)
  334.                                                 Else
  335.                                                         PokeInt PeekInt(ruleBank,r),BankSize(PeekInt(ruleBank,r))-4,m/5
  336.                                                 EndIf
  337.                                         EndIf
  338.                                 Next
  339.                                 mode=""
  340.                         EndIf
  341.                 Wend
  342.                
  343.                 rule=BankToStr(PeekInt(ruleBank,r+4))
  344.                 FreeBank PeekInt(ruleBank,r+4)
  345.                 For c=0 To BankSize(constBank)-SIZEOF_CONST Step SIZEOF_CONST
  346.                         rule=Replace(rule,"{"+BankToStr(PeekInt(constBank,c))+"}",BankToStr(PeekInt(constBank,c+4)))
  347.                 Next
  348.                 PokeInt ruleBank,r+4,StrToBank(rule)
  349.                
  350.                 action=BankToStr(PeekInt(ruleBank,r+8))
  351.                 If Left(action,1)<>"{"
  352.                         FreeBank PeekInt(ruleBank,r+8)
  353.                         If Lower(Left(action,5))="store"
  354.                                 For c=0 To BankSize(constBank)-SIZEOF_CONST Step SIZEOF_CONST
  355.                                         If PeekByte(constBank, c + 8) = False
  356.                                                 action="store "+Replace(Mid(action,6),"{"+BankToStr(PeekInt(constBank,c))+"}",BankToStr(PeekInt(constBank,c+4)))
  357.                                         EndIf
  358.                                 Next
  359.                         ElseIf Lower(Left(action,4))="type"
  360.                                 For c=0 To BankSize(constBank)-SIZEOF_CONST Step SIZEOF_CONST
  361.                                         If PeekByte(constBank, c + 8) = False
  362.                                                 action="type "+Replace(Mid(action,5),"{"+BankToStr(PeekInt(constBank,c))+"}",BankToStr(PeekInt(constBank,c+4)))
  363.                                         EndIf
  364.                                 Next
  365.                         ElseIf Lower(Left(action,4))="mode"
  366.                                 For m=0 To BankSize(modeBank)-5 Step 5
  367.                                         If PeekByte(modeBank,m+4)=1
  368.                                                 action=Replace(action,"<"+BankToStr(PeekInt(modeBank,m))+">",-(m/5))
  369.                                         Else
  370.                                                 action=Replace(action,"<"+BankToStr(PeekInt(modeBank,m))+">",m/5)
  371.                                         EndIf
  372.                                 Next
  373.                         EndIf
  374.                         PokeInt ruleBank,r+8,StrToBank(action)
  375.                 EndIf
  376.         Next
  377. End Function
  378.  
  379. Function StrToBank(s$)          ;Return a bank containing the binary value of the given string
  380.         Local i,bank
  381.         bank=CreateBank(Len(s))
  382.         For i=0 To Len(s)-1
  383.                 PokeByte bank,i,Asc(Mid(s,i+1,1))
  384.         Next
  385.         Return bank
  386. End Function
  387.  
  388. Function BankToStr$(bank)               ;Return a string containing the ASCII value of the given bank
  389.         Local i,s$
  390.         For i=0 To BankSize(bank)-1
  391.                 s=s+Chr(PeekByte(bank,i))
  392.         Next
  393.         Return s
  394. End Function
  395.  
  396. ;~IDEal Editor Parameters:
  397. ;~F#E#4F#9D#AB#AF#D8#F6#12E#17B#184
  398. ;~C#Blitz3D


Comments :


Yasha(Posted 1+ years ago)

 Here are a couple of more useful examples. This generates a small lexer for a simple QuakeC-like language:
Code: [Select]

Modes: {
Comment Exclusive
InString Exclusive
}

Rules: {

;Punctuation
<= Store 1
>= Store 2
== Store 3
!= Store 4
:: Store 5
; Store 6
, Store 7
! Store 8
* Store 9
/ Store 10
( Store 11
) Store 12
- Store 13
+ Store 14
= Store 15
[ Store 16
] Store 17
{ Store 18
} Store 19
. Store 20
< Store 21
> Store 22
# Store 23
&&? Store 24
||? Store 25
^ Store 26
% Store 27
: Store 28

;Values
[0-9]+ { StoreNumericToken tBank,29,token
}
[0-9]*.[0-9]+ { StoreNumericToken tBank,30,token
}
" Mode <InString>
<InString> [^
"]* Store 31
<InString>
|" Mode <>

;Comments
/* Mode <Comment>
<Comment> */ Mode <>
//[^
]*


;Names
[a-zA-Z_][a-zA-Z0-9_]* Store 32
}

Code: {
;If the previous token was a minus, check if it was subtraction or negation and store appropriately
Function StoreNumericToken(tBank,numType,token$)
Local i
If BankSize(tBank)
If BBLex_TokenType(tBank,(BankSize(tBank)/8)-1)=13
If BankSize(tBank)>8
If TokenSubtractible(BBLex_TokenType(tBank,(BankSize(tBank)/8)-2))=False
RemoveLastToken(tBank)
token="-"+token
EndIf
Else
RemoveLastToken(tBank)
token="-"+token
EndIf
EndIf
EndIf
BBLex_StoreToken tBank,numType,token
End Function

;Removes the last token from the given token bank
Function RemoveLastToken(tBank)
Local i
For i=0 To BankSize(tBank)/8-1
If PeekInt(tBank,BankSize(tBank)-8)=BBLex_TokenType(tBank,i) Then Exit
Next
If i=BankSize(tBank)/8 Then FreeBank PeekInt(tBank,BankSize(tBank)-4)
ResizeBank tBank,BankSize(tBank)-8
End Function

;Take a token type and see if it's an operator or a term
Function TokenSubtractible(tokenType)
If tokenType>=29 Or tokenType=12 Or tokenType=17
Return True
Else
Return False
EndIf
End Function
}
This generates a complete lexer for the C programming language, as described in the reference grammar at the back of K&R (ANSI C89, not including preprocessor):
Code: [Select]

Constants: {
    OCT 0[0-9]+
DEC [1-9][0-9]+
HEX 0[xX][0-9a-fA-F]+
INTSUFFIX ([uU]|([lL][uU]?))
FLTSUFFIX ([fF]|([lL][fF]?))
CHAR '\?.'
}

Modes: {
    COMMENT Exclusive
INSTRING Exclusive
}

Rules: {
;Constants
    {DEC}{INTSUFFIX}? { StoreNumericToken tBank,1,token
}
{OCT}{INTSUFFIX}? { StoreNumericToken tBank,1,token
}
{HEX}{INTSUFFIX}? { StoreNumericToken tBank,1,token
}
[0-9]*.[0-9]+([eE]-?{DEC})?{FLTSUFFIX}? { StoreNumericToken tBank,2,token
}
{CHAR} Store 3
L{CHAR} Store 4
" Mode <INSTRING>
<INSTRING> [^
"]* Store 5
<INSTRING> [
"] Mode <>

;Comments
/* Mode <COMMENT>
<COMMENT> */ Mode <>
//[^
]*


;Punctuation
; Store 6
{ Store 7
} Store 8
, Store 9
= Store 10
: Store 11
( Store 12
) Store 13
[ Store 14
] Store 15
* Store 16
... Store 17
*= Store 18
/= Store 19
%= Store 20
+= Store 21
-= Store 22
<<= Store 23
>>= Store 24
&= Store 25
^= Store 26
|= Store 27
? Store 28
|| Store 29
&& Store 30
| Store 31
^ Store 32
& Store 33
== Store 34
!= Store 35
< Store 36
> Store 37
<= Store 38
>= Store 39
<< Store 40
>> Store 41
+ Store 42
- Store 43
* Store 44
/ Store 45
% Store 46
++ Store 47
-- Store 48
~ Store 49
! Store 50
. Store 51
-> Store 52

;Keywords
auto Store 53
register Store 54
static Store 55
extern Store 56
typedef Store 57
void Store 58
char Store 59
short Store 60
int Store 61
long Store 62
float Store 63
double Store 64
signed Store 65
unsigned Store 66
const Store 67
volatile Store 68
struct Store 69
union Store 70
enum Store 71
case Store 72
default Store 73
if Store 74
else Store 75
switch Store 76
while Store 77
do Store 78
for Store 79
goto Store 80
continue Store 81
break Store 82
return Store 83
sizeof Store 84

;Identifiers
[a-zA-Z_][a-zA-Z0-9_]* Store 85
}

Code: {
;If the previous token was a minus, check if it was subtraction or negation and store appropriately
Function StoreNumericToken(tBank,numType,token$)
Local i
If BankSize(tBank)
If BBLex_TokenType(tBank,(BankSize(tBank)/8)-1)=43
If BankSize(tBank)>8
If TokenSubtractible(BBLex_TokenType(tBank,(BankSize(tBank)/8)-2))=False
RemoveLastToken(tBank)
token="-"+token
EndIf
Else
RemoveLastToken(tBank)
token="-"+token
EndIf
EndIf
EndIf
BBLex_StoreToken tBank,numType,token
End Function

;Removes the last token from the given token bank
Function RemoveLastToken(tBank)
Local i
For i=0 To BankSize(tBank)/8-1
If PeekInt(tBank,BankSize(tBank)-8)=BBLex_TokenType(tBank,i) Then Exit
Next
If i=BankSize(tBank)/8 Then FreeBank PeekInt(tBank,BankSize(tBank)-4)
ResizeBank tBank,BankSize(tBank)-8
End Function

;Take a token type and see if it's an operator or a term
Function TokenSubtractible(tokenType)
Select tokenType
Case 1,2,3,4,13,15,85
Return True
Default
Return False
End Select
End Function
}



Dabhand(Posted 1+ years ago)

 Nice work! :)Dabz [/i]

 

SimplePortal 2.3.6 © 2008-2014, SimplePortal