Processing the " within a string

Started by Mikey, June 12, 2018, 01:35:44

Previous topic - Next topic

Mikey


I'm using the command inside of B3D

Instr(DMY$,Chr$(34))

which does not work.
How can I process this using the instr command?

Matty

save this file as test.bb in your folder and run from in the ide...it finds it fine:

;here is a string with a " in it.
infile = ReadFile("test.bb")
While(Not(Eof(infile)))
val$ = ReadLine(infile)
If(Instr(val$,Chr$(34),1)>0) Then
RuntimeError "found it!"
EndIf
Wend
CloseFile infile

TomToad

You need to show us more of what you are doing.  This works fine on my computer.

l$ = "This is a "+Chr(34)+"quote"+Chr(34)



Print l$

Print Instr(l$,Chr$(34))
------------------------------------------------
8 rabbits equals 1 rabbyte.

RemiD

@TomToad>>Thanks ! i needed to add the " symbol to a string line, in the past, but did not know how to do it... Good :)

Mikey


I changed the code to something this before seeing the posts here

PL$= ReadLine(FH1):Print PL$:Print Len(PL$);Print Asc(Mid$(PL$,Instr(PL$,",")-1))
RL$=PL$
While Instr(RL$,",")
TC=Instr(RL$,",")
If TC
.CK If Asc(Mid$(RL$,TC-1,1))<>34
TC=TC+Instr(Mid$(RL$,TC+1),","):Goto CK
ElseIf Asc(Mid$(RL$,TC-1,1))=34
Print Mid$(RL$,2,Len(RL$)-1)
EndIf
SD=SD+1
If SD>=2
Print TC:Print Mid$(RL$,2,TC-1):HALT(1)
EndIf
RL$=Mid$(RL$,TC+1)
EndIf
; E=E+1
Wend


It's not so much the reading of the " . The other issue is that the text I'm parsing has them in odd places.

Steve Elliott

Win11 64Gb 12th Gen Intel i9 12900K 3.2Ghz Nvidia RTX 3070Ti 8Gb
Win11 16Gb 12th Gen Intel i5 12450H 2Ghz Nvidia RTX 2050 8Gb
Win11  Pro 8Gb Celeron Intel UHD Graphics 600
Win10/Linux Mint 16Gb 4th Gen Intel i5 4570 3.2GHz, Nvidia GeForce GTX 1050 2Gb
macOS 32Gb Apple M2Max
pi5 8Gb
Spectrum Next 2Mb


col

https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

Steve Elliott

lol you'll goto spaghetti hell...And nobody wants to go there  ;D
Win11 64Gb 12th Gen Intel i9 12900K 3.2Ghz Nvidia RTX 3070Ti 8Gb
Win11 16Gb 12th Gen Intel i5 12450H 2Ghz Nvidia RTX 2050 8Gb
Win11  Pro 8Gb Celeron Intel UHD Graphics 600
Win10/Linux Mint 16Gb 4th Gen Intel i5 4570 3.2GHz, Nvidia GeForce GTX 1050 2Gb
macOS 32Gb Apple M2Max
pi5 8Gb
Spectrum Next 2Mb

col

Hehe,

In all seriousness :) instead of us making fun we could offer some help with parsing, if its wanted of course.

If you were to look up 'lexical analyzer' ( don't worry about the name - it just looks all technical and sci-fi but its not really ) you could create a powerful parser that can parse practically anything you want.

The use of a lexical analyzer is to break a big stream of characters into more manageable pieces called 'tokens'. You can then 'parse' the tokens much more easily. Think of it as breaking a huge sentence of text into words by using the space character as the separator between the words, then all 'non-space' characters get grouped into a single word - you now only need to deal with 'words' instead of the huge stream of characters.
https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

Mikey

#10
QuoteIf you were to look up 'lexical analyzer' ( don't worry about the name - it just looks all technical and sci-fi but its not really ) you could create a powerful parser that can parse practically anything you want.

Yes that would suffice but I'm using BB and it has its limitations.
On the old site I think someone made something that would do that but it might be too much additional code.

Yes I do know that a goto statement can make you code more unorganized but not one or two of them however, if that's all you will use for directing code then : Look Out !!

Here is the line I'm trying to parse.

"07076051.wav","Two-stroke petrol engine driving small elevator, start, run, stop.","194","Engines: Petrol","EC117D","Diesel & Petrol Engines","4"

col

#11
omg I'm so rusty with B3D and BB code  :))

I managed to throw this together in my lunch break. I ran out of time to fully test it though  ;)
It may seem complicated at first but it would be the beginnings of a full-on lexer and parser.
I'm so used to OO code nowadays that I tried to take an OO approach to creating a lexical analyzer and a parser to parse the line you mentioned above.

If it's too complicated then oh well, maybe you or someone may be able to cherry pick bits and pieces for their own benefit - or just throw it in the bin :D

For the sake of keeping the code and data in one source file I change the quotes ( " ) in what would be the text file to ~q ( which is a BlitzMax style 'in text' quote ) and the code parses that format accordingly. You would need to change that part to suit handling the quote symbol if you're loading from a file.
EDIT: Coffee break time - I modified the code to handle the string data as you have it ( with " ).

Also there may be slight bugs in there as I doubt it handles 'corner' cases and will need thorough testing.



Const TOKEN_UNKNOWN = 0
Const TOKEN_STRING = 1
Const TOKEN_COMMA = 2
Const TOKEN_EOL = 3
Const TOKEN_EOF = 4

Type TToken
Field value$
Field tipe
End Type

Type TLexer
Field in$
Field inlength

Field token_start
Field token_end
End Type


Function TokenTypeToString$(TokenType)
Select TokenType
Case TOKEN_UNKNOWN Return "TOKEN_UNKNOWN"
Case TOKEN_STRING Return  "TOKEN_STRING "
Case TOKEN_COMMA Return   "TOKEN_COMMA  "
Case TOKEN_EOL Return     "TOKEN_EOL    "
Case TOKEN_EOF Return     "TOKEN_EOF    "
End Select
End Function

Function Lexer_NextToken.TToken(lex.TLexer)
; default to an unknown token
Local t.TToken = New TToken
t\tipe = TOKEN_UNKNOWN

; reset the token start position
lex\token_start = lex\token_end

; end of line?
If Lexer_IsEndOfLine(lex) ; could deal with end-of-file also if wanted
t\tipe = TOKEN_EOL
t\value = "end-of-line"

; is a '~q' - change as required
Else If Lexer_IsQuote(lex)
While lex\token_end <= lex\inlength
lex\token_end = lex\token_end + 1
If Lexer_IsQuote(lex) Exit
Wend
t\tipe = TOKEN_STRING
t\value = Mid(lex\in, lex\token_start, lex\token_end - lex\token_start)
Else
; make a known symbol
If Asc(Mid(lex\in, lex\token_end, 1)) = 44 ; ','
t\tipe = TOKEN_COMMA
t\value = ","
lex\token_end = lex\token_end + 1
EndIf
EndIf

Return t
End Function

Function Lexer_IsQuote(lex.TLexer)
; change this code to suit detecting a " character instead of ~q
If Mid(lex\in, lex\token_end, 1) = Chr(34)
lex\token_end = lex\token_end + 1
Return True
EndIf
End Function

Function Lexer_IsEndOfLine(lex.TLexer)
Return lex\token_end >= lex\inlength
End Function





; you would now create a TParser to handle the token types as per the syntax that you expect
Type TParser
Field lexer.TLexer
Field token.TToken
End Type

Function Parser_Parse(parser.TParser)
Parser_NextToken(parser)

While parser\token\tipe <> TOKEN_EOL
; Select the parser\token\tipe and do something meaningful with it, here the code prints out the token data
; You would do something more meaningful with the data
Print TokenTypeToString(parser\token\tipe) + " : " + parser\token\value
Parser_NextToken(parser)
Wend
End Function

Function Parser_NextToken(parser.TParser)
parser\token = Lexer_NextToken(parser\lexer)
End Function

Local in$ = Chr(34) + "07076051.wav" + Chr(34) + ","
in = in + Chr(34) + "Two-stroke petrol engine driving small elevator, start, run, stop." + Chr(34) + ","
in = in + Chr(34) + "194" + Chr(34) + ","
in = in + Chr(34) + "Engines: Petrol" + Chr(34) + ","
in = in + Chr(34) + "EC117D" + Chr(34) + ","
in = in + Chr(34) + "Diesel & Petrol Engines" + Chr(34) + ","
in = in + Chr(34) + "4"+Chr(34)

; set up a lexer with data
Local lexer.TLexer = New TLexer
lexer\token_start = 1
lexer\token_end = 1
lexer\in = in ; "~q07076051.wav~q,~qTwo-stroke petrol engine driving small elevator, start, run, Stop.~q,~q194~q,~qEngines: Petrol~q,~qEC117D~q,~qDiesel & Petrol Engines~q,~q4~q"
lexer\inlength = Len(lexer\in)


; setup a parser with the lexer
Local parser.TParser = New TParser
parser\lexer = lexer

; use the parser to control the lexer and parse the output of the lexer tokens
Parser_Parse(parser)





https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

3DzForMe

@col. lovely piece of code,  did a slice of data parsing myself in the past with B3D, still use my clunkier code to good effect. Only one that really benefits from it is moi. Hey Ho ;) :))
BLitz3D, IDEal, AGK Studio, BMax, Java Code, Cerberus
Recent Hardware: Dell Laptop
Oldest Hardware: Commodore Amiga 1200 with 1084S Monitor & Blitz Basic 2.1

Mikey

Using Polymorphism would have mad the source smaller, but it does work