October 28, 2020, 11:20:41 PM

Author Topic: [bmx] SplitString by Perturbatio [ 1+ years ago ]  (Read 3431 times)

Offline BlitzBot

  • Jr. Member
  • **
  • Posts: 1
[bmx] SplitString by Perturbatio [ 1+ years ago ]
« on: June 29, 2017, 12:28:40 AM »
Title : SplitString
Author : Perturbatio
Posted : 1+ years ago

Description : Function to split a string at the specified delimiters and returns a Tlist as the result

Code :
Code: BlitzMax
  1. Function SplitString:TList(inString:String, Delim:String)
  2.         Local tempList : TList = New TList
  3.         Local currentChar : String = ""
  4.         Local count : Int = 0
  5.         Local TokenStart : Int = 0
  6.        
  7.         If Len(Delim)<>1 Then Return Null
  8.        
  9.         inString = Trim(inString)
  10.        
  11.         For count = 0 Until Len(inString)
  12.                 If inString[count..count+1] = delim Then
  13.                         tempList.AddLast(inString[TokenStart..Count])
  14.                         TokenStart = count + 1
  15.                 End If
  16.         Next
  17.         tempList.AddLast(inString[TokenStart..Count])  
  18.         Return tempList
  19. End Function
  20.  
  21. 'Example usage:
  22. Local myList:TList = SplitString("This is a longer test string that I am using to test this split string test thing", " ")
  23.  
  24. If myList Then
  25.         For a$ = EachIn myList
  26.                 Print a$
  27.         Next
  28. EndIf


Comments : none...

Offline Henri

  • Sr. Member
  • ****
  • Posts: 263
Re: [bmx] SplitString by Perturbatio [ 1+ years ago ]
« Reply #1 on: June 24, 2018, 02:09:41 PM »
Hi,

sorry I'm bit late to comment:-(.

Blitzmax already has a built-in split string method:
Code: BlitzMax
  1. Local txt:String = "This is a longer test string that I am using to test this split string test thing"
  2. Local ar:String[] = txt.Split(" ")
  3.  
  4. For Local i:Int = 0 Until ar.length
  5.         Print ar[i]
  6. Next

Split is a basic tool for parsing text in an easy way. Yours is a slightly lower level approach, but good when more advanced control is needed: For instance split can only have one delimiter as far as I know.
 
-Henri
- Got 01100011 problems, but the bit ain't 00000001

Offline Derron

  • Hero Member
  • *****
  • Posts: 3237
Re: [bmx] SplitString by Perturbatio [ 1+ years ago ]
« Reply #2 on: June 24, 2018, 02:34:49 PM »
His whole function seems to do performance-critical stuff:

- initializes new objects and afterwards returns null when an invalid param is passed (better return null right at the beginning as no precalculations are needed)

- does a "bla[c ... c+1] = delim" check which is a "slice" operation instead of just checking the array indice itself (dunno if the "ASM" code is the same at the end but always thought "slicing" is more expensive)


bye
Ron

Offline Henri

  • Sr. Member
  • ****
  • Posts: 263
Re: [bmx] SplitString by Perturbatio [ 1+ years ago ]
« Reply #3 on: June 24, 2018, 03:51:52 PM »
Yes, slicing in checking is unnecessary.

Better way would be checking against character code.

Here is a sligtly more optimised version with possible to add multiple delimiters

Code: BlitzMax
  1. Function SplitString:TList(inString:String, delim:String[])
  2.                
  3.         If Not inString Or Not delim Then Return Null
  4.        
  5.         Local tempList:TList = New TList
  6.         Local currentChar:String = ""
  7.         Local count:Int = 0
  8.         Local tokenStart:Int = 0
  9.         Local delimChar:Int
  10.         Local delimPos:Int
  11.        
  12.         'Multiple delimiters
  13.         If delim.length > 1
  14.                 For count = 0 Until inString.length
  15.                         For delimPos = 0 Until delim.length
  16.                                 If inString[count] = delim[delimPos][0] Then
  17.                                         tempList.AddLast(inString[tokenStart..Count])
  18.                                         tokenStart = count + 1
  19.                                         Exit
  20.                                 End If
  21.                         Next
  22.                 Next
  23.        
  24.         'Single delimiter. Just to save few microseconds to process this seperately
  25.         Else
  26.                 delimChar = Asc(delim[0])
  27.                 For count = 0 Until inString.length
  28.                         If inString[count] = delimChar Then
  29.                                 tempList.AddLast(inString[tokenStart..Count])
  30.                                 tokenStart = count + 1
  31.                         End If         
  32.                 Next
  33.         EndIf
  34.        
  35.         tempList.AddLast(inString[TokenStart..Count])  
  36.         Return tempList
  37. End Function
  38.  
  39. 'Example usage:
  40. Local myList:TList = SplitString("This is a longer test string that I am using,to,test,this split string test thing", [" ", ","])
  41.  
  42. For Local s:String = EachIn myList
  43.         Print s
  44. Next


-Henri
- Got 01100011 problems, but the bit ain't 00000001

Offline Derron

  • Hero Member
  • *****
  • Posts: 3237
Re: [bmx] SplitString by Perturbatio [ 1+ years ago ]
« Reply #4 on: June 24, 2018, 04:33:53 PM »
You have this line here:
inString = inString.Trim()

which _can_ create trouble as it removes whitespace.

Imagine someone used "    " (four whitespaces as delim - 4 whitespaces is common when replacing "tab" with spaces). Now the glued together code of "empty string" + "empty string" + "hey" becomes:
4 whitespaces
4 whitespaces
hey

With your trim() call you destroy the data.


Regardless of how "efficient" something is: never manipulate data if you cannot ensure that the manipulation just contains unused data. " " space is a valid delimiter.


bye
Ron

Offline Henri

  • Sr. Member
  • ****
  • Posts: 263
Re: [bmx] SplitString by Perturbatio [ 1+ years ago ]
« Reply #5 on: June 24, 2018, 05:13:28 PM »
What I wrote doesn't support 4 character delimiters, only single. It would have to be modified to allow them.
Also this is very basic, and doesn't cover advanced scenarios. Again it would have to be modified for that.

Trim was left there, because of original posters desire to trim unprintable characters at both ends of input string. From my point of view there is a valid reason for it, as you might not want those as a token. It depends on what you want.

Here are some split examples http://pages.cs.wisc.edu/~hasti/cs302/examples/Parsing/parseString.html

What above is, is a simple parsing. Parsing something like computer code (which I assume you mean, for case like replacing tabs with 4 spaces) requires some more advanced stuff as you have consider things like comments / blocks / different EOL-chars / typing styles / Start of line etc.

-Henri
- Got 01100011 problems, but the bit ain't 00000001

Offline Derron

  • Hero Member
  • *****
  • Posts: 3237
Re: [bmx] SplitString by Perturbatio [ 1+ years ago ]
« Reply #6 on: June 24, 2018, 07:02:41 PM »
Ah yes... my fault. Code clearly only handles single-char delims. So most probably ignore what I wrote about the 4-spaces delimiter.

Nonetheless: same example, this time the delimiter becomes "~t" (tab)  which is pretty common (eg. tab separated values - like in a "csv" - instead of commas)
Again the first serialized objects are empty strings. So the resulting string becomes "~t~they".

The trim now removes the preceding tabs - which results in the first param becoming "hey" instead of "". This is why "trim()" is not useful for splitting stuff. I would only trim results - and this only if I know that there cannot be any whitespace at the begin or end. so an "hey " would get trimmed to "hey".


bye
Ron

Offline Henri

  • Sr. Member
  • ****
  • Posts: 263
Re: [bmx] SplitString by Perturbatio [ 1+ years ago ]
« Reply #7 on: June 24, 2018, 08:32:18 PM »
I agree with you that trimming should be left to the user, so I updated the code (although it was not my intention to supersede the OP :-) )

On a side note, wanting leading or trailing hidden characters on a token in general is almost never desired.
 
-Henri
- Got 01100011 problems, but the bit ain't 00000001

Offline Derron

  • Hero Member
  • *****
  • Posts: 3237
Re: [bmx] SplitString by Perturbatio [ 1+ years ago ]
« Reply #8 on: June 24, 2018, 10:48:22 PM »
yes, on a token it might not be desired - which is why you could trim the values before adding to the list. But as described above (and seconded by you) the trimming of the original/raw string can lead to issues.


The OP is no longer active for many years now, so I assume he wont come here full of anger because you optimized his outdated code :-)


bye
Ron

 

SimplePortal 2.3.6 © 2008-2014, SimplePortal