JSON parser for NG?

Started by Yellownakji, April 06, 2019, 23:16:59

Previous topic - Next topic

Derron

As said above: yes brl.json uses fromutf8string ...which does not support that emoji utf8 code range yet.
Let's wait for a fix for the linked issue.


Bye
Ron

TomToad

Been looking at this issue a little closer. 
Let me break down what exactly brl.json is doing when you call TJSON.Load()
First, before being parsed, the json string is converted to an UTF-8 string.  Note that this is before parsing, so escaped unicode characters are converted as literals. i.e. "\u1234" becomes '\','u','1','2','3','4'.  This string is then sent to pub.janson where it is parsed.  The escaped unicode characters are then parsed and translated to its UTF-8 equivalent.  When you then try and access this string, it is converted from UTF-8 to a BlitzMax string (where we get the error).

The first two unicode values in Hezkore's example is actually a surrogate pair.  This is 21 bits of additional characters encoded into 2 16 bit UCS-2 characters. So \ud83e\udd37 encodes into code point   0x1F937 which is a shrug emoji.  This gets translated to the utf-8 string F0 9F A4 B7.

The solution here seems simpler than it first appears.  All that would need to be done is to first, modify the bbStringFromUTF8() function to work with 4 byte sequences (currently it only decodes up to 3 bytes).  If the value is above 65535, then convert to the surrogate pair.  Of course, BlitzMax wont be able to do anything with the pair, it will most likely just print random characters as BlitzMax only recognizes the first 65536 code points as valid.  But you will still be able to process the characters and transfer between programs, and most importantly, BlitzMax will not crash.
------------------------------------------------
8 rabbits equals 1 rabbyte.

TomToad

Success.  A small modification to blr.mod/blitz.mod/blitz_string.c fixes the problem.
Alter lines 267-268 like so
if( v & 0xffff0000 ) //bbExThrowCString( "Unicode character out of UCS-2 range" );
{
v -= 0x10000;
d = ((v >> 10)& 0x7ffff) + 0xd800;
e = (v & 0x3ff) + 0xdc00;
*q++=d;
*q++=e;
}else{
*q++=v;
}


Now when you get a UTF-8 code > 65535, it will create a surrogate pair instead of crashing.  Once again, if you try and print the string, you'll just end up with random characters.
------------------------------------------------
8 rabbits equals 1 rabbyte.

Yellownakji

Quote from: Derron on August 04, 2019, 09:13:26
It is your opinion to say Brucey's brl.json is crap.
Feel free to add your code so the community can benefit from your less "crap", better performing, better documented and overall better module.


I go along with that it needs examples and maybe some more documentation - but this is what the NG users could do for Brucey - so he does not need to take care of this alongside to other stuff.


@ own parser
So how did you solve the UTF8-handling which is broken in "brl.blitz" and not in "brl.json". Let me guess, you wrote your own code doing it superior but you hide it from us so only you can benefit?


So please: give your oo json implementation which covers the stuff brl.json covers and handles. Maybe make "persistence.mod" work with it too so it is far easier for us to serialize blitzmax objects into json data - and vice versa. For now there is just brl.json doing this and of course a more lightweight, less error prone and better performing lib is always appreciated.


bye
Ron

'Readline', 'contains' and 'compare' functions are your best friend.  Combine with a 'select case' loop.  -   You can easily create lists of entries and array entries just by comparing and checking for common '{}' and joined ',' patterns.  JSON follows a very linear format.   '{}' already inside of a defined '{}' clearly indicates an array list and ',' , at all times, clearly denotes 'new' as in 'new list' or 'next parent entry'.   Don't forget to include spaces (' ') into account.   Spaces are sometimes used as a 'break' point, but not often.

I only replied because i had 'notify me of replies' checked.    This is a necro thread, as Hezkore stated.  I'm cannot offer any other help on my topic, nor do i have the time to.   I'm unsubscribing from this thread as it's no longer of use to me nor am i a current member of this forum.  -  I do not utilize NG for my projects any longer except for existing projects still being maintained.  From time to time, i continue to load the forum and lurk around but this is halting.

Oh, and Derron?  Stop replying to current and new users like you have a stick up your rear.   You are one of the issues on this forum and a prime reason why a select few no longer participate.  Just stop!

I'm sorry i couldn't assist you further, Hezkore;  Good day, everyone.

Cheers.

Derron

#19
@ TomToad
Cool beans. Dunno if this is more useful than a custom "callback" to allow individual handling (escaping them back with "\..." or some hexcode like in html) but it is surely offering help to hezkore.


@Yellownakji
Bye.

PS: "Jansson" (the library behind brl.json) does more and allows more than just splitting given strings/files into data objects according to some syntax. Also using a library allows to skip writing the tedious stuff and rely on bugfixes they do. Of course a stripped snippet (reading simple XML, json, ini, ...) files is more lightweight than a full blown (or in case of mxml "lightweight but feature-rich") yet you have a lot to do if you want to create these files (supporting comments, encoding, groups, enquoting,...). So it all depends on how much you like to reinvent wheels.

Have fun with the new language of choice.


bye
Ron

Brucey

Quote from: Yellownakji on August 04, 2019, 08:49:26
BRL.JSON in general is pure crap, so i wasn't going to stick with it in regardless.

Thanks.

I do not know of any bugs with the current implementation. The latest issue of large unicode values is a problem with BlitzMax specifically.
brl.json should now be able to process those problem json files now with the latest updates to the brl modules, which adds support for storing large unicode values in surrogate pairs - unfortunately, that's all it's doing at the moment, as BlitzMax doesn't currently understand the difference between a character and the pairs that may be combined to make it.
This will affect pretty much any string manipulation you do on such a string. I don't have a plan to do anything about it at the moment.
However, Brl.TextStream has also been updated with improved support for both surrogate pairs and loading utf-8 files.

YMMV

Hezkore

Simplicity is the ultimate sophistication. 🚀
GitHub
BlitzMax for VSCode
BlitzMax for Vim

Brucey

I'm also looking at a more useful serialiser/deserialiser for brl.json, along the lines of Gson (java). Unlike my persistence module, which stores info about the objects themselves, this is more of a raw serialiser - but probably more useful for "world data".

Here's a working example :

Local txt:String = "{~qx~q:10, ~qy~q:10, ~qw~q:20, ~qh~q:20}"

Local jconv:TJConv = New TJConv

Local box:TBox = TBox(jconv.FromJson(txt, "TBox"))
Print box.ToString()

Type TBox

Field x:Int
Field y:Int
Field w:Int
Field h:Int

Method ToString:String()
Return x + ", " + y + ", " + w + ", " + h
End Method
End Type

which prints the result : 10, 10, 20, 20

and another

Local txt:String = "{~qname~q:~qFred~q, ~qage~q:22}"

Local jconv:TJConv = New TJConv

Local person:TPerson = TPerson(jconv.FromJson(txt, "TPerson"))
Print person.name + " : " + person.age

Type TPerson
Field name:String
Field age:Int
End Type

which prints the result : Fred : 22

This hides the complexity of JSON, and lets you create simple objects that you can populate directly from JSON data, which is probably what most people want to do.

TomToad

@brucey:  can't wait to see what you do with this.  This would make it so much easier to transfer data to/from other languages.  AGK2 has ToJSON and FromJSON commands, and Javascript has JSON.parse and JSON.stringify.

I actually tried using the bah.persistencejson module, but with all the extra data, it was useless.
------------------------------------------------
8 rabbits equals 1 rabbyte.

Derron

questions about your "raw serialiser":

- what happens to removed fields?
- what happens to changed field types (x:int becomes a x:Vec2D)?
- could it be used as "base" to properly serialize objects and their relations to each other (like "persistence" does)?

Asking as you know that I use the persistence.mod to create and restore savestates for my games. Storing all the data with "IDs" and just referencing them by IDs in other objects would work somehow but that leads to "races" of what is serialized first (resolving references/validation) except I have lazy "Getters" which would fetch a reference on use ("GetPerson:TPerson(): if not person then person = GetPersonCollection().GetByID(personID); return person").

While I could use your existing solution it would be - imho - better to have some kind of "basic functionality" on which the persistence module can build up. That way fixes in the base class are automatically done to the extended one.


Will have to test your tpersistence-json-mod more deeply. The XML-Savegames of some hours of playing (so "gameday 20" or so) are uncompressed 70MB already :-) And yes, I already prune too old game stats (archives of audience ratings, market situations ... ). Json is less "verbose" than XML so it saves a lot from skipping all the repeated fieldnames. Saves and loads faster ("writes/reads less").


PS: it uses brl.reflection. So for now (!) it is not thread safe. Keep that in mind.

PPS: Meanwhile TomToad posted.
Yes, I too think a "DataObject.FromJson()" and "DataObject.ToJson()" could be useful when interacting with other stuff. Brucey might know of my simple "TData" container which has some getters, setters and contains a "TMap" storing the properties. With the new brl.collections where might be mightier collection types than a "TMap" (or TStringMap in this case).
The idea is to have a "dynamic" collection which can hold all the objects - even other collections for "children lists".

So deserialized Json data could be accessed like this:
data.GetData("sprite1").GetInt("x")
data.GetData("sprite1").GetInt("y")
or
Vec2D(data.GetData("sprite1").Get("position")).x

So in essence: a generic data container which can serialize into json (or xml or ...) and deserialize from json, xml ...
You could even do it like BlitzMax likes it to do: register "drivers".
So there is an abstract serialize/deserialize. And then you can do 'data.serialize(uri, "xml")'). In addition the procedural interface "serializeToXML(data, uri)" can be still made available.
This way you can load whatever serializer-type you you want - maybe even your custom "binary and encrypted"-serializer.




bye
Ron

TomToad

There are a couple of ways that a Json serializer could be implemented.  One would be the way AGK2 does it.  The type needs to be created.  When calling ToJson, the json string is created with the same structure as the type.  Field identifiers would be the key and the value would be the object value.  So:
type Player
   position as Vec2
   speed as Vec2
endtype

myPlayer as Player
myPlayer.position.x = 100
myPlayer.position.y = 200
myPlayer.speed.x = 50
myPlayer.speed.y = 75

js as string
js = myPlayer.ToJson()

//js now contains the string {"position":{"x":100,"y":50},"speed":{"x":50,"y":75}}
js = "{'position':{'x':50},'speed':{'x':100},'extra':'The main player'}
secondPlayer as Player
secondPlayer.FromJson(js)

//secondPlayer.position.x is 50, secondPlayer.position.y is set to default of 0 as it doesn't exist in the string
//secondPlayer.speed.x is 100, secondPlayer.speed.y is set to default of 0 as it doesn't exist in the string
//'extra' is ignored as there is no equivelent field in the type


Javascript actually builds the type on the fly.  Where fields don"t exist, one will be created

var myPlayer = {position:{x:50,y:100},speed:{x:10,y:20}}
var js = JSON.Stringify(myPlayer)

//js contains the string {"position":{"x":50,"y":100},"speed":{"x":10,"y":20}}
js = "{'position':{'x':50},'speed':{'y':20},'extra':'The second player'}
var secondPlayer = JSON.Parse(js)
//secondPlayer.position does not contain a y field
var posY = secondPlayer.position.y
//due to javascript's dynamic nature, when you try and access a non-existant field, a new one is created
//   and filled with a default.  In this case secondPlayer.position.y now equals 0.
var extra = secondPlayer.extra
//the 'extra' key also creates a field in javascript, so extra now contains "The second player"

I think it would be best to use more of the AGK way with BlitzMax instead of the javascript way.  If a key has an object as it's value, and there are no matching fields in the type, there's no way to create one as there is no way to know which type the object is referring to.
------------------------------------------------
8 rabbits equals 1 rabbyte.

Derron

As we could not add new "methods" to objects (also it adds a dependency to the json-module even if not used...) it would require two procedural functions:
Function ToJson:string(o:object)
and
Function FromJson:object(s:string)

and in your custom types you could have your methods calling these ".ToJson()" and ".FromJson()" (or prepend the modulescope...).


What it needs of course, is the use of meta data - so you can disable serialization of certain properties. Because: how to serialize a function pointer? You can serialize a "TImage" as it is based on TPixmap which is pixel data but sometimes you do not want to store this stuff ("big images").
And even bigger: TImage is a derivate of "TGLImageFrame" or "TDXImageFrame" or ... so you serialize a engine/hardware or OS-dependend object.
This means you need to make sure to only serialize stuff working across all platforms you want to support.

For "persistence mod" there is {nopersist} to avoid handling certain properties.

I myself added some stuff to call custom serializer - this made Brucey to add such a similar thing too (you can register serializers for custom object types - like Maps, Lists, ..).
Similar stuff should be made available for the "ToJson/FromJson"-functionality provider too.

That means you eg. store the URI of the image and on deserialization try to fetch it again, same for music, ...


So: for basic types, like "number/string containers" this is a nice addition, but for complex types containing "byte ptr", "images", "sounds" ... stuff can become complex. Or it enforces "collections/managers" and objects relating to these "binaries" via ID instead of object references.


bye
Ron

Brucey

I've committed the initial version of brl.jconv.

It currently has support for serializing primitive types, Strings, and Objects.
I'll probably work on Arrays next.

Then some kind of type adaptor interface for custom serializing.
And support for metadata - I'm thinking of things like enable/disable of fields, support for different names.
And a bunch of more configurable stuff via the builder.

This is not designed to be a "BlitzMax Object Serializer" (there's already a module that can do that) - although I imagine once type adaptors are implemented, you might be able to use it as such.

Brucey

I've added support for arrays (to brl.jconv), which more or less covers the majority of your typical JSON object structures.

It's required some updates to brl.reflection in order to avoid jumping through too many hoops.
I also pushed MaxUnit into the BRL namespace, because it's useful to have it as part of the core functionality - I've added some unit tests to brl.jconv.

Next I'll be looking at some form of custom type mapping/serializing, and adding more options.

Derron

Similar to the persistence.mod any serializer in some way needs to be able to tackle circular references.

Nice to have is a way to avoid duplicate data (A references B and C references B. D references A and C. D is the one to "jsonify").
Persistence.mod does a good (but in some NG builds I got warnings about duplicate references...)

Bye
Ron