text.xml TxmlNode.setContent() appends content instead of replacing

Started by TomToad, December 08, 2024, 02:33:44

Previous topic - Next topic

TomToad

Trying to change the content of a TxmlNode using setContent(), but it appends to what is there instead.  The example on blitzmax.org works fine.  I don't understand what I am doing different.

BlitzMax NG code
SuperStrict

Framework text.xml
Import BRL.StandardIO

Local doc:TxmlDoc = TxmlDoc.parseFile("test.xml")
Local root:TxmlNode = doc.getRootElement()

ParseChildren(root.getChildren())
Print doc.toString()

Function ParseChildren(children:TList)
   For Local child:TxmlNode = EachIn children
      Local name:String = child.getName()
      If name = "ChapterString"
         Print child.toString()
         child.setContent("Happy")
         Print child.toString()
      EndIf
      ParseChildren(child.getChildren())
   Next
End Function
example test.xml file
<?xml version="1.0"?>
<!-- <!DOCTYPE Chapters SYSTEM "matroskachapters.dtd"> -->
<Chapters>
  <EditionEntry>
    <EditionFlagHidden>0</EditionFlagHidden>
    <EditionFlagDefault>1</EditionFlagDefault>
    <EditionUID>9866912279059327637</EditionUID>
    <ChapterAtom>
      <ChapterUID>1286159179066931568</ChapterUID>
      <ChapterTimeStart>00:00:00.000000000</ChapterTimeStart>
      <ChapterFlagHidden>0</ChapterFlagHidden>
      <ChapterFlagEnabled>1</ChapterFlagEnabled>
      <ChapterTimeEnd>00:01:49.234125000</ChapterTimeEnd>
      <ChapterDisplay>
        <ChapterString>Chapter 01</ChapterString>
        <ChapterLanguage>eng</ChapterLanguage>
      </ChapterDisplay>
    </ChapterAtom>
    <ChapterAtom>
      <ChapterUID>5098340377607805417</ChapterUID>
      <ChapterTimeStart>00:01:49.234125000</ChapterTimeStart>
      <ChapterFlagHidden>0</ChapterFlagHidden>
      <ChapterFlagEnabled>1</ChapterFlagEnabled>
      <ChapterTimeEnd>00:04:27.642375000</ChapterTimeEnd>
      <ChapterDisplay>
        <ChapterString>Chapter 02</ChapterString>
        <ChapterLanguage>eng</ChapterLanguage>
      </ChapterDisplay>
    </ChapterAtom>
    <ChapterAtom>
      <ChapterUID>1184190901793323768</ChapterUID>
      <ChapterTimeStart>00:04:27.642375000</ChapterTimeStart>
      <ChapterFlagHidden>0</ChapterFlagHidden>
      <ChapterFlagEnabled>1</ChapterFlagEnabled>
      <ChapterTimeEnd>00:09:22.436875000</ChapterTimeEnd>
      <ChapterDisplay>
        <ChapterString>Chapter 03</ChapterString>
        <ChapterLanguage>eng</ChapterLanguage>
      </ChapterDisplay>
    </ChapterAtom>
    <ChapterAtom>
      <ChapterUID>11649965882648966106</ChapterUID>
      <ChapterTimeStart>00:09:22.436875000</ChapterTimeStart>
      <ChapterFlagHidden>0</ChapterFlagHidden>
      <ChapterFlagEnabled>1</ChapterFlagEnabled>
      <ChapterTimeEnd>00:12:27.455041666</ChapterTimeEnd>
      <ChapterDisplay>
        <ChapterString>Chapter 04</ChapterString>
        <ChapterLanguage>eng</ChapterLanguage>
      </ChapterDisplay>
    </ChapterAtom>
    <ChapterAtom>
      <ChapterUID>14716554404388442181</ChapterUID>
      <ChapterTimeStart>00:12:27.455041666</ChapterTimeStart>
      <ChapterFlagHidden>0</ChapterFlagHidden>
      <ChapterFlagEnabled>1</ChapterFlagEnabled>
      <ChapterTimeEnd>00:16:19.436791666</ChapterTimeEnd>
      <ChapterDisplay>
        <ChapterString>Chapter 05</ChapterString>
        <ChapterLanguage>eng</ChapterLanguage>
      </ChapterDisplay>
    </ChapterAtom>
  </EditionEntry>
</Chapters>
result
Building test
[ 86%] Processing:test.bmx
[ 93%] Compiling:test.bmx.gui.debug.win32.x64.c
[100%] Linking:test.debug.exe
Executing:test.debug.exe
<ChapterString>Chapter 01</ChapterString>
<ChapterString>Chapter 01Happy</ChapterString>
<ChapterString>Chapter 02</ChapterString>
<ChapterString>Chapter 02Happy</ChapterString>
<ChapterString>Chapter 03</ChapterString>
<ChapterString>Chapter 03Happy</ChapterString>
<ChapterString>Chapter 04</ChapterString>
<ChapterString>Chapter 04Happy</ChapterString>
<ChapterString>Chapter 05</ChapterString>
<ChapterString>Chapter 05Happy</ChapterString>
<?xml version="1.0"?>
<!-- <!DOCTYPE Chapters SYSTEM "matroskachapters.dtd"> -->
<Chapters>
  <EditionEntry>
    <EditionFlagHidden>0</EditionFlagHidden>
    <EditionFlagDefault>1</EditionFlagDefault>
    <EditionUID>9866912279059327637</EditionUID>
    <ChapterAtom>
      <ChapterUID>1286159179066931568</ChapterUID>
      <ChapterTimeStart>00:00:00.000000000</ChapterTimeStart>
      <ChapterFlagHidden>0</ChapterFlagHidden>
      <ChapterFlagEnabled>1</ChapterFlagEnabled>
      <ChapterTimeEnd>00:01:49.234125000</ChapterTimeEnd>
      <ChapterDisplay>
        <ChapterString>Chapter 01Happy</ChapterString>
        <ChapterLanguage>eng</ChapterLanguage>
      </ChapterDisplay>
    </ChapterAtom>
    <ChapterAtom>
      <ChapterUID>5098340377607805417</ChapterUID>
      <ChapterTimeStart>00:01:49.234125000</ChapterTimeStart>
      <ChapterFlagHidden>0</ChapterFlagHidden>
      <ChapterFlagEnabled>1</ChapterFlagEnabled>
      <ChapterTimeEnd>00:04:27.642375000</ChapterTimeEnd>
      <ChapterDisplay>
        <ChapterString>Chapter 02Happy</ChapterString>
        <ChapterLanguage>eng</ChapterLanguage>
      </ChapterDisplay>
    </ChapterAtom>
    <ChapterAtom>
      <ChapterUID>1184190901793323768</ChapterUID>
      <ChapterTimeStart>00:04:27.642375000</ChapterTimeStart>
      <ChapterFlagHidden>0</ChapterFlagHidden>
      <ChapterFlagEnabled>1</ChapterFlagEnabled>
      <ChapterTimeEnd>00:09:22.436875000</ChapterTimeEnd>
      <ChapterDisplay>
        <ChapterString>Chapter 03Happy</ChapterString>
        <ChapterLanguage>eng</ChapterLanguage>
      </ChapterDisplay>
    </ChapterAtom>
    <ChapterAtom>
      <ChapterUID>11649965882648966106</ChapterUID>
      <ChapterTimeStart>00:09:22.436875000</ChapterTimeStart>
      <ChapterFlagHidden>0</ChapterFlagHidden>
      <ChapterFlagEnabled>1</ChapterFlagEnabled>
      <ChapterTimeEnd>00:12:27.455041666</ChapterTimeEnd>
      <ChapterDisplay>
        <ChapterString>Chapter 04Happy</ChapterString>
        <ChapterLanguage>eng</ChapterLanguage>
      </ChapterDisplay>
    </ChapterAtom>
    <ChapterAtom>
      <ChapterUID>14716554404388442181</ChapterUID>
      <ChapterTimeStart>00:12:27.455041666</ChapterTimeStart>
      <ChapterFlagHidden>0</ChapterFlagHidden>
      <ChapterFlagEnabled>1</ChapterFlagEnabled>
      <ChapterTimeEnd>00:16:19.436791666</ChapterTimeEnd>
      <ChapterDisplay>
        <ChapterString>Chapter 05Happy</ChapterString>
        <ChapterLanguage>eng</ChapterLanguage>
      </ChapterDisplay>
    </ChapterAtom>
  </EditionEntry>
</Chapters>

Process complete
expected result:  All <ChapterString> nodes' content should be replaced with Happy.
------------------------------------------------
8 rabbits equals 1 rabbyte.

SToS

Change...
child.setContent("Happy")To..
child.setName("Happy")

TomToad

No, that doesn't work.  .setname() changes the tag, not the content.  I want to change <ChapterString>Chapter 01</ChapterString> into <ChapterString>Happy</ChapterString>.  Your example changes it to <Happy>Chapter 01</Happy>

For a little context, I am writing a program to change the chapter names in an .mkv file.  Using mkvextract.exe to extract the chapters into an .xml file, then using text.xml module to edit the file.  Eventually replacing the ChapterString content with the actual chapter names. I am just using Happy here in order to reduce the problem to a basic example.
------------------------------------------------
8 rabbits equals 1 rabbyte.

Midimaster

Looks like a bug in XML module

I know that brucey did an update he last weeks here. Do you already have the lastest Release of BlitzMax NG?
It is offered on the BlitzMax Discord channel or at BlitzMax Github

Do you like to report an ISSUE at Discord, or should I do it for you?


...back from North Pole.

TomToad

Just downloaded and tried with the latest Blitzmax. v0.146.3.58.202412080206.  Still same issue.
------------------------------------------------
8 rabbits equals 1 rabbyte.

Baggey

Guy's I am absolutely no expert but.

BlitzmaxNG  Is a little F****R.

uncheck all the build options and never get any problems!

But then Someone else gives me code and they've got options Checked? You need to experiment with these options. It could be something stupid? If your not compiling the same way's it ain't going to work :-X
Running a PC that just Aint fast enough!? i7 4Ghz Quad core 32GB ram  2x1TB SSD and NVIDIA Quadro K1200 on 2 x HP Z24's . DID Technology stop! Or have we been assimulated!

Windows10, Parrot OS, Raspberry Pi Black Edition! , ZX Spectrum 48k, C64, Enterprise 128K, The SID chip. Im Misunderstood!

Midimaster

Quote from: Baggey on December 08, 2024, 15:47:58...uncheck all the build options and never get any problems!....
What a funny idea!

Unchecking all the build options makes the compiler not showing the bugs anymore. But they are still there.

During Development these should be active all the time:

QuickBuild  ON
( ...as long as you did not update the BlitzMax release). Switch it OFF for a first run, when you changed something in the modules or updated BlitzMax

DebugBuild ON
( ...as only this will point you to your bad code). Switch it only OFF, when you produce the final build for your customers.

Overload Warnings ON
( ...as this will allow "auto-casting). Keep it On means: Blitzmax only throws a text warning instead of an Error Break, when you combine two different TYPEs. e.g. send a INTEGER to a function, which expects a FLOAT.


Time-critical applications

...often need to switch DEBUG off to reach the expected performance. But also here you should check from time to time, whether your App "survives" a Debug run.  This will point you to things like "element beyond array length" or forbidden memory access.
...back from North Pole.

Derron

The documentation sample works - because it starts a _new_ node (without content) ...

If you just want to change the content of a node ... then it must interpret the existing stuff as "content"


<?xml version="1.0"?>
<!-- <!DOCTYPE Chapters SYSTEM "matroskachapters.dtd"> -->
<Chapters>
  <EditionEntry>
    <EditionFlagHidden>0</EditionFlagHidden>
  </EditionEntry>
</Chapters>
... made a short version out of it.


and adjusted the doc sample a bit:
SuperStrict

Framework text.xml
Import BRL.StandardIO

Local doc:TxmlDoc = TxmlDoc.parseFile("test_short.xml")


If doc Then
   
    Local root:TxmlNode = doc.getRootElement() 'TxmlNode.newNode("root")
    'doc.setRootElement(root)

    ' create a empty node group
    Local nodegroup:TxmlNode = TXmlNode(root.GetFirstChild()) ' root.addChild("nodegroup")
   
    ' create a new empty node
    Local node:TxmlNode = TXmlNode(nodegroup.GetFirstChild()) ' nodegroup.addChild("node")

    Print node.ToString()

    ' set the node content
    node.setContent("Some text content for the node")
   
    Print node.ToString()
   
    ' change the node content
    node.setContent("Modified content!")
   
    Print node.ToString()

print doc.ToString()
End If

And the output is:
<EditionFlagHidden>0</EditionFlagHidden>
<EditionFlagHidden>0Some text content for the node</EditionFlagHidden>
<EditionFlagHidden>0Modified content!</EditionFlagHidden>
<?xml version="1.0"?>
<!-- <!DOCTYPE Chapters SYSTEM "matroskachapters.dtd"> -->
<Chapters>
  <EditionEntry>
    <EditionFlagHidden>0Modified content!</EditionFlagHidden>
  </EditionEntry>
</Chapters>

Do you see how this "0" stays constant, but the "content" we change is replacing itself? This indicates that "0" is not part of the "content".

Checking what xml.mod is doing:
void bmx_mxmlSetContent(mxml_node_t * node, BBString * content) {
mxml_node_t * child = mxmlGetFirstChild(node);
while (child != NULL) {
mxml_node_t * txt = NULL;
if (mxmlGetType(child) == MXML_TEXT) {
txt = child;
}
child = mxmlGetNextSibling(child);
if (txt) {
mxmlDelete(txt);
}
}
char * c = bbStringToUTF8String(content);
mxmlNewText(node, 0, c);
bbMemFree(c);
}
so if a node contains a child node it would go "inwards" until it finds a suitable "text node".

I removed the "while" loop for a test..

and now the line is not replacing the "custom" content but only appending ...
    <EditionFlagHidden>0Some text content for the nodeModified content!</EditionFlagHidden>
which indicates that the "mxmlDelete()" is required there to remove our previously added value ... but it does not remove the "original" one.

Now I just removed the "type check" in there ...
// if (mxmlGetType(child) == MXML_TEXT) {
txt = child;
// }
And now my "setcontent" call is actually replacing _everything_ inside - Brucey might better explain why this might be incorrect to do (nodes having string content but also child nodes... or so).

Edit: The function seems to enforce looking for a sub-element / child (of type txt) ... dunno if a "loaded xml" differs there and the node itself is already MXML_TEXT ... and thus it would have to "replace itself".

Edit2: the passed node is of type MXML_ELEMENT (XML element with attributes) ...

bye
Ron

TomToad

Thanks Derron.  Your research has lead me to a solution.  It appears that the content which is loaded is given the type of MXML_OPAQUE, which I gather means a string that represents an unknown type (integer, real, text, etc...).  glue.c in text.xml is testing for MXML_TEXT specifically.  Since .setContent() should replace the content regardless of type, with the exception of child nodes, I replaced the check with if (mxmlGetType(child) > MXML_ELEMENT) {  and everything seems to work now.
:)
------------------------------------------------
8 rabbits equals 1 rabbyte.

Derron

Brucey committed another approach to it ... simply deleting the children regardless of the subtype.

Dunno if yours is more "preserving" - or "wrong". I am more on your side and think he must have added this "text only" check for a reason - to not cut the forest for one tree so to say.


bye
Ron

Midimaster

It looks like Brucey already started repairing xml.mod. I can see some action tonight at text.mod in his Github. I guess Updates will come at 2024-12-16
...back from North Pole.

TomToad

I don't think that's what we want.  At least that isn't what I want, maybe I'm misunderstanding the purpose of .setContent().  In Brucey's modification, everything gets replaced, including child elements.  I just need the text portion replaced.
SuperStrict

Framework text.xml
Import BRL.StandardIO

Local xmlFile:String = "<?xml version=~q1.0~q?>~r"
xmlFile :+ "<root>~r"
xmlFile :+ "   <parent>~r"
xmlFile :+ "      <child1>child1 text</child1>~r"
xmlFile :+ "      <child2>child2 text</child2>~r"
xmlFile :+ "      parent text~r"
xmlFile :+ "   </parent>~r"
xmlFile :+ "</root>"

Local doc:TxmlDoc = TxmlDoc.readDoc(xmlFile)
Print doc.ToStringFormat(True)+"~n-------------------------"

Local root:TxmlNode = doc.getRootElement()
Local parent:TxmlNode = TxmlNode(root.getFirstChild())

parent.setContent("New Parent Text")
Print doc.ToStringFormat(True)

My modification you get
<?xml version="1.0"?>
<root>  
  <parent>     
    <child1>child1 text</child1>     
    <child2>child2 text</child2>      parent text  
  </parent>
</root>
-------------------------
<?xml version="1.0"?>
<root>  
  <parent>
    <child1>child1 text</child1>
    <child2>child2 text</child2>New Parent Text
  </parent>
</root>

Brucey's modification you get
<?xml version="1.0"?>
<root>  
  <parent>     
    <child1>child1 text</child1>     
    <child2>child2 text</child2>      parent text  
  </parent>
</root>
-------------------------
<?xml version="1.0"?>
<root>  
  <parent>New Parent Text</parent>
</root>
------------------------------------------------
8 rabbits equals 1 rabbyte.

Midimaster

These results are all nonsense!

As user I would expect: when I found the node and replace its content, that afterward the content is replaced.

If the node is a Parent node with additional Children, it may be acceptable, that all childrens including the structure are exchanged with my new (only text) content. But perhaps this should throw a warning: "no content here, but children"

But if I iterated down to the very deepest child and its "true" content, I need to trust, that this content will be exchanged. But this is also not done by the lastest version of text.mod.xml.mod.

I did a test with your first text.xml and this short example code:

SuperStrict

Framework text.xml
Import BRL.StandardIO

Local doc:TxmlDoc = TxmlDoc.parseFile("chapter.xml")
Local root:TxmlNode = doc.getRootElement()

ParseChildren(root.getChildren())

Function ParseChildren(children:TList)
    For Local child:TxmlNode = EachIn children
        Local name:String = child.GetName()
        Print  "NAME =" + name '+ " CONTENT=" + child.toString() + "+++++"
        If name = "ChapterString"
            Print "*******************"
            Print "content before: " + child.GetContent()  + "!"
            child.setContent("Happy")
            Print " content after: " +child.GetContent()   + "!"
            Print "--done---------   "
            Print
        EndIf
        ParseChildren(child.getChildren())
    Next
End Function

This was the result:
Building XMLChange
[ 86%] Processing:XMLChange.bmx
[ 93%] Compiling:XMLChange.bmx.gui.debug.win32.x64.c
[100%] Linking:XMLChange.debug.exe
Executing:XMLChange.debug.exe
NAME =EditionEntry
NAME =EditionFlagHidden
NAME =EditionFlagDefault
NAME =EditionUID
NAME =ChapterAtom
NAME =ChapterUID
NAME =ChapterTimeStart
NAME =ChapterFlagHidden
NAME =ChapterFlagEnabled
NAME =ChapterTimeEnd
NAME =ChapterDisplay
NAME =ChapterString
*******************
content before: Chapter 01!
 content after: !
--done---------  
....

There is no longer any content after SetContent()

This is definitely a bug and I will now report it on GitHub.

GitHub-Issue here: https://github.com/bmx-ng/text.mod/issues/36
...back from North Pole.

TomToad

It just may come down how you interpret the more ambiguous parts of the xml specifications.  As I understand it, these three structures are equivalent. 
<!-- Text before children -->
<parent>
   SyntaxBomb
   <child />
</parent>

<!-- text after children -->
<parent>
   <child />
   SyntaxBomb
</parent>

<!-- text and children interspersed -->
<parent>
   Syntax<child />Bomb
</parent>

Is the content of an element just the text or is it the text and all the children of the element? 
Maybe .setContent() should have a flag to determine if only the text should be replaced or all its children.  .setContent(replaceChildren:int = True)
------------------------------------------------
8 rabbits equals 1 rabbyte.

Derron

I assume there should be different functions:
- one to set "content" of the node (so simply said you could node.SetContent("<childnode>hehe</childnode>")) -> I assume in this case the node should be optionally set to MXML_Opaque (preserving whitespace at begin/end)
- and one to set this node to a "text node" and there the content then (so it becomes a MXML_TEXT one ... not preserving whitespace at the begin/end

For now the setContent stuff is doing
- remove old nodes (suiting to some criteria)
- add a new text node (similar to "AddContent")

Maybe it should also care for "what is there" ... and maybe "what is there" is incorrect (so of the wrong "node type") ?


Would be possibly a good idea to prepare a test file which is doing all the stuffs we want:
- replacing everything inside a node (so removing children of a given node) with content
- replacing a nodes value with content (so preserve children) with content (text, numeric, ...)


bye
Ron