[Pharo-dev] XMLParser changes

monty monty2 at programmer.net
Mon Apr 25 11:47:26 EDT 2016


It's not a regression (see the changelog or the new comment in #ignorableWhitespace:). This is the correct behavior required by the spec, is long-overdue, and is also the way other parsers like libxml2 behave.

>From Sec. 2.10:
"An XML processor MUST always pass all characters in a document that are not markup through to the application. A validating XML processor MUST also inform the application which of these characters constitute white space appearing in element content."

The "white space appearing in element content" is the only whitespace the spec treats as ignorable, but the definition of "element content" requires validation and a DTD with ELEMENT declarations that restrict its content to elements.

>From sec. 3.2.1:
"An element type has element content when elements of that type MUST contain only child elements (no character data), optionally separated by white space."

so these are of type element content:
<!ELEMENT a (b,c,d)>
<!ELEMENT a (b|c)>
<!ELEMENT a (b+,c*)>

but these aren't:
<!ELEMENT a (#PCDATA|b|c)*>
<!ELEMENT a ANY>
<!ELEMENT a EMPTY>

And if the parser is non-validating (or in our case, if you disable it) or there's no DTD, then all whitespace must be assumed to be non-ignorable.

> Sent: Monday, April 25, 2016 at 6:14 AM
> From: "Cyril Ferlicot Delbecque" <cyril.ferlicot at gmail.com>
> To: pharo-dev at lists.pharo.org, monty <monty2 at programmer.net>
> Subject: XMLParser changes
>
> Hi,
> 
> last week there was a new stable version of XMLParser and some tests
> broke in some tools. I think that there was a regression in this version.
> 
> Snippet:
> 
> (XMLDOMParser parse: '<?xml version="1.0" encoding="UTF-8"?>
> <!--clones description file-->
> <Clones>
>     <ClonedFragment cloneName="test">
>         <Member fileName="ProgramA"/>
>         <Member fileName="ProgramB"/>
>     </ClonedFragment>
>     <ClonedFragment cloneName="test2">
>         <Member fileName="ProgramA"/>
>         <Member fileName="ProgramB"/>
>     </ClonedFragment>
> </Clones>') elements first nodes
> 
> With the release 2.7.4 we get 2 nodes but in release 2.7.6 we get 5
> nodes. The 2 previous ones and 3 empty String nodes.
> 
> I think this is not what we expect. Correct me if I am wrong :)
> 
> 
> -- 
> Cyril Ferlicot
> 
> http://www.synectique.eu
> 
> 165 Avenue Bretagne
> Lille 59000 France
> 
> 



More information about the Pharo-dev mailing list