[Pharo-dev] problem while parsing xml

Stéphane Ducasse stephane.ducasse at inria.fr
Sun Dec 29 06:58:35 EST 2013


Thanks for your reactivity :)
I will add the following to the draft of the XML chapter. 

Monty could you register to the pharo mailing-list because this is the place where people communicate?
You put a big gmail filter that just show you mail about XML :)

Stef

> With Zinc patched using the fileOut attached (which I emailed to Sven) and the latest XMLParser packages, Satoshi's code works fine. But it takes a while to make the 12 (!) HTTP requests needed for the external DTD and external entities it includes. It is good to use the caching resolver for this sort of thing:
> 
> resolver := DTDCachingEntityResolver new.
> xml := '<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <!DOCTYPE score-partwise PUBLIC
>     "-//Recordare//DTD MusicXML 3.0 Partwise//EN"
>     "http://www.musicxml.org/dtds/partwise.dtd">
> <score-partwise version="3.0">
>   <part-list>
>     <score-part id="P1">
>       <part-name>Music</part-name>
>     </score-part>
>   </part-list>
>   <part id="P1">
>     <measure number="1">
>       <attributes>
>         <divisions>1</divisions>
>         <key>
>           <fifths>0</fifths>
>         </key>
>         <time>
>           <beats>4</beats>
>           <beat-type>4</beat-type>
>         </time>
>         <clef>
>           <sign>G</sign>
>           <line>2</line>
>         </clef>
>       </attributes>
>       <note>
>         <pitch>
>           <step>C</step>
>           <octave>4</octave>
>         </pitch>
>         <duration>4</duration>
>         <type>whole</type>
>       </note>
>     </measure>
>   </part>
> </score-partwise>
> '.
> (XMLDOMParser on: xml)
>     externalEntityResolver: resolver;
>     parseDocument.
> 
> If you evaluate it, and then after, evaluate just the last three lines, it's faster the second time because the HTTP requests didn't have to be made again.
> 
> I did remove one test you added that used the same external DTD as Satoshi, because the rest of the XML didn't comform to the DTD and still raised validation exceptions even though the "DNU #asciiValue" error is gone.
Excellent!

> I think it's best anyway not having unit tests make HTTP requests if posssible (especially not 12), otherwise you have random test failures because a server or network is down:
> http://xunitpatterns.com/Unit%20Test%20Rulz.html

Indeed when I wrote the test I had no idea what was the problem.


> I try to test everything short of actual HTTP or File interaction (like the path resolution tests in testHTTPClientPaths/testFileClientPaths).
> 
> I don't have enough time to author anything at the moment, but I appreciate the offer.

If you take some notes let me know :)

> And I appreciate the work you've done on Pharo and Squeak over the years. I learned Smalltalk through Pharo By Example, and there are probably many others who also did.
>  
>> 
>> ----- Original Message -----
>> From: Stéphane Ducasse
>> Sent: 12/28/13 05:15 AM
>> To: monty
>> Subject: Re: [Pharo-dev] problem while parsing xml
>>  
>> Thanks a lot I forwarded your mail.
>> This makes me thinking that it would be great to have a little documentation of the XML ecosystem in Pharo.
>> If you are interested to co-author a book chapter with me it could be really great and fun. 
>>  
>> Stef
>>  
>>  
>>  
>>> 
>>> Thanks for the CCs.  I've identified the sources of the problem(s) and am working on it now and should have it fixed soon (though its not entirely confined to XMLParser, but the fixes outside are minimal).
>>> 
>>> NOTE: You can always just disable resolvesExternalEntities: and isValidating: (or use parse:usingNamespaces:validation:externalEntities: with the last two args false) if they are causing problems.
>>>  
>>>> ----- Original Message -----
>>>> From: Stéphane Ducasse
>>>> Sent: 12/27/13 04:42 AM
>>>> To: Pharo Development List
>>>> Subject: Re: [Pharo-dev] problem while parsing xml
>>>>  
>>>> ok I missed the example in the mail of satoshi I will try to have a look or at leat write a test.
>>>>  
>>>> On 25 Dec 2013, at 19:04, Stéphane Ducasse <stephane.ducasse at inria.fr> wrote:
>>>>  
>>>>> 
>>>>> Hi Hernan
>>>>>  
>>>>> how did you reproduce the bug?
>>>>> Because it would be nice to fix it.
>>>>>  
>>>>> Stef
>>>>>  
>>>>> On 24 Dec 2013, at 18:48, Hernán Morales Durand <hernan.morales at gmail.com> wrote:
>>>>>  
>>>>>> 
>>>>>> I have tried in Pharo 2.0 configurations from Config Browser (ConfigurationOfXMLParser-StephaneDucasse.16) and latest from http://www.smalltalkhub.com/mc/PharoExtras/XMLParser/main (ConfigurationOfXMLParser-monty.37) and both have this bug.
>>>>>> Hernán
>>>>>>  
>>>>>> 2013/12/24 Stéphane Ducasse <stephane.ducasse at inria.fr>
>>>>>> Hi Satoshi
>>>>>> 
>>>>>> Did you try with the latest configuration of XMLParser?
>>>>>> Because I do not know what it is in Moose.
>>>>>> Stef
>>>>>> 
>>>>>> On 23 Dec 2013, at 18:26, NISHIHARA Satoshi <goonsh at gmail.com> wrote:
>>>>>> 
>>>>>> > Hello
>>>>>> >
>>>>>> > using: Pharo3.0 Latest update: #30659 (it occurs same error to
>>>>>> > Pharo2.0 Latest update: #20628)
>>>>>> > (Moose 4.9 with XML-Parser (DamienCassou.143) has no_err)
>>>>>> >
>>>>>> > 1. load XMLParser (StephaneDucasse.16) from Configuration browser
>>>>>> > 2. select and inspect it below:
>>>>>> > ^  XMLDOMParser parse: '<?xml version="1.0" encoding="UTF-8" standalone="no"?>
>>>>>> > <!DOCTYPE score-partwise PUBLIC
>>>>>> >    "-//Recordare//DTD MusicXML 3.0 Partwise//EN"
>>>>>> >    "http://www.musicxml.org/dtds/partwise.dtd">
>>>>>> > <score-partwise version="3.0">
>>>>>> >  <part-list>
>>>>>> >    <score-part id="P1">
>>>>>> >      <part-name>Music</part-name>
>>>>>> >    </score-part>
>>>>>> >  </part-list>
>>>>>> >  <part id="P1">
>>>>>> >    <measure number="1">
>>>>>> >      <attributes>
>>>>>> >        <divisions>1</divisions>
>>>>>> >        <key>
>>>>>> >          <fifths>0</fifths>
>>>>>> >        </key>
>>>>>> >        <time>
>>>>>> >          <beats>4</beats>
>>>>>> >          <beat-type>4</beat-type>
>>>>>> >        </time>
>>>>>> >        <clef>
>>>>>> >          <sign>G</sign>
>>>>>> >          <line>2</line>
>>>>>> >        </clef>
>>>>>> >      </attributes>
>>>>>> >      <note>
>>>>>> >        <pitch>
>>>>>> >          <step>C</step>
>>>>>> >          <octave>4</octave>
>>>>>> >        </pitch>
>>>>>> >        <duration>4</duration>
>>>>>> >        <type>whole</type>
>>>>>> >      </note>
>>>>>> >    </measure>
>>>>>> >  </part>
>>>>>> > </score-partwise>
>>>>>> > ' readStream.
>>>>>> >
>>>>>> > 3. and get DNU: SmallInteger>>asciiValue
>>>>>> >
>>>>>> > 'case of XMLZincClient>>#get:
>>>>>> > 'http://www.musicxml.com/dtds/partwise.dtd' timeout: ... onError: ...
>>>>>> > returns binaryStream?
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > --
>>>>>> > "NISHIHARA Satoshi"
>>>>>> > [:goonsh :nsh | ^ nishis perform: goonsh with: nsh]
>>>>>> > <PharoDebug.log>
>>>>>> 
>>>>>>  
>>>  
>  
> <ZnMimeType-isBinary.st>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20131229/eca1b710/attachment-0002.html>


More information about the Pharo-dev mailing list