[Pharo-dev] Scraping HTML chapter 2 (soon chapter 3 coming)

Cédrick Béler cdrick65 at gmail.com
Wed Oct 4 09:37:28 EDT 2017


Hi Steph (sorry for late reply, it seems I have a wrong automatic process
of mailing list),

I'll published it soon and why not doing a chapter on that subject. This
could be a really fun tutorial :)

I'll add the section tag in the code for nested tags.

Cheers,

Cédrick

2017-09-27 21:52 GMT+02:00 Stephane Ducasse <stepharo.self at gmail.com>:

> Hello Cedric
>
>>
>> I’m using Soup from time to time. I use it also with student this
>> semester to do a small app that get leboncoin adds programmatically to send
>> alerts (leboncoin is a French web site to sell/buy stuff between people).
>>
>
> cool I want it because I'm looking for Games on leboncoin
>
>
>> I think it should be updated to reflect HTML5 important new tags. For
>> instance I needed to add the section tag in the « nestableBlockTags » …
>> It’s in SoupParserParameter initializeNestableBlockTags (see the screenshot
>> - I don’t have my image here sot it’s just the method). Without this tag, I
>> could properly get the classified ads).
>>
>
>> I can publish it if you want but I think there are more important HTML5
>> tags to include in this parameters class. As Soup is an "incomplete
>> parser » (as far as I understand), there is no need to include all html5
>> tags. But beside <section>, do other people think there are tags to include
>> there ?
>>
>
> Please publish it.
> What you can do also is to write a chapter showing how you script
> leboncoin with Soup.
> You are welcome.
>
>
>>
>>
>>
>> Cheers,
>>
>> Cédrick
>>
>>
>>
>> Le 27 sept. 2017 à 09:25, Stephane Ducasse <stepharo.self at gmail.com> a
>> écrit :
>>
>> I came with the idea of this booklet thank to Peter Kenny that kindly
>> answered a question on the Pharo mailing-list.
>> To help, Peter showed to a Pharoer how to scrap a web site
>> using XPath. In addition, some years ago
>> I was maintaining Soup a scraping framework because I was scraping
>> magic web sites and I wanted an application to manage my magic cards.
>> Since then I always wanted to try XPath and in addition I wanted to
>> offer this booklet to Peter. Why because I asked Peter
>> if he would like to write something and he told that he was at a great
>> age where he would not take any commitment.
>> I realised that I would like to get as old as him and be able to hack
>> like a mad in Pharo with new technology.
>> So this booklet is a gift to Peter, a great and gentle Pharoer.
>>
>> Stef
>> <scrapingChap2-min.pdf>
>>
>>
>>
>


-- 
Cédrick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20171004/9a0d9336/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 377433 bytes
Desc: not available
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20171004/9a0d9336/attachment.png>


More information about the Pharo-dev mailing list