[Pharo-users] The opposite of encodeForHTTP

Stéphane Ducasse stephane.ducasse at inria.fr
Sun Jul 22 03:12:01 EDT 2012


On Jul 20, 2012, at 6:25 PM, Brenda Larcom wrote:

> I suppose I could unlurk at this point.  :)
> 
> I'm a security geek (specifically, a secure development geek focusing on security architecture) in my day job, and I have a long unmaintained architecture security analysis tool written in Squeak (http://www.octotrike.org/ for the curious), which I have been unmothballing.  We are considering switching to Pharo, partly because we are planning to add some P2P collaboration features we think have an HTTP layer in there somewhere & partly because we like it small, tidy, and self-compatible.  Hence my lurking.

Welcome and I would love to have more people working on these areas :).

> I've done some work on how data validation should be done for security purposes, for my day job.  This includes output encoding and decoding, like what Davide is talking about.  It's pretty tricky to get right because of the large number of contexts, with subtly different rules.  E.g. I would expect encodeForHTTP to be appropriate for HTTP headers, except that e.g. two things you usually want to put in HTTP headers are URIs and cookies, each of which have different rules (for different subparts, even) for what should be encoded.  The differences don't seem like much, but in the wild, my coworkers & I see these sorts of differences lead to vulnerabilities on a daily basis.
> 
> From a security architecture perspective, the absolute best way to handle encoding & decoding for a structured object like an HTTP request or response (or a URI, or a cookie, or an HTML document, or..) is to use a validating parser.  Basically, when you get an HTTP request, parse it & put it in an object structured like the request.  At that time, you know the meaning of each portion of the string you are parsing, so you can interpret the bits correctly/safely.  The object(s) should store the individual strings that are actually content (vs. structure & constants) in a decoded state.  The developer should get everything from the objects, in decoded form, and put everything into the objects in decoded form.  Then, when it is time to send the response, the objects encode everything safely/canonically based on the exact type of objects they are.  This design concentrates the hard stuff (encoding, decoding, canonicalization, layering encodings on top of each other) near the interfaces, at the first/last possible moment enough context is known to interpret the information accurately.  It separates the mechanics of using a protocol or format from the intent of using the protocol.  It lets someone like me easily QA both the library and application code for security.  It is also simple for the developer to use safely (all the dev needs to think about is what objects/content they want to assemble, and the data validation at that layer is taken care of automatically) & is therefore the only design pattern I have seen consistently avoid all encoding-related vulnerabilities in the wild.  
> 
> So what does this mean?  Basically, from a security perspective, encoding & decoding methods should live in the objects they encode and decode, and never be called from outside code.  That is, there should be an HTTPHeader>>fromString: or fromStream: method, which is called from an HTTPResponse >>fromString: or fromStream: method, and no String>>decodeFromHTTP.   Adding a String>>decodeFromHTTP method is easy from the library maintainer's point of view, approximately correct (way more correct than no method at all), and it matches what most languages are doing these days, but it shifts the burden of all that thought about the specific HTTP header & context to the application developer, who is usually just trying to write an application, not learn every single detail of the HTTP & gazillion other standards he would need to do this safely.
> 
> Since this is a suggestion for substantial architecture change that would cause significant backwards compatibility issues throughout the entire Web application stack, and I'm new to Pharo to boot, I am expecting some interesting discussion to occur next.  Or maybe profound silence.  :)

Thanks for the explanation. It makes sense. String is a dead object just counting and assembling characters. So 
Now what I would love to see is if you interested:
	- how can we improve the infrastructure of Pharo?
	step by step or via a big refactoring :)

	- I would add a simple decodeFromHTTP as a convenience method and in the future point to the validators.


> In my back pocket somewhere amongst the code I am unmothballing, I have 95% of a thouroughly documented URI implementation and test suite that follows this pattern and is pedantically compliant with one or another of the URI RFCs (it's old, may not be the most recent).

Bring it to life. We were discussing internally that we would like to have a decent URI implementation and we would like to massively clean 
the URL/URI …. with ZnURL whatever. So it would be great to have a good part.
Now what I see from your mail :) is that you are a kind of perfectionist and you should pay attention (I know some of them) and
you should force yourself to be happy with 80% and release it 
	- 1 your 80% may be the 95% of somebody else
	- 2 release often, make progress is the best way to finish. :)

>  I believe Spoon & Slate are using a previous version of it or its derivatives.  I'll need a fully pedantic HTTP parsing stack to feel comfortable releasing a P2P architecture security analysis tool (high value target, large attack surface, potentially very large professional embarrassment), so whatever isn't available, I expect we'll end up writing.  If Pharo folks are interested in this pattern,

Yes I'm. I will let the other reply to you because I'm far down in south of france but I'm quite sure that we are all interested.

> I would love to contribute my libraries/changes as I finish them, get advice on backward compatibility, performance, and APIs people would like to see, review whatever related code you'd like for security issues, and/or collaborate with any other developer who is interested.

I would love to learn from your expertise.

Stef
> 
> Brenda
> 
> 
> On Jul 20, 2012, at 1:47 AM, Davide Varvello <varvello at yahoo.com> wrote:
> 
>> Good Stef, I opened a new feature as reminder here: http://code.google.com/p/pharo/issues/detail?id=6430
>>  
>> Davide
>> 
>> ----
>> - Cerchi un bravo Dentista, Avvocato, Commercialista? Un buon Hotel, Ristorante, Pizzeria? Io l'ho trovato su Oltre il Passaparola
>> 
>> - Blog: Cambia il Tempo
>> 
>> From: Stéphane Ducasse [via Smalltalk] <[hidden email]>
>> To: Davide Varvello <[hidden email]> 
>> Sent: Thursday, July 19, 2012 10:43 PM
>> Subject: Re: The opposite of encodeForHTTP
>> 
>> Let us fix it and propose a decodeFromHTTP method 
>> 
>> Stef 
>> 
>> On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote: 
>> 
>> > Thanks Sven, 
>> > I was looking for String>>decode..whatever... with no luck :-) 
>> > Cheers 
>> > 
>> > -- 
>> > View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
>> > Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com. 
>> > 
>> 
>> 
>> 
>> 
>> If you reply to this email, your message will be added to the discussion below:
>> http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640822.html
>> To unsubscribe from The opposite of encodeForHTTP, click here.
>> NAML
>> 
>> 
>> 
>> View this message in context: Re: The opposite of encodeForHTTP
>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.





More information about the Pharo-users mailing list