1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
|
<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw='http://wellformedweb.org/CommentAPI/' xmlns:dc='http://purl.org/dc/elements/1.1/' xmlns:rl='http://www.purl.org/RESTLog/'>
<channel>
<title>The Well-Formed Web</title>
<link>http://wellformedweb.org/news/</link>
<description>Exploring the limits of XML and HTTP</description>
<dc:creator>BitWorking, Inc</dc:creator>
<item>
<title>Should you use Content Negotiation in your Web Services?</title>
<link>http://bitworking.org/news/WebServicesAndContentNegotiation</link>
<description>
<p>Should you use Content Negotiation when building your web service?
The short answer is no. There are definite problems with <abbrev title="Content Negotiation">conneg</abbrev>
and I can give some examples of problems I have run into and also point to problems
other have run into.</p>
<p>First let's back up and explain Content Negotiation. Your browser is
a generic display program and can take in various kinds of media, such
as HTML, JPEGs, CSS, Flash, etc. and display it for you. The first thing to
note is that each of those kinds of media have different mime types.
Each format has it's own registered mime type and when a client
does a GET on a URL it gets back not only the content but the response
also includes a <code>Content-Type:</code> header which lists
the mime-type of what is in the body.
</p>
<p>One of the interesting things about HTTP is that it allows
the same URI to have multiple representations. For example I
could have a URL that had both <code>plain/text</code> and <code>text/html</code>
representations. Now that leads to two obvious questions.</p>
<ol>
<li>How does the server know which represenation to serve?</li>
<li>How can the browser influence the servers choice to get something it can handle?</li>
</ol>
<p>Let's start by answering question two first. The browser uses the <code>Accept:</code>
header to list out the mime-types that it is willing to accept. There is also a weighting
scheme that allows the client to specify a preference for one media type
over another. For example, here is the capture of some of the headers, including the <code>Accept:</code> header,
sent by Mozilla when it does a GET on a URI:</p>
<pre class="example"><code>Accept: text/xml,application/xml,application/xhtml+xml,\
text/html;q=0.9,text/plain;q=0.8,video/x-mng,\
image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate,compress;q=0.9
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
</code></pre>
<p>The <code>Accept:</code> header list the mime-types that the browser can
handle along with weights of the form <code>q=</code> where the argument
is a floating point number between 0 and 1. The weights indicate a preference
for that media type, with a higher number inidicating a higher preference. Note that
there are several bits of complexity I am going to ignore for now. The first is the last
type the Mozilla browser says in can accept, */*;q=0.1. This is a wild card
match, which will match any mime-type that the server could want to serve up. The second
is that there are multiple Accept headers, one for language, one for encoding, another
for charset. How these over-lap and influence the response sent won't be covered here.
</p>
<p>Now to answer the first question. The server looks at the available representations
is has and servers up the one with the highest preference to the client.
Based on the <code>Accept:</code>
header it sends an appropriate representation back and indicates the type it
chose using the <code>Content-Type:</code> header.</p>
<p>This seems like a really cool and vastly under utilized feature of HTTP. It also
seems particularly intriguing for web services. You could return
JPEGs from that mapping service for the older client platforms, but also
serve up SVG for the newer clients so they can scale and rotate their maps.
What could possibly go wrong?</p>
<p>The first thing that could go wrong is a bug or mis-configuration on the client or the server.
This has happened to me in the
past. The W3C does conneg on some of their recommendations, returning either HTML or plain
text based on the clients capabilities. This is fine, but one day their server was
either confused or mis-configured because it would only serve the recommendation in <code>plain/text</code>.
I really needed the HTML form, but after trying multiple browsers from multipe locations I could only retrieve the text
format. I ended up pulling the HTML version out of the Google cache.</p>
<p>The second problem that I ran across highlights the real core problem with conneg. I was
trying to use the W3C XSLT service to do some transformations on my web pages. Now the server side
software I use to run Well-Formed Web does conneg and can return either HTML or an RSS item
fragment for each URI. At the time I was serving up XHTML 1.0, which is valid XML and
thus good input into an XSLT service. So the way the XSLT service works is that you enter two URIs, one
for the source content and the other for the XSLT sheet to apply to the source content.
My transformation kept failing and it was because of the
Accept headers that the XSLT service sent when it went to retrieve the source content.
My server kept returning the RSS item fragment and not
the XHTML. Now this would have been fine if I wanted to apply an XSLT sheet to my RSS item fragment, but in this
case I wanted it to apply to the XHTML. Note that the problem could have been completely reversed, I could have
been trying to apply the XSLT to the RSS item and not to the XHTML and my server could have returned
the XHTML all the time. The crux of the problem is that when I gave the URI to the XSLT transformation
service I have no way of specifying what mime-type to request. I get no chance to tweak the
services <code>Accept:</code> header.
</p>
<p>Let's cover that again to clarify. If I hand you a URI only, and that URI supports conneg,
then I get no control over which representation you retrieve. In the cases where you are
passing a URI into a service that is later going to retrieve a represenation from that URI, you
really have no idea which representation it's going to get. That could mean that you end up
passing your RSS feed to the W3C HTML validator, or you end up passing XHTML instead of RSS into
an XSLT translator service, or you end up passing a 12MB PNG to a handheld instead of
that 20KB SVG file. You end up with a problem that is hard to debug and
one that wouldn't exist if each URI had only one mime-type.</p>
<h3>Further Reading</h3>
<p><a href="http://norman.walsh.name/2003/07/02/conneg">Norman Walsh has also run into problems</a> with Content Negotiation.</p>
<p>The issue of using fragment identifiers with conneg has not only come up but was important enough to
merit mention in the W3C document <a href="http://www.w3.org/TR/webarch/#frag-conneg">Architecture of the World Wide Web</a>.</p>
</description>
<dc:date>2003-09-06T21:54:43-05:00</dc:date>
<wfw:comment>http://bitworking.org/news/comments/WebServicesAndContentNegotiation</wfw:comment>
<wfw:commentRss>http://bitworking.org/news/WebServicesAndContentNegotiation?crss</wfw:commentRss>
</item>
<item>
<title>Google2Atom</title>
<link>http://wellformedweb.org/news/Google2Atom</link>
<description>
<p>Welcome to the Google2Atom web service. Just enter your
search and your <a href="http://www.google.com/apis/">Google key</a>
below. Once you press "Search" you will get an <a href="http://www.mnot.net/drafts/draft-nottingham-atom-format-00.html">
Atom</a> feed of the search results.
</p>
<form method="get" action="http://wellformedweb.org/cgi-bin/google2atom.cgi">
<p><input size="50" name="q"/></p>
<p>Google Key: <input size="20" name="license_key"/></p>
<p><input type="submit" value=" Search "/></p>
</form>
<hr />
<p><strong>Note:</strong> The Google Key is no longer mandatory, if it's not
supplied it will use my own key. In light of that please feel free to
use my key for experimentation, but if you start making heavy use
of this service please get your own Google API Key to avoid
limiting others use of this service.</p>
<p>This is a REST based reformulation of the Google API. As such it uses
query parameters in a GET based HTTP request to do the search. That is, it works
just like the regular google web page, but this form returns
a well-formed XML document instead of a web page. Why is this better?</p>
<dl>
<dt>Simplicity</dt>
<dd>
It works just like the google web page, so it is
conceptually easier to understand.
</dd>
<dt>Composability</dt>
<dd>Since the request is just a simple GET the results of a query can be composed
with other web services. For example, the results could be transformed using
XSLT or fed into a validator.
</dd>
</dl>
<h3>Bonus Features</h3>
<p>One feature found in this interface that is not found
in the original Google API is the well-formedness of the
results content.
<a href="http://bitworking.org/news/Announcing_pyTidy">PyTidy</a>
is used to transform the HTML
snippets from the Google API into well-formed XML and place
those into 'content' elements with type='text/html' and
mode='xml'.
</p>
<h3>Colophon</h3>
<p>Google2Atom is written in <a href="http://www.python.org">Python</a> and uses
both the <a href="http://bitworking.org/news/Announcing_pyTidy">
pyTidy</a> and <a href="http://www.diveintomark.org/projects/pygoogle/">
pyGoogle</a> libraries.</p>
</description>
<dc:date>2003-11-22T01:18:42-05:00</dc:date>
<wfw:comment>http://wellformedweb.org/news/comments/Google2Atom</wfw:comment>
<wfw:commentRss>http://wellformedweb.org/news/Google2Atom?crss</wfw:commentRss>
</item>
<item>
<title>wfw namespace elements</title>
<link>http://wellformedweb.org/news/wfw_namespace_elements</link>
<description>
<p>The <code>wfw</code> namespace, http://wellformedweb.org/CommentAPI/
contains multiple elements. As more are added in various places I will
endeavor to keep the list here updated.</p>
<dl>
<dt>wfw:comment</dt>
<dd>The first element to appear in this namespace is <code>comment</code>. This element appears
in RSS feeds and contains the URI that comment entries are to be POSTed to. The details
of this are outlined in the <a href="http://wellformedweb.org/story/9">CommentAPI Specification</a>.<dd>
<dt>wfw:commentRss</dt>
<dd>The second element to appear in the wfw namespace is <code>commentRss</code>. This element
also appears in RSS feeds and contains the URI of the RSS feed for comments on that Item.
This is documented in <a href="http://www.sellsbrothers.com/spout/default.aspx?content=archive.htm#exposingRssComments">Chris Sells' Specification</a>. Note that for quite a while this page has had a typo and erroneously referred to
this element as 'commentRSS' as opposed to the correct 'commentRss'. Feed consumers should be aware
that they may run into both spellings in the wild. Please see
<a href="http://www.intertwingly.net/blog/2006/04/16/commentRss">this page</a> for
more information.
</dd>
</dl>
</description>
<dc:date>2003-10-10T13:11:46-05:00</dc:date>
<wfw:comment>http://wellformedweb.org/news/comments/wfw_namespace_elements</wfw:comment>
<wfw:commentRss>http://wellformedweb.org/news/wfw_namespace_elements?crss</wfw:commentRss>
</item>
<item>
<title>The HTTP verb PUT under Apache: Safe or Dangerous?</title>
<link>http://wellformedweb.org/news/PUT_SaferOrDangerous</link>
<description>
<p>"Is the HTTP verb PUT under Apache safe or dangerous?" This is a question I come across often, and have now
run into it twice in the work on Atom. So is it safe? The answer is maybe.</p>
<p>Here are two such examples:</p>
<blockquote><p>
Using DELETE and PUT may be the "right thing to do"
in an ideal world, but the fact of the matter is that a
lot -- if not the vast majority -- of webservers do not allow these
operations. </p></blockquote>
<blockquote><p>If anyone knows of a newer article describing
HTTP PUT with apache, I would be very interested in seeing it. Because,
due to my experience with PUT, you have to define a single PUTScript in
httpd.conf, and if you PUT something to an apache server at the URI
www.example.com/blog/entries/1 or something similar, apache passes all
of the information to the PUTScript, not to anything else.</p></blockquote>
<p>Both of the above quotes are from the <a href="http://www.intertwingly.net/wiki/pie/RestEchoApiPutAndDelete">Atom Wiki discussion
of the use of PUT</a>. A little digging reveals that the ApacheWeek article
<a href="http://www.apacheweek.com/features/put">Publishing Pages with PUT</a>
is referenced most often when the danger of PUT is raised. <p>
<p>That ApacheWeek article does talk about the dangers of PUT and
the cautions you need to follow when writing a script that
does content publishing via PUT. That key part of that phrase
is <strong>content publishing</strong>. That means that PUT is being
used to upload arbitrary content to the server and the client
is determining via the URI where the content should be stored.
Now you can imagine how this might be dangerous, for example
not correctly checking URI paths that include <code>../..</code> could
let a malicious agent re-write your <code>.bashrc</code>.</p>
<p>Implementing a PUT script can be difficult and a security hazard
in the context of content publishing, but that's the case because
the client is choosing the target URI and the client could upload
any content type. In the case of Web Services in general, and
the AtomAPI in particular, PUT is used in a much narrower manner
and avoids those potential security problems.</p>
<p>In the case of the AtomAPI PUT is only allowed on URIs that point
to a pre-existing resource. The
AtomAPI follows a general idiom for editing resources of doing
a GET to retrieve the original XML, then a PUT on the same URI
to upate that resource with the edited XML. No URIs are created
by doing a PUT. PUT is not accepted on arbitrary URIs. This makes
the use of PUT in the context of the AtomAPI just as safe as POST.</p>
<p>There are quite a few ways to configure Apache to process
incoming requests. In particular it is possible to have a single
script that handles all PUT requests below a chosen directory. This
strategy, and all of the associated security concerns associated with
it, are covered fully in the <a href="http://www.apacheweek.com/features/put">Publishing Pages with PUT</a>.</p>
<p>When processing request with a CGI script all the PUT requests
will come through. The verb is passed to the CGI program via the REQUEST_METHOD environment
variable, and the program decides what to do with the content.</p>
<p>Using PUT propoerly has advantages in Web Service development. First,
Apache lets you control security based on the verb using the
<a href="http://httpd.apache.org/docs-2.0/mod/core.html#limit">Limit</a>
and <a href="http://httpd.apache.org/docs-2.0/mod/core.html#limitexcept">LimitExcept</a>
directives, which
let you restrict access controls based on the verb. Here is a sample
of one of my <code>.htaccess</code> files that restricts the use of
all verbs except GET to the CGI program <code>Bulu.cgi.</code></p>
<pre class="example"><code>&lt;Files Bulu.cgi>
AuthType Basic
AuthName myrealm
AuthUserFile /path/to/my/password/file
&lt;LimitExcept GET>
Require valid-user
&lt;/LimitExcept>
&lt;/Files>
</code></pre>
<p>In addition, the <a href="http://httpd.apache.org/docs-2.0/mod/mod_actions.html#script">Script</a>
directive can be used to dispatch to a CGI program based on the verb used:</p>
<pre class="example"><code>Script PUT /cgi-bin/put.cgi</code></pre>
<p>The second advantage using PUT brings is clarity. Given the idiom
of using GET/PUT in tandem on a URI to edit resources PUT
clearly signals what the interface is doing.</p>
<h4>Resources</h4>
<p><a href="http://www.apacheweek.com">ApacheWeek</a>: <a href="http://www.apacheweek.com/features/put">Publishing Pages with PUT</a></p>
<p><a href="http://www.intertwingly.net/wiki/pie/RestEchoApiPutAndDelete">RestEchoApiPutAndDelete</a>: Discussion on the use of PUT
and DELETE in the AtomAPI.</p>
<p><a href="http://httpd.apache.org/docs-2.0/mod/mod_actions.html">mod_actions</a>: An Apache module for
controlling dispatching based on verb or content-type.</p>
<p><a href="http://www.w3.org/Amaya/User/Put.html">Configuring your WWW server to understand the PUT method</a>, from the W3Cs Amaya project documentation.</p>
<p><a href="http://www.webdav.org/">WebDAV</a> is also something you may be interested in if you
are looking for ways to publish your content using HTTP. WebDAV stands for
"Web-based Distributed Authoring and Versioning". It is a set of extensions to the HTTP
protocol which allows users to collaboratively edit and manage files on remote web servers.
<a href="http://httpd.apache.org/docs-2.0/mod/mod_dav.html">
Mod_dav</a> in an Apache module that implements WebDAV.</p>
</description>
<dc:date>2003-08-23T00:45:25-05:00</dc:date>
<wfw:comment>http://wellformedweb.org/news/comments/PUT_SaferOrDangerous</wfw:comment>
<wfw:commentRss>http://wellformedweb.org/news/PUT_SaferOrDangerous?crss</wfw:commentRss>
</item>
<item>
<title>Six Plus One</title>
<link>http://wellformedweb.org/news/SixPlusOne</link>
<description>
<p>Previously I talked about the <a href="http://bitworking.org/news/Six_Places">six different places</a> there are to
store information in an HTTP transaction. This is slightly misleading.
</p>
<p> To review, the six places are:</p>
<ol>
<li>Request URI</li>
<li>Request Headers</li>
<li>Request Content</li>
<li>Response Status Code</li>
<li>Response Headers</li>
<li>Response Content</li>
</ol>
<p>This is slightly misleading because the URI is listed as a single
storage location. This isn't the best characterization, as it really
contains two different sets of information: the path, and the query parameters.</p>
<p>Now the path part of a URI usually corresponds to the directory structure on the server.
But remember that the path structure of a server is completely controlled
by that server and it need not corresponse to any file or directory strucure.
While it is at times convenient to map it to a directory structure, this isn't required,
and it is possible to pass path information to a
CGI program. For example, if you do a GET on the following URL:</p>
<pre class="example"><code>http://example.org/cgi-bin/test.py/fred/12
</code></pre>
<p>and there exists a program named <code>test.py</code> in the <code>cgi-bin</code> directory
then that program will be executed. The remaining path after the program is passed
to the CGI program in the PATH_INFO environment variable. In contrast, if query
parameters are passed in, they are passed to the CGI program
via the QUERY_STRING environment variable.</p>
<p>For example, if this is the script <code>test.py</code>:</p>
<pre class="example"><code>import os
print "Content-type: text/plain\n\n"
print "PATH_INFO = %s" % os.environ['PATH_INFO']
print "QUERY_STRING = %s" % os.environ['QUERY_STRING']</code></pre>
<p>And it handles the GET for this URI:</p>
<pre class="example"><code>http://localhost/cgi-bin/test.py/reilly/12?id=234454</code></pre>
<p>It will display:</p>
<pre class="example"><code>PATH_INFO = /reilly/12
QUERY_STRING = id=234454
</code></pre>
<p>Note how the piece of the path below test.py has been stripped off and made
available via <code>PATH_INFO</code>, while the query parameters are
stored in the QUERY_STRING environment variable.
</p>
<p>So HTTP, via the structure of a URI, gives you two distinct places
to store information, one in the path and the second in the query parameters.
This isn't even the full story, because if you are running Apache and have
the ability to use .htaccess files you can use
<a href="http://httpd.apache.org/docs/mod/mod_rewrite.html">mod_rewrite</a> and map URIs so that they appear
as paths but show up in the CGI as query parameters, but we won't cover that
now.
</p>
</description>
<dc:date>2003-08-03T01:34:49-05:00</dc:date>
<wfw:comment>http://wellformedweb.org/news/comments/SixPlusOne</wfw:comment>
<wfw:commentRss>http://wellformedweb.org/news/SixPlusOne?crss</wfw:commentRss>
</item>
</channel>
</rss>
|