Ted Leung on the air
Ted Leung on the air: Open Source, Java, Python, and ...
Ted Leung on the air: Open Source, Java, Python, and ...
Sun, 04 Jan 2004
RSS Feed cleanups
Brent Simmons posted tonight about Gzip compression of RSS feeds. There are a few things I'd like to point out.
If you are using Apache, mod_gzip will compress dynamically generated content as well -- my RSS feed is dynamically generated, and it's being compressed by mod_gzip just fine -- you can control compression on a mimetype, response header type, or uri pattern bases, among other things.
I'm glad that Brent posted because, as he pointed out, I'm keeping a list, and I did bug him about it, even before I had a Mac. But in truth I feel a little guilty about it and here's why. After Brent implemented Gzip encoding, he mailed me to get checked off on the list and asked for feeds to test on. He said that he couldn't find that many gzipped feeds to test with. At the time, I thought he was mistaken. Well, Brent also implemented a statistics window in NetNewsWire that shows how many requests NNW made for a feed, and how many 304 and Gzip responses it got. So I've been looking at the statistics, and it's not looking good out there folks. As far as gzipped feeds go, about 10% of the feeds in my NNW (about 900) are gzipped. That's a lot worse than I expected. I understand that this can be tough -- the easiest way to implement gzipping is todo what Brent suggested, shove it off to Apache. That means that people who are being hosted somewhere need to know enough Apache config to turn gzip on. Not likely. Or have enlighted hosting admins that automatically turn it on, but that' doesn't appear to be the case. So blogging software vendors could help a lot by turning gzip support on in the software.
What's even more depressing is that for HTTP conditional get, the figure is only about 33% of feeds. And this is something that the blogging software folks should do. We are doing it in pyblosxom.
If you are using NetNewsWire, you can see the truth for yourself. Just go to the Window menu and select "Show Bandwidth Statistics" (you have to do this after you've pulled your feeds, though). If you are using some other RSS reader, well, you're on your own.
I was thinking of publishing a list of feeds that don't do either gzip or HTTP conditional get, but it would be too long. If you are interested, I've (with the help of some of the other NNW beta testers) written an Applescript that exports the bandwidth statistics as an XML file. The script is available here and the output on my NNW feeds is here (there's no gzip info because there's no Applescript property for it yet). Please only download the feed statistics data if you are really interested. Its about 8600 lines of XML.
So be a good citizen and fix your feed. You'll save bandwidth, and your readers will save both bandwidth and download time.
[00:27] |
[computers/internet/weblogs] |
# |
TB |
F |
G |
2 Comments |
mod_gzip has a couple problems, that convinced me that it was better disabled.
This is for regular HTML, which probably isn't what you were thinking about, but still it might explain why mod_gzip isn't as used as it might be.
The problems are with incompatible browsers, as usual.
The documented problem is that a "Vary: User-Agent" header needs to be sent out so that proxies know that different user agents might be served different content. Unfortunately there are many different flavours of IE, Mozilla, etc. so proxy-cachability of the page is compromised. Not that proxies are used much, but it still is a bummer.
The other problem I experienced (didn't search much, probably is known) is that Netscape 4.x only decompresses content with "well known" mimetypes, so text/html will show ok, but text/x-apache-log won't be decompressed.
I suspect that there are many many more problems similar to these.
I suppose that one could turn on mod_gzip only for text/xml and application/rss+xml and still save a lot of bandwidth though.
Posted by Duncan Wilcox at Sun Jan 4 13:42:47 2004
This is for regular HTML, which probably isn't what you were thinking about, but still it might explain why mod_gzip isn't as used as it might be.
The problems are with incompatible browsers, as usual.
The documented problem is that a "Vary: User-Agent" header needs to be sent out so that proxies know that different user agents might be served different content. Unfortunately there are many different flavours of IE, Mozilla, etc. so proxy-cachability of the page is compromised. Not that proxies are used much, but it still is a bummer.
The other problem I experienced (didn't search much, probably is known) is that Netscape 4.x only decompresses content with "well known" mimetypes, so text/html will show ok, but text/x-apache-log won't be decompressed.
I suspect that there are many many more problems similar to these.
I suppose that one could turn on mod_gzip only for text/xml and application/rss+xml and still save a lot of bandwidth though.
Posted by Duncan Wilcox at Sun Jan 4 13:42:47 2004
I haven't run into most of these problems -- or nobody cares enough to report them. I do know that I'm saving a ton of bandwidth compared to no compression.
But it's still amazing to me how litte support for http conditional get there is.
Posted by Ted Leung at Mon Jan 5 13:22:45 2004
But it's still amazing to me how litte support for http conditional get there is.
Posted by Ted Leung at Mon Jan 5 13:22:45 2004
You can subscribe to an RSS feed of the comments for this blog:
Add a comment here:
You can use some HTML tags in the comment text:
To insert a URI, just type it -- no need to write an anchor tag.
Allowable html tags are:
You can also use some Wiki style:
URI => [uri title]
<em> => _emphasized text_
<b> => *bold text*
Ordered list => consecutive lines starting spaces and an asterisk
To insert a URI, just type it -- no need to write an anchor tag.
Allowable html tags are:
<a href>
, <em>
, <i>
, <b>
, <blockquote>
, <br/>
, <p>
, <code>
, <pre>
, <cite>
, <sub>
and <sup>
.You can also use some Wiki style:
URI => [uri title]
<em> => _emphasized text_
<b> => *bold text*
Ordered list => consecutive lines starting spaces and an asterisk