Permanent link to archive for 5/31/04. Monday, May 31, 2004
Solving the "silent data loss" problem in RSS 2.0

I want to float a trial balloon for solving a problem pointed out by Mark Pilgrim and Sam Ruby.

The problem

Suppose I want to include a left angle bracket in a weblog post, where the bracket is not part of an HTML tag. I dutifully encode the left angle bracket, as required, as <.

Now, when an aggregator reads the feed it decodes the < turning it into a left angle bracket. It then flows the text through an HTML renderer that assumes that a left angle bracket begins an HTML element. Of course in this case it doesn't, and depending on whether it's balanced or not, it can create big problems, in any case the text immediately after the left-angle-bracket won't be visible, creating a "silent data loss" as reported by Sam and Mark.

The proposal

An addition to the proposed rssHints module of a single element, entitled:

<rssHints:descriptionFormat>

with one required attribute, rssHints:type

<rssHints:descriptionFormat rssHints:type="xxx">

which can take on one of two values.

1. text/plain

2. text/html

If the value is text/html, the processor should deal with the description exactly as it currently does, that is, flow it through an HTML parser, or strip HTML, or ???.

If the value is text/plain, it may flow it through an HTML parser, but before doing so it must encode left angle brackets, quotes and ampersands, so as to neuter any text that might be interpreted by the parser as HTML.

Status: This is just for discussion. Do not deploy. Please comment if you have concerns, objections, etc.

Groundrules for discussion: Detente, self-deprecating humor, it's even worse than it appears.

# Posted by Dave Winer on 5/31/04; 11:51:35 AM - --