Core Jabber Group
Recommended Practices Submission
Jabber-XML: 19990324
Category: Protocol Extension
Scott Robinson
Patrick McCuller
Many many others... August 1999

Multipurpose Internet Mail Extensions

A recommended practice specification for encoding MIME into the Jabber-XML protocol


RFC 2045 specifies the MIME standard for use in Internet Mail. This document specifies the recommended additions to the current Jabber-XML protocol to support, as functionally possible, the RFC 2045 MIME standard.

1. Introduction and rationale

Jabber is a XML-based server oriented protocol designed to be a universal transport for current Instant Messaging (IM) systems. The current Jabber-XML protocol has been designed to a level for universal transport of plain-text messaging. However, the current Jabber development group's direction for support of non-plain text messaging, such as AOL Instant Messenger (AIM), forces the need for the current Jabber-XML protocol to expand. Discussion within the jdev mailing list has come to a agreement on a practice for implementing the RFC 2045 MIME standard into the Jabber-XML protocol. This document describes the extensions and contains one complex example usage of them.

2. Definitions and terminology

This document assumes you have a basic understanding of Jabber-XML, alternate methods of encoding than US-ASCII, and MIME.

3. MIME header fields and the "<mime>" tag

All MIME header fields are directly mapped onto fields of the <mime/> tag. The <mime/> tag is located within a standard Jabber message's <say/> / CDATA fragments.

3.1 MIME-Version Header Field

The MIME-Version header field has the same usage as the original RFC 2045 specification.

Note that the MIME-Version header field is required within the <mime\> tag in a Jabber message. It is not required for each body part of a multipart entity. It is required for the embedded headers of a body of type "message/rfc822", "message/partial" and in the <mime\> tag of a "message/jabber" if and only if the embedded message is itself claimed to be MIME-conformant.

NOTE TO IMPLEMENTORS: When checking MIME-Version values any RFC 822 comment strings that are present must be ignored. In particular, the following four MIME-Version fields are equivalent:

     MIME-Version: 1.0

     MIME-Version: 1.0 (produced by MetaSend Vx.x)

     MIME-Version: (produced by MetaSend Vx.x) 1.0

     MIME-Version: 1.(produced by MetaSend Vx.x)0

In the absence of a MIME-Version field, a receiving Jabber client (whether conforming to MIME requirements or not) may optionally choose to interpret the body of the message according to client-specific conventions. It should be noted that in practice non-MIME messages usually contain UTF-8/UTF-16 plain-text.

It is impossible to be certain that a non-MIME mail message is actually plain text in the UTF-8/UTF-16 character sets since it might well be a message that, using some set of "international" conventions, includes text in another character set.

3.2 Content-Type Header Field

The Content-Type header field has the same usage as the RFC 2045 specification.

Beyond this syntax, the only syntactic constraint on the definition of subtype names is the desire that their uses must not conflict. That is, it would be undesirable to have two different communities using "Content-Type: application/foobar" to mean two different things. The process of defining new media subtypes, then, is not intended to be a mechanism for imposing restrictions, but simply a mechanism for publicizing their definition and usage. There are, therefore, two acceptable mechanisms for defining new media subtypes:

  1. Private values (starting with "X-") may be defined bilaterally between two cooperating agents without outside registration or standardization. Such values cannot be registered or standardized.
  2. New standard values should be registered with IANA as described in RFC 2048.

The second RFC pertaining to MIME, RFC 2046, defines the initial set of media types for MIME.

Default Jabber-XML messages without a MIME Content-Type header are taken by this protocol to be plain text in the UTF-8 character set, which can be explicitly specified as:

     Content-type: text/plain; charset=utf-8

This default is assumed if no Content-Type header field is specified. It is also recommend that this default be assumed when a syntactically invalid Content-Type header field is encountered. In the presence of a MIME-Version header field and the absence of any Content-Type header field, a receiving User Agent can also assume that plain UTF-8 text was the sender's intent. Plain UTF-8 text may still be assumed in the absence of a MIME-Version or the presence of an syntactically invalid Content-Type header field, but the sender's intent might have been otherwise.

4.0 Content-Transfer-Encoding Header Field

The Content-Transfer-Encoding header field has the same usage as the original RFC 2045 specification.

The following should be noted, especially for the Jabber-XML internationalization effort, that the encoding mechanisms defined here explicitly encode all data in UTF-8. Thus, for example, suppose an entity has header fields such as:

     Content-Type: text/plain; charset=russian-korean
     Content-transfer-encoding: base64

This must be interpreted to mean that the body is a base64 UTF-8 encoding of data that was originally in russian-korean, and will be in that character set again after decoding.

4.1 Quoted-Printable Content-Transfer-Encoding

The Quoted-Printable encoding is intended to represent data that largely consists of octets that correspond to printable characters in the UTF-8 character set. It encodes the data in such a way that the resulting octets are unlikely to be modified by mail transport. Jabber-XML, however, does not have this problem. Quoted-Printable is included only for MIME compatibility. If the data being encoded are mostly UTF-8 text, the encoded form of the data remains largely recognizable by humans. A body which is entirely UTF-8 may also be encoded in Quoted-Printable to ensure the integrity of the data should the message pass through a character-translating, and/or line-wrapping gateway.

5.0 Content-ID Header Field

The Content-Transfer-Encoding header field has the same usage as the original RFC 2045 specification.

There are no implementation notes.

6.0 Content-Description Header field.

The Content-Transfer-Encoding header field has the same usage as the original RFC 2045 specification.

The description is presumed to be given in the UTF-8 character set, although the mechanism specified in RFC 2047 may be used for non-UTF-8 Content-Description values.

7.0 Additional MIME Header Fields

All additional MIME header fields are directly mapped into the fields of the <mime/> tag in the same method standardized header fields are.

8.0 Errata

There is no double-whitespace between the <mime/> and the start of the contained data with Jabber-XML with MIME extensions. The XML portion of MIME, and more specifically the moving of the headers into the <mime/> tag, allow for this.

In Jabber-XML with MIME extensions, the contents of the MIME data are normal XML CDATA and need to be escaped properly.

9.0 A Complex Example

The following is the complex example translated from RFC 2049 into Jabber-XML with MIME extensions.

<to name="Ned Freed"></to>
<subject>A multipart example</subject>
<say>This is the preamble area of a multipart message. Jabber clients that understand multipart format should ignore this preamble.
If you are reading this text, you might want to consider changing to a jabber client that understands how to properly display multipart messages.
<mime content-type="multipart/mixed">
... Some text appears here ...
[Note the lack of a blank between the boundary and the start of the text in this part? In Jabber-XML the headers are embedded in the <mime/> tag and there is no need for a unique boundary marker]
<mime content-type="text/plain; charset=utf-8">
This could have been part of the previous part, but
illustrates explicit versus implicit typing of body
<mime content-type="multipart/parallel">
<mime content-type="audio/basic" content-transfer-encoding="base64">
... base64-encoded 8000 Hz single-channel mu-law-format audio data goes here ...
<mime content-type="image/jpeg" content-transfer-encoding="base64">
... base64-encoded image data goes here ...
<mime content-type="text/enriched">
This is &lt;bold&gt;&lt;italic&gt;enriched&lt;/italic&gt;&lt;/bold&gt;&lt;smaller&gt;as defined in RFC 1896.&lt;/smaller&gt;
Isn't it &lt;bigger&gt;&lt;bigger&gt;cool?&lt;/bigger&gt;&lt;/bigger&gt;
<mime content-type="message/rfc822">
From: (mailbox in US-ASCII)
To: (address in US-ASCII)
Subject: (subject in US-ASCII)
Content-Type: Text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: Quoted-printable

... Additional text in ISO-8859-1 goes here ...
<mime content-type="message/jabber">
    &lt;to name='Jenny(work)'&gt;jenny&lt;/to&gt;
    &lt;to name='HAhah!'&gt;safdsgh@asdfg.asdfasdf&lt;/to&gt;
    &lt;subject&gt;Did you see that?&lt;/subject&gt;
    &lt;say&gt;asdgf asdfkjasgoijqwert asdgaldgjkas&lt;/say&gt;