

  INTERNET-DRAFT                                      Eric A. Hall, Editor 
  Document: draft-hall-dm-idns-00.txt                           Consultant 
  Expires: May 2002                                          November 2001 
      
      
                  The Internationalized Domain Name System 
      
      
     Status of this Memo 
      
     This document is an Internet-Draft and is in full conformance with 
     all provisions of Section 10 of RFC2026. 
      
     Internet-Drafts are working documents of the Internet Engineering 
     Task Force (IETF), its areas, and its working groups. Note that 
     other groups may also distribute working documents as Internet-
     Drafts. 
      
     Internet-Drafts are draft documents valid for a maximum of six 
     months and may be updated, replaced, or obsoleted by other 
     documents at any time. It is inappropriate to use Internet-Drafts 
     as reference material or to cite them other than as "work in 
     progress." 
      
     The list of current Internet-Drafts can be accessed at 
     http://www.ietf.org/ietf/1id-abstracts.txt. 
      
     The list of Internet-Draft Shadow Directories can be accessed at 
     http://www.ietf.org/shadow.html. 
      
      
  1.      Abstract 
      
     The principle intention of this specification is to facilitate the 
     deployment of a completely internationalized domain name syntax 
     and service which new protocols, applications and host systems can 
     use, but without disrupting the existing infrastructure. Towards 
     that end, this document describes a series of elective 
     encapsulation services and protocol extensions which cumulatively 
     allow internationalized domain names to be stored and transmitted 
     in the existing DNS message and within application data streams, 
     according to the compliance level of the participating systems. 
      
   
   
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
      
     Table of Contents 
      
     1.   Abstract..................................................1 
     2.   Definitions and Terminology...............................3 
     3.   Introduction..............................................4 
       3.1.  Background.............................................4 
       3.2.  Objectives.............................................5 
       3.3.  Common Usage Scenarios.................................7 
       3.4.  User Audiences.........................................9 
       3.5.  Service Overview......................................11 
       3.6.  Process Example.......................................13 
     4.   The Internationalized Namespace..........................19 
       4.1.  Internationalized Domain Names and Labels.............20 
       4.2.  Internationalized Host Identifiers....................27 
       4.3.  STD13 Domain Names....................................28 
       4.4.  STD13 Host Identifiers................................29 
     5.   Transfer Encodings and Label Types.......................30 
       5.1.  The EDNS/UTF-8 Label Type.............................31 
       5.2.  The STD13 Legacy Label Type...........................33 
     6.   Application Guidelines...................................36 
       6.1.  Input and Output Charsets.............................37 
       6.2.  Protocol and Application Data.........................38 
       6.3.  DNS Lookups and Resolver Calls........................40 
     7.   Resolver Guidelines......................................42 
       7.1.  Resolver APIs.........................................42 
       7.2.  Query Processing Services.............................44 
       7.3.  The Hosts Database....................................48 
     8.   Server Guidelines........................................49 
       8.1.  Internationalized Zones...............................50 
       8.2.  Namespace Visibility Restrictions.....................51 
       8.3.  The Master File Format................................52 
     9.   Caching Guidelines.......................................53 
     10.  Security Considerations..................................53 
     11.  IANA Considerations......................................54 
     12.  References...............................................54 
     13.  Acknowledgements.........................................55 
     14.  Editor's Address.........................................55 
      
   
  Hall                    I-D Expires: May 2002               [page 2] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
      
  2.      Definitions and Terminology 
      
     This document unites, enhances and clarifies several pre-existing 
     technologies. Readers are expected to be familiar with the 
     following specifications: 
      
          [AMC-ACE-Z] <draft-ietf-idn-amc-ace-z>, "AMC-ACE-Z version 
            0.3.1" 
      
          [NAMEPREP] <draft-ietf-idn-nameprep>, "Preparation of 
            Internationalized Host Names" 
      
          [STD13] (RFC 1034) "Domain names - concepts and facilities", 
            (RFC 1035) "Domain names - implementation and 
            specification" 
      
          [STD3] (RFC 1122) "Requirements for Internet Hosts -- 
            Communication Layers", (RFC1123) "Requirements for Internet 
            Hosts -- Application and Support" 
      
          [BCP18] (RFC 2277) "IETF Policy on Character Sets and 
            Languages" 
      
          [RFC2279] "UTF-8, a transformation format of ISO 10646" 
      
          [RFC2671] "Extension Mechanisms for DNS (EDNS0)" 
      
      
     The following abbreviations are used throughout this document: 
      
          UCS (Universal Character Set)  The ISO/IEC 10646 character 
            set repertoire, as represented by the Unicode 3.1 
            specification. 
      
          ACE (ASCII-Compatible Encoding)  A transfer encoding which 
            encodes UCS character codes into a seven-bit codespace 
            which is compatible with US-ASCII. 
      
          UTF-8 (UCS Transformation Format, Eight-Bit)  A transfer 
            encoding which encodes UCS characters into an eight-bit 
            codespace which is compatible with DNS message formats. 
      
     The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 
     NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" 
     in this document are to be interpreted as described in RFC 2119. 
   
  Hall                    I-D Expires: May 2002               [page 3] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
      
      
  3.      Introduction 
      
     The domain name system (DNS) [STD13] currently defines a message, 
     namespace and protocol. Although the DNS message is capable of 
     transferring eight-bit character codes as protocol data, 
     applications are currently limited to a subset of US-ASCII when 
     they interact with the DNS namespace, and this restricted syntax 
     is enforced by almost every TCP/IP application and protocol which 
     utilizes domain names as embedded data (including, surprisingly, 
     the DNS protocol). 
      
     In order to allow for the use of a larger range of characters in 
     the namespace, this document extends and clarifies a variety of 
     Internet specifications so that characters from the Universal 
     Character Set (UCS) [ISO10646] may be used in domain names. This 
     document also extends the DNS message structure to allow for the 
     use of UTF-8 [RFC2279] encoded characters for the purpose of 
     transferring these domain names, but also provides an ASCII-
     compatible encoding (ACE) [AMC-ACE-Z] of these character codes 
     which existing protocols and applications can use to access the 
     internationalized domain names, and also provides identification 
     mechanisms which allow the end-point systems to downwardly 
     negotiate when needed. Finally, this document defines behavior for 
     DNS systems which implement this architecture, including the end-
     point applications which generate and store DNS domain names, and 
     the resolvers, caches and servers which process them. 
      
     The mechanisms presented here are elective. Developers, zone 
     administrators and network operators who wish to make use of the 
     internationalized domain names may do so according to their own 
     schedule. Those developers, administrators and operators who 
     cannot or prefer not to implement the specified extensions can 
     continue to use their legacy systems, and will still be able to 
     access resources from the internationalized domain name system. 
      
      
  3.1.    Background 
      
     From one perspective, DNS is already an "eight-bit clean" system, 
     in that the structured DNS message is capable of storing and 
     transmitting eight-bit data without any additional effort. 
     However, this perspective only considers one particular facet of 
     the domain name system, and ignores the more critical aspect of 
   
  Hall                    I-D Expires: May 2002               [page 4] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     the DNS namespace, which has rules that are entirely different 
     from those which govern the message format. 
      
     The DNS namespace (or more appropriately, the view of the 
     namespace which applications use and enforce) is governed by rules 
     set forth in RFC952 [RFC952], STD3 [STD3], and STD13, which 
     collectively define the characters that are eligible for use with 
     host names. These rules are meant to provide a common template 
     which may be applied to either the DNS namespace or a local hosts 
     database, such that a query for "host.example.com" can be 
     processed through either system. The range of valid characters 
     currently defined are the letters, numbers and hyphen characters 
     from US-ASCII [ASCII] (additional rules also govern the valid 
     order and length of a host name). Character code values outside of 
     this range are valid in domain name messages, but are undefined 
     when used in the namespace, and are subject to interpretation by 
     the applications which generate them. 
      
     The host name rules are enforced by almost every application and 
     protocol which uses DNS to identify a host or system. This 
     includes network utilities such as ping and traceroute which 
     simply identify systems by name, and complex protocols such as 
     SMTP which use domain names to determine message-routing paths. 
     Portions of the DNS protocol itself are also affected by these 
     restrictions, such as the domain names which may be used for NS 
     resource records with sub-domain delegation operations (since 
     these servers are connection targets, they are also required to be 
     compliant with the host name rules). 
      
     Because these domain names are so pervasive throughout the 
     Internet (and even within proprietary applications that run on 
     private networks), it is not possible to declare a "flag day" at 
     which eight-bit domain names will be considered valid encodings of 
     a particular character set. Instead, an extended namespace with a 
     larger set of charset rules must be defined, an extended DNS 
     protocol capable of supporting these domain names must be 
     deployed, and a transitional mechanism which allows the old and 
     new systems to interact must be established. This document 
     attempts to meet these objectives. 
      
      
  3.2.    Objectives 
      
     In broad terms, this document has one overall goal, which is to 
     facilitate the creation and use of an internationalized domain 
     name system around a UCS namespace, a collection of UTF-8 and 
   
  Hall                    I-D Expires: May 2002               [page 5] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     legacy-compatible encodings which are suitable for transferring 
     internationalized domain names within DNS and the affected 
     application data streams, and a negotiation mechanism which allows 
     end-point systems to identify the encoding that they will use for 
     a particular operation. 
      
     One of the objectives stated above is to internationalize the 
     existing DNS namespace, by allowing UCS characters to be used in 
     host names and sub-domain delegations in old and new zones 
     equally. As such, this document does not define a new namespace, 
     but instead defines mechanisms by which leaf-nodes and sub-domains 
     may be created within the existing hierarchy. 
      
     UTF-8 was chosen as the primary transfer encoding of these domain 
     names for several reasons. For one, there is a wide availability 
     of tools and expertise surrounding UTF-8, and it is already widely 
     deployed within development environments, operating systems and 
     applications. Furthermore, BCP18 [BCP18] requires that new 
     application protocols be able to use UTF-8 as application data, 
     and for many applications, this specifically means domain names 
     which are passed as data. All signs indicate that UTF-8 is 
     currently and will continue to be the preferred eight-bit encoding 
     on the Internet, and this specification embraces this position in 
     its design. 
      
     However, most of the network services currently in use are bound 
     by the legacy host naming restrictions, and those applications and 
     protocols will also need to be able to interact with resources 
     from the internationalized namespace, even though they will not be 
     compliant with the UTF-8 encoding mechanisms defined in this 
     document. In order to allow these systems to participate, this 
     specification also embraces the use of ACE as a seven-bit 
     backwards-compatible encoding for legacy systems to use. 
      
     Note that even though a single encoding could have been specified 
     by this document, past and present requirements would not have 
     been satisfied by a single choice. For example, supporting UTF-8 
     alone would mean isolating legacy systems from resources in the 
     UCS namespace, while supporting ACE alone would not have provided 
     a truly internationalized namespace (the ACE encoded domain names 
     still appear in user data quite frequently). By allowing the UTF-8 
     and ACE encodings to coexist, the existing and emerging 
     communities can both be served. 
      
     Because both encodings will be active during the same time period, 
     this document also defines DNS protocol extensions which allow the 
   
  Hall                    I-D Expires: May 2002               [page 6] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     end-point systems to detect the encoding that is in use for a 
     particular query/response pair. Note that these negotiation 
     mechanisms not only allow new and legacy systems to interoperate, 
     but they also provide a transition service for developers, zone 
     administrators and end-users, in that ACE encoded domain names can 
     be initially deployed within existing applications and DNS 
     systems, while individual elements of the infrastructure can be 
     upgraded without disturbing other components. 
      
      
  3.3.    Common Usage Scenarios 
      
     Discussion of the mechanism provided by this document depends upon 
     the usage context of the domain names themselves. Domain names are 
     extremely pervasive, and are used by almost every TCP/IP protocol 
     and application in one form or another. However, most usages fall 
     under one or more of the following scenarios: 
      
        *   Connection identifiers  Domain names are most commonly 
            used as host-specific identifiers for outbound connection 
            requests, whether this be for a command-line application 
            such as ping, or as a host name which is stored in an 
            application's configuration file. Another common usage 
            scenario for connection identifiers is with reverse 
            lookups, where a server is logging incoming connections by 
            the corresponding domain name, or where a program such as 
            netstat is displaying all of the application sessions which 
            are currently active on a host. In both of these cases, 
            domain names are passed through applications to a resolver, 
            resulting in DNS queries and responses which eventually 
            provide the requested DNS data. 
      
            A related use (but one which does not generate DNS 
            messages) is determining the host name of the local system. 
            This is commonly found with applications and protocols that 
            need to display the domain name of the local system as part 
            of a protocol operation (such as an SMTP greeting banner) 
            or as application data. 
      
            Connection identifiers (and lookups in general) are 
            probably the largest single use of domain names today, and 
            this is likely to be the case with internationalized domain 
            names as well. This document fully supports the use of 
            internationalized domain names for lookup operations, as 
            long as the calling application, the stub resolver, the 
            local caching servers, and the authoritative servers for 
   
  Hall                    I-D Expires: May 2002               [page 7] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
            the specified domain name are compliant with this 
            specification. If any of these components are not capable 
            of supporting internationalized domain names in this 
            manner, the ACE equivalent domain name will be negotiated 
            for the operation at hand. 
      
        *   Protocol data  Some application protocols exchange domain 
            names as protocol data, with those domain names either 
            determining or altering a service-specific operation. 
            Examples of this usage include SMTP envelopes ("RCPT TO 
            <user@domain.dom>") where the domain name is used to 
            determine whether or not a particular email message should 
            be accepted for delivery, the HTTP HOST header field which 
            identifies a specific document tree on a shared server, 
            BOOTP/DHCP options, WHOIS input, and more. 
      
            Because these protocols treat domain names as protocol 
            data, most of these protocols also have specific formatting 
            requirements which must be addressed before UTF-8 domain 
            names can be used by these protocols directly. This 
            document is intended to facilitate the use of UTF-8 encoded 
            domain names in this manner, although it is expected that 
            most of the protocol development groups will need to 
            develop negotiation mechanisms before these protocols can 
            use internationalized domain names directly. Until such 
            work is completed, ACE equivalent domain names can be used 
            to provide these protocols with access to the 
            internationalized namespace. 
      
        *   Structured application data  Structured application data 
            is similar to protocol data in that it can trigger or 
            affect some protocol action, although this will not always 
            occur. For example, a web browser can process an embedded 
            IMG link which may be present in a web page, while a user 
            can manually follow an embedded email link which is also 
            stored in the same web page; even though both usage models 
            share the same structured data format (URLs), they are 
            processed differently by the application. Similarly, email 
            messages typically contain multiple domain names as 
            structured data in the message headers, and some of these 
            domain names will directly affect subsequent protocol 
            operations, while others will not. 
      
            Because of this ambiguity, this document defines no 
            specific treatment for structured application data. In some 
            cases, no additional mechanisms will be required, while 
   
  Hall                    I-D Expires: May 2002               [page 8] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
            other scenarios will require negotiation mechanisms before 
            an internationalized domain name can be used in the 
            structured data (with ACE being required as the interim 
            format). Each protocol development group is encouraged to 
            analyze each usage independently, to classify the usage as 
            a connection identifier, protocol data, or unstructured 
            application data, and to determine the appropriate course 
            of action for each usage accordingly. 
      
        *   Unstructured application data  Many application protocols 
            provide free-text data which can contain domain names, but 
            with those domain names existing as unstructured data. For 
            example, an email message which is provided as a text/plain 
            MIME body part may contain a domain name which identifies a 
            system or service in the context of a specific application, 
            but in an unstructured form ("your files were moved from 
            server1 to server2"). Similarly, an email address may be 
            provided in WHOIS output, but as unstructured data which 
            does not affect the protocol. 
      
            Given the application-specific nature of this data, it 
            cannot be managed by any global protocol or process. Where 
            a protocol has rules or restrictions on the data itself, 
            then those rules are maintained, but some formatting rules 
            may need to be extended before internationalized domain 
            names (or their equivalents) can be encoded in the 
            application data. For example, internationalized domain 
            names in email messages may need to be converted to a 
            preferred display charset, while ACE equivalents may be 
            necessary for protocols which only support US-ASCII. 
      
     Each of the above scenarios represent distinct handling cases 
     where internationalized domain names may or may not be used 
     directly. In some cases, the internationalized domain names may be 
     used as soon as the applications and resolvers are configured to 
     use them, while in other cases, measured and cautious deployment 
     is required in order to prevent undue breakage. In the latter 
     cases, however, the backwards-compatible ACE encoding is available 
     so that the internationalized domain names can be used. 
      
      
  3.4.    User Audiences 
      
     Another perspective on the changes which will result from 
     deploying the mechanisms described in this document can be seen by 
     analyzing how any such changes will affect the different 
   
  Hall                    I-D Expires: May 2002               [page 9] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     "audiences" who work with domain names, and who have their own 
     unique context-specific usage requirements and objectives. The 
     three main audiences discussed in this document are: 
      
        *   Developers. Protocol and application developers need to be 
            able to incorporate internationalized domain names into 
            their systems as easily as possible, although there are 
            many factors which will affect such usage, including the 
            input and output charsets and encodings which are available 
            to the applications and protocols. Where feasible, this 
            specification allows developers to choose any charset or 
            encoding which may be required and suitable for use, 
            although in most cases, a recommendation is also made for 
            the use of UTF-8 in particular. 
      
            Developers may adopt internationalized domain names for 
            connection identifiers and lookup operations fairly 
            quickly, such that users can use those system as soon as 
            they have compliant systems (and they have a target domain 
            name to communicate with). Implementing support for 
            internationalized domain names in protocols and application 
            data will require additional effort by the affected 
            development groups. 
      
            Support for ACE will be harder to implement, since it is a 
            relatively new and untested encoding syntax, with no 
            existing developer tools. This will likely be the largest 
            hurdle to overcome when developing applications for use 
            with this service. 
      
        *   Zone administrators. Organizations that wish to deploy 
            internationalized domain names should be able to do so 
            easily, at a reasonable cost, and without suffering 
            excessive pre-conditions. Towards this objective, the 
            mechanisms described by this document allow organizations 
            to deploy and use internationalized domain names within any 
            zone immediately, without requiring any other zone to have 
            been updated beforehand (although there are specific and 
            strong suggestions for upgrading the Internet's high-load 
            servers as soon as possible). 
      
            If an organization wishes to publish internationalized 
            domain names for users to access and utilize, the 
            authoritative servers for the affected zone must be 
            compliant with the naming rules and message formats 
            described by this document, which will almost certainly 
   
  Hall                    I-D Expires: May 2002              [page 10] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
            require the administrators of that zone to upgrade their 
            servers. However, organizations may also choose to only 
            deploy ACE encoded domain names if an immediate migration 
            is not feasible, with the caveat that internationalized 
            domain names in their native form will not be available 
            from those zones. 
      
        *   Network operators. The systems and human users which 
            generate DNS lookups are another area of concern, as these 
            protocols, programs and users will expect these lookups to 
            succeed, and will also expect that the visible namespace 
            will be compatible with the capabilities of the requesting 
            system at a minimum investment. This is a broad range of 
            requirements. 
      
            At a minimum, applications must be capable of generating 
            and accepting the internationalized domain names if they 
            are to use those domain names (see the "Developers" 
            discussion above for the application requirements). 
            Similarly, the local resolvers, caches and forwarders on 
            the user's network must also support the message formats if 
            they are to relay internationalized domain names between 
            their local applications and the remote zones being 
            queried. If the applications, resolvers and caches do not 
            support these requirements, intermediary systems will 
            perform the down-level negotiation automatically on their 
            behalf such that additional effort is not required on the 
            user's part. 
      
     In summary, the developers, zone administrators and end-users can 
     immediately participate in the internationalized namespace at no 
     additional expense if they are content with using ACE encoded 
     domain names, and can use internationalized domain names in their 
     native form if they are willing to make the necessary investments. 
     Furthermore, since the native and backwards-compatible encodings 
     are not mutually exclusive, implementers of this specification 
     have the option of adopting ACE for immediate use and then 
     transitioning to internationalized domain names on a per-system, 
     per-zone, or per-application basis, according to their schedule. 
      
      
  3.5.    Service Overview 
      
     This document specifies a variety of extensions to several 
     different protocols and services in order to facilitate the use of 
     internationalized domain names anywhere this support exists or can 
   
  Hall                    I-D Expires: May 2002              [page 11] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     be implemented, and to provide a legacy-compatible domain name in 
     all other situations. 
      
     More specifically, this document defines or clarifies behavior for 
     the following elements: 
      
        *   Host name character restrictions. Legacy protocols and 
            applications are currently restricted to the legacy host 
            naming rules, which only allow for a subset of US-ASCII 
            characters (letters, digits and the hyphen character). This 
            document redefines the characters which are valid within a 
            host name so that system identifiers, domain name parts of 
            host names, and new network services can use most of the 
            characters from the UCS. 
      
        *   DNS message format. This document defines an extended label 
            format based on the extended label services provided by 
            RFC2671 (Extension Mechanisms for DNS - EDNS0) [RFC2671], 
            with this label format being used to encapsulate UTF-8 
            encoded internationalized domain names in DNS messages. Any 
            DNS message which carries the UTF-8 encoded domain names is 
            required to use the EDNS/UTF-8 label type defined in this 
            document. Any DNS message which carries legacy domain names 
            (including the ACE encoded equivalent domain names) is 
            required to use the traditional message format. 
      
        *   Application handling rules. Applications can use 
            internationalized domain names immediately for lookup 
            operations that do not directly affect external services or 
            protocols, and can use ACE encoding sequences to specify 
            internationalized domain names in legacy protocol 
            operations, and can use them both at the same time. 
      
        *   Stub resolvers. Stub resolvers will most likely need to 
            provide a series of internationalized APIs in order to 
            fully support applications that generate internationalized 
            domain name lookups. For example, these APIs will almost 
            certainly be required in order for the resolver to 
            determine that the calling application is compliant with 
            the host name requirements defined by this document, and 
            that the domain names should be encoded in the proper label 
            format. Although this specification does not dictate these 
            APIs, it encourages their use, and provides some guidance 
            on the issues surrounding their use. 
      
   
  Hall                    I-D Expires: May 2002              [page 12] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
        *   Forwarders, resolving servers and caches. The user-side 
            servers which process internationalized domain names have 
            several protocol-specific requirements, including the 
            negotiated fall-back service when UTF-8 queries fail. 
      
        *   Authoritative servers. A key part of this specification is 
            the simultaneous support for internationalized and legacy 
            compatible domain names in the UCS namespace, thereby 
            allowing a domain name to be entered into an authoritative 
            zone database once, and for the appropriate response to be 
            generated by a server according to the label encoding from 
            the associated query. In order for this to work, this 
            specification requires authoritative servers which serve 
            internationalized domain names to comply with specific 
            conditions. This specification also allows existing servers 
            to serve ACE equivalent domain names when the authoritative 
            servers cannot be upgraded, although this typically results 
            in lower levels of functionality. 
      
     The elements listed above collectively define a completely 
     internationalized domain name system, which is capable of 
     servicing internationalized domain names in all compliant systems, 
     and which is also capable of providing ACE encoded equivalent 
     domain names when any component from the internationalized service 
     is not available. 
      
      
  3.6.    Process Example 
      
     This section illustrates a series of query/response transactions 
     under which the processes and protocols defined in this document 
     function. This example uses a reverse lookup for the PTR resource 
     record associated with the "14.2.0.192.in-addr.arpa." domain name 
     (forward lookups work similarly, but the issues are more fully 
     demonstrated by PTR lookups). Each of the various technologies 
     shown below are described in later sections of this document. The 
     sole purpose of this example is to provide an illustration of 
     these mechanisms in order to facilitate better discussion. 
      
     Note that this illustration represents a worst-case scenario 
     (thereby exercising most of the functionality provided by this 
     specification), and does not represent a typical scenario. 
      
        a.  First, a PTR resource record for 14.2.0.192.in-addr.arpa. 
            is added to the internationalized zone database on the 
            replication master server for the 2.0.192.in-addr.arpa. 
   
  Hall                    I-D Expires: May 2002              [page 13] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
            zone, with the resource record data value of 
            "host.<idn>.example.com." (where <idn> is an 
            internationalized domain name compliant with the host 
            naming rules provided in this document). Both of these 
            domain names have a primary representation consisting of 
            UCS characters in some local encoding, but are also 
            available as UTF-8 and ACE encoded data so they can be 
            encapsulated within DNS queries and responses. 
      
            Once the zone is reloaded and is replicated by the other 
            authoritative servers for that zone, the domain names can 
            be processed. 
      
        b.  An application on a remote system generates a DNS lookup 
            for the PTR resource record associated with the 
            14.2.0.192.in-addr.arpa. domain name. 
      
            If this is a legacy application, it issues the lookup using 
            the only method it knows, which is to pass the domain name 
            to the legacy resolver API. This would result in the 
            resolver issuing a legacy DNS query for the PTR resource 
            record associated with the specified domain name. 
      
            If this application is compliant with this specification, 
            it performs the following steps: 
      
            1.   Verify that the resolver is capable of processing 
                 queries for UTF-8 domain names by probing for an 
                 internationalized API. If this step failed, then the 
                 domain name would be converted to the legacy STD13 
                 octet encoding in step 3.6.b.3 and passed to the 
                 resolver's legacy API. 
      
            2.   Convert the domain name from its generated encoding to 
                 the canonical UCS characters, and then normalize and 
                 case-convert the UCS characters. 
      
            3.   Convert the normalized and lowercased UCS characters 
                 to the charset or encoding used by the resolver's 
                 internationalized API. 
      
            4.   Issue a lookup for the PTR resource record associated 
                 with the internationalized domain name, via the 
                 resolver's internationalized API. 
      
   
  Hall                    I-D Expires: May 2002              [page 14] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
                 Note that even though the domain name is compatible 
                 with the legacy host name rules, the domain name is 
                 passed through the internationalized API so that 
                 servers can tell whether or not the original 
                 application is UTF-8 compliant, and can determine the 
                 format of any internationalized domain names which are 
                 to be returned in the response messages. This is 
                 required in case the queried resource record includes 
                 internationalized domain names as resource record data 
                 (as would be the case with PTR resource records), and 
                 is also required for the proper handling of any SOA or 
                 NS resource records which may be returned as 
                 additional data in the response. 
      
            For the purpose of this example, we will assume that each 
            of these steps were successfully performed. 
      
        c.  The client's stub resolver generates the query, with the 
            Question Section of the query containing the UTF-8 encoded 
            domain name encapsulated in an EDNS/UTF-8 extended label. 
      
        d.  The stub resolver sends the query to one of its configured 
            resolving servers. 
      
        e.  The resolving server will either answer the query from its 
            cache or forward the query to a name server which is 
            authoritative for the namespace hierarchy, as per the 
            normal query-resolution procedure. For the purpose of this 
            example, we will assume that the server has no information 
            about the specified domain name, so it forwards the query 
            to one of the root zone's authoritative servers in order to 
            begin the iterative resolution process. 
      
        f.  The queried server responds with a referral, providing 
            delegation data for a zone in the path to the queried 
            domain name. For the purposes of this example, we will use 
            192.in-addr.arpa. as the delegation domain specified in the 
            referral message. 
      
            The specific format of the referral will depend on whether 
            or not the queried server understands the EDNS/UTF-8 label 
            encoding. If the server is compliant with this 
            specification (which it is, or else it wouldn't have 
            answered with a referral), then the referral will also 
            provide ENDS/UTF-8 encoded domain names in the Authority 
            and Additional-Data Sections of the referral. If the server 
   
  Hall                    I-D Expires: May 2002              [page 15] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
            was not compliant with this specification, it would return 
            an error upon seeing the extended label type, which would 
            cause the resolving server to restart the query using the 
            legacy label type. 
      
        g.  The resolving server decodes the UTF-8 encoded domain names 
            to their UCS character representation, caches the resource 
            records in their UCS form, and sends the query to one of 
            the authoritative servers for the referral zone. Note that 
            the cache did not normalize or case-convert the UCS 
            characters; only the end-systems perform this work. 
      
        h.  In this case, the queried server does not understand the 
            EDNS/UTF-8 label format, and has returned a FORMERR 
            response code. 
      
        i.  When these errors are encountered, the current resolver 
            (whether this is the client's stub resolver or a caching 
            server in the query path) must convert the query domain 
            name from its current form to a legacy-compatible encoding 
            (either ACE or STD13 octet sequences, depending on the UCS 
            characters which have been encoded), and then has to 
            reissue the query in that format. 
      
            In this case, the domain name only contains printable 
            characters from US-ASCII, so the STD13 octet encoding is 
            used for the fall-back query. Because the UCS domain name 
            was normalized and lowercased before it was passed to the 
            client's stub resolver, the legacy domain name will also be 
            in this format (although it will be compared in a case-
            neutral form by the recipient server). 
      
            Note that once this conversion takes place, the legacy 
            label format is used for the remainder of the current query 
            chain (this prevents excessive delays from multiple fall-
            back operations, which could result in timeouts at the 
            original resolver or application).  
      
   
  Hall                    I-D Expires: May 2002              [page 16] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
        j.  The queried server returns a delegation referral for the 
            2.0.192.in-addr.arpa. zone. Since the query arrived in the 
            STD13 octet encoding, the server has no indicator of the 
            client's capabilities, so the referral NS resource records 
            will also be returned in legacy compatible form (either as 
            STD13 octet sequences or as ACE encoded data, depending on 
            the character codes provided in each label from each of the 
            associated domain names). 
      
            Note that even though these NS resource records will be 
            restricted to legacy-compatible host names and label types, 
            they may contain and reference ACE domain names. In this 
            regard, a legacy server in the delegation path does not 
            prevent internationalized domain names from being delegated 
            or resolved, but only prevents them from being processed as 
            EDNS/UTF-8 extended labels. 
      
            Also note that once the authoritative servers for a zone 
            have been discovered and cached, any subsequent UTF-8 
            queries which are generated for the resources in that zone 
            will be sent directly to one of those servers, bypassing 
            the delegation hierarchy. As such, subsequent queries which 
            are provided in EDNS/UTF-8 labels can be processed directly 
            by the zone's authoritative servers, without the delegation 
            servers disrupting the process. 
      
        k.  The resolving server decodes the STD13 octet sequences and 
            ACE encoded domain names to their UCS character 
            representations, caches the resource records, and resends 
            the query to one of the authoritative servers for the 
            referral zone. 
      
        l.  The queried server processes the request. Since this query 
            arrived as an STD13 octet sequence, the server must compare 
            the seven-bit characters from the domain name (which is all 
            of them, in this example) in a case-neutral form. Note that 
            if the query had arrived as ACE or UTF-8 encoded domain 
            names, the server would have decoded the specified domain 
            name to its canonical UCS characters and performed a case-
            exact match against the resulting characters. 
      
        m.  The queried server responds with the requested data. Note 
            that the query was submitted in the legacy label form due 
            to the fall-back processing which occurred in step 3.6.i, 
            so the server will only respond to this query with STD13 
   
  Hall                    I-D Expires: May 2002              [page 17] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
            octet sequences or ACE encoded domain names, using the 
            STD13 legacy label. 
      
        n.  The resolving server decodes the STD13 octet sequences and 
            ACE encoded domain names to their UCS character 
            representations, and caches the resource records. Since the 
            query was originally received as an internationalized 
            domain name (as indicated by the EDNS/UTF-8 extended label 
            from the original query), the resolving server has to 
            encode the answer data as UTF-8 before passing it back to 
            the client's stub resolver. However, since the input was 
            not provided in an encoded UCS form, the server has to 
            normalize and case-convert the STD13 octet sequence in 
            order to provide a valid internationalized domain name. 
      
        o.  The stub resolver decodes the UTF-8 encoded domain names 
            which have been provided in the response message to their 
            UCS character representation, and passes the data to the 
            original calling application using the charset or encoding 
            favored by the resolver. 
      
        p.  The application validates the received domain name by 
            decoding the internationalized domain name to its canonical 
            UCS characters, normalizing and down-casing the resulting 
            domain name, and comparing the results with the answer data 
            which was provided by the resolver. 
      
     As can be seen, the UTF-8 name resolution process is identical to 
     the current resolution process, with the addition of a single 
     fall-back query in step 3.6.i which resulted in one extra 
     query/response pair (roughly equivalent to adding one extra 
     delegation referral into the query path), and with several 
     different encoding conversions, as required by the participating 
     systems and services. This example also illustrates the 
     requirements which are placed on developers, zone administrators, 
     and network operators in order for typical connection identifier 
     services to function with UTF-8 domain names. 
      
     However, if each system and service had used UTF-8 for encoding 
     purposes (including everything between the stub resolver's APIs 
     and the authoritative servers for the target zone), then no 
     additional queries or conversions would have been required (other 
     than the direct UCS conversions required for validation and 
     caching, the latter of which can be performed separately without 
     affecting the processing path). In this regard, the example above 
     illustrates how this system can function even when only a portion 
   
  Hall                    I-D Expires: May 2002              [page 18] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     of the participating systems utilize UTF-8, and also illustrates 
     how effective the entire operation would be if all of the 
     recommendations and requirements provided in this specification 
     were adopted. 
      
     It is also important to reiterate here that any such costs 
     associated with this compliance are entirely elective by the 
     affected parties. If they want to streamline the process, the 
     option is available to them, although the system also works when 
     very few optimizations are implemented. 
      
      
  4.      The Internationalized Namespace 
      
     In simple terms, this specification defines an internationalized 
     namespace which consists of domain names and labels that contain 
     UCS character codes, and also specifies a series of encoding 
     formats which may be used whenever the UCS values need to be 
     encapsulated for transmission within DNS messages or application 
     data streams. 
      
     In this regard, the internationalized namespace is the UCS 
     representation of the domain names and labels as they are used for 
     comparison operations once a domain name arrives for processing, 
     while the transfer encodings ensure that a domain name arrives at 
     the destination system intact, so that it may be processed in its 
     canonical form. 
      
     There are four conceptual elements to this model: 
      
        *   Character codes. Labels from internationalized domain names 
            have a single logical canonical representation as sequences 
            of UCS code point values. The UCS characters are used when 
            a particular label from a domain name is created by an 
            application, stored in a zone, hosts or cache database, and 
            is used whenever two sets of domain names or labels need to 
            be compared. However, different kinds of domain names have 
            different rules which govern the character codes that may 
            be used. 
      
        *   Storage encodings. Whenever a domain name is created or 
            copied from the network, it must be stored in a format that 
            is reversible to the canonical UCS character representation 
            of that domain name. This specification does not mandate or 
            require any particular storage encoding, and allows this 
            decision to be made on a per-implementation basis, as long 
   
  Hall                    I-D Expires: May 2002              [page 19] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
            as the storage encoding supports character codes which can 
            be converted to UCS equivalent values for comparison 
            purposes. However, the use of UTF-8 for this purpose is 
            encouraged, since it is the most common. 
      
        *   Transfer encodings. Whenever a domain name needs to be sent 
            over the network, it must be packaged in a form which is 
            compliant with the capabilities of the transfer protocol in 
            use. This document specifies three transfer encodings which 
            may be used to encode canonical UCS character codes in DNS 
            messages or application streams, which are: the octet 
            encoding from STD13, the ACE encoding from <ACE-Z>, and the 
            UTF-8 encoding from RFC2279. Each encoding has different 
            costs and benefits in different usage scenarios. 
      
        *   Comparison operations. When two domain names need to be 
            compared, they also follow rules which are appropriate to 
            the type of domain name being provided, and the transfer 
            encoding which may have been used to provide the domain 
            name to the system. 
      
     This document defines four distinct types of internationalized 
     domain names which may exist in the internationalized namespace, 
     and also describes how each of the above considerations affect 
     those domain names and their labels. These domain name types are 
     described throughout the remainder of this section. 
      
      
  4.1.    Internationalized Domain Names and Labels 
      
     This section describes the master template rules for all domain 
     names and labels which may be used in the internationalized 
     namespace, although subordinate rules and restrictions are also 
     applied as secondary filters, depending on the intended usage of 
     the domain name. 
      
     For example, domain names and labels which are to be used as 
     internationalized host identifiers (either as host names, or as 
     domain names which are used to specify a host) are restricted to a 
     specific subset of UCS characters. Meanwhile, domain names and 
     labels which are compliant with STD13's global rules are 
     restricted to eight-bit code values, while the domain names and 
     labels which are used as STD13 host identifiers are restricted to 
     a specific subset of US-ASCII. 
      
   
  Hall                    I-D Expires: May 2002              [page 20] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
      
     The following diagram illustrates how the subordinate rules are 
     applied and interpreted against the master restrictions: 
      
                      +-----------------------+ 
                      | Internationalized DNs | 
                      +-----------------------+ 
                       any UCS character codes 
                          /       | 
                         /        | 
                        /         | 
                       /          | 
          +-----------+     +-----------+     +------------+ 
          | Int. Host |     | STD13 DNs +-----+ STD13 Host | 
          +-----------+     +-----------+     +------------+ 
          normalized        character         ASCII letters, 
          subset of         codes 0x00        numbers, and 
          UCS chars         through 0xFF      hyphen char 
      
     As can be seen, the internationalized domain names and labels 
     rules allow any UCS character code to be stored, although each 
     particular usage of the domain names and labels will have their 
     own secondary rules and restrictions. 
      
     In order to allow future documents to define additional rules as 
     required for their usage, this document defines very few global 
     rules on the core internationalized domain names and labels. 
      
      
  4.1.1.  IDN syntax and structure 
      
     In this specification, an internationalized domain name consists 
     of a variable number of labels, each of which contain a variable 
     number of UCS character codes, not all of which will have defined 
     UCS character interpretations. 
      
     Furthermore, the encoding system which is used to store and 
     interpret those values on a system is not relevant to this 
     specification, and is therefore not defined. The characters in a 
     label can be stored in memory or on disk as UTF-8, UCS-4, ACE, or 
     any other storage encoding which is desired by the operators and 
     implementers of the affected system, as long as that encoding 
     system is reversible to the canonical UCS character code values, 
     and is able to represent the necessary range of UCS characters 
     (the "necessary range" varies by operation). 
      
   
  Hall                    I-D Expires: May 2002              [page 21] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     The only universal restrictions which apply to internationalized 
     domain names and labels are those which govern length. This 
     specification requires that labels from internationalized domain 
     names MUST be restricted to a minimum length of two characters and 
     a maximum length of 63 characters, inclusive. The exception to 
     this rule is the root domain, which is always represented by a 
     zero-length label. Note that this rule specifically refers to the 
     canonical UCS characters, rather than any encoded form (encoding 
     will often result in labels and domain names with fewer actual 
     characters, due to overhead from the encoding algorithm). 
      
     A fully-qualified internationalized domain name is formed by 
     joining a series of labels together, with the most-contextually 
     specific label in the left-most position of the label sequence, 
     and with the root domain occupying the right-most position. The 
     sum total of all labels in an internationalized domain name MUST 
     NOT exceed 255 characters, inclusive. Any number of labels MAY be 
     stored in the domain name, but the sum total of their lengths MUST 
     NOT exceed this limit. 
      
     However, labels which contain UCS character codes greater than 
     U+007F will result in multi-byte UTF-8 and ACE encodings, so the 
     maximum length of a label or an internationalized domain name is 
     governed by their UTF-8 and ACE encoded lengths. Both encodings 
     MUST result in an encoded length of 63 octets or less in order to 
     be usable, with a maximum cumulative length of 255 octets. 
      
      
  4.1.2.  IDN transfer encodings 
      
     The UCS is currently occupies a 21-bit range of character code 
     values, containing tens of thousands of assigned characters, and 
     hundreds of thousands of unassigned characters. Due to the multi-
     byte nature of the code point values, UCS characters cannot be 
     passed as protocol or application data in most of the existing 
     Internet protocols (including DNS messages), at least not without 
     the help of some kind of encoding scheme. At the very least, the 
     UCS character values have to be encoded as eight-bit sequences if 
     they are to fit within existing eight-bit data structures, and 
     have to be encoded as a subset of US-ASCII characters if they are 
     to be usable with legacy protocols and applications which only use 
     STD13's host identifier rules for their structured domain name 
     data types. 
      
     With this objective in mind, this document defines three different 
     transfer encoding systems which can be used to convert 
   
  Hall                    I-D Expires: May 2002              [page 22] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     internationalized domain names and labels into a form which is 
     suitable for transfer in different data streams. These are the 
     legacy STD13 octet encoding, ACE, and UTF-8. Each of these 
     encoding schemes provide different benefits and capabilities to 
     the internationalized DNS effort. 
      
        *   STD13 octets. The STD13 octet encoding scheme provides a 
            direct one-to-one mapping between eight-bit characters and 
            their eight-bit values, but it is only capable of storing 
            character codes in the range of U+0000 through U+00FF, 
            which severely restricts its usefulness. 
      
        *   ACE. The ACE encoding scheme is capable of storing UCS 
            character code value as seven-bit sequences in STD13 legacy 
            labels. While this makes it practically compatible with the 
            legacy host identifier rules, the resulting data imposes 
            additional labor on the Internet community, and the reuse 
            of the legacy label also results in certain amounts of 
            ambiguity with some DNS domain names and labels. 
      
        *   UTF-8. The UTF-8 encoding scheme is capable of encoding all 
            UCS character code values as sequences of eight-bit data 
            which are compatible with legacy DNS message restrictions, 
            but the encoded output requires explicit support from 
            internationalized applications and protocols. UTF-8 output 
            uses a new label type in order to prevent additional 
            ambiguity problems from arising. 
      
     The table below illustrates the UCS character code sequences which 
     are supported by each of the different encoding schemes. 
      
                          STD13 
                          Octets   ACE    UTF-8 
                        +-------+-------+-------- 
                        |       |       | 
               US-ASCII |   Y   |       |   Y 
                        |       |       | 
              Eight-Bit |   Y   |   Y   |   Y 
                        |       |       | 
          Any UCS Chars |       |   Y   |   Y 
                        |       |       | 
      
   
  Hall                    I-D Expires: May 2002              [page 23] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     More specifically, the character code sequence ranges and their 
     valid encodings are: 
      
        *   US-ASCII. If a label only contains character codes from the 
            range of U+0000 through U+007F, then it MAY be encoded as a 
            legacy STD13 octet sequence or UTF-8, but MUST NOT be 
            encoded as ACE. 
      
            Note that this specification explicitly prohibits seven-bit 
            labels from being encoded as ACE data, since such an action 
            would be redundant, results in greater processing overhead 
            for those labels, and multiple representations introduce 
            problems with caches on legacy systems. Furthermore, 
            certain security risks would be introduced if this were 
            allowed. For example, a malicious user could register or 
            purposefully create an ACE encoded representation of the 
            "example.com" label sequence such that users mistakenly 
            sent sensitive data to malicious systems. 
      
            In order to prevent these problems from occurring, this 
            specification requires that any ACE-encoded label which 
            consists entirely of seven-bit characters MUST be 
            immediately discarded with extreme prejudice. This rule 
            applies to every implementation of this specification, 
            including any applications, resolvers, caches or servers 
            which process labels. 
      
        *   Eight-bit codes. If a label contains character codes from 
            the eight-bit range of U+0000 through U+00FF, then it MAY 
            be encoded as STD13 octet sequences, ACE, or UTF-8. This 
            rule specifically requires that the label MUST contain at 
            least one character from the eight-bit range, MAY contain 
            any number of characters from the seven-bit range, but MUST 
            NOT contain characters with code values which are greater 
            than U+00FF. 
      
            Since the STD13 octet encoding and ACE both use the legacy 
            STD13 label type, this specification relies on the input 
            encoding of a domain name in order to determine the output 
            encoding. In some cases, however, the input encoding will 
            not be clear, or will not be specified, and this can result 
            in some ambiguity with label sequences from this range. 
      
            For example, if the domain name provided in a query 
            consists of seven-bit labels, then the STD13 octet sequence 
            is the only valid encoding for the legacy STD13 label, 
   
  Hall                    I-D Expires: May 2002              [page 24] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
            meaning that ACE could not have been used in the query. If 
            the specified domain name exists as a CNAME resource record 
            which refers to a domain name that contains eight-bit 
            character codes, then the proper output encoding for that 
            domain name will not be clearly discernable. Moreover, the 
            STD13 and ACE encodings will generate different results, 
            since the STD13 octet sequence will only contain a single 
            octet for the eight-bit character, while the ACE encoding 
            will contain multiple octets of encoded data. 
      
            When this situation arises, systems MUST give preference to 
            the ACE encoding, on the assumption that the referenced 
            character is more likely to represent a UCS character than 
            an eight-bit code value (the UCS characters in this range 
            are Latin-1, which are the most common characters after the 
            legacy US-ASCII set). Furthermore, the ACE encoded 
            representation of these characters allow for a broader 
            range of subsequent operations (since it complies with the 
            legacy host naming restrictions, it can be used with CNAME 
            resource records that refer to hosts), while the STD13 
            octet encoded representation does not. 
      
            It is possible to avoid this scenario on authoritative zone 
            servers (and thus the affected caches) by allowing the 
            operator to specify whether or not the input is Latin-1 UCS 
            character data or binary data, with the server generating 
            the proper output accordingly. Also note that the default 
            encoding specified by this document is UTF-8, which does 
            not suffer from the ambiguity problems described above. 
      
        *   Any UCS character codes. If a label consists of any 
            character codes greater than U+00FF, then it MAY be encoded 
            as ACE or UTF-8, but MUST NOT be encoded as STD13 octet 
            sequences. STD13 is not capable of representing character 
            codes greater than U+00FF, so it cannot be used with any 
            UCS characters beyond the eight-bit range. 
      
     Encodings are performed on a per-label basis. Each label MUST NOT 
     be encoded more than once. Also note that recursive encodings 
     result in applications discarding the domain name. 
      
     When the STD13 octet encoding is used to encode labels for 
     transmission, the labels are encoded according to the rules 
     specified in STD13, and are encapsulated in STD13 legacy labels. 
      
   
  Hall                    I-D Expires: May 2002              [page 25] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     When ACE is used to encode labels for transmission, the labels are 
     encoded according to the rules specified in <ACE-Z>, and are 
     encapsulated in STD13 legacy labels (this process is described in 
     section 5.2). 
      
     When UTF-8 is used to encode labels for transmission, the labels 
     are encoded according to the rules specified in RFC2279, and are 
     encapsulated in EDNS/UTF-8 extended labels (the format of this 
     label is described in section 5.1). 
      
     Note that a domain name MAY contain any combination of STD13 octet 
     encoded labels and ACE encoded labels. However, if a domain name 
     contains any UTF-8 encoded labels, then ALL of the labels from 
     that domain name MUST be encoded as UTF-8 data. This rule 
     primarily exists so that DNS compression services can be 
     maintained consistently, but it also prevents mixed referrals 
     which can trigger unnecessary fall-back processing, and also 
     provides a single encoding representation to internationalized 
     systems which benefits efficiency. 
      
     The root domain (as specified by the zero-length label at the 
     right edge of the domain name) MUST NOT be encoded with ACE. More 
     specifically, zero-length labels MUST NOT contain any character 
     data of any kind, and since ACE labels have prefix strings, they 
     are explicitly forbidden from being used for the root domain. 
      
      
  4.1.3.  IDN comparison operations 
      
     When an internationalized domain name label is received from the 
     network as ACE or UTF-8 encoded data, the labels MUST be decoded 
     to their canonical UCS character representation, and the resulting 
     UCS characters MUST be compared as case-exact sequences to their 
     stored equivalents. Except where specifically required in this 
     specification (EG, validity tests which are performed by 
     applications), normalization and case-conversion MUST NOT be 
     performed against the resulting UCS character codes prior to any 
     comparison operations being performed. 
      
     However, internationalized domain name labels which are received 
     as STD13 octet sequences MUST be given special treatment, as these 
     domain names could have originated from legacy systems operating 
     under STD13's rules. In this case, the seven-bit US-ASCII 
     alphabetic characters (U+0041 through U+005A, and U+0061 through 
     U+007A) from those labels MUST be compared in a case-neutral form. 
     All other code values MUST be compared as case-exact code values 
   
  Hall                    I-D Expires: May 2002              [page 26] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     (this particularly includes eight-bit characters, which were not 
     defined by STD13). 
      
      
  4.2.    Internationalized Host Identifiers 
      
     Internationalized host identifiers are a subset of the 
     internationalized domain names described in section 4.1, which 
     only use a subset of the allowable UCS characters, but which reuse 
     the global transfer encodings and comparison routines. 
      
     Most of the displayable characters from the UCS can be used in 
     host identifiers, and there are no additional rules governing the 
     ordering or length of their labels. However, the characters which 
     are used in internationalized host identifiers MUST be normalized 
     and case-converted before they are encoded for storage or 
     transfer. This requires more effort on the part of applications 
     and servers when the internationalized domain names are initially 
     created, but results in less ambiguity and lower processing 
     requirements for servers, caches and resolvers during subsequent 
     comparison operations. 
      
     The restrictions which govern the creation of internationalized 
     host identifiers are as follows: 
      
        a.  Labels MUST be restricted to the subset of characters which 
            are permitted by <nameprep> [nameprep]. Characters which 
            are prohibited by <nameprep> MUST NOT appear in any label 
            of any internationalized host identifier. 
      
        b.  Labels MUST be normalized through <nameprep> before they 
            are stored or encoded for transfer. Internationalized host 
            identifiers will not be normalized as part of any 
            comparison operation, so systems MUST normalize the labels 
            before they are stored or transmitted. 
      
        c.  Labels MUST be converted to lowercase according to the 
            case-mappings rules specified in <nameprep> before they are 
            stored or encoded for transfer. Internationalized host 
            identifiers will not be converted to lowercase as part of 
            any comparison operation, so systems MUST normalize the 
            labels before they are stored or transmitted. 
      
     According to the rules above, a label from an internationalized 
     host identifier which was originally created with the UCS 
     character sequence of <LATIN CAPITAL LETTER A><COMBINING ACUTE 
   
  Hall                    I-D Expires: May 2002              [page 27] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     ACCENT><LATIN CAPITAL LETTER B> (U+0041 U+0301 U+0042) would be 
     normalized and lowercased to <LATIN SMALL LETTER A WITH 
     ACUTE><LATIN SMALL LETTER B> (U+00E1 U+0062). The normalized, 
     lowercase form would be used as the canonical UCS character 
     representation of that label when it was encoded for storage and 
     transmission purposes, and would be the form which was used for 
     comparison operations on any resolvers, caches and servers. 
      
     Internationalized host identifiers which are received from the 
     network can contain labels which have been encoded as STD13 octet 
     sequences, ACE or UTF-8. In all of these cases, the comparison 
     rules defined in section 4.1.3 MUST be applied. 
      
      
  4.3.    STD13 Domain Names 
      
     STD13 allows any eight-bit code values to be used in domain name 
     labels. However, STD13 host identifiers (as described in section 
     4.4 of this specification) are the most common form of STD13 
     domain names, and have much tighter restrictions. 
      
     There are common uses of STD13 domain names which do not comply 
     with the STD13 host identifier subset, however. One common example 
     of this is SRV identifiers, which use an underscore character 
     (U+005F) as part of their label syntax. Another common example is 
     found when email addresses are provided in SOA and RP resource 
     records, and where the left-hand side of the email address is 
     stored as an STD13 domain name label which does not represent a 
     host identifier. Furthermore, email addresses often contain extra 
     characters which are not legal in STD13 host identifiers, such as 
     a full-stop character (U+002E). For example, "joe.admin" could be 
     stored as an STD13 domain name label in the fully-qualified domain 
     name of "joe.admin.example.com.", which would represent the email 
     address of "joe.admin@example.com" when that domain name was 
     extracted from the SOA or RP resource record and processed. 
      
     Implementations of this specification MUST allow STD13 domain 
     names to be created and stored, using the following rules: 
      
        a.  Labels MUST be restricted to the code values of U+0000 
            through U+00FF. Restrictions on character content MUST NOT 
            be applied (note that if this domain name will be used as 
            part of an STD13 host identifier, the rules specified in 
            section 4.4 MUST be used instead). 
      
   
  Hall                    I-D Expires: May 2002              [page 28] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
        b.  Labels MUST NOT be normalized or lowercased before they are 
            stored or encoded for transfer. 
      
        c.  Systems MUST allow STD13 domain names to be specified as 
            exact sequences of eight-bit octet values, and MUST NOT 
            treat these sequences as canonical UCS characters which are 
            normalized or lowercased. STD13 defines an escaping 
            mechanism whereby the decimal value of the octet is 
            prefaced with a reverse-solidus (such as "\193"), which is 
            suggested for this usage. 
      
     STD13 domain names which are received from the network can contain 
     labels which have been encoded as STD13 octet sequences, ACE or 
     UTF-8. In all of these cases, the comparison rules defined in 
     section 4.1.3 MUST be applied. Note that some of these sequences 
     can contain octet code values which have not been normalized or 
     lowercased by the originating system, since these values can be 
     used to specify binary domain names. 
      
      
  4.4.    STD13 Host Identifiers 
      
     This document does not deprecate, replace or modify the host name 
     rules defined by RFC952, STD3 or STD13 as they apply to legacy 
     host identifiers. However, there are several issues which affect 
     the usage of these domain names and their labels in this system. 
      
     The range of characters which are currently defined as valid in 
     STD13 host identifiers are the uppercase and lowercase letters, 
     numbers and hyphen character from US-ASCII. No other characters 
     are allowed to be used. Furthermore, the current rules also 
     prohibit the use of the hyphen character in the first or last 
     character position of a host identifier label. 
      
     Implementations of this specification MUST allow STD13 host 
     identifiers to be created and stored, using the following rules: 
      
        a.  Labels MUST be restricted to the code values of U+002D, 
            U+0031 through U+0039, U+0041 through U+005A, and U+0061 
            through U+007A. 
      
        b.  Labels MUST NOT contain the code value of U+002D in either 
            the first or last character position of the label. 
      
   
  Hall                    I-D Expires: May 2002              [page 29] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
        c.  The alphabetic characters MUST be converted to lowercase 
            before they are stored or transmitted. STD13 host 
            identifiers are always compared in a case-neutral form. 
      
     STD13 host identifiers which are received from the network can 
     contain labels which have been encoded as STD13 octet sequences 
     UTF-8. In both cases, the comparison rules defined in section 
     4.1.3 MUST be applied. 
      
      
  5.      Transfer Encodings and Label Types 
      
     As was discussed in section 4.1.2, internationalized domain names 
     and labels are required to be encoded as either eight-bit or 
     seven-bit data whenever they are transmitted as protocol or 
     application data. 
      
     The particular output encoding format which will be used for any 
     given label will be primarily determined by the capabilities of 
     the participating end-point systems. If the application or 
     protocol which is relaying the domain name labels supports 
     internationalized domain names directly then UTF-8 encoded labels 
     can be used, but if the protocol or application is only capable of 
     supporting STD13 host identifiers as domain name data, then the 
     STD13 octet and/or ACE encoded labels will have to be used. 
      
     With DNS messages in particular, the "data type" is the label 
     encapsulation in use. Although STD13 legacy labels allow for the 
     use of eight-bit codes, multiple encodings for the same basic 
     character data result in interpretation problems without some form 
     of ancillary tagging service. For this reason, each encoding is 
     represented differently by this specification. When the STD13 
     legacy label contains STD13 octet sequences then no tagging is 
     provided, but if the STD13 legacy label contains ACE encoded data 
     then the encoded sequence is tagged with an ACE identifier (a 
     character prefix which does not normally appear in labels). When 
     UTF-8 domain names are provided, an EDNS/UTF-8 extended label is 
     used to encapsulate the internationalized domain name. 
      
     Furthermore, the encoding which is used for any label in the 
     message will also determine the label type which is used to 
     encapsulate and transfer the entire domain name. If any label 
     contains EDNS/UTF-8 extended labels, then all of the labels from 
     that domain name are required to be encapsulated for transfer in 
     EDNS/UTF-8 extended labels. Conversely, if a domain name contains 
     ACE or STD13 octet encoded labels, then all of the labels from 
   
  Hall                    I-D Expires: May 2002              [page 30] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     that domain name are required to be encapsulated for transfer 
     using the STD13 legacy label format. 
      
     Note that other legacy applications and protocols will most likely 
     be required to provide extended encodings or negotiation features 
     before they can exchange internationalized domain names directly. 
     However, new applications and protocols which are subsequently 
     written to comply with BCP18 and this specification should not 
     require any such effort, as they should be capable of transferring 
     UTF-8 domain names from the beginning. 
      
      
  5.1.    The EDNS/UTF-8 Label Type 
      
     Any internationalized domain name label which has been encoded as 
     UTF-8 for transmission in a DNS message MUST be encapsulated as a 
     EDNS/UTF-8 label. 
      
     The EDNS/UTF-8 extended label is an instance of EDNS extended 
     label types (as defined by RFC2671). Extended labels are indicated 
     by the leading bit pattern of 0b01 in the label type field (the 
     first two bits from the "label length" octet of the STD13 legacy 
     label type), with the remaining six bits of this octet indicating 
     the extended label type in use. The EDNS/UTF-8 label type uses the 
     binary value of 0b000011 for this indication (note that IANA may 
     change this assignment). 
      
     EDNS/UTF-8 labels contain two subordinate units of data. The first 
     octet contains a length indicator which works exactly the same as 
     the length octet as used by STD13 legacy labels: if the first two 
     bits of this octet are 0b00 then the rest of that octet provides 
     the length of the label data field, but if the first two bits of 
     this octet are 0b11 then the label is a pointer to some other 
     label, and the remainder of the length octet provides an off-set 
     which points to the length octet of the referenced label, as per 
     the rules provided in section 4.1.4 of RFC 1035 (STD13, part 2). 
      
   
  Hall                    I-D Expires: May 2002              [page 31] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     The structure of the EDNS/UTF-8 extended label is illustrated by 
     the following figure. 
      
                              1 1 1 1 1 1 1 1 1 1 
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |0 1|0 0 0 0 1 1|    length     |  label data  ///  | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      
          0b01  The extended label identifier. 
      
          0b000011  The EDNS/UTF-8 extended label type identifier. 
      
          Length  The number of octets in the label data, or the off-
            set to the length octet of another EDNS/UTF-8 label. 
      
          Label data  The label data, encoded as UTF-8 octets. 
      
     The following example shows the domain name of me.com, where the 
     "e" in "me" is the UCS character <LATIN SMALL LETTER E WITH ACUTE> 
     (U+00E9), which has the UTF-8 encoded octet sequence of 0xC3A9. 
      
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
         20 | 0  1  0  0  0  0  1  1|          0x03         | 
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
         22 |        0x6D (m)       |      0xC3 (e')        | 
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
         24 |      0xA9 (e')        | 0  1  0  0  0  0  1  1| 
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
         26 |         0x03          |        0x63 (c)       | 
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
         28 |        0x6F (o)       |        0x6D (m)       | 
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
         30 | 0  1  0  0  0  0  1  1|         0x00          | 
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
      
     Octet 20 identifies the EDNS/UTF-8 extended label type, while 
     octet 21 indicates that the label is three octets long. Octet 22 
     contains the UTF-8 value for lowercase "m", while octets 23 and 24 
     contain the UTF-8 value for the UCS character <LATIN SMALL LETTER 
     E WITH ACUTE> (encoded as 0xC3A9). 
      
     Similarly, octet 25 identifies another EDNS/UTF-8 extended label 
     type, while octet 26 indicates that the label is three octets 
     long, while octets 27 through 29 contain the UTF-8 values for the 
     lowercase alphabetic sequence of "com". 
   
  Hall                    I-D Expires: May 2002              [page 32] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
      
     Finally, octet 30 identifies another EDNS/UTF-8 extended label 
     type, while octet 31 indicates that the label is zero octets in 
     length, thereby signifying the root zone (the end of the queried 
     domain name). 
      
     Note that the use of the EDNS/UTF-8 extended label type serves 
     multiple purposes. On the one hand, it provides a method of 
     signaling the resolver's capabilities to the server, so that the 
     server can determine which format it needs to use when returning 
     answers, referrals or errors. Moreover, using an encapsulation 
     format which is not backwards compatible prevents certain 
     ambiguity problems which can result from overloading the STD13 
     legacy label with multiple encodings. These problems are seen in 
     certain situations with STD13 octet encoding and ACE, where a 
     server cannot adequately determine which encoding a resolver 
     desires. By using a separate extended label type for UT-8, these 
     kinds of ambiguities are avoided. 
      
     There are additional benefits which come from using EDNS extended 
     label types, which are best expressed as "future possibilities". 
     Once the EDNS extended label mechanisms are widely deployed, it 
     becomes feasible to specify additional encoding mechanisms as soon 
     as the Internet community deems it desirable. In this regard, 
     defining alternative encodings is much easier the second time. 
      
      
  5.2.    The STD13 Legacy Label Type 
      
     Any internationalized domain name label which has been encoded as 
     ACE or STD13 octet sequences for transmission in a DNS message 
     MUST be encapsulated within an STD13 legacy label. 
      
     This document does not deprecate, replace or extend the STD13 
     octet encoding or label encapsulation rules defined by STD13. 
     However, this document does provide some guidance on the creation 
     and interpretation of ACE encoded labels when they are stored in 
     legacy labels, which is necessary in order for recipient systems 
     to properly detect and decode the label contents. 
      
     Note that STD13 octet sequences and ACE data MAY both be provided 
     the same domain name. As such, each STD13 legacy label from a DNS 
     message must be examined and processed independently. 
      
      
   
  Hall                    I-D Expires: May 2002              [page 33] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
  5.2.1.  ACE encoded labels 
      
     ACE encoded labels always begin with the character sequence of 
     <TBD> (this document uses "zz--" as a placeholder sequence until a 
     formal assignment is made). Any label which contains ACE encoded 
     data MUST begin with this character sequence prefix. Similarly, 
     any label which begins with this character sequence MUST be 
     recognized and processed as an ACE encoded label, according to the 
     rules defined in this specification. 
      
     Encoding and encapsulating a label as ACE data is a three-part 
     process, as follows: 
      
        a.  Encode the canonical UCS character data from the 
            internationalized domain name label into ACE using the 
            procedure defined in <ACE-Z> 
      
        b.  Preface the encoded output with the "zz--" prefix sequence, 
            thereby indicating that this label contains ACE encoded UCS 
            character data. 
      
        c.  Determine the length of the encoded data and store this 
            value in the STD13 legacy label's length octet. 
      
     Decoding an ACE label is the opposite of that process. 
      
     Note that whenever the ACE algorithm encounters a seven-bit 
     character code in the input, it is passed through unmodified to 
     the encoded output. If a label only contains seven-bit character 
     codes, the label MUST NOT be encoded as ACE, and MUST be encoded 
     as either STD13 octet sequences or UTF-8. Forcing a seven-bit 
     label to be encoded as ACE serves no benefit, incurs additional 
     processing on the end-point systems, and can also expose certain 
     security risks. Any system which is capable of generating and 
     deciphering ACE encoded labels is required to treat such sequences 
     as hostile, and MUST dispose of them immediately without any 
     further processing immediately; systems are forbidden to even 
     return these labels in DNS error messages. 
      
     Similarly, ACE MUST NOT be used to encode any zero-length labels 
     (including but not specifically limited to the root domain), since 
     the presence of prefix characters in these labels can invalidate 
     their protocol-specific interpretations. 
      
     When an STD13 legacy label is received which has "zz--" in the 
     first four character positions, the label MUST be treated as an 
   
  Hall                    I-D Expires: May 2002              [page 34] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     ACE-encoded internationalized domain name, and MUST be decoded to 
     its canonical UCS character values for further processing. 
      
     Note that STD13 legacy labels MUST be verified before the ACE 
     encoded data is extracted (as per the rules defined in STD13 which 
     govern the STD13 legacy label type), but systems which are 
     compliant with this specification MUST perform all subsequent 
     comparison, caching, or storage operations against the canonical 
     UCS characters, and MUST NOT use the ACE encoded label sequence 
     for any of these operations. 
      
     Note that the legacy systems which are not compliant with this 
     specification will treat ACE encoded labels as any other STD13 
     legacy label. 
      
      
  5.2.2.  STD13 octet encoded labels 
      
     Any STD13 legacy labels which do not begin with the ACE prefix 
     MUST be treated as STD13 octet encoding sequences. The rules for 
     this process are defined by STD13's default label encapsulation 
     services, although this document also provides some clarifications 
     on the use of this encoding with internationalized domain names 
     and labels. 
      
     Whenever the STD13 octet sequence is used to encode the labels 
     from an internationalized domain name, the octet values of the 
     canonical UCS characters are stored directly in the label. Because 
     the DNS message is limited to octets, the range of UCS character 
     codes which are eligible for use with STD13 octet sequences is 
     limited to U+0000 through U+00FF. If any UCS character codes 
     outside this range need to be transferred, the internationalized 
     domain name label will have to be encoded as ACE or UTF-8. 
      
     Note that comparison operations for the seven-bit range of 
     alphabetic character values MUST be performed in a case-neutral 
     form, although eight-bit code values MUST NOT be normalized or 
     case-converted as part of a comparison operation. These rules are 
     required in order to ensure backwards compatibility with the STD13 
     compliant systems which may be generating these labels as parts of 
     an STD13 domain name while also supporting the normalization and 
     case-conversion which may have been applied to the UCS characters 
     in the storage or transfer encoding systems. 
      
      
   
  Hall                    I-D Expires: May 2002              [page 35] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
  6.      Application Guidelines 
      
     As was discussed in section 3.3, there are multiple scenarios in 
     which an application can make use of internationalized domain 
     names, ranging from simple lookups of connection identifiers to 
     abstract encapsulations of unstructured application data. This is 
     an extremely broad range of uses, which is complicated by the 
     extreme pervasiveness of applications and protocols that use 
     domain names for one or more of these purposes. 
      
     Furthermore, network applications face a complex array of input 
     and output operations which will cumulatively affect the ability 
     of that application to make use of the internationalized domain 
     name system for various services and functions. These issues are 
     illustrated by the figure below: 
      
                       [IDNs]              [IDNs] 
                         |                   ^ 
                         |                   | 
                  +------V------+     +------+------+ 
                  |    input    |     |   output    | 
                  |   charset   |     |   charset   | 
                  +-----------+-+     +-+-----------+ 
                               \       / 
                            +---+-----+---+ 
                            | Application | 
                            +---+-----+---+ 
                               /       \ 
                  +-----------+-+     +-+-----------+ 
                  |   lookups   |     |   app data  <---> [IDNs] 
                  +------+------+     +-------------+ 
                         | 
                  +------+------+ 
                  |   resolver  <---> [IDNs] 
                  +-------------+ 
      
     As can be seen, the ability for an applications to complete adopt 
     internationalized domain names will be determined by many factors, 
     any one of which could prevent the application from completely 
     incorporating the restrictions and recommendations prescribed by 
     this specification. 
      
     In order to allow for a flexible adoption schedule, this 
     specification defines very few mandates that applications must 
     adopt, but instead focuses on recommendations which applications 
     should comply with whenever they need to use internationalized 
   
  Hall                    I-D Expires: May 2002              [page 36] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     domain names, and also provides recommendations for situations 
     where the preferred behavior is not feasible. Applications which 
     are compliant with all of the recommendations provided in this 
     specification will be able to generate, store, transfer and 
     resolve internationalized domain names throughout all of their 
     operations, using UTF-8 as a common encoding for all of these 
     operations. Meanwhile, applications which are not in complete 
     compliance with this specification will still be able to make use 
     of the internationalized domain names in these operations, 
     although such access may be limited to using backwards-compatible 
     encodings which require greater amounts of effort to implement and 
     which provide fewer benefits. 
      
      
  6.1.    Input and Output Charsets 
      
     If an application is unable to accept, process, store or display 
     characters from the complete UCS repertoire, that application's 
     support for internationalized domain names will be somewhat 
     limited, by definition. 
      
     Although this document does not mandate any particular charset or 
     encoding which all applications must use for all operations, 
     applications SHOULD use coded character sets or encodings which 
     can handle characters from a reasonable number of scripts. 
      
     In particular, the following areas have specific requirements: 
      
        *   Input charsets and encodings. Since UTF-8 is used as the 
            default encoding for internationalized domain names 
            throughout this specification (and others, such as BCP18), 
            UTF-8 is also RECOMMENDED for use with input encodings of 
            internationalized domain names in particular, although this 
            is not required. Many platforms and development 
            environments support UTF-8 as a local encoding of the UCS 
            and it can be reasonably used with many types of input 
            (such as configuration files), although many systems will 
            require a specific encoding (such as UCS-2, or ISO/IEC 
            8859-1) in situations which require memory access or 
            keyboard input. 
      
            Regardless of the input encodings used, implementations 
            MUST map domain names and labels to their canonical UCS 
            characters for any normalization and case-conversion work 
            which is subsequently required by any DNS lookups (see 
            section 6.3). 
   
  Hall                    I-D Expires: May 2002              [page 37] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
      
        *   Output choices will likely be limited to a system-preferred 
            charset or encoding. In general, this document RECOMMENDS 
            that output systems choose an output charset or encoding 
            which reflects the data being provided. However, 
            applications MUST NOT display unknown characters with 
            generic replacement characters (such as boxes or circles) 
            if it is known that the original characters are not 
            available for display with the specified charset, as such 
            characters will almost certainly trigger failure conditions 
            in subsequent protocol operations. 
      
     In those situations where adequate input or output charsets or 
     encodings are unavailable, applications MAY use ACE to encode 
     internationalized domain names for the purpose of ensuring that 
     the data is provided intact. Since ACE is capable of representing 
     UCS characters as sequences of seven-bit characters, it is 
     functionally usable as a last line of defense in almost any 
     environment, with the caveat that ACE encoding sequences are 
     extremely cryptic and will likely result in lower levels of 
     usability and functionality. 
      
      
  6.2.    Protocol and Application Data 
      
     There are several interrelated issues which will determine an 
     application's ability to provide or accept internationalized 
     domain names as protocol or application data, although the 
     principle determining factors for any such usage will generally be 
     the capabilities of the underlying protocol itself. 
      
     If a protocol allows negotiation or tagging services in order to 
     distinguish between different encodings, that protocol can likely 
     be extended to support the use of UTF-8 as protocol or application 
     data through command/response negotiation options or through data-
     type tags. Older protocols which do not provide any negotiation 
     services or which mandate the use of US-ASCII in all data will 
     likely require the use of ACE encoded domain names as a short-term 
     measure until the protocol is made compliant with BCP18. 
      
        *   Protocol data. If the protocol supports UTF-8 encoded 
            internationalized domain names in commands or responses, 
            then that encoding SHOULD be used wherever it is allowed. 
            If UTF-8 is not supported by the protocol, STD13 octet 
            sequences and/or ACE encoded equivalents of the 
            internationalized domain name MUST be used. 
   
  Hall                    I-D Expires: May 2002              [page 38] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
      
            In some cases, this negotiation can be performed on a per-
            session basis, while in other cases this work will need to 
            be performed for each transaction within the session, while 
            in other cases the internationalized domain names will have 
            to be tagged whenever they are provided as protocol or 
            application data. 
      
            The DNS protocol is itself an example of a protocol which 
            requires tagging in order for internationalized domain 
            names to be exchanged within the existing DNS message (with 
            these indicators taking the form of ACE encoding prefixes 
            and EDNS/UTF-8 extended label type codes). Meanwhile, a 
            protocol such as WHOIS can theoretically support a session-
            wide negotiation option that allowed the use of 
            internationalized domain names as protocol and application 
            data for the duration of that session. Conversely, a 
            protocol such as SMTP will likely require the use of 
            session-specific identifiers for some operations, while 
            other operations may be able to use label tags (similar to 
            the existing support for domain literals, which are 
            identified by a pair of surrounding square brackets). 
      
            Regardless of the encodings which are used, implementations 
            MUST map domain names and labels to their canonical UCS 
            characters for any normalization and case-conversion work 
            which is subsequently required as part of a DNS lookup (see 
            section 6.3). 
      
        *   Structured application data. Structured application data 
            such as URLs and email addresses MUST be processed 
            according to the rules which govern those data formats. 
            Applications MUST NOT perform any conversion or 
            transliteration which is not explicitly prescribed by the 
            governing documents, since non-standard usages are likely 
            to result in misinterpreted data. 
      
        *   Unstructured application data. Domain names which appear as 
            unstructured data in application content are beyond the 
            control of this specification, and are generally subject to 
            the encoding and formatting desires of the end-users who 
            created the data. Generally speaking, it is RECOMMENDED 
            that applications allow users to enter or view documents in 
            whatever format they prefer, but that any conversion 
            between multiple source and destination charsets and 
            encodings use UCS as the translation intermediary, such 
   
  Hall                    I-D Expires: May 2002              [page 39] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
            that internationalized domain names are properly converted 
            along with the rest of the application data. 
      
     In some cases, the application will need to probe the resolver 
     before it can use internationalized domain names as data. For 
     example, a participating system may need to determine the 
     internationalized domain name of the local system so that it can 
     provide this data in a protocol-specific banner message, and in 
     these cases, the application will have to communicate with the 
     resolver before this data can be provided. 
      
     Due to the usage-specific nature of internationalized domain names 
     within protocol and application data streams, each development 
     group will have to analyze the restrictions and capabilities which 
     affect their specific services independently. 
      
      
  6.3.    DNS Lookups and Resolver Calls 
      
     One of the most frequent uses for domain names is for lookup 
     operations, such as for locating the IP addresses associated with 
     a specified domain name, determining the domain name associated 
     with a specified IP address, or performing a protocol-specific 
     lookup operation for a specific resource record (such as the MX or 
     SOA resource records associated with a specific domain). 
      
     Since these lookup operations do not directly affect external 
     protocols or data, internationalized domain names can be used for 
     lookup operations at the application's discretion. For example, 
     applications such as ping and netstat only use domain names for 
     display purposes, and can therefore make immediate use of 
     internationalized domain names within their protocol operations. 
     Similarly, a protocol can be limited to STD13 host identifiers as 
     protocol identifiers which will require the application to provide 
     internationalized domain names as ACE encoded sequences, but any 
     lookup operations which are necessary for the internationalized 
     domain names can still be performed in their native form. In these 
     cases, the protocol operations and lookup operations are separate 
     tasks with separate rules. 
      
     Similarly, applications are not required to use internationalized 
     domain names and internationalized resolver APIs for every lookup. 
     In some cases, it may be more efficient for an application to only 
     use internationalized domain names for lookup operations against 
     connection identifiers, and to use STD13 octet sequences or ACE 
     encoded legacy lookups for domain names which were obtained as 
   
  Hall                    I-D Expires: May 2002              [page 40] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     protocol or application data (this will be especially true in 
     those cases where the protocol does not yet provide an 
     internationalized domain name data-type). In those cases where an 
     application prefers to use the legacy resolution path, the 
     application MUST use the resolver's legacy APIs. For lookups 
     against internationalized domain names, the application MUST use 
     the resolver's internationalized APIs. 
      
     Note that this specification does not define a mandatory encoding 
     which must be used between the applications and the local 
     resolver. However, resolvers MUST provide at least one encoding 
     which is capable of supporting the entire UCS repertoire of 
     character codes, including character codes which are currently 
     unassigned. Since UTF-8 is the default encoding which is used 
     throughout this specification, it is also RECOMMENDED for use with 
     resolver APIs, although this is not required. Resolvers MAY 
     dictate a local encoding, with the only requirement being support 
     for the entire range of UCS character codes. 
      
     Regardless of the data being provided or the charset or encoding 
     which is used to provide that data, applications MUST normalize 
     and case-convert any internationalized host identifiers which it 
     generates or receives from a lookup operation. This process MUST 
     use the canonical UCS characters of the domain name according to 
     the rules specified in <nameprep> for every host identifier which 
     is sent to or received from a resolver. 
      
     If the application knows that the requested data specifically 
     refers to a host identifier, then the domain name data which is 
     returned by the resolver MUST be normalized and case-converted, 
     and the resulting domain name MUST be compared to the original 
     domain name which was received prior to the normalization and 
     case-conversion steps. If the processed domain name does not match 
     the domain name which was received, the domain name MUST be 
     discarded as malformed. 
      
     This step is necessary in order to ensure the integrity and 
     veracity of internationalized domain names which are processed by 
     applications, since there are multiple opportunities for errors to 
     be introduced (such as mistyped entries in the resolver's hosts 
     database, or malicious data which has been purposefully provided 
     in a zone), and these errors can result in sensitive data being 
     directed to the wrong network. Note that the above rule 
     specifically applies to host identifiers and not to all 
     internationalized domain names as a whole; applications MUST NOT 
     arbitrarily normalize and case-convert any and all domain names, 
   
  Hall                    I-D Expires: May 2002              [page 41] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     but MUST apply these steps to any and all domain names which are 
     known to be used as host identifiers. 
      
     As part of the processing rules for DNS lookups, it is expected 
     that an application can exchange internationalized domain names 
     with the resolver using a charset or encoding which is capable of 
     representing the entire UCS character code range. Towards this 
     objective, applications SHOULD test the capabilities of the 
     resolver prior to transferring internationalized domain names. In 
     those situations where the resolver is unable to support this 
     usage, the application MUST encode the internationalized domain 
     name as STD13 octet sequences or ACE, and pass the resulting STD13 
     host identifier to the resolver. 
      
      
  7.      Resolver Guidelines 
      
     Resolvers play a crucial role in the use of internationalized 
     domain names, in that they provide the internationalized namespace 
     which applications work with. As part of this service, resolvers 
     provide encapsulation services for the internationalized domain 
     names which are exchanged with the applications, resolve queries 
     in the internationalized namespace on behalf of the applications, 
     and provide lookup matching for entries which are stored in a 
     local hosts database. Note that resolvers which cache answer data 
     for subsequent operations are also governed by the caching 
     restrictions provided in section 9. 
      
      
  7.1.    Resolver APIs 
      
     Stub resolvers which communicate directly with applications that 
     are compliant with this specification are strongly encouraged to 
     provide a separate set of APIs for those applications to use 
     whenever internationalized domain names need to be provided in 
     queries or response messages. 
      
     The use of an internationalized API will generally facilitate 
     smoother operations for the applications, in that it will allow 
     the application to determine the capabilities of the resolver, to 
     obtain the internationalized domain name of the local system, and 
     to process queries for internationalized domain names as special 
     data types. 
      
     Furthermore, the use of internationalized versus legacy APIs 
     provides a way for resolvers to separate internationalized and 
   
  Hall                    I-D Expires: May 2002              [page 42] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     legacy application query paths, such that the legacy APIs only 
     result in STD13 legacy labels, while the internationalized APIs 
     generate and trigger EDNS/UTF-8 extended labels. The output 
     formatting of the DNS messages are controlled by tight 
     restrictions, and the use of alternative APIs will likely result 
     in simpler resolver implementations. 
      
     For example, it is suggested that applications use the 
     internationalized APIs for all of the DNS lookups they generate, 
     even if the domain name only contains seven-bit characters. This 
     is required in case the queried domain name only exists with a 
     CNAME or PTR resource record which references an internationalized 
     domain name, and the server has to know which encoding to use for 
     that query. If the client had not used the internationalized API 
     for the original lookup of the domain name, the resolver may have 
     chosen the wrong label type, and thus the response data would only 
     be returned as ACE encoded data. 
      
     Conversely, older applications which generate malformed eight-bit 
     queries through the legacy APIs will result in those queries being 
     properly rejected by the DNS servers, preventing undue problems 
     with these applications from occurring. For example, an older 
     application may process an internationalized domain name through 
     the system-default charset or encoding (such as MacRoman), which 
     would result in the domain name being malformed when the 
     application tried to do something important with that domain name 
     (such as send an email message over SMTP). The use of multiple 
     APIs causes these malformed applications to break, and the invalid 
     domain names are kept out of the application protocol space. 
      
     Internationalized APIs are optional to the extent that an 
     application MAY use an embedded resolver which is known to be 
     capable of generating and processing internationalized domain 
     names through the existing function calls. However, the use of 
     separate APIs for internationalized domain names is encouraged. 
      
     Although this document does not mandate any specific APIs, the 
     following functions SHOULD be provided for in some form: 
      
        *   Test Wide. Applications MUST be able to test the resolver 
            for compliance with this specification. In those cases 
            where this function is performed by some other function 
            (such as one of the following), the capabilities of the 
            resolver MUST be detectable even if the requested operation 
            fails. For example, if an application issues a call for the 
            internationalized domain name of the local system, the 
   
  Hall                    I-D Expires: May 2002              [page 43] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
            capability of the resolver to handle internationalized 
            domain names MUST be uniquely represented even if the local 
            host name cannot be determined. 
      
        *   Get Wide X-By-Y. Applications SHOULD be able to specify any 
            resource record associated with any internationalized 
            domain name as part of a lookup operation. Whether this 
            service is provided as a series of lookup-specific APIs or 
            as a general purpose API is up to the resolver. 
      
        *   Get Wide Local Name. Applications which utilize 
            internationalized domain names as data will need to be able 
            to determine the internationalized form of their local 
            system name for some operations (such as a protocol-
            specific welcome banner). When this function is called, the 
            resulting data MUST be provided as the canonical UCS 
            character code values, or their equivalent as represented 
            by a locally mandated charset or encoding. 
      
            Note that an ACE equivalent of the system name SHOULD be 
            returned when the relevant legacy API is queried. In those 
            cases where the legacy and internationalized domain names 
            both contain seven-bit character codes (possibly because 
            the host name is only available in US-ASCII, or because the 
            host name was assigned as ACE by an external configuration 
            service), the internationalized host name MUST still be 
            accessible through the internationalized function. 
      
     Note that this application does not specify a charset or encoding 
     which must be used by the resolver APIs. However, wherever an 
     internationalized API is presented, the resolver MUST utilize a 
     charset or encoding which supports the entire UCS repertoire of 
     character codes, including character codes which are currently 
     unassigned. Since UTF-8 is the default charset for most of the 
     operations specified in this document, it is also RECOMMENDED for 
     this service, but is not required. 
      
      
  7.2.    Query Processing Services 
      
     Resolvers which are compliant with the recommendations provided in 
     this specification will provide two query paths, one of which 
     supports STD13 domain names and another which supports 
     internationalized domain names. Technically, there is no 
     requirement for two processing paths, although these paths will 
   
  Hall                    I-D Expires: May 2002              [page 44] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     likely exist as conceptual paths even if they are not represented 
     or implemented uniquely in all resolvers. 
      
     The legacy processing path is defined by STD13. This document does 
     not update, modify or extend the rules that resolvers operate 
     under when an STD13 compliant domain name is received by a legacy 
     application through any legacy APIs which may exist. However, when 
     an internationalized domain name is received from an 
     internationalized application through any internationalized APIs, 
     the processing rules defined in this section MUST be followed. 
     Note that these rules apply to all resolvers, whether they are 
     stub resolvers, forwarders or caching servers. 
      
     Generally speaking, the internationalized domain name resolution 
     process has two major components: processing internationalized 
     domain names as queries, and performing fall-back processing if an 
     EDNS/UTF-8 query is rejected by an authoritative server. 
      
      
  7.2.1.  Internationalized queries 
      
     Queries for internationalized domain names which are received 
     through internationalized APIs can be expected to have originated 
     at an application which is capable of accepting and processing 
     internationalized domain names in the response messages. 
      
     Resolvers MUST encode the labels from the queried domain name as 
     UTF-8 and encapsulate the resulting encoded labels into EDNS/UTF-8 
     extended labels for transfer within DNS messages, per the 
     instructions provided in section 5.1. 
      
     Any and all responses to these queries will also be encoded as 
     UTF-8 and encapsulated in EDNS/UTF-8 extended labels. Resolvers 
     MUST decode the provided response data, convert the labels to 
     their canonical UCS character codes, and return the requested data 
     to the calling application. 
      
     The resolver MUST NOT normalize or case convert internationalized 
     domain names which may be received in queries or response 
     messages. Since the queries have originated from applications 
     which have indicated that they are compliant with this 
     specification (via the API) while the responses will have 
     originated from caches or servers which indicate that they are 
     also compliant (via the EDNS/UTF-8 extended labels), those systems 
     are assumed to have normalized and case-converted the domain names 
     before they were generated or stored. Also note that applications 
   
  Hall                    I-D Expires: May 2002              [page 45] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     will validate the host identifiers that they receive in response 
     messages, so an additional check is expected to be performed on 
     the answer data by those systems. 
      
      
  7.2.2.  Fall-back processing 
      
     If a queried server is unable to process EDNS/UTF-8 extended 
     labels, then it is required by STD13 to generate an error 
     signifying the problem. Resolvers MUST interpret these errors, 
     decode the UTF-8 queried domain name, re-encode it as STD13 octets 
     and/or ACE per the instructions provided in section 5.2, and then 
     reissue the query as an STD13 legacy label sequence. 
      
     The legacy DNS error responses which will trigger this series of 
     events are FORMERR and NOTIMPL. Any other errors indicate that the 
     EDNS/UTF-8 extended label was successfully processed but that the 
     query was not matched, and those errors MUST be returned to the 
     application. If the fallback processing results in any error 
     responses whatsoever, then the resolver MUST return those errors 
     to the calling application. 
      
     Any servers which subsequently receive the fall-back queries and 
     which are compliant with this specification will process the 
     queries as internationalized domain names, and will return the 
     answer data as STD13 octet sequences or ACE encoded data, using 
     the STD13 legacy label. 
      
     Generally speaking, fall-back processing serves two purposes: 
      
        *   Answering the initial query. If a UTF-8 domain name cannot 
            be resolved because a server in the delegation path does 
            not understand the EDNS/UTF-8 label type, the resolver can 
            reissue the query as an ACE encoded legacy label type so 
            that the query proceeds past the problematic server. 
      
        *   Seeding the resolver's cache. As a result of the above, the 
            resolver will learn about the authoritative name servers 
            for the target zone, and this information can be used for 
            any subsequent queries for domain names within the 
            specified zone (for as long as the data is cached, anyway). 
            As such, any subsequent EDNS/UTF-8 queries which are issued 
            for the portion of the namespace served by that zone will 
            be sent directly to one of those authoritative servers 
            where they can be answered directly. In this regard, 
   
  Hall                    I-D Expires: May 2002              [page 46] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
            subsequent lookups do not require fall-back processing if 
            they are received during the cache window. 
      
     Regardless of whether or not fall-back processing has been 
     performed, if the calling application issued the original query as 
     an internationalized domain name, then the resolver MUST respond 
     to the query in that form as well. This means that the resolver 
     MUST convert any STD13 octet sequences or ACE encoded labels into 
     their canonical UCS characters, convert the answer data into the 
     resolver's native charset or encoding, and return the data to the 
     calling process. The resolver MUST NOT perform any normalization 
     or case-conversion during this process, as such an action can 
     corrupt domain names which are not used for host identifiers. 
      
     If the original query was received through the resolver's legacy 
     APIs, then the query MUST be generated and returned in the legacy 
     format, and MUST NOT be converted to an internationalized domain 
     name prior to the query or response being passed through. 
      
     Once fall-back processing occurs, the process MUST NOT be repeated 
     for any additional queries in the current lookup operation. No 
     other queries from the current lookup operations MUST NOT be sent 
     as EDNS/UTF-8 extended labels, since multiple fall-back operations 
     can result in time-outs on the client systems. 
      
     Because the fall-back process results in two lookups being issued 
     against the rejecting zone, eliminating the fall-back processing 
     as soon as possible will be an operational requirement for many 
     organizations. Any caches or forwarders which are used by stub 
     resolvers within an end-user network are practically required to 
     be able to process the EDNS/UTF-8 queries, since those servers 
     will receive every query which is issued by the stub resolvers. 
     While this isn't a technical requirement (fall-back processing 
     will get around the problematic servers), it will likely prove to 
     be a consideration for network operators looking to support 
     internationalized domain names on their local networks. 
      
     This document also strongly encourages the root and TLD servers to 
     be upgraded as soon as possible (even if they do not intend to 
     directly provide UTF-8 domain name delegations), in order to allow 
     those servers to read and process the EDNS/UTF-8 extended labels, 
     thereby reducing the number of fall-back queries which are sent to 
     those servers. 
      
      
   
  Hall                    I-D Expires: May 2002              [page 47] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
  7.3.    The Hosts Database 
      
     Generally speaking, there are two areas of consideration for stub 
     resolvers that provide local hosts databases for name resolution 
     services. These are the input requirements for internationalized 
     domain names which will be added to the hosts database, and the 
     requirements which govern how queries will be compared to the 
     entries in the hosts database. 
      
     Note that resolvers are not required to implement a hosts database 
     or local lookup services (STD3 says "a host MAY also implement a 
     host name translation mechanism that searches a local Internet 
     host table"). However, wherever a hosts database is provided with 
     an internationalized resolver, compliance with the rules specified 
     in this section is required. 
      
     If a stub resolver offers the capability to compare 
     internationalized domain names against a local hosts database, 
     that database MUST be compatible with the internationalized domain 
     name rules specified in section 4 of this document. 
      
     In particular, the resolver SHOULD allow internationalized domain 
     names with any code values to be stored, even if the canonical UCS 
     characters for those values are undefined or are illegal for use 
     with internationalized host identifiers (this is required to 
     support domain names which are not host identifiers). In those 
     cases where an internationalized domain name specifies an exact 
     sequence of octets for binary comparison, the hosts database MUST 
     provide a mechanism for tagging the eight-bit characters so that 
     they are not interpreted, processed or compared as the canonical 
     UCS character equivalents of those codes. 
      
     However, entries which explicitly provide host identifiers MUST be 
     normalized and case-converted prior to being stored. In order to 
     satisfy both of these requirements, it is RECOMMENDED that hosts 
     databases store internationalized host identifiers as untagged 
     data, but that they also provide some sort of tagging service for 
     character code values which are to be returned as-is. STD13 
     defines an escaping mechanism whereby the decimal value of the 
     octet is prefaced with a reverse-solidus (such as "\193"), which 
     is suggested for this usage. 
      
     The storage format of the hosts database MAY use any charset or 
     encoding the resolver deems most suitable for that platform, as 
     long as the rules and restrictions provided above are followed. 
     Since UTF-8 is used as the default encoding throughout this 
   
  Hall                    I-D Expires: May 2002              [page 48] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     specification, it is RECOMMENDED as the default encoding for hosts 
     databases as well, although this is not required. 
      
     Not all of the applications which use a resolver are likely to be 
     compliant with this specification, so resolvers MUST ensure that 
     they are able to interpret and process any queries from the legacy 
     APIs which provide the ACE equivalent of an internationalized 
     domain name that is stored in the hosts database. When such a 
     query arrives, the domain name MUST be converted to the canonical 
     UCS character codes represented by the ACE encoded sequence and 
     compared to entries in the hosts database in that form (tagged 
     octets excluded). Any internationalized domain names which are 
     required to be returned through the legacy APIs MUST be converted 
     to STD13 octet sequences and/or ACE before they are returned. 
      
      
  8.      Server Guidelines 
      
     When a zone administrator desires to provide internationalized 
     domain names in a zone, they are presented with two options: they 
     can add the STD13 octets or ACE encoded internationalized domain 
     names to an existing zone, or they can use internationalized zone 
     databases directly. Both of these usage scenarios have their own 
     benefits and restrictions. 
      
     Using STD13 octet sequences and ACE with legacy servers allows for 
     the immediate deployment of internationalized domain names on 
     existing servers, and within hierarchies which include 
     internationalized domain names. However, any such queries which 
     originate at applications that are compliant with this 
     specification will always initially fail, guaranteeing that fall-
     back processing will always occur for those zones. 
      
     Conversely, using internationalized zones directly allows servers 
     to process legacy, ACE and EDNS/UTF-8 queries equally, thereby 
     providing greater value to the applications and resolvers which 
     have been made compliant with this specification. However, 
     internationalized zones have additional requirements (most 
     notably, they are required to be upgraded simultaneously), and 
     these will prove burdensome to some zone operators. 
      
     This specification focuses on the processing requirements for 
     internationalized zones which support the use of internationalized 
     domain names as explicit data, and which also support the 
     necessary subordinate mechanisms such as EDNS/UTF-8 queries. When 
     STD13 octet sequences or ACE encoded domain names are used with 
   
  Hall                    I-D Expires: May 2002              [page 49] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     legacy servers, the rules defined in STD13 for those servers MUST 
     be used. 
      
     Note that each zone SHOULD be configurable independently. If a 
     server hosts multiple zones, each of those zones SHOULD be 
     operable as independent entities, with any of them using ACE or 
     internationalized domain names as necessary. This rule is 
     necessary since each zone is likely to have different replication 
     partners and configuration rules which will require different 
     migration strategies. 
      
      
  8.1.    Internationalized Zones 
      
     All domain names which are published by an internationalized zone 
     MUST be compatible with the restrictions specified in section 4 of 
     this document. In particular, the zone database MUST allow binary 
     domain names to be stored as any octet value, but MUST also comply 
     with the normalization and case-mapping rules when a domain name 
     represents a host identifier. These restrictions MUST be applied 
     as part of the process in which the domain name is being added to 
     the zone database. In those cases where an internationalized 
     domain name specifies an exact sequence of octets for binary 
     comparison, the hosts database MUST provide a mechanism for 
     tagging the eight-bit characters so that they are not interpreted, 
     processed or compared as the canonical UCS character equivalents 
     of those codes. STD13 defines an escaping mechanism whereby the 
     decimal value of the octet is prefaced with a reverse-solidus 
     (such as "\193"), which is suggested for this usage. 
      
     Servers which are compliant with this specification MUST be 
     capable of providing UTF-8 and ACE encoded representations of the 
     UCS domain names which are stored in the zone, and servers MUST 
     restrict output to only one label type for any protocol operation, 
     such that queries containing STD13 legacy labels MUST be answered 
     with STD13 octet sequences and/or ACE encoded domain names, while 
     EDNS/UTF-8 queries MUST only be answered with UTF-8 encoded domain 
     names (this not only includes basic operations such as simple 
     queries, but also includes advanced operations such as zone 
     transfers; see section 8.2). Similarly, external operations such 
     as exporting the contents of the zone to a master file (as 
     discussed in section 8.3) MUST result in a single encoding form 
     being used for that specific operation. 
      
     Note that the underlying zone database technology which may be 
     employed by any particular server is beyond the scope of this 
   
  Hall                    I-D Expires: May 2002              [page 50] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     document. Servers MAY use any database technology, charset or 
     encoding deemed appropriate for the local environment, although 
     the contents of the zone MUST be mapped to the canonical UCS 
     character codes for all comparison operations (octet values 
     excluded). Since UTF-8 is used as the default encoding throughout 
     this specification, it is RECOMMENDED for use as the default 
     encoding with zone databases as well, but is not required. 
      
     Servers MUST NOT normalize or case-map any UCS characters which 
     are decoded from UTF-8 or ACE encoded labels, and MUST restrict 
     comparison operations of these labels to precise matches of the 
     UCS domain names which are stored in the zone database. However, 
     the seven bit character codes from any labels which are received 
     as STD13 octet sequences MUST be compared in a case-neutral form, 
     and MUST NOT be normalized as part of the comparison operation. 
      
     When a zone is converted to support internationalized domain 
     names, all of the servers which replicate that zone MUST be 
     upgraded. This is required due to ambiguities that can occur with 
     labels which may be encoded as either STD13 octet sequences or ACE 
     data, and where the label only uses character codes from the 
     eight-bit range of character codes (this problem is described in 
     detail in section 4.1.2). In order to ensure that all of the 
     servers for a zone respond to one of those queries correctly, all 
     of the servers which replicate the zone MUST fully support this 
     document and its requirements. 
      
      
  8.2.    Namespace Visibility Restrictions 
      
     In all cases, the encoding format of the domain names which are 
     returned in response to a query MUST be the same as the encoding 
     format which was used by the query. If the query was provided as a 
     sequence of legacy labels, then all of the domain names which are 
     provided in the response message MUST be provided as legacy labels 
     (containing either ACE or STD13 octet encoded values). 
      
     Similarly, if a query is provided as EDNS/UTF-8 encoded data, all 
     domain names which are provided in the response message MUST be 
     provided as UTF-8 encoded data in EDNS/UTF-8 extended labels. In 
     some situations, this process may require the server to perform an 
     extra conversion. 
      
     For example, assume that the <idn>.example.com. domain name has 
     two associated MX resource records, one of which points to the UCS 
     domain name of mail.<idn>.example.com, while the other points to 
   
  Hall                    I-D Expires: May 2002              [page 51] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     the ACE encoded domain name of mail.<ace>.example.net. (where the 
     "<ace>" label is the ACE equivalent of an internationalized sub-
     domain in the example.net. zone). If a UTF-8 query arrives for the 
     MX resource records associated with the <idn>.example.com. domain 
     name, both resource records MUST be returned as EDNS/UTF-8 data. 
     In order for this requirement to be satisfied, the server will 
     have to decode the <ace> label to its UCS canonical form for zone 
     storage purposes, and encode the domain name as UTF-8 for 
     transmission whenever an EDNS/UTF-8 answer set is required. 
      
     The visibility rules specified in this section are mandatory for 
     every domain name which is provided in any message. If a system 
     requests a zone transfer and uses the EDNS/UTF-8 extended label 
     type in the request, all of the domain names in all of the 
     messages which are sent as part of the zone transfer MUST be 
     provided in their UTF-8 encoded form. Similarly, if a zone 
     transfer is requested and uses the legacy label type, then all of 
     the domain names from all of the messages which are sent as part 
     of the zone transfer MUST be provided as either STD13 octet 
     sequences or ACE encoded data, using the legacy label type. 
      
      
  8.3.    The Master File Format 
      
     STD13 specifies a "master file" format which is used as a 
     platform-neutral storage and transfer format for importing and 
     exporting the contents of a particular zone. Note that the master 
     file is not the same as the operating database for a zone; the 
     master file format is used (or is useful) for copying a zone to 
     another server, storing a copy of the zone database off-line, 
     emailing a copy of the zone to another user or system, and 
     performing other off-line actions against the database' contents. 
     Once a zone is loaded on a server, however, any database 
     technology can be used for managing the zones and generating 
     response messages. 
      
     In order to facilitate the continued use of master files, any zone 
     which is compliant with this specification MUST support the use of 
     UTF-8 as an import and export encoding format for the master file 
     associated with that zone. 
      
     Furthermore, compliant versions of a master file are required to 
     have the "$UTF-8" control literal at the beginning of the first 
     line of text in the master file if it contains UTF-8 encoded data. 
     Master files from zones which do not contain UTF-8 encoded domain 
   
  Hall                    I-D Expires: May 2002              [page 52] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     names MUST NOT contain the "$UTF-8" control literal in the first 
     print position of any line. 
      
     If the master file contains the "$UTF-8" control literal, all of 
     the data within the master file MUST be encoded in UTF-8 as 
     specified by RFC2279, and SHOULD be managed with UTF-8 compliant 
     tools (such as UTF-8 text editors, mailers that support UTF-8 MIME 
     encodings, and so forth). 
      
      
  9.      Caching Guidelines 
      
     Whenever an internationalized domain name is stored in a cache, it 
     MUST be stored in its canonical UCS character code form, 
     regardless of whether the domain name was received as STD13 octet 
     encoding sequences, UTF-8, or ACE data. Caches MUST NOT normalize 
     or case convert any domain names that they store, as such a 
     process could invalidate domain names that are not used for host 
     identifiers. 
      
     Any subsequent queries which are processed through the cache MUST 
     be compared against the stored UCS characters. Internationalized 
     domain name labels which are decoded from UTF-8 or ACE labels MUST 
     NOT be normalized or case-converted as part of the comparison 
     operation, although labels which are provided as STD13 octet 
     sequences MUST be compared as case-neutral octet values. 
      
     Caches MUST be capable of providing UTF-8 and ACE encoded 
     representations of the UCS domain names which are stored in the 
     cache, with the appropriate format determined by the format used 
     in the corresponding query. However, answer data MUST be 
     restricted to only one encoding form for any protocol operation, 
     meaning that queries containing legacy labels MUST only be 
     answered with STD13 octet sequences and/or ACE encoded labels, 
     while UTF-8 queries MUST only be answered with UTF-8 encoded 
     domain names. 
      
      
  10.     Security Considerations 
      
     This document defines an extension to the domain name system, and 
     as such, it inherits the weaknesses which already exist in DNS. 
     Where possible, this specification strengthens DNS with multiple 
     checks. For example, this specification requires that domain names 
     be validated three times before they are used by applications: 
     once on specification, once on entry at the authoritative zone or 
   
  Hall                    I-D Expires: May 2002              [page 53] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
     hosts database, and once again when the answer data is received by 
     the requesting application. Despite these checks, the root 
     weaknesses inherent in DNS are still present. 
      
     This document uses multiple encoding algorithms, although boundary 
     conditions from the existing DNS are preserved for both the source 
     and encoded representations. 
      
      
  11.     IANA Considerations 
      
     This document requires the use of an EDNS extended label type 
     identification code. This document uses the b000011 ELT code. 
      
      
  12.     References 
      
          [AMC-ACE-Z] <draft-ietf-idn-amc-ace-z>, "AMC-ACE-Z version 
            0.3.1" 
      
          [NAMEPREP] <draft-ietf-idn-nameprep>, "Preparation of 
            Internationalized Host Names" 
      
          [RFC2119] "Key words for use in RFCs to Indicate Requirement 
            Levels" 
      
          [RFC952] "DoD Internet host table specification" 
      
          [STD13] (RFC 1034) "Domain names - concepts and facilities", 
            (RFC 1035) "Domain names - implementation and 
            specification" 
      
          [STD3] (RFC 1122) "Requirements for Internet Hosts -- 
            Communication Layers", (RFC1123) "Requirements for Internet 
            Hosts -- Application and Support" 
      
          [BCP18] (RFC 2277) "IETF Policy on Character Sets and 
            Languages" 
      
          [RFC2279] "UTF-8, a transformation format of ISO 10646" 
      
          [RFC2671] "Extension Mechanisms for DNS (EDNS0)" 
      
          [ASCII] "ANSI X3.4-1968. USA Standard Code for Information 
            Interchange" 
      
   
  Hall                    I-D Expires: May 2002              [page 54] 
  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
   
   
          [ISO10646] "ISO/IEC 10646-1:2000. International Standard -- 
            Information technology -- Universal Multiple-Octet Coded 
            Character Set (UCS) -- Part 1: Architecture and Basic 
            Multilingual Plane" 
      
      
  13.     Acknowledgements 
      
     This document is an assembly of multiple ideas and proposals which 
     have been made on the IDN working group mailing list. Many of the 
     ideas presented here have been proposed by multiple parties in one 
     form or another, although Dan Oscarsson is credited for proposing 
     a dual-mode operation which is capable of simultaneously 
     supporting UTF-8 and legacy mode encodings. Other contributors to 
     key elements from this specification (some of them unknowingly or 
     unwillingly) include (alphabetically) Marc Blanchett, Adam 
     Costello, Mark Davis, Martin Duerst, Patrik Faltstrom, Paul 
     Hoffman, David Hopwood, and many others. 
      
      
  14.     Editor's Address 
      
     Eric A. Hall 
     ehall@ehsco.com 
      
      
      
   
  Hall                    I-D Expires: May 2002              [page 55] 
