admin管理员组

文章数量:1287827

"Françoise Lefèvre"@example

I'm reading RFC 5321 to try to actually understand what constitutes a valid email address -- and I'm probably making this a lot more difficult than it needs to be -- but this has been bugging me.

               i.e., within a quoted string, any
               ASCII graphic or space is permitted
               without blackslash-quoting except
               double-quote and the backslash itself.

Does this mean that ASCII extended character sets are valid within quotes? Or does that imply standard ASCII table only?

EDIT - With the answers in mind, here's a simple jQuery validator that could work in supplement to the the plugin's built-in email validation to check the characters.

jQuery.validator.addMethod("ascii_email", function( value, element ) { 
    // In pliance with RFC 5321, this allows all standard printing ASCII characters in quoted text.
    // Unquoted text must be ASCII-US alphanumeric or one of the following: ! # $ % & ' * + - / = ? ^ _ ` { | } ~   
    // @ and . get a free pass, as this is meant to be used together with the email validator

    var result = this.optional(element) || 
        (
            /^[\u002a\u002b\u003d\u003f\u0040\u0020-\u0027\u002d-u002f\u0030-\u0039\u0041-\u005a\u005e-\u007e]+$/.test(value.replace(/(["])(?:\\\1|.)*?\1/, "")) &&     
            /^[\u0020-\u007e]+$/.test(value.match(/(["])(?:\\\1|.)*?\1/, ""))   
        );
    return result;
}, "Invalid characters");

The plugin's built-in validation appears to be pretty good, except for catching invalid characters. Out of the test cases listed here it only disallows ments, folding whitespace and addresses lacking a TDL (ie: @localhost, @255.255.255.255) -- all of which I can easily live without.

"Françoise Lefèvre"@example.

I'm reading RFC 5321 to try to actually understand what constitutes a valid email address -- and I'm probably making this a lot more difficult than it needs to be -- but this has been bugging me.

               i.e., within a quoted string, any
               ASCII graphic or space is permitted
               without blackslash-quoting except
               double-quote and the backslash itself.

Does this mean that ASCII extended character sets are valid within quotes? Or does that imply standard ASCII table only?

EDIT - With the answers in mind, here's a simple jQuery validator that could work in supplement to the the plugin's built-in email validation to check the characters.

jQuery.validator.addMethod("ascii_email", function( value, element ) { 
    // In pliance with RFC 5321, this allows all standard printing ASCII characters in quoted text.
    // Unquoted text must be ASCII-US alphanumeric or one of the following: ! # $ % & ' * + - / = ? ^ _ ` { | } ~   
    // @ and . get a free pass, as this is meant to be used together with the email validator

    var result = this.optional(element) || 
        (
            /^[\u002a\u002b\u003d\u003f\u0040\u0020-\u0027\u002d-u002f\u0030-\u0039\u0041-\u005a\u005e-\u007e]+$/.test(value.replace(/(["])(?:\\\1|.)*?\1/, "")) &&     
            /^[\u0020-\u007e]+$/.test(value.match(/(["])(?:\\\1|.)*?\1/, ""))   
        );
    return result;
}, "Invalid characters");

The plugin's built-in validation appears to be pretty good, except for catching invalid characters. Out of the test cases listed here it only disallows ments, folding whitespace and addresses lacking a TDL (ie: @localhost, @255.255.255.255) -- all of which I can easily live without.

Share Improve this question edited Aug 12, 2010 at 13:57 Greg asked Aug 12, 2010 at 12:48 GregGreg 7,9228 gold badges45 silver badges69 bronze badges 4
  • In general, the best answer to this sort of question is the address is valid if you can get a couple different MTAs to accept it. The IETF standards don't always specify things as clearly as you might want. – msw Commented Aug 12, 2010 at 12:57
  • Don't validate the individual characters. Rather validate the syntax. – BalusC Commented Aug 12, 2010 at 13:59
  • @BafusC I do validate the syntax. I would also like to stop people from cramming Sanskrit into an ASCII-only field. The two are not mutually exclusive. I do realize however that true email validation with a RegEx, is as one redditer put it, is "like building a house using nothing but a power drill." Client-side validation is only there to tell someone "hey, this doesn't belong" -- and I believe this is a good, simple way of doing that. – Greg Commented Aug 12, 2010 at 14:03
  • Also, that regex in that link is terrible. I don't know why it was voted up like that. It's alright to accept bad emails, but you can't have a script turn away tons of perfectly valid ones. It fails on something as simple as [email protected]. Please, webmasters, if you don't want to put in the effort to do client-side validation correctly, just don't do it at all. Instead, fire off an email and see if it works. – Greg Commented Aug 12, 2010 at 14:23
Add a ment  | 

4 Answers 4

Reset to default 4

According to this MSDN page the extended ASCII characters aren't valid, currently, but there is a proposed specification that would change this.

http://msdn.microsoft./en-us/library/system.mail.mailaddress(VS.90).aspx

The important part is here:

Thomas Lee is correct in that a quoted local part is valid in an email address and certain mail addresses may be invalid if not in a quoted string. However, the characters that others of you have mentioned such as the umlaut and the agave are not in the ASCII character set, they are extended ASCII. In RFC 2822 (and subsequent RFC's 5322 and 3696) the dtext specification (allowed in quoted local parts) only allows most ASCII values (RFC 2822, section 3.4.1) which includes values in ranges from 33-90 and 94-126. RFC 5335 has been proposed that would allow non-ascii characters in the addr-spec, however it is still labeled as experimental and as such is not supported in MailAddress.

In this RFC, ASCII means US-ASCII , i.e., no characters with a value greater than 127 are allowed. As a proof, here are some quotes from RFC 5321:

The mail data may contain any of the 128 ASCII character codes, [...]

[...]

Systems MUST NOT define mailboxes in such a way as to require the use in SMTP of non-ASCII characters (octets with the high order bit set to one) or ASCII "control characters" (decimal value 0-31 and 127). These characters MUST NOT be used in MAIL or RCPT mands or other mands that require mailbox names.

These quotes quite clearly imply that characters with a value greater than 127 are considered non-ASCII. Since such characters are explicitly forbidden in MAIL TO or RCPT mands, it is impossible to use them for e-mail addresses.

Thus, "Francoise Lefevre"@example. is a perfectly valid address (according to the RFC), whereas "Françoise Lefèvre"@example. is not.

Technically yes, but read on:

While the above definition for Local-part is relatively permissive,
for maximum interoperability, a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form or where the Local-part is case- sensitive.

...

Systems MUST NOT define mailboxes in such a way as to require the use in SMTP of non-ASCII characters.

The HTML5 spec has an interesting take on the issue of valid email addresses:

A valid e-mail address is a string that matches the ABNF production 1*( atext / "." ) "@" ldh-str 1*( "." ldh-str ) where atext is defined in RFC 5322 section 3.2.3, and ldh-str is defined in RFC 1034 section 3.5.

The nice thing about this, of course, is that you can then take a look at the open source browser's source code for validating it (look for the IsValidEmailAddress function). Of course it's in C, but not too hard to translate to JS.

本文标签: javascriptIs this a valid email addressStack Overflow