Email Syntax Check in Python

3 May 2008

Sometimes you may want to check that an email address is not syntactically invalid, i.e. it looks like a recognisable email address. I use this approach in my zetact contact form processor.

Of course, it does not mean the address actually leads anywhere, but at least you know are dealing with an email address that could exist.

This is the code I have been using, albeit I have changed it from a class method to a simple function to make this post simpler.

"""Email check using regex."""
    def invalidreg(emailkey):
        """Email validation, checks for syntactically invalid email
        courtesy of Mark Nenadov.
        See
        http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65215"""
        import re
        emailregex =
        "^.+\\@(\\[?)[a-zA-Z0-9\\-\\.]+\\.([a-zA-Z]{2,3}|[0-9]{1,3\
    })(\\]?)$"
        if len(emailkey) > 7:
            if re.match(emailregex, emailkey) != None:
                return False
            return True
        else:
            return True

I decided it would be more Pythonic to try to do this using the built-in string methods, rather than importing the re module and using a monster regular expression. Here was my first attempt.

"""Email checks using string methods - simple version."""
    def invalidemail(emailaddress):
        """Checks for a syntactically invalid email address."""
        try:
            emailitems = emailaddress.rsplit('@', 1)
            emailitems.extend(emailitems[1].rsplit('.', 1))
        except IndexError:
            return True

        if [x for x in emailitems if not x.replace(".","").isalnum()] \
                and emailaddress >= 7:
            return True
        else:
            return False

After a bit of testing and playing with this, a friend pointed me towards the relevant RFC on restrictions of email addresses. While the standard allows the use of many different special characters, in practice email addresses have to be much stricter if you actually want people in the real world to be able to send email to you.

For example, if we allow the email address []@commandline.org.uk, will whatever receives the output of this function be able to use it? As pointed out by Jan Goyvaerts, most software won't actually be able to handle obscure special characters.

We also don't want to water down the syntax check and allow junk for the sake of theoretical but non-existent addresses.

My compromise is to allow these special symbols -_.%+. in the local-part of the email address, and -_. in the domain name. I also do sanity checking on the top-level domain, it needs to be either a generic name or two characters long (country codes are all two letters).

So below is my current version, I added lots of comments and white space to make it easy to read.

"""Ditch nonsense email addresses."""

    GENERIC_DOMAINS = "aero", "asia", "biz", "cat", "com", "coop", \
        "edu", "gov", "info", "int", "jobs", "mil", "mobi", "museum", \
        "name", "net", "org", "pro", "tel", "travel"

    def invalid(emailaddress, domains = GENERIC_DOMAINS):
        """Checks for a syntactically invalid email address."""

        # Email address must be 7 characters in total.
        if len(emailaddress) < 7:
            return True # Address too short.

        # Split up email address into parts.
        try:
            localpart, domainname = emailaddress.rsplit('@', 1)
            host, toplevel = domainname.rsplit('.', 1)
        except ValueError:
            return True # Address does not have enough parts.

        # Check for Country code or Generic Domain.
        if len(toplevel) != 2 and toplevel not in domains:
            return True # Not a domain name.

        for i in '-_.%+.':
            localpart = localpart.replace(i, "")
        for i in '-_.':
            host = host.replace(i, "")

        if localpart.isalnum() and host.isalnum():
            return False # Email address is fine.
        else:
            return True # Email address has funny characters.

    # Start the ball rolling.
    if __name__ == "__main__":
        print invalid("warrior@example.com")

Discuss this post - Leave a comment

1 dbr says...

There's a better, if utterly horrible to read way of doing this using regex's.

http://emailverification.pastecode.com/?show=f76a41a8b

This way isn't too bad, it allows blah+thesethingys@example.com which a lot of websites invalidate (Which is incredibly annoying).. One thing I find a little weird - a return of False means the email is valid? I would have though if valid(mail): print "Valid email" would be a more sensible way of doing things? That way: if not valid(email): print "Wrong" # would work

Posted at 4:33 p.m. on May 3, 2008


2 Ted Hosmann says...

I like the idea in your last example to check that the Domain is valid - problem is...what about users with subdomain email addresses (ted@mail.example.com) or users with country email domains (ted@example.co.uk)

Posted at 7:43 a.m. on May 4, 2008


3 Zeth says...

@dbr,

Checking for syntactically invalid email addresses is what the function does, so:

if invalid(emailaddress):
  #do something

Otherwise the program can just carry on, no else clause required. Maybe my programming style is just different, you can easily change it to be the other way if you want.

Ted, If you read the code more carefully or try it out, you will see that both of your examples will pass the test.

subdomains are not a problem because I allow dots in the hostname: for i in '-_.':

Country code domains are catered for by if len(toplevel) != 2

Posted at 10:06 a.m. on May 4, 2008


4 Zeth says...

@dbr

On regular expressions, the aim of this post is to use Python built-in string methods instead of regular expressions. Your example, blah+thesethingys@example.com will be considered valid by my function as I allow the plus sign: for i in '-_.%+.'

Posted at 10:10 a.m. on May 4, 2008


5 Zeth says...

Here is dbr's regular expression (the pastebin is only temporary).

import re

monster = "(?:[a-z0-9!#$%&'*+/=?^_{|}~-]+(?:.[a-z0-9!#$%" + \
    "&'*+/=?^_{|}~-]+)*|\"(?:" + \
    "[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]" + \
    "|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9]" + \
    "(?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?" + \
    "|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.)" + \
    "{3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?" + \
    "|[a-z0-9-]*[a-z0-9]:(?:" + \
    "[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]"  + \
    "|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])"

evil = re.compile(monster)

if evil.match("test+label@google.museum.au"):
    print "yay!"

Posted at 10:33 a.m. on May 4, 2008


6 John Reese says...

Just as an FYI, I get an 'XML Parsing Error: not well-formed' message in my newsreader (Liferea) for this entry. Line number 94, Column 98.

This is the first (mostly/enough) valid email checker I've seen that doesn't use a monster regex. I definitely like it.

Posted at 7:56 p.m. on May 4, 2008


7 Ted Hosmann says...

@Zeth

ARGH - I feel like such a n00b. You, my friend, are absolutely correct. Thanks for clearing that up for me.

Posted at 8:59 p.m. on May 5, 2008


8 Omar Zabaneh says...

Zeth,

Thank you for this post, very helpful. I used it as a basis for my own email validation function that i wish to share with you, in a selfish attempt to feel better about using your code.

I delegated the domain verification to dns, It sounds like a good idea, but im not aware at the moments of any drawbacks. please let me know what you think. Here's my code:

import dns.resolver

def valid( email_address ):
    # check email parts
    try:
        username, domain = email_address.rsplit('@', 1)
    except ValueError:
        return False
    # check username: allow alphanumeric characters and the dot
    if not username.replace( '.', '' ).isalnum():
        return False
    # check domain
    try:
        dns_response = dns.resolver.query( domain, 'MX')
    except dns.resolver.NoAnswer:
        # this host doesn't have MX records
        return False
    except dns.resolver.NXDOMAIN:
        # no such hostname
        return False
    return True

by the way, on line 11 of your second code snippet, shouldn't it be "len(emailaddress) >= 7" as opposed to emailaddress >= 7.

Also, you can call the function like this:

if not valid( email_address ):
    # do this

without requiring and else part as well. however its all a matter of taste. both ways seem valid to me.

Thanks again!

Posted at 2:26 a.m. on July 25, 2008


9 Zeth says...

Thanks Omar, that is a really good snippet, thanks for taking the time to share it with us.

Posted at 11:22 a.m. on July 25, 2008


10 Phill says...

Personally I don't bother with email syntax checking - it seems pointless to me. The majority of typos are going to be of the form tpyo@examlpe.com, which the syntax check wouldn't catch.

Omar - I like your script, checking DNS is a good idea, but there is one thing you should know. It looks like you're checking for the MX record, which is great, except that the RFC specifies if no MX record is found but there is a CNAME you should use that instead.

All this means that email is actually a complicated business. I think the only way to be sure of an email address, if it's really important, is to send someone a link to that address with a confirmation token and have them click on it.

And the reason I'm posting this nearly three months after the original post is because I just noticed the comments :)

Posted at 3:45 p.m. on July 31, 2008


11 andylockran says...

I'm sure I found a work around for this when I used zetact around the time of this post.. , but this regex excludes the email address andy@lnmf.info.

Any ideas how to change it to allow that form of address?

Posted at 3:27 p.m. on August 14, 2008


12 deesha says...

hey all, i just wanted to check the other stuf if nay1 can help me what i want is i have to convert the line starting with my function name to some other format , i have done with that much part but what i want ahead is the line wont be everytime starting as a first character so it may have like my function name if having some other check also in itself wherin say if chaeck is to be done to be verified for the same line if it is acceptable then convert the line else not

Posted at 5:40 a.m. on September 8, 2008


13 Amit says...

The regex in comment #5 is generally pretty good, but it incorrectly matches strings such as 'john@doe@johndoe.com'. As per Wikipedia, an email address can have only one '@' sign.

Posted at 9:21 p.m. on February 3, 2009


14 inportb says...

@Amit: so... I take it that you haven't checked out the RFC yet?

Posted at 5:51 a.m. on April 22, 2009


15 Thomas Damgaard says...

Omar, thanks.

Posted at 1:10 p.m. on May 13, 2009


16 gameskillz says...

Killzone 2 - the best PS3 game yet?Still LittleBigPlanet for me, but Sony's new shooter is mightily impressive. What you think about my web? http://www.easyfaxlesspaydayloan.com/payday-loans-online.html

Posted at 8:39 a.m. on June 26, 2009


What do you have to say?

Show Editing Help

Europython

About

Hello, my name is Zeth, I'll be your host here.

Command Line Warriors is about taking control of your own technology, it looks at our experiences of computing; especially using GNU/Linux, the Python programming language, the command-line and issues such as techno-ethics, best practices and whatever is cool now. If you take control of your technology then you are a Warrior too!

This site is your site too which means that you can contribute and get involved. You can leave comments using the facility provided. For me, the comments and discussions are by far the best part of the site. So please do have your say!

Latest Discussions

gutes Qualitätscasino

July 3, 2009
The paragraph is the most basic block in a reST document. Paragraphs are simply chunks of text separated by one or more blank lines. As in Python, indentation is significant ...
An Introduction to ReStructuredText

sreejith

July 3, 2009
I want to download a file from remote server in binary format. Can anyone let me know the command to do so? Thanks in advance
PuTTY Series: Using PSFTP

jythlkedl;rg

July 2, 2009
????? ??? ????????? ?????? ?? ???????? ? ????????? ???, ? ??????? ??? ??????? ??????? ? ??? ?? ???? ?? ?? ????? ???????????????? ??????????????????. ??????????????????? ????? ?? ?? ????, ?, ???, ...
Burning an iso to CD on Windows

gbi-service-ru

July 1, 2009
???? ?????????, ?????????? ?? ?? ? ???"??? ??????, ?????????? ??? ?? ???? ? ????, ? ? ?????. ?? ??? ???? ???? ??? ???. ?? ?????????? ???? ?? ?. ???????? ?, ...
Burning an iso to CD on Windows

seo techniques

July 1, 2009
I would like to thank you for the inforamtion you have put on this article no matter.
Only the penitent man will pass - on captchas and cotton wool

Online Craps lernen

July 1, 2009
I would like to thank you for the making these clarifications in such a detailed manner to rebuilt the communication and enhancing the strategies of the organization which could be ...
Disclaimer: NO WARRANTY

ZK@Web Marketing Blog

July 1, 2009
Django is an amazing web framework; we built a lot of features in a very short period of time and Django [mostly] stayed out of our way. Last night as ...
Baby Steps with Django - Part 4 Django Applications and flow

overnight payday loans

July 1, 2009
I found commandline.org.uk very informative. The article is professionally written and I feel like the author knows the subject very well. commandline.org.uk keep it that way.
Only the penitent man will pass - on captchas and cotton wool

Drogo

June 30, 2009
Gotta agree with your sentiments about many modern games. The cost of a new game is prohibitive, especially for consoles (although I've noticed that PS2 games have crashed in price ...
Retro British Gaming - Part 3: Amstrad CPC Games

pppiohooddd

June 29, 2009
Free vadult video site! http://crech.us/ 1000 free video every day!
OpenSolaris, Gobuntu, and be careful who you kiss

Tesyimasystus

June 29, 2009
...Love this dude!!! http://www.esnips.com/doc/79c22395-7bd6-4299-92db-cf392e381698/kutiman---this-is-what-it-became Peace
5 Homebrew Python Games

Simon Tite

June 28, 2009
twitterfall is still there, I just tried it, and to me it beats Visible Tweets hands down. Problem with Visible Tweets: * Extremely **irritating** animations! (There are three available, but ...
Visualising your favourite keywords in Twitter

piffAltetle

June 28, 2009
??? ??? ???? ???????????? ??????,?????????? ???? ?????? ??????????? ???????,??????????? ????? mp3,??????? ??????????? ??????.
Encrypt your /home this Christmas: part three - moving your data to the encrypted partition

idhyougjdsyhfr

June 26, 2009
SMS Trap is something that never fails to help you get your partner off guard? Our software will make reading other people?s SMS as easy as ABC. Ready for some ...
Burning an iso to CD on Windows

Sozdanie-saitov-com

June 26, 2009
???? ???????? ? ?????????????????? ????????????? - ??? - ???? ?? ????? ?????. ???? ?? ??? ? ??????? ?????? ????????? ?????! ???, ?????23126 sozdanie-saitov.com@mail.ru
Burning an iso to CD on Windows

gameskillz

June 26, 2009
Killzone 2 - the best PS3 game yet?Still LittleBigPlanet for me, but Sony's new shooter is mightily impressive. What you think about my web? http://www.easyfaxlesspaydayloan.com/payday-loans-online.html
Email Syntax Check in Python

Anish

June 25, 2009
hey Moritz, Check this http://commandline.org.uk/python/my-merry-five-minutes-with-bazaar/
Setting up a bazaar server

gbi zavod 177

June 24, 2009
???? ?????????, ?????????? ?? ?? ? ???"??? ??????, ?????????? ??? ?? ???? ? ????, ? ? ?????. ?? ??? ???? ???? ??? ???. ?? ?????????? ???? ?? ?. ???????? ?, ...
Burning an iso to CD on Windows

vettone

June 24, 2009
??? ????? ????? ????,??????? ?? ???,????? ???? ???.????? ?? ??????????? ????,?????????? ?????? ???????? ?? ????.???? ????????: http://euro-football.ucoz.com ????? ???? ??????????.
Burning an iso to CD on Windows

tuegjhg78kjfhuey

June 23, 2009
? ???????????????? ???? ??? ???, ?? ?????? ?? ?????????, ???, ???????????????? ??? ??????????, ???? ????? ??? ??? http://remont.ucoz.ua/
Burning an iso to CD on Windows