One simple way to increase the security of a form is to intelligently limit the string length of submitted data. The less characters allowed, the more limited are an attacker's potential array of exploits they can attempt using your form.
The trick of course, is to strike the right balance between limiting potential hacks to your site and usability.
A CAPTCHA test is what I would consider an unnecessary obstacle.
I have excellent vision and a top line monitor yet I sometimes have trouble identifying the obfuscated characters I am expected to type in on some website forms. This lack of usability is simply bad design.
Clever design and a little bit of manual oversight is a better idea in the majority of cases.
Identifying a potentially suspicious form submit by the length of the data submitted in the email field is something that can be done without any user interaction or involvement. It takes a little work to implement but makes for a better user experience.
So how long is an email address?
According to RFC 1034, the maximum length of a domain name cannot exceed 255 characters. The username or local part cannot exceed 64 characters. Plus the @ gives a total of 320 characters.
Allowing a user to enter 320 characters in an email field gives scope to some pretty elaborate XSS hacks. In practice, valid email addresses are much shorter than this.
To establish a practical limit on a suitable length to allow for an email field I needed some data. One of the sites I work on had a nice collection of 50,496 email addresses which I analysed to establish some rules for email length and most importantly, maximum length of the email field.
The distribution of email address lengths conforms to the discreet probability distribution known as the Poisson distribution. This type of distribution would also occur if you were to measure the time in seconds between vehicles passing a fixed point on a road.
The mean of the distribution of email lengths in this case is 23 characters. For the case of the Poisson distribution, the standard deviation is the square root of the mean or 4.8 in this case. If we wanted to allow everything within 5 standard deviations from the mean this would mean 23 + 5 x 4.8 = 47 characters.
In our data sample only 3 addresses exceeded this length so this appears to be a sensible value to use as the maximum length of an email address.
The data is also useful to establish a lower bound on the length of an email address.
There were some very short email addresses, however, none of them were valid. In the sample population of 50,496 email addresses, only 2 were 8 characters long. Thus, to avoid being classed as suspicious, an email address should be at least 8 characters in length.
Using these length rules will mean that about 1 in 16,000 users will be classed as suspicious. At that point you can decide whether to automatically reject their input or provisionally accept it subject to approval by a site administrator. This is an acceptable compromise between usability and security.