What is the maximum safe length of a last name to allow in a website form?

Mark Edwards

One simple way to increase the security of a form is to intelligently limit the string length of submitted data. The less characters allowed, the more limited are an attacker's potential array of exploits they can attempt using your form.

The trick of course, is to strike the right balance between limiting potential hacks to your site and usability.

A CAPTCHA test is what I would consider an unnecessary obstacle.

I have excellent vision and a top line monitor yet I sometimes have trouble identifying the obfuscated characters I am expected to type in on some website forms. This lack of usability is simply bad design.

Clever design and a little bit of manual oversight is a better idea in the majority of cases.

Identifying a potentially suspicious form submit by the length of the data submitted in the last name field is something that can be done without any user interaction or involvement. It takes a little work to implement but makes for a better user experience.

So how long is a last name?

To establish a practical limit on a suitable length to allow for a last name field I needed some data. One of the sites I work on had a nice collection of 26,558 last names which I analysed to establish some rules for last name length and most importantly, maximum length of the last name field.

The distribution of last name lengths conforms to the discreet probability distribution known as the Poisson distribution. This type of distribution would also occur if you were to measure the time in seconds between vehicles passing a fixed point on a road.

The mean of the distribution of last name lengths in this case is 6 characters. For the case of the Poisson distribution, the standard deviation is the square root of the mean or 2.45 in this case. If we wanted to allow everything within 6 standard deviations from the mean this would mean 5 + 6 x 2.45 = 19.4 characters. So for a practical limit we could set 20 characters.

In our data sample only 2 names exceeded this length so this appears to be a sensible value to use as the maximum length of a last name.

Using these length rules will mean that about 1 in 13,200 users will be classed as suspicious. At that point you can decide whether to automatically reject their input or provisionally accept it subject to approval by a site administrator. This is an acceptable compromise between usability and security.

Graph of last name length data poisson distribution with mean and standard deviation

