1. If you match one of the standard English dialects (American, Canadian, etc.) pretty well, you are probably a native English-speaker.
2. Non-native English speakers rarely use regionally-specific language like I'm finished my homework.
The algorithm is only just starting to be able to guess native languages other than English. As it learns more, we'll know more about what the distinguishing features of different language backgrounds are.
1. Canadians, Irish, and Scottish accept I'm finished my homework instead of with my homework.
2. Americans, Canadians, and South Africans accept I sent my mother a letter instead of to my mother.
3. Some Australians and New Zealanders will say, 'She's raining outside' instead of 'It's raining outside.'
The algorithm that guesses your native language and dialect works like this:
We measure the Euclidean distance between your responses and the typical responses for each dialect. Whichever dialect you are closest to is likely your dialect.
For example, below is a table showing the percentage of speakers of Dialects A and B who answer "grammatical" to each of four sentences. On the right, you can see some participant's answers (1 = grammatical, 0 = ungrammatical).
|Sentence||Dialect A||Dialect B||Participant|
Euclidean distance (?) may be familiar if you ever if you ever calculated distances in geometry. The major difference is that we have more than three dimensions to deal with. In our table above, it is four dimensions. In the actual quiz, it is about 80. But the basic procedure is the same.
In our example, the distance between the participant and Dialect A is 0.5 whereas the distance between the participant and Dialect B is 1.7. The actual number is not especially meaningful -- the more questions in the survey, the larger the distances are going to be -- but what is meaningful is that our participant is closer to Dialect A and so is more likely to speak that dialect.
To look at native language as well, we define non-native "dialects" based on the typical answers given by people whose native language is the same. That is, we find the typical answers for the "Spanish dialect", the "Russian dialect", and so on. If a participant is closest to a traditional dialect of English (like American or Canadian), the algorithm guesses that they are a native English speaker. However, if their answers are closest to one of the non-native dialects (like the Spanish dialect), it guesses that they are a Spanish-speaker.
Of course, the algorithm can only make guesses about dialects for which it has a lot of information. So the algorithm cannot guess that someone's native language is (for example) Japanese until enough Japanese-speaking people have participated in the quiz. Only then does the algorithm have a good sense as to how Japanese-speaking people use English grammar.
This algorithm is fairly simple. However, it is surprisingly accurate, particularly for dialects about which it has a lot of information. We are also testing out more sophisticated algorithms. Whenever we have interesting results to report, we will describe them on the findings page and/or our blog.