I thought I would try out the MultiLingProfiler tool linked from the ncelp.org website. You can find it here:
The idea is that you can test a text you have sourced or written to see how many words fall outside the 2000 most frequent words NCELP use for their vocab frequency bank. I copied and pasted a French text from frenchteacher.net, one I wrote for Higher Tier GCSE pupils. It's an interview with a female astronaut, adapted from an online source somewhere.
The tool highlights in orange any words which don't feature in the top 2000. Have a quick look at the text below. You'll note that the tool doesn't deal easily with verb chunks such as "avez-vous", so you can discount examples like that. Frequency counts (corpora) always produce surprising anomalies. So in the case below, words which you might be surprised to be in the top 2000 might include:
formation, partenaire, exigences, recueillir, fonctionner, quotidiennes, s'entraîner, affecté
Whereas words which ARE NOT in the top 2000 which might - stress might - surprise you are
mathématiques, ingénieurs, secondaire, mécaniques, pilote, incroyable
Now, we know that the sources of vocabulary frequency lists vary and may bear only partial resemblance to the language teenagers might encounter or want to know. So it would be surprising if there weren't apparent anomalies, when looked at from a teacher's point of view.
One thing which stands out to me (and this applies strongly the French) are the number of cognate words which help with comprehension. So the following words which are NOT in the top 2000 are nevertheless easy to work out from an English speaker's point of view. (The same would not necessarily apply to a speaker of a different language.)
astronaute, mathématiques, secondaire, navale, aviation, océanographique, mécaniques, pilote
This is based on an assumption that the teenager would know these words in English. They may struggle with aviation or océanographique.
Should we keep in mind the presence and frequency of cognates?
It seems to me that, while frequency is a very important factor to keep in mind when curriculum planning, you have to handle it carefully. You need to factor in the needs of the target audience, the availability of cognates (NCELP understandably decided to include them) and the thematic material you want students to hear, read, speak and write about.
The fact that NCELP is reluctant to specify topics or themes means that the frequency list they use may be too random or inappropriate in some ways for secondary learners. They rightly point out that texts largely contains high-frequency words (80% of words in a typical text would be from the top 2000), but you may still get anomalies.
NCELP have also mentioned that teachers should not be too slavish to frequency lists. This is true. Interesting material often contains rarer words and you'd be crazy not to teach them. Whether the 2000 words allow examiners to produce interesting, usable texts in papers remains to be seen. NCELP seem to think this will work if a small amount of glossing is allowed (up to 2% of the words in a text).
In the example below, you would need to gloss around a dozen words (not including highlighted chunks such as travaillez-vous). Copied into Word, the text comes out at 368 words. My maths suggests that I would need to gloss just over 3% - not too far off the NCELP figure. I could have simplified the text further too.
All this assumes that pupils would understand the high-frequency words, which of course they may or may not.
Anyway, just a bit of fun! You might like to try out the profiler yourself.
No one doubts that, for beginners especially, it's better to focus on common words rather than rare ones, but using published frequency lists can lead to some peculiar outcomes.
1. Depuis combien de temps pour la ?