On Word Frequency Analysis and Advanced Procrastination for Writers by Ian McHugh

Ian McHugh is a fellow member of the CSFG and we were having a discussing on the mailing list the other day about this strange thing Ian had discovered in terms of word frequency in fiction. So I asked him if he’d consider writing up his findings and guest posting here for me. After all, that saves me having to write up what he found and it’s his baby anyway. He was foolish kind enough to agree. So, many thanks to Ian and hopefully you guys might find some of this quite interesting.

On Word Frequency Analysis and Advanced Procrastination for Writers

by Ian McHugh (ianmchugh.wordpress.com)

A few weeks ago, fellow CSFG member Phill Berrie wrote a post about word frequency analysis, a tool he uses in his work as an editor. In his post, Phill included a link to a free online word frequency analyser. Plug the text of your story in and it spits out:

  • the total word count of the story
  • how many different unique words you’ve used (a, few, weeks, ago, etc)
  • and how many times you’ve used them (a=36, few=5, weeks=2, ago=2)

Since I had set aside that weekend for working on the final draft of my novel, I decided instead (see “advanced procrastination”, above) to plug a few of my stories into the online analyser and see what the results were. After plugging all of my stories into the analyser, it told me a bunch of stuff that I already pretty well knew:

  • I’m using less adjectives and adverbs than I used to.
  • I have developed a habit of overusing the word as to join two clauses in a sentence.
  • I somehow don’t write stories between 3,000 and 4,000 words long. Like, ever.

What it also showed, that I hadn’t realised before, was that the number of different unique words that I use has fallen by about 20-25% since I first started writing. For stories over 6,000 words, my number of unique words per thousand has dropped from up near 300 to under 230.

So, why?

I had a couple of hypotheses:

Hypothesis #1
My vocabulary is shrinking. No, seriously. I had to look up synonyms for theory to find hypothesis. Then I had to look up like to find synonym. I was very hard on my brain in my late teens and early twenties – like, “I can’t really remember 1991 to 1994” kind of hard on my brain. I flunked out of art school because I was too stoned and drunk. Art school. That’s like flunking out of rock’n’roll for doing too much cocaine, only less cool. These days when I’m speaking, I often lose my words in mid-sentence. Maybe I’m using less words because I’m losing my words?

Hypothesis #2
Or, given that I’m using less adjectives and adverbs in my stories, maybe I’m just cutting out the crap?

So I wondered what the unique word counts would be for writers operating at a higher level than me. I just happened to have a softcopy of Kaaron Warren’s first short story collection, The Grinding House, so I plugged a few of Kaaron’s old stories into the analyser. Casting about, I also had a softcopy of a longish Lucius Shepard story from Issue 1 of Crowded Magazine. In both cases, I found that the unique word counts were down around 200 per 1,000 words.


Then I went to Tor.com and grabbed a few stories by authors who I immediately recognised as famous, award-winners, working novelists etc, and plugged those in. There was a wider range, but most of the unique word counts were still at or below the low end of my own stories.

So, does this mean that better writers use less words, but use them better? It’s an appealing idea. Had I cracked the secret code to being a better writer?

Yeah, no.

Nice idea, but it holds water about as well as… as one of them thingies that you wash lettuce in… like a bowl, but with holes in it… eh, nevermind.

When I threw a wider net (this was still my novel-editing weekend, mind you – advanced procrastination, remember) and looked at a larger sample of stories from online SFWA pro-markets (including more stories from Tor.com and stories from Apex, Beneath Ceaseless Skies, Clarkesworld, Lightspeed and Strange Horizons) the unique word counts were all over the place. Including from some of the same authors I’d looked at in the first sample. So much so that it’s not even meaningful to talk about any kind of mean or median.

If anything, many of them were opposite to where my stories have been headed, with unique word counts above my high early average.

So where does this leave me? Back at Hypothesis #1? Was Kaaron also hard on her brain in her youth?

Is there maybe some superficial similarity between my writing style and Kaaron’s writing style? Or at least, Kaaron Warren circa 1994 to 2003? Hell, I’d take that, any day.


In all honesty, I wouldn’t say that my writing style really is like Kaaron’s in any way you’d notice, but if I have lifted something from her work and incorporated it into my own, it wouldn’t be at all surprising. The Grinding House was a book that made a big impression on me in the early part of my writing career. (Kaaron still uses a quote from my review of it.)

Similarly, if there’s any single story that most influenced me as a new writer, it was Tony Daniel’s “A Dry Quiet War”. Because of that story, I wrote ““Bitter Dreams”, which is probably still my best story, and have kept on writing Westerns since then. “A Dry Quiet War” has a unique word count under 200 per thousand words.

Shepard was another early influence. While he does write elaborate fantasy stories (the Dragon Graiule tales, for example), he’s also written knuckle-dragging, hairy-backed manly stories for Playboy, with protagonists who are terse like the love-child of Clint Eastwood and Conan the Barbarian.

Maybe there’s a clue there. I tend to write in a close third-person or, occasionally, first-person point of view. A lot of my recent stories have featured protagonists who are in some way “simple” – mentally simple, children, from simple socio-cultural settings, or just plain terse. It follows that, with a close point-of-view, the narrative voice for a simple character should also be simple.

Simple character = simple language = lower unique word count.

And a lot of my more complex and elaborate stories are ones with higher unique word counts.

That seems like one of those revelations that’s bleeding obvious once you see it. “Well, of course I knew that!” I think there’s a lesson there, though, in terms of writing consciously for your character’s voice.

And another thing I found? One of the sweet spots for story length for (at least the) SFWA pro markets (I looked at) seems to be between 3,000 and 4,000 words long.


Another sweet spot seems to be between 5,000 and 6,000 words – in which range my stories have, overall, been noticeably less successful than they have over 6,000 words or under 3,000.

Well, I guess if nothing else I found out what I need to work on.

And I did also write/edit nearly 10,000 words of the final draft of my novel that weekend.

Advanced procrastination.

Speaking of which: You should be writing! So go find your character’s voice, and get back to work!


The story’s the thing and the meaning of words

I like words. That much is blatantly obvious to anyone who knows me and most who don’t. Language, words and stories are the foundation of everything we’ve become as a cultural animal. Language and words evolve too. You might hate it when people say arks instead of ask, as in, “Can I arks you a question?” To which you reply, “You just did, now go back to fucking school!” But you’d be wrong, kinda. The modern dialectal “ax” is as old as Old English “acsian” and was an accepted literary variant until c.1600. So “arks” is closer to the old version than “ask”. Although the word does derive from the Old English “ascian” (not the variant “acsian”), so the correct word has always been “ask” really. Anyway, I’m rambling like an old man on a day trip from the care home. My point is that language evolves and changes.

It can be upsetting sometimes, when we feel like language is dying or being killed off by the uneducated youth of today. But it’s not. It’s an organic thing, doing what it’s always done. After all, you don’t call a happy man gay any more. Unless he’s happy and likes cock, then it’s okay. And you could call him gay even if he was unhappy. Woah, this crazy thing called language!

booksSo I got to thinking about the nature of storytelling, as that’s my thang, and how it’s changing. And, by extension, how the language around storytelling is changing. It came up when I was sitting on the couch with my Kindle the other day and my wife called out from the other room, “What are you doing?”

I panicked and quickly checked that I wasn’t up to something, but rallied and replied, “Just reading a book… er, novel.”

And it surprised me. I was reading a book. Albeit an ebook. It was a novel. I could as easily have been reading a short story, novella or saved web page on my Kindle. I should have simply replied, “Just reading.” But it was out there. I was etymologically stunned for a moment. Why had I corrected myself? I wondered if the word “book” would change in meaning. At what point might it refer only to an actual paper and pages physical book? Would that ever happen? Would we then refer to ebooks by their type – novel, novella, collection and so on?

Let’s look at some definitions (all from dictionary.com):

1. a written or printed work of fiction or nonfiction, usually on sheets of paper fastened or bound together within covers.
2. a number of sheets of blank or ruled paper bound together for writing, recording business transactions, etc.
3. a division of a literary work, especially one of the larger divisions.

While “a written… work” is primary, the bit “usually on sheets of paper fastened or bound together within covers” is a key part of the definition. It seems that book applies to the artefact as much, if not more than, the content. That’s why we specify ebook when we’re referring to an electronic copy.

So perhaps it’s better, when reading on my Kindle, to say, “I’m reading a novel.” I don’t think I’ll ever say, “I’m reading an ebook”, as it seems irrelevant in some way. It’s not a papery artefact, so I don’t say “book”. The fact that it’s an ebook does little to impart what I’m actually reading.

1. a fictitious prose narrative of considerable length and complexity, portraying characters and usually presenting a sequential organization of action and scenes.
(Interestingly – 2. (formerly) novella Origin: 1560–70; < Italian novella (storia) new kind of story. That's evolved now to mean a short novel.)

So that definitely describes better what activity I’m engaged in. Of course, I could say that I’m reading a story.

1. a narrative, either true or fictitious, in prose or verse, designed to interest, amuse, or instruct the hearer or reader; tale.
2. a fictitious tale, shorter and less elaborate than a novel.
3. such narratives or tales as a branch of literature: song and story.
4. the plot or succession of incidents of a novel, poem, drama, etc.: The characterizations were good, but the story was weak.
5. a narration of an incident or a series of events or an example of these that is or may be narrated, as an anecdote, joke, etc.

This would work well if I was reading a short story, collection or anthology. But, as you can see from the definition, it doesn’t really work linguistically in terms of a novel. It’s come to indicate something shorter.

Of course, when reading a short or a novel, we’re absolutely enjoying a story. After all, regardless of the delivery system, the story’s the thing. That’s what we’re there for. When it comes to my own work, much as I love the beautiful artefact that is a paper book, all I’m really interested in is people reading my stories, be they short or novel. Read them on paper, ereader, computer screen, whatever. I don’t care. You could read them transcribed in felt pen on a hooker’s breasts for all I care, as long as you’re enjoying the story. And now I have this urge, at some point in my life, to read a story written on a hooker’s breasts. Ah well, something else for the bucket list.

So have I solved the conundrum? Actually, no. Because what if I’m reading a non-fiction work on my Kindle. It’s an ebook, so not a book in the artefect sense. But it’s not a novel either. Maybe I could then say, “I’m reading a book about literature on hookers’ breasts in the early twenty first century.”

My wife would come stumbling into the room saying, “What!? I haven’t seen a book like that lying around.”

To which I heft my Kindle and say, “It’s an ebook.” *sigh*

Language. It’s a funny old thing.


French translation reprint in Monstres! anthology

I’m still pretty tied up in the Kung Fu seminar, but it’s nearly at an end. My wife will be very glad when I get home and start pulling my weight again. In the meantime, I had to mention this bit of news. Some time ago I sold a reprint of my monster short story, Deep Sea Fishing, to the Monstres! anthology, coming soon from Celephais Press. The story was first published in Seizure, issue 4. It’s very exciting on many levels. Firstly, it’s my first foreign langauge translation – in this case into French. The anthology title should have been a clue – that wasn’t a typo. Not to mention the title of this post.

My story has been translated by Vincent Corlaix. I’m intrigued to think about what he may have done. I wonder how much of my voice and style survives a translation. I guess that’s the sign of a good translator – one who will keep those things intact. I’m sure Corlaix has done an excellent job. In translation, my story is called Pêche en haute mer. Which is kinda cool. It’s a Lovecraft-inspired yarn and fits the monsters theme well.

The other good thing is that Celephais have released the cover art, and it’s bloody brilliant. See for yourself – click it for a bigger image:

You’ll notice the list of contributors on the back cover and I’m very proud to share a Table Of Contents with a couple of very good friends – Kaaron Warren and Bill Congreve. It’s also nice to see my name right next to Lavie Tidhar. It’s actually the second time I’ve shared a ToC with Mr Tidhar – last time in Murky Depths, #16. Lavie, we must stop meeting like this. People will talk.

This antho will be available in early January and I’ll drop another mention then for those French-reading friends and readers. Or perhaps you could buy a copy for the French friends in your life. You’ve got a French friend or two, right?

Here’s the full ToC:

Blue (Blue), de Pablo Dobrinin, traduction Jacques Fuentealba
Dieu est argent (Working for the God of the Love of Money), de Kaaron Warren, traduction Benoît Giuseppin
Les reines de l’évasion, de Célia Deiana
L’heure des suicidés, Marc R. Soto, trad. Jacques Fuentealba
Fantômes (Fantasmas), de Carlos Gardini, trad. Jacques Fuentealba
Blood Faerie, une symphonie nocturne, de Yohan Vasse
Tania (Tania), de Fermín Moreno, trad. Jacques Fuentealba
Les meilleurs partent toujours en premier, Nelly Chadour
À l’aube de la nuit (Until Sunrise), Bill Congreve, trad. Luc Kenoufi
Mater Insania, de Marija Nielsen
Altera in alteram, de Léonor Lara
Ma femme est un shoggoth (I married a Shoggoth), de Jeffrey Thomas, trad. de Maxime Le Dain
Lien de sang (Blood Relations), de Lewis Shiner, trad. Élodie Meste
En préparant le pot-au-feu, de Timothée Rey
Grand-père Loup (Grand-Father Wolf), de Steve Rasnic Tem trad. Mathieu Rivero
L’Évolution des espèces (La evolución de las especies), de Nuria C. Botey, trad. Marie-Anne Cleden
Pêche en haute mer (Deep Sea Fishing), de Alan Baxter, trad. Vincent Corlaix
Le vieil homme et la mer. Et l’étranger. Et le Kraken. (El viejo y el mar. Y el extraño. Y el Kraken.), de Pedro Escudero, trad. Jacques Fuentealba
Zombi Revenge psyché, de Marc-Olivier Aiken
Lanjnoir (Blakenjel), de Lavie Tidhar, trad. Thomas Bauduret
Je ne suis pas un monstre, de David Pierru

I’ll get back to regular blogging when my mind and body recover from this seminar, hopefully towards the end of the week.


Service interruption due to kicking butt

I apologise if things are a bit quiet around here for a couple of weeks. As most of you probably know, my “day job” is teaching people to kick butt – I’m a Warrior Scribe. Martial arts practice and instruction, just like writing, requires constant practice and improvement, and the taking of every opportunity to learn. For the next two weeks I’m at an intensive Master training seminar with my teacher in Sydney, training six hours a day and spending the evenings drinking with training buddies, then collpasing into bed with phrases like, “Ow, my fucking arms!”, “Where did that bruise come from?” and “Holy shit, Kung Fu hurts but it’s so freaking cool!”

So posting here will be infrequent if not non-existent until mid-December. In the meantime, let me leave you with a word and a challenge. The word is collop. It’s good, epiglottal sort of word, huh?

1. A small slice of meat.
2. A small slice, portion, or piece of anything.
3. A fold or roll of flesh on the body.

The challenge is this – use it in a sentence in everyday speech. If someone asks if you want ham on your sandwich, say, “Sure, just a collop, thanks.” Or perhaps say to your loved one, “Baby, let me lick your collops.” You know, that sort of thing. Do feel free to comment with any successful usages of the word. And you’re welcome – it’s a good’un, I know.


Bulwer-Lytton Fiction Contest 2011 Results

SnoopyThe Bulwer-Lytton Fiction Contest is one of my favourite literary events. It’s a brilliant idea. It stems from the awful writing of Edward George Bulwer-Lytton. You probably think you’ve never heard of him. But I can almost guarantee you have. Here, see if this is familiar:

“It was a dark and stormy night;”

Yep. You know him. But did you know just how bad he was? Here’s the rest of that line, from Paul Clifford (1830):

“It was a dark and stormy night; the rain fell in torrents–except at occasional intervals, when it was checked by a violent gust of wind which swept up the streets (for it is in London that our scene lies), rattling along the housetops, and fiercely agitating the scanty flame of the lamps that struggled against the darkness.”

Holy crap.

It’s writing like that which gave rise to the contest. During his studies Professor Scott Rice of the English Department at San Jose State University unearthed the source of that famous line, “It was a dark and stormy night”, as being the opening of the Edward George Bulwer-Lytton novel, Paul Clifford. And it is a very famous line. After all, Snoopy uses it all the time and that Beagle knows his shit.

For all his hideous writing skills, Lytton coined some phrases we all know well. Among them “the pen is mightier than the sword”, “the great unwashed”, and “the almighty dollar”. He’s had an impact, has Bulwer-Lytton.

So Professor Rice, with the help of San Jose State University, has, since 1982, put together the contest which seeks the worst opening lines to the worst of all novels. You can learn all about the contest here: http://www.bulwer-lytton.com/

Meanwhile, the 2011 results are in. The winner this year is the shortest entry to ever win the contest. It comes from Sue Fondrie of Oshkosh, WI. (Yeah, I thought that was a children’s clothing line for people with more money than sense, but apparently it’s a place too.) Here’s the winning line:

Cheryl’s mind turned like the vanes of a wind-powered turbine, chopping her sparrow-like thoughts into bloody pieces that fell onto a growing pile of forgotten memories.

Top work, Sue. Congratulations.

Rodney Reed of Ooltewah, TN takes out the runner-up prize with this one:

As I stood among the ransacked ruin that had been my home, surveying the aftermath of the senseless horrors and atrocities that had been perpetrated on my family and everything I hold dear, I swore to myself that no matter where I had to go, no matter what I had to do or endure, I would find the man who did this . . . and when I did, when I did, oh, there would be words.

There are other winners in several categories (Adventure, Crime, Sci-Fi, Vile Puns, etc.) and they’re all listed on the contest site here. Go and have a read. They’re hilarious.