On Word Frequency Analysis and Advanced Procrastination for Writers by Ian McHugh

Ian McHugh is a fellow member of the CSFG and we were having a discussing on the mailing list the other day about this strange thing Ian had discovered in terms of word frequency in fiction. So I asked him if he’d consider writing up his findings and guest posting here for me. After all, that saves me having to write up what he found and it’s his baby anyway. He was foolish kind enough to agree. So, many thanks to Ian and hopefully you guys might find some of this quite interesting.

On Word Frequency Analysis and Advanced Procrastination for Writers

by Ian McHugh (ianmchugh.wordpress.com)

A few weeks ago, fellow CSFG member Phill Berrie wrote a post about word frequency analysis, a tool he uses in his work as an editor. In his post, Phill included a link to a free online word frequency analyser. Plug the text of your story in and it spits out:

  • the total word count of the story
  • how many different unique words you’ve used (a, few, weeks, ago, etc)
  • and how many times you’ve used them (a=36, few=5, weeks=2, ago=2)

Since I had set aside that weekend for working on the final draft of my novel, I decided instead (see “advanced procrastination”, above) to plug a few of my stories into the online analyser and see what the results were. After plugging all of my stories into the analyser, it told me a bunch of stuff that I already pretty well knew:

  • I’m using less adjectives and adverbs than I used to.
  • I have developed a habit of overusing the word as to join two clauses in a sentence.
  • I somehow don’t write stories between 3,000 and 4,000 words long. Like, ever.

What it also showed, that I hadn’t realised before, was that the number of different unique words that I use has fallen by about 20-25% since I first started writing. For stories over 6,000 words, my number of unique words per thousand has dropped from up near 300 to under 230.

So, why?

I had a couple of hypotheses:

Hypothesis #1
My vocabulary is shrinking. No, seriously. I had to look up synonyms for theory to find hypothesis. Then I had to look up like to find synonym. I was very hard on my brain in my late teens and early twenties – like, “I can’t really remember 1991 to 1994” kind of hard on my brain. I flunked out of art school because I was too stoned and drunk. Art school. That’s like flunking out of rock’n’roll for doing too much cocaine, only less cool. These days when I’m speaking, I often lose my words in mid-sentence. Maybe I’m using less words because I’m losing my words?

Hypothesis #2
Or, given that I’m using less adjectives and adverbs in my stories, maybe I’m just cutting out the crap?

So I wondered what the unique word counts would be for writers operating at a higher level than me. I just happened to have a softcopy of Kaaron Warren’s first short story collection, The Grinding House, so I plugged a few of Kaaron’s old stories into the analyser. Casting about, I also had a softcopy of a longish Lucius Shepard story from Issue 1 of Crowded Magazine. In both cases, I found that the unique word counts were down around 200 per 1,000 words.

Interesting!

Then I went to Tor.com and grabbed a few stories by authors who I immediately recognised as famous, award-winners, working novelists etc, and plugged those in. There was a wider range, but most of the unique word counts were still at or below the low end of my own stories.

So, does this mean that better writers use less words, but use them better? It’s an appealing idea. Had I cracked the secret code to being a better writer?

Yeah, no.

Nice idea, but it holds water about as well as… as one of them thingies that you wash lettuce in… like a bowl, but with holes in it… eh, nevermind.

When I threw a wider net (this was still my novel-editing weekend, mind you – advanced procrastination, remember) and looked at a larger sample of stories from online SFWA pro-markets (including more stories from Tor.com and stories from Apex, Beneath Ceaseless Skies, Clarkesworld, Lightspeed and Strange Horizons) the unique word counts were all over the place. Including from some of the same authors I’d looked at in the first sample. So much so that it’s not even meaningful to talk about any kind of mean or median.

If anything, many of them were opposite to where my stories have been headed, with unique word counts above my high early average.

So where does this leave me? Back at Hypothesis #1? Was Kaaron also hard on her brain in her youth?

Is there maybe some superficial similarity between my writing style and Kaaron’s writing style? Or at least, Kaaron Warren circa 1994 to 2003? Hell, I’d take that, any day.

Colander!

In all honesty, I wouldn’t say that my writing style really is like Kaaron’s in any way you’d notice, but if I have lifted something from her work and incorporated it into my own, it wouldn’t be at all surprising. The Grinding House was a book that made a big impression on me in the early part of my writing career. (Kaaron still uses a quote from my review of it.)

Similarly, if there’s any single story that most influenced me as a new writer, it was Tony Daniel’s “A Dry Quiet War”. Because of that story, I wrote ““Bitter Dreams”, which is probably still my best story, and have kept on writing Westerns since then. “A Dry Quiet War” has a unique word count under 200 per thousand words.

Shepard was another early influence. While he does write elaborate fantasy stories (the Dragon Graiule tales, for example), he’s also written knuckle-dragging, hairy-backed manly stories for Playboy, with protagonists who are terse like the love-child of Clint Eastwood and Conan the Barbarian.

Maybe there’s a clue there. I tend to write in a close third-person or, occasionally, first-person point of view. A lot of my recent stories have featured protagonists who are in some way “simple” – mentally simple, children, from simple socio-cultural settings, or just plain terse. It follows that, with a close point-of-view, the narrative voice for a simple character should also be simple.

Simple character = simple language = lower unique word count.

And a lot of my more complex and elaborate stories are ones with higher unique word counts.

That seems like one of those revelations that’s bleeding obvious once you see it. “Well, of course I knew that!” I think there’s a lesson there, though, in terms of writing consciously for your character’s voice.

And another thing I found? One of the sweet spots for story length for (at least the) SFWA pro markets (I looked at) seems to be between 3,000 and 4,000 words long.

Sigh.

Another sweet spot seems to be between 5,000 and 6,000 words – in which range my stories have, overall, been noticeably less successful than they have over 6,000 words or under 3,000.

Well, I guess if nothing else I found out what I need to work on.

And I did also write/edit nearly 10,000 words of the final draft of my novel that weekend.

Advanced procrastination.

Speaking of which: You should be writing! So go find your character’s voice, and get back to work!

.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • StumbleUpon
  • LinkedIn
  • MySpace
  • Reddit
  • Slashdot
  • Technorati
  • RSS
  • Twitter

4 thoughts on “On Word Frequency Analysis and Advanced Procrastination for Writers by Ian McHugh

  1. I remember an awful meme going around that encouraged writers not to use ‘said’ and to replace it with words like ‘gushed’, ‘sighed’ or ‘exploded’. I think it’s quite possible that your low count means you are writing lovely, clean prose. Not that there’s anything wrong with wordiness, just that it often does go horribly wrong in practice. Apallingly, shockingly hideously wrong.

Leave a Comment