Tips

Count Word Frequency In A Text File

Have you ever wanted to count the word frequency in a text file? Have you ever wanted to know how many times you used a word in a document? Well, with just two simple commands (grep and wc) we’re going to learn how to do just that.

As a writer, I try to avoid using the same words over and over again. It’s not just bad form, using the same words repeatedly actually makes the document more difficult to comprehend. The brain needs variety (and key concept repetition) or else it just kinda checks out and doesn’t pay much attention.

There are other times when you may want to count the number of words used in a document, perhaps as a reference. “Your honor, the document in question contained my client’s name 245 times – and none of the uses were factual!” I’m sure you can concoct a scenario where you may want this information. You can probably come up with one that doesn’t even involve a judge!

You can also use it a little more broadly, and we’ll cover that. For now, let’s make sure we’re all working on the same page. Press CTRL + ALT + T and open up your terminal, and then enter the following commands:

cd Downloads 
wget https://linux-tips.us/files/rnd-num.txt

There. Now that we’re on the same page (you have the text file in your downloads folder) we can all work with the same list of random numbers – instead of random words. (All we care about is that there are characters.)

Count Word Frequency:

Well, you already have the terminal open and you’ve already downloaded my random numbers file. We’ll substitute numbers for words – as words are just a string of characters. So, seeing as you’re prepared…

Trust me, it won’t make a difference that we’re using numbers. By the time I’m done explaining this, you’ll understand and apply it to words (or other characters) all on your own. It’s pretty straightforward and easy to understand.

Let’s say we wanted to count the instances of 62829. The command would look like this:

grep -i -o 62829 rnd-num.txt | wc -l

If you run that command, you’ll see that that string of characters occurs just once. That’s expected and correct.

You can also do things like finding all the instances of 1 (or any single character) in the list with this command:

grep -i -o 1 rnd-num.txt | wc -l

You can be even more complicated and find all the times a 7 immediately follows a 2. That command would look like this:

grep -i -o 27 rnd-num.txt | wc -l

(There are three instances of 27.)

So, what’s going on? Well, you are using grep (to search) the contents of the file. You are then piping the output to wc where the number of lines (instances) are being counted.

You can probably be pretty fancy with this, but I just wanted to give a quick overview. Mostly, I figured it’s a good excuse to dig out grep and wc – and who doesn’t like panning for nuggets in text?

Closure:

Yup… This one isn’t a very long (or complicated) article. That’s okay. I like articles of all shapes and sizes. This article will help you count word frequency, something we all may need at one point or another. Sure enough, Linux makes this a pretty simple task.

Thanks for reading! If you want to help, or if the site has helped you, you can donate, register to help, write an article, or buy inexpensive hosting to start your own site. If you scroll down, you can sign up for the newsletter, vote for the article, and comment.

KGIII

Retired mathematician, residing in the mountains of Maine. I may be old and wise, but I am not infallible. Please point out any errors. And, as always, thanks again for reading.