text Archives • Linux Tips

Turn PDF Into Text

Today’s exercise is simple, though it will rely on the terminal because we’re just going to turn PDF into text. This isn’t something complicated and it’s fairly effective. It’s also an exercise anyone can follow along with. So, if you want to turn PDF into text, read this article!

I’m sure this works in multiple distros, but I only have instructions for a few in my notes. Most everyone should be able to follow along with this article and turn PDF into text. You’ll see…

While I’m sure everyone is familiar with PDF, I’ll explain…

PDF stands for Portable Document Format and is one of many standards for documents. Specifically, it’s ISO 32000 and is a file format brought to use by Adobe. Adobe is a proprietary product but the standard is open, meaning you have your choice of PDF readers, editors, and creators.

On the other hand, PDF may not be as easily parsed as other file formats. You may also just want to extract some text from a PDF or turn it into something more information-dense, sans pictures and fluffy formatting. There are any number of reasons why you might want to turn PDF to text and it’s a simple operation that’s going to give you ‘acceptable’ results most of the time.

The tool we’ll be using…

pdftotext:

We’ll use a tool known as ‘pdftotext’ which does as its name implies. It’s a tool that lets you turn PDF into text, so from .pdf to .txt is the goal. Like many Linux tools, this is a terminal-based operation.

You can check to see if pdftotext is already installed with this command:

which pdftotext

1	which pdftotext

If the output matches this, you can skip the installation step:

$ which pdftotext
/usr/bin/pdftotext

1 2	$ which pdftotext /usr/bin/pdftotext

If you want, you can check the man page and see that it is indeed the correct tool for the job if your job is to turn PDF into text. That’s this command:

man pdftotext

1	man pdftotext

That command will show you that the description is indeed what we want to accomplish in today’s article. That description is basically:

pdftotext – Portable Document Format (PDF) to text converter

(It may also tell you the version in that section, which is odd but is what it is.)

So, you can see that pdftotext is the correct tool for the job when you want to…

Turn PDF Into Text:

As I mentioned in the intro, if you want to turn PDF into text one of the ways to do so will require using the terminal. There are all sorts of GUI tools you can use to do this very same job, but we’ll do this in the terminal. So, you can usually get away with pressing CTRL + ALT + T to open your default terminal emulator. Otherwise, check your application menu and you’ll find a terminal option in there.

With your terminal open, we first will install a meta package so that we can use pdftotext to turn PDF into text. That application is ‘poppler’. You can pick from the following to match your package manager to install this.

Debian/Ubuntu/etc:

sudo apt install poppler-utils

1	sudo apt install poppler-utils

Arch/Manjaro/etc:

sudo pacman -S poppler

1	sudo pacman -S poppler

RHEL/Fedora/etc:

sudo dnf install poppler

1	sudo dnf install poppler

The poppler package contains pdftotext which is the tool we’re after in our quest to turn PDF into text. It’s a noble quest!

Now, the syntax is quite simple:

pdftotext <file_name>.pdf

1	pdftotext <file_name>.pdf

That will create a <file_name>.txt file in the same directory.

Now, if you checked the man page above, you’d see that there’s not a whole lot to this application. You can largely ignore all the options (and we will), though there aren’t that many.

The two options we are most interested in would be about just converting single pages into text. For that, you want the -f (first page) and -l last page flags. They do exactly what you’d expect and the syntax is as follows:

pdftotext -f <page_number> <-l <page_number> <file_name>.pdf

1	pdftotext -f <page_number> <-l <page_number> <file_name>.pdf

I’ll give you an example…

Let’s say you want to print pages 1 through 3. The syntax would be:

pdftotext -f 1 -l 3 <file_name.pdf>

1	pdftotext -f 1 -l 3 <file_name.pdf>

Sometimes this whole pdftotext thing doesn’t do a great job. If the PDF file is formatted in a fancy manner, it may just not come out in text all that well. Fortunately, PDF is an open standard and you can help it along with the -layout flag.

The -layout flag is described like this:

Maintain (as best as possible) the original physical layout of the text. The default is to ´undo’ physical layout (columns, hyphenation, etc.) and output the text in reading order.

So, that flag will do its best to turn the layout into what it was in the original PDF. This is a handy flag for when the output isn’t usable. It’s possible to retain columns, advanced formatting, and all of that stuff, meaning the text file output is more useful. You won’t always need this option, but it can come in handy. You can safely ignore the remainder of the man page for the vast majority of what folks are going to do with this command.

That’s pretty much all you need to know about the pdftotext application. It does what you think it’d do. It’s the tool you use to turn PDF into text, just like it says on the tin! Pretty handy!

Closure:

So, that’s an article…

If you’ve ever wanted to turn PDF into text, you now know how. You can use this to make a PDF easier to parse, easier to read, etc. It’s up to you how you use pdftotext. You now have the knowledge! You now have the power! Indeed, you have life by the horns. (Which is a rather silly place to grab onto.)

Man, this is a lot of articles… At this point, it’s almost habitual. Technically, I have published something every other day – for a long time. A couple of those articles weren’t really articles. They were placeholders because Mother Nature is a fickle beast and I live in a very remote location. We had a few major (deadly even) storms that took out our infrastructure. I think I can be forgiven for that – and I did upload articles saying that there’d be no article.

The site has come a long way…

I haven’t done a meta article in a while…

Seriously, without you (my readers) I’d have never kept going this long. It’s obviously not a money-making operation, but it is an educational operation. That’s more important than money.

Thanks for reading! If you want to help, or if the site has helped you, you can donate, register to help, write an article, or buy inexpensive hosting to start your site. If you scroll down, you can sign up for the newsletter, vote for the article, and comment.

Save A Web Page As Text

If you’re at all like me, you document all sorts of things and you too might find it handy to know how to save a web page as text. It’s not a complicated task; you can do it in the terminal easily enough. So, if you want to save a web page as text, read on!

This intro should be rather short. Imagine that!

I don’t have to explain what a web page is. It’s a page (just a page) on a website.

I don’t have to explain what text means. We’ll just be using .txt files.

While this isn’t something I’ve bothered with in a long time, you might find it interesting and helpful. If you’re into keeping notes of things you want to learn more about and remember, you may find saving a web page as text worthwhile.

You can organize the text files however you want and one of the best benefits is that you can perform searches on your local documents easily enough. This might be something that interests you, especially if you’re new and browsing around the web looking for things to learn.

We’ll only be using a couple of tools. We will be using the terminal.

curl:

The first application you’ll need to save a web page as a text file will be the curl application. The curl application is used to transfer a URL. A curl command downloads a file and shows it in your standard output.

If you check the man page, you’ll see:

curl – transfer a URL

See? Exactly as I had said. It’s the correct tool for the job.

You can also see this article about curl:

Let’s Have a Limited Look at Linux’s cURL Application

html2text:

This should be obvious by the title. It should be made further obvious by the title of this article. This is an application that turns HTML (Hypertext Markup Language – what is used on web pages more often than not) into plain text.

If you check the man page, you’ll see:

html2text – an advanced HTML-to-text converter

Once again, a fine application for the task at hand. You’ll see!

Save A Web Page As Text:

As mentioned above, this is a terminal-based operation. We’re going to save a web page as text, but we’re going to do it in the Linux terminal. More often than not, a terminal can be opened by pressing CTRL + ALT + T on your keyboard.

I’ll give installation instructions for the apt-using distros out there. These packages will be available in your package manager if you’re using any of the major distros. Just adjust these commands to match your needs.

curl:

sudo apt install curl

1	sudo apt install curl

html2text:

sudo apt install html2text

1	sudo apt install html2text

We’re interested only in the -o (output) flag for this application of html2text.

The Process:

The syntax to save a web page as text is simple. It looks like this:

curl <URL> | html2text -o <saved_filename>.txt

1	curl <URL> \| html2text -o <saved_filename>.txt

Simply, we’re using the curl application to grab the data, we then send that data through the pipe command where it’s processed by the html2text application.

An example would look like this:

curl https://linux-tips.us | html2text -o linux-tips.txt

1	curl https://linux-tips.us \| html2text -o linux-tips.txt

You can, of course, save individual pages as text. Here’s an example:

curl https://linux-tips.us/create-a-new-user/ | html2text -o create_a_new_user.txt

1	curl https://linux-tips.us/create-a-new-user/ \| html2text -o create_a_new_user.txt

The terminal output is interesting:

$ curl https://linux-tips.us/create-a-new-user/ | html2text -o create_a_new_user.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  195k    0  195k    0     0  66303      0 --:--:--  0:00:03 --:--:-- 66319

$ curl https://linux-tips.us/create-a-new-user/ | html2text -o create_a_new_user.txt

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 195k 0 195k 0 0 66303 0 --:--:-- 0:00:03 --:--:-- 66319

Then, you can use a plain text editor to read (and edit) the text file. You can view it in the terminal with just the cat command. That’d look like this:

cat <saved_filename>.txt

1	cat <saved_filename>.txt

Though, it’s probably easier to read the saved file with a decent plain text editor that has a GUI. There’s an abundance of text editors available for Linux, so pick your favorite and use that to read the saved output.

Closure:

Well, if you have ever wanted to save a web page as text, you now know how to do that. This was an article that came not from my notes but from my memory. I used to do this with some regularity but I’ve stopped doing so as of late. I haven’t kept so many new notes lately, though I’m not sure why not.

Anyhow, this is a nice and simple exercise that anyone should be able to follow. If you’re using a different package manager it may take a bit more effort, but it’s not complicated. The packages should be available in all the major distros, or something similar. The curl application will certainly be available and might even be installed by default.

Send A Message To Another Logged In User

Today’s article might be useful for system administrators or just for fun, as we learn to send a message to another logged-in user (in the terminal, of course). This shouldn’t be a complicated or lengthy article, though many of my recent articles have been significantly longer than usual.

If you’re just a regular desktop user, this might not be all that interesting, but you can still test it if you want. Besides, you never know when you will want to send a message to another logged-in user! It could happen.

Let’s say you have a server with people logged in via SSH. This could also be a single computer with multiple people logged in, should you wish to test this and play around with sending a message to another logged-in user. Let’s also say that you want to send them a message in the terminal.

Perhaps you’re going to log them off? Maybe you’re going to reboot the server? Who knows, maybe you want to give them some sort of directions and the easiest way to do so is to send them a message that pops up in their terminal. You can do that!

We’ll be using a few tools for this. None of them are all that complicated and these little tools (do one thing and do it well) are tools that make the Linux world go around.

For starters, we’ll be using the ‘who’ command.

who – show who is logged on

We will also be making use of the ‘awk’ command.

gawk – pattern scanning and processing language

Next, we’ll be using the ‘echo’ command.

echo – display a line of text

There will also be the ‘write’ command.

write — send a message to another user

We will also be using a pipe. We will pipe the output from one command to another command. We’ve done that lots of times on this site, so regular readers will already be familiar with a pipe and how it works.

Briefly speaking, a pipe is just one way to take the output from one command for use in another command. It’s a pretty handy tool to add to your Linux toolbox if you haven’t already done so. It’s a simple tool, which is a good thing.

If all of the above looks complicated, don’t be alarmed. It’s not all that complicated and the commands I share will be simple enough for most anyone to follow. You’ll be able to adjust them to your needs quite easily.

Send A Message To Another Logged In User:

As mentioned in the intro, you’ll want an open terminal for this. So, open your default terminal emulator. You can usually just press CTRL + ALT + T and your default terminal will open. This isn’t always true, but it’s true in many cases. You will otherwise need to open the terminal on your own.

With your terminal now open, let’s find out who is logged in. To do that, we only need the following command:

who -u

who -u

However, we only care about the first two fields, so let’s narrow that output with the following command:

who -u | awk '{print $1, $2}'

1	who -u \| awk '{print $1, $2}'

The output from that command is all we need for the next part. You use the first column to identify the username. That makes them easy to identify, or at least easier for most folks.

The other column is the 2nd one. That identifies their login method, basically which terminal they’re using, and is also what we will use to specify the recipient of our message. Next, to send a message to another logged-in user, you use a command similar to this:

echo "<message>" | write <$1> <$2>

1	echo "<message>" \| write <$1> <$2>

Or, take a look at this:

identify and send a message to a logged in user. — See? It’s not complicated. It’s harder to describe than it is to do.

So, in that case, the syntax of the command is easy, it’s just like this:

echo "this is a message" | write kgiii pts/3

1	echo "this is a message" \| write kgiii pts/3

You’ll notice that the output of the command isn’t on that screen. It was sent to the other screen, the screen where that user was logged in (specifically over SSH). It quite happily sends the message to the user logged in at that location.

You can’t use usernames alone, as it’s possible for more than one person to use the same username. This method identifies the user and the method/location they’ve used to log in. It’s a pretty handy command like that. It might look a bit complex, but it isn’t.

Closure:

So, if you’ve ever wanted to send a message to another logged-in user, you can now do that. It’s easier done than explained, but hopefully, you get the gist of it and can apply it to your personal computer usage.

It’s not always that easy to come up with ideas for articles. I often pull them from my notes, but my notes are a mess, and not all of them would make good articles. If there’s something you’d like covered, and I know the subject, feel free to contact me and let me know. Of course, don’t forget that I take guest articles when they’re about Linux.

A Little About The ‘tail’ Command

Today’s article is about the ‘tail’ command, seeing as the last article was about the ‘head‘ command. The tail command is the head companion’s counterpart. It only makes sense to cover one after covering the other, so today’s article will do just that.

Like the head command, the tail command has been with us for a long time, since pretty much the earliest days of Unix. Where head shows you the first lines in a file, the tail command shows you the lines from the end of the file. The man page describes tail as:

tail – output the last part of files

The tail command is pretty handy, often used by sysadmins to monitor log files. It can also be used like the head command to quickly check the contents of a text file, but it shows the material at the end of the file and not at the start of the file. That’s useful for remembering where you left off, for example. Anyhow, there are all sorts of ways to use it and this article will explain some of them.

Getting Started With The ‘tail’ Command:

I don’t think it’s all that important for this article (I’m not sure, I haven’t written it yet!), but we can start on the same page like we did with the head command.

We’ll need to get started with the terminal open. You can do open your terminal with your keyboard – just press CTRL + ALT + T and your default terminal should open.

Once you get the terminal open, you can run the following two commands. Be sure to press the enter button after each of them and it will download a handy text file (just some random numbers) so that we’re all working on the same file.

cd Downloads 
wget https://linux-tips.us/files/rnd-num.txt

1 2	cd Downloads wget https://linux-tips.us/files/rnd-num.txt

With that complete, we can head on into the main article! It shouldn’t be all that long or difficult.

The ‘tail’ command:

Seeing as you’ve already got the rnd-num.txt file downloaded and your terminal is already open, I think we can just jump into using the tail command. If you just want to view last 10 lines of a file, you can use this command:

tail rnd-num.txt

1	tail rnd-num.txt

On the other hand, you can use the -n flag to show a specific number of lines. If you only wanted to see the bottom 5 lines, you’d use this command:

tail -n 5 rnd-num.txt

1	tail -n 5 rnd-num.txt

Assuming you’re all playing the home game, and just to show a good example, the output from the final command would look similar to this:

tail with the -n flag — As you can see, it only shows the last five lines of text. Pretty neat, huh?

Along the same lines, but not necessarily as useful, is the -c command. It works the same way it does in the head command, namely showing the specified number of bytes. If you wanted to see the final 5 bytes, the command would look like:

tail -c 5 rnd-num.txt

1	tail -c 5 rnd-num.txt

You can also use tail on more than one file at a time. If you do so, you can also use the -v flag and it will helpfully show the names of the files. The command would look a little like this:

tail -v rnd-num.txt test.txt

1	tail -v rnd-num.txt test.txt

The output would look similar to this:

tail being used on multiple files — It helpfully shows you the file names, which can be handy if you’re using multiple files.

One of the command options available with tail isn’t available with head. That flag is the -f mostly used for logs. What happens is you use the -f flag and then tail keeps running, outputting new lines to your terminal as the occur. In that case, it’d be something like:

tail -f <log>

1	tail -f <log>

That should show the last 10 lines of the log file and then update when new lines are added to the log file you’ve opened. Use man tail for more usage information.

Closure:

And that’s it! There’s another article for the site and another article closer to reaching the project’s goals. This article covers the tail command, seeing as the head command was covered in the last article. Feel free to leave a comment sharing how you use the tail command, or maybe even just a comment or question to motivate me.

Thanks for reading! If you want to help, or if the site has helped you, you can donate, register to help, write an article, or buy inexpensive hosting to start your own site. If you scroll down, you can sign up for the newsletter, vote for the article, and comment.

It’s Time To Learn A Little About The ‘head’ Command

Today’s article is about the ‘head’ command. The head command is a tool for viewing a file’s contents (or piped data), starting from the top. There’s not a whole lot to the command, and this will make be a pretty short article that’s fit for a beginner.

The head command has been with us since the heady days of Unix (See what I did there?) and is still a useful command today. In fact, the man page defines it like this:

head – output the first part of files

As you can guess, it does exactly what it says on the tin. There are any number of circumstances when you might want to use it, but I often use it when I don’t remember the exact filename I’m after and just want to see the first few lines of text as a reminder. There are better uses.

Anyhow, there’s not a whole lot to it, but I’ll show you the basics. Like we did recently, let’s see if we can all get started on the same page. So, open your terminal and enter the following:

cd Downloads
wget https://linux-tips.us/files/rnd-num.txt

1 2	cd Downloads wget https://linux-tips.us/files/rnd-num.txt

Doing that will put you in the Downloads directory and will download a text file (it’s perfectly harmless) and it means we are all working with the same settings. You do not need to do this, but it could help.

About The ‘head’ Command:

Seeing as you’ve already got the terminal open, and that you figured it out without me having to repeat it like I do in almost every article, we’ll just jump right into the first command.

head rnd-num.txt

1	head rnd-num.txt

That should output the first ten lines of the rnd-num.txt file, looking something like this:

head in action — If you used the rnd-num.txt, your output should be the same. Pretty neat, huh?

That’s the first ten lines from the rnd-num.txt file or, in other words, the head of the file has been outputted to the terminal. This has a number of uses, including the pipe. You can easily pipe it to another command. It’d look something like this:

head rnd-num.txt | cat > test.txt

1	head rnd-num.txt \| cat > test.txt

That’s not all that head can do, it can output a specified number of lines. To do that, you use the -n flag. It looks like this:

head -n 5 rnd-num.txt

1	head -n 5 rnd-num.txt

You can also use the -c flag to show the first x-number of bytes in a file. That’s not very complicated. In this case, we’ll look at the first 25 bytes.

head -c 25 rnd-num.txt

1	head -c 25 rnd-num.txt

You’ll find the output looks pretty similar to the output from the previous head command. You can even work with multiple files and the head command will handle them easily. If you’re going to use multiple files, you should use the -v flag.

head -v rnd-num.txt test.txt

1	head -v rnd-num.txt test.txt

It’ll helpfully preface the start of each file with the name of the file. In this case, the first line of the output would look like this:

head with multiple files — See? It’ll handily list the filename before listing the output.

See? Pretty helpful!

As shown in the image, you can easily deal with multiple files and whatnot, but there’s really not much more to be done with the head command. If you’re curious, you can also enter man head to get more usage information.

Closure:

Yup… There’s another article. This is about the head command, a command that’s not used often but worth having in your toolbox. If you use it more often, feel free to leave a comment explaining what you do with it.