Networking

Save A Web Page As Text

If you’re at all like me, you document all sorts of things and you too might find it handy to know how to save a web page as text. It’s not a complicated task; you can do it in the terminal easily enough. So, if you want to save a web page as text, read on! 

This intro should be rather short. Imagine that!

I don’t have to explain what a web page is. It’s a page (just a page) on a website.

I don’t have to explain what text means. We’ll just be using .txt files.

While this isn’t something I’ve bothered with in a long time, you might find it interesting and helpful. If you’re into keeping notes of things you want to learn more about and remember, you may find saving a web page as text worthwhile.

You can organize the text files however you want and one of the best benefits is that you can perform searches on your local documents easily enough. This might be something that interests you, especially if you’re new and browsing around the web looking for things to learn.

We’ll only be using a couple of tools. We will be using the terminal.

curl:

The first application you’ll need to save a web page as a text file will be the curl application. The curl application is used to transfer a URL. A curl command downloads a file and shows it in your standard output.

If you check the man page, you’ll see:

curl – transfer a URL

See? Exactly as I had said. It’s the correct tool for the job. 

You can also see this article about curl:

Let’s Have a Limited Look at Linux’s cURL Application

html2text:

This should be obvious by the title. It should be made further obvious by the title of this article. This is an application that turns HTML (Hypertext Markup Language – what is used on web pages more often than not) into plain text.

If you check the man page, you’ll see:

html2text – an advanced HTML-to-text converter

Once again, a fine application for the task at hand. You’ll see!

Save A Web Page As Text:

As mentioned above, this is a terminal-based operation. We’re going to save a web page as text, but we’re going to do it in the Linux terminal. More often than not, a terminal can be opened by pressing CTRL + ALT + T on your keyboard.

I’ll give installation instructions for the apt-using distros out there. These packages will be available in your package manager if you’re using any of the major distros. Just adjust these commands to match your needs.

curl:
sudo apt install curl
html2text:
sudo apt install html2text

We’re interested only in the -o (output) flag for this application of html2text.

The Process:

The syntax to save a web page as text is simple. It looks like this:

curl <URL> | html2text -o <saved_filename>.txt

Simply, we’re using the curl application to grab the data, we then send that data through the pipe command where it’s processed by the html2text application.

An example would look like this:

curl https://linux-tips.us | html2text -o linux-tips.txt

You can, of course, save individual pages as text. Here’s an example:

curl https://linux-tips.us/create-a-new-user/ | html2text -o create_a_new_user.txt

The terminal output is interesting:

$ curl https://linux-tips.us/create-a-new-user/ | html2text -o create_a_new_user.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  195k    0  195k    0     0  66303      0 --:--:--  0:00:03 --:--:-- 66319

Then, you can use a plain text editor to read (and edit) the text file. You can view it in the terminal with just the cat command. That’d look like this:

cat <saved_filename>.txt

Though, it’s probably easier to read the saved file with a decent plain text editor that has a GUI. There’s an abundance of text editors available for Linux, so pick your favorite and use that to read the saved output.

Closure:

Well, if you have ever wanted to save a web page as text, you now know how to do that. This was an article that came not from my notes but from my memory. I used to do this with some regularity but I’ve stopped doing so as of late. I haven’t kept so many new notes lately, though I’m not sure why not.

Anyhow, this is a nice and simple exercise that anyone should be able to follow. If you’re using a different package manager it may take a bit more effort, but it’s not complicated. The packages should be available in all the major distros, or something similar. The curl application will certainly be available and might even be installed by default.

Thanks for reading! If you want to help, or if the site has helped you, you can donate, register to help, write an article, or buy inexpensive hosting to start your site. If you scroll down, you can sign up for the newsletter, vote for the article, and comment.

KGIII

Retired mathematician, residing in the mountains of Maine. I may be old and wise, but I am not infallible. Please point out any errors. And, as always, thanks again for reading.

Recent Posts

Short: The Halt Command

Today's article is intentionally short because we're simply going to learn about the basics of…

9 hours ago

Installing Flatpaks In Linux

Today we're going to have a pretty easy article where you'll learn about installing Flatpaks…

2 days ago

A Quick Look At The Shutdown Command

This is going to be just a quick article about a command many of you…

4 days ago

Setting Up Coding Environments on Linux for Educational Use

With so many strong attributes, such as robustness and flexibility, Linux stands as a powerful…

6 days ago

View Detailed Hardware Information

There are many tools for showing your hardware information and today we'll get to view…

1 week ago

How To: Install Wine In Lubuntu

Today's article isn't going to be complicated or long as we are going to cover…

1 week ago