Klaus Zimmermann's Corner

How to install all manpages in Alpine Linux

An old instruction book of telephones called 'The Telephone, and how we use it'. It depicts two children dressed in 50s fashion holding wired handsets to their ears and presumably talking to each other.

EDIT (20240717): it was pointed out to me by user @amk that the same end-result can be achieved by simply installing the "docs" Alpine package, which will automatically pull in all missing documentation on the system, including ones that the procedure below misses. If you're looking to install manpages and keep them updated as new packages arrive, this is the best way to go.

The text below remains useful as an exercise to learn how to query textual information in a Unix environment, and you may use it to practice those skills yourself.

It's no secret that one of my favorite distributions is Alpine Linux, which fits both my older computers and portable USB installations alike. It has given new life in many an ancient machines that I have in my house and is small and light enough to fit as a bug-out USB, as a complete Desktop session.

In pursuing its mission of being small, fast and secure aiming for things like containers or embedded systems, however, there are some things that the Alpine project does that "feels" a little weird for your average Desktop system properly. Some common drivers missing removed presumably to save space, missing coreutils programs substituted by smaller watered-down busybox versions, and a complete lack of offline, local documentation.

So Alpine doesn't come with man. Including manpages takes up storage space in the distribution, and probably anyone messing with Alpine in a container or remote machine already has access to this documentation in the host machine. Still, what about my use case as a main-use driver system? I don't have a "host" machine to look, and not always have an internet connection to search for examples of commands. How can I make my Alpine machine self-reliable offline?

Turns out that the repositories neatly separate the software from documentation by having manpages and other docs identified by the -doc prefix so that helps. However, it leaves with another problem: how to find out all of the packages missing out on documentation? Remembering that not all packages have associated -doc packages.

I had a hunch this job would involve some major text processing and Unix tools usage (sed, awk, grep...).

And, boy, was I right.

Step 1: install man

The man program itself is available in the mandoc package. Apparently, being a documentation reader itself, it comes with its own documentation package preinstalled, so there's no mandoc-doc package to install:

apk add mandoc

Easy-peasy. So you now have a manual reader... but no manual pages except maybe for those about man itself.

Step 2: list all the installed packages in the system

Which packages need their documentation installed as well? apk can help you out here. First apk list will show you all the packages you have currently in your system. You have to do some parsing, as it helpfully includes things like version information too, but it's not exactly relevant to our task at hand here. Save that content somewhere to parse afterwards:

apk list -I | awk '{print $1}' | sed 's/-[0-9].*//' > installed_pkgs

OK, you have a lsit of packages currently in your system. Just tack -doc to the end of the packages in that list, install them and you are good to go, right?

Nope, not quite.

Step 3: force an error upon apk and use it

This is because not all packages have an associated -doc package, and apk throws an error when an "unknown" package is required. So the question now is to find out exactly which of those installed packages have an associated documentation package and install only those.

So apk will throw an error if you attempt to install all those non-existing docs. But who says that we can't use it to our own advantage? Why not use those errors as a way to flag which packages not to install?

Surely enough, we can use this to our advantage. First, add -doc to every line in this list:

sed -i 's/$/-doc/g' installed_pkgs

Then capture the full error into a second file for comparison:

xargs apk add < installed_pkgs 2> incompatible_pkgs_raw

You could, for example, prefix the installed_pkgs file with apk add \ and execute it as a script.

An interesting consequence is that apk mentions "required by" on every package that is in error in the operation. We can use this to our advantage to filter out packages that shouldn't be in our apk query, as well as other lines. We extract the package names in the end with a neat awk command.

cat incompatible_pkgs_raw | 
    grep -v "required by" |
    grep -v "ERROR" |
    awk '{ print $1 }' > incompatible_pkgs

Now you should have two lists - all installed files and all files without documentation - available for you. Can you use one to filter out the other?

Step 4: use diff's nemesis - comm

While I struggled for a while thinking about how I could use diff to filter and compare these two lists, I found somewhere a suggestion on a similar task to use instead comm for this.

I have to say that until I performed this task, I had never known about the comm command, which basically does the opposite of diff - it prints lines that are common to both files.

The forum answer for a task similar to mine suggested that I used the command in the following way:

comm -2 <(sort installed_pkgs) <(sort incompatible_pkgs)

This will highlight with a tab character the lines common to both appearing in incompatible_pkgs. Because I had trouble processing the tab character with grep later on, I decided to change them to something more manageable:

comm -2 <(sort installed_pkgs) <(sort incompatible_pkgs) |
    sed 's/\t/#/g' > difference_list

In short, you have to make sure both files are sorted before you compare them, otherwise many false positives will appear, but you can sort them in place and avoid having to create many temporary files.

And now you have a distinct file with only valid documentation packages available.

Step 5: clean up and create a new install list

With the command above, problematic packages were neatly prefixed with a tab character \t. That's a neat trick. You can filter them out now with a simple grep:

cat difference_list | grep -v "^#" > final_list

Now you can perform a clean install of all the valid documentation packages for the ones in your system!

xargs apk add < final_list

Mind you that it's not exactly a small install either - in my system I ended up downloading more than 600MB of documentation through this. But come to think of it, this is really quite negligible given the amount of disk space I have here.

Conclusion

Adding documentation to Alpine Linux isn't the hardest thing to do, but the fact that it doesn't install it by default does offer a hurdle to install everything retroactively once you have a decently ongoing system. However, with some command-line kung-fu and the use of the awesome Unix text processing tools, we can surely bump the "offline capabilities" of an Alpine system to those of a more mainstream Linux distro and have meaningful local documentation.

I probably should take this one step further and write out a script to automate this as I get other Alpine systems set up; after all this disk space is quite cheap.

And now I gotta make sure that I don't spoil this effort by making sure I always install a -doc package to match every package I install from now on!


If you daily-drive Alpine Linux, do you miss having documentation readily available in like other distributions? How did you install them all in your system? Let me know in Mastodon!


This post is number #56 of my #100DaysToOffload project.


Last updated on 07/17/24