How Does Pacman Know What to Update
How does pacman know what to update?
Suyash Singh
Posted by Suyash Singh
on October 18, 2022
Photo by shark on Unsplash

Pacman, short for PACkage MANager, comes as a default for Arch Linux. I am a big fan of Arch’s rolling release model and I find the way in which pacman efficiently manages it all very beautiful. If like me, you have also wondered how does it manage the rolling release model of Arch beautifully this post is going to keep the explanation crisp & short.

Pacman’s official page on Arch Linux website reads the following:

pacman is a utility which manages software packages in Linux. It uses simple compressed files as a package format, and maintains a text-based package database (more of a hierarchy), just in case some hand tweaking is necessary.

pacman does not strive to “do everything.” It will add, remove and upgrade packages in the system, and it will allow you to query the package database for installed packages, files and owners. It also attempts to handle dependencies automatically and can download packages from a remote server.

Core of pacman

At the core of pacman is ALPM (Arch Linux Package Manager) library, and essentially pacman is the front-end to ALPM. ALPM is where all the actual package management tasks happen.

Server-client architecture

Pacman primarily relies on the server-client architecture model wherein a remote repository maintains all the different versions of all the available packages. And, users’ machines (clients in this case) synchronizes the local database with the server’s version.

Anatomy of a usual pacman system upgrade command

Generally, arch users upgrade their system using the pacman -Syu command like this:

$ sudo pacman -Syu
:: Synchronizing package databases...
 core                  157.2 KiB  52.5 KiB/s 00:03 [###############] 100%
 extra                1722.1 KiB  1156 KiB/s 00:01 [###############] 100%
 community               7.1 MiB   936 KiB/s 00:08 [###############] 100%
:: Starting full system upgrade...
resolving dependencies...
looking for conflicting packages...
 
Packages (180) ...  haskell-aeson-2.1.0.0-2
               haskell-aeson-pretty-0.8.9-97
               haskell-ansi-terminal-0.11.3-31
               haskell-ansi-wl-pprint-0.6.9-348
               ...
 
Total Download Size:   127.70 MiB
Total Installed Size:  492.69 MiB
Net Upgrade Size:        0.09 MiB

When this happens, the -S argument asks pacman to sync the local package version database with the remote and the -y & -u arguments instruct pacman to refresh & perform system upgrade respectively.

Cache & Databases

Pacman stores the packages it downloads in the /var/cache/pacman/pkg directory. A quick look inside the directory shows primarily two types of files:

$ ls /var/cache/pacman/pkg
 
<PACKAGE_A>.4rc5-14-x86_64.pkg.tar.zst
<PACKAGE_A>.4rc5-14-x86_64.pkg.tar.zst.sig

Pacman usually stores the version database inside the /var/lib/pacman/sync directory:

$ ls /var/lib/pacman/sync
 community.db   core.db   extra.db

Behind the scenes

  • The information for all the packages inside the pacman databases are maintained in the .db files present in /var/lib/pacman/sync directory. When a sync happens the db files are first updated locally and a list of packages to be updated is prepared
  • Previously downloaded packages are loaded and compared against the packages list obtained from the just updated db for package repositories
  • Once pacman calculates the list of packages & their dependencies which needs to be updated it presents the list of these packages to the user
  • Pacman runs the prehooks if any Packages and their dependencies which should be fetched from remote are downloaded inside the var/cache/pacman/pkg location
  • It verifies the package integrity using the .MTREE file against the information present inside the package desc file in the repository db
  • Pacman then essentially copies the the uncompressed package with proper permissions into the respective location
  • Pacman runs the posthooks if any
  • The above high level flow is transactional, i.e. a lock is placed whenever the system is going through an upgrade transaction.
  • Scenarios where a package needs to be removed are handled by effectively removing the binary files previously copied during installation. Configuration files are usually kept as it is.

Final thoughts

Pacman is a fast pacman manager because of the following:

  • It downloads the binaries of the packages directly instead of building from sources
  • Starting v6 it now supports parallel downloads (not on by default)
  • The number of packages maintained in the official repositories is relatively small
  • Use of the cache directory helps it to save computation time
  • Maintaining a local directory format inside the repository db makes things really fast since there are just ~13k (in 2022) packages in total in the official repositories
  • Use of .MTREE packaging structure helps it verify integrity of the packages quickly
  • Use of .PKGINFO helps it to quickly workout the dependencies if necessary

Further reading