A play on the Audi slogan: Vorsprung Durch Technik. Except we’re going to talk about something that is clearly not progress. Systemd. Roughly 6 years ago, Systemd came to life as the new, event-based init mechanism, designed to replicate the old serialized System V thingie. Today, it is the reality in most distributions, for better or worse. Mostly the latter.
Why would you oppose progress, one may say. To that end, we need to define progress. It is merely the state of something being newer, AKA newer is always better, or the fact it offers superior functionality that was missing in the old technology? After all, System V is 33 years old, so the new stuff ought to be smarter. The topic of my article today is to tell you a story of how I went about fixing a broken Fedora 24 system – powered by systemd of course, and why, at the end of, my conclusion was one of pain and defeat.
We all know that people are resistant to change, so sometimes, it is difficult to differentiate between legitimate opposition to inferior products and a kneejerk emotional response. This is why changes need to be measured in the same way biology measures its success. We need to look at the evolutionary model and apply it to technology. We need to ask, is the new stuff really better? And it comes down to how easy it is for a new product to survive – in the hands of its users, and how much investment is needed to sustain a successful level of existence. If you’re looking for an even more abstract analogy, it’s about energy. Any new model ought to be even more energy efficient, and require fewer resources to maintain the optimal, often minimal level of survival.
This model is highly relevant and applicable to software, much more than you may imagine. Take a look at Windows 8, and the introduction of the Screen Menu, which effectively doubled the number of actions required from users to access common things previously available in the classic desktop setup. This increase in effort without any increase in productivity led to a huge backlash from the community, and eventually, Microsoft went back on its decision and restored the standard menu, because it is superior on the evolutionary scale.
The same thing applies to Windows 10 settings versus the old Control Panel. Close to home, the Gnome desktop environment is another good example of an evolutionary regression. Opening applications takes an extra click, as they are not readily available on any workspace or panel. Some of the criticism may sound petty, but it touches the core of what we are, as a human race. We are biological engines, and we are designed to be lazy.
If you have never used System V like, eh, systems, before, then you probably won’t care about Systemd. However, if you have performed system administration tasks on a Linux box in the old era and now need to do the same with Systemd, then you might be interested to know how much extra effort, apart from the obvious learning curve, you need to keep your systems healthy and going. This ties into our example. We have a broken Fedora box. And we want to try to fix it.
My ailing system exhibits the following symptoms – it cannot reach the Gnome desktop environment. It is stuck booting, and there are no indications as to what may have gone wrong. There are no virtual consoles available, and thus, I am unable to access any sort of debug information in order to try to resolve things.
The problem in more detail
So, the system is not booting. We only have the progress bar, which slows down as it reaches the right end, and then it never transitions into any kind of logic screen or desktop. Using Ctrl + Alt + F1-7 or any combo thereof does not help in any way.
I realized the only way I was going to make progress was to boot into a Fedora live session and try to figure out from there what went wrong. This meant accessing the disk and mounting the filesystems first, which required a little bit of hackery, as I was using the default LVM setup. But as I’ve shown you in a tutorial on exactly this very topic, it’s not too difficult. In the live session, once I had the original /var/log available, I could finally see the root of the problem:
[FAILED] Failed to start Journal Service [DEPEND] Dependency Failed for Flush Journal to Persistent Storage
This is a Systemd message, and it tells us that something went wrong with the journaling service, which is governed by the journald daemon of the Systemd framework. At this point, I had to invest a lot of time and read on the finer details of the new init system, as well as a dozen forum threads discussing the issue. Unfortunately, everyone had an ever so slightly different manifestation of the problem, and the suggestions did not bear any fruit. Moreover, there wasn’t a single educated thread of information, more sort of trial & error guesses and hunches as to what should be done.
This led me to a troubling realization that the documentation and general knowledge on Systemd are very rare. There just isn’t the abundance of tips and tricks like you could find on System V, which would help you almost immediately narrow down your problem and resolve it. The problem affects both official sources as well as the Internet collective. The reason is quite simple though – Systemd is too complex for its own good and everyone forced to use it.
The number of components is just overwhelming – you have daemons, targets and core elements, as well as utilities. Targets touch into the user messaging systems, graphical interface and even the user session. The effort needed to master Systemd is probably equal to mastering all other components of a typical Linux system. This breaks the simplicity principle on which Linux is built, and it also highlights the development-focused nature of the framework. Systemd is a technology designed to make development easy, not necessarily the end usage. Now, let’s go back to our example.
Why Systemd is not suited for purpose
I decided not to give up and do some more work on my own. The boot log was not sufficient, and I had to open the journal logs and read what is inside. This turned out to be much more difficult than ever before. Systemd does not keep logs in a simple way, nor does it utilize flat files. Nope. You get binary files, with an internal database structure. This probably makes it easier for developers to do their work, it makes it impossible for administrators to troubleshoot.
When you’re running in a live session, you want the absolute minimum of tools to be able to debug your box. Alas, Systemd requires its own journalctl utility as a broker for reading and interpreting the information. Now, do not get me wrong, System V was not without its foibles. Pacct is a good example of complexity gone mad, but there, the binary method was required to cope with a huge volume of data. Systemd could use a simple method for booting, but because it is such a complex and inter-dependent framework, the boot process was sacrificed for the sake of all other components.
Now, journalctl normally expects the default root to be used, so you actually need to use –root or –file flags to open files that are not part of the running system. Again, this complicates access to logs, as you need to sort-of chroot into a different filesystem. The location of logs and the naming convention is counter-intuitive. To access the logs I needed to cd into:
/run/media/liveuser/<mount point>/var/log/journal/<unintelligible journal id>/
Then, here, you have user and system logs, labeled user-<userid>@<random-numbers>.journal and system@<random numbers>.journal, respectively. There are also the user-<userid>.journal and system.journal files, which should be identical to some of the previously mentioned entries. Moreover, the journal ID should be unique and persistent across boots, and if you get different numbers for multiple entries, then you may have a problem, which could be the reason for the earlier boot failure.
However, while this sounds easy when explained – it is very difficult to decipher while troubleshooting, and it is also not meaningful in any way. The numbers and the naming convention make sense from a purely transactional database perspective, similar to what you would have when working with cloud object storage, but not when you need to debug problems as a human. Moreover, we still have not accessed these logs.
After I located the logs, realized they were not accessible as text files, and used journalctl with the –file option, I then started going through the log to try to understand what may have gone wrong. The log files are very difficult to read. They also do not break over multiple lines, so you do not actually get the full output. Worse yet, I wasn’t able to find a single line that indicated of any problem.
I then spent more time reading online, including Arch, Fedora, Mageia forums and other sources, to no avail. One of the suggestions mentioned a bug with journal persistence – hence the different random numbers in log files, which was the result of a wrong service type defined in the Systemd journal service unit file. I decided to explore this in more detail and see if I could fix it.
Unit files are similar to old System V RC files, explaining what each service needs to do when starting and stopping, except they are written in a far more convoluted way. The location of the unit files is also controversial, as they are located under: /usr/lib/systemd/system/<service name>.service. Specifically for the journaling daemon, the name corresponds to systemd-journald.service. I would expect system files to be located under /etc, especially when it comes to configurations. The new model breaks a well known convention.
In the unit file, I looked for wrongly declared directives – specifically the Service type should be set to Notify, which it already was. However, other than blindly bumping around after spending roughly two hours chasing solutions online, I was nowhere near closer to understanding let alone resolving my problem.
I believe I’m a highly technical and tech-savvy person with a very good intuition around technology. I am able to learn and test new things and concepts with very little need for manuals, help files, online sources, wikis, or any other types of information. I can sit down and start using the software immediately, and this has always been the case. Good examples include KVM, Docker, and others, none of which are trivial. However, Systemd completely stumped me. It is one of the few technologies that has no regard to previous experience, knowledge or ability to work logically. It is designed as a dimensionless framework with an object-oriented approach to events and triggers, and as such, it does not follow the basics of logic that is imprinted in our brain. It is software that is best suited for AI, never for human interaction.
At the end of the day, I had learned a few things around Systemd, but very little of any practical value. I had mostly uncovered various flaws in the design, which I’m sure can be easily self-justified. I had realized that Systemd is complex, difficult to use and read, and not very helpful in solving problems. Indeed, my Fedora box remained unbootable, and I was forced to rebuild the installation. I do not recall a single case where I was unable to fix a Linux box when it was still powered by good ole init. I had worked on some really bad issues, but I was always able to recover. With Systemd, I had to concede defeat. The notion of events and timeouts is just completely wrong. And it goes against the Darwinist principles of evolution, be it biology or software.
We cannot stay in the past. We must change. Unfortunately, sometimes, future solutions fail to deliver, because they are trying to fix a problem that does not exist. I’m sure a ton of developers can easily point to a hundred flaws in init, but that does not mean that Systemd is the answer. The same way the internal combustion engine has its merits 130 years after it was created, System V and init are probably not ready to be relayed to history, especially not when Systemd is the current proposed alternative. It simply does not have what it takes to be the superior functional and evolutionary replacement.
It is a change, but one that introduces a huge amount of energy into the environment, and it offers very little to no practical advantage over the previous implementation. This is true of many other recent changes in the Linux world, including the windows system, the audio framework, desktop environments, and more. The trend is quite alarming, as it is enabled and led by new technology, but not necessarily innovation. The two are not necessarily mutually inclusive or given. And that is the distinction that seems to have eluded the Linux world a lot recently. It is evident in the decline of popularity, stability and acceptance of Linux distributions, most likely because the focus is on re-inventing that which does not require any.
As far as Systemd is concerned, I am concerned, because it is a technology that does not correlate to knowledge or experience, and it poses a great risk to the prosperity of Linux. Evolution has its ways of telling us when we’ve done something wrong, so it will be interesting to judge what is happening today15-20 years from now. I do not foresee bright times. And you might as well practice Linux installations, since they may be the answer to when Systemd goes bad, as I cannot foresee any easy, helpful way out of trouble. Stay strong.