A few days ago, after not touching my Arch install for months, I decided to do an update. This installed a new kernel, so of course I did a reboot.
After grumbling about my up-time, I went about my day. Things were fine until I tried to start thunar, and it just... didn't.
...
...
...
Oh, there it is!
Yes, my computer has a new issue. Thunar takes almost exactly 25 seconds to actually start up. This issue also applies to pavucontrol, my screenshot utility, my podcast manager, and a few other programs.
Luckily, Linux has a lot of useful tools built-in for debugging crazy issues like this! The first one I like to reach for is strace
. This is a lovely utility for logging system calls to the terminal. I find this is an absolutely perfect level of abstraction in a lot of situations. It's high-level enough that I'm not stepping through assembler instructions, but still low-level enough that I have a precise view of what's going on.
I fire up strace
with strace pavucontrol
. It spits out a ton of logs, before eventually hanging:
write(8, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55cfbbf8d300, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55cfbbf88b78, FUTEX_WAKE_PRIVATE, 1) = 1 eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 11 futex(0x7fe8833f1ba0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55cfbbf8dff0, FUTEX_WAKE_PRIVATE, 1) = 0 write(8, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55cfbbf88b78, FUTEX_WAKE_PRIVATE, 1) = 1 poll([{fd=11, events=POLLIN}], 1, 25000
Well, there's our 25 second timeout - 25000 milliseconds. At least, I assume this is that, but I'm really just making a guess. Anyway, what's fd=11?
lsof
is the next tool to reach for. First, I start pavucontrol
a second time.
I grab its PID with ps -aux | grep pavucontrol
:
stephen 2319431 0.6 0.0 10480 3800 pts/0 S+ 22:39 0:00 strace pavucontrol stephen 2319434 1.0 0.1 316492 36768 pts/0 Sl+ 22:39 0:00 pavucontrol stephen 2319680 0.0 0.0 9112 2400 pts/1 S+ 22:39 0:00 grep --color=auto pavucontrol
Now we have its PID - 2319434. We can feed this into lsof to get a bit more information about the file descriptor.
lsof -p 2319434
... pavucontr 2319434 stephen 0u CHR 136,0 0t0 3 /dev/pts/0 pavucontr 2319434 stephen 1u CHR 136,0 0t0 3 /dev/pts/0 pavucontr 2319434 stephen 2u CHR 136,0 0t0 3 /dev/pts/0 pavucontr 2319434 stephen 3u unix 0x00000000b0fa1eb9 0t0 5191565 type=STREAM (CONNECTED) pavucontr 2319434 stephen 4u a_inode 0,14 0 1076 [eventfd:245] pavucontr 2319434 stephen 5u a_inode 0,14 0 1076 [eventfd:246] pavucontr 2319434 stephen 6u CHR 136,0 0t0 3 /dev/pts/0 pavucontr 2319434 stephen 7u unix 0x000000005f742058 0t0 5205220 type=STREAM (CONNECTED) pavucontr 2319434 stephen 8u a_inode 0,14 0 1076 [eventfd:247] pavucontr 2319434 stephen 9u unix 0x0000000000946e4a 0t0 5205222 type=STREAM (CONNECTED) pavucontr 2319434 stephen 10u a_inode 0,14 0 1076 [eventfd:248] pavucontr 2319434 stephen 11u a_inode 0,14 0 1076 [eventfd:254]
From the last line (the one with 11u in it) we can see it's an anon_inode eventfd. I have no idea what this means but we can work with it.
Years of experience with Linux has given me trust issues, so let's do a quick confirm. ls -l /proc/2319434/fd
lrwx------ 1 stephen stephen 64 May 18 22:39 11 -> 'anon_inode:[eventfd]'
Yup. Still an anon_inode eventfd. And I still have no clue what this means. I did try to research it but didn't get too far. I guess I need some more prerequisite knowledge here!
From my years of experience with C (there was a time when it was my favourite language, for some reason), gdb
is what I want to reach for next. We can use it to step through pavucontrol
line by line. It gives us a bit more granularity than just using strace
.
I start pavucontrol once again, this time using gdb pavucontrol
.
$ gdb pavucontrol GNU gdb (GDB) 13.1 Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from pavucontrol... This GDB supports auto-downloading debuginfo from the following URLs: <https://debuginfod.archlinux.org> Enable debuginfod for this session? (y or [n]) y Debuginfod has been enabled. To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit. Downloading separate debug info for /usr/bin/pavucontrol Reading symbols from /home/stephen/.cache/debuginfod_client/74e85fe333bf717a0434dec5c4444aa66cd73d41/debuginfo... (gdb) r Starting program: /usr/bin/pavucontrol Downloading separate debug info for /lib64/ld-linux-x86-64.so.2 Downloading separate debug info for system-supplied DSO at 0x7ffff7fc8000 Downloading separate debug info for /usr/lib/libgtkmm-3.0.so.1 Downloading separate debug info for /usr/lib/libatkmm-1.6.so.1 ... [lots more of this] Downloading separate debug info for /usr/lib/libgpg-error.so.0 Downloading separate debug info for /usr/lib/gtk-3.0/modules/libcanberra-gtk-module.so Downloading separate debug info for /usr/lib/gio/modules/libgvfsdbus.so Downloading separate debug info for /usr/lib/gvfs/libgvfscommon.so [New Thread 0x7ffff25ff6c0 (LWP 3875580)] [New Thread 0x7fffebfff6c0 (LWP 3875581)] [New Thread 0x7ffff1dfe6c0 (LWP 3875582)] (pavucontrol:3873842): Gtk-WARNING **: 19:39:10.626: Theme parsing error: gtk.css:3611:22: 'none' is not a valid color name ^C Thread 1 "pavucontrol" received signal SIGINT, Interrupt. 0x00007ffff6913c0f in __GI___poll (fds=0x55555574b570, nfds=1, timeout=25000) at ../sysdeps/unix/sysv/linux/poll.c:29 Downloading source file /usr/src/debug/glibc/glibc/io/../sysdeps/unix/sysv/linux/poll.c 29 return SYSCALL_CANCEL (poll, fds, nfds, timeout);
Alright, so there's a few things there I want to point out. First of all, gdb helpfully asks if we want to download debuginfo automatically. To be honest, I'm not entirely sure on the mechanism here, but it's a massive quality-of-life improvement, so of course we say yes. Then gdb downloads all the debuginfo, obviously. Once this is done, and the program is allowed to run for a few seconds, we Ctrl+C it to pause execution.
We're apparently hanging at that SYSCALL_CANCEL
line. A simple p timeout
shows that yes, our timeout is indeed 25000. Exactly what we were seeing before!
Armed with this new information, I started doing more research. Eventually, I stumbled my way onto this post. To be honest, I'm not entirely sure how I got there, but the fix, sudo pacman -Rndd xdg-desktop-portal xdg-desktop-portal-gtk
, immediately had things starting instantly again!
So, why did I make this post? I think documenting my debugging steps is useful - even if only for myself. This is basically my very generic process for digging into issues when I have so little to work on that I can't even search it up.