summaryrefslogtreecommitdiff
path: root/pkgs/development/python-modules/termplotlib/gnuplot-subprocess.patch
diff options
context:
space:
mode:
authorSamuel Dionne-Riel <samuel@dionne-riel.com>2024-12-31 20:06:56 -0500
committerSamuel Dionne-Riel <samuel@dionne-riel.com>2025-01-01 00:32:55 -0500
commit40fb9e7c5a45af3f63fc821067f044d3d8a4befc (patch)
tree1b4bd36cc438b71f5da67449396fa1ff9ea58812 /pkgs/development/python-modules/termplotlib/gnuplot-subprocess.patch
parent88195a94f390381c6afcdaa933c2f6ff93959cb4 (diff)
switch-to-configuration-ng: Fix exit status on bootloader install error
The problem ----------- When rebuilding a system, if `switch-to-configuration-ng` fails to install bootloader files, it will (most likely) `exit(0)`. ``` /etc/nixos $ sudo nixos-rebuild --fast boot && reboot [sudo] password for samuel: building the system configuration... updating GRUB 2 menu... cannot copy /nix/store/.../initrd to /boot/kernels/...-initrd.tmp: No space left on device Failed to install bootloader Broadcast message from samuel@... on pts/1 (Tue 2024-12-31 16:48:26 EST): The system will reboot now! ``` This is a quite awkward breaking change with the expected behaviour. * * * The investigation ----------------- Compare: - https://github.com/NixOS/nixpkgs/blob/85b5f3e959327a3fa46f843848ebb8799069bb95/pkgs/by-name/sw/switch-to-configuration-ng/src/src/main.rs#L171-L179 - https://github.com/NixOS/nixpkgs/blob/85b5f3e959327a3fa46f843848ebb8799069bb95/nixos/modules/system/activation/switch-to-configuration.pl#L115-L117 Let's see what `die()` is all about: - https://github.com/NixOS/nixpkgs/blob/85b5f3e959327a3fa46f843848ebb8799069bb95/pkgs/by-name/sw/switch-to-configuration-ng/src/src/main.rs#L121-L125 ***sus.*** There are multiple issues converging here. **Incorrect port** The original implementation did not use `die`, but `exit 1`. So porting from `perl` following the script's idiosyncrasies was not done appropriately. **Incorrect `die` fac-simile** The `die` method is incomplete with regard to the semantics of perl. - https://perldoc.perl.org/5.40.0/functions/die Of importance to us: > If [die is called], the exit code is determined from the values of > `$!` and `$?` with this pseudocode: > > ``` > exit $! if $!; # errno > exit $? >> 8 if $? >> 8; # child exit status > exit 255; # last resort > ``` The `die()` method in `switch-to-configuration-ng` *only* checks `errno`, using its value directly to `exit()`. It does not handle some form of implicit child process exit status. And, due to incorrect assumptions, it will not fall back to anything. **Incorrect implementation** (Note that from this point on, I'm not a Rust expert, so bear with me if some nuances are lost or incorrectly represented.) The `die()` function implementation, as a port, might not even work correctly. Already, the `spawn` method does not mention it would be setting `errno`, so any `die()` following a `status.success()` is *sus* and should be investigated. Since it's not attempting to do anything "smart" with child processes. - https://doc.rust-lang.org/1.83.0/std/process/struct.Command.html#method.spawn And I'd argue that using *errno* in this manner in Rust is probably a mistake, and should not be done. > This should be called immediately after a call to a platform function, > otherwise the state of the error value is indeterminate. - https://doc.rust-lang.org/1.83.0/std/io/struct.Error.html#method.last_os_error Considering *platform function* is largely left undefined, I would (probably wrongly) intuit that it should be considered undefined behaviour to rely on it. Note that `raw_os_error` might have a surprising interface. > If this `Error` was constructed via `last_os_error` [...], > then this function will return `Some`, otherwise it will return `None`. - https://doc.rust-lang.org/1.83.0/std/io/struct.Error.html#method.raw_os_error Since it's used as `std::io::Error::last_os_error().raw_os_error()`, AFAIUI it will always return `Some`. Since this is exposing `errno`, the libc concept, it will behave the same, and may be set to `0` by default, just like here: ``` $ printf '#include <stdlib.h>\n#include <errno.h>\nint main() { exit(errno); }' \ | cc -x c - && ./a.out; echo $? 0 ``` Which means that, since no *platform function*[sic] changed its value, it will be zero, the `die()` function will be equivalent to `exit(errno)`, and the program will have failed “successfully” wrongly. * * * The fix ------- I've fixed the `do_pre_switch_check` and `do_install_bootloader` methods, both of which share the same defects (the original script uses `exit 1` for both). They were the only `status.success()` checks using `die()`. * * * Reproducing the issue --------------------- Remember how I said: > Considering *platform function* is largely left undefined, I would > (probably wrongly) intuit that it should be considered undefined > behaviour to rely on it. Here's why it's not some vague FUD. First, make sure a `nixos-rebuild boot` would need to write new files to the boot partition. Removing an older (but still alive) generation's initrd can do that. Fill the `/boot` partition to force an error. ``` $ sudo dd if=/dev/zero of=/boot/BOGUS.FILLINGS ``` Then, and here's the fun part, observe: ``` ~ $ sudo rm -r /run/nixos ~ $ sudo nixos-rebuild --fast boot ; echo $? building the system configuration... updating GRUB 2 menu... cannot copy /nix/store/x91w4p91l7iclkdp38chvdxcw6nr5113-mobile-nixos-initrd-generic/initrd to /boot/kernels/x91w4p91l7iclkdp38chvdxcw6nr5113-mobile-nixos-initrd-generic-initrd.tmp: No space left on device Failed to install bootloader 0 ~ $ sudo nixos-rebuild --fast boot ; echo $? building the system configuration... updating GRUB 2 menu... cannot copy /nix/store/x91w4p91l7iclkdp38chvdxcw6nr5113-mobile-nixos-initrd-generic/initrd to /boot/kernels/x91w4p91l7iclkdp38chvdxcw6nr5113-mobile-nixos-initrd-generic-initrd.tmp: No space left on device Failed to install bootloader warning: error(s) occurred while switching to the new configuration 1 ``` So... What's the deal with /run/nixos? It's where the lock file will reside. (And other transient files.) - https://github.com/NixOS/nixpkgs/blob/85b5f3e959327a3fa46f843848ebb8799069bb95/pkgs/by-name/sw/switch-to-configuration-ng/src/src/main.rs#L1018-L1027 But why does that matter here? ``` ~ $ errno 17 EEXIST 17 File exists ``` This error is produced by some *platform functions*[sic] that create either the directory, or the lockfile. The file already exists. So the script would end-up failing this way *only for the first invocation*. Which is why it's possible any of you all reviewing this ~~novel~~ PR haven't faced that issue. * * * Future work ----------- I believe `die()` *probably* should be switched to check the value, and `exit 255` if it's 0. Though I also believe `die()` shouldn't try to port perl semantics into Rust. I don't think it's working out. Additionally, a NixOS test should be authored to ensure that errors in these phases actually are handled appropriately. Signed-off-by: Samuel Dionne-Riel <samuel@dionne-riel.com>
Diffstat (limited to 'pkgs/development/python-modules/termplotlib/gnuplot-subprocess.patch')
0 files changed, 0 insertions, 0 deletions