freebsd-src.git - FreeBSD sources

Age	Commit message (Collapse)	Author
3 days	jail(3): fix common usage after mac.label support	Kyle Evans
	Nobody else's mac.conf(5) has any entries for jails, so they get a trivial ENOENT and we fail before we can fetch any jail parameters. Most notably, this breaks `jls -s` / `jls -n` if you do not have any loaded policy that applies jail labels. Add an entry that works for everyone, and hardcode that as an ENOENT fallback in libjail to provide a smoother transition. This is probably not harmful to leave in long-term, since mac.conf(5) will override it. This unearthed one additional issue, in that mac_get_prison() in the MAC framework handled the no-label-policies bit wrong. We don't want to break jail utilities enumerating jail parameters automatically, so we must ingest the label in all cases -- we can still use it as a small optimization to avoid trying to copy out any label. We will break things if a non-optional element is specified in the copied in label, but that's expected. The APIs dedicated to jaildescs remain unphased, since they won't be used in the same way. Fixes: db3b39f063d9f05 ("libjail: extend struct handlers [...]") Fixes: bd55cbb50c58876 ("kern: add a mac.label jail parameter") Reported by: jlduran (on behalf of Jenkins) Reviewed by: jlduran Differential Revision: https://reviews.freebsd.org/D54786
7 days	kern: add a mac.label jail parameter	Kyle Evans
	Have it take a `struct mac` and we'll paper over the difference for jail(8)/jls(8) in libjail(3). The mac_syscalls.h model is taken from mac_set_proc_*() that were previously done. Reviewed by: olce Differential Revision: https://reviews.freebsd.org/D53958
7 days	kern: mac: pull mac_label_copyin_string out	Kyle Evans
	A future commit to the area will further our jail integration and add a use for this: the struct mac itself was already copied in as part of vfs_buildopts(), so we only need to copyin the strings. We add an explicit flag argument because the jail operation will need to do it while holding the prison lock. Reviewed by: olce Differential Revision: https://reviews.freebsd.org/D53957
7 days	mac_set_fd(3): add support for jail descriptors	Kyle Evans
	We'll still add an old-fashioned jail param to configure jail MAC labels, but for testing it's really easy to grab a jaildesc and use that. Reviewed by: jamie, olce Differential Revision: https://reviews.freebsd.org/D53956
7 days	kern: mac: add various jail MAC hooks	Kyle Evans
	This adds the following hooks: - mpo_prison_check_attach: check for subject capability to attach to a given jail - mpo_prison_check_create: check for subject capability to create a jail with the given option set - mpo_prison_check_get: check for subject capability to fetch the given parameters for a jail - mpo_prison_check_set: check for subject capability to set the given parameters for a jail - mpo_prison_check_remove: check for subject capability to remove the jail check_get wouldn't typically be a privileged operation, but is included to give MAC policies a wider range of capabilities at a relatively low cost. We also add two more for the purpose of label propagation: - mpo_prison_created: surface the creation of a jail so that one can do propagation to, e.g., the root vnode or any mounts - mpo_prison_attached: attach an existing process to the jail so that one can propagate the jail label to the process, as appropriate. It is unclear if this is preferred vs. having separate associate entry points for each type of object we might associate. That would split these up like so: - prison_created -> prison_associate_vnode - prison_attached -> prison_associate_proc Some sample policy ideas that should be feasible to implement with this set of hooks, in case it's inspiring: - mac_bomb: policy that allows a poudriere user to construct jails without root privilege, given a restricted set of jail parameters. Slap a warning label on it. - mac_capsule: policy that realizes the capsule idea that I pitched[0] on -jail@ to create jails that are effectively immutable once sealed, using these hooks and a label. Perhaps a silly idea, but a downstream could consider a scenario where it can implement special jail enumeration using a MAC policy and a cooperating application that specifies non-parameter options to filter the results. [0] https://lists.freebsd.org/archives/freebsd-jail/2025-September/000550.html Reviewed by: olce (slightly earlier version) Differential Revision: https://reviews.freebsd.org/D53954
7 days	mac: add macros for 5-argument SDT probes	Kyle Evans
	A last-minute change to the jail MAC entry points in D53954 is going to pass the jail_[gs]et(2) flags to mac_prison_check_[gs]et() so that a policy can, e.g., reject or allow a change if the intent is to immediately attach, or disallow some fetching of dying jails. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D54658
7 days	kern: mac: add a MAC label to struct prison	Kyle Evans
	Reviewed by: olce Differential Revision: https://reviews.freebsd.org/D53953
2025-12-04	MAC: Rename mac_cred_create_swapper to mac_cred_create_kproc0	John Baldwin
	Reported by: markj Reviewed by: olce Differential Revision: https://reviews.freebsd.org/D54052
2025-11-24	MAC: Use the current thread's user ABI to determine the layout of struct mac	John Baldwin
	This removes mac_label_copyin32() as mac_label_copyin() can now handle both native and 32-bit struct mac objects. Reviewed by: olce, brooks Obtained from: CheriBSD Sponsored by: AFRL, DARPA Differential Revision: https://reviews.freebsd.org/D53755
2025-10-29	audit(4): Fix a typo in an kernel error message	Gordon Bergling
	- s/Authenticateion/Authentication/ MFC after: 5 days
2025-10-27	audit: convert audit event class lookup to lockless	Andrew Gallatin
	When system call auditing is enabled, every audited call does a lookup in the evclass hash table. This table appears to be insert only (eg, nothing can be removed) and protecting it with an rwlock is overkill. Using an rwlock causes just the atomic operations to maintain uncontended rwlock state to be responsible for measurable overhead on high core count servers making lots of system calls. Given that the evclass hash table can never have items removed, only added, using a mutex to serialize additions and converting to ck_list allows sufficient protection for lockless lookups. In a contrived example of 64 cores, all reading 1 byte from their own file, this change increases performance from 5M reads/sec to 70M reads/sec on an AMD 7502P. Reviewed by: markj, mjg, glebius (privately) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D53176
2025-10-18	knotes: kqueue: handle copy for trivial filters	Konstantin Belousov
	Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D52045
2025-10-13	MAC: Use proper prototype for SYSINIT functions	Zhenlei Huang
	MFC after: 1 week
2025-10-13	audit: Use proper prototype for SYSINIT functions	Zhenlei Huang
	MFC after: 1 week
2025-09-29	MAC/do: Check executable path from the current jail's root	Olivier Certner
	Contrary to my initial belief, vn_fullpath() does return a vnode's path from the current chroot, and not from the global root (which would have been a bug also, but without security consequences). This enables a "confused deputy"-like scenario where a chroot(2) can change which executable can be authorized by MAC/do, which is even more problematic for unprivileged chroot(2). This was found by re-examining the code following two close events: 1. Shawn Webb sent a mail to freebsd-hackers@ on 08/05 saying that in HardenedBSD they had added a check on P2_NO_NEW_PRIVS (in mac_do_priv_grant()), which I responded to on 08/20 saying that P2_NO_NEW_PRIVS was not necessary for mac_do(4), with a correct reasoning but based on the wrong above-mentioned assumption about vn_fullpath(). 2. I reviewed some code by Kushagra Srivastava (GSoC 2025 student working on mac_do(4)/mdo(1)) adding the ability to specify which executables can spawn processes that mac_do(4) may decide to authorize (others are simply ignored), which currently is hardcoded to '/usr/bin/mdo'. MFC after: 3 days Event: EuroBSDCon 2025 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D52758
2025-09-17	MAC/do: Restore matching the first supplementary group	Olivier Certner
	As 'cr_gid' was in fact stored in cr_groups[0], rule_grant_supplementary_groups() would loop only on further elements of cr_groups[]. Now that cr_groups[0] is not 'cr_gid' anymore, but some supplementary group, take it into account. Fixes: be1f7435ef218b1d ("kern: start tracking cr_gid outside of cr_groups[]") MFC after: 5 days MFC to: stable/15 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D52271
2025-09-17	MAC/bsdextended: Restore matching subjects' effective GID	Olivier Certner
	Fixes: be1f7435ef218b1d ("kern: start tracking cr_gid outside of cr_groups[]") MFC after: 5 days MFC to: stable/15 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D52270
2025-09-16	jail: Optionally allow audit session state to be configured in a jail	Mark Johnston
	Currently it is impossible for a privileged, jailed process to set audit session state. This can result in suprising audit event misattribution. For example, suppose a user ssh'es into a jail and restarts a service; normally, sshd sets audit state such that events generated by the SSH session are attributed to the newly authenticated user, but in a jail, the corresponding setaudit(2) call fails, so events are attributed to the user who had started sshd in the jail (typically the user who had started the jail itself by some means). While this behaviour is reasonable, administrators might want to trust the jailed sshd to reset audit state, such that the authenticated user appears in audit logs. Add a jail knob to enable this. Add a simple regression test. This is a reapplication of commit 246d7e9fc23928 following a revert. The audit system calls must preserve the old behaviour of returning ENOSYS if the system call is disallowed within a jail, as some applications depend on that behaviour. Reviewed by: kevans, jamie (previous version) MFC after: 1 week Sponsored by: Modirum MDPay Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D51719 Differential Revision: https://reviews.freebsd.org/D52572
2025-09-16	Revert "jail: Optionally allow audit session state to be configured in a jail"	Mark Johnston
	Changing audit system calls to return EPERM instead of ENOSYS when invoked from a jail breaks some userspace applications. Revert for now until a more complete change is reviewed. This reverts commit 246d7e9fc23928be22db38220f5439f5cdee5264. PR: 289645
2025-09-15	jail: Optionally allow audit session state to be configured in a jail	Mark Johnston
	Currently it is impossible for a privileged, jailed process to set audit session state. This can result in suprising audit event misattribution. For example, suppose a user ssh'es into a jail and restarts a service; normally, sshd sets audit state such that events generated by the SSH session are attributed to the newly authenticated user, but in a jail, the corresponding setaudit(2) call fails, so events are attributed to the user who had started sshd in the jail (typically the user who had started the jail itself by some means). While this behaviour is reasonable, administrators might want to trust the jailed sshd to reset audit state, such that the authenticated user appears in audit logs. Add a jail knob to enable this. Add a simple regression test. Reviewed by: kevans, jamie MFC after: 1 week Sponsored by: Modirum MDPay Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D51719
2025-08-21	MAC/do: Rename the internal malloc type	Kushagra Srivastava
	From M_DO to M_MAC_DO. While here, make the descriptions more accurate. (Commit message by olce@.) Reviewed by: olce MFC after: 3 days Sponsored by: Google LLC (GSoC 2025) Sponsored by: The FreeBSD Foundation
2025-08-03	mac: Remove uses of DEBUG_VFS_LOCKS	Mark Johnston
	We can assert that a vnode lock is held whenever INVARIANTS is configured. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D51698
2025-07-24	kern: adopt the cr_gid macro for cr_groups[0] more widely	Kyle Evans
	A future change may split cr_gid out of cr_groups[0] so that there's a cleaner separation between the supplemental groups and the effective group. Do the mechanical conversion where we can, and drop some comments where we need further work because some assumptions about cr_gid == cr_groups[0] have been made. This should not be a functional change, but downstreams and other out-of-tree code are advised to investigate their usage of cr_groups sooner rather than later, as a future change will render assumptions about these two being equivalent harmful. Reviewed by: asomers, kib, olce Differential Revision: https://reviews.freebsd.org/D51153
2025-06-18	audit: move the wait from the queue length from the commit to alloc	Konstantin Belousov
	AUDIT_SYSCALL_EXIT() and indirectly audit_commit() is intended to be called from arbitrary top-level context. This means that any sleepable locks can be owned by the caller, and which makes the sleeping in audit_commit() forbidden. Since we need to sleep for the record in audit_alloc() anyway, move the sleep for the queue limit there. At worst, if the audit is suspended is disabled when we actually reach the commit location, this means that we lost time uselessly. PR: 287566 Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D50879
2025-06-11	machine/stdarg.h -> sys/stdarg.h	Brooks Davis
	Switch to using sys/stdarg.h for va_list type and va_* builtins. Make an attempt to insert the include in a sensible place. Where style(9) was followed this is easy, where it was ignored, aim for the first block of sys/*.h headers and don't get too fussy or try to fix other style bugs. Reviewed by: imp Exp-run by: antoine (PR 286274) Pull Request: https://github.com/freebsd/freebsd-src/pull/1595
2025-05-27	MAC/do: Fix a too stringent debug assertion for a target of 'uid=*'	Olivier Certner
	MDF_HAS_PRIMARY_CLAUSE only concerns groups, not users, and is thus not set in the latter case. This change only has an effect on INVARIANTS builds. PR: 287057 MFC after: 10 minutes Sponsored by: The FreeBSD Foundation
2025-05-16	grantbylabel_syscall check p_textvp != NULL	Simon J. Gerraty
	kernel process will not have valid p_textvp Reviewed by: stevek Differential Revision: https://reviews.freebsd.org/D50368
2025-04-02	MAC/do: Rules: <from> and <to> parts now to be separated by '>'	Olivier Certner
	Previously, we would accept only ':' as the separator, which makes parsing of the rule specification harder for humans, especially those people that are used to UNIX systems where ':' is used as the separator in PATH. With ':', the <from> and <to> parts can look like two different elements that are unrelated, especially to these eyes. Change parse_single_rule() so that '>' is also accepted as a separator between <from> and <to>, and promote it as the one to use. During a transition period, we will still allow the use of ':' for backwards compatibility. The manual page update comes from separate revision D49628. ':' has been completely removed from it on purpose. Reviewed by: bapt, manpages (ziaee) MFC after: 5 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D49627
2025-04-02	MAC/do: parse_single_rule(): Fix herald comment's first line	Olivier Certner
	No functional change. MFC after: 5 days Sponsored by: The FreeBSD Foundation
2025-02-09	MAC: mac_biba, mac_lomac: Fix setting loader tunables	Zhenlei Huang
	A string loader tunable requires setting the len parameter to a nonzero value, typically the size of the string, to have the flag CTLFLAG_TUN work correctly [1] [2]. Without this fix security.mac.{biba,lomac}.trusted_interfaces would have no effect at all. [1] 3da1cf1e88f8 Extend the meaning of the CTLFLAG_TUN flag to automatically ... [2] 6a3287f889b0 Fix regression issue after r267961. Handle special string case ... Reviewed by: olce, kib Fixes: af3b2549c4ba Pull in r267961 and r267973 again ... MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D48898
2025-02-06	audit/audit.c: fix typo KERNEL_PANICED->KERNEL_PANICKED	Konstantin Belousov
	Noted by: cy Fixes: 53ece2bea9ffa654aaa50e (audit(9): do not touch VFS if panicing) MFC after: 3 days
2025-02-05	audit(9): do not touch VFS if panicing	Konstantin Belousov
	Reported by: bz
2025-01-14	audit: Fix short-circuiting in syscallenter()	Mark Johnston
	syscallenter() has a slow path to handle syscall auditing and dtrace syscall tracing. It uses AUDIT_SYSCALL_ENTER() to check whether to take the slow path, but this macro also has side effects: it writes the audit log entry. When systrace (dtrace syscall tracing) is enabled, this would get short-circuited, and we end up not writing audit log entries. Introduce a pure macro to check whether auditing is enabled, use it in syscallenter() instead of AUDIT_SYSCALL_ENTER(). Reviewed by: kib Reported by: Joe Duin <jd@firexfly.com> Fixes: 2f7292437d0c ("Merge audit and systrace checks") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D48448
2024-12-17	MAC/do: Fix a compilation warning about an unused function	Olivier Certner
	grant_supplementary_group_from_flags() had been used in previous versions of the recent changes, but recently has not been needed anymore. It has been kept around just in case deliberately, by analogy with grant_primary_group_from_flags() (this one still being used).
2024-12-16	MAC/do: Update copyright	Olivier Certner
	Approved by: emaste (mentor) Sponsored by: The FreeBSD Foundation
2024-12-16	MAC/do: Apply a rule on real UID/GID instead of effective ones	Olivier Certner
	We intend MAC/do to authorize transitions based on the "real" identity information of the calling process, rather than transiently-acquired effective IDs. Reviewed by: bapt Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47845
2024-12-16	MAC/do: Convert internal TAILQs to STAILQs	Olivier Certner
	We only browse these forward and never need to remove arbitrary elements from them. No functional change (intended). Reviewed by: bapt, emaste Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47624
2024-12-16	MAC/do: parse_rules(): Tolerate blanks around tokens	Olivier Certner
	To this end, we introduce the strsep_noblanks() function, designed to be a drop-in replacement for strstep(), and use it in place of the latter. We had taken care of calling strsep() even when the remaining sub-string was not delimited (i.e., with empty string as its second argument), so this commit only has mechanical replacements of existing calls. Reviewed by: bapt Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47623
2024-12-16	MAC/do: toast_rules(): Minor simplification	Olivier Certner
	Use the most common pattern to browse and delete elements of a list, as it reads quicker. Reviewed by: bapt Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47622
2024-12-16	MAC/do: Interpret the new rules specification; Monitor setcred()	Olivier Certner
	TL;DR: Now monitor setcred() calls, and reject or grant them according to the new rules specification. Drop monitoring setuid() and setgroups(). As previously explained in the commit introducing the setcred() system call, MAC/do must know the entire new credentials while the old ones are still available to be able to approve or reject the requested changes. To this end, the chosen approach was to introduce a new system call, setcred(), instead of modifying existing ones to be able to participate in a "prepare than commit"-like protocol. ****** The MAC framework typically calls several hooks of its registered policies as part of the privilege checking/granting process. Each system call calls some dedicated hook early, to which it usually passes the same arguments it received, whose goal is to forcibly deny access to the functionality when needed (i.e., a single deny by any policy globally denies the access). Then, the system call usually calls priv_check() or priv_check_cred() an unspecified number of times, each of which may trigger calls to two generic MAC hooks. The first such call is to mac_priv_check(), and always happens. Its role is to deny access early and forcibly, as can be done also in system calls' dedicated early hooks (with different reach, however). The second, mac_priv_grant(), is called only if the priv_check() and prison_priv_check() generic code doesn't handle the request by itself, i.e., doesn't explicitly grant access (to the super user, or to all users for a few specific privileges). It allows any single policy to grant the requested access (regardless of whether the other policies do so or not). MAC/do currently only has an effect on processes spawned from the '/usr/bin/mdo' executable. It implements all setcred() hooks, called via mac_cred_setcred_enter(), mac_cred_check_setcred() and mac_cred_setcred_exit(). In the first one, implemented in mac_do_setcred_enter(), it checks if MAC/do has to apply to the current process, allocates (or re-uses) per-thread data to be later used by the other hooks (those of setcred() and the mac_priv_grant() one, called by priv_check()) and fills them with the current context (the rules to apply). This is both because memory allocations cannot be performed while holding the process lock and to ensure that all hooks called by a single setcred() see the same rules to apply (not doing this would be a security hazard as rules are concurrently changed by the administrator, as explained in more details below). In the second one (implemented by mac_do_check_setcred()), it stores in MAC/do's per-thread data the new credentials. Indeed, the next MAC/do's hook implementation to be called, mac_do_priv_grant() (implementing the mac_priv_grant() hook) must have knowledge of the new credentials that setcred() wants to install in order to validate them (or not), which the MAC framework can't provide as the priv_check*() API only passes the current credentials and a specific privilege number to the mac_priv_check() and mac_priv_grant() hooks. By contrast, the very point of MAC/do is to grant the privilege of changing credentials not only based on the current ones but also on the seeked-for ones. The MAC framework's constraints that mac_priv_grant() hooks are called without context and that MAC modules must compose (each module may implement any of the available hooks, and in particular those of setcred()) impose some aspects of MAC/do's design. Because MAC/do's rules are tied to jails, accessing the current rules requires holding the corresponding jail's lock. As other policies might try to grab the same jail's lock in the same hooks, it is not possible to keep the rules' jail's lock between mac_do_setcred_enter() and mac_do_priv_grant() to ensure that the rules are still alive. We have thus augmented 'struct rules' with a reference count, and its lifecyle is now decoupled from being referenced or not by a jail. As a thread enters mac_cred_setcred_enter(), it grabs a hold on the current rules and keeps a pointer to them in the per-thread data. In its mac_do_setcred_exit(), MAC/do just "frees" the per-thread data, in particular by dropping the referenced rules (we wrote "frees" within guillemets, as in fact the per-thread structure is reused, and only freed when a thread exits or the module is unloaded). Additionally, ensuring that all hooks have a consistent view of the rules to apply might become crucial if we augment MAC/do with forceful access denial policies in the future (i.e., policies that forcibly disable access regardless of other MAC policies wanting to grant that access). Indeed, without the above-mentioned design, if newly installed rules start to forcibly deny some specific transitions, and some thread is past the mac_cred_check_setcred() hook but before the mac_priv_grant() one, the latter may grant some privileges that should have been rejected first by the former (depending on the content of user-supplied rules). A previous version of this change used to implement access denial mandated by the '!' and '-' GID flags in mac_do_check_setcred() with the goal to have this rejection prevail over potential other MAC modules authorizing the transition. However, this approach had two drawbacks. First, it was incompatible both conceptually and in the current implementation with multiple rules being treated as an inclusive disjunction, where any single rule granting access is enough for MAC/do to grant access. Explicit denial requested by one matching rule could prevent another rule from granting access. The implementation could have been fixed, but the conflation of rules being considered as disjoint for explicit granting but conjunct for forced denial would have remained. Second, MAC/do applies only to processes spawned from a particular executable, and imposing system-wide restrictions on only these processes is conceptually strange and probably not very useful. In the end, we moved the implementation of explicit access denial into mac_do_priv_grant(), along with the interpretation of other target clauses. The separate definition of 'struct mac_do_data_header' may seem odd, as it is only used in 'struct mac_do_setcred_data'. It is a remnant of an earlier version that was not using setcred(), but rather implemented hooks for setuid() and setgroups(). We however kept it, as it clearly separates the machinery to pass data from dedicated system call hooks to priv_grant() from the actual data that MAC/do needs to monitor a call to setcred() specifically. It may be useful in the future if we evolve MAC/do to also grant privileges through other system calls (each seen as a complete credentials transition on its own). The target supplementary groups are checked with merge-like algorithms leveraging the fact that all supplementary groups in credentials ('struct ucred') and in each rule ('struct rule') are sorted, avoiding to start a binary search for each considered GID which is asymptotically more costly. All access granting/denial is thus at most linear and in at most the sum of the number of requested groups, currently held ones and those contained in the rule, per applicable rule. This should be enough in all practical cases. There is however still room for more optimizations, without or with changes in rules' data structures, if the need ever arises. Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47620
2024-12-16	MAC/do: Introduce rules reference counting	Olivier Certner
	This is going to be used in subsequent commits to keep rules alive even if disconnected from their jail in the meantime. We'll indeed have to release the prison lock between two uses (outright rejection, final granting) where the rules must absolutely stay the same for security reasons. Reviewed by: bapt Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47619
2024-12-16	New setcred() system call and associated MAC hooks	Olivier Certner
	This new system call allows to set all necessary credentials of a process in one go: Effective, real and saved UIDs, effective, real and saved GIDs, supplementary groups and the MAC label. Its advantage over standard credential-setting system calls (such as setuid(), seteuid(), etc.) is that it enables MAC modules, such as MAC/do, to restrict the set of credentials some process may gain in a fine-grained manner. Traditionally, credential changes rely on setuid binaries that call multiple credential system calls and in a specific order (setuid() must be last, so as to remain root for all other credential-setting calls, which would otherwise fail with insufficient privileges). This piecewise approach causes the process to transiently hold credentials that are neither the original nor the final ones. For the kernel to enforce that only certain transitions of credentials are allowed, either these possibly non-compliant transient states have to disappear (by setting all relevant attributes in one go), or the kernel must delay setting or checking the new credentials. Delaying setting credentials could be done, e.g., by having some mode where the standard system calls contribute to building new credentials but without committing them. It could be started and ended by a special system call. Delaying checking could mean that, e.g., the kernel only verifies the credentials transition at the next non-credential-setting system call (we just mention this possibility for completeness, but are certainly not endorsing it). We chose the simpler approach of a new system call, as we don't expect the set of credentials one can set to change often. It has the advantages that the traditional system calls' code doesn't have to be changed and that we can establish a special MAC protocol for it, by having some cleanup function called just before returning (this is a requirement for MAC/do), without disturbing the existing ones. The mac_cred_check_setcred() hook is passed the flags received by setcred() (including the version) and both the old and new kernel's 'struct ucred' instead of 'struct setcred' as this should simplify evolving existing hooks as the 'struct setcred' structure evolves. The mac_cred_setcred_enter() and mac_cred_setcred_exit() hooks are always called by pairs around potential calls to mac_cred_check_setcred(). They allow MAC modules to allocate/free data they may need in their mac_cred_check_setcred() hook, as the latter is called under the current process' lock, rendering sleepable allocations impossible. MAC/do is going to leverage these in a subsequent commit. A scheme where mac_cred_check_setcred() could return ERESTART was considered but is incompatible with proper composition of MAC modules. While here, add missing includes and declarations for standalone inclusion of <sys/ucred.h> both from kernel and userspace (for the latter, it has been working thanks to <bsm/audit.h> already including <sys/types.h>). Reviewed by: brooks Approved by: markj (mentor) Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47618
2024-12-16	MAC/do: Output errors when parsing rules	Olivier Certner
	So that administrators can more easily know what the problem is with the rules they are trying to set. The new sysctl 'security.mac.do.print_parse_error' controls whether trying to set sysctl 'security.mac.do.rules' with invalid rules triggers printing of the error on the system console. Setting jail parameters directlty reports an error to the calling process thanks to the VFS options mechanism used by the jail machinery, so is not controlled by the new sysctl setting. Reviewed by: bapt Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47617
2024-12-16	MAC/do: Support multiple users and groups as single rule's targets	Olivier Certner
	Supporting group targets is a requirement for MAC/do to be able to enforce a limited set of valid new groups passed to setgroups(). Additionally, it must be possible for this set of groups to also depend on the target UID, since users and groups are quite tied in UNIX (users are automatically placed in only the groups specified through '/etc/passwd' (primary group) and '/etc/group' (supplementary ones)). These requirements call for a re-design of the specification of the rules specification string and of 'struct rule'. A rules specification string is now a list of rules separated by ';' (instead of ','). One rule is still composed of a "from" part and a "to" (or "target") part, both being separated by ':' (as before). The first part, "from", is matched against the credentials of the process calling setuid()/setgroups(). Its specification remains unchanged: It is a '<type>=<id>' clause, where <type> is either "uid" or "gid" and <id> an UID or GID. The second part, "to", is now a comma-separated (',') list of '<flags><type>=<id>' clauses similar to that of the "from" part, with the extensions that <id> may also be "" or "any" or ".", and that <flags> may contain at most one of the '+', '-' and '!' characters when <type> is GID. "" and "any" both designate any ID for the <type>, and are aliases to each other. In front of them, only the "+" flag is allowed (in addition to the previous rules). "." designates the process' current IDs for the <type>, as explained below. For GIDs, an absence of flag indicates that the specified GID is allowed as the real, effective and/or saved GIDs (the "primary" groups). Conversely, the presence of any allowed flag indicates that the specification concerns supplementary groups. The '+' flag in front of "gid" indicates that the ID is allowed as a supplementary group. The '!' flag indicates that the ID is mandatory, i.e., must be listed in the supplementary groups. The '-' flag indicates that the GID must not be listed in the supplementary groups. A specification with '-' is only useful in conjunction with a '+'-tagged specification where only one of them has <id> ".", or if other MAC policies are loaded that would give access to other, unwanted groups. "." indicates some ID that the calling process already has on privilege check. For type "uid", it designates any of the real, effective or saved UIDs. For type "gid", its effect depends on the presence of one of the '+', '-' or '!' flags. If no flag is present, it designates any of the real, effective or saved GIDs. If one is present, it designates any of the supplementary groups. If the "to" part doesn't specify any explicit UID, any of the UIDs of the calling process is implied (it is as if "uid=." had been specified). Similarly, if it doesn't specify any explicit GID, "gid=.,!gid=." is assumed, meaning that all the groups of the calling process are implied and must be present. More precisely, each of the desired real, effective and saved GIDs must be one of the current real, effective or saved GID, whereas all others (the supplementary ones) must be the same as those that are current. No two clauses in a single "to" list may display the same <id>, except for GIDs but only if, each time the same <id> appears, it does so with a different flag (no flag counting as a separate flag) and all the specified flags are not contradictory (e.g., it is possible to have the same GID appear with no flag and the "+" flag, but the same GID with both "+" and "-" will be rejected). 'struct rule' now holds arrays of UIDs (field 'uids') and GIDs (field 'gids') that are admissible as targets, with accompanying flags (such as MDF_SUPP_MUST, representing the '!' flag). Some flags are also held by ID type, including flags associated to individual IDs, as MDF_CURRENT in these flags stands for the process being privilege-checked's current IDs, to which ID flags apply. As a departure from this scheme, "*" or "any" as <id> for GIDs is either represented by MDF_ANY or MDF_ANY_SUPP. This is to make it coexist with a "."/MDF_CURRENT specification for the other category of groups (among primary and supplementary groups), which needs to be qualified by the usual GID flags. This commit contains only the changes to parse the new rules and to build their representation. The privilege granting part is not fixed here, beyond what making compilation work requires (and, in preparation for some subsequent commit, minimal adaptations to the matching logic in check_setuid()). Approved by: markj (mentor) Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47616
2024-12-16	MAC/do: Rename private OSD slot by removing 'mac_do_' prefix	Olivier Certner
	This variable is static and holds the OSD slot number for jails that MAC/do uses to store rules. In the same vein as previous renames, simplify it by removing the redundant prefix, as this name cannot appear in code outside of 'mac_do.c', nor in stack traces on panic. Reviewed by: bapt Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47772
2024-12-16	MAC/do: Ease input/output of ID types	Olivier Certner
	Have a static constant array mapping numerical ID types to their canonical representations ('id_type_to_str'). New parse_id_type() that parses a type thanks to 'id_type_to_str' and with a special case to accept also 'any'. Have parse_rule_element() use parse_id_type(). A later commit will add a second call to the latter for the destination ID. Reviewed by: bapt Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47615
2024-12-16	MAC/do: Better parsing for IDs (strtoui_strict())	Olivier Certner
	Introduce strtoui_strict(), which signals an error on overflow contrary to the in-kernel strto*() family of functions which have no 'errno' to set and thus do not allow callers to distinguish a genuine maximum value on input and overflow. It is built on top of strtoq() and the 'quad_t' type in order to achieve this distinction and also to still support negative inputs with the usual meaning for these functions. See the introduced comments for more details. Use strtoui_strict() to read IDs instead of strtol(). Reviewed by: bapt Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47614
2024-12-16	MAC/do: 'struct rule': IDs and types as 'u_int', rename fields	Olivier Certner
	This is in preparation for introducing a common conversion function for IDs and to simplify code a bit by removing the from-IDs union and not having to introduce a new one for to-IDs in a later commit. Reviewed by: bapt Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47613
2024-12-16	MAC/do: parse_rule_element(): Bug in parsing the origin ID	Olivier Certner
	The ID field was allowed to be empty, which would be then parsed as 0 by strtol(). There remains bugs in this function, where parsing for from- or to- IDs accepts spaces and produces 0, but this will conveniently be fixed in a later commit introducing strtoui_strict(). Reviewed by: bapt Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47612
2024-12-16	MAC/do: parse_rule_element(): Style, more clarity	Olivier Certner
	Add newlines to separate logical blocks. Remove braces around 'if's non-compound substatements. No functional change (intended). Reviewed by: bapt Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47611