summaryrefslogtreecommitdiff
path: root/sys/dev/netmap/netmap_kern.h
AgeCommit message (Collapse)Author
2024-10-14netmap: Make memory pools NUMA-awareMark Johnston
Each netmap adapter associated with a physical adapter is attached to a netmap memory pool. contigmalloc() is used to allocate physically contiguous memory for the pool, but ideally we would ensure that all such memory is allocated from the NUMA domain local to the adapter. Augment netmap's memory pools with a NUMA domain ID, similar to how IOMMU groups are handled in the Linux port. That is, when attaching to a physical adapter, ensure that the associated memory pools are local to the adapter's associated memory domain, creating new pools as needed. Some types of ifnets do not have any defined NUMA affinity; in this case the domain ID in question is the sentinel value -1. Add a sysctl, dev.netmap.port_numa_affinity, which can be used to enable the new behaviour. Keep it disabled by now to avoid surprises in case netmap applications are relying on zero-copy optimizations to forward packets between ports belonging to different NUMA domains. Reviewed by: vmaffione MFC after: 2 weeks Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D46666
2024-03-26netmap: Address errors on memory free in netmap_genericTom Jones
netmap_generic keeps a pool of mbufs for handling transfers, these mbufs have an external buffer attached to them. If some cases other parts of the network stack can chain these mbufs, when this happens the normal pool destructor function can end up free'ing the pool mbufs twice: - A first time if a pool mbuf has been chained with another mbuf when its chain is freed - A second time when its entry in the pool is freed Additionally, if other parts of the stack demote a pool mbuf its interface reference will be cleared. In this case we deference a NULL pointer when trying to free the mbuf through the destructor. Store a reference to the adapter in ext_arg1 with the destructor callback so we can find the correct adapter when free'ing a pool mbuf. This change enables using netmap with epair interfaces. Reviewed By: vmaffione MFC after: 1 week Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D44371
2023-12-27netmap: Ignore errors in CSB_WRITE()Mark Johnston
The CSB_WRITE() and _READ() macros respectively write to and read from userspace memory and so can in principle fault. However, we do not check for errors and will proceed blindly if they fail. Add assertions to verify that they do not. This is in preparation for annotating copyin() and related functions with __result_use_check. Reviewed by: vmaffione MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D43200
2023-08-16sys: Remove $FreeBSD$: one-line .h patternWarner Losh
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-05-12spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSDWarner Losh
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
2023-04-05netmap: Fix queue stalls with generic interfacesMark Johnston
In emulated mode, the FreeBSD netmap port attempts to perform zero-copy transmission. This works as follows: the kernel ring is populated with mbuf headers to which netmap buffers are attached. When transmitting, the mbuf refcount is initialized to 2, and when the counter value has been decremented to 1 netmap infers that the driver has freed the mbuf and thus transmission is complete. This scheme does not generalize to the situation where netmap is attaching to a software interface which may transmit packets among multiple "queues", as is the case with bridge or lagg interfaces. In that case, we would be relying on backing hardware drivers to free transmitted mbufs promptly, but this isn't guaranteed; a driver may reasonably defer freeing a small number of transmitted buffers indefinitely. If such a buffer ends up at the tail of a netmap transmit ring, further transmits can end up blocked indefinitely. Fix the problem by removing the zero-copy scheme (which is also not implemented in the Linux port of netmap). Instead, the kernel ring is populated with regular mbuf clusters into which netmap buffers are copied by nm_os_generic_xmit_frame(). The refcounting scheme is preserved, and this lets us avoid allocating a fresh cluster per transmitted packet in the common case. If the transmit ring is full, a callout is used to free the "stuck" mbuf, avoiding the queue deadlock described above. Furthermore, when recycling mbuf clusters, be sure to fully reinitialize the mbuf header instead of simply re-setting M_PKTHDR. Some software interfaces, like if_vlan, may set fields in the header which should be reset before the mbuf is reused. Reviewed by: vmaffione MFC after: 1 month Sponsored by: Zenarmor Sponsored by: OPNsense Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38065
2023-03-14netmap: get rid of save_if_input for emulated adaptersVincenzo Maffione
The save_if_input function pointer was meant to save the previous value of ifp->if_input before replacing it with the emulated adapter hook. However, the same pointer value is already stored in the if_input field of the netmap_adapter struct, to be used for host TX ring processing. Reuse the netmap_adapter if_input field to simplify the code and save some space. MFC after: 14 days
2023-03-11netmap: get rid of WNA() macroVincenzo Maffione
MFC after: 7 days
2023-02-14Mechanically convert netmap(4) to IfAPIJustin Hibbits
Reviewed by: vmaffione, zlei Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D37814
2023-01-23netmap: Correct a commentMark Johnston
Reviewed by: vmaffione MFC after: 1 week Sponsored by: Zenarmor Sponsored by: OPNsense Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38063
2022-12-24debug_put_get: don't crash on null pointersVincenzo Maffione
MFC after: 7 days
2022-12-24netmap: drop compatibility FreeBSD codeVincenzo Maffione
Netmap users on FreeBSD are not supposed to import code from the github netmap repository anymore. They should use the code that is available in the src repo. We can therefore drop the compatibility code. MFC after: 7 days
2022-12-03netmap_update_config: update na->name to cope with reconfigurationsVincenzo Maffione
MFC after: 1 week
2022-11-02sys: Nuke double-semicolonsElliott Mitchell
A distinct number of double-semicolons have ended up in FreeBSD. Take a pass at getting rid of many of these harmless typos. Reviewed by: emaste, rrs Pull Request: https://github.com/freebsd/freebsd-src/pull/609 Differential Revision: https://reviews.freebsd.org/D31716
2022-03-06netmap: add a tunable for the maximum number of VALE switchesVincenzo Maffione
The new dev.netmap.max_bridges sysctl tunable can be set in loader.conf(5) to change the default maximum number of VALE switches that can be created. Current defaults is 8. MFC after: 2 weeks
2021-08-22netmap: import changes from upstreamVincenzo Maffione
- make sure rings are disabled during resets - introduce netmap_update_hostrings_mode(), with support for multiple host rings - always initialize ni_bufs_head in netmap_if ni_bufs_head was not properly initialized when no external buffers were requestedx and contained the ni_bufs_head from the last request. This was causing spurious buffer frees when alternating between apps that used external buffers and apps that did not use them. - check na validitity under lock on detach - netmap_mem: fix leak on error path - nm_dispatch: fix compilation on Raspberry Pi MFC after: 2 weeks
2021-04-17wpa: Import wpa_supplicant/hostapd commit f91680c15Cy Schubert
This is the April update to vendor/wpa committed upstream 2021/04/07. This is MFV efec8223892b3e677acb46eae84ec3534989971f. Suggested by: philip Reviewed by: philip MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D29744
2021-04-17netmap: make sure rings are disabled during resetsVincenzo Maffione
Explicitly disable ring synchronization before calling callbacks that may result in a hardware reset. Before this patch we relied on capturing the down/up events which, however, may not be issued by all drivers.
2021-04-02netmap: several typo fixesVincenzo Maffione
No functional changes intended.
2021-04-02netmap: fix typo bug in netmap_compute_buf_lenVincenzo Maffione
2021-03-29netmap: add kernel support for the "offsets" featureVincenzo Maffione
This feature enables applications to ask netmap to transmit or receive packets starting at a user-specified offset from the beginning of the netmap buffer. This is meant to ease those packet manipulation operations such as pushing or popping packet headers, that may be useful to implement software switches, routers and other packet processors. To use the feature, drivers (e.g., iflib, vtnet, etc.) must have explicit support. This change does not add support for any driver, but introduces the necessary kernel changes. However, offsets support is already included for VALE ports and pipes.
2021-01-24netmap: simplify parameter passingVincenzo Maffione
Changes imported from the netmap github.
2020-06-11netmap: introduce netmap_kring_on()Vincenzo Maffione
This function returns NULL if the ring identified by queue id and direction is in netmap mode. Otherwise return the corresponding kring. Use this function to replace vtnet_netmap_queue_on(). MFC after: 1 week Notes: svn path=/head/; revision=362076
2020-02-07netmap: improve netmap(4) and vale(4) man pagesVincenzo Maffione
Clean up obsolete sysctl descriptions and add missing ones. PR: 243838 Reviewed by: bcr MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23546 Notes: svn path=/head/; revision=357663
2020-01-13netmap: disable passthrough with no hypervisor supportVincenzo Maffione
The netmap passthrough subsystem requires proper support in the hypervisor. In particular, two PCI device ids (from the Red Hat PCI vendor id 0x1b36) need to be assigned to the two netmap virtual devices. We then disable these devices until the ids have not been assigned, in order to avoid conflicts with other virtual devices emulated by upstream QEMU. PR: 241774 MFC after: 3 days Notes: svn path=/head/; revision=356704
2019-09-01netmap: import changes from upstream (SHA 137f537eae513)Vincenzo Maffione
- Rework option processing. - Use larger integers for memory size values in the memory management code. MFC after: 2 weeks Notes: svn path=/head/; revision=351657
2019-02-18netmap: don't schedule kqueue notify task when kqueue is not usedVincenzo Maffione
This change adds a counter (kqueue_users) to keep track of how many kqueue users are referencing a given struct nm_selinfo. In this way, nm_os_selwakeup() can schedule the kevent notification task only when kqueue is actually being used. This is important to avoid wasting CPU in the common case where kqueue is not used. Reviewed by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19177 Notes: svn path=/head/; revision=344253
2019-02-05netmap: refactor logging macros and pipesVincenzo Maffione
Changelist: - Replace ND, D and RD macros with nm_prdis, nm_prinf, nm_prerr and nm_prlim, to avoid possible naming conflicts. - Add netmap_krings_mode_commit() helper function and use that to reduce code duplication. - Refactor pipes control code to export some functions that can be reused by the veth driver (on Linux) and epair(4). - Add check to reject API requests with version less than 11. - Small code refactoring for the null adapter. MFC after: 1 week Notes: svn path=/head/; revision=343772
2019-02-02netmap: upgrade sync-kloop supportVincenzo Maffione
Add SYNC_KLOOP_MODE option, and add support for direct mode, where application executes the TXSYNC and RXSYNC in the context of the ioeventfd wake up callback. MFC after: 5 days Notes: svn path=/head/; revision=343689
2019-01-30netmap: fix lock order reversal related to kqueue usageVincenzo Maffione
When using poll(), select() or kevent() on netmap file descriptors, netmap executes the equivalent of NIOCTXSYNC and NIOCRXSYNC commands, before collecting the events that are ready. In other words, the poll/kevent callback has side effects. This is done to avoid the overhead of two system call per iteration (e.g., poll() + ioctl(NIOC*XSYNC)). When the kqueue subsystem invokes the kqueue(9) f_event callback (netmap_knrw), it holds the lock of the struct knlist object associated to the netmap port (the lock is provided at initialization, by calling knlist_init_mtx). However, netmap_knrw() may need to wake up another netmap port (or even the same one), which means that it may need to call knote(). Since knote() needs the lock of the struct knlist object associated to the to-be-wake-up netmap port, it is possible to have a lock order reversal problem (AB/BA deadlock). This change prevents the deadlock by executing the knote() call in a per-selinfo taskqueue, where it is possible to hold a mutex. Reviewed by: aleksandr.fedorov_itglobal.com MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D18956 Notes: svn path=/head/; revision=343579
2019-01-23netmap: improvements to the netmap kloop (CSB mode)Vincenzo Maffione
Changelist: - Add the proper memory barriers in the kloop ring processing functions. - Fix memory barriers usage in the user helpers (nm_sync_kloop_appl_write, nm_sync_kloop_appl_read). - Fix nm_kr_txempty() helper to look at rhead rather than rcur. This is important since the kloop can read a value of rcur which is ahead of the value of rhead (see explanation in nm_sync_kloop_appl_write) - Remove obsolete ptnetmap_guest_write_kring_csb() and ptnet_guest_read_kring_csb(), and update if_ptnet(4) to use those. - Prepare in advance the arguments for netmap_sync_kloop_[tr]x_ring(), to make the kloop faster. - Provide kernel and user implementation for nm_ldld_barrier() and nm_ldst_barrier() MFC after: 2 weeks Notes: svn path=/head/; revision=343346
2019-01-23netmap: fix knote() argument to match the mutex stateVincenzo Maffione
The nm_os_selwakeup function needs to call knote() to wake up kqueue(9) users. However, this function can be called from different code paths, with different lock requirements. This patch fixes the knote() call argument to match the relavant lock state. Also, comments have been updated to reflect current code. PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219846 Reported by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com> Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D18876 Notes: svn path=/head/; revision=343344
2018-12-21netmap: move buf_size validation code to its own functionVincenzo Maffione
This code validates the netmap buf_size against the interface MTU and maximum descriptor size, to make sure the values are consistent. Moving this functionality to its own function is needed because this function is also called by Linux-specific code. MFC after: 3 days Notes: svn path=/head/; revision=342300
2018-12-05netmap: align codebase to the current upstream (760279cfb2730a585)Vincenzo Maffione
Changelist: - Replace netmap passthrough host support with a more general mechanism to call TXSYNC/RXSYNC from an in-kernel event-loop. No kernel threads are used to use this feature: the application is required to spawn a thread (or a process) and issue a SYNC_KLOOP_START (NIOCCTRL) command in the thread body. The kernel loop is executed by the ioctl implementation, which returns to userspace only when a different thread calls SYNC_KLOOP_STOP or the netmap file descriptor is closed. - Update the if_ptnet driver to cope with the new data structures, and prune all the obsolete ptnetmap code. - Add support for "null" netmap ports, useful to allocate netmap_if, netmap_ring and netmap buffers to be used by specialized applications (e.g. hypervisors). TXSYNC/RXSYNC on these ports have no effect. - Various fixes and code refactoring. Sponsored by: Sunny Valley Networks Differential Revision: https://reviews.freebsd.org/D18015 Notes: svn path=/head/; revision=341516
2018-10-23netmap: align codebase to the current upstream (sha 8374e1a7e6941)Vincenzo Maffione
Changelist: - Move large parts of VALE code to a new file and header netmap_bdg.[ch]. This is useful to reuse the code within upcoming projects. - Improvements and bug fixes to pipes and monitors. - Introduce nm_os_onattach(), nm_os_onenter() and nm_os_onexit() to handle differences between FreeBSD and Linux. - Introduce some new helper functions to handle more host rings and fake rings (netmap_all_rings(), netmap_real_rings(), ...) - Added new sysctl to enable/disable hw checksum in emulated netmap mode. - nm_inject: add support for NS_MOREFRAG Approved by: gnn (mentor) Differential Revision: https://reviews.freebsd.org/D17364 Notes: svn path=/head/; revision=339639
2018-05-18netmap: pull fix for 32-bit support from upstreamMatt Macy
Approved by: sbruno Notes: svn path=/head/; revision=333778
2018-04-12netmap: align codebase to the current upstream (commit id 3fb001303718146)Vincenzo Maffione
Changelist: - Turn tx_rings and rx_rings arrays into arrays of pointers to kring structs. This patch includes fixes for ixv, ixl, ix, re, cxgbe, iflib, vtnet and ptnet drivers to cope with the change. - Generalize the nm_config() callback to accept a struct containing many parameters. - Introduce NKR_FAKERING to support buffers sharing (used for netmap pipes) - Improved API for external VALE modules. - Various bug fixes and improvements to the netmap memory allocator, including support for externally (userspace) allocated memory. - Refactoring of netmap pipes: now linked rings share the same netmap buffers, with a separate set of kring pointers (rhead, rcur, rtail). Buffer swapping does not need to happen anymore. - Large refactoring of the control API towards an extensible solution; the goal is to allow the addition of more commands and extension of existing ones (with new options) without the need of hacks or the risk of running out of configuration space. A new NIOCCTRL ioctl has been added to handle all the requests of the new control API, which cover all the functionalities so far supported. The netmap API bumps from 11 to 12 with this patch. Full backward compatibility is provided for the old control command (NIOCREGIF), by means of a new netmap_legacy module. Many parts of the old netmap.h header has now been moved to netmap_legacy.h (included by netmap.h). Approved by: hrs (mentor) Notes: svn path=/head/; revision=332423
2018-04-09netmap: align codebase to upstream version v11.4Vincenzo Maffione
Changelist: - remove unused nkr_slot_flags - new nm_intr adapter callback to enable/disable interrupts - remove unused sysctls and document the other sysctls - new infrastructure to support NS_MOREFRAG for NIC ports - support for external memory allocator (for now linux-only), including linux-specific changes in common headers - optimizations within netmap pipes datapath - improvements on VALE control API - new nm_parse() helper function in netmap_user.h - various bug fixes and code clean up Approved by: hrs (mentor) Notes: svn path=/head/; revision=332319
2018-04-04netmap: align if_ptnet guest driver to the upstream code (commit 0e15788)Vincenzo Maffione
The change upgrades the driver to use the split Communication Status Block (CSB) format. In this way the variables written by the guest and read by the host are allocated in a different cacheline than the variables written by the host and read by the guest; this is needed to avoid cache thrashing. Approved by: hrs (mentor) Notes: svn path=/head/; revision=332047
2017-11-27sys/dev: further adoption of SPDX licensing ID tags.Pedro F. Giffuni
Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Notes: svn path=/head/; revision=326255
2017-06-12Update the current version of netmap to bring it in sync with the githubLuiz Otavio O Souza
version. This commit contains mostly refactoring, a few fixes and minor added functionality. Submitted by: Vincenzo Maffione <v.maffione at gmail.com> Requested by: many Sponsored by: Rubicon Communications, LLC (Netgate) Notes: svn path=/head/; revision=319881
2016-10-27Various fixes for ptnet/ptnetmap (passthrough of netmap ports). In detail:Luigi Rizzo
- use PCI_VENDOR and PCI_DEVICE ids from a publicly allocated range (thanks to RedHat) - export memory pool information through PCI registers - improve mechanism for configuring passthrough on different hypervisors Code is from Vincenzo Maffione as a follow up to his GSOC work. Notes: svn path=/head/; revision=308000
2016-10-18remove stale and unused code from various filesLuigi Rizzo
fix build on 32 bit platforms simplify logic in netmap_virt.h The commands (in net/netmap.h) to configure communication with the hypervisor may be revised soon. At the moment they are unused so this will not be a change of API. Notes: svn path=/head/; revision=307574
2016-10-18Restore svn r306772 that was overwritten by netmap import at svn r307394Sean Bruno
#include <sys/selinfo.h> should be here as all drivers that support netmap need to use this file regardless. Notes: svn path=/head/; revision=307569
2016-10-16Import the current version of netmap, aligned with the one on github.Luigi Rizzo
This commit, long overdue, contains contributions in the last 2 years from Stefano Garzarella, Giuseppe Lettieri, Vincenzo Maffione, including: + fixes on monitor ports + the 'ptnet' virtual device driver, and ptnetmap backend, for high speed virtual passthrough on VMs (bhyve fixes in an upcoming commit) + improved emulated netmap mode + more robust error handling + removal of stale code + various fixes to code and documentation (some mixup between RX and TX parameters, and private and public variables) We also include an additional tool, nmreplay, which is functionally equivalent to tcpreplay but operating on netmap ports. Notes: svn path=/head/; revision=307394
2016-10-06Move netmap selinfo.h in to sensible location.Sean Bruno
netmap_kern.h currently requires all drivers including it to include selinfo.h. Submitted by: mmacy@nextbsd.org Reviewed by: gnn MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D5334 Notes: svn path=/head/; revision=306772
2015-07-19add a use count so the netmap module cannot be unloaded while in use.Luigi Rizzo
Notes: svn path=/head/; revision=285699
2015-07-19small documentation updateLuigi Rizzo
Notes: svn path=/head/; revision=285695
2015-07-10staticize functions only used in netmap.cLuigi Rizzo
(detected by jenkins run with gcc 4.9) Update documentation on the use of netmap_priv_d, rename the refcount and use the same structure in FreeBSD and linux No functional changes. Notes: svn path=/head/; revision=285359
2015-07-10Sync netmap sources with the version in our private tree.Luigi Rizzo
This commit contains large contributions from Giuseppe Lettieri and Stefano Garzarella, is partly supported by grants from Verisign and Cisco, and brings in the following: - fix zerocopy monitor ports and introduce copying monitor ports (the latter are lower performance but give access to all traffic in parallel with the application) - exclusive open mode, useful to implement solutions that recover from crashes of the main netmap client (suggested by Patrick Kelsey) - revised memory allocator in preparation for the 'passthrough mode' (ptnetmap) recently presented at bsdcan. ptnetmap is described in S. Garzarella, G. Lettieri, L. Rizzo; Virtual device passthrough for high speed VM networking, ACM/IEEE ANCS 2015, Oakland (CA) May 2015 http://info.iet.unipi.it/~luigi/research.html - fix rx CRC handing on ixl - add module dependencies for netmap when building drivers as modules - minor simplifications to device-specific routines (*txsync, *rxsync) - general code cleanup (remove unused variables, introduce macros to access rings and remove duplicate code, Applications do not need to be recompiled, unless of course they want to use the new features (monitors and exclusive open). Those willing to try this code on stable/10 can just update the sys/dev/netmap/*, sys/net/netmap* with the version in HEAD and apply the small patches to individual device drivers. MFC after: 1 month Sponsored by: (partly) Verisign, Cisco Notes: svn path=/head/; revision=285349