summaryrefslogtreecommitdiff
path: root/sys/dev/netmap/netmap.c
AgeCommit message (Collapse)Author
2024-10-14netmap: Make memory pools NUMA-awareMark Johnston
Each netmap adapter associated with a physical adapter is attached to a netmap memory pool. contigmalloc() is used to allocate physically contiguous memory for the pool, but ideally we would ensure that all such memory is allocated from the NUMA domain local to the adapter. Augment netmap's memory pools with a NUMA domain ID, similar to how IOMMU groups are handled in the Linux port. That is, when attaching to a physical adapter, ensure that the associated memory pools are local to the adapter's associated memory domain, creating new pools as needed. Some types of ifnets do not have any defined NUMA affinity; in this case the domain ID in question is the sentinel value -1. Add a sysctl, dev.netmap.port_numa_affinity, which can be used to enable the new behaviour. Keep it disabled by now to avoid surprises in case netmap applications are relying on zero-copy optimizations to forward packets between ports belonging to different NUMA domains. Reviewed by: vmaffione MFC after: 2 weeks Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D46666
2023-08-16sys: Remove $FreeBSD$: one-line .h patternWarner Losh
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-05-12spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSDWarner Losh
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
2023-03-21netmap: fix copyin/copyout of nmreq options listVincenzo Maffione
The previous code unsuccesfully attempted to report a precise error for each option in the user list. Moreover, commit 253b2ec199b broke some ctrl-api-test (see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260547). With this patch we bail out as soon as an unrecoverable error is detected and we properly check for copy boundaries. EOPNOTSUPP no longer immediately returns an error, so that any other option in the list may be examined by the caller code and a precise report of the (un)supported options can be returned to the user. With this patch, all ctrl-api-test unit tests pass again. PR: 260547 Submitted by: giuseppe.lettieri@unipi.it Reviewed by: vmaffione MFC after: 14 days
2023-02-14Mechanically convert netmap(4) to IfAPIJustin Hibbits
Reviewed by: vmaffione, zlei Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D37814
2023-01-23netmap: Try to count packet drops in emulated modeMark Johnston
Right now we have little visibility into packet drops within netmap. Start trying to make packet loss issues more visible by counting queue drops in the transmit path, and in the input path for interfaces running in emulated mode, where we place received packets in a bounded software queue that is processed by rxsync. Reviewed by: vmaffione MFC after: 1 week Sponsored by: Zenarmor Sponsored by: OPNsense Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38064
2023-01-23netmap: Tell the compiler to avoid reloading ring indicesMark Johnston
Per the removed comments these fields should be loaded only once, since they can in principle be modified concurrently, though this would be a violation of the userspace contract with netmap. No functional change intended. Reviewed by: vmaffione MFC after: 1 week Sponsored by: Zenarmor Sponsored by: OPNsense Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38061
2022-12-03netmap_update_config: update na->name to cope with reconfigurationsVincenzo Maffione
MFC after: 1 week
2022-03-16netmap: Fix TOCTOU vulnerability in nmreq_copyinVincenzo Maffione
The total size of the user-provided nmreq was first computed and then trusted during the copyin. This might lead to kernel memory corruption and escape from jails/containers. Reported by: Lucas Leong (@_wmliang_) of Trend Micro Zero Day Initiative Security: CVE-2022-23084 MFC after: 3 days
2022-03-16netmap: Fix integer overflow in nmreq_copyinVincenzo Maffione
An unsanitized field in an option could be abused, causing an integer overflow followed by kernel memory corruption. This might be used to escape jails/containers. Reported by: Reno Robert and Lucas Leong (@_wmliang_) of Trend Micro Zero Day Initiative Security: CVE-2022-23085
2022-03-06netmap: fix refcount bug in netmap allocatorVincenzo Maffione
Symptom: when a single extmem memory region is provided to netmap multiple times, for multiple interfaces, the memory region is never released by netmap once all the existing file descriptors are closed. Fix the relevant condition in netmap_mem_drop(): release the memory when the last user of netmap_adapter is gone, rather then when the last user of netmap_mem_d is gone. MFC after: 2 weeks
2021-08-22netmap: import changes from upstreamVincenzo Maffione
- make sure rings are disabled during resets - introduce netmap_update_hostrings_mode(), with support for multiple host rings - always initialize ni_bufs_head in netmap_if ni_bufs_head was not properly initialized when no external buffers were requestedx and contained the ni_bufs_head from the last request. This was causing spurious buffer frees when alternating between apps that used external buffers and apps that did not use them. - check na validitity under lock on detach - netmap_mem: fix leak on error path - nm_dispatch: fix compilation on Raspberry Pi MFC after: 2 weeks
2021-04-18netmap: use safer defaults for hwbuf_lenVincenzo Maffione
We must make sure that incoming packets will never overflow the netmap buffers, even when the user is using the offset feature. In the typical scenario, the netmap buffer is 2KiB and, with an MTU of 1500, there are ~500 bytes available for user offsets. Unfortunately, some NICs accept incoming packets even when they are larger then the MTU. This means that the only way to stop DMA from overflowing the netmap buffers, when offsets are allowed, is to choose a hardware buffer length which is smaller than the netmap buffer length. For most NICs and for 2KiB netmap buffers, this means 1024 bytes, which is unconveniently small. The current code will select the small hardware buf size even when offsets are not in use. The main purpose of this change is to fix this bug by returning to the normal behavior for the no-offsets case. At the same time, the patch pushes the handling of the offset case to the lower level driver code, so that it can be made NIC-specific (in future patches).
2021-04-17wpa: Import wpa_supplicant/hostapd commit f91680c15Cy Schubert
This is the April update to vendor/wpa committed upstream 2021/04/07. This is MFV efec8223892b3e677acb46eae84ec3534989971f. Suggested by: philip Reviewed by: philip MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D29744
2021-04-17netmap: make sure rings are disabled during resetsVincenzo Maffione
Explicitly disable ring synchronization before calling callbacks that may result in a hardware reset. Before this patch we relied on capturing the down/up events which, however, may not be issued by all drivers.
2021-04-02netmap: several typo fixesVincenzo Maffione
No functional changes intended.
2021-04-02netmap: fix typo bug in netmap_compute_buf_lenVincenzo Maffione
2021-03-29netmap: add kernel support for the "offsets" featureVincenzo Maffione
This feature enables applications to ask netmap to transmit or receive packets starting at a user-specified offset from the beginning of the netmap buffer. This is meant to ease those packet manipulation operations such as pushing or popping packet headers, that may be useful to implement software switches, routers and other packet processors. To use the feature, drivers (e.g., iflib, vtnet, etc.) must have explicit support. This change does not add support for any driver, but introduces the necessary kernel changes. However, offsets support is already included for VALE ports and pipes.
2021-03-15netmap: fix memory leak in NETMAP_REQ_PORT_INFO_GETVincenzo Maffione
The netmap_ioctl() function has a reference counting bug in case of NETMAP_REQ_PORT_INFO_GET command. When `hdr->nr_name[0] == '\0'`, the function does not decrease the refcount of "nmd", which is increased by netmap_mem_find(), causing a refcount leak. Reported by: Xiyu Yang <sherllyyang00@gmail.com> Submitted by: Carl Smith <carl.smith@alliedtelesis.co.nz> MFC after: 3 days PR: 254311
2021-03-05netmap: Stop printing a line to the dmesg in netmap_init()Mark Johnston
netmap is compiled into the kernel by default so initialization was always reported, and netmap uses a formatting convention not used in the rest of the kernel. Reviewed by: vmaffione MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29099
2021-01-24netmap: simplify parameter passingVincenzo Maffione
Changes imported from the netmap github.
2021-01-10netmap: restore hwofs and support it in iflibVincenzo Maffione
Restore the hwofs functionality temporarily disabled by 7ba6ecf216fb15e8b147db2 to prevent issues with iflib. This patch brings the necessary changes to iflib to enable howfs to allow interface restarts without disrupting netmap applications actively using its rings. After this change, it becomes possible for multiple non-cooperating netmap applications to use non-overlapping subsets of the available netmap rings without clashing with each other. PR: 252453 MFC after: 1 week
2021-01-10netmap: vtnet: enable/disable krings on any interface reinitVincenzo Maffione
See 3d65fd97e85ab807f3b for a detailed explanation. PR: 252453 MFC after: 1 week
2021-01-09netmap: refactor netmap_resetVincenzo Maffione
The netmap_reset() function is meant to be called by the driver when they initialize (or re-initialize) a hardware ring. However, since the introduction of support for opening (in netmap mode) a subset of the available rings, netmap_reset() may be called multiple times on actively used rings, causing both kring and netmap ring to transition to an inconsistent state. This changes improves the situation by resetting all the indices fields of the kring to 0, as expected after the reinitialization of a hardware ring. PR: 252518 MFC after: 1 week
2021-01-09netmap: iflib: stop krings during interface resetVincenzo Maffione
When different processes open separate subsets of the available rings of a same netmap interface, a device reset may be performed while one of the processes is actively using some rings (e.g., caused by another process executing a nmport_open()). With this patch, such situation will cause the active process to get a POLLERR, so that it can have a chance to detect the situation. We also guarantee that no process is running a txsync or rxsync (ioctl or poll) while an iflib device reset is in progress. PR: 252453 MFC after: 1 week
2020-08-24netmap: use FreeBSD guards for epoch callsVincenzo Maffione
EPOCH calls are FreeBSD specific. Use guards to protect these, so that the code can compile under Linux. MFC after: 1 week Notes: svn path=/head/; revision=364731
2020-02-26Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)Pawel Biernacki
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718 Notes: svn path=/head/; revision=358333
2020-01-23In netmap() call ether_input() within the network epoch.Gleb Smirnoff
Notes: svn path=/head/; revision=357008
2019-10-20netmap: minor misc improvementsVincenzo Maffione
- use ring->head rather than ring->cur in lb(8) - use strlcat() rather than strncat() - fix bandwidth computation in pkt-gen(8) MFC after: 1 week Notes: svn path=/head/; revision=353775
2019-09-01netmap: import changes from upstream (SHA 137f537eae513)Vincenzo Maffione
- Rework option processing. - Use larger integers for memory size values in the memory management code. MFC after: 2 weeks Notes: svn path=/head/; revision=351657
2019-03-18netmap: add support for multiple host ringsVincenzo Maffione
Some applications forward from/to host rings most or all the traffic received or sent on a physical interface. In this cases it is desirable to have more than a pair of RX/TX host rings, and use multiple threads to speed up forwarding. This change adds support for multiple host rings. On registering a netmap port, the user can specify the number of desired receive and transmit host rings in the nr_host_tx_rings and nr_host_rx_rings fields of the nmreq_register structure. MFC after: 2 weeks Notes: svn path=/head/; revision=345269
2019-02-07netmap: revert netmap_attach_ext() to pre-r343772Vincenzo Maffione
Reported by: marius MFC after: 1 week Notes: svn path=/head/; revision=343867
2019-02-05netmap: refactor logging macros and pipesVincenzo Maffione
Changelist: - Replace ND, D and RD macros with nm_prdis, nm_prinf, nm_prerr and nm_prlim, to avoid possible naming conflicts. - Add netmap_krings_mode_commit() helper function and use that to reduce code duplication. - Refactor pipes control code to export some functions that can be reused by the veth driver (on Linux) and epair(4). - Add check to reject API requests with version less than 11. - Small code refactoring for the null adapter. MFC after: 1 week Notes: svn path=/head/; revision=343772
2019-02-02netmap: upgrade sync-kloop supportVincenzo Maffione
Add SYNC_KLOOP_MODE option, and add support for direct mode, where application executes the TXSYNC and RXSYNC in the context of the ioeventfd wake up callback. MFC after: 5 days Notes: svn path=/head/; revision=343689
2019-01-30netmap: fix lock order reversal related to kqueue usageVincenzo Maffione
When using poll(), select() or kevent() on netmap file descriptors, netmap executes the equivalent of NIOCTXSYNC and NIOCRXSYNC commands, before collecting the events that are ready. In other words, the poll/kevent callback has side effects. This is done to avoid the overhead of two system call per iteration (e.g., poll() + ioctl(NIOC*XSYNC)). When the kqueue subsystem invokes the kqueue(9) f_event callback (netmap_knrw), it holds the lock of the struct knlist object associated to the netmap port (the lock is provided at initialization, by calling knlist_init_mtx). However, netmap_knrw() may need to wake up another netmap port (or even the same one), which means that it may need to call knote(). Since knote() needs the lock of the struct knlist object associated to the to-be-wake-up netmap port, it is possible to have a lock order reversal problem (AB/BA deadlock). This change prevents the deadlock by executing the knote() call in a per-selinfo taskqueue, where it is possible to hold a mutex. Reviewed by: aleksandr.fedorov_itglobal.com MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D18956 Notes: svn path=/head/; revision=343579
2019-01-23netmap: fix knote() argument to match the mutex stateVincenzo Maffione
The nm_os_selwakeup function needs to call knote() to wake up kqueue(9) users. However, this function can be called from different code paths, with different lock requirements. This patch fixes the knote() call argument to match the relavant lock state. Also, comments have been updated to reflect current code. PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219846 Reported by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com> Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D18876 Notes: svn path=/head/; revision=343344
2018-12-22netmap: fix txsync check in netmap pollVincenzo Maffione
To check if txsync can be skipped, it is necessary to look for unseen TX space. However, this means comparing ring->cur against ring->tail, rather than ring->head against ring->tail (like nm_ring_empty() does). This change also adds some more comments to explain the optimization performed at the beginning of netmap_poll(). MFC after: 3 days Sponsored by: Sunny Valley Networks Notes: svn path=/head/; revision=342369
2018-12-22netmap: fix bug in netmap_poll() optimizationVincenzo Maffione
The bug was introduced by r339639, although it is present in the upstream netmap code since 2015. It is due to resetting the want_rx variable to POLLIN, rather than resetting it to POLLIN|POLLRDNORM. It only affects select(), which uses POLLRDNORM. poll() is not affected, because it uses POLLIN. Also, it only affects FreeBSD, because Linux skips the optimization implemented by the piece of code where the bug occurs. MFC after: 3 days Sponsored by: Sunny Valley Networks Notes: svn path=/head/; revision=342368
2018-12-21netmap: move buf_size validation code to its own functionVincenzo Maffione
This code validates the netmap buf_size against the interface MTU and maximum descriptor size, to make sure the values are consistent. Moving this functionality to its own function is needed because this function is also called by Linux-specific code. MFC after: 3 days Notes: svn path=/head/; revision=342300
2018-12-06netmap: netmap_transmit should honor bpf packet tap hookVincenzo Maffione
This allows tcpdump to capture outbound kernel packets while in netmap mode Submitted by: Marc de la Gueronniere <mdelagueronniere@verisign.com> Reviewed by: vmaffione MFC after: 1 week Sponsored by: Verisign, Inc. Differential Revision: https://reviews.freebsd.org/D17896 Notes: svn path=/head/; revision=341624
2018-12-05netmap: align codebase to the current upstream (760279cfb2730a585)Vincenzo Maffione
Changelist: - Replace netmap passthrough host support with a more general mechanism to call TXSYNC/RXSYNC from an in-kernel event-loop. No kernel threads are used to use this feature: the application is required to spawn a thread (or a process) and issue a SYNC_KLOOP_START (NIOCCTRL) command in the thread body. The kernel loop is executed by the ioctl implementation, which returns to userspace only when a different thread calls SYNC_KLOOP_STOP or the netmap file descriptor is closed. - Update the if_ptnet driver to cope with the new data structures, and prune all the obsolete ptnetmap code. - Add support for "null" netmap ports, useful to allocate netmap_if, netmap_ring and netmap buffers to be used by specialized applications (e.g. hypervisors). TXSYNC/RXSYNC on these ports have no effect. - Various fixes and code refactoring. Sponsored by: Sunny Valley Networks Differential Revision: https://reviews.freebsd.org/D18015 Notes: svn path=/head/; revision=341516
2018-10-23netmap: align codebase to the current upstream (sha 8374e1a7e6941)Vincenzo Maffione
Changelist: - Move large parts of VALE code to a new file and header netmap_bdg.[ch]. This is useful to reuse the code within upcoming projects. - Improvements and bug fixes to pipes and monitors. - Introduce nm_os_onattach(), nm_os_onenter() and nm_os_onexit() to handle differences between FreeBSD and Linux. - Introduce some new helper functions to handle more host rings and fake rings (netmap_all_rings(), netmap_real_rings(), ...) - Added new sysctl to enable/disable hw checksum in emulated netmap mode. - nm_inject: add support for NS_MOREFRAG Approved by: gnn (mentor) Differential Revision: https://reviews.freebsd.org/D17364 Notes: svn path=/head/; revision=339639
2018-05-18netmap: pull fix for 32-bit support from upstreamMatt Macy
Approved by: sbruno Notes: svn path=/head/; revision=333778
2018-04-12netmap: align codebase to the current upstream (commit id 3fb001303718146)Vincenzo Maffione
Changelist: - Turn tx_rings and rx_rings arrays into arrays of pointers to kring structs. This patch includes fixes for ixv, ixl, ix, re, cxgbe, iflib, vtnet and ptnet drivers to cope with the change. - Generalize the nm_config() callback to accept a struct containing many parameters. - Introduce NKR_FAKERING to support buffers sharing (used for netmap pipes) - Improved API for external VALE modules. - Various bug fixes and improvements to the netmap memory allocator, including support for externally (userspace) allocated memory. - Refactoring of netmap pipes: now linked rings share the same netmap buffers, with a separate set of kring pointers (rhead, rcur, rtail). Buffer swapping does not need to happen anymore. - Large refactoring of the control API towards an extensible solution; the goal is to allow the addition of more commands and extension of existing ones (with new options) without the need of hacks or the risk of running out of configuration space. A new NIOCCTRL ioctl has been added to handle all the requests of the new control API, which cover all the functionalities so far supported. The netmap API bumps from 11 to 12 with this patch. Full backward compatibility is provided for the old control command (NIOCREGIF), by means of a new netmap_legacy module. Many parts of the old netmap.h header has now been moved to netmap_legacy.h (included by netmap.h). Approved by: hrs (mentor) Notes: svn path=/head/; revision=332423
2018-04-09netmap: align codebase to upstream version v11.4Vincenzo Maffione
Changelist: - remove unused nkr_slot_flags - new nm_intr adapter callback to enable/disable interrupts - remove unused sysctls and document the other sysctls - new infrastructure to support NS_MOREFRAG for NIC ports - support for external memory allocator (for now linux-only), including linux-specific changes in common headers - optimizations within netmap pipes datapath - improvements on VALE control API - new nm_parse() helper function in netmap_user.h - various bug fixes and code clean up Approved by: hrs (mentor) Notes: svn path=/head/; revision=332319
2017-11-27sys/dev: further adoption of SPDX licensing ID tags.Pedro F. Giffuni
Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Notes: svn path=/head/; revision=326255
2017-06-12Update the current version of netmap to bring it in sync with the githubLuiz Otavio O Souza
version. This commit contains mostly refactoring, a few fixes and minor added functionality. Submitted by: Vincenzo Maffione <v.maffione at gmail.com> Requested by: many Sponsored by: Rubicon Communications, LLC (Netgate) Notes: svn path=/head/; revision=319881
2016-10-27Various fixes for ptnet/ptnetmap (passthrough of netmap ports). In detail:Luigi Rizzo
- use PCI_VENDOR and PCI_DEVICE ids from a publicly allocated range (thanks to RedHat) - export memory pool information through PCI registers - improve mechanism for configuring passthrough on different hypervisors Code is from Vincenzo Maffione as a follow up to his GSOC work. Notes: svn path=/head/; revision=308000
2016-10-18remove stale and unused code from various filesLuigi Rizzo
fix build on 32 bit platforms simplify logic in netmap_virt.h The commands (in net/netmap.h) to configure communication with the hypervisor may be revised soon. At the moment they are unused so this will not be a change of API. Notes: svn path=/head/; revision=307574
2016-10-16Import the current version of netmap, aligned with the one on github.Luigi Rizzo
This commit, long overdue, contains contributions in the last 2 years from Stefano Garzarella, Giuseppe Lettieri, Vincenzo Maffione, including: + fixes on monitor ports + the 'ptnet' virtual device driver, and ptnetmap backend, for high speed virtual passthrough on VMs (bhyve fixes in an upcoming commit) + improved emulated netmap mode + more robust error handling + removal of stale code + various fixes to code and documentation (some mixup between RX and TX parameters, and private and public variables) We also include an additional tool, nmreplay, which is functionally equivalent to tcpreplay but operating on netmap ports. Notes: svn path=/head/; revision=307394