| Age | Commit message (Collapse) | Author |
|
Differential Revision: https://reviews.freebsd.org/D53098
Reviewed by: tuexen
Sponsored by: Netflix
|
|
Just like vmxnet3_intr_disable_all, iflib may invoke this routine
before vmxnet3_attach_post() has run, which is before the top-level
shared data area is initialized and the device made aware of it.
MFC after: 1 week
Sponsored by: Dell Inc.
|
|
This driver prefers connection speed over sas port speed. On stable/13
branch the stack-allocated CCB is not cleared thus the cam layer may
report weird speed on boot.
```
da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device
da0: 4294967.295MB/s transfers
```
-current and stable/14 have the change [1] which clears stack-allocated
CCB thus are not affected, but I want -current and stable/14 to have this
fix in to reduce drift between branches.
1. ec5325dbca62 cam: make sure to clear even more CCBs allocated on the stack
Reviewed by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D48438
|
|
As of 9e6544dd6e02c46b805d11ab925c4f3b18ad7a4b contigfree(9) is no longer
needed and should not be used anymore. We leave a wrapper for 3rd party
code in at least 15.x but remove (almost) all other cases from the tree.
This leaves one use of contigfree(9) untouched; that was the original
trigger for 9e6544dd6e02 and is handled in D45813 (to be committed
seperately later).
Sponsored by: The FreeBSD Foundation
Reviewed by: markj, kib
Tested by: pho (10h stress test run)
Differential Revision: https://reviews.freebsd.org/D46099
|
|
When we update credits there is a potential for a race causing an
overflow of vxcr_next (i.e. incrementing it past vxcr_ndesc). Change the
check to >= rather than == to be more robust against this.
Reviewed by: emaste
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43712
|
|
- s/withing/within/
MFC after: 3 days
|
|
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
|
|
At least one user reports issues with vmx interfaces after 725e4008ef,
where we default to not resetting the interface on VLAN changes. This
was on an ESXi 7.0.3 setup.
Reported by: Marcos Mendoza <mmendoza@netgate.com>
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
|
|
In rS360398, a new iflib device method was added with default of opt out
for VLAN events needing an interface reset.
This re-init is unintentional for vmxnet3(4).
MFC after: 2 weeks
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D41558
|
|
Since d49e83eac3baf16a22b1c5d42e8438b68b17e6f9, iflib(9) is ready
for this change.
While at it, make isc_driver_version strings (static) const where
not apparently un-const on purpose, too.
This reduces the size of the amd64 GENERIC by about 10 KiB.
|
|
Remove /^\s*\$FreeBSD\$$\n/
|
|
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
|
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
|
|
Fix the number of targets we inquiry to be one less than the maximum
number of targets adapter reports. This gets rid of the errors reported
on VMware Workstation:
(probe36:pvscsi0:0:65:0): INQUIRY. CDB: 12 00 00 00 24 00
(probe36:pvscsi0:0:65:0): CAM status: CCB request completed with an error
While here, print the maximum number of targets.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D39867
|
|
Summary:
Convert iflib(4) and the following drivers:
* axgbe
* em
* ice
* ixl
* vmxnet
Sponsored by: Juniper Networks, Inc.
Reviewed by: kbowling, #iflib
Differential Revision: https://reviews.freebsd.org/D37768
|
|
vcpuHint has been expanded to 16 bit on host side to enable
interruptions to be routed to more CPUs. Guest side should align with
the change.
This change has been tested with hosts with 8-bit and 16-bit vcpuHint,
on both platforms host side can get correct value.
This driver is for ESXi product which only supports x86/x64. They are
little-endian. So there is no need to consider big-endian system.
PR: 264840
Reviewed by: imp@, Zhenlei Huang
|
|
- s/occured/occurred/
MFC after: 3 days
|
|
|
|
|
|
This variable is unsed in a debug trace conditional on a private
debugging macro (so not eligible for __diagused).
|
|
Sponsored by: Rubicon Communications, LLC ("Netgate")
|
|
Passing up such descriptors to iflib is obviously wasteful.
But the main conern is that we may overrun iri_frags array because of
them. That's been observed in practice.
Also, assert that the number of fragments / descriptors / segments is
less than IFLIB_MAX_RX_SEGS.
Reviewed by: gallatin, pkelsey
MFC after: 3 weeks
Sponsored by: Panzura LLC
Differential Revision: https://reviews.freebsd.org/D33189
|
|
No functional change intended.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
|
|
Summary:
An error mapping PCI resources results in a panic due to unallocated
resources being freed up. This change puts the appropriate checks in
place to prevent the panic.
PR: 252445
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
Tested by: marcus
MFC after: 1 week
Sponsored by: VMware
Test Plan:
Along with user testing, also simulated error by inserting a ENXIO
return in vmci_map_bars().
Reviewed by: marcus
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D32016
|
|
We have no space there for such a long strings. Also it makes no
sense to set description if there is only one interrupt.
MFC after: 2 weeks
|
|
xpt_bus_register and xpt_bus_deregister returns a hybrid error that's
neither a cam_status, nor an errno, but a mix of both. Update
xpt_bus_register and xpt_bus_deregister to return an errno. The vast
majority of current users compare against zero, which can also be
spelled CAM_SUCCESS. Nobody uses CAM_FAILURE, so remove that symbol
to prevent comfusion (nothing returns it either).
Where the return value is saved, ensure that the variable 'error' is
used to store an errno and 'status' is used to store a cam_status where
it makes the code clearer (usually just in functions that already mix
and match). Where the return value isn't used at all, avoid storing it
at all.
Reviewed by: scottl@, mav@ (earlier version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30860
|
|
While the PV SCSI SG list can handle 512k of SG entries, it can only do
so for I/O that's aligned to 4k or better. newfs_msdos does unaligned
I/O, so triggers too long for host errors in cam when a 512k I/O is
attempted. Prefer power of 2 256k to the absolute maximum 508k, though
that can be revisited should the latter show to give significant
performance improvement.
MFC After: 3 days
Tested by: darius on discord (508k version of patch)
Sponsored by: Netflix
|
|
Doing a 'dd' over iscsi will reliably cause stalls. Tx
cleaning _should_ reliably happen as data is sent.
However, currently if the transmit queue fills it will
wait until the iflib timer (hz/2) runs.
This change causes the the tx taskq thread to be run
if there are completed descriptors.
While here:
- make timer interrupt delay a sysctl
- simplify txd_db_check handling
- comment on INTR types
Background on the change:
Initially doorbell updates were minimized by only writing to the register
on every fourth packet. If txq_drain would return without writing to the
doorbell it scheduled a callout on the next tick to do the doorbell write
to ensure that the write otherwise happened "soon". At that time a sysctl
was added for users to avoid the potential added latency by simply writing
to the doorbell register on every packet. This worked perfectly well for
e1000 and ixgbe ... and appeared to work well on ixl. However, as it
turned out there was a race to this approach that would lockup the ixl MAC.
It was possible for a lower producer index to be written after a higher one.
On e1000 and ixgbe this was harmless - on ixl it was fatal. My initial
response was to add a lock around doorbell writes - fixing the problem but
adding an unacceptable amount of lock contention.
The next iteration was to use transmit interrupts to drive delayed doorbell
writes. If there were no packets in the queue all doorbell writes would be
immediate as the queue started to fill up we could delay doorbell writes
further and further. At the start of drain if we've cleaned any packets we
know we've moved the state machine along and we write the doorbell (an
obvious missing optimization was to skip that doorbell write if db_pending
is zero). This change required that tx interrupts be scheduled periodically
as opposed to just when the hardware txq was full. However, that just leads
to our next problem.
Initially dedicated msix vectors were used for both tx and rx. However, it
was often possible to use up all available vectors before we set up all the
queues we wanted. By having rx and tx share a vector for a given queue we
could halve the number of vectors used by a given configuration. The problem
here is that with this change only e1000 passed the necessary value to have
the fast interrupt drive tx when appropriate.
Reported by: mav@
Tested by: mav@
Reviewed by: gallatin@
MFC after: 1 month
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D27683
|
|
Notes:
svn path=/head/; revision=365089
|
|
The pidx argument of isc_rxd_flush() indicates which is the last valid
receive descriptor to be used by the NIC. However, current code has
multiple issues:
- Intel drivers write pidx to their RDT register, which means that
NICs will only use the descriptors up to pidx-1 (modulo ring size N),
and won't actually use the one pointed by pidx. This does not break
reception, but it is anyway confusing and suboptimal (the NIC will
actually see only N-2 descriptors as available, rather than N-1).
Other drivers (if_vmx, if_bnxt, if_mgb) adhere to this semantic).
- The semantic used by Intel (RDT is one descriptor past the last
valid one) is used by most (if not all) NICs, and it is also used
on the TX side (also in iflib). Since iflib is not currently
using this semantic for RX, it must decrement fl->ifl_pidx
(modulo N) before calling isc_rxd_flush(), and then the
per-driver callback implementation must increment the index
again (to match the real semantic). This is confusing and suboptimal.
- The iflib refill function is also called at initialization.
However, in case the ring size is smaller than 128 (e.g. if_mgb),
the refill function will actually prepare all the receive
descriptors (N), without leaving one unused, as most of NICs assume
(e.g. to avoid RDT to overrun RDH). I can speculate that the code
looks like this right now because this issue showed up during
testing (e.g. with if_mgb), and it was easy to workaround by
decrementing pidx before isc_rxd_flush().
The goal of this change is to simplify the code (removing a bunch
of instructions from the RX fast path), and to make the semantic of
isc_rxd_flush() consistent across drivers. To achieve this, we:
- change the semantics of the pidx argument to the usual one (that
is the index one past the last valid one), so that both iflib and
drivers avoid the decrement/increment dance.
- fix the initialization code to prepare at most N-1 descriptors.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26191
Notes:
svn path=/head/; revision=365061
|
|
When vmx(4) was converted to an iflib driver in r343291, the
power-of-2 queue count constraint was removed as it appeared that
current implementations of the VMXNET3 virtual device no longer
required that constraint. It turns out that some of the
implementations still do, and on such systems, the device will fail to
initialize when configured with a non-power-of-2 RX or TX queue count.
PR: 237321
Reported by: ncrogers@gmail.com
MFC after: 1 week
Notes:
svn path=/head/; revision=359029
|
|
These adjustments improve performance with jumbo frames and/or LRO
enabled (i.e., when there may be multiple descriptors per packet) by
increasing the default size of the receive queues and by always using
page-sized buffers for the body type receive ring.
This patch also adjust the initialization of the max frame size to
remove cases where certain configuration sequences would result in 2K
receive buffers being used instead of 4K ones when jumbo frames were
enabled.
Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23950
Notes:
svn path=/head/; revision=359001
|
|
skipping receive descriptors
This fixes a bug where the checksum offload status of received packets
was being taken from the first descriptor instead of the last, which
affected LRO packets.
The driver has been hardened against the device skipping receive
descriptors, although it is not believed that this can occur given the
way this implementation configures the receive rings.
Additionally, for packets received with the error indicator set, the
driver now forces the length of all fragments in that packet to zero
prior to passing it to iflib. Such packets should wind up being
discarded at some point in the stack anyway, but this removes any
questions by killing them in the driver.
Counters have been added (and exposed via sysctls) for skipped receive
descriptors, zero-length packets received, and packets received with
the error indicator set so that these conditions can be easily
observed in the field.
PR: 243126, 243392, 240628
Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer
Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23949
Notes:
svn path=/head/; revision=359000
|
|
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Notes:
svn path=/head/; revision=358333
|
|
We observe at least one problem: if a UDP socket is connect(2)-ed, then a
received packet that matches the connection cannot be matched to the
corresponding PCB because of an incorrect flow ID. That was oberved for DNS
requests from the libc resolver. We got this problem because FreeBSD
r343291 enabled code that can set rsstype of received packets to values
other than M_HASHTYPE_OPAQUE_HASH. Earlier that code was under 'ifdef
notyet'.
The essence of this change is to use the system-wide RSS key instead of
some historic hardcoded key when the software RSS is enabled and it is
configured to use Toeplitz algorithm (the default).
In all other cases, the driver reports the opaque hash type for received
packets while still using Toeplitz algorithm with the internal key.
PR: 242890
Reviewed by: pkelsey
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D23147
Notes:
svn path=/head/; revision=357042
|
|
Fix a mistake introduced by r343291, which ported the vmx(4)
driver to iflib.
In case of TSO, the hlen field of the (first) tx descriptor must
be initialized to the cumulative length of Ethernet, IP and TCP
headers. The length of the TCP header was missing.
PR: 236999
Reported by: pkelsey
Reviewed by: avg
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22967
Notes:
svn path=/head/; revision=356703
|
|
Fix suggested by: jhb, scottl
Sponsored by: Panzura
Notes:
svn path=/head/; revision=354716
|
|
This driver allows to usage of the paravirt SCSI controller
in VMware products like ESXi. The pvscsi driver provides a
substantial performance improvement in block devices versus
the emulated mpt and mps SCSI/SAS controllers.
Error handling in this driver has not been extensively tested
yet.
Submitted by: vbhakta@vmware.com
Relnotes: yes
Sponsored by: VMware, Panzura
Differential Revision: D18613
Notes:
svn path=/head/; revision=354715
|
|
Notes:
svn path=/head/; revision=353803
|
|
As of r347221 the iflib legacy interrupt mode setup assumes that drivers
perform both receive and transmit processing from the interrupt handler.
This assumption is invalid in the vmxnet3 driver, so introduce the
IFLIB_SINGLE_IRQ_RX_ONLY flag to make iflib avoid tx processing in the
interrupt handler.
PR: 239118
Reported and tested by: Juraj Lutter <otis@sk.freebsd.org>
Obtained from: marius
Reviewed by: gallatin
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21831
Notes:
svn path=/head/; revision=352906
|
|
kernel module automatically when FreeBSD is running on VMware.
Reviewed by: mp
Differential Revision: https://reviews.freebsd.org/D21182
Notes:
svn path=/head/; revision=351482
|
|
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
Notes:
svn path=/head/; revision=347984
|
|
When in MSI mode, the device was only being configured with one
interrupt index, but it needs two - one for the actual interrupt and
one to park the tx queue at.
Also clarified comments relating to interrupt index assignment.
Reported by: Yuri Pankov <yuripv@yuripv.net>
MFC after: 1 day
Notes:
svn path=/head/; revision=343688
|
|
bus_teardown_intr(9) before pci_release_msi(9).
- Ensure that iflib(4) and associated drivers pass correct RIDs to
bus_release_resource(9) by obtaining the RIDs via rman_get_rid(9)
on the corresponding resources instead of using the RIDs initially
passed to bus_alloc_resource_any(9) as the latter function may
change those RIDs. Solely em(4) for the ioport resource (but not
others) and bnxt(4) were using the correct RIDs by caching the ones
returned by bus_alloc_resource_any(9).
- Change the logic of iflib_msix_init() around to only map the MSI-X
BAR if MSI-X is actually supported, i. e. pci_msix_count(9) returns
> 0. Otherwise the "Unable to map MSIX table " message triggers for
devices that simply don't support MSI-X and the user may think that
something is wrong while in fact everything works as expected.
- Put some (mostly redundant) debug messages emitted by iflib(4)
and em(4) during attachment under bootverbose. The non-verbose
output of em(4) seen during attachment now is close to the one
prior to the conversion to iflib(4).
- Replace various variants of spelling "MSI-X" (several in messages)
with "MSI-X" as used in the PCI specifications.
- Remove some trailing whitespace from messages emitted by iflib(4)
and change them to consistently start with uppercase.
- Remove some obsolete comments about releasing interrupts from
drivers and correct a few others.
Reviewed by: erj, Jacob Keller, shurd
Differential Revision: https://reviews.freebsd.org/D18980
Notes:
svn path=/head/; revision=343578
|
|
Also, expose IFLIB_MAX_RX_SEGS to iflib drivers and add
iflib_dma_alloc_align() to the iflib API.
Performance is generally better with the tunable/sysctl
dev.vmx.<index>.iflib.tx_abdicate=1.
Reviewed by: shurd
MFC after: 1 week
Relnotes: yes
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D18761
Notes:
svn path=/head/; revision=343291
|
|
Run on LLNW canaries and tested by pho@
gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.
When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:
InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32
4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32
4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32
4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32
4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32
4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32
4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32
After the patch
InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51
5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51
5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51
5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51
5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52
5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52
Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch
Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15366
Notes:
svn path=/head/; revision=333813
|
|
and VMware legal:
- Add a dual BSD-2 Clause/GPLv2 LICENSE file in the VMCI directory
- Remove the use of "All Rights Reserved"
- Per best practice, remove copyright/license info from Makefile
Reviewed by: imp, emaste, jhb, Vishnu Dasa <vdasa@vmware.com>
Approved by: VMware legal via Mark Peek <markpeek@vmware.com>
Differential Revision: https://reviews.freebsd.org/D14979
Notes:
svn path=/head/; revision=332263
|
|
Approved by: Vishnu Dasa <vdasa@vmware.com>
Notes:
svn path=/head/; revision=331609
|
|
To fix the GCC build, remove multiple redundant declarations of
vmci_send_datagram() (the copy in vmci.h as well as the extern definition in
vmci_queue_pair.c were wholly redundant).
Also to fix the GCC build, include a non-empty format string in the vmci(4)
definition of ASSERT(). It seems harmless either way, but adding the
stringified invariant is easier than masking the warning.
The other vmci_kernel_defs.h changes are cosmetic and simply match macros to
existing definitions.
Reported by: GCC 6.4.0
Sponsored by: Dell EMC Isilon
Notes:
svn path=/head/; revision=331566
|
|
In a virtual machine, VMCI is exposed as a regular PCI device. The primary
communication mechanisms supported are a point-to-point bidirectional
transport based on a pair of memory-mapped queues, and asynchronous
notifications in the form of datagrams and doorbells. These features are
available to kernel level components such as vSockets through the VMCI
kernel API. In addition to this, the VMCI kernel API provides support for
receiving events related to the state of the VMCI communication channels,
and the virtual machine itself.
Submitted by: Vishnu Dasa <vdasa@vmware.com>
Reviewed by: bcr, imp
Obtained from: VMware
Differential Revision: https://reviews.freebsd.org/D14289
Notes:
svn path=/head/; revision=331510
|