linux.git/drivers/block/drbd, branch v3.16-rc2

drbd: use list_first_entry_or_null in first_peer_device/first_connection

2014-04-30T19:46:56+00:00

If there are no peer_devices or connections, I'd rather have NULL
than some "arbitrary" address pretending to point to a struct.

Helps to avoid hard to debug symptoms, in case we ever try to use
and dereference a drbd_connection or drbd_peer_device
where we in fact don't have any connection at all.

Signed-off-by: Philipp Reisner 
Signed-off-by: Lars Ellenberg 
Signed-off-by: Jens Axboe

drbd: Allow attaching of a newly created device to any backing device

2014-04-30T19:46:56+00:00

A newly created device was never exposed before, i.e. has a
exposed_data_uuid of 0. Then it is valid to attach to any current_uuid
of a backing device (of course also to a newly created one (4))

Signed-off-by: Philipp Reisner 
Signed-off-by: Lars Ellenberg 
Signed-off-by: Jens Axboe

drbd: Test cstate while holding req_lock

2014-04-30T19:46:56+00:00

In case a connection transitions into C_TIMEOUT within the timer
function (request_timer_fn()) we need to make sure that the receiver
thread (potentially running on a different CPU) sees the updated
cstate later on.

Signed-off-by: Philipp Reisner 
Signed-off-by: Lars Ellenberg 
Signed-off-by: Jens Axboe

drbd: use blk_set_stacking_limits()

2014-04-30T19:46:55+00:00

...instead directly assigning to q->limits.discard_zeroes_data

Signed-off-by: Philipp Reisner 
Signed-off-by: Lars Ellenberg 
Signed-off-by: Jens Axboe

drbd: evaluate disk and network timeout on different requests

2014-04-30T19:46:55+00:00

Just because it is the oldest not yet completed request
does not make it the oldest request waiting for disk.
Or waiting for the peer.

And we completely missed already completed requests
that would still hold references to activity log extents,
waiting only for the barrier ack.

Find two oldest not yet completely processed requests,
one that is still waiting for local completion,
and one that is still waiting for some response from the peer.
These may or may not be the same request object.

Then separately apply the network and disk timeouts, respectively.

Signed-off-by: Philipp Reisner 
Signed-off-by: Lars Ellenberg 
Signed-off-by: Jens Axboe

drbd: Fix a hole in the challange-response connection authentication

2014-04-30T19:46:55+00:00

In the implementation as it was, the two peers sent each other
a challenge, and expects the challenge hashed with the shared
secret back.

A attacker could simply wait for the challenge of the peer, and
send the same challenge back. Then it waits for the response, and
sends the same response back.

Prevent this by not accepting a challenge from the peer that is
the same as the challenge sent to the peer.

Signed-off-by: Philipp Reisner 
Signed-off-by: Lars Ellenberg 
Signed-off-by: Jens Axboe

drbd: always implicitly close last epoch when idle

2014-04-30T19:46:55+00:00

Once our sender thread needs to wait_for_work(),
and actually needs to schedule(), just before we do that,
we already check if it is useful to implicitly close the last epoch.

The condition was too strict: only implicitly close the epoch,
if there have been no new (write) requests at all.

The assumption was that if there were new requests, they would
always be communicated one way or another, and would send necessary
epoch separating barriers explicitly.

This is not always true, e.g. when becoming diskless,
or while explicitly starting a full resync.

The last communicated epoch could stay open for a long time,
locking down corresponding activity log extents.

It is safe to always implicitly send that last barrier, as soon as we
determin that there cannot be more requests in the last communicated
epoch, even if there have been (uncommunicated) new requests in new
epochs meanwhile.

Signed-off-by: Philipp Reisner 
Signed-off-by: Lars Ellenberg 
Signed-off-by: Jens Axboe

drbd: add back some fairness to AL transactions

2014-04-30T19:46:55+00:00

When batching more updates to the activity log into single transactions,
we lost the ability for new requests to force themselves into the active
set: all preparation steps became non-blocking, and if all currently
hot extents keep busy, they could starve out new incoming requests
to cold extents for quite a while.

This can only happen if your IO backend accepts more IO operations per
average DRBD replication round trip time than you have al-extents
configured.

If we have incoming requests to cold extents,
at least do one blocking update per transaction.

In an artificial worst-case workload on SSD with an asynchronous 600 ms
replication link, with al-extents = 7 (the minimum we allow), and
concurrent full resynch, without this patch, some write requests have
been observed to be starved for 40 seconds.
With this patch, application observed a worst case latency of twice the
replication round trip time.

Signed-off-by: Philipp Reisner 
Signed-off-by: Lars Ellenberg 
Signed-off-by: Jens Axboe

drbd: keep max-bio size during detach/attach on disconnected primary

2014-04-30T19:46:55+00:00

We want to store in persistent meta data what the peer DRBD can handle,
which, due to spreading requests to multiple bios,
may be more than its backing device can handle.

Otherwise, if a disconnected Primary temporarily loses access to its local data
as well, we may accidentally shrink the max-bio setting, portentially causing
already assembled, but not yet processed, application bios to be spuriously
failed due to device limits.

Signed-off-by: Philipp Reisner 
Signed-off-by: Lars Ellenberg 
Signed-off-by: Jens Axboe

drbd: fix a race between start_resync and send_and_submit

2014-04-30T19:46:55+00:00

In the drbd make request function, specifically in
drbd_send_and_submit(), we decide whether we want to send the actual
write request, or only a "set this block out of sync" information.

We do so based on the current connection state, while holding the req_lock.
The connection state is not supposed to change while holding the req_lock.

But in drbd_start_resync, we did change that state anyways,
while only holding the global_state_lock, which is enough to change
sync-after dependencies (paused vs active resync), but
not good enough to change the connection state.

Fix: in drbd_start_resync, first grab the req_lock to serialize with
drbd_send_and_submit(), before grabbing the global_state_lock
to be able to evaluate the sync-after dependencies.

Signed-off-by: Philipp Reisner 
Signed-off-by: Lars Ellenberg 
Signed-off-by: Jens Axboe