Commit Graph

23 Commits

Author SHA1 Message Date
Cory Benfield be4ea8ac74 Add SocketOptionChannel for wider socket options. (#589)
Motivation:

A small number of socket options have values that do not fit into a C
int type. Our current ChannelOption based approach for setting these
simply does not work, and cannot be extended to support the truly arbitrary
types that the setsockopt/getsockopt functions allow here.

This makes it impossible to use some socket options, which is hardly a
good place to be.

There were a number of ways we could have addressed this: we could have
special-cased all socket options with non-integer types in ChannelOption,
but I believe that would be too manual, and risk limiting users that need
to set other socket options. We could also have added a ChannelOption
that simply allows users to pass a buffer to write into or read from,
but that is a very un-Swift-like API that feels pretty gross to hold.

Ultimately, the nicest seemed to be a new protocol users could check
for, and that would provide APIs that let users hold the correct concrete
type. As with setsockopt/getsockopt, while this API is typed it is
not type-safe: ultimately, the struct we have here is treated just as a
buffer by setsockopt/getsockopt. We do not attempt to prevent users from
shooting themselves in the foot here.

This PR does not include an example use case in any server, as I will
provide such an example in a subsequent multicast PR.

Modifications:

- Added a SocketOptionChannel protocol.
- Conformed BaseSocketChannel to SocketOptionChannel.
- Wrote some tests for this.

Result:

Users can set and get sockopts with data that is larger than a C int.
2018-09-18 13:40:38 +02:00
Johannes Weiss df764fedf0
don't crash when Channel goes inactive in read triggered by write error (#594)
Motivation:

Previously we asserted that a Channel cannot go inactive after calling
out for a read that was triggered by (draining the receive buffer after)
a write error.

This assertion was put in place to guard against `readComplete` events
sent on inactive channels. It did that job just fine but crashes aren't
great so we now conditionally fire the `readComplete` event if the
Channel stays active.

Modifications:

make the readComplete event firing conditional

Result:

fewer crashes, more happy faces
2018-08-27 11:49:58 +01:00
Johannes Weiss a9072729bb
output a hint if we get a reset connection w/o error set (#573)
Motivation:

We have one case in the code where we expect a socket error to be set
when we hit a connection that got the `.reset` event. On Linux that's
expected and therefore we just map this to `ECONNRESET`.
On other platforms however that is unexpected. Now I just found an
[Envoy commit](62f8e51bdd (diff-3315c7b2)    91f78a482ebd4cbfc0545362R345)
which suggests this might not work on FreeBSD. This should prompt our
users to let us know if we ever see this error on non-Linux systems.

Modifications:

Add a message that makes it clear this is unexpected and link our
ticket.

Result:

we should learn if this actually happens on macOS.
2018-08-16 15:13:02 +01:00
Johannes Weiss d32d5c6a50 add forgotten selfs (#574)
Motivation:

Swift code that doesn't use explicit `self.` is hard to read, even
harder to search for and quite bug prone. During my analysis I found
quite a number of them, this fixes the ones I came across.

Modifications:

Add a bunch of missing explicit `self.`s

Result:

code easier to read and less bug prone
2018-08-16 09:28:24 +01:00
Cory Benfield 695afc5205
Widen assertion on readEOF. (#453)
Motivation:

In Darwin it is possible for readEOF to be received before writable,
if we race connection teardown with registering for writable. See
https://github.com/apple/swift-nio/issues/452 for a more in-depth
explanation.

Modifications:

Widened the assertion.

Result:

People's code won't go bang.
2018-05-30 14:35:10 +01:00
Johannes Weiss ec880b0853 don't just drop Channels on the floor if registration fails
Motivation:

If registration of a Channel fails we need to close that Channel.
Previously we'd just drop it on the floor which triggers an assertion
that an open channel got leaked.

Modifications:

In cases where the Channel registration fails, we need to close that
channel.

Result:

- No crashes if registration fails.
- fixes #417
2018-05-22 11:29:20 +01:00
Johannes Weiß 2f55baf537 implement lazy selector registration (#388)
Motivation:

Previously we would eagerly register any fd with the selector
(epoll/kqueue) which works well with kqueue. With epoll however, you'll
get `EPOLLHUP` immediately if you supplied a socket that hasn't had
connect/bind called on it.
We got away with this because we took quite a bit of care to always make
sure we call connect/bind pretty much immediately after we registered
it.  And crucially before we ever asked the selector to tell us about
new events.

Modifications:

made selector registration lazy (when we try activating the channel by
calling connect/bind)

Result:

the user can now register and connect whenever they feel like it.
2018-05-04 19:09:59 +02:00
Johannes Weiß a5db2a6751
fix event loop hop between registration and activation of accepted channels (#389)
Motivation:

Once again, we had an extra event loop hop between a channel
registration and its activation. Usually this shows up as `EPOLLHUP` but
not so for accepted channels. What happened instead is that we had a
small race window after we accepted a channel. It was in a state where
it was not marked active _yet_ and therefore we'd not read out its data
in case we received a `.readEOF`. That usually leads to a stale
connection. Fortunately it doesn't happen very often that the client
connects, immediately sends some data and then shuts the write end of
the socket.

Modifications:

prevent the event loop hop between registration and activation

Result:

will always read out the read buffer on .readEOF
2018-05-04 15:48:08 +01:00
Johannes Weiß 5bebbf5565 don't crash if close is called when close fails (#387)
Motivation:

@vlm hit an interesting situation which is very likely the sign of
another bug that we have yet to find:

During a close, @vlm hit an error in close which lead to a user callout
which then closed again.

The immediate fix is simple (this PR) is as always not to call out to
the user before reconciling our own state. But hitting this bug also
means that either a `socket.close` or a `selectableEventLoop.deregister`
failed as we only called out in those situations. That definitely hides
a condition that is hard to imagine without another bug that we still
need to find.

Modifications:

in `BaseSocketChannel.close0` follow rule 0 and don't call out before
reconciling state.

Result:

fewer crashes, less potential to hide other bugs.
2018-05-03 17:35:49 +02:00
Johannes Weiß 86fce7fce8
don't nil out connectPromise prematurely and wait fully (#337)
Motivation:

We can't nil out the connectPromise before we fail it. This totally
slipped through because we never fully waited for it because of the
reasons below.

It's quite easy to misuse `eventLoop.submit` as if you do some
asynchronous operation (will return `EventLoopFuture<T>`) the result of
the submit will be `EventLoopFuture<EventLoopFuture<T>>`. So if you just
do

    try eventLoop.submit {
        return somethingAsynchronous
    }.wait()

then you will have only waited for the asynchronous operation to have
started (it might still be running). Fortunately, the compiler usually
warns you about that unless it returns `Void`. So it naturally tells you
to write

    try eventLoop.submit {
        return somethingAsynchronous
    }.wait().wait()

which will wait for both. Unfortunately, `XCTAssertNoThrow` will just
eat away any type so there it's advisable to write

    XCTAssertNoThrow(try eventLoop.submit { ... }.wait.wait() as Void)

to be sure it's actually a void. The Swift compiler however has an
annoying bug where it won't always realise that that's a problem
(https://bugs.swift.org/browse/SR-7478). Fortunately the 4.1 release
behaves okay so CI will catch it.

Modifications:

- replace the bad pattern
    _ = try el.submit { ... }
- added `as Void`s inside `XCTAssertNoThrow`s that do submit
- added missing `.wait()`s

Result:

tests should be more correct.
2018-05-01 15:24:20 +01:00
Norman Maurer 9f01374169
Ensure localAddress / remoteAddress are still accessible in channelInactive / handlerRemoved (#346)
Motivation:

Often its useful to be still be able to access the local / remote address during channelInactive / handlerRemoved callbacks to for example log it. We should ensure its still accessible during it.

Modifications:

- Fallback to slow-path in ChannelHandlerContext.localAddress0 / remoteAddress0 if fast-path fails to try accessing the address via the cache.
- Clear cached addresses after all callbacks are run.
- Add unit test.

Result:

Be able to access addresses while handlers are notified.
2018-04-23 12:51:01 +02:00
Johannes Weiß 902b18df16
don't deliver events for unregistered fds (#341)
Motivation:

Since forever we had a major bug in the Selector: In this condition:

- kqueue/epoll had many events
- in one of the earlier events we unregister a Channel whose fd is on of
  the later events
- we subsequently (still in the same event loop tick) register a new
  channel which gets the same fd as the previously closed one

then we would deliver an event that was meant for a previous channel to
a newly opened one.

Thanks to @mcdappdev for hitting this bug, helping us debug it and also
providing a repeatedly working repro.

Modifications:

if during event delivery any fd gets unregistered, we stop delivering
the remaining events and rely on the selector to redeliver them
again next time.

Result:

we don't deliver events for previously closed channels to new ones.
2018-04-20 16:19:10 +01:00
Norman Maurer 91ef938c60
Ensure correct ordering of promise notification when connect is still pending. (#330)
Motivation:

When a pending connect is still in process and the Channel is closed we need to ensure we use the correct ordering when notify the promises and ChannelHandlers in the pipeline.

The ordering should be:

- Notify pending connect promise
- Notify close promise
- Call channelInactive(...)

Modifications:

- Correct ordering of notification
- Add unit test

Result:

Ensure correct ordering when connect is still pending
2018-04-19 14:06:21 +02:00
Lev Walkin 338cfb541b support for initializing off the system file descriptors (#285)
Motivation:

In some contexts it is important to be able to pass file descriptors
around instead of specifying the particular host and port. This allows removing
certain windows for race conditions, and has positive security implications.

Modifications:

Added `withConnectedSocket(_:)` call to the `ClientBootstrap` that can
be used to create a new `Channel` out of existing file descriptor
representing the connected socket.

Result:

Allows initializing `Channel`s off the existing file descriptors.
2018-04-19 09:58:39 +02:00
Johannes Weiß a1ab3f32b3
propagate socket errors on .reset (#325)
Motivation:

Since #286, we do eagerly detect connection resets which is good. To
resolve the situation we first try to read everything off the socket. If
something fails there, everything is good and the client gets the
correct error. If that however doesn't lead to any errors, we would
close the connection with `error: ChannelError.eof` which the client
would then see for example on a connect that happened to get
`ECONNRESET` or `ECONNREFUSED`. That's not right so these situations are
fixed here.

Modifications:

- on reset, try to get the socket error if any
- if no socket error was detected, we send a `ECONNRESET` because we
  know we have been reset but not why and didn't encounter any errors
- write tests for these situations

Result:

connection resets that don't hit any other errors will now see the
correct error
2018-04-18 18:56:00 +01:00
Cory Benfield 707b413ec6 Synchronous connection failures must close channels. (#329)
Motivation:

When a connect() call returns an error other than EINPROGRESS, we
currently leave the FD registered and don't close the channel. This is
wrong, we should take this as a signal to close the socket immediately,
rathern than letting the selector tell us this FD is dead: after all,
we know it's dead.

Modifications:

Call `close0()` in synchronous connect failures.

Result:

Faster failures!
2018-04-18 16:07:05 +01:00
Johannes Weiß b109389c54 eagerly process input EOF/connection resets (fixes #277) (#286)
Motivation:

We had a number of problems:

1. We wanted to lazily process input EOFs and connection resets only
   when the user actually calls `read()`. On Linux however you cannot
   unsubscribe from `EPOLLHUP` so that's not possible.
2. Lazily processing input EOFs/connection resets wastes kernel
   resources and that could potentially lead to a DOS
3. The very low-level `Selector` interpreted the eventing mechanism's
   events quite a lot so the `EventLoop`/`Channel` only ever saw
   `readable` or `writable` without further information what exactly
   happened.
4. We completely ignored `EPOLLHUP` until now which on Unix Domain
   Socket close leads to a 100% CPU spin (issue #277)

Modifications:

- made the `Selector` interface richer, it now sends the following
  events: `readable`, `writable`, `readEOF` (input EOF), `reset`
  (connection reset or some error)
- process input EOFs and connection resets/errors eagerly
- change all tests which relied on using unconnected and unbound sockets
  to user connected/bound ones as `epoll_wait` otherwise would keep
  sending us a stream of `EPOLLHUP`s which would now lead to an eager
  close

Result:

- most importantly: fix issue #277
- waste less kernel resources (by dealing with input EOFs/connection
  resets eagerly)
- bring kqueue/epoll more in line
2018-04-16 19:36:47 +02:00
Cory Benfield 3275ff7c9c
Don't call to user code before reconciling Channel state. (#310)
Motivation:

Callouts to user code allows the user to make calls that re-enter
channel code. In the case of channel closure, we could call out to the
user before the channel knew it was completely closed, which would
trigger a safety assertion (in debug mode) or hit a fatalError
(in release mode).

Modifications:

Reconciled channel state before we call out to user code.
Added a test for this case.

Result:

Fewer crashes, better channel state management.
2018-04-12 13:49:09 +01:00
Johannes Weiß aee4aad3f9 fix a crash when async connect failed (#302) (#309)
Motivation:

When an asynchronous connect failed we would crash because we didn't
correctly close the connection to start tearing everything down.
That correctly made the `SocketChannelLifecycleManager` trip.

This was filed as #302.

Modifications:

Correctly deregister and close the connection if an async connect fails.

Result:

Fewer crashes, fixes #302.
2018-04-12 14:21:12 +02:00
Johannes Weiß 748e08e62a fix issues found by TSan (#294)
Motivation:

NIO 1.3.1 was TSan clean but `master` regressed. In total I could see
two breakages both proudly presented by `SocketChannelLifecycleManager`.
Sorry :| :

1. accessing the `isActiveAtomic` from arbitrary threads:
   this was clearly a bug because `SocketChannelLifecycleManager` is
   usually held mutabe. So `isActiveAtomic` needs to be injected into
   `SocketChannelLifecycleManager` rather than contstructed there
2. accessing the `ChannelPipeline` through `SocketChannelLifecycleManager`:
   same problem as above really.

Modifications:

inject both the ChannelPipeline and the isActiveAtomic in the
appropriate places.

Result:

NIO much more correct, TSan happy
2018-04-09 12:05:54 +01:00
Johannes Weiß 23744213e6 make Channel lifecycle statemachine explicit (#220)
Motivation:

We had a lot of problems with the Channel lifecycle statemachine as it
wasn't explicit, this fixes this. Additionally, it asserts a lot more.

Modifications:

- made Channel lifecycle statemachine explicit
- lots of asserts

Result:

- hopefully the state machine works better
- the asserts should guide our way to work on in corner cases as well
2018-03-29 19:50:07 +02:00
Cory Benfield 0a23199d7d
Remove busywork left over from flush promises. (#234)
Motivation:

The Socket channels and PendingDatagramWritesManager had a bunch of
overhead left over from when they were handling flush promises. This was
only adding to code complexity and perf cost.

Modifications:

Removed the promise label in markPendingWritePoint.

Reworked the code to stop expecting write promises in
PendingDatagramWritesManager.

Result:

Smaller, faster code.
2018-03-26 17:07:56 +01:00
Johannes Weiß f58e8318fb
move BaseSocketChannel to its own file (#212)
Motivation:

{,Server,Datagram,Base}SocketChannel hold quite a bit of complexity and
also a many of lines of code. Yet they all shared the very same file
which isn't great.

Modifications:

- moved BaseSocketChannel into its own file
- made the methods that need overriding internal (from `fileprivate`)

Result:

SocketChannel.swift has a lot fewer lines of code.
2018-03-21 17:15:53 +00:00