Motivation:
The HappyEyeballs connector synchronises state on an event loop but
calls out to a 'Resolver' to do DNS lookups. The resolver returns
results as a future which may be on a different loop than the connector.
The connector does not hop back to its own event loop before processing
the results.
For client bootstraps, if no resolver is specified then the default
resolver uses the same event loop as the connector so in many cases this
is not an issue. However, if a custom resolver is used this guarantee is
lost and data races are much more likely.
Modifications:
- Hop back to the connector's event loop after calling the resolver.
- Add a test.
Result:
Fewer data races.
* Add `AsyncChannel` based `ServerBootstrap.bind()` methods
# Motivation
In my previous PR, we added a new async bridge from a NIO `Channel` to Swift Concurrency primitives in the from of the `NIOAsyncChannel`. This type alone is already helpful in bridging `Channel`s to Concurrency; however, it is hard to use since it requires to wrap the `Channel` at the right time otherwise we will drop reads. Furthermore, in the case of protocol negotiation this becomes even trickier since we need to wait until it finishes and then wrap the `Channel`.
# Modification
This PR introduces a few things:
1. New methods on the `ServerBootstrap` which allow the creation of `NIOAsyncChannel` based channels. This can be used in all cases where no protocol negotiation is involved.
2. A new protocol and type called `NIOProtocolNegotiationHandler` and `NIOProtocolNegotiationResult` which is used to identify channel handlers that are doing protocol negotiation.
3. New methods on the `ServerBootstrap` that are aware of protocol negotiation.
# Result
We can now easily and safely create new `AsyncChannel`s from the `ServerBootstrap`
* Code review
* Fix typo
* Fix up tests
* Stop finishing the writer when an error is caught
* Code review
* Fix up writer tests
* Introduce shared protocol negotiation handler state machine
* Correctly handle multi threaded event loops
* Adapt test to assert the channel was closed correctly.
* Code review
Motivation:
The fix provided in #2407 was subtly wrong. ignoreSIGPIPE, which throws
the error in question, closes the FD on error _except_ on EINVAL from
fcntl, where it instead does not. This inconsistent behaviour is the
source of the bug. Because this behaviour is inconsistent, the fix from
PR #2407 is also inconsistent and can in some cases double-close the
socket.
The actual issue is not as old as I expected: the code can be observed
by reviewing the change in #1598, which incorrectly inserted the error
transformation before the call to close.
Modifications:
- Revert the change from #2407.
- Move the close in ignoreSIGPIPE to before the error check, rather than
after, so we unconditionally execute it.
Result:
More resilient fix.
Motivation:
When an error is hit during a read loop, a channel is able to tolerate
that error without closing. This is done for a number of reasons, but
the most important one is accepting sockets for already-closed
connections, which can trigger all kinds of errors on the read path.
Unfortunately, there was an edge-case in the code for handling this
case. If one or more reads in the loop had succeeded before the error
was caught, the inner code would be expecting a call to readIfNeeded,
but the outer code wouldn't make it. This would lead to autoRead
channels being wedged open.
Modifications:
This patch extends the Syscall Abstraction Layer to add support for
server sockets. It adds two tests: one for the basic accept flow, and
then one for the case discussed above.
This patch also refactors the code in BaseSocketChannel.readable0 to
more clearly show the path through the error case. There were a number
of early returns and partial conditionals that led to us checking the
same condition in a number of places. This refactor makes it clearer
that it is possible to exit this code in the happy path, with a
tolerated error, which should be considered the same as reading
_something_.
Result:
Harder to wedge a channel open.
Motivation:
In some circumstances we can accept a socket that is already closed. In
those cases, creating the underlying Socket type will fail, as
attempting to ignore SIGPIPE will fail. On Apple platforms, this causes
us to leak the accepted socket, and can lead to file descriptor
exhaustion.
Modifications:
- Close the accepted socket if we fail to create a Socket class
Result:
No FD leaks
* Pool buffers for messages and addresses.
* Revert changes related to controlMessageStorage
* Cosmetic fix.
---------
Co-authored-by: Cory Benfield <lukasa@apple.com>
Motivation:
Support was added for UDP_SEGMENT in #2372 which allows for large UDP
datagrams to be written to a socket by letting the kernel or NIC segment
the data across multiple datagrams. This reduces traversals across the
network stack which can lead to performance improvements. UDP_GRO is the
receive-side counterpart allowing the kernel/NIC to aggregate datagrams
and reduce network stack traversals.
Modifications:
- Add a function in CNIOLinux to check whether UDP_GRO is supported
- Add the relevant socket and channel options
- Add tests
Result:
- UDP_GRO can be enabled where supported and applications may receive
large buffers.
mark syncShutdownGracefully noasync
Motivation:
The code as-is blocks the calling thread.
Modifications:
* mark `EventLoopGroup.syncShutdownGracefully()` and `NIOThreadPool.syncShutdownGracefully()` noasync on Swift > 5.7
* offer NIOThreadPool.shutdownGracefully()
* add renamed to syncShutdownGracefully()
setOption forms a raw pointer to a generic argument. The compiler will
warn on this as of:
[proposal] Constrain implicit raw pointer conversion... #1963https://github.com/apple/swift-evolution/pull/1963
/Sources/NIOPosix/BaseSocket.swift:286:31: warning: forming
'UnsafeRawPointer' to a variable of type 'T'; this is likely incorrect
because 'T' may contain an object reference.
option_value: &val,
^
Ideally, this would be fixed by adding a BitwiseCopyable constraint to
the 'value' parameter of 'BaseSocker.setOption'. That would not only
eliminate the warning, but would make the API safer. But
BitwiseCopyable isn't quite ready for public use. In the meantime,
this is a reasonable workaround.
Co-authored-by: Cory Benfield <lukasa@apple.com>
Motivation:
On Linux, the UDP_SEGMENT socket option allows for large buffers to be
written to the kernel and segmented by the kernel (or in some cases the
NIC) into smaller datagrams. This can substantially decrease the number
of syscalls.
This can be set on a per message basis on a per socket basis. This
change adds per socket configuration.
Modifications:
- Add a CNIOLinux function to check whether UDP_SEGMENT is supported on
that particular Linux.
- Add a helper to `System` to check whether UDP_SEGMENT is supported on
the current platform.
- On Linux only:
- add the udp socket option level
- add the udp_segment socket option
- Add the `DatagramSegmentSize` channel option.
- Get/Set the option in `DatagramChannel`
Results:
UDP GSO is supported on Linux.
Co-authored-by: Cory Benfield <lukasa@apple.com>
Motivation:
Channels can read `ChannelOptions.maxMessagesPerRead` times from a
socket in each read cycle. They typically re-use the same buffer for
each read and rely on it CoWing if necessary. If we read more than once
in a cycle then we may CoW the buffer. Instead of reusing one buffer we
can reuse a pool of buffers limited by `maxMessagesPerRead` and cycle
through each, reducing the chance of CoWing the buffers.
Modifications:
- Extend `RecvByteBufferAllocator` to provide the size of the next
buffer with a default implementation returning `nil`.
- Add an recv buffer pool which lazily grows up to a fixed size and
attempts to reuse buffers where possible if doing so avoids CoWing.
Results:
Fewer allocations
Motivation:
To know when we next need to wake up, we keep track of what the next
deadline will be. This works great, but in order to keep track of this
UInt64 we save off an entire ScheduledTask. This object is quite wide (6
pointers wide), and two of those pointers require ARC traffic, so doing
this saving produces unnecessary overhead.
Worse, saving this task plays poorly with task cancellation. If the
saved task is cancelled, this has the effect of "retaining" that task
until the next event loop tick. This is unlikely to produce catastrophic
bugs in real programs, where the loop does tick, but it violates our
tests which rigorously assume that we will always drop a task when it is
cancelled. In specific manufactured cases it's possible to produce leaks
of non-trivial duration.
Modifications:
- Wrote a weirdly complex test.
- Moved the implementation of Task.readyIn to a method on NIODeadline
- Saved a NIODeadline instead of a ScheduledTask
Result:
Minor performance improvement in the core event loop processing, minor
correctness improvement.
Motivation:
PooledBuffer is an inherently unsafe type, but its original incarnation
was less safe than it needed to be. In particular, we can rewrite it to
ensure that it is compatible with automatic reference counting.
Modifications:
- Rewrite PooledBuffer to use ManagedBuffer
- Clean up alignment math
- Use scoped accessors
- Add hooks for future non-scoped access
Result:
Safer, clearer code
* Pool buffers for ivecs and storage refs in the event loop.
* Introduce PoolElement for poolable objects and add some bounds checks for the pooled buffers.
* Some polishes.
* Fix build failure with Swift 5.5/5.6
* User raw pointers instead of typed.
Motivation:
The `PendingDatagramWritesManager` unconditionally creates an array and
reserves capacity... only to never use it.
Modifications:
Remove the unsued code.
Result:
Fewer allocations.
Motivation:
According to the Linux man page the msg_len field supposed to be used to return a number of bytes sent for the particular message.
It does not make a sense to initialize it with a size of the message.
Modifications:
Change msg_leg field initialization, use 0 instead of message size.
Result:
Use sendmmsg() call properly.
Co-authored-by: Cory Benfield <lukasa@apple.com>
Motivation:
Empty UDP datagrams could be used to have a meaning.
Empty datagrams were being silently dropped on write.
Receiving an empty diagram causes an assertion failure (possible DDoS).
Modifications:
Remove early exit when writing empty datagrams and non-empty assertion when reading them.
Result:
We can now write and read empty datagrams.
Motivation:
Less code we have - less bugs we have.
The fix remove few lines of code keeping the same functionality.
Modifications:
Just remove some useless instance variables.
Result:
Less code.
Cap NonBlockingFileIO reads at Int32.max
Motivation:
We wish to avoid overly large reads resulting in EINVAL signals being
triggered resulting in errors. We workaround the issiue in the
NonBlockingFileIO level to keep the lower levels as simple as possible.
Modifications:
`NonBlockingFileIO` `read0` amends read `byteCount`s to be `Int32.max` if they are larger than that value.
Result:
Large `NonBlockingFileIO` reads no longer result in precondition
failures.
* RawSocket prototype
* Conform `ProtocolSubtype` to `Hashable`
* Add public `NIOIPProtocol` type
Make `ProtocolSubtype` internal
* Subset of IANA protocols with an RFC
* Add `CustomStringConvertible` to `NIOIPProtocol`
* Add `init(_ rawValue: Int)`
* Rename `NIOBSDSocket.ProtocolSubtype.ip` to `.default`
* Add `NIOBSDSocket.ProtocolSubtype.mptcp`
and remove `NIOBSDSocket.mptcpProtocolSubtype`
Motivation
MPTCP provides multipath capability for TCP connections. This
allows TCP connections to consume multiple independent network
paths, providing devices with a number of capabilities to
improve throughput, latency, or reliability.
MPTCP is not totally transparent, and requires servers to support
the functionality as well as clients. To that end, we should expose
some MPTCP capability.
Importantly, MPTCP uses a number of new socket flags and options.
To enable us to support this when it is available but gracefully fail
when it is not, we've hardcoded a number of Linux kernel constants
instead of relying on libc to expose them. This is safe to do on Linux
because its syscall layer is ABI stable.
Modifications
- Add ClientBootstrap and ServerBootstrap flags for MPTCP
- Plumb MPTCP through the stack
- Add new socket options for MPTCP
Result
MPTCP is supported on Linux
Motivation
Get the Android build working again
Modifications
- Modify close/open/readdir arguments, as they can be null
- Remove mkpath_np, as it's not there on Android
Result
Android builds again and the same tests pass
Motivation:
Basic set of function that NonBlockingFileIO provides is enough to read and write files, but as we get more and more into fully asynchronous world we need other non-blocking file-related function.
Modifications:
- Adds support for getting file information using lstat
- Adds ability to create a symlink, delete symlink and read it's destination
- Adds ability to list directories
- Adds ability to rename and remove files
This extension does not port cleanly to Windows as the time structures
on Windows are different. This happens to be unused, so simply remove
the extension on Windows.
* Throw fatalError when scheduling on shutdown EL if SWIFTNIO_STRICT is set
Signed-off-by: Si Beaumont <beaumont@apple.com>
* Add CrashTest for SWIFTNIO_STRICT crash
Signed-off-by: Si Beaumont <beaumont@apple.com>
* fixup: Extract env var parsing to static let
Signed-off-by: Si Beaumont <beaumont@apple.com>
Co-authored-by: Cory Benfield <lukasa@apple.com>
Add an import of `NIOCore` on Windows which mirrors the other platforms.
This greatly reduces the noise in the error list.
Co-authored-by: Cory Benfield <lukasa@apple.com>
The member names are not identical across platforms. Add a case to
handle the name difference on Windows.
Co-authored-by: Cory Benfield <lukasa@apple.com>
Use Win32 APIs to properly validate if a file is a pipe on Windows.
This enables providing the same semantics without leaking additional
Windows specifics.
Co-authored-by: Cory Benfield <lukasa@apple.com>
Reorganise the ECN constants to colocate the definitions for the
different platforms. Define the constants for Windows as the platform
does not provide them in the system headers.
Replace the use of raw constants with the internal enumeration. This
ensures that the constant names are uniform and don't leak structural
information from the underlying information.