Motivation:
dtrace is extrmely useful when debugging allocation related issues. This
adds two scripts that have helped me many times.
Modifications:
- `dev/malloc-aggregation.d` which prints an aggregation of all
stacks that have allocated
- `dev/boxed-existentials.d` which live prints all allocations of boxed
existentials
Result:
sharing debugging tools is great
Motivation:
In the new CI, ChannelTests.testGeneralConnectTimeout often fails with
```
error: ChannelTests.testGeneralConnectTimeout : threw error "connection reset (error set): Network is unreachable (errno: 101) "
```
It's not a great test but this just works around it by saying that if
the network's down that's fine too.
Modifications:
also accept ENETDOWN and ENETUNREACH
Result:
tests should pass in new CI.
Motivation:
HTTP{Request,Response}Head were too large even with a less than 3 word
ByteBuffer, this shrinks them box CoW boxing them. Should also reduce
ARC traffic when passing around a bunch.
Modifications:
CoW box HTTP{Request,Response}Head
Result:
fewer existential containers created, more performance
Motivation:
NIO's dynamic pipeline comes with one performance caveat: If your type
isn't specialised in NIOAny (only types in the NIO module can be) and
it's over 3 words (24 bytes on 64-bit platforms), you'll suffer an
allocation for every time you box it in NIOAny. Very unfortunately, both
ByteBuffer and FileRegion were exactly 3 words wide which means that any
enum containing those would be wider than 3 words. Worse, both
HTTP{Request,Response}Head contained types that were wider than 3 words.
The best solution to this problem is to shrink ByteBuffer and FileRegion
to just under 3 words and that's exactly what this PR is doing. That is
slightly tricky as ByteBuffer was already bit packed fairly well
(reader/writer indices & slice begin/end were stored as a UInt32). The
trick we employ now is to store the slice beginning in a UInt24 and the
file region reader index in a UInt56. That saves one byte for both
ByteBuffer and FileRegion with very moderate tradeoffs: The reader index
in a file now needs to be within 64 PiB (peta bytes) and a byte buffer
slice beginning must start within 16 MiB (mega bytes). Note: The
reader/writer indices as well as slice ends are _not_ affected and can
still be within 4 GiB. Clearly no one would care about the restrictions
for FileRegions in the real world but we might hit the ByteBuffer slice
beginning limit in which case the slice would be copied out. But
given that slices are mostly used to slice off headers in network
protocols, 16 MiB should be _plenty_.
Norman was kind enough to measure the perf differences:
master (before this PR):
```
$ wrk -c 256 -d 10s -t 4 -s ~/Downloads/pipeline-many.lua -H "X-Host: SomeValue" -H "Host: swiftnio.io" -H "ThereAreEvenMoreHeaders: AndMoreValues" http://localhost:8888
Running 10s test @ http://localhost:8888
4 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 93.02ms 158.13ms 1.97s 93.07%
Req/Sec 112.79k 26.05k 258.75k 84.50%
4501098 requests in 10.04s, 214.63MB read
Socket errors: connect 0, read 111, write 0, timeout 89
Requests/sec: 448485.61
Transfer/sec: 21.39MB
```
after this PR:
```
$ wrk -c 256 -d 10s -t 4 -s ~/Downloads/pipeline-many.lua -H "X-Host: SomeValue" -H "Host: swiftnio.io" -H "ThereAreEvenMoreHeaders: AndMoreValues" http://localhost:8888
Running 10s test @ http://localhost:8888
4 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 107.53ms 206.56ms 1.99s 90.55%
Req/Sec 124.15k 26.56k 290.41k 89.25%
4952904 requests in 10.03s, 236.17MB read
Socket errors: connect 0, read 161, write 0, timeout 22
Requests/sec: 493852.65
Transfer/sec: 23.55MB
```
so we see a nice 10% improvement
Modifications:
- shrank ByteBuffer by 1 byte, making it 23 bytes in total
- shrank FileRegion by 1 byte, making it 23 bytes in total
- added @_inlineable to NIOAny where appropriate to not suffer boxed existentials in different places
- added tests for the new edge cases
Result:
- more speed
- fewer allocations
Refactored extension HTTPResponseStatus: Equatable, no change to the efficacy of the code.
Motivation:
Code readability is essential to a long-lasting maintainable project, the code this PR changes was "readable" however this PR will enhance the readability by removing excess lines of code.
Modifications:
Changed a long case statement, to a much shorter one.
Result:
There will be fewer lines of code, no change to the efficacy of product should occur. I am not familiar enough with the compiler to know if this will have a deleterious effect on performance I don't think it should be based on the way public var code: UInt { get } is implemented.
Motivation:
Its often useful to be able to add / retrieve elements on both sides of a CircularBuffer.
Modifications:
- Add last / prepend to be able to act on both ends.
- Add tests
Result:
Fixes https://github.com/apple/swift-nio/issues/279
Motivation:
The pipelining handler made the assumption that `channelRead` is never
called recursively. That's mostly true but there is at least one
situation where that's not true:
- pipelining handler seen a response .end and delivers a .head (which is
done in `channelRead`)
- a handler further down stream writes and flushes some response data
- the flushes fail which leads to us draining the receive buffer
- if the receive buffer contained more requests, the pipelining
handler's `channelRead` is called again (recursively)
The net result of that was that the new request parts from the receive
buffer would now jump the queue and go through the channel pipeline
next, before other already buffered messages.
Modifications:
made the pipelining handler buffer if a `channelRead` comes in from the
pipeline and there is already at least one message buffered.
Result:
the ordering of the incoming messages should now be respected which is
very important...
Motivation:
Often its useful to be still be able to access the local / remote address during channelInactive / handlerRemoved callbacks to for example log it. We should ensure its still accessible during it.
Modifications:
- Fallback to slow-path in ChannelHandlerContext.localAddress0 / remoteAddress0 if fast-path fails to try accessing the address via the cache.
- Clear cached addresses after all callbacks are run.
- Add unit test.
Result:
Be able to access addresses while handlers are notified.
Motivation:
There were two errors in testShuttingDownFailsRegistration that
could cause spurious test failures.
The first was that the test did not wait for connection establishment
before it began shutting the selector down, due to a change made in
The second was that the server side of the connection was not wedged open,
and so it would be closed by the call to loop.closeGently. That closure
could, if the main thread took long enough, lead to the client side
reading EOF and being forcibly closed.
Modifications:
Added a promise to wait for connection establishment.
Added support for wedging both the server and client side open.
Result:
Fewer test failures.
Motivation:
We should use power of two for the underlying storage to allow us to make use of bitmasking which is faster then modulo operations.
Modifications:
Use power of two and replace module with bitmasking.
Result:
Faster implementation of CircularBuffer.
Motivation:
It's quite common that someone asks for the canonical form of a header
and we don't have any header values. There's no need to do string
comparisons then, we can just return an empty array.
Modifications:
Check if we're asking for a header that isn't present and if so, just
return the empty array.
Result:
Should be ever so slightly faster.
Motivation:
Since forever we had a major bug in the Selector: In this condition:
- kqueue/epoll had many events
- in one of the earlier events we unregister a Channel whose fd is on of
the later events
- we subsequently (still in the same event loop tick) register a new
channel which gets the same fd as the previously closed one
then we would deliver an event that was meant for a previous channel to
a newly opened one.
Thanks to @mcdappdev for hitting this bug, helping us debug it and also
providing a repeatedly working repro.
Modifications:
if during event delivery any fd gets unregistered, we stop delivering
the remaining events and rely on the selector to redeliver them
again next time.
Result:
we don't deliver events for previously closed channels to new ones.
Motivation:
The NIOWebSocketServer demo works much better, when it actually
configures the WebSocketUpgrader.
You should have a test for that!
Modifications:
This adds the `upgrader` to the `HTTPUpgradeConfiguration`.
Result:
The demo will actually perform something web-sockety.
Motivation:
We recently changed the backing for the HTTP request and response
objects so that the URI string was no longer visible in
`request.description`. This patch fixes that and prints a format like
this:
HTTPRequestHead { method: GET, uri: "/", version: HTTP/1.1, headers: [("Host", "localhost:8888"), ("User-Agent", "curl/7.54.0"), ("Accept", "*/*")] }
and
HTTPResponseHead { version: HTTP/1.1, status: ok, headers: [("content-length", "12")] }
Modifications:
improved the HTTP request/response descriptions
Result:
debugging is easier
Motivation:
When a pending connect is still in process and the Channel is closed we need to ensure we use the correct ordering when notify the promises and ChannelHandlers in the pipeline.
The ordering should be:
- Notify pending connect promise
- Notify close promise
- Call channelInactive(...)
Modifications:
- Correct ordering of notification
- Add unit test
Result:
Ensure correct ordering when connect is still pending
Motivation:
In some contexts it is important to be able to pass file descriptors
around instead of specifying the particular host and port. This allows removing
certain windows for race conditions, and has positive security implications.
Modifications:
Added `withConnectedSocket(_:)` call to the `ClientBootstrap` that can
be used to create a new `Channel` out of existing file descriptor
representing the connected socket.
Result:
Allows initializing `Channel`s off the existing file descriptors.
Motivation:
If a server binds "localhost", and then we try to connect with an
AF_INET socket, then on macOS at least some of the time we will have
a protocol mismatch, as at least some of the time localhost will have
resolved to ::1 before 127.0.0.1.
That's bad.
Modifications:
Create a socket whose address family matches the server's.
Result:
This test will pass more often
Motivation:
Immutable members of structs (properties defined with let) can never be changed
even if the struct itself is mutable (defined withh var). This makes it difficult
to build rich convenience APIs on top of these structs.
Modifications:
Changed all 'let' stored properties on HTTP{Request|Response}Head to 'var'.
Result:
HTTP{Request|Response}Head are easier to use now.
Motivation:
Since #286, we do eagerly detect connection resets which is good. To
resolve the situation we first try to read everything off the socket. If
something fails there, everything is good and the client gets the
correct error. If that however doesn't lead to any errors, we would
close the connection with `error: ChannelError.eof` which the client
would then see for example on a connect that happened to get
`ECONNRESET` or `ECONNREFUSED`. That's not right so these situations are
fixed here.
Modifications:
- on reset, try to get the socket error if any
- if no socket error was detected, we send a `ECONNRESET` because we
know we have been reset but not why and didn't encounter any errors
- write tests for these situations
Result:
connection resets that don't hit any other errors will now see the
correct error
Motivation:
The integration tests' info mode lets us see the number of allocations
the did, that's great.
Modifications:
run integration tests in info mode
Result:
mode information
Motivation:
Opening a file, seeking it or querying its size (or other information)
is blocking on UNIX so `NonBlockingFileIO` should support that too.
Modifications:
Added a method to `NonBlockingFileIO` which lets the user open a file
without blocking the calling thread.
Result:
Less blocking is good :)
Motivation:
When a connect() call returns an error other than EINPROGRESS, we
currently leave the FD registered and don't close the channel. This is
wrong, we should take this as a signal to close the socket immediately,
rathern than letting the selector tell us this FD is dead: after all,
we know it's dead.
Modifications:
Call `close0()` in synchronous connect failures.
Result:
Faster failures!
Motivation:
PR #286 started processing EPOLLHUPs. Sadly epoll also immediately
throws EPOLLHUP at you if you register a socket that isn't either
connected or listening. This fixes two more instances of this problem.
Modifications:
made sure sockets that are registered are either listening or
connected (and before we let the Selector wait for events)
Result:
tests should be more reliable
Motivation:
When implementing a custom ChannelCore, you will probably need access
to the data inside a NIOAny. It should be possible to unwrap that.
Modifications:
Added an extension to ChannelCore to allow unwrapping a NIOAny.
Added @_versioned to almost all of NIOAny.
Result:
ChannelCore is implementable outside NIO
Motivation:
The ping/pong test should only do 4 allocations:
- readFromSocket does unconditionally allocate one ByteBuffer
(allocating one ByteBuffer does 2 allocations)
- we have a ping and a pong side, each call readFromSocket
so 2 * 2 = 4 allocations per message. Before adding the `@_inlineable`
to the read/write/get/set integer methods in ByteBuffer however I was
seeing way more allocations. So looked at them and fixed them.
Turns out that massively reduces the number of allocations for HTTP too.
On Linux I was seeing this change:
1000_reqs_1_conn: 282000 -> 70000
1_reqs_1000_conn: 1287000 -> 907000
Modifications:
- made the ByteBuffer integer methods inlineable
- added a test that makes sure we don't regress on the simplest possible
use of NIO: a ping/pong server
Result:
After your change, what will change.
Motivation:
Sometimes it's useful if the integration tests can output something that
isn't always useful. `-v` for verbose is way too verbose so this adds a
`-i` flag to the integration tests that lets 'info' messages through.
Modifications:
- add `-i` flags to integration tests
- make use of it in the integration counter tests
Result:
if we enable `-i` in CI, then we can constantly monitor the number of
allocations we do even if we don't fail the tests.
IdleStateHandler.IdleStateEvent is currently internal. Make it public.
Motivation:
The IdleStateEvent is delivered to the user viapublic func userInboundEventTriggered(ctx: ChannelHandlerContext, event: Any). In this method, it must be possible to have a check on the type of event. IdleStateEvent must be public.
Modifications:
Change access level of IdleStateHandler.IdleStateEvent to public.
Result:
IdleStateEvent will become a part of the NIO API.
* Allow to specify the max websockets frame size when using the upgrader.
Motivation:
At the moment its not possible to adjust the max websockets frame size when using the upgrader while its possible when directly use the WebSocketFrameDecoder.
Modifications:
Allow to specify max frame size via an init argument (with default value).
Result:
More flexible way to use the upgrader.
Motivation:
We had a number of problems:
1. We wanted to lazily process input EOFs and connection resets only
when the user actually calls `read()`. On Linux however you cannot
unsubscribe from `EPOLLHUP` so that's not possible.
2. Lazily processing input EOFs/connection resets wastes kernel
resources and that could potentially lead to a DOS
3. The very low-level `Selector` interpreted the eventing mechanism's
events quite a lot so the `EventLoop`/`Channel` only ever saw
`readable` or `writable` without further information what exactly
happened.
4. We completely ignored `EPOLLHUP` until now which on Unix Domain
Socket close leads to a 100% CPU spin (issue #277)
Modifications:
- made the `Selector` interface richer, it now sends the following
events: `readable`, `writable`, `readEOF` (input EOF), `reset`
(connection reset or some error)
- process input EOFs and connection resets/errors eagerly
- change all tests which relied on using unconnected and unbound sockets
to user connected/bound ones as `epoll_wait` otherwise would keep
sending us a stream of `EPOLLHUP`s which would now lead to an eager
close
Result:
- most importantly: fix issue #277
- waste less kernel resources (by dealing with input EOFs/connection
resets eagerly)
- bring kqueue/epoll more in line
Motivation:
We had a bug which is happens in the combination of these states:
- we held a request in the pipelining handler (because we're procesing a
previous one)
- a http handler error happened whilst a response's `.head` had already
been sent (but not yet the `.end`)
- the HTTPServerProtocolErrors handler is in use
That would lead to this situation:
- the error isn't held by the pipelining handler
- the error handler then just sends a full response (`.head` and `.end`)
but the actual http server already send a `.head`. So all in all, we
sent `.head`, `.head`, `.end` which is illegal
- the pipelining handler didn't notice this and beause it saw an `.end`
it would send through the next requst
- now the http server handler is in the situation that it gets `.head`,
`.head` too (which is illegal)
Modifications:
- hold HTTP errors in the pipelining handler too
Result:
- more correctness
Motivation:
We returned an callback which needed to be executed by the caller. This will cause extra allocations and is not needed. We can just return a EventLoopPromise which will cascade the notification.
Modification:
Not return a closure but promise.
Result:
Less allocations.
Motivation:
Previously the integration tests ran in verbose mode wouldn't actually
fail as we didn't correctly propagate the exit code.
Modifications:
correctly propagate the exit code
Result:
integration tests correctly fail in verbose mode too
Motivation:
Its very likely that the response will be HTTP/1.1 or HTTP/1.0 so we can optimize for it by minimizing the buffer.write(...) calls.
Beside this we should also change the pattern of *.write(into: inout ByteBuffer) to extension on ByteBuffer itself.
Modificiations:
- Optimize for HTTP/1.0 and 1.1
- Use extensions on ByteBuffer
Result:
Faster and more clean code.
Motivation:
HTTP message framing has a number of edge cases that NIO currently does
not tolerate. We should decide what our position is on each of these edge
cases and handle it appropriately.
Modifications:
Provide an extensive test suite that codifies our expected handling of
these edge cases. Fix divergences from this behaviour.
Result:
Better tolerance for the weird corners of HTTP.
Motivation:
Callouts to user code allows the user to make calls that re-enter
channel code. In the case of channel closure, we could call out to the
user before the channel knew it was completely closed, which would
trigger a safety assertion (in debug mode) or hit a fatalError
(in release mode).
Modifications:
Reconciled channel state before we call out to user code.
Added a test for this case.
Result:
Fewer crashes, better channel state management.
Motivation:
When an asynchronous connect failed we would crash because we didn't
correctly close the connection to start tearing everything down.
That correctly made the `SocketChannelLifecycleManager` trip.
This was filed as #302.
Modifications:
Correctly deregister and close the connection if an async connect fails.
Result:
Fewer crashes, fixes#302.
Motivation:
The Swift compiler seems to get very nervous when variadic inits
are used for types that have constructors with optional values. In
general we're not worried about breaking downstream consumers'
extensions when we update our code, but in this case we break the
ability to conform HTTPHeaders to ExpressibleByDictionaryLiteral,
which is a bit annoying. See https://bugs.swift.org/browse/SR-7415
for more details.
For 1.5.0 we should conform HTTPHeaders ourselves, but until that
time we can restore anyone who was conforming HTTPHeaders to
ExpressibleByDictionaryLiteral by restoring the old unary initializer
and delegating it to the new one. This presents an effect that is
equivalent to the old behaviour, but is new.
As a side note, in general it's a bad idea to conform types that you
do not own to standard library protocols. NIO reserves the right to
add conformances to our types in any Semver Minor release, so having
that conformance in your own code risks breakage without a Semver
Major patch bump. Please be aware.
Modifications:
Restored the unary initializer for HTTPHeaders.
Result:
Less breakage, more happiness.
Motivation:
`ab -k` behaves weirdly: it'll send an HTTP/1.0 request with keep-alive set
but stops doing anything at all if the server doesn't also set Connection: keep-alive
which our example HTTP1Server didn't do.
Modifications:
In the HTTP1Server example if we receive an HTTP/1.0 request with
Connection: keep-alive, we'll now set keep-alive too.
Result:
ab -k doesn't get stuck anymore.
Motivation:
TailChannelHandler does not implement any _ChannelOutboundHandler methods and so has no need to conform to it.
Modifications:
Not implement _ChannelOutboundHandler for TailChannelHandler
Result:
More correct code.
Motivation:
If we not filled the whole buffer with read(...) we should stop reading and wait until we get notified again. Otherwise chances are good that the next read(...) call will either read nothing or only a very small amount of data.
Also this will allow us to call fireChannelReadComplete() which may give the user the chance to flush out all pending writes.
Modifications:
Stop reading and wait for next wakup when we not fill the whole buffer.
Result:
Less wasted read calls and early chance of flushing out pending data.
Motivation:
We are currently parsing each header eagly which means we need to convert from bytes to String frequently. The reality is that most of the times the user is not really interested in all the headers and so it is kind of wasteful to do so.
Modification:
Rewrite our internal storage of HTTPHeaders to use a ByteBuffer as internal storage and so only parse headers on demand.
Result:
Less overhead for parsing headers.
Motivation:
We have one integration test left disabled, but actually it basically
already passes: it just expects the wrong format of error. We should
have all our integration tests enabled.
Modifications:
Enabled the test.
Changed the wording of the error it expected.
Result:
No disabled integration tests.
* Add EventLoopFuture reduce(initialResult:) function (#240)
Motivation:
Add a new function to allow us to work with multiple EventLoopFutures, that return values, by combining their results and wrapping it in a new
EventLoopFuture, using a specified reducer function.
Modifications:
- Add a new function
EventLoopFuture<T>.reduce<U>(initialResult:futures:eventLoop:nextPartialResult)
- Changed the implementation of andAll<Void> to use
EventLoopFuture<Void>.reduce(initialResult)
- Add a new function EventLoopFuture<T>.reduce<U>(into:)
Result:
Can reduce multiple EventLoopFutures without having to deal with
creating EventLoopFutures.
Simplified EventLoopFuture andAll<Void>
- An API which can be used to construct collections from future values in a linear time.
Motivation:
Generics and closures across modules can be problematic as the cause
allocations. ByteBuffer is one of NIO's core data types and therefore
should be well optimised. Generic/closure taking methods so far were a
bit of a problem. And with
[SE-0193](https://github.com/apple/swift-evolution/blob/master/proposals/0193-cross-module-inlining-and-specialization.md)
accepted, there's not much reason to use those features even though they
are (at the moment) @_.
For the 1000 requests over 1 connection allocation benchmark we go from
"total_allocations": 325378
to
"total_allocations": 321366
so we lose 4 allocations per HTTP request/response pair, not super
impressive but not nothing either.
Modifications:
Make generic/closure taking methods on ByteBuffer `@_inlineable`
Result:
Fewer allocations
Motivation:
NIO 1.3.1 was TSan clean but `master` regressed. In total I could see
two breakages both proudly presented by `SocketChannelLifecycleManager`.
Sorry :| :
1. accessing the `isActiveAtomic` from arbitrary threads:
this was clearly a bug because `SocketChannelLifecycleManager` is
usually held mutabe. So `isActiveAtomic` needs to be injected into
`SocketChannelLifecycleManager` rather than contstructed there
2. accessing the `ChannelPipeline` through `SocketChannelLifecycleManager`:
same problem as above really.
Modifications:
inject both the ChannelPipeline and the isActiveAtomic in the
appropriate places.
Result:
NIO much more correct, TSan happy
Motivation:
HTTPHeaders had an unusual API for Swift: `getCanonicalForm(String)` which is
better suited as a subscript `[canonicalForm: String]`.
Also the implementation was needlessly inefficient.
Modifications:
- renamed HTTPHeaders.getCanonicalForm to HTTPHeaders[canonicalForm]
- improved efficiency by replacing anti-pattern
Array.map { ... }.reduce(+, []) by flatMap
Result:
more Swift in both meanings
Motivation:
Unfortunately we cannot guarantee that a weak ref is going to be nil in
a timely fashion if a strong reference to the object exists in another
thread. This makes us fail tests on slow machines, which is bad.
Modifications:
Wait for up to a second for the other thread to get its house in order
before we give up.
Result:
Hopefully I'll never have to "fix" this test again.
Motivation:
Currently the HTTP decoders can throw errors, but they will be ignored
and lead to a simple EOF. That's not ideal: in most cases we should make
a best-effort attempt to send a 4XX error code before we shut the client
down.
Modifications:
Provided a new ChannelHandler that generates 400 errors when the HTTP
decoder fails.
Added a flag to automatically add that handler to the channel pipeline.
Added the handler to the HTTP sample server.
Enabled integration test 12.
Result:
Easier error handling for HTTP servers.