### Motivation:
In my previous PR https://github.com/apple/swift-nio/pull/2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.
### Modifications:
This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.
Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR https://github.com/apple/swift-nio/pull/2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now
### Result:
`scheduleTask` requires two fewer allocations
### Motivation:
In my previous PR https://github.com/apple/swift-nio/pull/2009, I added baseline performance and allocation tests around `scheduleTask` and `execute`. After analysing, the various allocations that happen when scheduling a task there were only a few that could be optimized away potentially.
### Modifications:
This PR converts the `ScheduledTask` class to a struct which will reduce the number of allocations for scheduling tasks by 1. The only thing that needs to be worked around when converting to a struct is giving it an identity so that we can implement `Equatable` conformance properly. I explored two options. First, using an `ObjectIdentifier` passed to the init. Second, using an atomic counter per EventLoop. I went with the latter since the former requires an additional allocation in the case of calling `execute`
### Result:
`scheduleTask` and `execute` require one less allocation
### Motivation:
In issue https://github.com/apple/swift-nio/issues/1316, we see a large number of allocations to happen when scheduling tasks. This can definitely be optimized. This PR adds a number of baseline allocation and performance tests for both `scheduleTask` and `execute`. In the next PRs, I am going to try a few optimizations to reduce the number of allocations.
### Modifications:
Added baseline performance and allocation tests for `scheduleTask` and `execute`
Motivation:
To justify performance changes we need to measure the code being
changed. We believe that `HTTPHeaders.subscript(canonicalForm:)` is a
little slow.
Modifications:
- Add allocation and performance tests for fetching header values in
their canonical form
Results:
More benchmarks!
Motivation:
Peter thinks that result-erasing maps should not allocate, and we have
special code paths in the code to try to make Void -> Void maps not
allocate. Sadly, both code paths currently do allocate.
Per our rules for not trying to make optimizations without data, we
should start measuing these closures so we can make optimizations.
Modifications:
- Added an alloc couter test for result-erasing maps.
Result:
Alloc counter test suitable for any fix of #1697.
Motivation:
5.0 is several years old now, and while we still support it there is
beginning to be some conflict with performance optimizations for newer
Swift versions. We should stop counting the allocation counts on this
version now: we no longer care as much about regressions.
Modifications:
- Stop setting alloc counter limits on 5.0
- Remove the script that tries to add them back.
Result:
No longer measure allocs on 5.0
Motivation:
We alloc quite a lot with our implementations of flatMapThrowing and flatMapErrorThrowing.
While we don't use Futures a lot in NIO itself a lot of our users, depend quite a bit on them. Let's make their code faster.
Modifications:
Create a Promise and use _whenComplete directly instead of going through another flatMap method
Result:
In my testing I see a reduction of 3 allocs per invocation 🎉
Motivation:
Due to https://bugs.swift.org/browse/SR-14516 , we sometimes get
allocating (!?) `subscript.read` accessors in the CircularBuffer.first
depending on the `Element` type...
Modifications:
Implement `CircularBuffer.first` instead of inheriting it from
Collection.
Result:
Fewer allocs in some cases.
Motivation:
Usually, we add a more or less random number of slack allocations to
make sure the tests don't spuriously fail. This makes it quite costly to
support new Swift versions.
Modifications:
Add a script which can spit you out the right allocation limits
including slack.
Result:
Easier to support new Swift versions
Motivation:
Any version of ChannelHandler removal that does not have a
ChannelHandlerContext already in hand is currently excessively
expensive. This is because it allocates a promise and a callback for
finding the context, despite already having a promise in hand for users
to complete.
We can remove a pair of allocations here by jumping to the event loop
directly and then running our operations synchronously.
Modifications:
- Rewrite removeHandler(name:promise:) and removeHandler(_:promise:) to
jump directly to the event loops and then work synchronously.
Result:
Cheaper code
Motivation:
Allocation counter tests are good, and we aren't measuring this today.
Modifications:
- Wrote some add/remove tests that use different remove functions.
Result:
Better insight into performance.
Motivation:
We recently added a synchronous view of the `ChannelPipline` so that
callers can avoid allocating futures when they know they're on the right
event loop. We also offer convenience APIs to configure the pipeline for
particular use cases, like an HTTP/1 server but we don't have
synchronous versions of these APIs yet. We should have parity
between as synchronous and asyncronous APIs where feasible.
Modifications:
- Add synchronous helpers to configure HTTP1 client and server pipelines
Result:
Callers to synchronously configure HTTP1 client and server pipelines.
Motivation:
We added synchronous pipeline operations to allow the caller to save
allocations when they know they are already on the correct event loop.
However, we missed a trick! In some cases the caller cannot guarantee
they are on the correct event loop and must use an asynchronous method
instead. If that method returns a void future and is called on the event
loop, then we can perform the operation synchronously and return a
cached void future.
Modifications:
- Add API to `EventLoop` for creating a 'completed' future with a
`Result` (similar to `EventLoopPromise.completeWith`)
- Add an equivalent for making completed void futures
- Use these when asynchronously adding handlers and the caller is
already on the right event loop.
Result:
- Fewer allocations on the happiest of happy paths when adding handlers
asynchronously to a pipeline.
* Add synchronous channel options
Motivation:
The functions for getting and setting channel options are currently
asynchronous. This ensures that options are set and retrieved safely.
However, in some cases the caller knows they are on the correct event
loop but still has to pay the cost of allocating a future to either get
or set an option.
Modifications:
- Add a 'NIOSynchronousChannelOptions' protocol for getting and setting
options
- Add a customisation point to 'Channel' to return 'NIOSynchronousChannelOptions'.
- Default implementation returns nil so as to not break API.
- Add implementations for 'EmbeddedChannel' and 'BaseSocketChannel'
- Allocation tests for getting and setting autoRead
Results:
Options can be get and set synchronously.
Motivation:
`ChannelPipeline` is explicitly thread-safe, any of the operations may
be called from outside of the channel's event loop. However, there are
often cases where it is known that the caller be on the right event
loop, and an asynchronous API is unnecessary.
In some cases -- such as when a pipeline is configured dynamically and
handlers are added from the 'channelRead' implementation of one handler
-- it forces the caller to write code that they might not actually need:
such as buffering events which may happen before the future completes.
This is unnecessary complexity when the caller knows that they must
already be on an event loop.
Modifications:
- Add a 'SynchronousOperations' view to the 'ChannelPipeline' which is
available to callers via 'syncOperations'.
- Supported operations include: adding a handler, adding multiple
handlers, retrieving a context via various predicates and retrieving a
handler of a given type.
- Some of the operations in 'ChannelPipeline' were refactored to have an
explicitly synchronous version, asynchronous versions complete their
promise based on the result of these calls.
- Various minor documentation fixes and addition of 'self' where it was
not used explicitly.
Result:
Users can perform synchronous operations on the 'ChannelPipeline' if
they know they are on the right event loop.
Motivation:
Succeeded `EventLoopFuture<Void>`s are quite important in SwiftNIO, they
happen all over the place. Unfortunately, we usually allocate each time,
unnecessarily.
Modifications:
Offer `EventLoop`s the option to cache succeeded void futures.
Result:
Fewer allocations.
Motivation:
When you make a change that affects many performance tests, it's often
easier to just copy the results from CI. Unfortunately, that makes the
diff hard to read because the order is arbitrary.
Modifications:
Sort the list so you can always easily get to the same order as the
docker file by using `| sort` or `:sort` in vim.
Result:
Easier to update perf tests.
* Add allocation test for adding multiple handlers
Motivation:
I believe there is at least 1 avoidable allocation in this area.
Even if there isn't, making sure we don't increase allocations is good.
Modifications:
Add a test of allocations when adding multiple handlers.
Set limits for docker images.
Result:
Allocations when adding multiple handlers are now checked.
* Remove an allocation from addHandlers
Motivation:
Fewer allocations should improve performance.
Modifications:
Split out a sub function from addHandlers.
I originally thought I'd have to change the part of this
function which reads `var handlers = handlers` as there was
a surprising allocation at the beginning of this function.
It seems that breaking out some of the logic is sufficient
to remove an allocation.
Result:
1 fewer allocation.
* Fix up alloc tests.
Motivation:
When running load through EmbeddedChannel we spend an enormous amount of
time screwing around with removing things from Arrays. Arrays are not a
natural data type for `removeFirst()`, and in fact that method is
linear-time on Array due to the need for Array to be zero-indexed. Let's
stop using (and indeed misusing) Array on EmbeddedChannel.
While we're here, if we add some judicious @inlinable annotations we can
also save additional work generating results that users don't need.
Modifications:
- Replace arrays with circular buffers (including marked versions).
- Avoid CoWs and extra allocations on flush.
- Make some API methods inlinable to make them cheaper.
Result:
- Much cheaper EmbeddedChannel for benchmark purposes.
Motivation:
Curently adding multiple channel handlers makes a call to the
async version of addHandler for each handler resulting in
n+1 futures. It feels better to use just one future and add
all the handlers synchoronously.
Modifications:
Change sync functions with take a promise to instead return a Result.
Feed this back until reaching addHandlers.
Result:
Multiple handlers can be added more quickly.
Motivation:
Ruby on Xenial is too old for the cocoapods downloader which jazzy uses.
Modifications:
Change Dockerfile to not install Jazzy on xenial.
Result:
Dockerfile will build again, but bionic or later is required for docs.
Motivation:
There is at least a theoretical race to flush before close in prior version.
Having terrible code in the NIO repo is asking for someone to copy it.
Modifications:
flatMap the various parts of the client together which also ensures the
flush is complete before close is called.
Result:
Slightly nicer code, slightly fewer allocations.
Motivation:
so_reuseaddr on linux allows multiple binding to the same port simultaneously - this has not been seen to happen in practice but it is best to be cautious.
Modifications:
Remove address reuse from the UDP allocation tests.
Result:
Reduced danger of binding the same port twice.
Motivation:
We now have sufficiently many integration tests that the number of TCP
sockets in TIME_WAIT begins to cause resource pressure on the system.
This can lead to the integration tests failing due to being unable to
assign a socket.
We can force this to not happen by disabling linger. As we're running on
a lossless fabric (localhost), this is entirely safe.
Modifications:
- Disable TIME_WAIT for integration tests.
Result:
Integration tests will pass.
Co-authored-by: Johannes Weiss <johannesweiss@apple.com>
Co-authored-by: Peter Adams <pp_adams@apple.com>
Motivation:
The 1000 TCP connections allocation counter test did way too many thread
crosses and other things that allocated for unrelated reasons.
Modifications:
Remove those.
Result:
Much more stable
Motivation:
Allocation tests are very hard to use productively if they don't produce stable results.
Modifications:
Change TCP and UDP connection tests to monitor on the server side as well as the client side to make sure all data is completely send and received before stopping counting allocations.
Result:
Allocation tests are more stable than before.
Motivation:
There are multiple sub-optimal ByteBuffer creation patterns that occur
in the wild. Most often they happen when people don't actually have
access to a `Channel` just want to "convert" a `String` into a
`ByteBuffer`. To do this, they are forced to type
var buffer = ByteBufferAllocator().buffer(capacity: string.utf8.count)
buffer.writeString(string)
Sometimes, they don't get the capacity calculation right or just put a
`0`.
Similar problems happen if NIO users want to cache a ByteBuffer in their
`ChannelHandler`. You will then find this code:
```swift
if self.buffer == nil {
self.buffer = receivedBuffer
} else {
var receivedBuffer = receivedBuffer
self.buffer!.writeBuffer(&receivedBuffer)
}
```
And lastly, sometimes people want to append one `ByteBuffer` to another
without mutating the appendee. That's also cumbersome because we only
support a mutable version of `writeBuffer`.
Modifications:
- add `ByteBuffer` convenience initialisers
- add convenience `writeBuffer` methods to `Optional<ByteBuffer>`
- add `writeBufferImmutable` which doesn't mutate the appendee.
Result:
More convenience.
Co-authored-by: Cory Benfield <lukasa@apple.com>
Co-authored-by: Cory Benfield <lukasa@apple.com>
Fine grained UDP allocation tests
Motivation:
The existing UDP allocation tests count allocations everywhere.
Split out 3 tests to cover:
Making connections.
Making bootstraps.
Transferring data.
Modifications:
Three new tests.
Result:
Easier to debug changes to allocation profiles in UDP.
Motivation:
The existing tcp allocation tests cover everything from bootstrap creation through to sending data in one massive test. The parameters of how many iterations control the focus of the test. It would be better to be more explicit about what we're trying to test.
Modifications:
2 new tests:
* client bootstrap creation.
* connection establishment.
(Data transfer is already covered by the ping_pong test)
Result:
Easier to assess the impact of changes to memory allocation patterns.
Motivation:
Values were previously missing (5.3) or incorrectly specified (5.1)
meaning that allocations were not checked in for these swift versions.
Modifications:
Correct variable name in 5.1, add missing variables in 5.3 docker files.
Result:
UDP allocations will be checked.
Co-authored-by: Cory Benfield <lukasa@apple.com>