Track sent items using SmallVec #1657

martinthomson · 2024-02-15T05:02:18Z

DO NOT MERGE THIS - IT MAKES THINGS SLOWER

I thought that this was worth sharing though.

I was looking at the profiles that @KershawChang shared and noticed that write_stream_frame was eating a decent amount of CPU because there was an allocation in there. It took me a while to track it down, but it turns out that this is the first place that we add a RecoveryToken to the Vec of those that we hold in each SentPacket.

The obvious thing to test in this case is SmallVec. SmallVec is designed for places where you mostly add a few items to a collection. It holds a small array into which values are deposited unless more than the array size is added, when those values are allocated on the heap, just like a normal Vec. This means that if you stay within the array, you avoid a heap allocation, which can save a bunch of work. The cost is a little complexity, the extra stack allocation, and that your stuff needs to move if you ever overflow the backing array. So if your prediction is wrong, it is more expensive.

I tried this out and it makes performance worse. With a backing array of size 1, the performance change isn't observable. But larger values progressively degrade performance. Increasing the array size to 8 makes it worse than the value of 4 I started with. My theory is that the increased size of the base object starts is the problem. This collection is typically kept in a SentPacket. Our acknowledgements code needs extra time to move the larger struct around.

The size of the performance regression suggests that there are some ways we could further improve how SentPacket is handled.

Benchmark results

Run multiple transfers with varying seeds
                        time:   [42.395 ms 42.547 ms 42.712 ms]
                        change: [+3.3754% +4.3870% +5.2776%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

Run multiple transfers with the same seed
                        time:   [42.513 ms 42.584 ms 42.657 ms]
                        change: [+3.4443% +3.7777% +4.0976%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

1. Use NSS Release 2. Add workflow_dispatch 3. Move some env values up in the file

codecov-commenter · 2024-02-16T07:14:58Z

Codecov Report

Attention: Patch coverage is 99.09910% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 92.93%. Comparing base (76630a5) to head (809926f).

❗ Current head 809926f differs from pull request most recent head 167816c. Consider uploading reports for the commit 167816c to get more accurate results

Files	Patch %	Lines
neqo-transport/src/send_stream.rs	92.30%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1657      +/-   ##
==========================================
- Coverage   93.08%   92.93%   -0.15%     
==========================================
  Files         117      119       +2     
  Lines       36422    37263     +841     
==========================================
+ Hits        33903    34632     +729     
- Misses       2519     2631     +112

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Co-authored-by: Lars Eggert <[email protected]> Signed-off-by: Martin Thomson <[email protected]>

martinthomson · 2024-02-19T12:04:12Z

@larseggert, I'm looking at the profiles that were generated here and it looks like these are off the main branch. The flamegraph for transfer shows calls into BTreeMap, but I don't see any way that could happen.

larseggert · 2024-02-19T12:12:55Z

I can't get to it this week. Wonder if the checkout action needs another parameter?

Signed-off-by: Lars Eggert <[email protected]>

larseggert · 2024-03-19T06:04:01Z

Let's see what the new flamegraphs show (also because LTO is now enabled.)

github-actions · 2024-03-25T13:25:12Z

Benchmark results

Performance differences relative to 76630a5.

drain a timer quickly time: [395.29 ns 402.55 ns 409.33 ns]
change: [-0.6787% +0.9859% +2.7112%] (p = 0.26 > 0.05)
No change in performance detected.
coalesce_acked_from_zero 1+1 entries
time: [193.97 ns 194.36 ns 194.79 ns]
change: [-0.5237% -0.1538% +0.2215%] (p = 0.41 > 0.05)
No change in performance detected.
coalesce_acked_from_zero 3+1 entries
time: [235.93 ns 236.55 ns 237.18 ns]
change: [-0.8063% -0.4103% -0.0496%] (p = 0.04 < 0.05)
Change within noise threshold.
coalesce_acked_from_zero 10+1 entries
time: [235.78 ns 236.59 ns 237.55 ns]
change: [-0.4030% +3.6386% +13.911%] (p = 0.42 > 0.05)
No change in performance detected.
coalesce_acked_from_zero 1000+1 entries
time: [217.27 ns 222.01 ns 232.88 ns]
change: [+1.4742% +5.6444% +13.185%] (p = 0.06 > 0.05)
No change in performance detected.
RxStreamOrderer::inbound_frame()
time: [117.57 ms 117.65 ms 117.73 ms]
change: [+0.0927% +0.1982% +0.2949%] (p = 0.00 < 0.05)
Change within noise threshold.
transfer/Run multiple transfers with varying seeds
time: [124.35 ms 124.61 ms 124.88 ms]
thrpt: [32.031 MiB/s 32.099 MiB/s 32.169 MiB/s]
change:
time: [+5.4576% +5.7824% +6.1037%] (p = 0.00 < 0.05)
thrpt: [-5.7526% -5.4664% -5.1751%]
💔 Performance has regressed.
transfer/Run multiple transfers with the same seed
time: [124.82 ms 125.00 ms 125.18 ms]
thrpt: [31.953 MiB/s 32.000 MiB/s 32.047 MiB/s]
change:
time: [+5.2953% +5.5022% +5.6981%] (p = 0.00 < 0.05)
thrpt: [-5.3909% -5.2152% -5.0290%]
💔 Performance has regressed.

Client/server transfer results

Transfer of 134217728 bytes over loopback.

Client	Server	CC	Pacing	Mean [ms]	Min [ms]	Max [ms]	Relative
msquic	msquic			428.3 ± 58.8	359.7	538.1	1.00
neqo	msquic	reno	on	1921.6 ± 35.0	1850.3	1960.9	1.00
neqo	msquic	reno		1917.7 ± 30.3	1857.9	1959.7	1.00
neqo	msquic	cubic	on	1898.8 ± 56.6	1775.9	1971.2	1.00
neqo	msquic	cubic		1901.9 ± 46.7	1811.5	1960.3	1.00
msquic	neqo	reno	on	4418.4 ± 170.0	4222.4	4692.8	1.00
msquic	neqo	reno		4510.8 ± 232.7	4225.0	4969.3	1.00
msquic	neqo	cubic	on	4475.1 ± 154.6	4259.7	4672.4	1.00
msquic	neqo	cubic		4434.9 ± 168.7	4204.7	4705.9	1.00
neqo	neqo	reno	on	3664.9 ± 172.2	3387.1	3957.8	1.00
neqo	neqo	reno		3805.0 ± 275.7	3529.5	4374.1	1.00
neqo	neqo	cubic	on	4329.7 ± 141.6	4114.5	4523.6	1.00
neqo	neqo	cubic		4372.4 ± 141.9	4184.3	4553.5	1.00

⬇️ Download logs

martinthomson and others added 5 commits February 15, 2024 15:42

Use a smallvec for loss recovery tokens

0bac724

A small value might be better

f06dc6a

8 is definitely worse

adfa95f

Benchmark tweaks

085989d

1. Use NSS Release 2. Add workflow_dispatch 3. Move some env values up in the file

Merge branch 'main' into smallvec-tokens

7c59c9e

martinthomson and others added 2 commits February 19, 2024 22:15

YAML corrections

43d1b7f

Co-authored-by: Lars Eggert <[email protected]> Signed-off-by: Martin Thomson <[email protected]>

Build NSS optimized

9134871

martinthomson and others added 2 commits February 20, 2024 11:15

Merge branch 'bench-dispatch' into smallvec-tokens

809926f

Merge branch 'main' into smallvec-tokens

ad0e705

Signed-off-by: Lars Eggert <[email protected]>

larseggert added 2 commits March 25, 2024 14:07

Fix

e05f66d

Merge branch 'main' into smallvec-tokens

167816c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track sent items using SmallVec #1657

Track sent items using SmallVec #1657

martinthomson commented Feb 15, 2024

codecov-commenter commented Feb 16, 2024 •

edited by codecov bot

Loading

martinthomson commented Feb 19, 2024

larseggert commented Feb 19, 2024

larseggert commented Mar 19, 2024

github-actions bot commented Mar 25, 2024

Track sent items using SmallVec #1657

Are you sure you want to change the base?

Track sent items using SmallVec #1657

Conversation

martinthomson commented Feb 15, 2024

codecov-commenter commented Feb 16, 2024 • edited by codecov bot Loading

Codecov Report

martinthomson commented Feb 19, 2024

larseggert commented Feb 19, 2024

larseggert commented Mar 19, 2024

github-actions bot commented Mar 25, 2024

Benchmark results

Client/server transfer results

codecov-commenter commented Feb 16, 2024 •

edited by codecov bot

Loading