Execution order changes benchmark results #3766

snnsnn · 2024-12-24T08:57:08Z

Hi there! 👋

It appears that the execution order of items in the benchmark suite significantly impacts the reported outcomes. When running the same benchmark multiple times, the fastest library changes depending on the order in which the benchmarks are executed. Below are three separate runs of the handle-event benchmark, with varying results:

Run 1

$:hono/benchmarks/handle-event$ node index.mjs 
Hello hey
worktop x 140,112 ops/sec ±3.25% (88 runs sampled)
custom-implementation x 365,763 ops/sec ±2.15% (88 runs sampled)
Hono x 317,173 ops/sec ±4.31% (83 runs sampled)
itty-router x 321,088 ops/sec ±11.34% (72 runs sampled)
Fastest is custom-implementation

Run 2

$:hono/benchmarks/handle-event$ node index.mjs 
Hello hey
custom-implementation x 374,533 ops/sec ±1.89% (90 runs sampled)
worktop x 149,032 ops/sec ±1.58% (93 runs sampled)
Hono x 320,424 ops/sec ±4.13% (84 runs sampled)
itty-router x 307,371 ops/sec ±18.18% (72 runs sampled)
Fastest is custom-implementation

Run 3

$:hono/benchmarks/handle-event$ node index.mjs 
Hello hey
itty-router x 312,020 ops/sec ±17.62% (74 runs sampled)
custom-implementation x 105,017 ops/sec ±37.42% (36 runs sampled)
worktop x 16,306 ops/sec ±65.94% (30 runs sampled)
Hono x 82,493 ops/sec ±43.59% (34 runs sampled)
Fastest is itty-router

Libraries earlier in the execution order tend to perform better, while those executed later show significant performance degradation, likely due to resource constraints or runtime factors like garbage collection.

Could you consider implementing a solution to improve the accuracy and reliability of these benchmarks? It would ensure fair comparisons and provide more actionable performance insights.

Thank you!

The text was updated successfully, but these errors were encountered:

yusukebe · 2024-12-24T09:33:01Z

Hi @snnsnn

I can't produce those behaviors on my machine(Apple M1 Pro/Mem: 32 GB). From fastest to slowest, they are Hono, sunder, itty-router, and worktop. The result is always the same even if I change the order of execution.

EdamAme-x · 2024-12-24T09:45:18Z

I have previously asked the author of mitata about this issue.

It is possible for the results to vary depending on the order of execution.
However, that is in an environment with significantly less memory, and in an environment with sufficient memory, most of the overhead is shaved off by GC, etc., so the results rarely change depending on the order.

He said.

snnsnn · 2024-12-24T10:09:07Z

@yusukebe The runtime i used node v20.16.0 (x64-linux) with 16GB RAM, it is probably GC. The way I run the benchmarks is cd into benchmarks/handle-event folder and run pnpm run start after installing dependencies and modifying files. I removed sunder as it is not maintained and had module related issues.

It is probably GC related because I try to run the benchmark with deno v2.1.4, I ran out of memory:

<--- JS stacktrace --->
#
# Fatal JavaScript out of memory: Reached heap limit
#
==== C stack trace ===============================

    deno(+0x2d8b133) [0x5c91184c4133]
    deno(+0x2d8a9fb) [0x5c91184c39fb]
    deno(+0x2d85f18) [0x5c91184bef18]
    deno(+0x2ddc1ac) [0x5c91185151ac]
    deno(+0x2f90467) [0x5c91186c9467]
    deno(+0x2f8e613) [0x5c91186c7613]
    deno(+0x2f85211) [0x5c91186be211]
    deno(+0x2f85ce2) [0x5c91186bece2]
    deno(+0x2f6b55a) [0x5c91186a455a]
    deno(+0x33e0d22) [0x5c9118b19d22]
    deno(+0x426ad36) [0x5c91199a3d36]
Trace/breakpoint trap (core dumped)

With bun, the benchmark failed to execute due to error:

bun index.js 
64 |   }
65 | };
66 | var tryDecodeURI = (str) => tryDecode(str, decodeURI);
67 | var getPath = (request) => {
68 |   const url = request.url;
69 |   const start = url.indexOf("/", 8);
                     ^
TypeError: undefined is not an object (evaluating 'url.indexOf')
      at getPath (/home/user/hono/dist/utils/url.js:69:17)
      at #dispatch (/home/user/hono/dist/hono-base.js:170:18)
      at /home/user/hono/benchmarks/handle-event/index.js:131:19
      at fn (/home/user/hono/benchmarks/handle-event/index.js:130:12)
      at /home/user/hono/benchmarks/handle-event/index.js:143:1

Edit: Updating bun to v1.1.42 fixed the issue.

yusukebe · 2024-12-25T01:36:24Z

I am unable to reproduce them. So I cannot do anything regarding this issue.

snnsnn · 2024-12-26T10:38:40Z

I have previously asked the author of mitata about this issue.

It is possible for the results to vary depending on the order of execution.
However, that is in an environment with significantly less memory, and in an environment with sufficient memory, most of the overhead is shaved off by GC, etc., so the results rarely change depending on the order.

He said.

The handle-event benchmarks uses benchmark, not mitata:

import Benchmark from 'benchmark'

snnsnn · 2024-12-26T11:07:37Z

@yusukebe Perhaps these tests could be migrated to the latest version of mitata, which provides configurations to mitigate external factors like garbage collection (GC): Mitata#garbage-collection-pressure.

Additionally, using more realistic test cases and increasing the number of tests to better reflect real-world scenarios might yield more reliable results. I’m not sure if using tsx impacts the benchmarks, but running the tests against compiled outputs might help eliminate any inconsistencies.

On the topic of realistic data, some benchmarks use GitHub API paths for testing. For instance, this Rust-based example could provide useful paths: Matchit Benchmark. Extracting similar paths could enhance the relevance of the tests.

At the moment, I’m on a tight schedule and don’t have in-depth knowledge of Hono’s routers. I came across these tests while searching for a fast router implementation in JavaScript. Please feel free to close the issue or keep it open in case someone else decides to tackle these improvements.

Thank you for the great work on Hono!

Best regards

EdamAme-x · 2024-12-26T13:23:29Z

Hi @snnsnn
I sent PR about this
Can you check this?

snnsnn · 2024-12-26T18:21:05Z

Tests run successfully as PR'd, but the dependencies are not updated to their latest versions.

> hono-benchmark@0.0.1 start
> bun --experimental-specifier-resolution=node index.js

Hello hey
Hello hey
clk: ~4.18 GHz
cpu: AMD Ryzen 7 5800H with Radeon Graphics
runtime: bun 1.1.42 (x64-linux)

benchmark                   avg (min … max) p75   p99    (min … top 1%)
------------------------------------------- -------------------------------
Hello hey
Hello hey
Hono                           2.54 µs/iter   2.50 µs   █▂                 
                      (1.67 µs … 639.07 µs)   6.04 µs ▁███▅▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁
itty-router                    3.13 µs/iter   3.13 µs ▄█                   
                        (2.88 µs … 4.24 µs)   4.17 µs ██▆█▇▂▄▁▂▁▂▂▁▂▁▁▂▁▁▁▂
sunder                         3.17 µs/iter   3.24 µs    ▄ █▄▂             
                        (2.95 µs … 3.61 µs)   3.59 µs ▇▇▃█▇███▃▇▅▃▅▁▁▃▁▁▁▃▅
worktop                        4.66 µs/iter   4.79 µs  ▂  ▂▂█      ▂       
                        (4.29 µs … 5.77 µs)   5.24 µs ▇█▄▁███▄▇▁▇▁▁█▄▄▁▁▁▁▄

Updating the competing libraries to their latest versions would make the comparisons more fair and accurate. However, updating them currently causes the tests to break, leaving only Hono operational. Additionally, the start command can’t be run with node v22.12.0 due to missing .js extensions in import statements, probably the --experimental-specifier-resolution=node flag seems unsupported in this version.

> hono-benchmark@0.0.1 start
> bun --experimental-specifier-resolution=node index.js

Hello hey
clk: ~2.14 GHz
cpu: AMD Ryzen 7 5800H with Radeon Graphics
runtime: bun 1.1.42 (x64-linux)

benchmark                   avg (min … max) p75   p99    (min … top 1%)
------------------------------------------- -------------------------------
Hono                           2.57 µs/iter   2.65 µs ▃ █  ▂               
                        (2.37 µs … 3.36 µs)   3.23 µs █▇█▆▅█▇▇▂▄▂▁▁▁▁▁▁▁▁▂▂

To improve the benchmarks, it might help to clean up the test cases by running them in complete isolation. Using a different request and event for each case, along with mitata’s gc('inner') to trigger garbage collection before every iteration, could lead to more consistent results. Using a single handler per case could also reduce variability from different handlers, focusing the benchmark on routing logic. Adding a more realistic number of routes would make the tests closer to real-world scenarios. See this link for routes from different websites https://github.com/DuskSystems/wayfind/tree/main/benches

EdamAme-x · 2024-12-26T18:52:56Z

It may be necessary to move the benchmark to another repository and connect it with a submodule.

EdamAme-x linked a pull request Dec 26, 2024 that will close this issue

chore: benchmark to mitata #3777

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution order changes benchmark results #3766

Execution order changes benchmark results #3766

snnsnn commented Dec 24, 2024

yusukebe commented Dec 24, 2024 •

edited

Loading

EdamAme-x commented Dec 24, 2024

snnsnn commented Dec 24, 2024 •

edited

Loading

yusukebe commented Dec 25, 2024

snnsnn commented Dec 26, 2024

snnsnn commented Dec 26, 2024

EdamAme-x commented Dec 26, 2024

snnsnn commented Dec 26, 2024 •

edited

Loading

EdamAme-x commented Dec 26, 2024 •

edited

Loading

Execution order changes benchmark results #3766

Execution order changes benchmark results #3766

Comments

snnsnn commented Dec 24, 2024

Run 1

Run 2

Run 3

yusukebe commented Dec 24, 2024 • edited Loading

EdamAme-x commented Dec 24, 2024

snnsnn commented Dec 24, 2024 • edited Loading

yusukebe commented Dec 25, 2024

snnsnn commented Dec 26, 2024

snnsnn commented Dec 26, 2024

EdamAme-x commented Dec 26, 2024

snnsnn commented Dec 26, 2024 • edited Loading

EdamAme-x commented Dec 26, 2024 • edited Loading

yusukebe commented Dec 24, 2024 •

edited

Loading

snnsnn commented Dec 24, 2024 •

edited

Loading

snnsnn commented Dec 26, 2024 •

edited

Loading

EdamAme-x commented Dec 26, 2024 •

edited

Loading