About 2% improvement on WS/HTTP benchmarks, possibly unlocking more
optimizations in the future.
---------
Co-authored-by: Matt Mastracci <matthew@mastracci.com>
This commit changes "eager ops" to directly return a response value
instead of calling "opresponse" callback in JavaScript. This saves
one boundary crossing and has a fantastic impact on the "async_ops.js"
benchmark:
```
v1.32.4
$ deno run cli/bench/async_ops.js
time 329 ms rate 3039513
time 322 ms rate 3105590
time 307 ms rate 3257328
time 301 ms rate 3322259
time 303 ms rate 3300330
time 306 ms rate 3267973
time 300 ms rate 3333333
time 301 ms rate 3322259
time 301 ms rate 3322259
time 301 ms rate 3322259
time 302 ms rate 3311258
time 301 ms rate 3322259
time 302 ms rate 3311258
time 302 ms rate 3311258
time 303 ms rate 3300330
```
```
this branch
$ ./target/release/deno run -A cli/bench/async_ops.js
time 257 ms rate 3891050
time 248 ms rate 4032258
time 251 ms rate 3984063
time 246 ms rate 4065040
time 238 ms rate 4201680
time 227 ms rate 4405286
time 228 ms rate 4385964
time 229 ms rate 4366812
time 228 ms rate 4385964
time 226 ms rate 4424778
time 226 ms rate 4424778
time 227 ms rate 4405286
time 228 ms rate 4385964
time 227 ms rate 4405286
time 228 ms rate 4385964
time 227 ms rate 4405286
time 229 ms rate 4366812
time 228 ms rate 4385964
```
Prerequisite for https://github.com/denoland/deno/pull/18652
- bump deps: the newest `lazy-regex` need newer `oncecell` and
`regex`
- reduce `unwrap`
- remove dep `lazy_static`
- make more regex cached
---------
Co-authored-by: Bartek Iwańczuk <biwanczuk@gmail.com>
This commit changes the build process in a way that preserves already
registered ops in the snapshot. This allows us to skip creating hundreds of
"v8::String" on each startup, but sadly there is still some op registration
going on startup (however we're registering 49 ops instead of >200 ops).
This situation could be further improved, by moving some of the ops
from "runtime/" to a separate extension crates.
---------
Co-authored-by: Divy Srivastava <dj.srivastava23@gmail.com>
Join two independent ops into one. A fast impl of one + a slow callback
of another. Here's an example showing optimized paths for latin-1 via
fast call and the next-best fallback using V8 apis.
```rust
#[op(v8)]
fn op_encoding_encode_into_fallback(
scope: &mut v8::HandleScope,
input: serde_v8::Value,
// ...
#[op(fast, slow = op_encoding_encode_into_fallback)]
fn op_encoding_encode_into(
input: Cow<'_, str>,
// ...
```
Benchmark results of the fallback path:
```
time target/release/deno run -A --unstable ./cli/tests/testdata/benches/text_encoder_into_perf.js
________________________________________________________
Executed in 70.90 millis fish external
usr time 57.76 millis 0.23 millis 57.53 millis
sys time 17.02 millis 1.28 millis 15.74 millis
target/release/deno_main run -A --unstable ./cli/tests/testdata/benches/text_encoder_into_perf.js
________________________________________________________
Executed in 154.00 millis fish external
usr time 67.14 millis 0.26 millis 66.88 millis
sys time 38.82 millis 1.47 millis 37.35 millis
```
Currently fast ops will always check for the alignment of a TypedArray
when getting a slice out of them. A match is then done to ensure that
some slice was received and if not a fallback will be requested.
For Uint8Arrays (and WasmMemory which is equivalent to a Uint8Array) the
alignment will always be okay. Rust probably optimises this away for the
most part (since the Uint8Array check is `x % 1 != 0`), but what it
cannot optimise away is the fast ops path's request for fallback options
parameter.
The extra parameter's cost is likely negligible but V8 will need to
check if a fallback was requested and prepare the fallback call just in
case it was. In the future the lack of a fallback may also enable V8 to
much better optimise the result handling.
For V8 created buffers, it seems like all buffers are actually always
guaranteed to be properly aligned: All buffers seem to always be created
8-byte aligned, and creating a 32 bit array or 64 bit array with a
non-aligned offset from an ArrayBuffer is not allowed. Unfortunately,
Deno FFI cannot give the same guarantees, and it is actually possible
for eg. 32 bit arrays to be created unaligned using it. These arrays
work fine (at least on Linux) so it seems like this is not illegal, it
just means that we cannot remove the alignment checking for 32 bit
arrays.
In Rust, it is UB if a slice is mutated while borrowed except through
the slice itself, and it is also UB if a mutable slice is read while
borrowed. The op macro allows borrowing an `ArrayBuffer{,View}` as a
memory slice for the duration of an op, but this is not sound for async
ops, since the `ArrayBuffer` could be accessed from JS during the await
points. This PR therefore disallows such automatic borrowing only for
async ops.
Co-authored-by: Divy Srivastava <dj.srivastava23@gmail.com>
Currently realms are supported on `deno_core`, but there was no support
for async ops anywhere other than the main realm. The main issue is that
the `js_recv_cb` callback, which resolves promises corresponding to
async ops, was only set for the main realm, so async ops in other realms
would never resolve. Furthermore, promise ID's are specific to each
realm, which meant that async ops from other realms would result in a
wrong promise from the main realm being resolved.
This change takes the `ContextState` struct added in #17050, and adds to
it a `js_recv_cb` callback for each realm. Combined with the fact that
that same PR also added a list of known realms to `JsRuntimeState`, and
that #17174 made `OpCtx` instances realm-specific and had them include
an index into that list of known realms, this makes it possible to know
the current realm in the `queue_async_op` and `queue_fast_async_op`
methods, and therefore to send the results of promises for each realm to
that realm, and prevent the ID's from getting mixed up.
Additionally, since promise ID's are no longer unique to the isolate,
having a single set of unrefed ops doesn't work. This change therefore
also moves `unrefed_ops` from `JsRuntimeState` to `ContextState`, and
adds the lengths of the unrefed op sets for all known realms to get the
total number of unrefed ops to compare in the event loop.
This PR is a reland of #14734 after it was reverted in #16366, except
that `ContextState` and `JsRuntimeState::known_realms` were previously
relanded in #17050. Another significant difference with the original PR
is passing around an index into `JsRuntimeState::known_realms` instead
of a `v8::Global<v8::Context>` to identify the realm, because async op
queuing in fast calls cannot call into V8, and therefore cannot have
access to V8 globals. This also simplified the implementation of
`resolve_async_ops`.
Co-authored-by: Luis Malheiro <luismalheiro@gmail.com>
Fixes https://github.com/denoland/deno/issues/16934
Example compiler error:
```
error: mutable opstate is not supported in async ops
--> core/ops_builtin.rs:122:1
|
122 | #[op]
| ^^^^^
|
= note: this error originates in the attribute macro `op` (in Nightly builds, run with -Z macro-backtrace for more info)
```
Uses SeqOneByteString optimization to do zero-copy `&str` arguments in
fast calls.
- [x] Depends on https://github.com/denoland/rusty_v8/pull/1129
- [x] Depends on
https://chromium-review.googlesource.com/c/v8/v8/+/4036884
- [x] Disable in async ops
- [x] Make it work with owned `String` with an extra alloc in fast path.
- [x] Support `Cow<'_, str>`. Owned for slow case, Borrowed for fast
case
```rust
#[op]
fn op_string_len(s: &str) -> u32 {
str.len() as u32
}
```