Join two independent ops into one. A fast impl of one + a slow callback
of another. Here's an example showing optimized paths for latin-1 via
fast call and the next-best fallback using V8 apis.
```rust
#[op(v8)]
fn op_encoding_encode_into_fallback(
scope: &mut v8::HandleScope,
input: serde_v8::Value,
// ...
#[op(fast, slow = op_encoding_encode_into_fallback)]
fn op_encoding_encode_into(
input: Cow<'_, str>,
// ...
```
Benchmark results of the fallback path:
```
time target/release/deno run -A --unstable ./cli/tests/testdata/benches/text_encoder_into_perf.js
________________________________________________________
Executed in 70.90 millis fish external
usr time 57.76 millis 0.23 millis 57.53 millis
sys time 17.02 millis 1.28 millis 15.74 millis
target/release/deno_main run -A --unstable ./cli/tests/testdata/benches/text_encoder_into_perf.js
________________________________________________________
Executed in 154.00 millis fish external
usr time 67.14 millis 0.26 millis 66.88 millis
sys time 38.82 millis 1.47 millis 37.35 millis
```
Currently fast ops will always check for the alignment of a TypedArray
when getting a slice out of them. A match is then done to ensure that
some slice was received and if not a fallback will be requested.
For Uint8Arrays (and WasmMemory which is equivalent to a Uint8Array) the
alignment will always be okay. Rust probably optimises this away for the
most part (since the Uint8Array check is `x % 1 != 0`), but what it
cannot optimise away is the fast ops path's request for fallback options
parameter.
The extra parameter's cost is likely negligible but V8 will need to
check if a fallback was requested and prepare the fallback call just in
case it was. In the future the lack of a fallback may also enable V8 to
much better optimise the result handling.
For V8 created buffers, it seems like all buffers are actually always
guaranteed to be properly aligned: All buffers seem to always be created
8-byte aligned, and creating a 32 bit array or 64 bit array with a
non-aligned offset from an ArrayBuffer is not allowed. Unfortunately,
Deno FFI cannot give the same guarantees, and it is actually possible
for eg. 32 bit arrays to be created unaligned using it. These arrays
work fine (at least on Linux) so it seems like this is not illegal, it
just means that we cannot remove the alignment checking for 32 bit
arrays.
In Rust, it is UB if a slice is mutated while borrowed except through
the slice itself, and it is also UB if a mutable slice is read while
borrowed. The op macro allows borrowing an `ArrayBuffer{,View}` as a
memory slice for the duration of an op, but this is not sound for async
ops, since the `ArrayBuffer` could be accessed from JS during the await
points. This PR therefore disallows such automatic borrowing only for
async ops.
Co-authored-by: Divy Srivastava <dj.srivastava23@gmail.com>
Currently realms are supported on `deno_core`, but there was no support
for async ops anywhere other than the main realm. The main issue is that
the `js_recv_cb` callback, which resolves promises corresponding to
async ops, was only set for the main realm, so async ops in other realms
would never resolve. Furthermore, promise ID's are specific to each
realm, which meant that async ops from other realms would result in a
wrong promise from the main realm being resolved.
This change takes the `ContextState` struct added in #17050, and adds to
it a `js_recv_cb` callback for each realm. Combined with the fact that
that same PR also added a list of known realms to `JsRuntimeState`, and
that #17174 made `OpCtx` instances realm-specific and had them include
an index into that list of known realms, this makes it possible to know
the current realm in the `queue_async_op` and `queue_fast_async_op`
methods, and therefore to send the results of promises for each realm to
that realm, and prevent the ID's from getting mixed up.
Additionally, since promise ID's are no longer unique to the isolate,
having a single set of unrefed ops doesn't work. This change therefore
also moves `unrefed_ops` from `JsRuntimeState` to `ContextState`, and
adds the lengths of the unrefed op sets for all known realms to get the
total number of unrefed ops to compare in the event loop.
This PR is a reland of #14734 after it was reverted in #16366, except
that `ContextState` and `JsRuntimeState::known_realms` were previously
relanded in #17050. Another significant difference with the original PR
is passing around an index into `JsRuntimeState::known_realms` instead
of a `v8::Global<v8::Context>` to identify the realm, because async op
queuing in fast calls cannot call into V8, and therefore cannot have
access to V8 globals. This also simplified the implementation of
`resolve_async_ops`.
Co-authored-by: Luis Malheiro <luismalheiro@gmail.com>
Fixes https://github.com/denoland/deno/issues/16934
Example compiler error:
```
error: mutable opstate is not supported in async ops
--> core/ops_builtin.rs:122:1
|
122 | #[op]
| ^^^^^
|
= note: this error originates in the attribute macro `op` (in Nightly builds, run with -Z macro-backtrace for more info)
```
Uses SeqOneByteString optimization to do zero-copy `&str` arguments in
fast calls.
- [x] Depends on https://github.com/denoland/rusty_v8/pull/1129
- [x] Depends on
https://chromium-review.googlesource.com/c/v8/v8/+/4036884
- [x] Disable in async ops
- [x] Make it work with owned `String` with an extra alloc in fast path.
- [x] Support `Cow<'_, str>`. Owned for slow case, Borrowed for fast
case
```rust
#[op]
fn op_string_len(s: &str) -> u32 {
str.len() as u32
}
```
This PR introduces Wasm ops. These calls are optimized for entry from
Wasm land.
The `#[op(wasm)]` attribute is opt-in.
Last parameter `Option<&mut [u8]>` is the memory slice of the Wasm
module *when entered from a Fast API call*. Otherwise, the user is
expected to implement logic to obtain the memory if `None`
```rust
#[op(wasm)]
pub fn op_args_get(
offset: i32,
buffer_offset: i32,
memory: Option<&mut [u8]>,
) {
// ...
}
```
- [x] Avoid copying buffers.
https://encoding.spec.whatwg.org/#dom-textdecoder-decode
> Implementations are strongly encouraged to use an implementation
strategy that avoids this copy. When doing so they will have to make
sure that changes to input do not affect future calls to
[decode()](https://encoding.spec.whatwg.org/#dom-textdecoder-decode).
- [x] Special op to avoid string label deserialization and parsing.
(Ideally we should map labels to integers in JS)
- [x] Avoid webidl `Object.assign` when options is undefined.
Implements fast scheduling of deferred op futures.
```rs
#[op(fast)]
async fn op_read(
state: Rc<RefCell<OpState>>,
rid: ResourceId,
buf: &mut [u8],
) -> Result<u32, Error> {
// ...
}
```
The future is scheduled via a fast API call and polled by the event loop
after being woken up by its waker.
V8's JIT can do a better job knowing the argument count and also enable
fast call path (in future).
This also lets us call async ops without `opAsync`:
```js
const { ops } = Deno.core;
await ops.op_void_async();
```
this patch: 4405286 ops/sec
main: 3508771 ops/sec
example writeFile benchmark:
```
# before
time 188 ms rate 53191
time 168 ms rate 59523
time 167 ms rate 59880
time 166 ms rate 60240
time 168 ms rate 59523
time 173 ms rate 57803
time 183 ms rate 54644
# after
time 157 ms rate 63694
time 152 ms rate 65789
time 151 ms rate 66225
time 151 ms rate 66225
time 152 ms rate 65789
```