Say Hello to 1-Based indexing in JavaScript with a 3-line change in V8 engine!
If you like my work, you can sponsor me on GitHub!. That helps me sharing more of this content
One of my favorite things in programming is extend functionalities and bisect the whole system. Today I’m publishing the result of my latest experiment with the V8 interpreter.
Indexing by 0
is something that when I was beginning as a programmer never made a lot of sense. In my brain, 1
was always the first element, at least my intuition was considering that as logical sense. But years later, after looking at pointer arithmetic and its implications in arrays you may get why 0
is the beginning (I’m gonna write more deeply about that later).
I made some changes in V8 to make start reading from an array at 1
instead of 0
.
The V8 workflow is:
- Read the source code.
- Parse that source code and convert it into a abstract syntax tree (if valid JS or throws if invalid JS).
- Generate a bytecode from that abstract syntax tree.
- -Optional, may or may not happen - Optimize that code, make it a machine code instead.
In this experiment I wanted to change the interpretation of the expression array[1]
to be ran as it was array[0]
. Thankfully V8 allow us to print the byte code of each function in JS.
I’ve created a simple script that will return the max number in two provided args.
function getMax (a,b) {
return Math.max(a,b);
}
console.log(getMax(process.argv[2], process.argv[3]))
I can ask V8 for the bytecode for that function, even using Node.js, by running that piece of code with the next flags:
--print-bytecode
for printing the bytecode.--print-bytecode-filter="getMax"
and just print ourgetMax
function.
So doing a simple node --print-bytecode --print-bytecode-filter="getMax" /tmp/foo.js 1 2
I’ll run that function comparing 1
and 2
, feel free to change. That will output something like this.
[generated bytecode for function: getMax (0x265690714ce1 <SharedFunctionInfo getMax>)]
Bytecode length: 16
Parameter count 3
Register count 2
Frame size 16
Bytecode age: 0
26 S> 0x265690715d7e @ 0 : 21 00 00 LdaGlobal [0], [0]
0x265690715d81 @ 3 : c3 Star1
38 E> 0x265690715d82 @ 4 : 2d f9 01 02 GetNamedProperty r1, [1], [2]
0x265690715d86 @ 8 : c4 Star0
38 E> 0x265690715d87 @ 9 : 5f fa f9 03 04 04 CallProperty2 r0, r1, a0, a1, [4]
47 S> 0x265690715d8d @ 15 : a9 Return
Constant pool (size = 2)
0x265690715d29: [FixedArray] in OldSpace
- map: 0x30cd62300211 <Map(FIXED_ARRAY_TYPE)>
- length: 2
0: 0x07392aed6c91 <String[4]: #Math>
1: 0x07392aed5f69 <String[3]: #max>
Handler Table (size = 0)
Source Position Table (size = 10)
0x265690715d91 <ByteArray[10]>
We can split that into simpler chunks.
[generated bytecode for function: getMax (0x265690714ce1 <SharedFunctionInfo getMax>)]
Bytecode length: 16
Parameter count 3
Register count 2
Frame size 16
Bytecode age: 0
That is telling us that log corresponds to the getMax
function, as we specify it in the flags.
The bottom chunk is a bit more different
Constant pool (size = 2)
0x265690715d29: [FixedArray] in OldSpace
- map: 0x30cd62300211 <Map(FIXED_ARRAY_TYPE)>
- length: 2
0: 0x07392aed6c91 <String[4]: #Math>
1: 0x07392aed5f69 <String[3]: #max>
Handler Table (size = 0)
Source Position Table (size = 10)
0x265690715d91 <ByteArray[10]>
It is a constant pool (as specified), but look, we have there our global Math
and its method max
loaded. Both indexed, Math
is located at 0
and max
at 1
.
The interesting chunk is the bytecode itself.
26 S> 0x265690715d7e @ 0 : 21 00 00 LdaGlobal [0], [0]
0x265690715d81 @ 3 : c3 Star1
38 E> 0x265690715d82 @ 4 : 2d f9 01 02 GetNamedProperty r1, [1], [2]
0x265690715d86 @ 8 : c4 Star0
38 E> 0x265690715d87 @ 9 : 5f fa f9 03 04 04 CallProperty2 r0, r1, a0, a1, [4]
47 S> 0x265690715d8d @ 15 : a9 Return
Let’s look at the first instruction of that bytecode.
LdaGlobal [0], [0]
It says, LdaGlobal
or “Load Global at 0”, which is Math
global, according to the constant pool. The [0]
after is something for code caching, a feedback vector -not covered in this entry-. We’ll just ignore that for now.
The next two instructions are:
Star1
GetNamedProperty r1, [1], [2]
The Star1
means, store that Math
constant into register 1.
The GetNamedProperty r1, [1]
means, load the property of Math
(or the value of register 1 in this case) that corresponds the constant 1
in our constant pool, which is the max
function.
The next chunk is:
Star0
CallProperty2 r0, r1, a0, a1, [4]
Return
The Star0
is known now, that will store Math.max
into register 0.
The CallProperty2
, it is a little more complicated structure; I’ll put the source code.
void BytecodeGraphBuilder::VisitCallProperty2() {
Node* callee =
environment()->LookupRegister(bytecode_iterator().GetRegisterOperand(0));
Node* receiver =
environment()->LookupRegister(bytecode_iterator().GetRegisterOperand(1));
Node* arg0 =
environment()->LookupRegister(bytecode_iterator().GetRegisterOperand(2));
Node* arg1 =
environment()->LookupRegister(bytecode_iterator().GetRegisterOperand(3));
int const slot_id = bytecode_iterator().GetIndexOperand(4);
BuildCall(ConvertReceiverMode::kNotNullOrUndefined,
{callee, receiver, arg0, arg1, feedback_vector_node()}, slot_id);
}
So… by having this bytecode instruction CallProperty2 r0, r1, a0, a1, [4]
it means that r0
is the callee (function execution), in this point, r0
is max
function. The receiver is Math
as the r1
is the “loaded global Math
”, and the a0
and a1
corresponds to both numbers provided.
So in our case the expression CallProperty2 r0, r1, a0, a1, [4]
could means something like “Invoke/call Math.max with the two provided arguments", but the next instruction is a simple Return
. And yes, that will return the value obtained from Math.max(a,b)
.
We have read the whole bytecode now!
LdaGlobal [0], [0]
Star1
GetNamedProperty r1, [1], [2]
Star0
CallProperty2 r0, r1, a0, a1, [4]
Return
Constant pool (size = 2)
0x265690715d29: [FixedArray] in OldSpace
- map: 0x30cd62300211 <Map(FIXED_ARRAY_TYPE)>
- length: 2
0: 0x07392aed6c91 <String[4]: #Math>
1: 0x07392aed5f69 <String[3]: #max>
Handler Table (size = 0)
Source Position Table (size = 10)
0x265690715d91 <ByteArray[10]>
Now, let’s explore this JavaScript snippet that will return the index 3 of a [1,2,3,4,5]
array, which is the number 4
function getNumberInIndex3 () {
const numbers = [1, 2, 3, 4, 5]
return numbers[3]
}
console.log(getNumberInIndex3())
Let’s print the getNumberInIndex3
's bytecode and just focus on that.
node --print-bytecode --print-bytecode-filter="getNumberInIndex3" /tmp/foo.js
CreateArrayLiteral [0], [0], #37
Star0
LdaSmi [3]
GetKeyedProperty r0, [1]
Return
We have three new bytecodes:
CreateArrayLiteral
will create an array from our codebase, literally[1, 2, 3, 4, 5]
LdaSmi [3]
which means load small integer 3GetKeyedProperty r0
will load not a named key, but a indexed key.
So that byte code will create an array, and load the index 3.
That is fine, as we are expecting V8 to read that as 0-based indexing. Our next goal is modify the interpreter to by 1-based indexing instead. That is as simple as decreasing every index we try to load from the array. So if numbers[3]
is provided, the interpreter should be reading numbers[2]
. Our task is simple here, just decrease 1
in the requested index.
Every single time, before calling a GetKeyedProperty
, just decrease in 1
the loaded integer. Thankfully there is a simple bytecode for that called Dec
, we need to inject that function inside the GetKeyedProperty
producer. Let’s see that producer’s source code.
BytecodeArrayBuilder& BytecodeArrayBuilder::LoadKeyedProperty(
Register object, int feedback_slot) {
OutputGetKeyedProperty(object, feedback_slot);
return *this;
}
As said, the task is simple, decrease by one the loaded integer, by appending this chunk, we will achieve that behaviour.
BytecodeArrayBuilder& BytecodeArrayBuilder::LoadKeyedProperty(
Register object, int feedback_slot) {
{
// The 0 is for the feedback vector, we don't care about that.
OutputDec(0); // Decrement the loaded integer first.
}
OutputGetKeyedProperty(object, feedback_slot);
return *this;
}
After recompiling and running that same snippet, we are expecting the returned value to be 3
, instead of 4
as in 1-based indexing array in [1,2,3,4,5]
, the expression numbers[3]
must be 3
.
Let’s run it!
➜ ./out/arm64.debug/d8 --print-bytecode --print-bytecode-filter="getNumberInIndex3" /tmp/foo.js
[generated bytecode for function: getNumberInIndex3 (0x2f9100298509 <SharedFunctionInfo getNumberInIndex3>)]
Bytecode length: 13
Parameter count 1
Register count 1
Frame size 8
0x3099000400a8 @ 0 : 7e 00 00 25 CreateArrayLiteral [0], [0], #37
0x3099000400ac @ 4 : ca Star0
0x3099000400ad @ 5 : 0d 03 LdaSmi [3]
0x3099000400af @ 7 : 54 00 Dec [0]
0x3099000400b1 @ 9 : 31 f9 01 GetKeyedProperty r0, [1]
0x3099000400b4 @ 12 : af Return
Constant pool (size = 1)
0x309900040075: [TrustedFixedArray]
- map: 0x2f9100000595 <Map(TRUSTED_FIXED_ARRAY_TYPE)>
- length: 1
0: 0x2f9100298651 <ArrayBoilerplateDescription PACKED_SMI_ELEMENTS, 0x2f9100298635 <FixedArray[5]>>
Handler Table (size = 0)
Source Position Table (size = 0)
3
👆 we made it, we got a 3
as a return value and the Dec [0]
bytecode is now part of the interpretation of that snippet. Now we can guarantee that every time we try to access an index; it will be decremented by one; making the syntax follow the 1-based indexing rule but the storage under the hood is the same!
With a 3-line change in V8 source code, now JavaScript follows 1-based indexing in arrays!
You can be my companion on this road; I post on Twitter (yes, Twitter), read more of my blog, and visit my website.
If you like my work, you can sponsor me on GitHub!. That helps me sharing more of this content
Side notes:
- This blog entry was heavily inspired by the amazing Franziska Hinkelmann’s blogpost about V8’s bytecode, go read it! It is incredible!
- For the last run, I used d8, changing V8 and re-compile is easier than changing and link it to Node.js.
- This is just gonna work
arrAt[index]
. - This is experimental, this could potentially break something else, somewhere, somehow.