Juan José

Say Hello to 1-based indexing in JavaScript!

Juan Arboleda, June 24, 202412 minutes readingin v8 ,javascript

Say Hello to 1-Based indexing in JavaScript with a 3-line change in V8 engine!

If you like my work, you can sponsor me on GitHub!. That helps me sharing more of this content

One of my favorite things in programming is extend functionalities and bisect the whole system. Today I’m publishing the result of my latest experiment with the V8 interpreter.

Indexing by 0 is something that when I was beginning as a programmer never made a lot of sense. In my brain, 1 was always the first element, at least my intuition was considering that as logical sense. But years later, after looking at pointer arithmetic and its implications in arrays you may get why 0 is the beginning (I’m gonna write more deeply about that later).

I made some changes in V8 to make start reading from an array at 1 instead of 0.

The V8 workflow is:

  1. Read the source code.
  2. Parse that source code and convert it into a abstract syntax tree (if valid JS or throws if invalid JS).
  3. Generate a bytecode from that abstract syntax tree.
  4. -Optional, may or may not happen - Optimize that code, make it a machine code instead.

In this experiment I wanted to change the interpretation of the expression array[1] to be ran as it was array[0] . Thankfully V8 allow us to print the byte code of each function in JS.

I’ve created a simple script that will return the max number in two provided args.

function getMax (a,b) {
  return Math.max(a,b);
}

console.log(getMax(process.argv[2], process.argv[3]))

I can ask V8 for the bytecode for that function, even using Node.js, by running that piece of code with the next flags:

  1. --print-bytecode for printing the bytecode.
  2. --print-bytecode-filter="getMax" and just print our getMax function.

So doing a simple node --print-bytecode --print-bytecode-filter="getMax" /tmp/foo.js 1 2 I’ll run that function comparing 1 and 2, feel free to change. That will output something like this.

[generated bytecode for function: getMax (0x265690714ce1 <SharedFunctionInfo getMax>)]
Bytecode length: 16
Parameter count 3
Register count 2
Frame size 16
Bytecode age: 0
   26 S> 0x265690715d7e @    0 : 21 00 00          LdaGlobal [0], [0]
         0x265690715d81 @    3 : c3                Star1
   38 E> 0x265690715d82 @    4 : 2d f9 01 02       GetNamedProperty r1, [1], [2]
         0x265690715d86 @    8 : c4                Star0
   38 E> 0x265690715d87 @    9 : 5f fa f9 03 04 04 CallProperty2 r0, r1, a0, a1, [4]
   47 S> 0x265690715d8d @   15 : a9                Return
Constant pool (size = 2)
0x265690715d29: [FixedArray] in OldSpace
 - map: 0x30cd62300211 <Map(FIXED_ARRAY_TYPE)>
 - length: 2
           0: 0x07392aed6c91 <String[4]: #Math>
           1: 0x07392aed5f69 <String[3]: #max>
Handler Table (size = 0)
Source Position Table (size = 10)
0x265690715d91 <ByteArray[10]>

We can split that into simpler chunks.

[generated bytecode for function: getMax (0x265690714ce1 <SharedFunctionInfo getMax>)]
Bytecode length: 16
Parameter count 3
Register count 2
Frame size 16
Bytecode age: 0

That is telling us that log corresponds to the getMax function, as we specify it in the flags.

The bottom chunk is a bit more different

Constant pool (size = 2)
0x265690715d29: [FixedArray] in OldSpace
 - map: 0x30cd62300211 <Map(FIXED_ARRAY_TYPE)>
 - length: 2
           0: 0x07392aed6c91 <String[4]: #Math>
           1: 0x07392aed5f69 <String[3]: #max>
Handler Table (size = 0)
Source Position Table (size = 10)
0x265690715d91 <ByteArray[10]>

It is a constant pool (as specified), but look, we have there our global Math and its method max loaded. Both indexed, Math is located at 0 and max at 1.

The interesting chunk is the bytecode itself.

   26 S> 0x265690715d7e @    0 : 21 00 00          LdaGlobal [0], [0]
         0x265690715d81 @    3 : c3                Star1
   38 E> 0x265690715d82 @    4 : 2d f9 01 02       GetNamedProperty r1, [1], [2]
         0x265690715d86 @    8 : c4                Star0
   38 E> 0x265690715d87 @    9 : 5f fa f9 03 04 04 CallProperty2 r0, r1, a0, a1, [4]
   47 S> 0x265690715d8d @   15 : a9                Return

Let’s look at the first instruction of that bytecode.

LdaGlobal [0], [0]

It says, LdaGlobal or “Load Global at 0”, which is Math global, according to the constant pool. The [0] after is something for code caching, a feedback vector -not covered in this entry-. We’ll just ignore that for now.

The next two instructions are:

Star1
GetNamedProperty r1, [1], [2]

The Star1 means, store that Math constant into register 1.

The GetNamedProperty r1, [1] means, load the property of Math (or the value of register 1 in this case) that corresponds the constant 1 in our constant pool, which is the max function.

The next chunk is:

Star0
CallProperty2 r0, r1, a0, a1, [4]
Return

The Star0 is known now, that will store Math.max into register 0.

The CallProperty2 , it is a little more complicated structure; I’ll put the source code.

void BytecodeGraphBuilder::VisitCallProperty2() {
  Node* callee =
      environment()->LookupRegister(bytecode_iterator().GetRegisterOperand(0));
  Node* receiver =
      environment()->LookupRegister(bytecode_iterator().GetRegisterOperand(1));
  Node* arg0 =
      environment()->LookupRegister(bytecode_iterator().GetRegisterOperand(2));
  Node* arg1 =
      environment()->LookupRegister(bytecode_iterator().GetRegisterOperand(3));
  int const slot_id = bytecode_iterator().GetIndexOperand(4);
  BuildCall(ConvertReceiverMode::kNotNullOrUndefined,
            {callee, receiver, arg0, arg1, feedback_vector_node()}, slot_id);
}

Source: https://github.com/v8/v8/blob/8b52e5f026dc859cab827ec584f3eedbc0510459/src/compiler/bytecode-graph-builder.cc#L2548

So… by having this bytecode instruction CallProperty2 r0, r1, a0, a1, [4] it means that r0 is the callee (function execution), in this point, r0 is max function. The receiver is Math as the r1 is the “loaded global Math”, and the a0 and a1 corresponds to both numbers provided.

So in our case the expression CallProperty2 r0, r1, a0, a1, [4] could means something like “Invoke/call Math.max with the two provided arguments", but the next instruction is a simple Return. And yes, that will return the value obtained from Math.max(a,b) .

We have read the whole bytecode now!

LdaGlobal [0], [0]
Star1
GetNamedProperty r1, [1], [2]
Star0
CallProperty2 r0, r1, a0, a1, [4]
Return

Constant pool (size = 2)
0x265690715d29: [FixedArray] in OldSpace
 - map: 0x30cd62300211 <Map(FIXED_ARRAY_TYPE)>
 - length: 2
           0: 0x07392aed6c91 <String[4]: #Math>
           1: 0x07392aed5f69 <String[3]: #max>
Handler Table (size = 0)
Source Position Table (size = 10)
0x265690715d91 <ByteArray[10]>

Now, let’s explore this JavaScript snippet that will return the index 3 of a [1,2,3,4,5] array, which is the number 4

function getNumberInIndex3 () {
  const numbers = [1, 2, 3, 4, 5]
  return numbers[3]
}

console.log(getNumberInIndex3())

Let’s print the getNumberInIndex3's bytecode and just focus on that.

node --print-bytecode --print-bytecode-filter="getNumberInIndex3" /tmp/foo.js
CreateArrayLiteral [0], [0], #37
Star0
LdaSmi [3]
GetKeyedProperty r0, [1]
Return

We have three new bytecodes:

  1. CreateArrayLiteral will create an array from our codebase, literally [1, 2, 3, 4, 5]
  2. LdaSmi [3] which means load small integer 3
  3. GetKeyedProperty r0 will load not a named key, but a indexed key.

So that byte code will create an array, and load the index 3.

That is fine, as we are expecting V8 to read that as 0-based indexing. Our next goal is modify the interpreter to by 1-based indexing instead. That is as simple as decreasing every index we try to load from the array. So if numbers[3] is provided, the interpreter should be reading numbers[2] . Our task is simple here, just decrease 1 in the requested index.

Every single time, before calling a GetKeyedProperty, just decrease in 1 the loaded integer. Thankfully there is a simple bytecode for that called Dec , we need to inject that function inside the GetKeyedProperty producer. Let’s see that producer’s source code.

BytecodeArrayBuilder& BytecodeArrayBuilder::LoadKeyedProperty(
    Register object, int feedback_slot) {
  OutputGetKeyedProperty(object, feedback_slot);
  return *this;
}

Source: https://github.com/v8/v8/blob/8b52e5f026dc859cab827ec584f3eedbc0510459/src/interpreter/bytecode-array-builder.cc#L875

As said, the task is simple, decrease by one the loaded integer, by appending this chunk, we will achieve that behaviour.

BytecodeArrayBuilder& BytecodeArrayBuilder::LoadKeyedProperty(
    Register object, int feedback_slot) {
  {
	  // The 0 is for the feedback vector, we don't care about that.
	  OutputDec(0); // Decrement the loaded integer first.
  }
  OutputGetKeyedProperty(object, feedback_slot);
  return *this;
}

After recompiling and running that same snippet, we are expecting the returned value to be 3, instead of 4 as in 1-based indexing array in [1,2,3,4,5] , the expression numbers[3] must be 3.

Let’s run it!

➜ ./out/arm64.debug/d8 --print-bytecode --print-bytecode-filter="getNumberInIndex3" /tmp/foo.js
[generated bytecode for function: getNumberInIndex3 (0x2f9100298509 <SharedFunctionInfo getNumberInIndex3>)]
Bytecode length: 13
Parameter count 1
Register count 1
Frame size 8
         0x3099000400a8 @    0 : 7e 00 00 25       CreateArrayLiteral [0], [0], #37
         0x3099000400ac @    4 : ca                Star0
         0x3099000400ad @    5 : 0d 03             LdaSmi [3]
         0x3099000400af @    7 : 54 00             Dec [0]
         0x3099000400b1 @    9 : 31 f9 01          GetKeyedProperty r0, [1]
         0x3099000400b4 @   12 : af                Return
Constant pool (size = 1)
0x309900040075: [TrustedFixedArray]
 - map: 0x2f9100000595 <Map(TRUSTED_FIXED_ARRAY_TYPE)>
 - length: 1
           0: 0x2f9100298651 <ArrayBoilerplateDescription PACKED_SMI_ELEMENTS, 0x2f9100298635 <FixedArray[5]>>
Handler Table (size = 0)
Source Position Table (size = 0)
3

👆 we made it, we got a 3 as a return value and the Dec [0] bytecode is now part of the interpretation of that snippet. Now we can guarantee that every time we try to access an index; it will be decremented by one; making the syntax follow the 1-based indexing rule but the storage under the hood is the same!

With a 3-line change in V8 source code, now JavaScript follows 1-based indexing in arrays!

You can be my companion on this road; I post on Twitter (yes, Twitter), read more of my blog, and visit my website.

If you like my work, you can sponsor me on GitHub!. That helps me sharing more of this content

Side notes:

  • This blog entry was heavily inspired by the amazing Franziska Hinkelmann’s blogpost about V8’s bytecode, go read it! It is incredible!
  • For the last run, I used d8, changing V8 and re-compile is easier than changing and link it to Node.js.
  • This is just gonna work arrAt[index].
  • This is experimental, this could potentially break something else, somewhere, somehow.