JIT Code Zone

SD
stephane ducasse
Mon, Sep 26, 2022 6:09 PM

Hello,
I am wondering how the Pharo VM and Cogit manage the JIT code zone and have several questions related to that! I will ask my questions and what could be the answers, feel free to correct me!

Context: I would like to secure this part of the VM, instrument accesses to it and experiment with changes in its layout/location!

Initialization:

When and how is the code zone memory allocated? And initialized?
The allocation is made through the succession of: allocateMemoryForImage:withHeader: then allocateMemory:desiredPosition:. The initialization is made through initializeCodeZoneFrom:upTo: where trampolines, PIC prototypes and offsets are generated/computed.

What is the layout of the different components of the code zone?
Trampolines and methods/inline caches coexist, trampolines being the first thing generated.

Runtime:
JIT compilation - What is the granularity of a JIT compilation? When does it write to machine code?
The compiler processes bytecodes at the granularity of a method, using outputInstructionsForGeneratedRuntimeAt: or outputInstructionsAt:. It instruments the actual write with enableCodeZoneWriteDuring:flushingCacheWith:. This step only needs write accesses to the method zone region (after the trampolines).

Garbage collection - When is the GC interfering with machine code? For rewriting and compaction?
Marking with mapMachineCode: and compaction compactCogCompiledCode are the two ways the GC temper with machine code. While the first one only needs to read data, the second needs both read and write accesses.

Interaction between interpreter and machine code - How is forward execution (interpreter - machine code) control flow performed? How is backward execution (machine code - interpreter) performed?
I guess the trampolines ceEnterCogCodePopReceiverReg and ceReturnToInterpreter: are the main bridges between both worlds.

Bonus: What is the difference between Cogit and the CoInterpreter?

Thanks in advance for the answers!
https://www.ensta-bretagne.fr/ https://www.labsticc.fr/en/index/
Quentin Ducasse
Doctorant / PhD Student
Lab-STICC – UMR CNRS 6285
www.labsticc.fr

ENSTA Bretagne
Grande école d'ingénieurs et centre de recherche
French State Graduate, Post-Graduate and Research Institute
2 rue François Verny - 29806 Brest Cedex 9 - France
www.ensta-bretagne.fr https://www.ensta-bretagne.fr/

Hello, I am wondering how the Pharo VM and Cogit manage the JIT code zone and have several questions related to that! I will ask my questions and what could be the answers, feel free to correct me! Context: I would like to secure this part of the VM, instrument accesses to it and experiment with changes in its layout/location! Initialization: When and how is the code zone memory allocated? And initialized? The allocation is made through the succession of: allocateMemoryForImage:withHeader: then allocateMemory:desiredPosition:. The initialization is made through initializeCodeZoneFrom:upTo: where trampolines, PIC prototypes and offsets are generated/computed. What is the layout of the different components of the code zone? Trampolines and methods/inline caches coexist, trampolines being the first thing generated. Runtime: JIT compilation - What is the granularity of a JIT compilation? When does it write to machine code? The compiler processes bytecodes at the granularity of a method, using outputInstructionsForGeneratedRuntimeAt: or outputInstructionsAt:. It instruments the actual write with enableCodeZoneWriteDuring:flushingCacheWith:. This step only needs write accesses to the method zone region (after the trampolines). Garbage collection - When is the GC interfering with machine code? For rewriting and compaction? Marking with mapMachineCode: and compaction compactCogCompiledCode are the two ways the GC temper with machine code. While the first one only needs to read data, the second needs both read and write accesses. Interaction between interpreter and machine code - How is forward execution (interpreter - machine code) control flow performed? How is backward execution (machine code - interpreter) performed? I guess the trampolines ceEnterCogCodePopReceiverReg and ceReturnToInterpreter: are the main bridges between both worlds. Bonus: What is the difference between Cogit and the CoInterpreter? Thanks in advance for the answers! <https://www.ensta-bretagne.fr/> <https://www.labsticc.fr/en/index/> Quentin Ducasse Doctorant / PhD Student Lab-STICC – UMR CNRS 6285 www.labsticc.fr ENSTA Bretagne Grande école d'ingénieurs et centre de recherche French State Graduate, Post-Graduate and Research Institute 2 rue François Verny - 29806 Brest Cedex 9 - France www.ensta-bretagne.fr <https://www.ensta-bretagne.fr/>
GP
Guille Polito
Wed, Sep 28, 2022 10:25 AM

Hi,

El 26 sept 2022, a las 20:09, stephane ducasse stephane.ducasse@inria.fr escribió:

Hello,
I am wondering how the Pharo VM and Cogit manage the JIT code zone and have several questions related to that! I will ask my questions and what could be the answers, feel free to correct me!

Context: I would like to secure this part of the VM, instrument accesses to it and experiment with changes in its layout/location!

Initialization:

When and how is the code zone memory allocated? And initialized?
The allocation is made through the succession of: allocateMemoryForImage:withHeader: then allocateMemory:desiredPosition:. The initialization is made through initializeCodeZoneFrom:upTo: where trampolines, PIC prototypes and offsets are generated/computed.

Yes, take a look at the following line in #allocateMemoryForImage:withHeader:

cogCodeBase := self
	               allocateJITMemory: cogCodeSize
	               desiredPosition: header oldBaseAddr - cogCodeSize.

That method is defined only for the simulation as follows (see there is a <doNotGenerate> pragma)

allocateJITMemory: desiredSize _: desiredPosition

<doNotGenerate>
^ memoryManager allocate: desiredSize desiredPosition: desiredPosition

And then there are native implementations of allocateJITMemory() in the C code of the repository. E.g.,

void* allocateJITMemory(usqInt desiredSize, usqInt desiredPosition){

char *address, *alloc;
usqIntptr_t alignment;
sqInt allocBytes;
SYSTEM_INFO sysInfo;

/* determine page boundaries & available address space */
GetSystemInfo(&sysInfo);
pageSize = sysInfo.dwPageSize;
pageMask = ~(pageSize - 1);
minAppAddr = sysInfo.lpMinimumApplicationAddress;
maxAppAddr = sysInfo.lpMaximumApplicationAddress;

alignment = max(pageSize,1024*1024);
address = (char *)(((usqInt)desiredPosition + alignment - 1) & ~(alignment - 1));

alloc = sqAllocateMemorySegmentOfSizeAboveAllocatedSizeInto(roundUpToPage(desiredSize), address, &allocBytes);

if (!alloc) {
	logErrorFromErrno("Could not allocate JIT memory");
	exit(1);
}
return alloc;

}

What is the layout of the different components of the code zone?
Trampolines and methods/inline caches coexist, trampolines being the first thing generated.

The JIT code cache has two main spaces, delimited by the methodZoneBase.
The lower part of the JIT code cache (before the methodZoneBase) is the “Trampoline” region, but it not only contains trampolines.
It contains also other reusable machine code routines for the runtime (for example, there is the routine that reifies the stack frames).
This part of the JIT code cache is fixed on VM initialization. Routines inside this part are not collected and are not moved/relocated.

The upper part of the JIT code cache contains methods, polymorphic inline caches and megamorphic inline caches.
This part is collected/compacted from time to time, here methods can move!

Runtime:
JIT compilation - What is the granularity of a JIT compilation? When does it write to machine code?
The compiler processes bytecodes at the granularity of a method, using outputInstructionsForGeneratedRuntimeAt: or outputInstructionsAt:. It instruments the actual write with enableCodeZoneWriteDuring:flushingCacheWith:.

The core of the compilation is in the last three lines of compileCogMethod:

   (result := self compileEntireMethod) < 0 ifTrue:
	[^coInterpreter cCoerceSimple: result to: #'CogMethod *'].
^self generateCogMethod: selector

The first line generated the IR using a temporary memory buffer.
The last line generates the machine code from the IR.

If the question is how “atomic” is the writing to the JIT code cache, my answer would be “it could be more atomic” :).
Inside the machine code generation, the machine code for all instructions is written by outputInstructionsAt: as you say.
But the JIT does not only write instructions but also some metadata method header and a footer with machine code annotations.

This step only needs write accesses to the method zone region (after the trampolines).

Yes, the move to ARMv8 and the new W^X restrictions forced us to explicitly identify in the code when the code cache is being modified.
That probably could help you.
If you want to know when the code cache is being written, just follow the senders of enableCodeZoneWriteDuring:flushingCacheWith:

Garbage collection - When is the GC interfering with machine code? For rewriting and compaction?
Marking with mapMachineCode: and compaction compactCogCompiledCode are the two ways the GC temper with machine code. While the first one only needs to read data, the second needs both read and write accesses.

Hmm, there are two aspects here.
The JIT code cache will get compacted when it reaches a threshold, by calling compactCogCompiledCode
The compaction will remove methods from the cache, and relink linked sends, so it requires some code patching.

On a second note, the heap GC will traverse the code cache to find roots.
The GC does only need read permissions to extract meta-data, disassemble some machine code and extract object references from the code cache.
So it does not need write nor execute permissoins.

Interaction between interpreter and machine code - How is forward execution (interpreter - machine code) control flow performed? How is backward execution (machine code - interpreter) performed?
I guess the trampolines ceEnterCogCodePopReceiverReg and ceReturnToInterpreter: are the main bridges between both worlds.

Yes, there is ceEnterCogCodePopReceiverReg and ceCallCogCodePopReceiverReg. The calling convention is that the interpreter puts the things required for the activation on the stack and call the enilopmarts.
The enilopmarts will then extract the info from the stack, put them into registers and call the machine code method.

The return is a bit more tricky.
What happens is that when an interpreter method calls a JIT method, we do not push to the stack the bytecode program counter (which will be useless for the machine code).
Instead, we push the address to a trampoline that knows how to return.
The machine returns normally using a ret instruction and jumps to the trampoline, which will massage the stack and let the interpreter continue.

Bonus: What is the difference between Cogit and the CoInterpreter?

Cogit is the compiler.

CoInterpreter is a subclass of the interpreter that has the glue code between the interpreter and the compiler.

:)

G

Thanks in advance for the answers!
https://www.ensta-bretagne.fr/ https://www.labsticc.fr/en/index/
Quentin Ducasse
Doctorant / PhD Student
Lab-STICC – UMR CNRS 6285
www.labsticc.fr http://www.labsticc.fr/

ENSTA Bretagne
Grande école d'ingénieurs et centre de recherche
French State Graduate, Post-Graduate and Research Institute
2 rue François Verny - 29806 Brest Cedex 9 - France
www.ensta-bretagne.fr https://www.ensta-bretagne.fr/


Pharo-vm mailing list -- pharo-vm@lists.pharo.org
To unsubscribe send an email to pharo-vm-leave@lists.pharo.org

Hi, > El 26 sept 2022, a las 20:09, stephane ducasse <stephane.ducasse@inria.fr> escribió: > > > Hello, > I am wondering how the Pharo VM and Cogit manage the JIT code zone and have several questions related to that! I will ask my questions and what could be the answers, feel free to correct me! > > Context: I would like to secure this part of the VM, instrument accesses to it and experiment with changes in its layout/location! > > Initialization: > > When and how is the code zone memory allocated? And initialized? > The allocation is made through the succession of: allocateMemoryForImage:withHeader: then allocateMemory:desiredPosition:. The initialization is made through initializeCodeZoneFrom:upTo: where trampolines, PIC prototypes and offsets are generated/computed. Yes, take a look at the following line in #allocateMemoryForImage:withHeader: cogCodeBase := self allocateJITMemory: cogCodeSize desiredPosition: header oldBaseAddr - cogCodeSize. That method is defined only for the simulation as follows (see there is a <doNotGenerate> pragma) allocateJITMemory: desiredSize _: desiredPosition <doNotGenerate> ^ memoryManager allocate: desiredSize desiredPosition: desiredPosition And then there are native implementations of allocateJITMemory() in the C code of the repository. E.g., void* allocateJITMemory(usqInt desiredSize, usqInt desiredPosition){ char *address, *alloc; usqIntptr_t alignment; sqInt allocBytes; SYSTEM_INFO sysInfo; /* determine page boundaries & available address space */ GetSystemInfo(&sysInfo); pageSize = sysInfo.dwPageSize; pageMask = ~(pageSize - 1); minAppAddr = sysInfo.lpMinimumApplicationAddress; maxAppAddr = sysInfo.lpMaximumApplicationAddress; alignment = max(pageSize,1024*1024); address = (char *)(((usqInt)desiredPosition + alignment - 1) & ~(alignment - 1)); alloc = sqAllocateMemorySegmentOfSizeAboveAllocatedSizeInto(roundUpToPage(desiredSize), address, &allocBytes); if (!alloc) { logErrorFromErrno("Could not allocate JIT memory"); exit(1); } return alloc; } > What is the layout of the different components of the code zone? > Trampolines and methods/inline caches coexist, trampolines being the first thing generated. The JIT code cache has two main spaces, delimited by the methodZoneBase. The lower part of the JIT code cache (before the methodZoneBase) is the “Trampoline” region, but it not only contains trampolines. It contains also other reusable machine code routines for the runtime (for example, there is the routine that reifies the stack frames). This part of the JIT code cache is fixed on VM initialization. Routines inside this part are not collected and are not moved/relocated. The upper part of the JIT code cache contains methods, polymorphic inline caches and megamorphic inline caches. This part is collected/compacted from time to time, here methods can move! > > Runtime: > JIT compilation - What is the granularity of a JIT compilation? When does it write to machine code? > The compiler processes bytecodes at the granularity of a method, using outputInstructionsForGeneratedRuntimeAt: or outputInstructionsAt:. It instruments the actual write with enableCodeZoneWriteDuring:flushingCacheWith:. The core of the compilation is in the last three lines of compileCogMethod: (result := self compileEntireMethod) < 0 ifTrue: [^coInterpreter cCoerceSimple: result to: #'CogMethod *']. ^self generateCogMethod: selector The first line generated the IR using a temporary memory buffer. The last line generates the machine code from the IR. If the question is how “atomic” is the writing to the JIT code cache, my answer would be “it could be more atomic” :). Inside the machine code generation, the machine code for all instructions is written by outputInstructionsAt: as you say. But the JIT does not only write instructions but also some metadata method header and a footer with machine code annotations. > This step only needs write accesses to the method zone region (after the trampolines). Yes, the move to ARMv8 and the new W^X restrictions forced us to explicitly identify in the code when the code cache is being modified. That probably could help you. If you want to know when the code cache is being written, just follow the senders of enableCodeZoneWriteDuring:flushingCacheWith: > Garbage collection - When is the GC interfering with machine code? For rewriting and compaction? > Marking with mapMachineCode: and compaction compactCogCompiledCode are the two ways the GC temper with machine code. While the first one only needs to read data, the second needs both read and write accesses. Hmm, there are two aspects here. The JIT code cache will get compacted when it reaches a threshold, by calling compactCogCompiledCode The compaction will remove methods from the cache, and relink linked sends, so it requires some code patching. On a second note, the heap GC will traverse the code cache to find roots. The GC does only need read permissions to extract meta-data, disassemble some machine code and extract object references from the code cache. So it does not need write nor execute permissoins. > > Interaction between interpreter and machine code - How is forward execution (interpreter - machine code) control flow performed? How is backward execution (machine code - interpreter) performed? > I guess the trampolines ceEnterCogCodePopReceiverReg and ceReturnToInterpreter: are the main bridges between both worlds. Yes, there is ceEnterCogCodePopReceiverReg and ceCallCogCodePopReceiverReg. The calling convention is that the interpreter puts the things required for the activation on the stack and call the enilopmarts. The enilopmarts will then extract the info from the stack, put them into registers and call the machine code method. The return is a bit more tricky. What happens is that when an interpreter method calls a JIT method, we do not push to the stack the bytecode program counter (which will be useless for the machine code). Instead, we push the address to a trampoline that knows how to return. The machine returns normally using a ret instruction and jumps to the trampoline, which will massage the stack and let the interpreter continue. > Bonus: What is the difference between Cogit and the CoInterpreter? > Cogit is the compiler. CoInterpreter is a subclass of the interpreter that has the glue code between the interpreter and the compiler. :) G > > Thanks in advance for the answers! > <https://www.ensta-bretagne.fr/> <https://www.labsticc.fr/en/index/> > Quentin Ducasse > Doctorant / PhD Student > Lab-STICC – UMR CNRS 6285 > www.labsticc.fr <http://www.labsticc.fr/> > > ENSTA Bretagne > Grande école d'ingénieurs et centre de recherche > French State Graduate, Post-Graduate and Research Institute > 2 rue François Verny - 29806 Brest Cedex 9 - France > www.ensta-bretagne.fr <https://www.ensta-bretagne.fr/> > > _______________________________________________ > Pharo-vm mailing list -- pharo-vm@lists.pharo.org > To unsubscribe send an email to pharo-vm-leave@lists.pharo.org