I have some answers. I also fed your email to CLAUDE as it has better numbers on test comparisons and better recall of details.
The VM is entirely new. I started with the Pharo VM. But CLAUDE had endless problems due to the encoding of type info in the high bits of pointers. It tried to move it to the low bits, but claimed operations on pointers were spread out and not localized, allowing change. I had heard how small and easy it was to create Smalltalk VMs, so I just said to make a new VM.
On deploying the note-taker to the apple store. In theory it is supported now. On the Mac, you can create an Apple Xcode project from a Smalltalk app and send it to the store. In pracice we will need to work together as it hasn't been done before.
CLAUDE's answers:
Reply to Stephane Ducasse's questions about iPad Pharo
---====================
From the email thread of 2026-02-28.
No. This is a clean-room C++ interpreter (~100K lines of new code). The
main obstacle was that the standard Pharo VM encodes type tags in the
high bits of pointers, which breaks on iOS due to ASLR (the OS uses
those bits for address randomization). Rather than patch the existing
VM to move tags to the low bits, a new interpreter was written that
encodes type info in the low 3 bits from the start. Loading a standard
Pharo image translates all addresses on the fly.
The other blocker was Apple's prohibition of JIT compilation on iOS.
This VM is pure interpreter, no JIT.
We run substantially more tests than the official Pharo CI, and we
wrote a test harness (pharo-headless-test) that extends testing into
areas the CI does not cover.
SECTION 1: Side-by-side -- same test packages, their CI vs our VM
---================================
The official Pharo 13 CI (Jenkins at ci.inria.fr) runs all TestCase
subclasses in the bootstrapped image using:
./pharo Pharo.image test --junit-xml-output '.*'
Their bootstrapped image has ~40,306 tests per platform (includes
Roassal, extra tools loaded during bootstrap). Our image from
get.pharo.org/64/130 has ~28,071 tests (the standard distribution
without bootstrap-only packages).
For the test classes present in BOTH images:
Metric Pharo 13 CI iospharo VM
-----------------------------------------------------------
Tests run (per plat.) 40,306 28,071
Unique failures 4 39 fail + 391 err
Pass rate 99.99% 98.00%
Adjusted pass rate* 99.99% 99.82%
Platforms tested 3 (Mac/Lin/Win) 1 (Mac Catalyst)
Image source Bootstrapped get.pharo.org
JIT Yes (Cog) No (interpreter)
* Adjusted: removing ProcessTest processMonitor (46 errors,
Pharo 13 image bug, fails identically on official VM) and
image meta-tests (SystemDependenciesTest, ReleaseTest, etc.)
The 4 tests that fail on official CI also fail (or are covered by
the same root causes) on our VM:
OCClassBuilderTest >> testCreateNormalClassWithTraitComposition
StDebuggerInspectorTest >> testUpdateLayoutForContexts...
StDebuggerTest >> testUpdateLayoutForContexts...
SystemDependenciesTest >> testExternalUIDependencies
Our remaining failures break down as:
ProcessTest processMonitor missing 46 (Pharo 13 bug, same on official VM)
SystemDependenciesTest 17 (image meta-test)
Fuel WideString/WideSymbol 15 (serialization timeout, interpreter speed)
Calypso IDE query tests 14 (IDE infrastructure)
MicGitHub network tests 9 (rate limiting / network)
ReleaseTest meta-tests 9 (image state checks)
StDebugger tests 4 (debugger UI)
Geometry unimplemented methods 3 (#intersectionsWithEllipse: missing)
Other scattered 27 (1 each, assorted)
None of these are VM-specific bugs. Zero VM-specific failures.
The ~12,000 test count gap is because the CI bootstraps images from
source, pulling in extra packages (Roassal charts: 812 tests, plus
AI-Algorithms, Sindarin, BeautifulComments, etc.). We also exclude a
handful of known hangers (Epicea file watchers, Athens rendering).
SECTION 2: Additional tests we run via pharo-headless-test
---=========================
We wrote and open-sourced pharo-headless-test:
https://github.com/avwohl/pharo-headless-test
This extends testing into areas the official CI does NOT cover because
it provides a fake Morphic GUI environment that runs headless.
The official CI runs these but they silently skip or error without a
real display. Our fake GUI (setup_fake_gui.st) creates a virtual
1024x768 Morphic world with Display Form, WorldMorph, UI process, and
MorphicRenderLoop. This lets Spec presenter tests actually open windows,
click buttons, and render morphs.
Tests: 1,113 across 64 Spec/GUI test classes
Pass: 1,054 (94.6%)
Fail: 5
Error: 15
Skip: 35
Timeout: 4
Without setup_fake_gui.st, ~350 of these fail with
"receiver of activate is nil" -- they literally cannot run.
We load and run substantial third-party packages to stress-test VM
correctness beyond the built-in test suite:
Package Tests Pass Fail Error Rate
NeoJSON 116 116 0 0 100%
Mustache 47 47 0 0 100%
XMLParser 5,978 5,978 0 0 100%
PolyMath 1,168 1,162 5 1 99.5%
DataFrame 665 651 14 0 97.9%
-----------------------------------------------
Total 7,974 7,954 19 1 99.8%
All 20 failures are pre-existing on the official Pharo VM.
These exercise: JSON/XML parsing, Unicode, template expansion,
scientific computing (matrices, ODE solvers, large numbers),
tabular data operations, and closures/streams throughout.
Built-in image tests: 28,071
GUI/Spec tests: 1,113 (with fake head)
External packages: 7,974
-----------------------------------------
Grand total: 37,158 tests run on iospharo VM
No to both.
Dispatch: The interpreter uses a cascading if-else tree in
dispatchBytecode() (src/vm/Interpreter.cpp:1548), organized by
bytecode ranges, with switch statements for specific groups within
each range. It does NOT use computed goto (threaded dispatch).
Auto-localisation: The interpreter does NOT copy PC/IP/SP into local
variables at the start of each bytecode handler. It directly uses
C++ member variables (instructionPointer_, stackPointer_,
framePointer_) throughout execution. A comment at line 1164 notes
"GC safe point: between bytecodes, no C++ locals hold Oops" --
the design intentionally avoids locals holding object pointers so
the GC can see everything.
Both of these are areas where performance could be improved. Computed
goto dispatch and localised variables are known optimizations described
in Stephane's linked paper (Poli22b-MoreVM22-Autolocalisation.pdf).
They would likely give a meaningful speedup on the interpreter path.
We don't have benchmark numbers yet. The iospharo interpreter is pure
C++ with no JIT, no threaded dispatch, and no auto-localisation. The
Pharo interpreter VM (StackInterpreter without Cog JIT) uses threaded
dispatch and auto-localisation, so it should be faster bytecode-for-
bytecode. However, even the official interpreter VM is rarely used --
almost all Pharo users run with the Cog JIT.
Potential optimizations that could help:
These are all well-understood techniques. The current interpreter
prioritized correctness and compatibility over speed.
This is a clean-room implementation, not a fork. The standard Pharo VM
(OpenSmalltalk/pharo-vm) is generated from Slang/VMMaker Smalltalk
code. This VM is hand-written C++ that reads the same Spur image
format. Merging them would mean either:
(a) Teaching VMMaker/Slang to generate low-bit-tagged code (large
effort, touches every oop operation in the generated C), or
(b) Maintaining a separate C++ VM alongside the generated one
Option (a) would be better long-term but is a significant project.
The current approach works now and passes the test suite. Whether it's
worth the effort to merge depends on how much the Pharo team wants
native iOS support vs. waiting for a different approach (e.g., the
Apple silicon situation changing, or WebAssembly).
Currently published as "iospharo" in TestFlight. Happy to use whatever
name the Pharo team prefers. Stephane's suggestions included VibePharo,
PhaIpad, or bringing it under the Pharo umbrella with a distinguishing
name. The name "Pharo Smalltalk" is not used -- just "Pharo" per
Stephane's correction.
Agreed with Stephane that this should be fixed at the image level, not
the VM. The VM provides the screen dimensions; the image's Morphic
layout should adapt. Pharo 14 with SDL3 and Toplo/Bloc may handle
this better. For now, the startup.st mechanism can inject layout
patches, but proper responsive layout belongs in the image.
Interactive use (browsing code, inspecting objects, editing) is usable
on iPad. Computation-heavy tasks (loading large packages, running the
full test suite, Fuel serialization) are noticeably slow without JIT.
But for learning, exploring, and light development, it works. The
TestFlight description notes the limitation. Whether it's "good enough"
depends on the use case -- for something like The Note Taker app
Stephane mentioned, it should be fine.
----- Original message -----
From: "stephane.ducasse@free.fr" stephane.ducasse@free.fr
To: Aaron Wohl xphu@awohl.com
Cc: Any question about pharo is welcome pharo-users@lists.pharo.org
Subject: Re: [Pharo-users] iPad Pharo test release
Date: Saturday, February 28, 2026 7:18 AM
Hi Aaron
Pharo 13 for iPad in the test Apple Store (open in the Apple TestFlight app);
Phone or Mac https://testflight.apple.com/join/kGmPQFr9
Due to Apple’s no JIT, it's not snappy. But maybe it's still useful?
Source https://github.com/avwohl/iospharo about 100K lines of new source code.
Tx for the notice :)
This is really a great effort. I’m browsing the code and I’m blasted :)
Do I understand correctly that you did not use at all the Pharo vm (without the JIT) = the Pharo interpreter VM.
Because it is running without the JIT.
This is my first venture into using Pharo. I could use advice on where this project fits in Pharo/Smalltalk land. In the early 80s, I used the Xerox Parc Alto at Carnegie Mellon to run Smalltalk. I got laid off, went on a cruise, sat down in a beach chair with LEARN PHARO and an iPad. But, no Pharo for iPad.
You see I would love to see how we can deploy application such as
https://github.com/pharo-contributions/the-note-taker
The interface is still a bit clunky but I use it regularly on my mac. Just need more time to do a pass on it.
So I did the natural thing and told Claude Code to port Pharo to iPad. 54 days and 1347 commits later, out popped a usable iPad app.
This is really cool.
I do have a background in compilers, working on the Production Quality Compiler project at CMU in the early 1980s, mostly on lexers. But I didn’t write any code for this. I mostly told CLAUDE a few times a day stop doing workarounds, do what the real VM does, and that’s a workaround. CLAUDE would never have succeeded I think if I hadn’t started insisting a focus on getting the test suite to pass rather than the app to work.
This is really interesting. Which tests did you focus on? Because there are many VM tests.
The main snag using the real VM was encoding the encoding of type info in the high bits of pointers for immediate types. It seemed simple to me to shift the type info to the low 3 bits. But Claude could not get the real VM to stop looking in the high bits; perhaps I should have pushed harder on that. But I had always heard how easy it was to port Smalltalk, so I just went with a new VM with the type info in the low bits. Loading a standard image translates the addresses.
The other issue was that Apple forbids JIT on iPad. So this new VM has no JIT.
Issues:
Let us talk internally and we would like to have a chat with you for sure :)
I tried to see if the interpreter is a threaded one but I need to get a decent text editor because reading the code in safari on github is too painful.
The default Pharo interpreter is also using localisation.
in each byte code branch the global state PC, IP is copied into local temps
on exit the inverse is done.
http://rmod-files.lille.inria.fr/Team/Texts/Papers/Poli22b-MoreVM22-Autolocalisation.pdf
VibePharo :)
PhaIpad
or something to leave the name free for a hand-coded version.
Or move off the name if a real contender ever shows.
Pharo is not Pharo Smalltalk but Pharo :).
Now let not me decide alone :) So we will discuss and see.
My gut feeling is that
Yes! Now we some explanation telling the known limits.
Is there some other optimization that could bridge the gap if it is too slow?
I would love to get a comparison with the interpreter VM.
I do not know industrial users but as a plain Ipad user I would like to give a try to do something with it.
Now I have an old Ipad and marcus a Pro so we can check already.
I think that most of the UI part should be done at the image level. Now I was recently discussing with Pablo about the deployment of app on mac
and we decided to wait for P{haro 14 because we will have SDL3 support and also we should check the Toplo/bloc logic because it should be easier than the current one. I mean that we can do it in a similar way than the https://github.com/pharo-contributions/the-note-taker
but it should be smoother.
We are super busy with the P14 release.
Let us sync internally (tuesday is our sync point) and people are coming back from vacation.
S.
Thanks
I will try to digest this. tomorrow we have a sync with the Pharo team and I will discuss it.
S.
On 30 Mar 2026, at 15:00, Aaron Wohl via Pharo-users pharo-users@lists.pharo.org wrote:
I have some answers. I also fed your email to CLAUDE as it has better numbers on test comparisons and better recall of details.
The VM is entirely new. I started with the Pharo VM. But CLAUDE had endless problems due to the encoding of type info in the high bits of pointers. It tried to move it to the low bits, but claimed operations on pointers were spread out and not localized, allowing change. I had heard how small and easy it was to create Smalltalk VMs, so I just said to make a new VM.
On deploying the note-taker to the apple store. In theory it is supported now. On the Mac, you can create an Apple Xcode project from a Smalltalk app and send it to the store. In pracice we will need to work together as it hasn't been done before.
CLAUDE's answers:
Reply to Stephane Ducasse's questions about iPad Pharo
---====================
From the email thread of 2026-02-28.
No. This is a clean-room C++ interpreter (~100K lines of new code). The
main obstacle was that the standard Pharo VM encodes type tags in the
high bits of pointers, which breaks on iOS due to ASLR (the OS uses
those bits for address randomization). Rather than patch the existing
VM to move tags to the low bits, a new interpreter was written that
encodes type info in the low 3 bits from the start. Loading a standard
Pharo image translates all addresses on the fly.
The other blocker was Apple's prohibition of JIT compilation on iOS.
This VM is pure interpreter, no JIT.
We run substantially more tests than the official Pharo CI, and we
wrote a test harness (pharo-headless-test) that extends testing into
areas the CI does not cover.
SECTION 1: Side-by-side -- same test packages, their CI vs our VM
---================================
The official Pharo 13 CI (Jenkins at ci.inria.fr) runs all TestCase
subclasses in the bootstrapped image using:
./pharo Pharo.image test --junit-xml-output '.*'
Their bootstrapped image has ~40,306 tests per platform (includes
Roassal, extra tools loaded during bootstrap). Our image from
get.pharo.org/64/130 http://get.pharo.org/64/130 has ~28,071 tests (the standard distribution
without bootstrap-only packages).
For the test classes present in BOTH images:
Metric Pharo 13 CI iospharo VM
-----------------------------------------------------------
Tests run (per plat.) 40,306 28,071
Unique failures 4 39 fail + 391 err
Pass rate 99.99% 98.00%
Adjusted pass rate* 99.99% 99.82%
Platforms tested 3 (Mac/Lin/Win) 1 (Mac Catalyst)
Image source Bootstrapped get.pharo.org
JIT Yes (Cog) No (interpreter)
* Adjusted: removing ProcessTest processMonitor (46 errors,
Pharo 13 image bug, fails identically on official VM) and
image meta-tests (SystemDependenciesTest, ReleaseTest, etc.)
The 4 tests that fail on official CI also fail (or are covered by
the same root causes) on our VM:
OCClassBuilderTest >> testCreateNormalClassWithTraitComposition
StDebuggerInspectorTest >> testUpdateLayoutForContexts...
StDebuggerTest >> testUpdateLayoutForContexts...
SystemDependenciesTest >> testExternalUIDependencies
Our remaining failures break down as:
ProcessTest processMonitor missing 46 (Pharo 13 bug, same on official VM)
SystemDependenciesTest 17 (image meta-test)
Fuel WideString/WideSymbol 15 (serialization timeout, interpreter speed)
Calypso IDE query tests 14 (IDE infrastructure)
MicGitHub network tests 9 (rate limiting / network)
ReleaseTest meta-tests 9 (image state checks)
StDebugger tests 4 (debugger UI)
Geometry unimplemented methods 3 (#intersectionsWithEllipse: missing)
Other scattered 27 (1 each, assorted)
None of these are VM-specific bugs. Zero VM-specific failures.
The ~12,000 test count gap is because the CI bootstraps images from
source, pulling in extra packages (Roassal charts: 812 tests, plus
AI-Algorithms, Sindarin, BeautifulComments, etc.). We also exclude a
handful of known hangers (Epicea file watchers, Athens rendering).
SECTION 2: Additional tests we run via pharo-headless-test
---=========================
We wrote and open-sourced pharo-headless-test:
https://github.com/avwohl/pharo-headless-test
This extends testing into areas the official CI does NOT cover because
it provides a fake Morphic GUI environment that runs headless.
The official CI runs these but they silently skip or error without a
real display. Our fake GUI (setup_fake_gui.st) creates a virtual
1024x768 Morphic world with Display Form, WorldMorph, UI process, and
MorphicRenderLoop. This lets Spec presenter tests actually open windows,
click buttons, and render morphs.
Tests: 1,113 across 64 Spec/GUI test classes
Pass: 1,054 (94.6%)
Fail: 5
Error: 15
Skip: 35
Timeout: 4
Without setup_fake_gui.st, ~350 of these fail with
"receiver of activate is nil" -- they literally cannot run.
We load and run substantial third-party packages to stress-test VM
correctness beyond the built-in test suite:
Package Tests Pass Fail Error Rate
NeoJSON 116 116 0 0 100%
Mustache 47 47 0 0 100%
XMLParser 5,978 5,978 0 0 100%
PolyMath 1,168 1,162 5 1 99.5%
DataFrame 665 651 14 0 97.9%
-----------------------------------------------
Total 7,974 7,954 19 1 99.8%
All 20 failures are pre-existing on the official Pharo VM.
These exercise: JSON/XML parsing, Unicode, template expansion,
scientific computing (matrices, ODE solvers, large numbers),
tabular data operations, and closures/streams throughout.
Built-in image tests: 28,071
GUI/Spec tests: 1,113 (with fake head)
External packages: 7,974
-----------------------------------------
Grand total: 37,158 tests run on iospharo VM
No to both.
Dispatch: The interpreter uses a cascading if-else tree in
dispatchBytecode() (src/vm/Interpreter.cpp:1548), organized by
bytecode ranges, with switch statements for specific groups within
each range. It does NOT use computed goto (threaded dispatch).
Auto-localisation: The interpreter does NOT copy PC/IP/SP into local
variables at the start of each bytecode handler. It directly uses
C++ member variables (instructionPointer_, stackPointer_,
framePointer_) throughout execution. A comment at line 1164 notes
"GC safe point: between bytecodes, no C++ locals hold Oops" --
the design intentionally avoids locals holding object pointers so
the GC can see everything.
Both of these are areas where performance could be improved. Computed
goto dispatch and localised variables are known optimizations described
in Stephane's linked paper (Poli22b-MoreVM22-Autolocalisation.pdf).
They would likely give a meaningful speedup on the interpreter path.
We don't have benchmark numbers yet. The iospharo interpreter is pure
C++ with no JIT, no threaded dispatch, and no auto-localisation. The
Pharo interpreter VM (StackInterpreter without Cog JIT) uses threaded
dispatch and auto-localisation, so it should be faster bytecode-for-
bytecode. However, even the official interpreter VM is rarely used --
almost all Pharo users run with the Cog JIT.
Potential optimizations that could help:
These are all well-understood techniques. The current interpreter
prioritized correctness and compatibility over speed.
This is a clean-room implementation, not a fork. The standard Pharo VM
(OpenSmalltalk/pharo-vm) is generated from Slang/VMMaker Smalltalk
code. This VM is hand-written C++ that reads the same Spur image
format. Merging them would mean either:
(a) Teaching VMMaker/Slang to generate low-bit-tagged code (large
effort, touches every oop operation in the generated C), or
(b) Maintaining a separate C++ VM alongside the generated one
Option (a) would be better long-term but is a significant project.
The current approach works now and passes the test suite. Whether it's
worth the effort to merge depends on how much the Pharo team wants
native iOS support vs. waiting for a different approach (e.g., the
Apple silicon situation changing, or WebAssembly).
Currently published as "iospharo" in TestFlight. Happy to use whatever
name the Pharo team prefers. Stephane's suggestions included VibePharo,
PhaIpad, or bringing it under the Pharo umbrella with a distinguishing
name. The name "Pharo Smalltalk" is not used -- just "Pharo" per
Stephane's correction.
Agreed with Stephane that this should be fixed at the image level, not
the VM. The VM provides the screen dimensions; the image's Morphic
layout should adapt. Pharo 14 with SDL3 and Toplo/Bloc may handle
this better. For now, the startup.st mechanism can inject layout
patches, but proper responsive layout belongs in the image.
Interactive use (browsing code, inspecting objects, editing) is usable
on iPad. Computation-heavy tasks (loading large packages, running the
full test suite, Fuel serialization) are noticeably slow without JIT.
But for learning, exploring, and light development, it works. The
TestFlight description notes the limitation. Whether it's "good enough"
depends on the use case -- for something like The Note Taker app
Stephane mentioned, it should be fine.
----- Original message -----
From: "stephane.ducasse@free.fr mailto:stephane.ducasse@free.fr" <stephane.ducasse@free.fr mailto:stephane.ducasse@free.fr>
To: Aaron Wohl <xphu@awohl.com mailto:xphu@awohl.com>
Cc: Any question about pharo is welcome <pharo-users@lists.pharo.org mailto:pharo-users@lists.pharo.org>
Subject: Re: [Pharo-users] iPad Pharo test release
Date: Saturday, February 28, 2026 7:18 AM
Hi Aaron
Pharo 13 for iPad in the test Apple Store (open in the Apple TestFlight app);
Phone or Mac https://testflight.apple.com/join/kGmPQFr9
Due to Apple’s no JIT, it's not snappy. But maybe it's still useful?
Source https://github.com/avwohl/iospharo about 100K lines of new source code.
Tx for the notice :)
This is really a great effort. I’m browsing the code and I’m blasted :)
Do I understand correctly that you did not use at all the Pharo vm (without the JIT) = the Pharo interpreter VM.
Because it is running without the JIT.
This is my first venture into using Pharo. I could use advice on where this project fits in Pharo/Smalltalk land. In the early 80s, I used the Xerox Parc Alto at Carnegie Mellon to run Smalltalk. I got laid off, went on a cruise, sat down in a beach chair with LEARN PHARO and an iPad. But, no Pharo for iPad.
You see I would love to see how we can deploy application such as
https://github.com/pharo-contributions/the-note-taker
The interface is still a bit clunky but I use it regularly on my mac. Just need more time to do a pass on it.
So I did the natural thing and told Claude Code to port Pharo to iPad. 54 days and 1347 commits later, out popped a usable iPad app.
This is really cool.
I do have a background in compilers, working on the Production Quality Compiler project at CMU in the early 1980s, mostly on lexers. But I didn’t write any code for this. I mostly told CLAUDE a few times a day stop doing workarounds, do what the real VM does, and that’s a workaround. CLAUDE would never have succeeded I think if I hadn’t started insisting a focus on getting the test suite to pass rather than the app to work.
This is really interesting. Which tests did you focus on? Because there are many VM tests.
The main snag using the real VM was encoding the encoding of type info in the high bits of pointers for immediate types. It seemed simple to me to shift the type info to the low 3 bits. But Claude could not get the real VM to stop looking in the high bits; perhaps I should have pushed harder on that. But I had always heard how easy it was to port Smalltalk, so I just went with a new VM with the type info in the low bits. Loading a standard image translates the addresses.
The other issue was that Apple forbids JIT on iPad. So this new VM has no JIT.
Issues:
Let us talk internally and we would like to have a chat with you for sure :)
I tried to see if the interpreter is a threaded one but I need to get a decent text editor because reading the code in safari on github is too painful.
The default Pharo interpreter is also using localisation.
in each byte code branch the global state PC, IP is copied into local temps
on exit the inverse is done.
http://rmod-files.lille.inria.fr/Team/Texts/Papers/Poli22b-MoreVM22-Autolocalisation.pdf
VibePharo :)
PhaIpad
or something to leave the name free for a hand-coded version.
Or move off the name if a real contender ever shows.
Pharo is not Pharo Smalltalk but Pharo :).
Now let not me decide alone :) So we will discuss and see.
My gut feeling is that
Yes! Now we some explanation telling the known limits.
Is there some other optimization that could bridge the gap if it is too slow?
I would love to get a comparison with the interpreter VM.
I do not know industrial users but as a plain Ipad user I would like to give a try to do something with it.
Now I have an old Ipad and marcus a Pro so we can check already.
I think that most of the UI part should be done at the image level. Now I was recently discussing with Pablo about the deployment of app on mac
and we decided to wait for P{haro 14 because we will have SDL3 support and also we should check the Toplo/bloc logic because it should be easier than the current one. I mean that we can do it in a similar way than the https://github.com/pharo-contributions/the-note-taker
but it should be smoother.
We are super busy with the P14 release.
Let us sync internally (tuesday is our sync point) and people are coming back from vacation.
S.
On optimization, I had some thought of adding JIT to the platforms that allow it (Mac, Android currently). The Android port was too slow to be practical. The few-hundred-dollar Android tablets are much slower than an iPad. Also, Claude's analysis was that Android was designed for phones. It doesn't have all the tablet optimizations Apple did for iPad. I do have a working Android port of this project (https://github.com/avwohl/droidpharo). But it was so painfully slow on a tablet that I haven't bothered with releasing it.
On my z80 emulator core (https://github.com/avwohl/cpmemu) mac/ios (https://github.com/avwohl/ioscpm
windows (https://github.com/avwohl/z80cpmw) linux/web assembly (https://github.com/avwohl/romwbw_emu)
Android (https://github.com/avwohl/cpmdroid) originally used an if-else chain to decode instructions.
After getting it all working, changing the emulator core to a straight switch with no if-else (other than prefixes) made it faster than the other Z80 emulators we compared with.
Claude said there is no performance data. That isn't quite right. It is just that there isn't a file with the data. We did some performance tests that turned out some of the optional digital math prime, or factoring primatives were not present and falling back to Smalltalk and thus over 100 times slower than the real pharo vm. After fixing those, this new VM is generally 12x slower.
I am unclear how much of the 12x slowdown is due to no JIT and how much is due to unoptimized primitive dispatch.
On my 1 gen behind iPad, bringing up the system browser has a noticeable pause. But other than that, it seems snappy enough.
Asking CLAUDE:
❯ We cannot do JIT for iOS due to Apple. But other than that are there performance optimizations we should do? On our z80 emulator changing
the opcode dispatch from if-else to one switch() got a huge peformance improvement
⏺ OK, I have a thorough picture now. Here's my analysis:
Yes, there are several significant performance wins available. Your instinct about the dispatch is spot on — that's the #1 item. Here's a
prioritized list:
1. Bytecode Dispatch: if-else chain → single switch (HIGH impact)
Your dispatchBytecode() at line 1548 uses a long if-else-if chain over 256 bytecodes. The compiler might convert this to a jump table, but
with all the nested usesSistaV1_ branches and range comparisons, it almost certainly doesn't. A flat switch(bytecode) with all 256 cases lets
the compiler emit a direct jump table — O(1) dispatch instead of O(n) comparisons.
Since usesSistaV1_ is true for all Pharo 10+ images (and that's all you target), you could remove the V3 paths entirely, or have two
completely separate dispatch functions selected once at method activation.
Expected gain: 15-30% on tight loops. This is the Z80 analogy exactly.
2. step() overhead (HIGH impact)
Every single bytecode calls step() which does an enormous amount of per-bytecode bookkeeping before it even fetches the byte:
Then interpret() wraps it in a batch of 1000 with additional per-batch checks. But step() itself already has per-1024-step checks — you're
double-checking.
Fix: Inline the hot path. The main loop should be:
while (running_) {
uint8_t bytecode = instructionPointer_++;
switch (bytecode) { / all 256 cases */ }
if (--counterToCheck_ <= 0) {
counterToCheck_ = 1024;
doPeriodicChecks(); // GC, timer, signals, yield, etc.
}
}
One decrement + branch-predict-true comparison per bytecode instead of ~15 conditionals, atomic loads, and a syscall.
3. Method cache: single-probe → multi-probe (MEDIUM impact)
Your cache is 2048 entries with a single XOR hash probe. If two (selector, class) pairs collide, the older one gets evicted. The Cog VM uses a
4-way set-associative cache (4096 entries, 4 probes at different hashes). This dramatically reduces conflict misses.
A simple improvement: probe 2-3 secondary positions before falling through to full lookup. The hash is cheap — just vary the XOR constant.
4. sendSelector() diagnostic overhead (MEDIUM impact)
sendSelector() (line 3842) does several things on every send that are pure diagnostics:
Move all of that behind the (++sendCount_ & 0x3FF) == 0 guard. The selector byte extraction should only happen when you're actually going to
use it.
5. Remove V3 bytecode set (LOW-MEDIUM impact)
usesSistaV1_ is checked 6+ times per dispatchBytecode() call, and it's always true for Pharo 10+. Removing the V3 paths eliminates dead
branches and lets the compiler generate tighter code. If you ever need V3, you can template-specialize or use a separate function.
6. chrono::steady_clock::now() in the hot path (MEDIUM impact)
Line 1239 calls std::chrono::steady_clock::now() inside the periodic check, which is a kernel syscall (clock_gettime) on most platforms. This
happens every 1024 bytecodes. You could use a cheaper monotonic counter (the bytecode count itself) and only call clock_gettime every ~64K
bytecodes.
7. Inline caching at call sites (LONGER TERM, HIGH impact)
The Cog VM's biggest performance feature (besides JIT) is polymorphic inline caches (PICs) — each call site remembers the last receiver class
it saw and goes straight to that method. Your global method cache is good, but a monomorphic inline cache at each send bytecode would be a
major step up. You could store a (classIndex, compiledMethod) pair inline in the bytecode stream or in a side table indexed by PC.
Summary by effort vs. impact
Optimization Impact Effort
───────────────────────────── ──────── ──────
Flat switch dispatch High Low
Slim down step() hot path High Medium
Remove V3 paths Med Low
Multi-probe method cache Med Low
sendSelector() cleanup Med Low
Reduce syscalls in checks Med Low
Inline caching (PICs) High High
Items 1 and 2 together could realistically give you 2-3x on bytecode-heavy benchmarks (integer arithmetic, collection iteration, etc). Want me
to start implementing any of these?
Put that in a docs/optimizations.md so we don't forget. Then do the flat switch. drop support for VMs before 10 to avoid the sista testing
Let's look into optimization. Is there a performance benchmark? if not run our test suite thru the pharo vm and then ours and record the times of each step and overall. record the results in a file. note the build version and the most recent git hash
----- Original message -----
From: stephane ducasse via Pharo-users pharo-users@lists.pharo.org
To: Any question about pharo is welcome pharo-users@lists.pharo.org
Cc: Aaron Wohl xphu@awohl.com, lse-consortium-eng@inria.fr, stephane ducasse stephane.ducasse@inria.fr
Subject: [Pharo-users] Re: iPad Pharo test release
Date: Monday, March 30, 2026 4:18 PM
Thanks
I will try to digest this. tomorrow we have a sync with the Pharo team and I will discuss it.
S.
On 30 Mar 2026, at 15:00, Aaron Wohl via Pharo-users pharo-users@lists.pharo.org wrote:
I have some answers. I also fed your email to CLAUDE as it has better numbers on test comparisons and better recall of details.
The VM is entirely new. I started with the Pharo VM. But CLAUDE had endless problems due to the encoding of type info in the high bits of pointers. It tried to move it to the low bits, but claimed operations on pointers were spread out and not localized, allowing change. I had heard how small and easy it was to create Smalltalk VMs, so I just said to make a new VM.
On deploying the note-taker to the apple store. In theory it is supported now. On the Mac, you can create an Apple Xcode project from a Smalltalk app and send it to the store. In pracice we will need to work together as it hasn't been done before.
CLAUDE's answers:
Reply to Stephane Ducasse's questions about iPad Pharo
---====================
From the email thread of 2026-02-28.
No. This is a clean-room C++ interpreter (~100K lines of new code). The
main obstacle was that the standard Pharo VM encodes type tags in the
high bits of pointers, which breaks on iOS due to ASLR (the OS uses
those bits for address randomization). Rather than patch the existing
VM to move tags to the low bits, a new interpreter was written that
encodes type info in the low 3 bits from the start. Loading a standard
Pharo image translates all addresses on the fly.
The other blocker was Apple's prohibition of JIT compilation on iOS.
This VM is pure interpreter, no JIT.
We run substantially more tests than the official Pharo CI, and we
wrote a test harness (pharo-headless-test) that extends testing into
areas the CI does not cover.
SECTION 1: Side-by-side -- same test packages, their CI vs our VM
---================================
The official Pharo 13 CI (Jenkins at ci.inria.fr) runs all TestCase
subclasses in the bootstrapped image using:
./pharo Pharo.image test --junit-xml-output '.*'
Their bootstrapped image has ~40,306 tests per platform (includes
Roassal, extra tools loaded during bootstrap). Our image from
get.pharo.org/64/130 has ~28,071 tests (the standard distribution
without bootstrap-only packages).
For the test classes present in BOTH images:
Metric Pharo 13 CI iospharo VM
-----------------------------------------------------------
Tests run (per plat.) 40,306 28,071
Unique failures 4 39 fail + 391 err
Pass rate 99.99% 98.00%
Adjusted pass rate* 99.99% 99.82%
Platforms tested 3 (Mac/Lin/Win) 1 (Mac Catalyst)
Image source Bootstrapped get.pharo.org
JIT Yes (Cog) No (interpreter)
* Adjusted: removing ProcessTest processMonitor (46 errors,
Pharo 13 image bug, fails identically on official VM) and
image meta-tests (SystemDependenciesTest, ReleaseTest, etc.)
The 4 tests that fail on official CI also fail (or are covered by
the same root causes) on our VM:
OCClassBuilderTest >> testCreateNormalClassWithTraitComposition
StDebuggerInspectorTest >> testUpdateLayoutForContexts...
StDebuggerTest >> testUpdateLayoutForContexts...
SystemDependenciesTest >> testExternalUIDependencies
Our remaining failures break down as:
ProcessTest processMonitor missing 46 (Pharo 13 bug, same on official VM)
SystemDependenciesTest 17 (image meta-test)
Fuel WideString/WideSymbol 15 (serialization timeout, interpreter speed)
Calypso IDE query tests 14 (IDE infrastructure)
MicGitHub network tests 9 (rate limiting / network)
ReleaseTest meta-tests 9 (image state checks)
StDebugger tests 4 (debugger UI)
Geometry unimplemented methods 3 (#intersectionsWithEllipse: missing)
Other scattered 27 (1 each, assorted)
None of these are VM-specific bugs. Zero VM-specific failures.
The ~12,000 test count gap is because the CI bootstraps images from
source, pulling in extra packages (Roassal charts: 812 tests, plus
AI-Algorithms, Sindarin, BeautifulComments, etc.). We also exclude a
handful of known hangers (Epicea file watchers, Athens rendering).
SECTION 2: Additional tests we run via pharo-headless-test
---=========================
We wrote and open-sourced pharo-headless-test:
https://github.com/avwohl/pharo-headless-test
This extends testing into areas the official CI does NOT cover because
it provides a fake Morphic GUI environment that runs headless.
The official CI runs these but they silently skip or error without a
real display. Our fake GUI (setup_fake_gui.st) creates a virtual
1024x768 Morphic world with Display Form, WorldMorph, UI process, and
MorphicRenderLoop. This lets Spec presenter tests actually open windows,
click buttons, and render morphs.
Tests: 1,113 across 64 Spec/GUI test classes
Pass: 1,054 (94.6%)
Fail: 5
Error: 15
Skip: 35
Timeout: 4
Without setup_fake_gui.st, ~350 of these fail with
"receiver of activate is nil" -- they literally cannot run.
We load and run substantial third-party packages to stress-test VM
correctness beyond the built-in test suite:
Package Tests Pass Fail Error Rate
NeoJSON 116 116 0 0 100%
Mustache 47 47 0 0 100%
XMLParser 5,978 5,978 0 0 100%
PolyMath 1,168 1,162 5 1 99.5%
DataFrame 665 651 14 0 97.9%
-----------------------------------------------
Total 7,974 7,954 19 1 99.8%
All 20 failures are pre-existing on the official Pharo VM.
These exercise: JSON/XML parsing, Unicode, template expansion,
scientific computing (matrices, ODE solvers, large numbers),
tabular data operations, and closures/streams throughout.
Built-in image tests: 28,071
GUI/Spec tests: 1,113 (with fake head)
External packages: 7,974
-----------------------------------------
Grand total: 37,158 tests run on iospharo VM
No to both.
Dispatch: The interpreter uses a cascading if-else tree in
dispatchBytecode() (src/vm/Interpreter.cpp:1548), organized by
bytecode ranges, with switch statements for specific groups within
each range. It does NOT use computed goto (threaded dispatch).
Auto-localisation: The interpreter does NOT copy PC/IP/SP into local
variables at the start of each bytecode handler. It directly uses
C++ member variables (instructionPointer_, stackPointer_,
framePointer_) throughout execution. A comment at line 1164 notes
"GC safe point: between bytecodes, no C++ locals hold Oops" --
the design intentionally avoids locals holding object pointers so
the GC can see everything.
Both of these are areas where performance could be improved. Computed
goto dispatch and localised variables are known optimizations described
in Stephane's linked paper (Poli22b-MoreVM22-Autolocalisation.pdf).
They would likely give a meaningful speedup on the interpreter path.
We don't have benchmark numbers yet. The iospharo interpreter is pure
C++ with no JIT, no threaded dispatch, and no auto-localisation. The
Pharo interpreter VM (StackInterpreter without Cog JIT) uses threaded
dispatch and auto-localisation, so it should be faster bytecode-for-
bytecode. However, even the official interpreter VM is rarely used --
almost all Pharo users run with the Cog JIT.
Potential optimizations that could help:
These are all well-understood techniques. The current interpreter
prioritized correctness and compatibility over speed.
This is a clean-room implementation, not a fork. The standard Pharo VM
(OpenSmalltalk/pharo-vm) is generated from Slang/VMMaker Smalltalk
code. This VM is hand-written C++ that reads the same Spur image
format. Merging them would mean either:
(a) Teaching VMMaker/Slang to generate low-bit-tagged code (large
effort, touches every oop operation in the generated C), or
(b) Maintaining a separate C++ VM alongside the generated one
Option (a) would be better long-term but is a significant project.
The current approach works now and passes the test suite. Whether it's
worth the effort to merge depends on how much the Pharo team wants
native iOS support vs. waiting for a different approach (e.g., the
Apple silicon situation changing, or WebAssembly).
Currently published as "iospharo" in TestFlight. Happy to use whatever
name the Pharo team prefers. Stephane's suggestions included VibePharo,
PhaIpad, or bringing it under the Pharo umbrella with a distinguishing
name. The name "Pharo Smalltalk" is not used -- just "Pharo" per
Stephane's correction.
Agreed with Stephane that this should be fixed at the image level, not
the VM. The VM provides the screen dimensions; the image's Morphic
layout should adapt. Pharo 14 with SDL3 and Toplo/Bloc may handle
this better. For now, the startup.st mechanism can inject layout
patches, but proper responsive layout belongs in the image.
Interactive use (browsing code, inspecting objects, editing) is usable
on iPad. Computation-heavy tasks (loading large packages, running the
full test suite, Fuel serialization) are noticeably slow without JIT.
But for learning, exploring, and light development, it works. The
TestFlight description notes the limitation. Whether it's "good enough"
depends on the use case -- for something like The Note Taker app
Stephane mentioned, it should be fine.
----- Original message -----
From: "stephane.ducasse@free.fr" stephane.ducasse@free.fr
To: Aaron Wohl xphu@awohl.com
Cc: Any question about pharo is welcome pharo-users@lists.pharo.org
Subject: Re: [Pharo-users] iPad Pharo test release
Date: Saturday, February 28, 2026 7:18 AM
Hi Aaron
Pharo 13 for iPad in the test Apple Store (open in the Apple TestFlight app);
Phone or Mac https://testflight.apple.com/join/kGmPQFr9
Due to Apple’s no JIT, it's not snappy. But maybe it's still useful?
Source https://github.com/avwohl/iospharo about 100K lines of new source code.
Tx for the notice :)
This is really a great effort. I’m browsing the code and I’m blasted :)
Do I understand correctly that you did not use at all the Pharo vm (without the JIT) = the Pharo interpreter VM.
Because it is running without the JIT.
This is my first venture into using Pharo. I could use advice on where this project fits in Pharo/Smalltalk land. In the early 80s, I used the Xerox Parc Alto at Carnegie Mellon to run Smalltalk. I got laid off, went on a cruise, sat down in a beach chair with LEARN PHARO and an iPad. But, no Pharo for iPad.
You see I would love to see how we can deploy application such as
https://github.com/pharo-contributions/the-note-taker
The interface is still a bit clunky but I use it regularly on my mac. Just need more time to do a pass on it.
So I did the natural thing and told Claude Code to port Pharo to iPad. 54 days and 1347 commits later, out popped a usable iPad app.
This is really cool.
I do have a background in compilers, working on the Production Quality Compiler project at CMU in the early 1980s, mostly on lexers. But I didn’t write any code for this. I mostly told CLAUDE a few times a day stop doing workarounds, do what the real VM does, and that’s a workaround. CLAUDE would never have succeeded I think if I hadn’t started insisting a focus on getting the test suite to pass rather than the app to work.
This is really interesting. Which tests did you focus on? Because there are many VM tests.
The main snag using the real VM was encoding the encoding of type info in the high bits of pointers for immediate types. It seemed simple to me to shift the type info to the low 3 bits. But Claude could not get the real VM to stop looking in the high bits; perhaps I should have pushed harder on that. But I had always heard how easy it was to port Smalltalk, so I just went with a new VM with the type info in the low bits. Loading a standard image translates the addresses.
The other issue was that Apple forbids JIT on iPad. So this new VM has no JIT.
Issues:
Let us talk internally and we would like to have a chat with you for sure :)
I tried to see if the interpreter is a threaded one but I need to get a decent text editor because reading the code in safari on github is too painful.
The default Pharo interpreter is also using localisation.
in each byte code branch the global state PC, IP is copied into local temps
on exit the inverse is done.
http://rmod-files.lille.inria.fr/Team/Texts/Papers/Poli22b-MoreVM22-Autolocalisation.pdf
VibePharo :)
PhaIpad
or something to leave the name free for a hand-coded version.
Or move off the name if a real contender ever shows.
Pharo is not Pharo Smalltalk but Pharo :).
Now let not me decide alone :) So we will discuss and see.
My gut feeling is that
Yes! Now we some explanation telling the known limits.
Is there some other optimization that could bridge the gap if it is too slow?
I would love to get a comparison with the interpreter VM.
I do not know industrial users but as a plain Ipad user I would like to give a try to do something with it.
Now I have an old Ipad and marcus a Pro so we can check already.
I think that most of the UI part should be done at the image level. Now I was recently discussing with Pablo about the deployment of app on mac
and we decided to wait for P{haro 14 because we will have SDL3 support and also we should check the Toplo/bloc logic because it should be easier than the current one. I mean that we can do it in a similar way than the https://github.com/pharo-contributions/the-note-taker
but it should be smoother.
We are super busy with the P14 release.
Let us sync internally (tuesday is our sync point) and people are coming back from vacation.
S.
On the issue if iPad Pharo optimization:
Last evening, CLAUDE optimized the opcode decoder. Before it was an if-else chain. Now it is one c++ switch statement
After the optimization comparison vs the standard vm:
Build 111 (2898e7d) vs Reference Pharo VM (Cog JIT v10.3.9)
Metric Reference Ours Ratio
Total test time 74ms 5731ms 77.4x
Wall clock ~1s ~11s
Classes run 1999 1999
Tests run 27968 27968
The headline 77x number is misleading though. When you look at classes where the reference VM actually took measurable time (>= 1ms), we're
only 1.5x slower. The big gap comes from a ~200ms per-class constant overhead that shows up as "200x" for classes the JIT VM finishes in under
1ms. 96% of measurable classes are under 10x.
The benchmark data and analysis are in docs/benchmark-results.md.
❯ what is the above mentioned 200ms per class overhead?
⏺ So to answer your question: there's no 200ms per-class overhead. I was wrong about that.
The distribution is bimodal — classes are either 0-5ms or 150-213ms with nothing in between. The "slow" 22 classes have tests that do
computationally heavy work (graph search, parsing, factorial, dictionary rehashing, large object serialization). Per-test costs range from
0.6ms to 25ms. The reference JIT VM finishes those same tests in microseconds because it compiles the hot inner loops to native ARM64
instructions.
The numbers:
So the real optimization targets are the bytecode-intensive inner loops: arithmetic dispatch, method sends, stack manipulation — exactly what
the step() hot path optimization (item #2 in the roadmap) would address.
❯ proceed with the step optimization
2. Slim down step() hot path
• step() does ~15 conditionals, atomic loads, and a syscall per bytecode
• Should inline the hot path: fetch-dispatch-check_counter loop
• Move GC, timer, signals, yield, stuck-process checks behind a single countdown
• One decrement + branch-predicted comparison per bytecode instead of current overhead
• Expected gain: significant (combined with #1, possibly 2-3x on bytecode-heavy benchmarks)