The "Ahl Benchmark" of BASIC performance

Chuck(G) · Friday at 5:19 AM

cjs said:
Opcode mnemonics are part of the syntax, and very often don't match the CPU documentation. I can't remember whether which of LDA A,#0 or LDAA #0 is official Motorola 6800 synta

For a non-isomorphism, consider 8086 assembly. What does "MOV" translate to? One thing that I found moderately amusing is that the same mnemonic and syntax translates to a different binary rendition when run through, say, MASM vs. DEBUG.

One thing that I found annoying about MASM 2.0 or thereabouts was the automatic and silent transmogrification of LEA reg, imm to MOV reg, imm. Damnit--when I say "LEA" I mean "LEA".

voidstar78 · Friday at 5:59 AM

Hmm, but when you do a LIST, you are effectively "de-compiling" the set of BASIC tokens back into a set of symbolic keywords. So, from a parsing perspective it just seems a lot of similar concepts: if your opcode or p-code is one byte, then you have a set of 256 symbols that get interpreted into some kind of operation or action.

I'm just saying there is a reason BASIC became popular on those late 70s micros rather than something like FORTRAN or C. With very limited ROM resources and effectively no (affordable) file system (the 76-79 timeframe), something like C gets a lot harder to pull off even if you have 64K. Implementing BASIC is similar to an assembler, in just being a kind of token-translator (but of course opcodes get implemented by some microcode, while BASIC tokens are implemented in a ton of opcodes).

Most BASICs (at least that I'm aware of) aren't compiled and are continuously interpreted. Like looking at that 1974 "Illinois" BASIC, they chose to use ASCII 134 for "IF". So if you write your code on the screen

Code:

10 IF A = 5 THEN PRINT "FIVE"

BASIC's "workspace" stores ASCII 134 somewhere, corresponding to the token of "IF". The interpretation of that token changes depending on your "mode" (i.e. doing a LIST versus RUN). LIST will just expand ASCII 134 back to the letters "IF", while during RUN a whole slew of things has to happen (starting with looking at the operands following that token), with the support of those tokens all implemented in ROM.

Similar stuff for an assembler - it encounters the symbolic sequence "ADD" and narrows down a set of opcodes, then parses a few more operands to resolve addressing mode and such, to make a final decision about the corresponding byte code.

I know some "TinyC" and "TinyPascal's" were written for micros - I'm just not sure how effective they were. Like for the SuperPET, I can't recall if its Pascal was in ROM? Or just mentally think about how to implement a "real" high level language without a file system: you're parsing say 10KB of plain-text that's in RAM. That plain-text is parsed into multiple assembler files (say one for each function, so say its 5 of those - you have to decide where in RAM that gets stored because "who knows when" any of those functions gets called {yes by analysis you could figure that out and decide to inline or not}). Now you have to load your assembler -- maybe that's in a ROM, so you can just initialize a vector of the addresses your assembly is in RAM (all while not stomping over the original 10K plaintext -- which maybe you could flush to a casssette storage and just make users reload that -- which I think C on the C64 was like that {a very painful experience}), and pass that to the assembler. That assembler is going to still need its own RAM for some buffering... Finally you get a linker, that has to stitch it all together using either "relocatable code" or carefully not interfere with any of the RAM addresses that your dev tools are using. Not impossible, but I'd say a lot more to coordinate and tackle than tokenized BASIC And that's how I can appreciate BASIC for what it is - make some logic decisions and control output reactions, and store stuff in arrays, in a much easier symbolic fashion even on very-resourced limited systems.

krebizfan · Friday at 6:27 AM

Some of the Microsoft BASIC descendants used double byte tokens, piggybacking off one of the higher value characters to permit an extra 128 possible tokens. The Color Computer had only one set of double byte tokens. IBM BASIC for the IBM PC had 3 sets of double byte tokens providing a potential of 512 tokens in addition to the 128 characters of 7-bit ASCII. AFAICT, no MS BASIC variant tried giving a token a value under 128, even as the second byte of a two byte token.

BASIC wasn't the only ROM choice for micros. Forth was famously used by the Jupiter Ace and a number of trainers*. Some trainers had a simple editor and assembler in ROM. The Nascom had an option for Blue Label Pascal in ROM which saved 16K of RAM. Blue Label became Turbo Pascal. Then computers reached the point of switching to software that wasn't for development with applications in ROM.

* The Multitech MPF-1/88 might be a leader in this. ROMs could have BASIC, Assembler with Editor, and Forth installed with a menu to decide which was in use. Plus, with the addition of an expansion board, graphics cards, floppy controllers, and memory cards could be added resulting in a system that could run some IBM PC software. Multitech became Acer and they switched to more conventional clones fairly soon.

Plasma · Friday at 1:48 PM

cjs said:
Opcode mnemonics are part of the syntax, and very often don't match the CPU documentation. I can't remember whether which of LDA A,#0 or LDAA #0 is official Motorola 6800 syntax—I think the former—but many programs use the latter and there's no issue there. For a less trivial example, I use only Z80 mnemonics for 8080 programming, even through they don't match the documentation of the 8085 CPU on my trainer board where I run them.

The key again is isomorphisms: there is a strict one between an 8080 opcode and its operands and its 8080-syntax assembly language representation, and the same between an 8080 opcode and its operands and its Z80-syntax assembly language representation. So all three represent the exact same thing, and you can derive any one from any of the others.

If you want others to be able to read your code, they should match or at least be close (LDA vs LDAA). I would consider the Z80 vs 8080 mnemonics a special case, and you are still matching the documentation for one of those. It would be very unwise for an assembler author to "invent" their own mnemonics that are completely different from any CPU docs.

cjs · Friday at 2:57 PM

Chuck(G) said:
For a non-isomorphism, consider 8086 assembly. What does "MOV" translate to? One thing that I found moderately amusing is that the same mnemonic and syntax translates to a different binary rendition when run through, say, MASM vs. DEBUG.

This is why I was careful to say that "Opcode mnemonics are part of the syntax" and "there is a strict one between an 8080 opcode and its operands." MOV itself could translate to multiple different opcodes; you need the operands in order to know the opcode.

voidstar78 said:
Hmm, but when you do a LIST, you are effectively "de-compiling" the set of BASIC tokens back into a set of symbolic keywords. So, from a parsing perspective it just seems a lot of similar concepts: if your opcode or p-code is one byte, then you have a set of 256 symbols that get interpreted into some kind of operation or action.

No, tokenised BASIC is very different from P-code. Tokenised BASIC is another form of source code; P-code is object code generated by a compiler that is one implementation of a compilation of the the source code (out of many possible ones).

That there's a direct and easy isomorphism (performed by the tokeniser and the LIST command) should make it clear that tokenisation is source code. And you can also observe that tokenisation does no checks of even syntax, much less semantics: you can type 10 =IF)7 into a C64, which will happily accept it, and show it back to you with LIST, though this is nonsense as a BASIC program.

voidstar78 said:
I'm just saying there is a reason BASIC became popular on those late 70s micros rather than something like FORTRAN or C.

Yes. And that would be because Paul Allen, Monte Davidoff and Bill Gates wrote MS-BASIC right at the start of the personal computer revolution, marketed it well, and sold enough of it early on that it essentially became a standard that soon (nearly) every manufacturer decided to follow.

They chose BASIC because it was what they happened to know; there were other options that were both more powerful and easier to implement, such as Lisp.

voidstar78 said:
Implementing BASIC is similar to an assembler, in just being a kind of token-translator....

No, that's very, very wrong. To see why that's so wrong, write a little "token-translator" that can deal with something even as simple as:

Code:

10 J=0
20 GOSUB 100
30 END
100 IF J > 100 THEN RETURN
110 FOR I = 1 TO 3: J = J + I : NEXT
120 GOSUB 100

What you will come up with will be very much different from an assembler.

voidstar78 said:
Or just mentally think about how to implement a "real" high level language without a file system: you're parsing say 10KB of plain-text that's in RAM. That plain-text is parsed into multiple assembler files....

Well, yes, that would be very difficult, if you chose such a ridiculous way to implement it. Not even compilers with a file system available are quite so foolish.

If you're going to implement a compiled high-level language without a filesystem, you'd generally do it the same way assemblers without filesystems were done: you have an area of RAM for source code, another area of RAM for object code, and you directly assemble/compile the source to object code. There's no need for or point to a linker in such a situation; linkers are useful when you have library archives, which clearly you don't in that situation.

And, in fact, at last one (somewhat) high-level language was done in exactly this way. I can't recall the name of it now, but I believe it was something from Motorola designed for doing mathematical programs.

Chuck(G) · Friday at 4:00 PM

cjs said:
This is why I was careful to say that "Opcode mnemonics are part of the syntax" and "there is a strict one between an 8080 opcode and its operands." MOV itself could translate to multiple different opcodes; you need the operands in order to know the opcode.

But that was my point--MOV syntax being exactly the same between MASM and DEBUG won't generate the same code in some instances. Note that there are special-cased "short" forms of MOV as well as long forms to accomplish the same thing. MASM will pick the short form, while DEBUG always uses the long general form.

krebizfan · Friday at 4:08 PM

cjs said:
If you're going to implement a compiled high-level language without a filesystem, you'd generally do it the same way assemblers without filesystems were done: you have an area of RAM for source code, another area of RAM for object code, and you directly assemble/compile the source to object code. There's no need for or point to a linker in such a situation; linkers are useful when you have library archives, which clearly you don't in that situation.

And, in fact, at last one (somewhat) high-level language was done in exactly this way. I can't recall the name of it now, but I believe it was something from Motorola designed for doing mathematical programs.

Turbo Pascal did that in its early versions as befits a design that started off on a cassette only system.

The heavily documented p-code system was UCSD's and information on that can be found at http://pascal.hansotten.com/ucsd-p-system/more-on-p-code/

voidstar78 · 2024-05-18T09:38:12+0100

cjs said:
They chose BASIC because it was what they happened to know; there were other options that were both more powerful and easier to implement, such as Lisp.

I don't think it was quite that simple. Here is the Wang2200 BASIC manual also from 1974 (pre-dating Altair), which they also included floating point support.

https://www.wang2200.org/docs/language/2200_AB_BasicProgrammingManual.700-3231.2-74.pdf

Show me a ROM Lisp in under 8K that can also do floating point. APL was more favored for the IBM 5100, but even IBM was convinced that BASIC was a necessity (though using a whopping 36KB of ROM). They took the extra year needed to integrate that into their system.

cjs said:
No, that's very, very wrong. To see why that's so wrong, write a little "token-translator" that can deal with something even as simple as:

Sure, it is going to LOOK very different, but conceptually there are similarities. With some notional CISC-like instructions, that BASIC might look like this:

Code:

data.j: .byte 0
data.i: .byte 1

  .org
  lda data.j    ; j = 0
  bga 100, end  ; if j > 100 then end
  ldb data.i    ; i = 1
again:
  incb          ; i = i+1
  bgb 3, end    ; if i > 3 then end
  jmp again

end:

And emphasis on "similar" - yes in the assembler you use the default .org as akin to "where the program starts". Since BASIC doesn't require a file system, it can't depend on the row in a file to imply an ordering - so the actual address of the "code" (meaning that tokenized stuff, not native machine code) is abstracted away by those horrid line numbers (an unfortunate necessary evil just due to not having an editor or file system). Once you do have a file system with a row/line designator, then the line numbers almost immediately become moot.

Conversely, of course, you can embed your "inline machine code" in BASIC with a bunch of DATA sequences (only if you have POKE and SYS, which not all did).

And to clarify, I'm using p-code like described in the TinyPascal article here:

Byte Magazine Volume 03 Number 09 - Graphic Manipulations : Free Download, Borrow, and Streaming : Internet Archive

Foreground p.58 A -TINY- PASCAL COMPILER, Part 1: The P-Code Interpreter [theme Software] [author Chung-Yuen] p.94 LET YOUR FINGERS DO THE TALKING: Scanner...

archive.org

cjs said:
Not even compilers with a file system available are quite so foolish.

That was the intent of the thought exercise - to recognize how much we take for granted on having a file system available (or once that becomes available, then yes quickly far better things than BASIC become available - but the practicality of BASIC, if just for that immediate-mode, is still nice).

A linker is needed once your code becomes larger than a segment size. Or like in cc65 where there is that linker configuration file, to "blank out" certain region of memory as "off limits" during the final linking stage. Linking isn't just about bringing in a .lib or .a files. It's become that in the modern sense, but fundamentally it's deciding where in the address space your object code needs to be placed (then an "executable" file helps guide that placement more organically each time the code is loaded -- it used to be that code-size mattered -- hence the optimize for size options - but that's rare these days).

cjs · 2024-05-18T12:25:50+0100

voidstar78 said:
Show me a ROM Lisp in under 8K that can also do floating point.

I don't happen to have one handy (I'm not even aware, off-hand, of any Lisps put into ROM), but early Lisps pre-dating microcomputers were often that small and did floating point, IIRC.

But I'm more than a little mystified by how you could even think that this wouldn't be possible. The side of the floating point code doesn't change significantly whether you're using it for BASIC, Lisp, or any other language, and a minimal Lisp interpreter is far smaller than a BASIC interpreter. So anything you think it was important BASIC did in a small space by nature would have been done by Lisp in as small or smaller a space simply because Lisp has far less cruft to deal with.

voidstar78 said:
Sure, it is going to LOOK very different, but conceptually there are similarities. With some notional CISC-like instructions, that BASIC might look like this:

I'm starting to wonder if you're trolling me here. Hand-compiling half of a program you've been given is not at all the same thing as writing an interpreter or compile, any more than hand-assembling a bit of assembly code is the same thing has having an assembler.

Try again, this time writing a program that reads the program I gave, or very similar ones, and interprets or compiles it.

voidstar78 said:
Since BASIC doesn't require a file system, it can't depend on the row in a file to imply an ordering - so the actual address of the "code" (meaning that tokenized stuff, not native machine code) is abstracted away by those horrid line numbers....

It's not "abstracted away" by those line numbers: the line numbers (which are in the tokenised code) are the addresses for GOTO, GOSUB, etc.

voidstar78 said:
...but the practicality of BASIC, if just for that immediate-mode, is still nice).

And many other languages have that same "immediate mode" as well, including Lisp. In fact, that's where BASIC gets the whole idea of its interactivity from: Thomas E. Kurtz, while he was visiting MIT, met John McCarthy who suggested that time sharing would get around the issue of long delays in feedback due to having to submit jobs on cards to be run on batch systems. There's no question in my mind that Kurtz had seen the interactive Lisp systems that were in common use in McCarthy's lab at MIT at the time and modeled BASIC on that idea ("sit down a terminal, type some program code, and see what comes back").

voidstar78 said:
A linker is needed once your code becomes larger than a segment size. Or like in cc65 where there is that linker configuration file, to "blank out" certain region of memory as "off limits" during the final linking stage....

I think you're getting distracted by the whole linking thing: it's clearly not necessary for the sort of systems we're talking about: neither MS-BASIC nor many popular Lisp implementations used it.

voidstar78 · 2024-05-18T19:17:00+0100

Well let's try this: below is the IBM5110 emulator I put together, just using it as a way to demonstrate how tokenized BASIC looks like. (and sure, I can agree whether it is Lisp, BASIC, Java, whatever syntax sugar you pick, they could be tokenized - just some choices on that syntax are a bit more level of effort to do that tokenization)

I just know the IBM5110 starts tokenizing at address 0x120E. Also note syntactically, the IBM 5110's BASIC doesn't use THEN and doesn't support compound statements (i.e. no ":"). So a few minor adjustments were necessary. It ends up tokenized like this:

Code:

                         @120E
                         NEXT PREV LINE  CODE
10 J=0                   121B 0000 00 10 33 D1 7E 3A 00 00 FF  : (J=D1)
20 GOSUB 100             1225 120E 00 20 31 01 00 FF
30 END                   122D 121B 00 30 34 FF                 v-"100"
100 IF J>100 GOTO 130    123F 1225 01 00 2E 00 6E D1 6E 3A 00 64 FE 01 30 FF
110 FOR I=1 TO 3         1250 122D 01 10 29 C9 40 3A 00 01 FE 3A 00 03 FF
115 J=J+I                1250 123F 01 15 33 D1 7E D1 4E C9 FF
116 NEXT I               127B 1250 01 16 2A C9 40 00 ..<x18>.. 00 FF
120 GOSUB 100            1285 125D 01 20 31 01 00 FF
130 RETURN               0000 127B 01 30 26 FF

That's not native PALM machine code for that system, it's just the set of representative tokens they chose to use (so the "plain text" of the program doesn't need to be stored in memory, but rather this "compressed tokenized form"). Another system could technically "run' that same p-code (and maybe a CPU could be microcode'd to even run it as its native instruction set, just it would hugely inefficient to do so).

The BASIC line number is not the memory address - but, like you said, the tokenized form (which is stored at some "system decided address", as indicated by the next/prev linked list addresses prior to the executive tokens) also contains the original user-supplied line number. That's how I meant "abstracted away" to the end user (on the specifics of what actual physical address the program tokens are stored). To support much larger programs, or combinations of programs, and you need to start using CHAIN or for other reasons, it's no guarantee your program will get tokenized always to these same addresses in the general-case (for tiny examples like this they likely will get the same address).

But from the above token, you can see some trends: <33> is their code for assignment (=), D1 is associated with the variable J, the difference between END vs RETURN is just token 34 vs 26, GOSUB is represented by "opcode" <31>, etc. So you're right, the similarity may be more to an instruction set than "assembly language" per se - the lines there get blurred to me (yeah, blasphemy to some; I do recognize that assembly generally does need two-passes to sort out the branch distances - I recall Gary Kildall writing some early 70's ones needing three passes - so I'm not trying to trivialize the specifics of an assembler).

But the similarity I meant is just that principle of <opcode> <operands> getting interpreted (and ok, BASIC isn't unique in that - but it particular approach could fit in 4KB ROMs, and when trying to mass produce inexpensive systems, every cent counts). The code already exists to parse your sample into that fashion of <opcode> <operand> and to interpret it, it's nearly any ROM BASIC you can find. Maybe other "languages" could have fit - we can do a lot now in hindsight - but for whatever reason, it seems BASIC was the main choice.

And I can agree the relative "success" of BASIC can be attributed more to marketing than any technical "goodness." But the overall appealing aspects I think were accepted several years before Microsoft (just due to seeing earlier Wang, Tektronics, and HP systems "adopting" BASIC in their more mainstream desk top systems). Wang going so far as to even implement their BASIC using TTL chips. The name itself is part of that marketing, and maybe that also attributed to the appeal. APL struggled (IMO) just since finding a "normal" keyboard at all (early/mid 70s) was a challenge, let alone dealing with an extra translation of "funny symbols".

cjs said:
I think you're getting distracted by the whole linking thing: it's clearly not necessary for the sort of systems we're talking about: neither MS-BASIC nor many popular Lisp implementations used it.

From what I've seen, most of the "micros" didn't bother supporting the CHAIN keyword in BASIC. Sure, that's not the only way to handle "large programs" in BASIC. The general idea is to keep your variable/array state while loading in a new program (like a fancy sort/reporting routine, or in a game maybe some fancy path-searching routine to make interesting opponent AI). An excellent implementation of Hangman on the X16, done in BASIC, was running out of FRE(0) space to add any more features. You're right, that systems to that scale will generally have a file system of some sort. Just in most BASICs, you can't DIM more than about 5000-6000 (or FRE(0)/5, i think?). Sure it's a system resource limitation (to have that much data resident at once in addition to the code) more than an inherent language limitation. To me I think it just shows how BASIC had its place for a time in resource-limited systems (and others have argued that boot-up BASIC systems were just "fancy programmable calculators" -- maybe, but some million-dollar businesses did run using BASIC for payroll, etc.).

Chuck(G) · 2024-05-18T20:09:39+0100

As I've mentioned earlier, we used the MCBA accounting/inventory/payroll suite on an 8085 system. MCBA was acquired about a decade ago by Geneva Software: https://www.genevasoftware.com/AboutUs.aspx

Serious stuff.

cjs · 2024-05-19T03:15:44+0100

voidstar78 said:

Code:

                         @120E
                         NEXT PREV LINE  CODE
10 J=0                   121B 0000 00 10 33 D1 7E 3A 00 00 FF  : (J=D1)
20 GOSUB 100             1225 120E 00 20 31 01 00 FF
30 END                   122D 121B 00 30 34 FF                 v-"100"
100 IF J>100 GOTO 130    123F 1225 01 00 2E 00 6E D1 6E 3A 00 64 FE 01 30 FF
110 FOR I=1 TO 3         1250 122D 01 10 29 C9 40 3A 00 01 FE 3A 00 03 FF
115 J=J+I                1250 123F 01 15 33 D1 7E D1 4E C9 FF
116 NEXT I               127B 1250 01 16 2A C9 40 00 ..<x18>.. 00 FF
120 GOSUB 100            1285 125D 01 20 31 01 00 FF
130 RETURN               0000 127B 01 30 26 FF

So that is actual IBM 5110 "tokenised" BASIC code? As far as I can tell that is not "tokenised" in the sense that MS-BASIC is, but instead seems to be some intermediate form; though it would also be a lot clearer if you annotated it. Here's the beginnings of a reversed-engineered annotation of the first few lines:

Code:

                       @120E
                      │ NEXT │ PREV │ LINE │ CODE
10 J=0                │ 121B │ 0000 │ 0010 │ 33 D1 7E 3A 00 00 FF  : (J=D1)
                                             '3 D1 '~ ': ?  ?  Ω
20 GOSUB 100          │ 1225 │ 120E │ 0020 │ 31 01 00 FF
                                             τS  100  Ω

I've marked ASCII characters with ~ followed by the ASCII value, though these may not actually be representing ASCII characters in the source, what appear to be line terminators with Ω, and keywords with τ followed by a letter, e.g., τS for GOSUB.

It's pretty clear to see here that this is not BASIC tokenization in the MS-BASIC sense of the term: it's doing some significant additional processing that involves not just removing spaces and the like (which is one step in heading towards an AST) but it also seems to be renaming variables and, well, I've not investigated enough further to see what's really going on there. If you can produce a small, simple translator between the two forms and post it here, that would explain a lot more.

In MS-BASIC, as I mentioned before the tokenisation is a very simple substitution of tokens for certain strings and back again: both are source code just expressed in a slightly different form. (This is true even in the later BASICs where numbers are tokenised; there's a bit more work because numeric tokenisation form depends on what comes before the number─a GOTO gets a line number instead of a float, etc.─but that's not changing the core of what it's doing.)

If you want to examine how that works you can have a look at my MSX-BASIC de-/re-tokeniser on a branch in r8format. That includes the code itself, plenty of test data (under programs/) that's easy to examine, de- and re-tokenisation command line tools, and a program to hexdump a tokenised MS-BASIC program in a more readable format than a regular hexdump.

voidstar78 said:
Another system could technically "run' that same p-code (and maybe a CPU could be microcode'd to even run it as its native instruction set, just it would hugely inefficient to do so).

I think you may not be clear on how much difference there is between an interpreter that can run this kind of code and an "interpreter" that runs standard machine code. For example, CPUs do not normally have "allocate this variable name as a storage location in a heap and maintain the mapping" for that instruction. (And nor should they, most people would argue.)

voidstar78 said:
The BASIC line number is not the memory address - but, like you said, the tokenized form (which is stored at some "system decided address", as indicated by the next/prev linked list addresses prior to the executive tokens) also contains the original user-supplied line number. That's how I meant "abstracted away" to the end user (on the specifics of what actual physical address the program tokens are stored).

Yes, but that's a trivial abstraction and, in earlier versions of MS-BASIC, that abstraction doesn't even exist. Every time the interpreter sees a GOTO 100 statement it doesn't "turn that into another number," but instead just searches its linked list for line number 100. (This is why it's good to put frequently called subroutines as early as possible in your program.)

voidstar78 said:
To support much larger programs, or combinations of programs, and you need to start using CHAIN or for other reasons, it's no guarantee your program will get tokenized always to these same addresses in the general-case (for tiny examples like this they likely will get the same address).

Right. And in MS-BASIC, that's completely unimportant as well, since it uses the line numbers directly as the addresses; it cares not at all at what physical addresses the line numbers are stored. You can even freeze program execution, remove the entire first line (let's assume it's a REM or whatever), shift everything else down, patch up all the linked list pointers, and carry on and everything will be fine. (This may depend on what the subroutine stack is holding in terms of addresses, though, but let's assume we're not in a subroutine.)

In other words, addresses in an MS-BASIC program are the line numbers, not the locations in memory.

voidstar78 said:
But from the above token, you can see some trends: <33> is their code for assignment (=), D1 is associated with the variable J...

What are the $7E and $3A in that first line?

Anyway, even here we suddenly see a major difference between this and MS-BASIC's tokenisation of the
same line:

Code:

                      │ NEXT │ LINE  │ CODE
10 J = 0              │ .... │ 0A 00 │ 4A 20 EF 20 00
                                       'J sp =  sp Ω

You'll note here that MS tokenisation is simply doing direct replacement of one symbol for another: there is no syntax change as the IBM 5110 system above is doing. (The IBM system is, from the looks of it, at least starting to build an AST, if $33 represents '='.) That's a major program transformation step, and generally the first one towards interpreting or compiling code. MS-BASIC does not do this when it tokenises code; MS-BASIC's tokenisation is really just a form of compression that also happens to make the next step of interpretation a bit easier.

voidstar78 said:
...the similarity may be more to an instruction set than "assembly language" per se - the lines there get blurred to me (yeah, blasphemy to some; I do recognize that assembly generally does need two-passes to sort out the branch distances - I recall Gary Kildall writing some early 70's ones needing three passes - so I'm not trying to trivialize the specifics of an assembler).

But you can "trivialise" the specifics of an assembler compared to this, because an assembler doesn't actually need an AST (except perhaps for expressions in an operand field). The passes are irrelevant here; they're needed only to resolve forward references. Plenty of assembly-language programs can be assembled in a single pass.

voidstar78 said:
But the similarity I meant is just that principle of <opcode> <operands> getting interpreted (and ok, BASIC isn't unique in that - but it particular approach could fit in 4KB ROMs....

Well, yes. Building an AST isn't terribly hard and in fact, for Lisp, it's so easy that it will take up less space in ROM than for almost any other language. (This is because LISP S-expression syntax is already an expression of an AST; you simply need to convert from using parens to indicate tree nodes to an actual tree data structure with pointers.)

voidstar78 said:
The code already exists to parse your sample into that fashion of <opcode> <operand> and to interpret it, it's nearly any ROM BASIC you can find.

Well, yes, that code has to exist for any language. But it's not "<opcode> <operand>"; it's an AST. Have a look at what your 5150 BASIC does for K = (I+1)*(J+2) for an example.

voidstar78 said:
Wang going so far as to even implement their BASIC using TTL chips.

I very much doubt that happened; I'd be interested to see what led you to that conclusion. I expect that Wang implemented a CPU in TTL, but had a software BASIC interpreter or compiler like everybody else.

voidstar78 said:
APL struggled (IMO) just since finding a "normal" keyboard at all (early/mid 70s) was a challenge, let alone dealing with an extra translation of "funny symbols".

I certainly believe that was part of it. Another part would simply be the more mathematical approach. For whatever reason, people are fine with using rather sophisticated mathematical concepts if they learned them in elementary school (look up the history of zero and the equals sign if you like), but not equally or in in some cases less sophisticated stuff that is taught only later on.

voidstar78 said:
From what I've seen, most of the "micros" didn't bother supporting the CHAIN keyword in BASIC.

I'm not really seeing how that's relevant.

The "Ahl Benchmark" of BASIC performance

Chuck(G)

25k Member

voidstar78

Veteran Member

krebizfan

Veteran Member

Plasma

Veteran Member

cjs

Experienced Member

Chuck(G)

25k Member

krebizfan

Veteran Member

voidstar78

Veteran Member

Byte Magazine Volume 03 Number 09 - Graphic Manipulations : Free Download, Borrow, and Streaming : Internet Archive

cjs

Experienced Member

voidstar78

Veteran Member

Chuck(G)

25k Member

cjs

Experienced Member