Hate UML?

Draw sequence diagrams in seconds.
http://www.websequencediagrams.com

qb.js: An implementation of QBASIC in Javascript
Posted on: 2010-01-08 21:30:00
If your browser supports the proposed CANVAS tag, you will see a screen below containing a BASIC program.

Many programmers seldom think about how their compiler or scripting language is implemented. To them, it is a tool, and the less it gets in the way, the better. However, for a craftsman, knowing how your tools work can help you do a better job. It helps to think on several levels at once. Take this code, for example:

function createDummyString( length )
{
    var result = "";
    for ( var i = 0; i < length; i++ ) {
        result = result + " "; 
    }

    return result;
}

In some languages, strings are immutable, so result could be repeatedly created, copied, and destroyed millions of times in this one function call.

Those who are familiar with compilers have the ability to think on another level. For them, visible about 10 cm behind the computer screen is a whole other level of code, which contains how the programming language might be implemented. Beyond that, further in the distance, lies assembly language, with its precarious branch prediction, happy cache hits, and misrable misses.

Here is an implementation of QBASIC in Javascript. In this next series of blog entries, we will explore its inner workings, covering all parts of the compilation process.

What works

Only text mode is supported. The most common commands (enough to run nibbles) are implemented. These include:

  • Subs and functions
  • Arrays
  • User types
  • Shared variables
  • Loops
  • Input from screen

What doesn't work

  • Graphics modes are not supported
  • No statements are allowed on the same line as IF/THEN
  • Line numbers are not supported
  • Only the built-in functions used by NIBBLES.BAS are implemented
  • All subroutines and functions must be declared using DECLARE

This is far from being done. In the comments, AC0KG points out that P=1-1 doesn't work. Update in 2014: If I ever do this again I would use a "Packrat" parser.

The parser is slow because we are using a simple Earley parser. (EarleyParser.js). A faster implementation of a different kind of parser is in the works.

License

License is GPL v3.

Overview of the system

The compiler takes your BASIC program and converts it into a list of user data types, data from DATA statements, and bytecode statements. Here's pretty picture of the previous sentence:

If you just look at the Javascript source it will be confusing to figure out what goes where. So here is a map of all the important "classes" of the system:

Console

(Source) A canvas that represents the screen and captures keyboard input

Virtual machine

(Source) The virtual machine executes bytecode. The bytecode instructions may manipulate the stack, jump to a new address, or use the console functions. They may also execute system functions (such as LEFT$) or system subroutines (such as CLS).

Types

(Source) Each type (single, double, integer, long, array, user) has functions to create an initial value, or to copy a value into a variable. Upon a copy, for instance, the integer type performs rounding and truncates the value to 16 bits.

Codegenerator

(Source) The code generator visits each node of the abstract syntax tree abd generates bytecode for it. It uses the type information added by the TypeChecker as an aid.

TypeChecker

(Source) The TypeChecker's job is to catch any errors before we try compiling. Without it, you could write a program that tries to multiply an array by the string "George". Then the virtual machine would say "WTF?!" and crash. It fills in the type for any expression in the syntax tree.

GlrParser

(Source) The GlrParser is an implementation of Tomita's GLR parser. It uses a RuleSet to parse the program into an abstract syntax tree.

Unfortunately, there is a problem with my implementation of the GLR parser. I think I know how to solve it, but at the moment, the system uses the very slow, but concise Earley parser, in EarleyParser.js EarleyParser.js.

Tokenizer

(Source) If you try to use javascript's built in Regex object for splitting text into tokens, you will soon pull your hair out and run screaming through the halls. Instead, the tokenizer implements a simple Thompson NFA with lazy evaluation. In some cases, this technique is faster than Javascript's own RegEx functions!

Ruleset

(Source) The ruleset contains grammar rules. In addition, it can remove redundant rules, and compute the FIRST and FOLLOW sets.

Ruleparser

(Source) The RuleParser is implemented on top of the RuleSet. It uses the parser to parse rules, and transforms them to add goodies like comma separated lists, kleene star, and alternation operators. But it's main purpose is to allow me to say, "My parser uses itself to parse its own rules!"

QBasic

(Source) Finally, qbasic.js contains all of the grammar rules and Abstract Syntax Tree nodes for BASIC programs.

Virtual machine

The most straightforward way of creating a basic compiler in javascript is to directly translate the basic into javascript functions. But this approach will not work for two reasons. First, there is "goto" which, although it is a reserved word, is not yet in Javascript. (Obviously, ECMAScript community finds "with", prototype inheritance, and the rules of the 'this' keyword to be far less confusing than allowing "goto"). It is possible to automatically move statements around to eliminate GOTO, but you don't want to go there.

The other problem is that browsers tend to freeze until javascript programs finish running. To avoid freezing the browser until the program ends, we break the program into small chunks, and execute a few of those chunks every so often using a javascript timer. This gives the appearance of a running program and doesn't freeze the browser.

Bytecode solves both of those problems. By breaking the program into bytecode instructions, we can implement goto by just changing which instruction we are going to execute next. We can also suspend execution any time to allow the user to interact with the browser.

The virtual machine executes programs, which consist of types, an array of instructions, and a set of variable names which are shared. It has some data:

  • pc, the program counter (the index of the next instruction to execute)
  • a stack of frames. A frame maps variable names to their values. Each time a function is called, the frame is added to the stack. Each time it returns, the frame is removed, thus destroying all local variables.
  • an execution stack. The instructions can manipulate this stack. Two types of things can be pushed onto the stack: either a value (which is.a javascript string, number, or null), or a reference to a value (which can be the ScalarVariable or ArrayVariable objects).

Executing Instructions

Javascript excels at looking up things in objects and mapping strings to functions. This makes the virtual machine instruction lookup very efficient.

All of the information about each instruction is stored in a single object, called "Instructions". That means we can run dispatch instructions very simply, leaving the heavy work of figuring out where the code is to the javascript runtime:

while( this.pc < this.instructions.length ) {
    var next = this.instructions[this.pc++];
    next.instr.execute( this, instr.arg );
}

Each instruction can manipulate the stack, set variables, or change the pc to jump to another location.

Example

Here's some basic code:

A = 1
B = 2
PRINT A + B

And here's the bytecode produced to run the above statements:

   ' L1 A = 1
[0] pushconst 1
[1] pushref A
[2] assign
   ' L2 B = 2
[3] pushconst 2
[4] pushref B
[5] assign
   ' L3 PRINT A + B
[6] pushvalue A
[7] pushvalue B
[8] +
[9] syscall print
[10] pushconst 'n'
[11] syscall print
[12] ret
[13] end

PUSHCONST 1 pushes a number 1 onto the stack. PUSHREF A is a bit complicated, though. It pushes a reference to variable A onto the stack. Since there was no prior value of A, it has to create one with the default type of SINGLE and adds the mapping from the name "A" to that variable. After all that housekeepint, it does the push. The state of the virtual machine looks like this:

The ASSIGN instruction expects a reference and a constant on the stack. It removes them, and assigns the reference to the variable. Here's the actual javascript implementation of the instruction:

    ASSIGN: {
        name: "assign",
        numArgs: 0,
        execute: function( vm, arg )
        {
            // Copy the value into the variable reference.
            // Stack: left hand side: variable reference
            // right hand side: value to assign.

            var lhs = vm.stack.pop();
            var rhs = vm.stack.pop();

            lhs.value = lhs.type.copy( rhs );
        }

Let's look at instructions 6 to 8. We've seen PUSHREF, and PUSHCONST, now what's PUSHVALUE? This instructions the variable name and pushes its current value on the stack. After instruction 7, the state of the virtual machine is this:

All of the instructions are pretty simple to implement. Here's how the "+" instruction works. Thanks to Javascript, it works for both strings and numbers.

    "+": {
        name: "+",
        numArgs: 0,
        execute: function( vm, arg )
        {
            var rhs = vm.stack.pop();
            var lhs = vm.stack.pop();
            vm.stack.push( lhs + rhs );
        }
    },

Finally, we call the "print" system function, which just pops the result off the stack and displays it on the screen.

Here are the other instructions:

ASSIGN

Pop two things off the stack, and assign the value to the reference.

MEMBER_VALUE

Pop the reference to the user defined structure, and push the value of the named member.

MEMBER_DEREF

Like MEMBER_VALUE, but pushes a reference to the named mamber.

ARRAY_DEREF

Pop the array indices, and push a reference to the given location of the array.

PUSHCONST

Push the literal value onto the stack.

RET

destroy the current variable map, and restore the previous one, jumping to the previous value of PC.

GOSUB

Like call, but instead of using a new variable map, copy the current one.

CALL

Create a new, empty variable map, store PC in it, and jump to the given location.

JMP

Immediately jump to the given location.

BNZ

pop the top of the stack and jump to the given location if it is non-zero

MOD, /, *, +, -, AND, OR, NOT, <>, >=, <=, <, >, =

Pop one or two arguments off the top of the stack, perform the given operation, and push the result onto the stack.

BZ

Pop the top of the stack and jump to the given location if it is zero.

END

Halt the VM.

NEW

Create a new unnamed instance of the variable of the specified type, and push it onto the stack.

PUSHTYPE

Push the named type onto the stack

PUSHVALUE

Push the value of the given variable onto the stack.

PUSHREF

Push a reference to the given variable onto the stack.

POP

Pop the stack and discard

POPVAL

Set the given variable's value to be the value at the top of the stack.

RESTORE

Set data index to the given value

COPYTOP

duplicate the top of the stack.

FORLOOP

Using counter, step, and end value on the stack, determine if the for loop should continue. If not, jump to the given location.

SYSCALL

Call the given system function or subroutine (eg, LOCATE or CLS or LEFT$)

Console

The console performs these functions:

  • Printing to the screen
  • Converting QBASIC's colour codes to HTML colours
  • Keeping track of cursor position
  • Blinking the cursor
  • Allowing line oriented user input
  • Keeping a keyboard buffer for INKEY$, and converting DOM keycodes to qbasic keyboard codes.

The console uses an HTML Canvas object for display. It has an image of every character in the IBM character set. When you want to print something, it first draws a solid rectangle of the background colour at that position. Then it copies the character's image, leaving alone any transparent pixels.

In input mode, anything the user types is displayed on the screen and copied to a buffer. When you hit enter, input mode ends, and a completion function is called to restart the virtual machine and process the input.

While not in input mode, and you hit a key, it is converted into a QBASIC character code and added to the keyboard buffer. This buffer is used for the INKEY$ function.

Until next time

We've covered the runtime system of our virtual computer. We've left out how to handle arrays and function calls. For those features, you'll have to look up how the ARRAY_DEREF and CALL instructions are implemented, in virtualmachine.js.

Further reading...

Want more programming tech talk?
Add to Circles on Google Plus
Subscribe to posts

Post comment

Real Name:
Your Email (Not displayed):

Text only. No HTML. If you write "http:" your message will be ignored.
Choose an edit password if you want to be able to edit or delete your comment later.
Editing Password (Optional):

John Haugeland

2010-01-09 11:52:11
"In modern languages, strings are immutable"

LOL. Novice zealot flag detected. Article ignored.

Steve

2010-01-09 12:17:49
Corrected.

meanroy

2010-01-09 13:19:39
Great!

Bill went on to rule the world after HE wrote a Basic, what are your plans?

;-)

Roy.

Markus

2010-01-09 14:45:30
This is awesome. Reminds me of my first steps with computers :)

Michael

2010-01-09 20:53:38
STR$(curLevel) -> str$ needs a left space (" ")!

nhyone

2010-01-10 09:42:50
I was thinking of re-implementing some old BASIC games in JavaScript. Looks like writing a whole VM to do it is much cooler! :-D

Jake

2010-01-10 10:27:22
Thanks for a great posting - very useful and informative

I was looking at doing a *very* similar project for another version of BASIC and was thinking of modelling the interpreter along similar lines

I'm glad I wasn't barking up the wrong tree - Cheers

Swart

2010-01-10 16:36:59
this is madness! XD

Brian McKelvey

2010-01-11 21:13:57
This is inspirational! I hope I can learn quite a bit from examining your implementation. That, and there's a ton of nostalgia for me in QBasic... :)

I recently wrote an interpreter for an old RPN stack-based language called Iptscrae (Pig-latin for "Script"). It somewhat resembles Forth. It was a custom scripting language for an old avatar-chat system for which I'm writing a new open source client in Flash. Source is available on github: www.github.com/theturtle32/OpenPalace

It's my first attempt at writing my own language interpreter. Some of the principles seem to be the same as what you did, which makes me feel good about it, but I love the idea of doing it as a virtual machine. I don't think it would take much to refactor it as such... So exciting!

Thanks so much for creating this!!

wilq32

2010-01-12 05:03:58
Great one! Congrats :)

fakejoe

2010-01-13 03:49:03
Wicked Cool. Wayyyyy too cool.

What have you done Anakin!

You just gave the Emperor a new galaxy to conqu.. I mean repackage their all time high revenue legendary programs ;-)

AC0KG

2010-01-16 21:48:32
Looks like the parser has a bug:

This fails:

P=1-1

While these work:

P=1+1

P=1-(1)

Nate

2010-02-11 09:42:32
Thank you! From now on all my code will be in Basic.

Jeff Geurts

2010-03-02 17:21:45
I like you.

The Wife (Alice)

2010-03-02 22:04:40
Back off Geurts he's mine!

Daniel Reed

2011-02-06 18:17:01
You should really try code documentation sometime, it really works wonders for the readability of your code....

Angelo

2011-05-24 01:41:03
OWWWWWWWWWWWWWWWW

My respects!

Steve Hanov

2011-09-07 17:19:22
I think the code is sufficiently documented. The sources for most compilers, including qb.js, are almost the same. I assume that anyone reading the code is already familiar with lexing, parsing, syntax trees, and QBASIC.

Majid

2011-09-11 04:14:55
Hi dear Steve.

i am not familiar with java script enough. please help me how could i use this project for my school site.

i just download this page but when i click on button for compiling , it dosnt work.

Majid

2011-09-11 04:18:29
Hi dear Steve.

I copy this page and qb.js into my htdocs , when i enter submit the program of printing "hello" , it was the result:

Bad token!

Parse failed.

Bad token!

Parse failed.

Bad token!

Parse failed.

Bad token!

Parse failed.

Bad token!

Parse failed.

Program successfully parsed. 3 statements.

Program successfully parsed. 3 statements.

Program successfully parsed. 1 statements.

pls help me. should i add some php or js codes?

Jesse Ramirez Silva

2012-07-03 13:58:37
Nice article. Regards from BigField in Rio/Brazil.

De Lorean

2013-10-09 09:01:15
I also saw that immutable string comment ... :D , Ignorance duly noted.

taylor swift

2013-11-17 17:14:48
hey just wanted to tell these are not accepting

a. REM statements

b. CLS statements

c. IF-THEN-ENDIF statements

d. IF- THEN- ELSE- ENDIF statements

but anyway it has helped me a lot for preparing for exams and hats off to you creator

Caue Rego

2013-12-14 21:27:42
I got a very old basic game of mine which simply doesn't run here.

I stripped just the COMMAND$ part of it, really a few lines of code, and it does run on qb64.net

Would you be interested in grabbing it and making it work? I'd love to be able to play with it in your emulator, as it is much faster than qb64!

Email
steve.hanov@gmail.com

Other posts by Steve

Yes, You Absolutely Might Possibly Need an EIN to Sell Software to the US How Asana Breaks the Rules About Per-Seat Pricing 5 Ways PowToon Made Me Want to Buy Their Software How I run my business selling software to Americans 0, 1, Many, a Zillion Give your Commodore 64 new life with an SD card reader 20 lines of code that will beat A/B testing every time [comic] Appreciation of xkcd comics vs. technical ability VP trees: A data structure for finding stuff fast Why you should go to the Business of Software Conference Next Year Four ways of handling asynchronous operations in node.js Type-checked CoffeeScript with jzbuild Zero load time file formats Finding the top K items in a list efficiently An instant rhyming dictionary for any web site Succinct Data Structures: Cramming 80,000 words into a Javascript file. Throw away the keys: Easy, Minimal Perfect Hashing Why don't web browsers do this? Fun with Colour Difference Compressing dictionaries with a DAWG Fast and Easy Levenshtein distance using a Trie The Curious Complexity of Being Turned On Cross-domain communication the HTML5 way Five essential steps to prepare for your next programming interview Minimal usable Ubuntu with one command Finding awesome developers in programming interviews Compress your JSON with automatic type extraction JZBUILD - An Easy Javascript Build System Pssst! Want to stream your videos to your iPod? "This is stupid. Your program doesn't work," my wife told me The simple and obvious way to walk through a graph Asking users for steps to reproduce bugs, and other dumb ideas Creating portable binaries on Linux Bending over: How to sell your software to large companies Regular Expression Matching can be Ugly and Slow C++: A language for next generation web apps qb.js: An implementation of QBASIC in Javascript Zwibbler: A simple drawing program using Javascript and Canvas You don't need a project/solution to use the VC++ debugger Boring Date (comic) barcamp (comic) How IE <canvas> tag emulation works I didn't know you could mix and match (comic) Sign here (comic) It's a dirty job... (comic) The PenIsland Problem: Text-to-speech for domain names Pitching to VCs #2 (comic) Building a better rhyming dictionary Does Android team with eccentric geeks? (comic) Comment spam defeated at last Pitching to VCs (comic) How QBASIC almost got me killed Blame the extensions (comic) How to run a linux based home web server Microsoft's generosity knows no end for a year (comic) Using the Acer Aspire One as a web server When programmers design web sites (comic) Finding great ideas for your startup Game Theory, Salary Negotiation, and Programmers Coding tips they don't teach you in school When a reporter mangles your elevator pitch Test Driven Development without Tears Drawing Graphs with Physics Free up disk space in Ubuntu Keeping Abreast of Pornographic Research in Computer Science Exploiting perceptual colour difference for edge detection Experiment: Deleting a post from the Internet Is 2009 the year of Linux malware? Email Etiquette How a programmer reads your resume (comic) How wide should you make your web page? Usability Nightmare: Xfce Settings Manager cairo blur image surface Automatically remove wordiness from your writing Why Perforce is more scalable than Git Optimizing Ubuntu to run from a USB key or SD card UMA Questions Answered Make Windows XP look like Ubuntu, with Spinning Cube Effect See sound without drugs Standby Preventer Stock Picking using Python Spoke.com scam Stackoverflow.com Copy a cairo surface to the windows clipboard Simulating freehand drawing with Cairo Free, Raw Stock Data Installing Ubuntu on the Via Artigo Why are all my lines fuzzy in cairo? A simple command line calculator Tool for Creating UML Sequence Diagrams Exploring sound with Wavelets UMA and free long distance UMA's dirty secrets Installing the Latest Debian on an Ancient Laptop Dissecting Adsense HTML/ Javascript/ CSS Pretty Printer Web Comic Aggregator Experiments in making money online How much cash do celebrities make? Draw waveforms and hear them Cell Phones on Airplanes Detecting C++ memory leaks What does your phone number spell? A Rhyming Engine Rules for Effective C++ Cell Phone Secrets