bAdkOde
A long, long time ago, before there was Facebook or Twitter, and when Google was simply a search engine, I had a revelation. I realized that programming languages were simply a set of tokens. Of course, this is not so mindblowing now. But to a young 17-year old who had just started college, this was an amazing revelation. Being a nerd and all, I decided that I wanted to make my own programming language. For some reason I wanted to make it look really ugly and difficult to read. Had I known about Perl or other esoteric languages, I probably wouldn’t have decided to write it. Who am I kidding, I probably would have anyway and also would have felt a little less weird. Anyway, the very first incarnation of my language, bAdkOde, was written in Java. At this point, I had never heard about EBNF’s,http://en.wikipedia.org/wiki/Recursive_descent_parser[ recursive-descent parsers], or anything related to language parsing. But I was eager, knew how to write code, and therefore I was dangerous. My first implementation had no recursion whatsoever. I parsed the text iteratively. I had this horribly complex function that tried to validate and parse a printable string-of-text (for example, print "This is variable {a} "). I even had interpolation in my language (I thought it was novel, but I eventually found out that Perl already had it) and so that made the parsing even harder. My language was also typed and you could even create variables (I maintained a list of declared variables - I eventually found out that this was called a "Symbol Table"). It also supported basic mathematical operations. There was no way to do flow control though, and so the language didn’t have any if-statements or looping constructs. I had plans to add all of this in but never really got around to it.
Eventually, I took a class at ASU (CSE 240) that taught me about parsing languages. Using this information I rewrote bAdkOde using recursive-descent parsers. Then something happened and I lost interest. bAdkOde languished and was forgotten for a time. Then, I was suddenly interested in writing a language again because I was going through a phase where I read a lot about esoteric programming-languages. I thought I’d write my own. At this time, I think I was also taking an assembly class (possibly CSE 421) and so the eventual design of the language was definitely influenced by a few assembly concepts. This time, I decided to /api/media/5e9b98e5-60e1-453c-bce0-8fa1f5bcadf4/content the language in Perl] since it is suited to parsing text. In retrospect, I probably should have written the language using a recursive-descent parser (even if it had to be in Perl) instead of my current implementation.
First, I wrote up a quick spec of the features of the language. It was going to be an extremely minimal language with only a few operators. It would have a stack, memory, and two registers. Memory and stack are unlimited (theoretically, but it depends on how much physical memory you have - the language enforces no constraint), but the actual code you write doesn’t reside in the memory that you’re able to use from the code. After I wrote the specs, I wrote an EBNF:
stmt ::= b-stmt | u-stmt | while-stmt ;
b-stmt ::= binary-op, (number|mem-reg), mem-reg ;
u-stmt ::= unary-op, mem-reg ;
while-stmt ::= "{", logic-op, mem-reg, {stmt}, "}" ;
binary-op ::= > | + | - ;
unary-op ::= stack-op | ' | " | ? ;
mem-reg ::= mem | reg ;
logic-op ::= = | ! | + | - ;
stack-op ::= push | pull ;
push ::= ")", (number|mem-reg) ;
pull ::= "(", mem-reg ;
mem ::= "[", reg ;
reg ::= a | b ;
number ::= digit, {digit} ;
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ;
*Note*: That’s right. The EBNF shows that you can’t use negative numbers as operands. This wasn’t a design decision… I think I just forgot about it. It just means that makes things a little more fun
From the very first definition in the EBNF, you can see that the language has three types of statements: binary-operator statement, unary-operator statement, and a while statement. What this means is that there are statements with an operator that requires two operands, statements with operators that require only on operand, and a while (looping) statement.
Registers, Memory, and Stack
There are two registers, a and b. You can also use these registers as pointers, so [a and [b mean "The memory addresses identified by the values in register a and register b". For example, if a contained the value 10, then [a refers to memory address 10. Memory is unlimited (theoretically, but in practice is limited by the amount of physical memory you have), and so is the Stack. You cannot directly refer to the stack and there is no stack pointer (so there is no way to directly manipulate the stack). In processors, memory is typically addressed by BYTES, WORDS, LONGWORDS, or QUADWORDS. There is no such addressing scheme in bAdkOde. Also, in normal memory, the largest value a typical memory location (usually a byte) can hold is FF16 (25510). In bAdkOde however, the largest number a location can hold really depends on the implementation. I implemented this grammar in Perl and so depending on how it was built, you can have -253 to 253 (32-bit ints and double floats), or -263 to 264 - 1 (64-bit ints and double floats), or -2113 to 2113 (64-bit ints and quadruple floats). These limits also apply to the value that can be stored in the registers or on the stack.
Binary-operator statements
There are three binary operators: > or move, + or add, and - or subtraction.
> or move
Usage:
> [src] [dest]
The move operator takes two operands. The first operand (source) can either be a number, register, or a memory location. The second operand (destination) can either be a register or a memory location.
+ or add
Usage:
+ [src] [src2/dest]
The add operator takes two parameters. It adds the values represented by both operands and stores it in the location represented by the second. The first operand (source) can either be a number, register, or a memory location. The second operand (destination) can either be a register or a memory location.
Examples:
# add the values of the a and b register and store the result in b
+ab
# add the value in the a register and the value in the memory
# location that a points to, and then store the result in that
# memory location
+a[b
#add 3 to the value in register a
+3a
# add two memory locations together and store the value in
# the second
+[a[b
- or subtract
Usage:
- [src] [src2/dest]
The subtract operator takes two parameters. It subtracts the value represented by the first operand from the value represented by the second operand and stores it in the location represented by the second. The first operand (source) can either be a number, register, or a memory location. The second operand (destination) can either be a register or a memory location.
Examples:
# negate the value in register a
>0b
-ab
# subtract two memory locations together and store the value in
# the second
-[a[b
Unary-operator statements
There are five unary operators: ) or push, ( or pull, ' or print value, " or print character representation of value, and ? or input.
) or push
Usage:
) [src]
The push operator takes the value supplied and pushes it onto the stack. The source parameter can be a number, register, or memory location.
Examples:
# push 0 onto the stack
)0
# push the value in register a onto the stack
)a
# push the value in the memory location that a
# points to, onto the stack
)[a
( or pull
Usage:
( [src]
The pull operator pulls a value off the stack and stores it in the source parameter. The source parameter can either be a register or a memory location.
Examples:
# pull a value off the top of the stack and store it in register a
(a
# pull a value off the top of the stack and store it in the
# location pointed to by register b
([b
' or print value
Usage:
' [src]
The print value operator prints out the value supplied to it. The operand can be a number, register or memory location.
Examples:
# print the number 10
'10
# print the value in register a
'a
# print the value pointed to, by register b
'[b
" or print character representation of value
The print character representation of value operator takes the source parameter and prints the character representation of the value in the source parameter.
Examples:
# print a space
"32
# print character representation of value in register a
"a
# print character representation of value pointed to by register b
"[b
? or input
Usage:
? [dest]
The input operator accepts a character of input from the user and stores it in the location specified by the destination parameter. For example, if the user entered a space, the number "32" would be stored.
Examples:
# accept a character of input and store the ASCII value of that character
# in register a
?a
# accept a character of input and store the ASCII value of that character
# in the memory location that register b points to
?[b
Looping statement bAdkOde has one looping-statement, which is analogous to a while loop.
Usage:
{ [logic-operator] [src]
...
[statements]
...
}
There are four logic operators which are = or equal to zero, ! or not equal to zero, + or greater than zero and - or lesser than zero. The source parameter can either be a register or a memory location. The loop will run as long as the condition specified by the logic operator and source parameter is true.
Examples:
# Loop while the value in register a is not zero
{!a
...
}
# Loop while the value in location pointed to by register b is zero
{=[b
...
}
# Loop while the value in register a is positive
{+a
...
}
# Loop while the value in location pointed to by register b is negative
{-[b
...
}
Macros
Usage:
Macro definition:
@[macro-name]([param1], ..., [paramN]) = [block of bAdkOde statements
that use param1, ..., paramN];
Macro usage:
&[macro-name]([param1], ..., [paramN])
Macros are a useful feature in many assemblers. They provide a convenient way to re-use blocks of code. I decided to implement macros in bAdkOde as well. Notice the semi-colon at the end of the macro definition. This tells the parser that the definition has ended.
Examples:
# Macro to convert ASCII representation of integer to actual
# integer value
@atoi(CHR) = -48CHR;
# Macro to convert integer value to ASCII representation of
# integer value
@itoa(NUM) = +48NUM;
# Input a character
?a
# Convert to actual integer value
&atoi(a)
# Store the number 9 in location pointed to by value in
# register a
>9[a
# Convert 9 to ASCII value of the character '9' (57)
&itoa([a)
*Note*: Macros aren’t specified in the EBNF because they really aren’t part of the language itself. They’re simply concise representations of a set of instructions. Labels Labels are place-holders for "magic numbers". They make the code much easier to read.
Usage:
Label definition:
*[label-name] = [label-value];
Label usage:
$[label-name]$
In the following example, I’m going to use a label to replace the number '48':
Examples:
# Define the offset that we're going to use
*OFFSET = 48;
# Macro to convert ASCII representation of integer to actual
# integer value
@atoi(CHR) = -$OFFSET$CHR;
# Macro to convert integer value to ASCII representation of
# integer value
@itoa(NUM) = +$OFFSET$NUM;
Importing files
Usage:
%[filename];
After I implemented labels and macros, I figured that it would be nice if I was able to define them all in one place and then re-use them over and over again. Hence, the concept of imports.
Examples:
In file macros.b:
# Macro to convert ASCII representation of integer to actual
# integer value
*OFFSET = 48;
@atoi(CHR) = -$OFFSET$CHR;
# Macro to convert integer value to ASCII representation of
# integer value
@itoa(NUM) = +$OFFSET$NUM;
In file macro-use.bad:
%macros.b
# Input a character
?a
# Convert to actual integer value
&atoi(a)
# Store the number 9 in location pointed to by value in
# register a
>9[a
# Convert 9 to ASCII value of the character '9' (57)
&itoa([a)
By convention, bAdkOde scripts end with the extension bad, whereas macro-definition files end with the extension b. Also, macro-definition files shouldn’t contain any actual code. Any such code is ignored. It’s possible to "overload" macros. You can have macros that share the same name, but they must have different numbers of parameters.
Example bAdkOde scripts
Here are a few example scripts:
hello-world.bad
)0)33)100)108)114)111)87)32)111)108)108)101)72(a{!a"a(a}
fibonacci.bad
# prints the first 10 fibonacci numbers
)0
)1
>10a
>a[a
{![a
(a
(b
)b
'b"32
+ab
(a
)b
)a
>10a
-1[a
}
"8"10
reverse.bad
# reverse prints what ever the user enters
)0
>1a
{!a
?a
)a
-10a
}
>1a
{!a
(a
"a
}
echo.bad
# takes whatever the user enters and stores it in memory
# and then prints it out
>0b>1a{!a?a>a[b+1b-10a}>0b>1a{!a>[ba"a+1b-10a}
As you can see, pretty unreadable. I’ve tried to provide a semblance of readability in some scripts by indenting. But you can pretty much write everything in one line and it’ll work.
Translation into C bAdkOde can easily be translated into C. Assuming that we have three (long int) variables called a, b, and top that refer to register a, register b, and top of the stack, and also assuming that we have two (long int) arrays mem and stack, we can translate bAdkOde into C via the following rules:
1. Any references to the registers a and b directly translate to the variables
a and b.
2. [b or [a translates to mem[b] or mem[a].
3. ) and ( both use stack[top]. However, in ), stack[top] is on the
right-hand side of the assignment expression whereas in ( it is on the
left-hand side.
4. >, +, and - map to =, +, and - respectively.
5. ', ", and ? map to printf (or printf("%d", value)), putchar (from stdio.h)
and getchar (also from stdio.h).
6. { maps to while, and =, !, +, and - map to == 0, != 0, >= 0, and < 0. }
maps to the closing } brace for the while.
*Note:* My current implementation (which I wrote around four or five years ago) really sucks! I think the interpreter could be written more elegantly. Also, I think I should probably figure out another way to implement the memory and stack in the C translation. Right now I create a fixed-size array. I think the ideal implementation would be a linked list that grows and shrinks as required. Translation of fibonacci.bad into fibonacci.c
#include <stdio.h>
int main()
{
long int a = 0;
long int b = 0;
long int mem[131072];
long int stack[131072];
long int top = 0;
long int i = 0;
/* zero out memory and stack */
for(i = 0; i < 131072; i++)
{
mem[i] = 0;
stack[i] = 0;
}
stack[top] = 0;
top++;
stack[top] = 1;
top++;
a = 10;
mem[a] = a;
while(mem[a] != 0)
{
top--;
a = stack[top];
top--;
b = stack[top];
stack[top] = b;
top++;
printf("%d", b);
putchar(32);
b += a;
top--;
a = stack[top];
stack[top] = b;
top++;
stack[top] = a;
top++;
a = 10;
mem[a]--;
}
putchar(8);
putchar(10);
return 0;
}
Turing completeness and self-modifying code
I have no idea if bAdkOde is Turing Complete. I suspect it might be, but I have no proof. I recall reading somewhere that if you are able to write a quine (program that prints itself) in a language, then that language is Turing Complete. I figure that if I’m able to write a quine then I should be good! Regarding self-modifying code, I expect that it should be trivial to modify the interpreter so that program code exists in the same memory-space as data. However, this would make the translation of the program into C a little trickier.
Oh, and /api/media/5e9b98e5-60e1-453c-bce0-8fa1f5bcadf4/content] is a link to the bAdkOde interpreter that I wrote in Perl (I’m not proud of this code at all! Could be written better :p).
Closing remarks bAdkOde isn’t meant to be a practical language. It’s an esoteric language and its creation was an academic exercise for me. If you’re trying to write actual software in bAdkOde, then God help you! :)