{
Kaori
Kaori is a statically typed programming language. Designed to be simple, expressive, and readable. Its syntax combines familiar ideas from modern languages such as Python and Rust, while keeping a minimal and clean structure.
Here is a quick look at some syntax:
def main() {
fib_n: number = 5;
print(fib(fib_n));
}
def fib(n: number) -> number {
if n == 0 {
return 0;
}
if n == 1 {
return 1;
}
return fib(n - 1) + fib(n - 2);
}
A programming language also has its own grammar. In English grammar classes, we learned the rules to build our first sentences formed with words, in the compilers world statements, expressions and declarations are built with tokens that are formed by a sequence of one or more characters.
We are going to enforce a set of rules known as grammar, this is a very important step to develop a compiler, going from a sequence of tokens to an Abstract Syntax Tree that can represent a program in a more meaningful way.
Here is a non EBNF grammar with custom syntax highlight, created with regular expression, so that non-compiler developers can also understand it without having to dive into EBNF syntax:
program -> (function_declaration)* "end of file"
variable_declaration -> identifier ":" type "=" expression
parameter -> identifier ":" type
parameters -> (parameter ("," parameter)*)?
function_declaration -> "def" identifier "(" parameters ")" ("->" type)? block_statement
What does any of this even mean? Take a look at the variable declaration rule, it expects an identifier, then a colon, then it tries to parse a type, then it expects an assign operator token and then finally it tries to parse an expression, after all those steps it builds the declaration node for the variable.
An example where our rule is not followed: what happens if the next token to be consumed after parsing the type annotation is not an assign operator? Then that would be what is known as a syntax error! If we are trying to parse a variable declaration, according to the rules an assign operator is always expected after a type annotation, so make sure to not miss it in your code.
block_statement -> "{" ( expression_statement
| print_statement
| if_statement
| while_statement
| for_statement
| block_statement
| variable_declaration ";")* "}"
expression_statement -> expression ";"
print_statement -> "print" "(" expression ")" ";"
if_statement -> "if" expression block_statement ("else" (if_statement | block_statement))?
while_statement -> "while" expression block_statement
for_statement -> "for" variable_declaration ";" expression ";" expression_statement block_statement
The parser is written with a Recursive Descent Parser and the good thing of it is that it mirrors every single non terminal in the grammar. Take a look at the while statement non terminal and let's compare it with the Rust parser code for it:
fn parse_while_loop_statement(&mut self) -> Result<Stmt, KaoriError> {
let span = self.token_stream.span();
self.token_stream.consume(TokenKind::While)?;
let condition = self.parse_expression()?;
let block = self.parse_block_statement()?;
Ok(Stmt::while_loop(condition, block, span))
}
It consumes the while token, parses an expression, which is the condition for the loop, then parses a block statement and that's it, this is the magic of it! Let's look at another example if you're still not convinced:
fn parse_print_statement(&mut self) -> Result<Stmt, KaoriError> {
let span = self.token_stream.span();
self.token_stream.consume(TokenKind::Print)?;
self.token_stream.consume(TokenKind::LeftParen)?;
let expression = self.parse_expression()?;
self.token_stream.consume(TokenKind::RightParen)?;
self.token_stream.consume(TokenKind::Semicolon)?;
Ok(Stmt::print(expression, span))
}
For parsing a print statement according to the grammar, it is expected a print token, followed by a left parentheses, then an expression, then a right parentheses and finally a semicolon token.
There are still unanswered questions about our parsing, look at the following example:
2 + 3 * 5;
Mathematicians, a long time ago, created the order of operations convention, it is so we don't have to put as many parentheses in an expression to be able to express it in a way others would understand it with no ambiguity issues, they killed two birds with one stone: the expression becomes way less verbose to read and the ambiguity is gone! So here is the question: what is the answer to that expression according to them?
The multiplication is done before the addition and the result is obviously: 17. Multiplication and division are both part of the factor rule because they share the same precedence level, addition and subtraction are part of the term rule.
term -> factor (("+" | "-") factor)*
factor -> prefix_unary (("*" | "/") prefix_unary)*
To be able to parse an addition or a subtraction, the parser tries to parse a factor on the left and on the right side of it to ensure all the multiplications or divisions or operators with higher precedence are parsed before, this is how the order of operations are enforced by the grammar.
expression -> assignment | logic_or
assignment -> identifier "=" expression // Changing in the future
logic_or -> logic_and ("||" logic_and)*
logic_and -> equality ("&&" equality)*
equality -> comparison (("!=" | "==") comparison)*
comparison -> term ((">" | ">=" | "<" | "<=") term)*
term -> factor (("+" | "-") factor)*
factor -> prefix_unary (("*" | "/") prefix_unary)*
prefix_unary -> ("!" | "-") unary | primary
primary -> number_literal
| string_literal
| boolean_literal
| postfix_unary
| "(" expression ")"
postfix_unary -> identifier ("++" | "--")? | function_call // Changing in the future
arguments -> (expression ("," expression)*)?
function_call -> identifier ("(" arguments ")")* // Changing in the future
The last, but not the least important, the parsing rules for types, because types can also be represented by recursive trees.
type -> function_type | simple_type
simple_type -> primitive_type | identifier
primitive_type -> "bool" | "number"
arguments -> (type ("," type)*)?
function_type -> "(" arguments ")" "->" type
Writing your first program in Kaori is quite simple and the main function does not need a return type annotation, because the entry point of the program does not need to return values.
def main() {
print("hello world");
}
Variable declaration require type annotation, and must always be initialized with a value on the right-hand side.
String, number and bool are the most basic types:
def main() {
foo: number = 5;
bar: String = "hello world";
foo_bar: bool = true;
}
Notice that foo is now declared in the global scope and that is not allowed, all objects can only live in the local scope.
foo: number = 5;
def main() {
bar: String = "hello world";
foo_bar: bool = true;
}
Operators are the building blocks of expressions, and each operator has a fixed precedence, which determines the order in which expressions are evaluated when multiple operators appear together.
For example, multiplication and division have higher precedence than addition and subtraction:
3 + 4 * 5; // 23
(3 + 4) * 5; // 35
Comparison operators like >, <=, or == always evaluate to a boolean value:
12 > 7; // true
7 == 12; // false
95 >= 95; // true
Logical operators such as && and || allow combining boolean expressions:
true && false; // false
true || false; // true
!(5 == 6); // true
The assignment operator has the lowest precedence, ensuring that the expression on the right-hand side is fully evaluated before being assigned to the variable on the left:
a = 3 + 4 * 2; // a = 11;
Parentheses can always be used to make evaluation order explicit, if parentheses are omitted then the parsing follows operator precedence.
Control flow allows you to decide how the code executes: you can branch into different paths or repeat code with loops.
An if statement runs a block of code only if its condition is true.
def main() {
if 10 > 5 {
print("10 is bigger");
} else if 2 < 3 {
print("2 is smaller");
} else {
print("all the other branches condition were false");
}
}
A while loop runs a block of code repeatedly as long as the condition remains true.
def main() {
i: number = 0;
while i < 3 {
print(i);
i++;
}
}
A for loop is just a syntax sugar for the while loop and also runs a block of code as long as the condition remains true. It has a variable declaration, a condition, and an expression statement that increments the variable.
def main() {
for i: number = 0; i < 3; i++ {
print(i);
}
}
Loops can be nested, which is useful for iterating over multiple dimensions.
def main() {
for x: number = 0; x < 2; x++ {
for y: number = 0; y < 2; y++ {
print(x + y);
}
}
}
A function is declared with the def keyword, followed by its name, parameters, and an optional return type.
def square(n: number) -> number {
return n * n;
}
def main() {
result: number = square(5);
print(result); // 25
}
Functions can also call themselves recursively, just remember to include a base case. :D
def fib(n: number) -> number {
if n == 0 {
return 0;
}
if n == 1 {
return 1;
}
return fib(n - 1) + fib(n - 2);
}
def main() {
print(fib(6));
}
Error reporting is one the core features. A programming language without clear diagnostics misses one of the most important pillars of usability. In the current implementation, it provides detailed error messages, showing both the line and the column where the error occurred and pointing exactly to the problematic code. This makes debugging much easier and helps developers understand what went wrong.
def main() {
print(2 +);
}
What do we expect to happen in the code above? can you guess? it is a syntax error, an addition operation expects to have a left and a right operand, but right parentheses is not a valid operand.
I believe we can now confidently call Kaori a Turing-complete programming language. Many core features have already been implemented, and the journey so far has been both fun and challenging. Kaori is now more than 5x faster than its original Java implementation, since it is fully rewritten in Rust and no longer relies on a naive tree-walker interpreter.
In fact, we already outperform Python in hot loops, running about 2x faster. PyPy is still about 4x faster than us, but we have more optimizations planned for the future. Our goal is to get as close as possible to PyPy JIT-level performance, without a JIT, it sounds very unlikely, but at the same time it is going to be a very exciting journey, don't you think?
Features:
The name "Kaori" is inspired by Kaori Miyazono from the anime "Your Lie in April". She represents inspiration, motivation, and the desire to create something different from the standard, the same spirit behind creating this programming language. 🙂