LemonHX

LemonHX

CEO of Limit-LAB 喜欢鼓捣底层的代码,意图改变世界
twitter
tg_channel

Programming Language Design Pitfall Chronicles (Experts, Please Avoid)

Image-2.png

I have learned a lot from the failures of the Sap language and the CN language I tried to create before. I have also received input from numerous community members and industry engineers. Finally, I have compiled and attempted to create this language.

I hope it can help the next person who wants to create a programming language.

Speed: Why worry about it before it's even done?#

It's 2022 now, and there are 114514 ways to speed up your program with JIT methods. When designing a programming language, speed should be considered last, not first.

Even if you design something as terrible as JS, you can still achieve free speed improvements through GraalVM.

Unless you are designing a replacement for a certain C language, I can only wish you success.

Generics: Never overestimate the intelligence of users, never!#

Generics can be done, but they should not be made static. There are a few considerations here. First, dynamic generics can be separated and used by dynamic methods. Second, the type system of static generics is too complex for ordinary users.

Don't think you can handle generics yourself. You can spend two days writing Rust and doing some type gymnastics.

So I think type erasure + TypeID generics best meet the needs of normal people: achieving code reuse without increasing much mental overhead. Although this may result in a loss of runtime speed, memory usage can be somewhat solved through GC and pre-allocation.

List<Integer> typed = new ArrayList<Integer>();
List untyped = typed;

This way, the compiler can determine the generics, helping us reduce mental burden, and remove generics when necessary for code reuse.

Syntactic Sugar and Features: Don't do it if you're not sure how#

If you cannot guarantee that a feature is compatible and orthogonal to another feature, and can remain relevant for a long time while being maintainable and easy to learn, then don't do it.

Let's illustrate what the above statement means with an example:

// Specify the data source.
int[] scores = { 97, 92, 81, 60 };

// Define the query expression.
IEnumerable<int> scoreQuery =
    from score in scores
    where score > 80
    select score;

When Linq was first introduced, it seemed like a great feature. But now it is criticized because there are countless ways to speed up access to a collection, and it is more semantic to have a built-in SQL-like language.

scores.filter(_ > 80)

This is not only shorter than the previous example, but also orthogonal to the entire language. It can be further parallelized through iterator to achieve significant speed improvements.

Now let's talk about what it means to be incompatible.

class Point:
    x: int
    y: int

def where_is(point):
    match point:
        case Point(x=0, y=0):
            print("Origin")
        case Point(x=0, y=y):
            print(f"Y={y}")
        case Point(x=x, y=0):
            print(f"X={x}")
        case Point():
            print("Somewhere else")
        case _:
            print("Not a point")

This is a new feature developed by some garbage Python language, just to follow the trend.

In many pattern matching implementations, each case clause establishes its own separate scope. Then, variables bound by patterns are only visible within the respective case block. However, in Python, this makes no sense. Establishing separate scopes essentially means that each case clause is a separate function and cannot directly access variables in the surrounding scope (without resorting to nonlocal). In addition, case clauses can no longer affect any surrounding control flow with standard statements such as return or break. Therefore, this strict scoping leads to unintuitive and surprising behavior.

-- PEP 635

In my opinion, what we need at this time is not a new match syntax, but a better visitor interface and a better switch, because what you have done is something that those who have used pattern matching cannot understand.

Programming languages evolve with the advancement of history, the emergence of new papers, and better ideas. This will inevitably bring a lot of outdated features. However, some outdated features can perfectly coexist with current theories, while others are completely outdated and cause headaches for compiler developers.

Let's take C#'s delegates as an example. They are excellent because they can perfectly coexist with lambda expressions, as we can simulate lambda behavior with anonymous inner classes.

// C# 2
List<int> result = list.FindAll(
          delegate (int no)
          {
              return (no % 2 == 0);
          }
        );
// C# 3
 List<int> result = list.FindAll(i => i % 2 == 0);

But some programming languages simply didn't consider the consequences and added some strange syntactic sugar that eventually became outdated.

For example, the appearance of args... is an insult to our intelligence (as said by the Zig language).

Concurrency: Async or Stackful Coroutines?#

Although the design of async looks good at first glance, it raises some issues:

First, async code can never have a stable ABI because async is stackless coroutines, and the stack is compiled into a struct that holds all the variables of the current environment. Depending on the optimizer's choice, this struct is always changing.

Without a stable ABI, you cannot export this function separately, which makes cross-language calling very painful.

Second, .await is both labor-intensive and may not achieve the expected results.

async fn caller() {
    let ares = a(xxx).await;
    let bres = b(xxx).await;
    let cres = c(xxx).await;
}

At first glance, this code looks like asynchronous functions, so the execution inside should be asynchronous... but not really. Many people simply wrap main in async, so the await inside essentially becomes blocking. Without async, normal users cannot correctly execute them using blocking interfaces. Don't think your users will write a bunch of combinators. They won't. Your users will just open Stack Overflow and search:

<< how to await on multiple async function at the same time? >>

-- Your lovely users

So, the compiler you mentioned is super powerful, and the code in a language with CPS is not as good as the code in this weak language.

func WhatEver() {
    var wg sync.WaitGroup

    wg.Add(1)
    go func() {
        defer wg.Done()
        a(xxx)
    }
    wg.Add(1)
    go func() {
        defer wg.Done()
        b(xxx)
    }
    wg.Add(1)
    go func() {
        defer wg.Done()
        c(xxx)
    }
}

This code at least guarantees parallelism.

Developers: Don't be self-righteous#

When designing a programming language, most of them are designed to be Turing complete, with only good and difficult to write, and there is basically nothing that cannot be written. When you can use well-established design patterns that are considered easy to write and have a large amount of code to prove it, please don't add more advanced extensions thinking they are better.

Normal people know that there are only two most common data types:

  • array
  • map

But our PHP author thinks: Hey, look, this map is just an array, but with strings as indexes!

$array = array("foo", "bar", "hello", "world");
$map   = array("foo" => "bar", "hello" => "world");

I have to say that this design is truly amazing!

Abstraction: Appropriate coupling is better than carefully designed decoupling.#

This will only make every user have to do this before writing any code:

[dependencies]
rand = "*"

In my opinion, this is foolish. You can directly report a compilation error on unsupported platforms, but don't solve this problem through package management.

Similarly, most code is not generic, but project-only. Don't try to provide a very general and generic standard for everything. It can be achieved through partial specialization rather than providing a default implementation for generics.

Well... well, you can look at Haskell for this. The default implementation is a bunch of abstract nonsense.

Let's not forget to mention the interface of strings, numbers, and regex in Haskell:

Haskell officially provides two things for strings:

  1. [Char] = String
  2. ByteString

The first one may seem intuitive, but don't forget that Haskell has a default lazy boxing, so it cannot be directly understood as char[] in C. The second one is more intuitive and widely used, but the first one is used more in the standard library.

So every developer is struggling to convert string literals to bytestrings. At least many libraries I have seen have this layer.

And let's talk about how confusing Haskell's regex is.

Screenshot-2022-05-16-132320-1024x528.png

By the time you figure out which one to use, our Perl users have already finished writing this regex. Why can't such a basic feature be built-in?

Most regex can be compiled and even JITed. What you are doing... can only be said to be overthinking.

Truly ambitious Haskell programmers don't use regex, they use Parsec. Next.

The Real World is Not Pure!#

It wasn't until a year ago that I realized that although state management is difficult, these difficulties cannot be bypassed by some pure functions. Even if you manage to bypass them, the cost you pay is far greater than the benefits it brings.

And you don't want to force your users to learn this, do you?

Look at the OCaml users next door. They have been doing fine for so many years.

let x = ref 0 in
let y = x in
    x := 1;
    !y

OCaml has even developed Coq.

So don't bother with fancy Monad+Transformer unless you really encounter some concurrent problems with multiple cores. Isn't there STM to help you with that?

Conclusion#

Before designing a programming language, understand what your target user group needs or how to attract more people through design, rather than showcasing your extraordinary programming skills and intelligence.

Of course, if you don't want to design a language that satisfies everyone and is enjoyable to use, then ignore what I said.

In summary, your creation can produce a language that is moderately maintainable, has an unsound type system, and has decent speed.

It may not be appealing to many people, as there are already enough heap of shit languages...

-- Potato TooLarge

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.