LemonHX

LemonHX

CEO of Limit-LAB 喜欢鼓捣底层的代码,意图改变世界
twitter
tg_channel

My understanding of OS

what-is-operating-system.jpg

The Target of Criticism

This article may seem overly aggressive, so I do not recommend it for any middle-aged or older readers. If there are parts of this article that you disagree with, please comment and tell me your detailed reasons, thank you~.

What is an Operating System#

Software used to manage the interaction between humans and hardware This is my definition of an operating system. Operating systems, like the IE browser, are software, and they can also be quite terrible.

Generally, operating systems exhibit similar behavior across different hardware, but this is not a requirement. I actually don't care much about compatibility, but isolating hardware details and unifying software interfaces is indeed a good thing for developers, as it allows their software to be ported to more platforms.

Current Operating System Camps#

The ancient camps have no practical significance; Solaris is left with just ZFS, UNIX is dead, and Minix... will forever live in textbooks.

Operating systems generally have standards, and we currently have three standards:

  • POSIX Standard
  • GNU/Linux De Facto Standard
  • WINDOWS/MAC/Other Self-Playing Standard

First, let's talk about what POSIX is. I think POSIX was created to allow software from the old UNIX to be migrated to new platforms. In my view, it is a standard that forces everyone to comply due to the historical baggage of UNIX operating systems.

So the POSIX standard is just mediocre; I have nothing good to say about it. In fact, everyone is only partially compatible, which is much scarier than being fully compatible.

Next, let's talk about LINUX. The code of LINUX itself is already quite spaghetti-like, but if we ignore GNU, using the environment provided by FreeBSD in userland, you will find that your core is LINUX, but no software can run. This is why I mention the two together; without GLIBC and GCC, LINUX is essentially useless.

LINUX has now become the de facto standard for server development, even more concretely, CentOS8 compatibility. This is actually understandable; after all, commercial server operations can manage their own. However, the problem is that this group of idiots open-sourced their code, squeezing out other open-source projects and occupying users' minds, thus being forced to be tied to the CentOS ship.

WINDOWS Standard: Any standard with documentation is a good standard!

MAC Standard: It's just crap; all discussions are rejected.

API: Shell? What a disgusting thing#

Oh my God, does anyone really like writing shell scripts? The shell is essentially a way to connect a bunch of C programs (or libc) with the operating system's interface, providing a very awkward syntax.

The reason the shell can exist is that the designers of UNIX did not think to create a good abstraction to interface with various C codes, and the impoverished semantics of the C language are insufficient to support the complexity of this system.

Now we see the shell living in the YAML of k8s, which is quite frightening.

In terms of the cleanliness of the operating system's API, Windows, despite having encountered DLL hell in the past, indeed outperforms all *nix designs. APIs should use a universal description format; Windows now uses a method similar to .net IL to generate various interface descriptions, successfully interfacing with languages like Rust, C#, and C++, abandoning the header format and the clumsy shell.

Security#

If operating systems were not written in C but used a slightly safer language, more than half of the security issues would not exist.

-- Me

The security issues of operating systems, in my view, are an endless extension of C. If we bring it into the Warhammer universe, there is actually a slight correlation.

Every day, thousands of psychics excellent programmers bring to the Golden Throne various operating systems to extend the life of the Emperor C, while those sacrificed psychics believe this is a supreme honor. The once glorious technology UNIX has become a superstitious idol. The chaotic syscall power is everywhere, and the soulless swarm bug attempts to devour the entire galaxy.

First, let's not talk about the current CPU hardware design, but about the design of rings.#

A ring is a circle, with levels from 0 to N, where a smaller number indicates a higher level. Ring 0 is generally the position of the core, and developers typically use syscalls to call things in ring 0. Ring 1 and ring 2 are generally positions for drivers that users should not touch, while ring 3 is where users reside. At first glance, this seems reasonable, but in fact, it does not seem necessary and also increases the overhead of context switching. Anyone who has written low-level code knows that to be fast, you should reduce syscalls. For example, mutex -> futex, vfs -> fuse, io -> spdk. We always know that what we write is absolutely safe, so to increase speed, we abandon this design and move common things to user space. So... is the ring design outdated?

Some who study deeper may think, "Hey, isn't ring -1 used for virtualization?"

I don't oppose virtualization; I just believe that this thing needs a ring, while the rest does not.

What's wrong with system calls (syscall)?#

Next, let's talk about the system calls we just mentioned. I wonder if anyone here has played with eBPF. If so, you should already understand my thoughts. I mainly want to introduce this to those who have not played with eBPF. We know that C does not have closures, and even if C had closures, they could not be passed across processes or memory spaces or rings. This is why a twisted syscall mechanism was invented to call things inside the kernel. In fact, syscalls in many OSs also need to be POSIX compatible. I could care less; if I could directly write a closure to pass to the operating system's kernel (like the binary of eBPF), and after security verification and compilation, run it directly in ring 0 with trendy pure async IO, that would look more like a program written by a normal person rather than manually saving a bunch of states and worrying about thread safety.

What is a file?#

I see people complaining that post-00s kids don’t even know what a file is. I want to retort, Do you even know what a file is? I think most people think they know what a file is, but they give me such shallow answers that I can't be bothered to refute them. My personal view is: Why should we post-00s kids pay for your stupid designs?

Let's return to the most basic definition of a file: an A4 sheet of paper. It is visible, tangible, and has limited capacity (you can only write a limited amount). You can name the file, for example, this A4 sheet is called "Weekly Report for the Fourth Week of August." But what does this have to do with files in computers?

Clearly, we initially abstracted in this direction, but now it seems there is not much connection. If we were to describe a file in a modern operating system, we... would try to describe it.

In a Linux system, a file is like a toilet (flush indicated). You can defecate in it or flush it down the drain. You can pour leftovers into it as long as it doesn't clog. You can squat on this toilet for as long as you want. Normally, only one person is allowed to defecate on the toilet at a time. There are many forms of toilets, so files also come in many forms, such as flushing toilets, wooden barrels, and village cesspools. For example, you have tmpfs, rootfs, and procfs.

So this is actually far from our A4 paper model, so I think we need to properly abstract the concept of a file. For example: procfs.

In the garbage operating system of Linux, there is something called procfs, which stores a bunch of dynamic things, usually located in the /proc directory, such as /proc/cpuinfo. We know that normal filesystems can be written to, but procfs... is a bit special. It looks like a toilet, but when you actually use it, you find it is an artwork.

Why is the current filesystem designed so twisted? Because, as we mentioned earlier, the shell is more often a string processing language, and when it comes to processing non-string data, it becomes inadequate. Therefore, the UNIX philosophy of everything is a file was designed, because otherwise the omnipotent C + Shell could not handle this data!

Timers, signals, semaphores, processes, network devices, drivers: files are like songbirds!

For example, when normal people want to read information from an operating system, they might think about whether there is an API, while UNIX programmers like to use cat /proc/xxx | sed -i '.*aaa.*/bbbb/ | grep yyy. So until the last century, this was revered as the sacred simplicity. But what if one day there is information that does not exist in text form... like a directed graph? Oh, the shell can't handle that, but directed graphs are far more common in real life than text.

Implementation of Filesystems#

Just now we mentioned that VFS is quite twisted; how did it come to be? To complete my complaints about procfs, we find that if this VFS does not intertwine with the operating system, we cannot read things like cpuinfo that only ring 0 has permission to read. So VFS depends on the operating system, but the operating system depends on various shells, and shells depend on various VFS, creating a mutual dependency that crosses rings. This means that from a design perspective, this whole thing is already very unsafe...

Some experienced users may know that recently everyone has been discussing SPDK or FUSE. Why does FUSE exist? Because VFS is not only unsafe but also pretends to be safe. What happens when we execute cat ~/.README.md?

  1. Call a function called open in user space.
  2. Open is a glibc-wrapped syscall.
  3. Trap into the kernel.
  4. Read VFS.
  5. Read XFS.
  6. Read block.
  7. Send IO request.
  8. Hard drive reads into the buffer.
  9. Initiate IO interrupt.
  10. Copy to kernel buffer.
  11. Copy to user buffer.
  12. Return.
  13. Return.
  14. Return.
  15. Exit the kernel.
  16. Exit syscall and switch back to user space.
  17. Obtain the string.

Oh, we copied twice, and we had two context switches. We have three abstract function calls that we don't know what they are doing, and we completed several memory mappings. In fact, only steps 4 6 7 9 11 17 are what we care about. These steps do not require any switches or changes to user space. So we can understand that FUSE is a filesystem implementation that only includes these steps.

But have you noticed that VFS in the FUSE system only serves to identify the resource we want... So now RedoxOS has started using URLs for files.

By the way, after FUSE, filesystems have sprung up like mushrooms after rain because the complexity has decreased, and no one cares about what the garbage core is doing; they all connect to VFS and implement it themselves.

However, using URLs does not seem to solve the core problem compared to doing VFS well.

What were files originally used for?#

Many people might directly say, to store files! What else?! But you are really wrong. Ancient humans had the ability to calculate offsets and write to floppy disks!

In fact, the original design of VFS and files was to provide persistence for IPC. Although Microsoft's VFS did a terrible job, at least they got the COM interface right.

So you say this "everything is a file" doesn't seem particularly twisted; is there something more stimulating?!#

Yes! PID PID (Process ID) is a typical twisted example. It tries to use a file named with a number to describe a running program. In real life, this is undoubtedly a crazy behavior, just like saying I lifted my spoon for breakfast and I call it 3.

As someone with a solid computer foundation, you should know that this number has an upper limit. Suppose I lift my spoon as 3, I count being alive as 5, and I count breathing as 6. You will find that the duration of these events varies from very short to very long, and they can be infinitely enumerated. So when PIDs are exhausted, it starts looking for any that have become free. Oh, if I use my living state as a process from 0 to 2^22...

That's right, fork failed: Resource temporarily unavailable.

If I use my living state as a process from 0 to 2^22...

This thing is called a zombie process because it indeed describes something useless, but it occupies a process and refuses to die.

So what's stimulating about this?

Because I said a PID is a file, and files also have an identifier representing the currently opened file, FD. Oh... but at least this thing lives inside the process. However, the interesting thing about this value is that it changes frequently because opening IO is too common. Most IOs are indeed short-lived, and the UNIX philosophy is that everything is a file. So once you obtain this FD, performing operations on it could potentially backfire.

Security 2: The Superposition of We Completely Trust Users and Users are Dogs#

Continuing from the previous section, we talked about how twisted the design of the filesystem is. Now we can expand on the security mechanisms of the everything is a file operating system.

Linux has a very simple and foolish permission system where files have owners.

Those who read the previous chapter might exclaim, "Huh?! Huh?!" That's right, even PIDs have owners, and the owner can set three permissions: read R, write W, execute X. If all three are enabled, it equals 7.

R + W + X
4 + 2 + 1

The first permission is User, the second is Group, and the third is Other. So 777 is the highest permission; Boeing 777 flew by.

Huh?! Other? Yes... Other, so vague that normal people dare not grant it. Then an application crashes, saying it has no permission, and you say, "Just enable it." The problem arises: small and beautiful applications will scan you like crazy...

So you have no idea what to enable. As a user, you don't know who accessed your stuff, and you can't prevent a specific member of the group from accessing it, nor can you kick them out of the group (like the video group)...

So from the previous security point, users are dogs, and now we completely trust users. Truly, it's like a whore setting up a sign for virtue in *nix.

TOCTOU Problem (Inconsistency Between Check Time and Use Time)#

if (access("file", W_OK) != 0) {
    exit(1);
}
symlink("/etc/passwd", "file");
int fd = open("file", O_WRONLY);
write(fd, buffer, sizeof(buffer));

Oh, even though we don't have permission for /etc/passwd, we still wrote to it!

Here we should mention the Windows kids who did it right.#

Windows stores file permissions in metadata, which has something called Access Control List, and another called Mandatory Access Control, which can meticulously control permissions for processes, threads, files, directories, ports, memory, and devices. Moreover, Windows even provides a GUI from the last century.

Some might say, "Look, Linux also has SELINUX." Would you dare to enable it? If it breaks, you take the blame.

Windows' NTFS has TxF (Transactional File System), which can avoid this TOCTOU problem to some extent (those who have studied databases know what the atomic definition of transactions is, so that problem cannot occur), but Microsoft does not recommend using it.

Lemon suggests that you OS people quickly learn from databases; they are two generations ahead of you!#

IPC, Socket, and Net#

I believe many newcomers cannot distinguish what these three things are for. IPC is for interaction between different processes on the same machine, Socket can be a form of IPC on the same machine, and Net is for accessing others.

Oh, here we have to criticize Linux again. Look at Windows; they have a set of COM and a set of DLL. Once the COM Object is registered, MSVC can directly write C++ and inherit it to complete IPC. But you see, you are twisted to sockets; dbus is available, but dbus has three implementations and is still in user space. The result is 114514 copies of information, which is painfully slow. Then Android says, "I’m out; garbage dbus is too slow." Why not implement it in the kernel? Hence, the invention of Linux's binder, which allows for single-copy efficiency.

However, I am too exhausted to discuss network stacks because it's truly draining. Let's talk about it another day.

IO Chapter#

Signal Queue IO, you can just die; no one uses you.

First, let me clarify for the newbies reading this article that IO here refers not only to pure files but to all IO, such as networks and printers.

For modern people, it is no longer a time to scrutinize: "Oh, it's just IO; read and write on the spot, memory is so small, why allocate?" If it's slow, just wait!

First, let's turn our attention back to the era of Linux 2 and the select and poll functions... which were born just for laughs!

We loop through all IOs, and for each IO, we call select/poll, then copy all the fds from user space to kernel space, and wait, wait, wait, wait, and your computer feels like it has crashed~!

syslet/LCA: Does anyone seriously use such a clownish implementation?

Linux AIO#

  • AIO: We support asynchronous IO in Linux!
  • User: I want to use asynchronous IO and the "everything is a file" feature to operate a socket~
  • AIO: Get lost! Who spoils you!

Then AIO gets criticized heavily!

Through the hard work of countless mediocre engineers... Linus himself expressed some small opinions.

So I think this is ridiculously ugly. AIO is a horrible ad-hoc design, with the main excuse being “other, less gifted people, made that design, and we are implementing it for compatibility because database people — who seldom have any shred of taste — actually use it”.

— Linus Torvalds

Ugh, this is too damn disgusting! AIO is a scary toy, with the main excuse being “this design was proposed by amateurs, and we had to implement this crap for compatibility with database people, who really have no taste and can swallow this crap!”.

— Linus Torvalds (translated with integrity)

Because after all that design, it only works for O_DIRECT, which is really direct access that bypasses the cache. This is almost a piece of crap tailored for databases in the kernel, and this crap looks asynchronous but can be blocked with many tricks! Wow!

Epoll and kqueue, let's exclude epoll; even copying homework, it can ruin kqueue, which is indeed async, but I think it resembles a notification mechanism, so let's not discuss it for now.

io_uring (uring: urine) emerges!#

Oh, the Linux community can finally rightfully suppress BSD! (How many years has IOCP been around in Windows?)

You can check out other articles for more details; so far, I think this is just passable. The only question is when we will abandon the "everything is a file" design.

*nix People's Taste in UI#

It's 2022! Please, application developers, use GUI; stop writing a bunch of CLI! The computers of our grandparents' generation lacked the capability to support GUIs, but look at the Smalltalk and LISP machines of our parents' generation; weren't they GUI-first? What about composability? What about typing faster? Oh, that's not a reason to avoid GUIs; it's just that developers are too lazy to provide shortcuts.

I think my views on GUI design could be discussed in a separate article.

Conclusion#

If OSs could generate revenue like databases, they would have innovated countless generations by now.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.