Operating systems might be one of the most complex pieces of software, yet their underlying concepts can be pretty intuitive.
I really had no idea what I was getting into when I decided to take Brown’s CS1690 in my sophomore year. I had just finished my introductory computer systems and architecture course, and from day one this course delved into the intricate details of operating systems during lectures, which, while informative, were often complex and overwhelming. By the time I finally grasped a concept, I forgot where this thing was supposed to fit into the big picture. The real understanding began when I implemented these concepts practically by building Weenix, a Unix-based OS we implemented over the semester. This hands-on experience made all of the theory we learned more intuitive, yet I still found myself occasionally lost in the myriad of details.
To gain a clearer perspective, I often found myself taking a step back to focus on the high-level objectives of what we were trying to build. This approach of (over)simplifying the essence of operating systems and all the concepts proved to be incredibly helpful. It grounded my understanding and provided a solid foundation from which to explore the more nuanced aspects of these complex systems. This strategy of breaking down and understanding the core goals of a system has not only helped me in my operating systems course but has also become a valuable tool in my broader work within computer science and software engineering. The aim of this article is to share this high-level understanding I’ve gained in my OS class, providing a simplified yet comprehensive overview of the fundamental components and functions of an operating system.
What is an operating system?
A simple way to conceptualize an operating system is a program that acts as an intermediary between software applications and the computer hardware, allowing developers to build applications without worrying about the underlying hardware of the system. This program manages system resources and facilitates communication between hardware and software components. Early operating systems were small enough to fit into a single executable in which all components are stored into a single file loaded into memory when the computer boots up. This structure is known as the monolithic approach; modern operating systems, however, have evolved to adopt a more complex and modular design. For this article, we will center our discussion on the monolithic approach, since it is much simpler and more elegant to conceptualize, and has influenced many modern operating systems like Windows, Linux, and macOS.
Nearly all computers have two modes of operation: user mode (with the least privileges) and privileged mode (complete access over the system). The distinction between these two modes is crucial for the security and stability of the operating system.
In user mode, also called userland, software applications run with limited privileges. This means they cannot directly access hardware or manipulate critical system resources. This limitation is essential as it provides a layer of protection against malicious or faulty software. If an application running in user mode tries to perform an operation that requires higher privileges, like modifying system files or accessing restricted areas of memory, the operating system will intervene and prevent it. This mode is where user applications, like web browsers and word processors, operate.
In contrast, privileged mode, or kernel mode, is only used to run code that is apart of the operating system itself. The kernel generally refers to the portion of the OS that runs in privileged mode and is responsible for managing system resources like memory, CPU, and device drivers. Applications running in user mode must call on the kernel in order to perform operations that require direct hardware access, such as writing data to a disk, on their behalf.
The separation between user mode and privileged mode is vital for system security and reliability. By restricting application access to critical system resources and hardware, the operating system isolate issues to userland and prevent many types of software errors and malicious activities from bringing down or corrupting the whole system.
Since user applications cannot directly access the hardware or perform certain operations, they need some way to request the kernel to perform these privileged operations on their behalf.
System calls (syscalls) are mechanisms through which user programs request services from the kernel. Syscalls appear as normal function calls to the application developer and serve as a controlled interface to the kernel, allowing user programs to perform tasks like file operations, network communications, or access hardware devices. When a syscall is made, the execution mode switches from user mode to kernel mode, enabling the kernel to safely perform the requested operation on behalf of the user program.
Processes and Threads
Probably the most important abstraction from a programmer's point of view is a process. They're like individual workspaces for each program, providing an isolated environment where programs can operate without affecting each other.
More specifically, a process in our system is an executing program. It contains an address space, a list of references to open files, and a bunch of other information shared by a set of threads. The OS ensures that processes are isolated from other processes running on the system (i.e. one process cannot directly access the memory of another process).
Threads (threads of execution) are the smallest unit of work that can be run by the OS. Think of them as individual tasks, a chunk of code being sequentially executed, to which the OS can allocate CPU time. Multiple threads can run in parallel with other threads, and threads within the same process can share resources with other threads such as memory and open files.
Before the early 1990s, when operating systems primarily supported a single-thread-per-process model, the distinction between processes and threads was much less clear. Now that multithreaded processes are standard, it’s important to distinguish the two. In many cases, you can think of a process as the program itself and its associated resources, and threads as the individual units of work being done by the program. More complicated applications will have multiple processes (Chrome, for example, runs each tab in its own process), but nevertheless this is a useful mental model to conceptualize the difference between the two.
When a program runs, it needs to store its instructions (the code) and its data (like variables) somewhere in the computer for quick access by the CPU. Physical memory, commonly referred to as RAM (Random Access Memory), is a key component in a computer system where the computer stores data that is actively being used or processed. Think of it like a workspace where all the data currently being uses is laid out for quick access. Physical memory is made up of a series of locations, each with a unique physical address. These locations can be accessed very quickly by the CPU, allowing for fast read and write operations.
Physical memory is finite. There's only so much space available for storing and processing data. This limitation poses a challenge: how do we run large or multiple programs within this limited space? Additionally, direct management of physical memory can make writing applications complex and risky. Without proper management, programs could overwrite each other's data or run out of memory.
Virtual memory is a key abstraction of physical memory that represents the range of memory addresses that a process can read from or write to as a virtual address space. At its core, an address space is a set of mappings that define how virtual addresses (used by a program) translate to physical addresses in memory.
From a programmer's perspective, virtual address spaces are a godsend. They simplify programming by providing a consistent and private memory environment for each process. Programmers deal with virtual addresses, and the operating system, along with the hardware, takes care of mapping these virtual addresses to physical memory locations.
Virtual memory is also crucial in ensuring processes are executed in isolation. Because each process operates in its own disjoint address space, they cannot accidentally or intentionally accessing or corrupting the data belonging to another program (or worse, the OS kernel). This isolation is crucial for maintaining the stability and security of the whole system.
Virtual address spaces are typically much larger than the actual amount of physical memory available. Through memory paging (also called swapping), the computer can use secondary storage (like a hard drive) to compensate for shortages of physical memory by temporarily transferring data to disk storage. This mechanism allows processes to use more memory than is available in physical memory at the expense of performance. Of course, to the application, this is all abstracted away and handled by the OS.
Virtual memory is foundational to many other features in modern operating systems, such as memory protection (ensuring that programs don't access unauthorized memory areas) and the efficient sharing of code across different processes through shared libraries.
Each core of the CPU can only execute one instruction at a time (so a dual-core CPU can run two instructions simultaneously), and yet it seems like our computers are doing many more things at once. This is because the OS cleverly manages the CPU's time, quickly switching between tasks so that everything gets a turn, making it appear as if all the programs are running at the same time.
The OS schedules threads for execution, determining which gets access to the CPU and for how long. Time slicing is a technique where the OS allocates a time slot or 'slice' to each thread, rotating (context switching) between them rapidly to give the impression of simultaneous execution. This is crucial for multitasking, allowing multiple processes and threads to run seemingly simultaneously. This also means that programs can be paused (preempted) at any point by the OS and be resumed at a later time. Of course, all this behavior is abstracted away from the programmer by the OS.
Context switching between threads in different processes is more expensive than switching between threads in the same process, since we need to swap out the address space.
File systems define how files are named, stored, and retrieved from a storage device. Every time you open a file, your OS uses a file system to load it from a storage device.
A file is essentially a chunk of data (an array of bytes) that is managed by the OS. This is a crucial abstraction that allows the OS and user applications to read and persist data onto a storage device without needing to know how that device works. It’s useful to think of a file as two components: the metadata that describes the file (name, type, size, creation date, owners, permissions, etc.) and the content itself.
A directory is a special kind of file that stores references to a collection of other files. These help users and programs organize files to be located more quickly, prevents naming conflicts, and logically groups related files.
Layers of abstraction
The term “file system” is pretty loaded. In many operating systems, there are three layers of abstraction that build upon each other to provide user applications a consistent, high-level interface that allows them to interact with the file system without worrying how the data is physically represented on a storage device.
The actual low-level implementation of a file systems is the physical file system. This layer is responsible for physically storing the data onto the storage device. This device can be a disk or an SSD, or something like a network connection. This layer reads and writes file blocks, fixed-sized sequences of data, to the disk by interacting with device drivers.
The virtual file system acts as an abstraction layer that provides a uniform interface to the underlying physical file systems. This allows different file system types (like ext4, NTFS, FAT32, etc.) to be accessed in a consistent way by the OS, regardless of their specific implementation details. It facilitates operations like mounting, unmounting, and managing different file systems without the need for applications to understand the specifics of each file system type. Essentially, it serves as a bridge between the physical file system and the logical file system, ensuring compatibility and ease of access across various file systems.
Finally, we have the actual interface that allows user applications to interface with the file system. This is the logical file system, and is responsible for the commands and system calls used by users and applications to interact with the file system (e.g.,
mkdir in Unix-like systems). This layer is also provides file access, directory operations, and security.
Device drivers are programs that allow the OS to communicate with hardware devices. They deal with device-level details and provide a standard interface to the rest of the system. These allow us to interact with devices such as displays, disks, keyboards, mice, printers, and sound cards without having to worry about the low-level details of how they work.
A storage device driver is a program that acts as a mediator between the operating system and the physical storage devices like hard disk drives (HDDs), solid-state drives (SSDs), or any other forms of storage media.
When an application requests a file operation (like reading a file), the storage device driver translates this high-level request into a series of low-level commands that the storage device can understand. For example, it translates a file read request into commands to move the disk's read/write head to the correct location on the disk (in the case of HDDs) or to access the right memory block (in the case of SSDs). It also takes care of error handling, such as dealing with unreadable sectors or communication errors, ensuring data integrity and reliability.
The driver also plays a role in the performance of the system by optimizing how data is read from or written to the disk, like caching frequently accessed data or managing how data is physically arranged on the storage medium.
An I/O (Input/Output) device driver manages the data exchange between the computer and external devices such as keyboards, mice, printers, and network interfaces. These drivers are responsible for translating the general input/output instructions from the OS to device-specific commands.
For example, when a user presses a key on a keyboard, the keyboard driver interprets the electrical signal produced by the keystroke and translates it into a format that the operating system can understand, such as a specific character or command. Similarly, when a print command is issued, the printer driver converts the file into a format that the printer can process and ensures that the data is transferred correctly to the printer for output.
In addition to basic data transfer, I/O device drivers may also provide additional functionalities specific to the device, such as configuring network settings for a network interface card or managing the different printing modes in a printer. They also handle any status messages or errors from the device, ensuring the OS can respond appropriately to situations like a low ink warning from a printer or a disconnection notice from a network interface.
An interrupt is a signal sent to the processor by hardware or software indicating an event that needs immediate attention. It literally "interrupts" the processor as it is executing a program, allowing the processor to deal with a situation that has arisen, and then return to its previous task. Interrupts are a key feature in a computer's operation and are used to handle asynchronous events, like input from the user or a device, or the completion of a disk I/O operation.
An interrupt handler, also known as an interrupt service routine (ISR), is a special function in the operating system or a device driver designed to handle specific types of interrupts. When an interrupt occurs, the system pauses the currently executing task, saves its state, and runs the code of the appropriate interrupt handler to deal with the event.
In our keyboard example, the pressing a key would trigger an interrupt, which would cause the OS to execute the appropriate interrupt handler (defined within the driver) to read the keystroke data from the keyboard and translate it into a format that the operating system understands. This might involve determining which key was pressed and whether any modifier keys (like Shift or Ctrl) were held down at the time. Once the interrupt handler has completed its task, it signals the CPU that the interrupt has been handled, and the normal execution of the previously interrupted task is resumed.
Hopefully this writeup has given you a rough idea of what operating systems are supposed to do and what their main components are. Clearly, we glossed over many intricate details, but I hope that a strong, high-level understanding will help you stay grounded in your further exploration into the specifics, like how different operating systems implement these concepts, the various optimizations and trade-offs they make, and how they adapt to different hardware architectures and use cases.