Skip to content

Royalty-Codes/howk_shell

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HOWK SHELL: Complete Architecture Guide

Table of Contents

  1. Foundations
  2. Architecture Overview
  3. Advanced I/O & Execution
  4. Data Structures & Job Design
  5. Concurrency & Synchronization
  6. File Organization

Foundations

What is a Shell?

A shell is an operating system abstraction layer that sits between the user and the kernel. It provides:

  1. Command parsing: Reading user text ("ls -la") and understanding what program to run
  2. Process creation: Creating new processes (fork/exec)
  3. Job control: Managing background/foreground jobs, stopping/resuming
  4. I/O redirection: Connecting stdin/stdout to files or pipes
  5. Environment management: Variables, working directory, command history

When you type a command, the shell:

User input
    ↓
Parse into program name + arguments
    ↓
Lookup if it's a builtin or external program
    ↓
If builtin: run in shell process (affects shell state)
If external: fork() child process, exec() the program, wait() for exit
    ↓
Print results to stdout
    ↓
Back to command prompt

Processes and fork/exec/wait

Understanding Processes

A process is a running instance of a program. Every process has:

  • PID: Process ID (unique identifier assigned by kernel)
  • Memory: Own heap, stack, variables
  • File descriptors: Open files, pipes, sockets
  • Working directory: Current directory ("pwd")
  • Environment: Variables like PATH, HOME, USER

The fork() System Call

pid_t child_pid = fork();

fork() does something remarkable: it creates a COMPLETE COPY of the current process.

Before fork():
    Parent process [PID 1000]

After fork():
    Parent process [PID 1000]  
    Child process  [PID 1001]  (exact copy of parent)

Both processes continue executing from the exact same line of code, but fork() returns:

  • In parent: child's PID (e.g., 1001)
  • In child: 0
pid_t pid = fork();
if (pid > 0) {
    // This code runs in PARENT (pid is the child's PID)
    printf("I am parent, created child %d\n", pid);
} else if (pid == 0) {
    // This code runs in CHILD (pid is 0)
    printf("I am child\n");
} else {
    // pid < 0 means fork() failed
    perror("fork failed");
}

Critical insight: Parent and child share NO memory after fork(). Changes in the child don't affect the parent.

This is why cd must be a builtin:

pid_t pid = fork();
if (pid == 0) {
    // CHILD PROCESS
    chdir("/tmp");          // Child changes directory
    exit(0);                // Child exits
}
// PARENT PROCESS continues here
// pwd still shows original directory! (child's chdir was ignored)

The execvp() System Call

execvp("ls", {"ls", "-la", NULL});

execvp() REPLACES the current process with a new program.

Before execvp():
    Current process memory: [shell code, variables]

execvp("ls", ...):
    1. Load "/bin/ls" into memory (replace all code)
    2. Set up argv (command-line arguments)
    3. Never returns (old code is gone)

After execvp():
    Current process memory: [ls code, variables]

Critical: execvp() doesn't return. If it fails, your code continues (and that's an error).

pid_t pid = fork();
if (pid == 0) {
    // CHILD PROCESS
    execvp("ls", argv);     // Becomes ls, never returns
    perror("execvp");       // Only runs if execvp FAILED
    exit(127);              // Error exit code
}
// PARENT continues here

The waitpid() System Call

int status;
waitpid(child_pid, &status, 0);

waitpid() blocks the parent until the child exits.

Parent: ---[fork]---[waitpid blocks]---[child exits]---[continue]---
Child:        [exec ls]--------[ls runs]--------[exit]

Flags matter:

  • waitpid(pid, &status, 0): Block until child exits (foreground behavior)
  • waitpid(-1, &status, WNOHANG): Check for exited children WITHOUT blocking
  • waitpid(pid, &status, WUNTRACED): Also return if child is stopped

Why Zombies Exist

If a child exits but the parent never calls waitpid(), the child becomes a zombie process:

  • Child's resources are freed (memory, open files)
  • Child's exit status remains in the kernel's process table
  • Process table entry cannot be freed until parent waitpid()s
  • If parent never waitpid()s, zombie persists (even after parent exits)

The HOWK shell solution: Call reap_background_processes() in the main loop.

while (1) {
    reap_background_processes();  // Clean up finished background jobs
    // ... read and execute commands ...
}

Using WNOHANG ensures we don't block the shell:

while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
    // pid is a child that exited, reap it
}
// Returns immediately if no children exited

File Descriptors Deep Dive

A file descriptor is an integer handle to an open file/stream. The kernel maintains a file descriptor table per process:

FD Table (in kernel, per process):
    0 -> STDIN_FILENO   (keyboard/input)
    1 -> STDOUT_FILENO  (terminal/output)
    2 -> STDERR_FILENO  (error output)
    3 -> [your files]
    4 -> [pipes]
    5 -> [sockets]

Open a File

int fd = open("myfile.txt", O_RDONLY);  // Returns 3 (next available FD)

After this, fd = 3 and FD table entry 3 points to myfile.txt's kernel file structure.

Duplicate a File Descriptor

int saved_stdout = dup(1);  // Create another reference to stdout

Now both FD 1 and saved_stdout point to the same kernel file structure.

Redirect Using dup2()

int fd = open("output.txt", O_WRONLY | O_CREAT | O_TRUNC);
dup2(fd, 1);  // Make FD 1 (stdout) point to output.txt

After this:

  • All printf(), cout, writes to stdout go to output.txt
  • FD table entry 1 now points to output.txt
  • Original stdout is still referenced by the kernel internally, but our process can't access it

This is how redirection works:

Original: printf() → stdout (FD 1) → terminal
After dup2(): printf() → stdout (FD 1) → output.txt

Architecture Overview

High-Level Execution Flow

┌─────────────────────────────────────────────────────┐
│                  HOWK SHELL MAIN LOOP                │
│                                                      │
│  while (1) {                                         │
│      reap_background_processes()  (WNOHANG)         │
│      read_command_line()          (from user)       │
│      parse_input()                (tokenize)        │
│      inspect_command()            (builtin check)   │
│                                                      │
│      if (BUILTIN):                                  │
│          execute_builtin()        (in parent)       │
│      else if (BACKGROUND):                          │
│          submit_to_threadpool()   (or fork+return)  │
│      else:                                          │
│          execute_process()        (wait for exit)   │
│  }                                                   │
└─────────────────────────────────────────────────────┘

Component Interactions

shell.c
├─ parser.c (parse input)
├─ commands.c (classify: builtin or external)
├─ builtin_executor.c (run builtins with FD redirect)
├─ process_executor.c (fork/exec external programs)
├─ job_tracker.c (track background jobs)
├─ threadpool.c (worker threads for parallel jobs)
│   └─ queue.c (priority-sorted job queue)
└─ line_editor.c (read input with history)

File Descriptor Layout in Pipes

When you type: cat file.txt | grep "pattern" | wc -l

The shell creates pipes and connects them:

cat file.txt
    │
    │ (pipe 0)
    ↓
grep "pattern"
    │
    │ (pipe 1)
    ↓
wc -l
    │
    ↓
stdout (terminal)

In HOWK:

pipe(pipes[0]);      // Create pipe 0: [read_end, write_end]
pipe(pipes[1]);      // Create pipe 1: [read_end, write_end]

// Child 0 (cat)
fork();
if (child) {
    dup2(pipes[0][1], STDOUT_FILENO);  // stdout → pipe 0 write
    execvp("cat", argv);
}

// Child 1 (grep)
fork();
if (child) {
    dup2(pipes[0][0], STDIN_FILENO);   // stdin ← pipe 0 read
    dup2(pipes[1][1], STDOUT_FILENO);  // stdout → pipe 1 write
    execvp("grep", argv);
}

// Child 2 (wc)
fork();
if (child) {
    dup2(pipes[1][0], STDIN_FILENO);   // stdin ← pipe 1 read
    execvp("wc", argv);
}

// Parent: close all pipes and wait
close_all_pipes();
waitpid(child0); waitpid(child1); waitpid(child2);

Advanced I/O & Execution

Pipes at the Kernel Level

When you create a pipe with pipe(fds):

int fds[2];
pipe(fds);
// fds[0] = read end
// fds[1] = write end

The kernel creates a circular buffer in memory:

Kernel memory:
    ┌──────────────────────────────────┐
    │   PIPE BUFFER (e.g., 64KB)       │
    │                                  │
    │  write_ptr → [data] ← read_ptr   │
    └──────────────────────────────────┘

Write behavior:

  • write(fds[1], data) appends to the pipe buffer
  • If buffer is full, write blocks

Read behavior:

  • read(fds[0], buffer) removes from the pipe buffer
  • If buffer is empty, read blocks

Closure:

  • If all write ends are closed and buffer is empty, read returns 0 (EOF)
  • If read end is closed, write raises SIGPIPE (broken pipe signal)

This is why close_all_pipes() is critical:

// If child 1 (grep) keeps the write end of pipe 0 open,
// child 2 (wc) will wait forever for EOF
// Solution: child 1 and 2 must close all unused pipe ends

Signals and SIGCHLD

When a child process exits, the kernel sends SIGCHLD to the parent:

Child: [runs]...[exits]
        └─→ Kernel: Child exited!
            └─→ Parent: Receives SIGCHLD

The parent can:

  1. Ignore the signal (not recommended, causes zombies)
  2. Install a signal handler to reap children
  3. Poll with waitpid(WNOHANG) in main loop (HOWK uses this)

HOWK uses polling in the main loop:

while (1) {
    reap_background_processes();  // Clean up exited children
    // ...
}

This is simpler than signal handlers and avoids race conditions.

Redirection Logic

Input redirection (<):

int fd = open("input.txt", O_RDONLY);
dup2(fd, 0);  // stdin now points to input.txt

Output redirection (>):

int fd = open("output.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
dup2(fd, 1);  // stdout now points to output.txt

Append redirection (>>):

int fd = open("output.txt", O_WRONLY | O_CREAT | O_APPEND, 0644);
dup2(fd, 1);  // stdout now points to output.txt (append mode)

Critical for builtins: Since builtins run in the shell process, we MUST:

  1. Save original FD
  2. Redirect
  3. Run builtin
  4. Restore original FD

Without restoring, all subsequent output goes to the redirected file!


Data Structures & Job Design

Job Abstraction

A job is a user-facing concept: a command submitted by the user.

HOWK jobs have three states:

  1. onTop (Foreground): User is waiting for it to complete

    • Parent blocks in waitpid()
    • User sees command progress/output
  2. BG (Background): Job runs while shell accepts more commands

    • Parent returns immediately
    • Job's output goes to stdout (can be noisy!)
    • Job tracked in job_tracker
  3. STOPPED: Job was running but received SIGSTOP/SIGTSTP

    • Can be resumed with onTop or BG
    • Can be killed with kill

Job Tracker Design

HOWK uses a simple array-based tracker:

#define MAX_JOBS 100
static TrackedJob jobs[MAX_JOBS];

// A tracked job:
typedef struct {
    int job_id;           // User-facing ID (1, 2, 3, ...)
    pid_t pid;            // Process ID
    char command[256];    // Command string for display
    JobState state;       // BG or STOPPED
} TrackedJob;

Design choice: Fixed-size array instead of dynamic list.

Advantages:

  • No malloc/free in hot path
  • Bounded memory usage
  • Predictable performance
  • Simple bounds checking

Disadvantages:

  • Limited to 100 concurrent jobs
  • Unused slots are wasted space

Priority Queue for Job Scheduling

HOWK threads execute jobs from a priority queue:

┌─────────────────────────────────┐
│   PRIORITY QUEUE (Linked List)   │
│                                  │
│ head → [HIGH, cmd1]              │
│        [HIGH, cmd2]              │
│        [NORMAL, cmd3]            │
│        [NORMAL, cmd4]            │
│        [LOW, cmd5]               │
│        → NULL                     │
└─────────────────────────────────┘

Insertion: O(n) scan to find correct position (maintain sorted order)

Dequeue: Return highest-priority job that fits available threads

if (current->job.requested_threads <= available_threads) {
    // This job can run! Dequeue it
}

Fairness: FIFO within same priority. Jobs with same priority execute in order received.

Why not just "highest priority first"?

  • Risk of starvation: if high-priority jobs keep arriving, low-priority never run
  • FIFO within priority ensures low-priority jobs eventually run

Thread Slots

Each job can request multiple "slots" (threads):

Total threads = 8
Available = 8

Job1 requests 4 slots → Available = 4
Job2 requests 2 slots → Available = 2
Job3 requests 4 slots → BLOCKED (need 4, have 2)

Job1 finishes → Available = 6
Job3 can now run → Available = 2

This allows:

  • Multiple independent jobs in parallel
  • Single heavy job claiming multiple workers
  • Fine-grained resource control

Concurrency & Synchronization

Thread Pool Architecture

┌─────────────────────────────────────────────────┐
│        HOWK THREADPOOL (Fixed N workers)         │
├─────────────────────────────────────────────────┤
│                                                  │
│  Worker 1  ─┐                  ┌─ [Available=2] │
│  Worker 2  ─┼─[READY]─→ [QUEUE]─┤               │
│  Worker 3  ─┼─[SLEEP]           ├─ [Priority]   │
│  Worker 4  ─┤                  ├─ [Slots]       │
│             │[EXECUTING]          │              │
│             └──────────────────────┘             │
│                                                  │
└─────────────────────────────────────────────────┘

Worker Loop:

while (1) {
    lock();
    while (!shutdown && !dequeue_job()) {
        cond_wait();        // Sleep if no work
    }
    if (shutdown) break;
    
    available_threads -= slots;
    unlock();
    
    execute_job();          // Outside lock
    
    lock();
    available_threads += slots;
    cond_broadcast();       // Wake others
    unlock();
}

Producer-Consumer Pattern

HOWK implements the classic producer-consumer pattern:

┌──────────────────────────────────────────┐
│         SHELL (Main Thread) [PRODUCER]    │
│                                          │
│  submit_to_threadpool()                  │
│  ├─ lock()                               │
│  ├─ enqueue_job()                        │
│  ├─ cond_broadcast()    [WAKE WORKERS]   │
│  └─ unlock()                             │
└──────────────────────────────────────────┘
        ↓
┌──────────────────────────────────────────┐
│         PRIORITY QUEUE (Shared State)     │
│         [Protected by pool_mutex]         │
└──────────────────────────────────────────┘
        ↑
┌──────────────────────────────────────────┐
│      WORKER THREADS N [CONSUMERS]         │
│                                          │
│  worker_loop()                           │
│  ├─ lock()                               │
│  ├─ dequeue_job()                        │
│  ├─ (job not available? cond_wait())     │
│  ├─ unlock()                             │
│  ├─ execute_job()                        │
│  └─ (repeat)                             │
└──────────────────────────────────────────┘

Mutexes: Protecting Shared State

A mutex (mutual exclusion lock) ensures only one thread accesses shared data at a time:

pthread_mutex_t pool_mutex = PTHREAD_MUTEX_INITIALIZER;

// Critical section
pthread_mutex_lock(&pool_mutex);
available_threads -= slots;  // Only one thread at a time
pthread_mutex_unlock(&pool_mutex);

Why needed?

Without mutex:

Thread A:  read available_threads (= 8)
Thread B:  read available_threads (= 8)    [RACE CONDITION!]
Thread A:  write available_threads = 6
Thread B:  write available_threads = 6     [Lost update! Should be 4]

With mutex:

Thread A:  lock(), read (8), write (6), unlock()
Thread B:  lock() [blocks], read (6), write (4), unlock()

Condition Variables: Efficient Waiting

A condition variable lets threads sleep efficiently instead of polling:

// WITHOUT condition variable (busy-wait):
while (queue_is_empty()) {
    sleep(1);    // Wasteful: sleep even if job arrives immediately
}

// WITH condition variable:
while (queue_is_empty()) {
    pthread_cond_wait(&cond, &mutex);  // Sleep until signaled
}

Critical semantics:

pthread_cond_wait(&cond, &mutex):
    1. Unlock mutex
    2. Sleep (atomically, no race window)
    3. When signaled, re-acquire mutex
    4. Return

// IMPORTANT: Recheck condition after returning from wait!
while (!shutdown && !dequeue_job()) {
    pthread_cond_wait(&cond, &mutex);
}
// Loop ensures we handle spurious wakeups

Spurious Wakeups: Some systems wake threads without a broadcast (spurious wakeup). The while loop handles this:

  • If woken but condition still false, loop again and wait
  • Safe and correct

Race Condition Example

Without proper locking:

Main thread:          Worker thread:
submit_to_threadpool():
  enqueue_job()       worker_loop():
  [NEW QUEUE]          [Read queue - might miss new job!]
  cond_broadcast()     [Sleep on condition]

With proper locking:

Main thread:          Worker thread:
  lock()               lock() [BLOCKED]
  enqueue_job()
  cond_broadcast()
  unlock() ─────→     [ACQUIRED]
                       dequeue_job() [Sees new job!]
                       unlock()

File Organization

Header Files: Public Interfaces

include/shell/parser.h

Defines ParsedCommand struct and parse_input() function.

typedef struct {
    char raw[1024];              // Original input
    char buffer[4096];           // Working buffer
    char name[128];              // Program name
    char *args[MAX_ARGS];        // argv for execvp
    int argcount;                // Number of args
    int is_background;           // & flag
    int requested_threads;       // Threadpool slots
    int priority;                // Job priority
    char *input_file;            // < file
    char *output_file;           // > or >> file
    int append_output;           // >> vs >
    SimpleCommand pipeline[MAX_PIPE_COMMANDS];
    int pipeline_count;
} ParsedCommand;

include/shell/commands.h

Defines builtin command IDs and CommandRequest struct.

typedef enum {
    COMMAND_EXTERNAL = 0,
    COMMAND_BUILTIN
} CommandType;

typedef enum {
    BUILTIN_NONE = 0,
    BUILTIN_EXIT,
    BUILTIN_HELP,
    BUILTIN_HISTORY,
    BUILTIN_JOB_LIST,  // j*b
    BUILTIN_ONTOP,     // onTop <job_id>
    BUILTIN_BG,        // BG <job_id>
    BUILTIN_TPOOL      // tpool <threads>
} BuiltinId;

include/jobs/job_tracker.h

Job tracking registry.

include/jobs/queue.h

Priority queue for threadpool jobs.

typedef enum {
    PRIORITY_HIGH = 0,
    PRIORITY_NORMAL = 1,
    PRIORITY_LOW = 2
} JobPriority;

include/jobs/threadpool.h

Thread pool manager.

Source Files: Implementation

shell.c

Main event loop (REPL: Read, Evaluate, Print, Loop).

int main() {
    while (1) {
        reap_background_processes();      // WNOHANG: don't block
        read_command_line(prompt, ...);   // Get user input
        parse_input(...);                 // Tokenize
        inspect_command(...);             // Builtin or external?
        if (BUILTIN)
            execute_builtin(...);
        else
            execute_process(...);
    }
}

parser.c

Tokenizes raw input into ParsedCommand.

  • trim_spaces(): Remove leading/trailing whitespace
  • normalize_operators(): Add spaces around |, <, >
  • read_word(): Extract one token (handling quotes)
  • parse_input(): Main parser

commands.c

Registry of builtins.

  • Looks up command name in a table
  • Returns CommandRequest with metadata

builtin_executor.c

Executes commands that must run in parent shell process.

  • redirect_builtin_io(): Save original FDs, redirect
  • execute_tpool(): Initialize threadpool
  • execute_ontop(): Bring background job to foreground
  • execute_bg(): Resume stopped job
  • execute_job_list(): Print j*b
  • restore_builtin_io(): Restore original FDs

process_executor.c

Executes external programs.

  • fork(): Create child
  • execvp(): Replace child with program
  • waitpid(): Wait for child (or WNOHANG for background)
  • pipe(): Create pipes between commands
  • dup2(): Connect FDs for pipes and redirection

job_tracker.c

Registry of background jobs.

  • Array-based: static TrackedJob jobs[MAX_JOBS]
  • add_job(): Register new job
  • remove_job_by_pid(): Unregister finished job
  • get_job_by_id(): Lookup job by ID
  • print_jobs(): List all active jobs (j*b)

queue.c

Priority queue for threadpool jobs.

  • Linked list implementation
  • enqueue_job(): Insert in priority order, FIFO within priority
  • dequeue_job(): Remove highest-priority job that fits available threads
  • is_queue_empty(): Simple check

threadpool.c

Worker thread pool.

  • init_threadpool(): Create N worker threads
  • submit_to_threadpool(): Add job to queue, broadcast workers
  • worker_loop(): Each worker's main function
  • shutdown_threadpool(): Gracefully stop all workers

line_editor.c

Read user input with history and arrow key support.

  • Raw terminal mode (ICANON disabled)
  • ESC [ A/B for up/down history
  • Terminal state save/restore

Key Takeaways

Processes & Forking

  • fork() creates a complete process copy
  • execvp() replaces process memory with new program
  • waitpid() blocks until child exits
  • Use WNOHANG to poll without blocking

File Descriptors

  • 0 = stdin, 1 = stdout, 2 = stderr
  • open() creates a new FD
  • dup() copies an FD
  • dup2() redirects (e.g., stdout → file)
  • Close all unused FD's in pipes (critical!)

Shells & Builtins

  • Builtin commands run in parent shell (affect shell state)
  • External commands run in child processes
  • Pipes connect stdout of one to stdin of next
  • Redirection uses dup2() to rewire FDs

Job Control

  • Track background jobs in a table
  • Use WNOHANG to reap finished children without blocking
  • Bring jobs to foreground with waitpid(WUNTRACED)
  • Send SIGCONT to resume stopped jobs

Threading & Synchronization

  • Mutex: Protect shared data (available_threads, queue)
  • Condition variable: Wake threads efficiently (not busy-wait)
  • Always use while with condition variables (handle spurious wakeups)
  • Release locks before long operations (job execution)

Building and Running

cd d:\Bunker\BaseCamp\howk_shell
make clean
make
./howk

Example Session

HOWK -> ($) ls
[listing files]

HOWK -> ($) sleep 100 BG
[Threadpool] Submitting job to threadpool...
[1] Background job spawned...

HOWK -> ($) j*b
--- HOWK ACTIVE JOBS ---
  [1] Running (BG)  PID: 1234   CMD: sleep 100 BG
------------------------

HOWK -> ($) onTop 1
Bringing Job [1] (sleep 100 BG) onTop...
[waits until process finishes]
[Background Job Completed: PID 1234]

HOWK -> ($) exit

HOWK Shell Architecture Document
Comprehensive explanation of processes, fork/exec/wait, file descriptors, job control, pipes, signals, data structures, queues, scheduling, threads, mutexes, and condition variables.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors