Microkernel Based Systems

23.02.2023 - 10:56

Kernel Level

Interfacing with Hardware

undefined behavior is alright in output
- tell the users not to depend on the last bits of the output
not alright in input
- you can define different behaviors based on the last bit

For kernel can

program a part of memory for its use
it’s basically just another application

Linking and loading

You build the kernel outside of a standard linux environment

no dynamic liking
no RTTI
no stdlib

Thread Switch

userlevel → kernellevel
save the registers in TCB
- to switch back restore those registers

IPC

To send messages between threads you don’t save and restore those register. The receiving end will declare beforehand to the kernel to be willing to this and ready to have its registers overwritten.

to avoid priority inversions if high priority T call and wait on low priority T the latter get a priority boost to not let the high one hanging

In Multiprocessor System

ways of interaction
- shared memory
- inter-processor interrupts
- direct memory access
  - usually not available

Different interfaces are possible.

migrating threads
- a caller thread could give its resources to the callee, which doesn’t even need the data structure itself.
asynchronous help
- caller thread waits or gets back to the callee when he needs the product of the call

Option 2 is local communication partners

pool of workers, with a middle thread deciding to whom to dispatch
3-way communication
- inspect
- forward
portals, more efficient way to remove the middleman
- hides away information to the caller, it doesn’t need to know which CPU it runs on
- receivers wait at the portal
- queue sender if no receiver is waiting
- queue receivers if no message is pending
- policy? the queue adds some problems
  - round robin
  - LIFO
  - queue

MESI cache lines

modified
exclusive
shared
invalid

Evict data into L2

Costs

most costly are sysenter and sysexit
can we avoid them?
LIPC
- tells the kernel to switch context
- when preemption is needed lazily switch kernel context

Memory management

Loader
- Construct the address space for applications.
Copy on write
- create Data memory as read only and replace with private writable copy on the first write
filesystem
anon
- provides resources
HDD driver

Cache Coloring

Index (middle part of the address) as a color indicating what row of cache to write (cache set). This way other processes cannot interfere with other colors.

This can be done at page level too by coloring in regard to the page offset.

Memory for the kernel itself

prevent denial of service through memory exhaustion
- quotas

Segments

Protezione dei segmenti fatta dalla MPU. Mapping of data and code with permissions.

Page table

Multi-level, basically a tree data-structure with 512 entry per-root for example. Traverse the data structure using the 2 indexes and the offset

Interface

cannot have MM directly map virtual-physical memory
- for access to kernel memory
- also this would not abstract hardware architecture

A good interface that resolves the problem of having different FM that can overlap and conflict is one that just gives directly the PT entry to the receiver of the memory. This is a map operation, the contract for this is that the MM can revoke the mapping at any time. When the system is running out of memory and needs it to give to other processes. For pages getting revoked a FM could pin a page that is important will not change in the moment (maybe during a DMA)¹.

The concept of mapping was known before as capability

pointer + access rights

Mapping is basically copying capability into a destination with reduced access rights.

diminish
grant (L4 map)
revoke (L4 unmap)

This is more general in the sense that allows having capabilities toward all kind of objects, like threads. Revocation has to resolve even indirect mappings, if a lower level removes access rights all higher levels need to have them removed. This constructs a key of dependencies that need to be traversed to access all the page tables entries to change the access rights. It is possible for an attacker to create a infinite tree, there is no good way to bound such a revoke operation. You could bound the tree depth, this creates problems for developing applications around that. The main problem remains that to start and end with a consistent system you need the operation to be preemptive.

On a forward pass through the tree you can remove the privileges so that children do not start growing the tree while the system is collapsing it from the other children. You can also use a lock in the root to indicate that the operation is still ongoing.

Demand Paging

The old linux interface just forked and then cleared the old address space. You can also create a clean address space and a thread inside it, then this will page fault transitioning to the kernel via IPC to the pager asking for the page of its code. Then the exception will be handled and then the thread resumes its execution.

Region Manager

In user level, so an application would IPC this RM to ask for pages and then this one would manipulate it and forward to the pager what it needs. Then the pager responds directly to the application.

Writing a syscall

the kernel crashes if the root is invoked
at the end of the syscall you have to continue the execution to the user code
- get out of the kernel and go to the address space of the user code
- on the binary (hypervisor.o) you can see that user stack is located on address 0x1000
- the user code is at address 0x2000 SEL_USER_CODE
- the mapping is done to the pagetable
- basically prepare the stack with the values needed and invoke iret
in usercode:
1. fault immediately {there is a opcode for undefined behavior}

The syshandler receives the codes of the syscalls and has to switch over the different possibilities and invoke the correct one. NB the kernel mode is always entered with a clean stack

Yield

continuation
thread + stack
- have a EC object with
  - stack
  - instruction pointer
    - could have that as a return address left on the stack before switching to jump in and continue
  - stack pointer
- have a switch function making current the other EC

Scheduling

real-time vs best effort
time as a resource

Best effort:

focus on common case behavior
as many tasks as possible
good responsiveness, no hard done if late

Real-time:

guarantees on timeliness
- hard, any deadline miss is consider catastrophic
  - easier to build
- soft
  - only probabilistic guarantees
- firm
  - out of a number of deadlines a number of them are allowed to miss
focus on worst case behavior
being late has negative effects
- the bound is defined by the environment
task, work to be scheduled
- periodic, instances reoccur every period \(T\)
- sporadic, instance reoccur no earlier than period \(T\)
job, concrete instance of a task
- these are usually implemented through a thread
time triggered
event triggered, fixed / dynamic priority
partitioned, assign to CPU and schedule locally
global, scheduling decides both when and where

The mixed criticality is controversial.

the failure probability events have to be independent, so they are multiplied and they quickly become ignorable
if they aren’t they are a problem
fault tree analysis

in Microkernel Based Systems

Different ideas:

Brian Ford - CPU Inheritance Scheduling
- event → mk → root scheduler → particular scheduler
- pro
  - decisions at user level
- cons
  - very high scheduling overhead because of all the IPC
  - need to conceive a user-level protocol to tell root when to switch to different user schedulers

Critical Sections and Resources

priority inheritance protocol
- give your own time to help the currently scheduled with the resources
ceiling priority protocol

User Level

Paper Reviews

idea → implementation and experiments → paper → review gets written → accept / reject / revise
purpose is to score and decide the most relevant papers to slot in conferences
- ideally you don’t show reviews to authors and tell them what to fix to get the paper accepted

Structure of a Review

summary, showing the author you understood the paper
points in favor and against, summary of the review
specific questions, classify the paper, ask for confidence
details, description of what is right, wrong, questionable

As a reviewer:

the weakness is in the paper not the authors
give clear instructions on how to improve

As an author:

respect the reviewer
- font size
- spell checking
- caution with symbols and abbreviations

Structure:

score
confidence in correctness
summary
points in favor
points against
details

Program Verification

Source: The Semantics of a Programming Language

syntax
semantics
- easy for booleans, harder for integers
- we can represent a subset
  - cant use modulo math in the definition of semantics because you lose the capability to recognize integer overflow
program logic
- sequential, assignment, consequence, etc.

Simple C: Imp

Denotational semantics
Operational semantics
- small and big step

Parallel systems verification with rely & guarantee as preconditions and postconditions. Considering interleaving between programs.

the different programs rely on others not to interfere and guarantee not to interfere

Memory Model

separation logic
underspecified semantics
- need at least 1 bit extra for encoding bytes, encode a dirty bit
plain memory
- memory where all is going well
- verify a property at least in a part of memory
- set of memory is blessed

Processor Architecture

not part of exam

basic knowledge of processor design
why accelerators are fast
transistor are created by doping (via aluminium and phosphor) the silicon
- create a electrical imbalance in the atoms
- before doping you mask the regions

Field effect transistor:

n region electrons go to the p regions after you apply an electrical charge
applying source and drain works creating an isolating layer and a gate on top
field effect from the changed gate make the p expand and get closer to the n regions getting to the point where the current can frow through
the transistor works like a digital switch
there is a margin of intermediate states which will cause problems
there is a propagation delay from input change to output charge

You use this by designing CMOS, having your functionality and then disabling any current after towards the GRD

there will be electrons leaking to the GND and energy dissipation through heat
the circuit is concluded with an inverter circuit

To have memory you need to connect the combinational logic device with memory devices capturing new states and storing current state

DLatch stores between two inverters for exactly 1 cycle
by combining the DLatch with the clock signal and the input and a second DLatch to always have a stable signal, this creates a register
it is costly in regards to the number of transistors, it is also the fastest
putting memory in between the logical devices allows to clock the system faster, this technique is called pipelining

Pipelining is not that easy because of dependencies between instructions

read after write hazard
write after write
write after read

To resolve these hazard you can introduce stalling in the pipeline if instruction requires a resource in use

control flow hazard are left to programmer and compiler to resolve
self modifying code with pipeline flushing

These RAW are so common that ALU has a bypass to itself to have the result back in the input in a single cycle, same thing for read from memory to have the result next cycle

For cache coherence there are protocols like MESI The burden for resolving dependencies in instruction can be on the user:

very long instruction word anchitecture (VLIW)
Itanium
bundle instruction to be fetched together by the CPU

Different instruction need different amount of cycles to complete

extend the pipeline to allow different instruction to complete at different points in time and optimize the latency

Another way to resolve data hazards the processor can choose different registers from that specified by the programmer

the only ensurance is that the state is what is specified by the time it is checked
register renaming, the names are only needed by the programmer/compiler
having a rearder buffer keeping track of the instruction coming in and recording their results the CPU can use them with speculative execution to cheat when exceptions happens and it needs to report to the OS the actual registers used
- Spectre and Meltdown used speculation, first with information leakage as a side effect and second as lazy privilege checks in tandem with speculation

We try to fetch as many instructions as possible but still have to deal with branch prediction

predict whether the branch is taken or not, statically and dynamically
predict the location of the branch
predict for special constructs
- predict backward jumps as not likely
- predict forward jumps as likely
- doing this predictions are correct most of the time

At this point you can get faster by parallelism

more CPU
symmetric multiprocessing, one CPU and thread selecting with pipelining
- in this case the probability of dependencies is much lower

Or faster with accelerators

so critical functionality gets dedicated hardware
matrix multiplication

Unlimited ?? Pinning smtsmt ↩︎

Dan's Brain