Microkernel Based Systems
Kernel Level
Interfacing with Hardware
- undefined behavior is alright in output
- tell the users not to depend on the last bits of the output
- not alright in input
- you can define different behaviors based on the last bit
For kernel can
- program a part of memory for its use
- it’s basically just another application
Linking and loading
You build the kernel outside of a standard linux environment
- no dynamic liking
- no RTTI
- no stdlib
Thread Switch
- userlevel → kernellevel
- save the registers in
TCB
- to switch back restore those registers
IPC
To send messages between threads you don’t save and restore those register. The receiving end will declare beforehand to the kernel to be willing to this and ready to have its registers overwritten.
- to avoid priority inversions if high priority
T
call and wait on low priorityT
the latter get a priority boost to not let the high one hanging
In Multiprocessor System
- ways of interaction
- shared memory
- inter-processor interrupts
- direct memory access
- usually not available
Different interfaces are possible.
- migrating threads
- a caller thread could give its resources to the callee, which doesn’t even need the data structure itself.
- asynchronous help
- caller thread waits or gets back to the callee when he needs the product of the call
Option 2 is local communication partners
- pool of workers, with a middle thread deciding to whom to dispatch
- 3-way communication
inspect
forward
- portals, more efficient way to remove the middleman
- hides away information to the caller, it doesn’t need to know which CPU it runs on
- receivers wait at the portal
- queue sender if no receiver is waiting
- queue receivers if no message is pending
- policy? the queue adds some problems
- round robin
- LIFO
- queue
MESI
cache lines
- modified
- exclusive
- shared
- invalid
Evict data into L2
Costs
- most costly are
sysenter
andsysexit
- can we avoid them?
LIPC
- tells the kernel to switch context
- when preemption is needed lazily switch kernel context
Memory management
- Loader
- Construct the address space for applications.
- Copy on write
- create Data memory as read only and replace with private writable copy on the first write
- filesystem
- anon
- provides resources
- HDD driver
Cache Coloring
Index (middle part of the address) as a color indicating what row of cache to write (cache set). This way other processes cannot interfere with other colors.
This can be done at page level too by coloring in regard to the page offset.
Memory for the kernel itself
- prevent denial of service through memory exhaustion
- quotas
Segments
Protezione dei segmenti fatta dalla MPU
.
Mapping of data and code with permissions.
Page table
Multi-level, basically a tree data-structure with 512 entry per-root for example. Traverse the data structure using the 2 indexes and the offset
Interface
- cannot have
MM
directly map virtual-physical memory- for access to kernel memory
- also this would not abstract hardware architecture
A good interface that resolves the problem of having different FM
that can overlap and conflict is one that just gives directly the PT
entry to the receiver of the memory.
This is a map
operation, the contract for this is that the MM
can revoke the mapping at any time. When the system is running out of memory and needs it to give to other processes.
For pages getting revoked a FM
could pin a page that is important will not change in the moment (maybe during a DMA
)1.
The concept of mapping was known before as capability
- pointer + access rights
Mapping is basically copying capability into a destination with reduced access rights.
diminish
grant
(L4
map)revoke
(L4
unmap)
This is more general in the sense that allows having capabilities toward all kind of objects, like threads.
Revocation has to resolve even indirect mappings, if a lower level removes access rights all higher levels need to have them removed.
This constructs a key of dependencies that need to be traversed to access all the page tables entries to change the access rights.
It is possible for an attacker to create a infinite tree, there is no good way to bound such a revoke
operation.
You could bound the tree depth, this creates problems for developing applications around that.
The main problem remains that to start and end with a consistent system you need the operation to be preemptive.
On a forward pass through the tree you can remove the privileges so that children do not start growing the tree while the system is collapsing it from the other children. You can also use a lock in the root to indicate that the operation is still ongoing.
Demand Paging
The old linux interface just forked and then cleared the old address space.
You can also create a clean address space and a thread inside it, then this will page fault transitioning to the kernel via IPC
to the pager asking for the page of its code. Then the exception will be handled and then the thread resumes its execution.
Region Manager
In user level, so an application would IPC
this RM
to ask for pages and then this one would manipulate it and forward to the pager what it needs. Then the pager responds directly to the application.
Writing a syscall
- the kernel crashes if the root is invoked
- at the end of the syscall you have to continue the execution to the user code
- get out of the kernel and go to the address space of the user code
- on the binary (hypervisor.o) you can see that user stack is located on address 0x1000
- the user code is at address 0x2000
SEL_USER_CODE
- the mapping is done to the pagetable
- basically prepare the stack with the values needed and invoke
iret
- in
usercode
:- fault immediately {there is a opcode for undefined behavior}
The syshandler receives the codes of the syscalls and has to switch over the different possibilities and invoke the correct one. NB the kernel mode is always entered with a clean stack
Yield
- continuation
- thread + stack
- have a
EC
object with- stack
- instruction pointer
- could have that as a return address left on the stack before switching to jump in and continue
- stack pointer
- have a switch function making current the other
EC
- have a
Scheduling
- real-time vs best effort
- time as a resource
Best effort:
- focus on common case behavior
- as many tasks as possible
- good responsiveness, no hard done if late
Real-time:
- guarantees on timeliness
- hard, any deadline miss is consider catastrophic
- easier to build
- soft
- only probabilistic guarantees
- firm
- out of a number of deadlines a number of them are allowed to miss
- hard, any deadline miss is consider catastrophic
- focus on worst case behavior
- being late has negative effects
- the bound is defined by the environment
- task, work to be scheduled
- periodic, instances reoccur every period \(T\)
- sporadic, instance reoccur no earlier than period \(T\)
- job, concrete instance of a task
- these are usually implemented through a thread
- time triggered
- event triggered, fixed / dynamic priority
- partitioned, assign to CPU and schedule locally
- global, scheduling decides both when and where
The mixed criticality is controversial.
- the failure probability events have to be independent, so they are multiplied and they quickly become ignorable
- if they aren’t they are a problem
- fault tree analysis
in Microkernel Based Systems
Different ideas:
- Brian Ford - CPU Inheritance Scheduling
- event → mk → root scheduler → particular scheduler
- pro
- decisions at user level
- cons
- very high scheduling overhead because of all the
IPC
- need to conceive a user-level protocol to tell root when to switch to different user schedulers
- very high scheduling overhead because of all the
Critical Sections and Resources
- priority inheritance protocol
- give your own time to help the currently scheduled with the resources
- ceiling priority protocol
User Level
Paper Reviews
- idea → implementation and experiments → paper → review gets written → accept / reject / revise
- purpose is to score and decide the most relevant papers to slot in conferences
- ideally you don’t show reviews to authors and tell them what to fix to get the paper accepted
Structure of a Review
- summary, showing the author you understood the paper
- points in favor and against, summary of the review
- specific questions, classify the paper, ask for confidence
- details, description of what is right, wrong, questionable
As a reviewer:
- the weakness is in the paper not the authors
- give clear instructions on how to improve
As an author:
- respect the reviewer
- font size
- spell checking
- caution with symbols and abbreviations
Structure:
- score
- confidence in correctness
- summary
- points in favor
- points against
- details
Program Verification
Source: The Semantics of a Programming Language
- syntax
- semantics
- easy for booleans, harder for integers
- we can represent a subset
- cant use modulo math in the definition of semantics because you lose the capability to recognize integer overflow
- program logic
- sequential, assignment, consequence, etc.
Simple C: Imp
- Denotational semantics
- Operational semantics
- small and big step
Parallel systems verification with rely & guarantee as preconditions and postconditions. Considering interleaving between programs.
- the different programs rely on others not to interfere and guarantee not to interfere
Memory Model
- separation logic
- underspecified semantics
- need at least 1 bit extra for encoding bytes, encode a dirty bit
- plain memory
- memory where all is going well
- verify a property at least in a part of memory
- set of memory is blessed
Processor Architecture
not part of exam
- basic knowledge of processor design
- why accelerators are fast
- transistor are created by doping (via aluminium and phosphor) the silicon
- create a electrical imbalance in the atoms
- before doping you mask the regions
Field effect transistor:
n
region electrons go to thep
regions after you apply an electrical charge- applying source and drain works creating an isolating layer and a gate on top
- field effect from the changed gate make the
p
expand and get closer to then
regions getting to the point where the current can frow through - the transistor works like a digital switch
- there is a margin of intermediate states which will cause problems
- there is a propagation delay from input change to output charge
You use this by designing CMOS
, having your functionality and then disabling any current after towards the GRD
- there will be electrons leaking to the
GND
and energy dissipation through heat - the circuit is concluded with an inverter circuit
To have memory you need to connect the combinational logic device with memory devices capturing new states and storing current state
DLatch
stores between two inverters for exactly 1 cycle- by combining the
DLatch
with the clock signal and the input and a secondDLatch
to always have a stable signal, this creates a register - it is costly in regards to the number of transistors, it is also the fastest
- putting memory in between the logical devices allows to clock the system faster, this technique is called pipelining
Pipelining is not that easy because of dependencies between instructions
- read after write hazard
- write after write
- write after read
To resolve these hazard you can introduce stalling in the pipeline if instruction requires a resource in use
- control flow hazard are left to programmer and compiler to resolve
- self modifying code with pipeline flushing
These RAW
are so common that ALU
has a bypass to itself to have the result back in the input in a single cycle, same thing for read from memory to have the result next cycle
For cache coherence there are protocols like MESI
The burden for resolving dependencies in instruction can be on the user:
- very long instruction word anchitecture (
VLIW
) - Itanium
- bundle instruction to be fetched together by the
CPU
Different instruction need different amount of cycles to complete
- extend the pipeline to allow different instruction to complete at different points in time and optimize the latency
Another way to resolve data hazards the processor can choose different registers from that specified by the programmer
- the only ensurance is that the state is what is specified by the time it is checked
- register renaming, the names are only needed by the programmer/compiler
- having a rearder buffer keeping track of the instruction coming in and recording their results the
CPU
can use them with speculative execution to cheat when exceptions happens and it needs to report to theOS
the actual registers used- Spectre and Meltdown used speculation, first with information leakage as a side effect and second as lazy privilege checks in tandem with speculation
We try to fetch as many instructions as possible but still have to deal with branch prediction
- predict whether the branch is taken or not, statically and dynamically
- predict the location of the branch
- predict for special constructs
- predict backward jumps as not likely
- predict forward jumps as likely
- doing this predictions are correct most of the time
At this point you can get faster by parallelism
- more
CPU
- symmetric multiprocessing, one
CPU
and thread selecting with pipelining- in this case the probability of dependencies is much lower
Or faster with accelerators
- so critical functionality gets dedicated hardware
- matrix multiplication
-
Unlimited ?? Pinning smtsmt ↩︎