Software Vulnerabilities: Exploitation and Mitigation
Principles
Computer Security and Software Development
Software Life Cycle
Confidentiality, Integrity, Availability
Authentication & Access Control
Software Vulnerabilities
Computer Architecture
Memory Attacks and Defenses
Buffer overflow
Can occur with languages with no built-in memory defense.
C
,C++
A buffer is a data container, sequence. An array basically. Accessed using an index. If you access over the length of the buffer:
- nothing happens
- could happen if the
OS
allocated more than needed
- could happen if the
- segfault
- reaching a zone not allocated to the program
- this is detected by
OS
and stopped - this could corrupt data
- custom code execution
- understanding the overwrite done by the overflow an attacker could put something calling custom code in this segment
C
functions are identified by memory@addresses
- when a function \(F_{1}\) calls \(F_{2}\) the address of the former is stored on the stack
- this way \(F_{2}\) knows where to return once done
- local variables of the callee are stored on the stack
The analyst can hijack the control flow if the memory addresses of the stack/function are fixed. This is not the case anymore.
Usually the custom code executed is a reverse shell to take control of the computer. The code runs with the privileges of the vulnerable process.
The vulnerability is usually used to exploit more vulnerabilities.
Two approaches to limit exploitation
- stack canary
- data execution prevention (
DEP
)- Address Space Layout Randomization (
ASLR
) - Control Flow Integrity (
CFI
)- checks you are coming back from the right addresses
- Address Space Layout Randomization (
The problem still exists, vulnerabilities is Intel
and AMD
were found as of 2018.
ASLR
can be still attacked if there is an information leak in the code giving insight into memory addresses.
- once the mapping is leaked any code block is reusable
Control Flow Integrity
- mechanisms ensuring the control flows executed are the intended ones.
- control flow graphs
CFG
- one for function
- also used a call graph
- one for the entire program
In code a basic block is a sequence of instruction with only one entry and one exit point.
The control flow can jump in
- a direct way with target written in the code, cannot be overridden
- an indirect way with target in a variable in memory or a register, to protect with
ICFG
ICFG
is checked with labels in the code, added by the compiler. This is done by a new instruction ID
that continues with the call to address contained in DST
only if the ID
is the same of the one annotated at the start of the method called. This is needed for indirect calls so for virtual methods for example.
Another solution in a shadow stack, a protected stack keeping track of the calling trace.
- stores copy of
return
address of each call - after a
ret
the processor checks if the return address stored in the normal stack is the same as the one in the shadow stack
Compiling with CFI
can add up to 20% overhead.
Heap overflow
When data has no fixed size it is stored on the heap. The heap is importanti because it survives the single function life-cycle. When freeing chunks in the heap if there are adjacent ones you join them, to help with fragmentation.
Metadata of chunks:
prev_size
size
- lowest 3 bits are flags, lowest bit
PREV_INUSE
- lowest 3 bits are flags, lowest bit
data
← pointer returned
If a chunk is no longer need it is marked as unallocated, when this happens the pointers fd
and bk
are added to chunk’s data sections.
- this way on every
free
blocks of free memory are doubly linked free
callsunlink
which updates the size of the chunk and remove the link from/to this new bigger chunk, this is done for performance reasons- this procedure is used to write a return address with a heap overflow
- consolidate backward\forward
#define unlink(P, BK, FD)
{
BK = P->bk;
FD = P->fd;
FD->bk = BK;
BK->fd = FD;
}
By manipulating bk
and fd
we can write in any place in memory.
Steps:
- overwrite the
PREV_INUSE
of the next chuck with 0 - overwrite the
size
of the next chuck with -4 (this in 32bits systems with 4B fields)- when the checker sees this it will check the next
PREV_INUSE
, so it checks the samePREV_INUSE
with 0, now it will callunlink
to merge the chunks
- when the checker sees this it will check the next
- overwrite
bk
with@shellcode
- overwrite
fd
with@free-12
- this is the address of the
GOT
entry offree
, now this entry will be overwritten byunlink
to be pointing to the shellcode
- this is the address of the
- now calls to
free
execude theshellcode
Integer overflow
String format vulnerabilities
Type confusion
Use After Free
High Level Attacks and Defenses
Injection
Broad class of attack vectors.
- an attacker supplies untrusted input
- input is processed as part of command or query
- this alters the execution
Code injection enables bypass of authorization checks and/or execution of arbitrary code on the server. This way the attacker gets access to privileged environment and can dump the database.
The reason for this vulnerability is usually insufficient user input validation and sanitization.
SQL injection
SQLi
<? php
$conn-> query("SELECT * from users where username == \"$username\" and password == \"$password\"");
With username: " OR 1 == 1 –"1 you can bypass the password check via injection.
OS command injection
Cross-Site Scripting
XSS
Confused deputy
Ask someone to do something in your stead.
Finding Software Vulnerabilities
Static Program Analysis
Analyse only the code, no running, to find vulnerabilities.
Dynamic Program Analysis
Analyse the execution of the program to find vulnerabilities.
Fuzzing
Very used in the hacking and the industry world to find vulnerabilities. Generate a large number of inputs to exercise the program.
- could be clicks, text, network interactions…
Input totally random, in both size and value.
For analysis you can use a Oracle to test the results against the oracle responses for the inputs (which is the expected result). With oracles one can detect anomalies aside from crashes.
First publication on topic in the 1990: An empirical study of the reliability of UNIX utilities.
State of the art for C/C++ is AFL
, American Fuzzy Loop.
Checks code coverage of the binary (can be number of statements you cover running the test case).
AFL
mutates the best inputs it finds during testing by maximizing a goal function (coverage).
Differently to manual analysis not all code can be covered and the fuzzer might take a long time to find acceptable inputs.
-
– denotes comment in SQL ↩︎