Software Vulnerabilities: Exploitation and Mitigation

23.02.2023 - 10:58

Principles

Computer Security and Software Development

Software Life Cycle

Confidentiality, Integrity, Availability

Authentication & Access Control

Software Vulnerabilities

Computer Architecture

Memory Attacks and Defenses

Buffer overflow

Can occur with languages with no built-in memory defense.

C, C++

A buffer is a data container, sequence. An array basically. Accessed using an index. If you access over the length of the buffer:

nothing happens
- could happen if the OS allocated more than needed
segfault
- reaching a zone not allocated to the program
- this is detected by OS and stopped
- this could corrupt data
custom code execution
- understanding the overwrite done by the overflow an attacker could put something calling custom code in this segment
- C functions are identified by memory @addresses
- when a function \(F_{1}\) calls \(F_{2}\) the address of the former is stored on the stack
  - this way \(F_{2}\) knows where to return once done
- local variables of the callee are stored on the stack

The analyst can hijack the control flow if the memory addresses of the stack/function are fixed. This is not the case anymore.

Usually the custom code executed is a reverse shell to take control of the computer. The code runs with the privileges of the vulnerable process.

The vulnerability is usually used to exploit more vulnerabilities.

Two approaches to limit exploitation

stack canary
data execution prevention (DEP)
- Address Space Layout Randomization (ASLR)
- Control Flow Integrity (CFI)
  - checks you are coming back from the right addresses

The problem still exists, vulnerabilities is Intel and AMD were found as of 2018.

ASLR can be still attacked if there is an information leak in the code giving insight into memory addresses.

once the mapping is leaked any code block is reusable

Control Flow Integrity

mechanisms ensuring the control flows executed are the intended ones.
control flow graphs CFG
- one for function
also used a call graph
- one for the entire program

In code a basic block is a sequence of instruction with only one entry and one exit point.

The control flow can jump in

a direct way with target written in the code, cannot be overridden
an indirect way with target in a variable in memory or a register, to protect with ICFG

ICFG is checked with labels in the code, added by the compiler. This is done by a new instruction ID that continues with the call to address contained in DST only if the ID is the same of the one annotated at the start of the method called. This is needed for indirect calls so for virtual methods for example.

Another solution in a shadow stack, a protected stack keeping track of the calling trace.

stores copy of return address of each call
after a ret the processor checks if the return address stored in the normal stack is the same as the one in the shadow stack

Compiling with CFI can add up to 20% overhead.

Heap overflow

When data has no fixed size it is stored on the heap. The heap is importanti because it survives the single function life-cycle. When freeing chunks in the heap if there are adjacent ones you join them, to help with fragmentation.

Metadata of chunks:

prev_size
size
- lowest 3 bits are flags, lowest bit PREV_INUSE
data ← pointer returned

If a chunk is no longer need it is marked as unallocated, when this happens the pointers fd and bk are added to chunk’s data sections.

this way on every free blocks of free memory are doubly linked
free calls unlink which updates the size of the chunk and remove the link from/to this new bigger chunk, this is done for performance reasons
- this procedure is used to write a return address with a heap overflow
- consolidate backward\forward

#define unlink(P, BK, FD)
{
    BK = P->bk;
    FD = P->fd;
    FD->bk = BK;
    BK->fd = FD;
}

By manipulating bk and fd we can write in any place in memory.

Steps:

overwrite the PREV_INUSE of the next chuck with 0
overwrite the size of the next chuck with -4 (this in 32bits systems with 4B fields)
1. when the checker sees this it will check the next PREV_INUSE, so it checks the same PREV_INUSE with 0, now it will call unlink to merge the chunks
overwrite bk with @shellcode
overwrite fd with @free-12
1. this is the address of the GOT entry of free, now this entry will be overwritten by unlink to be pointing to the shellcode
now calls to free execude the shellcode

Integer overflow

String format vulnerabilities

Type confusion

Use After Free

High Level Attacks and Defenses

Injection

Broad class of attack vectors.

an attacker supplies untrusted input
input is processed as part of command or query
this alters the execution

Code injection enables bypass of authorization checks and/or execution of arbitrary code on the server. This way the attacker gets access to privileged environment and can dump the database.

The reason for this vulnerability is usually insufficient user input validation and sanitization.

SQL injection

SQLi

<?  php
$conn-> query("SELECT * from users where username == \"$username\" and password == \"$password\"");

With username: " OR 1 == 1 –"¹ you can bypass the password check via injection.

OS command injection

Cross-Site Scripting

XSS

Confused deputy

Ask someone to do something in your stead.

Finding Software Vulnerabilities

Static Program Analysis

Analyse only the code, no running, to find vulnerabilities.

Dynamic Program Analysis

Analyse the execution of the program to find vulnerabilities.

Fuzzing

Very used in the hacking and the industry world to find vulnerabilities. Generate a large number of inputs to exercise the program.

could be clicks, text, network interactions…

Input totally random, in both size and value.

For analysis you can use a Oracle to test the results against the oracle responses for the inputs (which is the expected result). With oracles one can detect anomalies aside from crashes.

First publication on topic in the 1990: An empirical study of the reliability of UNIX utilities.

State of the art for C/C++ is AFL, American Fuzzy Loop. Checks code coverage of the binary (can be number of statements you cover running the test case).

AFL mutates the best inputs it finds during testing by maximizing a goal function (coverage). Differently to manual analysis not all code can be covered and the fuzzer might take a long time to find acceptable inputs.

– denotes comment in SQL ↩︎

Dan's Brain