-
Notifications
You must be signed in to change notification settings - Fork 257
Internally Reko uses its own Register Transfer Language (RTL) in all analyses it performs. It has no knowledge of processor specific machine instructions. It is the task of each processor architecture implementation to provide a suitable Rewriter that translates machine instructions to RTL.
RTL consists of distinct instructions and expressions.
The instructions are the following:
RtlAssignment
: models an assignment. E.g.
eax = eax + 1
Mem0[r1:byte] = 0x20
RtlBranch
: models a conditional branch to an address. E.g.
branch Test(NZ, eax) 00402344
RtlCall
: models either a direct subroutine call (to a constant address) or an indirect call (to a computed expression:
call 00401580
call Mem0[eax + 00000018:word32]
RtlGoto
: models an unconditional direct or indirect branch (like the kind produced by switch
statements):
goto 00401890
goto Mem0[r1 + r2 * 4]
RtlIf
: models a conditionally executed statement, present in some architectures:
if (r1 > 0) r1 = 0
RtlReturn
: models a return to the caller, including how many bytes are removed from a return stack (if applicable)
return (4)
RtlSideEffect
: models an instruction that has no observable effect on registers, e.g. the out
instruction of the x86 architecture:
__outb(edx,al)
All expressions modeled by the Decompiler have a [data type](data type). At the very least, the data type will be one of the neutral byte
or word<XX>
types, whose only attribute is their size in bits.
Base expressions constitute the leaf nodes of expression trees. There are three kinds of base expressions:
Constants: these model constant values, such as booleans, integers, characters or real numbers. Constants may be signed, unsigned (in the case of integers)
false
-1234
3e-3
'c'
Later stages of the decompilation process may produce string constants, which also are modeled by constants.
Addresses are special constants that are known to be pointers to locations. Addresses are especially useful to Reko as it allows it to determine locations referred to by the program. Address must model byzantine addressing schemes such as the infamous x86 segmented addresses, consisting of a segment selector and an offset.
004079A0
0C00:1253
Identifiers model locations accessed by the program. The name of the identifiers are derived from register names, or synthesized from other values such as stack offsets:
r1
dwLoc04
global_00403120
fn04001670
Expressions can be further composed by combinations of base expressions and the following:
Unary operators model negation, bit-wise complement, and other single-operand expressions:
!cx
&dwLoc04
Binary operators model arithmetic operations, logical operations, shift operations, and comparisons:
dwLoc04 + 0x0004
r1 << 0x02
al >= '0'
Memory accesses model loads from and stores to memory. A special version of the instruction models Intel x86 segmented memory accesses:
Mem0[fp - 0x12] = r10
ax = Mem1[es:bx + 0x04:word16]
Sequences model expressions that occupy consecutive ordered sequences of registers. Commonly used when register pairs are used to represent values that are too wide to fit in one register:
dx:ax
es:bx
hi:lo
The sequence construction operator SEQ is used to build sequences of other things than registers. For instance, the expression SEQ(Mem0[ds:bx + 0x0004:word16],Mem0[ds:bx + 0x0002:word16])
models a 32-bit sequence constructed by fetching two 16-bit words from memory in little-endian order.
Casts are used to coerce the data type of an expression to another. This construct is used to type conversion, model sign extension and truncation:
(word16) eax
(int32) 'a'
Applications model calls to functions:
fn0124_0123(ecx)
The DPB function is derived from a function in Common Lisp that takes its name from a PDP-10 instruction, also called DPB
(http://pdp10.nocrew.org/docs/instruction-set/Byte.html), which would deposit a byte inside of a larger word. Reko uses DPB
to models how, on some architectures, a byte load is stored into an architectural register that is wider than a byte without modifying the remainder of the register. For instance, the m68k instruction:
move.b (a5),d3
will be "lifted" to the RTL:
tmp = Mem0[a5:byte] // load a byte from memory
d3 = DPB(d3, tmp, 0) // deposit the byte at offset 0 of d3.