Help File:ASM Basics 1
ASM Basics 1
Originally posted by DABhand
The Basics
Opcodes
Ok whats opcodes? An opcode is an instruction the processor can understand. For example
SUB and ADD and DIV
The sub instructions subtracts two numbers together. Most opcodes have operands
SUB destination,source like the following
SUB eax, ecx
SUB has 2 operands. In the case of a subtraction, a source and a destination. It subtracts the source value to the destination value and then stores the result in the destination. Operands can be of different types: registers, memory locations, immediate values.
So basically that instruction is this, say for example eax contained 20 and ecx contained 10
eax = eax - ecx eax = 20 - 10 eax = 10
Easy that bit huh
Registers
Ahhh here is the main force of asm, Registers contain values and information which is used in a program to keep track of things, and when new to ASM it does look messy but the system is practically efficient. It is honestly
Lets take a look at the main Register used, its eax. Say it contains the value FFEEDDCCh (the h means hexidecimal) when working later with softice u will see hex values alot so get used to it now
Ok Ill show how the registers are constructed
EAX FFEEDDCC AX DDCC AH DD AL CC
ax, ah, al are part of eax. EAX is a 32-bit register (available only on 386+), ax contains the lower 16 bits (2 bytes) of eax, ah contains the high byte of ax, and al contains the low byte of ax. So ax is 16 bit, al and ah are 8 bit. So, in the example above, these are the values of the registers:
eax = FFEEDDCC (32-bit) ax = DDCC (16-bit) ah = DD (8-bit) al = CC (8-bit)
Understand? I know its alot to take in, but thats how registers work Heres some more examples of opcodes and the registers used...
mov eax, 002130DF // mov loads a value into a register mov cl, ah // move the high byte of ax (30h) into cl sub cl, 10 // substract 10 (dec.) from the value in cl mov al, cl // and store it in the lowest byte of eax.
So at start..
eax = 002130DF
at end
eax = 00213026
Did you follow what happened? I hope so, cause im trying to make this as easy as I can
Ok lets discuss the types of registers, there is 4 types used mainly (there is others but will tell about them later)
General Purpose Registers
These 32-bit (and their 16bit and 8bit sub registers) registers can be used for anything, but their main purpose is shown after them.
eax (ax/ah/al) Accumulator ebx (bx/bh/bl) Base ecx (cx/ch/cl) Counter edx (dx/dh/dl) Data
As said these are hardly used nowadays for their main purpose and is used to ferry arround information within programs and games (such as scores, health value etc)
Segment Registers
Segment registers define the segment of memory that is used. You'll probably won't need them with win32asm, because windows has a flat memory system. In dos, memory is divided into segments of 64kb, so if you want to define a memory address, you specify a segment, and an offset (like 0172:0500 (segment:offset)). In windows, segments have sizes of 4gig, so you won't need segments in win. Segments are always 16-bit registers.
CS code segment DS data segment SS stack segment ES extra segment FS (only 286+) general purpose segment GS (only 386+) general purpose segment
Pointer Registers
Actually, you can use pointer registers as general purpose registers (except for eip), as long as you preserve their original values. Pointer registers are called pointer registers because their often used for storing memory addresses. Some opcodes (and also movb,scasb,etc.) use them.
esi (si) Source index edi (di) Destination index eip (ip) Instruction pointer
EIP (or IP in 16-bit programs) contains a pointer to the instruction the processor is about to execute. So you can't use eip as general purpose registers.
Stack Registers
There are 2 stack registers: esp & ebp. ESP holds the current stack position in memory (more about this in one of the next tutorials). EBP is used in functions as pointer to the local variables.
esp (sp) Stack pointer ebp (bp) Base pointer
MEMORY
How is the memory used within ASM and the layout of it? Well hopefully this will answer some questions. Bear in mind there is more advanced things than what is explained here, but hell you lot arent advanced, so start from the basics
Lets look at the different types..
DOS
In 16-bit programs like for DOS (and Win 3.1), memory was divided in segments. These segments have sizes of 64kb. To access memory, a segment pointer and an offset pointer are needed. The segment pointer indicates which segment (section of 64kb) to use, the offset pointer indicates the place in the segment itself.
Take a look at this
----------------------------MEMORY-------------------------------- |SEGMENT 1 (64kb)|SEGMENT 2 (64kb)|SEGMENT 3 (64kb)|etc...........|
Hope that shows well
Note that the following explanation is for 16-bit programs, more on 32-bit later (but don't skip this part, it is important to understand 32-bits).
The table above is the total memory, divided in segments of 64kb. There's a maximum of 65536 segments. Now take one of the segments:
-------------------SEGMENT 1(64kb)---------------------- |Offset 1|Offset 2|Offset 3|Offset 4|Offset 5|etc.......|
To point to a location in a segment, offsets are used. An offset is a location inside the segment. There's a maximum of
65536 offsets per segment. The notation of an address in memory is:
SEGMENT:OFFSET
For example:
0145:42A2 (all hex numbers remember )
This means: segment 145, offset 42A2. To see what is at that address, you first go to segment 145, and then to offset 42A2 in that segment.
Hopefully you remembered to read about those Segment Registers a while ago on this thread.
CS - Code segment DS - Data Segment SS - Stack Segment ES - Extra Segment FS - General Purpose GS - General Purpose <<< Them remember
The names explain their function: code segment (CS) contains the number of the section where the current code that is being executed is. Data segment for the current segment to get data from. Stack indicates the stack segment (more on the stacks later), ES, FS, GS are general purpose registers and can be used for any segment (not in win32 though).
Pointer registers most of the time hold an offset, but general purpose registers (ax, bx, cx, dx etc.) can also be used for this. IP (Pointer register) indicates the offset (in the CS (code segment)) of the instruction that is currently executed. SP (Stack register) holds the offset (in the SS (stack segment)) of the current stack position.
Phew and you thought 16bit memory was hard huh
Sorry if thats all confusing, but its the easiest way to explain it. Reread it a few times it will eventually sink into your brain on how memory works and how it is accessed to be read and written too
Now we move to
32-bit Windows
You have probably noticed that all this about segments really isn't fun. In 16-bit programming, segments are essential. Fortunately, this problem is solved in 32-bit Windows (9x and NT).
You still have segments, but don't care about them because they aren't 64kb, but 4 GIG. Windows will probably even crash if you try to change one of the segment registers.
This is called the flat memory model. There are only offsets, and they now are 32-bit, so in a range from 0 to 4,294,967,295. Every location in memory is indicated only by an offset.
This is really one of the best advantages of 32-bit over 16-bit. So you can forget the segment registers now and focus on the other registers.
Oh the madness of it all, wow 4 gig bits to work with
The Fun Part
The Fun Part begins!!!
Its
THE OPCODES
Here is a list of a few opcodes you will notice alot of when making trainers or cracking etc.
MOV
This instruction is used to move (or actually copy) a value from one place to another. This 'place' can be a register, a memory location or an immediate value (only as source value of course). The syntax of the mov instruction is:
mov destination, source
You can move a value from one register to another (note that the instruction copies the value, in spite of its name 'move', to the destination).
mov edx, ecx
The instruction above copies the contents of ecx to edx. The size of source and destination should be the same, this instruction for example is NOT valid:
mov al, ecx ; NOT VALID
This opcode tries to put a DWORD (32-bit) value into a byte (8-bit). This can't be done by the mov instruction (there are other instructions to do this). But these instructions are allowed because source and destination don't differ in size, like for example...
mov al, bl mov cl, dl mov cx, dx mov ecx, ebx
Memory locations are indicated with an offset (in win32, for more info see the previous page). You can also get a value from a certain memory location and put it in a register. Take the following table as example:
offset 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 data 0D 0A 50 32 44 57 25 7A 5E 72 EF 7D FF AD C7
(each block represents a byte)
The offset value is indicated as a byte here, but it is a 32-bit value. Take for example 3A (which isn't a common value for an offset, but otherwise the table won't fit...), this also is a 32-bit value: 0000003Ah. Just to save space, some unusual and low offsets are used. All values are hexcodes.
Look at offset 3A in the table above. The data at that offset is 25, 7A, 5E, 72, EF, etc. To put the value at offset 3A in, for example, a register you use the mov instruction, too:
mov eax, dword ptr [0000003Ah] ... but.......
You will see this more commonly in programs as
mov eax, dword ptr [ecx+45h]
This means ecx+45 will point to the memory location to take the 32 bit data from, we know its 32bit because of the dword in the instruction. To take say 16bit of data we use WORD PTR or 8bit BYTE PTR, like the following examples..
mov cl, byte ptr [34h] cl will get the value 0Dh (see table above) mov dx, word ptr [3Eh] dx will get the value 7DEFh (see table above, remember that the bytes are reversed)
The size sometimes isn't necessary:
mov eax, [00403045h]
because eax is a 32-bit register, the assembler assumes (and this is the only way to do it, too) it should take a 32-bit value from memory location 403045.
Immediate numbers are also allowed:
mov edx, 5006
This will just make the register edx contain the value 5006. The brackets, [ and ], are used to get a value from the memory location between the brackets, without brackets it is just a value. A register as memory location is allowed to (it should be a 32-bit register in 32-bit programs):
mov eax, 403045h ; make eax have the value 403045 hex. mov cx, [eax] ; put the word size value at the memory location EAX (403045) into register CX.
In mov cx, [eax], the processor first looks what value (=memory location) eax holds, then what value is at that location in memory, and put this word (16 bits because the destination, cx, is a 16-bit register) into CX.
Phew
ADD,SUB,MUL and DIV
These are easy to understand Good old maths, im sure everyone can add and subtract and multiply and divide
Anyways on with the info
The add-opcode has the following syntax:
add destination, source
The calculation performed is destination = destination + source. The following forms are allowed:
Destination Source Example Register Register add ecx, edx Register Memory add ecx, dword ptr [104h] / add ecx, [edx] Register Immediate value add eax, 102 Memory Immediate value add dword ptr [401231h], 80 Memory Register add dword ptr [401231h], edx
This instruction is very simple. It just takes the source value, adds the destination value to it and then puts the result in the destination. Other mathematical instructions are:
SUB destination, source (destination = destination - source) MUL destination, source (destination = destiantion * source) DIV source (eax = eax / source, edx = remainer)
Its easy peasy aint it Or is it
Substraction works the same as add, multiplication is just dest = dest * source. Division is a little different. Because registers are integer values (i.e. round numbers, not floating point numbers) , the result of a division is split in a quotient and a remainder. For example:
28 / 6 --> quotient = 4, remainder = 4 30 / 9 --> quotient = 3, remainder = 3 97 / 10 --> quotient = 9, remainder = 7 18 / 6 --> quotient = 3, remainder = 0
Now, depending on the size of the source, the quotient is stored in (a part of) eax, the remainder in (a part of) edx:
Source size Division Quotient stored inRemainder Stored in... BYTE (8-bits) ax / source AL AH WORD (16-bits) dx:ax* / source AX DX DWORD (32-bits) edx:eax* / source EAX EDX
- = For example: if dx = 2030h, and ax = 0040h, dx: ax = 20300040h. dx:ax is a dword value where dx represents the
higher word and ax the lower. Edx:eax is a quadword value (64-bits) where the higher dword is edx and the lower eax.
The source of the div-opcode can be:
an 8-bit register (al, ah, cl,...) a 16-bit register (ax, dx, ...) a 32-bit register (eax, edx, ecx...) an 8-bit memory value (byte ptr [xxxx]) a 16-bit memory value (word ptr [xxxx]) a 32-bit memory value (dword ptr [xxxx])
The source can not be an immediate value because then the processor cannot determine the size of the source operand.
BITWISE OPS
These instructions all take a destination and a source, exept the 'NOT' instruction. Each bit in the destination is compared to the same bit in the source, and depending on the instruction, a 0 or a 1 is placed in the destination bit:
Instruction AND OR XOR NOT Source Bit |0 0 1 1|0 0 1 1|0 0 1 1|0 1| Destination Bit |0 1 0 1|0 1 0 1|0 1 0 1|X X| Output Bit |0 0 0 1|0 1 1 1|0 1 1 0|1 0|
AND sets the output bit to 1 if both the source and destination bit is 1. OR sets the output bit if either the source or destination bit is 1 XOR sets the output bit if the source bit is different from the destination bit. NOT inverts the source bit.
An example:
mov ax, 3406 mov dx, 13EAh xor ax, dx
ax = 3406 (decimal), which is 0000110101001110 in binary.
dx = 13EA (hex), which is 0001001111101010 in binary.
Perform the XOR operation on these bits:
Source 0001001111101010 (dx) Destination 0000110101001110 (ax) Output 0001111010100100 (new dx)
The new dx is 0001111010100100 (7845 decimal, 1EA5 in hex) after the instruction.
Another example:
mov ecx, FFFF0000h not ecx
FFFF0000 is in binary 11111111111111110000000000000000 (16 1's, 16 0's)
If you take the inverse of every bit, you get:
00000000000000001111111111111111 (16 0's, 16 1's), which is 0000FFFF in hex.
So ecx is after the NOT operation 0000FFFFh.
The last one is handy for serial generating, as is XOR. Infact XOR is used more for serials than any other instruction, widely used for serial checking in Winzip, Winrar, EA Games, Vivendi Universalis
I WONT TELL YOU HOW TO MAKE KEYGENS SO DONT ASK :)
INC/DEC(REMENTS)
There are 2 very simple instructions, DEC and INC. These instructions increase or decrease a memory location or register with one. Simply put:
inc reg -> reg = reg + 1 dec reg -> reg = reg - 1 inc dword ptr [103405] -> value at [103405] will increase by one. dec dword ptr [103405] -> value at [103405] will decrease by one.
Ahh easy one to understand So is the next one
NOP
This instruction does absolutely nothing. This instruction just occupies space and time. It is used for filling purposes and patching codes.
BIT rotation and shifting
Note: Most of the examples below use 8-bit numbers, but this is just to make the picture clear.
Shifting functions
SHL destination, count SHR destination, count
SHL and SHR shift a count number of bits in a register/memlocation left or right.
Example:
; al = 01011011 (binary) here shr al, 3
This means: shift all the bits of the al register 3 places to the right. So al will become 00001011. The bits on the left are filled up with zeroes and the bits on the right are shifted out. The last bit that is shifted out is saved in the carry-flag. The carry-bit is a bit in the processor's Flags register. This is not a register like eax or ecx that you can directly access (although there are opcodes to do this), but it's contents depend on the result of the instruction. This will be explained later, the only thing you'll have to remember now is that the carry is a bit in the flag register and that it can be on or off. This bit equals the last bit shifted out.
shl is the same as shr, but shifts to the left.
; bl = 11100101 (binary) here shl bl, 2
bl is 10010100 (binary) after the instruction. The last two bits are filled up with zeroes, the carry bit is 1, because the bit that was last shifted out is a 1.
Then there are two other opcodes:
SAL destination, count (Shift Arithmetic Left) SAR destination, count (Shift Arithmetic Right)
SAL is the same as SHL, but SAR is not quite the same as SHR. SAR does not shift in zeroes but copies the MSB (most significant bit - The first bit if 1 it moves 1 in from the left, if 0 then 0's will be placed from left). Example:
al = 10100110 sar al, 3 al = 11110100 sar al, 2 al = 11111101 bl = 00100110 sar bl, 3 bl = 00000100
This one you may have problems to get to grips with
Rotation functions
rol destination, count ; rotate left ror destination, count ; rotate right rcl destination, count ; rotate through carry left rcr destination, count ; rotate through carry right
Rotation looks like shifting, with the difference that the bits that are shifted out are shifted in again on the other side:
Example: ror (rotate right)
Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0 Before 1 0 0 1 1 0 1 1 Rotate count 3 1 0 0 1 1 0 1 1 (Shift out) Result 0 1 1 1 0 0 1 1
As you can see in the figure above, the bits are rotated, i.e. every bit that is pushed out is shift in again on the other side. Like shifting, the carry bit holds the last bit that's shifted out. RCL and RCR are actually the same as ROL and ROR. Their names suggest that they use the carry bit to indicate the last shift-out bit, which is true, but as ROL and ROR do the same, they do not differ from them.
Exchange
Quite Straightforward this, I wont go into major details, it just swaps the values of two registers about (values, addresses). Like example..
eax = 237h ecx = 978h xchg eax, ecx eax = 978h ecx = 237h
Anyways end of day 1, if you learn this into your head the following days will get easier than harder. This is the basics ive taught you. Learn em well.