:: 전산쟁이의 카피질

더블 버퍼링으로 화면 껌뻑임을 없애자

Posted by 빵빵빵

2011/03/08 14:41 전산(컴퓨터)/PC-Windows

목표 : 화면전환시 깜빡이는 현상을 방지한다.

핵심개념 :

- 깜빡거리는 것은 두 장면 사이에 다른 관련없는 불필요한 장면이 순간적으로 끼어든다는 뜻이다.

- 화면을 그리는 시간을 단축할수록, '미완성 장면'이 보여지는 시간을 줄여 전체적으로 깔끔한 전환이 된다.

이론 :

위 그림의 기존화면전환쪽 흰색부분이 깜빡임의 원인입니다.

더블버퍼링은 나중에 완성된 장면을 한꺼번에 출력시켜 두 장면 사이에 불필요한 장면이 없도록 합니다.

기존화면을 InvalidateRect(hWnd, NULL, FALSE); 로 전환할 경우, 이전 화면을 지우지 않아 일단 깜빡임은 없을겁니만,

화면을 그리는 과정이 눈에 보일 수 있습니다. 특히 그림이 곂쳐지는 부분에선 깜빡임이 발생할 수도 있겠죠.

단순한 프로그램에서는 더블버퍼링을 쓰던 말던 실행속도가 빨라 화면전환에 별 문제는 없으나,

게임처럼 CPU 사용량이 좀 있는 프로그램을 제작할때는 거의 필수적으로 더블버퍼링을 쓰셔야 유저가 편할 겁니다.

예제소스 :

#include <windows.h>
#include <stdio.h> // sprintf_s 관련

char Str[32]; // 그냥 문자열 선언

RECT crt; // 직사각형모양 범위 선언

// 더블버퍼링은 보통 타이머 등과 같이 연동해서 씁니다.

// 여기 예제에서는 순수하게 더블버퍼링과 관련된 것만 코딩하려고 노력하였기에

// 실행시 기존화면출력과 별로 구별되어 보이지 않는 것이 정상입니다.

LRESULT CALLBACK WndProc( HWND hWnd, UINT iMessage, WPARAM wParam, LPARAM lParam )
{
HDC hdc, hMemDC; // HDC를 하나더 선언해준다. HDC는 '그리는 작업' 이다.

HBITMAP hBitmap, OldBitmap; // HBITMAP은 대략 종이를 의미한다. 종이 2장 선언
PAINTSTRUCT ps;

switch( iMessage ) {
case WM_CREATE:

GetClientRect(hWnd, &crt); // 시작할때 현재 창 범위를 crt 에 기억한다.

return 0;

case WM_PAINT: // 페인트 이벤트 → InvalidateRect(hWnd, NULL, FALSE); 로 호출
hdc = BeginPaint(hWnd,&ps); // 그리기 시작

hMemDC = CreateCompatibleDC(hdc); // hMemDC 에 기존 DC (hdc)에 맞게 새 DC 생성
hBitmap = CreateCompatibleBitmap(hdc, crt.right, crt.bottom); // crt 규격대로 종이 생성
OldBitmap = (HBITMAP)SelectObject(hMemDC, hBitmap); // 종이 교체

// 새 DC 생성하면 그에 딸린 펜, 붓, 종이 같은 것들도 기본값으로 다 셋팅되는데,

// 셋팅된 종이가 가로세로 0 x 0 인 종이라 crt 규격인 종이로 글씨체 설정하듯 교체해주어야 하는 것 같음.

// 아니면 말고. 그냥 추측이니까.... 틀린 것일 수도 있음. 난 컴공대생이 아니라 아마추어일 뿐이라고.

// 이부분에 그린다. 주의할 점은 hdc 대신 hMemDC 로 설정해서 그려야 한다는 것.

sprintf_s(Str,"더블 버퍼링"); TextOutA(hMemDC,10,10,Str,lstrlenA(Str));

sprintf_s(Str,"화면 전환시 깜빡임 방지"); TextOutA(hMemDC,10,30,Str,lstrlenA(Str));

BitBlt(hdc, 0, 0, crt.right, crt.bottom, hMemDC, 0, 0, SRCCOPY); // 배껴그리기

// hdc 의 0,0 위치에 hMemDC의 0,0위치부터 crt.right,crt.bottom까지의 영역, 즉 crt범위를 그린다 라는 설정인듯.

DeleteObject(SelectObject(hMemDC, OldBitmap)); // 종이 원래대로 한 후 제거
DeleteDC(hMemDC); // hMemDC 제거

EndPaint(hWnd,&ps); // 그리기 종료
return 0;

case WM_DESTROY:
PostQuitMessage( 0 );
return 0;
}

return DefWindowProc( hWnd, iMessage, wParam, lParam );
}

int APIENTRY WinMain( HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpszCmdParam, int nCmdShow )
{ ...(생략. http://cafe.naver.com/buildgame/71 참조)... }

출처 : http://cafe.naver.com/buildgame.cafe?iframe_url=/ArticleRead.nhn%3Farticleid=94&

ARM 어셈블리에 대해 설명됀 사이트와 예제(Asm과 C 비교)

Posted by 빵빵빵

2011/02/22 15:27 전산(컴퓨터)/Mobile-CE&PPC

http://kkamagui.springnote.com/pages/432792

http://downrg.com/i/entry/417

※ 용어 정의
 Rd: Destination Register / Rn: Operand1 Register / Rm: Operand2 Register
 <cond>: Execution Condition code
 <S>: S-Suffix - Status Update Suffix - SPSR의 값을 CPSR로 불러와서 Status를 Update
 <!> : ! - Suffix - Writeback Suffix - [,]내의 선처리 연산 수행 후 값을 갱신
 <Operand2>: Operand2가 가질 수 있는 형식
ㄱ. #Immediate: 32bit 명령에서 Immediate값은 8-bit pattern의 짝수 shift 값 만을 허용
ㄴ. Rm{, shift연산 #immediate}: Register(Rm)값에 #immediate 값으로 Shift 연산

   § Shift 연산의 종류
   - asr(Arithmetic Shift Right): Immediate의 값 만큼 right shift, 앞에 bit는 Sign Extension
   - lsr(Logical Shift Right): Immediate의 값 만큼 right shift, 앞에 bit는 0으로 채움
   - lsl(Logical Shift Left): Immediate의 값 만큼 left shift, 뒤에 bit는 0으로 채움
   - ror(ROtate Right): Immediate의 값 만큼 rotate right, rotate후 bit 0값은 carry에 저장
   - rrx(Rotate Right eXtend): rrx는 1bit 씩 rotate right, bit 0값은 carry에 저장

1. 데이터 처리 명령(General Data Processing Instruction)

1.1 산술 연산

Syntax: add<cond><S> Rd, Rn, <Operand2>

add: Rd := Rn + <Operand2>
sub: Rd := Rn - <Operand2>
adc(ADd with Carry), sbc(SuBtract with Carry): Carry를 포함한 add, sub 연산

rsb(Reverse SuBtract): Rd := <Operand2> - Rn
rsc(Reverse Subract with Carry): Carry를 포함한 역 sub 연산

1.2 논리 연산

Syntax: and<cond><S> Rd, Rn, <Operand2>

and: Rd := Rn & <Operand2>
orr: Rd := Rn | <Operand2>
eor: Rd := Rn ^ <Operand2>
bic: Rd := Rn & !<Operand2>

1.3 Register 값 저장

Syntax: mov<cond><S> Rd, <Operand2>

mov: Rd := <Operand2>
mvn: Rd := !<Operand2>

1.4 비교

Syntax: cmp<cond><S> Rn, <Operand2>

cmp: Rn값에서 Opeand2값을 빼서 그 결과를 Status flag에 반영, SUBS와 동일한 명령
cmn: Rn값에서 Operand2값을 더해서 그 결과를 Status flag에 반영, ADDS와 동일한 명령

tst: Rn과 Opearand2를 bit and 연산을 수행해서 그 결과를 Status flag에 반영, ANDS와 동일한 명령
teq: Rn과 Operand2를 bit xor 연산을 수행해서 그 결과를 Status flag에 반영, EORS와 동일한 명령

2. 메모리 접근 명령(Memory Accesss Instruction)

Syntax: ldr<cond> Rd, label
 ldr<cond><T> Rd, [Rn]
 ldr<cond> Rd, [Rn, FlexOffset]<!> ;Pre-Indexed<Auto-Indexing>
 ldr<cond><T> Rd, [Rn], FlexOffset ;Post-Indexed

: B Suffix가 있을 경우 8-bit Unsigned byte 단위로 Access, 없을 경우 32-bit word로 Access
<T>: T suffix가 있을 경우 Processor가 User mode에서 memory access 처리
FlexOffset:
 ㄱ.#Immediate: -4095 부터 -4096사이의 상수 값
ㄴ.{-}Rm{, shift연산}: Rm은 음의 부호를 가질 수 있으며, Rm의 Shift 연산도 가능함

 2.1 Load 또는 Store 명령 예제

ldr r0, [r1]: r1에 저장된 주소를 이용해서 메모리로부터 r0로 값을 불러옴
str r0, [r1], #4: r0의 값을 메모리의 r1의 주소에 저장하고 r1을 +4함.
참고) 부호가 있는 Halfword, Byte로 읽을 때는 SH(Signed Halfword), SB(Signed Byte) <--(ldr only)
Unsigned Halfword로 읽거나 저장할 때는 H를 사용.
Doubleword의 경우 D 를 사용, 이 때의 Offset은 {-}Rm 만 허용함.

 2.2 Multiple Load 또는 Store 명령

Syntax: ldm<cond><addrmode> Rn<!>, {reglist}<^>

<addrmode>: address mode에는 총 8가지가 있으며, 4가지는 address의 연상 방식에 따른
구분이며 4가지는 stack의 특성에 따른 구분이다.
- IA(Increment Address after each transfer), - IB(Increment Address after each transfer)
- DA(Decrement Address after each transfer), - DB(Decrement Address after each transfer)

- FD(Full descending stack): stack의 주소에 data가 저장이 된 상태이고, 주소가 감소하면서 저장
- ED(Emtpy descending stack): stack의 주소에 data가 없는 상태이고, 주소가 감소하면서 저장
- FA(Full ascending stack): stack의 주소에 data가 저장이 된 상태이고, 주소가 증가하면서 저장
- EA(Emtpy ascending stack): stack의 주소에 data가 없는 상태이고, 주소가 증가하면서 저장

<!>: ! - Suffix가 있을 경우 마지막 주소(최종으로 이동한 주소)를 Rn에 저장함
<^>: SPSR의 값을 CPSR에 넣어줌, S-Suffix와 동일한 기능을 수행함.

ldm: Rn으로 부터 reglist에 지정한 register 수 만큼 값을 불러옴
stm: reglist에 있는 register의 값들을 Rn에 저장함.

[주의] Reglist에 지정한 Register의 순서와 상관없이 Register의 번호가 낮은 값이
메모리의 낮은 주소에 저장 또는 읽어진다. reglist는 'r1,r2,r3' 또는 'r1-r3'으로 표현
[자주 사용되는 형식] STMFD sp!, {r4-r7,lr} / LDMFD sp!, {r4-r7,pc}

3. 분기 명령(Branch Instruction)

Syntax: b<cond> label

   b: label이 있는 주소로 branch(PC값에 label의 주소를 입력)
   bl: 다음 명령의 주소를 lr에 저장하고, b와 같이 label의 주소로 branch

4. 기타 명령어
   4.1 Software Interrupt

Syntax: swi<cond> Immediate_24bit

swi: 지정한 번호를 갖는 Software Interrupt를 발생시킴, 해당 번호에 맞는 SWI vector로 branch
(Software Interrupt가 걸리면 프로세서의 모드는 Supervisor로 변경됨)

4.2 PSR Access

Syntax: mrs<cond> Rd, psr

psr에 지정한 값(cpsr 또는 spsr)로 부터 값을 불러와서 Rd에 저장 (Register <- PSR)

Syntax: msr<cond> psr_(field), #Immediate_8bit
msr<cond> psr_fields, Rm

Register(Rm)의 값 또는 8bit Immediate값을 psr(cpsr 또는 spsr)에 저장 (Register -> PSR)
(field): f, s, x, c 값이 선택적으로 올 수 있음. 지정한 field 영역에만 값을 저장함.

[주의] 프로세서가 User 또는 System mode일 때는 SPSR에 엑세스 하지 말아야 한다.
[자주 사용되는 형식] msr CPSR_c,r0

5. 상태 플래그와 실행 조건 코드(Status Flags & Execution Condition Codes)
   N: 연산 결과가 음의 값을 가질 때 Set '1'
   Z: 연산 결과가 영일 때 Set '1'
   C: 연산 결과가 캐리(Carry)를 가질 때 Set '1'
   V: 연산 결과 오버플로우(Overflow)를 발생시킬 때 Se

<ARM Instructioin Set>

① opcode<cond><S> Rd, Rn, #Immediate
 ② opcode<cond><S> Rd, Rn, Rm OP #Imm
 ③ opcode<cond><S> Rd, Rn, Rm OP Rs
- cmp, cmn 명령에서는 Rd는 무조건 '0' 값을 넣어줘야 함.(SBZ(Should Be Zero))

 ④ opcode<cond> Rd, Rn, #Immediate
 ⑤ opcode<cond> Rd, Rn, Rm OP #Imm
 ⑥ opcode<cond> Rd, <address>
 ⑦ opcode<cond><addrmode> Rm, Register_List^
 ⑧ opcode<cond><addrmode> Rm<!>, Register_List
 ⑨ opcode<cond><addrmode> Rm<!>, Register_List^
- P='1' Pre, P='0' Post / U='1' Increment, U='0' Decrement / B='1' Byte load, B='0' Word load /
 W='1' Write-back(Auto-Index) W='0' / L='1' opcode는 ldr, L= '0' str /
I='1' Addr_mode가 모두 Offset field I='0' 앞에 Addr_mode는 '0' 뒤에 Addr_mode는 Rm /
S='1' Signed, S='0' Unsigned / H='1' Half Word, H='0' Word or Byte

 ⑩ b<cond> #Target Address(24bit Offset) - L의 값이 '1'이면 bl 명령

 ⑪ SWI #SWI Number

⑫ mrs<cond><S> Rd, PSR
 ⑬ msr<cond><S> PSR_<Field_Mask>, Rm
 ⑭ msr<cond><S> PSR_f, #Immediate
- S의 값이 '1'이면 SPSR에서, '0'이면 CPSR.
- SBO(Should Be One) 영역은 '1'로, SBZ(Should Be Zero) 영역은 '0'의 값을 넣어줘야 함

ARM_Reference-rE.Ejected.pdf

<참고자료>
- ARM Developer Suite 1.2 Assembler Guide(ARM DUI 0068B):
http://infocenter.arm.com/help/topic/com.arm.doc.dui0068b/DUI0068.pdf
- ARM Asssembly Language Programming: http://www.arm.com/miscPDFs/9658.pdf
- kkamagui의 프로그래밍 작업실 ARM 어셈블리: http://kkamagui.springnote.com/pages/432792
- ARM Instruction Quick Finder: http://www.heyrick.co.uk/assembler/qfinder.html
- ARM Reference - rE Ejected: http://re-eject.gbadev.org/ =>ARM_Reference-rE.Ejected.pdf 자료 출처

"Embedded" 카테고리의 다른 글

추가 ARM Assembly 관련 내용

; generated by Thumb C Compiler, ADS1.2 [Build 805]

; commandline [-O2 -S -IC:\apps\ADS12\INCLUDE]
CODE16 1. 일단 Thumb mode compile되었군요.

AREA ||.text||, CODE, READONLY 2. Section 이름은 ||.text||이고 code네요.

calc PROC 3. 함수 이름이 calc라는 걸로 구현되어 있나 봐여
PUSH {r0-r7,lr} 4. 어디선가 불리면서 r0~r6을 stack에 넣고, lr도 넣네요.

LDR r6,|L1.32| 5. r6에 |L1.32|의 주소를 넣고서..
MOV r5,r0 6. 뭔지 모르겠지만 r0는 argument로 받았고

이녀석을 r5에 복사하네요?

MOV r4,#0 7. r4에는 0을 넣고요.
MOV r7,#2 8. r6에는 2를 넣습니다.

|L1.10|
LSL    r5,r5, r7    9. r5를 r7만큼 shift해서 r5에 넣고
ADD    r4,r4,r5 10. r4에 다시 r4와 r5를 더한 것을 넣어요.
MOV    r0,r4 11. r4을 r0에 넣고서
LDR    r1,[r6,#4] ; data 12. r6가 가르키는 곳에서 4만큼 더한 곳의 값을 r1에..
BL manual    13. manual이라는 함수로 r0와 r1의 값을 가지고 jump

LR에는 CMP r0, #0를 넣고서..
CMP    r0,#0 14. manul의 return값이 0인지 확인해서
BNE    |L1.10| 15. 0이 아니면 |L1.10|으로 jump
MOV    r0,r4    16. 이제 돌아가기 위해서 r4를 r0에 넣고
POP    {r3-r7,pc} 17. 이 함수를 부른 곳으로 돌아간다 .
DCW    0000
|L1.32| DATA
DCD    data
ENDP

AREA ||.data||, DATA, ALIGN=2

||.data$0||
data
DCB    0x0000000a
DCB    0x00000014
DCB    0x0000001e
DCB    0x0000002

일단은 뭔가를 loop를 돌고 있는데, manual 이라는 함수의 결과 값이 0이면 계속 loop를 도는 형태로서, 그 loop안에서는 r5가 계속 2의 좌승 형태로 늘어나면서 r4에 저장되는 형태를 취하고 있고요. manual의 argument는 계속 저장되는 r4와 특정 memory 영역에서 가져온 값인 r1, 2개를

갖습니다. 또한 이 함수는 return값으로 계속 저장된 값 r4를 최종적으로 돌려주게 되어 있는 형태 입니다. 대충 감 오시는 지요?

이 녀석을 대충의 c code로 다시 reverse engineering해 보면 다음과 같이 상상해 볼 수 있을 것입니다.

byte data[] = { 10, 20, 30, 40};

int calc (int a)
{
int sum=0; //아마도 r4

while (1)
{
a = a^4; // 여기에서 재미있는 사실은 ^4할 때마다 <<2를 해주면 같은 효과

// 아마도 r5

   sum = sum + a ;    // r4 = r4 + r5

   if (manual (sum,data[1])) // r4와 r6이 가르키는 이상한 data를 manual로 넘겨줌.
break;    // 조건이 return이 0이 아니면 끝.
}
return sum; // r4를 return함.
}

출처 : http://recipes.egloos.com/5027277

디렉티브

1) Export i (== 전역변수)
-> 즉, i 라는 변수를 현재의 파일내에서 최초 선언하여 사용했지만, 다른 Object 파일에서 참조한다는 것을
assembler에게 알려주는 것.

2) IMPORT i (== extern int i;)
-> 현재의 소스파일에서 사용되는 변수 i는 다른 Object 파일에서 선언되어져 있다는 것을
Assembler 에게 알려주는 것입니다.

3) DCD == " = "
-> 즉 "table DCD 1,2,4 " 라고 되어진 것은 32bit 로 메모리를 3개 할당하고
각각에 1,2,4 라는 값을 초기값으로 넣고 그 시작 Address는 table 이라는 label 로 표현되어 집니다.
이것을 C로 표현하면 unsigned int table[3] = {1,2,4};

4) AREA == segment
-> 모든 프로그램은 크게 두가지의 기계어 영역을 반드시 포함하고 있습니다.
-> 실제로 실행하여야할 실행 코드를 포함하는 영역, 실행코드를 수행하기 위하여 사용되는 변수를
정의하여 R/W 하는데 사용해야할 영역, 전자는 CODE area 와 후자는 DATA area 를 말하는 것입니다.

--------------------------------------------아직 정리 안함-------------------------------------------------------

5) ALIGN
-> ARM 은 기본적으로 모든 명령이 4byte 로 구성되어 집니다. 즉 4의 배수로 실행코드를 쪼개의 기계어를 하나하나 해석해 낸다고 보면 됩니다. 그런데 ASM 코드들 중간에 address 나 변수등을 정의할 때 1byte 나 2byte 로 정의를 해버리면 4의 배수가 않되는 경우가 발생합니다. 이때 오류가 발생할 수도 있습니다. 즉 4의 배수를 맞추기위하여 0으로 초기화되는 byte들을 추가로 패딩해주는 것이 필요한데 이것을 해주는 directive입니다.

6)SPACE
-> 초기값을 zero 로 지정한 바이트 만큼 메모리를 잡으라는 것입니다.

------------------------------------------------
EXPORT INT_Loaded_Flag
INT_Loaded_Flag DCD 0

-> INT_Loaded_Flag 라는 변수를 0으로 잡는데 그 크기는 4bytes로 한다. 그리고 이 변수는 다른 파일에서 참조되어진다.
------------------------------------------------
IMPORT INT_Initialize
INT_Resel_Addr DCD INT_Initialize

-> INT_Initialize 라는 이름의 함수 또는 변수가 현재의 파일이외에 다른 파일에 선언되어져 있는데 이것을 현재의 파일에서 참조하고 그 Address를 4bytes 크기로 끈어서 INT_Resel_Addr라는 변수명으로 재 정의한다.
------------------------------------------------
INT_IRQ_Vectors
DCD INT_IRQ_Shell

-> INT_IRQ_Shell 이라는 함수의 시작 Address를 INT_IRQ_Vectors라는 사람이 인지할 수있는 label 로 재정의 한다.
-----------------------------------------------
INT_bss_start
DCD |Image$$bss$$Base|

-> 이것은 "Image$$bss$$Base"의 의미를 알아야 합니다. Image$$bss$$Base은 ARM assembler/linker 가 이해하는 label 입니다. 이것은 이미 무언의 약속으로 정의된 label 이므로 프로그래머가 어셈 코드에서 따러 정의하지 않아도 ARM asembler/compiler 가 알아서 생성해내는label 입니다. Image$$bss$$Base의 의미만 말하면 , bss 영역의 시작 address를 의미하는 것입니다.
즉, bss 영역의 시작 address를 INT_bss_start
라는 label로 재정의하여 프로그램을 용이하게 하겠다는 directive문구 입니다.

출처 : http://gauss.egloos.com/643866

ARM WinCE용 어셈 샘플 이그잼플

Posted by 빵빵빵

2011/02/17 14:45 전산(컴퓨터)/Mobile-CE&PPC

출처 : http://blogs.arm.com/software-enablement/155-how-to-call-a-function-from-arm-assembler/

번역 : 빵빵빵

오역이 있을 수 있으므로 100% 믿지 마시고... 지적 바랍니다.

How to Call a Function from ARM Assembler

어셈블러에서 함수 호출하기

Posted by ARM_DaveB,

LEAVE COMMENT

26 February 2010(원저작자입니다.)

Once you move beyond short sequences of optimised ARM assembler, the next likely step will be to managing more complex, optimised routines using macros and functions. Macros are good for short repeated sequences, but often quickly increase the size of your code. As lower power and smaller code sizes are often closely tied, it is not long before you will need to make effective and efficient use of the processor by calling functions from your carefully hand-crafted code.

한번 최적화된 ARM 어셈블러로 짧은 코드를 해보면, 다음은 매크로와 함수들을 사용해서 좀더 복잡한 코드를 만들어 보고 싶어지게 된다. 매크로는 반복되는 코드들을 간단하게 보이게 하기 위해 아주 좋다. 그러나 코드를 무지막지하게 증가시켜버릴 수도 있(으므로 주의해야 한)다.(주. 매크로는 실제로 조금 복잡한 코드를 한 단어 수준으로 간단하게 보여주는 것이므로 매크로를 여러번 사용하면 화면상에는 간단하게 보이지만 어셈블러는 원래의 복잡한 코드들을 여러번 사용한 것으로 보고 어셈블하므로 코드 사이즈가 커질수 밖에 없다.)

Leaving, only to Return
반환값만 남기다??

To start, here is a small example in ARM Assembler with one function calling another.

이제, 간단한 ARM 어셈블러 예제를 보겠는데, 다른 함수를 호출하는 예제이다.

CODE

.globl main
.extern abs
.extern printf

.text
output_str:
.ascii "The answer is %d\n\0"

@ returns abs(z)+x+y
@ r0 = x, r1 = y, r2 = z
.align 4
do_something:
push {r4, lr}
add r4, r0, r1
mov r0, r2
bl abs
add r0, r4, r0
pop {r4, pc}

main:
push {ip, lr}
mov r0, #1
mov r1, #3
mov r2, #-4
bl do_something
mov r1, r0
ldr r0, =output_str
bl printf
mov r0, #0
pop {ip, pc}

The interesting instructions, at least when we are talking about the link register and the stack, are push, pop andbl. If you are familiar with other assembler languages, then I suspect push and pop are no mystery. They simply take the provided register list and push them onto the stack - or pop them off and into the provided registers. bl, as you may have guessed, is no more than branch with link, where the address of the next instruction after the branch is loaded into the link register lr. Once the routine we are calling has been executed, lr can be copied back to pc, which will enable the CPU to continue from the code after the bl instruction

link register와 스텍에 대해 이야기할 때, 중요한 인스트럭션(명령)은 push, pop, bl 이다. 만일 다른 어셈블리언어에(x86 같은??? PC용 어셈블러) 익숙하다면, push와 pop은 동일하다고 생각한다. 이 명령들은 레지스터 리스트에 들어오는 값을 스텍에 저장하거나(push), 스텍에 있는 값을 레지스터로 꺼내온다(pop). 추측하고 있을지 모르겠지만... bl은 다음 인스트럭션(명령)의 주소를 link register(lr)에 저장하고 link로 분기한다. 한번 호출한 루틴이 실행되면, bl 인스트럭션 명령 실행 뒤의 명령을 CPU가 계속 실행 될 수 있게 lr(link register)은 pc(program counter?? 현재 실행하는 인스트럭션의 위치-주소) 복사된다.

In do_something we push the link register to the stack, so that we can pop it back off again to return, even though the call to abs will have overwritten the original contents of the link register. The program stores r4, because the ARM procedure call standard specifies that r4-r11 must be preserved between function calls and that the called function is responsible for that preservation. This means both that do_something needs to preserve the result of r0 + r1 in a register that will not be destroyed by abs, and that We must also preserve the contents of whichever register we use to hold that result. Of course in this particular case, we could have just used r3, but it is something that needs to be considered.

=>do_something 에서 link register(lr)를 스택에 저장했다. 중간에 abs를 호출해서 원래의 link register 값이 변경되더라도 리턴해서 되돌아 올때 pop해서 원래의 주소로 되돌아 올 수 있다. 프로그램이 r4에 저장한다. ARM 명령어 호출 표준 스팩에 r4~r11은 함수 호출과 호출된 함수의 응답에 대한 보존값을 저장하도록 정의되어 있다. 이것은 do_something은 r0 + r1의 결과 abs 호출에 의해 없어지지 않게 레지스터에 저장한다는 의미이다. 그리고 원하는 결과를 보존할 수 있다. 물론 r3 레지스터를 쓰고 싶으면 쓸 수는 있지만.. 잘 생각해서!!! 문제 없을지!!를 잘 생각하고 써야한다.

We push and pop the ip register, even though we do not have to preserve it, because the procedure call standard requires that the stack be 64-bit aligned. This gives a performance benefit when using the stack operations as they can take advantage of 64-bit data paths within the CPU.

=> 명령어 호출 표준(procedure call standard)가 64비트로 정렬된 스택을 요구하기 때문에, ip 레지스터가 (값이) 필요없다 하더라도, push 하고 pop 한다.

We could just push the value, after all if abs needs the register, then that is how it will preserve it. There is a minor performance case for pushing r4 rather than the value we know we will need, but the strongest argument is probably that just pushing/popping any registers you need at the start and end of the function makes for less error prone and more readable code.

You will also notice that the 'main' function also pushes and pops the contents of lr. That is because while the main code may be the first thing in my code to be executed, it is not the first thing to be executed when my program is loaded. The compiler will be insert calls to some basic setup functions before main is called, and to some final clean up calls for when we exit.

The Special Case of Windows CE

Windows CE uses a technique known as Structured Exception Handling to unwind the stack when an exception occurs. This requires anyone writing assembler code to take notice of some additional restrictions when implementing for that OS. Coding examples are available on MSDN, and should be consulted, but the general idea is that there should be no changes to the value of sp other than as the very first and very last instructions in your function. If you perform a stack push or pop at any other point the virtual unwinder can cause your application some very non-virtual trouble.

Passing on

It is almost certainly worth your time becoming familiar with the details of the ARM Procedure Call Standard but apart from the list of registers that need to be preserved that was covered earlier it is probably worth quickly covering the passing in of parameters and the returning of results.

The first four 32-bit values are passed in the registers r0-r3. If one of the parameters is 64 bits long, then either r0and r1 or r2 and r3 will be used - but not r1 and r2. The endianness used is officially defined to be "as if the value had been loaded from memory representation with a single LDM instruction". Rather than looking up what that means, I would suggest simply writing some code to test it. If there are more parameters than will fit in r0-r3, then the last of the values are written to the stack before the function is called.

Results are returned in r0, or r0 and r1 if it requires 64-bits. Check the link above for more detailed information, but that should cover most cases.

Need for Speed?

One important thing to remember when working with the link register is that the latest ARM processors provideReturn Stack Prediction in addition to normal branch prediction. If the processor comes across an instruction likepop {...,pc} or bx lr it will try to 'branch predict' the return. This allows the processor to successfully predict return branches when common code is called from many points and normal branch prediction techniques could not be used. On processors with longer pipelines this can be a useful optimisation. To make use of it from your assembler code you need to follow some simple guidelines:

Do

Use instructions like pop {pc} when you are returning normally
Use b instead of bl or blx if you do not expect to return to execute the next instruction
Use blx when calling code indirectly (using a value in a register) rather than loading directly to pc

추가 : http://recipes.egloos.com/4988629

Register 사용법을 총칭하여

PCS (Procedure Call Standard)라고 부르고요,

APCS : ARM Procedure Call Standard (구버전)

TPCS : Thumb Procedure Call Standard (구버전)

ATPCS : ARM-Thumb Procedure Call Standard (AAPCS의 선배)

AAPCS : Procedure Call Standard for ARM Architecture (현재 최신버전)

라고 이름 붙였네요. 결국 지금 사용되는 Procedure Call Standard (Register 사용법)은 이름 하야 AAPCS라고 부르는 게 맞겠습니다.

이 용법을 지금 잘 알아두면, 이후에 함수의 구조, Stack의 사용 등을 이해하기 쉬우니까, 꼭 알아 두셔야 해요.

AAPCS에 의한 각 Register의 사용법은 Table과 같습니더.