Oftentimes, it can be advantageous to create small application helpers in assembly language to decrease codespace and increase execution speed of the overall program. An example of this would be the transfer of characters to a FIFO (First-In, First-Out) storage buffer for transmission over a serial port. This simple action could be performed by an assembly language driver which would execute much quicker than coding it in C. The following discussion outlines a method of interfacing a C program with an assembly language subroutine.
The first step in creating the assembly language code for the driver is
to determine how to pass the C arguments to the assembly language
routine. The cc65 toolset allows the user to specify whether the data
is passed to a subroutine via the stack or by the processor registers by
using the __fastcall__
function declaration (note that there
are two underscore characters in front of and two behind the
fastcall
declaration). When __fastcall__
is
specified, the rightmost argument in the function call is passed to the
subroutine using the 6502 registers instead of the stack. Note that if
there is only one argument in the function call, the execution overhead
required by the stack interface routines is completely avoided.
Without __fastcall__
, the argument is loaded in the A and X
registers and then pushed onto the stack via a call to pushax
.
The first thing the subroutine does is retrieve the argument from the
stack via a call to ldax0sp
, which copies the values into the A
and X. When the subroutine is finished, the values on the stack must be
popped off and discarded via a jump to incsp2
, which includes
the RTS subroutine return command. This is shown in the following code
sample.
Calling sequence:
lda #<(L0001) ; Load A with the high order byte
ldx #>(L0001) ; Load X with the low order byte
jsr pushax ; Push A and X onto the stack
jsr _foo ; Call foo, i.e., foo (arg)
Subroutine code:
_foo: jsr ldax0sp ; Retrieve A and X from the stack
sta ptr ; Store A in ptr
stx ptr+1 ; Store X in ptr+1
... ; (more subroutine code goes here)
jmp incsp2 ; Pop A and X from the stack (includes return)
If __fastcall__
is specified, the argument is loaded into the A
and X registers as before, but the subroutine is then called
immediately. The subroutine does not need to retrieve the argument
since the value is already available in the A and X registers.
Furthermore, the subroutine can be terminated with an RTS statement
since there is no stack cleanup which needs to be performed. This is
shown in the following code sample.
Calling sequence:
lda #<(L0001) ; Load A with the high order byte
ldx #>(L0001) ; Load X with the low order byte
jsr _foo ; Call foo, i.e., foo (arg)
Subroutine code:
_foo: sta ptr ; Store A in ptr
stx ptr+1 ; Store X in ptr+1
... ; (more subroutine code goes here)
rts ; Return from subroutine
The hardware driver in this example writes a string of character data to a hardware FIFO located at memory location $1000. Each character is read and is compared to the C string termination value ($00), which will terminate the loop. All other character data is written to the FIFO. For convenience, a carriage return/line feed sequence is automatically appended to the serial stream. The driver defines a local pointer variable which is stored in the zero page memory space in order to allow for retrieval of each character in the string via the indirect indexed addressing mode.
The assembly language routine is stored in a file names "rs232_tx.s" and is shown below:
; ---------------------------------------------------------------------------
; rs232_tx.s
; ---------------------------------------------------------------------------
;
; Write a string to the transmit UART FIFO
.export _rs232_tx
.exportzp _rs232_data: near
.define TX_FIFO $1000 ; Transmit FIFO memory location
.zeropage
_rs232_data: .res 2, $00 ; Reserve a local zero page pointer
.segment "CODE"
.proc _rs232_tx: near
; ---------------------------------------------------------------------------
; Store pointer to zero page memory and load first character
sta _rs232_data ; Set zero page pointer to string address
stx _rs232_data+1 ; (pointer passed in via the A/X registers)
ldy #00 ; Initialize Y to 0
lda (_rs232_data) ; Load first character
; ---------------------------------------------------------------------------
; Main loop: read data and store to FIFO until \0 is encountered
loop: sta TX_FIFO ; Loop: Store character in FIFO
iny ; Increment Y index
lda (_rs232_data),y ; Get next character
bne loop ; If character == 0, exit loop
; ---------------------------------------------------------------------------
; Append CR/LF to output stream and return
lda #$0D ; Store CR
sta TX_FIFO
lda #$0A ; Store LF
sta TX_FIFO
rts ; Return
.endproc