My journey with ATTiny4313 (part 4)

Part 4: Dealing with interrupts

Managing the /STROBE line

The protocol between the driver and the Soundpool MO4

The communication between the Atari and the Soundpool MO4 is achieved via the parallel port (or printer port, sometimes called "Centronics") and a specific driver; I've already detailed the connection in this article.

What is /STROBE used for?

Since the parallel port is asynchronous, /STROBE is used to warn the MO4 that a new byte is available.
But /STROBE is also used to reset the MO4 when it is pulsed 4 times in a row. In response, BUSY is expected to be high in less than 2 micro-second.
The reset is requested when the driver starts, and after a bunch of data has been sent.
The driver always send data to the 4 outputs of the MO4, from output 1 to 4: for each output, one byte is sent to indicate how many bytes will follow for this output, then the data bytes:

  1. Send data to the MO4:
    1. Send data for output 1
      1. Write the quantity of bytes to follow
      2. Pulse /STROBE: the MO4 reads the quantity
      3. Write byte #1 to the parallel port
      4. Pulse /STROBE: the MO4 reads byte #1 and sent it to MIDI
      5. Write byte #2 to the parallel port
      6. Pulse /STROBE: the MO4 reads byte #2 and sent it to MIDI
      7. ...
    2. Repeat for output 2
    3. Repeat for output 3
    4. Repeat for output 4
  2. Reset the MO4
    1. Write 0 to the parallel port
    2. Pulse /STROBE: 0 byte to follow for output 1
    3. Pulse /STROBE: 0 byte to follow for output 2
    4. Pulse /STROBE: 0 byte to follow for output 3
    5. Pulse /STROBE: 0 byte to follow for output 4
(Assembler MC68000)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Driver write routines, called by MROS.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
driver_write_port1:
    ...
    bra.b  driver_write
driver_write_port2:
    ...
    bra.b  driver_write
driver_write_port3:
    ...
    bra.b  driver_write
driver_write_port4:
    ...
driver_write:
    ...
    bsr.w  reset_hardware
    ...
    bne.b  .dowrite
    ...
    rts
    
.dowrite:
    ...
    bsr.w   write_byte_hardware   ; Actually write the data to the hardware.
    ...
    rts

Timing issue

The tight timing doesn't allow the micro-controller to properly manage the signal (i.e. counting pulses in a time frame and raising BUSY accordingly). To overcome this issue, I decided to delegate the counting to an external chip, the decade counter 74LS90, in a divide-by-five configuration with CLKA connected to /STROBE and Qc connected to BUSY.

Each transition of STROBE from low to high will increment the counter. The outputs Qa and Qc are used to detect the changes; Qa will trigger PCINT11 and start a timer, Qc will trigger INT1 and stop the timer. Otherwise, the ATTiny must reset the decade counter by pulsing R1 and R2 (pins 2 & 3) for at least 15 ns, which will bring Qa, Qb and Qc to the initial state. We use the Timer 1, started When PCINT11 state changes.

Chronograms

Instructions and chronogram when data is sent

Note: on the Atari, the parallel port (or printer port, sometimes called "Centronics") is managed via the sound chip, the Yamaha YM-2149. On this chip, we use 2 unused registers: register #14 (port A) and #15 (port B). Port A

(Assembler MC68000)
write_byte_hardware:
    (...)
    lea     ym_base.w,a1 ; a1 contains the address of YM-2149
    lea     2(a1),a2     ; a2 = ym_data.

    move.b	#15,(a1)     ; Select the port B of YM-2149
    move.b	port_bitfield_copy(a5),(a2)	; Write port bitfield.
.loop:
    move.b	#14,(a1)     ; Select the port A of YM-2149
    move.b	d1,(a2)      ; Pulse strobe (bit 5)
    move.b	d0,(a2)
    move.b	#15,(a1)     ; Select the port B of YM-2149
    move.b	(a3)+,(a2)   ; Write data.
    dbf	d3,.loop

    move.b	#14,(a1)     ; Select the port A of YM-2149
    move.b	d1,(a2)      ; Pulse strobe.
    move.b	d0,(a2)
    
    move.b	mfp_gpio_reg.w,d0
    move.b	#$FE,interrupt_pending_reg.w ; Clear interrupt.
    and.w	#1,d0                        ; Set hardware_ready if busy line was 0.
    subq.w	#1,d0
    or.w	d0,hardware_ready(a5)

    move.w	#$2400,sr                    ; Interrupts on.
    movem.l	(a7)+,d0-d3/a0-a3
    rts
Given the timing sheet of the MC68000, this gives the following chronogram on /STROBE:

Instructions and chronogram when a reset is requested

(Assembler MC68000)
reset_hardware:
    (...)
    move.b	#15,ym_base.w    ; Select the port B of YM-2149
    move.b	#0,(a5)          ; Set data lines to 0.
    (...)
    move.b	d1,(a5)          ; Pulse strobe low-high four times.
    move.b	d0,(a5)          ; This gets the hardware back to a state where
    move.b	d1,(a5)          ; it's expecting a port bitfield, regardless
    move.b	d0,(a5)          ; of what state it's currently in.
    move.b	d1,(a5)
    move.b	d0,(a5)
    move.b	d1,(a5)
    move.b	d0,(a5)          ; End with strobe high.
    
    move.b	mfp_gpio_reg.w,d0            ; Fetch busy line status.
    move.b	#$FE,interrupt_pending_reg.w ; Clear busy interrupt.
    lea 	vars(pc),a5
    and.w	#1,d0
    eori.w	#1,d0                        ; Invert bit 0 of d0 (busy input).
    move.w	d0,hardware_ready(a5)        ; Conditionally mark hardware ready.

    move.w	(a7)+,sr
    movem.l	(a7)+,d0-d1/a5
    rts
The overall schematic is the following.

And now?

Speed

The better understanding of the protocol between the Atari and the MO4 raised a question: is the ATTiny4313 still best suited for this project? The addition of the decade counter (74LS90), which I chose a bit by chance, was actually a good help since it allows to manage the BUSY signal faster than what could be achieved with the ATTiny, but it also added a bit of complexity

Originally, the Soundpool MO4 is built uppon a CPLD, the ispLSI1016 from Lattice. For simple logic, CPLD and FPGA are faster than a 8 bit CPU paced at 8 MHz. The ATTiny is very efficient: most of the instructions executes in 1 or 2 clock cycles. For example, a LDI rr,#xx (load immediate value #xx to the register rr) instruction on the ATTiny is executed in 1 cycle; the equivalent on a the Atari's CPU, the Motorola MC680000, (move.b #xx,rr) takes 8 cycles. I think this efficiency is due to the architecture: ATTiny is based on Harvard architecture, while MC68000 is based on Von Neumann architecture.

The ijmp instruction

This instruction allows to jump to the address pointed by Z (16 bit pointer made by registers r30 and r31). It's very efficient and runs in 2 cycles. Since I know the first incoming byte always represents the quantity of data to follow, I can use the Z register to select what part of code to run. Then, every time /STROBE goes down, it triggers an interrupt; I now what is expected and execute only the good part of code.
I use the RAM as a buffer; pointer X is used to store the data. Another part of the code reads the buffer and send the data.
Note: The code is pretty ugly, with several duplicates, but the idea is to minimize the CPU cycles and I currently use less than 10% of program memory. So sometime, duplicating a part of code saves a jump.

...
; =========================================================
; Interrupt vectors
; =========================================================
INT1addr:       rjmp    strobe
...
; =========================================================
; Interrupt Service Routine (ISR)
; =========================================================
strobe:
        ijmp                   ; Indirect jump to execute the good section
counter:
        in      r21, PORTB     ; Read counter from Port B
        cpi     r21, 0x00      ; Data expected ?
        breq    counter_end    ; No -> end
        cpi     r22, ME        ; Are these data for me ?
        breq    counter_next   ; Yes -> prepare
        ldi     ZH, hi8(skip)  ; The next bytes are to be skipped
        ldi     ZL, lo8(skip)

counter_next:
        ldi     ZH, hi8(data)  ; The next bytes are data
        ldi     ZL, lo8(data)
        
        clr     r16
        STORE   TCNT1H, r16
        STORE   TCNT1L, r16    ; Reset timer 1 counter
        ldi     r16, T1_START
        STORE   TCCR1B, r16    ; Start timer 1
        reti
counter_end:
        inc     r22            ; prepare for next unit
        
        clr     r16
        STORE   TCNT1H, r16
        STORE   TCNT1L, r16    ; Reset timer 1 counter
        ldi     r16, T1_START
        STORE   TCCR1B, r16    ; Start timer 1
        reti

skip:
        dec     r21            ; Decrement counter
        brge    next           ; Loop while r21 > 0
        ldi     ZH, hi8(counter) ; The next byte is a counter
        ldi     ZL, lo8(counter)
        reti
data:
        in      r20, PORTB     ; Read data from Port B
        st      X+, r20        ; Store in RAM buffer, increment X
        andi    XL, 0x7f       ; Make sure X never exceed 127
        dec     r21            ; Decrement counter
        brge    next           ; Loop while r21 > 0
        ldi     ZH, hi8(counter) ; The next byte is a counter
        ldi     ZL, lo8(counter)
next:
        reti
  

Follow up:  Part 1  Part 2  Part 3  Part 4.

Comments

Popular Posts