My journey with ATTiny4313 (part 4)
Part 4: Dealing with interrupts
Managing the /STROBE line
The protocol between the driver and the Soundpool MO4
The communication between the Atari and the Soundpool MO4 is achieved via the parallel port (or printer port, sometimes called "Centronics") and a specific driver; I've already detailed the connection in this article.
What is /STROBE used for?
Since the parallel port is asynchronous, /STROBE is used to warn the MO4 that a new byte is available.
But /STROBE is also used to reset the MO4 when it is pulsed 4 times in a row. In response, BUSY is expected to be high in less than 2 micro-second.
The reset is requested when the driver starts, and after a bunch of data has been sent.
The driver always send data to the 4 outputs of the MO4, from output 1 to 4: for each output, one byte is sent to indicate how many bytes will follow for this output, then the data bytes:
- Send data to the MO4:
- Send data for output 1
- Write the quantity of bytes to follow
- Pulse /STROBE: the MO4 reads the quantity
- Write byte #1 to the parallel port
- Pulse /STROBE: the MO4 reads byte #1 and sent it to MIDI
- Write byte #2 to the parallel port
- Pulse /STROBE: the MO4 reads byte #2 and sent it to MIDI ...
- Repeat for output 2
- Repeat for output 3
- Repeat for output 4
- Send data for output 1
- Reset the MO4
- Write 0 to the parallel port
- Pulse /STROBE: 0 byte to follow for output 1
- Pulse /STROBE: 0 byte to follow for output 2
- Pulse /STROBE: 0 byte to follow for output 3
- Pulse /STROBE: 0 byte to follow for output 4
(Assembler MC68000) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; Driver write routines, called by MROS. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; driver_write_port1: ... bra.b driver_write driver_write_port2: ... bra.b driver_write driver_write_port3: ... bra.b driver_write driver_write_port4: ... driver_write: ... bsr.w reset_hardware ... bne.b .dowrite ... rts .dowrite: ... bsr.w write_byte_hardware ; Actually write the data to the hardware. ... rts
Timing issue
The tight timing doesn't allow the micro-controller to properly manage the signal (i.e. counting pulses in a time frame and raising BUSY accordingly). To overcome this issue, I decided to delegate the counting to an external chip, the decade counter 74LS90, in a divide-by-five configuration with CLKA connected to /STROBE and Qc connected to BUSY.
Each transition of STROBE from low to high will increment the counter. The outputs Qa and Qc are used to detect the changes; Qa will trigger PCINT11 and start a timer, Qc will trigger INT1 and stop the timer. Otherwise, the ATTiny must reset the decade counter by pulsing R1 and R2 (pins 2 & 3) for at least 15 ns, which will bring Qa, Qb and Qc to the initial state. We use the Timer 1, started When PCINT11 state changes.
Chronograms
Instructions and chronogram when data is sent
Note: on the Atari, the parallel port (or printer port, sometimes called "Centronics") is managed via the sound chip, the Yamaha YM-2149. On this chip, we use 2 unused registers: register #14 (port A) and #15 (port B). Port A
(Assembler MC68000) write_byte_hardware: (...) lea ym_base.w,a1 ; a1 contains the address of YM-2149 lea 2(a1),a2 ; a2 = ym_data. move.b #15,(a1) ; Select the port B of YM-2149 move.b port_bitfield_copy(a5),(a2) ; Write port bitfield. .loop: move.b #14,(a1) ; Select the port A of YM-2149 move.b d1,(a2) ; Pulse strobe (bit 5) move.b d0,(a2) move.b #15,(a1) ; Select the port B of YM-2149 move.b (a3)+,(a2) ; Write data. dbf d3,.loop move.b #14,(a1) ; Select the port A of YM-2149 move.b d1,(a2) ; Pulse strobe. move.b d0,(a2) move.b mfp_gpio_reg.w,d0 move.b #$FE,interrupt_pending_reg.w ; Clear interrupt. and.w #1,d0 ; Set hardware_ready if busy line was 0. subq.w #1,d0 or.w d0,hardware_ready(a5) move.w #$2400,sr ; Interrupts on. movem.l (a7)+,d0-d3/a0-a3 rtsGiven the timing sheet of the MC68000, this gives the following chronogram on /STROBE:
Instructions and chronogram when a reset is requested
(Assembler MC68000) reset_hardware: (...) move.b #15,ym_base.w ; Select the port B of YM-2149 move.b #0,(a5) ; Set data lines to 0. (...) move.b d1,(a5) ; Pulse strobe low-high four times. move.b d0,(a5) ; This gets the hardware back to a state where move.b d1,(a5) ; it's expecting a port bitfield, regardless move.b d0,(a5) ; of what state it's currently in. move.b d1,(a5) move.b d0,(a5) move.b d1,(a5) move.b d0,(a5) ; End with strobe high. move.b mfp_gpio_reg.w,d0 ; Fetch busy line status. move.b #$FE,interrupt_pending_reg.w ; Clear busy interrupt. lea vars(pc),a5 and.w #1,d0 eori.w #1,d0 ; Invert bit 0 of d0 (busy input). move.w d0,hardware_ready(a5) ; Conditionally mark hardware ready. move.w (a7)+,sr movem.l (a7)+,d0-d1/a5 rtsThe overall schematic is the following.
And now?
Speed
The better understanding of the protocol between the Atari and the MO4 raised a question: is the ATTiny4313 still best suited for this project? The addition of the decade counter (74LS90), which I chose a bit by chance, was actually a good help since it allows to manage the BUSY signal faster than what could be achieved with the ATTiny, but it also added a bit of complexity
Originally, the Soundpool MO4 is built uppon a CPLD, the ispLSI1016 from Lattice. For simple logic, CPLD and FPGA are faster than a 8 bit CPU paced at 8 MHz. The ATTiny is very efficient: most of the instructions executes in 1 or 2 clock cycles. For example, a LDI rr,#xx
(load immediate value #xx to the register rr) instruction on the ATTiny is executed in 1 cycle; the equivalent on a the Atari's CPU, the Motorola MC680000, (move.b #xx,rr
) takes 8 cycles. I think this efficiency is due to the architecture: ATTiny is based on Harvard architecture, while MC68000 is based on Von Neumann architecture.
The ijmp
instruction
This instruction allows to jump to the address pointed by Z (16 bit pointer made by registers r30 and r31). It's very efficient and runs in 2 cycles. Since I know the first incoming byte always represents the quantity of data to follow, I can use the Z register to select what part of code to run. Then, every time /STROBE goes down, it triggers an interrupt; I now what is expected and execute only the good part of code.
I use the RAM as a buffer; pointer X is used to store the data. Another part of the code reads the buffer and send the data.
Note: The code is pretty ugly, with several duplicates, but the idea is to minimize the CPU cycles and I currently use less than 10% of program memory. So sometime, duplicating a part of code saves a jump.
... ; ========================================================= ; Interrupt vectors ; ========================================================= INT1addr: rjmp strobe ... ; ========================================================= ; Interrupt Service Routine (ISR) ; ========================================================= strobe: ijmp ; Indirect jump to execute the good section counter: in r21, PORTB ; Read counter from Port B cpi r21, 0x00 ; Data expected ? breq counter_end ; No -> end cpi r22, ME ; Are these data for me ? breq counter_next ; Yes -> prepare ldi ZH, hi8(skip) ; The next bytes are to be skipped ldi ZL, lo8(skip) counter_next: ldi ZH, hi8(data) ; The next bytes are data ldi ZL, lo8(data) clr r16 STORE TCNT1H, r16 STORE TCNT1L, r16 ; Reset timer 1 counter ldi r16, T1_START STORE TCCR1B, r16 ; Start timer 1 reti counter_end: inc r22 ; prepare for next unit clr r16 STORE TCNT1H, r16 STORE TCNT1L, r16 ; Reset timer 1 counter ldi r16, T1_START STORE TCCR1B, r16 ; Start timer 1 reti skip: dec r21 ; Decrement counter brge next ; Loop while r21 > 0 ldi ZH, hi8(counter) ; The next byte is a counter ldi ZL, lo8(counter) reti data: in r20, PORTB ; Read data from Port B st X+, r20 ; Store in RAM buffer, increment X andi XL, 0x7f ; Make sure X never exceed 127 dec r21 ; Decrement counter brge next ; Loop while r21 > 0 ldi ZH, hi8(counter) ; The next byte is a counter ldi ZL, lo8(counter) next: reti
Comments