6502 Homebrew – Mark Hindsbo’s Blog

June 18, 2020

Reading FAT data Part 2

Let’s just recap a little before we move on. In FAT the SD Card is divided into clusters. Each cluster has the same size, which is a power of 2 times the SD sector size (1 sector = 512 bytes). On my current FAT32 formatted SD card the cluster size is 4096 bytes or 8 sectors. A cluster represents the minimum size of data you can allocate on the card and files always takes up a whole number of clusters (which might be more than the actual file size).

The FAT32 system uses 4 byte numbers for the clusters, starting with $000000, $00000001, $00000002, etc. They are stored in little endian in the file allocation table (FAT) and you can see an example of this here. A file is a simple chain of sectors containing its data and that chain can be found in the FAT. The table below shows an example of a FAT entry for a file starting in sector 3 and continuing to sector 6. Each cluster entry contains the number of the next cluster in the chain. In the example below the value for cluster 3 can be found in bytes 12-15 (4 byte values per cluster) and contains the number 4, which is the next next cluster in the chain. The chain continues to cluster 6 where we encounter the end of file marker (in this case $0FFFFFFF). Note that clusters in a chain do not have to be consecutive numbers.

Offset	+0	+1	+2	+3	+4	+5	+6	+7
$0000	F0	FF	FF	0F	FF	FF	FF	0F
$0008	FF	FF	FF	0F	04	00	00	00
$0010	05	00	00	00	06	00	00	00
$0018	FF	FF	FF	0F	…	…	…	…

Example FAT table with a file starting in sector 3 and continuing through sector 6

As mentioned in the previous post the root directory can typically be found in cluster 2 and contains top level directory and file information. Each directory entry is 32 bytes long and can contain short file names, long file names, directories, etc. If it is a short file entry then the first 8 bytes contain the name and the next 3 the file extension. The first cluster of the file is stored in bytes 26, 27, 20 and 21 of the directory entry (don’t ask why … read the Wikipedia article 🙂

The last remaining thing to mention before we get to the code, is that you need the address of a given sector on the SD card to read it with command 17. E.g. sector 5 is at address 5*512 = $00000A00. Since FAT gives us the cluster numbers we need a way to convert from that to sector address. In my previous post we used the information in the MBR and boot sector to find the address of the first root cluster – and we also know its cluster number (typically 2). We can use this information to find the start address of any given cluster.

cluster_address = root_start_address + (cluster_number – root_cluster_number) * cluster_byte_size

With zp_temp storing a 4 byte cluster number the following code returns the address in sd_file_sector. For the math helper functions see my previous post or the full source code.

.cluster_adr    sec                    ; cluster number - root cluster number 
                lda zp_temp+0
                sbc sd_root_cluster+0 
                sta zp_temp+0
                lda zp_temp+1
                sbc sd_root_cluster+1 
                sta zp_temp+1
                lda zp_temp+2
                sbc sd_root_cluster+2 
                sta zp_temp+2
                lda zp_temp+3
                sbc sd_root_cluster+3 
                sta zp_temp+3
                
                ldx #8                 ; 512 = 2 power 9 (loop always adds 1) 
                lda sd_cluster_size 
                beq .cfe_end           ; error handling  
@power2         inx                    ; shift right to calculater how many powers of 2 the cluster size is  
                lsr 
                bcc @power2            ; carry will be set when we encounter first (and only) bit 
                
                jsr m.shift2w          ; offset address to this cluster (multiply w 512 * clusters/sector)
                
                ldx #<sd_root_sector   ; copy root start address ... 
                ldy #<sd_file_sector                                                                            
                jsr m.move2w_osd 
        
                ldx #<sd_file_sector   ; ... add offset to the right sector for this cluster number                                                                                  
                ldy #>sd_file_sector 
                jmp m.add2w
                ; rts

Note that the commented rts statement is just the way I remind myself that the code returns after the jump. It is one of the numerous small 6502 optimizations to replace “jsr, rts” with “jmp” which saves 6 cycles and 1 byte. Probably overkill in this context, but it’s second nature for me.

Let’s get into the meat of the code for loading a file. As noted previously my implementation only supports files in the root directory. I might later extend to support sub-directories, but for most usage the root directory limitation should not be an issue. I also only support the short file name that must be stored in sd_file_name in the 8+3 format; e.g. “boot prg” for a file named boot.prg

sd.load_file    jsr sd.load_root                        ; load first root sector                        
                bcc .stlb_end                           ; carry clear = failure 

                
@check_sector   ldx #(SD_SECTOR_SIZE/F32_ENTRY_SIZE)    ; number of entries per sector (is 16 assuming 512 byte sectors)
                lda #<sd_data_buffer
                sta zp_source+0
                lda #>sd_data_buffer
                sta zp_source+1

                
@cmp_attribute  ldy #11                                 ; file attribute 
                lda (zp_source),y
                and #%00011000                          ; check volume labe or directory (will include LFN)
                bne @next_entry                         ; ... then skip                         
                dey
                
@cmp_file_name  lda sd_file_name,y                      ; check short name only (11 chars)
                cmp (zp_source),y
                bne @next_entry 
                dey 
                bpl @cmp_file_name
                bmi @found_file                         ; if all letters match file name we have the right entry 

                
@next_entry     clc                                     ; advance to the next entry in the root 
                lda zp_source+0
                adc #F32_ENTRY_SIZE
                sta zp_source+0
                bcc @move_entry_end
                inc zp_source+1
@move_entry_end                 
                dex                                     ; have we checked this entire sector? 
                bne @cmp_attribute
                
@load_sector    jsr sd.load_rootx                       ; load next sector                      
                bcs @check_sector                       ; carry clear = failure or end of root  
                                
.stlb_error     lda #SD_FILE_ERR                        ; return file error code 
                sta sd_cmd_dat+0 
.boot_error     clc                                     ; clear carry to indicate failure 
                rts

Hopefully this is straight forward. Bytes 0-10 of a directory entry contains the short file name and is compared against our target file name stored in sd_file_name. Perhaps the only thing to comment on is the file attribute in byte 11, which allows us to skip the check if the entry is a directory, long file name, or similar. Any of those entries will have bits 3 and/or 4 of the attribute set. The code simply goes through each sector of the root and loads a new sector when needed. When a match is found it breaks the loop or returns an error if we dont find a match.

The root directory is loaded sector by sector using the following function. Calling sd.load_root loads the first sector and subsequent calls to sd.load_rootx loads the next sector, returning carry clear once we have gone through all sectors of the root. Note it uses the zp_temp+5 to zp_temp+7 variables to keep track of where it is and those cannot be clobbered between calls. It also assumes that no other SD card commands or similar are called, so the sd_cmd_dat structure is unchanged between calls. As a reminder I have allocated 512 bytes starting at sd_data_buffer to contain the sector data loaded by calling sd.get.block

sd.load_root    ldx #3
@root_cluster   lda sd_root_cluster,x      ; copy cluster number of root to zp_temp 0-3
                sta zp_temp,x
                dex 
                bpl @root_cluster
                jsr sd.get_fchain          ; get fat chain for root folder  
                bcc .sdlf_end              ; error if carry is clear 
                
                ldy #0                     ; first cluster   
                
.sdlr_sector1   ldx #0                     ; store cluster number 
@copy_cn        lda sd_fat_chain,y 
                sta zp_temp,x 
                iny 
                inx
                cpx #4
                bne @copy_cn
                
                jsr .chk_fat_entry         ; check if we are at end of file 
                bcs .clear_carry           ; ... if so return carry clear 
                
                sty zp_temp+7              ; store/update index to next entry 
                lda sd_cluster_size        ; number of sectors to load per cluster  
                sta zp_temp+5
                                
                jsr .cluster_adr           ; convert cluster number to address (address in zp_temp)
@load_fsector   ldx #<sd_file_sector       ; get sector in this cluster of file                                    
                ldy #>sd_file_sector 
                jmp sd.get_block
                ; rts 


sd.load_rootx   ldy zp_temp+7              ; index to current entry in fat chain (if we need to load a new cluster)     
                dec zp_temp+5              ; sectors left in this cluster?
                beq .sdlr_sector1          ; ... if not get next cluster  
                                
@next_sector    ldx #<(sd_cmd_dat+1)       ; add 512 to current sector/load address                                                                                                                                               
                ldy #>(sd_cmd_dat+1)     
                jsr m.add2w_512 

@load_sector    jmp sd.get_blockx          ; load next sector                      
                ; rts

The root directory can be spread over multiple clusters, similar to a file, and as such has its own FAT chain. The following function will find the chain of any file (or directory) given the first cluster number zp_temp (4 bytes). I returns the chain in the sd_fat_chain buffer. Note that since I only have 32K RAM the maximal cluster chain length of a file will be 4 * 32K / 512 = 256 bytes (smallest cluster size is 512 or one sector). Most cards will have cluster sizes greater than 1 sector, but for worst case I have allocated 256 bytes of buffer for the cluster chains.

sd.get_fchain   ldx #4                ; copy first cluster number into fat chain              
@first_entry    lda zp_temp,x 
                sta sd_fat_chain,x 
                dex 
                bpl @first_entry
                                
                ldx #4 
@calc_sector    stx zp_temp+6         ; index to entry in fat-chain table (XR = 4 first time we get here) 
                                
                ldx #2 
                jsr m.shift2w         ; multiply by 4 to get address offset to the right fat (4 bytes per cluster entry)  
                
                lda zp_temp+0         ; index into sector where this cluster numbers entry is located
                sta zp_temp+4                           ; temp+4/temp+5 hi/lo range 0-508   
                lda zp_temp+1
                and #%00000001 
                sta zp_temp+5                    
                
                lda #$00              ; truncate to get offset to address of start of fat sector 
                sta zp_temp+0                    
                lda zp_temp+1
                and #%11111110 
                sta zp_temp+1                    
                        
                ldx #<sd_fat_sector   ; copy fat sector start adress ... 
                ldy #<sd_file_sector                                            
                jsr m.move2w_osd 
                
                ldx #<sd_file_sector  ; ... add offset to the right fat sector for this entry                                                                                 
                ldy #>sd_file_sector 
                jsr m.add2w
                
                ldx #3  
@sector_loaded  lda sd_file_sector,x  ; did we just load this sector? (compare to address in sd command data)
                cmp sd_cmd_dat+1,x 
                bne @load_fat_sctr    ; ... if not then load the new one 
                dex 
                bpl @sector_loaded    ; ... otherwise no need to load it again (next cluster in the chain is often the sequential next one)
                
@set_source     clc                   ; location of next cluster number in FAT sector data 
                lda #<sd_data_buffer
                adc zp_temp+4         ; low byte of index into FAT sector 
                sta zp_source+0
                lda #>sd_data_buffer
                adc zp_temp+5         ; hi byte of index into FAT sector 
                sta zp_source+1
                
                ldx zp_temp+6         ; index to entry in fat-chain table
                ldy #0
                                         
@get_cluster    lda (zp_source),y     ; get next cluster number in chain 
                sta sd_fat_chain,x    ; store in next location in FAT chain 
                sta zp_temp,y 
                inx 
                beq @error            ; overflow in fat chain means file is too long
                iny
                cpy #4
                bne @get_cluster         
                
                jsr .chk_fat_entry    ; check if normal cluster value or EOF  
                bcc @calc_sector      ; normal -> carry set 
                rts     
                
              
@load_fat_sctr  ldx #<sd_file_sector  ; get fat sector that holds the next cluster of this file chain                    
                ldy #>sd_file_sector 
                jsr sd.get_block
                bcs @set_source       ; no error -> continue  
                
                
@error          ldx #3                ; if overflow we have an issue  
                lda #$FF
@eof_loop       sta sd_fat_chain,x    ; store end-of-file marker as fist cluster @ error 
                dex 
                bpl @eof_loop
 
                clc                   ; clear carry to indicate error                 
@end            rts

The first thing this code does is to copy the start cluster value into the FAT chain we are creating. The start cluster of a file is obviously the first one we will load. Then it calculates where in the FAT table to look for the chain we are after. If e.g. the start cluster is 312, then it is located 312*4 = 1248 = $4E0 bytes into the FAT table. That will be in the 3rd sector of the FAT table (first sector contains bytes 0-511, second 512 – 1023, third 1024 – 1535, etc.) and at offset 224 in that sector (1248-1024=224). So in this example the address of the FAT sector we are after is fat_sector_address + 1024 and the number of the next cluster in the chain will be the 4 byte value located at offset 224.

The code then simply loads that sector and begins building the chain by reading the value at the offset we calculated (224 in this example) and saves it in the next spot in the sd_fat_chain buffer/table. It then uses this value to go find the next cluster in the chain, etc. Since clusters in a chain are often consecutive I check to see if the next cluster is in the sector we already loaded; if it is there is no need to load it again from the SD card.

If we for some reason have an error I store the end-of-file marker in the first slot of the chain, which would indicate that there are no clusters in the chain.

Every time a cluster number is read we need to check if it is an actual cluster number or if we have reached the end of the file. Valid cluster numbers run from $00000000 – $0FFFFFEF, a bad cluster is indicated by $?FFFFFF7, and values higher than that indicating end-of-file. The following code does that check and returns carry clear for a normal cluster number.

.chk_fat_entry  lda zp_temp+3                
                cmp #$0F            ; High byte >= $0F?
                bcc @cfe_end        ; ... if not then normal cluster number 
                
                lda zp_temp+2       ; middle bytes are $FF for special/reserved values
                cmp #$FF
                bne @cfe_end
                lda zp_temp+1                            
                cmp #$FF
                bne @cfe_end
                
                lda zp_temp+0
                cmp #$F0            ; highest valid cluster value is $EF 

.cfe_end        rts

Zero indicates a free cluster and would not be a valid value for a file chain (note to self: I really should check for this as well and return an error). Technically I should also check if it is the end-of-file marker we encounter if carry is set, but since a bad sector cannot be part of a file chain we should be OK here.

We can now find the cluster chain for the root directory (and any file or directory) which allows us to cycle through its entries and look for the desired file name. Once we find the right entry we can use it to extract the first cluster of the file. The number of the first cluster in a file is stored in bytes in bytes 26, 27, 20 and 21 of the directory entry and the file size is stored at location 31 ($1F). The following code extracts this information once we have found a match – and creates the cluster chain of that file.

@found_file     ldy #$1F                    ; file size 
                ldx #3                                  
@get_file_size  lda (zp_source),y
                sta sd_file_size,x  
                dey 
                dex 
                bpl @get_file_size

                lda sd_file_size+3          ; check if file size is over max limit 
                ora sd_file_size+2          ; high word must be zero 
                bne .stlb_error         
@get_file_pages ldx sd_file_size+1          ; calculate file size in number of pages (1 page = 256 bytes) 
                lda sd_file_size+0
                beq @set_file_pages         ; do we have remaining bytes, beyond number of full pages?  
                inx                         ; ... then increase page number by one 
                cpx #(>SD_MAX_FILESIZE)+1   ; more pages than max size allows? 
                bcs .stlb_error 
@set_file_pages stx sd_file_pages           ; number of pages in file (remaining to load)

        
@first_cluster  ldy #$1A                    ; first cluster of file  
                lda (zp_source),y
                sta zp_temp+0
                iny  
                lda (zp_source),y
                sta zp_temp+1
                ldy #$14                    ; first cluster of file  
                lda (zp_source),y
                sta zp_temp+2               ; start of FAT chain stored in buffer 
                iny  
                lda (zp_source),y
                sta zp_temp+3
                                
@get_fchain     jsr sd.get_fchain           ; get fat chain of clusters for this file 
                bcc .sdlf_end               ; error if carry is clear

Note that I also do a check to see if the file size is larger than 32K and return an error if it is. Since my system only has 32K there is no reason to load a larger file. In addition a calculate the number of pages (1 page = 256 bytes) to load. The reason for this will be clear once we go through the function I use to load the actual file into the system.

Finally we are ready to load the actual data from the file into memory. But since this blog post is already too long I will do this in the next one. As always feel free to comment, ask questions and point out mistakes, etc.

June 6, 2020June 22, 2020

Reading FAT data Part 1

OK with the SD card initialized in SPI mode it is finally time to read some data. To do this we need to understand the FAT format and I found the following links helpful as I was getting into the specifics. Note in my implementation I only support FAT32.

As mentioned in the previous posts the data is stored in 512 byte blocks on the SD card, which is the smallest increment of data you can read. To get a block of data from the SD card you can send it command 17 with the argument being the address of the block (4 byte address). Once the command is sent we get the usual R1 response and then have to wait until the card is ready to transmit data, which is indicates by sending the OK token ($FE). The following function accomplishes this (and calls the sd.send_cmd function described in my previous post)

.sd_block_cmd   stx zp_source+0       ; low byte in XR
                sty zp_source+1       ; high byte in YR

                ldy #3                ; move block address to command buffer 
@set_cmd_data   lda (zp_source),y
                sta sd_cmd_dat+1,y 
                dey 
                bpl @set_cmd_data
                
.sd_block_cmdx  lda #17               ; entry point if sd_cmd_dat structure set directly 
                jsr sd.send_cmd       ; command 17 = read a single block (arg = block number)
                bne @end  
                
                ldy #0
@wait_data      jsr spi.get_byte      ; wait until card is ready to send block data 
                cmp #$FF 
                bne @return
                dey
                bne @wait_data
                
@return         cmp #$FE              ; did we get 'ok' token ($FE) or error?
@end            rts

The function needs a pointer to the 4 byte block address passed in XR/YR and will return ‘equal’ (Zero flag set) on success. Assuming that it returns without an error we are now ready to read 512 bytes of data from the specified block on the SD card. The following function reads the 512 bytes of data into a memory buffer that in my current implementation is $0300 – $04FF. To change it simply set sd_data_buffer to the location in memory you want to use.

sd.get_block    jsr .sd_block_cmd     ; send 'read block' command to SD card          
                bne .sdgb_error

.sd_get_bdata   ldx #0 
                
@first_256      lda VIA1_IFR          ; check IRQ flag
                and #%00000100        ; check for SR flag  
                beq @first_256          
                lda VIA1_SR           ; get data 
                sta sd_data_buffer,x 
                inx 
                bne @first_256
                                
                ; ldx #0              ; read last 256 bytes into buffer 
@last_256       lda VIA1_IFR          ; check IRQ flag
                and #%00000100        ; check for SR flag  
                beq @last_256           
                lda VIA1_SR           ; get data 
                sta sd_data_buffer+$100,x 
                inx 
                bne @last_256
                
                jsr spi.get_byte      ; get CRC 
                jsr spi.get_byte                                        

                sec                   ; set carry on success 
                rts 
                

sd.get_blockx   jsr .sd_block_cmdx    ; address already set in in sd_cmd_dat structure
                beq .sd_get_bdata

.sdgb_error     lda #SD_FILE_ERR      ; return file error code 
                sta sd_cmd_dat+0 
                clc                   ; clear carry on failure  
                rts

I have chosen not to call my generic spi.get_byte function while reading the data, but implement it inline for speed — and I also have two 256 byte loops to read the 512 bytes of data for the same reason. If you want to optimize for code size you could do this, but I have optimized for speed of reading the data. Note the small optimization of not setting XR = 0 in the second loop, since it will have that value exiting the first loop. Once the 512 bytes are received the card sends a two byte CRC code which I simply ignore, but if you want error correction you could implement logic for that.

The loop takes 22 cycles to complete each pass, which is slightly longer than the 16 cycles it takes to read a byte using the system clock to drive the shift register clock on the VIA. This implementation has the advantage of being able to work at different clock speeds, since it waits for the flag to be set, but if you know you are running at a fixed speed you could optimize it to run ~25% faster, e.g. by the following

@first_256      lda VIA1_SR             
                sta sd_data_buffer,x
                nop        
                inx 
                bne @first_256

I have not currently tested this and at exactly 16 cycles it might be too tight timing.

The first data we want to read from a FAT formatted card is the Master Boot Record (MBR). The MBR is located on block 0. To get the address of any block we have to multiply by the block size (512 or $200) … of course for block 0 this is still $00000000 😉

sd.load_mbr     lda #$00                     ; load block 0 on SD card = MBR 
                sta sd_cmd_dat+4                                        
                sta sd_cmd_dat+3                                        
                sta sd_cmd_dat+2                                        
                sta sd_cmd_dat+1 
                jsr sd.get_blockx

@boot_adr       lda #$00                     ; calculate & store boot sector address (little indean)  
                sta sd_boot_sector+0         ; address = LBA * $200 (sector size)
                lda sd_data_buffer+$1BE+8    ; location of first partition info ($1BE) + LBA (#sectors between MBR and first sector in partition) 
                asl 
                sta sd_boot_sector+1
                lda sd_data_buffer+$1BE+9 
                rol 
                sta sd_boot_sector+2
                lda sd_data_buffer+$1BE+10 
                rol 
                sta sd_boot_sector+3

If you want more info on what is stored on the MBR there is a quick overview here and more info in the Wikipedia article linked above. I am assuming that we only have one partition — or at least my current implementation will only support the first partition on the card — and the only piece of information we are after in the MBR is the location of the boot sector address. The first partition info starts at offset $1BE and the number of blocks/sectors between the MBR and the boot sector is stored as a 4 byte value starting in byte 8 of the partition info. Multiply this value by 512 and you have the boot sector address.

I should really also check that this partition is active and that it is FAT32 formatted, but for now I will leave that up to the reader.

The boot sector contains more information that we will need to read files from the SD card. Most importantly we want to find the location of the File Allocation Table (FAT) and its length. We also need the number of sectors per cluster for the FAT formatting of the card. If you have formatted a USB stick you might have noticed an “allocation unit size” which is the smallest unit size for the file format. It will be a power of 2 multiplier of the SD block size, e.g. an allocation unit size of 4096 bytes is 8 * 512. This means that the smallest size of FAT data you can write to the card in this example is 4096 bytes or 8 blocks of 512 bytes. Every file will be saved as whole number of sectors with any remaining bytes “wasted” — and a 5000 byte file will be 2 sectors long, in this example, with 3,192 bytes unused on the second sector. What a luxury on an 8 bit system, but obviously FAT32 was designed for systems where a couple of thousand bytes can be considered a pittance 🙂

In addition to the location of the FAT we also want to find the location of the root sector where information about file and folder names, etc. is stored. This location is not directly stored in the boot sector, but can be calculated since it starts immediately after the FAT sectors. For error correction etc. there are more than one FAT (typically 2) so you need to multiply the length of a FAT with the number of FAT’s to find the root sector location. Lastly we want the root sector number, which typically is 2 but could be a different value. We need this since the FAT stores the sector numbers of files and we will have to convert those sector numbers to addresses of blocks on the SD Card.

sd.load_bsctr   ldx #<sd_boot_sector       ; get boot sector 
                ldy #>sd_boot_sector 
                jsr sd.get_block
                bcc .stlb_end              ; carry clear = failure         

@sector_size    lda sd_data_buffer+$0B     ; check that sector size is 512 
                bne .init_sd.error                      
                lda sd_data_buffer+$0C           
                cmp #$02
                bne .init_sd.error                      

@reserved       lda sd_data_buffer+$0E     ; sectors to first FAT (number of reserved sectors)
                sta zp_temp+0   
                lda sd_data_buffer+$0F                  
                sta zp_temp+1   
                lda #$00
                sta zp_temp+2   
                jsr m.mult512              ; convert from sector length to address length 
                
                ldx #<sd_boot_sector       ; copy boot sector address to fat sector address... 
                ldy #<sd_fat_sector             
                jsr m.move2w_osd 
                
                ldx #<sd_fat_sector        ; ... and add reserved sector length                            
                ldy #>sd_fat_sector 
                jsr m.add2w

                ldx #3
@fat_length     lda sd_data_buffer+$24,x   ; sector length of FAT (for FAT32)
                sta sd_fat_length,x 
                sta zp_temp,x   
                dex 
                bpl @fat_length
                jsr m.mult512              ; convert from sector length to address length  

                ldx #<sd_fat_sector        ; copy fat sector address to root sector address... 
                ldy #<sd_root_sector            
                jsr m.move2w_osd                

                lda sd_data_buffer+$10     ; number of FAT's 
                sta zp_temp+5
@add_fat_length ldx #<sd_root_sector       ; add FAT length times number of FAT entries                            
                ldy #>sd_root_sector 
                jsr m.add2w
                dec zp_temp+5
                bne @add_fat_length             
                
@cluster_size   lda sd_data_buffer+$0D     ; sectors / cluster 
                sta sd_cluster_size
                               
                ldx #3 
@root_cluster   lda sd_data_buffer+$2C,x   ; root cluster number (typically 2, but is theoretically a 2word)
                sta sd_root_cluster,x 
                dex
                bpl @root_cluster

I created a couple of helper math routines to help multiplying and adding 4 byte (double word) numbers. Nothing special about these, but just for completion they are listed below.

I will stop this post here. I had intended to cover everything needed to load a file, but it is already getting too long and I am late in publishing. So next blog post will cover the FAT and root sector, and finally we will load a file. However if you are impatient everything is in the complete code listing with a decent number of comments and you can always as questions in the comments.

;--------------------------------------------------------------------------
; copy 2word address to another location, both in os_data range   
; source/dest low byte pointer in XR/YR 
;--------------------------------------------------------------------------

m.move2w_osd    lda #>os_data                           ; both source and destination in the same page range 
m.move2w        sta zp_source+1
                sta zp_dest+1

                stx zp_source+0                         ; low byte of source/dest       
                sty zp_dest+0

                ldy #3
@move           lda (zp_source),y 
                sta (zp_dest),y 
                dey 
                bpl @move
                
                rts 
                
                
;--------------------------------------------------------------------------
; add 2word zp_temp to (zp_source) and store in (zp_source). 
; pointer to other 2word in XR/YR 
;--------------------------------------------------------------------------

m.add2w_512     lda #$00
                sta zp_temp+0                           ; store 512 in zp_temp
                sta zp_temp+2
                sta zp_temp+3
                lda #$02 
                sta zp_temp+1 

                
m.add2w         stx zp_source+0                         ; low byte in XR
                sty zp_source+1                         ; high byte in YR               
                ldy #0                                  ; note: cannot run a loop, since cpy would affect flag 

                clc
                lda (zp_source),y 
                adc zp_temp+0
                sta (zp_source),y 
                iny
                lda (zp_source),y
                adc zp_temp+1
                sta (zp_source),y 
                iny
                lda (zp_source),y
                adc zp_temp+2 
                sta (zp_source),y 
                iny
                lda (zp_source),y
                adc zp_temp+3 
                sta (zp_source),y

                rts 


;--------------------------------------------------------------------------
; left shift 2word in zp_temp. number of shifts in XR 
;--------------------------------------------------------------------------     

m.shift2w       asl zp_temp+0           
                rol zp_temp+1
                rol zp_temp+2
                rol zp_temp+3 
                dex 
                bne m.shift2w 
                rts 

                
;--------------------------------------------------------------------------
; multiply 2word in zp_temp with 512 
;--------------------------------------------------------------------------

m.mult512       asl zp_temp+0           
                rol zp_temp+1
                rol zp_temp+2
                lda zp_temp+2 
                sta zp_temp+3 
                lda zp_temp+1 
                sta zp_temp+2 
                lda zp_temp+0 
                sta zp_temp+1 
                lda #$00                                 
                sta zp_temp+0                                           
                rts

May 24, 2020

Accessing the SD Card

Now that we can send and receive bytes using the SPI protocol it is time to begin accessing the SD card itself. In this post I will go through what it needed to initialize the SD Card and in the next post go into the FAT format so we can read files from the SD Card. My implementation is not fully general, but should support most newer cards and you can find the entire code here.

I found this series of blog posts from Lucky Resistor helpful in understanding the SD commands, responses, and data format. The pictures were especially helpful in getting an overview of what is being sent back and forth and I would highly encourage you to read Part 2 and Part 3 in combination with this post of mine. There is also a great overview here that would be a good pre-read to this post and the whole initialization process is summarized in this flow chart.

The procedure to initialize the SD Card is as follows

Put card into SPI mode, by sending at lest 74 clock pulses
Send Command 0 repeatedly until card is idle (resets)
Send Command 8 and check for the right response
Send AMCD41 command to initialize newer / high capacity cards
Send Command 58 to check working voltage range
Finally send Command 16 to ensure block size is 512 bytes

The first step is sending at least 74 clock pulses (I send 80 as the shift register triggers 8 clock pulses every time we send a byte). For this first step you have to set CS (slave select for the SD card) high and MOSI should also be 1 during this procedure. For all “normal” commands you should of course select the SD card by pulling CS low, but the first step is different. The follow code achieves this (see SPI Part 2 for spi.send_byte etc. or you can find the full code here)

sd.init         jsr spi.set_output              ; setup for SPI output 
                lda #(SPI_OUTPUT+SPI_NOSS)      ; set SR to output and SD card CS = 1 (special first time init)
                sta VIA1_PORTB                
                                
                ldx #10                         ; send 80 clock signals to initialize SPI mode for SD card      
@init0          lda #$FF                        ; MOSI = 1 and CS = 1 
                jsr spi.send_byte
                dex 
                bne @init0

Note that this initialization should be run at between 100-400 kHz. You can subsequently (on newer cards) increase the speed up to about 25 MHz. Check my previous post for notes on how to vary SPI clock speed and setting it to the max of 1/2 the system clock speed.

Sending a command to the SD card follows a 6 byte format. The first byte is the command number with bit 7 = 0 and bit 6 = 1. E.g. for command 8 the first byte is $48. This is then followed by 4 bytes that are the argument for the command, and finally a CRC (error checking) byte. The CRC byte is optional for SPI mode, with some exceptions that we cover later. Before sending a command you should check that the SD card is idle, indicated by receiving $FF. After sending the command the SD card will send one of several responses, depending on the command. The first byte of the response (at least for the commands used here) will contain several flags and have bit 7 = 0. For many commands this is the only response (denoted R1).

sd.send_cmd     ora #$40            ; set bit 6 = 0 for a command 
                sta sd_cmd_dat+5    ; store at end of command data (we send from end due to dex)
                
@set_crc        ldx #$FF            ; standard CRC for most commands                
                cmp #$48            ; cmd8 has a different CRC
                bne @store_crc                           
                ldx #$87            ; crc for cmd8                 
@store_crc      stx sd_cmd_dat+0    ; store at start of command data (we send from end due to dex)


@check_idle     jsr spi.set_input
                jsr spi.get_byte
                cmp #$FF            ; idle = $FF
                bne .sdcmd_error    ; clear carry and return 
                
@wait           lda VIA1_IFR        ; check IRQ flag
                and #%00000100      ; check for SR flag  
                beq @wait
                

sd.send_cmdx    jsr spi.set_output  ; setup SPI to output 
                                                
                ldx #5
@send_cmd       lda sd_cmd_dat,x 
                jsr spi.send_byte 
                dex 
                bpl @send_cmd

       
sd.get_response jsr spi.set_input

                ldx #10             ; wait for response (should be within 8 bytes)             
@get_byte       jsr spi.get_byte
                bpl .sdcmd_return   ; first bit of first byte of response will always be 0  
                dex                 ; retry 
                bne @get_byte   

.sdcmd_error    ora #$80            ; return negative value on error 
                
.sdcmd_return   sta sd_cmd_dat+0    ; save return value for later retrieval 
                rts

Couple of comments on the above. Since a valid R1 response from the SD card always has bit 7 = 0 I return a negative value in case of a failure. Also in the above you will notice that the command bytes are stored in the “opposite” sequence (e.g. the first byte to send is stored in the 6th byte of the command data). I do this in general to optimize my indexed loops, so I dont have to do a compare at the end. Doing “dex, bne” saves both time and space over “inx, cpx #val, bne”. Most of the times it is possible to design your code so a loop runs from max value down to zero.

With the code for sending a command to the SD card in place we can now proceed with the actual initialization process. Firstly (and perhaps most importantly) we send command 0 repeatedly until the card is in idle state. I found that if a previous command was not completed this can take quite a number of retries. For instance if resetting the computer while there are still 100’s of bytes left to receive from a read command command 0 needs to be sent dozens of times to get to idle state.

Since the card might not initially be in idle mode we cant wait for the idle byte to be received, as we generally do when sending commands. Therefore I jump to sd.send_cmdx to skip this step in the code below. Also for the initialization process it is recommended to send the correct CRC byte ($95) for command 0.

                ldy #100             ; number of retries to get card in idle state   
                
@cmd0           lda #0               ; command 0 (go to idle) - send without waiting for idle byte! 
                sta sd_cmd_dat+4                                        
                sta sd_cmd_dat+3                                        
                sta sd_cmd_dat+2                                        
                sta sd_cmd_dat+1                                
                ldx #$95             ; crc for cmd0 
                stx sd_cmd_dat+0     ; store at start of command data (we send from end due to dex)
                ora #$40             ; set the command bit 
                sta sd_cmd_dat+5     ; store at end of command data (we send from end due to dex)
                jsr sd.send_cmdx     ; call send command w/o checking idle 
                cmp #$01                                        
                beq @cmd8            ; returns 1 when i idle state
                
                dey                  ; retry cmd0 
                bne @cmd0       
                beq .init_sd.error

Next step is to “send if cond” or command 8 with a special argument of $000001AA and the correct CRC code of $87. The $AA is a pattern that the card should repeat back on success. It could be any pattern, but $AA is the standard one ($AA = %10101010). If we have a type 2+ card the 4th byte of the response should be our pattern (response to command 8 is the R1 byte plus 4 more). If we have an older card then the initialization process is slightly different and I have chosen not to support this (at least for now).

@cmd8           ;lda #$00                
                ;sta sd_cmd_dat+4                                        
                ;sta sd_cmd_dat+3                                        
                lda #$01                
                sta sd_cmd_dat+2                                        
                lda #$AA               ; pattern for response                          
                sta sd_cmd_dat+1                        
                lda #8                 ; command 8 = send if cond  
                jsr sd.send_cmd                                 
                ;cmp #$01              ; skip error checking as response will trigger error if there is one 
                ;bne @error                             
                jsr spi.get_byte       ; response should be $00, $00, $01, $AA 
                jsr spi.get_byte
                jsr spi.get_byte
                jsr spi.get_byte
                cmp #$AA 
                bne .init_sd.error

Since the command data already contains zeros in the 3rd and 4th location from command 0 I skip setting them. It’s just second nature for me to optimize for size and speed in 6502 assembly, but I leave the code commented for readability and if the proceeding code should change and the assumption is no longer valid. I also skip checking of we got the right R1 response (should be $01) since if there is an error then we wont get the right pattern back either and that will trigger the error. Again just second nature to save bytes and cycles.

With a newer card we then need to send it a special application command to indicate we support SDHC and SDCX cards and enable high speed transmission. Command 55 tells the card that the next command is an application command. The application command we need is AMC41 with bit 30 set in its argument ( $40000000). This sequence might also have to be repeated a number of times until the SD card has executed the command. When the card is ready the AMC41 command will return zero.

                ldy #200             ; number of retries until card is ready 
        
@cmd55          lda #$00             ; command arguments 
                sta sd_cmd_dat+4                                        
                sta sd_cmd_dat+3                                        
                sta sd_cmd_dat+2                                        
                sta sd_cmd_dat+1 
                lda #55              ; command 55 = application command   
                jsr sd.send_cmd
                ;cmp #$01
                ;bne @error                             
                
@amc41          lda #$40             ; command arguments 
                sta sd_cmd_dat+4                                        
                ;lda #$00 
                ;sta sd_cmd_dat+3                                       
                ;sta sd_cmd_dat+2                                       
                ;sta sd_cmd_dat+1 
                lda #41              ; AMC41 when following command 55  
                jsr sd.send_cmd         
                beq @cmd58
                
                dey                  ; retry until card is ready 
                bne @cmd55

The final steps are to send command 58 (read OCR) to check that the card support the right voltage supply ranges. This is optional and could be skipped entirely if your supply voltage is 2.7V to 3.6V. I still send the command just in case, but ignore the response. Lastly we need to send command 16 to set the block size to 512 bytes to ensure compatibility with FAT. This is done by setting the argument to $00000200 = 512.

@cmd58          lda #$00 
                sta sd_cmd_dat+4                                        
                ;sta sd_cmd_dat+3                                       
                ;sta sd_cmd_dat+2                                       
                ;sta sd_cmd_dat+1 
                lda #58                  ; command 58 = read OCR    
                jsr sd.send_cmd         
                bne .init_sd.error 
                jsr spi.get_byte         ; dont use this response for anything 
                jsr spi.get_byte         ; is used to check that voltage is ok for this card     
                jsr spi.get_byte
                jsr spi.get_byte

                                                        
@cmd16          ;lda #$00 
                ;sta sd_cmd_dat+4                                       
                ;sta sd_cmd_dat+3                                       
                ;sta sd_cmd_dat+1 
                lda #$02                  ; 512 bytes size = $00000200
                sta sd_cmd_dat+2                                        
                lda #16                   ; command 16 = set block size     
                jsr sd.send_cmd         
                bne .init_sd.error

With this the card should be initialized and ready to read and write to using the FAT file format. I repeat this initialization each time I begin an operation with the SD card, just in case the card has been removed and/or replaced with another one. There might be a way of detecting that a card has been ejected so that you can limit the initialization to once. If someone knows how to do this I would love to hear it in the comments.

The full code incl. reading files in the FAT format can be found here. Feel free to skip ahead if you cant wait for the next post on the FAT file format.

May 15, 2020May 18, 2020

SPI Part 2

In this post we will take a look at the basic SPI code and then subsequently at the more complex SD card and FAT file format implementation. Before we get going some definitions. The way my address decoding logic is set up the VIA registers have the following addresses and names that I will use in the code. Should be easy to modify for your system (see VIA documentation for more details)

VIA1_PORTB      = $F100      ; address of port a
VIA1_PORTA      = $F101      ; address of port b
VIA1_DDRB       = $F102      ; data direction register for port a
VIA1_DDRA       = $F103      ; data direction register for port b
VIA1_T1CL       = $F104      ; timer 1 counter low 
VIA1_T1CH       = $F105      ; timer 1 counter high
VIA1_T1LL       = $F106      ; timer 1 latch low
VIA1_T1LH       = $F107      ; timer 1 latch high
VIA1_T2CL       = $F108      ; timer 2 counter low 
VIA1_T2CH       = $F109      ; timer 2 counter high
VIA1_SR         = $F10A      ; shift register 
VIA1_ACR        = $F10B      ; auxiliary control register 
VIA1_PCR        = $F10C      ; peripheral control register 
VIA1_IFR        = $F10D      ; interrupt flag register 
VIA1_IER        = $F10E      ; interrupt enable register
VIA1_ORA        = $F10F      ; same as port A, except "no handshake"

My initialization code disables the shift register interrupt, sets port B to output and sets the T2 clock frequency (for the fastest speed set SPI_CLKN = 0) .

spi.init        lda #%00000100  ; disable shift-register interrupt
                sta VIA1_IER     
        
                lda #$FF        ; set port B to output
                sta VIA1_DDRB
        
                lda #SPI_CLKN   ; set number of counts for T2/clock
                sta VIA1_T2CL

                rts

You can set an interrupt to trigger when 8 bits have been sent/received, but I have chosen to have the load/save running in the “main thread”. I have certain kernel update routines in IRQ (e.g. updating screen, cock, etc.) that will then interrupt when needed. Hence the SR interrupt is disabled. You could easily chose to do this differently.

Since we are using the shift-register for both MOSI and MISO communication (see my SPI Part 1 post) we need to set up both the SR register and the buffer chip for either output or input. For the SR register we do this by setting bits 2-4 in the VIA auxiliary control register (ACR) to either %001 (shift in under control of T2) or %101 (shift out under control of T2).

The the input line (connected to PB0) or the output line (connected to PB1) on the buffer is enabled by pulling it low (obviously only pull one low and the other high … I know I really should implement this in HW). This will enable or disable the respective gates on the buffer.

Finally pull the SD cards Slave Select low to indicate it is the device we are communicating to (SS1 is connected to PB2). E.g. writing %xxxxxx10 to Port B will disable output (bit 1 = 1) and enable input (bit 0 = 0). And %xx1110xx will set SS1 = 0 to select the SD card and deselect SS2-4 by setting them to 1.

spi.set_input   lda VIA1_ACR
                and #%11100011    ; mask out SR control bits
                ora #%00000100    ; SR in under control of T2 
                sta VIA1_ACR          
     
                lda #%00111010    ; buffer input & SD card CS = 0 
                sta VIA1_PORTB          
                rts 

                
spi.set_output  lda VIA1_ACR
                and #%11100011    ; mask out SR control bits
                ora #%00010100    ; SR in under control of T2
                sta VIA1_ACR

                lda #%00111001    ; buffer input & SD card CS = 0
                sta VIA1_PORTB
                rts

So far all we have done is to set up things and we need the code to actually send a byte over SPI. Note that any reading or writing to the shift register (SR) clears the respective flag in the IFR. Once 8 bits have been sent or received the flag (bit 2 in IFR) is set to signal that data is ready and if the interrupt is enabled it will also trigger an interrupt.

spi.send_byte   sta VIA1_SR      ; send data to SR (also clears SR flag in IFR)
@wait           lda VIA1_IFR     ; check IFR flags
                and #%00000100   ; isolate SR flag  
                beq @wait        ; wait until done sending byte 
                rts
                

spi.get_byte    lda VIA1_IFR     ; check IFR flag
                and #%00000100   ; isolate SR flag  
                beq spi.get_byte ; wait until SR flag is set (when previous shift operation is completed)
                lda VIA1_SR      ; get data (also clears SR flag in IFR)
                rts

You have to be a little careful as you can only clear the SR flag in IFR by reading or writing to the shift register. For SD card communication you wilI always send bytes before receiving (more about this in the next blog post where we will dive into the actual SD card and FAT format). You will notice that my send_byte function sends first and then waits (until the byte has been sent), whereas the get_byte function waits first and then sends. This works because of this sequence, but if you are not careful you could be caught in a loop waiting for a flag that is never set or cleared.

Ideally you want to do other things while the SR register is sending/receiving, so for sending the above can certainly be optimized. Later you will see in some of my code where I send or receive large quantities of data that I have made these optimizations.

At the maximal T2 clock speed it will take 32 clock cycles to send/receive a byte, which can conveniently be used to e.g. storing the previous byte in memory. If you count the cycles of the code you execute between reads and writes you wouldn’t even have to check the SR flag before triggering the next byte to be sent/received (but if you are not careful you could trigger the shift register before it is ready for the next byte).

These were the basics of sending and receiving bytes over SPI and in my next blog posts I will go over the specifics of communicating with an SD card in SPI mode.

Note 1: As I was writing this blog post I realized the maximum read/write speed is attained by driving the SPI clock signal from PHI2 rather than Timer 2. The speed is then fixed at half the system clock speed and sending/receiving a byte takes 16 clock cycles, rather than 32 or more with T2. To do this you simply set bits 2-4 of ACR to %010 for receiving and %110 for sending. I tested this with my SD card and it works nicely . However if you need variable speeds for SPI communication with other devices (that might not be as fast) you still want to be able to change the SPI clock speed through T2.

Note 2: The IDE I used for coding supports standard 6502 instructions, but not the extended 65C02 instruction set, so there are places where my code could be optimized. This is a project for me to begin at some point. Suggestions for good 65C02 compilers and/or IDE’s very welcome. From my C64 coding I have been using the excellent C64Studio from Georg Rottensteiner.

May 10, 2020May 10, 2020

SPI part 1

I wanted to have a modern storage solution for my 6502 homebrew computer and that naturally leads to SD cards. The easiest way to interact with an SD card is probably in SPI mode, with the added benefit that you can later add more SPI devices. The web is full of intros to the SPI protocol, incl. https://www.circuitbasics.com/basics-of-the-spi-communication-protocol/

In short we will need a clock line (CLK), serial out (MOSI), serial in (MISO), and a “Slave Select” (SS) line for each device. At least for now I chose to set it up so I can have 3 slave devices connected to the SPI port, controlled by PB2-PB4 on the VIA.

The most important design decision was to use the VIA timer to create the SPI clock signal. The T2 timer can create a clock signal that is output on CB1 and the frequency is PHI2 / (2*(N+2), where N is the value you set T2 to count down from (and PHI2 the system clock). Choosing N = 0 the SPI clock becomes 1/4 of the system clock or e.g. 250 kHz if you are using a 1MHz clock for the 6502.

Most other solutions I could find on the web manually created a clock signal by the 6502 writing directly to the VIA. Two writes to set the clock high/low and one read for each bit is at least 12 clock cycles, so the max frequency using this method is 1/12th the system clock frequency. Using T2 (and the VIA’s shift register) we should get at least 3X the speed.

The other advantage is that T2 can be set up to control the VIA’s shift register with bits being sent or received on CB2 and as such the 6502 processor does not have to read every single bit; it can just e.g. read a full byte once the shift register is full. This further increases speed of communications, rather than having the 6502 read and shift every bit.

However there is only one shift register (that can be set to either send or receive) and the SPI protocol calls for separate lines for sending and receiving bits. So I chose to add a buffer to switch between MOSI and MISO being routed to CB2 on the VIA. I use PB1 and PB2 to set which of the lines are enabled on the buffer – and I have pulled MOSI high when the output on the buffer is disabled. This works since – at least for the SD card – there is no simultaneous sending and receiving of bits.

I could of course reduce that to only using one pin on the VIA to control output (MOSI) or input (MISO) by adding a not-gate, but at least from now have not added the extra chip. If I need more glue logic for something else later I certainly will change this (it does annoy me a little to “waste” a pin on the VIA that could be used for something else).

Hopefully this helps explain the HW setup of the SPI interface and in my next blog post I will describe the code for implementing the SPI protocol.

April 28, 2020May 10, 2020

A short video

I need to find some time to documents some of the components of the computer, but for now you will have to make do with a short video of it in action.

http://hindsbo.dk/blog/wp-content/uploads/2020/04/20200427_160603.mp4

So far you can load the root directory from an SD card, load and/or run a file, and inspect the contents of memory. The file format is the same as C64 with the load address in the first two bytes.

The kernel also contains a number of functions related to the LCD screen and to e.g. display menus and allow for selection of an item, etc.

The little timer in the upper right hand corner is an IRQ wedge I was testing. The system runs an IRQ that updates the screen, reads the buttons, etc. It also allows you to wedge your own IRQ routine (and continue system IRQ if you so chose) which is what I am testing in the boot.prg file I load in the video.

April 18, 2020May 10, 2020

6502 Homebrev Overview

Below is the full schematic of my 6502 homebrew computer, as it stands today. I have a couple of expansions in mind, incl. RS232 serial interface, that is not yet hooked up. The main components are:

CPU – W65C02 processor (IC1)
32K ROM – AT28C256 15PU EEPROM (IC2)
32K RAM – HM62256B or similar (IC3)
Address Decoder – AT28C64B EEPROM (IC4)
I/O Controller – W65C22 VIA (IC5)
SPI buffer chip – SN74HC125N (IC11)
MIKROE-3 SD board (connected to SPI (J1))
LCD 20×4 Display (DS1)
74HC245 Bus Transceiver for display (IC13)
74HC574N Flip-Flop for connecting buttons (IC12)
W65C51 Asynch Comms (coming)
MAX232 RS-232 driver (coming)
X MHz Clock Oscillator (X1)
DS1813-10 Reset circuit (X2)
74HC00N NAND gates for misc glue logic (IC10)

I will go into more detail on sub-components in future blog posts and later also on the programming/code.

April 17, 2020May 10, 2020

Hello 6502

Having coded on the 6502 since I was about 11 years old I wanted to build my own 8-bit computer. I was inspired by Ben Eater’s YouTube series and it has been a lot of fun getting into HW (again). I have extended his design and now have LCD, SD Card, etc. up and running. It has been a great learning experience that I can only recommend! In the following blogs I will detail my design and code further. As I do feel free to comment; I’m a novice HW designer and any suggestions and improvements will be greatly appreciated.

6502 homebrew computer