Apple claims the new M2 chip has the following specs.
M2 features Image by Apple via Youtube
We all know that these numbers are probably a little fluffy. Maybe a lot fluffy, and in practical applications, they are probably pretty far off. Benchmarking in a lab is fine, but the numbers rarely reflect real-world performance.
After Gregorio posted this image earlier this week, it sparked a fair amount of discussion on the interwebs about the memory transfer speed of a 6502 processor.
The 6502 on Commodore machines shares the clock with the video chip. Since dual ported ram wasn’t financially feasible at the time, they chose a memory access trick that allowed both the video chip and processor to access memory during a single clock cycle. I think it’s the same on most Commodores, but on the VIC-20, the processor accesses the memory on the low part of the signal and the VIC chip on the high part. Maybe that’s backward… anyhoo, you get the point.
VIC-20 PAL Clock signal from the 6561
Memory at 1MB per second
Going back to the slide, this 1Mhz memory bandwidth is what folks are questioning.
On every clock cycle, the 6502 reads memory from somewhere… the stack, registers, program counter, memory locations, etc. So at 1 Mhz, typical for Commodore machines, this 1MB per second bandwidth is probably accurate in a vacuum, where marketing people hang out.
It’s important to note that Gregorio Naçu‘s slide was a parody and not intended to be a hard numbers accurate kind of thing. Please remember that because if you don’t, the rest of this discussion will ruffle your feathers.
We’ll try some memory transfers to get an idea of what actual transfer speeds might look like using standard Commodore hardware. Other 6502-based platforms might be faster or slower, so I encourage you to try some tests of your own, and please let me know what you find.
Again, remember that transferring memory takes more clock cycles than just reading or writing…
Let’s give this a go on the most popular 6502-based system of all time, the Commodore 64.
Everyone has a heads-up display for their Commodore 64 these days.
The transfer
We’ll take a queue right from the venerable Rodney Zaks.
Incidentally, Robin did a long video fixing this book’s implementation bug. I’ll be using the revised version as I think it’s a well-established example of doing a real-world block transfer. Sure there may be faster ways, but this is a realistic way, which is what we’re going for.
You can read this excellent chapter on how this works, and Robin’s video goes into it in great detail. Here’s what we’re going to do:
source = $0800
dest = $4800
len = $4000
from = $fb
to = $fd
tmpx = $a6
copyr
.block
lda #source
sta from+1
lda #dest
sta to+1
ldy #0
ldx #>len
beq remain
next lda (from),y
sta (to),y
iny
bne next
inc from+1
inc to+1
dex
bne next
remain ldx #
We can count jiffies on a Commodore to give us an idea of how fast this copy takes. Sure there's a slight overhead in the setup, but I think it's marginal enough that we can ignore it for our purposes.
$12(18) jiffies
Okay, that's pretty fast. Since that's 16k transferred, it works out to about 54.6 k per second.
Let's do a bunch of them and see what it comes out as.
We can call this pretty quickly 255 times and do the same math.
lda #$00
sta $a2
sta $a1
sta $a0
ldx #255
stx tmpx
lp
jsr copyr
dec tmpx
ldx tmpx
bne lp
lda $a0
jsr printbyte
lda $a1
jsr printbyte
lda $a2
jsr printbyte
$1128(4392)
So at $1128 jiffies(4392) and 255 transfers of 16,384, we're seeing around 57K per second.
Grain of salt, yes, but real-world enough.
Yeah, there's some overhead in the setup and running of the transfer. We could probably make this loop a few percentage points faster. Maybe if we make it tight, we could get 15% better out of it. But the point was real-world uses, and this is a pretty good example of a tight but flexible loop to transfer. Let's not get TOO pedantic here.
What's important to note is that transferring memory takes several clock cycles per byte. If we count them, it's about a dozen cycles, which tracks roughly with our results.
The KIM-1 is arguably the most simple and pure 6502 platform, so it will be interesting to try and do memory transfers on it.
It IS clocked a little slower than a Commodore 64, so I expect it to transfer slightly slower. But it doesn't have to compete for access time as VIC-II "badlines," so maybe it'll be pretty close.
Let's find out.
I don't own a "real" KIM-1, but I do own what is considered the best two clones. Today, let's use the Corsham KIM-1 Clone. I'm going to call it a KIM-1 from here forward, mostly because I enjoy getting angry letters about this. You've been warned.
Measuring time
The KIM-1 doesn't have a jiffy clock like the other Commodore machines.
The "Application ports" are easily accessible, so if we set a pin high when we start and set it low again when we finish, we can easily use an oscilloscope to measure the time.
With the expansion bus hooked up on my Corsham KIM board, the Application port A direction is set to output with.
lda #$ff
sta $1603
Set all ports out output
And then, we can toggle pin PA0 by setting it high or low. We'll use $FF and $0 for that for simplicity.
Side note: this is a non-standard location for this port, your KIM-1 or clone probably has it in the $1700 range. Check your documentation.
16k in 262 Milliseconds is around 62.5k per second. Slightly faster than a Commodore 64 even though an NTSC Commodore 64 runs at a slightly higher clock speed (1.023MHz) than our KIM here.
Let's do this 255 times in a tight loop, ignoring the overhead of things like JSR, which takes a few clock cycles each loop. We're going for a ballpark here.
So our loop code then looks something like
lda #$ff
sta $1603
sta $1601 ;technically setting all pins high here
;could just use #$01
ldx #255
stx tmpx
lp
jsr copyr
dec tmpx
ldx tmpx
bne lp
lda #$00
sta $1601
brk
Then if we probe it with an oscilloscope, we can measure the 1+ minute square wave.
So 255 transfers of 16,384 bytes take 67 seconds. Or about 62k per second.
I happen to have a Cerberus 2080 board. As far as I know, mine is the only green one in the world.
This has dual-ported RAM and can clock the brand new (yes, they still make them) WDC 65c02S processor at a blazing 8Mhz. Let's see what kind of results we get from it.
Again, we have a no jiffy clock problem, so I'm going to skip right to the 4MB transfer, time it over the video capture, and have it show "done" on the screen when it finishes. Unlike the KIM-1, I don't have a straightforward way to time it with an I/O pin. It'll give us a good enough idea of where we are.
0:00
/
About 6.29 seconds
16,384 bytes 255 times took 6.29 seconds, so maxed out, a modern 6502 at 8MHz can do about 664.2k per second. Not too bad!
Sure, this was not a comprehensive set of tests. But in the real world, a 6502 can copy the entire contents of a Commodore 64's memory from one place to another in about a second. Pretty respectable, and it was pretty fast for the time.
Unrolling
You could certainly use self modifying code and unroll this copy routine to get better performance at the price of flexibility and arguably understanding for the average casual 6502 assembly coder.
Again, this was not a "how fast can we absolutely make it" but an everyday use examination.
This copy can handle from one to 216 bytes and every number in between. And as my favorite Youtuber is fond of saying "I know I know, but I didn't do that. Let the angry emails begin."
REU
If you have an REU on your Commodore, that can theoretically swap out the memory at a byte per clock cycle. A true 1MB per second. I heard that games like Sam's Journey make use of this feature quite a bit.
0:00
/
Sam's Journey First Level
I'd love to hear your thoughts on how you'd approach this, pedantic, nit-picky, and otherwise. Bonus points if you demonstrate methods that show dramatically better results.
Whatever you do, be sure to have fun and don't take marketing slides too seriously.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.