After checking each of the 256 cells, the cell with the lowest difference is chosen.
This is essentially the fundamental operation of a vector quantizer codec, which was popular in the early days of lossy video compression due to its low processing requirements. The fact that this term doesn't appear in the article means the author has independently discovered and reinvented this theory, which is itself quite impressive.
flykespice 5 hours ago [-]
Those kinds of things can be figured with intuition, and it's more productive this way than spending time looking through academic articles online
crazygringo 6 hours ago [-]
I've read the post but I still have no idea what this is.
Where is the data coming from? Why does it involve playing part of SMB to start with? Besides the sprites used, what does this have to do with SMB at all? How is this different from just a custom ROM that would play this music video? Could you even fit this much data into a ROM?
I feel like there's a bunch of context you need to understand what this is about, and I clearly don't have any of it. And the "objectives" list at the top isn't helping ("1. Do a little tomfoolery 2. Execute arbitrary code..."). What are the actual parameters of this challenge, to understand what is actually being achieved?
Can anybody here help explain?
Edit: thank you so much for all the replies! Now I get why this is wild.
layer8 5 hours ago [-]
Some old console games have bugs that effectively allow to reprogram them to whatever you want, just by pressing the right buttons on the gamepad in the right sequence at the right time (due to the bugs leading to out-of-bounds memory writes or similar). In this particular case, SMB is being reprogrammed into streaming Bad Apple from controller input, where the data being streamed is input via thousands of button presses per second.
You could in theory perform this on the original SNES hardware with the original SMB game cartridge, if you can press the buttons on the gamepad fast enough and with the right timing.
dlcarrier 5 hours ago [-]
There is a community of speed runners, who try to reach goals in video games as fast as possible, either with or without using glitches in the game, sometimes by hand and sometimes with tools, including tools that completely automate the entire process. The goals are usually related to completing the game or portions of the game, often as quickly as possible, as thoroughly as possible, or while doing as little as possible.
There's online forums for posting speed-running accomplishments and challenging others' accomplishments.
I'n this case, someone used the tools for automated speed running, to execute a glitch in SMB that allows for arbitrary code execution, then entered and ran a program that plays a Bad Apple music video, all with controller input. This is posted as a speed run with an accomplishment of playing the video.
A custom cartridge could be made to make a Bad Apple music video, but this is showing off a way to play the video using only SMB cartridges.
The data has to fit into RAM, which is much smaller than a tricky cartridge ROM. The music and graphic data is streamed as controller inputs, but the rest is done in RAM, using some data already in the SMB cartridge for video output, due to the low bitrate of controller input but also because the video processor can't display directly from RAM.
BuildTheRobots 5 hours ago [-]
My understanding is that they're glitching the game to inject data into memory, in this case processor instructions. And then they use another glitch to jump to and execute that code. No modifications needed to the hardware or ROM, you're just being extremely selective about how to play the game to cause memory to get flipped.
they are most likely using a controller hooked up to a bot/laptop that exploits flaws in the code with inputs at high speed (much faster than a human could) that injects code into live games
Controller inputs, same as if you were playing SMB yourself. This "just" presses buttons a lot faster and with more precision.
For a bit more background, this type of thing originally grew out of the game speedrunning community. Some people were interested in how much faster a speedrun could become if a "perfect" human with complete knowledge of the game state and perfect execution ran a game, and they modified games/created tools to mimic what such a run would look like, resulting in tool-assisted speedruns.
Because of this goal of showing off what might be theoretically possible, TASes are nearly universally restricted to ~the same input methods that humans would use. Where TASes differ is in how they're created, as they make use of external tools such as memory inspection (e.g., to observe RNG, player/enemy positions/speeds, etc.), frame advance/programmed input (for perfectly precise/reproducible inputs), and save states (so sections can be tried over and over again to correct mistakes, try different RNG, etc.).
> Why does it involve playing part of SMB to start with?
This is also tied to TASes showing off what is theoretically possible. Because of this, TASes pretty much always start from some "normal" state - game power-on (the normal start state for TASes submitted to tasvideos.org), a game state that is reachable by a normal human (e.g., done for one of the Ocarina of Time showcases at one of the Games Done Quick events IIRC), etc.
> Besides the sprites used, what does this have to do with SMB at all?
One of the "holy grails" of TASes is arbitrary code execution (ACE), where you manipulate a game into a state where you can get it to execute arbitrary code. This TAS shows that only using controller inputs it's possible to manipulate SMB and the game hardware into a state where ACE is possible.
> How is this different from just a custom ROM that would play this music video?
The difference is in the journey, not the destination. An analogy might be installing "plain" malware on a computer vs. finding/writing a RCE chain that gets you the same capabilities. The latter might be considered more technically impressive/interesting to certain communities.
> Could you even fit this much data into a ROM?
Not as-is in this case. Here the data is streamed in, sort of as if the controllers were serving as extended memory.
> And the "objectives" list at the top isn't helping ("1. Do a little tomfoolery 2. Execute arbitrary code..."). What are the actual parameters of this challenge, to understand what is actually being achieved?
TASes submitted to tasvideos.org come with a submitter-chosen list of objectives for the TAS since different TASes for the same game may aim to do different things. One TAS may try to finish a game as fast as possible using any possible method, another TAS may try to finish a game as fast as possible while restricting the use of some approaches, some TASes may try to finish a game as fast as possible while also getting a "true" ending or "full" completion, some TASes may just try to be entertaining without trying to be as fast as possible, etc.
Not all objectives are deemed worthwhile of official publication by tasvideos.org, but IIRC there aren't any hard-and-fast rules about what is acceptable and what isn't.
This particular TAS was submitted on April Fool's, so there's a bit of additional silliness involved.
toast0 1 hours ago [-]
> This TAS shows that only using controller inputs it's possible to manipulate SMB and the game hardware into a state where ACE is possible.
It's not just controller inputs; it also needs a initial RAM setup, the SMB ACE happens because of out of bounds access when defeating Bowser with fireballs on level N, which is only accessible due to initial memory manipulation through another cartridge (or etc). This is in contrast to ACE on SMB3 which can happen on the title screen and is used to setup the initial RAM.
pipes 4 hours ago [-]
Me too!
mbStavola 16 hours ago [-]
I thought the audio was just overlayed on top, but it was streamed in via the controller. It sounds AMAZING, incredibly even on the console!
jofzar 9 hours ago [-]
Same, I was like "oh shame it's audio is overlayed" then realized it wasn't. Amazing.
baobun 2 days ago [-]
Super cool.
Is it mentioned anywhere how big the payload is? How many button presses? Are the audio samples "streamed" or does it all fit in NES RAM?
100th_Coin 18 hours ago [-]
Hey, I'm the TASer who put this run together.
This was 5.8 million inputs.
To summarize what I put in the writeup, the 7-bit PCM audio was streamed in at approximately 25 Khz, (reading from the controller and writing to address $4011 every 71 CPU cycles.) while occasionally dipping to 9 Khz while streaming in the graphics data.
Sesse__ 11 hours ago [-]
I never really got the part about the audio conversion; are you just rounding it from 16 to 7 bits, or are you doing dithering + noise shaping (as you definitely should at such low bit depths)? And similarly, how are you downsampling from 44 kHz to a time-varying sample rate; are you properly filtering, or are you getting tons of aliasing?
100th_Coin 8 hours ago [-]
The conversion from 16 bits to 7 bits was using rounding.
My method of downsampling was complicated. Since the creation of the TAS was being automated, and since I also needed to stream in graphics data occasionally, I ran into the issue of needing to know exactly what byte to read from the .wav file at any given moment. I used a custom NES emulator to emulate the generated inputs, and I had it count CPU cycles so I can convert that into seconds, then parse the .wav file with that info.
To be completely honest, this project was my first time directly reading the contents of a .wav file like this, and I had no prior experience writing code for audio conversion or playback. If I were to do this project again, I'd look into noise dithering + noise shaping, as well as filtering methods. I know at the very end of the TAS, there's certainly some weird audio artifacts that I couldn't figure out how to fix at the time.
Sesse__ 6 hours ago [-]
> The conversion from 16 bits to 7 bits was using rounding.
As a very quick fix, you can dither by just adding a random value from [-0.5, +0.5] before rounding (to -64..+63 or whatever your range is). It will give you a dither, and probably sound slightly better; a bit more noise for much less distortion. Noise shaping is left as an exercise for the reader :-) (It is probably nontrivial to get perfect with variable sample rate anyway.)
> I used a custom NES emulator to emulate the generated inputs, and I had it count CPU cycles so I can convert that into seconds, then parse the .wav file with that info.
It sounds like you are just picking one sample without any filtering/averaging/anything (nearest neighbor); this will cause aliasing, which is another part of the reason for the “roughness” you may hear in the sound. You can do a very cheap trick here as well: Take some audio software you trust (say, Audacity) and convert the .wav file to 25208 Hz. This means that you'll get good filtering for most of your audio, and less bad filtering for the 13.85kHz parts.
Aurornis 7 hours ago [-]
Very fun project. Thanks for the writeup and for coming here to answer questions.
temperceve 12 hours ago [-]
I'd love a laymane overview of what you've done here
GrantMoyer 6 hours ago [-]
I'm not the author, but these video-in-game projects typically work with a few phases:
1. Get the game into a specific state by performing specific actions, moving to specific positions, performing specific inputs, etc. so that a portion of the game state in RAM happens to be an executable program.
2. Jump to that executable code such as by corrupting the return address in the stack with a buffer overflow
3. (optional) The program from 1 may be a simple "bootstrap" program which lets the player directly write a new, larger program using controller inputs then jumps to the new program.
4. The program reads the video and audio from the stream of controller inputs, decodes them, and displays them. The encoding is usually an ad-hoc scheme designed to take advantage of the available hardware. The stream of replayed inputs is computed directly from the media files.
voidUpdate 15 hours ago [-]
yooo, its funny cartridge swap guy!
gregdeon 19 hours ago [-]
The entire TAS file takes about 16 MB, far more than the 4 KB of RAM on the NES. During the audio + video playback, the TAS is streaming via the controller by making inputs roughly 500 times per frame (15 kHz).
ninjin 19 hours ago [-]
Indeed, if you want to see what is possible with only a Famicom (or NES) itself you can have a look at Little Limit's incredible Bad Apple "port" [1]. This recording is from an emulator [2], but I know from personal experience that it plays perfectly fine on my "New Famicom" (HVC-101). This is not to detract from how amazing the posted ACE is, but it is indeed different in terms of data limitations.
It's impressive what can be done if a lot of effort is put in.
ninjin 16 hours ago [-]
It really is impressive, but it is also fair to point out that it uses the MXM-1 mapper which only came into existence in 2022 [1]. I find it pointless to argue whether it is "cheating" or not as the technology it uses was used for other consoles at the time and it is fun to see new mappers like this, but it is, again, very different compared to keeping it within the realm of original Famicom/NES mappers and limits.
TASbot uses ACE to take over pokemon red running in a Super Nintendo Super Gameboy, then takes over the super nintendo itself, then streams twitch chat through the controllers to display on screen.
Rendered at 22:32:53 GMT+0000 (Coordinated Universal Time) with Vercel.
This is essentially the fundamental operation of a vector quantizer codec, which was popular in the early days of lossy video compression due to its low processing requirements. The fact that this term doesn't appear in the article means the author has independently discovered and reinvented this theory, which is itself quite impressive.
Where is the data coming from? Why does it involve playing part of SMB to start with? Besides the sprites used, what does this have to do with SMB at all? How is this different from just a custom ROM that would play this music video? Could you even fit this much data into a ROM?
I feel like there's a bunch of context you need to understand what this is about, and I clearly don't have any of it. And the "objectives" list at the top isn't helping ("1. Do a little tomfoolery 2. Execute arbitrary code..."). What are the actual parameters of this challenge, to understand what is actually being achieved?
Can anybody here help explain?
Edit: thank you so much for all the replies! Now I get why this is wild.
You could in theory perform this on the original SNES hardware with the original SMB game cartridge, if you can press the buttons on the gamepad fast enough and with the right timing.
There's online forums for posting speed-running accomplishments and challenging others' accomplishments.
I'n this case, someone used the tools for automated speed running, to execute a glitch in SMB that allows for arbitrary code execution, then entered and ran a program that plays a Bad Apple music video, all with controller input. This is posted as a speed run with an accomplishment of playing the video.
A custom cartridge could be made to make a Bad Apple music video, but this is showing off a way to play the video using only SMB cartridges.
The data has to fit into RAM, which is much smaller than a tricky cartridge ROM. The music and graphic data is streamed as controller inputs, but the rest is done in RAM, using some data already in the SMB cartridge for video output, due to the low bitrate of controller input but also because the video processor can't display directly from RAM.
This video is done by hand and explains a bit more about how and why it works: https://www.youtube.com/watch?v=hB6eY73sLV0
a good video that makes this tech more understandable (with explanations at the beginnimg and an amazing demo at the end) https://www.youtube.com/watch?v=PNbkv_DJ0f0
and some background https://tasvideos.org/ArbitraryCodeExecutionHowTo
Controller inputs, same as if you were playing SMB yourself. This "just" presses buttons a lot faster and with more precision.
For a bit more background, this type of thing originally grew out of the game speedrunning community. Some people were interested in how much faster a speedrun could become if a "perfect" human with complete knowledge of the game state and perfect execution ran a game, and they modified games/created tools to mimic what such a run would look like, resulting in tool-assisted speedruns.
Because of this goal of showing off what might be theoretically possible, TASes are nearly universally restricted to ~the same input methods that humans would use. Where TASes differ is in how they're created, as they make use of external tools such as memory inspection (e.g., to observe RNG, player/enemy positions/speeds, etc.), frame advance/programmed input (for perfectly precise/reproducible inputs), and save states (so sections can be tried over and over again to correct mistakes, try different RNG, etc.).
> Why does it involve playing part of SMB to start with?
This is also tied to TASes showing off what is theoretically possible. Because of this, TASes pretty much always start from some "normal" state - game power-on (the normal start state for TASes submitted to tasvideos.org), a game state that is reachable by a normal human (e.g., done for one of the Ocarina of Time showcases at one of the Games Done Quick events IIRC), etc.
> Besides the sprites used, what does this have to do with SMB at all?
One of the "holy grails" of TASes is arbitrary code execution (ACE), where you manipulate a game into a state where you can get it to execute arbitrary code. This TAS shows that only using controller inputs it's possible to manipulate SMB and the game hardware into a state where ACE is possible.
> How is this different from just a custom ROM that would play this music video?
The difference is in the journey, not the destination. An analogy might be installing "plain" malware on a computer vs. finding/writing a RCE chain that gets you the same capabilities. The latter might be considered more technically impressive/interesting to certain communities.
> Could you even fit this much data into a ROM?
Not as-is in this case. Here the data is streamed in, sort of as if the controllers were serving as extended memory.
> And the "objectives" list at the top isn't helping ("1. Do a little tomfoolery 2. Execute arbitrary code..."). What are the actual parameters of this challenge, to understand what is actually being achieved?
TASes submitted to tasvideos.org come with a submitter-chosen list of objectives for the TAS since different TASes for the same game may aim to do different things. One TAS may try to finish a game as fast as possible using any possible method, another TAS may try to finish a game as fast as possible while restricting the use of some approaches, some TASes may try to finish a game as fast as possible while also getting a "true" ending or "full" completion, some TASes may just try to be entertaining without trying to be as fast as possible, etc.
Not all objectives are deemed worthwhile of official publication by tasvideos.org, but IIRC there aren't any hard-and-fast rules about what is acceptable and what isn't.
This particular TAS was submitted on April Fool's, so there's a bit of additional silliness involved.
It's not just controller inputs; it also needs a initial RAM setup, the SMB ACE happens because of out of bounds access when defeating Bowser with fireballs on level N, which is only accessible due to initial memory manipulation through another cartridge (or etc). This is in contrast to ACE on SMB3 which can happen on the title screen and is used to setup the initial RAM.
Is it mentioned anywhere how big the payload is? How many button presses? Are the audio samples "streamed" or does it all fit in NES RAM?
I share the full assembly code in the tasvideos writeup: https://tasvideos.org/8991S#HereSTheAsmCode
To summarize what I put in the writeup, the 7-bit PCM audio was streamed in at approximately 25 Khz, (reading from the controller and writing to address $4011 every 71 CPU cycles.) while occasionally dipping to 9 Khz while streaming in the graphics data.
My method of downsampling was complicated. Since the creation of the TAS was being automated, and since I also needed to stream in graphics data occasionally, I ran into the issue of needing to know exactly what byte to read from the .wav file at any given moment. I used a custom NES emulator to emulate the generated inputs, and I had it count CPU cycles so I can convert that into seconds, then parse the .wav file with that info.
To be completely honest, this project was my first time directly reading the contents of a .wav file like this, and I had no prior experience writing code for audio conversion or playback. If I were to do this project again, I'd look into noise dithering + noise shaping, as well as filtering methods. I know at the very end of the TAS, there's certainly some weird audio artifacts that I couldn't figure out how to fix at the time.
As a very quick fix, you can dither by just adding a random value from [-0.5, +0.5] before rounding (to -64..+63 or whatever your range is). It will give you a dither, and probably sound slightly better; a bit more noise for much less distortion. Noise shaping is left as an exercise for the reader :-) (It is probably nontrivial to get perfect with variable sample rate anyway.)
> I used a custom NES emulator to emulate the generated inputs, and I had it count CPU cycles so I can convert that into seconds, then parse the .wav file with that info.
It sounds like you are just picking one sample without any filtering/averaging/anything (nearest neighbor); this will cause aliasing, which is another part of the reason for the “roughness” you may hear in the sound. You can do a very cheap trick here as well: Take some audio software you trust (say, Audacity) and convert the .wav file to 25208 Hz. This means that you'll get good filtering for most of your audio, and less bad filtering for the 13.85kHz parts.
1. Get the game into a specific state by performing specific actions, moving to specific positions, performing specific inputs, etc. so that a portion of the game state in RAM happens to be an executable program.
2. Jump to that executable code such as by corrupting the return address in the stack with a buffer overflow
3. (optional) The program from 1 may be a simple "bootstrap" program which lets the player directly write a new, larger program using controller inputs then jumps to the new program.
4. The program reads the video and audio from the stream of controller inputs, decodes them, and displays them. The encoding is usually an ad-hoc scheme designed to take advantage of the available hardware. The stream of replayed inputs is computed directly from the media files.
[1]: https://littlelimit.net/bad_apple_2_5.htm
[2]: https://www.youtube.com/watch?v=eNU1lzr_m4Q
It's impressive what can be done if a lot of effort is put in.
[1]: https://somethingnerdy.com/unlocking-the-nes-for-former-dawn
They use a modified ROM to set the memory state, but that could be done with SMB3 manually, apparently.
This is a video of it running on a console, though I'm uncertain how the inputs were provided
Could it run doom?
TASbot uses ACE to take over pokemon red running in a Super Nintendo Super Gameboy, then takes over the super nintendo itself, then streams twitch chat through the controllers to display on screen.