Hacker News Clone

jgrahamcDec 1, 2025, 2:29 PM

In my 6502 hacking days, the presence of an exclusive OR was a sure-fire indicator you’d either found the encryption part of the code, or some kind of sprite routine.

Yeah, sadly the 6502 didn't allow you to do EOR A; while the Z80 did allow XOR A. If I remember correctly XOR A was AF and LD A, 0 was 3E 01[1]. So saved a whole byte! And I think the XOR was 3 clock cycles fast than the LD. So less space taken up by the instruction and faster.

I have a very distinct memory in my first job (writing x86 assembly) of the CEO walking up behind my desk and pointing out that I'd done MOV AX, 0 when I could have done XOR AX, AX.

[1] 3E 00

vanderZwanDec 1, 2025, 2:38 PM

Hah, we commented on the exact same paragraph within a minute of each other! My memory agrees with your memory, although I think that should be 3E 00. Let me look that up:

https://jnz.dk/z80/ld_r_n.html

https://jnz.dk/z80/xor_r.html

Yep, if I'm reading this right that's 3E 00, since the second byte is the immediate value.

One difference between XOR and LD is that LD A, 0 does not affect flags, which sometimes mattered.

sfinkDec 1, 2025, 8:39 PM

What is this "LD A, 0" syntax? Is it a z80 thing?

One of the random things burned into my memory for 6502 assembly is that LDA is $A9. I never separated the instruction from the register; it's not like they were general purpose. But that might be because I learned programming from the 2 books that came with my C64, a BASIC manual and a machine code reference manual, and that's how they did it.

I learned assembly programming by reading through the list of supported instructions. That, and typing in games from Compute's Gazette and manually disassembling the DATA instructions to understand how they worked. Oh, and the zero-page reference.

Good times.

jgrahamcDec 3, 2025, 7:18 PM

What is this "LD A, 0" syntax? Is it a z80 thing?

On the 6502 you had three instructions LDA, LDX, LDY where the register name is essentially part of the instruction name. On the Z80 you had a lot of "load" instruction so you had LD and then many different operands: loading 8-bit registers, loading 16-bit, writing to memory, reading from memory, reading/writing from memory using a register as an index. So, made more sense on Z80 to have "LD" whereas LDA/LDX/LDY worked fine on 6502.

NarishmaDec 1, 2025, 10:42 PM

> One of the random things burned into my memory for 6502 assembly is that LDA is $A9. I never separated the instruction from the register; it's not like they were general purpose.

You had LDA and LDX and LDY as separate instructions while the Z80 assembler had a single LD instruction with different operands. It's the same thing really.

sfinkDec 2, 2025, 12:11 AM

Right, though the LD? and ST? instructions were kind of exceptions. You could only do arithmetic and stack and bitwise ops (and, or, eor, shift, rotate) with A, never X nor Y. Increment and decrement were X/Y only. You couldn't even add two registers together without stashing one in memory.

vanderZwanDec 2, 2025, 8:56 AM

> What is this "LD A, 0" syntax? Is it a z80 thing?

Well, I never wrote any 6502 so I can't compare, but yes, you could load immediate values into any register except the flag register on the Z80. Was that not a thing on the 6502?

jgrahamcDec 3, 2025, 7:15 PM

The 6502 instruction set was really limited but there were three registers: A, X, Y and there were immediate load instructions for each: LDA #0, LDX #0, LDY #0.

jgrahamcDec 1, 2025, 2:52 PM

You're right. Of course, it's 3E 00. Not sure how I remembered 3E 01. My only excuse is that it was 40 years ago!

anonzzziesDec 1, 2025, 3:26 PM

3E 00 : I was on MSX and never had an assembler when you so I only remember the Hex, never actually knew the instructions; I wrote programs/games by data 3E,00,CD,etc without comments saying LD A as I never knew those at the time.

unnahDec 1, 2025, 4:37 PM

Umm... how did you manage to learn those hex codes? You just read a lot of machine code and it started to make sense?

jgrahamcDec 1, 2025, 4:40 PM

I started out writing machine code without an assembler and so had to hand assemble a lot of stuff. After a while you end up just knowing the common codes and can write your program directly. This was also useful because it was possible to write or modify programs directly through an interface sometimes called a "front panel" where you could change individual bytes in memory.

Back in 1985 I did some hand-coding like this because I didn't have access to an assembler: https://blog.jgc.org/2013/04/how-i-coded-in-1985.html and I typed the whole program in through the keypad.

stevekempDec 1, 2025, 5:16 PM

Same here. On/For the ZX Spectrum, looking up the hex-codes in the back of the orange book. At least it was spiral-bound to make it easier.

Later still I'd be patching binaries to ensure their serial-checks passed, on Intel.

af78Dec 1, 2025, 6:22 PM

I had a similar experience of writing machine code for Z80-based computers (Amstrad CPC) in the 90's, as a teenager. I didn't have an assembler so I manually converted mnemonics to hex. I still remember a few opcodes: CD for CALL, C9 for RET, 01 for LD BC, 21 for LD HL... Needless to say, the process was tedious and error-prone. Calculating relative jumps was a pain. So was keeping track of offsets and addresses of variables and jump targets. I tended to insert nops to avoid having to recalculate everything in case I needed to modify some code... I can't say I miss these times.

I'm quite sure none of my friends knew any CPU opcode; however, people usually remembered a few phone numbers.

kragenDec 1, 2025, 4:55 PM

The instruction sets were a lot simpler at the time. The 8080 instruction set listing is only a few pages, and some of that is instructions you rarely use like RRC and DAA. The operand fields are always in the same place. My own summary of the instruction set is at https://dercuano.github.io/notes/8080-opcode-map.html#addtoc....

senderistaDec 1, 2025, 7:11 PM

It wasn't unusual in the 80s to type in machine code listings to a PC; I remember doing this as an 8-year-old from magazines, but I didn't understand any of the stuff I was typing in.

anonzzziesDec 1, 2025, 8:59 PM

Typing from mags, getting interested in how the magic works by learning to use a hex monitor and trying out things. I was a kid so time enough.

I didn't know you could do it differently for years after I started.

amirhirschDec 1, 2025, 5:21 PM

I implemented a PDP-11 in 2007-10 and I can still read PDP-11 Octal

favoritedDec 1, 2025, 9:23 PM

"Prefer `xor a` instead of `ld a, 0`" is basically the first optimization that you learn when doing SM83 assembly.

https://github.com/pret/pokecrystal/wiki/Optimizing-assembly...

mmphosisDec 1, 2025, 8:26 PM

Try to keep the value 0 in the Y register.

  echo tya|asm|mondump -r|6502
                                A=AA X=00 Y=00 S=00 P=22 PC=0300  0
  0300- 98        TYA           A=00 X=00 Y=00 S=00 P=22 PC=0301  2

brucehoultDec 2, 2025, 1:18 AM

That's 1 byte smaller than `LDA #0`, but not faster. And you don't have enough registers to waste them -- being able to do `STZ` and the `(zp)` addressing mode without having to keep 0 in Z or Y were small but soooo convenient things in the 65C02.

snvzzDec 2, 2025, 6:38 AM

You might like the PC Engine, a game console based on the 65C02*.

*Actually a custom chip also containing some peripherals.

user3939382Dec 2, 2025, 1:58 AM

I’m building a new 6502 machine

daekenDec 1, 2025, 1:16 PM

Back in 2005 or 2006, I was working at a little startup with "DVD Jon" Johansen and we'd have Quake 3 tournaments to break up the monotony of reverse-engineering and juggling storage infrastructure. His name was always "xor eax,eax" and I always just had to laugh at the idea of getting zeroed out by someone with that name. (Which happened a lot -- I was good, but he was much better!)

VectorLockDec 1, 2025, 4:04 PM

I was there but never got in on the Quake 3 fun; mp3t**

pansa2Dec 1, 2025, 1:22 PM

> Unlike other partial register writes, when writing to an e register like eax, the architecture zeros the top 32 bits for free.

I’m familiar with 32-bit x86 assembly from writing it 10-20 years ago. So I was aware of the benefit of xor in general, but the above quote was new to me.

I don’t have any experience with 64-bit assembly - is there a guide anywhere that teaches 64-bit specifics like the above? Something like “x64 for those who know x86”?

sparkieDec 1, 2025, 2:03 PM

It's not only xor that does this, but most 32-bit operations zero-extend the result of the 64-bit register. AMD did this for backward compatibility. so existing programs would mostly continue working, unlike Intel's earlier attempt at 64-bits which was an entirely new design.

The reason `xor eax,eax` is preferred to `xor rax,rax` is due to how the instructions are encoded - it saves one byte which in turn reduces instruction cache usage.

When using 64-bit operations, a REX prefix is required on the instruction (byte 0x40..0x4F), which serves two purposes - the MSB of the low nybble (W) being set (ie, REX prefixes 0x48..0x4f) indicates a 64-bit operation, and the low 3 bits of low nybble allow using registers r8-r15 by providing an extra bit for the ModRM register field and the base and index fields in the SIB byte, as only 3-bits (8-registers) are provided by x86.

A recent addition, APX, adds an additional 16 registers (r16-r31), which need 2 additional bits. There's a REX2 prefix for this (0xD5 ...), which is a two byte prefix to the instruction. REX2 replaces the REX prefix when accessing r16-r31, still contains the W bit, but it also includes an `M0` bit, which says which of the two main opcode maps to use, which replaces the 0x0F prefix, so it has no additional cost over the REX prefix when accessing the second opcode map.

cesarbDec 1, 2025, 2:39 PM

> It's not only xor that does this, but most 32-bit operations zero-extend the result of the 64-bit register. AMD did this for backward compatibility.

It's not just that, zero-extending or sign-extending the result is also better for out-of-order implementations. If parts of the output register are preserved, the instruction needs an extra dependency on the original value.

ychen306Dec 1, 2025, 6:26 PM

This. It's for renaming.

nickelproDec 1, 2025, 6:09 PM

Except for `xchg eax, eax`, which was the canonical nop on x86. Because it was supposed to do nothing, having it zero out the top 32-bits of rax would be quite surprising. So it doesn't.

Instead you need to use the multi-byte, general purpose encoding of `xchg` for `xchg eax, eax` to get the expected behavior.

veltasDec 1, 2025, 1:34 PM

Chapter 3 of volume 1, ctrl+f for "64-bit mode", has a lot of the essentials including e.g. the stuff about zeroing out the top half of the register.

https://www.intel.com/content/www/us/en/developer/articles/t...

huflungdungDec 1, 2025, 1:44 PM

[dead]

matt_dDec 1, 2025, 1:44 PM

See https://github.com/MattPD/cpplinks/blob/master/assembly.x86.... - mostly focused on x86-64 (and some of the talks/tutorials offer pretty good overview)

eb0laDec 1, 2025, 1:33 PM

I remember a lot of code zeroing registrers, dating at least back from the IBM PC XT days (before the 80286).

If you decode the instruction, it makes sense to use XOR:

- mov ax, 0 - needs 4 bytes (66 b8 00 00) - xor ax,ax - needs 3 bytes (66 31 c0)

This extra byte in a machine with less than 1 Megabyte of memory did id matter.

In 386 processors it was also - mov eax,0 - needs 5 bytes (b8 00 00 00 00) - xor eax,eax - needs 2 bytes (31 c0)

Here Intel made the decision to use only 2 bytes. I bet this helps both the instruction decoder and (of course) saves more memory than the old 8086 instruction.

wildlogicDec 1, 2025, 10:19 PM

I learned this trick writing shellcode - the shellcode has to be null byte (0x00) free, or it will terminate and not progress past the null byte, since it is the string terminator. of course, when you xor something with itself, the result is zero. the byte code generated by the instruction xor eax, eax doesn't contain null bytes, whereas mov eax, 0 does.

anhldbkDec 1, 2025, 11:27 PM

Yes, it's one of my favorite trick also.

fookerDec 1, 2025, 1:56 PM

It's funny how machine code is a high level language nowadays, for this example the CPU recognizes the zeroing pattern and does something quite a bit different.

dheatovDec 1, 2025, 6:03 PM

It's really impressive how powerful and efficient it has become. However, I find it so much more difficult to build mental model of it. I've been struggling with atomic and r/w barrier as there are sooo many ways the instructions could've been executed (or not executed!).

fookerDec 1, 2025, 10:29 PM

It's a consequence of keeping our general purpose single threaded programming model the same for five decades.

It has it's merits, but the underlying hardware has changed.

Intel tried to push this responsibility to the compiler with Itanium but that failed catastrophicically, so we're back to the CPU pretending it's 1985.

ReubenssonDec 1, 2025, 2:32 PM

What do you mean that cpu does something different? Isnt cpu doing what is being asked, that being xor with consequence of zeroing when given two same values.

IsTomDec 1, 2025, 2:37 PM

I think OP means that it has come a long way from the simple mental model of µops being a direct execution of operations and with all the register renamings and so on

doogliusDec 1, 2025, 5:02 PM

FTA:

> And, having done that it removes the operation from the execution queue - that is the xor takes zero execution cycles!1 It’s essentially optimised out by the CPU

12_throw_awayDec 1, 2025, 8:40 PM

> with consequence of zeroing when given two same values

Right, it has the same consequence, but it doesn't actually perform the stated operation. ASM is just a now just a high level language that tells the computer to "please give me the same state that a PDP-11-like computer would give me upon executing these instructions."

fookerDec 1, 2025, 5:47 PM

Same consequence yes.

But it will not execute xor, nor will it actually zero out eax in most cases.

It'll do something similar to constant propagation with the information that whenever xor eax, eax occurs; all uses of eax go through a simpler execution path until eax is overwritten.

horsawlarwayDec 2, 2025, 12:02 AM

No.

It's emulating the zero result when it recognizes this pattern, usually by playing clever tricks with virtual registers.

timonokoDec 1, 2025, 7:33 PM

I had Atari Portfolio, sold without software, except basic command.com. I could only copy raw ASCII bytes from keyboard to file. When it is in COM-file, it is directly executable. What is the simple program I made from ASCII-bytes to read bytes in hexadecimal. I thought that for a week, but finally solved it.

@grok: This is a classic and brilliant puzzle from the early days of DOS. The challenge is to write a self-displaying hex dumper (.COM file) using only the copy con command, which means every byte of the program must correspond to a standard, typeable ASCII character. This is a form of code-golfing that relies on a deep understanding of 8086 machine code.

The solution is a clever piece of self-modifying code. Since many essential 8086 opcodes (like MOV, INT, RET) don't have printable ASCII equivalents, the program must construct these instructions in memory and then execute them.

Here is a well-known 128-byte solution. I will first show you the ASCII characters you would type, and then explain how it works.

The Program to Type

You would have started by typing copy con hex.com and then carefully typed the following 128 characters (including spaces and case-sensitive characters) and finally pressed Ctrl+Z to save the file.

   1 j!>1!QhH-@0X,jHj.Ph1PhD0Ph 0@h, j, h.0@h, h 0@h.H-@,jH1X,h.H.@,H1j j X,h.H.@,H1j j X.H-@H-@,jHj.Ph1PhE0Ph@0@h, j, h.0@h, h
     0@h.

How It Works: A High-Level Explanation

This program is a marvel of 8086 trickery. Here is a simplified breakdown of what's happening:

etc.etc

timonokoDec 1, 2025, 7:38 PM

My program was definitively shorter. I think I did not bother with real hexadecimals. Just used last four bytes of characters to make a full byte. Used it as a bootstrap program.

@grok: While your exact code is lost to time, it would have looked something like one of the ultra-small ASCII "dropper" programs that were once passed around. Here is a plausible 32-byte example of what the program you typed might have looked like.

  You would have run copy con nibbler.com, typed the following line, and hit Ctrl+Z:

  `j%1!PZYfX0f1Xf1f1AYf1E_j%1!PZ`

  This looks like nonsense, but to the 8088/8086 processor, it's a dense set of instructions that does the following:

etc etc.

timonokoDec 1, 2025, 7:53 PM

97% of these millenials of HN do not understand the problem and its brilliant solution. That is why I was truly astonished @grok grokked it rightaway.

BTW. It is not beyond possibility that this nibbler or dropper was made by myself and published in Usenet by me myself in 1989. Who else would have such a problem.

It was a bankcrupt sale and the machine was sold as "inactivated".

rep_lodsbDec 5, 2025, 12:52 PM

97% of AI enthusiasts will be astonished by complete garbage, and just blindly copy-paste it into forums, assuming that others will share their astonishment (and somehow consider them intelligent / interesting for it, despite expending almost zero effort themselves?).

Let's take a look at just the first few instructions here:

    ASCII HEX      ASM
    j%    6A 25    PUSH 25h        ; 80186+ only, won't work on Atari Portfolio!
    1!    31 21    XOR [BX+DI],SP  ; complete nonsense
    P     50       PUSH AX         ; AX sort-of-undefined at this point, depends on DOS version and command line arguments
    Z     5A       POP DX          ; copy AX into DX
    Y     59       POP CX          ; CX = 25h?
    fX    66 58    POP EAX         ; 80386+ only! also nothing on stack anymore!
    0f1   30 66 31 XOR [BP+31h],AH ; complete nonsense
    ...

This looks like randomly jumbled together from fragments of ASCII-only machine code intended for much newer x86 processors -- just look at how often `f` appears in those strings, which is the operand-size prefix, and meaningless on 16-bit chips. The memory addressing might make more sense in 32-bit mode too, where a different encoding is used (modr/m and SIB byte).

And this is how LLMs work. They can associate the tokens for e.g. `PZ` with the instructions `PUSH AX / POP DX`, and a certain kind of program in which those would be likely to appear, drawn somewhere from their massive training set.

Humans can easily learn to recognize these 'words' of ASCII text in machine code too, just by spending time looking at it. Another good one is the pair of `<ar` and `<zw` (usually next to each other, with an unprintable character between), present in most upcase()-like code from that era. So if you had asked Grok to accept both upper and lower case of hex input, I would bet that it would have inserted those sequences somewhere too.

But what LLMs CAN NOT do is plan ahead on how to use these program fragments to accomplish a specific task, or keep simple constraints in mind like "this must run on the 80C88 processor in the Atari Portfolio". They even have trouble with keeping track of registers, stack, or which mode the CPU is in.

(yes, late, but I can't let this stand here uncommented. https://xkcd.com/386 )

pclmulqdqDec 1, 2025, 1:37 PM

In modern CPUs, a lot of these are recognized as zeroing idioms and they end up doing the same thing (often a register renaming trick). Using the shortest one makes sense. If you use a really weird zeroing pattern, you can also see it as a backend uop while many of these zeroing idioms are elided by the frontend on some cores.

deadcoreDec 1, 2025, 1:36 PM

Matt Godbolt also uploads to his self titled Youtube channel: https://www.youtube.com/watch?v=eLjZ48gqbyg

brucehoultDec 2, 2025, 1:22 AM

He also runs a site with a bunch of different compilers and versions :p

mattgodboltDec 2, 2025, 2:40 AM

That's just some weird side hobby of his.

brucehoultDec 2, 2025, 5:27 AM

Dude. You've become a verb.

vanderZwanDec 1, 2025, 2:30 PM

Not sure why you got downvoted for pointing that out - it might be linked at the end of the article but people can still miss that.

deadcoreDec 1, 2025, 5:17 PM

*shrugs* the internet being the internet I suppose.

There was "See the video that accompanies this post." but NGL was just posting encase anyone didn’t have time to read or missed it.

omnicognateDec 1, 2025, 1:33 PM

It happens to be the first instruction of the first snippet in the wonderful xchg rax,rax.

https://www.xorpd.net/pages/xchg_rax/snip_00.html

doogliusDec 1, 2025, 1:45 PM

Not sure what I am looking at here is this just a bunch of different ways to zero registers?

omnicognateDec 1, 2025, 2:00 PM

It's a collection of interesting assembly snippets ("gems and riddles" in the author's words) presented without commentary. People have posted annotated "solutions" online, but figuring out what the snippets do and why they are interesting is the fun of it.

It's also available as an inscrutable printed book on Amazon.

mubou2Dec 1, 2025, 2:50 PM

That music when you click "int" is awesome. Reminds me of the good ol' days of keygens.

thereinDec 1, 2025, 6:42 PM

Keygen music will always have a special place in my heart. This is a good one.

I do wonder who was the first cracker that thought of including a keygen music that started the tradition.

I also miss how different groups competed with each other and boasted about theirs while dissing others in readmes.

Readme's would have .NFO suffix and that would try to load in some Windows tool but you had to open them in notepad. Good times.

AudiophilipDec 1, 2025, 3:33 PM

It's a chiptune-style xm module, "Funky Stars" by Quazar: https://soundcloud.com/scene_music/funky-stars

fortran77Dec 1, 2025, 2:24 PM

Back when I did IBM 370 BAL Assembly Language, we did the same thing to clear a register to zero.

  XR   15,15         XOR REGISTER 15 WITH REGISTER 15

vs

  L    15,=F'0'      LOAD REGISTER 15 WITH 0

This was alleged to be faster on the 370 because because XR operated entirely within the CPU registers, and L (Load) fetched data from memory (i.e.., the constant came from program memory).

charles_fDec 1, 2025, 3:24 PM

> By using a slightly more obscure instruction, we save three bytes every time we need to set a register to zero

Meanwhile, most "apps" we get nowadays contain half of npmjs neatly bundled in electron. I miss the days when default was native and devs had constraints to how big their output could be.

FilligreeDec 1, 2025, 4:09 PM

JS is just easier and takes less code.

Which isn’t an excuse anymore. UI coding isn’t that hard; if someone can’t do it, well, Claude certainly can.

charles_fDec 1, 2025, 4:19 PM

I'm fine with that, but keeping some consideration to optimization should still be something, even in environments when constraints are low. The problem is when no-one cares and includes 4 versions of jquery in their app so that they don't have to do const $=document.getElementById, everything grows to weigh 1Gb, use 1Gb of ram and 10% of your CPU, and your system is as sluggish nowadays (or even more) than it was 10y ago, with 10x the ram and processing power.

anticrymacticDec 1, 2025, 5:07 PM

> so that they don't have to do const $=document.getElementById,

``` const window.$ = (q)=>document.querySelector(q); ``` Emulates the behavior much better. This is already set on modern version of browsers[1]

[1] https://firefox-source-docs.mozilla.org/devtools-user/web_co...

dapperhnDec 2, 2025, 1:49 AM

It is set, but only in the developer console, not for JavaScript included with the website/app.

int_19hDec 1, 2025, 8:47 PM

It's not even true that HTML/JS is easier than something like, say, WPF.

saagarjhaDec 1, 2025, 5:09 PM

Claude is pretty bad at coding UIs.

ethinDec 1, 2025, 5:21 PM

> In this case, even though rax is needed to hold the full 64-bit long result, by writing to eax, we get a nice effect: Unlike other partial register writes, when writing to an e register like eax, the architecture zeros the top 32 bits for free. So xor eax, eax sets all 64 bits to zero.

I had no idea this happened. Talk about a fascinating bit of X86 trivia! Do other architectures do this too? I'd imagine so, but you never know.

monocasaDec 1, 2025, 6:50 PM

A lot of the RISC architectures do something similar (sign extend rather than zero extend) when using 32 ops on a 64 bit processor. MIPS and PowerPC come to mind off of the top of my head. Being careful about that in the spec basically lets them treat 32-bit mode on a 64-bit processor as just 'mask off the top bits on any memory access'. Some of these processors will even let you use 64bit ops in 32bit mode, and really only just truncate memory addresses.

So the real question is why does x86 zero extend rather than sign extend in these cases, and the answer is probably that by zero extending, with an implementation that treats a 64bit architectural register as a pair 32bit renamed physical registers, you can statically set the architectural upper register back on the free pool by marking it as zero rather than the sign extended result of an op.

201984Dec 1, 2025, 5:26 PM

AArch64 also zeroes the upper 32 bits of the destination register when you use a 32 bit instruction.

flykespiceDec 1, 2025, 5:54 PM

I'm curious, why is that?

I know x86-64 zeroes the upper part of the register for backwards compability and improve instruction cache (no need for REX prefix), but AArch64 is unclear for me.

201984Dec 1, 2025, 6:36 PM

It's to break dependencies for register renaming. If you have an instruction like

  mov w5, w6 // move low 32 bits of register 6 into low 32 bits of register 5

This instruction only depends on the value of register 6. If instead it of zeroing the upper half it left it unchanged, then it would depend on w6 and also the previous value of register 5. That would constrain the renamer and consequently out-of-order execution.

umanwizardDec 1, 2025, 6:18 PM

I don't know either, but why wouldn't backwards compatibility apply to aarch64? It too is based on a pre-existing 32-bit architecture.

NarishmaDec 2, 2025, 5:56 AM

I don't think it's backwards compatible the same way x86-64 is.

zeuxcgDec 1, 2025, 8:09 PM

You really want to avoid a dependency on prior content of the destination register, to allow renaming and maximize out of order scheduling.

vanderZwanDec 1, 2025, 2:28 PM

> In my 6502 hacking days, the presence of an exclusive OR was a sure-fire indicator you’d either found the encryption part of the code, or some kind of sprite routine.

Meanwhile, people like me who got started with a Z80 instead immediately knew why, since XOR A is the smallest and fastest way to clear the accumulator and flag register. Funny how that also shows how specific this is to a particular CPU lineage or its offshoots.

flustercanDec 1, 2025, 5:58 PM

As a longtime developer currently perusing their first computer science degree, it makes me happy that I understood this article. Nearly makes all the trouble seem worth it.

grimgrinDec 1, 2025, 3:34 PM

I'd like to learn about the earliest pronunciations of these instructions. Only because watching a video earlier, I heard "MOV" pronounced "MAUV" not "MOVE"

Not sure exactly how I could dig up pronunciations, except finding the oldest recordings

pwgDec 2, 2025, 4:16 AM

> Only because watching a video earlier, I heard "MOV" pronounced "MAUV" not "MOVE"

Was it someone from an electronics background? Because MOV is also the acronym for Metal Oxide Varistor [1] from electronics and in the electronics world the acronym it is often pronounced "MAUV".

[1] https://en.wikipedia.org/wiki/Varistor

jmmvDec 1, 2025, 4:15 PM

> It gets better though! Since this is a very common operation, x86 CPUs spot this “zeroing idiom” early in the pipeline and can specifically optimise around it: the out-of-order tracking systems knows that the value of “eax” (or whichever register is being zeroed) does not depend on the previous value of eax, so it can allocate a fresh, dependency-free zero register renamer slot.

While this is probably true ("probably" because I haven't checked it myself, but it makes sense), the CPU could do the exact same thing for "mov eax, 0", couldn't it? (Does it?)

adrian_bDec 1, 2025, 5:23 PM

Most Intel/AMD CPUs do the same thing for a few alternative instructions, e.g. "sub rax, rax".

I do not think that anyone bothers to do this for a "mov eax, 0", because neither assembly programmers nor compilers use such an instruction. Either "xor reg,reg" or "sub reg,reg" have been the recommended instructions for clearing registers since 1978, i.e. since the launch of Intel 8086, because Intel 8086 lacked a "clear" instruction, like that of the competing CPUs from DEC or Motorola.

One should remember that what is improperly named "exclusive or" in computer jargon is actually simultaneously addition modulo 2 and subtraction modulo 2 (because these 2 operations are identical; the different methods of carry and borrow generation distinguish addition from subtraction only for moduli greater than 2).

The subtraction of a thing from itself is null, which is why clearing a register is done by subtracting it from itself, either with word subtraction or with bitwise modulo-2 subtraction, a.k.a. XOR.

(The true "exclusive or" operation is a logical operation distinct from the addition/subtraction modulo 2. These 2 distinct operations are equivalent only for 2 operands. For 3 or more operands they are different, but programmers still use incorrectly the term XOR when they mean the addition modulo 2 of 3 or more operands. The true "exclusive" or is the function that is true only when exactly one of its operands is true, unlike "inclusive" or, which is true when at least one of its operands is true. To these 2 logical "or" functions correspond the 2 logical quantifiers "There exists a unique ..." and "There exists a ...".)

lucozadeDec 1, 2025, 4:31 PM

> couldn't it? (Does it?)

It could of course. It can do pretty much any pattern matching it likes. But I doubt very much it would because that pattern is way less common.

As the article points out, the XOR saves 3 bytes of instructions for a really, really common pattern (to zero a register, particularly the return register).

So there's very good reason to perform the XOR preferentially and hence good reason to optimise that very common idiom.

Other approaches eg add a new "zero <reg>" instruction are basically worse as they're not backward compatible and don't really improve anything other than making the assembly a tiny bit more human readable.

electrolyDec 1, 2025, 4:20 PM

Sure, lots of longer instructions have this effect. "xor eax,eax" is interesting because it's short. That zero immediate in "mov eax,0" is bigger than the entire "xor eax,eax" instruction.

MobiusHorizonsDec 1, 2025, 4:22 PM

I believe it does in some newer CPUs. It takes extra silicon to recognize the pattern though, and compilers emit the xor because the instruction is smaller, so I doubt there is much speed up in real workloads.

pwgDec 2, 2025, 4:17 AM

> the CPU could do the exact same thing for "mov eax, 0", couldn't it?

Yes, it could, but mov eax, 0 is still going to also be six bytes of instruction in cache, and fetched, and decoded, so optimizing on the shorter version is marginally better.

addaonDec 1, 2025, 4:20 PM

Yes, "mov r, imm" also breaks dependencies -- but the immediate needs to be encoded, so the instruction is longer.

jakewilDec 2, 2025, 12:54 AM

I'm building a gameboy emulator and when I was debugging the boot ROM I noticed there was the instruction `xor A` (which xor's a with itself). I was wondering why they chose such a weird way to set A to 0. Now it makes sense -- since the boot ROM is only 256 bytes, they really needed to conserve space! Thanks for this, looking forward to the rest of the series!

jabedudeDec 1, 2025, 2:35 PM

similarly IIRC, on (some generations of) x86 chips, NOP is sugar around `XCHG EAX, EAX` which is effectively a do-nothing operation

bitwizeDec 1, 2025, 3:31 PM

This is pretty much all x86 chips as far as I'm aware: opcode 0x90 which is equivalent to XCHG AX,AX.

The 8080 and Z80's NOP was at opcode 0. Which was neat because you could make a "NOP slide" simply by zeroing out memory.

kccqzyDec 1, 2025, 4:57 PM

There are multiple variants of nop mainly because you sometimes need the nop instruction to take up a certain number of bytes for alignment purposes. You have the 1-byte nop, but there is also the 9-byte nop.

ternaryoperatorDec 1, 2025, 10:31 PM

The origin AFAIK stems from the mainframe days. When using BAL (the assembly language for the IBM/360 family and its descendants), xoring was faster than moving 0 to the variable. Many of the early devs who wrote assembly for PCs came from mainframe backgrounds and so the idiom was carried over.

SuzuranDec 2, 2025, 1:51 PM

In some older IBM-built processors (channel controllers, the various iterations of the CSP), an xor of something against itself also had the effect of safely clearing a stored bad parity without triggering a parity check from reading the operand. You would see strategic clearing in this manner done by system software or firmware during error recovery or early initialization.

kwertyoowiyopDec 1, 2025, 8:20 PM

In this thread, we have found all the programmers born before 1975!

vanderZwanDec 1, 2025, 8:58 PM

Hey, some of us are younger and happened to get into programming via making games on their TI-83 graphing calculator in Z80!

struc_soDec 2, 2025, 7:10 AM

It’s not just about code size or cycle count anymore; modern OoO (Out-of-Order) processors treat this idiomatically. The renamer recognizes xor reg, reg as a dependency-breaking zeroing idiom immediately, which frees up the physical register allocation faster than a mov. It's fascinating how hardware optimization has effectively leaked into the instruction set definition over time.

3oil3Dec 2, 2025, 1:59 AM

What a great article! When the author mentionned "showing-off", that's what I thought at first, I mean, most of us have the "why not spend 2 hours trying to figure it out when you can read the manual for 2 minute" kind of mind-set, which is similar to the "why not make it really complex if we can make it simple". But no, it's actually a really smart idea!!

DweditDec 1, 2025, 1:41 PM

Because "sub eax,eax" looks stupid. (and also clears the carry flag, unlike "xor eax, eax")

tom_Dec 1, 2025, 2:00 PM

xor clears the carry as well? In fact, looks like xor and sub affect the same set of flags!

xor:

> The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is undefined.

sub:

> The OF, SF, ZF, AF, PF, and CF flags are set according to the result.

(I don't have an x64 system handy, but hopefully the reference manual can be trusted. I dimly remembered this, or something like it, tripping me up after coming from programming for the 6502.)

trollbridgeDec 1, 2025, 3:13 PM

This is a good thing since the pipeline now doesn’t have to track the state of the flags since they all got zero’d.

sfinkDec 1, 2025, 9:08 PM

Strangely, the only difference on the flags is that AF (auxiliary carry) is undefined for `xor eax, eax` but guaranteed to be zeroed for `sub eax, eax`. I don't know what that means in practice, though I'm guessing that at the very least the hardware would not treat it as a dependency on the previous value.

HackerThemAllDec 1, 2025, 3:25 PM

If I remember correctly, sub used to be slower than xor on some ancient architectures.

QuitschquatDec 1, 2025, 5:28 PM

At some point I could disassemble 8086 (16 bit x86/real mode) as a kid. Byte sequences like 31 C9 or 31 C0 were a sure way to know if a loop of some kind was being initialized. Even simple compilers at the time made the mov xx, 0 → xor xx, xx optimization.

kstrauserDec 1, 2025, 5:49 PM

Why wasn't that a standard assembler macro, like ZEROAX or something? It seems to come up enough that it seems like there'd be a common shortcut for it.

(Not suggesting it should be. Maybe that's a terrible idea, but I don't know why.)

sfinkDec 1, 2025, 9:01 PM

I don't know, but one reason might be that with 8-bit opcodes you only have 256 instructions to play with, and many of those encode registers, so ZEROAX is burning a meaningful percentage of your total opcode space. And if you're not encoding it into a single byte, then it's pure waste: you already need XOR (and SUB), so you'd just be adding a redundant way of achieving the same thing. (Note that this argument doesn't completely hold up, since eg the 6502 had a fair number of undocumented opcodes largely because they didn't need all of them.)

Though technically you said "assembler macro", not opcode. For that, I suspect the argument is more psychological: we had such limited resources of all sorts back then that being parsimonious with everything was a required mindset. The mindset didn't just mean you made everything as short as possible, it also meant you reused everything you possibly could. So reusing XOR just felt more fitting and natural than carving out a separate assembler instruction name. (Also, there would be the question of what effect ZEROAX should have on the flags, which can be somewhat inferred when reusing an existing instruction.)

kstrauserDec 1, 2025, 11:19 PM

I meant something defined in the assembler along the lines of

  .macro ZEROAX
    xor eax, eax
  .endm

where it was defined with a semantically meaningful name, but emitting the exact same opcodes as when writing it out. I mean, I guess taking that to the logical extreme, you'd end up with... C. I dunno, it just seemed like the sort of thing that would have caught on my convention.

I use to write lots of 6502 and 68k assembler, and 68k especially tended to look quite human-readable by the time devs ended up writing macros for everything. Perhaps that wasn't the same culture around x86 asm, which I admit I've done far, far less of.

sfinkDec 1, 2025, 11:53 PM

Right, that's what I was thinking in my 2nd paragraph. No real reason not to, I suspect it just conflicted with the mindset of cleverness that you had to have for other reasons. Macros for >1 instruction would be fine, macros for 1 instruction would be looked down on because you haven't joined the club by twisting your brain into knots.

> I use to write lots of 6502 and 68k assembler, and 68k especially tended to look quite human-readable by the time devs ended up writing macros for everything.

Yes. I only did nontrivial amounts of 6502 and x86, but from what I saw of 68k, it seemed like it started out cleaner-looking and more readable even before adding in macros. (Or for all I know, I was reading code using built-in macros.)

HackerThemAllDec 1, 2025, 3:24 PM

> Interestingly, when zeroing the “extended” numbered registers (like r8), GCC still uses the d (double width, ie 32-bit) variant.

Of course. I might have some data stored in the higher dword of that register.

Tuna-FishDec 1, 2025, 4:21 PM

Clearing e8 also clears the upper half.

Partial register updates are kryptonite to OoO engines. For people used to low-level programming weak machines, it seems natural to just update part of a register, but the way every modern OoO CPU works that is literally not a possible operation. Registers are written to exactly once, and this operation also frees every subsequent instruction waiting for that register to be executed. Dirty registers don't get written to again, they are garbage collected and reset for next renaming.

The only way to implement partial register updates is to add 3-operand instructions, and have the old register state to be the third input. This is also more expensive than it sounds like, and on many modern CPUs you can execute only one 3-operand integer instruction per clock, vs 4+ 2-operand ones.

rfl890Dec 1, 2025, 3:29 PM

Which will still be zeroed.

bitwizeDec 1, 2025, 2:31 PM

Because mov eax, 0 requires fetching a constant and prolongs instruction fetching/execution. XOR A was a trick I learned back in the Z80 days.

flohofwoeDec 1, 2025, 3:16 PM

The actually surprising part to me is that such an important instruction uses a two byte encoding instead of one byte :)

kccqzyDec 1, 2025, 5:02 PM

Even supporting just 8 registers that would take up 8/256=0.03125 of the instruction encoding space.

GuB-42Dec 1, 2025, 11:53 PM

They could have made a version just for (E)AX. "general purpose" registers in x86 are not the same. AX is the accumulator, for arithmetic, BX is for indexing, CX is the loop counter and DX is for data and extending AX in divisions. You don't have to use them for that purpose, but you will have access more optimized instructions if you do. Out of these 4, AX is the most likely you would want to set to zero.

For loops, it is generally expected that you count down, with CX. The "LOOP" instruction is designed for this, so no special need to zero CX. SI and DI, the index registers may benefit from an optimized zeroing, for use with the "string" instructions.

Here I think Intel engineers didn't see the need and not having a special instruction to zero AX must simplify the decoder.

OgsyedIEDec 1, 2025, 1:22 PM

The page crashes after 3 seconds, 100% of the time, on the latest version of Android Chrome and works fine on Brave, fyi.

robmccollDec 1, 2025, 1:27 PM

This is not my experience on the latest version of Chrome Android (142.0.7444.171). It did not crash for me.

JuniperMesosDec 1, 2025, 6:20 PM

> In this case, even though rax is needed to hold the full 64-bit long result, by writing to eax, we get a nice effect: Unlike other partial register writes, when writing to an e register like eax, the architecture zeros the top 32 bits for free. So xor eax, eax sets all 64 bits to zero.

Huh, news to me. Although the amount of x86-64 assembly programming I've personally done is extremely minimal. Frankly, this is exactly the sort of architecture-specific detail I'm happy to let an ASM-generating library know for me rather than know myself.

silverfrostDec 1, 2025, 2:08 PM

Back on the Z80 'xor a' is the shortest sequence to zero A

BiraIgnacioDec 1, 2025, 3:19 PM

Also cool this got at the top item on the HN front page

dintechDec 1, 2025, 2:33 PM

My brain read this is "Why not ear wax?"

kragenDec 1, 2025, 5:04 PM

    xor wax, wax    ; clear wax
    xor sax, sax    ; clear sax
    xor fax, fax    ; tru tru

sixthDotDec 1, 2025, 2:47 PM

I've wrote a lot of `xor al,al` in my youth.

snvzzDec 1, 2025, 1:27 PM

Because, unlike RISC-V, x86 has no x0 register.

rhaps0dyDec 1, 2025, 4:25 PM

No RSS? I want to subscribe :'(

sphDec 1, 2025, 4:30 PM

“Who cares about RSS, no one uses it any more”

There’s dozens of us! By the way, totally unaffiliated, but I have used fetchrss for those websites that have no feed.

sylwareDec 1, 2025, 1:52 PM

Remnant of RISC attempt without a zero register.

sylwareDec 3, 2025, 2:05 AM

Come on... that was a joke... this karma system...

tony-john12Dec 1, 2025, 1:42 PM

[dead]

Why xor eax, eax?

Comments