Want faster Linux? For a long time this has been CPU dependent.
Switch from 32-bit to 64-bit. Get a faster CPU with faster clock-speed.. Get a CPU with more cores.
But there is a limit to both of these strategies. So what do we do next?
This has been discussed a little here already, but I wanted to find out about the details.
Modern software often requires specific CPU capabilities to run efficiently or even at all. These capabilities are grouped into levels called x86-64-v1 through v4, with each level adding newer instruction sets that improve performance. For example, x86-64-v3 includes features like AVX2, FMA, and BMI2, which allow the processor to handle more data at once, perform complex math operations faster, and move data more efficiently. These improvements can lead to significant speedups—sometimes 2 to 3 times faster—for tasks like scientific computing, video processing, and machine learning. However, not all CPUs support these newer levels. If your processor doesn’t support x86-64-v3, you won’t be able to run software that requires it, such as RL 10. Checking your CPU’s compatibility ensures you can take full advantage of modern software performance and avoid compatibility issues.
Process more data per instruction
Note: Gains are not automatic — software must be compiled to use these instructions.
I have tested a few of these, the results are actually pretty impressive. Right now only Redhat (and clones, i.e. Alma, CentOS, Rocky, Oracle, etc..) support this, but I suspect other Linux distro's will follow. Will this "force" people to buy newer hardware? Not necessarily. Some distro's, have more than one iso. Much like like a 32bit version and a 64-bit version. They have an x86_64_v1 version and a x86_64_v3 version. I haven't seen anything that suggests what Debian based distro's will do (if anything). As of writing this... I haven't seen any response from Arch on this, but they are usually quick to adapt.
Edit: It appears CachyOS does support v3/v4.
Let’s say you’re running a matrix multiplication in a scientific app:
On a v1-compatible CPU, it might use scalar instructions (1 element at a time).
On a v3-compatible CPU, it can use AVX2 to process 8 elements at once — potentially making it up to 8× faster in that specific loop.
Sources:
This has caused quite a stir in the cloud providers. This caught a few of them off-guard. So some cloud providers aren't making these distro's available yet. (As their hardware doesn't support it yet). Not only do you have the microarchitecture nuacnes, but now you have the fact that md based disk mappers aren't supported on nvme drives. Out of 9 computers... I only have 1 that supports v4.... I only have 1 that supports v3. Everything else won't run these newer microarchitectures. If you're planning on running these in a VM, note: the host computer has to support the microarchitecture. But if it does... the gains are noticeable.
Switch from 32-bit to 64-bit. Get a faster CPU with faster clock-speed.. Get a CPU with more cores.
But there is a limit to both of these strategies. So what do we do next?
This has been discussed a little here already, but I wanted to find out about the details.
Modern software often requires specific CPU capabilities to run efficiently or even at all. These capabilities are grouped into levels called x86-64-v1 through v4, with each level adding newer instruction sets that improve performance. For example, x86-64-v3 includes features like AVX2, FMA, and BMI2, which allow the processor to handle more data at once, perform complex math operations faster, and move data more efficiently. These improvements can lead to significant speedups—sometimes 2 to 3 times faster—for tasks like scientific computing, video processing, and machine learning. However, not all CPUs support these newer levels. If your processor doesn’t support x86-64-v3, you won’t be able to run software that requires it, such as RL 10. Checking your CPU’s compatibility ensures you can take full advantage of modern software performance and avoid compatibility issues.
Newer instruction sets (like AVX2, FMA, BMI1/2) allow the CPU to:Process more data per instruction
- Example: AVX2 can handle 256-bit wide vectors, meaning it can process 8 floats or 4 doubles at once — instead of one at a time.
- Fewer instructions = fewer CPU cycles = faster execution.
- FMA (Fused Multiply-Add) combines multiplication and addition in one step, improving both speed and accuracy.
- Instructions like MOVBE and BMI2 help with faster data movement and bit manipulation.
| Scientific computing / AI / ML | 1.5× to 3× faster |
| Video encoding / decoding | 1.2× to 2× faster |
| Cryptography / Compression | 1.5× to 4× faster |
| General desktop apps | 5% to 20% faster |
| Gaming / Graphics | Depends on engine, often 10–30% faster |
Note: Gains are not automatic — software must be compiled to use these instructions.
I have tested a few of these, the results are actually pretty impressive. Right now only Redhat (and clones, i.e. Alma, CentOS, Rocky, Oracle, etc..) support this, but I suspect other Linux distro's will follow. Will this "force" people to buy newer hardware? Not necessarily. Some distro's, have more than one iso. Much like like a 32bit version and a 64-bit version. They have an x86_64_v1 version and a x86_64_v3 version. I haven't seen anything that suggests what Debian based distro's will do (if anything). As of writing this... I haven't seen any response from Arch on this, but they are usually quick to adapt.
Edit: It appears CachyOS does support v3/v4.
Let’s say you’re running a matrix multiplication in a scientific app:
On a v1-compatible CPU, it might use scalar instructions (1 element at a time).
On a v3-compatible CPU, it can use AVX2 to process 8 elements at once — potentially making it up to 8× faster in that specific loop.
Sources:
Question on AVX2 and FMA (Haswell)
Ok I tried my best to find the answer myself thru wiki, youtube, google searches and still dont know. Can someone tell me/explain 1) What is FMA is and what it does?(examples) 2) What AVX2 is and does? (examples) I seem to find technical answers that I don't understand. In simple terms...
forums.anandtech.com
AnandTech Forums: Technology, Hardware, Software, and Deals
Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
www.anandtech.com
This has caused quite a stir in the cloud providers. This caught a few of them off-guard. So some cloud providers aren't making these distro's available yet. (As their hardware doesn't support it yet). Not only do you have the microarchitecture nuacnes, but now you have the fact that md based disk mappers aren't supported on nvme drives. Out of 9 computers... I only have 1 that supports v4.... I only have 1 that supports v3. Everything else won't run these newer microarchitectures. If you're planning on running these in a VM, note: the host computer has to support the microarchitecture. But if it does... the gains are noticeable.
Last edited:

