However GPU profiling should never happen like on bare metal system.
I did a couple more tests and it's related to a config.
When I started playing with the VMs, I noticed they are too slow to stream HD videos in the browser. What I did was add a kernel option to share virtual slices of the GPU (i915.enable_gvt=1). This results in the webGL headers of the VM being the same as the host - and trying the test on the host indeed does yield the same score/headers.
Interestingly, the different ISO VMs I setup to play (Arch niri, Debian Trixie KDE, Fedora 43) all roughly achieve the
same score, give and take some variances of the other setup variables covered in the eff page (e.g. debian ff-esr version, one VM using default en-US, others en-GB, one using UTC, others GMT, some installing more default fonts than others, etc). It's quite interesting that they conclude the same overall score.
Then, without the GPU kernel option, a VM would use software rendering (llvmpipe), which is much more common (halving the webGL identifier score to 7). I'm think this was what you had in mind. But obviously this affects the browsing performance negatively. I know because my old notebook uses llvmpipe on metal as well..
All in all, I don't think one can tell from the eff test results, that it is run in a VM, which is good.
I'd wager a bet that the newer the hardware is, the higher the default overall score. Simply because there are less machines in the testing pool, the newer hardware (like the GPU) sticks out. To diffuse it, one might need to heavily restrict capabilities (which takes the fun out of computing to an extent, see streaming example).
Of course the other mentioned advantages of browsing in a VM do hold. For example, on encountering the unfortunate defacement this week, I simply rebooted the VM (and could have deleted it without loss).