Új hozzászólás Aktív témák

  • S_x96x_S

    addikt

    A ZEN4-es szoftveres támogatás elég lassú ....
    de azért halad ...

    GCC 13 Now Enables 512-bit Vector For AMD Zen 4 Tuning
    https://www.phoronix.com/news/GCC-13-Zen-4-Znver4-512b-Vector

    Enable 512 bit vector for zen4

    While internally 512 registers are splits into two 256 halves, 512 bit vectors
    reduces number of instructions to retire and has chance to improve paralelism.
    There are few tsvc benchmarks that improves significantly:

    runtime
    benchmark 256bit 512bit
    s2275 48.57 20.67 -58%
    s311 32.29 16.06 -50%
    s312 32.30 16.07 -50%
    vsumr 32.30 16.07 -50%
    s314 10.77 5.42 -50%
    s313 21.52 10.85 -50%
    vdotr 43.05 21.69 -50%
    s316 10.80 5.64 -48%
    s235 61.72 33.91 -45%
    s161 15.91 9.95 -38%
    s3251 32.13 20.31 -36%

    And there are no benchmarks with off-noise regression. The basic matrix
    multiplication loop improves by 32%. It is also expected that 512 bit
    vectors are more power effecient (I can't masure that).

    The down side is that loops with low trip counts may get slower when the
    unvectorized prologue and epilogue is hit more often. With SPECfp this
    problem happens with x264 (12% regression) and bwaves (6% regression)
    and this is tracked in
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
    and will need more work on vectorizer to support masked epilogues.

    After some additional testing it seems that using 512 bit vectors by
    default is now overall better choice.

Új hozzászólás Aktív témák