Új hozzászólás Aktív témák

  • S_x96x_S

    addikt

    AMD Zen 4 Cost Table & Tuning Patches Posted For The GCC Compiler
    https://www.phoronix.com/news/AMD-Zen4-Cost-Table-Tuning-GCC

    "this patch updates cost of znver4 mostly based on data measued by Agner Fog. Compared to previous generations x87 became bit slower which is probably not big deal (and we have minimal benchmarking coverage for it). One interesting improvement is reduction of FMA cost. I also updated costs of AVX256 loads/stores based on latencies (not throughput which is twice of avx256). Overall AVX512 vectorization seems to improve noticeably some of TSVC benchmarks but since internally 512 vectors are split to 256 vectors it is somewhat risky and does not win in SPEC scores (mostly by regressing benchmarks with loop that have small trip count like x264 and exchange), so for now I am going to set AVX256_OPTIMAL tune but I am still playing with it. We improved since ZNVER1 on choosing vectorization size and also have vectorized prologues/epilogues so it may be possible to make avx512 small win overall.

    In general I would like to keep cost tables latency based unless we have a good reason to not do so. There are some interesting diferences in znver3 tables that I also patched and seems performance neutral. I will send that separately."

Új hozzászólás Aktív témák