Já me disseram que eu não deveria usar as opções -O3 ou -Ofast em meus projetos. Bem… elas tendem a oferecer as melhores otimizações possíveis e, na tabela abaixo, mostro a diferença entre elas e a compilação sem opções de otimização (que chamei de “generic”) e a opção de “nenhuma” otimização (-O0). A opção -Os, em teoria, gera código pequeno (‘s’ de smaller), mas essencialmente, ele é a mesma coisa que -O2, exceto pela possível otimização de chamadas à strlen (que, na minha experiência, quase nunca o compilador consegue otimizar):
generic | -Os | -O0 | -O1 | -O2 | -O3 | -Ofast | |
-faggressive-loop-optimizations | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fasynchronous-unwind-tables | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fauto-inc-dec | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fchkp-check-incomplete-type | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fchkp-check-read | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fchkp-check-write | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fchkp-instrument-calls | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fchkp-narrow-bounds | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fchkp-optimize | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fchkp-store-bounds | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fchkp-use-static-bounds | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fchkp-use-static-const-bounds | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fchkp-use-wrappers | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fcommon | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fearly-inlining | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-ffunction-cse | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fgcse-lm | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fira-hoist-pressure | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fira-share-save-slots | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fira-share-spill-slots | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fivopts | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fkeep-static-consts | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fleading-underscore | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-flifetime-dse | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-flto-odr-type-merging | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fpeephole | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fprefetch-loop-arrays | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-freg-struct-return | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsched-critical-path-heuristic | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsched-dep-count-heuristic | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsched-group-heuristic | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsched-interblock | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsched-last-insn-heuristic | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsched-rank-heuristic | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsched-spec | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsched-spec-insn-heuristic | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsched-stalled-insns-dep | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fschedule-fusion | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsemantic-interposition | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fshow-column | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsplit-ivs-in-unroller | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fstack-protector-strong | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fstdarg-opt | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fstrict-volatile-bitfields | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsync-libcalls | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-ftree-loop-if-convert | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-ftree-loop-im | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-ftree-loop-ivcanon | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-ftree-loop-optimize | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-ftree-parallelize-loops | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-ftree-phiprop | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-ftree-reassoc | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-ftree-scev-cprop | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-funit-at-a-time | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-funwind-tables | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-malign-stringops | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-mavx256-split-unaligned-load | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-mavx256-split-unaligned-store | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-mfancy-math-387 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-mfp-ret-in-387 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-mfxsr | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-mglibc | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-mno-sse4 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-mpush-args | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-mred-zone | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-msse | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-msse2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-mtls-direct-seg-refs | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-mvzeroupper | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-fsigned-zeroes | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
-fdelete-null-pointer-checks | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fident | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-finline-atomics | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftrapping-math | ✓ | ✓ | ✓ | ||||
-fbranch-count-reg | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fcombine-stack-adjustments | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fcompare_elim | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fcprop-registers | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fdefer-pop | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fforward-propagate | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fguess-branch-probability | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fhoist-adjacent-loads | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fif-conversion | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fif-conversion2 | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-finline | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-finline-functions-called-once | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fipa-profile | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fipa-pure-const | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fipa-reference | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fmerge-constants | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fmove-loop-invariants | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fomit-frame-pointer | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fshrink-wrap | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fsplit-wide-types | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-fssa-phiopt | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftoplevel-reorder | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-bit-ccp | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-ccp | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-ch | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-coalesce-vars | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-copy-prop | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-copyrename | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-cselim | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-dce | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-dominator-opts | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-dse | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-forwprop | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-fre | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-pta | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-sink | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-slsr | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-sra | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-ftree-ter | ✓ | ✓ | ✓ | ✓ | ✓ | ||
-falign-labels | ✓ | ✓ | ✓ | ✓ | |||
-fcaller-saves | ✓ | ✓ | ✓ | ✓ | |||
-fcrossjumping | ✓ | ✓ | ✓ | ✓ | |||
-fcse-follow-jumps | ✓ | ✓ | ✓ | ✓ | |||
-fdevirtualize | ✓ | ✓ | ✓ | ✓ | |||
-fdevirtualize-speculatively | ✓ | ✓ | ✓ | ✓ | |||
-fexpensive-optimizations | ✓ | ✓ | ✓ | ✓ | |||
-fgcse | ✓ | ✓ | ✓ | ✓ | |||
-findirect-inlining | ✓ | ✓ | ✓ | ✓ | |||
-finline-small-functions | ✓ | ✓ | ✓ | ✓ | |||
-fipa-cp | ✓ | ✓ | ✓ | ✓ | |||
-fipa-cp-alignment | ✓ | ✓ | ✓ | ✓ | |||
-fipa-icf | ✓ | ✓ | ✓ | ✓ | |||
-fipa-icf-functions | ✓ | ✓ | ✓ | ✓ | |||
-fipa-icf-variables | ✓ | ✓ | ✓ | ✓ | |||
-fipa-ra | ✓ | ✓ | ✓ | ✓ | |||
-fipa-sra | ✓ | ✓ | ✓ | ✓ | |||
-fisolate-erroneous-paths-dereference | ✓ | ✓ | ✓ | ✓ | |||
-flra-remat | ✓ | ✓ | ✓ | ✓ | |||
-foptimize-sibling-calls | ✓ | ✓ | ✓ | ✓ | |||
-fpartial-inlining | ✓ | ✓ | ✓ | ✓ | |||
-fpeephole2 | ✓ | ✓ | ✓ | ✓ | |||
-free | ✓ | ✓ | ✓ | ✓ | |||
-freorder-blocks | ✓ | ✓ | ✓ | ✓ | |||
-freorder-blocks-and-partition | ✓ | ✓ | ✓ | ✓ | |||
-freorder-functions | ✓ | ✓ | ✓ | ✓ | |||
-frerun-cse-after-loop | ✓ | ✓ | ✓ | ✓ | |||
-fschedule-insns2 | ✓ | ✓ | ✓ | ✓ | |||
-fstrict-aliasing | ✓ | ✓ | ✓ | ✓ | |||
-fstrict-overflow | ✓ | ✓ | ✓ | ✓ | |||
-fthread-jumps | ✓ | ✓ | ✓ | ✓ | |||
-ftree-builtin-call-dce | ✓ | ✓ | ✓ | ✓ | |||
-ftree-switch-conversion | ✓ | ✓ | ✓ | ✓ | |||
-ftree-tail-merge | ✓ | ✓ | ✓ | ✓ | |||
-ftree-vrp | ✓ | ✓ | ✓ | ✓ | |||
-foptimize-strlen | ✓ | ✓ | ✓ | ||||
-ftree-pre | ✓ | ✓ | ✓ | ||||
-finline-functions | ✓ | ✓ | ✓ | ||||
-fgcse-after-reload | ✓ | ✓ | |||||
-fipa-cp-clone | ✓ | ✓ | |||||
-fpredictive-commoning | ✓ | ✓ | |||||
-ftree-loop-distribute-patterns | ✓ | ✓ | |||||
-ftree-loop-vectorize | ✓ | ✓ | |||||
-ftree-partial-pre | ✓ | ✓ | |||||
-ftree-slp-vectorize | ✓ | ✓ | |||||
-funswitch-loops | ✓ | ✓ | |||||
-fassociative-math | ✓ | ||||||
-fcx-limited-range | ✓ | ||||||
-ffinite-math-only | ✓ | ||||||
-freciprocal-math | ✓ | ||||||
-funsafe-math-optimizations | ✓ |
Note que somente com as opções -O3 e -Ofast podemos ter o recurso de vetorização de um loop que, é claro, poderia ser “ligada” usando a chave -ftree-loop-vectorize, mas, de qualquer maneira, Essas duas opções oferencem muitas vantagens, tanto do ponto de vista da performance e do tamanho do código gerado pelo compilador.
Se você usa apenas a opção -O2, note que algumas coisas não são oferecidas, como o recurso de organizar funções de pouco uso como inline, algumas otimizações inter process também são são realizadas e além da vetorização, algumas eliminações de sub expressões globais e da previsão de loops (para auxiliar no branching predition) podem ficar seriamente comprometidas.
Tá certo que a opção -Ofast só é realmente útil em dois pontos: Quando não precisamos da conformidade estrita com IEEE 754 e quanto otimizações ainda mais agressivas (porém “unsafe”) sejam interessantes. Eu evitaria o -Ofast a não ser que você saiba o que faz. Mas adotaria -O3 como padrão.
UPDATE: Eu já tinha observado isso antes mas fiz questão de fazer alguns testes… Ao que parece, -O2 realmente gera código mais rápido que -O3 ou -Ofast!