Discussion about this post

User's avatar
andaja's avatar

There are several moments worth mentioning I think:

- while DS investments look enormous it is projected that almost half of it can remains on paper due to cutting edge chips and high bandwidth memory shortage (for at least 3 years)

- the memory access seems the main bottleneck for the top models now not pure calculation speed.

- even if the scaling laws are still true (though the model size seems stalled at 2t parameters) the model training is not the main consumer of compute for quite some time now.

No posts

Ready for more?