Nie jesteś zalogowany | zaloguj się

Wydział Matematyki, Informatyki i Mechaniki Uniwersytetu Warszawskiego

  • Skala szarości
  • Wysoki kontrast
  • Negatyw
  • Podkreślenie linków
  • Reset

Aktualności — Wydarzenia

Sem. "Uczenie maszynowe"


Conditional Computation in Transformers

Prelegent: Sebastian Jaszczur

2022-06-02 12:15

Note: seminar will be in person with a follow-up lunch.


Transformer architecture is widely used in Natural Language Processing to get state of the art results. Unfortunately, such model quality is usually only possible by using extremely large models, which require significant resources during both training and prediction. This problem can be tackled by designing a neural network to conditionally skip a significant portion of model’s weights and computation, leaving only parts useful for the example at hand. I will describe some of the recent developments in this area, including my work published at NeurIPS 2021, “Sparse is Enough in Scaling Transformers”, and my ongoing research.