Tweaked/added notes to docs/Multithreading.md.

Details:
- Added language to docs/Multithreading.md cautioning the reader about
  the nuances of setting multithreading parameters via the manual and
  automatic ways simultaneously, and also about how these parameters
  behave when multithreading is disabled at configure-time. These
  changes are an attempt to address the issues that arose in issue #362.
  Thanks to Jérémie du Boisberranger for his feedback on this topic.
- CREDITS file update.
This commit is contained in:
Field G. Van Zee
2019-11-12 15:32:57 -06:00
parent bdc7ee3394
commit 8f399c8940
2 changed files with 80 additions and 75 deletions

149
CREDITS
View File

@@ -5,88 +5,89 @@ Acknowledgements
The BLIS framework was primarily authored by
Field Van Zee @fgvanzee (The University of Texas at Austin)
Field Van Zee @fgvanzee (The University of Texas at Austin)
but many others have contributed code and feedback, including
Sameer Agarwal @sandwichmaker (Google)
Murtaza Ali (Texas Instruments)
Sajid Ali @s-sajid-ali (Northwestern University)
Erling Andersen @erling-d-andersen
Alex Arslan @ararslan
Vernon Austel (IBM, T.J. Watson Research Center)
Matthew Brett @matthew-brett (University of Birmingham)
Jed Brown @jedbrown (Argonne National Laboratory)
Robin Christ @robinchrist
Kay Dewhurst @jkd2016 (Max Planck Institute, Halle, Germany)
Jeff Diamond (Oracle)
Johannes Dieterich @iotamudelta
Krzysztof Drewniak @krzysz00
Marat Dukhan @Maratyszcza (Google)
Victor Eijkhout @VictorEijkhout (Texas Advanced Computing Center)
Evgeny Epifanovsky @epifanovsky (Q-Chem)
Isuru Fernando @isuruf
Roman Gareev @gareevroman
Richard Goldschmidt @SuperFluffy
Sameer Agarwal @sandwichmaker (Google)
Murtaza Ali (Texas Instruments)
Sajid Ali @s-sajid-ali (Northwestern University)
Erling Andersen @erling-d-andersen
Alex Arslan @ararslan
Vernon Austel (IBM, T.J. Watson Research Center)
Matthew Brett @matthew-brett (University of Birmingham)
Jed Brown @jedbrown (Argonne National Laboratory)
Robin Christ @robinchrist
Kay Dewhurst @jkd2016 (Max Planck Institute, Halle, Germany)
Jeff Diamond (Oracle)
Johannes Dieterich @iotamudelta
Krzysztof Drewniak @krzysz00
Marat Dukhan @Maratyszcza (Google)
Victor Eijkhout @VictorEijkhout (Texas Advanced Computing Center)
Evgeny Epifanovsky @epifanovsky (Q-Chem)
Isuru Fernando @isuruf
Roman Gareev @gareevroman
Richard Goldschmidt @SuperFluffy
Chris Goodyer
John Gunnels @jagunnels (IBM, T.J. Watson Research Center)
Ali Emre Gülcü @Lephar
Jeff Hammond @jeffhammond (Intel)
Jacob Gorm Hansen @jacobgorm
Jean-Michel Hautbois @jhautbois
Ian Henriksen @insertinterestingnamehere (The University of Texas at Austin)
Minh Quan Ho @hominhquan
Matthew Honnibal @honnibal
Stefan Husmann @stefanhusmann
Francisco Igual @figual (Universidad Complutense de Madrid)
Tony Kelman @tkelman
Lee Killough @leekillough (Cray)
Mike Kistler @mkistler (IBM, Austin Research Laboratory)
Michael Lehn @michael-lehn
@ShmuelLevine
Dave Love @loveshack
Tze Meng Low (The University of Texas at Austin)
Ye Luo @ye-luo (Argonne National Laboratory)
Ricardo Magana @magania (Hewlett Packard Enterprise)
Bryan Marker @bamarker (The University of Texas at Austin)
Devin Matthews @devinamatthews (The University of Texas at Austin)
Stefanos Mavros @smavros
Nisanth Padinharepatt (AMD)
Devangi Parikh @dnparikh (The University of Texas at Austin)
Elmar Peise @elmar-peise (RWTH-Aachen)
Clément Pernet @ClementPernet
John Gunnels @jagunnels (IBM, T.J. Watson Research Center)
Ali Emre Gülcü @Lephar
Jeff Hammond @jeffhammond (Intel)
Jacob Gorm Hansen @jacobgorm
Jérémie du Boisberranger @jeremiedbb
Jean-Michel Hautbois @jhautbois
Ian Henriksen @insertinterestingnamehere (The University of Texas at Austin)
Minh Quan Ho @hominhquan
Matthew Honnibal @honnibal
Stefan Husmann @stefanhusmann
Francisco Igual @figual (Universidad Complutense de Madrid)
Tony Kelman @tkelman
Lee Killough @leekillough (Cray)
Mike Kistler @mkistler (IBM, Austin Research Laboratory)
Michael Lehn @michael-lehn
@ShmuelLevine
Dave Love @loveshack
Tze Meng Low (The University of Texas at Austin)
Ye Luo @ye-luo (Argonne National Laboratory)
Ricardo Magana @magania (Hewlett Packard Enterprise)
Bryan Marker @bamarker (The University of Texas at Austin)
Devin Matthews @devinamatthews (The University of Texas at Austin)
Stefanos Mavros @smavros
Nisanth Padinharepatt (AMD)
Devangi Parikh @dnparikh (The University of Texas at Austin)
Elmar Peise @elmar-peise (RWTH-Aachen)
Clément Pernet @ClementPernet
Ilya Polkovnichenko
Jack Poulson @poulson (Stanford)
Mathieu Poumeyrol @kali
Christos Psarras @ChrisPsa (RWTH-Aachen)
@qnerd
Michael Rader @mrader1248
Pradeep Rao @pradeeptrgit (AMD)
Jack Poulson @poulson (Stanford)
Mathieu Poumeyrol @kali
Christos Psarras @ChrisPsa (RWTH-Aachen)
@qnerd
Michael Rader @mrader1248
Pradeep Rao @pradeeptrgit (AMD)
Aleksei Rechinskii
Karl Rupp @karlrupp
Martin Schatz (The University of Texas at Austin)
Nico Schlömer @nschloe
Karl Rupp @karlrupp
Martin Schatz (The University of Texas at Austin)
Nico Schlömer @nschloe
Rene Sitt
Tony Skjellum @tonyskjellum (The University of Tennessee at Chattanooga)
Mikhail Smelyanskiy (Intel, Parallel Computing Lab)
Nathaniel Smith @njsmith
Shaden Smith @ShadenSmith
Tyler Smith @tlrmchlsmth (The University of Texas at Austin)
Paul Springer @springer13 (RWTH-Aachen)
Adam J. Stewart @adamjstewart (University of Illinois at Urbana-Champaign)
Tony Skjellum @tonyskjellum (The University of Tennessee at Chattanooga)
Mikhail Smelyanskiy (Intel, Parallel Computing Lab)
Nathaniel Smith @njsmith
Shaden Smith @ShadenSmith
Tyler Smith @tlrmchlsmth (The University of Texas at Austin)
Paul Springer @springer13 (RWTH-Aachen)
Adam J. Stewart @adamjstewart (University of Illinois at Urbana-Champaign)
Vladimir Sukarev
Santanu Thangaraj (AMD)
Nicholai Tukanov @nicholaiTukanov (The University of Texas at Austin)
Rhys Ulerich @RhysU (The University of Texas at Austin)
Robert van de Geijn @rvdg (The University of Texas at Austin)
Kiran Varaganti @kvaragan (AMD)
Natalia Vassilieva (Hewlett Packard Enterprise)
Zhang Xianyi @xianyi (Chinese Academy of Sciences)
Benda Xu @heroxbd
Costas Yamin @cosstas
Chenhan Yu @ChenhanYu (The University of Texas at Austin)
Roman Yurchak @rth (Symerio)
M. Zhou @cdluminate
Santanu Thangaraj (AMD)
Nicholai Tukanov @nicholaiTukanov (The University of Texas at Austin)
Rhys Ulerich @RhysU (The University of Texas at Austin)
Robert van de Geijn @rvdg (The University of Texas at Austin)
Kiran Varaganti @kvaragan (AMD)
Natalia Vassilieva (Hewlett Packard Enterprise)
Zhang Xianyi @xianyi (Chinese Academy of Sciences)
Benda Xu @heroxbd
Costas Yamin @cosstas
Chenhan Yu @ChenhanYu (The University of Texas at Austin)
Roman Yurchak @rth (Symerio)
M. Zhou @cdluminate
BLIS's development was partially funded by grants from industry
partners, including

View File

@@ -107,7 +107,11 @@ This pattern--automatic or manual--holds regardless of which of the three method
Regardless of which method is employed, and which specific way within each method, after setting the number of threads, the application may call the desired level-3 operation (via either the [typed API](docs/BLISTypedAPI.md) or the [object API](docs/BLISObjectAPI.md)) and the operation will execute in a multithreaded manner. (When calling BLIS via the BLAS API, only the first two (global) methods are available.)
**Note**: Please be aware of what happens if you try to specify both the automatic and manual ways, as it could otherwise confuse new users. Regardless of which broad method is used, **if multithreading is specified via both the automatic and manual ways, the manual way will always take precedence.** Also, specifying parallelism for even *one* loop counts as specifying the manual way (in which case the ways of parallelism for the remaining loops will be assumed to be 1).
**Note**: Please be aware of what happens if you try to specify both the automatic and manual ways, as it could otherwise confuse new users. Here are the important points:
* Regardless of which broad method is used, **if multithreading is specified via both the automatic and manual ways, the values set via the manual way will always take precedence.**
* Specifying parallelism for even *one* loop counts as specifying the manual way (in which case the ways of parallelism for the remaining loops will be assumed to be 1).
* If you have specified multithreading via *both* the automatic and manual ways, BLIS will **not** complain if the values are inconsistent with one another. (For example, you may request 8 total threads be used while also specifing 4 ways of parallelism within each of two matrix multiplication loops, for a total of 16 ways.) Furthermore, you will be able to query these inconsistent values via the runtime API both before and after multithreading executes.
* If multithreading is disabled, you **may still** specify multithreading values via either the manual or automatic ways. However, BLIS will silently ignore **all** of these values. A BLIS library that is built with multithreading disabled at configure-time will always run sequentially (from the prespective of a single application thread).
## Globally via environment variables