diff --git a/CREDITS b/CREDITS index be218e1b3..6e47f8717 100644 --- a/CREDITS +++ b/CREDITS @@ -5,88 +5,89 @@ Acknowledgements The BLIS framework was primarily authored by - Field Van Zee @fgvanzee (The University of Texas at Austin) + Field Van Zee @fgvanzee (The University of Texas at Austin) but many others have contributed code and feedback, including - Sameer Agarwal @sandwichmaker (Google) - Murtaza Ali (Texas Instruments) - Sajid Ali @s-sajid-ali (Northwestern University) - Erling Andersen @erling-d-andersen - Alex Arslan @ararslan - Vernon Austel (IBM, T.J. Watson Research Center) - Matthew Brett @matthew-brett (University of Birmingham) - Jed Brown @jedbrown (Argonne National Laboratory) - Robin Christ @robinchrist - Kay Dewhurst @jkd2016 (Max Planck Institute, Halle, Germany) - Jeff Diamond (Oracle) - Johannes Dieterich @iotamudelta - Krzysztof Drewniak @krzysz00 - Marat Dukhan @Maratyszcza (Google) - Victor Eijkhout @VictorEijkhout (Texas Advanced Computing Center) - Evgeny Epifanovsky @epifanovsky (Q-Chem) - Isuru Fernando @isuruf - Roman Gareev @gareevroman - Richard Goldschmidt @SuperFluffy + Sameer Agarwal @sandwichmaker (Google) + Murtaza Ali (Texas Instruments) + Sajid Ali @s-sajid-ali (Northwestern University) + Erling Andersen @erling-d-andersen + Alex Arslan @ararslan + Vernon Austel (IBM, T.J. Watson Research Center) + Matthew Brett @matthew-brett (University of Birmingham) + Jed Brown @jedbrown (Argonne National Laboratory) + Robin Christ @robinchrist + Kay Dewhurst @jkd2016 (Max Planck Institute, Halle, Germany) + Jeff Diamond (Oracle) + Johannes Dieterich @iotamudelta + Krzysztof Drewniak @krzysz00 + Marat Dukhan @Maratyszcza (Google) + Victor Eijkhout @VictorEijkhout (Texas Advanced Computing Center) + Evgeny Epifanovsky @epifanovsky (Q-Chem) + Isuru Fernando @isuruf + Roman Gareev @gareevroman + Richard Goldschmidt @SuperFluffy Chris Goodyer - John Gunnels @jagunnels (IBM, T.J. Watson Research Center) - Ali Emre Gülcü @Lephar - Jeff Hammond @jeffhammond (Intel) - Jacob Gorm Hansen @jacobgorm - Jean-Michel Hautbois @jhautbois - Ian Henriksen @insertinterestingnamehere (The University of Texas at Austin) - Minh Quan Ho @hominhquan - Matthew Honnibal @honnibal - Stefan Husmann @stefanhusmann - Francisco Igual @figual (Universidad Complutense de Madrid) - Tony Kelman @tkelman - Lee Killough @leekillough (Cray) - Mike Kistler @mkistler (IBM, Austin Research Laboratory) - Michael Lehn @michael-lehn - @ShmuelLevine - Dave Love @loveshack - Tze Meng Low (The University of Texas at Austin) - Ye Luo @ye-luo (Argonne National Laboratory) - Ricardo Magana @magania (Hewlett Packard Enterprise) - Bryan Marker @bamarker (The University of Texas at Austin) - Devin Matthews @devinamatthews (The University of Texas at Austin) - Stefanos Mavros @smavros - Nisanth Padinharepatt (AMD) - Devangi Parikh @dnparikh (The University of Texas at Austin) - Elmar Peise @elmar-peise (RWTH-Aachen) - Clément Pernet @ClementPernet + John Gunnels @jagunnels (IBM, T.J. Watson Research Center) + Ali Emre Gülcü @Lephar + Jeff Hammond @jeffhammond (Intel) + Jacob Gorm Hansen @jacobgorm + Jérémie du Boisberranger @jeremiedbb + Jean-Michel Hautbois @jhautbois + Ian Henriksen @insertinterestingnamehere (The University of Texas at Austin) + Minh Quan Ho @hominhquan + Matthew Honnibal @honnibal + Stefan Husmann @stefanhusmann + Francisco Igual @figual (Universidad Complutense de Madrid) + Tony Kelman @tkelman + Lee Killough @leekillough (Cray) + Mike Kistler @mkistler (IBM, Austin Research Laboratory) + Michael Lehn @michael-lehn + @ShmuelLevine + Dave Love @loveshack + Tze Meng Low (The University of Texas at Austin) + Ye Luo @ye-luo (Argonne National Laboratory) + Ricardo Magana @magania (Hewlett Packard Enterprise) + Bryan Marker @bamarker (The University of Texas at Austin) + Devin Matthews @devinamatthews (The University of Texas at Austin) + Stefanos Mavros @smavros + Nisanth Padinharepatt (AMD) + Devangi Parikh @dnparikh (The University of Texas at Austin) + Elmar Peise @elmar-peise (RWTH-Aachen) + Clément Pernet @ClementPernet Ilya Polkovnichenko - Jack Poulson @poulson (Stanford) - Mathieu Poumeyrol @kali - Christos Psarras @ChrisPsa (RWTH-Aachen) - @qnerd - Michael Rader @mrader1248 - Pradeep Rao @pradeeptrgit (AMD) + Jack Poulson @poulson (Stanford) + Mathieu Poumeyrol @kali + Christos Psarras @ChrisPsa (RWTH-Aachen) + @qnerd + Michael Rader @mrader1248 + Pradeep Rao @pradeeptrgit (AMD) Aleksei Rechinskii - Karl Rupp @karlrupp - Martin Schatz (The University of Texas at Austin) - Nico Schlömer @nschloe + Karl Rupp @karlrupp + Martin Schatz (The University of Texas at Austin) + Nico Schlömer @nschloe Rene Sitt - Tony Skjellum @tonyskjellum (The University of Tennessee at Chattanooga) - Mikhail Smelyanskiy (Intel, Parallel Computing Lab) - Nathaniel Smith @njsmith - Shaden Smith @ShadenSmith - Tyler Smith @tlrmchlsmth (The University of Texas at Austin) - Paul Springer @springer13 (RWTH-Aachen) - Adam J. Stewart @adamjstewart (University of Illinois at Urbana-Champaign) + Tony Skjellum @tonyskjellum (The University of Tennessee at Chattanooga) + Mikhail Smelyanskiy (Intel, Parallel Computing Lab) + Nathaniel Smith @njsmith + Shaden Smith @ShadenSmith + Tyler Smith @tlrmchlsmth (The University of Texas at Austin) + Paul Springer @springer13 (RWTH-Aachen) + Adam J. Stewart @adamjstewart (University of Illinois at Urbana-Champaign) Vladimir Sukarev - Santanu Thangaraj (AMD) - Nicholai Tukanov @nicholaiTukanov (The University of Texas at Austin) - Rhys Ulerich @RhysU (The University of Texas at Austin) - Robert van de Geijn @rvdg (The University of Texas at Austin) - Kiran Varaganti @kvaragan (AMD) - Natalia Vassilieva (Hewlett Packard Enterprise) - Zhang Xianyi @xianyi (Chinese Academy of Sciences) - Benda Xu @heroxbd - Costas Yamin @cosstas - Chenhan Yu @ChenhanYu (The University of Texas at Austin) - Roman Yurchak @rth (Symerio) - M. Zhou @cdluminate + Santanu Thangaraj (AMD) + Nicholai Tukanov @nicholaiTukanov (The University of Texas at Austin) + Rhys Ulerich @RhysU (The University of Texas at Austin) + Robert van de Geijn @rvdg (The University of Texas at Austin) + Kiran Varaganti @kvaragan (AMD) + Natalia Vassilieva (Hewlett Packard Enterprise) + Zhang Xianyi @xianyi (Chinese Academy of Sciences) + Benda Xu @heroxbd + Costas Yamin @cosstas + Chenhan Yu @ChenhanYu (The University of Texas at Austin) + Roman Yurchak @rth (Symerio) + M. Zhou @cdluminate BLIS's development was partially funded by grants from industry partners, including diff --git a/docs/Multithreading.md b/docs/Multithreading.md index a2630b18d..98f9539ad 100644 --- a/docs/Multithreading.md +++ b/docs/Multithreading.md @@ -107,7 +107,11 @@ This pattern--automatic or manual--holds regardless of which of the three method Regardless of which method is employed, and which specific way within each method, after setting the number of threads, the application may call the desired level-3 operation (via either the [typed API](docs/BLISTypedAPI.md) or the [object API](docs/BLISObjectAPI.md)) and the operation will execute in a multithreaded manner. (When calling BLIS via the BLAS API, only the first two (global) methods are available.) -**Note**: Please be aware of what happens if you try to specify both the automatic and manual ways, as it could otherwise confuse new users. Regardless of which broad method is used, **if multithreading is specified via both the automatic and manual ways, the manual way will always take precedence.** Also, specifying parallelism for even *one* loop counts as specifying the manual way (in which case the ways of parallelism for the remaining loops will be assumed to be 1). +**Note**: Please be aware of what happens if you try to specify both the automatic and manual ways, as it could otherwise confuse new users. Here are the important points: + * Regardless of which broad method is used, **if multithreading is specified via both the automatic and manual ways, the values set via the manual way will always take precedence.** + * Specifying parallelism for even *one* loop counts as specifying the manual way (in which case the ways of parallelism for the remaining loops will be assumed to be 1). + * If you have specified multithreading via *both* the automatic and manual ways, BLIS will **not** complain if the values are inconsistent with one another. (For example, you may request 8 total threads be used while also specifing 4 ways of parallelism within each of two matrix multiplication loops, for a total of 16 ways.) Furthermore, you will be able to query these inconsistent values via the runtime API both before and after multithreading executes. + * If multithreading is disabled, you **may still** specify multithreading values via either the manual or automatic ways. However, BLIS will silently ignore **all** of these values. A BLIS library that is built with multithreading disabled at configure-time will always run sequentially (from the prespective of a single application thread). ## Globally via environment variables