565.77

565.57.01
560.35.03
2026-01-27 19:49:47 +00:00 · 2024-12-05 16:37:35 +01:00 · 2024-10-22 17:38:58 +02:00 · 2024-08-19 10:46:21 -07:00 · 2024-07-31 11:27:06 -07:00 · 2024-07-19 15:45:15 -07:00
1583 changed files with 327964 additions and 177042 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,209 +0,0 @@
-# Changelog
-
-## Release 550 Entries
-
-### [550.40.55] 2024-03-07
-
-### [550.40.53] 2024-02-28
-
-#### Added
-
- Added vGPU Host and vGPU Guest support. For vGPU Host, please refer to the README.vgpu packaged in the vGPU Host Package for more details.
-
-### [550.40.07] 2024-01-24
-
-#### Fixed
-
- Set INSTALL_MOD_DIR only if it's not defined, [#570](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/570) by @keelung-yang
-## Release 545 Entries
-
-#### Fixed
-
- The brightness control of NVIDIA seems to be broken, [#573](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/573)
-
-### [545.29.02] 2023-10-31
-
-### [545.23.06] 2023-10-17
-
-#### Fixed
-
- Fix always-false conditional, [#493](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/493) by @meme8383
-
-#### Added
-
- Added beta-quality support for GeForce and Workstation GPUs. Please see the "Open Linux Kernel Modules" chapter in the NVIDIA GPU driver end user README for details.
-
-## Release 535 Entries
-
-### [535.129.03] 2023-10-31
-
-### [535.113.01] 2023-09-21
-
-#### Fixed
-
- Fixed building main against current centos stream 8 fails, [#550](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/550) by @airlied
-
-### [535.104.05] 2023-08-22
-
-### [535.98] 2023-08-08
-
-### [535.86.10] 2023-07-31
-
-### [535.86.05] 2023-07-18
-
-### [535.54.03] 2023-06-14
-
-### [535.43.02] 2023-05-30
-
-#### Fixed
-
- Fixed console restore with traditional VGA consoles.
-
-#### Added
-
- Added support for Run Time D3 (RTD3) on Ampere and later GPUs.
- Added support for G-Sync on desktop GPUs.
-
-## Release 530 Entries
-
-### [530.41.03] 2023-03-23
-
-### [530.30.02] 2023-02-28
-
-#### Changed
-
- GSP firmware is now distributed as `gsp_tu10x.bin` and `gsp_ga10x.bin` to better reflect the GPU architectures supported by each firmware file in this release.
-    - The .run installer will continue to install firmware to /lib/firmware/nvidia/<version> and the nvidia.ko kernel module will load the appropriate firmware for each GPU at runtime.
-  
-#### Fixed
-
- Add support for resizable BAR on Linux when NVreg_EnableResizableBar=1 module param is set. [#3](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/3) by @sjkelly
-
-#### Added
-
- Support for power management features like Suspend, Hibernate and Resume.
-
-## Release 525 Entries
-
-### [525.147.05] 2023-10-31
-
-#### Fixed
-
- Fix nvidia_p2p_get_pages(): Fix double-free in register-callback error path, [#557](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/557) by @BrendanCunningham
-
-### [525.125.06] 2023-06-26
-
-### [525.116.04] 2023-05-09
-
-### [525.116.03] 2023-04-25
-
-### [525.105.17] 2023-03-30
-
-### [525.89.02] 2023-02-08
-
-### [525.85.12] 2023-01-30
-
-### [525.85.05] 2023-01-19
-
-#### Fixed
-
- Fix build problems with Clang 15.0, [#377](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/377) by @ptr1337
-
-### [525.78.01] 2023-01-05
-
-### [525.60.13] 2022-12-05
-
-### [525.60.11] 2022-11-28
-
-#### Fixed
-
- Fixed nvenc compatibility with usermode clients [#104](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/104)
-
-### [525.53] 2022-11-10
-
-#### Changed
-
- GSP firmware is now distributed as multiple firmware files: this release has `gsp_tu10x.bin` and `gsp_ad10x.bin` replacing `gsp.bin` from previous releases.
-    - Each file is named after a GPU architecture and supports GPUs from one or more architectures. This allows GSP firmware to better leverage each architecture's capabilities.
-    - The .run installer will continue to install firmware to `/lib/firmware/nvidia/<version>` and the `nvidia.ko` kernel module will load the appropriate firmware for each GPU at runtime.
-
-#### Fixed
-
- Add support for IBT (indirect branch tracking) on supported platforms, [#256](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/256) by @rnd-ash
- Return EINVAL when [failing to] allocating memory, [#280](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/280) by @YusufKhan-gamedev
- Fix various typos in nvidia/src/kernel, [#16](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/16) by @alexisgeoffrey
- Added support for rotation in X11, Quadro Sync, Stereo, and YUV 4:2:0 on Turing.
-
-## Release 520 Entries
-
-### [520.61.07] 2022-10-20
-
-### [520.56.06] 2022-10-12
-
-#### Added
-
- Introduce support for GeForce RTX 4090 GPUs.
-
-### [520.61.05] 2022-10-10
-
-#### Added
-
- Introduce support for NVIDIA H100 GPUs.
-
-#### Fixed
-
- Fix/Improve Makefile, [#308](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/308/) by @izenynn
- Make nvLogBase2 more efficient, [#177](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/177/) by @DMaroo
- nv-pci: fixed always true expression, [#195](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/195/) by @ValZapod
-
-## Release 515 Entries
-
-### [515.76] 2022-09-20
-
-#### Fixed
-
- Improved compatibility with new Linux kernel releases
- Fixed possible excessive GPU power draw on an idle X11 or Wayland desktop when driving high resolutions or refresh rates
-
-### [515.65.07] 2022-10-19
-
-### [515.65.01] 2022-08-02
-
-#### Fixed
-
- Collection of minor fixes to issues, [#6](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/61) by @Joshua-Ashton
- Remove unnecessary use of acpi_bus_get_device().
-
-### [515.57] 2022-06-28
-
-#### Fixed
-
- Backtick is deprecated, [#273](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/273) by @arch-user-france1
-
-### [515.48.07] 2022-05-31
-
-#### Added
-
- List of compatible GPUs in README.md.
-
-#### Fixed
-
- Fix various README capitalizations, [#8](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/8) by @27lx 
- Automatically tag bug report issues, [#15](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/15) by @thebeanogamer
- Improve conftest.sh Script, [#37](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/37) by @Nitepone
- Update HTTP link to HTTPS, [#101](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/101) by @alcaparra
- moved array sanity check to before the array access, [#117](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/117) by @RealAstolfo
- Fixed some typos, [#122](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/122) by @FEDOyt
- Fixed capitalization, [#123](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/123) by @keroeslux
- Fix typos in NVDEC Engine Descriptor, [#126](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/126) from @TrickyDmitriy
- Extranous apostrohpes in a makefile script [sic], [#14](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/14) by @kiroma
- HDMI no audio @ 4K above 60Hz, [#75](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/75) by @adolfotregosa
- dp_configcaps.cpp:405: array index sanity check in wrong place?, [#110](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/110) by @dcb314
- NVRM kgspInitRm_IMPL: missing NVDEC0 engine, cannot initialize GSP-RM, [#116](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/116) by @kfazz
- ERROR: modpost: "backlight_device_register" [...nvidia-modeset.ko] undefined, [#135](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/135) by @sndirsch
- aarch64 build fails, [#151](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/151) by @frezbo
-
-### [515.43.04] 2022-05-11
-
- Initial release.
-
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 # NVIDIA Linux Open GPU Kernel Module Source

 This is the source release of the NVIDIA Linux open GPU kernel modules,
-version 550.40.55.
+version 565.77.


 ## How to Build
@@ -17,7 +17,7 @@ as root:

 Note that the kernel modules built here must be used with GSP
 firmware and user-space NVIDIA GPU driver components from a corresponding
-550.40.55 driver release.  This can be achieved by installing
+565.77 driver release.  This can be achieved by installing
 the NVIDIA GPU driver from the .run file using the `--no-kernel-modules`
 option.  E.g.,

@@ -74,7 +74,7 @@ kernel.

 The NVIDIA open kernel modules support the same range of Linux kernel
 versions that are supported with the proprietary NVIDIA kernel modules.
-This is currently Linux kernel 3.10 or newer.
+This is currently Linux kernel 4.15 or newer.


 ## How to Contribute
@@ -179,16 +179,13 @@ software applications.

 ## Compatible GPUs

-The NVIDIA open kernel modules can be used on any Turing or later GPU
-(see the table below). However, in the __DRIVER_VERION__ release, GeForce and
-Workstation support is considered to be Beta quality. The open kernel modules
-are suitable for broad usage, and NVIDIA requests feedback on any issues
-encountered specific to them.
+The NVIDIA open kernel modules can be used on any Turing or later GPU (see the
+table below).

 For details on feature support and limitations, see the NVIDIA GPU driver
 end user README here:

-https://us.download.nvidia.com/XFree86/Linux-x86_64/550.40.55/README/kernel_open.html
+https://us.download.nvidia.com/XFree86/Linux-x86_64/565.77/README/kernel_open.html

 For vGPU support, please refer to the README.vgpu packaged in the vGPU Host
 Package for more details.
@@ -202,6 +199,7 @@ Subsystem Device ID.
 | NVIDIA TITAN RTX                                | 1E02           |
 | NVIDIA GeForce RTX 2080 Ti                      | 1E04           |
 | NVIDIA GeForce RTX 2080 Ti                      | 1E07           |
+| NVIDIA CMP 50HX                                 | 1E09           |
 | Quadro RTX 6000                                 | 1E30           |
 | Quadro RTX 8000                                 | 1E30 1028 129E |
 | Quadro RTX 8000                                 | 1E30 103C 129E |
@@ -394,6 +392,7 @@ Subsystem Device ID.
 | NVIDIA GeForce RTX 2070                         | 1F07           |
 | NVIDIA GeForce RTX 2060                         | 1F08           |
 | NVIDIA GeForce GTX 1650                         | 1F0A           |
+| NVIDIA CMP 40HX                                 | 1F0B           |
 | NVIDIA GeForce RTX 2070                         | 1F10           |
 | NVIDIA GeForce RTX 2070 with Max-Q Design       | 1F10 1025 132D |
 | NVIDIA GeForce RTX 2070 with Max-Q Design       | 1F10 1025 1342 |
@@ -651,6 +650,7 @@ Subsystem Device ID.
 | NVIDIA T1000 8GB                                | 1FF0 17AA 1612 |
 | NVIDIA T400 4GB                                 | 1FF2 1028 1613 |
 | NVIDIA T400 4GB                                 | 1FF2 103C 1613 |
+| NVIDIA T400E                                    | 1FF2 103C 18FF |
 | NVIDIA T400 4GB                                 | 1FF2 103C 8A80 |
 | NVIDIA T400 4GB                                 | 1FF2 10DE 1613 |
 | NVIDIA T400E                                    | 1FF2 10DE 18FF |
@@ -693,6 +693,7 @@ Subsystem Device ID.
 | NVIDIA GeForce GTX 1660                         | 2184           |
 | NVIDIA GeForce GTX 1650 SUPER                   | 2187           |
 | NVIDIA GeForce GTX 1650                         | 2188           |
+| NVIDIA CMP 30HX                                 | 2189           |
 | NVIDIA GeForce GTX 1660 Ti                      | 2191           |
 | NVIDIA GeForce GTX 1660 Ti with Max-Q Design    | 2191 1028 0949 |
 | NVIDIA GeForce GTX 1660 Ti with Max-Q Design    | 2191 103C 85FB |
@@ -753,14 +754,20 @@ Subsystem Device ID.
 | NVIDIA H800                                     | 2324 10DE 17A8 |
 | NVIDIA H20                                      | 2329 10DE 198B |
 | NVIDIA H20                                      | 2329 10DE 198C |
+| NVIDIA H20-3e                                   | 232C 10DE 2063 |
+| NVIDIA H20-3e                                   | 232C 10DE 2064 |
 | NVIDIA H100 80GB HBM3                           | 2330 10DE 16C0 |
 | NVIDIA H100 80GB HBM3                           | 2330 10DE 16C1 |
 | NVIDIA H100 PCIe                                | 2331 10DE 1626 |
+| NVIDIA H200                                     | 2335 10DE 18BE |
+| NVIDIA H200                                     | 2335 10DE 18BF |
 | NVIDIA H100                                     | 2339 10DE 17FC |
 | NVIDIA H800 NVL                                 | 233A 10DE 183A |
+| NVIDIA H200 NVL                                 | 233B 10DE 1996 |
 | NVIDIA GH200 120GB                              | 2342 10DE 16EB |
 | NVIDIA GH200 120GB                              | 2342 10DE 1805 |
 | NVIDIA GH200 480GB                              | 2342 10DE 1809 |
+| NVIDIA GH200 144G HBM3e                         | 2348 10DE 18D2 |
 | NVIDIA GeForce RTX 3060 Ti                      | 2414           |
 | NVIDIA GeForce RTX 3080 Ti Laptop GPU           | 2420           |
 | NVIDIA RTX A5500 Laptop GPU                     | 2438           |
@@ -829,6 +836,16 @@ Subsystem Device ID.
 | NVIDIA GeForce RTX 3050 4GB Laptop GPU          | 25AB           |
 | NVIDIA GeForce RTX 3050 6GB Laptop GPU          | 25AC           |
 | NVIDIA GeForce RTX 2050                         | 25AD           |
+| NVIDIA RTX A1000                                | 25B0 1028 1878 |
+| NVIDIA RTX A1000                                | 25B0 103C 1878 |
+| NVIDIA RTX A1000                                | 25B0 103C 8D96 |
+| NVIDIA RTX A1000                                | 25B0 10DE 1878 |
+| NVIDIA RTX A1000                                | 25B0 17AA 1878 |
+| NVIDIA RTX A400                                 | 25B2 1028 1879 |
+| NVIDIA RTX A400                                 | 25B2 103C 1879 |
+| NVIDIA RTX A400                                 | 25B2 103C 8D95 |
+| NVIDIA RTX A400                                 | 25B2 10DE 1879 |
+| NVIDIA RTX A400                                 | 25B2 17AA 1879 |
 | NVIDIA A16                                      | 25B6 10DE 14A9 |
 | NVIDIA A2                                       | 25B6 10DE 157E |
 | NVIDIA RTX A2000 Laptop GPU                     | 25B8           |
@@ -847,6 +864,7 @@ Subsystem Device ID.
 | NVIDIA RTX A500 Embedded GPU                    | 25FB           |
 | NVIDIA GeForce RTX 4090                         | 2684           |
 | NVIDIA GeForce RTX 4090 D                       | 2685           |
+| NVIDIA GeForce RTX 4070 Ti SUPER                | 2689           |
 | NVIDIA RTX 6000 Ada Generation                  | 26B1 1028 16A1 |
 | NVIDIA RTX 6000 Ada Generation                  | 26B1 103C 16A1 |
 | NVIDIA RTX 6000 Ada Generation                  | 26B1 10DE 16A1 |
@@ -864,9 +882,11 @@ Subsystem Device ID.
 | NVIDIA L40S                                     | 26B9 10DE 1851 |
 | NVIDIA L40S                                     | 26B9 10DE 18CF |
 | NVIDIA L20                                      | 26BA 10DE 1957 |
+| NVIDIA L20                                      | 26BA 10DE 1990 |
 | NVIDIA GeForce RTX 4080 SUPER                   | 2702           |
 | NVIDIA GeForce RTX 4080                         | 2704           |
 | NVIDIA GeForce RTX 4070 Ti SUPER                | 2705           |
+| NVIDIA GeForce RTX 4070                         | 2709           |
 | NVIDIA GeForce RTX 4090 Laptop GPU              | 2717           |
 | NVIDIA RTX 5000 Ada Generation Laptop GPU       | 2730           |
 | NVIDIA GeForce RTX 4090 Laptop GPU              | 2757           |
@@ -874,6 +894,7 @@ Subsystem Device ID.
 | NVIDIA GeForce RTX 4070 Ti                      | 2782           |
 | NVIDIA GeForce RTX 4070 SUPER                   | 2783           |
 | NVIDIA GeForce RTX 4070                         | 2786           |
+| NVIDIA GeForce RTX 4060 Ti                      | 2788           |
 | NVIDIA GeForce RTX 4080 Laptop GPU              | 27A0           |
 | NVIDIA RTX 4000 SFF Ada Generation              | 27B0 1028 16FA |
 | NVIDIA RTX 4000 SFF Ada Generation              | 27B0 103C 16FA |
@@ -896,7 +917,9 @@ Subsystem Device ID.
 | NVIDIA RTX 3500 Ada Generation Embedded GPU     | 27FB           |
 | NVIDIA GeForce RTX 4060 Ti                      | 2803           |
 | NVIDIA GeForce RTX 4060 Ti                      | 2805           |
+| NVIDIA GeForce RTX 4060                         | 2808           |
 | NVIDIA GeForce RTX 4070 Laptop GPU              | 2820           |
+| NVIDIA GeForce RTX 3050 A Laptop GPU            | 2822           |
 | NVIDIA RTX 3000 Ada Generation Laptop GPU       | 2838           |
 | NVIDIA GeForce RTX 4070 Laptop GPU              | 2860           |
 | NVIDIA GeForce RTX 4060                         | 2882           |
@@ -904,8 +927,11 @@ Subsystem Device ID.
 | NVIDIA GeForce RTX 4050 Laptop GPU              | 28A1           |
 | NVIDIA RTX 2000 Ada Generation                  | 28B0 1028 1870 |
 | NVIDIA RTX 2000 Ada Generation                  | 28B0 103C 1870 |
+| NVIDIA RTX 2000E Ada Generation                 | 28B0 103C 1871 |
 | NVIDIA RTX 2000 Ada Generation                  | 28B0 10DE 1870 |
+| NVIDIA RTX 2000E Ada Generation                 | 28B0 10DE 1871 |
 | NVIDIA RTX 2000 Ada Generation                  | 28B0 17AA 1870 |
+| NVIDIA RTX 2000E Ada Generation                 | 28B0 17AA 1871 |
 | NVIDIA RTX 2000 Ada Generation Laptop GPU       | 28B8           |
 | NVIDIA RTX 1000 Ada Generation Laptop GPU       | 28B9           |
 | NVIDIA RTX 500 Ada Generation Laptop GPU        | 28BA           |
--- a/kernel-open/Kbuild
+++ b/kernel-open/Kbuild
@@ -72,7 +72,7 @@ EXTRA_CFLAGS += -I$(src)/common/inc
 EXTRA_CFLAGS += -I$(src)
 EXTRA_CFLAGS += -Wall $(DEFINES) $(INCLUDES) -Wno-cast-qual -Wno-format-extra-args
 EXTRA_CFLAGS += -D__KERNEL__ -DMODULE -DNVRM
-EXTRA_CFLAGS += -DNV_VERSION_STRING=\"550.40.55\"
+EXTRA_CFLAGS += -DNV_VERSION_STRING=\"565.77\"

 ifneq ($(SYSSRCHOST1X),)
 EXTRA_CFLAGS += -I$(SYSSRCHOST1X)
@@ -118,7 +118,7 @@ ifeq ($(ARCH),x86_64)
 endif

 ifeq ($(ARCH),powerpc)
- EXTRA_CFLAGS += -mlittle-endian -mno-strict-align -mno-altivec
+ EXTRA_CFLAGS += -mlittle-endian -mno-strict-align
 endif

 EXTRA_CFLAGS += -DNV_UVM_ENABLE
@@ -170,6 +170,9 @@ NV_CONFTEST_CMD := /bin/sh $(NV_CONFTEST_SCRIPT) \
 NV_CFLAGS_FROM_CONFTEST := $(shell $(NV_CONFTEST_CMD) build_cflags)

 NV_CONFTEST_CFLAGS = $(NV_CFLAGS_FROM_CONFTEST) $(EXTRA_CFLAGS) -fno-pie
+NV_CONFTEST_CFLAGS += $(call cc-disable-warning,pointer-sign)
+NV_CONFTEST_CFLAGS += $(call cc-option,-fshort-wchar,)
+NV_CONFTEST_CFLAGS += $(call cc-option,-Werror=incompatible-pointer-types,)
 NV_CONFTEST_CFLAGS += -Wno-error

 NV_CONFTEST_COMPILE_TEST_HEADERS := $(obj)/conftest/macros.h
--- a/kernel-open/Makefile
+++ b/kernel-open/Makefile
@@ -28,7 +28,7 @@ else
  else
    KERNEL_UNAME ?= $(shell uname -r)
    KERNEL_MODLIB := /lib/modules/$(KERNEL_UNAME)
-    KERNEL_SOURCES := $(shell test -d $(KERNEL_MODLIB)/source && echo $(KERNEL_MODLIB)/source || echo $(KERNEL_MODLIB)/build)
+    KERNEL_SOURCES := $(shell ((test -d $(KERNEL_MODLIB)/source && echo $(KERNEL_MODLIB)/source) || (test -d $(KERNEL_MODLIB)/build/source && echo $(KERNEL_MODLIB)/build/source)) || echo $(KERNEL_MODLIB)/build)
  endif

  KERNEL_OUTPUT := $(KERNEL_SOURCES)
@@ -42,12 +42,32 @@ else
  else
    KERNEL_UNAME ?= $(shell uname -r)
    KERNEL_MODLIB := /lib/modules/$(KERNEL_UNAME)
-    ifeq ($(KERNEL_SOURCES), $(KERNEL_MODLIB)/source)
+    # $(filter patter...,text) - Returns all whitespace-separated words in text that
+    # do match any of the pattern words, removing any words that do not match.
+    # Set the KERNEL_OUTPUT only if either $(KERNEL_MODLIB)/source or
+    # $(KERNEL_MODLIB)/build/source path matches the KERNEL_SOURCES.
+    ifneq ($(filter $(KERNEL_SOURCES),$(KERNEL_MODLIB)/source $(KERNEL_MODLIB)/build/source),)
      KERNEL_OUTPUT := $(KERNEL_MODLIB)/build
      KBUILD_PARAMS := KBUILD_OUTPUT=$(KERNEL_OUTPUT)
    endif
  endif

+  # If CC hasn't been set explicitly, check the value of CONFIG_CC_VERSION_TEXT.
+  # Look for the compiler specified there, and use it by default, if found.
+  ifeq ($(origin CC),default)
+    cc_version_text=$(firstword $(shell . $(KERNEL_OUTPUT)/.config; \
+                      echo "$$CONFIG_CC_VERSION_TEXT"))
+
+    ifneq ($(cc_version_text),)
+      ifeq ($(shell command -v $(cc_version_text)),)
+          $(warning WARNING: Unable to locate the compiler $(cc_version_text) \
+            from CONFIG_CC_VERSION_TEXT in the kernel configuration.)
+      else
+          CC=$(cc_version_text)
+      endif
+    endif
+  endif
+
  CC ?= cc
  LD ?= ld
  OBJDUMP ?= objdump
@@ -61,6 +81,16 @@ else
    )
  endif

+  KERNEL_ARCH = $(ARCH)
+
+  ifneq ($(filter $(ARCH),i386 x86_64),)
+    KERNEL_ARCH = x86
+  else
+    ifeq ($(filter $(ARCH),arm64 powerpc),)
+        $(error Unsupported architecture $(ARCH))
+    endif
+  endif
+
  NV_KERNEL_MODULES ?= $(wildcard nvidia nvidia-uvm nvidia-vgpu-vfio nvidia-modeset nvidia-drm nvidia-peermem)
  NV_KERNEL_MODULES := $(filter-out $(NV_EXCLUDE_KERNEL_MODULES), \
                                    $(NV_KERNEL_MODULES))
@@ -102,8 +132,9 @@ else
  # module symbols on which the Linux kernel's module resolution is dependent
  # and hence must be used whenever present.

-  LD_SCRIPT ?= $(KERNEL_SOURCES)/scripts/module-common.lds      \
-               $(KERNEL_SOURCES)/arch/$(ARCH)/kernel/module.lds \
+  LD_SCRIPT ?= $(KERNEL_SOURCES)/scripts/module-common.lds             \
+               $(KERNEL_SOURCES)/arch/$(KERNEL_ARCH)/kernel/module.lds \
+               $(KERNEL_OUTPUT)/arch/$(KERNEL_ARCH)/module.lds         \
               $(KERNEL_OUTPUT)/scripts/module.lds
  NV_MODULE_COMMON_SCRIPTS := $(foreach s, $(wildcard $(LD_SCRIPT)), -T $(s))

--- a/kernel-open/common/inc/nv-firmware.h
+++ b/kernel-open/common/inc/nv-firmware.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2022-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -44,6 +44,7 @@ typedef enum
    NV_FIRMWARE_CHIP_FAMILY_GA10X = 4,
    NV_FIRMWARE_CHIP_FAMILY_AD10X = 5,
    NV_FIRMWARE_CHIP_FAMILY_GH100 = 6,
+    NV_FIRMWARE_CHIP_FAMILY_GB10X = 8,
    NV_FIRMWARE_CHIP_FAMILY_END,
 } nv_firmware_chip_family_t;

@@ -52,6 +53,7 @@ static inline const char *nv_firmware_chip_family_to_string(
 )
 {
    switch (fw_chip_family) {
+        case NV_FIRMWARE_CHIP_FAMILY_GB10X: return "gb10x";
        case NV_FIRMWARE_CHIP_FAMILY_GH100: return "gh100";
        case NV_FIRMWARE_CHIP_FAMILY_AD10X: return "ad10x";
        case NV_FIRMWARE_CHIP_FAMILY_GA10X: return "ga10x";
@@ -66,13 +68,13 @@ static inline const char *nv_firmware_chip_family_to_string(
    return NULL;
 }

-// The includer (presumably nv.c) may optionally define
-// NV_FIRMWARE_PATH_FOR_FILENAME(filename)
-// to return a string "path" given a gsp_*.bin or gsp_log_*.bin filename.
+// The includer may optionally define
+// NV_FIRMWARE_FOR_NAME(name)
+// to return a platform-defined string for a given a gsp_* or gsp_log_* name.
 //
-// The function nv_firmware_path will then be available.
-#if defined(NV_FIRMWARE_PATH_FOR_FILENAME)
-static inline const char *nv_firmware_path(
+// The function nv_firmware_for_chip_family will then be available.
+#if defined(NV_FIRMWARE_FOR_NAME)
+static inline const char *nv_firmware_for_chip_family(
    nv_firmware_type_t fw_type,
    nv_firmware_chip_family_t fw_chip_family
 )
@@ -81,15 +83,16 @@ static inline const char *nv_firmware_path(
    {
        switch (fw_chip_family)
        {
+            case NV_FIRMWARE_CHIP_FAMILY_GB10X:  // fall through
            case NV_FIRMWARE_CHIP_FAMILY_GH100:  // fall through
            case NV_FIRMWARE_CHIP_FAMILY_AD10X:  // fall through
            case NV_FIRMWARE_CHIP_FAMILY_GA10X:
-                return NV_FIRMWARE_PATH_FOR_FILENAME("gsp_ga10x.bin");
+                return NV_FIRMWARE_FOR_NAME("gsp_ga10x");

            case NV_FIRMWARE_CHIP_FAMILY_GA100:  // fall through
            case NV_FIRMWARE_CHIP_FAMILY_TU11X:  // fall through
            case NV_FIRMWARE_CHIP_FAMILY_TU10X:
-                return NV_FIRMWARE_PATH_FOR_FILENAME("gsp_tu10x.bin");
+                return NV_FIRMWARE_FOR_NAME("gsp_tu10x");

            case NV_FIRMWARE_CHIP_FAMILY_END:  // fall through
            case NV_FIRMWARE_CHIP_FAMILY_NULL:
@@ -100,15 +103,16 @@ static inline const char *nv_firmware_path(
    {
        switch (fw_chip_family)
        {
+            case NV_FIRMWARE_CHIP_FAMILY_GB10X:  // fall through
            case NV_FIRMWARE_CHIP_FAMILY_GH100:  // fall through
            case NV_FIRMWARE_CHIP_FAMILY_AD10X:  // fall through
            case NV_FIRMWARE_CHIP_FAMILY_GA10X:
-                return NV_FIRMWARE_PATH_FOR_FILENAME("gsp_log_ga10x.bin");
+                return NV_FIRMWARE_FOR_NAME("gsp_log_ga10x");

            case NV_FIRMWARE_CHIP_FAMILY_GA100:  // fall through
            case NV_FIRMWARE_CHIP_FAMILY_TU11X:  // fall through
            case NV_FIRMWARE_CHIP_FAMILY_TU10X:
-                return NV_FIRMWARE_PATH_FOR_FILENAME("gsp_log_tu10x.bin");
+                return NV_FIRMWARE_FOR_NAME("gsp_log_tu10x");

            case NV_FIRMWARE_CHIP_FAMILY_END:  // fall through
            case NV_FIRMWARE_CHIP_FAMILY_NULL:
@@ -118,15 +122,15 @@ static inline const char *nv_firmware_path(

    return "";
 }
-#endif  // defined(NV_FIRMWARE_PATH_FOR_FILENAME)
+#endif  // defined(NV_FIRMWARE_FOR_NAME)

-// The includer (presumably nv.c) may optionally define
-// NV_FIRMWARE_DECLARE_GSP_FILENAME(filename)
+// The includer may optionally define
+// NV_FIRMWARE_DECLARE_GSP(name)
 // which will then be invoked (at the top-level) for each
-// gsp_*.bin (but not gsp_log_*.bin)
-#if defined(NV_FIRMWARE_DECLARE_GSP_FILENAME)
-NV_FIRMWARE_DECLARE_GSP_FILENAME("gsp_ga10x.bin")
-NV_FIRMWARE_DECLARE_GSP_FILENAME("gsp_tu10x.bin")
-#endif  // defined(NV_FIRMWARE_DECLARE_GSP_FILENAME)
+// gsp_* (but not gsp_log_*)
+#if defined(NV_FIRMWARE_DECLARE_GSP)
+NV_FIRMWARE_DECLARE_GSP("gsp_ga10x")
+NV_FIRMWARE_DECLARE_GSP("gsp_tu10x")
+#endif  // defined(NV_FIRMWARE_DECLARE_GSP)

-#endif  // NV_FIRMWARE_DECLARE_GSP_FILENAME
+#endif  // NV_FIRMWARE_DECLARE_GSP
--- a/kernel-open/common/inc/nv-hypervisor.h
+++ b/kernel-open/common/inc/nv-hypervisor.h
@@ -37,13 +37,11 @@ typedef enum _HYPERVISOR_TYPE
    OS_HYPERVISOR_UNKNOWN
 } HYPERVISOR_TYPE;

-#define CMD_VGPU_VFIO_WAKE_WAIT_QUEUE         0
-#define CMD_VGPU_VFIO_INJECT_INTERRUPT        1
-#define CMD_VGPU_VFIO_REGISTER_MDEV           2
-#define CMD_VGPU_VFIO_PRESENT                 3
-#define CMD_VFIO_PCI_CORE_PRESENT             4
+#define CMD_VFIO_WAKE_REMOVE_GPU              1
+#define CMD_VGPU_VFIO_PRESENT                 2
+#define CMD_VFIO_PCI_CORE_PRESENT             3

-#define MAX_VF_COUNT_PER_GPU 64
+#define MAX_VF_COUNT_PER_GPU                  64

 typedef enum _VGPU_TYPE_INFO
 {
@@ -54,17 +52,11 @@ typedef enum _VGPU_TYPE_INFO

 typedef struct
 {
-    void  *vgpuVfioRef;
-    void  *waitQueue;
    void  *nv;
-    NvU32 *vgpuTypeIds;
-    NvU8 **vgpuNames;
-    NvU32  numVgpuTypes;
-    NvU32  domain;
-    NvU8   bus;
-    NvU8   slot;
-    NvU8   function;
-    NvBool is_virtfn;
+    NvU32 domain;
+    NvU32 bus;
+    NvU32 device;
+    NvU32 return_status;
 } vgpu_vfio_info;

 typedef struct
--- a/kernel-open/common/inc/nv-kthread-q-os.h
+++ b/kernel-open/common/inc/nv-kthread-q-os.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2016-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -43,6 +43,8 @@ struct nv_kthread_q
    atomic_t main_loop_should_exit;

    struct task_struct *q_kthread;
+
+    bool is_unload_flush_ongoing;
 };

 struct nv_kthread_q_item
--- a/kernel-open/common/inc/nv-linux.h
+++ b/kernel-open/common/inc/nv-linux.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2001-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2001-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -58,14 +58,10 @@
 #include <linux/version.h>
 #include <linux/utsname.h>

-#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 32)
-#error "This driver does not support kernels older than 2.6.32!"
-#elif LINUX_VERSION_CODE < KERNEL_VERSION(2, 7, 0)
-#  define KERNEL_2_6
-#elif LINUX_VERSION_CODE >= KERNEL_VERSION(3, 0, 0)
-#  define KERNEL_3
-#else
-#error "This driver does not support development kernels!"
+#if LINUX_VERSION_CODE == KERNEL_VERSION(4, 4, 0)
+// Version 4.4 is allowed, temporarily, although not officially supported.
+#elif LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
+#error "This driver does not support kernels older than Linux 4.15!"
 #endif

 #if defined (CONFIG_SMP) && !defined (__SMP__)
@@ -474,7 +470,9 @@ static inline void *nv_vmalloc(unsigned long size)
    void *ptr = __vmalloc(size, GFP_KERNEL);
 #endif
    if (ptr)
+    {
        NV_MEMDBG_ADD(ptr, size);
+    }
    return ptr;
 }

@@ -492,7 +490,9 @@ static inline void *nv_ioremap(NvU64 phys, NvU64 size)
    void *ptr = ioremap(phys, size);
 #endif
    if (ptr)
+    {
        NV_MEMDBG_ADD(ptr, size);
+    }
    return ptr;
 }

@@ -528,8 +528,9 @@ static inline void *nv_ioremap_cache(NvU64 phys, NvU64 size)
 #endif

    if (ptr)
+    {
        NV_MEMDBG_ADD(ptr, size);
-
+    }
    return ptr;
 }

@@ -545,8 +546,9 @@ static inline void *nv_ioremap_wc(NvU64 phys, NvU64 size)
 #endif

    if (ptr)
+    {
        NV_MEMDBG_ADD(ptr, size);
-
+    }
    return ptr;
 }

@@ -675,7 +677,9 @@ static inline NvUPtr nv_vmap(struct page **pages, NvU32 page_count,
    /* All memory cached in PPC64LE; can't honor 'cached' input. */
    ptr = vmap(pages, page_count, VM_MAP, prot);
    if (ptr)
+    {
        NV_MEMDBG_ADD(ptr, page_count * PAGE_SIZE);
+    }
    return (NvUPtr)ptr;
 }

@@ -720,6 +724,7 @@ static inline dma_addr_t nv_phys_to_dma(struct device *dev, NvU64 pa)
 #endif
 }

+#define NV_GET_OFFSET_IN_PAGE(phys_page) offset_in_page(phys_page)
 #define NV_GET_PAGE_STRUCT(phys_page) virt_to_page(__va(phys_page))
 #define NV_VMA_PGOFF(vma)             ((vma)->vm_pgoff)
 #define NV_VMA_SIZE(vma)              ((vma)->vm_end - (vma)->vm_start)
@@ -836,16 +841,16 @@ static inline dma_addr_t nv_phys_to_dma(struct device *dev, NvU64 pa)
 #define NV_PRINT_AT(nv_debug_level,at)                                           \
    {                                                                            \
        nv_printf(nv_debug_level,                                                \
-            "NVRM: VM: %s:%d: 0x%p, %d page(s), count = %d, flags = 0x%08x, "    \
+            "NVRM: VM: %s:%d: 0x%p, %d page(s), count = %d, "                    \
            "page_table = 0x%p\n",  __FUNCTION__, __LINE__, at,                  \
            at->num_pages, NV_ATOMIC_READ(at->usage_count),                      \
-            at->flags, at->page_table);                                          \
+            at->page_table);                                                     \
    }

 #define NV_PRINT_VMA(nv_debug_level,vma)                                                 \
    {                                                                                    \
        nv_printf(nv_debug_level,                                                        \
-            "NVRM: VM: %s:%d: 0x%lx - 0x%lx, 0x%08x bytes @ 0x%016llx, 0x%p, 0x%p\n",    \
+            "NVRM: VM: %s:%d: 0x%lx - 0x%lx, 0x%08lx bytes @ 0x%016llx, 0x%p, 0x%p\n",    \
            __FUNCTION__, __LINE__, vma->vm_start, vma->vm_end, NV_VMA_SIZE(vma),        \
            NV_VMA_OFFSET(vma), NV_VMA_PRIVATE(vma), NV_VMA_FILE(vma));                  \
    }
@@ -947,14 +952,14 @@ static inline int nv_remap_page_range(struct vm_area_struct *vma,
 }

 static inline int nv_io_remap_page_range(struct vm_area_struct *vma,
-    NvU64 phys_addr, NvU64 size, NvU32 extra_prot)
+    NvU64 phys_addr, NvU64 size, NvU32 extra_prot, NvU64 start)
 {
    int ret = -1;
 #if !defined(NV_XEN_SUPPORT_FULLY_VIRTUALIZED_KERNEL)
-    ret = nv_remap_page_range(vma, vma->vm_start, phys_addr, size,
+    ret = nv_remap_page_range(vma, start, phys_addr, size,
        nv_adjust_pgprot(vma->vm_page_prot, extra_prot));
 #else
-    ret = io_remap_pfn_range(vma, vma->vm_start, (phys_addr >> PAGE_SHIFT),
+    ret = io_remap_pfn_range(vma, start, (phys_addr >> PAGE_SHIFT),
        size, nv_adjust_pgprot(vma->vm_page_prot, extra_prot));
 #endif
    return ret;
@@ -1078,6 +1083,8 @@ static inline void nv_kmem_ctor_dummy(void *arg)
        kmem_cache_destroy(kmem_cache);     \
    }

+#define NV_KMEM_CACHE_ALLOC_ATOMIC(kmem_cache)     \
+    kmem_cache_alloc(kmem_cache, GFP_ATOMIC)
 #define NV_KMEM_CACHE_ALLOC(kmem_cache)     \
    kmem_cache_alloc(kmem_cache, GFP_KERNEL)
 #define NV_KMEM_CACHE_FREE(ptr, kmem_cache) \
@@ -1104,6 +1111,23 @@ static inline void *nv_kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags)
 #endif
 }

+static inline int nv_kmem_cache_alloc_stack_atomic(nvidia_stack_t **stack)
+{
+    nvidia_stack_t *sp = NULL;
+#if defined(NVCPU_X86_64)
+    if (rm_is_altstack_in_use())
+    {
+        sp = NV_KMEM_CACHE_ALLOC_ATOMIC(nvidia_stack_t_cache);
+        if (sp == NULL)
+            return -ENOMEM;
+        sp->size = sizeof(sp->stack);
+        sp->top = sp->stack + sp->size;
+    }
+#endif
+    *stack = sp;
+    return 0;
+}
+
 static inline int nv_kmem_cache_alloc_stack(nvidia_stack_t **stack)
 {
    nvidia_stack_t *sp = NULL;
@@ -1159,6 +1183,16 @@ typedef struct nvidia_pte_s {
    unsigned int    page_count;
 } nvidia_pte_t;

+#if defined(CONFIG_DMA_SHARED_BUFFER)
+/* Standard dma_buf-related information. */
+struct nv_dma_buf
+{
+    struct dma_buf *dma_buf;
+    struct dma_buf_attachment *dma_attach;
+    struct sg_table *sgt;
+};
+#endif // CONFIG_DMA_SHARED_BUFFER
+
 typedef struct nv_alloc_s {
    struct nv_alloc_s *next;
    struct device     *dev;
@@ -1174,6 +1208,7 @@ typedef struct nv_alloc_s {
        NvBool physical    : 1;
        NvBool unencrypted : 1;
        NvBool coherent    : 1;
+        NvBool carveout    : 1;
    } flags;
    unsigned int   cache_type;
    unsigned int   num_pages;
@@ -1614,6 +1649,10 @@ typedef struct nv_linux_state_s {
    nv_kthread_q_t open_q;
    NvBool is_accepting_opens;
    struct semaphore open_q_lock;
+#if defined(NV_VGPU_KVM_BUILD)
+    wait_queue_head_t wait;
+    NvS32 return_status;
+#endif
 } nv_linux_state_t;

 extern nv_linux_state_t *nv_linux_devices;
@@ -1803,20 +1842,6 @@ static inline int nv_is_control_device(struct inode *inode)
 #endif
 #endif

-static inline NvU64 nv_pci_bus_address(struct pci_dev *dev, NvU8 bar_index)
-{
-    NvU64 bus_addr = 0;
-#if defined(NV_PCI_BUS_ADDRESS_PRESENT)
-    bus_addr = pci_bus_address(dev, bar_index);
-#elif defined(CONFIG_PCI)
-    struct pci_bus_region region;
-
-    pcibios_resource_to_bus(dev, &region, &dev->resource[bar_index]);
-    bus_addr = region.start;
-#endif
-    return bus_addr;
-}
-
 /*
 * Decrements the usage count of the allocation, and moves the allocation to
 * the given nvlfp's free list if the usage count drops to zero.
@@ -1989,31 +2014,6 @@ static inline NvBool nv_platform_use_auto_online(nv_linux_state_t *nvl)
    return nvl->numa_info.use_auto_online;
 }

-typedef struct {
-    NvU64 base;
-    NvU64 size;
-    NvU32 nodeId;
-    int ret;
-} remove_numa_memory_info_t;
-
-static void offline_numa_memory_callback
-(
-    void *args
-)
-{
-#ifdef NV_OFFLINE_AND_REMOVE_MEMORY_PRESENT
-    remove_numa_memory_info_t *pNumaInfo = (remove_numa_memory_info_t *)args;
-#ifdef NV_REMOVE_MEMORY_HAS_NID_ARG
-    pNumaInfo->ret = offline_and_remove_memory(pNumaInfo->nodeId,
-                                               pNumaInfo->base,
-                                               pNumaInfo->size);
-#else
-    pNumaInfo->ret = offline_and_remove_memory(pNumaInfo->base,
-                                               pNumaInfo->size);
-#endif
-#endif
-}
-
 typedef enum
 {
    NV_NUMA_STATUS_DISABLED             = 0,
--- a/kernel-open/common/inc/nv-mm.h
+++ b/kernel-open/common/inc/nv-mm.h
@@ -29,17 +29,17 @@
 typedef int vm_fault_t;
 #endif

-/* pin_user_pages
+/*
+ * pin_user_pages()
+ *
 * Presence of pin_user_pages() also implies the presence of unpin-user_page().
- * Both were added in the v5.6-rc1
+ * Both were added in the v5.6.
 *
- * pin_user_pages() was added by commit eddb1c228f7951d399240
- * ("mm/gup: introduce pin_user_pages*() and FOLL_PIN") in v5.6-rc1 (2020-01-30)
- *
- * Removed vmas parameter from pin_user_pages() by commit 40896a02751
- * ("mm/gup: remove vmas parameter from pin_user_pages()")
- * in linux-next, expected in v6.5-rc1 (2023-05-17)
+ * pin_user_pages() was added by commit eddb1c228f79
+ * ("mm/gup: introduce pin_user_pages*() and FOLL_PIN") in v5.6.
 *
+ * Removed vmas parameter from pin_user_pages() by commit 4c630f307455
+ * ("mm/gup: remove vmas parameter from pin_user_pages()") in v6.5.
 */

 #include <linux/mm.h>
@@ -63,25 +63,28 @@ typedef int vm_fault_t;
    #define NV_UNPIN_USER_PAGE put_page
 #endif // NV_PIN_USER_PAGES_PRESENT

-/* get_user_pages
+/*
+ * get_user_pages()
 *
- * The 8-argument version of get_user_pages was deprecated by commit
- * (2016 Feb 12: cde70140fed8429acf7a14e2e2cbd3e329036653)for the non-remote case
+ * The 8-argument version of get_user_pages() was deprecated by commit
+ * cde70140fed8 ("mm/gup: Overload get_user_pages() functions") in v4.6-rc1.
 * (calling get_user_pages with current and current->mm).
 *
- * Completely moved to the 6 argument version of get_user_pages -
- * 2016 Apr 4: c12d2da56d0e07d230968ee2305aaa86b93a6832
+ * Completely moved to the 6 argument version of get_user_pages() by
+ * commit c12d2da56d0e ("mm/gup: Remove the macro overload API migration
+ * helpers from the get_user*() APIs") in v4.6-rc4.
 *
- * write and force parameters were replaced with gup_flags by -
- * 2016 Oct 12: 768ae309a96103ed02eb1e111e838c87854d8b51
+ * write and force parameters were replaced with gup_flags by
+ * commit 768ae309a961 ("mm: replace get_user_pages() write/force parameters
+ * with gup_flags") in v4.9.
 *
 * A 7-argument version of get_user_pages was introduced into linux-4.4.y by
- * commit 8e50b8b07f462ab4b91bc1491b1c91bd75e4ad40 which cherry-picked the
- * replacement of the write and force parameters with gup_flags
+ * commit 8e50b8b07f462 ("mm: replace get_user_pages() write/force parameters
+ * with gup_flags") which cherry-picked the replacement of the write and
+ * force parameters with gup_flags.
 *
- * Removed vmas parameter from get_user_pages() by commit 7bbf9c8c99
- * ("mm/gup: remove unused vmas parameter from get_user_pages()")
- * in linux-next, expected in v6.5-rc1 (2023-05-17)
+ * Removed vmas parameter from get_user_pages() by commit 54d020692b34
+ * ("mm/gup: remove unused vmas parameter from get_user_pages()") in v6.5.
 *
 */

@@ -112,18 +115,19 @@ typedef int vm_fault_t;
    }
 #endif // NV_GET_USER_PAGES_HAS_ARGS_FLAGS

-/* pin_user_pages_remote
+/*
+ * pin_user_pages_remote()
 *
- * pin_user_pages_remote() was added by commit eddb1c228f7951d399240
- * ("mm/gup: introduce pin_user_pages*() and FOLL_PIN") in v5.6 (2020-01-30)
+ * pin_user_pages_remote() was added by commit eddb1c228f79
+ * ("mm/gup: introduce pin_user_pages*() and FOLL_PIN") in v5.6.
 *
 * pin_user_pages_remote() removed 'tsk' parameter by commit
- * 64019a2e467a ("mm/gup: remove task_struct pointer for  all gup code")
- * in v5.9-rc1 (2020-08-11). *
+ * 64019a2e467a ("mm/gup: remove task_struct pointer for all gup code")
+ * in v5.9.
 *
 * Removed unused vmas parameter from pin_user_pages_remote() by commit
- * 83bcc2e132("mm/gup: remove unused vmas parameter from pin_user_pages_remote()")
- * in linux-next, expected in v6.5-rc1 (2023-05-14)
+ * 0b295316b3a9 ("mm/gup: remove unused vmas parameter from
+ * pin_user_pages_remote()") in v6.5.
 *
 */

@@ -143,7 +147,7 @@ typedef int vm_fault_t;

 /*
 * get_user_pages_remote() was added by commit 1e9877902dc7
- * ("mm/gup: Introduce get_user_pages_remote()") in v4.6 (2016-02-12).
+ * ("mm/gup: Introduce get_user_pages_remote()") in v4.6.
 *
 * Note that get_user_pages_remote() requires the caller to hold a reference on
 * the task_struct (if non-NULL and if this API has tsk argument) and the mm_struct.
@@ -153,19 +157,17 @@ typedef int vm_fault_t;
 *
 * get_user_pages_remote() write/force parameters were replaced
 * with gup_flags by commit 9beae1ea8930 ("mm: replace get_user_pages_remote()
- * write/force parameters with gup_flags") in v4.9 (2016-10-13).
+ * write/force parameters with gup_flags") in v4.9.
 *
 * get_user_pages_remote() added 'locked' parameter by commit 5b56d49fc31d
- * ("mm: add locked parameter to get_user_pages_remote()") in
- * v4.10 (2016-12-14).
+ * ("mm: add locked parameter to get_user_pages_remote()") in v4.10.
 *
 * get_user_pages_remote() removed 'tsk' parameter by
 * commit 64019a2e467a ("mm/gup: remove task_struct pointer for
- * all gup code") in v5.9-rc1 (2020-08-11).
+ * all gup code") in v5.9.
 *
- * Removed vmas parameter from get_user_pages_remote() by commit a4bde14d549 
- * ("mm/gup: remove vmas parameter from get_user_pages_remote()")
- * in linux-next, expected in v6.5-rc1 (2023-05-14)
+ * Removed vmas parameter from get_user_pages_remote() by commit ca5e863233e8
+ * ("mm/gup: remove vmas parameter from get_user_pages_remote()") in v6.5.
 *
 */

--- a/kernel-open/common/inc/nv-proto.h
+++ b/kernel-open/common/inc/nv-proto.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 1999-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 1999-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -59,6 +59,8 @@ NV_STATUS   nv_uvm_resume               (void);
 void        nv_uvm_notify_start_device  (const NvU8 *uuid);
 void        nv_uvm_notify_stop_device   (const NvU8 *uuid);
 NV_STATUS   nv_uvm_event_interrupt      (const NvU8 *uuid);
+NV_STATUS   nv_uvm_drain_P2P            (const NvU8 *uuid);
+NV_STATUS   nv_uvm_resume_P2P           (const NvU8 *uuid);

 /* Move these to nv.h once implemented by other UNIX platforms */
 NvBool      nvidia_get_gpuid_list       (NvU32 *gpu_ids, NvU32 *gpu_count);
--- a/kernel-open/common/inc/nv.h
+++ b/kernel-open/common/inc/nv.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 1999-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 1999-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -44,6 +44,7 @@
 #include <nv-ioctl.h>
 #include <nv-ioctl-numa.h>
 #include <nvmisc.h>
+#include <os/nv_memory_area.h>

 extern nv_cap_t *nvidia_caps_root;

@@ -110,15 +111,15 @@ typedef enum _TEGRASOC_WHICH_CLK
    TEGRASOC_WHICH_CLK_DSIPLL_CLKOUTPN,
    TEGRASOC_WHICH_CLK_DSIPLL_CLKOUTA,
    TEGRASOC_WHICH_CLK_SPPLL0_VCO,
-    TEGRASOC_WHICH_CLK_SPPLL0_CLKOUTPN,
    TEGRASOC_WHICH_CLK_SPPLL0_CLKOUTA,
    TEGRASOC_WHICH_CLK_SPPLL0_CLKOUTB,
+    TEGRASOC_WHICH_CLK_SPPLL0_CLKOUTPN,
+    TEGRASOC_WHICH_CLK_SPPLL1_CLKOUTPN,
+    TEGRASOC_WHICH_CLK_SPPLL0_DIV27,
+    TEGRASOC_WHICH_CLK_SPPLL1_DIV27,
    TEGRASOC_WHICH_CLK_SPPLL0_DIV10,
    TEGRASOC_WHICH_CLK_SPPLL0_DIV25,
-    TEGRASOC_WHICH_CLK_SPPLL0_DIV27,
    TEGRASOC_WHICH_CLK_SPPLL1_VCO,
-    TEGRASOC_WHICH_CLK_SPPLL1_CLKOUTPN,
-    TEGRASOC_WHICH_CLK_SPPLL1_DIV27,
    TEGRASOC_WHICH_CLK_VPLL0_REF,
    TEGRASOC_WHICH_CLK_VPLL0,
    TEGRASOC_WHICH_CLK_VPLL1,
@@ -132,7 +133,7 @@ typedef enum _TEGRASOC_WHICH_CLK
    TEGRASOC_WHICH_CLK_DSI_PIXEL,
    TEGRASOC_WHICH_CLK_PRE_SOR0,
    TEGRASOC_WHICH_CLK_PRE_SOR1,
-    TEGRASOC_WHICH_CLK_DP_LINK_REF,
+    TEGRASOC_WHICH_CLK_DP_LINKA_REF,
    TEGRASOC_WHICH_CLK_SOR_LINKA_INPUT,
    TEGRASOC_WHICH_CLK_SOR_LINKA_AFIFO,
    TEGRASOC_WHICH_CLK_SOR_LINKA_AFIFO_M,
@@ -143,7 +144,7 @@ typedef enum _TEGRASOC_WHICH_CLK
    TEGRASOC_WHICH_CLK_PLLHUB,
    TEGRASOC_WHICH_CLK_SOR0,
    TEGRASOC_WHICH_CLK_SOR1,
-    TEGRASOC_WHICH_CLK_SOR_PAD_INPUT,
+    TEGRASOC_WHICH_CLK_SOR_PADA_INPUT,
    TEGRASOC_WHICH_CLK_PRE_SF0,
    TEGRASOC_WHICH_CLK_SF0,
    TEGRASOC_WHICH_CLK_SF1,
@@ -279,8 +280,7 @@ typedef struct nv_usermap_access_params_s
    NvU64    offset;
    NvU64   *page_array;
    NvU64    num_pages;
-    NvU64    mmap_start;
-    NvU64    mmap_size;
+    MemoryArea memArea;
    NvU64    access_start;
    NvU64    access_size;
    NvU64    remap_prot_extra;
@@ -296,8 +296,7 @@ typedef struct nv_alloc_mapping_context_s {
    NvU64  page_index;
    NvU64 *page_array;
    NvU64  num_pages;
-    NvU64  mmap_start;
-    NvU64  mmap_size;
+    MemoryArea memArea;
    NvU64  access_start;
    NvU64  access_size;
    NvU64  remap_prot_extra;
@@ -330,9 +329,11 @@ typedef struct nv_soc_irq_info_s {
    NvS32 ref_count;
 } nv_soc_irq_info_t;

-#define NV_MAX_SOC_IRQS              6
+#define NV_MAX_SOC_IRQS              10
 #define NV_MAX_DPAUX_NUM_DEVICES     4
-#define NV_MAX_SOC_DPAUX_NUM_DEVICES 2 // From SOC_DEV_MAPPING
+
+#define NV_MAX_SOC_DPAUX_NUM_DEVICES 2
+

 #define NV_IGPU_LEGACY_STALL_IRQ     70
 #define NV_IGPU_MAX_STALL_IRQS       3
@@ -495,12 +496,6 @@ typedef struct nv_state_t
    } iommus;
 } nv_state_t;

-// These define need to be in sync with defines in system.h
-#define OS_TYPE_LINUX   0x1
-#define OS_TYPE_FREEBSD 0x2
-#define OS_TYPE_SUNOS   0x3
-#define OS_TYPE_VMWARE  0x4
-
 #define NVFP_TYPE_NONE       0x0
 #define NVFP_TYPE_REFCOUNTED 0x1
 #define NVFP_TYPE_REGISTERED 0x2
@@ -539,6 +534,7 @@ typedef struct UvmGpuAddressSpaceInfo_tag           *nvgpuAddressSpaceInfo_t;
 typedef struct UvmGpuAllocInfo_tag                  *nvgpuAllocInfo_t;
 typedef struct UvmGpuP2PCapsParams_tag              *nvgpuP2PCapsParams_t;
 typedef struct UvmGpuFbInfo_tag                     *nvgpuFbInfo_t;
+typedef struct UvmGpuNvlinkInfo_tag                 *nvgpuNvlinkInfo_t;
 typedef struct UvmGpuEccInfo_tag                    *nvgpuEccInfo_t;
 typedef struct UvmGpuFaultInfo_tag                  *nvgpuFaultInfo_t;
 typedef struct UvmGpuAccessCntrInfo_tag             *nvgpuAccessCntrInfo_t;
@@ -549,6 +545,7 @@ typedef struct UvmPmaAllocationOptions_tag          *nvgpuPmaAllocationOptions_t
 typedef struct UvmPmaStatistics_tag                 *nvgpuPmaStatistics_t;
 typedef struct UvmGpuMemoryInfo_tag                 *nvgpuMemoryInfo_t;
 typedef struct UvmGpuExternalMappingInfo_tag        *nvgpuExternalMappingInfo_t;
+typedef struct UvmGpuExternalPhysAddrInfo_tag       *nvgpuExternalPhysAddrInfo_t;
 typedef struct UvmGpuChannelResourceInfo_tag        *nvgpuChannelResourceInfo_t;
 typedef struct UvmGpuChannelInstanceInfo_tag        *nvgpuChannelInstanceInfo_t;
 typedef struct UvmGpuChannelResourceBindParams_tag  *nvgpuChannelResourceBindParams_t;
@@ -609,6 +606,15 @@ typedef enum
    NV_POWER_STATE_RUNNING
 } nv_power_state_t;

+typedef struct
+{
+    const char *vidmem_power_status;
+    const char *dynamic_power_status;
+    const char *gc6_support;
+    const char *gcoff_support;
+    const char *s0ix_status;
+} nv_power_info_t;
+
 #define NV_PRIMARY_VGA(nv)      ((nv)->primary_vga)

 #define NV_IS_CTL_DEVICE(nv)    ((nv)->flags & NV_FLAG_CONTROL)
@@ -778,7 +784,7 @@ nv_state_t*  NV_API_CALL  nv_get_ctl_state       (void);

 void   NV_API_CALL  nv_set_dma_address_size      (nv_state_t *, NvU32 );

-NV_STATUS  NV_API_CALL  nv_alias_pages           (nv_state_t *, NvU32, NvU32, NvU32, NvU64, NvU64 *, void **);
+NV_STATUS  NV_API_CALL  nv_alias_pages           (nv_state_t *, NvU32, NvU64, NvU32, NvU32, NvU64, NvU64 *, NvBool, void **);
 NV_STATUS  NV_API_CALL  nv_alloc_pages           (nv_state_t *, NvU32, NvU64, NvBool, NvU32, NvBool, NvBool, NvS32, NvU64 *, void **);
 NV_STATUS  NV_API_CALL  nv_free_pages            (nv_state_t *, NvU32, NvBool, NvU32, void *);

@@ -822,6 +828,7 @@ void   NV_API_CALL  nv_acpi_methods_init         (NvU32 *);
 void   NV_API_CALL  nv_acpi_methods_uninit       (void);

 NV_STATUS  NV_API_CALL  nv_acpi_method           (NvU32, NvU32, NvU32, void *, NvU16, NvU32 *, void *, NvU16 *);
+NV_STATUS  NV_API_CALL  nv_acpi_d3cold_dsm_for_upstream_port (nv_state_t *, NvU8 *, NvU32, NvU32, NvU32 *);
 NV_STATUS  NV_API_CALL  nv_acpi_dsm_method       (nv_state_t *, NvU8 *, NvU32, NvBool, NvU32, void *, NvU16, NvU32 *, void *, NvU16 *);
 NV_STATUS  NV_API_CALL  nv_acpi_ddc_method       (nv_state_t *, void *, NvU32 *, NvBool);
 NV_STATUS  NV_API_CALL  nv_acpi_dod_method       (nv_state_t *, NvU32 *, NvU32 *);
@@ -883,8 +890,6 @@ void      NV_API_CALL nv_cap_drv_exit(void);
 NvBool    NV_API_CALL nv_is_gpu_accessible(nv_state_t *);
 NvBool    NV_API_CALL nv_match_gpu_os_info(nv_state_t *, void *);

-NvU32     NV_API_CALL nv_get_os_type(void);
-
 void      NV_API_CALL nv_get_updated_emu_seg(NvU32 *start, NvU32 *end);
 void      NV_API_CALL nv_get_screen_info(nv_state_t *, NvU64 *, NvU32 *, NvU32 *, NvU32 *, NvU32 *, NvU64 *);

@@ -900,6 +905,9 @@ void      NV_API_CALL nv_dma_release_dma_buf     (nv_dma_buf_t *);

 void      NV_API_CALL nv_schedule_uvm_isr        (nv_state_t *);

+NV_STATUS NV_API_CALL nv_schedule_uvm_drain_p2p  (NvU8 *);
+void      NV_API_CALL nv_schedule_uvm_resume_p2p (NvU8 *);
+
 NvBool    NV_API_CALL nv_platform_supports_s0ix  (void);
 NvBool    NV_API_CALL nv_s2idle_pm_configured    (void);

@@ -990,15 +998,15 @@ NV_STATUS  NV_API_CALL  rm_p2p_init_mapping       (nvidia_stack_t *, NvU64, NvU6
 NV_STATUS  NV_API_CALL  rm_p2p_destroy_mapping    (nvidia_stack_t *, NvU64);
 NV_STATUS  NV_API_CALL  rm_p2p_get_pages          (nvidia_stack_t *, NvU64, NvU32, NvU64, NvU64, NvU64 *, NvU32 *, NvU32 *, NvU32 *, NvU8 **, void *);
 NV_STATUS  NV_API_CALL  rm_p2p_get_gpu_info       (nvidia_stack_t *, NvU64, NvU64, NvU8 **, void **);
-NV_STATUS  NV_API_CALL  rm_p2p_get_pages_persistent (nvidia_stack_t *,  NvU64, NvU64, void **, NvU64 *, NvU32 *, void *, void *);
+NV_STATUS  NV_API_CALL  rm_p2p_get_pages_persistent (nvidia_stack_t *,  NvU64, NvU64, void **, NvU64 *, NvU32 *, void *, void *, void **);
 NV_STATUS  NV_API_CALL  rm_p2p_register_callback  (nvidia_stack_t *, NvU64, NvU64, NvU64, void *, void (*)(void *), void *);
 NV_STATUS  NV_API_CALL  rm_p2p_put_pages          (nvidia_stack_t *, NvU64, NvU32, NvU64, void *);
-NV_STATUS  NV_API_CALL  rm_p2p_put_pages_persistent(nvidia_stack_t *, void *, void *);
+NV_STATUS  NV_API_CALL  rm_p2p_put_pages_persistent(nvidia_stack_t *, void *, void *, void *);
 NV_STATUS  NV_API_CALL  rm_p2p_dma_map_pages      (nvidia_stack_t *, nv_dma_device_t *, NvU8 *, NvU64, NvU32, NvU64 *, void **);
 NV_STATUS  NV_API_CALL  rm_dma_buf_dup_mem_handle (nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvHandle, NvHandle, void *, NvHandle, NvU64, NvU64, NvHandle *, void **);
 void       NV_API_CALL  rm_dma_buf_undup_mem_handle(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle);
-NV_STATUS  NV_API_CALL  rm_dma_buf_map_mem_handle (nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvU64, NvU64, void *, nv_phys_addr_range_t **, NvU32 *);
-void       NV_API_CALL  rm_dma_buf_unmap_mem_handle(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvU64, nv_phys_addr_range_t **, NvU32);
+NV_STATUS  NV_API_CALL  rm_dma_buf_map_mem_handle (nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, MemoryRange, void *, NvBool, MemoryArea *);
+void       NV_API_CALL  rm_dma_buf_unmap_mem_handle(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, void *, NvBool, MemoryArea);
 NV_STATUS  NV_API_CALL  rm_dma_buf_get_client_and_device(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvHandle *, NvHandle *, NvHandle *, void **, NvBool *);
 void       NV_API_CALL  rm_dma_buf_put_client_and_device(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvHandle, void *);
 NV_STATUS  NV_API_CALL  rm_log_gpu_crash          (nv_stack_t *, nv_state_t *);
@@ -1027,9 +1035,7 @@ void       NV_API_CALL rm_enable_dynamic_power_management(nvidia_stack_t *, nv_s
 NV_STATUS  NV_API_CALL rm_ref_dynamic_power(nvidia_stack_t *, nv_state_t *, nv_dynamic_power_mode_t);
 void       NV_API_CALL rm_unref_dynamic_power(nvidia_stack_t *, nv_state_t *, nv_dynamic_power_mode_t);
 NV_STATUS  NV_API_CALL rm_transition_dynamic_power(nvidia_stack_t *, nv_state_t *, NvBool, NvBool *);
-const char* NV_API_CALL rm_get_vidmem_power_status(nvidia_stack_t *, nv_state_t *);
-const char* NV_API_CALL rm_get_dynamic_power_management_status(nvidia_stack_t *, nv_state_t *);
-const char* NV_API_CALL rm_get_gpu_gcx_support(nvidia_stack_t *, nv_state_t *, NvBool);
+void       NV_API_CALL rm_get_power_info(nvidia_stack_t *, nv_state_t *, nv_power_info_t *);

 void       NV_API_CALL rm_acpi_notify(nvidia_stack_t *, nv_state_t *, NvU32);
 void       NV_API_CALL rm_acpi_nvpcf_notify(nvidia_stack_t *);
@@ -1041,13 +1047,12 @@ NV_STATUS  NV_API_CALL  nv_vgpu_create_request(nvidia_stack_t *, nv_state_t *, c
 NV_STATUS  NV_API_CALL  nv_vgpu_delete(nvidia_stack_t *, const NvU8 *, NvU16);
 NV_STATUS  NV_API_CALL  nv_vgpu_get_type_ids(nvidia_stack_t *, nv_state_t *, NvU32 *, NvU32 *, NvBool, NvU8, NvBool);
 NV_STATUS  NV_API_CALL  nv_vgpu_get_type_info(nvidia_stack_t *, nv_state_t *, NvU32, char *, int, NvU8);
-NV_STATUS  NV_API_CALL  nv_vgpu_get_bar_info(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 *, NvU32, void *, NvBool *);
+NV_STATUS  NV_API_CALL  nv_vgpu_get_bar_info(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 *,
+                                             NvU64 *, NvU64 *, NvU32 *, NvBool *, NvU8 *);
 NV_STATUS  NV_API_CALL  nv_vgpu_get_hbm_info(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 *, NvU64 *);
-NV_STATUS  NV_API_CALL  nv_vgpu_start(nvidia_stack_t *, const NvU8 *, void *, NvS32 *, NvU8 *, NvU32);
-NV_STATUS  NV_API_CALL  nv_vgpu_get_sparse_mmap(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 **, NvU64 **, NvU32 *);
 NV_STATUS  NV_API_CALL  nv_vgpu_process_vf_info(nvidia_stack_t *, nv_state_t *, NvU8, NvU32, NvU8, NvU8, NvU8, NvBool, void *);
-NV_STATUS  NV_API_CALL  nv_vgpu_update_request(nvidia_stack_t *, const NvU8 *, NvU32, NvU64 *, NvU64 *, const char *);
-NV_STATUS  NV_API_CALL  nv_gpu_bind_event(nvidia_stack_t *);
+NV_STATUS  NV_API_CALL  nv_gpu_bind_event(nvidia_stack_t *, NvU32, NvBool *);
+NV_STATUS  NV_API_CALL  nv_gpu_unbind_event(nvidia_stack_t *, NvU32, NvBool *);

 NV_STATUS NV_API_CALL nv_get_usermap_access_params(nv_state_t*, nv_usermap_access_params_t*);
 nv_soc_irq_type_t NV_API_CALL nv_get_current_irq_type(nv_state_t*);
@@ -1078,6 +1083,9 @@ NV_STATUS   NV_API_CALL rm_run_nano_timer_callback(nvidia_stack_t *, nv_state_t
 void        NV_API_CALL nv_cancel_nano_timer(nv_state_t *, nv_nano_timer_t *);
 void        NV_API_CALL nv_destroy_nano_timer(nv_state_t *nv, nv_nano_timer_t *);

+// Host1x specific functions.
+NV_STATUS NV_API_CALL nv_get_syncpoint_aperture(NvU32, NvU64 *, NvU64 *, NvU32 *);
+
 #if defined(NVCPU_X86_64)

 static inline NvU64 nv_rdtsc(void)
--- a/kernel-open/common/inc/nv_uvm_interface.h
+++ b/kernel-open/common/inc/nv_uvm_interface.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2013-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2013-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -1085,6 +1085,22 @@ NV_STATUS nvUvmInterfaceRegisterUvmCallbacks(struct UvmOpsUvmEvents *importedUvm
 //
 void nvUvmInterfaceDeRegisterUvmOps(void);

+/*******************************************************************************
+    nvUvmInterfaceGetNvlinkInfo
+
+    Gets NVLINK information from RM.
+
+    Arguments:
+        device[IN]        - GPU device handle
+        nvlinkInfo [OUT]     - Pointer to NvlinkInfo structure
+
+    Error codes:
+      NV_ERROR
+      NV_ERR_INVALID_ARGUMENT
+*/
+NV_STATUS nvUvmInterfaceGetNvlinkInfo(uvmGpuDeviceHandle device,
+                                      UvmGpuNvlinkInfo *nvlinkInfo);
+
 /*******************************************************************************
    nvUvmInterfaceP2pObjectCreate

@@ -1161,6 +1177,48 @@ NV_STATUS nvUvmInterfaceGetExternalAllocPtes(uvmGpuAddressSpaceHandle vaSpace,
                                             NvU64 size,
                                             UvmGpuExternalMappingInfo *gpuExternalMappingInfo);

+/*******************************************************************************
+    nvUvmInterfaceGetExternalAllocPhysAddrs
+
+    The interface builds the RM physical addrs using the provided input parameters.
+
+    Arguments:
+        vaSpace[IN]                     -  vaSpace handle.
+        hMemory[IN]                     -  Memory handle.
+        offset [IN]                     -  Offset from the beginning of the allocation
+                                           where PTE mappings should begin.
+                                           Should be aligned with mappingPagesize
+                                           in gpuExternalMappingInfo associated
+                                           with the allocation.
+        size [IN]                       -  Length of the allocation for which PhysAddrs
+                                           should be built.
+                                           Should be aligned with mappingPagesize
+                                           in gpuExternalMappingInfo associated
+                                           with the allocation.
+                                           size = 0 will be interpreted as the total size
+                                           of the allocation.
+        gpuExternalMappingInfo[IN/OUT]  -  See nv_uvm_types.h for more information.
+
+   Error codes:
+        NV_ERR_INVALID_ARGUMENT         - Invalid parameter/s is passed.
+        NV_ERR_INVALID_OBJECT_HANDLE    - Invalid memory handle is passed.
+        NV_ERR_NOT_SUPPORTED            - Functionality is not supported (see comments in nv_gpu_ops.c)
+        NV_ERR_INVALID_BASE             - offset is beyond the allocation size
+        NV_ERR_INVALID_LIMIT            - (offset + size) is beyond the allocation size.
+        NV_ERR_BUFFER_TOO_SMALL         - gpuExternalMappingInfo.physAddrBufferSize is insufficient to
+                                          store single physAddr.
+        NV_ERR_NOT_READY                - Returned when querying the physAddrs requires a deferred setup
+                                          which has not yet completed. It is expected that the caller
+                                          will reattempt the call until a different code is returned.
+                                          As an example, multi-node systems which require querying
+                                          physAddrs from the Fabric Manager may return this code.
+*/
+NV_STATUS nvUvmInterfaceGetExternalAllocPhysAddrs(uvmGpuAddressSpaceHandle vaSpace,
+                                                  NvHandle hMemory,
+                                                  NvU64 offset,
+                                                  NvU64 size,
+                                                  UvmGpuExternalPhysAddrInfo *gpuExternalPhysAddrsInfo);
+
 /*******************************************************************************
    nvUvmInterfaceRetainChannel

@@ -1462,6 +1520,16 @@ NV_STATUS nvUvmInterfacePagingChannelPushStream(UvmGpuPagingChannelHandle channe
                                                char *methodStream,
                                                NvU32 methodStreamSize);

+/*******************************************************************************
+    nvUvmInterfaceReportFatalError
+
+    Reports a global fatal error so RM can inform the clients that a node reboot
+    is necessary to recover from this error. This function can be called from
+    any lock environment, bottom half or non-interrupt context.
+
+*/
+void nvUvmInterfaceReportFatalError(NV_STATUS error);
+
 /*******************************************************************************
    Cryptography Services Library (CSL) Interface
 */
@@ -1505,23 +1573,35 @@ NV_STATUS nvUvmInterfaceCslInitContext(UvmCslContext *uvmCslContext,
 void nvUvmInterfaceDeinitCslContext(UvmCslContext *uvmCslContext);

 /*******************************************************************************
-    nvUvmInterfaceCslUpdateContext
+    nvUvmInterfaceCslRotateKey

-    Updates a context after a key rotation event and can only be called once per
-    key rotation event. Following a key rotation event, and before
-    nvUvmInterfaceCslUpdateContext is called, data encrypted by the GPU with the
-    previous key can be decrypted with nvUvmInterfaceCslDecrypt.
+    Disables channels and rotates keys.

-    Locking: This function acquires an API lock.
-    Memory : This function does not dynamically allocate memory.
+    This function disables channels and rotates associated keys. The channels
+    associated with the given CSL contexts must be idled before this function is
+    called. To trigger key rotation all allocated channels for a given key must
+    be present in the list. If the function returns successfully then the CSL
+    contexts have been updated with the new key.
+
+    Locking: This function attempts to acquire the GPU lock. In case of failure
+             to acquire the return code is NV_ERR_STATE_IN_USE. The caller must
+             guarantee that no CSL function, including this one, is invoked
+             concurrently with the CSL contexts in contextList.
+    Memory : This function dynamically allocates memory.

    Arguments:
-        uvmCslContext[IN] - The CSL context associated with a channel.
-
+        contextList[IN/OUT]  - An array of pointers to CSL contexts.
+        contextListCount[IN] - Number of CSL contexts in contextList. Its value
+                               must be greater than 0.
    Error codes:
-        NV_ERR_INVALID_ARGUMENT - The CSL context is not associated with a channel.
+        NV_ERR_INVALID_ARGUMENT - contextList is NULL or contextListCount is 0.
+        NV_ERR_STATE_IN_USE     - Unable to acquire lock / resource. Caller
+                                  can retry at a later time.
+        NV_ERR_GENERIC          - A failure other than _STATE_IN_USE occurred
+                                  when attempting to acquire a lock.
 */
-NV_STATUS nvUvmInterfaceCslUpdateContext(UvmCslContext *uvmCslContext);
+NV_STATUS nvUvmInterfaceCslRotateKey(UvmCslContext *contextList[],
+                                     NvU32 contextListCount);

 /*******************************************************************************
    nvUvmInterfaceCslRotateIv
@@ -1529,17 +1609,13 @@ NV_STATUS nvUvmInterfaceCslUpdateContext(UvmCslContext *uvmCslContext);
    Rotates the IV for a given channel and operation.

    This function will rotate the IV on both the CPU and the GPU.
-    Outstanding messages that have been encrypted by the GPU should first be
-    decrypted before calling this function with operation equal to
-    UVM_CSL_OPERATION_DECRYPT. Similarly, outstanding messages that have been
-    encrypted by the CPU should first be decrypted before calling this function
-    with operation equal to UVM_CSL_OPERATION_ENCRYPT. For a given operation
-    the channel must be idle before calling this function. This function can be
-    called regardless of the value of the IV's message counter.
+    For a given operation the channel must be idle before calling this function.
+    This function can be called regardless of the value of the IV's message counter.

-    Locking: This function attempts to acquire the GPU lock.
-             In case of failure to acquire the return code
-             is NV_ERR_STATE_IN_USE.
+    Locking: This function attempts to acquire the GPU lock. In case of failure to
+             acquire the return code is NV_ERR_STATE_IN_USE. The caller must guarantee
+             that no CSL function, including this one, is invoked concurrently with
+             the same CSL context.
    Memory : This function does not dynamically allocate memory.

 Arguments:
@@ -1573,8 +1649,8 @@ NV_STATUS nvUvmInterfaceCslRotateIv(UvmCslContext *uvmCslContext,
    However, it is optional. If it is NULL, the next IV in line will be used.

    Locking: This function does not acquire an API or GPU lock.
-             If called concurrently in different threads with the same UvmCslContext
-             the caller must guarantee exclusion.
+             The caller must guarantee that no CSL function, including this one,
+             is invoked concurrently with the same CSL context.
    Memory : This function does not dynamically allocate memory.

 Arguments:
@@ -1610,9 +1686,14 @@ NV_STATUS nvUvmInterfaceCslEncrypt(UvmCslContext *uvmCslContext,
    maximized when the input and output buffers are 16-byte aligned. This is
    natural alignment for AES block.

+    During a key rotation event the previous key is stored in the CSL context.
+    This allows data encrypted by the GPU to be decrypted with the previous key.
+    The keyRotationId parameter identifies which key is used. The first key rotation
+    ID has a value of 0 that increments by one for each key rotation event.
+
    Locking: This function does not acquire an API or GPU lock.
-             If called concurrently in different threads with the same UvmCslContext
-             the caller must guarantee exclusion.
+             The caller must guarantee that no CSL function, including this one,
+             is invoked concurrently with the same CSL context.
    Memory : This function does not dynamically allocate memory.

    Arguments:
@@ -1622,6 +1703,8 @@ NV_STATUS nvUvmInterfaceCslEncrypt(UvmCslContext *uvmCslContext,
        decryptIv[IN]         - IV used to decrypt the ciphertext. Its value can either be given by
                                nvUvmInterfaceCslIncrementIv, or, if NULL, the CSL context's
                                internal counter is used.
+        keyRotationId[IN]     - Specifies the key that is used for decryption.
+                                A value of NV_U32_MAX specifies the current key.
        inputBuffer[IN]       - Address of ciphertext input buffer.
        outputBuffer[OUT]     - Address of plaintext output buffer.
        addAuthData[IN]       - Address of the plaintext additional authenticated data used to
@@ -1642,6 +1725,7 @@ NV_STATUS nvUvmInterfaceCslDecrypt(UvmCslContext *uvmCslContext,
                                   NvU32 bufferSize,
                                   NvU8 const *inputBuffer,
                                   UvmCslIv const *decryptIv,
+                                   NvU32 keyRotationId,
                                   NvU8 *outputBuffer,
                                   NvU8 const *addAuthData,
                                   NvU32 addAuthDataSize,
@@ -1656,8 +1740,8 @@ NV_STATUS nvUvmInterfaceCslDecrypt(UvmCslContext *uvmCslContext,
    undefined behavior.

    Locking: This function does not acquire an API or GPU lock.
-             If called concurrently in different threads with the same UvmCslContext
-             the caller must guarantee exclusion.
+             The caller must guarantee that no CSL function, including this one,
+             is invoked concurrently with the same CSL context.
    Memory : This function does not dynamically allocate memory.

    Arguments:
@@ -1685,8 +1769,8 @@ NV_STATUS nvUvmInterfaceCslSign(UvmCslContext *uvmCslContext,

    Locking: This function does not acquire an API or GPU lock.
    Memory : This function does not dynamically allocate memory.
-             If called concurrently in different threads with the same UvmCslContext
-             the caller must guarantee exclusion.
+             The caller must guarantee that no CSL function, including this one,
+             is invoked concurrently with the same CSL context.

    Arguments:
        uvmCslContext[IN/OUT] - The CSL context.
@@ -1711,8 +1795,8 @@ NV_STATUS nvUvmInterfaceCslQueryMessagePool(UvmCslContext *uvmCslContext,
    the returned IV can be used in nvUvmInterfaceCslDecrypt.

    Locking: This function does not acquire an API or GPU lock.
-             If called concurrently in different threads with the same UvmCslContext
-             the caller must guarantee exclusion.
+             The caller must guarantee that no CSL function, including this one,
+             is invoked concurrently with the same CSL context.
    Memory : This function does not dynamically allocate memory.

 Arguments:
@@ -1734,28 +1818,41 @@ NV_STATUS nvUvmInterfaceCslIncrementIv(UvmCslContext *uvmCslContext,
                                       UvmCslIv *iv);

 /*******************************************************************************
-    nvUvmInterfaceCslLogExternalEncryption
+    nvUvmInterfaceCslLogEncryption

-    Checks and logs information about non-CSL encryptions, such as those that
-    originate from the GPU.
+    Checks and logs information about encryptions associated with the given
+    CSL context.

-    This function does not modify elements of the UvmCslContext.
+    For contexts associated with channels, this function does not modify elements of
+    the UvmCslContext, and must be called for every CPU/GPU encryption.
+
+    For the context associated with fault buffers, bufferSize can encompass multiple
+    encryption invocations, and the UvmCslContext will be updated following a key
+    rotation event.
+
+    In either case the IV remains unmodified after this function is called.

    Locking: This function does not acquire an API or GPU lock.
    Memory : This function does not dynamically allocate memory.
-             If called concurrently in different threads with the same UvmCslContext
-             the caller must guarantee exclusion.
+             The caller must guarantee that no CSL function, including this one,
+             is invoked concurrently with the same CSL context.

    Arguments:
        uvmCslContext[IN/OUT] - The CSL context.
-        bufferSize[OUT]       - The size of the buffer encrypted by the
+        operation[IN]         - If the CSL context is associated with a fault
+                                buffer, this argument is ignored. If it is
+                                associated with a channel, it must be either
+                                - UVM_CSL_OPERATION_ENCRYPT
+                                - UVM_CSL_OPERATION_DECRYPT
+        bufferSize[IN]        - The size of the buffer(s) encrypted by the
                                external entity in units of bytes.

    Error codes:
-      NV_ERR_INSUFFICIENT_RESOURCES - The device encryption would cause a counter
+      NV_ERR_INSUFFICIENT_RESOURCES - The encryption would cause a counter
                                      to overflow.
 */
-NV_STATUS nvUvmInterfaceCslLogExternalEncryption(UvmCslContext *uvmCslContext,
-                                                 NvU32 bufferSize);
+NV_STATUS nvUvmInterfaceCslLogEncryption(UvmCslContext *uvmCslContext,
+                                         UvmCslOperation operation,
+                                         NvU32 bufferSize);

 #endif // _NV_UVM_INTERFACE_H_
--- a/kernel-open/common/inc/nv_uvm_types.h
+++ b/kernel-open/common/inc/nv_uvm_types.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2014-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -39,12 +39,13 @@
 // are multiple BIG page sizes in RM. These defines are used as flags to "0"
 // should be OK when user is not sure which pagesize allocation it wants
 //
-#define UVM_PAGE_SIZE_DEFAULT    0x0
-#define UVM_PAGE_SIZE_4K         0x1000
-#define UVM_PAGE_SIZE_64K        0x10000
-#define UVM_PAGE_SIZE_128K       0x20000
-#define UVM_PAGE_SIZE_2M         0x200000
-#define UVM_PAGE_SIZE_512M       0x20000000
+#define UVM_PAGE_SIZE_DEFAULT    0x0ULL
+#define UVM_PAGE_SIZE_4K         0x1000ULL
+#define UVM_PAGE_SIZE_64K        0x10000ULL
+#define UVM_PAGE_SIZE_128K       0x20000ULL
+#define UVM_PAGE_SIZE_2M         0x200000ULL
+#define UVM_PAGE_SIZE_512M       0x20000000ULL
+#define UVM_PAGE_SIZE_256G       0x4000000000ULL

 //
 // When modifying flags, make sure they are compatible with the mirrored
@@ -267,6 +268,7 @@ typedef struct UvmGpuChannelInfo_tag

    // The errorNotifier is filled out when the channel hits an RC error.
    NvNotification    *errorNotifier;
+    NvNotification    *keyRotationNotifier;

    NvU32              hwRunlistId;
    NvU32              hwChannelId;
@@ -292,13 +294,13 @@ typedef struct UvmGpuChannelInfo_tag

    // GPU VAs of both GPFIFO and GPPUT are needed in Confidential Computing
    // so a channel can be controlled via another channel (SEC2 or WLC/LCIC)
-    NvU64             gpFifoGpuVa;
-    NvU64             gpPutGpuVa;
-    NvU64             gpGetGpuVa;
+    NvU64              gpFifoGpuVa;
+    NvU64              gpPutGpuVa;
+    NvU64              gpGetGpuVa;
    // GPU VA of work submission offset is needed in Confidential Computing
    // so CE channels can ring doorbell of other channels as required for
    // WLC/LCIC work submission
-    NvU64             workSubmissionOffsetGpuVa;
+    NvU64              workSubmissionOffsetGpuVa;
 } UvmGpuChannelInfo;

 typedef enum
@@ -394,6 +396,7 @@ typedef enum
    UVM_LINK_TYPE_NVLINK_2,
    UVM_LINK_TYPE_NVLINK_3,
    UVM_LINK_TYPE_NVLINK_4,
+    UVM_LINK_TYPE_NVLINK_5,
    UVM_LINK_TYPE_C2C,
 } UVM_LINK_TYPE;

@@ -540,6 +543,36 @@ typedef struct UvmGpuExternalMappingInfo_tag
    NvU32 pteSize;
 } UvmGpuExternalMappingInfo;

+typedef struct UvmGpuExternalPhysAddrInfo_tag
+{
+    // In: Virtual permissions. Returns
+    // NV_ERR_INVALID_ACCESS_TYPE if input is
+    // inaccurate
+    UvmRmGpuMappingType mappingType;
+
+    // In: Size of the buffer to store PhysAddrs (in bytes).
+    NvU64 physAddrBufferSize;
+
+    // In: Page size for mapping
+    //     If this field is passed as 0, the page size
+    //     of the allocation is used for mapping.
+    //     nvUvmInterfaceGetExternalAllocPtes must pass
+    //     this field as zero.
+    NvU64 mappingPageSize;
+
+    // In: Pointer to a buffer to store PhysAddrs.
+    // Out: The interface will fill the buffer with PhysAddrs
+    NvU64 *physAddrBuffer;
+
+    // Out: Number of PhysAddrs filled in to the buffer.
+    NvU64 numWrittenPhysAddrs;
+
+    // Out: Number of PhysAddrs remaining to be filled
+    //      if the buffer is not sufficient to accommodate
+    //      requested PhysAddrs.
+    NvU64 numRemainingPhysAddrs;
+} UvmGpuExternalPhysAddrInfo;
+
 typedef struct UvmGpuP2PCapsParams_tag
 {
    // Out: peerId[i] contains gpu[i]'s peer id of gpu[1 - i]. Only defined if
@@ -565,11 +598,6 @@ typedef struct UvmGpuP2PCapsParams_tag
    // second, not taking into account the protocols overhead. The reported
    // bandwidth for indirect peers is zero.
    NvU32 totalLinkLineRateMBps;
-
-    // Out: True if the peers have a indirect link to communicate. On P9
-    // systems, this is true if peers are connected to different NPUs that
-    // forward the requests between them.
-    NvU32 indirectAccess      : 1;
 } UvmGpuP2PCapsParams;

 // Platform-wide information
@@ -604,6 +632,8 @@ typedef struct UvmGpuConfComputeCaps_tag
 {
    // Out: GPU's confidential compute mode
    UvmGpuConfComputeMode mode;
+    // Is key rotation enabled for UVM keys
+    NvBool bKeyRotationEnabled;
 } UvmGpuConfComputeCaps;

 #define UVM_GPU_NAME_LENGTH 0x40
@@ -660,6 +690,9 @@ typedef struct UvmGpuInfo_tag
    // Maximum number of TPCs per GPC
    NvU32 maxTpcPerGpcCount;

+    // Number of access counter buffers.
+    NvU32 accessCntrBufferCount;
+
    // NV_TRUE if SMC is enabled on this GPU.
    NvBool smcEnabled;

@@ -706,6 +739,13 @@ typedef struct UvmGpuInfo_tag

    // EGM base address to offset in the GMMU PTE entry for EGM mappings
    NvU64    egmBaseAddr;
+
+    // If connectedToSwitch is NV_TRUE,
+    // nvswitchEgmMemoryWindowStart tells the base address for the GPU's EGM memory in the
+    // NVSwitch address space. It is used when creating PTEs of GPU memory mappings
+    // to NVSwitch peers.
+    NvU64 nvswitchEgmMemoryWindowStart;
+
 } UvmGpuInfo;

 typedef struct UvmGpuFbInfo_tag
@@ -714,10 +754,12 @@ typedef struct UvmGpuFbInfo_tag
    // RM regions that are not registered with PMA either.
    NvU64 maxAllocatableAddress;

-    NvU32 heapSize;          // RAM in KB available for user allocations
-    NvU32 reservedHeapSize;  // RAM in KB reserved for internal RM allocation
-    NvBool bZeroFb;          // Zero FB mode enabled.
-    NvU64 maxVidmemPageSize; // Largest GPU page size to access vidmem.
+    NvU32  heapSize;           // RAM in KB available for user allocations
+    NvU32  reservedHeapSize;   // RAM in KB reserved for internal RM allocation
+    NvBool bZeroFb;            // Zero FB mode enabled.
+    NvU64  maxVidmemPageSize;  // Largest GPU page size to access vidmem.
+    NvBool bStaticBar1Enabled; // Static BAR1 mode is enabled
+    NvU64  staticBar1Size;     // The size of the static mapping
 } UvmGpuFbInfo;

 typedef struct UvmGpuEccInfo_tag
@@ -729,6 +771,15 @@ typedef struct UvmGpuEccInfo_tag
    NvBool   bEccEnabled;
 } UvmGpuEccInfo;

+typedef struct UvmGpuNvlinkInfo_tag
+{
+    unsigned nvlinkMask;
+    unsigned nvlinkOffset;
+    void    *nvlinkReadLocation;
+    NvBool  *nvlinkErrorNotifier;
+    NvBool   bNvlinkRecoveryEnabled;
+} UvmGpuNvlinkInfo;
+
 typedef struct UvmPmaAllocationOptions_tag
 {
    NvU32 flags;
@@ -845,6 +896,41 @@ typedef NV_STATUS (*uvmEventIsrTopHalf_t) (const NvProcessorUuid *pGpuUuidStruct
 typedef void (*uvmEventIsrTopHalf_t) (void);
 #endif

+/*******************************************************************************
+    uvmEventDrainP2P
+    This function will be called by the GPU driver to signal to UVM that the
+    GPU has encountered an uncontained error, and all peer work must be drained
+    to recover.  When it is called, the following assumptions/guarantees are
+    valid/made:
+
+      * Impacted user channels have been preempted and disabled
+      * UVM channels are still running normally and will continue to do
+        so unless an unrecoverable error is hit on said channels
+      * UVM must not return from this function until all enqueued work on
+      * peer channels has drained
+      * In the context of this function call, RM will still service faults
+      * UVM must prevent new peer work from being enqueued until the
+        uvmEventResumeP2P callback is issued
+
+    Returns:
+        NV_OK if UVM has idled peer work and will prevent new peer workloads.
+        NV_ERR_TIMEOUT if peer work was unable to be drained within a timeout
+        XXX NV_ERR_* for any other failure (TBD)
+
+*/
+typedef NV_STATUS (*uvmEventDrainP2P_t) (const NvProcessorUuid *pGpuUuidStruct);
+
+/*******************************************************************************
+    uvmEventResumeP2P
+    This function will be called by the GPU driver to signal to UVM that the
+    GPU has recovered from the previously reported uncontained NVLINK error.
+    When it is called, the following assumptions/guarantees are valid/made:
+
+      * UVM is again allowed to enqueue peer work
+      * UVM channels are still running normally
+*/
+typedef NV_STATUS (*uvmEventResumeP2P_t) (const NvProcessorUuid *pGpuUuidStruct);
+
 struct UvmOpsUvmEvents
 {
    uvmEventSuspend_t     suspend;
@@ -857,6 +943,8 @@ struct UvmOpsUvmEvents
    uvmEventWddmRestartAfterTimeout_t wddmRestartAfterTimeout;
    uvmEventServiceInterrupt_t serviceInterrupt;
 #endif
+    uvmEventDrainP2P_t drainP2P;
+    uvmEventResumeP2P_t resumeP2P;
 };

 #define UVM_CSL_SIGN_AUTH_TAG_SIZE_BYTES 32
@@ -1064,11 +1152,13 @@ typedef UvmGpuAccessCntrConfig gpuAccessCntrConfig;
 typedef UvmGpuFaultInfo gpuFaultInfo;
 typedef UvmGpuMemoryInfo gpuMemoryInfo;
 typedef UvmGpuExternalMappingInfo gpuExternalMappingInfo;
+typedef UvmGpuExternalPhysAddrInfo gpuExternalPhysAddrInfo;
 typedef UvmGpuChannelResourceInfo gpuChannelResourceInfo;
 typedef UvmGpuChannelInstanceInfo gpuChannelInstanceInfo;
 typedef UvmGpuChannelResourceBindParams gpuChannelResourceBindParams;
 typedef UvmGpuFbInfo gpuFbInfo;
 typedef UvmGpuEccInfo gpuEccInfo;
+typedef UvmGpuNvlinkInfo gpuNvlinkInfo;
 typedef UvmGpuPagingChannel *gpuPagingChannelHandle;
 typedef UvmGpuPagingChannelInfo gpuPagingChannelInfo;
 typedef UvmGpuPagingChannelAllocParams gpuPagingChannelAllocParams;
@@ -1086,4 +1176,21 @@ typedef enum UvmCslOperation
    UVM_CSL_OPERATION_DECRYPT
 } UvmCslOperation;

+typedef enum UVM_KEY_ROTATION_STATUS {
+    // Key rotation complete/not in progress
+    UVM_KEY_ROTATION_STATUS_IDLE = 0,
+    // RM is waiting for clients to report their channels are idle for key rotation
+    UVM_KEY_ROTATION_STATUS_PENDING = 1,
+    // Key rotation is in progress
+    UVM_KEY_ROTATION_STATUS_IN_PROGRESS = 2,
+    // Key rotation timeout failure, RM will RC non-idle channels.
+    // UVM should never see this status value.
+    UVM_KEY_ROTATION_STATUS_FAILED_TIMEOUT = 3,
+    // Key rotation failed because upper threshold was crossed, RM will RC non-idle channels
+    UVM_KEY_ROTATION_STATUS_FAILED_THRESHOLD = 4,
+    // Internal RM failure while rotating keys for a certain channel, RM will RC the channel.
+    UVM_KEY_ROTATION_STATUS_FAILED_ROTATION = 5,
+    UVM_KEY_ROTATION_STATUS_MAX_COUNT = 6,
+} UVM_KEY_ROTATION_STATUS;
+
 #endif // _NV_UVM_TYPES_H_
--- a/kernel-open/common/inc/nvkms-api-types.h
+++ b/kernel-open/common/inc/nvkms-api-types.h
@@ -50,6 +50,8 @@
 #define NVKMS_LOG2_LUT_ARRAY_SIZE             10
 #define NVKMS_LUT_ARRAY_SIZE                  (1 << NVKMS_LOG2_LUT_ARRAY_SIZE)

+#define NVKMS_OLUT_FP_NORM_SCALE_DEFAULT      0xffffffff
+
 typedef NvU32 NvKmsDeviceHandle;
 typedef NvU32 NvKmsDispHandle;
 typedef NvU32 NvKmsConnectorHandle;
@@ -245,6 +247,80 @@ struct NvKmsLutRamps {
    NvU16 blue[NVKMS_LUT_ARRAY_SIZE];  /*! in */
 };

+/* Datatypes for LUT capabilities */
+enum NvKmsLUTFormat {
+    /*
+     * Normalized fixed-point format mapping [0, 1] to [0x0, 0xFFFF].
+     */
+    NVKMS_LUT_FORMAT_UNORM16,
+
+    /*
+     * Half-precision floating point.
+     */
+    NVKMS_LUT_FORMAT_FP16,
+
+    /*
+     * 14-bit fixed-point format required to work around hardware bug 813188.
+     *
+     * To convert from UNORM16 to UNORM14_WAR_813188:
+     * unorm14_war_813188 = ((unorm16 >> 2) & ~7) + 0x6000
+     */
+    NVKMS_LUT_FORMAT_UNORM14_WAR_813188
+};
+
+enum NvKmsLUTVssSupport {
+    NVKMS_LUT_VSS_NOT_SUPPORTED,
+    NVKMS_LUT_VSS_SUPPORTED,
+    NVKMS_LUT_VSS_REQUIRED,
+};
+
+enum NvKmsLUTVssType {
+    NVKMS_LUT_VSS_TYPE_NONE,
+    NVKMS_LUT_VSS_TYPE_LINEAR,
+    NVKMS_LUT_VSS_TYPE_LOGARITHMIC,
+};
+
+struct NvKmsLUTCaps {
+    /*! Whether this layer or head on this device supports this LUT stage. */
+    NvBool supported;
+
+    /*! Whether this LUT supports VSS. */
+    enum NvKmsLUTVssSupport vssSupport;
+
+    /*!
+     * The type of VSS segmenting this LUT uses.
+     */
+    enum NvKmsLUTVssType vssType;
+
+    /*!
+     * Expected number of VSS segments.
+     */
+    NvU32 vssSegments;
+
+    /*!
+     * Expected number of LUT entries.
+     */
+    NvU32 lutEntries;
+
+    /*!
+     * Format for each of the LUT entries.
+     */
+    enum NvKmsLUTFormat entryFormat;
+};
+
+/* each LUT entry uses this many bytes */
+#define NVKMS_LUT_CAPS_LUT_ENTRY_SIZE (4 * sizeof(NvU16))
+
+/* if the LUT surface uses VSS, size of the VSS header */
+#define NVKMS_LUT_VSS_HEADER_SIZE (4 * NVKMS_LUT_CAPS_LUT_ENTRY_SIZE)
+
+struct NvKmsLUTSurfaceParams {
+    NvKmsSurfaceHandle surfaceHandle;
+    NvU64 offset NV_ALIGN_BYTES(8);
+    NvU32 vssSegments;
+    NvU32 lutEntries;
+};
+
 /*
 * A 3x4 row-major colorspace conversion matrix.
 *
@@ -440,9 +516,9 @@ struct NvKmsLayerCapabilities {
    NvBool supportsWindowMode              :1;

    /*!
-     * Whether layer supports HDR pipe.
+     * Whether layer supports ICtCp pipe.
     */
-    NvBool supportsHDR                     :1;
+    NvBool supportsICtCp                   :1;


    /*!
@@ -463,6 +539,10 @@ struct NvKmsLayerCapabilities {
     * still expected to honor the NvKmsUsageBounds for each head.
     */
    NvU64 supportedSurfaceMemoryFormats NV_ALIGN_BYTES(8);
+
+    /* Capabilities for each LUT stage in the EVO3 precomp pipeline. */
+    struct NvKmsLUTCaps ilut;
+    struct NvKmsLUTCaps tmo;
 };

 /*!
@@ -683,4 +763,20 @@ struct NvKmsSuperframeInfo {
    } view[NVKMS_MAX_SUPERFRAME_VIEWS];
 };

+/* Fields within NvKmsVblankSemControlDataOneHead::flags */
+#define NVKMS_VBLANK_SEM_CONTROL_SWAP_INTERVAL          15:0
+
+struct NvKmsVblankSemControlDataOneHead {
+    NvU32 requestCounterAccel;
+    NvU32 requestCounter;
+    NvU32 flags;
+
+    NvU32 semaphore;
+    NvU64 vblankCount NV_ALIGN_BYTES(8);
+};
+
+struct NvKmsVblankSemControlData {
+    struct NvKmsVblankSemControlDataOneHead head[NV_MAX_HEADS];
+};
+
 #endif /* NVKMS_API_TYPES_H */
--- a/kernel-open/common/inc/nvkms-kapi.h
+++ b/kernel-open/common/inc/nvkms-kapi.h
@@ -124,6 +124,14 @@ struct NvKmsKapiDisplayMode {
 #define NVKMS_KAPI_LAYER_INVALID_IDX           0xff
 #define NVKMS_KAPI_LAYER_PRIMARY_IDX              0

+struct NvKmsKapiLutCaps {
+    struct {
+        struct NvKmsLUTCaps ilut;
+        struct NvKmsLUTCaps tmo;
+    } layer[NVKMS_KAPI_LAYER_MAX];
+    struct NvKmsLUTCaps olut;
+};
+
 struct NvKmsKapiDeviceResourcesInfo {

    NvU32 numHeads;
@@ -158,13 +166,19 @@ struct NvKmsKapiDeviceResourcesInfo {

        NvU32 hasVideoMemory;

+        NvU32 numDisplaySemaphores;
+
        NvU8  genericPageKind;

        NvBool  supportsSyncpts;
+
+        NvBool requiresVrrSemaphores;
    } caps;

    NvU64 supportedSurfaceMemoryFormats[NVKMS_KAPI_LAYER_MAX];
-    NvBool supportsHDR[NVKMS_KAPI_LAYER_MAX];
+    NvBool supportsICtCp[NVKMS_KAPI_LAYER_MAX];
+
+    struct NvKmsKapiLutCaps lutCaps;
 };

 #define NVKMS_KAPI_LAYER_MASK(layerType) (1 << (layerType))
@@ -210,18 +224,26 @@ struct NvKmsKapiStaticDisplayInfo {
    NvU32 headMask;
 };

-struct NvKmsKapiSyncpt {
+struct NvKmsKapiSyncParams {
+    union {
+        struct {
+            /*!
+             * Possible syncpt use case in kapi.
+             * For pre-syncpt, use only id and value
+             * and for post-syncpt, use only fd.
+             */
+            NvU32   preSyncptId;
+            NvU32   preSyncptValue;
+        } syncpt;

-    /*!
-     * Possible syncpt use case in kapi.
-     * For pre-syncpt, use only id and value
-     * and for post-syncpt, use only fd.
-     */
-    NvBool  preSyncptSpecified;
-    NvU32   preSyncptId;
-    NvU32   preSyncptValue;
+        struct {
+            NvU32 index;
+        } semaphore;
+    } u;

-    NvBool  postSyncptRequested;
+    NvBool preSyncptSpecified;
+    NvBool postSyncptRequested;
+    NvBool semaphoreSpecified;
 };

 struct NvKmsKapiLayerConfig {
@@ -231,7 +253,7 @@ struct NvKmsKapiLayerConfig {
        NvU8 surfaceAlpha;
    } compParams;
    struct NvKmsRRParams rrParams;
-    struct NvKmsKapiSyncpt syncptParams;
+    struct NvKmsKapiSyncParams syncParams;

    struct {
        struct NvKmsHDRStaticMetadata val;
@@ -250,21 +272,54 @@ struct NvKmsKapiLayerConfig {
    NvU16 dstWidth, dstHeight;

    enum NvKmsInputColorSpace inputColorSpace;
+
+    struct {
+        NvBool enabled;
+        struct NvKmsKapiSurface *lutSurface;
+        NvU64 offset;
+        NvU32 vssSegments;
+        NvU32 lutEntries;
+    } ilut;
+
+    struct {
+        NvBool enabled;
+        struct NvKmsKapiSurface *lutSurface;
+        NvU64 offset;
+        NvU32 vssSegments;
+        NvU32 lutEntries;
+    } tmo;
+
    struct NvKmsCscMatrix csc;
    NvBool cscUseMain;
+
+    struct {
+        struct NvKmsCscMatrix lmsCtm;
+        struct NvKmsCscMatrix lmsToItpCtm;
+        struct NvKmsCscMatrix itpToLmsCtm;
+        struct NvKmsCscMatrix blendCtm;
+        struct {
+            NvBool lmsCtm      : 1;
+            NvBool lmsToItpCtm : 1;
+            NvBool itpToLmsCtm : 1;
+            NvBool blendCtm    : 1;
+        } enabled;
+    } matrixOverrides;
 };

 struct NvKmsKapiLayerRequestedConfig {
    struct NvKmsKapiLayerConfig config;
    struct {
-        NvBool surfaceChanged     : 1;
-        NvBool srcXYChanged       : 1;
-        NvBool srcWHChanged       : 1;
-        NvBool dstXYChanged       : 1;
-        NvBool dstWHChanged       : 1;
-        NvBool cscChanged         : 1;
-        NvBool tfChanged          : 1;
-        NvBool hdrMetadataChanged : 1;
+        NvBool surfaceChanged          : 1;
+        NvBool srcXYChanged            : 1;
+        NvBool srcWHChanged            : 1;
+        NvBool dstXYChanged            : 1;
+        NvBool dstWHChanged            : 1;
+        NvBool cscChanged              : 1;
+        NvBool tfChanged               : 1;
+        NvBool hdrMetadataChanged      : 1;
+        NvBool matrixOverridesChanged  : 1;
+        NvBool ilutChanged             : 1;
+        NvBool tmoChanged              : 1;
    } flags;
 };

@@ -319,7 +374,6 @@ struct NvKmsKapiHeadModeSetConfig {

    struct {
        struct {
-            NvBool specified;
            NvU32 depth;
            NvU32 start;
            NvU32 end;
@@ -327,22 +381,34 @@ struct NvKmsKapiHeadModeSetConfig {
        } input;

        struct {
-            NvBool specified;
            NvBool enabled;
            struct NvKmsLutRamps *pRamps;
        } output;
    } lut;
+
+    struct {
+        NvBool enabled;
+        struct NvKmsKapiSurface *lutSurface;
+        NvU64 offset;
+        NvU32 vssSegments;
+        NvU32 lutEntries;
+    } olut;
+
+    NvU32 olutFpNormScale;
 };

 struct NvKmsKapiHeadRequestedConfig {
    struct NvKmsKapiHeadModeSetConfig modeSetConfig;
    struct {
-        NvBool activeChanged       : 1;
-        NvBool displaysChanged     : 1;
-        NvBool modeChanged         : 1;
-        NvBool hdrInfoFrameChanged : 1;
-        NvBool colorimetryChanged  : 1;
-        NvBool lutChanged      : 1;
+        NvBool activeChanged          : 1;
+        NvBool displaysChanged        : 1;
+        NvBool modeChanged            : 1;
+        NvBool hdrInfoFrameChanged    : 1;
+        NvBool colorimetryChanged     : 1;
+        NvBool legacyIlutChanged      : 1;
+        NvBool legacyOlutChanged      : 1;
+        NvBool olutChanged            : 1;
+        NvBool olutFpNormScaleChanged : 1;
    } flags;

    struct NvKmsKapiCursorRequestedConfig cursorRequestedConfig;
@@ -368,6 +434,8 @@ struct NvKmsKapiHeadReplyConfig {

 struct NvKmsKapiModeSetReplyConfig {
    enum NvKmsFlipResult flipResult;
+    NvBool vrrFlip;
+    NvS32 vrrSemaphoreIndex;
    struct NvKmsKapiHeadReplyConfig
        headReplyConfig[NVKMS_KAPI_MAX_HEADS];
 };
@@ -1159,21 +1227,6 @@ struct NvKmsKapiFunctionsTable {
        NvU64 *pPages
    );

-     /*!
-     * Check if this memory object can be scanned out for display.
-     *
-     * \param [in]  device  A device allocated using allocateDevice().
-     *
-     * \param [in]  memory  The memory object to check for display support.
-     *
-     * \return NV_TRUE if this memory can be displayed, NV_FALSE if not.
-     */
-    NvBool (*isMemoryValidForDisplay)
-    (
-        const struct NvKmsKapiDevice *device,
-        const struct NvKmsKapiMemory *memory
-    );
-
    /*
     * Import SGT as a memory handle.
     *
@@ -1410,6 +1463,97 @@ struct NvKmsKapiFunctionsTable {
    (
        NvKmsKapiSuspendResumeCallbackFunc *function
    );
+
+    /*!
+     * Immediately initialize the specified display semaphore to the pending state.
+     *
+     * Must be called prior to applying a mode set that utilizes the specified
+     * display semaphore for synchronization.
+     *
+     * \param [in] device         The device which will utilize the semaphore.
+     *
+     * \param [in] semaphoreIndex Index of the desired semaphore within the
+     *                            NVKMS semaphore pool. Must be less than
+     *                            NvKmsKapiDeviceResourcesInfo::caps::numDisplaySemaphores
+     *                            for the specified device.
+     */
+    NvBool
+    (*tryInitDisplaySemaphore)
+    (
+        struct NvKmsKapiDevice *device,
+        NvU32 semaphoreIndex
+    );
+
+    /*!
+     * Immediately set the specified display semaphore to the displayable state.
+     *
+     * Must be called after \ref tryInitDisplaySemaphore to indicate a mode
+     * configuration change that utilizes the specified display semaphore for
+     * synchronization may proceed.
+     *
+     * \param [in] device         The device which will utilize the semaphore.
+     *
+     * \param [in] semaphoreIndex Index of the desired semaphore within the
+     *                            NVKMS semaphore pool. Must be less than
+     *                            NvKmsKapiDeviceResourcesInfo::caps::numDisplaySemaphores
+     *                            for the specified device.
+     */
+    void
+    (*signalDisplaySemaphore)
+    (
+        struct NvKmsKapiDevice *device,
+        NvU32 semaphoreIndex
+    );
+
+    /*!
+     * Immediately cancel use of a display semaphore by resetting its value to
+     * its initial state.
+     *
+     * This can be used by clients to restore a semaphore to a consistent state
+     * when they have prepared it for use by previously calling
+     * \ref tryInitDisplaySemaphore() on it, but are then prevented from
+     * submitting the associated hardware operations to consume it due to the
+     * subsequent failure of some software or hardware operation.
+     *
+     * \param [in] device         The device which will utilize the semaphore.
+     *
+     * \param [in] semaphoreIndex Index of the desired semaphore within the
+     *                            NVKMS semaphore pool. Must be less than
+     *                            NvKmsKapiDeviceResourcesInfo::caps::numDisplaySemaphores
+     *                            for the specified device.
+     */
+    void
+    (*cancelDisplaySemaphore)
+    (
+        struct NvKmsKapiDevice *device,
+        NvU32 semaphoreIndex
+    );
+
+    /*!
+     * Signal the VRR semaphore at the specified index from the CPU.
+     * If device does not support VRR semaphores, this is a no-op.
+     * Returns true if signal is success or no-op, otherwise returns false.
+     *
+     * \param [in]  device  A device allocated using allocateDevice().
+     *
+     * \param [in]  index   The VRR semaphore index to be signalled.
+     */
+    NvBool
+    (*signalVrrSemaphore)
+    (
+        struct NvKmsKapiDevice *device,
+        NvS32 index
+    );
+
+    /*
+     * Notify NVKMS that the system's framebuffer console has been disabled and
+     * the reserved allocation for the old framebuffer console can be unmapped.
+     */
+    void
+    (*framebufferConsoleDisabled)
+    (
+        struct NvKmsKapiDevice *device
+    );
 };

 /** @} */
@@ -1424,6 +1568,20 @@ NvBool nvKmsKapiGetFunctionsTable
    struct NvKmsKapiFunctionsTable *funcsTable
 );

+NvU32 nvKmsKapiF16ToF32(NvU16 a);
+
+NvU16 nvKmsKapiF32ToF16(NvU32 a);
+
+NvU32 nvKmsKapiF32Mul(NvU32 a, NvU32 b);
+
+NvU32 nvKmsKapiF32Div(NvU32 a, NvU32 b);
+
+NvU32 nvKmsKapiF32Add(NvU32 a, NvU32 b);
+
+NvU32 nvKmsKapiF32ToUI32RMinMag(NvU32 a, NvBool exact);
+
+NvU32 nvKmsKapiUI32ToF32(NvU32 a);
+
 /** @} */

 #endif /* defined(__NVKMS_KAPI_H__) */
--- a/kernel-open/common/inc/nvlimits.h
+++ b/kernel-open/common/inc/nvlimits.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2017 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2017-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -34,19 +34,25 @@
 /*
 * This is the maximum number of GPUs supported in a single system.
 */
-#define NV_MAX_DEVICES          32
+#define NV_MAX_DEVICES                32

 /*
 * This is the maximum number of subdevices within a single device.
 */
-#define NV_MAX_SUBDEVICES       8
+#define NV_MAX_SUBDEVICES             8

 /*
 * This is the maximum length of the process name string.
 */
-#define NV_PROC_NAME_MAX_LENGTH 100U
+#define NV_PROC_NAME_MAX_LENGTH       100U

 /*
 * This is the maximum number of heads per GPU.
 */
-#define NV_MAX_HEADS            4
+#define NV_MAX_HEADS                  4
+
+/*
+ * Maximum length of a MIG device UUID. It is a 36-byte UUID string plus a
+ * 4-byte prefix and NUL terminator: 'M' 'I' 'G' '-' UUID '\0x0'
+ */
+#define NV_MIG_DEVICE_UUID_STR_LENGTH 41U
--- a/kernel-open/common/inc/nvmisc.h
+++ b/kernel-open/common/inc/nvmisc.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 1993-2020 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -67,6 +67,9 @@ extern "C" {
 #define NVBIT64(b)                NVBIT_TYPE(b, NvU64)
 #endif

+//Concatenate 2 32bit values to a 64bit value
+#define NV_CONCAT_32_TO_64(hi, lo) ((((NvU64)hi) << 32) | ((NvU64)lo))
+
 // Helper macro's for 32 bit bitmasks
 #define NV_BITMASK32_ELEMENT_SIZE            (sizeof(NvU32) << 3)
 #define NV_BITMASK32_IDX(chId)               (((chId) & ~(0x1F)) >> 5)  
@@ -494,6 +497,23 @@ do                                                      \
 //
 #define NV_TWO_N_MINUS_ONE(n) (((1ULL<<(n/2))<<((n+1)/2))-1)

+//
+// Create a 64b bitmask with n bits set
+// This is the same as ((1ULL<<n) - 1), but it doesn't overflow for n=64
+//
+// ...
+// n=-1, 0x0000000000000000
+// n=0,  0x0000000000000000
+// n=1,  0x0000000000000001
+// ...
+// n=63, 0x7FFFFFFFFFFFFFFF
+// n=64, 0xFFFFFFFFFFFFFFFF
+// n=65, 0xFFFFFFFFFFFFFFFF
+// n=66, 0xFFFFFFFFFFFFFFFF
+// ...
+//
+#define NV_BITMASK64(n) ((n<1) ? 0ULL : (NV_U64_MAX>>((n>64) ? 0 : (64-n))))
+
 #define DRF_READ_1WORD_BS(d,r,f,v) \
    ((DRF_EXTENT_MW(NV##d##r##f)<8)?DRF_READ_1BYTE_BS(NV##d##r##f,(v)): \
    ((DRF_EXTENT_MW(NV##d##r##f)<16)?DRF_READ_2BYTE_BS(NV##d##r##f,(v)): \
@@ -574,6 +594,13 @@ nvMaskPos32(const NvU32 mask, const NvU32 bitIdx)
    n32 = BIT_IDX_32(LOWESTBIT(n32));\
 }

+// Destructive operation on n64
+#define LOWESTBITIDX_64(n64)         \
+{                                    \
+    n64 = BIT_IDX_64(LOWESTBIT(n64));\
+}
+
+
 // Destructive operation on n32
 #define HIGHESTBITIDX_32(n32)   \
 {                               \
@@ -694,6 +721,42 @@ nvPrevPow2_U64(const NvU64 x )
    }                                                       \
 }

+//
+// Bug 4851259: Newly added functions must be hidden from certain HS-signed
+// ucode compilers to avoid signature mismatch.
+//
+#ifndef NVDEC_1_0
+/*!
+ * Returns the position of nth set bit in the given mask.
+ *
+ * Returns -1 if mask has fewer than n bits set.
+ *
+ * n is 0 indexed and has valid values 0..31 inclusive, so "zeroth" set bit is
+ * the first set LSB.
+ *
+ * Example, if mask = 0x000000F0u and n = 1, the return value will be 5.
+ * Example, if mask = 0x000000F0u and n = 4, the return value will be -1.
+ */
+static NV_FORCEINLINE NvS32
+nvGetNthSetBitIndex32(NvU32 mask, NvU32 n)
+{
+    NvU32 seenSetBitsCount = 0;
+    NvS32 index;
+    FOR_EACH_INDEX_IN_MASK(32, index, mask)
+    {
+        if (seenSetBitsCount == n)
+        {
+            return index;
+        }
+        ++seenSetBitsCount;
+    }
+    FOR_EACH_INDEX_IN_MASK_END;
+
+    return -1;
+}
+
+#endif // NVDEC_1_0
+
 //
 // Size to use when declaring variable-sized arrays
 //
@@ -918,6 +981,11 @@ static NV_FORCEINLINE void *NV_NVUPTR_TO_PTR(NvUPtr address)
 // Use (lo) if (b) is less than 64, and (hi) if >= 64.
 //
 #define NV_BIT_SET_128(b, lo, hi)              { nvAssert( (b) < 128 ); if ( (b) < 64 ) (lo) |= NVBIT64(b); else (hi) |= NVBIT64( b & 0x3F ); }
+//
+// Clear the bit at pos (b) for U64 which is < 128.
+// Use (lo) if (b) is less than 64, and (hi) if >= 64.
+//
+#define NV_BIT_CLEAR_128(b, lo, hi)            { nvAssert( (b) < 128 ); if ( (b) < 64 ) (lo) &= ~NVBIT64(b); else (hi) &= ~NVBIT64( b & 0x3F ); }

 // Get the number of elements the specified fixed-size array
 #define NV_ARRAY_ELEMENTS(x)                   ((sizeof(x)/sizeof((x)[0])))
--- a/kernel-open/common/inc/nvstatuscodes.h
+++ b/kernel-open/common/inc/nvstatuscodes.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2014-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -151,6 +151,10 @@ NV_STATUS_CODE(NV_ERR_RISCV_ERROR,                     0x00000079, "Generic RISC
 NV_STATUS_CODE(NV_ERR_FABRIC_MANAGER_NOT_PRESENT,      0x0000007A, "Fabric Manager is not loaded")
 NV_STATUS_CODE(NV_ERR_ALREADY_SIGNALLED,               0x0000007B, "Semaphore Surface value already >= requested wait value")
 NV_STATUS_CODE(NV_ERR_QUEUE_TASK_SLOT_NOT_AVAILABLE,   0x0000007C, "PMU RPC error due to no queue slot available for this event")
+NV_STATUS_CODE(NV_ERR_KEY_ROTATION_IN_PROGRESS,        0x0000007D, "Operation not allowed as key rotation is in progress")
+NV_STATUS_CODE(NV_ERR_TEST_ONLY_CODE_NOT_ENABLED,      0x0000007E, "Test-only code path not enabled")
+NV_STATUS_CODE(NV_ERR_SECURE_BOOT_FAILED,              0x0000007F, "GFW secure boot failed")
+NV_STATUS_CODE(NV_ERR_INSUFFICIENT_ZBC_ENTRY,          0x00000080, "No more ZBC entry for the client")

 // Warnings:
 NV_STATUS_CODE(NV_WARN_HOT_SWITCH,                     0x00010001, "WARNING Hot switch")
--- a/kernel-open/common/inc/nvtypes.h
+++ b/kernel-open/common/inc/nvtypes.h
@@ -152,6 +152,12 @@ typedef   signed short     NvS16; /* -32768 to 32767                         */
     (((NvU32)(c) & 0xff) << 8)  | \
     (((NvU32)(d) & 0xff))))

+// Macro to build an NvU64 from two DWORDS, listed from msb to lsb
+#define NvU64_BUILD(a, b) \
+    ((NvU64)( \
+     (((NvU64)(a) & ~0U) << 32) | \
+     (((NvU64)(b) & ~0U))))
+
 #if NVTYPES_USE_STDINT
 typedef uint32_t           NvV32; /* "void": enumerated or multiple fields   */
 typedef uint32_t           NvU32; /* 0 to 4294967295                         */
--- a/kernel-open/common/inc/os-interface.h
+++ b/kernel-open/common/inc/os-interface.h
@@ -40,8 +40,11 @@
 #include "nv_stdarg.h"
 #include <nv-kernel-interface-api.h>
 #include <os/nv_memory_type.h>
+#include <os/nv_memory_area.h>
 #include <nv-caps.h>

+#include "rs_access.h"
+


 typedef struct
@@ -102,8 +105,10 @@ NvBool      NV_API_CALL  os_pci_remove_supported     (void);
 void        NV_API_CALL  os_pci_remove               (void *);
 void*       NV_API_CALL  os_map_kernel_space         (NvU64, NvU64, NvU32);
 void        NV_API_CALL  os_unmap_kernel_space       (void *, NvU64);
-void*       NV_API_CALL  os_map_user_space           (NvU64, NvU64, NvU32, NvU32, void **);
+#if defined(NV_VMWARE)
+void*       NV_API_CALL  os_map_user_space           (MemoryArea *, NvU32, NvU32, void **);
 void        NV_API_CALL  os_unmap_user_space         (void *, NvU64, void *);
+#endif
 NV_STATUS   NV_API_CALL  os_flush_cpu_cache_all      (void);
 NV_STATUS   NV_API_CALL  os_flush_user_cache         (void);
 void        NV_API_CALL  os_flush_cpu_write_combine_buffer(void);
@@ -114,7 +119,7 @@ void        NV_API_CALL  os_io_write_byte            (NvU32, NvU8);
 void        NV_API_CALL  os_io_write_word            (NvU32, NvU16);
 void        NV_API_CALL  os_io_write_dword           (NvU32, NvU32);
 NvBool      NV_API_CALL  os_is_administrator         (void);
-NvBool      NV_API_CALL  os_allow_priority_override  (void);
+NvBool      NV_API_CALL  os_check_access             (RsAccessRight accessRight);
 void        NV_API_CALL  os_dbg_init                 (void);
 void        NV_API_CALL  os_dbg_breakpoint           (void);
 void        NV_API_CALL  os_dbg_set_level            (NvU32);
@@ -130,7 +135,8 @@ void        NV_API_CALL  os_free_spinlock            (void *);
 NvU64       NV_API_CALL  os_acquire_spinlock         (void *);
 void        NV_API_CALL  os_release_spinlock         (void *, NvU64);
 NV_STATUS   NV_API_CALL  os_queue_work_item          (struct os_work_queue *, void *);
-NV_STATUS   NV_API_CALL  os_flush_work_queue         (struct os_work_queue *);
+NV_STATUS   NV_API_CALL  os_flush_work_queue         (struct os_work_queue *, NvBool);
+NvBool      NV_API_CALL  os_is_queue_flush_ongoing   (struct os_work_queue *);
 NV_STATUS   NV_API_CALL  os_alloc_mutex              (void **);
 void        NV_API_CALL  os_free_mutex               (void *);
 NV_STATUS   NV_API_CALL  os_acquire_mutex            (void *);
@@ -151,6 +157,7 @@ void        NV_API_CALL  os_release_rwlock_read      (void *);
 void        NV_API_CALL  os_release_rwlock_write     (void *);
 NvBool      NV_API_CALL  os_semaphore_may_sleep      (void);
 NV_STATUS   NV_API_CALL  os_get_version_info         (os_version_info*);
+NV_STATUS   NV_API_CALL  os_get_is_openrm            (NvBool *);
 NvBool      NV_API_CALL  os_is_isr                   (void);
 NvBool      NV_API_CALL  os_pat_supported            (void);
 void        NV_API_CALL  os_dump_stack               (void);
@@ -218,6 +225,8 @@ extern NvU32 os_page_size;
 extern NvU64 os_page_mask;
 extern NvU8  os_page_shift;
 extern NvBool os_cc_enabled;
+extern NvBool os_cc_sev_snp_enabled;
+extern NvBool os_cc_snp_vtom_enabled;
 extern NvBool os_cc_tdx_enabled;
 extern NvBool os_dma_buf_enabled;
 extern NvBool os_imex_channel_is_supported;
--- a/kernel-open/common/inc/os/nv_memory_area.h
+++ b/kernel-open/common/inc/os/nv_memory_area.h
@@ -0,0 +1,104 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: MIT
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef NV_MEMORY_AREA_H
+#define NV_MEMORY_AREA_H
+
+typedef struct MemoryRange
+{
+    NvU64 start;
+    NvU64 size;
+} MemoryRange;
+
+typedef struct MemoryArea
+{
+    MemoryRange *pRanges;
+    NvU64 numRanges;
+} MemoryArea;
+
+static inline NvU64 memareaSize(MemoryArea memArea)
+{
+    NvU64 size = 0;
+    NvU64 idx = 0;
+    for (idx = 0; idx < memArea.numRanges; idx++)
+    {
+        size += memArea.pRanges[idx].size;
+    }
+    return size;
+}
+
+static inline MemoryRange
+mrangeMake
+(
+    NvU64 start,
+    NvU64 size
+)
+{
+    MemoryRange range;
+    range.start = start;
+    range.size = size;
+    return range;
+}
+
+static inline NvU64
+mrangeLimit
+(
+    MemoryRange a
+)
+{
+    return a.start + a.size;
+}
+
+static inline NvBool
+mrangeIntersects
+(
+    MemoryRange a,
+    MemoryRange b
+)
+{
+    return ((a.start >= b.start) && (a.start < mrangeLimit(b))) ||
+        ((b.start >= a.start) && (b.start < mrangeLimit(a)));
+}
+
+static inline NvBool
+mrangeContains
+(
+    MemoryRange outer,
+    MemoryRange inner
+)
+{
+    return (inner.start >= outer.start) && (mrangeLimit(inner) <= mrangeLimit(outer));
+}
+
+static inline MemoryRange
+mrangeOffset
+(
+    MemoryRange range,
+    NvU64 amt
+)
+{
+    range.start += amt;
+    return range;
+}
+
+#endif /* NV_MEMORY_AREA_H */
--- a/kernel-open/common/inc/rm-gpu-ops.h
+++ b/kernel-open/common/inc/rm-gpu-ops.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 1999-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 1999-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -85,9 +85,11 @@ NV_STATUS  NV_API_CALL rm_gpu_ops_enable_access_cntr(nvidia_stack_t *, nvgpuDevi
 NV_STATUS  NV_API_CALL rm_gpu_ops_disable_access_cntr(nvidia_stack_t *, nvgpuDeviceHandle_t, nvgpuAccessCntrInfo_t);
 NV_STATUS  NV_API_CALL  rm_gpu_ops_set_page_directory (nvidia_stack_t *, nvgpuAddressSpaceHandle_t, NvU64, unsigned, NvBool, NvU32);
 NV_STATUS  NV_API_CALL  rm_gpu_ops_unset_page_directory (nvidia_stack_t *, nvgpuAddressSpaceHandle_t);
+NV_STATUS  NV_API_CALL rm_gpu_ops_get_nvlink_info(nvidia_stack_t *, nvgpuDeviceHandle_t, nvgpuNvlinkInfo_t);
 NV_STATUS  NV_API_CALL rm_gpu_ops_p2p_object_create(nvidia_stack_t *, nvgpuDeviceHandle_t, nvgpuDeviceHandle_t, NvHandle *);
 void       NV_API_CALL rm_gpu_ops_p2p_object_destroy(nvidia_stack_t *, nvgpuSessionHandle_t, NvHandle);
 NV_STATUS  NV_API_CALL rm_gpu_ops_get_external_alloc_ptes(nvidia_stack_t*, nvgpuAddressSpaceHandle_t, NvHandle, NvU64, NvU64, nvgpuExternalMappingInfo_t);
+NV_STATUS  NV_API_CALL rm_gpu_ops_get_external_alloc_phys_addrs(nvidia_stack_t*, nvgpuAddressSpaceHandle_t, NvHandle, NvU64, NvU64, nvgpuExternalPhysAddrInfo_t);
 NV_STATUS  NV_API_CALL rm_gpu_ops_retain_channel(nvidia_stack_t *, nvgpuAddressSpaceHandle_t, NvHandle, NvHandle, void **, nvgpuChannelInstanceInfo_t);
 NV_STATUS  NV_API_CALL rm_gpu_ops_bind_channel_resources(nvidia_stack_t *, void *, nvgpuChannelResourceBindParams_t);
 void       NV_API_CALL rm_gpu_ops_release_channel(nvidia_stack_t *, void *);
@@ -100,17 +102,18 @@ void       NV_API_CALL rm_gpu_ops_paging_channel_destroy(nvidia_stack_t *, nvgpu
 NV_STATUS  NV_API_CALL rm_gpu_ops_paging_channels_map(nvidia_stack_t *, nvgpuAddressSpaceHandle_t, NvU64, nvgpuDeviceHandle_t, NvU64 *);
 void       NV_API_CALL rm_gpu_ops_paging_channels_unmap(nvidia_stack_t *, nvgpuAddressSpaceHandle_t, NvU64, nvgpuDeviceHandle_t);
 NV_STATUS  NV_API_CALL rm_gpu_ops_paging_channel_push_stream(nvidia_stack_t *, nvgpuPagingChannelHandle_t, char *, NvU32);
+void       NV_API_CALL rm_gpu_ops_report_fatal_error(nvidia_stack_t *, NV_STATUS error);

 NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_context_init(nvidia_stack_t *, struct ccslContext_t **, nvgpuChannelHandle_t);
 NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_context_clear(nvidia_stack_t *, struct ccslContext_t *);
-NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_context_update(nvidia_stack_t *, struct ccslContext_t *);
+NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_rotate_key(nvidia_stack_t *, UvmCslContext *[], NvU32);
 NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_rotate_iv(nvidia_stack_t *, struct ccslContext_t *, NvU8);
 NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_encrypt(nvidia_stack_t *, struct ccslContext_t *, NvU32, NvU8 const *, NvU8 *, NvU8 *);
 NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_encrypt_with_iv(nvidia_stack_t *, struct ccslContext_t *, NvU32, NvU8 const *, NvU8*, NvU8 *, NvU8 *);
-NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_decrypt(nvidia_stack_t *, struct ccslContext_t *, NvU32, NvU8 const *, NvU8 const *, NvU8 *, NvU8 const *, NvU32, NvU8 const *);
+NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_decrypt(nvidia_stack_t *, struct ccslContext_t *, NvU32, NvU8 const *, NvU8 const *, NvU32, NvU8 *, NvU8 const *, NvU32, NvU8 const *);
 NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_sign(nvidia_stack_t *, struct ccslContext_t *, NvU32, NvU8 const *, NvU8 *);
 NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_query_message_pool(nvidia_stack_t *, struct ccslContext_t *, NvU8, NvU64 *);
 NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_increment_iv(nvidia_stack_t *, struct ccslContext_t *, NvU8, NvU64, NvU8 *);
-NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_log_device_encryption(nvidia_stack_t *, struct ccslContext_t *, NvU32);
+NV_STATUS  NV_API_CALL rm_gpu_ops_ccsl_log_encryption(nvidia_stack_t *, struct ccslContext_t *, NvU8, NvU32);

 #endif
--- a/kernel-open/common/inc/rs_access.h
+++ b/kernel-open/common/inc/rs_access.h
@@ -0,0 +1,276 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2019-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: MIT
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#pragma once
+
+#include <nvtypes.h>
+#if defined(_MSC_VER)
+#pragma warning(disable:4324)
+#endif
+
+//
+// This file was generated with FINN, an NVIDIA coding tool.
+// Source file:      rs_access.finn
+//
+
+
+
+
+#include "nvtypes.h"
+#include "nvmisc.h"
+
+
+/****************************************************************************/
+/*                       Access right definitions                           */
+/****************************************************************************/
+
+//
+// The meaning of each access right is documented in
+//   resman/docs/rmapi/resource_server/rm_capabilities.adoc
+//
+// RS_ACCESS_COUNT is the number of access rights that have been defined
+// and are in use. All integers in the range [0, RS_ACCESS_COUNT) should
+// represent valid access rights.
+//
+// When adding a new access right, don't forget to update
+//   1) The descriptions in the resman/docs/rmapi/resource_server/rm_capabilities.adoc
+//   2) RS_ACCESS_COUNT, defined below
+//   3) The declaration of g_rsAccessMetadata in rs_access_rights.c
+//   4) The list of access rights in drivers/common/chip-config/Chipcontrols.pm
+//   5) Any relevant access right callbacks
+//
+
+#define RS_ACCESS_DUP_OBJECT 0U
+#define RS_ACCESS_NICE       1U
+#define RS_ACCESS_DEBUG      2U
+#define RS_ACCESS_PERFMON    3U
+#define RS_ACCESS_COUNT      4U
+
+
+/****************************************************************************/
+/*                     Access right data structures                         */
+/****************************************************************************/
+
+/*!
+ * @brief A type that can be used to represent any access right.
+ */
+typedef NvU16 RsAccessRight;
+
+/*!
+ * @brief An internal type used to represent one limb in an access right mask.
+ */
+typedef NvU32 RsAccessLimb;
+#define SDK_RS_ACCESS_LIMB_BITS 32
+
+/*!
+ * @brief The number of limbs in the RS_ACCESS_MASK struct.
+ */
+#define SDK_RS_ACCESS_MAX_LIMBS 1
+
+/*!
+ * @brief The maximum number of possible access rights supported by the
+ *        current data structure definition.
+ *
+ * You probably want RS_ACCESS_COUNT instead, which is the number of actual
+ * access rights defined.
+ */
+#define SDK_RS_ACCESS_MAX_COUNT (0x20) /* finn: Evaluated from "(SDK_RS_ACCESS_LIMB_BITS * SDK_RS_ACCESS_MAX_LIMBS)" */
+
+/**
+ * @brief A struct representing a set of access rights.
+ *
+ * Note that the values of bit positions larger than RS_ACCESS_COUNT is
+ * undefined, and should not be assumed to be 0 (see RS_ACCESS_MASK_FILL).
+ */
+typedef struct RS_ACCESS_MASK {
+    RsAccessLimb limbs[SDK_RS_ACCESS_MAX_LIMBS];
+} RS_ACCESS_MASK;
+
+/**
+ * @brief A struct representing auxiliary information about each access right.
+ */
+typedef struct RS_ACCESS_INFO {
+    NvU32 flags;
+} RS_ACCESS_INFO;
+
+
+/****************************************************************************/
+/*                           Access right macros                            */
+/****************************************************************************/
+
+#define SDK_RS_ACCESS_LIMB_INDEX(index) ((index) / SDK_RS_ACCESS_LIMB_BITS)
+#define SDK_RS_ACCESS_LIMB_POS(index)   ((index) % SDK_RS_ACCESS_LIMB_BITS)
+
+#define SDK_RS_ACCESS_LIMB_ELT(pAccessMask, index) \
+    ((pAccessMask)->limbs[SDK_RS_ACCESS_LIMB_INDEX(index)])
+#define SDK_RS_ACCESS_OFFSET_MASK(index) \
+    NVBIT_TYPE(SDK_RS_ACCESS_LIMB_POS(index), RsAccessLimb)
+
+/*!
+ * @brief Checks that accessRight represents a valid access right.
+ *
+ * The valid range of access rights is [0, RS_ACCESS_COUNT).
+ *
+ * @param[in] accessRight The access right value to check
+ *
+ * @return true if accessRight is valid
+ * @return false otherwise
+ */
+#define RS_ACCESS_BOUNDS_CHECK(accessRight) \
+    (accessRight < RS_ACCESS_COUNT)
+
+/*!
+ * @brief Test whether an access right is present in a set
+ *
+ * @param[in] pAccessMask The set of access rights to read
+ * @param[in] index The access right to examine
+ *
+ * @return NV_TRUE if the access right specified by index was present in the set,
+ *         and NV_FALSE otherwise
+ */
+#define RS_ACCESS_MASK_TEST(pAccessMask, index) \
+    (RS_ACCESS_BOUNDS_CHECK(index) && \
+        (SDK_RS_ACCESS_LIMB_ELT(pAccessMask, index) & SDK_RS_ACCESS_OFFSET_MASK(index)) != 0)
+
+/*!
+ * @brief Add an access right to a mask
+ *
+ * @param[in] pAccessMask The set of access rights to modify
+ * @param[in] index The access right to set
+ */
+#define RS_ACCESS_MASK_ADD(pAccessMask, index) \
+    do \
+    { \
+        if (RS_ACCESS_BOUNDS_CHECK(index)) { \
+            SDK_RS_ACCESS_LIMB_ELT(pAccessMask, index) |= SDK_RS_ACCESS_OFFSET_MASK(index); \
+        } \
+    } while (NV_FALSE)
+
+/*!
+ * @brief Remove an access right from a mask
+ *
+ * @param[in] pAccessMask The set of access rights to modify
+ * @param[in] index The access right to unset
+ */
+#define RS_ACCESS_MASK_REMOVE(pAccessMask, index) \
+    do \
+    { \
+        if (RS_ACCESS_BOUNDS_CHECK(index)) { \
+            SDK_RS_ACCESS_LIMB_ELT(pAccessMask, index) &= ~SDK_RS_ACCESS_OFFSET_MASK(index); \
+        } \
+    } while (NV_FALSE)
+
+/*!
+ * @brief Performs an in-place union between two access right masks
+ *
+ * @param[in,out] pMaskOut The access rights mask to be updated
+ * @param[in] pMaskIn The set of access rights to be added to pMaskOut
+ */
+#define RS_ACCESS_MASK_UNION(pMaskOut, pMaskIn) \
+    do \
+    { \
+        NvLength limb; \
+        for (limb = 0; limb < SDK_RS_ACCESS_MAX_LIMBS; limb++) \
+        { \
+            SDK_RS_ACCESS_LIMB_ELT(pMaskOut, limb) |= SDK_RS_ACCESS_LIMB_ELT(pMaskIn, limb); \
+        } \
+    } while (NV_FALSE)
+
+/*!
+ * @brief Performs an in-place subtract of one mask's rights from another
+ *
+ * @param[in,out] pMaskOut The access rights mask to be updated
+ * @param[in] pMaskIn The set of access rights to be removed from pMaskOut
+ */
+#define RS_ACCESS_MASK_SUBTRACT(pMaskOut, pMaskIn) \
+    do \
+    { \
+        NvLength limb; \
+        for (limb = 0; limb < SDK_RS_ACCESS_MAX_LIMBS; limb++) \
+        { \
+            SDK_RS_ACCESS_LIMB_ELT(pMaskOut, limb) &= ~SDK_RS_ACCESS_LIMB_ELT(pMaskIn, limb); \
+        } \
+    } while (NV_FALSE)
+
+/*!
+ * @brief Removes all rights from an access rights mask
+ *
+ * @param[in,out] pAccessMask The access rights mask to be updated
+ */
+#define RS_ACCESS_MASK_CLEAR(pAccessMask) \
+    do \
+    { \
+        portMemSet(pAccessMask, 0, sizeof(*pAccessMask)); \
+    } while (NV_FALSE)
+
+/*!
+ * @brief Adds all rights to an access rights mask
+ *
+ * @param[in,out] pAccessMask The access rights mask to be updated
+ */
+#define RS_ACCESS_MASK_FILL(pAccessMask) \
+    do \
+    { \
+        portMemSet(pAccessMask, 0xff, sizeof(*pAccessMask)); \
+    } while (NV_FALSE)
+
+
+/****************************************************************************/
+/*                           Share definitions                              */
+/****************************************************************************/
+
+//
+// The usage of Share Policy and the meaning of each share type is documented in
+//   resman/docs/rmapi/resource_server/rm_capabilities.adoc
+//
+#define RS_SHARE_TYPE_NONE              (0U)
+#define RS_SHARE_TYPE_ALL               (1U)
+#define RS_SHARE_TYPE_OS_SECURITY_TOKEN (2U)
+#define RS_SHARE_TYPE_CLIENT            (3U)
+#define RS_SHARE_TYPE_PID               (4U)
+#define RS_SHARE_TYPE_SMC_PARTITION     (5U)
+#define RS_SHARE_TYPE_GPU               (6U)
+#define RS_SHARE_TYPE_FM_CLIENT         (7U)
+// Must be last. Update when a new SHARE_TYPE is added
+#define RS_SHARE_TYPE_MAX               (8U)
+
+
+//
+// Use Revoke to remove an existing policy from the list.
+// Allow is based on OR logic, Require is based on AND logic.
+// To share a right, at least one Allow (non-Require) must match, and all Require must pass.
+// If Compose is specified, policies will be added to the list. Otherwise, they will replace the list.
+//
+#define RS_SHARE_ACTION_FLAG_REVOKE      NVBIT(0)
+#define RS_SHARE_ACTION_FLAG_REQUIRE     NVBIT(1)
+#define RS_SHARE_ACTION_FLAG_COMPOSE     NVBIT(2)
+
+/****************************************************************************/
+/*                       Share flag data structures                         */
+/****************************************************************************/
+
+typedef struct RS_SHARE_POLICY {
+    NvU32          target;
+    RS_ACCESS_MASK accessMask;
+    NvU16          type;                         ///< RS_SHARE_TYPE_
+    NvU8           action;                        ///< RS_SHARE_ACTION_
+} RS_SHARE_POLICY;
--- a/kernel-open/conftest.sh
+++ b/kernel-open/conftest.sh
@@ -71,7 +71,7 @@ test_header_presence() {
    TEST_CFLAGS="-E -M $CFLAGS"

    file="$1"
-    file_define=NV_`echo $file | tr '/.\-a-z' '___A-Z'`_PRESENT
+    file_define=NV_`echo $file | tr '/.-' '___' | tr 'a-z' 'A-Z'`_PRESENT

    CODE="#include <$file>"

@@ -661,23 +661,6 @@ compile_test() {
            compile_check_conftest "$CODE" "NV_PCI_GET_DOMAIN_BUS_AND_SLOT_PRESENT" "" "functions"
        ;;

-        pci_bus_address)
-            #
-            # Determine if the pci_bus_address() function is
-            # present.
-            #
-            # Added by commit 06cf56e497c8 ("PCI: Add pci_bus_address() to
-            # get bus address of a BAR") in v3.14
-            #
-            CODE="
-            #include <linux/pci.h>
-            void conftest_pci_bus_address(void) {
-                pci_bus_address();
-            }"
-
-            compile_check_conftest "$CODE" "NV_PCI_BUS_ADDRESS_PRESENT" "" "functions"
-        ;;
-
        hash__remap_4k_pfn)
            #
            # Determine if the hash__remap_4k_pfn() function is
@@ -1416,6 +1399,42 @@ compile_test() {
            compile_check_conftest "$CODE" "NV_VFIO_REGISTER_EMULATED_IOMMU_DEV_PRESENT" "" "functions"
        ;;

+        bus_type_has_iommu_ops)
+            #
+            # Determine if 'bus_type' structure has a 'iommu_ops' field.
+            #
+            # This field was removed by commit 17de3f5fdd35 (iommu: Retire bus ops)
+            # in v6.8
+            #
+            CODE="
+            #include <linux/device.h>
+
+            int conftest_bus_type_has_iommu_ops(void) {
+                return offsetof(struct bus_type, iommu_ops);
+            }"
+
+            compile_check_conftest "$CODE" "NV_BUS_TYPE_HAS_IOMMU_OPS" "" "types"
+        ;;
+
+        eventfd_signal_has_counter_arg)
+            #
+            # Determine if eventfd_signal() function has an additional 'counter' argument.
+            #
+            # This argument was removed by commit 3652117f8548 (eventfd: simplify
+            # eventfd_signal()) in v6.8
+            #
+            CODE="
+            #include <linux/eventfd.h>
+
+            void conftest_eventfd_signal_has_counter_arg(void) {
+                struct eventfd_ctx *ctx;
+
+                eventfd_signal(ctx, 1);
+            }"
+
+            compile_check_conftest "$CODE" "NV_EVENTFD_SIGNAL_HAS_COUNTER_ARG" "" "types"
+        ;;
+
        drm_available)
            # Determine if the DRM subsystem is usable
            CODE="
@@ -1502,23 +1521,6 @@ compile_test() {
            compile_check_conftest "$CODE" "NV_GET_NUM_PHYSPAGES_PRESENT" "" "functions"
        ;;

-        backing_dev_info)
-            #
-            # Determine if the 'address_space' structure has
-            # a 'backing_dev_info' field.
-            #
-            # Removed by commit b83ae6d42143 ("fs: remove
-            # mapping->backing_dev_info") in v4.0
-            #
-            CODE="
-            #include <linux/fs.h>
-            int conftest_backing_dev_info(void) {
-                return offsetof(struct address_space, backing_dev_info);
-            }"
-
-            compile_check_conftest "$CODE" "NV_ADDRESS_SPACE_HAS_BACKING_DEV_INFO" "" "types"
-        ;;
-
        xen_ioemu_inject_msi)
            # Determine if the xen_ioemu_inject_msi() function is present.
            CODE="
@@ -2373,45 +2375,6 @@ compile_test() {
            compile_check_conftest "$CODE" "NV_DRM_ATOMIC_HELPER_LEGACY_GAMMA_SET_PRESENT" "" "functions"
        ;;

-        wait_on_bit_lock_argument_count)
-            #
-            # Determine how many arguments wait_on_bit_lock takes.
-            #
-            # Changed by commit 743162013d40 ("sched: Remove proliferation
-            # of wait_on_bit() action functions") in v3.17 (2014-07-07)
-            #
-            echo "$CONFTEST_PREAMBLE
-            #include <linux/wait.h>
-            void conftest_wait_on_bit_lock(void) {
-                wait_on_bit_lock(NULL, 0, 0);
-            }" > conftest$$.c
-
-            $CC $CFLAGS -c conftest$$.c > /dev/null 2>&1
-            rm -f conftest$$.c
-
-            if [ -f conftest$$.o ]; then
-                rm -f conftest$$.o
-                echo "#define NV_WAIT_ON_BIT_LOCK_ARGUMENT_COUNT 3" | append_conftest "functions"
-                return
-            fi
-
-            echo "$CONFTEST_PREAMBLE
-            #include <linux/wait.h>
-            void conftest_wait_on_bit_lock(void) {
-                wait_on_bit_lock(NULL, 0, NULL, 0);
-            }" > conftest$$.c
-
-            $CC $CFLAGS -c conftest$$.c > /dev/null 2>&1
-            rm -f conftest$$.c
-
-            if [ -f conftest$$.o ]; then
-                rm -f conftest$$.o
-                echo "#define NV_WAIT_ON_BIT_LOCK_ARGUMENT_COUNT 4" | append_conftest "functions"
-                return
-            fi
-            echo "#error wait_on_bit_lock() conftest failed!" | append_conftest "functions"
-        ;;
-
        pci_stop_and_remove_bus_device)
            #
            # Determine if the pci_stop_and_remove_bus_device() function is present.
@@ -2487,29 +2450,20 @@ compile_test() {
            fi
        ;;

-        mm_context_t)
+        file_operations_fop_unsigned_offset_present)
            #
-            # Determine if the 'mm_context_t' data type is present
-            # and if it has an 'id' member.
-            # It does not exist on all architectures.
+            # Determine if the FOP_UNSIGNED_OFFSET define is present.
            #
-            echo "$CONFTEST_PREAMBLE
-            #include <linux/mm.h>
-            int conftest_mm_context_t(void) {
-                return offsetof(mm_context_t, id);
-            }" > conftest$$.c
+            # Added by commit 641bb4394f40 ("fs: move FMODE_UNSIGNED_OFFSET to
+            # fop_flags") in v6.12.
+            #
+            CODE="
+            #include <linux/fs.h>
+            int conftest_file_operations_fop_unsigned_offset_present(void) {
+                return FOP_UNSIGNED_OFFSET;
+            }"

-            $CC $CFLAGS -c conftest$$.c > /dev/null 2>&1
-            rm -f conftest$$.c
-
-            if [ -f conftest$$.o ]; then
-                echo "#define NV_MM_CONTEXT_T_HAS_ID" | append_conftest "types"
-                rm -f conftest$$.o
-                return
-            else
-                echo "#undef NV_MM_CONTEXT_T_HAS_ID" | append_conftest "types"
-                return
-            fi
+            compile_check_conftest "$CODE" "NV_FILE_OPERATIONS_FOP_UNSIGNED_OFFSET_PRESENT" "" "types"
        ;;

        pci_dev_has_ats_enabled)
@@ -5066,6 +5020,42 @@ compile_test() {
            compile_check_conftest "$CODE" "NV_CC_PLATFORM_PRESENT" "" "functions"
        ;;

+        cc_attr_guest_sev_snp)
+            #
+            # Determine if 'CC_ATTR_GUEST_SEV_SNP' is present.
+            #
+            # Added by commit aa5a461171f9 ("x86/mm: Extend cc_attr to
+            # include AMD SEV-SNP") in v5.19.
+            #
+            CODE="
+            #if defined(NV_LINUX_CC_PLATFORM_H_PRESENT)
+            #include <linux/cc_platform.h>
+            #endif
+
+            enum cc_attr cc_attributes = CC_ATTR_GUEST_SEV_SNP;
+            "
+
+            compile_check_conftest "$CODE" "NV_CC_ATTR_SEV_SNP" "" "types"
+        ;;
+
+        hv_get_isolation_type)
+            #
+            # Determine if 'hv_get_isolation_type()' is present.
+            # Added by commit faff44069ff5 ("x86/hyperv: Add Write/Read MSR
+            # registers via ghcb page") in v5.16.
+            #
+            CODE="
+            #if defined(NV_ASM_MSHYPERV_H_PRESENT)
+            #include <asm/mshyperv.h>
+            #endif
+            void conftest_hv_get_isolation_type(void) {
+                int i;
+                hv_get_isolation_type(i);
+            }"
+
+            compile_check_conftest "$CODE" "NV_HV_GET_ISOLATION_TYPE" "" "functions"
+        ;;
+
        drm_prime_pages_to_sg_has_drm_device_arg)
            #
            # Determine if drm_prime_pages_to_sg() has 'dev' argument.
@@ -5168,11 +5158,15 @@ compile_test() {
            # commit 49a3f51dfeee ("drm/gem: Use struct dma_buf_map in GEM
            # vmap ops and convert GEM backends") in v5.11.
            #
+            # Note that the 'map' argument type is changed from 'struct dma_buf_map'
+            # to 'struct iosys_map' by commit 7938f4218168 ("dma-buf-map: Rename
+            # to iosys-map) in v5.18.
+            #
            CODE="
            #include <drm/drm_gem.h>
            int conftest_drm_gem_object_vmap_has_map_arg(
-                    struct drm_gem_object *obj, struct dma_buf_map *map) {
-                return obj->funcs->vmap(obj, map);
+                    struct drm_gem_object *obj) {
+                return obj->funcs->vmap(obj, NULL);
            }"

            compile_check_conftest "$CODE" "NV_DRM_GEM_OBJECT_VMAP_HAS_MAP_ARG" "" "types"
@@ -5212,25 +5206,23 @@ compile_test() {
            compile_check_conftest "$CODE" "NV_PCI_CLASS_MULTIMEDIA_HD_AUDIO_PRESENT" "" "generic"
        ;;

-        unsafe_follow_pfn)
+        follow_pfn)
            #
-            # Determine if unsafe_follow_pfn() is present.
+            # Determine if follow_pfn() is present.
            #
-            # unsafe_follow_pfn() was added by commit 69bacee7f9ad
-            # ("mm: Add unsafe_follow_pfn") in v5.13-rc1.
-            #
-            # Note: this commit never made it to the linux kernel, so
-            # unsafe_follow_pfn() never existed.
+            # follow_pfn() was added by commit 3b6748e2dd69
+            # ("mm: introduce follow_pfn()") in v2.6.31-rc1, and removed
+            # by commit 233eb0bf3b94 ("mm: remove follow_pfn")
+            # from linux-next 233eb0bf3b94.
            #
            CODE="
            #include <linux/mm.h>
-            void conftest_unsafe_follow_pfn(void) {
-                unsafe_follow_pfn();
+            void conftest_follow_pfn(void) {
+                follow_pfn();
            }"

-            compile_check_conftest "$CODE" "NV_UNSAFE_FOLLOW_PFN_PRESENT" "" "functions"
+            compile_check_conftest "$CODE" "NV_FOLLOW_PFN_PRESENT" "" "functions"
        ;;
-
        drm_plane_atomic_check_has_atomic_state_arg)
            #
            # Determine if drm_plane_helper_funcs::atomic_check takes 'state'
@@ -5516,7 +5508,8 @@ compile_test() {

        of_dma_configure)
            #
-            # Determine if of_dma_configure() function is present
+            # Determine if of_dma_configure() function is present, and how
+            # many arguments it takes.
            #
            # Added by commit 591c1ee465ce ("of: configure the platform
            # device dma parameters") in v3.16.  However, it was a static,
@@ -5526,17 +5519,69 @@ compile_test() {
            # commit 1f5c69aa51f9 ("of: Move of_dma_configure() to device.c
            # to help re-use") in v4.1.
            #
-            CODE="
+            # It subsequently began taking a third parameter with commit
+            # 3d6ce86ee794 ("drivers: remove force dma flag from buses")
+            # in v4.18.
+            #
+
+            echo "$CONFTEST_PREAMBLE
            #if defined(NV_LINUX_OF_DEVICE_H_PRESENT)
            #include <linux/of_device.h>
            #endif
+
            void conftest_of_dma_configure(void)
            {
                of_dma_configure();
            }
-            "
+            " > conftest$$.c

-            compile_check_conftest "$CODE" "NV_OF_DMA_CONFIGURE_PRESENT" "" "functions"
+            $CC $CFLAGS -c conftest$$.c > /dev/null 2>&1
+            rm -f conftest$$.c
+
+            if [ -f conftest$$.o ]; then
+                rm -f conftest$$.o
+
+                echo "#undef NV_OF_DMA_CONFIGURE_PRESENT" | append_conftest "functions"
+                echo "#undef NV_OF_DMA_CONFIGURE_ARGUMENT_COUNT" | append_conftest "functions"
+            else
+                echo "#define NV_OF_DMA_CONFIGURE_PRESENT" | append_conftest "functions"
+
+                echo "$CONFTEST_PREAMBLE
+                #if defined(NV_LINUX_OF_DEVICE_H_PRESENT)
+                #include <linux/of_device.h>
+                #endif
+
+                void conftest_of_dma_configure(void) {
+                    of_dma_configure(NULL, NULL, false);
+                }" > conftest$$.c
+
+                $CC $CFLAGS -c conftest$$.c > /dev/null 2>&1
+                rm -f conftest$$.c
+
+                if [ -f conftest$$.o ]; then
+                    rm -f conftest$$.o
+                    echo "#define NV_OF_DMA_CONFIGURE_ARGUMENT_COUNT 3" | append_conftest "functions"
+                    return
+                fi
+
+                echo "$CONFTEST_PREAMBLE
+                #if defined(NV_LINUX_OF_DEVICE_H_PRESENT)
+                #include <linux/of_device.h>
+                #endif
+
+                void conftest_of_dma_configure(void) {
+                    of_dma_configure(NULL, NULL);
+                }" > conftest$$.c
+
+                $CC $CFLAGS -c conftest$$.c > /dev/null 2>&1
+                rm -f conftest$$.c
+
+                if [ -f conftest$$.o ]; then
+                    rm -f conftest$$.o
+                    echo "#define NV_OF_DMA_CONFIGURE_ARGUMENT_COUNT 2" | append_conftest "functions"
+                    return
+                fi
+            fi
        ;;

        icc_get)
@@ -6505,7 +6550,9 @@ compile_test() {
            # Determine whether drm_fbdev_generic_setup is present.
            #
            # Added by commit 9060d7f49376 ("drm/fb-helper: Finish the
-            # generic fbdev emulation") in v4.19.
+            # generic fbdev emulation") in v4.19. Removed by commit
+            # aae4682e5d66 ("drm/fbdev-generic: Convert to fbdev-ttm")
+            # in v6.11.
            #
            CODE="
            #include <drm/drm_fb_helper.h>
@@ -6517,6 +6564,48 @@ compile_test() {
            }"

            compile_check_conftest "$CODE" "NV_DRM_FBDEV_GENERIC_SETUP_PRESENT" "" "functions"
+            ;;
+
+        drm_fbdev_ttm_setup)
+            #
+            # Determine whether drm_fbdev_ttm_setup is present.
+            #
+            # Added by commit aae4682e5d66 ("drm/fbdev-generic:
+            # Convert to fbdev-ttm") in v6.11.
+            #
+            CODE="
+            #include <drm/drm_fb_helper.h>
+            #if defined(NV_DRM_DRM_FBDEV_TTM_H_PRESENT)
+            #include <drm/drm_fbdev_ttm.h>
+            #endif
+            void conftest_drm_fbdev_ttm_setup(void) {
+                drm_fbdev_ttm_setup();
+            }"
+
+            compile_check_conftest "$CODE" "NV_DRM_FBDEV_TTM_SETUP_PRESENT" "" "functions"
+        ;;
+
+        drm_output_poll_changed)
+            #
+            # Determine whether drm_mode_config_funcs.output_poll_changed
+            # callback is present
+            #
+            # Removed by commit 446d0f4849b1 ("drm: Remove struct
+            # drm_mode_config_funcs.output_poll_changed") in v6.12. Hotplug
+            # event support is handled through the fbdev emulation interface
+            # going forward.
+            #
+            CODE="
+            #if defined(NV_DRM_DRM_MODE_CONFIG_H_PRESENT)
+            #include <drm/drm_mode_config.h>
+            #else
+            #include <drm/drm_crtc.h>
+            #endif
+            int conftest_drm_output_poll_changed_available(void) {
+                return offsetof(struct drm_mode_config_funcs, output_poll_changed);
+            }"
+
+            compile_check_conftest "$CODE" "NV_DRM_OUTPUT_POLL_CHANGED_PRESENT" "" "types"
        ;;

        drm_aperture_remove_conflicting_pci_framebuffers)
@@ -6757,12 +6846,45 @@ compile_test() {
            compile_check_conftest "$CODE" "NV_DRM_MODE_CREATE_DP_COLORSPACE_PROPERTY_HAS_SUPPORTED_COLORSPACES_ARG" "" "types"
        ;;

+        drm_syncobj_features_present)
+            # Determine if DRIVER_SYNCOBJ and DRIVER_SYNCOBJ_TIMELINE DRM
+            # driver features are present. Timeline DRM synchronization objects
+            # may only be used if both of these are supported by the driver.
+            #
+            # DRIVER_SYNCOBJ_TIMELINE Added by commit 060cebb20cdb ("drm:
+            # introduce a capability flag for syncobj timeline support") in
+            # v5.2
+            #
+            # DRIVER_SYNCOBJ Added by commit e9083420bbac ("drm: introduce
+            # sync objects (v4)") in v4.12
+            CODE="
+            #if defined(NV_DRM_DRM_DRV_H_PRESENT)
+            #include <drm/drm_drv.h>
+            #endif
+            int features = DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE;"
+
+            compile_check_conftest "$CODE" "NV_DRM_SYNCOBJ_FEATURES_PRESENT" "" "types"
+        ;;
+
+        stack_trace)
+            # Determine if functions stack_trace_{save,print} are present.
+            # Added by commit e9b98e162 ("stacktrace: Provide helpers for
+            # common stack trace operations") in v5.2.
+            CODE="
+            #include <linux/stacktrace.h>
+            void conftest_stack_trace(void) {
+                stack_trace_save();
+                stack_trace_print();
+            }"
+
+            compile_check_conftest "$CODE" "NV_STACK_TRACE_PRESENT" "" "functions"
+        ;;
+
        drm_unlocked_ioctl_flag_present)
            # Determine if DRM_UNLOCKED IOCTL flag is present.
            #
            # DRM_UNLOCKED was removed by commit 2798ffcc1d6a ("drm: Remove
-            # locking for legacy ioctls and DRM_UNLOCKED") in Linux
-            # next-20231208.
+            # locking for legacy ioctls and DRM_UNLOCKED") in v6.8.
            #
            # DRM_UNLOCKED definition was moved from drmP.h to drm_ioctl.h by
            # commit 2640981f3600 ("drm: document drm_ioctl.[hc]") in v4.12.
@@ -6778,6 +6900,280 @@ compile_test() {
            compile_check_conftest "$CODE" "NV_DRM_UNLOCKED_IOCTL_FLAG_PRESENT" "" "types"
        ;;

+        fault_flag_remote_present)
+            # Determine if FAULT_FLAG_REMOTE is present in the kernel, either
+            # as a define or an enum
+            #
+            # FAULT_FLAG_REMOTE define added by Kernel commit 1b2ee1266ea6
+            # ("mm/core: Do not enforce PKEY permissions on remote mm access")
+            # in v4.6
+            # FAULT_FLAG_REMOTE changed from define to enum by Kernel commit
+            # da2f5eb3d344 ("mm/doc: turn fault flags into an enum") in v5.13
+            # FAULT_FLAG_REMOTE moved from `mm.h` to `mm_types.h` by Kernel
+            # commit 36090def7bad ("mm: move tlb_flush_pending inline helpers
+            # to mm_inline.h") in v5.17
+            #
+            CODE="
+            #include <linux/mm.h>
+            int fault_flag_remote = FAULT_FLAG_REMOTE;
+            "
+
+            compile_check_conftest "$CODE" "NV_MM_HAS_FAULT_FLAG_REMOTE" "" "types"
+        ;;
+
+        drm_framebuffer_obj_present)
+            #
+            # Determine if the drm_framebuffer struct has an obj member.
+            #
+            # Added by commit 4c3dbb2c312c ("drm: Add GEM backed framebuffer
+            # library") in v4.14.
+            #
+            CODE="
+            #if defined(NV_DRM_DRMP_H_PRESENT)
+            #include <drm/drmP.h>
+            #endif
+
+            #if defined(NV_DRM_DRM_FRAMEBUFFER_H_PRESENT)
+            #include <drm/drm_framebuffer.h>
+            #endif
+
+            int conftest_drm_framebuffer_obj_present(void) {
+                return offsetof(struct drm_framebuffer, obj);
+            }"
+
+            compile_check_conftest "$CODE" "NV_DRM_FRAMEBUFFER_OBJ_PRESENT" "" "types"
+        ;;
+
+        drm_color_ctm_3x4_present)
+            # Determine if struct drm_color_ctm_3x4 is present.
+            #
+            # struct drm_color_ctm_3x4 was added by commit 6872a189be50
+            # ("drm/amd/display: Add 3x4 CTM support for plane CTM") in v6.8.
+            CODE="
+            #include <uapi/drm/drm_mode.h>
+            struct drm_color_ctm_3x4 ctm;"
+
+            compile_check_conftest "$CODE" "NV_DRM_COLOR_CTM_3X4_PRESENT" "" "types"
+        ;;
+
+        drm_color_lut)
+            # Determine if struct drm_color_lut is present.
+            #
+            # struct drm_color_lut was added by commit 5488dc16fde7
+            # ("drm: introduce pipe color correction properties") in v4.6.
+            CODE="
+            #include <uapi/drm/drm_mode.h>
+            struct drm_color_lut lut;"
+
+            compile_check_conftest "$CODE" "NV_DRM_COLOR_LUT_PRESENT" "" "types"
+        ;;
+
+        drm_property_blob_put)
+            #
+            # Determine if function drm_property_blob_put() is present.
+            #
+            # Added by commit 6472e5090be7 ("drm: Introduce
+            # drm_property_blob_{get,put}()") v4.12, when it replaced
+            # drm_property_unreference_blob().
+            #
+
+            CODE="
+            #if defined(NV_DRM_DRM_PROPERTY_H_PRESENT)
+            #include <drm/drm_property.h>
+            #endif
+            void conftest_drm_property_blob_put(void) {
+                drm_property_blob_put();
+            }"
+
+            compile_check_conftest "$CODE" "NV_DRM_PROPERTY_BLOB_PUT_PRESENT" "" "functions"
+        ;;
+
+        drm_driver_has_gem_prime_mmap)
+            #
+            # Determine if the 'drm_driver' structure has a 'gem_prime_mmap'
+            # function pointer.
+            #
+            # Removed by commit 0adec22702d4 ("drm: Remove struct
+            # drm_driver.gem_prime_mmap") in v6.6.
+            #
+            CODE="
+            #if defined(NV_DRM_DRMP_H_PRESENT)
+            #include <drm/drmP.h>
+            #endif
+
+            #if defined(NV_DRM_DRM_DRV_H_PRESENT)
+            #include <drm/drm_drv.h>
+            #endif
+
+            int conftest_drm_driver_has_gem_prime_mmap(void) {
+                return offsetof(struct drm_driver, gem_prime_mmap);
+            }"
+
+            compile_check_conftest "$CODE" "NV_DRM_DRIVER_HAS_GEM_PRIME_MMAP" "" "types"
+        ;;
+
+        drm_gem_prime_mmap)
+            #
+            # Determine if the function drm_gem_prime_mmap() is present.
+            #
+            # Added by commit 7698799f95 ("drm/prime: Add drm_gem_prime_mmap()
+            # in v5.0
+            #
+            CODE="
+            #if defined(NV_DRM_DRMP_H_PRESENT)
+            #include <drm/drmP.h>
+            #endif
+            #if defined(NV_DRM_DRM_PRIME_H_PRESENT)
+            #include <drm/drm_prime.h>
+            #endif
+            void conftest_drm_gem_prime_mmap(void) {
+                drm_gem_prime_mmap();
+            }"
+
+            compile_check_conftest "$CODE" "NV_DRM_GEM_PRIME_MMAP_PRESENT" "" "functions"
+        ;;
+
+        vmf_insert_mixed)
+            #
+            # Determine if the function vmf_insert_mixed() is present.
+            #
+            # Added by commit 1c8f422059ae ("mm: change return type to
+            # vm_fault_t") in v4.17.
+            #
+            CODE="
+            #include <linux/mm.h>
+            void conftest_vmf_insert_mixed() {
+                vmf_insert_mixed();
+            }"
+
+            compile_check_conftest "$CODE" "NV_VMF_INSERT_MIXED_PRESENT" "" "functions"
+        ;;
+
+        pfn_to_pfn_t)
+            #
+            # Determine if the function pfn_to_pfn_t() is present.
+            #
+            # Added by commit 34c0fd540e79 ("mm, dax, pmem: introduce pfn_t") in
+            # v4.5.
+            #
+            CODE="
+            #if defined(NV_LINUX_PFN_T_H_PRESENT)
+            #include <linux/pfn_t.h>
+            #endif
+            void conftest_pfn_to_pfn_t() {
+                pfn_to_pfn_t();
+            }"
+
+            compile_check_conftest "$CODE" "NV_PFN_TO_PFN_T_PRESENT" "" "functions"
+        ;;
+
+        drm_gem_dmabuf_mmap)
+            #
+            # Determine if the drm_gem_dmabuf_mmap() function is present.
+            #
+            # drm_gem_dmabuf_mmap() was exported by commit c308279f8798 ("drm:
+            # export gem dmabuf_ops for drivers to reuse") in v4.17.
+            #
+            CODE="
+            #if defined(NV_DRM_DRM_PRIME_H_PRESENT)
+            #include <drm/drm_prime.h>
+            #endif
+            void conftest_drm_gem_dmabuf_mmap(void) {
+                drm_gem_dmabuf_mmap();
+            }"
+
+            compile_check_conftest "$CODE" "NV_DRM_GEM_DMABUF_MMAP_PRESENT" "" "functions"
+        ;;
+
+        drm_gem_prime_export_has_dev_arg)
+            #
+            # Determine if drm_gem_prime_export() function has a 'dev' argument.
+            #
+            # This argument was removed by commit e4fa8457b219 ("drm/prime:
+            # Align gem_prime_export with obj_funcs.export") in v5.4.
+            #
+            CODE="
+            #if defined(NV_DRM_DRMP_H_PRESENT)
+            #include <drm/drmP.h>
+            #endif
+            #if defined(NV_DRM_DRM_PRIME_H_PRESENT)
+            #include <drm/drm_prime.h>
+            #endif
+
+            void conftest_drm_gem_prime_export_has_dev_arg(
+                    struct drm_device *dev,
+                    struct drm_gem_object *obj) {
+                (void) drm_gem_prime_export(dev, obj, 0);
+            }"
+
+            compile_check_conftest "$CODE" "NV_DRM_GEM_PRIME_EXPORT_HAS_DEV_ARG" "" "types"
+        ;;
+
+        dma_buf_ops_has_cache_sgt_mapping)
+            #
+            # Determine if dma_buf_ops structure has a 'cache_sgt_mapping'
+            # member.
+            #
+            # dma_buf_ops::cache_sgt_mapping was added by commit f13e143e7444
+            # ("dma-buf: start caching of sg_table objects v2") in v5.3.
+            #
+            CODE="
+            #include <linux/dma-buf.h>
+            int conftest_dma_ops_has_cache_sgt_mapping(void) {
+                return offsetof(struct dma_buf_ops, cache_sgt_mapping);
+            }"
+
+            compile_check_conftest "$CODE" "NV_DMA_BUF_OPS_HAS_CACHE_SGT_MAPPING" "" "types"
+        ;;
+
+        drm_gem_object_funcs)
+            #
+            # Determine if the 'struct drm_gem_object_funcs' type is present.
+            #
+            # Added by commit b39b5394fabc ("drm/gem: Add drm_gem_object_funcs")
+            # in v5.0.
+            #
+            CODE="
+            #if defined(NV_DRM_DRM_GEM_H_PRESENT)
+            #include <drm/drm_gem.h>
+            #endif
+            struct drm_gem_object_funcs funcs;"
+
+            compile_check_conftest "$CODE" "NV_DRM_GEM_OBJECT_FUNCS_PRESENT" "" "types"
+        ;;
+
+        struct_page_has_zone_device_data)
+            #
+            # Determine if struct page has a 'zone_device_data' field.
+            #
+            # Added by commit 8a164fef9c4c ("mm: simplify ZONE_DEVICE page
+            # private data") in v5.3.
+            #
+            CODE="
+            #include <linux/mm_types.h>
+            int conftest_struct_page_has_zone_device_data(void) {
+                return offsetof(struct page, zone_device_data);
+            }"
+
+            compile_check_conftest "$CODE" "NV_STRUCT_PAGE_HAS_ZONE_DEVICE_DATA" "" "types"
+        ;;
+
+    folio_test_swapcache)
+            #
+            # Determine if the folio_test_swapcache() function is present.
+            #
+            # folio_test_swapcache() was exported by commit d389a4a811551 ("mm:
+            # Add folio flag manipulation functions") in v5.16.
+            #
+            CODE="
+            #include <linux/page-flags.h>
+            void conftest_folio_test_swapcache(void) {
+                folio_test_swapcache();
+            }"
+
+            compile_check_conftest "$CODE" "NV_FOLIO_TEST_SWAPCACHE_PRESENT" "" "functions"
+        ;;
+
        # When adding a new conftest entry, please use the correct format for
        # specifying the relevant upstream Linux kernel commit.  Please
        # avoid specifying -rc kernels, and only use SHAs that actually exist
--- a/kernel-open/header-presence-tests.mk
+++ b/kernel-open/header-presence-tests.mk
@@ -15,6 +15,7 @@ NV_HEADER_PRESENCE_TESTS = \
  drm/drm_atomic_uapi.h \
  drm/drm_drv.h \
  drm/drm_fbdev_generic.h \
+  drm/drm_fbdev_ttm.h \
  drm/drm_framebuffer.h \
  drm/drm_connector.h \
  drm/drm_probe_helper.h \
@@ -28,6 +29,7 @@ NV_HEADER_PRESENCE_TESTS = \
  drm/drm_device.h \
  drm/drm_mode_config.h \
  drm/drm_modeset_lock.h \
+  drm/drm_property.h \
  dt-bindings/interconnect/tegra_icc_id.h \
  generated/autoconf.h \
  generated/compile.h \
@@ -52,6 +54,7 @@ NV_HEADER_PRESENCE_TESTS = \
  linux/dma-resv.h \
  soc/tegra/chip-id.h \
  soc/tegra/fuse.h \
+  soc/tegra/fuse-helper.h \
  soc/tegra/tegra_bpmp.h \
  video/nv_internal.h \
  linux/platform/tegra/dce/dce-client-ipc.h \
@@ -97,5 +100,7 @@ NV_HEADER_PRESENCE_TESTS = \
  linux/sync_file.h \
  linux/cc_platform.h \
  asm/cpufeature.h \
-  linux/mpi.h
+  linux/mpi.h \
+  asm/mshyperv.h \
+  linux/pfn_t.h

--- a/kernel-open/nvidia-drm/nv-kthread-q.c
+++ b/kernel-open/nvidia-drm/nv-kthread-q.c
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2016-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -176,7 +176,7 @@ static struct task_struct *thread_create_on_node(int (*threadfn)(void *data),
 {

    unsigned i, j;
-    const static unsigned attempts = 3;
+    static const unsigned attempts = 3;
    struct task_struct *thread[3];

    for (i = 0;; i++) {
@@ -201,7 +201,7 @@ static struct task_struct *thread_create_on_node(int (*threadfn)(void *data),

        // Ran out of attempts - return thread even if its stack may not be
        // allocated on the preferred node
-        if ((i == (attempts - 1)))
+        if (i == (attempts - 1))
            break;

        // Get the NUMA node where the first page of the stack is resident. If
--- a/kernel-open/nvidia-drm/nv_common_utils.h
+++ b/kernel-open/nvidia-drm/nv_common_utils.h
@@ -0,0 +1,120 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2015 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: MIT
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __NV_COMMON_UTILS_H__
+#define __NV_COMMON_UTILS_H__
+
+#include "nvtypes.h"
+#include "nvmisc.h"
+
+#if !defined(TRUE)
+#define TRUE NV_TRUE
+#endif
+
+#if !defined(FALSE)
+#define FALSE NV_FALSE
+#endif
+
+#define NV_IS_UNSIGNED(x) ((__typeof__(x))-1 > 0)
+
+/* Get the length of a statically-sized array. */
+#define ARRAY_LEN(_arr) (sizeof(_arr) / sizeof(_arr[0]))
+
+#define NV_INVALID_HEAD         0xFFFFFFFF
+
+#define NV_INVALID_CONNECTOR_PHYSICAL_INFORMATION (~0)
+
+#if !defined(NV_MIN)
+# define NV_MIN(a,b) (((a)<(b))?(a):(b))
+#endif
+
+#define NV_MIN3(a,b,c) NV_MIN(NV_MIN(a, b), c)
+#define NV_MIN4(a,b,c,d) NV_MIN3(NV_MIN(a,b),c,d)
+
+#if !defined(NV_MAX)
+# define NV_MAX(a,b) (((a)>(b))?(a):(b))
+#endif
+
+#define NV_MAX3(a,b,c) NV_MAX(NV_MAX(a, b), c)
+#define NV_MAX4(a,b,c,d) NV_MAX3(NV_MAX(a,b),c,d)
+
+static inline int NV_LIMIT_VAL_TO_MIN_MAX(int val, int min, int max)
+{
+    if (val < min) {
+        return min;
+    }
+    if (val > max) {
+        return max;
+    }
+    return val;
+}
+
+#define NV_ROUNDUP_DIV(x,y) ((x) / (y) + (((x) % (y)) ? 1 : 0))
+
+/*
+ * Macros used for computing palette entries:
+ *
+ * NV_UNDER_REPLICATE(val, source_size, result_size) expands a value
+ * of source_size bits into a value of target_size bits by shifting
+ * the source value into the high bits and replicating the high bits
+ * of the value into the low bits of the result.
+ *
+ * PALETTE_DEPTH_SHIFT(val, w) maps a colormap entry for a component
+ * that has w bits to an appropriate entry in a LUT of 256 entries.
+ */
+static inline unsigned int NV_UNDER_REPLICATE(unsigned short val,
+                                              int source_size,
+                                              int result_size)
+{
+    return (val << (result_size - source_size)) |
+        (val >> ((source_size << 1) - result_size));
+}
+
+
+static inline unsigned short PALETTE_DEPTH_SHIFT(unsigned short val, int depth)
+{
+    return NV_UNDER_REPLICATE(val, depth, 8);
+}
+
+/*
+ *  Use __builtin_ffs where it is supported, or provide an equivalent
+ *  implementation for platforms like riscv where it is not.
+ */
+#if defined(__GNUC__) && !NVCPU_IS_RISCV64
+static inline int nv_ffs(int x)
+{
+    return __builtin_ffs(x);
+}
+#else
+static inline int nv_ffs(int x)
+{
+    if (x == 0)
+        return 0;
+
+    LOWESTBITIDX_32(x);
+
+    return 1 + x;
+}
+#endif
+
+#endif /* __NV_COMMON_UTILS_H__ */
--- a/kernel-open/nvidia-drm/nvidia-drm-conftest.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-conftest.h
@@ -85,7 +85,11 @@

 /* For nv_drm_gem_prime_force_fence_signal */
 #ifndef spin_is_locked
+#if ((__FreeBSD_version >= 1500000) && (__FreeBSD_version < 1500018)) || (__FreeBSD_version < 1401501)
 #define spin_is_locked(lock) mtx_owned(lock.m)
+#else
+#define spin_is_locked(lock) mtx_owned(lock)
+#endif
 #endif

 #ifndef rwsem_is_locked
--- a/kernel-open/nvidia-drm/nvidia-drm-crtc.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-crtc.c
--- a/kernel-open/nvidia-drm/nvidia-drm-crtc.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-crtc.h
@@ -38,6 +38,13 @@
 #include "nvtypes.h"
 #include "nvkms-kapi.h"

+enum nv_drm_transfer_function {
+    NV_DRM_TRANSFER_FUNCTION_DEFAULT,
+    NV_DRM_TRANSFER_FUNCTION_LINEAR,
+    NV_DRM_TRANSFER_FUNCTION_PQ,
+    NV_DRM_TRANSFER_FUNCTION_MAX,
+};
+
 struct nv_drm_crtc {
    NvU32 head;

@@ -63,6 +70,8 @@ struct nv_drm_crtc {
     */
    struct drm_file *modeset_permission_filep;

+    struct NvKmsLUTCaps olut_caps;
+
    struct drm_crtc base;
 };

@@ -129,9 +138,6 @@ struct nv_drm_crtc_state {
     */
    struct NvKmsKapiHeadRequestedConfig req_config;

-    struct NvKmsLutRamps *ilut_ramps;
-    struct NvKmsLutRamps *olut_ramps;
-
    /**
     * @nv_flip:
     *
@@ -145,6 +151,12 @@ struct nv_drm_crtc_state {
     * nv_drm_atomic_crtc_destroy_state().
     */
    struct nv_drm_flip *nv_flip;
+
+    enum nv_drm_transfer_function regamma_tf;
+    struct drm_property_blob *regamma_lut;
+    uint64_t regamma_divisor;
+    struct nv_drm_lut_surface *regamma_drm_lut_surface;
+    NvBool regamma_changed;
 };

 static inline struct nv_drm_crtc_state *to_nv_crtc_state(struct drm_crtc_state *state)
@@ -152,6 +164,11 @@ static inline struct nv_drm_crtc_state *to_nv_crtc_state(struct drm_crtc_state *
    return container_of(state, struct nv_drm_crtc_state, base);
 }

+static inline const struct nv_drm_crtc_state *to_nv_crtc_state_const(const struct drm_crtc_state *state)
+{
+    return container_of(state, struct nv_drm_crtc_state, base);
+}
+
 struct nv_drm_plane {
    /**
     * @base:
@@ -173,6 +190,9 @@ struct nv_drm_plane {
     * Index of this plane in the per head array of layers.
     */
    uint32_t layer_idx;
+
+    struct NvKmsLUTCaps ilut_caps;
+    struct NvKmsLUTCaps tmo_caps;
 };

 static inline struct nv_drm_plane *to_nv_plane(struct drm_plane *plane)
@@ -183,6 +203,22 @@ static inline struct nv_drm_plane *to_nv_plane(struct drm_plane *plane)
    return container_of(plane, struct nv_drm_plane, base);
 }

+struct nv_drm_lut_surface {
+    struct NvKmsKapiDevice *pDevice;
+    struct NvKmsKapiMemory *nvkms_memory;
+    struct NvKmsKapiSurface *nvkms_surface;
+    struct {
+        NvU32 vssSegments;
+        enum NvKmsLUTVssType vssType;
+
+        NvU32 lutEntries;
+        enum NvKmsLUTFormat entryFormat;
+
+    } properties;
+    void *buffer;
+    struct kref refcount;
+};
+
 struct nv_drm_plane_state {
    struct drm_plane_state base;
    s32 __user *fd_user_ptr;
@@ -190,6 +226,20 @@ struct nv_drm_plane_state {
 #if defined(NV_DRM_HAS_HDR_OUTPUT_METADATA)
    struct drm_property_blob *hdr_output_metadata;
 #endif
+    struct drm_property_blob *lms_ctm;
+    struct drm_property_blob *lms_to_itp_ctm;
+    struct drm_property_blob *itp_to_lms_ctm;
+    struct drm_property_blob *blend_ctm;
+
+    enum nv_drm_transfer_function degamma_tf;
+    struct drm_property_blob *degamma_lut;
+    uint64_t degamma_multiplier; /* S31.32 Sign-Magnitude Format */
+    struct nv_drm_lut_surface *degamma_drm_lut_surface;
+    NvBool degamma_changed;
+
+    struct drm_property_blob *tmo_lut;
+    struct nv_drm_lut_surface *tmo_drm_lut_surface;
+    NvBool tmo_changed;
 };

 static inline struct nv_drm_plane_state *to_nv_drm_plane_state(struct drm_plane_state *state)
--- a/kernel-open/nvidia-drm/nvidia-drm-drv.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-drv.c
@@ -64,12 +64,14 @@
 #include <drm/drm_ioctl.h>
 #endif

-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
+#if defined(NV_DRM_FBDEV_AVAILABLE)
 #include <drm/drm_aperture.h>
 #include <drm/drm_fb_helper.h>
 #endif

-#if defined(NV_DRM_DRM_FBDEV_GENERIC_H_PRESENT)
+#if defined(NV_DRM_DRM_FBDEV_TTM_H_PRESENT)
+#include <drm/drm_fbdev_ttm.h>
+#elif defined(NV_DRM_DRM_FBDEV_GENERIC_H_PRESENT)
 #include <drm/drm_fbdev_generic.h>
 #endif

@@ -105,16 +107,16 @@ static int nv_drm_revoke_sub_ownership(struct drm_device *dev);

 static struct nv_drm_device *dev_list = NULL;

-static const char* nv_get_input_colorspace_name(
+static char* nv_get_input_colorspace_name(
    enum NvKmsInputColorSpace colorSpace)
 {
    switch (colorSpace) {
        case NVKMS_INPUT_COLORSPACE_NONE:
            return "None";
        case NVKMS_INPUT_COLORSPACE_SCRGB_LINEAR:
-            return "IEC 61966-2-2 linear FP";
+            return "scRGB Linear FP16";
        case NVKMS_INPUT_COLORSPACE_BT2100_PQ:
-            return "ITU-R BT.2100-PQ YCbCr";
+            return "BT.2100 PQ";
        default:
            /* We shoudn't hit this */
            WARN_ON("Unsupported input colorspace");
@@ -122,8 +124,30 @@ static const char* nv_get_input_colorspace_name(
    }
 };

+static char* nv_get_transfer_function_name(
+    enum nv_drm_transfer_function tf)
+{
+    switch (tf) {
+        case NV_DRM_TRANSFER_FUNCTION_LINEAR:
+            return "Linear";
+        case NV_DRM_TRANSFER_FUNCTION_PQ:
+            return "PQ (Perceptual Quantizer)";
+        default:
+            /* We shoudn't hit this */
+            WARN_ON("Unsupported transfer function");
+#if defined(fallthrough)
+            fallthrough;
+#else
+            /* Fallthrough */
+#endif
+        case NV_DRM_TRANSFER_FUNCTION_DEFAULT:
+            return "Default";
+    }
+};
+
 #if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)

+#if defined(NV_DRM_OUTPUT_POLL_CHANGED_PRESENT)
 static void nv_drm_output_poll_changed(struct drm_device *dev)
 {
    struct drm_connector *connector = NULL;
@@ -167,6 +191,7 @@ static void nv_drm_output_poll_changed(struct drm_device *dev)
    nv_drm_connector_list_iter_end(&conn_iter);
 #endif
 }
+#endif /* NV_DRM_OUTPUT_POLL_CHANGED_PRESENT */

 static struct drm_framebuffer *nv_drm_framebuffer_create(
    struct drm_device *dev,
@@ -204,7 +229,9 @@ static const struct drm_mode_config_funcs nv_mode_config_funcs = {
    .atomic_check  = nv_drm_atomic_check,
    .atomic_commit = nv_drm_atomic_commit,

+    #if defined(NV_DRM_OUTPUT_POLL_CHANGED_PRESENT)
    .output_poll_changed = nv_drm_output_poll_changed,
+    #endif
 };

 static void nv_drm_event_callback(const struct NvKmsKapiEvent *event)
@@ -364,31 +391,33 @@ static void nv_drm_enumerate_encoders_and_connectors
 */
 static int nv_drm_create_properties(struct nv_drm_device *nv_dev)
 {
-    struct drm_prop_enum_list enum_list[3] = { };
+    struct drm_prop_enum_list colorspace_enum_list[3] = { };
+    struct drm_prop_enum_list tf_enum_list[NV_DRM_TRANSFER_FUNCTION_MAX] = { };
    int i, len = 0;

    for (i = 0; i < 3; i++) {
-        enum_list[len].type = i;
-        enum_list[len].name = nv_get_input_colorspace_name(i);
+        colorspace_enum_list[len].type = i;
+        colorspace_enum_list[len].name = nv_get_input_colorspace_name(i);
        len++;
    }

-#if defined(NV_LINUX_NVHOST_H_PRESENT) && defined(CONFIG_TEGRA_GRHOST)
-    if (!nv_dev->supportsSyncpts) {
-        return 0;
+    for (i = 0; i < NV_DRM_TRANSFER_FUNCTION_MAX; i++) {
+        tf_enum_list[i].type = i;
+        tf_enum_list[i].name = nv_get_transfer_function_name(i);
    }

-    nv_dev->nv_out_fence_property =
-        drm_property_create_range(nv_dev->dev, DRM_MODE_PROP_ATOMIC,
-            "NV_DRM_OUT_FENCE_PTR", 0, U64_MAX);
-    if (nv_dev->nv_out_fence_property == NULL) {
-        return -ENOMEM;
+    if (nv_dev->supportsSyncpts) {
+        nv_dev->nv_out_fence_property =
+            drm_property_create_range(nv_dev->dev, DRM_MODE_PROP_ATOMIC,
+                    "NV_DRM_OUT_FENCE_PTR", 0, U64_MAX);
+        if (nv_dev->nv_out_fence_property == NULL) {
+            return -ENOMEM;
+        }
    }
-#endif

    nv_dev->nv_input_colorspace_property =
        drm_property_create_enum(nv_dev->dev, 0, "NV_INPUT_COLORSPACE",
-                                 enum_list, len);
+                                 colorspace_enum_list, len);
    if (nv_dev->nv_input_colorspace_property == NULL) {
        NV_DRM_LOG_ERR("Failed to create NV_INPUT_COLORSPACE property");
        return -ENOMEM;
@@ -403,6 +432,109 @@ static int nv_drm_create_properties(struct nv_drm_device *nv_dev)
    }
 #endif

+    nv_dev->nv_plane_lms_ctm_property =
+        drm_property_create(nv_dev->dev, DRM_MODE_PROP_BLOB,
+            "NV_PLANE_LMS_CTM", 0);
+    if (nv_dev->nv_plane_lms_ctm_property == NULL) {
+        return -ENOMEM;
+    }
+
+    nv_dev->nv_plane_lms_to_itp_ctm_property =
+        drm_property_create(nv_dev->dev, DRM_MODE_PROP_BLOB,
+            "NV_PLANE_LMS_TO_ITP_CTM", 0);
+    if (nv_dev->nv_plane_lms_to_itp_ctm_property == NULL) {
+        return -ENOMEM;
+    }
+
+    nv_dev->nv_plane_itp_to_lms_ctm_property =
+        drm_property_create(nv_dev->dev, DRM_MODE_PROP_BLOB,
+            "NV_PLANE_ITP_TO_LMS_CTM", 0);
+    if (nv_dev->nv_plane_itp_to_lms_ctm_property == NULL) {
+        return -ENOMEM;
+    }
+
+    nv_dev->nv_plane_blend_ctm_property =
+        drm_property_create(nv_dev->dev, DRM_MODE_PROP_BLOB,
+            "NV_PLANE_BLEND_CTM", 0);
+    if (nv_dev->nv_plane_blend_ctm_property == NULL) {
+        return -ENOMEM;
+    }
+
+    // Degamma TF + LUT + LUT Size + Multiplier
+
+    nv_dev->nv_plane_degamma_tf_property =
+        drm_property_create_enum(nv_dev->dev, 0,
+            "NV_PLANE_DEGAMMA_TF", tf_enum_list,
+            NV_DRM_TRANSFER_FUNCTION_MAX);
+    if (nv_dev->nv_plane_degamma_tf_property == NULL) {
+        return -ENOMEM;
+    }
+    nv_dev->nv_plane_degamma_lut_property =
+        drm_property_create(nv_dev->dev, DRM_MODE_PROP_BLOB,
+            "NV_PLANE_DEGAMMA_LUT", 0);
+    if (nv_dev->nv_plane_degamma_lut_property == NULL) {
+        return -ENOMEM;
+    }
+    nv_dev->nv_plane_degamma_lut_size_property =
+        drm_property_create_range(nv_dev->dev, DRM_MODE_PROP_IMMUTABLE,
+            "NV_PLANE_DEGAMMA_LUT_SIZE", 0, UINT_MAX);
+    if (nv_dev->nv_plane_degamma_lut_size_property == NULL) {
+        return -ENOMEM;
+    }
+    nv_dev->nv_plane_degamma_multiplier_property =
+        drm_property_create_range(nv_dev->dev, 0,
+            "NV_PLANE_DEGAMMA_MULTIPLIER", 0,
+            U64_MAX & ~(((NvU64) 1) << 63)); // No negative values
+    if (nv_dev->nv_plane_degamma_multiplier_property == NULL) {
+        return -ENOMEM;
+    }
+
+    // TMO LUT + LUT Size
+
+    nv_dev->nv_plane_tmo_lut_property =
+        drm_property_create(nv_dev->dev, DRM_MODE_PROP_BLOB,
+            "NV_PLANE_TMO_LUT", 0);
+    if (nv_dev->nv_plane_tmo_lut_property == NULL) {
+        return -ENOMEM;
+    }
+    nv_dev->nv_plane_tmo_lut_size_property =
+        drm_property_create_range(nv_dev->dev, DRM_MODE_PROP_IMMUTABLE,
+            "NV_PLANE_TMO_LUT_SIZE", 0, UINT_MAX);
+    if (nv_dev->nv_plane_tmo_lut_size_property == NULL) {
+        return -ENOMEM;
+    }
+
+    // REGAMMA TF + LUT + LUT Size + Divisor
+
+    nv_dev->nv_crtc_regamma_tf_property =
+        drm_property_create_enum(nv_dev->dev, 0,
+            "NV_CRTC_REGAMMA_TF", tf_enum_list,
+            NV_DRM_TRANSFER_FUNCTION_MAX);
+    if (nv_dev->nv_crtc_regamma_tf_property == NULL) {
+        return -ENOMEM;
+    }
+    nv_dev->nv_crtc_regamma_lut_property =
+        drm_property_create(nv_dev->dev, DRM_MODE_PROP_BLOB,
+            "NV_CRTC_REGAMMA_LUT", 0);
+    if (nv_dev->nv_crtc_regamma_lut_property == NULL) {
+        return -ENOMEM;
+    }
+    nv_dev->nv_crtc_regamma_lut_size_property =
+        drm_property_create_range(nv_dev->dev, DRM_MODE_PROP_IMMUTABLE,
+            "NV_CRTC_REGAMMA_LUT_SIZE", 0, UINT_MAX);
+    if (nv_dev->nv_crtc_regamma_lut_size_property == NULL) {
+        return -ENOMEM;
+    }
+    // S31.32
+    nv_dev->nv_crtc_regamma_divisor_property =
+        drm_property_create_range(nv_dev->dev, 0,
+            "NV_CRTC_REGAMMA_DIVISOR",
+            (((NvU64) 1) << 32), // No values between 0 and 1
+            U64_MAX & ~(((NvU64) 1) << 63)); // No negative values
+    if (nv_dev->nv_crtc_regamma_divisor_property == NULL) {
+        return -ENOMEM;
+    }
+
    return 0;
 }

@@ -434,7 +566,7 @@ static int nv_drm_load(struct drm_device *dev, unsigned long flags)

    struct NvKmsKapiAllocateDeviceParams allocateDeviceParams;
    struct NvKmsKapiDeviceResourcesInfo resInfo;
-#endif
+#endif /* defined(NV_DRM_ATOMIC_MODESET_AVAILABLE) */
 #if defined(NV_DRM_FORMAT_MODIFIERS_PRESENT)
    NvU64 kind;
    NvU64 gen;
@@ -480,6 +612,22 @@ static int nv_drm_load(struct drm_device *dev, unsigned long flags)
        return -ENODEV;
    }

+#if defined(NV_DRM_FBDEV_AVAILABLE)
+    /*
+     * If fbdev is enabled, take modeset ownership now before other DRM clients
+     * can take master (and thus NVKMS ownership).
+     */
+    if (nv_drm_fbdev_module_param) {
+        if (!nvKms->grabOwnership(pDevice)) {
+            nvKms->freeDevice(pDevice);
+            NV_DRM_DEV_LOG_ERR(nv_dev, "Failed to grab NVKMS modeset ownership");
+            return -EBUSY;
+        }
+
+        nv_dev->hasFramebufferConsole = NV_TRUE;
+    }
+#endif
+
    mutex_lock(&nv_dev->lock);

    /* Set NvKmsKapiDevice */
@@ -505,6 +653,12 @@ static int nv_drm_load(struct drm_device *dev, unsigned long flags)
    nv_dev->semsurf_max_submitted_offset =
        resInfo.caps.semsurf.maxSubmittedOffset;

+    nv_dev->display_semaphores.count =
+        resInfo.caps.numDisplaySemaphores;
+    nv_dev->display_semaphores.next_index = 0;
+
+    nv_dev->requiresVrrSemaphores = resInfo.caps.requiresVrrSemaphores;
+
 #if defined(NV_DRM_FORMAT_MODIFIERS_PRESENT)
    gen = nv_dev->pageKindGeneration;
    kind = nv_dev->genericPageKind;
@@ -530,6 +684,13 @@ static int nv_drm_load(struct drm_device *dev, unsigned long flags)

    ret = nv_drm_create_properties(nv_dev);
    if (ret < 0) {
+        drm_mode_config_cleanup(dev);
+#if defined(NV_DRM_FBDEV_AVAILABLE)
+        if (nv_dev->hasFramebufferConsole) {
+            nvKms->releaseOwnership(nv_dev->pDevice);
+        }
+#endif
+        nvKms->freeDevice(nv_dev->pDevice);
        return -ENODEV;
    }

@@ -590,6 +751,15 @@ static void __nv_drm_unload(struct drm_device *dev)
        return;
    }

+    /* Release modeset ownership if fbdev is enabled */
+
+#if defined(NV_DRM_FBDEV_AVAILABLE)
+    if (nv_dev->hasFramebufferConsole) {
+        drm_atomic_helper_shutdown(dev);
+        nvKms->releaseOwnership(nv_dev->pDevice);
+    }
+#endif
+
    cancel_delayed_work_sync(&nv_dev->hotplug_event_work);
    mutex_lock(&nv_dev->lock);

@@ -652,7 +822,6 @@ static int __nv_drm_master_set(struct drm_device *dev,
        !nvKms->grabOwnership(nv_dev->pDevice)) {
        return -EINVAL;
    }
-    nv_dev->drmMasterChangedSinceLastAtomicCommit = NV_TRUE;

    return 0;
 }
@@ -684,36 +853,37 @@ void nv_drm_master_drop(struct drm_device *dev, struct drm_file *file_priv)
 #endif
 {
    struct nv_drm_device *nv_dev = to_nv_device(dev);
-    int err;

    nv_drm_revoke_modeset_permission(dev, file_priv, 0);
    nv_drm_revoke_sub_ownership(dev);

-    /*
-     * After dropping nvkms modeset onwership, it is not guaranteed that
-     * drm and nvkms modeset state will remain in sync.  Therefore, disable
-     * all outputs and crtcs before dropping nvkms modeset ownership.
-     *
-     * First disable all active outputs atomically and then disable each crtc one
-     * by one, there is not helper function available to disable all crtcs
-     * atomically.
-     */
-
-    drm_modeset_lock_all(dev);
-
-    if ((err = nv_drm_atomic_helper_disable_all(
-            dev,
-            dev->mode_config.acquire_ctx)) != 0) {
-
-        NV_DRM_DEV_LOG_ERR(
-            nv_dev,
-            "nv_drm_atomic_helper_disable_all failed with error code %d !",
-            err);
-    }
-
-    drm_modeset_unlock_all(dev);
-
    if (!nv_dev->hasFramebufferConsole) {
+        int err;
+
+        /*
+         * After dropping nvkms modeset onwership, it is not guaranteed that drm
+         * and nvkms modeset state will remain in sync.  Therefore, disable all
+         * outputs and crtcs before dropping nvkms modeset ownership.
+         *
+         * First disable all active outputs atomically and then disable each
+         * crtc one by one, there is not helper function available to disable
+         * all crtcs atomically.
+         */
+
+        drm_modeset_lock_all(dev);
+
+        if ((err = nv_drm_atomic_helper_disable_all(
+                dev,
+                dev->mode_config.acquire_ctx)) != 0) {
+
+            NV_DRM_DEV_LOG_ERR(
+                nv_dev,
+                "nv_drm_atomic_helper_disable_all failed with error code %d !",
+                err);
+        }
+
+        drm_modeset_unlock_all(dev);
+
        nvKms->releaseOwnership(nv_dev->pDevice);
    }
 }
@@ -781,6 +951,14 @@ static int nv_drm_get_dev_info_ioctl(struct drm_device *dev,
    return 0;
 }

+static int nv_drm_get_drm_file_unique_id_ioctl(struct drm_device *dev,
+                                               void *data, struct drm_file *filep)
+{
+    struct drm_nvidia_get_drm_file_unique_id_params *params = data;
+    params->id = (u64)(filep->driver_priv);
+    return 0;
+}
+
 static int nv_drm_dmabuf_supported_ioctl(struct drm_device *dev,
                                         void *data, struct drm_file *filep)
 {
@@ -834,13 +1012,18 @@ static int nv_drm_get_dpy_id_for_connector_id_ioctl(struct drm_device *dev,
                                                    struct drm_file *filep)
 {
    struct drm_nvidia_get_dpy_id_for_connector_id_params *params = data;
+    struct drm_connector *connector;
+    struct nv_drm_connector *nv_connector;
+    int ret = 0;
+
+    if (!drm_core_check_feature(dev, DRIVER_MODESET)) {
+        return -EOPNOTSUPP;
+    }
+
    // Importantly, drm_connector_lookup (with filep) will only return the
    // connector if we are master, a lessee with the connector, or not master at
    // all. It will return NULL if we are a lessee with other connectors.
-    struct drm_connector *connector =
-        nv_drm_connector_lookup(dev, filep, params->connectorId);
-    struct nv_drm_connector *nv_connector;
-    int ret = 0;
+    connector = nv_drm_connector_lookup(dev, filep, params->connectorId);

    if (!connector) {
        return -EINVAL;
@@ -873,6 +1056,11 @@ static int nv_drm_get_connector_id_for_dpy_id_ioctl(struct drm_device *dev,
    int ret = -EINVAL;
 #if defined(NV_DRM_CONNECTOR_LIST_ITER_PRESENT)
    struct drm_connector_list_iter conn_iter;
+#endif
+    if (!drm_core_check_feature(dev, DRIVER_MODESET)) {
+        return -EOPNOTSUPP;
+    }
+#if defined(NV_DRM_CONNECTOR_LIST_ITER_PRESENT)
    nv_drm_connector_list_iter_begin(dev, &conn_iter);
 #endif

@@ -1085,6 +1273,10 @@ static int nv_drm_grant_permission_ioctl(struct drm_device *dev, void *data,
 {
    struct drm_nvidia_grant_permissions_params *params = data;

+    if (!drm_core_check_feature(dev, DRIVER_MODESET)) {
+        return -EOPNOTSUPP;
+    }
+
    if (params->type == NV_DRM_PERMISSIONS_TYPE_MODESET) {
        return nv_drm_grant_modeset_permission(dev, params, filep);
    } else if (params->type == NV_DRM_PERMISSIONS_TYPE_SUB_OWNER) {
@@ -1250,6 +1442,10 @@ static int nv_drm_revoke_permission_ioctl(struct drm_device *dev, void *data,
 {
    struct drm_nvidia_revoke_permissions_params *params = data;

+    if (!drm_core_check_feature(dev, DRIVER_MODESET)) {
+        return -EOPNOTSUPP;
+    }
+
    if (params->type == NV_DRM_PERMISSIONS_TYPE_MODESET) {
        if (!params->dpyId) {
            return -EINVAL;
@@ -1279,6 +1475,17 @@ static void nv_drm_postclose(struct drm_device *dev, struct drm_file *filep)
 }
 #endif /* NV_DRM_ATOMIC_MODESET_AVAILABLE */

+static int nv_drm_open(struct drm_device *dev, struct drm_file *filep)
+{
+    _Static_assert(sizeof(filep->driver_priv) >= sizeof(u64),
+                   "filep->driver_priv can not hold an u64");
+    static atomic64_t id = ATOMIC_INIT(0);
+
+    filep->driver_priv = (void *)atomic64_inc_return(&id);
+
+    return 0;
+}
+
 #if defined(NV_DRM_MASTER_HAS_LEASES)
 static struct drm_master *nv_drm_find_lessee(struct drm_master *master,
                                             int lessee_id)
@@ -1504,6 +1711,10 @@ static const struct file_operations nv_drm_fops = {
    .read           = drm_read,

    .llseek         = noop_llseek,
+
+#if defined(NV_FILE_OPERATIONS_FOP_UNSIGNED_OFFSET_PRESENT)
+    .fop_flags   = FOP_UNSIGNED_OFFSET,
+#endif
 };

 static const struct drm_ioctl_desc nv_drm_ioctls[] = {
@@ -1522,6 +1733,9 @@ static const struct drm_ioctl_desc nv_drm_ioctls[] = {
    DRM_IOCTL_DEF_DRV(NVIDIA_GET_DEV_INFO,
                      nv_drm_get_dev_info_ioctl,
                      DRM_RENDER_ALLOW|DRM_UNLOCKED),
+    DRM_IOCTL_DEF_DRV(NVIDIA_GET_DRM_FILE_UNIQUE_ID,
+                      nv_drm_get_drm_file_unique_id_ioctl,
+                      DRM_RENDER_ALLOW|DRM_UNLOCKED),

 #if defined(NV_DRM_FENCE_AVAILABLE)
    DRM_IOCTL_DEF_DRV(NVIDIA_FENCE_SUPPORTED,
@@ -1604,6 +1818,9 @@ static struct drm_driver nv_drm_driver = {
    .driver_features        =
 #if defined(NV_DRM_DRIVER_PRIME_FLAG_PRESENT)
                               DRIVER_PRIME |
+#endif
+#if defined(NV_DRM_SYNCOBJ_FEATURES_PRESENT)
+                               DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE |
 #endif
                               DRIVER_GEM  | DRIVER_RENDER,

@@ -1615,14 +1832,19 @@ static struct drm_driver nv_drm_driver = {
    .num_ioctls             = ARRAY_SIZE(nv_drm_ioctls),

 /*
- * linux-next commit 71a7974ac701 ("drm/prime: Unexport helpers for fd/handle
- * conversion") unexports drm_gem_prime_handle_to_fd() and
- * drm_gem_prime_fd_to_handle().
+ * Linux kernel v6.6 commit 6b85aa68d9d5 ("drm: Enable PRIME import/export for
+ * all drivers") made drm_gem_prime_handle_to_fd() /
+ * drm_gem_prime_fd_to_handle() the default when .prime_handle_to_fd /
+ * .prime_fd_to_handle are unspecified, respectively.
 *
- * Prior linux-next commit 6b85aa68d9d5 ("drm: Enable PRIME import/export for
- * all drivers") made these helpers the default when .prime_handle_to_fd /
- * .prime_fd_to_handle are unspecified, so it's fine to just skip specifying
- * them if the helpers aren't present.
+ * Linux kernel v6.6 commit 71a7974ac701 ("drm/prime: Unexport helpers for
+ * fd/handle conversion") unexports drm_gem_prime_handle_to_fd() and
+ * drm_gem_prime_fd_to_handle(). However, because of the aforementioned commit,
+ * it's fine to just skip specifying them in this case.
+ *
+ * Linux kernel v6.7 commit 0514f63cfff3 ("Revert "drm/prime: Unexport helpers
+ * for fd/handle conversion"") exported the helpers again, but left the default
+ * behavior intact. Nonetheless, it does not hurt to specify them.
 */
 #if NV_IS_EXPORT_SYMBOL_PRESENT_drm_gem_prime_handle_to_fd
    .prime_handle_to_fd     = drm_gem_prime_handle_to_fd,
@@ -1634,6 +1856,21 @@ static struct drm_driver nv_drm_driver = {
    .gem_prime_import       = nv_drm_gem_prime_import,
    .gem_prime_import_sg_table = nv_drm_gem_prime_import_sg_table,

+/*
+ * Linux kernel v5.0 commit 7698799f95 ("drm/prime: Add drm_gem_prime_mmap()")
+ * added drm_gem_prime_mmap().
+ *
+ * Linux kernel v6.6 commit 0adec22702d4 ("drm: Remove struct
+ * drm_driver.gem_prime_mmap") removed .gem_prime_mmap, but replaced it with a
+ * direct call to drm_gem_prime_mmap().
+ *
+ * TODO: Support .gem_prime_mmap on Linux < v5.0 using internal implementation.
+ */
+#if defined(NV_DRM_GEM_PRIME_MMAP_PRESENT) && \
+    defined(NV_DRM_DRIVER_HAS_GEM_PRIME_MMAP)
+    .gem_prime_mmap         = drm_gem_prime_mmap,
+#endif
+
 #if defined(NV_DRM_DRIVER_HAS_GEM_PRIME_CALLBACKS)
    .gem_prime_export       = drm_gem_prime_export,
    .gem_prime_get_sg_table = nv_drm_gem_prime_get_sg_table,
@@ -1656,6 +1893,7 @@ static struct drm_driver nv_drm_driver = {
 #if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
    .postclose              = nv_drm_postclose,
 #endif
+    .open                   = nv_drm_open,

    .fops                   = &nv_drm_fops,

@@ -1714,6 +1952,7 @@ void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)
    struct nv_drm_device *nv_dev = NULL;
    struct drm_device *dev = NULL;
    struct device *device = gpu_info->os_device_ptr;
+    bool bus_is_pci;

    DRM_DEBUG(
        "Registering device for NVIDIA GPU ID 0x08%x",
@@ -1747,7 +1986,7 @@ void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)
    dev->dev_private = nv_dev;
    nv_dev->dev = dev;

-    bool bus_is_pci =
+    bus_is_pci =
 #if defined(NV_LINUX)
        device->bus == &pci_bus_type;
 #elif defined(NV_BSD)
@@ -1767,15 +2006,10 @@ void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)
        goto failed_drm_register;
    }

-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
+#if defined(NV_DRM_FBDEV_AVAILABLE)
    if (nv_drm_fbdev_module_param &&
        drm_core_check_feature(dev, DRIVER_MODESET)) {

-        if (!nvKms->grabOwnership(nv_dev->pDevice)) {
-            NV_DRM_DEV_LOG_ERR(nv_dev, "Failed to grab NVKMS modeset ownership");
-            goto failed_grab_ownership;
-        }
-
        if (bus_is_pci) {
            struct pci_dev *pdev = to_pci_dev(device);

@@ -1784,12 +2018,15 @@ void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)
 #else
            drm_aperture_remove_conflicting_pci_framebuffers(pdev, nv_drm_driver.name);
 #endif
+            nvKms->framebufferConsoleDisabled(nv_dev->pDevice);
        }
+        #if defined(NV_DRM_FBDEV_TTM_AVAILABLE)
+        drm_fbdev_ttm_setup(dev, 32);
+        #elif defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
        drm_fbdev_generic_setup(dev, 32);
-
-        nv_dev->hasFramebufferConsole = NV_TRUE;
+        #endif
    }
-#endif /* defined(NV_DRM_FBDEV_GENERIC_AVAILABLE) */
+#endif /* defined(NV_DRM_FBDEV_AVAILABLE) */

    /* Add NVIDIA-DRM device into list */

@@ -1798,12 +2035,6 @@ void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)

    return; /* Success */

-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
-failed_grab_ownership:
-
-    drm_dev_unregister(dev);
-#endif
-
 failed_drm_register:

    nv_drm_dev_free(dev);
@@ -1870,12 +2101,6 @@ void nv_drm_remove_devices(void)
        struct nv_drm_device *next = dev_list->next;
        struct drm_device *dev = dev_list->dev;

-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
-        if (dev_list->hasFramebufferConsole) {
-            drm_atomic_helper_shutdown(dev);
-            nvKms->releaseOwnership(dev_list->pDevice);
-        }
-#endif
        drm_dev_unregister(dev);
        nv_drm_dev_free(dev);

@@ -1943,12 +2168,12 @@ void nv_drm_suspend_resume(NvBool suspend)

        if (suspend) {
            drm_kms_helper_poll_disable(dev);
-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
+#if defined(NV_DRM_FBDEV_AVAILABLE)
            drm_fb_helper_set_suspend_unlocked(dev->fb_helper, 1);
 #endif
            drm_mode_config_reset(dev);
        } else {
-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
+#if defined(NV_DRM_FBDEV_AVAILABLE)
            drm_fb_helper_set_suspend_unlocked(dev->fb_helper, 0);
 #endif
            drm_kms_helper_poll_enable(dev);
--- a/kernel-open/nvidia-drm/nvidia-drm-fb.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-fb.c
@@ -36,12 +36,15 @@

 static void __nv_drm_framebuffer_free(struct nv_drm_framebuffer *nv_fb)
 {
+    struct drm_framebuffer *fb = &nv_fb->base;
    uint32_t i;

    /* Unreference gem object */
-    for (i = 0; i < ARRAY_SIZE(nv_fb->nv_gem); i++) {
-        if (nv_fb->nv_gem[i] != NULL) {
-            nv_drm_gem_object_unreference_unlocked(nv_fb->nv_gem[i]);
+    for (i = 0; i < NVKMS_MAX_PLANES_PER_SURFACE; i++) {
+        struct drm_gem_object *gem = nv_fb_get_gem_obj(fb, i);
+        if (gem != NULL) {
+            struct nv_drm_gem_object *nv_gem = to_nv_gem_object(gem);
+            nv_drm_gem_object_unreference_unlocked(nv_gem);
        }
    }

@@ -69,10 +72,8 @@ static int
 nv_drm_framebuffer_create_handle(struct drm_framebuffer *fb,
                                 struct drm_file *file, unsigned int *handle)
 {
-    struct nv_drm_framebuffer *nv_fb = to_nv_framebuffer(fb);
-
    return nv_drm_gem_handle_create(file,
-                                    nv_fb->nv_gem[0],
+                                    to_nv_gem_object(nv_fb_get_gem_obj(fb, 0)),
                                    handle);
 }

@@ -88,6 +89,7 @@ static struct nv_drm_framebuffer *nv_drm_framebuffer_alloc(
 {
    struct nv_drm_device *nv_dev = to_nv_device(dev);
    struct nv_drm_framebuffer *nv_fb;
+    struct nv_drm_gem_object *nv_gem;
    const int num_planes = nv_drm_format_num_planes(cmd->pixel_format);
    uint32_t i;

@@ -101,21 +103,22 @@ static struct nv_drm_framebuffer *nv_drm_framebuffer_alloc(
        return ERR_PTR(-ENOMEM);
    }

-    if (num_planes > ARRAY_SIZE(nv_fb->nv_gem)) {
+    if (num_planes > NVKMS_MAX_PLANES_PER_SURFACE) {
        NV_DRM_DEV_DEBUG_DRIVER(nv_dev, "Unsupported number of planes");
        goto failed;
    }

    for (i = 0; i < num_planes; i++) {
-        if ((nv_fb->nv_gem[i] = nv_drm_gem_object_lookup(
-                        dev,
-                        file,
-                        cmd->handles[i])) == NULL) {
+        nv_gem = nv_drm_gem_object_lookup(dev, file, cmd->handles[i]);
+
+        if (nv_gem == NULL) {
            NV_DRM_DEV_DEBUG_DRIVER(
                nv_dev,
                "Failed to find gem object of type nvkms memory");
            goto failed;
        }
+
+        nv_fb_set_gem_obj(&nv_fb->base, i, &nv_gem->base);
    }

     return nv_fb;
@@ -135,12 +138,14 @@ static int nv_drm_framebuffer_init(struct drm_device *dev,
 {
    struct nv_drm_device *nv_dev = to_nv_device(dev);
    struct NvKmsKapiCreateSurfaceParams params = { };
+    struct nv_drm_gem_object *nv_gem;
+    struct drm_framebuffer *fb = &nv_fb->base;
    uint32_t i;
    int ret;

    /* Initialize the base framebuffer object and add it to drm subsystem */

-    ret = drm_framebuffer_init(dev, &nv_fb->base, &nv_framebuffer_funcs);
+    ret = drm_framebuffer_init(dev, fb, &nv_framebuffer_funcs);
    if (ret != 0) {
        NV_DRM_DEV_DEBUG_DRIVER(
            nv_dev,
@@ -148,23 +153,18 @@ static int nv_drm_framebuffer_init(struct drm_device *dev,
        return ret;
    }

-    for (i = 0; i < ARRAY_SIZE(nv_fb->nv_gem); i++) {
-        if (nv_fb->nv_gem[i] != NULL) {
-            if (!nvKms->isMemoryValidForDisplay(nv_dev->pDevice,
-                                                nv_fb->nv_gem[i]->pMemory)) {
-                NV_DRM_DEV_LOG_INFO(
-                        nv_dev,
-                        "Framebuffer memory not appropriate for scanout");
-                goto fail;
-            }
+    for (i = 0; i < NVKMS_MAX_PLANES_PER_SURFACE; i++) {
+        struct drm_gem_object *gem = nv_fb_get_gem_obj(fb, i);
+        if (gem != NULL) {
+            nv_gem = to_nv_gem_object(gem);

-            params.planes[i].memory = nv_fb->nv_gem[i]->pMemory;
-            params.planes[i].offset = nv_fb->base.offsets[i];
-            params.planes[i].pitch = nv_fb->base.pitches[i];
+            params.planes[i].memory = nv_gem->pMemory;
+            params.planes[i].offset = fb->offsets[i];
+            params.planes[i].pitch = fb->pitches[i];
        }
    }
-    params.height = nv_fb->base.height;
-    params.width = nv_fb->base.width;
+    params.height = fb->height;
+    params.width = fb->width;
    params.format = format;

    if (have_modifier) {
@@ -199,7 +199,7 @@ static int nv_drm_framebuffer_init(struct drm_device *dev,
    return 0;

 fail:
-    drm_framebuffer_cleanup(&nv_fb->base);
+    drm_framebuffer_cleanup(fb);
    return -EINVAL;
 }

--- a/kernel-open/nvidia-drm/nvidia-drm-fb.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-fb.h
@@ -41,8 +41,10 @@
 struct nv_drm_framebuffer {
    struct NvKmsKapiSurface *pSurface;

-    struct nv_drm_gem_object*
-        nv_gem[NVKMS_MAX_PLANES_PER_SURFACE];
+#if !defined(NV_DRM_FRAMEBUFFER_OBJ_PRESENT)
+    struct drm_gem_object*
+        obj[NVKMS_MAX_PLANES_PER_SURFACE];
+#endif

    struct drm_framebuffer base;
 };
@@ -56,6 +58,29 @@ static inline struct nv_drm_framebuffer *to_nv_framebuffer(
    return container_of(fb, struct nv_drm_framebuffer, base);
 }

+static inline struct drm_gem_object *nv_fb_get_gem_obj(
+    struct drm_framebuffer *fb,
+    uint32_t plane)
+{
+#if defined(NV_DRM_FRAMEBUFFER_OBJ_PRESENT)
+    return fb->obj[plane];
+#else
+    return to_nv_framebuffer(fb)->obj[plane];
+#endif
+}
+
+static inline void nv_fb_set_gem_obj(
+    struct drm_framebuffer *fb,
+    uint32_t plane,
+    struct drm_gem_object *obj)
+{
+#if defined(NV_DRM_FRAMEBUFFER_OBJ_PRESENT)
+    fb->obj[plane] = obj;
+#else
+    to_nv_framebuffer(fb)->obj[plane] = obj;
+#endif
+}
+
 struct drm_framebuffer *nv_drm_internal_framebuffer_create(
    struct drm_device *dev,
    struct drm_file *file,
--- a/kernel-open/nvidia-drm/nvidia-drm-fence.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-fence.c
@@ -293,14 +293,12 @@ __nv_drm_prime_fence_context_new(
     * to check a return value.
     */

-    *nv_prime_fence_context = (struct nv_drm_prime_fence_context) {
-        .base.ops = &nv_drm_prime_fence_context_ops,
-        .base.nv_dev = nv_dev,
-        .base.context = nv_dma_fence_context_alloc(1),
-        .base.fenceSemIndex = p->index,
-        .pSemSurface = pSemSurface,
-        .pLinearAddress = pLinearAddress,
-    };
+    nv_prime_fence_context->base.ops = &nv_drm_prime_fence_context_ops;
+    nv_prime_fence_context->base.nv_dev = nv_dev;
+    nv_prime_fence_context->base.context = nv_dma_fence_context_alloc(1);
+    nv_prime_fence_context->base.fenceSemIndex = p->index;
+    nv_prime_fence_context->pSemSurface = pSemSurface;
+    nv_prime_fence_context->pLinearAddress = pLinearAddress;

    INIT_LIST_HEAD(&nv_prime_fence_context->pending);

@@ -465,10 +463,15 @@ int nv_drm_prime_fence_context_create_ioctl(struct drm_device *dev,
 {
    struct nv_drm_device *nv_dev = to_nv_device(dev);
    struct drm_nvidia_prime_fence_context_create_params *p = data;
-    struct nv_drm_prime_fence_context *nv_prime_fence_context =
-        __nv_drm_prime_fence_context_new(nv_dev, p);
+    struct nv_drm_prime_fence_context *nv_prime_fence_context;
    int err;

+    if (nv_dev->pDevice == NULL) {
+        return -EOPNOTSUPP;
+    }
+
+    nv_prime_fence_context = __nv_drm_prime_fence_context_new(nv_dev, p);
+
    if (!nv_prime_fence_context) {
        goto done;
    }
@@ -523,6 +526,11 @@ int nv_drm_gem_prime_fence_attach_ioctl(struct drm_device *dev,
    struct nv_drm_fence_context *nv_fence_context;
    nv_dma_fence_t *fence;

+    if (nv_dev->pDevice == NULL) {
+        ret = -EOPNOTSUPP;
+        goto done;
+    }
+
    if (p->__pad != 0) {
        NV_DRM_DEV_LOG_ERR(nv_dev, "Padding fields must be zeroed");
        goto done;
@@ -1261,18 +1269,16 @@ __nv_drm_semsurf_fence_ctx_new(
     * to check a return value.
     */

-    *ctx = (struct nv_drm_semsurf_fence_ctx) {
-        .base.ops = &nv_drm_semsurf_fence_ctx_ops,
-        .base.nv_dev = nv_dev,
-        .base.context = nv_dma_fence_context_alloc(1),
-        .base.fenceSemIndex = p->index,
-        .pSemSurface = pSemSurface,
-        .pSemMapping.pVoid = semMapping,
-        .pMaxSubmittedMapping = (volatile NvU64 *)maxSubmittedMapping,
-        .callback.local = NULL,
-        .callback.nvKms = NULL,
-        .current_wait_value = 0,
-    };
+    ctx->base.ops = &nv_drm_semsurf_fence_ctx_ops;
+    ctx->base.nv_dev = nv_dev;
+    ctx->base.context = nv_dma_fence_context_alloc(1);
+    ctx->base.fenceSemIndex = p->index;
+    ctx->pSemSurface = pSemSurface;
+    ctx->pSemMapping.pVoid = semMapping;
+    ctx->pMaxSubmittedMapping = (volatile NvU64 *)maxSubmittedMapping;
+    ctx->callback.local = NULL;
+    ctx->callback.nvKms = NULL;
+    ctx->current_wait_value = 0;

    spin_lock_init(&ctx->lock);
    INIT_LIST_HEAD(&ctx->pending_fences);
@@ -1312,6 +1318,10 @@ int nv_drm_semsurf_fence_ctx_create_ioctl(struct drm_device *dev,
    struct nv_drm_semsurf_fence_ctx *ctx;
    int err;

+    if (nv_dev->pDevice == NULL) {
+        return -EOPNOTSUPP;
+    }
+
    if (p->__pad != 0) {
        NV_DRM_DEV_LOG_ERR(nv_dev, "Padding fields must be zeroed");
        return -EINVAL;
@@ -1473,6 +1483,11 @@ int nv_drm_semsurf_fence_create_ioctl(struct drm_device *dev,
    int ret = -EINVAL;
    int fd;

+    if (nv_dev->pDevice == NULL) {
+        ret = -EOPNOTSUPP;
+        goto done;
+    }
+
    if (p->__pad != 0) {
        NV_DRM_DEV_LOG_ERR(nv_dev, "Padding fields must be zeroed");
        goto done;
@@ -1635,6 +1650,10 @@ int nv_drm_semsurf_fence_wait_ioctl(struct drm_device *dev,
    unsigned long flags;
    int ret = -EINVAL;

+    if (nv_dev->pDevice == NULL) {
+        return -EOPNOTSUPP;
+    }
+
    if (p->pre_wait_value >= p->post_wait_value) {
        NV_DRM_DEV_LOG_ERR(
            nv_dev,
@@ -1743,6 +1762,11 @@ int nv_drm_semsurf_fence_attach_ioctl(struct drm_device *dev,
    nv_dma_fence_t *fence;
    int ret = -EINVAL;

+    if (nv_dev->pDevice == NULL) {
+        ret = -EOPNOTSUPP;
+        goto done;
+    }
+
    nv_gem = nv_drm_gem_object_lookup(nv_dev->dev, filep, p->handle);

    if (!nv_gem) {
--- a/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c
@@ -71,9 +71,20 @@ static void __nv_drm_gem_nvkms_memory_free(struct nv_drm_gem_object *nv_gem)
    nv_drm_free(nv_nvkms_memory);
 }

+static int __nv_drm_gem_nvkms_map(
+    struct nv_drm_gem_nvkms_memory *nv_nvkms_memory);
+
 static int __nv_drm_gem_nvkms_mmap(struct nv_drm_gem_object *nv_gem,
                                   struct vm_area_struct *vma)
 {
+    struct nv_drm_gem_nvkms_memory *nv_nvkms_memory =
+        to_nv_nvkms_memory(nv_gem);
+
+    int ret = __nv_drm_gem_nvkms_map(nv_nvkms_memory);
+    if (ret) {
+       return ret;
+    }
+
    return drm_gem_mmap_obj(&nv_gem->base,
                drm_vma_node_size(&nv_gem->base.vma_node) << PAGE_SHIFT, vma);
 }
@@ -146,11 +157,18 @@ static struct drm_gem_object *__nv_drm_gem_nvkms_prime_dup(
 static int __nv_drm_gem_nvkms_map(
    struct nv_drm_gem_nvkms_memory *nv_nvkms_memory)
 {
+    int ret = 0;
    struct nv_drm_device *nv_dev = nv_nvkms_memory->base.nv_dev;
    struct NvKmsKapiMemory *pMemory = nv_nvkms_memory->base.pMemory;

+    mutex_lock(&nv_nvkms_memory->map_lock);
+
+    if (nv_nvkms_memory->physically_mapped) {
+        goto done;
+    }
+
    if (!nv_dev->hasVideoMemory) {
-        return 0;
+        goto done;
    }

    if (!nvKms->mapMemory(nv_dev->pDevice,
@@ -161,7 +179,8 @@ static int __nv_drm_gem_nvkms_map(
            nv_dev,
            "Failed to map NvKmsKapiMemory 0x%p",
            pMemory);
-        return -ENOMEM;
+        ret = -ENOMEM;
+        goto done;
    }

    nv_nvkms_memory->pWriteCombinedIORemapAddress = ioremap_wc(
@@ -177,7 +196,9 @@ static int __nv_drm_gem_nvkms_map(

    nv_nvkms_memory->physically_mapped = true;

-    return 0;
+done:
+    mutex_unlock(&nv_nvkms_memory->map_lock);
+    return ret;
 }

 static void *__nv_drm_gem_nvkms_prime_vmap(
@@ -186,14 +207,38 @@ static void *__nv_drm_gem_nvkms_prime_vmap(
    struct nv_drm_gem_nvkms_memory *nv_nvkms_memory =
        to_nv_nvkms_memory(nv_gem);

-    if (!nv_nvkms_memory->physically_mapped) {
-        int ret = __nv_drm_gem_nvkms_map(nv_nvkms_memory);
-        if (ret) {
-           return ERR_PTR(ret);
-        }
+    int ret = __nv_drm_gem_nvkms_map(nv_nvkms_memory);
+    if (ret) {
+       return ERR_PTR(ret);
    }

-    return nv_nvkms_memory->pWriteCombinedIORemapAddress;
+    if (nv_nvkms_memory->physically_mapped) {
+        return nv_nvkms_memory->pWriteCombinedIORemapAddress;
+    }
+
+    /*
+     * If this buffer isn't physically mapped, it might be backed by struct
+     * pages. Use vmap in that case.
+     */
+    if (nv_nvkms_memory->pages_count > 0) {
+         return nv_drm_vmap(nv_nvkms_memory->pages,
+                            nv_nvkms_memory->pages_count);
+    }
+
+    return ERR_PTR(-ENOMEM);
+}
+
+static void __nv_drm_gem_nvkms_prime_vunmap(
+    struct nv_drm_gem_object *nv_gem,
+    void *address)
+{
+    struct nv_drm_gem_nvkms_memory *nv_nvkms_memory =
+        to_nv_nvkms_memory(nv_gem);
+
+    if (!nv_nvkms_memory->physically_mapped &&
+        nv_nvkms_memory->pages_count > 0) {
+        nv_drm_vunmap(address);
+    }
 }

 static int __nv_drm_gem_map_nvkms_memory_offset(
@@ -201,17 +246,7 @@ static int __nv_drm_gem_map_nvkms_memory_offset(
    struct nv_drm_gem_object *nv_gem,
    uint64_t *offset)
 {
-    struct nv_drm_gem_nvkms_memory *nv_nvkms_memory =
-        to_nv_nvkms_memory(nv_gem);
-
-    if (!nv_nvkms_memory->physically_mapped) {
-        int ret = __nv_drm_gem_nvkms_map(nv_nvkms_memory);
-        if (ret) {
-           return ret;
-        }
-    }
-
-    return nv_drm_gem_create_mmap_offset(&nv_nvkms_memory->base, offset);
+    return nv_drm_gem_create_mmap_offset(nv_gem, offset);
 }

 static struct sg_table *__nv_drm_gem_nvkms_memory_prime_get_sg_table(
@@ -223,7 +258,7 @@ static struct sg_table *__nv_drm_gem_nvkms_memory_prime_get_sg_table(
    struct sg_table *sg_table;

    if (nv_nvkms_memory->pages_count == 0) {
-        NV_DRM_DEV_LOG_ERR(
+        NV_DRM_DEV_DEBUG_DRIVER(
                nv_dev,
                "Cannot create sg_table for NvKmsKapiMemory 0x%p",
                nv_gem->pMemory);
@@ -241,6 +276,7 @@ const struct nv_drm_gem_object_funcs nv_gem_nvkms_memory_ops = {
    .free = __nv_drm_gem_nvkms_memory_free,
    .prime_dup = __nv_drm_gem_nvkms_prime_dup,
    .prime_vmap = __nv_drm_gem_nvkms_prime_vmap,
+    .prime_vunmap = __nv_drm_gem_nvkms_prime_vunmap,
    .mmap = __nv_drm_gem_nvkms_mmap,
    .handle_vma_fault = __nv_drm_gem_nvkms_handle_vma_fault,
    .create_mmap_offset = __nv_drm_gem_map_nvkms_memory_offset,
@@ -265,6 +301,7 @@ static int __nv_drm_nvkms_gem_obj_init(
        return -EINVAL;
    }

+    mutex_init(&nv_nvkms_memory->map_lock);
    nv_nvkms_memory->pPhysicalAddress = NULL;
    nv_nvkms_memory->pWriteCombinedIORemapAddress = NULL;
    nv_nvkms_memory->physically_mapped = false;
@@ -380,7 +417,7 @@ int nv_drm_gem_import_nvkms_memory_ioctl(struct drm_device *dev,
    int ret;

    if (!drm_core_check_feature(dev, DRIVER_MODESET)) {
-        ret = -EINVAL;
+        ret = -EOPNOTSUPP;
        goto failed;
    }

@@ -430,7 +467,7 @@ int nv_drm_gem_export_nvkms_memory_ioctl(struct drm_device *dev,
    int ret = 0;

    if (!drm_core_check_feature(dev, DRIVER_MODESET)) {
-        ret = -EINVAL;
+        ret = -EOPNOTSUPP;
        goto done;
    }

@@ -483,7 +520,7 @@ int nv_drm_gem_alloc_nvkms_memory_ioctl(struct drm_device *dev,
    int ret = 0;

    if (!drm_core_check_feature(dev, DRIVER_MODESET)) {
-        ret = -EINVAL;
+        ret = -EOPNOTSUPP;
        goto failed;
    }

@@ -551,14 +588,12 @@ static struct drm_gem_object *__nv_drm_gem_nvkms_prime_dup(
 {
    struct nv_drm_device *nv_dev = to_nv_device(dev);
    const struct nv_drm_device *nv_dev_src;
-    const struct nv_drm_gem_nvkms_memory *nv_nvkms_memory_src;
    struct nv_drm_gem_nvkms_memory *nv_nvkms_memory;
    struct NvKmsKapiMemory *pMemory;

    BUG_ON(nv_gem_src == NULL || nv_gem_src->ops != &nv_gem_nvkms_memory_ops);

    nv_dev_src = to_nv_device(nv_gem_src->base.dev);
-    nv_nvkms_memory_src = to_nv_nvkms_memory_const(nv_gem_src);

    if ((nv_nvkms_memory =
            nv_drm_calloc(1, sizeof(*nv_nvkms_memory))) == NULL) {
--- a/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.h
@@ -32,8 +32,15 @@
 struct nv_drm_gem_nvkms_memory {
    struct nv_drm_gem_object base;

+    /*
+     * Lock to protect concurrent writes to physically_mapped, pPhysicalAddress,
+     * and pWriteCombinedIORemapAddress.
+     *
+     * __nv_drm_gem_nvkms_map(), the sole writer, is structured such that
+     * readers are not required to hold the lock.
+     */
+    struct mutex map_lock;
    bool physically_mapped;
-
    void *pPhysicalAddress;
    void *pWriteCombinedIORemapAddress;

--- a/kernel-open/nvidia-drm/nvidia-drm-gem-user-memory.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-gem-user-memory.c
@@ -36,6 +36,10 @@
 #include "linux/mm.h"
 #include "nv-mm.h"

+#if defined(NV_LINUX_PFN_T_H_PRESENT)
+#include "linux/pfn_t.h"
+#endif
+
 #if defined(NV_BSD)
 #include <vm/vm_pageout.h>
 #endif
@@ -103,6 +107,37 @@ static int __nv_drm_gem_user_memory_mmap(struct nv_drm_gem_object *nv_gem,
    return 0;
 }

+#if defined(NV_LINUX) && !defined(NV_VMF_INSERT_MIXED_PRESENT)
+static vm_fault_t __nv_vm_insert_mixed_helper(
+    struct vm_area_struct *vma,
+    unsigned long address,
+    unsigned long pfn)
+{
+    int ret;
+
+#if defined(NV_PFN_TO_PFN_T_PRESENT)
+    ret = vm_insert_mixed(vma, address, pfn_to_pfn_t(pfn));
+#else
+    ret = vm_insert_mixed(vma, address, pfn);
+#endif
+
+    switch (ret) {
+        case 0:
+        case -EBUSY:
+            /*
+             * EBUSY indicates that another thread already handled
+             * the faulted range.
+             */
+            return VM_FAULT_NOPAGE;
+        case -ENOMEM:
+            return VM_FAULT_OOM;
+        default:
+            WARN_ONCE(1, "Unhandled error in %s: %d\n", __FUNCTION__, ret);
+            return VM_FAULT_SIGBUS;
+    }
+}
+#endif
+
 static vm_fault_t __nv_drm_gem_user_memory_handle_vma_fault(
    struct nv_drm_gem_object *nv_gem,
    struct vm_area_struct *vma,
@@ -112,36 +147,19 @@ static vm_fault_t __nv_drm_gem_user_memory_handle_vma_fault(
    unsigned long address = nv_page_fault_va(vmf);
    struct drm_gem_object *gem = vma->vm_private_data;
    unsigned long page_offset;
-    vm_fault_t ret;
+    unsigned long pfn;

    page_offset = vmf->pgoff - drm_vma_node_start(&gem->vma_node);
-
    BUG_ON(page_offset >= nv_user_memory->pages_count);
+    pfn = page_to_pfn(nv_user_memory->pages[page_offset]);

 #if !defined(NV_LINUX)
-    ret = vmf_insert_pfn(vma, address, page_to_pfn(nv_user_memory->pages[page_offset]));
-#else /* !defined(NV_LINUX) */
-    ret = vm_insert_page(vma, address, nv_user_memory->pages[page_offset]);
-    switch (ret) {
-        case 0:
-        case -EBUSY:
-            /*
-             * EBUSY indicates that another thread already handled
-             * the faulted range.
-             */
-            ret = VM_FAULT_NOPAGE;
-            break;
-        case -ENOMEM:
-            ret = VM_FAULT_OOM;
-            break;
-        default:
-            WARN_ONCE(1, "Unhandled error in %s: %d\n", __FUNCTION__, ret);
-            ret = VM_FAULT_SIGBUS;
-            break;
-    }
-#endif /* !defined(NV_LINUX) */
-
-    return ret;
+    return vmf_insert_pfn(vma, address, pfn);
+#elif defined(NV_VMF_INSERT_MIXED_PRESENT)
+    return vmf_insert_mixed(vma, address, pfn_to_pfn_t(pfn));
+#else
+    return __nv_vm_insert_mixed_helper(vma, address, pfn);
+#endif
 }

 static int __nv_drm_gem_user_create_mmap_offset(
--- a/kernel-open/nvidia-drm/nvidia-drm-gem.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-gem.c
@@ -144,6 +144,12 @@ void nv_drm_gem_object_init(struct nv_drm_device *nv_dev,
 #endif

    drm_gem_private_object_init(dev, &nv_gem->base, size);
+
+    /* Create mmap offset early for drm_gem_prime_mmap(), if possible. */
+    if (nv_gem->ops->create_mmap_offset) {
+        uint64_t offset;
+        nv_gem->ops->create_mmap_offset(nv_dev, nv_gem, &offset);
+    }
 }

 struct drm_gem_object *nv_drm_gem_prime_import(struct drm_device *dev,
@@ -232,6 +238,7 @@ int nv_drm_gem_map_offset_ioctl(struct drm_device *dev,
        return -EINVAL;
    }

+    /* mmap offset creation is idempotent, fetch it by creating it again. */
    if (nv_gem->ops->create_mmap_offset) {
        ret = nv_gem->ops->create_mmap_offset(nv_dev, nv_gem, &params->offset);
    } else {
@@ -319,7 +326,7 @@ int nv_drm_gem_identify_object_ioctl(struct drm_device *dev,
    struct nv_drm_gem_object *nv_gem = NULL;

    if (!drm_core_check_feature(dev, DRIVER_MODESET)) {
-        return -EINVAL;
+        return -EOPNOTSUPP;
    }

    nv_dma_buf = nv_drm_gem_object_dma_buf_lookup(dev, filep, p->handle);
--- a/kernel-open/nvidia-drm/nvidia-drm-helper.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-helper.c
@@ -45,8 +45,7 @@

 /*
 * The inclusion of drm_framebuffer.h was removed from drm_crtc.h by commit
- * 720cf96d8fecde29b72e1101f8a567a0ce99594f ("drm: Drop drm_framebuffer.h from
- * drm_crtc.h") in linux-next, expected in v5.19-rc7.
+ * 720cf96d8fec ("drm: Drop drm_framebuffer.h from drm_crtc.h") in v6.0.
 *
 * We only need drm_framebuffer.h for drm_framebuffer_put(), and it is always
 * present (v4.9+) when drm_framebuffer_{put,get}() is present (v4.12+), so it
--- a/kernel-open/nvidia-drm/nvidia-drm-helper.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-helper.h
@@ -40,8 +40,13 @@
 #include <drm/drm_blend.h>
 #endif

-#if defined(NV_DRM_ROTATION_AVAILABLE)
-/* For DRM_MODE_ROTATE_* and DRM_MODE_REFLECT_* */
+#if defined(NV_DRM_ROTATION_AVAILABLE) || \
+    defined(NV_DRM_COLOR_CTM_3X4_PRESENT) || \
+    defined(NV_DRM_COLOR_LUT_PRESENT)
+/*
+ * For DRM_MODE_ROTATE_*, DRM_MODE_REFLECT_*, struct drm_color_ctm_3x4, and
+ * struct drm_color_lut.
+ */
 #include <uapi/drm/drm_mode.h>
 #endif

@@ -358,6 +363,24 @@ static inline void nv_drm_connector_put(struct drm_connector *connector)
 #endif
 }

+static inline void nv_drm_property_blob_put(struct drm_property_blob *blob)
+{
+#if defined(NV_DRM_PROPERTY_BLOB_PUT_PRESENT)
+    drm_property_blob_put(blob);
+#else
+    drm_property_unreference_blob(blob);
+#endif
+}
+
+static inline void nv_drm_property_blob_get(struct drm_property_blob *blob)
+{
+#if defined(NV_DRM_PROPERTY_BLOB_PUT_PRESENT)
+    drm_property_blob_get(blob);
+#else
+    drm_property_reference_blob(blob);
+#endif
+}
+
 static inline struct drm_crtc *
 nv_drm_crtc_find(struct drm_device *dev, struct drm_file *filep, uint32_t id)
 {
@@ -613,8 +636,8 @@ static inline int nv_drm_format_num_planes(uint32_t format)
 #endif /* defined(NV_DRM_FORMAT_MODIFIERS_PRESENT) */

 /*
- * DRM_UNLOCKED was removed with linux-next commit 2798ffcc1d6a ("drm: Remove
- * locking for legacy ioctls and DRM_UNLOCKED"), but it was previously made
+ * DRM_UNLOCKED was removed with commit 2798ffcc1d6a ("drm: Remove locking for
+ * legacy ioctls and DRM_UNLOCKED") in v6.8, but it was previously made
 * implicit for all non-legacy DRM driver IOCTLs since Linux v4.10 commit
 * fa5386459f06 "drm: Used DRM_LEGACY for all legacy functions" (Linux v4.4
 * commit ea487835e887 "drm: Enforce unlocked ioctl operation for kms driver
@@ -625,6 +648,31 @@ static inline int nv_drm_format_num_planes(uint32_t format)
 #define DRM_UNLOCKED 0
 #endif

+/*
+ * struct drm_color_ctm_3x4 was added by commit 6872a189be50 ("drm/amd/display:
+ * Add 3x4 CTM support for plane CTM") in v6.8. For backwards compatibility,
+ * define it when not present.
+ */
+#if !defined(NV_DRM_COLOR_CTM_3X4_PRESENT)
+struct drm_color_ctm_3x4 {
+    __u64 matrix[12];
+};
+#endif
+
+/*
+ * struct drm_color_lut was added by commit 5488dc16fde7 ("drm: introduce pipe
+ * color correction properties") in v4.6. For backwards compatibility, define it
+ * when not present.
+ */
+#if !defined(NV_DRM_COLOR_LUT_PRESENT)
+struct drm_color_lut {
+    __u16 red;
+    __u16 green;
+    __u16 blue;
+    __u16 reserved;
+};
+#endif
+
 /*
 * drm_vma_offset_exact_lookup_locked() were added
 * by kernel commit 2225cfe46bcc which was Signed-off-by:
--- a/kernel-open/nvidia-drm/nvidia-drm-ioctl.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-ioctl.h
@@ -52,6 +52,7 @@
 #define DRM_NVIDIA_SEMSURF_FENCE_CREATE             0x15
 #define DRM_NVIDIA_SEMSURF_FENCE_WAIT               0x16
 #define DRM_NVIDIA_SEMSURF_FENCE_ATTACH             0x17
+#define DRM_NVIDIA_GET_DRM_FILE_UNIQUE_ID           0x18

 #define DRM_IOCTL_NVIDIA_GEM_IMPORT_NVKMS_MEMORY                           \
    DRM_IOWR((DRM_COMMAND_BASE + DRM_NVIDIA_GEM_IMPORT_NVKMS_MEMORY),      \
@@ -157,6 +158,11 @@
              DRM_NVIDIA_SEMSURF_FENCE_ATTACH),                         \
              struct drm_nvidia_semsurf_fence_attach_params)

+#define DRM_IOCTL_NVIDIA_GET_DRM_FILE_UNIQUE_ID                         \
+    DRM_IOWR((DRM_COMMAND_BASE +                                        \
+              DRM_NVIDIA_GET_DRM_FILE_UNIQUE_ID),                       \
+              struct drm_nvidia_get_drm_file_unique_id_params)
+
 struct drm_nvidia_gem_import_nvkms_memory_params {
    uint64_t mem_size;           /* IN */

@@ -385,4 +391,8 @@ struct drm_nvidia_semsurf_fence_attach_params {
    uint64_t wait_value;            /* IN Semaphore value to reach before signal */
 };

+struct drm_nvidia_get_drm_file_unique_id_params {
+    uint64_t id;                    /* OUT Unique ID of the DRM file */
+};
+
 #endif /* _UAPI_NVIDIA_DRM_IOCTL_H_ */
--- a/kernel-open/nvidia-drm/nvidia-drm-linux.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-linux.c
@@ -34,7 +34,7 @@ MODULE_PARM_DESC(
    "Enable atomic kernel modesetting (1 = enable, 0 = disable (default))");
 module_param_named(modeset, nv_drm_modeset_module_param, bool, 0400);

-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
+#if defined(NV_DRM_FBDEV_AVAILABLE)
 MODULE_PARM_DESC(
    fbdev,
    "Create a framebuffer device (1 = enable, 0 = disable (default)) (EXPERIMENTAL)");
--- a/kernel-open/nvidia-drm/nvidia-drm-modeset.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-modeset.c
@@ -42,6 +42,16 @@
 #include <drm/drm_atomic_helper.h>
 #include <drm/drm_crtc.h>

+#if defined(NV_LINUX_NVHOST_H_PRESENT) && defined(CONFIG_TEGRA_GRHOST)
+#include <linux/nvhost.h>
+#elif defined(NV_LINUX_HOST1X_NEXT_H_PRESENT)            
+#include <linux/host1x-next.h>
+#endif
+
+#if defined(NV_DRM_FENCE_AVAILABLE)
+#include "nvidia-dma-fence-helper.h"
+#endif
+
 struct nv_drm_atomic_state {
    struct NvKmsKapiRequestedModeSetConfig config;
    struct drm_atomic_state base;
@@ -146,6 +156,165 @@ static int __nv_drm_put_back_post_fence_fd(
    return ret;
 }

+#if defined(NV_DRM_FENCE_AVAILABLE)
+struct nv_drm_plane_fence_cb_data {
+    nv_dma_fence_cb_t dma_fence_cb;
+    struct nv_drm_device *nv_dev;
+    NvU32 semaphore_index;
+};
+
+static void
+__nv_drm_plane_fence_cb(
+    nv_dma_fence_t *fence,
+    nv_dma_fence_cb_t *cb_data
+)
+{
+    struct nv_drm_plane_fence_cb_data *fence_data =
+        container_of(cb_data, typeof(*fence_data), dma_fence_cb);
+    struct nv_drm_device *nv_dev = fence_data->nv_dev;
+
+    nv_dma_fence_put(fence);
+    nvKms->signalDisplaySemaphore(nv_dev->pDevice, fence_data->semaphore_index);
+    nv_drm_free(fence_data);
+}
+
+static int __nv_drm_convert_in_fences(
+    struct nv_drm_device *nv_dev,
+    struct drm_atomic_state *state,
+    struct drm_crtc *crtc,
+    struct drm_crtc_state *crtc_state)
+{
+    struct drm_plane *plane = NULL;
+    struct drm_plane_state *plane_state = NULL;
+    struct nv_drm_plane *nv_plane = NULL;
+    struct NvKmsKapiLayerRequestedConfig *plane_req_config = NULL;
+    struct NvKmsKapiHeadRequestedConfig *head_req_config =
+        &to_nv_crtc_state(crtc_state)->req_config;
+    struct nv_drm_plane_fence_cb_data *fence_data;
+    uint32_t semaphore_index;
+    uint32_t idx_count;
+    int ret, i;
+
+    if (!crtc_state->active) {
+        return 0;
+    }
+
+    nv_drm_for_each_new_plane_in_state(state, plane, plane_state, i) {
+        if ((plane->type == DRM_PLANE_TYPE_CURSOR) ||
+            (plane_state->crtc != crtc) ||
+            (plane_state->fence == NULL)) {
+            continue;
+        }
+
+        nv_plane = to_nv_plane(plane);
+        plane_req_config =
+            &head_req_config->layerRequestedConfig[nv_plane->layer_idx];
+
+        if (nv_dev->supportsSyncpts) {
+#if defined(NV_LINUX_NVHOST_H_PRESENT) && defined(CONFIG_TEGRA_GRHOST)
+#if defined(NV_NVHOST_DMA_FENCE_UNPACK_PRESENT)
+            int ret =
+                nvhost_dma_fence_unpack(
+                    plane_state->fence,
+                    &plane_req_config->config.syncParams.u.syncpt.preSyncptId,
+                    &plane_req_config->config.syncParams.u.syncpt.preSyncptValue);
+            if (ret == 0) {
+                plane_req_config->config.syncParams.preSyncptSpecified = true;
+                continue;
+            }
+#endif
+#elif defined(NV_LINUX_HOST1X_NEXT_H_PRESENT)
+            int ret =
+                host1x_fence_extract(
+                    plane_state->fence,
+                    &plane_req_config->config.syncParams.u.syncpt.preSyncptId,
+                    &plane_req_config->config.syncParams.u.syncpt.preSyncptValue);
+            if (ret == 0) {
+                plane_req_config->config.syncParams.preSyncptSpecified = true;
+                continue;
+            }
+#endif
+        }
+
+        /*
+         * Syncpt extraction failed, or syncpts are not supported.
+         * Use general DRM fence support with semaphores instead.
+         */
+        if (plane_req_config->config.syncParams.postSyncptRequested) {
+            // Can't mix Syncpts and semaphores in a given request.
+            return -EINVAL;
+        }
+
+        for (idx_count = 0; idx_count < nv_dev->display_semaphores.count; idx_count++) {
+            semaphore_index = nv_drm_next_display_semaphore(nv_dev);
+            if (nvKms->tryInitDisplaySemaphore(nv_dev->pDevice, semaphore_index)) {
+                break;
+            }
+        }
+
+        if (idx_count == nv_dev->display_semaphores.count) {
+            NV_DRM_DEV_LOG_ERR(
+                nv_dev,
+                "Failed to initialize semaphore for plane fence");
+            /*
+             * This should only happen if the semaphore pool was somehow
+             * exhausted. Waiting a bit and retrying may help in that case.
+             */
+            return -EAGAIN;
+        }
+
+        plane_req_config->config.syncParams.semaphoreSpecified = true;
+        plane_req_config->config.syncParams.u.semaphore.index = semaphore_index;
+
+        fence_data = nv_drm_calloc(1, sizeof(*fence_data));
+
+        if (!fence_data) {
+            NV_DRM_DEV_LOG_ERR(
+                nv_dev,
+                "Failed to allocate callback data for plane fence");
+            nvKms->cancelDisplaySemaphore(nv_dev->pDevice, semaphore_index);
+            return -ENOMEM;
+        }
+
+        fence_data->nv_dev = nv_dev;
+        fence_data->semaphore_index = semaphore_index;
+
+        ret = nv_dma_fence_add_callback(plane_state->fence,
+                                        &fence_data->dma_fence_cb,
+                                        __nv_drm_plane_fence_cb);
+
+        switch (ret) {
+        case -ENOENT:
+            /* The fence is already signaled */
+            __nv_drm_plane_fence_cb(plane_state->fence,
+                                    &fence_data->dma_fence_cb);
+#if defined(fallthrough)
+            fallthrough;
+#else
+            /* Fallthrough */
+#endif
+        case 0:
+            /*
+             * The plane state's fence reference has either been consumed or
+             * belongs to the outstanding callback now.
+             */
+            plane_state->fence = NULL;
+            break;
+        default:
+            NV_DRM_DEV_LOG_ERR(
+                nv_dev,
+                "Failed plane fence callback registration");
+            /* Fence callback registration failed */
+            nvKms->cancelDisplaySemaphore(nv_dev->pDevice, semaphore_index);
+            nv_drm_free(fence_data);
+            return ret;
+        }
+    }
+
+    return 0;
+}
+#endif /* defined(NV_DRM_FENCE_AVAILABLE) */
+
 static int __nv_drm_get_syncpt_data(
    struct nv_drm_device *nv_dev,
    struct drm_crtc *crtc,
@@ -245,6 +414,31 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,
        return -EINVAL;
    }

+#if defined(NV_DRM_FRAMEBUFFER_OBJ_PRESENT)
+    if (commit) {
+        /*
+         * This function does what is necessary to prepare the framebuffers
+         * attached to each new plane in the state for scan out, mostly by
+         * calling back into driver callbacks the NVIDIA driver does not
+         * provide. The end result is that all it does on the NVIDIA driver
+         * is populate the plane state's dma fence pointers with any implicit
+         * sync fences attached to the GEM objects associated with those planes
+         * in the new state, prefering explicit sync fences when appropriate.
+         * This must be done prior to converting the per-plane fences to
+         * semaphore waits below.
+         *
+         * Note this only works when the drm_framebuffer:obj[] field is present
+         * and populated, so skip calling this function on kernels where that
+         * field is not present.
+         */
+        ret = drm_atomic_helper_prepare_planes(dev, state);
+
+        if (ret) {
+            return ret;
+        }
+    }
+#endif /* defined(NV_DRM_FRAMEBUFFER_OBJ_PRESENT) */
+
    memset(requested_config, 0, sizeof(*requested_config));

    /* Loop over affected crtcs and construct NvKmsKapiRequestedModeSetConfig */
@@ -258,11 +452,6 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,
                               commit ? crtc->state : crtc_state;
        struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);

-        requested_config->headRequestedConfig[nv_crtc->head] =
-            to_nv_crtc_state(new_crtc_state)->req_config;
-
-        requested_config->headsMask |= 1 << nv_crtc->head;
-
        if (commit) {
            struct drm_crtc_state *old_crtc_state = crtc_state;
            struct nv_drm_crtc_state *nv_new_crtc_state =
@@ -282,7 +471,27 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,

                nv_new_crtc_state->nv_flip = NULL;
            }
+
+#if defined(NV_DRM_FENCE_AVAILABLE)
+            ret = __nv_drm_convert_in_fences(nv_dev,
+                                             state,
+                                             crtc,
+                                             new_crtc_state);
+
+            if (ret != 0) {
+                return ret;
+            }
+#endif /* defined(NV_DRM_FENCE_AVAILABLE) */
        }
+
+        /*
+         * Do this deep copy after calling __nv_drm_convert_in_fences,
+         * which modifies the new CRTC state's req_config member
+         */
+        requested_config->headRequestedConfig[nv_crtc->head] =
+            to_nv_crtc_state(new_crtc_state)->req_config;
+
+        requested_config->headsMask |= 1 << nv_crtc->head;
    }

    if (commit && nvKms->systemInfo.bAllowWriteCombining) {
@@ -313,6 +522,10 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,
        }
    }

+    if (commit && nv_dev->requiresVrrSemaphores && reply_config.vrrFlip) {
+        nvKms->signalVrrSemaphore(nv_dev->pDevice, reply_config.vrrSemaphoreIndex);
+    }
+
    return 0;
 }

@@ -506,7 +719,6 @@ int nv_drm_atomic_commit(struct drm_device *dev,

        goto done;
    }
-    nv_dev->drmMasterChangedSinceLastAtomicCommit = NV_FALSE;

    nv_drm_for_each_crtc_in_state(state, crtc, crtc_state, i) {
        struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
@@ -587,6 +799,9 @@ int nv_drm_atomic_commit(struct drm_device *dev,
                NV_DRM_DEV_LOG_ERR(
                    nv_dev,
                    "Flip event timeout on head %u", nv_crtc->head);
+                while (!list_empty(&nv_crtc->flip_list)) {
+                    __nv_drm_handle_flip_event(nv_crtc);
+                }
            }
        }
    }
--- a/kernel-open/nvidia-drm/nvidia-drm-os-interface.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-os-interface.h
@@ -59,14 +59,20 @@ typedef struct nv_timer nv_drm_timer;
 #endif

 #if defined(NV_DRM_FBDEV_GENERIC_SETUP_PRESENT) && defined(NV_DRM_APERTURE_REMOVE_CONFLICTING_PCI_FRAMEBUFFERS_PRESENT)
+#define NV_DRM_FBDEV_AVAILABLE
 #define NV_DRM_FBDEV_GENERIC_AVAILABLE
 #endif

+#if defined(NV_DRM_FBDEV_TTM_SETUP_PRESENT) && defined(NV_DRM_APERTURE_REMOVE_CONFLICTING_PCI_FRAMEBUFFERS_PRESENT)
+#define NV_DRM_FBDEV_AVAILABLE
+#define NV_DRM_FBDEV_TTM_AVAILABLE
+#endif
+
 struct page;

 /* Set to true when the atomic modeset feature is enabled. */
 extern bool nv_drm_modeset_module_param;
-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
+#if defined(NV_DRM_FBDEV_AVAILABLE)
 /* Set to true when the nvidia-drm driver should install a framebuffer device */
 extern bool nv_drm_fbdev_module_param;
 #endif
--- a/kernel-open/nvidia-drm/nvidia-drm-priv.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-priv.h
@@ -147,29 +147,56 @@ struct nv_drm_device {
    NvBool hasVideoMemory;

    NvBool supportsSyncpts;
+    NvBool requiresVrrSemaphores;
    NvBool subOwnershipGranted;
    NvBool hasFramebufferConsole;

-    /**
-     * @drmMasterChangedSinceLastAtomicCommit:
-     *
-     * This flag is set in nv_drm_master_set and reset after a completed atomic
-     * commit. It is used to restore or recommit state that is lost by the
-     * NvKms modeset owner change, such as the CRTC color management
-     * properties.
-     */
-    NvBool drmMasterChangedSinceLastAtomicCommit;
-
    struct drm_property *nv_out_fence_property;
    struct drm_property *nv_input_colorspace_property;

+    struct {
+        NvU32 count;
+        NvU32 next_index;
+    } display_semaphores;
+
 #if defined(NV_DRM_HAS_HDR_OUTPUT_METADATA)
    struct drm_property *nv_hdr_output_metadata_property;
 #endif

+    struct drm_property *nv_plane_lms_ctm_property;
+    struct drm_property *nv_plane_lms_to_itp_ctm_property;
+    struct drm_property *nv_plane_itp_to_lms_ctm_property;
+    struct drm_property *nv_plane_blend_ctm_property;
+
+    struct drm_property *nv_plane_degamma_tf_property;
+    struct drm_property *nv_plane_degamma_lut_property;
+    struct drm_property *nv_plane_degamma_lut_size_property;
+    struct drm_property *nv_plane_degamma_multiplier_property;
+
+    struct drm_property *nv_plane_tmo_lut_property;
+    struct drm_property *nv_plane_tmo_lut_size_property;
+
+    struct drm_property *nv_crtc_regamma_tf_property;
+    struct drm_property *nv_crtc_regamma_lut_property;
+    struct drm_property *nv_crtc_regamma_lut_size_property;
+    struct drm_property *nv_crtc_regamma_divisor_property;
+
    struct nv_drm_device *next;
 };

+static inline NvU32 nv_drm_next_display_semaphore(
+    struct nv_drm_device *nv_dev)
+{
+    NvU32 current_index = nv_dev->display_semaphores.next_index++;
+
+    if (nv_dev->display_semaphores.next_index >=
+        nv_dev->display_semaphores.count) {
+        nv_dev->display_semaphores.next_index = 0;
+    }
+
+    return current_index;
+}
+
 static inline struct nv_drm_device *to_nv_device(
    struct drm_device *dev)
 {
--- a/kernel-open/nvidia-drm/nvidia-drm-sources.mk
+++ b/kernel-open/nvidia-drm/nvidia-drm-sources.mk
@@ -67,10 +67,14 @@ NV_CONFTEST_FUNCTION_COMPILE_TESTS += fence_set_error
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += sync_file_get_fence
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_aperture_remove_conflicting_pci_framebuffers
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_fbdev_generic_setup
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_fbdev_ttm_setup
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_connector_attach_hdr_output_metadata_property
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_helper_crtc_enable_color_mgmt
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_crtc_enable_color_mgmt
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_atomic_helper_legacy_gamma_set
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += vmf_insert_mixed
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += pfn_to_pfn_t
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_gem_prime_mmap

 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_present
 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_has_bus_type
@@ -128,4 +132,12 @@ NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_dumb_destroy
 NV_CONFTEST_TYPE_COMPILE_TESTS += fence_ops_use_64bit_seqno
 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_aperture_remove_conflicting_pci_framebuffers_has_driver_arg
 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_mode_create_dp_colorspace_property_has_supported_colorspaces_arg
+NV_CONFTEST_TYPE_COMPILE_TESTS += drm_syncobj_features_present
 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_unlocked_ioctl_flag_present
+NV_CONFTEST_TYPE_COMPILE_TESTS += drm_framebuffer_obj_present
+NV_CONFTEST_TYPE_COMPILE_TESTS += drm_color_ctm_3x4_present
+NV_CONFTEST_TYPE_COMPILE_TESTS += drm_color_lut
+NV_CONFTEST_TYPE_COMPILE_TESTS += drm_property_blob_put
+NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_gem_prime_mmap
+NV_CONFTEST_TYPE_COMPILE_TESTS += drm_output_poll_changed
+NV_CONFTEST_TYPE_COMPILE_TESTS += file_operations_fop_unsigned_offset_present
--- a/kernel-open/nvidia-modeset/nv-kthread-q.c
+++ b/kernel-open/nvidia-modeset/nv-kthread-q.c
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2016-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -176,7 +176,7 @@ static struct task_struct *thread_create_on_node(int (*threadfn)(void *data),
 {

    unsigned i, j;
-    const static unsigned attempts = 3;
+    static const unsigned attempts = 3;
    struct task_struct *thread[3];

    for (i = 0;; i++) {
@@ -201,7 +201,7 @@ static struct task_struct *thread_create_on_node(int (*threadfn)(void *data),

        // Ran out of attempts - return thread even if its stack may not be
        // allocated on the preferred node
-        if ((i == (attempts - 1)))
+        if (i == (attempts - 1))
            break;

        // Get the NUMA node where the first page of the stack is resident. If
--- a/kernel-open/nvidia-modeset/nvidia-modeset-linux.c
+++ b/kernel-open/nvidia-modeset/nvidia-modeset-linux.c
@@ -77,15 +77,18 @@ module_param_named(disable_hdmi_frl, disable_hdmi_frl, bool, 0400);
 static bool disable_vrr_memclk_switch = false;
 module_param_named(disable_vrr_memclk_switch, disable_vrr_memclk_switch, bool, 0400);

-static bool hdmi_deepcolor = false;
+static bool hdmi_deepcolor = true;
 module_param_named(hdmi_deepcolor, hdmi_deepcolor, bool, 0400);

-static bool vblank_sem_control = false;
+static bool vblank_sem_control = true;
 module_param_named(vblank_sem_control, vblank_sem_control, bool, 0400);

 static bool opportunistic_display_sync = true;
 module_param_named(opportunistic_display_sync, opportunistic_display_sync, bool, 0400);

+static enum NvKmsDebugForceColorSpace debug_force_color_space = NVKMS_DEBUG_FORCE_COLOR_SPACE_NONE;
+module_param_named(debug_force_color_space, debug_force_color_space, uint, 0400);
+
 /* These parameters are used for fault injection tests.  Normally the defaults
 * should be used. */
 MODULE_PARM_DESC(fail_malloc, "Fail the Nth call to nvkms_alloc");
@@ -139,6 +142,28 @@ NvBool nvkms_opportunistic_display_sync(void)
    return opportunistic_display_sync;
 }

+enum NvKmsDebugForceColorSpace nvkms_debug_force_color_space(void)
+{
+    if (debug_force_color_space >= NVKMS_DEBUG_FORCE_COLOR_SPACE_MAX) {
+        return NVKMS_DEBUG_FORCE_COLOR_SPACE_NONE;
+    }
+    return debug_force_color_space;
+}
+
+NvBool nvkms_kernel_supports_syncpts(void)
+{
+/*
+ * Note this only checks that the kernel has the prerequisite
+ * support for syncpts; callers must also check that the hardware
+ * supports syncpts.
+ */
+#if (defined(CONFIG_TEGRA_GRHOST) || defined(NV_LINUX_HOST1X_NEXT_H_PRESENT))
+    return NV_TRUE;
+#else
+    return NV_FALSE;
+#endif
+}
+
 #define NVKMS_SYNCPT_STUBS_NEEDED

 /*************************************************************************
@@ -1070,7 +1095,7 @@ static void nvkms_kapi_event_kthread_q_callback(void *arg)
    nvKmsKapiHandleEventQueueChange(device);
 }

-struct nvkms_per_open *nvkms_open_common(enum NvKmsClientType type,
+static struct nvkms_per_open *nvkms_open_common(enum NvKmsClientType type,
                                         struct NvKmsKapiDevice *device,
                                         int *status)
 {
@@ -1122,7 +1147,7 @@ failed:
    return NULL;
 }

-void nvkms_close_pm_locked(struct nvkms_per_open *popen)
+static void nvkms_close_pm_locked(struct nvkms_per_open *popen)
 {
    /*
     * Don't use down_interruptible(): we need to free resources
@@ -1185,7 +1210,7 @@ static void nvkms_close_popen(struct nvkms_per_open *popen)
    }
 }

-int nvkms_ioctl_common
+static int nvkms_ioctl_common
 (
    struct nvkms_per_open *popen,
    NvU32 cmd, NvU64 address, const size_t size
@@ -1234,6 +1259,26 @@ void nvkms_close_from_kapi(struct nvkms_per_open *popen)
    nvkms_close_pm_unlocked(popen);
 }

+NvBool nvkms_ioctl_from_kapi_try_pmlock
+(
+    struct nvkms_per_open *popen,
+    NvU32 cmd, void *params_address, const size_t param_size
+)
+{
+    NvBool ret;
+
+    if (nvkms_read_trylock_pm_lock()) {
+        return NV_FALSE;
+    }
+
+    ret = nvkms_ioctl_common(popen,
+                             cmd,
+                             (NvU64)(NvUPtr)params_address, param_size) == 0;
+    nvkms_read_unlock_pm_lock();
+
+    return ret;
+}
+
 NvBool nvkms_ioctl_from_kapi
 (
    struct nvkms_per_open *popen,
@@ -1524,6 +1569,48 @@ NvBool nvKmsKapiGetFunctionsTable
 }
 EXPORT_SYMBOL(nvKmsKapiGetFunctionsTable);

+NvU32 nvKmsKapiF16ToF32(NvU16 a)
+{
+    return nvKmsKapiF16ToF32Internal(a);
+}
+EXPORT_SYMBOL(nvKmsKapiF16ToF32);
+
+NvU16 nvKmsKapiF32ToF16(NvU32 a)
+{
+    return nvKmsKapiF32ToF16Internal(a);
+}
+EXPORT_SYMBOL(nvKmsKapiF32ToF16);
+
+NvU32 nvKmsKapiF32Mul(NvU32 a, NvU32 b)
+{
+    return nvKmsKapiF32MulInternal(a, b);
+}
+EXPORT_SYMBOL(nvKmsKapiF32Mul);
+
+NvU32 nvKmsKapiF32Div(NvU32 a, NvU32 b)
+{
+    return nvKmsKapiF32DivInternal(a, b);
+}
+EXPORT_SYMBOL(nvKmsKapiF32Div);
+
+NvU32 nvKmsKapiF32Add(NvU32 a, NvU32 b)
+{
+    return nvKmsKapiF32AddInternal(a, b);
+}
+EXPORT_SYMBOL(nvKmsKapiF32Add);
+
+NvU32 nvKmsKapiF32ToUI32RMinMag(NvU32 a, NvBool exact)
+{
+    return nvKmsKapiF32ToUI32RMinMagInternal(a, exact);
+}
+EXPORT_SYMBOL(nvKmsKapiF32ToUI32RMinMag);
+
+NvU32 nvKmsKapiUI32ToF32(NvU32 a)
+{
+    return nvKmsKapiUI32ToF32Internal(a);
+}
+EXPORT_SYMBOL(nvKmsKapiUI32ToF32);
+
 /*************************************************************************
 * File operation callback functions.
 *************************************************************************/
--- a/kernel-open/nvidia-modeset/nvidia-modeset-os-interface.h
+++ b/kernel-open/nvidia-modeset/nvidia-modeset-os-interface.h
@@ -67,6 +67,14 @@ enum NvKmsSyncPtOp {
    NVKMS_SYNCPT_OP_READ_MINVAL,
 };

+enum NvKmsDebugForceColorSpace {
+    NVKMS_DEBUG_FORCE_COLOR_SPACE_NONE,
+    NVKMS_DEBUG_FORCE_COLOR_SPACE_RGB,
+    NVKMS_DEBUG_FORCE_COLOR_SPACE_YUV444,
+    NVKMS_DEBUG_FORCE_COLOR_SPACE_YUV422,
+    NVKMS_DEBUG_FORCE_COLOR_SPACE_MAX,
+};
+
 typedef struct {

    struct {
@@ -102,6 +110,7 @@ NvBool nvkms_disable_vrr_memclk_switch(void);
 NvBool nvkms_hdmi_deepcolor(void);
 NvBool nvkms_vblank_sem_control(void);
 NvBool nvkms_opportunistic_display_sync(void);
+enum NvKmsDebugForceColorSpace nvkms_debug_force_color_space(void);

 void   nvkms_call_rm    (void *ops);
 void*  nvkms_alloc      (size_t size,
@@ -304,6 +313,11 @@ NvU32 nvkms_enumerate_gpus(nv_gpu_info_t *gpu_info);

 NvBool nvkms_allow_write_combining(void);

+/*!
+ * Check if OS supports syncpoints.
+ */
+NvBool nvkms_kernel_supports_syncpts(void);
+
 /*!
 * Checks whether the fd is associated with an nvidia character device.
 */
@@ -328,6 +342,16 @@ NvBool nvkms_ioctl_from_kapi
    NvU32 cmd, void *params_address, const size_t params_size
 );

+/*!
+ * Like nvkms_ioctl_from_kapi, but return NV_FALSE instead of waiting if the
+ * power management read lock cannot be acquired.
+ */
+NvBool nvkms_ioctl_from_kapi_try_pmlock
+(
+    struct nvkms_per_open *popen,
+    NvU32 cmd, void *params_address, const size_t params_size
+);
+
 /*!
 * APIs for locking.
 */
--- a/kernel-open/nvidia-modeset/nvidia-modeset.Kbuild
+++ b/kernel-open/nvidia-modeset/nvidia-modeset.Kbuild
@@ -105,3 +105,4 @@ NV_CONFTEST_FUNCTION_COMPILE_TESTS += list_is_first
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += ktime_get_real_ts64
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += ktime_get_raw_ts64
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += acpi_video_backlight_use_native
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += kernel_read_has_pointer_pos_arg
--- a/kernel-open/nvidia-modeset/nvkms.h
+++ b/kernel-open/nvidia-modeset/nvkms.h
@@ -110,4 +110,18 @@ NvBool nvKmsSetBacklight(NvU32 display_id, void *drv_priv, NvU32 brightness);

 NvBool nvKmsOpenDevHasSubOwnerPermissionOrBetter(const struct NvKmsPerOpenDev *pOpenDev);

+NvU32 nvKmsKapiF16ToF32Internal(NvU16 a);
+
+NvU16 nvKmsKapiF32ToF16Internal(NvU32 a);
+
+NvU32 nvKmsKapiF32MulInternal(NvU32 a, NvU32 b);
+
+NvU32 nvKmsKapiF32DivInternal(NvU32 a, NvU32 b);
+
+NvU32 nvKmsKapiF32AddInternal(NvU32 a, NvU32 b);
+
+NvU32 nvKmsKapiF32ToUI32RMinMagInternal(NvU32 a, NvBool exact);
+
+NvU32 nvKmsKapiUI32ToF32Internal(NvU32 a);
+
 #endif /* __NV_KMS_H__ */
--- a/kernel-open/nvidia-peermem/nvidia-peermem.c
+++ b/kernel-open/nvidia-peermem/nvidia-peermem.c
@@ -60,6 +60,13 @@ static int peerdirect_support = NV_MEM_PEERDIRECT_SUPPORT_DEFAULT;
 module_param(peerdirect_support, int, S_IRUGO);
 MODULE_PARM_DESC(peerdirect_support, "Set level of support for Peer-direct, 0 [default] or 1 [legacy, for example MLNX_OFED 4.9 LTS]");

+enum {
+        NV_MEM_PERSISTENT_API_SUPPORT_LEGACY = 0,
+        NV_MEM_PERSISTENT_API_SUPPORT_DEFAULT = 1,
+};
+static int persistent_api_support = NV_MEM_PERSISTENT_API_SUPPORT_DEFAULT;
+module_param(persistent_api_support, int, S_IRUGO);
+MODULE_PARM_DESC(persistent_api_support, "Set level of support for persistent APIs, 0 [legacy] or 1 [default]");

 #define peer_err(FMT, ARGS...) printk(KERN_ERR "nvidia-peermem" " %s:%d ERROR " FMT, __FUNCTION__, __LINE__, ## ARGS)
 #ifdef NV_MEM_DEBUG
@@ -479,32 +486,8 @@ static struct peer_memory_client nv_mem_client_nc = {
    .release        = nv_mem_release,
 };

-#endif /* NV_MLNX_IB_PEER_MEM_SYMBOLS_PRESENT */
-
-static int nv_mem_param_conf_check(void)
+static int nv_mem_legacy_client_init(void)
 {
-    int rc = 0;
-    switch (peerdirect_support) {
-    case NV_MEM_PEERDIRECT_SUPPORT_DEFAULT:
-    case NV_MEM_PEERDIRECT_SUPPORT_LEGACY:
-        break;
-    default:
-        peer_err("invalid peerdirect_support param value %d\n", peerdirect_support);
-        rc = -EINVAL;
-        break;
-    }
-    return rc;
-}
-
-static int __init nv_mem_client_init(void)
-{
-    int rc;
-    rc = nv_mem_param_conf_check();
-    if (rc) {
-        return rc;
-    }
-
-#if defined (NV_MLNX_IB_PEER_MEM_SYMBOLS_PRESENT)
    // off by one, to leave space for the trailing '1' which is flagging
    // the new client type
    BUG_ON(strlen(DRV_NAME) > IB_PEER_MEMORY_NAME_MAX-1);
@@ -533,19 +516,96 @@ static int __init nv_mem_client_init(void)
                         &mem_invalidate_callback);
    if (!reg_handle) {
        peer_err("nv_mem_client_init -- error while registering traditional client\n");
-        rc = -EINVAL;
-        goto out;
+        return -EINVAL;
+    }
+    return 0;
+}
+
+static int nv_mem_nc_client_init(void)
+{
+    // The nc client enables support for persistent pages.
+    if (persistent_api_support == NV_MEM_PERSISTENT_API_SUPPORT_LEGACY)
+    {
+        //
+        // If legacy behavior is forced via module param,
+        // both legacy and persistent clients are registered and are named
+        // "nv_mem"(legacy) and "nv_mem_nc"(persistent).
+        //
+        strcpy(nv_mem_client_nc.name, DRV_NAME "_nc");
+    }
+    else
+    {
+        //
+        // With default persistent behavior, the client name shall be "nv_mem"
+        // so that libraries can use the persistent client under the same name.
+        //
+        strcpy(nv_mem_client_nc.name, DRV_NAME);
    }

-    // The nc client enables support for persistent pages.
-    strcpy(nv_mem_client_nc.name, DRV_NAME "_nc");
    strcpy(nv_mem_client_nc.version, DRV_VERSION);
    reg_handle_nc = ib_register_peer_memory_client(&nv_mem_client_nc, NULL);
    if (!reg_handle_nc) {
        peer_err("nv_mem_client_init -- error while registering nc client\n");
-        rc = -EINVAL;
-        goto out;
+        return -EINVAL;
    }
+    return 0;
+}
+
+#endif /* NV_MLNX_IB_PEER_MEM_SYMBOLS_PRESENT */
+
+static int nv_mem_param_peerdirect_conf_check(void)
+{
+    int rc = 0;
+    switch (peerdirect_support) {
+    case NV_MEM_PEERDIRECT_SUPPORT_DEFAULT:
+    case NV_MEM_PEERDIRECT_SUPPORT_LEGACY:
+        break;
+    default:
+        peer_err("invalid peerdirect_support param value %d\n", peerdirect_support);
+        rc = -EINVAL;
+        break;
+    }
+    return rc;
+}
+
+static int nv_mem_param_persistent_api_conf_check(void)
+{
+    int rc = 0;
+    switch (persistent_api_support) {
+    case NV_MEM_PERSISTENT_API_SUPPORT_DEFAULT:
+    case NV_MEM_PERSISTENT_API_SUPPORT_LEGACY:
+        break;
+    default:
+        peer_err("invalid persistent_api_support param value %d\n", persistent_api_support);
+        rc = -EINVAL;
+        break;
+    }
+    return rc;
+}
+
+static int __init nv_mem_client_init(void)
+{
+#if defined (NV_MLNX_IB_PEER_MEM_SYMBOLS_PRESENT)
+    int rc;
+    rc = nv_mem_param_peerdirect_conf_check();
+    if (rc) {
+        return rc;
+    }
+
+    rc = nv_mem_param_persistent_api_conf_check();
+    if (rc) {
+        return rc;
+    }
+
+    if (persistent_api_support == NV_MEM_PERSISTENT_API_SUPPORT_LEGACY) {
+        rc = nv_mem_legacy_client_init();
+        if (rc)
+            goto out;
+    }
+
+    rc = nv_mem_nc_client_init();
+    if (rc)
+        goto out;

 out:
    if (rc) {
--- a/kernel-open/nvidia-uvm/clc365.h
+++ b/kernel-open/nvidia-uvm/clc365.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2023 NVIDIA Corporation
+    Copyright (c) 2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
--- a/kernel-open/nvidia-uvm/clc369.h
+++ b/kernel-open/nvidia-uvm/clc369.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2023 NVIDIA Corporation
+    Copyright (c) 2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
--- a/kernel-open/nvidia-uvm/clc96f.h
+++ b/kernel-open/nvidia-uvm/clc96f.h
@@ -0,0 +1,329 @@
+/*******************************************************************************
+    Copyright (c) 2012-2015 NVIDIA Corporation
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to
+    deal in the Software without restriction, including without limitation the
+    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be
+    included in all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+
+#ifndef _clc96f_h_
+#define _clc96f_h_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "nvtypes.h"
+
+/* class BLACKWELL_CHANNEL_GPFIFO  */
+/*
+ * Documentation for BLACKWELL_CHANNEL_GPFIFO can be found in dev_pbdma.ref,
+ * chapter "User Control Registers". It is documented as device NV_UDMA.
+ * The GPFIFO format itself is also documented in dev_pbdma.ref,
+ * NV_PPBDMA_GP_ENTRY_*. The pushbuffer format is documented in dev_ram.ref,
+ * chapter "FIFO DMA RAM", NV_FIFO_DMA_*.
+ *
+ * Note there is no .mfs file for this class.
+ */
+#define  BLACKWELL_CHANNEL_GPFIFO_A                           (0x0000C96F)
+
+#define NVC96F_TYPEDEF                             BLACKWELL_CHANNELChannelGPFifoA
+
+/* dma flow control data structure */
+typedef volatile struct Nvc96fControl_struct {
+ NvU32 Ignored00[0x23];        /*                                  0000-008b*/
+ NvU32 GPPut;                   /* GP FIFO put offset               008c-008f*/
+ NvU32 Ignored01[0x5c];
+} Nvc96fControl, BlackwellAControlGPFifo;
+
+/* fields and values */
+#define NVC96F_NUMBER_OF_SUBCHANNELS                               (8)
+#define NVC96F_SET_OBJECT                                          (0x00000000)
+#define NVC96F_SET_OBJECT_NVCLASS                                         15:0
+#define NVC96F_SET_OBJECT_ENGINE                                         20:16
+#define NVC96F_SET_OBJECT_ENGINE_SW                                 0x0000001f
+#define NVC96F_NOP                                                 (0x00000008)
+#define NVC96F_NOP_HANDLE                                                 31:0
+#define NVC96F_NON_STALL_INTERRUPT                                 (0x00000020)
+#define NVC96F_NON_STALL_INTERRUPT_HANDLE                                 31:0
+#define NVC96F_FB_FLUSH                                            (0x00000024) // Deprecated - use MEMBAR TYPE SYS_MEMBAR
+#define NVC96F_FB_FLUSH_HANDLE                                            31:0
+// NOTE - MEM_OP_A and MEM_OP_B have been replaced in gp100 with methods for
+// specifying the page address for a targeted TLB invalidate and the uTLB for
+// a targeted REPLAY_CANCEL for UVM.
+// The previous MEM_OP_A/B functionality is in MEM_OP_C/D, with slightly
+// rearranged fields.
+#define NVC96F_MEM_OP_A                                            (0x00000028)
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_CANCEL_TARGET_CLIENT_UNIT_ID        5:0  // only relevant for REPLAY_CANCEL_TARGETED
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_INVALIDATION_SIZE                   5:0  // Used to specify size of invalidate, used for invalidates which are not of the REPLAY_CANCEL_TARGETED type
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_CANCEL_TARGET_GPC_ID               10:6  // only relevant for REPLAY_CANCEL_TARGETED
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_INVAL_SCOPE                         7:6  // only relevant for invalidates with NVC96F_MEM_OP_C_TLB_INVALIDATE_REPLAY_NONE for invalidating  link TLB only, or non-link TLB only or all TLBs
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_INVAL_SCOPE_ALL_TLBS                  0
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_INVAL_SCOPE_LINK_TLBS                 1
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_INVAL_SCOPE_NON_LINK_TLBS             2
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_INVAL_SCOPE_RSVRVD                    3
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_CANCEL_MMU_ENGINE_ID                8:0  // only relevant for REPLAY_CANCEL_VA_GLOBAL
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR                         11:11
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR_EN                 0x00000001
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR_DIS                0x00000000
+#define NVC96F_MEM_OP_A_TLB_INVALIDATE_TARGET_ADDR_LO                    31:12
+#define NVC96F_MEM_OP_B                                            (0x0000002c)
+#define NVC96F_MEM_OP_B_TLB_INVALIDATE_TARGET_ADDR_HI                     31:0
+#define NVC96F_MEM_OP_C                                            (0x00000030)
+#define NVC96F_MEM_OP_C_MEMBAR_TYPE                                        2:0
+#define NVC96F_MEM_OP_C_MEMBAR_TYPE_SYS_MEMBAR                      0x00000000
+#define NVC96F_MEM_OP_C_MEMBAR_TYPE_MEMBAR                          0x00000001
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PDB                                 0:0
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PDB_ONE                      0x00000000
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PDB_ALL                      0x00000001  // Probably nonsensical for MMU_TLB_INVALIDATE_TARGETED
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_GPC                                 1:1
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_GPC_ENABLE                   0x00000000
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_GPC_DISABLE                  0x00000001
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_REPLAY                              4:2  // only relevant if GPC ENABLE
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_REPLAY_NONE                  0x00000000
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_REPLAY_START                 0x00000001
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_REPLAY_START_ACK_ALL         0x00000002
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_REPLAY_CANCEL_TARGETED       0x00000003
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_REPLAY_CANCEL_GLOBAL         0x00000004
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_REPLAY_CANCEL_VA_GLOBAL      0x00000005
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACK_TYPE                            6:5  // only relevant if GPC ENABLE
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACK_TYPE_NONE                0x00000000
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACK_TYPE_GLOBALLY            0x00000001
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACK_TYPE_INTRANODE           0x00000002
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE                         9:7 //only relevant for REPLAY_CANCEL_VA_GLOBAL
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_READ                 0
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_WRITE                1
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_ATOMIC_STRONG        2
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_RSVRVD               3
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_ATOMIC_WEAK          4
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_ATOMIC_ALL           5
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_WRITE_AND_ATOMIC     6
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_ALL                  7
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL                    9:7  // Invalidate affects this level and all below
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_ALL         0x00000000  // Invalidate tlb caches at all levels of the page table
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_PTE_ONLY    0x00000001
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE0  0x00000002
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE1  0x00000003
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE2  0x00000004
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE3  0x00000005
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE4  0x00000006
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE5  0x00000007
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE                          11:10  // only relevant if PDB_ONE
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE_VID_MEM             0x00000000
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE_SYS_MEM_COHERENT    0x00000002
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE_SYS_MEM_NONCOHERENT 0x00000003
+#define NVC96F_MEM_OP_C_TLB_INVALIDATE_PDB_ADDR_LO                       31:12  // only relevant if PDB_ONE
+#define NVC96F_MEM_OP_C_ACCESS_COUNTER_CLR_TARGETED_NOTIFY_TAG            19:0
+// MEM_OP_D MUST be preceded by MEM_OPs A-C.
+#define NVC96F_MEM_OP_D                                            (0x00000034)
+#define NVC96F_MEM_OP_D_TLB_INVALIDATE_PDB_ADDR_HI                        26:0  // only relevant if PDB_ONE
+#define NVC96F_MEM_OP_D_OPERATION                                        31:27
+#define NVC96F_MEM_OP_D_OPERATION_MEMBAR                            0x00000005
+#define NVC96F_MEM_OP_D_OPERATION_MMU_TLB_INVALIDATE                0x00000009
+#define NVC96F_MEM_OP_D_OPERATION_MMU_TLB_INVALIDATE_TARGETED       0x0000000a
+#define NVC96F_MEM_OP_D_OPERATION_MMU_OPERATION                     0x0000000b
+#define NVC96F_MEM_OP_D_OPERATION_L2_PEERMEM_INVALIDATE             0x0000000d
+#define NVC96F_MEM_OP_D_OPERATION_L2_SYSMEM_INVALIDATE              0x0000000e
+// CLEAN_LINES is an alias for Tegra/GPU IP usage
+#define NVC96F_MEM_OP_B_OPERATION_L2_INVALIDATE_CLEAN_LINES         0x0000000e
+#define NVC96F_MEM_OP_D_OPERATION_L2_CLEAN_COMPTAGS                 0x0000000f
+#define NVC96F_MEM_OP_D_OPERATION_L2_FLUSH_DIRTY                    0x00000010
+#define NVC96F_MEM_OP_D_OPERATION_L2_SYSMEM_NCOH_INVALIDATE         0x00000011
+#define NVC96F_MEM_OP_D_OPERATION_L2_SYSMEM_COH_INVALIDATE          0x00000012
+#define NVC96F_MEM_OP_D_OPERATION_L2_WAIT_FOR_SYS_PENDING_READS     0x00000015
+#define NVC96F_MEM_OP_D_OPERATION_ACCESS_COUNTER_CLR                0x00000016
+#define NVC96F_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE                            1:0
+#define NVC96F_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE_MIMC                0x00000000
+#define NVC96F_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE_MOMC                0x00000001
+#define NVC96F_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE_ALL                 0x00000002
+#define NVC96F_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE_TARGETED            0x00000003
+#define NVC96F_MEM_OP_D_ACCESS_COUNTER_CLR_TARGETED_TYPE                   2:2
+#define NVC96F_MEM_OP_D_ACCESS_COUNTER_CLR_TARGETED_TYPE_MIMC       0x00000000
+#define NVC96F_MEM_OP_D_ACCESS_COUNTER_CLR_TARGETED_TYPE_MOMC       0x00000001
+#define NVC96F_MEM_OP_D_ACCESS_COUNTER_CLR_TARGETED_BANK                   6:3
+#define NVC96F_MEM_OP_D_MMU_OPERATION_TYPE                               23:20
+#define NVC96F_MEM_OP_D_MMU_OPERATION_TYPE_RESERVED                 0x00000000
+#define NVC96F_MEM_OP_D_MMU_OPERATION_TYPE_VIDMEM_ACCESS_BIT_DUMP   0x00000001
+#define NVC96F_SEM_ADDR_LO                                         (0x0000005c)
+#define NVC96F_SEM_ADDR_LO_OFFSET                                         31:2
+#define NVC96F_SEM_ADDR_HI                                         (0x00000060)
+#define NVC96F_SEM_ADDR_HI_OFFSET                                         24:0
+#define NVC96F_SEM_PAYLOAD_LO                                      (0x00000064)
+#define NVC96F_SEM_PAYLOAD_LO_PAYLOAD                                     31:0
+#define NVC96F_SEM_PAYLOAD_HI                                      (0x00000068)
+#define NVC96F_SEM_PAYLOAD_HI_PAYLOAD                                     31:0
+#define NVC96F_SEM_EXECUTE                                         (0x0000006c)
+#define NVC96F_SEM_EXECUTE_OPERATION                                       2:0
+#define NVC96F_SEM_EXECUTE_OPERATION_ACQUIRE                        0x00000000
+#define NVC96F_SEM_EXECUTE_OPERATION_RELEASE                        0x00000001
+#define NVC96F_SEM_EXECUTE_OPERATION_ACQ_STRICT_GEQ                 0x00000002
+#define NVC96F_SEM_EXECUTE_OPERATION_ACQ_CIRC_GEQ                   0x00000003
+#define NVC96F_SEM_EXECUTE_OPERATION_ACQ_AND                        0x00000004
+#define NVC96F_SEM_EXECUTE_OPERATION_ACQ_NOR                        0x00000005
+#define NVC96F_SEM_EXECUTE_OPERATION_REDUCTION                      0x00000006
+#define NVC96F_SEM_EXECUTE_ACQUIRE_SWITCH_TSG                            12:12
+#define NVC96F_SEM_EXECUTE_ACQUIRE_SWITCH_TSG_DIS                   0x00000000
+#define NVC96F_SEM_EXECUTE_ACQUIRE_SWITCH_TSG_EN                    0x00000001
+#define NVC96F_SEM_EXECUTE_ACQUIRE_RECHECK                               18:18
+#define NVC96F_SEM_EXECUTE_ACQUIRE_RECHECK_DIS                      0x00000000
+#define NVC96F_SEM_EXECUTE_ACQUIRE_RECHECK_EN                       0x00000001
+#define NVC96F_SEM_EXECUTE_RELEASE_WFI                                   20:20
+#define NVC96F_SEM_EXECUTE_RELEASE_WFI_DIS                          0x00000000
+#define NVC96F_SEM_EXECUTE_RELEASE_WFI_EN                           0x00000001
+#define NVC96F_SEM_EXECUTE_PAYLOAD_SIZE                                  24:24
+#define NVC96F_SEM_EXECUTE_PAYLOAD_SIZE_32BIT                       0x00000000
+#define NVC96F_SEM_EXECUTE_PAYLOAD_SIZE_64BIT                       0x00000001
+#define NVC96F_SEM_EXECUTE_RELEASE_TIMESTAMP                             25:25
+#define NVC96F_SEM_EXECUTE_RELEASE_TIMESTAMP_DIS                    0x00000000
+#define NVC96F_SEM_EXECUTE_RELEASE_TIMESTAMP_EN                     0x00000001
+#define NVC96F_SEM_EXECUTE_REDUCTION                                     30:27
+#define NVC96F_SEM_EXECUTE_REDUCTION_IMIN                           0x00000000
+#define NVC96F_SEM_EXECUTE_REDUCTION_IMAX                           0x00000001
+#define NVC96F_SEM_EXECUTE_REDUCTION_IXOR                           0x00000002
+#define NVC96F_SEM_EXECUTE_REDUCTION_IAND                           0x00000003
+#define NVC96F_SEM_EXECUTE_REDUCTION_IOR                            0x00000004
+#define NVC96F_SEM_EXECUTE_REDUCTION_IADD                           0x00000005
+#define NVC96F_SEM_EXECUTE_REDUCTION_INC                            0x00000006
+#define NVC96F_SEM_EXECUTE_REDUCTION_DEC                            0x00000007
+#define NVC96F_SEM_EXECUTE_REDUCTION_FORMAT                              31:31
+#define NVC96F_SEM_EXECUTE_REDUCTION_FORMAT_SIGNED                  0x00000000
+#define NVC96F_SEM_EXECUTE_REDUCTION_FORMAT_UNSIGNED                0x00000001
+#define NVC96F_WFI                                                 (0x00000078)
+#define NVC96F_WFI_SCOPE                                                   0:0
+#define NVC96F_WFI_SCOPE_CURRENT_SCG_TYPE                           0x00000000
+#define NVC96F_WFI_SCOPE_CURRENT_VEID                               0x00000000
+#define NVC96F_WFI_SCOPE_ALL                                        0x00000001
+#define NVC96F_YIELD                                               (0x00000080)
+#define NVC96F_YIELD_OP                                                    1:0
+#define NVC96F_YIELD_OP_NOP                                         0x00000000
+#define NVC96F_YIELD_OP_TSG                                         0x00000003
+#define NVC96F_CLEAR_FAULTED                                       (0x00000084)
+// Note: RM provides the HANDLE as an opaque value; the internal detail fields
+// are intentionally not exposed to the driver through these defines.
+#define NVC96F_CLEAR_FAULTED_HANDLE                                       30:0
+#define NVC96F_CLEAR_FAULTED_TYPE                                        31:31
+#define NVC96F_CLEAR_FAULTED_TYPE_PBDMA_FAULTED                     0x00000000
+#define NVC96F_CLEAR_FAULTED_TYPE_ENG_FAULTED                       0x00000001
+
+
+/* GPFIFO entry format */
+#define NVC96F_GP_ENTRY__SIZE                                          8
+#define NVC96F_GP_ENTRY0_FETCH                                       0:0
+#define NVC96F_GP_ENTRY0_FETCH_UNCONDITIONAL                  0x00000000
+#define NVC96F_GP_ENTRY0_FETCH_CONDITIONAL                    0x00000001
+#define NVC96F_GP_ENTRY0_GET                                        31:2
+#define NVC96F_GP_ENTRY0_OPERAND                                    31:0
+#define NVC96F_GP_ENTRY0_PB_EXTENDED_BASE_OPERAND                   24:8
+#define NVC96F_GP_ENTRY1_GET_HI                                      7:0
+#define NVC96F_GP_ENTRY1_LEVEL                                       9:9
+#define NVC96F_GP_ENTRY1_LEVEL_MAIN                           0x00000000
+#define NVC96F_GP_ENTRY1_LEVEL_SUBROUTINE                     0x00000001
+#define NVC96F_GP_ENTRY1_LENGTH                                    30:10
+#define NVC96F_GP_ENTRY1_SYNC                                      31:31
+#define NVC96F_GP_ENTRY1_SYNC_PROCEED                         0x00000000
+#define NVC96F_GP_ENTRY1_SYNC_WAIT                            0x00000001
+#define NVC96F_GP_ENTRY1_OPCODE                                      7:0
+#define NVC96F_GP_ENTRY1_OPCODE_NOP                           0x00000000
+#define NVC96F_GP_ENTRY1_OPCODE_ILLEGAL                       0x00000001
+#define NVC96F_GP_ENTRY1_OPCODE_GP_CRC                        0x00000002
+#define NVC96F_GP_ENTRY1_OPCODE_PB_CRC                        0x00000003
+#define NVC96F_GP_ENTRY1_OPCODE_SET_PB_SEGMENT_EXTENDED_BASE  0x00000004
+
+/* dma method formats */
+#define NVC96F_DMA_METHOD_ADDRESS_OLD                              12:2
+#define NVC96F_DMA_METHOD_ADDRESS                                  11:0
+#define NVC96F_DMA_SUBDEVICE_MASK                                  15:4
+#define NVC96F_DMA_METHOD_SUBCHANNEL                               15:13
+#define NVC96F_DMA_TERT_OP                                         17:16
+#define NVC96F_DMA_TERT_OP_GRP0_INC_METHOD                         (0x00000000)
+#define NVC96F_DMA_TERT_OP_GRP0_SET_SUB_DEV_MASK                   (0x00000001)
+#define NVC96F_DMA_TERT_OP_GRP0_STORE_SUB_DEV_MASK                 (0x00000002)
+#define NVC96F_DMA_TERT_OP_GRP0_USE_SUB_DEV_MASK                   (0x00000003)
+#define NVC96F_DMA_TERT_OP_GRP2_NON_INC_METHOD                     (0x00000000)
+#define NVC96F_DMA_METHOD_COUNT_OLD                                28:18
+#define NVC96F_DMA_METHOD_COUNT                                    28:16
+#define NVC96F_DMA_IMMD_DATA                                       28:16
+#define NVC96F_DMA_SEC_OP                                          31:29
+#define NVC96F_DMA_SEC_OP_GRP0_USE_TERT                            (0x00000000)
+#define NVC96F_DMA_SEC_OP_INC_METHOD                               (0x00000001)
+#define NVC96F_DMA_SEC_OP_GRP2_USE_TERT                            (0x00000002)
+#define NVC96F_DMA_SEC_OP_NON_INC_METHOD                           (0x00000003)
+#define NVC96F_DMA_SEC_OP_IMMD_DATA_METHOD                         (0x00000004)
+#define NVC96F_DMA_SEC_OP_ONE_INC                                  (0x00000005)
+#define NVC96F_DMA_SEC_OP_RESERVED6                                (0x00000006)
+#define NVC96F_DMA_SEC_OP_END_PB_SEGMENT                           (0x00000007)
+/* dma incrementing method format */
+#define NVC96F_DMA_INCR_ADDRESS                                    11:0
+#define NVC96F_DMA_INCR_SUBCHANNEL                                 15:13
+#define NVC96F_DMA_INCR_COUNT                                      28:16
+#define NVC96F_DMA_INCR_OPCODE                                     31:29
+#define NVC96F_DMA_INCR_OPCODE_VALUE                               (0x00000001)
+#define NVC96F_DMA_INCR_DATA                                       31:0
+/* dma non-incrementing method format */
+#define NVC96F_DMA_NONINCR_ADDRESS                                 11:0
+#define NVC96F_DMA_NONINCR_SUBCHANNEL                              15:13
+#define NVC96F_DMA_NONINCR_COUNT                                   28:16
+#define NVC96F_DMA_NONINCR_OPCODE                                  31:29
+#define NVC96F_DMA_NONINCR_OPCODE_VALUE                            (0x00000003)
+#define NVC96F_DMA_NONINCR_DATA                                    31:0
+/* dma increment-once method format */
+#define NVC96F_DMA_ONEINCR_ADDRESS                                 11:0
+#define NVC96F_DMA_ONEINCR_SUBCHANNEL                              15:13
+#define NVC96F_DMA_ONEINCR_COUNT                                   28:16
+#define NVC96F_DMA_ONEINCR_OPCODE                                  31:29
+#define NVC96F_DMA_ONEINCR_OPCODE_VALUE                            (0x00000005)
+#define NVC96F_DMA_ONEINCR_DATA                                    31:0
+/* dma no-operation format */
+#define NVC96F_DMA_NOP                                             (0x00000000)
+/* dma immediate-data format */
+#define NVC96F_DMA_IMMD_ADDRESS                                    11:0
+#define NVC96F_DMA_IMMD_SUBCHANNEL                                 15:13
+#define NVC96F_DMA_IMMD_DATA                                       28:16
+#define NVC96F_DMA_IMMD_OPCODE                                     31:29
+#define NVC96F_DMA_IMMD_OPCODE_VALUE                               (0x00000004)
+/* dma set sub-device mask format */
+#define NVC96F_DMA_SET_SUBDEVICE_MASK_VALUE                        15:4
+#define NVC96F_DMA_SET_SUBDEVICE_MASK_OPCODE                       31:16
+#define NVC96F_DMA_SET_SUBDEVICE_MASK_OPCODE_VALUE                 (0x00000001)
+/* dma store sub-device mask format */
+#define NVC96F_DMA_STORE_SUBDEVICE_MASK_VALUE                      15:4
+#define NVC96F_DMA_STORE_SUBDEVICE_MASK_OPCODE                     31:16
+#define NVC96F_DMA_STORE_SUBDEVICE_MASK_OPCODE_VALUE               (0x00000002)
+/* dma use sub-device mask format */
+#define NVC96F_DMA_USE_SUBDEVICE_MASK_OPCODE                       31:16
+#define NVC96F_DMA_USE_SUBDEVICE_MASK_OPCODE_VALUE                 (0x00000003)
+/* dma end-segment format */
+#define NVC96F_DMA_ENDSEG_OPCODE                                   31:29
+#define NVC96F_DMA_ENDSEG_OPCODE_VALUE                             (0x00000007)
+/* dma legacy incrementing/non-incrementing formats */
+#define NVC96F_DMA_ADDRESS                                         12:2
+#define NVC96F_DMA_SUBCH                                           15:13
+#define NVC96F_DMA_OPCODE3                                         17:16
+#define NVC96F_DMA_OPCODE3_NONE                                    (0x00000000)
+#define NVC96F_DMA_COUNT                                           28:18
+#define NVC96F_DMA_OPCODE                                          31:29
+#define NVC96F_DMA_OPCODE_METHOD                                   (0x00000000)
+#define NVC96F_DMA_OPCODE_NONINC_METHOD                            (0x00000002)
+#define NVC96F_DMA_DATA                                            31:0
+
+#ifdef __cplusplus
+};     /* extern "C" */
+#endif
+
+#endif /* _clc96f_h_ */
--- a/kernel-open/nvidia-uvm/clc9b5.h
+++ b/kernel-open/nvidia-uvm/clc9b5.h
@@ -0,0 +1,460 @@
+/*******************************************************************************
+    Copyright (c) 1993-2004 NVIDIA Corporation
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to
+    deal in the Software without restriction, including without limitation the
+    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be
+    included in all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+
+
+#include "nvtypes.h"
+
+#ifndef _clc9b5_h_
+#define _clc9b5_h_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define BLACKWELL_DMA_COPY_A                                                            (0x0000C9B5)
+
+typedef volatile struct _clc9b5_tag0 {
+    NvV32 Reserved00[0x40];
+    NvV32 Nop;                                                                  // 0x00000100 - 0x00000103
+    NvV32 Reserved01[0xF];
+    NvV32 PmTrigger;                                                            // 0x00000140 - 0x00000143
+    NvV32 Reserved02[0x36];
+    NvV32 SetMonitoredFenceType;                                                // 0x0000021C - 0x0000021F
+    NvV32 SetMonitoredFenceSignalAddrBaseUpper;                                 // 0x00000220 - 0x00000223
+    NvV32 SetMonitoredFenceSignalAddrBaseLower;                                 // 0x00000224 - 0x00000227
+    NvV32 Reserved03[0x6];
+    NvV32 SetSemaphoreA;                                                        // 0x00000240 - 0x00000243
+    NvV32 SetSemaphoreB;                                                        // 0x00000244 - 0x00000247
+    NvV32 SetSemaphorePayload;                                                  // 0x00000248 - 0x0000024B
+    NvV32 SetSemaphorePayloadUpper;                                             // 0x0000024C - 0x0000024F
+    NvV32 Reserved04[0x1];
+    NvV32 SetRenderEnableA;                                                     // 0x00000254 - 0x00000257
+    NvV32 SetRenderEnableB;                                                     // 0x00000258 - 0x0000025B
+    NvV32 SetRenderEnableC;                                                     // 0x0000025C - 0x0000025F
+    NvV32 SetSrcPhysMode;                                                       // 0x00000260 - 0x00000263
+    NvV32 SetDstPhysMode;                                                       // 0x00000264 - 0x00000267
+    NvV32 Reserved05[0x26];
+    NvV32 LaunchDma;                                                            // 0x00000300 - 0x00000303
+    NvV32 Reserved06[0x3F];
+    NvV32 OffsetInUpper;                                                        // 0x00000400 - 0x00000403
+    NvV32 OffsetInLower;                                                        // 0x00000404 - 0x00000407
+    NvV32 OffsetOutUpper;                                                       // 0x00000408 - 0x0000040B
+    NvV32 OffsetOutLower;                                                       // 0x0000040C - 0x0000040F
+    NvV32 PitchIn;                                                              // 0x00000410 - 0x00000413
+    NvV32 PitchOut;                                                             // 0x00000414 - 0x00000417
+    NvV32 LineLengthIn;                                                         // 0x00000418 - 0x0000041B
+    NvV32 LineCount;                                                            // 0x0000041C - 0x0000041F
+    NvV32 Reserved07[0x38];
+    NvV32 SetSecureCopyMode;                                                    // 0x00000500 - 0x00000503
+    NvV32 SetDecryptIv0;                                                        // 0x00000504 - 0x00000507
+    NvV32 SetDecryptIv1;                                                        // 0x00000508 - 0x0000050B
+    NvV32 SetDecryptIv2;                                                        // 0x0000050C - 0x0000050F
+    NvV32 Reserved_SetAESCounter;                                               // 0x00000510 - 0x00000513
+    NvV32 SetDecryptAuthTagCompareAddrUpper;                                    // 0x00000514 - 0x00000517
+    NvV32 SetDecryptAuthTagCompareAddrLower;                                    // 0x00000518 - 0x0000051B
+    NvV32 Reserved08[0x5];
+    NvV32 SetEncryptAuthTagAddrUpper;                                           // 0x00000530 - 0x00000533
+    NvV32 SetEncryptAuthTagAddrLower;                                           // 0x00000534 - 0x00000537
+    NvV32 SetEncryptIvAddrUpper;                                                // 0x00000538 - 0x0000053B
+    NvV32 SetEncryptIvAddrLower;                                                // 0x0000053C - 0x0000053F
+    NvV32 Reserved09[0x10];
+    NvV32 SetCompressionParameters;                                             // 0x00000580 - 0x00000583
+    NvV32 SetDecompressOutLength;                                               // 0x00000584 - 0x00000587
+    NvV32 SetDecompressOutLengthAddrUpper;                                      // 0x00000588 - 0x0000058B
+    NvV32 SetDecompressOutLengthAddrLower;                                      // 0x0000058C - 0x0000058F
+    NvV32 SetDecompressChecksum;                                                // 0x00000590 - 0x00000593
+    NvV32 Reserved10[0x5A];
+    NvV32 SetMemoryScrubParameters;                                             // 0x000006FC - 0x000006FF
+    NvV32 SetRemapConstA;                                                       // 0x00000700 - 0x00000703
+    NvV32 SetRemapConstB;                                                       // 0x00000704 - 0x00000707
+    NvV32 SetRemapComponents;                                                   // 0x00000708 - 0x0000070B
+    NvV32 SetDstBlockSize;                                                      // 0x0000070C - 0x0000070F
+    NvV32 SetDstWidth;                                                          // 0x00000710 - 0x00000713
+    NvV32 SetDstHeight;                                                         // 0x00000714 - 0x00000717
+    NvV32 SetDstDepth;                                                          // 0x00000718 - 0x0000071B
+    NvV32 SetDstLayer;                                                          // 0x0000071C - 0x0000071F
+    NvV32 SetDstOrigin;                                                         // 0x00000720 - 0x00000723
+    NvV32 Reserved11[0x1];
+    NvV32 SetSrcBlockSize;                                                      // 0x00000728 - 0x0000072B
+    NvV32 SetSrcWidth;                                                          // 0x0000072C - 0x0000072F
+    NvV32 SetSrcHeight;                                                         // 0x00000730 - 0x00000733
+    NvV32 SetSrcDepth;                                                          // 0x00000734 - 0x00000737
+    NvV32 SetSrcLayer;                                                          // 0x00000738 - 0x0000073B
+    NvV32 SetSrcOrigin;                                                         // 0x0000073C - 0x0000073F
+    NvV32 Reserved12[0x1];
+    NvV32 SrcOriginX;                                                           // 0x00000744 - 0x00000747
+    NvV32 SrcOriginY;                                                           // 0x00000748 - 0x0000074B
+    NvV32 DstOriginX;                                                           // 0x0000074C - 0x0000074F
+    NvV32 DstOriginY;                                                           // 0x00000750 - 0x00000753
+    NvV32 Reserved13[0x270];
+    NvV32 PmTriggerEnd;                                                         // 0x00001114 - 0x00001117
+    NvV32 Reserved14[0x3BA];
+} blackwell_dma_copy_aControlPio;
+
+#define NVC9B5_NOP                                                              (0x00000100)
+#define NVC9B5_NOP_PARAMETER                                                    31:0
+#define NVC9B5_PM_TRIGGER                                                       (0x00000140)
+#define NVC9B5_PM_TRIGGER_V                                                     31:0
+#define NVC9B5_SET_MONITORED_FENCE_TYPE                                         (0x0000021C)
+#define NVC9B5_SET_MONITORED_FENCE_TYPE_TYPE                                    0:0
+#define NVC9B5_SET_MONITORED_FENCE_TYPE_TYPE_MONITORED_FENCE                    (0x00000000)
+#define NVC9B5_SET_MONITORED_FENCE_TYPE_TYPE_MONITORED_FENCE_EXT                (0x00000001)
+#define NVC9B5_SET_MONITORED_FENCE_SIGNAL_ADDR_BASE_UPPER                       (0x00000220)
+#define NVC9B5_SET_MONITORED_FENCE_SIGNAL_ADDR_BASE_UPPER_UPPER                 24:0
+#define NVC9B5_SET_MONITORED_FENCE_SIGNAL_ADDR_BASE_LOWER                       (0x00000224)
+#define NVC9B5_SET_MONITORED_FENCE_SIGNAL_ADDR_BASE_LOWER_LOWER                 31:0
+#define NVC9B5_SET_SEMAPHORE_A                                                  (0x00000240)
+#define NVC9B5_SET_SEMAPHORE_A_UPPER                                            24:0
+#define NVC9B5_SET_SEMAPHORE_B                                                  (0x00000244)
+#define NVC9B5_SET_SEMAPHORE_B_LOWER                                            31:0
+#define NVC9B5_SET_SEMAPHORE_PAYLOAD                                            (0x00000248)
+#define NVC9B5_SET_SEMAPHORE_PAYLOAD_PAYLOAD                                    31:0
+#define NVC9B5_SET_SEMAPHORE_PAYLOAD_UPPER                                      (0x0000024C)
+#define NVC9B5_SET_SEMAPHORE_PAYLOAD_UPPER_PAYLOAD                              31:0
+#define NVC9B5_SET_RENDER_ENABLE_A                                              (0x00000254)
+#define NVC9B5_SET_RENDER_ENABLE_A_UPPER                                        24:0
+#define NVC9B5_SET_RENDER_ENABLE_B                                              (0x00000258)
+#define NVC9B5_SET_RENDER_ENABLE_B_LOWER                                        31:0
+#define NVC9B5_SET_RENDER_ENABLE_C                                              (0x0000025C)
+#define NVC9B5_SET_RENDER_ENABLE_C_MODE                                         2:0
+#define NVC9B5_SET_RENDER_ENABLE_C_MODE_FALSE                                   (0x00000000)
+#define NVC9B5_SET_RENDER_ENABLE_C_MODE_TRUE                                    (0x00000001)
+#define NVC9B5_SET_RENDER_ENABLE_C_MODE_CONDITIONAL                             (0x00000002)
+#define NVC9B5_SET_RENDER_ENABLE_C_MODE_RENDER_IF_EQUAL                         (0x00000003)
+#define NVC9B5_SET_RENDER_ENABLE_C_MODE_RENDER_IF_NOT_EQUAL                     (0x00000004)
+#define NVC9B5_SET_SRC_PHYS_MODE                                                (0x00000260)
+#define NVC9B5_SET_SRC_PHYS_MODE_TARGET                                         1:0
+#define NVC9B5_SET_SRC_PHYS_MODE_TARGET_LOCAL_FB                                (0x00000000)
+#define NVC9B5_SET_SRC_PHYS_MODE_TARGET_COHERENT_SYSMEM                         (0x00000001)
+#define NVC9B5_SET_SRC_PHYS_MODE_TARGET_NONCOHERENT_SYSMEM                      (0x00000002)
+#define NVC9B5_SET_SRC_PHYS_MODE_TARGET_PEERMEM                                 (0x00000003)
+#define NVC9B5_SET_SRC_PHYS_MODE_BASIC_KIND                                     5:2
+#define NVC9B5_SET_SRC_PHYS_MODE_PEER_ID                                        8:6
+#define NVC9B5_SET_SRC_PHYS_MODE_FLA                                            9:9
+#define NVC9B5_SET_DST_PHYS_MODE                                                (0x00000264)
+#define NVC9B5_SET_DST_PHYS_MODE_TARGET                                         1:0
+#define NVC9B5_SET_DST_PHYS_MODE_TARGET_LOCAL_FB                                (0x00000000)
+#define NVC9B5_SET_DST_PHYS_MODE_TARGET_COHERENT_SYSMEM                         (0x00000001)
+#define NVC9B5_SET_DST_PHYS_MODE_TARGET_NONCOHERENT_SYSMEM                      (0x00000002)
+#define NVC9B5_SET_DST_PHYS_MODE_TARGET_PEERMEM                                 (0x00000003)
+#define NVC9B5_SET_DST_PHYS_MODE_BASIC_KIND                                     5:2
+#define NVC9B5_SET_DST_PHYS_MODE_PEER_ID                                        8:6
+#define NVC9B5_SET_DST_PHYS_MODE_FLA                                            9:9
+#define NVC9B5_LAUNCH_DMA                                                       (0x00000300)
+#define NVC9B5_LAUNCH_DMA_DATA_TRANSFER_TYPE                                    1:0
+#define NVC9B5_LAUNCH_DMA_DATA_TRANSFER_TYPE_NONE                               (0x00000000)
+#define NVC9B5_LAUNCH_DMA_DATA_TRANSFER_TYPE_PIPELINED                          (0x00000001)
+#define NVC9B5_LAUNCH_DMA_DATA_TRANSFER_TYPE_NON_PIPELINED                      (0x00000002)
+#define NVC9B5_LAUNCH_DMA_FLUSH_ENABLE                                          2:2
+#define NVC9B5_LAUNCH_DMA_FLUSH_ENABLE_FALSE                                    (0x00000000)
+#define NVC9B5_LAUNCH_DMA_FLUSH_ENABLE_TRUE                                     (0x00000001)
+#define NVC9B5_LAUNCH_DMA_FLUSH_TYPE                                            25:25
+#define NVC9B5_LAUNCH_DMA_FLUSH_TYPE_SYS                                        (0x00000000)
+#define NVC9B5_LAUNCH_DMA_FLUSH_TYPE_GL                                         (0x00000001)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_TYPE                                        4:3
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_TYPE_NONE                                   (0x00000000)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_TYPE_RELEASE_SEMAPHORE_NO_TIMESTAMP         (0x00000001)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_TYPE_RELEASE_SEMAPHORE_WITH_TIMESTAMP       (0x00000002)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_TYPE_RELEASE_ONE_WORD_SEMAPHORE             (0x00000001)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_TYPE_RELEASE_FOUR_WORD_SEMAPHORE            (0x00000002)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_TYPE_RELEASE_CONDITIONAL_INTR_SEMAPHORE     (0x00000003)
+#define NVC9B5_LAUNCH_DMA_INTERRUPT_TYPE                                        6:5
+#define NVC9B5_LAUNCH_DMA_INTERRUPT_TYPE_NONE                                   (0x00000000)
+#define NVC9B5_LAUNCH_DMA_INTERRUPT_TYPE_BLOCKING                               (0x00000001)
+#define NVC9B5_LAUNCH_DMA_INTERRUPT_TYPE_NON_BLOCKING                           (0x00000002)
+#define NVC9B5_LAUNCH_DMA_SRC_MEMORY_LAYOUT                                     7:7
+#define NVC9B5_LAUNCH_DMA_SRC_MEMORY_LAYOUT_BLOCKLINEAR                         (0x00000000)
+#define NVC9B5_LAUNCH_DMA_SRC_MEMORY_LAYOUT_PITCH                               (0x00000001)
+#define NVC9B5_LAUNCH_DMA_DST_MEMORY_LAYOUT                                     8:8
+#define NVC9B5_LAUNCH_DMA_DST_MEMORY_LAYOUT_BLOCKLINEAR                         (0x00000000)
+#define NVC9B5_LAUNCH_DMA_DST_MEMORY_LAYOUT_PITCH                               (0x00000001)
+#define NVC9B5_LAUNCH_DMA_MULTI_LINE_ENABLE                                     9:9
+#define NVC9B5_LAUNCH_DMA_MULTI_LINE_ENABLE_FALSE                               (0x00000000)
+#define NVC9B5_LAUNCH_DMA_MULTI_LINE_ENABLE_TRUE                                (0x00000001)
+#define NVC9B5_LAUNCH_DMA_REMAP_ENABLE                                          10:10
+#define NVC9B5_LAUNCH_DMA_REMAP_ENABLE_FALSE                                    (0x00000000)
+#define NVC9B5_LAUNCH_DMA_REMAP_ENABLE_TRUE                                     (0x00000001)
+#define NVC9B5_LAUNCH_DMA_COMPRESSION_ENABLE                                    11:11
+#define NVC9B5_LAUNCH_DMA_COMPRESSION_ENABLE_FALSE                              (0x00000000)
+#define NVC9B5_LAUNCH_DMA_COMPRESSION_ENABLE_TRUE                               (0x00000001)
+#define NVC9B5_LAUNCH_DMA_SRC_TYPE                                              12:12
+#define NVC9B5_LAUNCH_DMA_SRC_TYPE_VIRTUAL                                      (0x00000000)
+#define NVC9B5_LAUNCH_DMA_SRC_TYPE_PHYSICAL                                     (0x00000001)
+#define NVC9B5_LAUNCH_DMA_DST_TYPE                                              13:13
+#define NVC9B5_LAUNCH_DMA_DST_TYPE_VIRTUAL                                      (0x00000000)
+#define NVC9B5_LAUNCH_DMA_DST_TYPE_PHYSICAL                                     (0x00000001)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION                                   17:14
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_IMIN                              (0x00000000)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_IMAX                              (0x00000001)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_IXOR                              (0x00000002)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_IAND                              (0x00000003)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_IOR                               (0x00000004)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_IADD                              (0x00000005)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_INC                               (0x00000006)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_DEC                               (0x00000007)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_INVALIDA                          (0x00000008)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_INVALIDB                          (0x00000009)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_FADD                              (0x0000000A)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_FMIN                              (0x0000000B)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_FMAX                              (0x0000000C)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_INVALIDC                          (0x0000000D)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_INVALIDD                          (0x0000000E)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_INVALIDE                          (0x0000000F)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_SIGN                              18:18
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_SIGN_SIGNED                       (0x00000000)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_SIGN_UNSIGNED                     (0x00000001)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_ENABLE                            19:19
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_ENABLE_FALSE                      (0x00000000)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_REDUCTION_ENABLE_TRUE                       (0x00000001)
+#define NVC9B5_LAUNCH_DMA_COPY_TYPE                                             21:20
+#define NVC9B5_LAUNCH_DMA_COPY_TYPE_PROT2PROT                                   (0x00000000)
+#define NVC9B5_LAUNCH_DMA_COPY_TYPE_DEFAULT                                     (0x00000000)
+#define NVC9B5_LAUNCH_DMA_COPY_TYPE_SECURE                                      (0x00000001)
+#define NVC9B5_LAUNCH_DMA_COPY_TYPE_NONPROT2NONPROT                             (0x00000002)
+#define NVC9B5_LAUNCH_DMA_COPY_TYPE_RESERVED                                    (0x00000003)
+#define NVC9B5_LAUNCH_DMA_VPRMODE                                               22:22
+#define NVC9B5_LAUNCH_DMA_VPRMODE_VPR_NONE                                      (0x00000000)
+#define NVC9B5_LAUNCH_DMA_VPRMODE_VPR_VID2VID                                   (0x00000001)
+#define NVC9B5_LAUNCH_DMA_MEMORY_SCRUB_ENABLE                                   23:23
+#define NVC9B5_LAUNCH_DMA_MEMORY_SCRUB_ENABLE_FALSE                             (0x00000000)
+#define NVC9B5_LAUNCH_DMA_MEMORY_SCRUB_ENABLE_TRUE                              (0x00000001)
+#define NVC9B5_LAUNCH_DMA_RESERVED_START_OF_COPY                                24:24
+#define NVC9B5_LAUNCH_DMA_DISABLE_PLC                                           26:26
+#define NVC9B5_LAUNCH_DMA_DISABLE_PLC_FALSE                                     (0x00000000)
+#define NVC9B5_LAUNCH_DMA_DISABLE_PLC_TRUE                                      (0x00000001)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_PAYLOAD_SIZE                                27:27
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_PAYLOAD_SIZE_ONE_WORD                       (0x00000000)
+#define NVC9B5_LAUNCH_DMA_SEMAPHORE_PAYLOAD_SIZE_TWO_WORD                       (0x00000001)
+#define NVC9B5_LAUNCH_DMA_RESERVED_ERR_CODE                                     31:28
+#define NVC9B5_OFFSET_IN_UPPER                                                  (0x00000400)
+#define NVC9B5_OFFSET_IN_UPPER_UPPER                                            24:0
+#define NVC9B5_OFFSET_IN_LOWER                                                  (0x00000404)
+#define NVC9B5_OFFSET_IN_LOWER_VALUE                                            31:0
+#define NVC9B5_OFFSET_OUT_UPPER                                                 (0x00000408)
+#define NVC9B5_OFFSET_OUT_UPPER_UPPER                                           24:0
+#define NVC9B5_OFFSET_OUT_LOWER                                                 (0x0000040C)
+#define NVC9B5_OFFSET_OUT_LOWER_VALUE                                           31:0
+#define NVC9B5_PITCH_IN                                                         (0x00000410)
+#define NVC9B5_PITCH_IN_VALUE                                                   31:0
+#define NVC9B5_PITCH_OUT                                                        (0x00000414)
+#define NVC9B5_PITCH_OUT_VALUE                                                  31:0
+#define NVC9B5_LINE_LENGTH_IN                                                   (0x00000418)
+#define NVC9B5_LINE_LENGTH_IN_VALUE                                             31:0
+#define NVC9B5_LINE_COUNT                                                       (0x0000041C)
+#define NVC9B5_LINE_COUNT_VALUE                                                 31:0
+#define NVC9B5_SET_SECURE_COPY_MODE                                             (0x00000500)
+#define NVC9B5_SET_SECURE_COPY_MODE_MODE                                        0:0
+#define NVC9B5_SET_SECURE_COPY_MODE_MODE_ENCRYPT                                (0x00000000)
+#define NVC9B5_SET_SECURE_COPY_MODE_MODE_DECRYPT                                (0x00000001)
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_SRC_TARGET                         20:19
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_SRC_TARGET_LOCAL_FB                (0x00000000)
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_SRC_TARGET_COHERENT_SYSMEM         (0x00000001)
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_SRC_TARGET_NONCOHERENT_SYSMEM      (0x00000002)
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_SRC_TARGET_PEERMEM                 (0x00000003)
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_SRC_PEER_ID                        23:21
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_SRC_FLA                            24:24
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_DST_TARGET                         26:25
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_DST_TARGET_LOCAL_FB                (0x00000000)
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_DST_TARGET_COHERENT_SYSMEM         (0x00000001)
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_DST_TARGET_NONCOHERENT_SYSMEM      (0x00000002)
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_DST_TARGET_PEERMEM                 (0x00000003)
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_DST_PEER_ID                        29:27
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_DST_FLA                            30:30
+#define NVC9B5_SET_SECURE_COPY_MODE_RESERVED_END_OF_COPY                        31:31
+#define NVC9B5_SET_DECRYPT_IV0                                                  (0x00000504)
+#define NVC9B5_SET_DECRYPT_IV0_VALUE                                            31:0
+#define NVC9B5_SET_DECRYPT_IV1                                                  (0x00000508)
+#define NVC9B5_SET_DECRYPT_IV1_VALUE                                            31:0
+#define NVC9B5_SET_DECRYPT_IV2                                                  (0x0000050C)
+#define NVC9B5_SET_DECRYPT_IV2_VALUE                                            31:0
+#define NVC9B5_RESERVED_SET_AESCOUNTER                                          (0x00000510)
+#define NVC9B5_RESERVED_SET_AESCOUNTER_VALUE                                    31:0
+#define NVC9B5_SET_DECRYPT_AUTH_TAG_COMPARE_ADDR_UPPER                          (0x00000514)
+#define NVC9B5_SET_DECRYPT_AUTH_TAG_COMPARE_ADDR_UPPER_UPPER                    24:0
+#define NVC9B5_SET_DECRYPT_AUTH_TAG_COMPARE_ADDR_LOWER                          (0x00000518)
+#define NVC9B5_SET_DECRYPT_AUTH_TAG_COMPARE_ADDR_LOWER_LOWER                    31:0
+#define NVC9B5_SET_ENCRYPT_AUTH_TAG_ADDR_UPPER                                  (0x00000530)
+#define NVC9B5_SET_ENCRYPT_AUTH_TAG_ADDR_UPPER_UPPER                            24:0
+#define NVC9B5_SET_ENCRYPT_AUTH_TAG_ADDR_LOWER                                  (0x00000534)
+#define NVC9B5_SET_ENCRYPT_AUTH_TAG_ADDR_LOWER_LOWER                            31:0
+#define NVC9B5_SET_ENCRYPT_IV_ADDR_UPPER                                        (0x00000538)
+#define NVC9B5_SET_ENCRYPT_IV_ADDR_UPPER_UPPER                                  24:0
+#define NVC9B5_SET_ENCRYPT_IV_ADDR_LOWER                                        (0x0000053C)
+#define NVC9B5_SET_ENCRYPT_IV_ADDR_LOWER_LOWER                                  31:0
+#define NVC9B5_SET_COMPRESSION_PARAMETERS                                       (0x00000580)
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_OPERATION                             0:0
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_OPERATION_DECOMPRESS                  (0x00000000)
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_OPERATION_COMPRESS                    (0x00000001)
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_ALGO                                  3:1
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_ALGO_SNAPPY                           (0x00000000)
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_ALGO_LZ4_DATA_ONLY                    (0x00000001)
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_ALGO_LZ4_BLOCK                        (0x00000002)
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_ALGO_LZ4_BLOCK_CHECKSUM               (0x00000003)
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_ALGO_DEFLATE                          (0x00000004)
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_ALGO_SNAPPY_WITH_LONG_FETCH           (0x00000005)
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_CHECK_SUM                             29:28
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_CHECK_SUM_NONE                        (0x00000000)
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_CHECK_SUM_ADLER32                     (0x00000001)
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_CHECK_SUM_CRC32                       (0x00000002)
+#define NVC9B5_SET_COMPRESSION_PARAMETERS_CHECK_SUM_SNAPPY_CRC                  (0x00000003)
+#define NVC9B5_SET_DECOMPRESS_OUT_LENGTH                                        (0x00000584)
+#define NVC9B5_SET_DECOMPRESS_OUT_LENGTH_V                                      31:0
+#define NVC9B5_SET_DECOMPRESS_OUT_LENGTH_ADDR_UPPER                             (0x00000588)
+#define NVC9B5_SET_DECOMPRESS_OUT_LENGTH_ADDR_UPPER_UPPER                       24:0
+#define NVC9B5_SET_DECOMPRESS_OUT_LENGTH_ADDR_LOWER                             (0x0000058C)
+#define NVC9B5_SET_DECOMPRESS_OUT_LENGTH_ADDR_LOWER_LOWER                       31:0
+#define NVC9B5_SET_DECOMPRESS_CHECKSUM                                          (0x00000590)
+#define NVC9B5_SET_DECOMPRESS_CHECKSUM_V                                        31:0
+#define NVC9B5_SET_MEMORY_SCRUB_PARAMETERS                                      (0x000006FC)
+#define NVC9B5_SET_MEMORY_SCRUB_PARAMETERS_DISCARDABLE                          0:0
+#define NVC9B5_SET_MEMORY_SCRUB_PARAMETERS_DISCARDABLE_FALSE                    (0x00000000)
+#define NVC9B5_SET_MEMORY_SCRUB_PARAMETERS_DISCARDABLE_TRUE                     (0x00000001)
+#define NVC9B5_SET_REMAP_CONST_A                                                (0x00000700)
+#define NVC9B5_SET_REMAP_CONST_A_V                                              31:0
+#define NVC9B5_SET_REMAP_CONST_B                                                (0x00000704)
+#define NVC9B5_SET_REMAP_CONST_B_V                                              31:0
+#define NVC9B5_SET_REMAP_COMPONENTS                                             (0x00000708)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_X                                       2:0
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_X_SRC_X                                 (0x00000000)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_X_SRC_Y                                 (0x00000001)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_X_SRC_Z                                 (0x00000002)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_X_SRC_W                                 (0x00000003)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_X_CONST_A                               (0x00000004)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_X_CONST_B                               (0x00000005)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_X_NO_WRITE                              (0x00000006)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Y                                       6:4
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Y_SRC_X                                 (0x00000000)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Y_SRC_Y                                 (0x00000001)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Y_SRC_Z                                 (0x00000002)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Y_SRC_W                                 (0x00000003)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Y_CONST_A                               (0x00000004)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Y_CONST_B                               (0x00000005)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Y_NO_WRITE                              (0x00000006)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Z                                       10:8
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Z_SRC_X                                 (0x00000000)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Z_SRC_Y                                 (0x00000001)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Z_SRC_Z                                 (0x00000002)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Z_SRC_W                                 (0x00000003)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Z_CONST_A                               (0x00000004)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Z_CONST_B                               (0x00000005)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_Z_NO_WRITE                              (0x00000006)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_W                                       14:12
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_W_SRC_X                                 (0x00000000)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_W_SRC_Y                                 (0x00000001)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_W_SRC_Z                                 (0x00000002)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_W_SRC_W                                 (0x00000003)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_W_CONST_A                               (0x00000004)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_W_CONST_B                               (0x00000005)
+#define NVC9B5_SET_REMAP_COMPONENTS_DST_W_NO_WRITE                              (0x00000006)
+#define NVC9B5_SET_REMAP_COMPONENTS_COMPONENT_SIZE                              17:16
+#define NVC9B5_SET_REMAP_COMPONENTS_COMPONENT_SIZE_ONE                          (0x00000000)
+#define NVC9B5_SET_REMAP_COMPONENTS_COMPONENT_SIZE_TWO                          (0x00000001)
+#define NVC9B5_SET_REMAP_COMPONENTS_COMPONENT_SIZE_THREE                        (0x00000002)
+#define NVC9B5_SET_REMAP_COMPONENTS_COMPONENT_SIZE_FOUR                         (0x00000003)
+#define NVC9B5_SET_REMAP_COMPONENTS_NUM_SRC_COMPONENTS                          21:20
+#define NVC9B5_SET_REMAP_COMPONENTS_NUM_SRC_COMPONENTS_ONE                      (0x00000000)
+#define NVC9B5_SET_REMAP_COMPONENTS_NUM_SRC_COMPONENTS_TWO                      (0x00000001)
+#define NVC9B5_SET_REMAP_COMPONENTS_NUM_SRC_COMPONENTS_THREE                    (0x00000002)
+#define NVC9B5_SET_REMAP_COMPONENTS_NUM_SRC_COMPONENTS_FOUR                     (0x00000003)
+#define NVC9B5_SET_REMAP_COMPONENTS_NUM_DST_COMPONENTS                          25:24
+#define NVC9B5_SET_REMAP_COMPONENTS_NUM_DST_COMPONENTS_ONE                      (0x00000000)
+#define NVC9B5_SET_REMAP_COMPONENTS_NUM_DST_COMPONENTS_TWO                      (0x00000001)
+#define NVC9B5_SET_REMAP_COMPONENTS_NUM_DST_COMPONENTS_THREE                    (0x00000002)
+#define NVC9B5_SET_REMAP_COMPONENTS_NUM_DST_COMPONENTS_FOUR                     (0x00000003)
+#define NVC9B5_SET_DST_BLOCK_SIZE                                               (0x0000070C)
+#define NVC9B5_SET_DST_BLOCK_SIZE_WIDTH                                         3:0
+#define NVC9B5_SET_DST_BLOCK_SIZE_WIDTH_ONE_GOB                                 (0x00000000)
+#define NVC9B5_SET_DST_BLOCK_SIZE_HEIGHT                                        7:4
+#define NVC9B5_SET_DST_BLOCK_SIZE_HEIGHT_ONE_GOB                                (0x00000000)
+#define NVC9B5_SET_DST_BLOCK_SIZE_HEIGHT_TWO_GOBS                               (0x00000001)
+#define NVC9B5_SET_DST_BLOCK_SIZE_HEIGHT_FOUR_GOBS                              (0x00000002)
+#define NVC9B5_SET_DST_BLOCK_SIZE_HEIGHT_EIGHT_GOBS                             (0x00000003)
+#define NVC9B5_SET_DST_BLOCK_SIZE_HEIGHT_SIXTEEN_GOBS                           (0x00000004)
+#define NVC9B5_SET_DST_BLOCK_SIZE_HEIGHT_THIRTYTWO_GOBS                         (0x00000005)
+#define NVC9B5_SET_DST_BLOCK_SIZE_DEPTH                                         11:8
+#define NVC9B5_SET_DST_BLOCK_SIZE_DEPTH_ONE_GOB                                 (0x00000000)
+#define NVC9B5_SET_DST_BLOCK_SIZE_DEPTH_TWO_GOBS                                (0x00000001)
+#define NVC9B5_SET_DST_BLOCK_SIZE_DEPTH_FOUR_GOBS                               (0x00000002)
+#define NVC9B5_SET_DST_BLOCK_SIZE_DEPTH_EIGHT_GOBS                              (0x00000003)
+#define NVC9B5_SET_DST_BLOCK_SIZE_DEPTH_SIXTEEN_GOBS                            (0x00000004)
+#define NVC9B5_SET_DST_BLOCK_SIZE_DEPTH_THIRTYTWO_GOBS                          (0x00000005)
+#define NVC9B5_SET_DST_BLOCK_SIZE_GOB_HEIGHT                                    15:12
+#define NVC9B5_SET_DST_BLOCK_SIZE_GOB_HEIGHT_GOB_HEIGHT_FERMI_8                 (0x00000001)
+#define NVC9B5_SET_DST_WIDTH                                                    (0x00000710)
+#define NVC9B5_SET_DST_WIDTH_V                                                  31:0
+#define NVC9B5_SET_DST_HEIGHT                                                   (0x00000714)
+#define NVC9B5_SET_DST_HEIGHT_V                                                 31:0
+#define NVC9B5_SET_DST_DEPTH                                                    (0x00000718)
+#define NVC9B5_SET_DST_DEPTH_V                                                  31:0
+#define NVC9B5_SET_DST_LAYER                                                    (0x0000071C)
+#define NVC9B5_SET_DST_LAYER_V                                                  31:0
+#define NVC9B5_SET_DST_ORIGIN                                                   (0x00000720)
+#define NVC9B5_SET_DST_ORIGIN_X                                                 15:0
+#define NVC9B5_SET_DST_ORIGIN_Y                                                 31:16
+#define NVC9B5_SET_SRC_BLOCK_SIZE                                               (0x00000728)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_WIDTH                                         3:0
+#define NVC9B5_SET_SRC_BLOCK_SIZE_WIDTH_ONE_GOB                                 (0x00000000)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_HEIGHT                                        7:4
+#define NVC9B5_SET_SRC_BLOCK_SIZE_HEIGHT_ONE_GOB                                (0x00000000)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_HEIGHT_TWO_GOBS                               (0x00000001)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_HEIGHT_FOUR_GOBS                              (0x00000002)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_HEIGHT_EIGHT_GOBS                             (0x00000003)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_HEIGHT_SIXTEEN_GOBS                           (0x00000004)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_HEIGHT_THIRTYTWO_GOBS                         (0x00000005)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_DEPTH                                         11:8
+#define NVC9B5_SET_SRC_BLOCK_SIZE_DEPTH_ONE_GOB                                 (0x00000000)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_DEPTH_TWO_GOBS                                (0x00000001)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_DEPTH_FOUR_GOBS                               (0x00000002)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_DEPTH_EIGHT_GOBS                              (0x00000003)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_DEPTH_SIXTEEN_GOBS                            (0x00000004)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_DEPTH_THIRTYTWO_GOBS                          (0x00000005)
+#define NVC9B5_SET_SRC_BLOCK_SIZE_GOB_HEIGHT                                    15:12
+#define NVC9B5_SET_SRC_BLOCK_SIZE_GOB_HEIGHT_GOB_HEIGHT_FERMI_8                 (0x00000001)
+#define NVC9B5_SET_SRC_WIDTH                                                    (0x0000072C)
+#define NVC9B5_SET_SRC_WIDTH_V                                                  31:0
+#define NVC9B5_SET_SRC_HEIGHT                                                   (0x00000730)
+#define NVC9B5_SET_SRC_HEIGHT_V                                                 31:0
+#define NVC9B5_SET_SRC_DEPTH                                                    (0x00000734)
+#define NVC9B5_SET_SRC_DEPTH_V                                                  31:0
+#define NVC9B5_SET_SRC_LAYER                                                    (0x00000738)
+#define NVC9B5_SET_SRC_LAYER_V                                                  31:0
+#define NVC9B5_SET_SRC_ORIGIN                                                   (0x0000073C)
+#define NVC9B5_SET_SRC_ORIGIN_X                                                 15:0
+#define NVC9B5_SET_SRC_ORIGIN_Y                                                 31:16
+#define NVC9B5_SRC_ORIGIN_X                                                     (0x00000744)
+#define NVC9B5_SRC_ORIGIN_X_VALUE                                               31:0
+#define NVC9B5_SRC_ORIGIN_Y                                                     (0x00000748)
+#define NVC9B5_SRC_ORIGIN_Y_VALUE                                               31:0
+#define NVC9B5_DST_ORIGIN_X                                                     (0x0000074C)
+#define NVC9B5_DST_ORIGIN_X_VALUE                                               31:0
+#define NVC9B5_DST_ORIGIN_Y                                                     (0x00000750)
+#define NVC9B5_DST_ORIGIN_Y_VALUE                                               31:0
+#define NVC9B5_PM_TRIGGER_END                                                   (0x00001114)
+#define NVC9B5_PM_TRIGGER_END_V                                                 31:0
+
+#ifdef __cplusplus
+};     /* extern "C" */
+#endif
+#endif // _clc9b5_h
+
--- a/kernel-open/nvidia-uvm/ctrl2080mc.h
+++ b/kernel-open/nvidia-uvm/ctrl2080mc.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2013-2023 NVIDIA Corporation
+    Copyright (c) 2013-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -34,6 +34,7 @@
 #define NV2080_CTRL_MC_ARCH_INFO_ARCHITECTURE_GA100                (0x00000170)
 #define NV2080_CTRL_MC_ARCH_INFO_ARCHITECTURE_GH100                (0x00000180)
 #define NV2080_CTRL_MC_ARCH_INFO_ARCHITECTURE_AD100                (0x00000190)
+#define NV2080_CTRL_MC_ARCH_INFO_ARCHITECTURE_GB100                (0x000001A0)

 /* valid ARCHITECTURE_GP10x implementation values */
 #define NV2080_CTRL_MC_ARCH_INFO_IMPLEMENTATION_GP100              (0x00000000)
--- a/kernel-open/nvidia-uvm/hwref/blackwell/gb100/dev_fault.h
+++ b/kernel-open/nvidia-uvm/hwref/blackwell/gb100/dev_fault.h
@@ -0,0 +1,547 @@
+/*******************************************************************************
+    Copyright (c) 2003-2016 NVIDIA Corporation
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to
+    deal in the Software without restriction, including without limitation the
+    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be
+    included in all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+
+#ifndef __gb100_dev_fault_h__
+#define __gb100_dev_fault_h__
+/* This file is autogenerated.  Do not edit */
+#define NV_PFAULT                                              /* ----G */
+#define NV_PFAULT_MMU_ENG_ID_GRAPHICS          384 /*       */
+#define NV_PFAULT_MMU_ENG_ID_DISPLAY           1 /*       */
+#define NV_PFAULT_MMU_ENG_ID_GSP               2 /*       */
+#define NV_PFAULT_MMU_ENG_ID_IFB               55 /*       */
+#define NV_PFAULT_MMU_ENG_ID_FLA               4 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1              256 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2              320 /*       */
+#define NV_PFAULT_MMU_ENG_ID_SEC               6 /*       */
+#define NV_PFAULT_MMU_ENG_ID_FSP               7 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PERF              10 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PERF0             10 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PERF1             11 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PERF2             12 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PERF3             13 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PERF4             14 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PERF5             15 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PERF6             16 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PERF7             17 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PERF8             18 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PERF9             19 /*       */
+#define NV_PFAULT_MMU_ENG_ID_GSPLITE          20 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVDEC             28 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVDEC0            28 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVDEC1            29 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVDEC2            30 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVDEC3            31 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVDEC4            32 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVDEC5            33 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVDEC6            34 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVDEC7            35 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVJPG0            36 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVJPG1            37 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVJPG2            38 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVJPG3            39 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVJPG4            40 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVJPG5            41 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVJPG6            42 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVJPG7            43 /*       */
+#define NV_PFAULT_MMU_ENG_ID_GRCOPY            65 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE0               65 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE1               66 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE2               67 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE3               68 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE4               69 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE5               70 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE6               71 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE7               72 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE8               73 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE9               74 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE10               75 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE11               76 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE12               77 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE13               78 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE14               79 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE15               80 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE16               81 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE17               82 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE18               83 /*       */
+#define NV_PFAULT_MMU_ENG_ID_CE19               84 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PWR_PMU           5 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PTP               3 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVENC0            44 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVENC1            45 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVENC2            46 /*       */
+#define NV_PFAULT_MMU_ENG_ID_NVENC3            47 /*       */
+#define NV_PFAULT_MMU_ENG_ID_OFA0              48 /*       */
+#define NV_PFAULT_MMU_ENG_ID_PHYSICAL          56 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST0             85 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST1             86 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST2             87 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST3             88 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST4             89 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST5             90 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST6             91 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST7             92 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST8             93 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST9             94 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST10            95 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST11            96 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST12            97 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST13            98 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST14            99 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST15            100 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST16            101 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST17            102 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST18            103 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST19            104 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST20            105 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST21            106 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST22            107 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST23            108 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST24            109 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST25            110 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST26            111 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST27            112 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST28            113 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST29            114 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST30            115 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST31            116 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST32            117 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST33            118 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST34            119 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST35            120 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST36            121 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST37            122 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST38            123 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST39            124 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST40            125 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST41            126 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST42            127 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST43            128 /*       */
+#define NV_PFAULT_MMU_ENG_ID_HOST44            129 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN0          256  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN1          257  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN2          258  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN3          259  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN4          260  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN5          261  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN6          262  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN7          263  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN8          264  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN9          265  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN10         266 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN11         267 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN12         268 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN13         269 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN14         270 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN15         271 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN16         272 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN17         273 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN18         274 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN19         275 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN20         276 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN21         277 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN22         278 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN23         279 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN24         280 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN25         281 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN26         282 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN27         283 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN28         284 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN29         285 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN30         286 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN31         287 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN32         288 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN33         289 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN34         290 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN35         291 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN36         292 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN37         293 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN38         294 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN39         295 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN40         296 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN41         297 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN42         298 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN43         299 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN44         300 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN45         301 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN46         302 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN47         303 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN48         304 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN49         305 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN50         306 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN51         307 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN52         308 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN53         309 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN54         310 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN55         311 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN56         312 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN57         313 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN58         314 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN59         315 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN60         316 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN61         317 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN62         318 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR1_FN63         319 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN0          320  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN1          321  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN2          322  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN3          323  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN4          324  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN5          325  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN6          326  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN7          327  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN8          328  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN9          329  /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN10         330 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN11         331 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN12         332 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN13         333 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN14         334 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN15         335 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN16         336 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN17         337 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN18         338 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN19         339 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN20         340 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN21         341 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN22         342 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN23         343 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN24         344 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN25         345 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN26         346 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN27         347 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN28         348 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN29         349 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN30         350 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN31         351 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN32         352 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN33         353 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN34         354 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN35         355 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN36         356 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN37         357 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN38         358 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN39         359 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN40         360 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN41         361 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN42         362 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN43         363 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN44         364 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN45         365 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN46         366 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN47         367 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN48         368 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN49         369 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN50         370 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN51         371 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN52         372 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN53         373 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN54         374 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN55         375 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN56         376 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN57         377 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN58         378 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN59         379 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN60         380 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN61         381 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN62         382 /*       */
+#define NV_PFAULT_MMU_ENG_ID_BAR2_FN63         383 /*       */
+#define NV_PFAULT_FAULT_TYPE                             4:0 /*       */
+#define NV_PFAULT_FAULT_TYPE_PDE                  0x00000000 /*       */
+#define NV_PFAULT_FAULT_TYPE_PDE_SIZE             0x00000001 /*       */
+#define NV_PFAULT_FAULT_TYPE_PTE                  0x00000002 /*       */
+#define NV_PFAULT_FAULT_TYPE_VA_LIMIT_VIOLATION   0x00000003 /*       */
+#define NV_PFAULT_FAULT_TYPE_UNBOUND_INST_BLOCK   0x00000004 /*       */
+#define NV_PFAULT_FAULT_TYPE_PRIV_VIOLATION       0x00000005 /*       */
+#define NV_PFAULT_FAULT_TYPE_RO_VIOLATION         0x00000006 /*       */
+#define NV_PFAULT_FAULT_TYPE_WO_VIOLATION         0x00000007 /*       */
+#define NV_PFAULT_FAULT_TYPE_PITCH_MASK_VIOLATION 0x00000008 /*       */
+#define NV_PFAULT_FAULT_TYPE_WORK_CREATION        0x00000009 /*       */
+#define NV_PFAULT_FAULT_TYPE_UNSUPPORTED_APERTURE 0x0000000a /*       */
+#define NV_PFAULT_FAULT_TYPE_CC_VIOLATION         0x0000000b /*       */
+#define NV_PFAULT_FAULT_TYPE_UNSUPPORTED_KIND     0x0000000c /*       */
+#define NV_PFAULT_FAULT_TYPE_REGION_VIOLATION     0x0000000d /*       */
+#define NV_PFAULT_FAULT_TYPE_POISONED             0x0000000e /*       */
+#define NV_PFAULT_FAULT_TYPE_ATOMIC_VIOLATION     0x0000000f /*       */
+#define NV_PFAULT_CLIENT                       14:8 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_0        0x00000000 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_1        0x00000001 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_2        0x00000002 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_3        0x00000003 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_4        0x00000004 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_5        0x00000005 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_6        0x00000006 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_7        0x00000007 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_0        0x00000008 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_1        0x00000009 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_2        0x0000000A /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_3        0x0000000B /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_4        0x0000000C /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_5        0x0000000D /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_6        0x0000000E /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_7        0x0000000F /*       */
+#define NV_PFAULT_CLIENT_GPC_RAST        0x00000010 /*       */
+#define NV_PFAULT_CLIENT_GPC_GCC         0x00000011 /*       */
+#define NV_PFAULT_CLIENT_GPC_GPCCS       0x00000012 /*       */
+#define NV_PFAULT_CLIENT_GPC_PROP_0      0x00000013 /*       */
+#define NV_PFAULT_CLIENT_GPC_PROP_1      0x00000014 /*       */
+#define NV_PFAULT_CLIENT_GPC_PROP_2      0x00000015 /*       */
+#define NV_PFAULT_CLIENT_GPC_PROP_3      0x00000016 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_8        0x00000021 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_9        0x00000022 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_10       0x00000023 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_11       0x00000024 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_12       0x00000025 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_13       0x00000026 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_14       0x00000027 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_15       0x00000028 /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_0     0x00000029 /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_1     0x0000002A /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_2     0x0000002B /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_3     0x0000002C /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_4     0x0000002D /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_5     0x0000002E /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_6     0x0000002F /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_7     0x00000030 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_8        0x00000031 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_9        0x00000032 /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_8     0x00000033 /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_9     0x00000034 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_16       0x00000035 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_17       0x00000036 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_18       0x00000037 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_19       0x00000038 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_10       0x00000039 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_11       0x0000003A /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_10    0x0000003B /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_11    0x0000003C /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_20       0x0000003D /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_21       0x0000003E /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_22       0x0000003F /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_23       0x00000040 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_12       0x00000041 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_13       0x00000042 /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_12    0x00000043 /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_13    0x00000044 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_24       0x00000045 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_25       0x00000046 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_26       0x00000047 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_27       0x00000048 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_14       0x00000049 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_15       0x0000004A /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_14    0x0000004B /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_15    0x0000004C /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_28       0x0000004D /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_29       0x0000004E /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_30       0x0000004F /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_31       0x00000050 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_16       0x00000051 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_17       0x00000052 /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_16    0x00000053 /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_17    0x00000054 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_32       0x00000055 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_33       0x00000056 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_34       0x00000057 /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_35       0x00000058 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_18       0x00000059 /*       */
+#define NV_PFAULT_CLIENT_GPC_PE_19       0x0000005A /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_18    0x0000005B /*       */
+#define NV_PFAULT_CLIENT_GPC_TPCCS_19    0x0000005C /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_36       0x0000005D /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_37       0x0000005E /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_38       0x0000005F /*       */
+#define NV_PFAULT_CLIENT_GPC_T1_39       0x00000060 /*       */
+#define NV_PFAULT_CLIENT_GPC_ROP_0       0x00000070 /*       */
+#define NV_PFAULT_CLIENT_GPC_ROP_1       0x00000071 /*       */
+#define NV_PFAULT_CLIENT_GPC_ROP_2       0x00000072 /*       */
+#define NV_PFAULT_CLIENT_GPC_ROP_3       0x00000073 /*       */
+#define NV_PFAULT_CLIENT_GPC_GPM          0x00000017 /*       */
+#define NV_PFAULT_CLIENT_HUB_VIP         0x00000000 /*       */
+#define NV_PFAULT_CLIENT_HUB_CE0         0x00000001 /*       */
+#define NV_PFAULT_CLIENT_HUB_CE1         0x00000002 /*       */
+#define NV_PFAULT_CLIENT_HUB_DNISO       0x00000003 /*       */
+#define NV_PFAULT_CLIENT_HUB_DISPNISO    0x00000003 /*       */
+#define NV_PFAULT_CLIENT_HUB_FE0         0x00000004 /*       */
+#define NV_PFAULT_CLIENT_HUB_FE          0x00000004 /*       */
+#define NV_PFAULT_CLIENT_HUB_FECS0       0x00000005 /*       */
+#define NV_PFAULT_CLIENT_HUB_FECS        0x00000005 /*       */
+#define NV_PFAULT_CLIENT_HUB_HOST        0x00000006 /*       */
+#define NV_PFAULT_CLIENT_HUB_HOST_CPU    0x00000007 /*       */
+#define NV_PFAULT_CLIENT_HUB_HOST_CPU_NB 0x00000008 /*       */
+#define NV_PFAULT_CLIENT_HUB_ISO         0x00000009 /*       */
+#define NV_PFAULT_CLIENT_HUB_MMU         0x0000000A /*       */
+#define NV_PFAULT_CLIENT_HUB_NVDEC0      0x0000000B /*       */
+#define NV_PFAULT_CLIENT_HUB_NVDEC       0x0000000B /*       */
+#define NV_PFAULT_CLIENT_HUB_CE3         0x0000000C /*       */
+#define NV_PFAULT_CLIENT_HUB_NVENC1      0x0000000D /*       */
+#define NV_PFAULT_CLIENT_HUB_NISO        0x0000000E /*       */
+#define NV_PFAULT_CLIENT_HUB_ACTRS       0x0000000E /*       */
+#define NV_PFAULT_CLIENT_HUB_P2P         0x0000000F /*       */
+#define NV_PFAULT_CLIENT_HUB_PD          0x00000010 /*       */
+#define NV_PFAULT_CLIENT_HUB_PD0         0x00000010 /*       */
+#define NV_PFAULT_CLIENT_HUB_PERF0       0x00000011 /*       */
+#define NV_PFAULT_CLIENT_HUB_PERF        0x00000011 /*       */
+#define NV_PFAULT_CLIENT_HUB_PMU         0x00000012 /*       */
+#define NV_PFAULT_CLIENT_HUB_RASTERTWOD  0x00000013 /*       */
+#define NV_PFAULT_CLIENT_HUB_RASTERTWOD0 0x00000013 /*       */
+#define NV_PFAULT_CLIENT_HUB_SCC         0x00000014 /*       */
+#define NV_PFAULT_CLIENT_HUB_SCC0        0x00000014 /*       */
+#define NV_PFAULT_CLIENT_HUB_SCC_NB      0x00000015 /*       */
+#define NV_PFAULT_CLIENT_HUB_SCC_NB0     0x00000015 /*       */
+#define NV_PFAULT_CLIENT_HUB_SEC         0x00000016 /*       */
+#define NV_PFAULT_CLIENT_HUB_SSYNC       0x00000017 /*       */
+#define NV_PFAULT_CLIENT_HUB_SSYNC0      0x00000017 /*       */
+#define NV_PFAULT_CLIENT_HUB_GRCOPY      0x00000018 /*       */
+#define NV_PFAULT_CLIENT_HUB_CE2         0x00000018 /*       */
+#define NV_PFAULT_CLIENT_HUB_XV          0x00000019 /*       */
+#define NV_PFAULT_CLIENT_HUB_MMU_NB      0x0000001A /*       */
+#define NV_PFAULT_CLIENT_HUB_NVENC0      0x0000001B /*       */
+#define NV_PFAULT_CLIENT_HUB_NVENC       0x0000001B /*       */
+#define NV_PFAULT_CLIENT_HUB_DFALCON     0x0000001C /*       */
+#define NV_PFAULT_CLIENT_HUB_SKED0       0x0000001D /*       */
+#define NV_PFAULT_CLIENT_HUB_SKED        0x0000001D /*       */
+#define NV_PFAULT_CLIENT_HUB_PD1         0x0000001E /*       */
+#define NV_PFAULT_CLIENT_HUB_DONT_CARE   0x0000001F /*       */
+#define NV_PFAULT_CLIENT_HUB_HSCE0       0x00000020 /*       */
+#define NV_PFAULT_CLIENT_HUB_HSCE1       0x00000021 /*       */
+#define NV_PFAULT_CLIENT_HUB_HSCE2       0x00000022 /*       */
+#define NV_PFAULT_CLIENT_HUB_HSCE3       0x00000023 /*       */
+#define NV_PFAULT_CLIENT_HUB_HSCE4       0x00000024 /*       */
+#define NV_PFAULT_CLIENT_HUB_HSCE5       0x00000025 /*       */
+#define NV_PFAULT_CLIENT_HUB_HSCE6       0x00000026 /*       */
+#define NV_PFAULT_CLIENT_HUB_HSCE7       0x00000027 /*       */
+#define NV_PFAULT_CLIENT_HUB_SSYNC1      0x00000028 /*       */
+#define NV_PFAULT_CLIENT_HUB_SSYNC2      0x00000029 /*       */
+#define NV_PFAULT_CLIENT_HUB_HSHUB       0x0000002A /*       */
+#define NV_PFAULT_CLIENT_HUB_PTP_X0      0x0000002B /*       */
+#define NV_PFAULT_CLIENT_HUB_PTP_X1      0x0000002C /*       */
+#define NV_PFAULT_CLIENT_HUB_PTP_X2      0x0000002D /*       */
+#define NV_PFAULT_CLIENT_HUB_PTP_X3      0x0000002E /*       */
+#define NV_PFAULT_CLIENT_HUB_PTP_X4      0x0000002F /*       */
+#define NV_PFAULT_CLIENT_HUB_PTP_X5      0x00000030 /*       */
+#define NV_PFAULT_CLIENT_HUB_PTP_X6      0x00000031 /*       */
+#define NV_PFAULT_CLIENT_HUB_PTP_X7      0x00000032 /*       */
+#define NV_PFAULT_CLIENT_HUB_NVENC2      0x00000033 /*       */
+#define NV_PFAULT_CLIENT_HUB_VPR_SCRUBBER0 0x00000034 /*       */
+#define NV_PFAULT_CLIENT_HUB_VPR_SCRUBBER1 0x00000035 /*       */
+#define NV_PFAULT_CLIENT_HUB_SSYNC3      0x00000036 /*       */
+#define NV_PFAULT_CLIENT_HUB_FBFALCON    0x00000037 /*       */
+#define NV_PFAULT_CLIENT_HUB_CE_SHIM     0x00000038 /*       */
+#define NV_PFAULT_CLIENT_HUB_CE_SHIM0    0x00000038 /*       */
+#define NV_PFAULT_CLIENT_HUB_GSP         0x00000039 /*       */
+#define NV_PFAULT_CLIENT_HUB_NVDEC1      0x0000003A /*       */
+#define NV_PFAULT_CLIENT_HUB_NVDEC2      0x0000003B /*       */
+#define NV_PFAULT_CLIENT_HUB_NVJPG0      0x0000003C /*       */
+#define NV_PFAULT_CLIENT_HUB_NVDEC3      0x0000003D /*       */
+#define NV_PFAULT_CLIENT_HUB_NVDEC4      0x0000003E /*       */
+#define NV_PFAULT_CLIENT_HUB_OFA0        0x0000003F /*       */
+#define NV_PFAULT_CLIENT_HUB_SCC1        0x00000040 /*       */
+#define NV_PFAULT_CLIENT_HUB_SCC_NB1     0x00000041 /*       */
+#define NV_PFAULT_CLIENT_HUB_SCC2        0x00000042 /*       */
+#define NV_PFAULT_CLIENT_HUB_SCC_NB2     0x00000043 /*       */
+#define NV_PFAULT_CLIENT_HUB_SCC3        0x00000044 /*       */
+#define NV_PFAULT_CLIENT_HUB_SCC_NB3     0x00000045 /*       */
+#define NV_PFAULT_CLIENT_HUB_RASTERTWOD1 0x00000046 /*       */
+#define NV_PFAULT_CLIENT_HUB_PTP_X8      0x00000046 /*       */
+#define NV_PFAULT_CLIENT_HUB_RASTERTWOD2 0x00000047 /*       */
+#define NV_PFAULT_CLIENT_HUB_RASTERTWOD3 0x00000048 /*       */
+#define NV_PFAULT_CLIENT_HUB_GSPLITE1    0x00000049 /*       */
+#define NV_PFAULT_CLIENT_HUB_GSPLITE2    0x0000004A /*       */
+#define NV_PFAULT_CLIENT_HUB_GSPLITE3    0x0000004B /*       */
+#define NV_PFAULT_CLIENT_HUB_PD2         0x0000004C /*       */
+#define NV_PFAULT_CLIENT_HUB_PD3         0x0000004D /*       */
+#define NV_PFAULT_CLIENT_HUB_FE1         0x0000004E /*       */
+#define NV_PFAULT_CLIENT_HUB_FE2         0x0000004F /*       */
+#define NV_PFAULT_CLIENT_HUB_FE3         0x00000050 /*       */
+#define NV_PFAULT_CLIENT_HUB_FE4         0x00000051 /*       */
+#define NV_PFAULT_CLIENT_HUB_FE5         0x00000052 /*       */
+#define NV_PFAULT_CLIENT_HUB_FE6         0x00000053 /*       */
+#define NV_PFAULT_CLIENT_HUB_FE7         0x00000054 /*       */
+#define NV_PFAULT_CLIENT_HUB_FECS1       0x00000055 /*       */
+#define NV_PFAULT_CLIENT_HUB_FECS2       0x00000056 /*       */
+#define NV_PFAULT_CLIENT_HUB_FECS3       0x00000057 /*       */
+#define NV_PFAULT_CLIENT_HUB_FECS4       0x00000058 /*       */
+#define NV_PFAULT_CLIENT_HUB_FECS5       0x00000059 /*       */
+#define NV_PFAULT_CLIENT_HUB_FECS6       0x0000005A /*       */
+#define NV_PFAULT_CLIENT_HUB_FECS7       0x0000005B /*       */
+#define NV_PFAULT_CLIENT_HUB_SKED1       0x0000005C /*       */
+#define NV_PFAULT_CLIENT_HUB_SKED2       0x0000005D /*       */
+#define NV_PFAULT_CLIENT_HUB_SKED3       0x0000005E /*       */
+#define NV_PFAULT_CLIENT_HUB_SKED4       0x0000005F /*       */
+#define NV_PFAULT_CLIENT_HUB_SKED5       0x00000060 /*       */
+#define NV_PFAULT_CLIENT_HUB_SKED6       0x00000061 /*       */
+#define NV_PFAULT_CLIENT_HUB_SKED7       0x00000062 /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC          0x00000063 /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC0         0x00000063 /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC1         0x00000064 /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC2         0x00000065 /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC3         0x00000066 /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC4         0x00000067 /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC5         0x00000068 /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC6         0x00000069 /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC7         0x0000006a /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC8         0x0000006b /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC9         0x0000006c /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC10        0x0000006d /*       */
+#define NV_PFAULT_CLIENT_HUB_ESC11        0x0000006e /*       */
+#define NV_PFAULT_CLIENT_HUB_NVDEC5      0x0000006F /*       */
+#define NV_PFAULT_CLIENT_HUB_NVDEC6      0x00000070 /*       */
+#define NV_PFAULT_CLIENT_HUB_NVDEC7      0x00000071 /*       */
+#define NV_PFAULT_CLIENT_HUB_NVJPG1      0x00000072 /*       */
+#define NV_PFAULT_CLIENT_HUB_NVJPG2      0x00000073 /*       */
+#define NV_PFAULT_CLIENT_HUB_NVJPG3      0x00000074 /*       */
+#define NV_PFAULT_CLIENT_HUB_NVJPG4      0x00000075 /*       */
+#define NV_PFAULT_CLIENT_HUB_NVJPG5      0x00000076 /*       */
+#define NV_PFAULT_CLIENT_HUB_NVJPG6      0x00000077 /*       */
+#define NV_PFAULT_CLIENT_HUB_NVJPG7      0x00000078 /*       */
+#define NV_PFAULT_CLIENT_HUB_FSP         0x00000079 /*       */
+#define NV_PFAULT_CLIENT_HUB_BSI         0x0000007A /*       */
+#define NV_PFAULT_CLIENT_HUB_GSPLITE     0x0000007B /*       */
+#define NV_PFAULT_CLIENT_HUB_GSPLITE0    0x0000007B /*       */
+#define NV_PFAULT_CLIENT_HUB_VPR_SCRUBBER2 0x0000007C /*       */
+#define NV_PFAULT_CLIENT_HUB_VPR_SCRUBBER3 0x0000007D /*       */
+#define NV_PFAULT_CLIENT_HUB_VPR_SCRUBBER4 0x0000007E /*       */
+#define NV_PFAULT_CLIENT_HUB_NVENC3      0x0000007F /*       */
+#define NV_PFAULT_ACCESS_TYPE                 19:16 /*       */
+#define NV_PFAULT_ACCESS_TYPE_READ       0x00000000 /*       */
+#define NV_PFAULT_ACCESS_TYPE_WRITE      0x00000001 /*       */
+#define NV_PFAULT_ACCESS_TYPE_ATOMIC     0x00000002 /*       */
+#define NV_PFAULT_ACCESS_TYPE_PREFETCH   0x00000003 /*       */
+#define NV_PFAULT_ACCESS_TYPE_VIRT_READ          0x00000000 /*       */
+#define NV_PFAULT_ACCESS_TYPE_VIRT_WRITE         0x00000001 /*       */
+#define NV_PFAULT_ACCESS_TYPE_VIRT_ATOMIC        0x00000002 /*       */
+#define NV_PFAULT_ACCESS_TYPE_VIRT_ATOMIC_STRONG 0x00000002 /*       */
+#define NV_PFAULT_ACCESS_TYPE_VIRT_PREFETCH      0x00000003 /*       */
+#define NV_PFAULT_ACCESS_TYPE_VIRT_ATOMIC_WEAK   0x00000004 /*       */
+#define NV_PFAULT_ACCESS_TYPE_PHYS_READ          0x00000008 /*       */
+#define NV_PFAULT_ACCESS_TYPE_PHYS_WRITE         0x00000009 /*       */
+#define NV_PFAULT_ACCESS_TYPE_PHYS_ATOMIC        0x0000000a /*       */
+#define NV_PFAULT_ACCESS_TYPE_PHYS_PREFETCH      0x0000000b /*       */
+#define NV_PFAULT_MMU_CLIENT_TYPE             20:20 /*       */
+#define NV_PFAULT_MMU_CLIENT_TYPE_GPC    0x00000000 /*       */
+#define NV_PFAULT_MMU_CLIENT_TYPE_HUB    0x00000001 /*       */
+#define NV_PFAULT_GPC_ID                      28:24 /*       */
+#define NV_PFAULT_PROTECTED_MODE              29:29 /*       */
+#define NV_PFAULT_REPLAYABLE_FAULT_EN         30:30 /*       */
+#define NV_PFAULT_VALID                       31:31 /*       */
+#endif // __gb100_dev_fault_h__
--- a/kernel-open/nvidia-uvm/hwref/blackwell/gb100/dev_mmu.h
+++ b/kernel-open/nvidia-uvm/hwref/blackwell/gb100/dev_mmu.h
@@ -0,0 +1,560 @@
+/*******************************************************************************
+    Copyright (c) 2003-2016 NVIDIA Corporation
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to
+    deal in the Software without restriction, including without limitation the
+    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be
+    included in all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+
+#ifndef __gb100_dev_mmu_h__
+#define __gb100_dev_mmu_h__
+/* This file is autogenerated.  Do not edit */
+#define NV_MMU_PDE                                                      /* ----G */
+#define NV_MMU_PDE_APERTURE_BIG                       (0*32+1):(0*32+0) /* RWXVF */
+#define NV_MMU_PDE_APERTURE_BIG_INVALID                      0x00000000 /* RW--V */
+#define NV_MMU_PDE_APERTURE_BIG_VIDEO_MEMORY                 0x00000001 /* RW--V */
+#define NV_MMU_PDE_APERTURE_BIG_SYSTEM_COHERENT_MEMORY       0x00000002 /* RW--V */
+#define NV_MMU_PDE_APERTURE_BIG_SYSTEM_NON_COHERENT_MEMORY   0x00000003 /* RW--V */
+#define NV_MMU_PDE_SIZE                               (0*32+3):(0*32+2) /* RWXVF */
+#define NV_MMU_PDE_SIZE_FULL                                 0x00000000 /* RW--V */
+#define NV_MMU_PDE_SIZE_HALF                                 0x00000001 /* RW--V */
+#define NV_MMU_PDE_SIZE_QUARTER                              0x00000002 /* RW--V */
+#define NV_MMU_PDE_SIZE_EIGHTH                               0x00000003 /* RW--V */
+#define NV_MMU_PDE_ADDRESS_BIG_SYS                   (0*32+31):(0*32+4) /* RWXVF */
+#define NV_MMU_PDE_ADDRESS_BIG_VID                   (0*32+31-3):(0*32+4) /* RWXVF */
+#define NV_MMU_PDE_ADDRESS_BIG_VID_PEER             (0*32+31):(0*32+32-3) /* RWXVF */
+#define NV_MMU_PDE_ADDRESS_BIG_VID_PEER_0                    0x00000000 /* RW--V */
+#define NV_MMU_PDE_APERTURE_SMALL                     (1*32+1):(1*32+0) /* RWXVF */
+#define NV_MMU_PDE_APERTURE_SMALL_INVALID                    0x00000000 /* RW--V */
+#define NV_MMU_PDE_APERTURE_SMALL_VIDEO_MEMORY               0x00000001 /* RW--V */
+#define NV_MMU_PDE_APERTURE_SMALL_SYSTEM_COHERENT_MEMORY     0x00000002 /* RW--V */
+#define NV_MMU_PDE_APERTURE_SMALL_SYSTEM_NON_COHERENT_MEMORY 0x00000003 /* RW--V */
+#define NV_MMU_PDE_VOL_SMALL                          (1*32+2):(1*32+2) /* RWXVF */
+#define NV_MMU_PDE_VOL_SMALL_TRUE                            0x00000001 /* RW--V */
+#define NV_MMU_PDE_VOL_SMALL_FALSE                           0x00000000 /* RW--V */
+#define NV_MMU_PDE_VOL_BIG                            (1*32+3):(1*32+3) /* RWXVF */
+#define NV_MMU_PDE_VOL_BIG_TRUE                              0x00000001 /* RW--V */
+#define NV_MMU_PDE_VOL_BIG_FALSE                             0x00000000 /* RW--V */
+#define NV_MMU_PDE_ADDRESS_SMALL_SYS                 (1*32+31):(1*32+4) /* RWXVF */
+#define NV_MMU_PDE_ADDRESS_SMALL_VID                 (1*32+31-3):(1*32+4) /* RWXVF */
+#define NV_MMU_PDE_ADDRESS_SMALL_VID_PEER           (1*32+31):(1*32+32-3) /* RWXVF */
+#define NV_MMU_PDE_ADDRESS_SMALL_VID_PEER_0                  0x00000000 /* RW--V */
+#define NV_MMU_PDE_ADDRESS_SHIFT                             0x0000000c /*       */
+#define NV_MMU_PDE__SIZE                                              8
+#define NV_MMU_PTE                                                      /* ----G */
+#define NV_MMU_PTE_VALID                              (0*32+0):(0*32+0) /* RWXVF */
+#define NV_MMU_PTE_VALID_TRUE                                       0x1 /* RW--V */
+#define NV_MMU_PTE_VALID_FALSE                                      0x0 /* RW--V */
+#define NV_MMU_PTE_PRIVILEGE                          (0*32+1):(0*32+1) /* RWXVF */
+#define NV_MMU_PTE_PRIVILEGE_TRUE                                   0x1 /* RW--V */
+#define NV_MMU_PTE_PRIVILEGE_FALSE                                  0x0 /* RW--V */
+#define NV_MMU_PTE_READ_ONLY                          (0*32+2):(0*32+2) /* RWXVF */
+#define NV_MMU_PTE_READ_ONLY_TRUE                                  0x1  /* RW--V */
+#define NV_MMU_PTE_READ_ONLY_FALSE                                 0x0  /* RW--V */
+#define NV_MMU_PTE_ENCRYPTED                          (0*32+3):(0*32+3) /* RWXVF */
+#define NV_MMU_PTE_ENCRYPTED_TRUE                            0x00000001 /* R---V */
+#define NV_MMU_PTE_ENCRYPTED_FALSE                           0x00000000 /* R---V */
+#define NV_MMU_PTE_ADDRESS_SYS                      (0*32+31):(0*32+4) /* RWXVF */
+#define NV_MMU_PTE_ADDRESS_VID                      (0*32+31-3):(0*32+4) /* RWXVF */
+#define NV_MMU_PTE_ADDRESS_VID_PEER                (0*32+31):(0*32+32-3) /* RWXVF */
+#define NV_MMU_PTE_ADDRESS_VID_PEER_0                       0x00000000 /* RW--V */
+#define NV_MMU_PTE_ADDRESS_VID_PEER_1                       0x00000001 /* RW--V */
+#define NV_MMU_PTE_ADDRESS_VID_PEER_2                       0x00000002 /* RW--V */
+#define NV_MMU_PTE_ADDRESS_VID_PEER_3                       0x00000003 /* RW--V */
+#define NV_MMU_PTE_ADDRESS_VID_PEER_4                       0x00000004 /* RW--V */
+#define NV_MMU_PTE_ADDRESS_VID_PEER_5                       0x00000005 /* RW--V */
+#define NV_MMU_PTE_ADDRESS_VID_PEER_6                       0x00000006 /* RW--V */
+#define NV_MMU_PTE_ADDRESS_VID_PEER_7                       0x00000007 /* RW--V */
+#define NV_MMU_PTE_VOL                                (1*32+0):(1*32+0) /* RWXVF */
+#define NV_MMU_PTE_VOL_TRUE                                  0x00000001 /* RW--V */
+#define NV_MMU_PTE_VOL_FALSE                                 0x00000000 /* RW--V */
+#define NV_MMU_PTE_APERTURE                           (1*32+2):(1*32+1) /* RWXVF */
+#define NV_MMU_PTE_APERTURE_VIDEO_MEMORY                     0x00000000 /* RW--V */
+#define NV_MMU_PTE_APERTURE_PEER_MEMORY                      0x00000001 /* RW--V */
+#define NV_MMU_PTE_APERTURE_SYSTEM_COHERENT_MEMORY           0x00000002 /* RW--V */
+#define NV_MMU_PTE_APERTURE_SYSTEM_NON_COHERENT_MEMORY       0x00000003 /* RW--V */
+#define NV_MMU_PTE_LOCK                               (1*32+3):(1*32+3) /* RWXVF */
+#define NV_MMU_PTE_LOCK_TRUE                                        0x1 /* RW--V */
+#define NV_MMU_PTE_LOCK_FALSE                                       0x0 /* RW--V */
+#define NV_MMU_PTE_ATOMIC_DISABLE                     (1*32+3):(1*32+3) /* RWXVF */
+#define NV_MMU_PTE_ATOMIC_DISABLE_TRUE                              0x1 /* RW--V */
+#define NV_MMU_PTE_ATOMIC_DISABLE_FALSE                             0x0 /* RW--V */
+#define NV_MMU_PTE_COMPTAGLINE                      (1*32+20+11):(1*32+12) /* RWXVF */
+#define NV_MMU_PTE_READ_DISABLE                     (1*32+30):(1*32+30) /* RWXVF */
+#define NV_MMU_PTE_READ_DISABLE_TRUE                               0x1  /* RW--V */
+#define NV_MMU_PTE_READ_DISABLE_FALSE                              0x0  /* RW--V */
+#define NV_MMU_PTE_WRITE_DISABLE                    (1*32+31):(1*32+31) /* RWXVF */
+#define NV_MMU_PTE_WRITE_DISABLE_TRUE                              0x1  /* RW--V */
+#define NV_MMU_PTE_WRITE_DISABLE_FALSE                             0x0  /* RW--V */
+#define NV_MMU_PTE_ADDRESS_SHIFT                             0x0000000c /*       */
+#define NV_MMU_PTE__SIZE                                             8
+#define NV_MMU_PTE_COMPTAGS_NONE                                    0x0 /*       */
+#define NV_MMU_PTE_COMPTAGS_1                                       0x1 /*       */
+#define NV_MMU_PTE_COMPTAGS_2                                       0x2 /*       */
+#define NV_MMU_PTE_KIND                              (1*32+7):(1*32+4) /* RWXVF */
+#define NV_MMU_PTE_KIND_INVALID                       0x07 /* R---V */
+#define NV_MMU_PTE_KIND_PITCH                         0x00 /* R---V */
+#define NV_MMU_PTE_KIND_GENERIC_MEMORY                                                  0x6 /* R---V */
+#define NV_MMU_PTE_KIND_Z16                                                             0x1 /* R---V */
+#define NV_MMU_PTE_KIND_S8                                                              0x2 /* R---V */
+#define NV_MMU_PTE_KIND_S8Z24                                                           0x3 /* R---V */
+#define NV_MMU_PTE_KIND_ZF32_X24S8                                                      0x4 /* R---V */
+#define NV_MMU_PTE_KIND_Z24S8                                                           0x5 /* R---V */
+#define NV_MMU_PTE_KIND_GENERIC_MEMORY_COMPRESSIBLE                                     0x8 /* R---V */
+#define NV_MMU_PTE_KIND_GENERIC_MEMORY_COMPRESSIBLE_DISABLE_PLC                         0x9 /* R---V */
+#define NV_MMU_PTE_KIND_S8_COMPRESSIBLE_DISABLE_PLC                                     0xA /* R---V */
+#define NV_MMU_PTE_KIND_Z16_COMPRESSIBLE_DISABLE_PLC                                    0xB /* R---V */
+#define NV_MMU_PTE_KIND_S8Z24_COMPRESSIBLE_DISABLE_PLC                                  0xC /* R---V */
+#define NV_MMU_PTE_KIND_ZF32_X24S8_COMPRESSIBLE_DISABLE_PLC                             0xD /* R---V */
+#define NV_MMU_PTE_KIND_Z24S8_COMPRESSIBLE_DISABLE_PLC                                  0xE /* R---V */
+#define NV_MMU_PTE_KIND_SMSKED_MESSAGE                                                  0xF /* R---V */
+#define NV_MMU_VER1_PDE                                                      /* ----G */
+#define NV_MMU_VER1_PDE_APERTURE_BIG                       (0*32+1):(0*32+0) /* RWXVF */
+#define NV_MMU_VER1_PDE_APERTURE_BIG_INVALID                      0x00000000 /* RW--V */
+#define NV_MMU_VER1_PDE_APERTURE_BIG_VIDEO_MEMORY                 0x00000001 /* RW--V */
+#define NV_MMU_VER1_PDE_APERTURE_BIG_SYSTEM_COHERENT_MEMORY       0x00000002 /* RW--V */
+#define NV_MMU_VER1_PDE_APERTURE_BIG_SYSTEM_NON_COHERENT_MEMORY   0x00000003 /* RW--V */
+#define NV_MMU_VER1_PDE_SIZE                               (0*32+3):(0*32+2) /* RWXVF */
+#define NV_MMU_VER1_PDE_SIZE_FULL                                 0x00000000 /* RW--V */
+#define NV_MMU_VER1_PDE_SIZE_HALF                                 0x00000001 /* RW--V */
+#define NV_MMU_VER1_PDE_SIZE_QUARTER                              0x00000002 /* RW--V */
+#define NV_MMU_VER1_PDE_SIZE_EIGHTH                               0x00000003 /* RW--V */
+#define NV_MMU_VER1_PDE_ADDRESS_BIG_SYS                   (0*32+31):(0*32+4) /* RWXVF */
+#define NV_MMU_VER1_PDE_ADDRESS_BIG_VID                   (0*32+31-3):(0*32+4) /* RWXVF */
+#define NV_MMU_VER1_PDE_ADDRESS_BIG_VID_PEER             (0*32+31):(0*32+32-3) /* RWXVF */
+#define NV_MMU_VER1_PDE_ADDRESS_BIG_VID_PEER_0                    0x00000000 /* RW--V */
+#define NV_MMU_VER1_PDE_APERTURE_SMALL                     (1*32+1):(1*32+0) /* RWXVF */
+#define NV_MMU_VER1_PDE_APERTURE_SMALL_INVALID                    0x00000000 /* RW--V */
+#define NV_MMU_VER1_PDE_APERTURE_SMALL_VIDEO_MEMORY               0x00000001 /* RW--V */
+#define NV_MMU_VER1_PDE_APERTURE_SMALL_SYSTEM_COHERENT_MEMORY     0x00000002 /* RW--V */
+#define NV_MMU_VER1_PDE_APERTURE_SMALL_SYSTEM_NON_COHERENT_MEMORY 0x00000003 /* RW--V */
+#define NV_MMU_VER1_PDE_VOL_SMALL                          (1*32+2):(1*32+2) /* RWXVF */
+#define NV_MMU_VER1_PDE_VOL_SMALL_TRUE                            0x00000001 /* RW--V */
+#define NV_MMU_VER1_PDE_VOL_SMALL_FALSE                           0x00000000 /* RW--V */
+#define NV_MMU_VER1_PDE_VOL_BIG                            (1*32+3):(1*32+3) /* RWXVF */
+#define NV_MMU_VER1_PDE_VOL_BIG_TRUE                              0x00000001 /* RW--V */
+#define NV_MMU_VER1_PDE_VOL_BIG_FALSE                             0x00000000 /* RW--V */
+#define NV_MMU_VER1_PDE_ADDRESS_SMALL_SYS                 (1*32+31):(1*32+4) /* RWXVF */
+#define NV_MMU_VER1_PDE_ADDRESS_SMALL_VID                 (1*32+31-3):(1*32+4) /* RWXVF */
+#define NV_MMU_VER1_PDE_ADDRESS_SMALL_VID_PEER           (1*32+31):(1*32+32-3) /* RWXVF */
+#define NV_MMU_VER1_PDE_ADDRESS_SMALL_VID_PEER_0                  0x00000000 /* RW--V */
+#define NV_MMU_VER1_PDE_ADDRESS_SHIFT                             0x0000000c /*       */
+#define NV_MMU_VER1_PDE__SIZE                                              8
+#define NV_MMU_VER1_PTE                                                      /* ----G */
+#define NV_MMU_VER1_PTE_VALID                              (0*32+0):(0*32+0) /* RWXVF */
+#define NV_MMU_VER1_PTE_VALID_TRUE                                       0x1 /* RW--V */
+#define NV_MMU_VER1_PTE_VALID_FALSE                                      0x0 /* RW--V */
+#define NV_MMU_VER1_PTE_PRIVILEGE                          (0*32+1):(0*32+1) /* RWXVF */
+#define NV_MMU_VER1_PTE_PRIVILEGE_TRUE                                   0x1 /* RW--V */
+#define NV_MMU_VER1_PTE_PRIVILEGE_FALSE                                  0x0 /* RW--V */
+#define NV_MMU_VER1_PTE_READ_ONLY                          (0*32+2):(0*32+2) /* RWXVF */
+#define NV_MMU_VER1_PTE_READ_ONLY_TRUE                                  0x1  /* RW--V */
+#define NV_MMU_VER1_PTE_READ_ONLY_FALSE                                 0x0  /* RW--V */
+#define NV_MMU_VER1_PTE_ENCRYPTED                          (0*32+3):(0*32+3) /* RWXVF */
+#define NV_MMU_VER1_PTE_ENCRYPTED_TRUE                            0x00000001 /* R---V */
+#define NV_MMU_VER1_PTE_ENCRYPTED_FALSE                           0x00000000 /* R---V */
+#define NV_MMU_VER1_PTE_ADDRESS_SYS                      (0*32+31):(0*32+4) /* RWXVF */
+#define NV_MMU_VER1_PTE_ADDRESS_VID                      (0*32+31-3):(0*32+4) /* RWXVF */
+#define NV_MMU_VER1_PTE_ADDRESS_VID_PEER                (0*32+31):(0*32+32-3) /* RWXVF */
+#define NV_MMU_VER1_PTE_ADDRESS_VID_PEER_0                       0x00000000 /* RW--V */
+#define NV_MMU_VER1_PTE_ADDRESS_VID_PEER_1                       0x00000001 /* RW--V */
+#define NV_MMU_VER1_PTE_ADDRESS_VID_PEER_2                       0x00000002 /* RW--V */
+#define NV_MMU_VER1_PTE_ADDRESS_VID_PEER_3                       0x00000003 /* RW--V */
+#define NV_MMU_VER1_PTE_ADDRESS_VID_PEER_4                       0x00000004 /* RW--V */
+#define NV_MMU_VER1_PTE_ADDRESS_VID_PEER_5                       0x00000005 /* RW--V */
+#define NV_MMU_VER1_PTE_ADDRESS_VID_PEER_6                       0x00000006 /* RW--V */
+#define NV_MMU_VER1_PTE_ADDRESS_VID_PEER_7                       0x00000007 /* RW--V */
+#define NV_MMU_VER1_PTE_VOL                                (1*32+0):(1*32+0) /* RWXVF */
+#define NV_MMU_VER1_PTE_VOL_TRUE                                  0x00000001 /* RW--V */
+#define NV_MMU_VER1_PTE_VOL_FALSE                                 0x00000000 /* RW--V */
+#define NV_MMU_VER1_PTE_APERTURE                           (1*32+2):(1*32+1) /* RWXVF */
+#define NV_MMU_VER1_PTE_APERTURE_VIDEO_MEMORY                     0x00000000 /* RW--V */
+#define NV_MMU_VER1_PTE_APERTURE_PEER_MEMORY                      0x00000001 /* RW--V */
+#define NV_MMU_VER1_PTE_APERTURE_SYSTEM_COHERENT_MEMORY           0x00000002 /* RW--V */
+#define NV_MMU_VER1_PTE_APERTURE_SYSTEM_NON_COHERENT_MEMORY       0x00000003 /* RW--V */
+#define NV_MMU_VER1_PTE_ATOMIC_DISABLE                     (1*32+3):(1*32+3) /* RWXVF */
+#define NV_MMU_VER1_PTE_ATOMIC_DISABLE_TRUE                              0x1 /* RW--V */
+#define NV_MMU_VER1_PTE_ATOMIC_DISABLE_FALSE                             0x0 /* RW--V */
+#define NV_MMU_VER1_PTE_COMPTAGLINE                      (1*32+20+11):(1*32+12) /* RWXVF */
+#define NV_MMU_VER1_PTE_KIND                              (1*32+11):(1*32+4) /* RWXVF */
+#define NV_MMU_VER1_PTE_ADDRESS_SHIFT                             0x0000000c /*       */
+#define NV_MMU_VER1_PTE__SIZE                                             8
+#define NV_MMU_VER1_PTE_COMPTAGS_NONE                                    0x0 /*       */
+#define NV_MMU_VER1_PTE_COMPTAGS_1                                       0x1 /*       */
+#define NV_MMU_VER1_PTE_COMPTAGS_2                                       0x2 /*       */
+#define NV_MMU_NEW_PDE                                                      /* ----G */
+#define NV_MMU_NEW_PDE_IS_PTE                                           0:0 /* RWXVF */
+#define NV_MMU_NEW_PDE_IS_PTE_TRUE                                      0x1 /* RW--V */
+#define NV_MMU_NEW_PDE_IS_PTE_FALSE                                     0x0 /* RW--V */
+#define NV_MMU_NEW_PDE_IS_PDE                                           0:0 /* RWXVF */
+#define NV_MMU_NEW_PDE_IS_PDE_TRUE                                      0x0 /* RW--V */
+#define NV_MMU_NEW_PDE_IS_PDE_FALSE                                     0x1 /* RW--V */
+#define NV_MMU_NEW_PDE_VALID                                            0:0 /* RWXVF */
+#define NV_MMU_NEW_PDE_VALID_TRUE                                       0x1 /* RW--V */
+#define NV_MMU_NEW_PDE_VALID_FALSE                                      0x0 /* RW--V */
+#define NV_MMU_NEW_PDE_APERTURE                                         2:1 /* RWXVF */
+#define NV_MMU_NEW_PDE_APERTURE_INVALID                          0x00000000 /* RW--V */
+#define NV_MMU_NEW_PDE_APERTURE_VIDEO_MEMORY                     0x00000001 /* RW--V */
+#define NV_MMU_NEW_PDE_APERTURE_SYSTEM_COHERENT_MEMORY           0x00000002 /* RW--V */
+#define NV_MMU_NEW_PDE_APERTURE_SYSTEM_NON_COHERENT_MEMORY       0x00000003 /* RW--V */
+#define NV_MMU_NEW_PDE_VOL                                              3:3 /* RWXVF */
+#define NV_MMU_NEW_PDE_VOL_TRUE                                  0x00000001 /* RW--V */
+#define NV_MMU_NEW_PDE_VOL_FALSE                                 0x00000000 /* RW--V */
+#define NV_MMU_NEW_PDE_NO_ATS                                            5:5 /* RWXVF */
+#define NV_MMU_NEW_PDE_NO_ATS_TRUE                                       0x1 /* RW--V */
+#define NV_MMU_NEW_PDE_NO_ATS_FALSE                                      0x0 /* RW--V */
+#define NV_MMU_NEW_PDE_ADDRESS_SYS                                     53:8 /* RWXVF */
+#define NV_MMU_NEW_PDE_ADDRESS_VID             (35-3):8 /* RWXVF */
+#define NV_MMU_NEW_PDE_ADDRESS_VID_PEER       35:(36-3) /* RWXVF */
+#define NV_MMU_NEW_PDE_ADDRESS_VID_PEER_0                        0x00000000 /* RW--V */
+#define NV_MMU_NEW_PDE_ADDRESS_SHIFT                             0x0000000c /*       */
+#define NV_MMU_NEW_PDE__SIZE                                              8
+#define NV_MMU_NEW_DUAL_PDE                                                      /* ----G */
+#define NV_MMU_NEW_DUAL_PDE_IS_PTE                                           0:0 /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_IS_PTE_TRUE                                      0x1 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_IS_PTE_FALSE                                     0x0 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_IS_PDE                                           0:0 /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_IS_PDE_TRUE                                      0x0 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_IS_PDE_FALSE                                     0x1 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_VALID                                            0:0 /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_VALID_TRUE                                       0x1 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_VALID_FALSE                                      0x0 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_APERTURE_BIG                                     2:1 /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_APERTURE_BIG_INVALID                      0x00000000 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_APERTURE_BIG_VIDEO_MEMORY                 0x00000001 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_APERTURE_BIG_SYSTEM_COHERENT_MEMORY       0x00000002 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_APERTURE_BIG_SYSTEM_NON_COHERENT_MEMORY   0x00000003 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_VOL_BIG                                          3:3 /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_VOL_BIG_TRUE                              0x00000001 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_VOL_BIG_FALSE                             0x00000000 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_NO_ATS                                       5:5 /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_NO_ATS_TRUE                                  0x1 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_NO_ATS_FALSE                                 0x0 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_ADDRESS_BIG_SYS                                 53:(8-4) /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_ADDRESS_BIG_VID         (35-3):(8-4) /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_ADDRESS_BIG_VID_PEER   35:(36-3) /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_ADDRESS_BIG_VID_PEER_0                    0x00000000 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_APERTURE_SMALL                                 66:65 /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_APERTURE_SMALL_INVALID                    0x00000000 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_APERTURE_SMALL_VIDEO_MEMORY               0x00000001 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_APERTURE_SMALL_SYSTEM_COHERENT_MEMORY     0x00000002 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_APERTURE_SMALL_SYSTEM_NON_COHERENT_MEMORY 0x00000003 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_VOL_SMALL                                      67:67 /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_VOL_SMALL_TRUE                            0x00000001 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_VOL_SMALL_FALSE                           0x00000000 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_ADDRESS_SMALL_SYS                             117:72 /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_ADDRESS_SMALL_VID      (99-3):72 /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_ADDRESS_SMALL_VID_PEER 99:(100-3) /* RWXVF */
+#define NV_MMU_NEW_DUAL_PDE_ADDRESS_SMALL_VID_PEER_0                  0x00000000 /* RW--V */
+#define NV_MMU_NEW_DUAL_PDE_ADDRESS_SHIFT                             0x0000000c /*       */
+#define NV_MMU_NEW_DUAL_PDE_ADDRESS_BIG_SHIFT 8 /*       */
+#define NV_MMU_NEW_DUAL_PDE__SIZE                                             16
+#define NV_MMU_NEW_PTE                                                      /* ----G */
+#define NV_MMU_NEW_PTE_VALID                                            0:0 /* RWXVF */
+#define NV_MMU_NEW_PTE_VALID_TRUE                                       0x1 /* RW--V */
+#define NV_MMU_NEW_PTE_VALID_FALSE                                      0x0 /* RW--V */
+#define NV_MMU_NEW_PTE_APERTURE                                         2:1 /* RWXVF */
+#define NV_MMU_NEW_PTE_APERTURE_VIDEO_MEMORY                     0x00000000 /* RW--V */
+#define NV_MMU_NEW_PTE_APERTURE_PEER_MEMORY                      0x00000001 /* RW--V */
+#define NV_MMU_NEW_PTE_APERTURE_SYSTEM_COHERENT_MEMORY           0x00000002 /* RW--V */
+#define NV_MMU_NEW_PTE_APERTURE_SYSTEM_NON_COHERENT_MEMORY       0x00000003 /* RW--V */
+#define NV_MMU_NEW_PTE_VOL                                              3:3 /* RWXVF */
+#define NV_MMU_NEW_PTE_VOL_TRUE                                  0x00000001 /* RW--V */
+#define NV_MMU_NEW_PTE_VOL_FALSE                                 0x00000000 /* RW--V */
+#define NV_MMU_NEW_PTE_ENCRYPTED                                        4:4 /* RWXVF */
+#define NV_MMU_NEW_PTE_ENCRYPTED_TRUE                            0x00000001 /* R---V */
+#define NV_MMU_NEW_PTE_ENCRYPTED_FALSE                           0x00000000 /* R---V */
+#define NV_MMU_NEW_PTE_PRIVILEGE                                        5:5 /* RWXVF */
+#define NV_MMU_NEW_PTE_PRIVILEGE_TRUE                                   0x1 /* RW--V */
+#define NV_MMU_NEW_PTE_PRIVILEGE_FALSE                                  0x0 /* RW--V */
+#define NV_MMU_NEW_PTE_READ_ONLY                                        6:6 /* RWXVF */
+#define NV_MMU_NEW_PTE_READ_ONLY_TRUE                                   0x1 /* RW--V */
+#define NV_MMU_NEW_PTE_READ_ONLY_FALSE                                  0x0 /* RW--V */
+#define NV_MMU_NEW_PTE_ATOMIC_DISABLE                                   7:7 /* RWXVF */
+#define NV_MMU_NEW_PTE_ATOMIC_DISABLE_TRUE                              0x1 /* RW--V */
+#define NV_MMU_NEW_PTE_ATOMIC_DISABLE_FALSE                             0x0 /* RW--V */
+#define NV_MMU_NEW_PTE_ADDRESS_SYS                                     53:8 /* RWXVF */
+#define NV_MMU_NEW_PTE_ADDRESS_VID             (35-3):8 /* RWXVF */
+#define NV_MMU_NEW_PTE_ADDRESS_VID_PEER       35:(36-3) /* RWXVF */
+#define NV_MMU_NEW_PTE_ADDRESS_VID_PEER_0                        0x00000000 /* RW--V */
+#define NV_MMU_NEW_PTE_ADDRESS_VID_PEER_1                        0x00000001 /* RW--V */
+#define NV_MMU_NEW_PTE_ADDRESS_VID_PEER_2                        0x00000002 /* RW--V */
+#define NV_MMU_NEW_PTE_ADDRESS_VID_PEER_3                        0x00000003 /* RW--V */
+#define NV_MMU_NEW_PTE_ADDRESS_VID_PEER_4                        0x00000004 /* RW--V */
+#define NV_MMU_NEW_PTE_ADDRESS_VID_PEER_5                        0x00000005 /* RW--V */
+#define NV_MMU_NEW_PTE_ADDRESS_VID_PEER_6                        0x00000006 /* RW--V */
+#define NV_MMU_NEW_PTE_ADDRESS_VID_PEER_7                        0x00000007 /* RW--V */
+#define NV_MMU_NEW_PTE_COMPTAGLINE   (20+35):36 /* RWXVF */
+#define NV_MMU_NEW_PTE_KIND                                           63:56 /* RWXVF */
+#define NV_MMU_NEW_PTE_ADDRESS_SHIFT                             0x0000000c /*       */
+#define NV_MMU_NEW_PTE__SIZE                                              8
+#define NV_MMU_VER2_PDE                                                      /* ----G */
+#define NV_MMU_VER2_PDE_IS_PTE                                           0:0 /* RWXVF */
+#define NV_MMU_VER2_PDE_IS_PTE_TRUE                                      0x1 /* RW--V */
+#define NV_MMU_VER2_PDE_IS_PTE_FALSE                                     0x0 /* RW--V */
+#define NV_MMU_VER2_PDE_IS_PDE                                           0:0 /* RWXVF */
+#define NV_MMU_VER2_PDE_IS_PDE_TRUE                                      0x0 /* RW--V */
+#define NV_MMU_VER2_PDE_IS_PDE_FALSE                                     0x1 /* RW--V */
+#define NV_MMU_VER2_PDE_VALID                                            0:0 /* RWXVF */
+#define NV_MMU_VER2_PDE_VALID_TRUE                                       0x1 /* RW--V */
+#define NV_MMU_VER2_PDE_VALID_FALSE                                      0x0 /* RW--V */
+#define NV_MMU_VER2_PDE_APERTURE                                         2:1 /* RWXVF */
+#define NV_MMU_VER2_PDE_APERTURE_INVALID                          0x00000000 /* RW--V */
+#define NV_MMU_VER2_PDE_APERTURE_VIDEO_MEMORY                     0x00000001 /* RW--V */
+#define NV_MMU_VER2_PDE_APERTURE_SYSTEM_COHERENT_MEMORY           0x00000002 /* RW--V */
+#define NV_MMU_VER2_PDE_APERTURE_SYSTEM_NON_COHERENT_MEMORY       0x00000003 /* RW--V */
+#define NV_MMU_VER2_PDE_VOL                                              3:3 /* RWXVF */
+#define NV_MMU_VER2_PDE_VOL_TRUE                                  0x00000001 /* RW--V */
+#define NV_MMU_VER2_PDE_VOL_FALSE                                 0x00000000 /* RW--V */
+#define NV_MMU_VER2_PDE_NO_ATS                                           5:5 /* RWXVF */
+#define NV_MMU_VER2_PDE_NO_ATS_TRUE                                      0x1 /* RW--V */
+#define NV_MMU_VER2_PDE_NO_ATS_FALSE                                     0x0 /* RW--V */
+#define NV_MMU_VER2_PDE_ADDRESS_SYS                                     53:8 /* RWXVF */
+#define NV_MMU_VER2_PDE_ADDRESS_VID             (35-3):8 /* RWXVF */
+#define NV_MMU_VER2_PDE_ADDRESS_VID_PEER       35:(36-3) /* RWXVF */
+#define NV_MMU_VER2_PDE_ADDRESS_VID_PEER_0                        0x00000000 /* RW--V */
+#define NV_MMU_VER2_PDE_ADDRESS_SHIFT                             0x0000000c /*       */
+#define NV_MMU_VER2_PDE__SIZE                                              8
+#define NV_MMU_VER2_DUAL_PDE                                                      /* ----G */
+#define NV_MMU_VER2_DUAL_PDE_IS_PTE                                           0:0 /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_IS_PTE_TRUE                                      0x1 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_IS_PTE_FALSE                                     0x0 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_IS_PDE                                           0:0 /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_IS_PDE_TRUE                                      0x0 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_IS_PDE_FALSE                                     0x1 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_VALID                                            0:0 /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_VALID_TRUE                                       0x1 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_VALID_FALSE                                      0x0 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_APERTURE_BIG                                     2:1 /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_APERTURE_BIG_INVALID                      0x00000000 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_APERTURE_BIG_VIDEO_MEMORY                 0x00000001 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_APERTURE_BIG_SYSTEM_COHERENT_MEMORY       0x00000002 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_APERTURE_BIG_SYSTEM_NON_COHERENT_MEMORY   0x00000003 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_VOL_BIG                                          3:3 /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_VOL_BIG_TRUE                              0x00000001 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_VOL_BIG_FALSE                             0x00000000 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_NO_ATS                                      5:5 /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_NO_ATS_TRUE                                 0x1 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_NO_ATS_FALSE                                0x0 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_ADDRESS_BIG_SYS                                 53:(8-4) /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_ADDRESS_BIG_VID         (35-3):(8-4) /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_ADDRESS_BIG_VID_PEER   35:(36-3) /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_ADDRESS_BIG_VID_PEER_0                    0x00000000 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_APERTURE_SMALL                                 66:65 /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_APERTURE_SMALL_INVALID                    0x00000000 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_APERTURE_SMALL_VIDEO_MEMORY               0x00000001 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_APERTURE_SMALL_SYSTEM_COHERENT_MEMORY     0x00000002 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_APERTURE_SMALL_SYSTEM_NON_COHERENT_MEMORY 0x00000003 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_VOL_SMALL                                      67:67 /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_VOL_SMALL_TRUE                            0x00000001 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_VOL_SMALL_FALSE                           0x00000000 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_ADDRESS_SMALL_SYS                             117:72 /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_ADDRESS_SMALL_VID      (99-3):72 /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_ADDRESS_SMALL_VID_PEER 99:(100-3) /* RWXVF */
+#define NV_MMU_VER2_DUAL_PDE_ADDRESS_SMALL_VID_PEER_0                  0x00000000 /* RW--V */
+#define NV_MMU_VER2_DUAL_PDE_ADDRESS_SHIFT                             0x0000000c /*       */
+#define NV_MMU_VER2_DUAL_PDE_ADDRESS_BIG_SHIFT 8 /*       */
+#define NV_MMU_VER2_DUAL_PDE__SIZE                                             16
+#define NV_MMU_VER2_PTE                                                      /* ----G */
+#define NV_MMU_VER2_PTE_VALID                                            0:0 /* RWXVF */
+#define NV_MMU_VER2_PTE_VALID_TRUE                                       0x1 /* RW--V */
+#define NV_MMU_VER2_PTE_VALID_FALSE                                      0x0 /* RW--V */
+#define NV_MMU_VER2_PTE_APERTURE                                         2:1 /* RWXVF */
+#define NV_MMU_VER2_PTE_APERTURE_VIDEO_MEMORY                     0x00000000 /* RW--V */
+#define NV_MMU_VER2_PTE_APERTURE_PEER_MEMORY                      0x00000001 /* RW--V */
+#define NV_MMU_VER2_PTE_APERTURE_SYSTEM_COHERENT_MEMORY           0x00000002 /* RW--V */
+#define NV_MMU_VER2_PTE_APERTURE_SYSTEM_NON_COHERENT_MEMORY       0x00000003 /* RW--V */
+#define NV_MMU_VER2_PTE_VOL                                              3:3 /* RWXVF */
+#define NV_MMU_VER2_PTE_VOL_TRUE                                  0x00000001 /* RW--V */
+#define NV_MMU_VER2_PTE_VOL_FALSE                                 0x00000000 /* RW--V */
+#define NV_MMU_VER2_PTE_ENCRYPTED                                        4:4 /* RWXVF */
+#define NV_MMU_VER2_PTE_ENCRYPTED_TRUE                            0x00000001 /* R---V */
+#define NV_MMU_VER2_PTE_ENCRYPTED_FALSE                           0x00000000 /* R---V */
+#define NV_MMU_VER2_PTE_PRIVILEGE                                        5:5 /* RWXVF */
+#define NV_MMU_VER2_PTE_PRIVILEGE_TRUE                                   0x1 /* RW--V */
+#define NV_MMU_VER2_PTE_PRIVILEGE_FALSE                                  0x0 /* RW--V */
+#define NV_MMU_VER2_PTE_READ_ONLY                                        6:6 /* RWXVF */
+#define NV_MMU_VER2_PTE_READ_ONLY_TRUE                                   0x1 /* RW--V */
+#define NV_MMU_VER2_PTE_READ_ONLY_FALSE                                  0x0 /* RW--V */
+#define NV_MMU_VER2_PTE_ATOMIC_DISABLE                                   7:7 /* RWXVF */
+#define NV_MMU_VER2_PTE_ATOMIC_DISABLE_TRUE                              0x1 /* RW--V */
+#define NV_MMU_VER2_PTE_ATOMIC_DISABLE_FALSE                             0x0 /* RW--V */
+#define NV_MMU_VER2_PTE_ADDRESS_SYS                                     53:8 /* RWXVF */
+#define NV_MMU_VER2_PTE_ADDRESS_VID             (35-3):8 /* RWXVF */
+#define NV_MMU_VER2_PTE_ADDRESS_VID_PEER       35:(36-3) /* RWXVF */
+#define NV_MMU_VER2_PTE_ADDRESS_VID_PEER_0                        0x00000000 /* RW--V */
+#define NV_MMU_VER2_PTE_ADDRESS_VID_PEER_1                        0x00000001 /* RW--V */
+#define NV_MMU_VER2_PTE_ADDRESS_VID_PEER_2                        0x00000002 /* RW--V */
+#define NV_MMU_VER2_PTE_ADDRESS_VID_PEER_3                        0x00000003 /* RW--V */
+#define NV_MMU_VER2_PTE_ADDRESS_VID_PEER_4                        0x00000004 /* RW--V */
+#define NV_MMU_VER2_PTE_ADDRESS_VID_PEER_5                        0x00000005 /* RW--V */
+#define NV_MMU_VER2_PTE_ADDRESS_VID_PEER_6                        0x00000006 /* RW--V */
+#define NV_MMU_VER2_PTE_ADDRESS_VID_PEER_7                        0x00000007 /* RW--V */
+#define NV_MMU_VER2_PTE_COMPTAGLINE   (20+35):36 /* RWXVF */
+#define NV_MMU_VER2_PTE_KIND                                           63:56 /* RWXVF */
+#define NV_MMU_VER2_PTE_ADDRESS_SHIFT                             0x0000000c /*       */
+#define NV_MMU_VER2_PTE__SIZE                                              8
+#define NV_MMU_VER3_PDE                                                      /* ----G */
+#define NV_MMU_VER3_PDE_IS_PTE                                           0:0 /* RWXVF */
+#define NV_MMU_VER3_PDE_IS_PTE_TRUE                                      0x1 /* RW--V */
+#define NV_MMU_VER3_PDE_IS_PTE_FALSE                                     0x0 /* RW--V */
+#define NV_MMU_VER3_PDE_VALID                                            0:0 /* RWXVF */
+#define NV_MMU_VER3_PDE_VALID_TRUE                                       0x1 /* RW--V */
+#define NV_MMU_VER3_PDE_VALID_FALSE                                      0x0 /* RW--V */
+#define NV_MMU_VER3_PDE_APERTURE                                         2:1 /* RWXVF */
+#define NV_MMU_VER3_PDE_APERTURE_INVALID                          0x00000000 /* RW--V */
+#define NV_MMU_VER3_PDE_APERTURE_VIDEO_MEMORY                     0x00000001 /* RW--V */
+#define NV_MMU_VER3_PDE_APERTURE_SYSTEM_COHERENT_MEMORY           0x00000002 /* RW--V */
+#define NV_MMU_VER3_PDE_APERTURE_SYSTEM_NON_COHERENT_MEMORY       0x00000003 /* RW--V */
+#define NV_MMU_VER3_PDE_PCF                                                                        5:3 /* RWXVF */
+#define NV_MMU_VER3_PDE_PCF_VALID_CACHED_ATS_ALLOWED__OR__INVALID_ATS_ALLOWED               0x00000000 /* RW--V */
+#define NV_MMU_VER3_PDE_PCF_VALID_CACHED_ATS_ALLOWED                                        0x00000000 /* RW--V */
+#define NV_MMU_VER3_PDE_PCF_INVALID_ATS_ALLOWED                                             0x00000000 /* RW--V */
+#define NV_MMU_VER3_PDE_PCF_VALID_UNCACHED_ATS_ALLOWED__OR__SPARSE_ATS_ALLOWED              0x00000001 /* RW--V */
+#define NV_MMU_VER3_PDE_PCF_VALID_UNCACHED_ATS_ALLOWED                                      0x00000001 /* RW--V */
+#define NV_MMU_VER3_PDE_PCF_SPARSE_ATS_ALLOWED                                              0x00000001 /* RW--V */
+#define NV_MMU_VER3_PDE_PCF_VALID_CACHED_ATS_NOT_ALLOWED__OR__INVALID_ATS_NOT_ALLOWED       0x00000002 /* RW--V */
+#define NV_MMU_VER3_PDE_PCF_VALID_CACHED_ATS_NOT_ALLOWED                                    0x00000002 /* RW--V */
+#define NV_MMU_VER3_PDE_PCF_INVALID_ATS_NOT_ALLOWED                                         0x00000002 /* RW--V */
+#define NV_MMU_VER3_PDE_PCF_VALID_UNCACHED_ATS_NOT_ALLOWED__OR__SPARSE_ATS_NOT_ALLOWED      0x00000003 /* RW--V */
+#define NV_MMU_VER3_PDE_PCF_VALID_UNCACHED_ATS_NOT_ALLOWED                                  0x00000003 /* RW--V */
+#define NV_MMU_VER3_PDE_PCF_SPARSE_ATS_NOT_ALLOWED                                          0x00000003 /* RW--V */
+#define NV_MMU_VER3_PDE_ADDRESS                                             51:12 /* RWXVF */
+#define NV_MMU_VER3_PDE_ADDRESS_SHIFT                                  0x0000000c /*       */
+#define NV_MMU_VER3_PDE__SIZE                                              8
+#define NV_MMU_VER3_DUAL_PDE                                                      /* ----G */
+#define NV_MMU_VER3_DUAL_PDE_IS_PTE                                           0:0 /* RWXVF */
+#define NV_MMU_VER3_DUAL_PDE_IS_PTE_TRUE                                      0x1 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_IS_PTE_FALSE                                     0x0 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_VALID                                            0:0 /* RWXVF */
+#define NV_MMU_VER3_DUAL_PDE_VALID_TRUE                                       0x1 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_VALID_FALSE                                      0x0 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_APERTURE_BIG                                     2:1 /* RWXVF */
+#define NV_MMU_VER3_DUAL_PDE_APERTURE_BIG_INVALID                      0x00000000 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_APERTURE_BIG_VIDEO_MEMORY                 0x00000001 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_APERTURE_BIG_SYSTEM_COHERENT_MEMORY       0x00000002 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_APERTURE_BIG_SYSTEM_NON_COHERENT_MEMORY   0x00000003 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG                                                                        5:3 /* RWXVF */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG_VALID_CACHED_ATS_ALLOWED__OR__INVALID_ATS_ALLOWED               0x00000000 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG_VALID_CACHED_ATS_ALLOWED                                        0x00000000 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG_INVALID_ATS_ALLOWED                                             0x00000000 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG_VALID_UNCACHED_ATS_ALLOWED__OR__SPARSE_ATS_ALLOWED              0x00000001 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG_VALID_UNCACHED_ATS_ALLOWED                                      0x00000001 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG_SPARSE_ATS_ALLOWED                                              0x00000001 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG_VALID_CACHED_ATS_NOT_ALLOWED__OR__INVALID_ATS_NOT_ALLOWED       0x00000002 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG_VALID_CACHED_ATS_NOT_ALLOWED                                    0x00000002 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG_INVALID_ATS_NOT_ALLOWED                                         0x00000002 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG_VALID_UNCACHED_ATS_NOT_ALLOWED__OR__SPARSE_ATS_NOT_ALLOWED      0x00000003 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG_VALID_UNCACHED_ATS_NOT_ALLOWED                                  0x00000003 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_BIG_SPARSE_ATS_NOT_ALLOWED                                          0x00000003 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_ADDRESS_BIG                                     51:8 /* RWXVF */
+#define NV_MMU_VER3_DUAL_PDE_APERTURE_SMALL                                 66:65 /* RWXVF */
+#define NV_MMU_VER3_DUAL_PDE_APERTURE_SMALL_INVALID                    0x00000000 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_APERTURE_SMALL_VIDEO_MEMORY               0x00000001 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_APERTURE_SMALL_SYSTEM_COHERENT_MEMORY     0x00000002 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_APERTURE_SMALL_SYSTEM_NON_COHERENT_MEMORY 0x00000003 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL                                                                      69:67 /* RWXVF */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL_VALID_CACHED_ATS_ALLOWED__OR__INVALID_ATS_ALLOWED               0x00000000 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL_VALID_CACHED_ATS_ALLOWED                                        0x00000000 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL_INVALID_ATS_ALLOWED                                             0x00000000 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL_VALID_UNCACHED_ATS_ALLOWED__OR__SPARSE_ATS_ALLOWED              0x00000001 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL_VALID_UNCACHED_ATS_ALLOWED                                      0x00000001 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL_SPARSE_ATS_ALLOWED                                              0x00000001 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL_VALID_CACHED_ATS_NOT_ALLOWED__OR__INVALID_ATS_NOT_ALLOWED       0x00000002 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL_VALID_CACHED_ATS_NOT_ALLOWED                                    0x00000002 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL_INVALID_ATS_NOT_ALLOWED                                         0x00000002 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL_VALID_UNCACHED_ATS_NOT_ALLOWED__OR__SPARSE_ATS_NOT_ALLOWED      0x00000003 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL_VALID_UNCACHED_ATS_NOT_ALLOWED                                  0x00000003 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_PCF_SMALL_SPARSE_ATS_NOT_ALLOWED                                          0x00000003 /* RW--V */
+#define NV_MMU_VER3_DUAL_PDE_ADDRESS_SMALL                                 115:76 /* RWXVF */
+#define NV_MMU_VER3_DUAL_PDE_ADDRESS_SHIFT                             0x0000000c /*       */
+#define NV_MMU_VER3_DUAL_PDE_ADDRESS_BIG_SHIFT 8 /*       */
+#define NV_MMU_VER3_DUAL_PDE__SIZE                                             16
+#define NV_MMU_VER3_PTE                                                      /* ----G */
+#define NV_MMU_VER3_PTE_VALID                                            0:0 /* RWXVF */
+#define NV_MMU_VER3_PTE_VALID_TRUE                                       0x1 /* RW--V */
+#define NV_MMU_VER3_PTE_VALID_FALSE                                      0x0 /* RW--V */
+#define NV_MMU_VER3_PTE_APERTURE                                         2:1 /* RWXVF */
+#define NV_MMU_VER3_PTE_APERTURE_VIDEO_MEMORY                     0x00000000 /* RW--V */
+#define NV_MMU_VER3_PTE_APERTURE_PEER_MEMORY                      0x00000001 /* RW--V */
+#define NV_MMU_VER3_PTE_APERTURE_SYSTEM_COHERENT_MEMORY           0x00000002 /* RW--V */
+#define NV_MMU_VER3_PTE_APERTURE_SYSTEM_NON_COHERENT_MEMORY       0x00000003 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF                                                                        7:3 /* RWXVF */
+#define NV_MMU_VER3_PTE_PCF_INVALID                                                         0x00000000 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_SPARSE                                                          0x00000001 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_MAPPING_NOWHERE                                                 0x00000002 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_NO_VALID_4KB_PAGE                                               0x00000003 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RW_ATOMIC_CACHED_ACE                                    0x00000000 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RW_ATOMIC_UNCACHED_ACE                                  0x00000001 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RW_ATOMIC_CACHED_ACE                                  0x00000002 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RW_ATOMIC_UNCACHED_ACE                                0x00000003 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RO_ATOMIC_CACHED_ACE                                    0x00000004 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RO_ATOMIC_UNCACHED_ACE                                   0x00000005 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RO_ATOMIC_CACHED_ACE                                  0x00000006 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RO_ATOMIC_UNCACHED_ACE                                0x00000007 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RW_NO_ATOMIC_CACHED_ACE                                 0x00000008 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RW_NO_ATOMIC_UNCACHED_ACE                               0x00000009 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RW_NO_ATOMIC_CACHED_ACE                               0x0000000A /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RW_NO_ATOMIC_UNCACHED_ACE                             0x0000000B /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RO_NO_ATOMIC_CACHED_ACE                                 0x0000000C /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RO_NO_ATOMIC_UNCACHED_ACE                               0x0000000D /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RO_NO_ATOMIC_CACHED_ACE                               0x0000000E /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RO_NO_ATOMIC_UNCACHED_ACE                             0x0000000F /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RW_ATOMIC_CACHED_ACD                                    0x00000010 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RW_ATOMIC_UNCACHED_ACD                                  0x00000011 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RW_ATOMIC_CACHED_ACD                                  0x00000012 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RW_ATOMIC_UNCACHED_ACD                                0x00000013 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RO_ATOMIC_CACHED_ACD                                    0x00000014 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RO_ATOMIC_UNCACHED_ACD                                  0x00000015 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RO_ATOMIC_CACHED_ACD                                  0x00000016 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RO_ATOMIC_UNCACHED_ACD                                0x00000017 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RW_NO_ATOMIC_CACHED_ACD                                 0x00000018 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RW_NO_ATOMIC_UNCACHED_ACD                               0x00000019 /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RW_NO_ATOMIC_CACHED_ACD                               0x0000001A /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RW_NO_ATOMIC_UNCACHED_ACD                             0x0000001B /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RO_NO_ATOMIC_CACHED_ACD                                 0x0000001C /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_REGULAR_RO_NO_ATOMIC_UNCACHED_ACD                               0x0000001D /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RO_NO_ATOMIC_CACHED_ACD                               0x0000001E /* RW--V */
+#define NV_MMU_VER3_PTE_PCF_PRIVILEGE_RO_NO_ATOMIC_UNCACHED_ACD                             0x0000001F /* RW--V */
+#define NV_MMU_VER3_PTE_KIND                                           11:8 /* RWXVF */
+#define NV_MMU_VER3_PTE_ADDRESS                                         51:12 /* RWXVF */
+#define NV_MMU_VER3_PTE_ADDRESS_SYS                                     51:12 /* RWXVF */
+#define NV_MMU_VER3_PTE_ADDRESS_PEER                                    51:12 /* RWXVF */
+#define NV_MMU_VER3_PTE_ADDRESS_VID                                     39:12 /* RWXVF */
+#define NV_MMU_VER3_PTE_PEER_ID                63:(64-3) /* RWXVF */
+#define NV_MMU_VER3_PTE_PEER_ID_0                                 0x00000000 /* RW--V */
+#define NV_MMU_VER3_PTE_PEER_ID_1                                 0x00000001 /* RW--V */
+#define NV_MMU_VER3_PTE_PEER_ID_2                                 0x00000002 /* RW--V */
+#define NV_MMU_VER3_PTE_PEER_ID_3                                 0x00000003 /* RW--V */
+#define NV_MMU_VER3_PTE_PEER_ID_4                                 0x00000004 /* RW--V */
+#define NV_MMU_VER3_PTE_PEER_ID_5                                 0x00000005 /* RW--V */
+#define NV_MMU_VER3_PTE_PEER_ID_6                                 0x00000006 /* RW--V */
+#define NV_MMU_VER3_PTE_PEER_ID_7                                 0x00000007 /* RW--V */
+#define NV_MMU_VER3_PTE_ADDRESS_SHIFT                             0x0000000c /*       */
+#define NV_MMU_VER3_PTE__SIZE                                              8
+#define NV_MMU_CLIENT                                             /* ----G */
+#define NV_MMU_CLIENT_KIND                                    2:0 /* RWXVF */
+#define NV_MMU_CLIENT_KIND_Z16                                0x1 /* R---V */
+#define NV_MMU_CLIENT_KIND_S8                                 0x2 /* R---V */
+#define NV_MMU_CLIENT_KIND_S8Z24                              0x3 /* R---V */
+#define NV_MMU_CLIENT_KIND_ZF32_X24S8                         0x4 /* R---V */
+#define NV_MMU_CLIENT_KIND_Z24S8                              0x5 /* R---V */
+#define NV_MMU_CLIENT_KIND_GENERIC_MEMORY                     0x6 /* R---V */
+#define NV_MMU_CLIENT_KIND_INVALID                            0x7 /* R---V */
+#endif // __gb100_dev_mmu_h__
--- a/kernel-open/nvidia-uvm/nv-kthread-q-selftest.c
+++ b/kernel-open/nvidia-uvm/nv-kthread-q-selftest.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016 NVIDIA Corporation
+    Copyright (c) 2016-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -81,7 +81,7 @@
 #define NUM_Q_ITEMS_IN_MULTITHREAD_TEST (NUM_TEST_Q_ITEMS * NUM_TEST_KTHREADS)

 // This exists in order to have a function to place a breakpoint on:
-void on_nvq_assert(void)
+static void on_nvq_assert(void)
 {
    (void)NULL;
 }
--- a/kernel-open/nvidia-uvm/nv-kthread-q.c
+++ b/kernel-open/nvidia-uvm/nv-kthread-q.c
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2016-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -176,7 +176,7 @@ static struct task_struct *thread_create_on_node(int (*threadfn)(void *data),
 {

    unsigned i, j;
-    const static unsigned attempts = 3;
+    static const unsigned attempts = 3;
    struct task_struct *thread[3];

    for (i = 0;; i++) {
@@ -201,7 +201,7 @@ static struct task_struct *thread_create_on_node(int (*threadfn)(void *data),

        // Ran out of attempts - return thread even if its stack may not be
        // allocated on the preferred node
-        if ((i == (attempts - 1)))
+        if (i == (attempts - 1))
            break;

        // Get the NUMA node where the first page of the stack is resident. If
--- a/kernel-open/nvidia-uvm/nvidia-uvm-sources.Kbuild
+++ b/kernel-open/nvidia-uvm/nvidia-uvm-sources.Kbuild
@@ -1,14 +1,16 @@
 NVIDIA_UVM_SOURCES ?=
 NVIDIA_UVM_SOURCES_CXX ?=

-NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_ats_sva.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_conf_computing.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_sec2_test.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_maxwell_sec2.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_hopper_sec2.c
+NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_blackwell.c
+NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_blackwell_fault_buffer.c
+NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_blackwell_mmu.c
+NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_blackwell_host.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_common.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_linux.c
-NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_debug_optimized.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/nvstatus.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/nvCpuUuid.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/nv-kthread-q.c
@@ -32,6 +34,7 @@ NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_range_tree.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_rb_tree.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_range_allocator.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_va_range.c
+NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_va_range_device_p2p.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_va_policy.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_va_block.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_range_group.c
@@ -72,6 +75,7 @@ NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_turing_host.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_ampere.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_ampere_ce.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_ampere_host.c
+NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_ampere_fault_buffer.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_ampere_mmu.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_hopper.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_hopper_fault_buffer.c
@@ -96,6 +100,7 @@ NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_perf_prefetch.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_ats.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_ats_ibm.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_ats_faults.c
+NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_ats_sva.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_test.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_test_rng.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_range_tree_test.c
--- a/kernel-open/nvidia-uvm/nvidia-uvm.Kbuild
+++ b/kernel-open/nvidia-uvm/nvidia-uvm.Kbuild
@@ -13,19 +13,6 @@ NVIDIA_UVM_OBJECTS =
 include $(src)/nvidia-uvm/nvidia-uvm-sources.Kbuild
 NVIDIA_UVM_OBJECTS += $(patsubst %.c,%.o,$(NVIDIA_UVM_SOURCES))

-# Some linux kernel functions rely on being built with optimizations on and
-# to work around this we put wrappers for them in a separate file that's built
-# with optimizations on in debug builds and skipped in other builds.
-# Notably gcc 4.4 supports per function optimization attributes that would be
-# easier to use, but is too recent to rely on for now.
-NVIDIA_UVM_DEBUG_OPTIMIZED_SOURCE := nvidia-uvm/uvm_debug_optimized.c
-NVIDIA_UVM_DEBUG_OPTIMIZED_OBJECT := $(patsubst %.c,%.o,$(NVIDIA_UVM_DEBUG_OPTIMIZED_SOURCE))
-
-ifneq ($(UVM_BUILD_TYPE),debug)
-  # Only build the wrappers on debug builds
-  NVIDIA_UVM_OBJECTS := $(filter-out $(NVIDIA_UVM_DEBUG_OPTIMIZED_OBJECT), $(NVIDIA_UVM_OBJECTS))
-endif
-
 obj-m += nvidia-uvm.o
 nvidia-uvm-y := $(NVIDIA_UVM_OBJECTS)

@@ -36,15 +23,14 @@ NVIDIA_UVM_KO = nvidia-uvm/nvidia-uvm.ko
 #

 ifeq ($(UVM_BUILD_TYPE),debug)
-  NVIDIA_UVM_CFLAGS += -DDEBUG -O1 -g
-else
-  ifeq ($(UVM_BUILD_TYPE),develop)
-    # -DDEBUG is required, in order to allow pr_devel() print statements to
-    # work:
-    NVIDIA_UVM_CFLAGS += -DDEBUG
-    NVIDIA_UVM_CFLAGS += -DNVIDIA_UVM_DEVELOP
-  endif
-  NVIDIA_UVM_CFLAGS += -O2
+  NVIDIA_UVM_CFLAGS += -DDEBUG -g
+endif
+
+ifeq ($(UVM_BUILD_TYPE),develop)
+  # -DDEBUG is required, in order to allow pr_devel() print statements to
+  # work:
+  NVIDIA_UVM_CFLAGS += -DDEBUG
+  NVIDIA_UVM_CFLAGS += -DNVIDIA_UVM_DEVELOP
 endif

 NVIDIA_UVM_CFLAGS += -DNVIDIA_UVM_ENABLED
@@ -56,30 +42,17 @@ NVIDIA_UVM_CFLAGS += -I$(src)/nvidia-uvm

 $(call ASSIGN_PER_OBJ_CFLAGS, $(NVIDIA_UVM_OBJECTS), $(NVIDIA_UVM_CFLAGS))

-ifeq ($(UVM_BUILD_TYPE),debug)
-  # Force optimizations on for the wrappers
-  $(call ASSIGN_PER_OBJ_CFLAGS, $(NVIDIA_UVM_DEBUG_OPTIMIZED_OBJECT), $(NVIDIA_UVM_CFLAGS) -O2)
-endif
-
 #
 # Register the conftests needed by nvidia-uvm.ko
 #

 NV_OBJECTS_DEPEND_ON_CONFTEST += $(NVIDIA_UVM_OBJECTS)

-NV_CONFTEST_FUNCTION_COMPILE_TESTS += wait_on_bit_lock_argument_count
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += pde_data
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += radix_tree_empty
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += radix_tree_replace_slot
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += pnv_npu2_init_context
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += vmf_insert_pfn
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += cpumask_of_node
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += list_is_first
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += timer_setup
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += pci_bus_address
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += set_memory_uc
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += set_pages_uc
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += ktime_get_raw_ts64
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += ioasid_get
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += mm_pasid_drop
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += mmget_not_zero
@@ -88,32 +61,21 @@ NV_CONFTEST_FUNCTION_COMPILE_TESTS += iommu_sva_bind_device_has_drvdata_arg
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += vm_fault_to_errno
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += find_next_bit_wrap
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += iommu_is_dma_domain
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += folio_test_swapcache

-NV_CONFTEST_TYPE_COMPILE_TESTS += backing_dev_info
-NV_CONFTEST_TYPE_COMPILE_TESTS += mm_context_t
-NV_CONFTEST_TYPE_COMPILE_TESTS += get_user_pages_remote
-NV_CONFTEST_TYPE_COMPILE_TESTS += get_user_pages
-NV_CONFTEST_TYPE_COMPILE_TESTS += pin_user_pages_remote
-NV_CONFTEST_TYPE_COMPILE_TESTS += pin_user_pages
-NV_CONFTEST_TYPE_COMPILE_TESTS += vm_fault_has_address
 NV_CONFTEST_TYPE_COMPILE_TESTS += vm_ops_fault_removed_vma_arg
-NV_CONFTEST_TYPE_COMPILE_TESTS += kmem_cache_has_kobj_remove_work
-NV_CONFTEST_TYPE_COMPILE_TESTS += sysfs_slab_unlink
-NV_CONFTEST_TYPE_COMPILE_TESTS += vm_fault_t
 NV_CONFTEST_TYPE_COMPILE_TESTS += mmu_notifier_ops_invalidate_range
 NV_CONFTEST_TYPE_COMPILE_TESTS += mmu_notifier_ops_arch_invalidate_secondary_tlbs
-NV_CONFTEST_TYPE_COMPILE_TESTS += proc_ops
-NV_CONFTEST_TYPE_COMPILE_TESTS += timespec64
-NV_CONFTEST_TYPE_COMPILE_TESTS += mm_has_mmap_lock
 NV_CONFTEST_TYPE_COMPILE_TESTS += migrate_vma_added_flags
 NV_CONFTEST_TYPE_COMPILE_TESTS += migrate_device_range
-NV_CONFTEST_TYPE_COMPILE_TESTS += vm_area_struct_has_const_vm_flags
 NV_CONFTEST_TYPE_COMPILE_TESTS += handle_mm_fault_has_mm_arg
 NV_CONFTEST_TYPE_COMPILE_TESTS += handle_mm_fault_has_pt_regs_arg
 NV_CONFTEST_TYPE_COMPILE_TESTS += mempolicy_has_unified_nodes
 NV_CONFTEST_TYPE_COMPILE_TESTS += mempolicy_has_home_node
 NV_CONFTEST_TYPE_COMPILE_TESTS += mpol_preferred_many_present
 NV_CONFTEST_TYPE_COMPILE_TESTS += mmu_interval_notifier
+NV_CONFTEST_TYPE_COMPILE_TESTS += fault_flag_remote_present
+NV_CONFTEST_TYPE_COMPILE_TESTS += struct_page_has_zone_device_data

 NV_CONFTEST_SYMBOL_COMPILE_TESTS += is_export_symbol_present_int_active_memcg
 NV_CONFTEST_SYMBOL_COMPILE_TESTS += is_export_symbol_present_migrate_vma_setup
--- a/kernel-open/nvidia-uvm/uvm.c
+++ b/kernel-open/nvidia-uvm/uvm.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2015-2023 NVIDIA Corporation
+    Copyright (c) 2015-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -127,9 +127,9 @@ static NV_STATUS uvm_api_mm_initialize(UVM_MM_INITIALIZE_PARAMS *params, struct
        goto err;
    }

-    old_fd_type = nv_atomic_long_cmpxchg((atomic_long_t *)&filp->private_data,
-                                         UVM_FD_UNINITIALIZED,
-                                         UVM_FD_INITIALIZING);
+    old_fd_type = atomic_long_cmpxchg((atomic_long_t *)&filp->private_data,
+                                      UVM_FD_UNINITIALIZED,
+                                      UVM_FD_INITIALIZING);
    old_fd_type &= UVM_FD_TYPE_MASK;
    if (old_fd_type != UVM_FD_UNINITIALIZED) {
        status = NV_ERR_IN_USE;
@@ -222,10 +222,6 @@ static int uvm_open(struct inode *inode, struct file *filp)
    // assigning f_mapping.
    mapping->a_ops = inode->i_mapping->a_ops;

-#if defined(NV_ADDRESS_SPACE_HAS_BACKING_DEV_INFO)
-    mapping->backing_dev_info = inode->i_mapping->backing_dev_info;
-#endif
-
    filp->private_data = NULL;
    filp->f_mapping = mapping;

@@ -325,21 +321,21 @@ static int uvm_release_entry(struct inode *inode, struct file *filp)

 static void uvm_destroy_vma_managed(struct vm_area_struct *vma, bool make_zombie)
 {
-    uvm_va_range_t *va_range, *va_range_next;
+    uvm_va_range_managed_t *managed_range, *managed_range_next;
    NvU64 size = 0;

    uvm_assert_rwsem_locked_write(&uvm_va_space_get(vma->vm_file)->lock);
-    uvm_for_each_va_range_in_vma_safe(va_range, va_range_next, vma) {
+    uvm_for_each_va_range_managed_in_vma_safe(managed_range, managed_range_next, vma) {
        // On exit_mmap (process teardown), current->mm is cleared so
        // uvm_va_range_vma_current would return NULL.
-        UVM_ASSERT(uvm_va_range_vma(va_range) == vma);
-        UVM_ASSERT(va_range->node.start >= vma->vm_start);
-        UVM_ASSERT(va_range->node.end   <  vma->vm_end);
-        size += uvm_va_range_size(va_range);
+        UVM_ASSERT(uvm_va_range_vma(managed_range) == vma);
+        UVM_ASSERT(managed_range->va_range.node.start >= vma->vm_start);
+        UVM_ASSERT(managed_range->va_range.node.end   <  vma->vm_end);
+        size += uvm_va_range_size(&managed_range->va_range);
        if (make_zombie)
-            uvm_va_range_zombify(va_range);
+            uvm_va_range_zombify(managed_range);
        else
-            uvm_va_range_destroy(va_range, NULL);
+            uvm_va_range_destroy(&managed_range->va_range, NULL);
    }

    if (vma->vm_private_data) {
@@ -351,18 +347,17 @@ static void uvm_destroy_vma_managed(struct vm_area_struct *vma, bool make_zombie

 static void uvm_destroy_vma_semaphore_pool(struct vm_area_struct *vma)
 {
+    uvm_va_range_semaphore_pool_t *semaphore_pool_range;
    uvm_va_space_t *va_space;
-    uvm_va_range_t *va_range;

    va_space = uvm_va_space_get(vma->vm_file);
    uvm_assert_rwsem_locked(&va_space->lock);
-    va_range = uvm_va_range_find(va_space, vma->vm_start);
-    UVM_ASSERT(va_range &&
-               va_range->node.start   == vma->vm_start &&
-               va_range->node.end + 1 == vma->vm_end &&
-               va_range->type == UVM_VA_RANGE_TYPE_SEMAPHORE_POOL);
+    semaphore_pool_range = uvm_va_range_semaphore_pool_find(va_space, vma->vm_start);
+    UVM_ASSERT(semaphore_pool_range &&
+               semaphore_pool_range->va_range.node.start   == vma->vm_start &&
+               semaphore_pool_range->va_range.node.end + 1 == vma->vm_end);

-    uvm_mem_unmap_cpu_user(va_range->semaphore_pool.mem);
+    uvm_mem_unmap_cpu_user(semaphore_pool_range->mem);
 }

 // If a fault handler is not set, paths like handle_pte_fault in older kernels
@@ -478,7 +473,7 @@ static void uvm_vm_open_failure(struct vm_area_struct *original,
 static void uvm_vm_open_managed(struct vm_area_struct *vma)
 {
    uvm_va_space_t *va_space = uvm_va_space_get(vma->vm_file);
-    uvm_va_range_t *va_range;
+    uvm_va_range_managed_t *managed_range;
    struct vm_area_struct *original;
    NV_STATUS status;
    NvU64 new_end;
@@ -534,13 +529,13 @@ static void uvm_vm_open_managed(struct vm_area_struct *vma)
        goto out;
    }

-    // There can be multiple va_ranges under the vma already. Check if one spans
+    // There can be multiple ranges under the vma already. Check if one spans
    // the new split boundary. If so, split it.
-    va_range = uvm_va_range_find(va_space, new_end);
-    UVM_ASSERT(va_range);
-    UVM_ASSERT(uvm_va_range_vma_current(va_range) == original);
-    if (va_range->node.end != new_end) {
-        status = uvm_va_range_split(va_range, new_end, NULL);
+    managed_range = uvm_va_range_managed_find(va_space, new_end);
+    UVM_ASSERT(managed_range);
+    UVM_ASSERT(uvm_va_range_vma_current(managed_range) == original);
+    if (managed_range->va_range.node.end != new_end) {
+        status = uvm_va_range_split(managed_range, new_end, NULL);
        if (status != NV_OK) {
            UVM_DBG_PRINT("Failed to split VA range, destroying both: %s. "
                          "original vma [0x%lx, 0x%lx) new vma [0x%lx, 0x%lx)\n",
@@ -552,10 +547,10 @@ static void uvm_vm_open_managed(struct vm_area_struct *vma)
        }
    }

-    // Point va_ranges to the new vma
-    uvm_for_each_va_range_in_vma(va_range, vma) {
-        UVM_ASSERT(uvm_va_range_vma_current(va_range) == original);
-        va_range->managed.vma_wrapper = vma->vm_private_data;
+    // Point managed_ranges to the new vma
+    uvm_for_each_va_range_managed_in_vma(managed_range, vma) {
+        UVM_ASSERT(uvm_va_range_vma_current(managed_range) == original);
+        managed_range->vma_wrapper = vma->vm_private_data;
    }

 out:
@@ -657,12 +652,12 @@ static struct vm_operations_struct uvm_vm_ops_managed =
 };

 // vm operations on semaphore pool allocations only control CPU mappings. Unmapping GPUs,
-// freeing the allocation, and destroying the va_range are handled by UVM_FREE.
+// freeing the allocation, and destroying the range are handled by UVM_FREE.
 static void uvm_vm_open_semaphore_pool(struct vm_area_struct *vma)
 {
    struct vm_area_struct *origin_vma = (struct vm_area_struct *)vma->vm_private_data;
    uvm_va_space_t *va_space = uvm_va_space_get(origin_vma->vm_file);
-    uvm_va_range_t *va_range;
+    uvm_va_range_semaphore_pool_t *semaphore_pool_range;
    bool is_fork = (vma->vm_mm != origin_vma->vm_mm);
    NV_STATUS status;

@@ -670,14 +665,17 @@ static void uvm_vm_open_semaphore_pool(struct vm_area_struct *vma)

    uvm_va_space_down_write(va_space);

-    va_range = uvm_va_range_find(va_space, origin_vma->vm_start);
-    UVM_ASSERT(va_range);
-    UVM_ASSERT_MSG(va_range->type == UVM_VA_RANGE_TYPE_SEMAPHORE_POOL &&
-                   va_range->node.start == origin_vma->vm_start &&
-                   va_range->node.end + 1 == origin_vma->vm_end,
+    semaphore_pool_range = uvm_va_range_semaphore_pool_find(va_space, origin_vma->vm_start);
+    UVM_ASSERT(semaphore_pool_range);
+    UVM_ASSERT_MSG(semaphore_pool_range &&
+                   semaphore_pool_range->va_range.node.start == origin_vma->vm_start &&
+                   semaphore_pool_range->va_range.node.end + 1 == origin_vma->vm_end,
                   "origin vma [0x%llx, 0x%llx); va_range [0x%llx, 0x%llx) type %d\n",
-                   (NvU64)origin_vma->vm_start, (NvU64)origin_vma->vm_end, va_range->node.start,
-                   va_range->node.end + 1, va_range->type);
+                   (NvU64)origin_vma->vm_start,
+                   (NvU64)origin_vma->vm_end,
+                   semaphore_pool_range->va_range.node.start,
+                   semaphore_pool_range->va_range.node.end + 1,
+                   semaphore_pool_range->va_range.type);

    // Semaphore pool vmas do not have vma wrappers, but some functions will
    // assume vm_private_data is a wrapper.
@@ -689,9 +687,9 @@ static void uvm_vm_open_semaphore_pool(struct vm_area_struct *vma)

        // uvm_disable_vma unmaps in the parent as well; clear the uvm_mem CPU
        // user mapping metadata and then remap.
-        uvm_mem_unmap_cpu_user(va_range->semaphore_pool.mem);
+        uvm_mem_unmap_cpu_user(semaphore_pool_range->mem);

-        status = uvm_mem_map_cpu_user(va_range->semaphore_pool.mem, va_range->va_space, origin_vma);
+        status = uvm_mem_map_cpu_user(semaphore_pool_range->mem, semaphore_pool_range->va_range.va_space, origin_vma);
        if (status != NV_OK) {
            UVM_DBG_PRINT("Failed to remap semaphore pool to CPU for parent after fork; status = %d (%s)",
                    status, nvstatusToString(status));
@@ -702,7 +700,7 @@ static void uvm_vm_open_semaphore_pool(struct vm_area_struct *vma)
        origin_vma->vm_private_data = NULL;
        origin_vma->vm_ops = &uvm_vm_ops_disabled;
        vma->vm_ops = &uvm_vm_ops_disabled;
-        uvm_mem_unmap_cpu_user(va_range->semaphore_pool.mem);
+        uvm_mem_unmap_cpu_user(semaphore_pool_range->mem);
    }

    uvm_va_space_up_write(va_space);
@@ -751,10 +749,81 @@ static struct vm_operations_struct uvm_vm_ops_semaphore_pool =
 #endif
 };

+static void uvm_vm_open_device_p2p(struct vm_area_struct *vma)
+{
+    struct vm_area_struct *origin_vma = (struct vm_area_struct *)vma->vm_private_data;
+    uvm_va_space_t *va_space = uvm_va_space_get(origin_vma->vm_file);
+    uvm_va_range_t *va_range;
+    bool is_fork = (vma->vm_mm != origin_vma->vm_mm);
+
+    uvm_record_lock_mmap_lock_write(current->mm);
+
+    uvm_va_space_down_write(va_space);
+
+    va_range = uvm_va_range_find(va_space, origin_vma->vm_start);
+    UVM_ASSERT(va_range);
+    UVM_ASSERT_MSG(va_range->type == UVM_VA_RANGE_TYPE_DEVICE_P2P &&
+                   va_range->node.start == origin_vma->vm_start &&
+                   va_range->node.end + 1 == origin_vma->vm_end,
+                   "origin vma [0x%llx, 0x%llx); va_range [0x%llx, 0x%llx) type %d\n",
+                   (NvU64)origin_vma->vm_start, (NvU64)origin_vma->vm_end, va_range->node.start,
+                   va_range->node.end + 1, va_range->type);
+
+    // Device P2P vmas do not have vma wrappers, but some functions will
+    // assume vm_private_data is a wrapper.
+    vma->vm_private_data = NULL;
+
+    if (is_fork) {
+        // If we forked, leave the parent vma alone.
+        uvm_disable_vma(vma);
+
+        // uvm_disable_vma unmaps in the parent as well so remap the parent
+        uvm_va_range_device_p2p_map_cpu(va_range->va_space, origin_vma, uvm_va_range_to_device_p2p(va_range));
+    }
+    else {
+        // mremap will free the backing pages via unmap so we can't support it.
+        origin_vma->vm_private_data = NULL;
+        origin_vma->vm_ops = &uvm_vm_ops_disabled;
+        vma->vm_ops = &uvm_vm_ops_disabled;
+        unmap_mapping_range(va_space->mapping, va_range->node.start, va_range->node.end - va_range->node.start + 1, 1);
+    }
+
+    uvm_va_space_up_write(va_space);
+
+    uvm_record_unlock_mmap_lock_write(current->mm);
+}
+
+static void uvm_vm_open_device_p2p_entry(struct vm_area_struct *vma)
+{
+    UVM_ENTRY_VOID(uvm_vm_open_device_p2p(vma));
+}
+
+// Device P2P pages are only mapped on the CPU. Pages are allocated externally
+// to UVM but destroying the range must unpin the RM object.
+static void uvm_vm_close_device_p2p(struct vm_area_struct *vma)
+{
+}
+
+static void uvm_vm_close_device_p2p_entry(struct vm_area_struct *vma)
+{
+    UVM_ENTRY_VOID(uvm_vm_close_device_p2p(vma));
+}
+
+static struct vm_operations_struct uvm_vm_ops_device_p2p =
+{
+    .open         = uvm_vm_open_device_p2p_entry,
+    .close        = uvm_vm_close_device_p2p_entry,
+
+#if defined(NV_VM_OPS_FAULT_REMOVED_VMA_ARG)
+    .fault        = uvm_vm_fault_sigbus_wrapper_entry,
+#else
+    .fault        = uvm_vm_fault_sigbus_entry,
+#endif
+};
+
 static int uvm_mmap(struct file *filp, struct vm_area_struct *vma)
 {
    uvm_va_space_t *va_space;
-    uvm_va_range_t *va_range;
    NV_STATUS status = uvm_global_get_status();
    int ret = 0;
    bool vma_wrapper_allocated = false;
@@ -845,18 +914,28 @@ static int uvm_mmap(struct file *filp, struct vm_area_struct *vma)
    status = uvm_va_range_create_mmap(va_space, current->mm, vma->vm_private_data, NULL);

    if (status == NV_ERR_UVM_ADDRESS_IN_USE) {
+        uvm_va_range_semaphore_pool_t *semaphore_pool_range;
+        uvm_va_range_device_p2p_t *device_p2p_range;
        // If the mmap is for a semaphore pool, the VA range will have been
        // allocated by a previous ioctl, and the mmap just creates the CPU
        // mapping.
-        va_range = uvm_va_range_find(va_space, vma->vm_start);
-        if (va_range && va_range->node.start == vma->vm_start &&
-                va_range->node.end + 1 == vma->vm_end &&
-                va_range->type == UVM_VA_RANGE_TYPE_SEMAPHORE_POOL) {
+        semaphore_pool_range = uvm_va_range_semaphore_pool_find(va_space, vma->vm_start);
+        device_p2p_range = uvm_va_range_device_p2p_find(va_space, vma->vm_start);
+        if (semaphore_pool_range && semaphore_pool_range->va_range.node.start == vma->vm_start &&
+                semaphore_pool_range->va_range.node.end + 1 == vma->vm_end) {
            uvm_vma_wrapper_destroy(vma->vm_private_data);
            vma_wrapper_allocated = false;
            vma->vm_private_data = vma;
            vma->vm_ops = &uvm_vm_ops_semaphore_pool;
-            status = uvm_mem_map_cpu_user(va_range->semaphore_pool.mem, va_range->va_space, vma);
+            status = uvm_mem_map_cpu_user(semaphore_pool_range->mem, semaphore_pool_range->va_range.va_space, vma);
+        }
+        else if (device_p2p_range && device_p2p_range->va_range.node.start == vma->vm_start &&
+                 device_p2p_range->va_range.node.end + 1 == vma->vm_end) {
+            uvm_vma_wrapper_destroy(vma->vm_private_data);
+            vma_wrapper_allocated = false;
+            vma->vm_private_data = vma;
+            vma->vm_ops = &uvm_vm_ops_device_p2p;
+            status = uvm_va_range_device_p2p_map_cpu(va_space, vma, device_p2p_range);
        }
    }

@@ -914,8 +993,9 @@ static NV_STATUS uvm_api_initialize(UVM_INITIALIZE_PARAMS *params, struct file *
    // attempt to be made. This is safe because other threads will have only had
    // a chance to observe UVM_FD_INITIALIZING and not UVM_FD_VA_SPACE in this
    // case.
-    old_fd_type = nv_atomic_long_cmpxchg((atomic_long_t *)&filp->private_data,
-                                         UVM_FD_UNINITIALIZED, UVM_FD_INITIALIZING);
+    old_fd_type = atomic_long_cmpxchg((atomic_long_t *)&filp->private_data,
+                                      UVM_FD_UNINITIALIZED,
+                                      UVM_FD_INITIALIZING);
    old_fd_type &= UVM_FD_TYPE_MASK;
    if (old_fd_type == UVM_FD_UNINITIALIZED) {
        status = uvm_va_space_create(filp->f_mapping, &va_space, params->flags);
@@ -1001,6 +1081,9 @@ static long uvm_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
        UVM_ROUTE_CMD_STACK_INIT_CHECK(UVM_CLEAN_UP_ZOMBIE_RESOURCES,      uvm_api_clean_up_zombie_resources);
        UVM_ROUTE_CMD_STACK_INIT_CHECK(UVM_POPULATE_PAGEABLE,              uvm_api_populate_pageable);
        UVM_ROUTE_CMD_STACK_INIT_CHECK(UVM_VALIDATE_VA_RANGE,              uvm_api_validate_va_range);
+        UVM_ROUTE_CMD_STACK_INIT_CHECK(UVM_TOOLS_GET_PROCESSOR_UUID_TABLE_V2,uvm_api_tools_get_processor_uuid_table_v2);
+        UVM_ROUTE_CMD_STACK_INIT_CHECK(UVM_ALLOC_DEVICE_P2P,               uvm_api_alloc_device_p2p);
+        UVM_ROUTE_CMD_STACK_INIT_CHECK(UVM_CLEAR_ALL_ACCESS_COUNTERS,      uvm_api_clear_all_access_counters);
    }

    // Try the test ioctls if none of the above matched
--- a/kernel-open/nvidia-uvm/uvm.h
+++ b/kernel-open/nvidia-uvm/uvm.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2013-2023 NVIDIA Corporation
+    Copyright (c) 2013-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -45,20 +45,20 @@
 //     #endif
 // 3) Do the same thing for the function definition, and for any structs that
 //    are taken as arguments to these functions.
-// 4) Let this change propagate over to cuda_a and dev_a, so that the CUDA and
-//    nvidia-cfg libraries can start using the new API by bumping up the API
+// 4) Let this change propagate over to cuda_a and bugfix_main, so that the CUDA
+//    and nvidia-cfg libraries can start using the new API by bumping up the API
 //    version number it's using.
 //    Places where UVM_API_REVISION is defined are:
 //      drivers/gpgpu/cuda/cuda.nvmk (cuda_a)
-//      drivers/setup/linux/nvidia-cfg/makefile.nvmk (dev_a)
-// 5) Once the dev_a and cuda_a changes have made it back into chips_a,
+//      drivers/setup/linux/nvidia-cfg/makefile.nvmk (bugfix_main)
+// 5) Once the bugfix_main and cuda_a changes have made it back into chips_a,
 //    remove the old API declaration, definition, and any old structs that were
 //    in use.

 #ifndef _UVM_H_
 #define _UVM_H_

-#define UVM_API_LATEST_REVISION 11
+#define UVM_API_LATEST_REVISION 13

 #if !defined(UVM_API_REVISION)
 #error "please define UVM_API_REVISION macro to a desired version number or UVM_API_LATEST_REVISION macro"
@@ -167,7 +167,7 @@ NV_STATUS UvmSetDriverVersion(NvU32 major, NvU32 changelist);
 //
 // Error codes:
 //     NV_ERR_NOT_SUPPORTED:
-//         The Linux kernel is not able to support UVM. This could be because
+//         The kernel is not able to support UVM. This could be because
 //         the kernel is too old, or because it lacks a feature that UVM
 //         requires. The kernel log will have details.
 //
@@ -384,36 +384,8 @@ NV_STATUS UvmIsPageableMemoryAccessSupportedOnGpu(const NvProcessorUuid *gpuUuid
 //         because it is not very informative.
 //
 //------------------------------------------------------------------------------
-#if UVM_API_REV_IS_AT_MOST(8)
-NV_STATUS UvmRegisterGpu(const NvProcessorUuid *gpuUuid);
-#else
 NV_STATUS UvmRegisterGpu(const NvProcessorUuid *gpuUuid,
                         const UvmGpuPlatformParams *platformParams);
-#endif
-
-#if UVM_API_REV_IS_AT_MOST(8)
-//------------------------------------------------------------------------------
-// UvmRegisterGpuSmc
-//
-// The same as UvmRegisterGpu, but takes additional parameters to specify the
-// GPU partition being registered if SMC is enabled.
-//
-// Arguments:
-//     gpuUuid: (INPUT)
-//         UUID of the physical GPU of the SMC partition to register.
-//
-//     platformParams: (INPUT)
-//         User handles identifying the partition to register.
-//
-// Error codes (see UvmRegisterGpu also):
-//
-//     NV_ERR_INVALID_STATE:
-//         SMC was not enabled, or the partition identified by the user
-//         handles or its configuration changed.
-//
-NV_STATUS UvmRegisterGpuSmc(const NvProcessorUuid *gpuUuid,
-                            const UvmGpuPlatformParams *platformParams);
-#endif

 //------------------------------------------------------------------------------
 // UvmUnregisterGpu
@@ -1364,6 +1336,86 @@ NV_STATUS UvmAllocSemaphorePool(void                          *base,
                                const UvmGpuMappingAttributes *perGpuAttribs,
                                NvLength                       gpuAttribsCount);

+//------------------------------------------------------------------------------
+// UvmAllocDeviceP2P
+//
+// Create a VA range within the process's address space reserved for use by
+// other devices to directly access GPU memory. The memory associated with the
+// RM handle is mapped into the user address space associated with the range for
+// direct access from the CPU.
+//
+// The VA range must not overlap with an existing VA range, irrespective of
+// whether the existing range corresponds to a UVM allocation or an external
+// allocation.
+//
+// Multiple VA ranges may be created mapping the same physical memory associated
+// with the RM handle. The associated GPU memory will not be freed until all VA
+// ranges have been destroyed either explicitly or implicitly and all non-UVM
+// users (eg. third party device drivers) have stopped using the associated
+// GPU memory.
+//
+// The VA range can be unmapped and freed by calling UvmFree.
+//
+// Destroying the final range mapping the RM handle may block until all third
+// party device drivers and other kernel users have stopped using the memory.
+//
+// These VA ranges are only associated with a single GPU.
+//
+// Arguments:
+//     gpuUuid: (INPUT)
+//         UUID of the physical GPU if the GPU is not SMC capable or SMC
+//         enabled, or the GPU instance UUID of the partition containing the
+//         memory to be mapped on the CPU.
+//
+//     base: (INPUT)
+//         Base address of the virtual address range.
+//
+//     length: (INPUT)
+//         Length, in bytes, of the range.
+//
+//     offset: (INPUT)
+//         Offset, in bytes, from the start of the externally allocated memory
+//         to map from.
+//
+//     platformParams: (INPUT)
+//         Platform specific parameters that identify the allocation.
+//         On Linux: RM ctrl fd, hClient and the handle (hMemory) of the
+//         externally allocated memory to map.
+//
+// Errors:
+//
+//     NV_ERR_INVALID_ADDRESS:
+//         base is NULL or length is zero or at least one of base and length is
+//         not aligned to 4K.
+//
+//     NV_ERR_INVALID_DEVICE:
+//         The gpuUuid was either not registered or has no GPU VA space
+//         registered for it.
+//
+//     NV_ERR_INVALID_ARGUMENT:
+//         base + offset + length exceeeds the end of the externally allocated
+//         memory handle or the externally allocated handle is not valid.
+//
+//     NV_ERR_UVM_ADDRESS_IN_USE:
+//         The requested virtual address range overlaps with an existing
+//         allocation.
+//
+//     NV_ERR_NO_MEMORY:
+//         Internal memory allocation failed.
+//
+//     NV_ERR_NOT_SUPPORTED:
+//         The device peer-to-peer feature is not supported by the current
+//         system configuration. This may be because the GPU doesn't support
+//         the peer-to-peer feature or the kernel was not built with the correct
+//         configuration options.
+//
+//------------------------------------------------------------------------------
+NV_STATUS UvmAllocDeviceP2P(NvProcessorUuid gpuUuid,
+                            void     *base,
+                            NvLength length,
+                            NvLength offset,
+                            const UvmDeviceP2PPlatformParams *platformParams);
+
 //------------------------------------------------------------------------------
 // UvmMigrate
 //
@@ -1448,7 +1500,9 @@ NV_STATUS UvmAllocSemaphorePool(void                          *base,
 //
 //     preferredCpuMemoryNode: (INPUT)
 //         Preferred CPU NUMA memory node used if the destination processor is
-//         the CPU.
+//         the CPU. -1 indicates no preference, in which case the pages used
+//         can be on any of the available CPU NUMA nodes. If NUMA is disabled
+//         only 0 and -1 are allowed.
 //
 // Error codes:
 //     NV_ERR_INVALID_ADDRESS:
@@ -1462,6 +1516,11 @@ NV_STATUS UvmAllocSemaphorePool(void                          *base,
 //         The VA range exceeds the largest virtual address supported by the
 //         destination processor.
 //
+//     NV_ERR_INVALID_ARGUMENT:
+//         preferredCpuMemoryNode is not a valid CPU NUMA node or it corresponds
+//         to a NUMA node ID for a registered GPU. If NUMA is disabled, it
+//         indicates that preferredCpuMemoryNode was not either 0 or -1.
+//
 //     NV_ERR_INVALID_DEVICE:
 //         destinationUuid does not represent a valid processor such as a CPU or
 //         a GPU with a GPU VA space registered for it. Or destinationUuid is a
@@ -1528,8 +1587,9 @@ NV_STATUS UvmMigrate(void                  *base,
 //
 //     preferredCpuMemoryNode: (INPUT)
 //         Preferred CPU NUMA memory node used if the destination processor is
-//         the CPU. This argument is ignored if the given virtual address range
-//         corresponds to managed memory.
+//         the CPU. -1 indicates no preference, in which case the pages used
+//         can be on any of the available CPU NUMA nodes. If NUMA is disabled
+//         only 0 and -1 are allowed.
 //
 //     semaphoreAddress: (INPUT)
 //         Base address of the semaphore.
@@ -1586,8 +1646,8 @@ NV_STATUS UvmMigrateAsync(void                  *base,
 //
 // Migrates the backing of all virtual address ranges associated with the given
 // range group to the specified destination processor. The behavior of this API
-// is equivalent to calling UvmMigrate on each VA range associated with this
-// range group.
+// is equivalent to calling UvmMigrate with preferredCpuMemoryNode = -1 on each
+// VA range associated with this range group.
 //
 // Any errors encountered during migration are returned immediately. No attempt
 // is made to migrate the remaining unmigrated ranges and the ranges that are
@@ -2169,7 +2229,8 @@ NV_STATUS UvmMapDynamicParallelismRegion(void                  *base,
 //
 // If any page in the VA range has a preferred location, then the migration and
 // mapping policies associated with this API take precedence over those related
-// to the preferred location.
+// to the preferred location. If the preferred location is a specific CPU NUMA
+// node, that NUMA node will be used for a CPU-resident copy of the page.
 //
 // If any pages in this VA range have any processors present in their
 // accessed-by list, the migration and mapping policies associated with this
@@ -2300,7 +2361,7 @@ NV_STATUS UvmDisableReadDuplication(void     *base,
 // UvmPreventMigrationRangeGroups has not been called on the range group that
 // those pages are associated with, then the migration and mapping policies
 // associated with UvmEnableReadDuplication override the policies outlined
-// above. Note that enabling read duplication on on any pages in this VA range
+// above. Note that enabling read duplication on any pages in this VA range
 // does not clear the state set by this API for those pages. It merely overrides
 // the policies associated with this state until read duplication is disabled
 // for those pages.
@@ -2333,7 +2394,8 @@ NV_STATUS UvmDisableReadDuplication(void     *base,
 //     preferredCpuMemoryNode: (INPUT)
 //         Preferred CPU NUMA memory node used if preferredLocationUuid is the
 //         UUID of the CPU. -1 is a special value which indicates all CPU nodes
-//         allowed by the global and thread memory policies.
+//         allowed by the global and thread memory policies. If NUMA is disabled
+//         only 0 and -1 are allowed.
 //
 // Errors:
 //     NV_ERR_INVALID_ADDRESS:
@@ -3266,7 +3328,7 @@ NV_STATUS UvmEventGetGpuUuidTable(NvProcessorUuid *gpuUuidTable,
 //------------------------------------------------------------------------------
 NV_STATUS UvmEventFetch(UvmDebugSession      sessionHandle,
                        UvmEventQueueHandle  queueHandle,
-                        UvmEventEntry_V1    *pBuffer,
+                        UvmEventEntry       *pBuffer,
                        NvU64               *nEntries);

 //------------------------------------------------------------------------------
@@ -3462,35 +3524,21 @@ NV_STATUS UvmToolsDestroySession(UvmToolsSessionHandle session);
 // 4. Destroy event Queue using UvmToolsDestroyEventQueue
 //

-#if UVM_API_REV_IS_AT_MOST(10)
-// This is deprecated and replaced by sizeof(UvmToolsEventControlData_V1) or
-// sizeof(UvmToolsEventControlData_V2).
-NvLength UvmToolsGetEventControlSize(void);
-
-// This is deprecated and replaced by sizeof(UvmEventEntry_V1) or
-// sizeof(UvmEventEntry_V2).
-NvLength UvmToolsGetEventEntrySize(void);
-#endif
-
 NvLength UvmToolsGetNumberOfCounters(void);

 //------------------------------------------------------------------------------
 // UvmToolsCreateEventQueue
 //
-// This call creates an event queue that can hold the given number of events.
-// All events are disabled by default. Event queue data persists lifetime of the
-// target process.
+// This function is deprecated. See UvmToolsCreateEventQueue_V2.
+//
+// This call creates an event queue that can hold the given number of
+// UvmEventEntry events. All events are disabled by default. Event queue data
+// persists lifetime of the target process.
 //
 // Arguments:
 //     session: (INPUT)
 //         Handle to the tools session.
 //
-//     version: (INPUT)
-//         Requested version for events or counters.
-//         See UvmEventEntry_V1 and UvmEventEntry_V2.
-//         UvmToolsEventControlData_V2::version records the entry version that
-//         will be generated.
-//
 //     event_buffer: (INPUT)
 //         User allocated buffer. Must be page-aligned. Must be large enough to
 //         hold at least event_buffer_size events. Gets pinned until queue is
@@ -3502,8 +3550,7 @@ NvLength UvmToolsGetNumberOfCounters(void);
 //
 //     event_control (INPUT)
 //         User allocated buffer. Must be page-aligned. Must be large enough to
-//         hold UvmToolsEventControlData_V1 if version is UvmEventEntry_V1 or
-//         UvmToolsEventControlData_V2 (although single page-size allocation
+//         hold UvmToolsEventControlData (although single page-size allocation
 //         should be more than enough). Gets pinned until queue is destroyed.
 //
 //     queue: (OUTPUT)
@@ -3514,7 +3561,6 @@ NvLength UvmToolsGetNumberOfCounters(void);
 //         Session handle does not refer to a valid session
 //
 //     NV_ERR_INVALID_ARGUMENT:
-//         The version is not UvmEventEntry_V1 or UvmEventEntry_V2.
 //         One of the parameters: event_buffer, event_buffer_size, event_control
 //         is not valid
 //
@@ -3525,21 +3571,66 @@ NvLength UvmToolsGetNumberOfCounters(void);
 //         pinned (e.g. because of OS limitation of pinnable memory). Also it
 //         could not have been possible to create UvmToolsEventQueueDescriptor.
 //
-//------------------------------------------------------------------------------
-#if UVM_API_REV_IS_AT_MOST(10)
-NV_STATUS UvmToolsCreateEventQueue(UvmToolsSessionHandle     session,
-                                   void                     *event_buffer,
-                                   NvLength                  event_buffer_size,
-                                   void                     *event_control,
-                                   UvmToolsEventQueueHandle *queue);
-#else
 NV_STATUS UvmToolsCreateEventQueue(UvmToolsSessionHandle        session,
-                                   UvmToolsEventQueueVersion    version,
                                   void                        *event_buffer,
                                   NvLength                     event_buffer_size,
                                   void                        *event_control,
                                   UvmToolsEventQueueHandle    *queue);
-#endif
+
+//------------------------------------------------------------------------------
+// UvmToolsCreateEventQueue_V2
+//
+// This call creates an event queue that can hold the given number of
+// UvmEventEntry_V2 events. All events are disabled by default. Event queue data
+// persists beyond the lifetime of the target process.
+//
+// Arguments:
+//     session: (INPUT)
+//         Handle to the tools session.
+//
+//     event_buffer: (INPUT)
+//         User allocated buffer. Must be page-aligned. Must be large enough to
+//         hold at least event_buffer_size events. Gets pinned until queue is
+//         destroyed.
+//
+//     event_buffer_size: (INPUT)
+//         Size of the event queue buffer in units of UvmEventEntry_V2's. Must
+//         be a power of two, and greater than 1.
+//
+//     event_control (INPUT)
+//         User allocated buffer. Must be page-aligned. Must be large enough to
+//         hold UvmToolsEventControlData (although single page-size allocation
+//         should be more than enough). Gets pinned until queue is destroyed.
+//
+//     queue: (OUTPUT)
+//         Handle to the created queue.
+//
+// Error codes:
+//     NV_ERR_INSUFFICIENT_PERMISSIONS:
+//         Session handle does not refer to a valid session
+//
+//     NV_ERR_INVALID_ARGUMENT:
+//         One of the parameters: event_buffer, event_buffer_size, event_control
+//         is not valid
+//
+//     NV_ERR_NOT_SUPPORTED:
+//         The requested version queue could not be created
+//         (i.e., the UVM kernel driver is older and doesn't support
+//         UvmToolsEventQueueVersion_V2).
+//
+//     NV_ERR_INSUFFICIENT_RESOURCES:
+//         There could be multiple reasons for this error. One would be that
+//         it's not possible to allocate a queue of requested size. Another
+//         would be either event_buffer or event_control memory couldn't be
+//         pinned (e.g. because of OS limitation of pinnable memory). Also it
+//         could not have been possible to create UvmToolsEventQueueDescriptor.
+//
+//------------------------------------------------------------------------------
+NV_STATUS UvmToolsCreateEventQueue_V2(UvmToolsSessionHandle        session,
+                                      void                        *event_buffer,
+                                      NvLength                     event_buffer_size,
+                                      void                        *event_control,
+                                      UvmToolsEventQueueHandle    *queue);

 UvmToolsEventQueueDescriptor UvmToolsGetEventQueueDescriptor(UvmToolsEventQueueHandle queue);

@@ -3955,6 +4046,8 @@ NV_STATUS UvmToolsWriteProcessMemory(UvmToolsSessionHandle  session,
 //------------------------------------------------------------------------------
 // UvmToolsGetProcessorUuidTable
 //
+// This function is deprecated. See UvmToolsGetProcessorUuidTable_V2.
+//
 // Populate a table with the UUIDs of all the currently registered processors
 // in the target process. When a GPU is registered, it is added to the table.
 // When a GPU is unregistered, it is removed. As long as a GPU remains
@@ -3967,61 +4060,63 @@ NV_STATUS UvmToolsWriteProcessMemory(UvmToolsSessionHandle  session,
 //     session: (INPUT)
 //         Handle to the tools session.
 //
-//     version: (INPUT)
-//         Requested version for the UUID table returned. The version must
-//         match the requested version of the event queue created with
-//         UvmToolsCreateEventQueue().
-//         See UvmEventEntry_V1 and UvmEventEntry_V2.
-//
 //     table: (OUTPUT)
 //         Array of processor UUIDs, including the CPU's UUID which is always
-//         at index zero.  The srcIndex and dstIndex fields of the
-//         UvmEventMigrationInfo struct index this array.  Unused indices will
-//         have a UUID of zero. Version UvmEventEntry_V1 only uses GPU UUIDs
-//         for the UUID of the physical GPU and only supports a single SMC
-//         partition registered per process. Version UvmEventEntry_V2 supports
-//         multiple SMC partitions registered per process and uses physical GPU
-//         UUIDs if the GPU is not SMC capable or SMC enabled and GPU instance
-//         UUIDs for SMC partitions.
-//         The table pointer can be NULL in which case, the size of the table
-//         needed to hold all the UUIDs is returned in 'count'.
-//
-//     table_size: (INPUT)
-//         The size of the table in number of array elements. This can be
-//         zero if the table pointer is NULL.
-//
-//     count: (OUTPUT)
-//         On output, it is set by UVM to the number of UUIDs needed to hold
-//         all the UUIDs, including any gaps in the table due to unregistered
-//         GPUs.
+//         at index zero. The number of elements in the array must be greater
+//         or equal to UVM_MAX_PROCESSORS_V1.
+//         The srcIndex and dstIndex fields of the UvmEventMigrationInfo struct
+//         index this array. Unused indices will have a UUID of zero.
+//         The reported UUID will be that of the corresponding physical GPU,
+//         even if multiple SMC partitions are registered under that physical
+//         GPU.
 //
 // Error codes:
 //     NV_ERR_INVALID_ADDRESS:
-//         writing to table failed or the count pointer was invalid.
-//
-//     NV_ERR_INVALID_ARGUMENT:
-//         The version is not UvmEventEntry_V1 or UvmEventEntry_V2.
-//         The count pointer is NULL.
-//         See UvmToolsEventQueueVersion.
-//
-//     NV_WARN_MISMATCHED_TARGET:
-//         The kernel returned a table suitable for UvmEventEntry_V1 events.
-//         (i.e., the kernel is older and doesn't support UvmEventEntry_V2).
+//         writing to table failed.
 //
 //     NV_ERR_NO_MEMORY:
 //         Internal memory allocation failed.
 //------------------------------------------------------------------------------
-#if UVM_API_REV_IS_AT_MOST(10)
-NV_STATUS UvmToolsGetProcessorUuidTable(UvmToolsSessionHandle  session,
-                                        NvProcessorUuid       *table,
-                                        NvLength              *count);
-#else
-NV_STATUS UvmToolsGetProcessorUuidTable(UvmToolsSessionHandle      session,
-                                        UvmToolsEventQueueVersion  version,
-                                        NvProcessorUuid           *table,
-                                        NvLength                   table_size,
-                                        NvLength                  *count);
-#endif
+NV_STATUS UvmToolsGetProcessorUuidTable(UvmToolsSessionHandle     session,
+                                        NvProcessorUuid          *table);
+
+//------------------------------------------------------------------------------
+// UvmToolsGetProcessorUuidTable_V2
+//
+// Populate a table with the UUIDs of all the currently registered processors
+// in the target process. When a GPU is registered, it is added to the table.
+// When a GPU is unregistered, it is removed. As long as a GPU remains
+// registered, its index in the table does not change.
+// Note that the index in the table corresponds to the processor ID reported
+// in UvmEventEntry event records and that the table is not contiguously packed
+// with non-zero UUIDs even with no GPU unregistrations.
+//
+// Arguments:
+//     session: (INPUT)
+//         Handle to the tools session.
+//
+//     table: (OUTPUT)
+//         Array of processor UUIDs, including the CPU's UUID which is always
+//         at index zero. The number of elements in the array must be greater
+//         or equal to UVM_MAX_PROCESSORS.
+//         The srcIndex and dstIndex fields of the UvmEventMigrationInfo struct
+//         index this array. Unused indices will have a UUID of zero.
+//         The reported UUID will be the GPU instance UUID if SMC is enabled,
+//         otherwise it will be the UUID of the physical GPU.
+//
+// Error codes:
+//     NV_ERR_INVALID_ADDRESS:
+//         writing to table failed.
+//
+//     NV_ERR_NOT_SUPPORTED:
+//         The UVM kernel driver is older and doesn't support
+//         UvmToolsGetProcessorUuidTable_V2.
+//
+//     NV_ERR_NO_MEMORY:
+//         Internal memory allocation failed.
+//------------------------------------------------------------------------------
+NV_STATUS UvmToolsGetProcessorUuidTable_V2(UvmToolsSessionHandle     session,
+                                           NvProcessorUuid          *table);

 //------------------------------------------------------------------------------
 // UvmToolsFlushEvents
--- a/kernel-open/nvidia-uvm/uvm_ampere_fault_buffer.c
+++ b/kernel-open/nvidia-uvm/uvm_ampere_fault_buffer.c
@@ -0,0 +1,75 @@
+/*******************************************************************************
+    Copyright (c) 2024 NVIDIA Corporation
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to
+    deal in the Software without restriction, including without limitation the
+    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+        The above copyright notice and this permission notice shall be
+        included in all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+#include "uvm_linux.h"
+#include "uvm_global.h"
+#include "uvm_gpu.h"
+#include "uvm_hal.h"
+#include "hwref/ampere/ga100/dev_fault.h"
+
+static bool client_id_ce(NvU16 client_id)
+{
+    if (client_id >= NV_PFAULT_CLIENT_HUB_HSCE0 && client_id <= NV_PFAULT_CLIENT_HUB_HSCE9)
+        return true;
+
+    if (client_id >= NV_PFAULT_CLIENT_HUB_HSCE10 && client_id <= NV_PFAULT_CLIENT_HUB_HSCE15)
+        return true;
+
+    switch (client_id) {
+        case NV_PFAULT_CLIENT_HUB_CE0:
+        case NV_PFAULT_CLIENT_HUB_CE1:
+        case NV_PFAULT_CLIENT_HUB_CE2:
+            return true;
+    }
+
+    return false;
+}
+
+uvm_mmu_engine_type_t uvm_hal_ampere_fault_buffer_get_mmu_engine_type(NvU16 mmu_engine_id,
+                                                                      uvm_fault_client_type_t client_type,
+                                                                      NvU16 client_id)
+{
+    // Servicing CE and Host (HUB clients) faults.
+    if (client_type == UVM_FAULT_CLIENT_TYPE_HUB) {
+        if (client_id_ce(client_id)) {
+            UVM_ASSERT(mmu_engine_id >= NV_PFAULT_MMU_ENG_ID_CE0 && mmu_engine_id <= NV_PFAULT_MMU_ENG_ID_CE9);
+
+            return UVM_MMU_ENGINE_TYPE_CE;
+        }
+
+        if (client_id == NV_PFAULT_CLIENT_HUB_HOST || client_id == NV_PFAULT_CLIENT_HUB_ESC) {
+            UVM_ASSERT(mmu_engine_id >= NV_PFAULT_MMU_ENG_ID_HOST0 && mmu_engine_id <= NV_PFAULT_MMU_ENG_ID_HOST31);
+
+            return UVM_MMU_ENGINE_TYPE_HOST;
+        }
+    }
+
+    // We shouldn't be servicing faults from any other engines other than GR.
+    UVM_ASSERT_MSG(client_id <= NV_PFAULT_CLIENT_GPC_ROP_3, "Unexpected client ID: 0x%x\n", client_id);
+    UVM_ASSERT_MSG(mmu_engine_id >= NV_PFAULT_MMU_ENG_ID_GRAPHICS && mmu_engine_id < NV_PFAULT_MMU_ENG_ID_BAR1,
+                   "Unexpected engine ID: 0x%x\n",
+                   mmu_engine_id);
+    UVM_ASSERT(client_type == UVM_FAULT_CLIENT_TYPE_GPC);
+
+    return UVM_MMU_ENGINE_TYPE_GRAPHICS;
+}
--- a/kernel-open/nvidia-uvm/uvm_ampere_host.c
+++ b/kernel-open/nvidia-uvm/uvm_ampere_host.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2018-2023 NVIDIA Corporation
+    Copyright (c) 2018-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -205,17 +205,18 @@ void uvm_hal_ampere_host_clear_faulted_channel_sw_method(uvm_push_t *push,
                     CLEAR_FAULTED_B, HWVALUE(C076, CLEAR_FAULTED_B, INST_HI, instance_ptr_hi));
 }

-// Copy from Pascal, this version sets TLB_INVALIDATE_INVAL_SCOPE.
+// Copy from Turing, this version sets TLB_INVALIDATE_INVAL_SCOPE.
 void uvm_hal_ampere_host_tlb_invalidate_all(uvm_push_t *push,
-                                            uvm_gpu_phys_address_t pdb,
-                                            NvU32 depth,
-                                            uvm_membar_t membar)
+                                           uvm_gpu_phys_address_t pdb,
+                                           NvU32 depth,
+                                           uvm_membar_t membar)
 {
    NvU32 aperture_value;
    NvU32 page_table_level;
    NvU32 pdb_lo;
    NvU32 pdb_hi;
    NvU32 ack_value = 0;
+    NvU32 sysmembar_value = 0;

    UVM_ASSERT_MSG(pdb.aperture == UVM_APERTURE_VID || pdb.aperture == UVM_APERTURE_SYS, "aperture: %u", pdb.aperture);

@@ -230,8 +231,8 @@ void uvm_hal_ampere_host_tlb_invalidate_all(uvm_push_t *push,
    pdb_lo = pdb.address & HWMASK(C56F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
    pdb_hi = pdb.address >> HWSIZE(C56F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);

-    // PDE3 is the highest level on Pascal, see the comment in uvm_pascal_mmu.c
-    // for details.
+    // PDE3 is the highest level on Pascal-Ampere, see the comment in
+    // uvm_pascal_mmu.c for details.
    UVM_ASSERT_MSG(depth < NVC56F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE3, "depth %u", depth);
    page_table_level = NVC56F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE3 - depth;

@@ -242,7 +243,12 @@ void uvm_hal_ampere_host_tlb_invalidate_all(uvm_push_t *push,
        ack_value = HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_ACK_TYPE, GLOBALLY);
    }

-    NV_PUSH_4U(C56F, MEM_OP_A, HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS) |
+    if (membar == UVM_MEMBAR_SYS)
+        sysmembar_value = HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, EN);
+    else
+        sysmembar_value = HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS);
+
+    NV_PUSH_4U(C56F, MEM_OP_A, sysmembar_value |
                               HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS),
                     MEM_OP_B, 0,
                     MEM_OP_C, HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_PDB, ONE) |
@@ -255,16 +261,18 @@ void uvm_hal_ampere_host_tlb_invalidate_all(uvm_push_t *push,
                     MEM_OP_D, HWCONST(C56F, MEM_OP_D, OPERATION, MMU_TLB_INVALIDATE) |
                               HWVALUE(C56F, MEM_OP_D, TLB_INVALIDATE_PDB_ADDR_HI, pdb_hi));

-    uvm_hal_tlb_invalidate_membar(push, membar);
+    // GPU membar still requires an explicit membar method.
+    if (membar == UVM_MEMBAR_GPU)
+        uvm_push_get_gpu(push)->parent->host_hal->membar_gpu(push);
 }

-// Copy from Volta, this version sets TLB_INVALIDATE_INVAL_SCOPE.
+// Copy from Turing, this version sets TLB_INVALIDATE_INVAL_SCOPE.
 void uvm_hal_ampere_host_tlb_invalidate_va(uvm_push_t *push,
                                           uvm_gpu_phys_address_t pdb,
                                           NvU32 depth,
                                           NvU64 base,
                                           NvU64 size,
-                                           NvU32 page_size,
+                                           NvU64 page_size,
                                           uvm_membar_t membar)
 {
    NvU32 aperture_value;
@@ -272,6 +280,7 @@ void uvm_hal_ampere_host_tlb_invalidate_va(uvm_push_t *push,
    NvU32 pdb_lo;
    NvU32 pdb_hi;
    NvU32 ack_value = 0;
+    NvU32 sysmembar_value = 0;
    NvU32 va_lo;
    NvU32 va_hi;
    NvU64 end;
@@ -281,9 +290,9 @@ void uvm_hal_ampere_host_tlb_invalidate_va(uvm_push_t *push,
    NvU32 log2_invalidation_size;
    uvm_gpu_t *gpu = uvm_push_get_gpu(push);

-    UVM_ASSERT_MSG(IS_ALIGNED(page_size, 1 << 12), "page_size 0x%x\n", page_size);
-    UVM_ASSERT_MSG(IS_ALIGNED(base, page_size), "base 0x%llx page_size 0x%x\n", base, page_size);
-    UVM_ASSERT_MSG(IS_ALIGNED(size, page_size), "size 0x%llx page_size 0x%x\n", size, page_size);
+    UVM_ASSERT_MSG(IS_ALIGNED(page_size, 1 << 12), "page_size 0x%llx\n", page_size);
+    UVM_ASSERT_MSG(IS_ALIGNED(base, page_size), "base 0x%llx page_size 0x%llx\n", base, page_size);
+    UVM_ASSERT_MSG(IS_ALIGNED(size, page_size), "size 0x%llx page_size 0x%llx\n", size, page_size);
    UVM_ASSERT_MSG(size > 0, "size 0x%llx\n", size);

    // The invalidation size must be a power-of-two number of pages containing
@@ -325,7 +334,7 @@ void uvm_hal_ampere_host_tlb_invalidate_va(uvm_push_t *push,
    pdb_lo = pdb.address & HWMASK(C56F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
    pdb_hi = pdb.address >> HWSIZE(C56F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);

-    // PDE3 is the highest level on Pascal-Ampere , see the comment in
+    // PDE3 is the highest level on Pascal-Ampere, see the comment in
    // uvm_pascal_mmu.c for details.
    UVM_ASSERT_MSG(depth < NVC56F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE3, "depth %u", depth);
    page_table_level = NVC56F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE3 - depth;
@@ -337,10 +346,15 @@ void uvm_hal_ampere_host_tlb_invalidate_va(uvm_push_t *push,
        ack_value = HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_ACK_TYPE, GLOBALLY);
    }

+    if (membar == UVM_MEMBAR_SYS)
+        sysmembar_value = HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, EN);
+    else
+        sysmembar_value = HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS);
+
    NV_PUSH_4U(C56F, MEM_OP_A, HWVALUE(C56F, MEM_OP_A, TLB_INVALIDATE_INVALIDATION_SIZE, log2_invalidation_size) |
-                               HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS) |
-                               HWVALUE(C56F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO, va_lo) |
-                               HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS),
+                               HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS) |
+                               sysmembar_value |
+                               HWVALUE(C56F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO, va_lo),
                     MEM_OP_B, HWVALUE(C56F, MEM_OP_B, TLB_INVALIDATE_TARGET_ADDR_HI, va_hi),
                     MEM_OP_C, HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_PDB, ONE) |
                               HWVALUE(C56F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO, pdb_lo) |
@@ -352,21 +366,23 @@ void uvm_hal_ampere_host_tlb_invalidate_va(uvm_push_t *push,
                     MEM_OP_D, HWCONST(C56F, MEM_OP_D, OPERATION, MMU_TLB_INVALIDATE_TARGETED) |
                               HWVALUE(C56F, MEM_OP_D, TLB_INVALIDATE_PDB_ADDR_HI, pdb_hi));

-    uvm_hal_tlb_invalidate_membar(push, membar);
+    // GPU membar still requires an explicit membar method.
+    if (membar == UVM_MEMBAR_GPU)
+        gpu->parent->host_hal->membar_gpu(push);
 }

-// Copy from Pascal, this version sets TLB_INVALIDATE_INVAL_SCOPE.
+// Copy from Turing, this version sets TLB_INVALIDATE_INVAL_SCOPE.
 void uvm_hal_ampere_host_tlb_invalidate_test(uvm_push_t *push,
                                             uvm_gpu_phys_address_t pdb,
                                             UVM_TEST_INVALIDATE_TLB_PARAMS *params)
 {
    NvU32 ack_value = 0;
+    NvU32 sysmembar_value = 0;
    NvU32 invalidate_gpc_value = 0;
    NvU32 aperture_value = 0;
    NvU32 pdb_lo = 0;
    NvU32 pdb_hi = 0;
    NvU32 page_table_level = 0;
-    uvm_membar_t membar;

    UVM_ASSERT_MSG(pdb.aperture == UVM_APERTURE_VID || pdb.aperture == UVM_APERTURE_SYS, "aperture: %u", pdb.aperture);
    if (pdb.aperture == UVM_APERTURE_VID)
@@ -381,7 +397,7 @@ void uvm_hal_ampere_host_tlb_invalidate_test(uvm_push_t *push,
    pdb_hi = pdb.address >> HWSIZE(C56F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);

    if (params->page_table_level != UvmInvalidatePageTableLevelAll) {
-        // PDE3 is the highest level on Pascal, see the comment in
+        // PDE3 is the highest level on Pascal-Ampere, see the comment in
        // uvm_pascal_mmu.c for details.
        page_table_level = min((NvU32)UvmInvalidatePageTableLevelPde3, params->page_table_level) - 1;
    }
@@ -393,6 +409,11 @@ void uvm_hal_ampere_host_tlb_invalidate_test(uvm_push_t *push,
        ack_value = HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_ACK_TYPE, GLOBALLY);
    }

+    if (params->membar == UvmInvalidateTlbMemBarSys)
+        sysmembar_value = HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, EN);
+    else
+        sysmembar_value = HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS);
+
    if (params->disable_gpc_invalidate)
        invalidate_gpc_value = HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_GPC, DISABLE);
    else
@@ -403,9 +424,9 @@ void uvm_hal_ampere_host_tlb_invalidate_test(uvm_push_t *push,

        NvU32 va_lo = va & HWMASK(C56F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO);
        NvU32 va_hi = va >> HWSIZE(C56F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO);
-        NV_PUSH_4U(C56F, MEM_OP_A, HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS) |
-                                   HWVALUE(C56F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO, va_lo) |
-                                   HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS),
+        NV_PUSH_4U(C56F, MEM_OP_A, sysmembar_value |
+                                   HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS) |
+                                   HWVALUE(C56F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO, va_lo),
                         MEM_OP_B, HWVALUE(C56F, MEM_OP_B, TLB_INVALIDATE_TARGET_ADDR_HI, va_hi),
                         MEM_OP_C, HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_REPLAY, NONE) |
                                   HWVALUE(C56F, MEM_OP_C, TLB_INVALIDATE_PAGE_TABLE_LEVEL, page_table_level) |
@@ -418,7 +439,7 @@ void uvm_hal_ampere_host_tlb_invalidate_test(uvm_push_t *push,
                                   HWVALUE(C56F, MEM_OP_D, TLB_INVALIDATE_PDB_ADDR_HI, pdb_hi));
    }
    else {
-        NV_PUSH_4U(C56F, MEM_OP_A, HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS) |
+        NV_PUSH_4U(C56F, MEM_OP_A, sysmembar_value |
                                   HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS),
                         MEM_OP_B, 0,
                         MEM_OP_C, HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_REPLAY, NONE) |
@@ -432,12 +453,7 @@ void uvm_hal_ampere_host_tlb_invalidate_test(uvm_push_t *push,
                                   HWVALUE(C56F, MEM_OP_D, TLB_INVALIDATE_PDB_ADDR_HI, pdb_hi));
    }

-    if (params->membar == UvmInvalidateTlbMemBarSys)
-        membar = UVM_MEMBAR_SYS;
-    else if (params->membar == UvmInvalidateTlbMemBarLocal)
-        membar = UVM_MEMBAR_GPU;
-    else
-        membar = UVM_MEMBAR_NONE;
-
-    uvm_hal_tlb_invalidate_membar(push, membar);
+    // GPU membar still requires an explicit membar method.
+    if (params->membar == UvmInvalidateTlbMemBarLocal)
+        uvm_push_get_gpu(push)->parent->host_hal->membar_gpu(push);
 }
--- a/kernel-open/nvidia-uvm/uvm_ampere_mmu.c
+++ b/kernel-open/nvidia-uvm/uvm_ampere_mmu.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2018-2020 NVIDIA Corporation
+    Copyright (c) 2018-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -36,22 +36,7 @@
 #include "uvm_ampere_fault_buffer.h"
 #include "hwref/ampere/ga100/dev_fault.h"

-uvm_mmu_engine_type_t uvm_hal_ampere_mmu_engine_id_to_type(NvU16 mmu_engine_id)
-{
-    if (mmu_engine_id >= NV_PFAULT_MMU_ENG_ID_HOST0 && mmu_engine_id <= NV_PFAULT_MMU_ENG_ID_HOST31)
-        return UVM_MMU_ENGINE_TYPE_HOST;
-
-    if (mmu_engine_id >= NV_PFAULT_MMU_ENG_ID_CE0 && mmu_engine_id <= NV_PFAULT_MMU_ENG_ID_CE9)
-        return UVM_MMU_ENGINE_TYPE_CE;
-
-    // We shouldn't be servicing faults from any other engines
-    UVM_ASSERT_MSG(mmu_engine_id >= NV_PFAULT_MMU_ENG_ID_GRAPHICS && mmu_engine_id < NV_PFAULT_MMU_ENG_ID_BAR1,
-                   "Unexpected engine ID: 0x%x\n", mmu_engine_id);
-
-    return UVM_MMU_ENGINE_TYPE_GRAPHICS;
-}
-
-static NvU32 page_table_depth_ampere(NvU32 page_size)
+static NvU32 page_table_depth_ampere(NvU64 page_size)
 {
    // The common-case is page_size == UVM_PAGE_SIZE_2M, hence the first check
    if (page_size == UVM_PAGE_SIZE_2M)
@@ -62,14 +47,14 @@ static NvU32 page_table_depth_ampere(NvU32 page_size)
        return 4;
 }

-static NvU32 page_sizes_ampere(void)
+static NvU64 page_sizes_ampere(void)
 {
    return UVM_PAGE_SIZE_512M | UVM_PAGE_SIZE_2M | UVM_PAGE_SIZE_64K | UVM_PAGE_SIZE_4K;
 }

 static uvm_mmu_mode_hal_t ampere_mmu_mode_hal;

-uvm_mmu_mode_hal_t *uvm_hal_mmu_mode_ampere(NvU32 big_page_size)
+uvm_mmu_mode_hal_t *uvm_hal_mmu_mode_ampere(NvU64 big_page_size)
 {
    static bool initialized = false;

--- a/kernel-open/nvidia-uvm/uvm_api.h
+++ b/kernel-open/nvidia-uvm/uvm_api.h
@@ -47,7 +47,7 @@
    {                                                                               \
        params_type params;                                                         \
        BUILD_BUG_ON(sizeof(params) > UVM_MAX_IOCTL_PARAM_STACK_SIZE);              \
-        if (nv_copy_from_user(&params, (void __user*)arg, sizeof(params)))          \
+        if (copy_from_user(&params, (void __user*)arg, sizeof(params)))             \
            return -EFAULT;                                                         \
                                                                                    \
        params.rmStatus = uvm_global_get_status();                                  \
@@ -60,7 +60,7 @@
                params.rmStatus = function_name(&params, filp);                     \
        }                                                                           \
                                                                                    \
-        if (nv_copy_to_user((void __user*)arg, &params, sizeof(params)))            \
+        if (copy_to_user((void __user*)arg, &params, sizeof(params)))               \
            return -EFAULT;                                                         \
                                                                                    \
        return 0;                                                                   \
@@ -84,7 +84,7 @@
        if (!params)                                                                    \
            return -ENOMEM;                                                             \
        BUILD_BUG_ON(sizeof(*params) <= UVM_MAX_IOCTL_PARAM_STACK_SIZE);                \
-        if (nv_copy_from_user(params, (void __user*)arg, sizeof(*params))) {            \
+        if (copy_from_user(params, (void __user*)arg, sizeof(*params))) {               \
            uvm_kvfree(params);                                                         \
            return -EFAULT;                                                             \
        }                                                                               \
@@ -99,7 +99,7 @@
                params->rmStatus = function_name(params, filp);                         \
        }                                                                               \
                                                                                        \
-        if (nv_copy_to_user((void __user*)arg, params, sizeof(*params)))                \
+        if (copy_to_user((void __user*)arg, params, sizeof(*params)))                   \
            ret = -EFAULT;                                                              \
                                                                                        \
        uvm_kvfree(params);                                                             \
@@ -244,6 +244,7 @@ NV_STATUS uvm_api_migrate(UVM_MIGRATE_PARAMS *params, struct file *filp);
 NV_STATUS uvm_api_enable_system_wide_atomics(UVM_ENABLE_SYSTEM_WIDE_ATOMICS_PARAMS *params, struct file *filp);
 NV_STATUS uvm_api_disable_system_wide_atomics(UVM_DISABLE_SYSTEM_WIDE_ATOMICS_PARAMS *params, struct file *filp);
 NV_STATUS uvm_api_tools_init_event_tracker(UVM_TOOLS_INIT_EVENT_TRACKER_PARAMS *params, struct file *filp);
+NV_STATUS uvm_api_tools_init_event_tracker_v2(UVM_TOOLS_INIT_EVENT_TRACKER_V2_PARAMS *params, struct file *filp);
 NV_STATUS uvm_api_tools_set_notification_threshold(UVM_TOOLS_SET_NOTIFICATION_THRESHOLD_PARAMS *params, struct file *filp);
 NV_STATUS uvm_api_tools_event_queue_enable_events(UVM_TOOLS_EVENT_QUEUE_ENABLE_EVENTS_PARAMS *params, struct file *filp);
 NV_STATUS uvm_api_tools_event_queue_disable_events(UVM_TOOLS_EVENT_QUEUE_DISABLE_EVENTS_PARAMS *params, struct file *filp);
@@ -256,5 +257,7 @@ NV_STATUS uvm_api_unmap_external(UVM_UNMAP_EXTERNAL_PARAMS *params, struct file
 NV_STATUS uvm_api_migrate_range_group(UVM_MIGRATE_RANGE_GROUP_PARAMS *params, struct file *filp);
 NV_STATUS uvm_api_alloc_semaphore_pool(UVM_ALLOC_SEMAPHORE_POOL_PARAMS *params, struct file *filp);
 NV_STATUS uvm_api_populate_pageable(const UVM_POPULATE_PAGEABLE_PARAMS *params, struct file *filp);
+NV_STATUS uvm_api_alloc_device_p2p(UVM_ALLOC_DEVICE_P2P_PARAMS *params, struct file *filp);
+NV_STATUS uvm_api_clear_all_access_counters(UVM_CLEAR_ALL_ACCESS_COUNTERS_PARAMS *params, struct file *filp);

 #endif // __UVM_API_H__
--- a/kernel-open/nvidia-uvm/uvm_ats.c
+++ b/kernel-open/nvidia-uvm/uvm_ats.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2018-2021 NVIDIA Corporation
+    Copyright (c) 2018-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
--- a/kernel-open/nvidia-uvm/uvm_ats.h
+++ b/kernel-open/nvidia-uvm/uvm_ats.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2018-2021 NVIDIA Corporation
+    Copyright (c) 2018-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -29,10 +29,9 @@
 #include "uvm_ats_ibm.h"
 #include "nv_uvm_types.h"
 #include "uvm_lock.h"
+#include "uvm_ats_sva.h"

-    #include "uvm_ats_sva.h"
-
-    #define UVM_ATS_SUPPORTED() (UVM_ATS_IBM_SUPPORTED() || UVM_ATS_SVA_SUPPORTED())
+#define UVM_ATS_SUPPORTED() (UVM_ATS_IBM_SUPPORTED() || UVM_ATS_SVA_SUPPORTED())

 typedef struct
 {
--- a/kernel-open/nvidia-uvm/uvm_ats_faults.c
+++ b/kernel-open/nvidia-uvm/uvm_ats_faults.c
@@ -90,18 +90,19 @@ static NV_STATUS service_ats_requests(uvm_gpu_va_space_t *gpu_va_space,

    uvm_migrate_args_t uvm_migrate_args =
    {
-        .va_space                       = va_space,
-        .mm                             = mm,
-        .dst_id                         = ats_context->residency_id,
-        .dst_node_id                    = ats_context->residency_node,
-        .start                          = start,
-        .length                         = length,
-        .populate_permissions           = populate_permissions,
-        .touch                          = fault_service_type,
-        .skip_mapped                    = fault_service_type,
-        .populate_on_cpu_alloc_failures = fault_service_type,
-        .user_space_start               = &user_space_start,
-        .user_space_length              = &user_space_length,
+        .va_space                           = va_space,
+        .mm                                 = mm,
+        .dst_id                             = ats_context->residency_id,
+        .dst_node_id                        = ats_context->residency_node,
+        .start                              = start,
+        .length                             = length,
+        .populate_permissions               = populate_permissions,
+        .touch                              = fault_service_type,
+        .skip_mapped                        = fault_service_type,
+        .populate_on_cpu_alloc_failures     = fault_service_type,
+        .populate_on_migrate_vma_failures   = fault_service_type,
+        .user_space_start                   = &user_space_start,
+        .user_space_length                  = &user_space_length,
    };

    UVM_ASSERT(uvm_ats_can_service_faults(gpu_va_space, mm));
@@ -112,7 +113,7 @@ static NV_STATUS service_ats_requests(uvm_gpu_va_space_t *gpu_va_space,
    // set skip_mapped to true. For pages already mapped, this will only handle
    // PTE upgrades if needed.
    status = uvm_migrate_pageable(&uvm_migrate_args);
-    if (status == NV_WARN_NOTHING_TO_DO)
+    if (fault_service_type && (status == NV_WARN_NOTHING_TO_DO))
        status = NV_OK;

    UVM_ASSERT(status != NV_ERR_MORE_PROCESSING_REQUIRED);
@@ -379,14 +380,20 @@ static NV_STATUS ats_compute_residency_mask(uvm_gpu_va_space_t *gpu_va_space,

 static void ats_compute_prefetch_mask(uvm_gpu_va_space_t *gpu_va_space,
                                      struct vm_area_struct *vma,
+                                      uvm_ats_service_type_t service_type,
                                      uvm_ats_fault_context_t *ats_context,
                                      uvm_va_block_region_t max_prefetch_region)
 {
-    uvm_page_mask_t *accessed_mask = &ats_context->accessed_mask;
+    uvm_page_mask_t *accessed_mask;
    uvm_page_mask_t *residency_mask = &ats_context->prefetch_state.residency_mask;
    uvm_page_mask_t *prefetch_mask = &ats_context->prefetch_state.prefetch_pages_mask;
    uvm_perf_prefetch_bitmap_tree_t *bitmap_tree = &ats_context->prefetch_state.bitmap_tree;

+    if (service_type == UVM_ATS_SERVICE_TYPE_FAULTS)
+        accessed_mask = &ats_context->faults.accessed_mask;
+    else
+        accessed_mask = &ats_context->access_counters.accessed_mask;
+
    if (uvm_page_mask_empty(accessed_mask))
        return;

@@ -406,7 +413,7 @@ static NV_STATUS ats_compute_prefetch(uvm_gpu_va_space_t *gpu_va_space,
                                      uvm_ats_fault_context_t *ats_context)
 {
    NV_STATUS status;
-    uvm_page_mask_t *accessed_mask = &ats_context->accessed_mask;
+    uvm_page_mask_t *accessed_mask;
    uvm_page_mask_t *prefetch_mask = &ats_context->prefetch_state.prefetch_pages_mask;
    uvm_va_block_region_t max_prefetch_region = uvm_ats_region_from_vma(vma, base);

@@ -420,6 +427,11 @@ static NV_STATUS ats_compute_prefetch(uvm_gpu_va_space_t *gpu_va_space,
    if (!uvm_perf_prefetch_enabled(gpu_va_space->va_space))
        return status;

+    if (service_type == UVM_ATS_SERVICE_TYPE_FAULTS)
+        accessed_mask = &ats_context->faults.accessed_mask;
+    else
+        accessed_mask = &ats_context->access_counters.accessed_mask;
+
    if (uvm_page_mask_empty(accessed_mask))
        return status;

@@ -432,12 +444,12 @@ static NV_STATUS ats_compute_prefetch(uvm_gpu_va_space_t *gpu_va_space,
        uvm_page_mask_init_from_region(prefetch_mask, max_prefetch_region, NULL);
    }
    else {
-        ats_compute_prefetch_mask(gpu_va_space, vma, ats_context, max_prefetch_region);
+        ats_compute_prefetch_mask(gpu_va_space, vma, service_type, ats_context, max_prefetch_region);
    }

    if (service_type == UVM_ATS_SERVICE_TYPE_FAULTS) {
-        uvm_page_mask_t *read_fault_mask = &ats_context->read_fault_mask;
-        uvm_page_mask_t *write_fault_mask = &ats_context->write_fault_mask;
+        uvm_page_mask_t *read_fault_mask = &ats_context->faults.read_fault_mask;
+        uvm_page_mask_t *write_fault_mask = &ats_context->faults.write_fault_mask;

        uvm_page_mask_or(read_fault_mask, read_fault_mask, prefetch_mask);

@@ -459,10 +471,10 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
    NV_STATUS status = NV_OK;
    uvm_va_block_region_t subregion;
    uvm_va_block_region_t region = uvm_va_block_region(0, PAGES_PER_UVM_VA_BLOCK);
-    uvm_page_mask_t *read_fault_mask = &ats_context->read_fault_mask;
-    uvm_page_mask_t *write_fault_mask = &ats_context->write_fault_mask;
-    uvm_page_mask_t *faults_serviced_mask = &ats_context->faults_serviced_mask;
-    uvm_page_mask_t *reads_serviced_mask = &ats_context->reads_serviced_mask;
+    uvm_page_mask_t *read_fault_mask = &ats_context->faults.read_fault_mask;
+    uvm_page_mask_t *write_fault_mask = &ats_context->faults.write_fault_mask;
+    uvm_page_mask_t *faults_serviced_mask = &ats_context->faults.faults_serviced_mask;
+    uvm_page_mask_t *reads_serviced_mask = &ats_context->faults.reads_serviced_mask;
    uvm_fault_client_type_t client_type = ats_context->client_type;
    uvm_ats_service_type_t service_type = UVM_ATS_SERVICE_TYPE_FAULTS;

@@ -637,6 +649,8 @@ NV_STATUS uvm_ats_service_access_counters(uvm_gpu_va_space_t *gpu_va_space,
    UVM_ASSERT(gpu_va_space->ats.enabled);
    UVM_ASSERT(uvm_gpu_va_space_state(gpu_va_space) == UVM_GPU_VA_SPACE_STATE_ACTIVE);

+    uvm_page_mask_zero(&ats_context->access_counters.migrated_mask);
+
    uvm_assert_mmap_lock_locked(vma->vm_mm);
    uvm_assert_rwsem_locked(&gpu_va_space->va_space->lock);

@@ -650,21 +664,24 @@ NV_STATUS uvm_ats_service_access_counters(uvm_gpu_va_space_t *gpu_va_space,

    // Remove pages which are already resident at the intended destination from
    // the accessed_mask.
-    uvm_page_mask_andnot(&ats_context->accessed_mask,
-                         &ats_context->accessed_mask,
+    uvm_page_mask_andnot(&ats_context->access_counters.accessed_mask,
+                         &ats_context->access_counters.accessed_mask,
                         &ats_context->prefetch_state.residency_mask);

-    for_each_va_block_subregion_in_mask(subregion, &ats_context->accessed_mask, region) {
+    for_each_va_block_subregion_in_mask(subregion, &ats_context->access_counters.accessed_mask, region) {
        NV_STATUS status;
        NvU64 start = base + (subregion.first * PAGE_SIZE);
        size_t length = uvm_va_block_region_num_pages(subregion) * PAGE_SIZE;
        uvm_fault_access_type_t access_type = UVM_FAULT_ACCESS_TYPE_COUNT;
+        uvm_page_mask_t *migrated_mask = &ats_context->access_counters.migrated_mask;

        UVM_ASSERT(start >= vma->vm_start);
        UVM_ASSERT((start + length) <= vma->vm_end);

        status = service_ats_requests(gpu_va_space, vma, start, length, access_type, service_type, ats_context);
-        if (status != NV_OK)
+        if (status == NV_OK)
+            uvm_page_mask_region_fill(migrated_mask, subregion);
+        else if (status != NV_WARN_NOTHING_TO_DO)
            return status;
    }

--- a/kernel-open/nvidia-uvm/uvm_ats_faults.h
+++ b/kernel-open/nvidia-uvm/uvm_ats_faults.h
@@ -29,18 +29,18 @@

 // Service ATS faults in the range (base, base + UVM_VA_BLOCK_SIZE) with service
 // type for individual pages in the range requested by page masks set in
-// ats_context->read_fault_mask/write_fault_mask. base must be aligned to
+// ats_context->fault.read_fault_mask/write_fault_mask. base must be aligned to
 // UVM_VA_BLOCK_SIZE. The caller is responsible for ensuring that faulting
 // addresses fall completely within the VMA. The caller is also responsible for
 // ensuring that the faulting addresses don't overlap a GMMU region. (See
 // uvm_ats_check_in_gmmu_region). The caller is also responsible for handling
 // any errors returned by this function (fault cancellations etc.).
 //
-// Returns the fault service status in ats_context->faults_serviced_mask. In
-// addition, ats_context->reads_serviced_mask returns whether read servicing
-// worked on write faults iff the read service was also requested in the
-// corresponding bit in read_fault_mask. These returned masks are only valid if
-// the return status is NV_OK. Status other than NV_OK indicate system global
+// Returns the fault service status in ats_context->fault.faults_serviced_mask.
+// In addition, ats_context->fault.reads_serviced_mask returns whether read
+// servicing worked on write faults iff the read service was also requested in
+// the corresponding bit in read_fault_mask. These returned masks are only valid
+// if the return status is NV_OK. Status other than NV_OK indicate system global
 // fault servicing failures.
 //
 // LOCKING: The caller must retain and hold the mmap_lock and hold the va_space
@@ -52,9 +52,9 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,

 // Service access counter notifications on ATS regions in the range (base, base
 // + UVM_VA_BLOCK_SIZE) for individual pages in the range requested by page_mask
-// set in ats_context->accessed_mask. base must be aligned to UVM_VA_BLOCK_SIZE.
-// The caller is responsible for ensuring that the addresses in the
-// accessed_mask is completely covered by the VMA. The caller is also
+// set in ats_context->access_counters.accessed_mask. base must be aligned to
+// UVM_VA_BLOCK_SIZE. The caller is responsible for ensuring that the addresses
+// in the accessed_mask is completely covered by the VMA. The caller is also
 // responsible for handling any errors returned by this function.
 //
 // Returns NV_OK if servicing was successful. Any other error indicates an error
--- a/kernel-open/nvidia-uvm/uvm_ats_sva.c
+++ b/kernel-open/nvidia-uvm/uvm_ats_sva.c
@@ -127,12 +127,12 @@ static NvU32 smmu_vintf_read32(void __iomem *smmu_cmdqv_base, int reg)

 // We always use VCMDQ127 for the WAR
 #define VCMDQ 127
-void smmu_vcmdq_write32(void __iomem *smmu_cmdqv_base, int reg, NvU32 val)
+static void smmu_vcmdq_write32(void __iomem *smmu_cmdqv_base, int reg, NvU32 val)
 {
    iowrite32(val, SMMU_VCMDQ_BASE_ADDR(smmu_cmdqv_base, VCMDQ) + reg);
 }

-NvU32 smmu_vcmdq_read32(void __iomem *smmu_cmdqv_base, int reg)
+static NvU32 smmu_vcmdq_read32(void __iomem *smmu_cmdqv_base, int reg)
 {
    return ioread32(SMMU_VCMDQ_BASE_ADDR(smmu_cmdqv_base, VCMDQ) + reg);
 }
--- a/kernel-open/nvidia-uvm/uvm_blackwell.c
+++ b/kernel-open/nvidia-uvm/uvm_blackwell.c
@@ -0,0 +1,105 @@
+/*******************************************************************************
+    Copyright (c) 2022-2023 NVIDIA Corporation
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to
+    deal in the Software without restriction, including without limitation the
+    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+        The above copyright notice and this permission notice shall be
+        included in all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+#include "uvm_global.h"
+#include "uvm_hal.h"
+#include "uvm_gpu.h"
+#include "uvm_mem.h"
+#include "uvm_blackwell_fault_buffer.h"
+
+void uvm_hal_blackwell_arch_init_properties(uvm_parent_gpu_t *parent_gpu)
+{
+    parent_gpu->tlb_batch.va_invalidate_supported = true;
+
+    parent_gpu->tlb_batch.va_range_invalidate_supported = true;
+
+    // TODO: Bug 1767241: Run benchmarks to figure out a good number
+    parent_gpu->tlb_batch.max_ranges = 8;
+
+    parent_gpu->utlb_per_gpc_count = uvm_blackwell_get_utlbs_per_gpc(parent_gpu);
+
+    parent_gpu->fault_buffer_info.replayable.utlb_count = parent_gpu->rm_info.maxGpcCount *
+                                                          parent_gpu->utlb_per_gpc_count;
+    {
+        uvm_fault_buffer_entry_t *dummy;
+        UVM_ASSERT(parent_gpu->fault_buffer_info.replayable.utlb_count <= (1 <<
+                                                                           (sizeof(dummy->fault_source.utlb_id) * 8)));
+    }
+
+    // A single top level PDE on Blackwell covers 64 PB and that's the minimum
+    // size that can be used.
+    parent_gpu->rm_va_base = 0;
+    parent_gpu->rm_va_size = 64 * UVM_SIZE_1PB;
+
+    parent_gpu->uvm_mem_va_base = parent_gpu->rm_va_size + 384 * UVM_SIZE_1TB;
+    parent_gpu->uvm_mem_va_size = UVM_MEM_VA_SIZE;
+
+    // See uvm_mmu.h for mapping placement
+    parent_gpu->flat_vidmem_va_base = (64 * UVM_SIZE_1PB) + (32 * UVM_SIZE_1TB);
+
+    // TODO: Bug 3953852: Set this to true pending Blackwell changes
+    parent_gpu->ce_phys_vidmem_write_supported = !uvm_parent_gpu_is_coherent(parent_gpu);
+
+    parent_gpu->peer_copy_mode = g_uvm_global.peer_copy_mode;
+
+    // All GR context buffers may be mapped to 57b wide VAs. All "compute" units
+    // accessing GR context buffers support the 57-bit VA range.
+    parent_gpu->max_channel_va = 1ull << 57;
+
+    parent_gpu->max_host_va = 1ull << 57;
+
+    // Blackwell can map sysmem with any page size
+    parent_gpu->can_map_sysmem_with_large_pages = true;
+
+    // Prefetch instructions will generate faults
+    parent_gpu->prefetch_fault_supported = true;
+
+    // Blackwell can place GPFIFO in vidmem
+    parent_gpu->gpfifo_in_vidmem_supported = true;
+
+    parent_gpu->replayable_faults_supported = true;
+
+    parent_gpu->non_replayable_faults_supported = true;
+
+    parent_gpu->access_counters_supported = true;
+
+    parent_gpu->access_counters_can_use_physical_addresses = false;
+
+    parent_gpu->fault_cancel_va_supported = true;
+
+    parent_gpu->scoped_atomics_supported = true;
+
+    parent_gpu->has_clear_faulted_channel_sw_method = true;
+
+    parent_gpu->has_clear_faulted_channel_method = false;
+
+    parent_gpu->smc.supported = true;
+
+    parent_gpu->sparse_mappings_supported = true;
+
+    parent_gpu->map_remap_larger_page_promotion = false;
+
+    parent_gpu->plc_supported = true;
+
+    parent_gpu->no_ats_range_required = true;
+}
--- a/kernel-open/nvidia-uvm/uvm_blackwell_fault_buffer.c
+++ b/kernel-open/nvidia-uvm/uvm_blackwell_fault_buffer.c
@@ -0,0 +1,122 @@
+/*******************************************************************************
+    Copyright (c) 2023-2024 NVIDIA Corporation
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to
+    deal in the Software without restriction, including without limitation the
+    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+        The above copyright notice and this permission notice shall be
+        included in all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+#include "uvm_linux.h"
+#include "uvm_global.h"
+#include "uvm_gpu.h"
+#include "uvm_hal.h"
+#include "uvm_hal_types.h"
+#include "hwref/blackwell/gb100/dev_fault.h"
+#include "clc369.h"
+
+// NV_PFAULT_FAULT_TYPE_COMPRESSION_FAILURE fault type is deprecated on
+// Blackwell.
+uvm_fault_type_t uvm_hal_blackwell_fault_buffer_get_fault_type(const NvU32 *fault_entry)
+{
+    NvU32 hw_fault_type_value = READ_HWVALUE_MW(fault_entry, C369, BUF_ENTRY, FAULT_TYPE);
+
+    switch (hw_fault_type_value) {
+        case NV_PFAULT_FAULT_TYPE_PDE:
+            return UVM_FAULT_TYPE_INVALID_PDE;
+        case NV_PFAULT_FAULT_TYPE_PTE:
+            return UVM_FAULT_TYPE_INVALID_PTE;
+        case NV_PFAULT_FAULT_TYPE_RO_VIOLATION:
+            return UVM_FAULT_TYPE_WRITE;
+        case NV_PFAULT_FAULT_TYPE_ATOMIC_VIOLATION:
+            return UVM_FAULT_TYPE_ATOMIC;
+        case NV_PFAULT_FAULT_TYPE_WO_VIOLATION:
+            return UVM_FAULT_TYPE_READ;
+
+        case NV_PFAULT_FAULT_TYPE_PDE_SIZE:
+            return UVM_FAULT_TYPE_PDE_SIZE;
+        case NV_PFAULT_FAULT_TYPE_VA_LIMIT_VIOLATION:
+            return UVM_FAULT_TYPE_VA_LIMIT_VIOLATION;
+        case NV_PFAULT_FAULT_TYPE_UNBOUND_INST_BLOCK:
+            return UVM_FAULT_TYPE_UNBOUND_INST_BLOCK;
+        case NV_PFAULT_FAULT_TYPE_PRIV_VIOLATION:
+            return UVM_FAULT_TYPE_PRIV_VIOLATION;
+        case NV_PFAULT_FAULT_TYPE_PITCH_MASK_VIOLATION:
+            return UVM_FAULT_TYPE_PITCH_MASK_VIOLATION;
+        case NV_PFAULT_FAULT_TYPE_WORK_CREATION:
+            return UVM_FAULT_TYPE_WORK_CREATION;
+        case NV_PFAULT_FAULT_TYPE_UNSUPPORTED_APERTURE:
+            return UVM_FAULT_TYPE_UNSUPPORTED_APERTURE;
+        case NV_PFAULT_FAULT_TYPE_CC_VIOLATION:
+            return UVM_FAULT_TYPE_CC_VIOLATION;
+        case NV_PFAULT_FAULT_TYPE_UNSUPPORTED_KIND:
+            return UVM_FAULT_TYPE_UNSUPPORTED_KIND;
+        case NV_PFAULT_FAULT_TYPE_REGION_VIOLATION:
+            return UVM_FAULT_TYPE_REGION_VIOLATION;
+        case NV_PFAULT_FAULT_TYPE_POISONED:
+            return UVM_FAULT_TYPE_POISONED;
+    }
+
+    UVM_ASSERT_MSG(false, "Invalid fault type value: %d\n", hw_fault_type_value);
+
+    return UVM_FAULT_TYPE_COUNT;
+}
+
+static bool client_id_ce(NvU16 client_id)
+{
+    if (client_id >= NV_PFAULT_CLIENT_HUB_HSCE0 && client_id <= NV_PFAULT_CLIENT_HUB_HSCE7)
+        return true;
+
+    switch (client_id) {
+        case NV_PFAULT_CLIENT_HUB_CE0:
+        case NV_PFAULT_CLIENT_HUB_CE1:
+        case NV_PFAULT_CLIENT_HUB_CE2:
+        case NV_PFAULT_CLIENT_HUB_CE3:
+            return true;
+    }
+
+    return false;
+}
+
+uvm_mmu_engine_type_t uvm_hal_blackwell_fault_buffer_get_mmu_engine_type(NvU16 mmu_engine_id,
+                                                                         uvm_fault_client_type_t client_type,
+                                                                         NvU16 client_id)
+{
+    // Servicing CE and Host (HUB clients) faults.
+    if (client_type == UVM_FAULT_CLIENT_TYPE_HUB) {
+        if (client_id_ce(client_id)) {
+            UVM_ASSERT(mmu_engine_id >= NV_PFAULT_MMU_ENG_ID_CE0 && mmu_engine_id <= NV_PFAULT_MMU_ENG_ID_CE19);
+
+            return UVM_MMU_ENGINE_TYPE_CE;
+        }
+
+        if (client_id == NV_PFAULT_CLIENT_HUB_HOST ||
+            (client_id >= NV_PFAULT_CLIENT_HUB_ESC0 && client_id <= NV_PFAULT_CLIENT_HUB_ESC11)) {
+            UVM_ASSERT((mmu_engine_id >= NV_PFAULT_MMU_ENG_ID_HOST0 && mmu_engine_id <= NV_PFAULT_MMU_ENG_ID_HOST44) ||
+                       (mmu_engine_id >= NV_PFAULT_MMU_ENG_ID_GRAPHICS));
+
+            return UVM_MMU_ENGINE_TYPE_HOST;
+        }
+    }
+
+    // We shouldn't be servicing faults from any other engines other than GR.
+    UVM_ASSERT_MSG(client_id <= NV_PFAULT_CLIENT_GPC_ROP_3, "Unexpected client ID: 0x%x\n", client_id);
+    UVM_ASSERT_MSG(mmu_engine_id >= NV_PFAULT_MMU_ENG_ID_GRAPHICS, "Unexpected engine ID: 0x%x\n", mmu_engine_id);
+    UVM_ASSERT(client_type == UVM_FAULT_CLIENT_TYPE_GPC);
+
+    return UVM_MMU_ENGINE_TYPE_GRAPHICS;
+}
--- a/kernel-open/nvidia-uvm/uvm_blackwell_fault_buffer.h
+++ b/kernel-open/nvidia-uvm/uvm_blackwell_fault_buffer.h
@@ -0,0 +1,92 @@
+/*******************************************************************************
+    Copyright (c) 2022 NVIDIA Corporation
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to
+    deal in the Software without restriction, including without limitation the
+    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+        The above copyright notice and this permission notice shall be
+        included in all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+#ifndef __UVM_HAL_BLACKWELL_FAULT_BUFFER_H__
+#define __UVM_HAL_BLACKWELL_FAULT_BUFFER_H__
+
+#include "nvtypes.h"
+#include "uvm_common.h"
+#include "uvm_gpu.h"
+
+// There are up to 10 TPCs per GPC in Blackwell, and there are 2 LTP uTLBs per
+// TPC. Besides, there is one active RGG uTLB per GPC. Each TPC has a number of
+// clients that can make requests to its uTLBs: 1xTPCCS, 1xPE, 2xT1. Requests
+// from these units are routed as follows to the 2 LTP uTLBs:
+//
+// --------                    ---------
+// | T1_0 | -----------------> | uTLB0 |
+// --------                    ---------
+//
+// --------                    ---------
+// | T1_1 | -----------------> | uTLB1 |
+// --------          --------> ---------
+//                   |             ^
+// -------           |             |
+// | PE  | -----------             |
+// -------                         |
+//                                 |
+// ---------                       |
+// | TPCCS | -----------------------
+// ---------
+//
+//
+// The client ids are local to their GPC and the id mapping is linear across
+// TPCs: TPC_n has TPCCS_n, PE_n, T1_p, and T1_q, where p=2*n and q=p+1.
+//
+// NV_PFAULT_CLIENT_GPC_LTP_UTLB_n and NV_PFAULT_CLIENT_GPC_RGG_UTLB enums can
+// be ignored. These will never be reported in a fault message, and should
+// never be used in an invalidate. Therefore, we define our own values.
+typedef enum {
+    UVM_BLACKWELL_GPC_UTLB_ID_RGG = 0,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP0 = 1,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP1 = 2,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP2 = 3,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP3 = 4,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP4 = 5,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP5 = 6,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP6 = 7,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP7 = 8,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP8 = 9,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP9 = 10,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP10 = 11,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP11 = 12,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP12 = 13,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP13 = 14,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP14 = 15,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP15 = 16,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP16 = 17,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP17 = 18,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP18 = 19,
+    UVM_BLACKWELL_GPC_UTLB_ID_LTP19 = 20,
+
+    UVM_BLACKWELL_GPC_UTLB_COUNT,
+} uvm_blackwell_gpc_utlb_id_t;
+
+static NvU32 uvm_blackwell_get_utlbs_per_gpc(uvm_parent_gpu_t *parent_gpu)
+{
+    NvU32 utlbs = parent_gpu->rm_info.maxTpcPerGpcCount * 2 + 1;
+    UVM_ASSERT(utlbs <= UVM_BLACKWELL_GPC_UTLB_COUNT);
+    return utlbs;
+}
+
+#endif
--- a/kernel-open/nvidia-uvm/uvm_blackwell_host.c
+++ b/kernel-open/nvidia-uvm/uvm_blackwell_host.c
@@ -0,0 +1,256 @@
+/*******************************************************************************
+    Copyright (c) 2024 NVIDIA Corporation
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to
+    deal in the Software without restriction, including without limitation the
+    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+        The above copyright notice and this permission notice shall be
+        included in all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+#include "uvm_hal.h"
+#include "uvm_push.h"
+#include "uvm_push_macros.h"
+#include "clc96f.h"
+
+// TODO: Bug 3210931: Rename HOST references and files to ESCHED.
+
+void uvm_hal_blackwell_host_tlb_invalidate_all(uvm_push_t *push,
+                                               uvm_gpu_phys_address_t pdb,
+                                               NvU32 depth,
+                                               uvm_membar_t membar)
+{
+    NvU32 aperture_value;
+    NvU32 page_table_level;
+    NvU32 pdb_lo;
+    NvU32 pdb_hi;
+    NvU32 ack_value = 0;
+    NvU32 sysmembar_value = 0;
+
+    UVM_ASSERT_MSG(pdb.aperture == UVM_APERTURE_VID || pdb.aperture == UVM_APERTURE_SYS, "aperture: %u", pdb.aperture);
+
+    if (pdb.aperture == UVM_APERTURE_VID)
+        aperture_value = HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_APERTURE, VID_MEM);
+    else
+        aperture_value = HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_APERTURE, SYS_MEM_COHERENT);
+
+    UVM_ASSERT_MSG(IS_ALIGNED(pdb.address, 1 << 12), "pdb 0x%llx\n", pdb.address);
+    pdb.address >>= 12;
+
+    pdb_lo = pdb.address & HWMASK(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
+    pdb_hi = pdb.address >> HWSIZE(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
+
+    // PDE4 is the highest level on Blackwell, see the comment in
+    // uvm_blackwell_mmu.c for details.
+    UVM_ASSERT_MSG(depth < NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE4, "depth %u", depth);
+    page_table_level = NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE4 - depth;
+
+    if (membar != UVM_MEMBAR_NONE)
+        ack_value = HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_ACK_TYPE, GLOBALLY);
+
+    if (membar == UVM_MEMBAR_SYS)
+        sysmembar_value = HWCONST(C96F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, EN);
+    else
+        sysmembar_value = HWCONST(C96F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS);
+
+    NV_PUSH_4U(C96F, MEM_OP_A, sysmembar_value |
+                               HWCONST(C96F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS),
+                     MEM_OP_B, 0,
+                     MEM_OP_C, HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_PDB, ONE) |
+                               HWVALUE(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO, pdb_lo) |
+                               HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_GPC, ENABLE) |
+                               HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_REPLAY, NONE) |
+                               HWVALUE(C96F, MEM_OP_C, TLB_INVALIDATE_PAGE_TABLE_LEVEL, page_table_level) |
+                               aperture_value |
+                               ack_value,
+                     MEM_OP_D, HWCONST(C96F, MEM_OP_D, OPERATION, MMU_TLB_INVALIDATE) |
+                               HWVALUE(C96F, MEM_OP_D, TLB_INVALIDATE_PDB_ADDR_HI, pdb_hi));
+}
+
+void uvm_hal_blackwell_host_tlb_invalidate_va(uvm_push_t *push,
+                                              uvm_gpu_phys_address_t pdb,
+                                              NvU32 depth,
+                                              NvU64 base,
+                                              NvU64 size,
+                                              NvU64 page_size,
+                                              uvm_membar_t membar)
+{
+    NvU32 aperture_value;
+    NvU32 page_table_level;
+    NvU32 pdb_lo;
+    NvU32 pdb_hi;
+    NvU32 ack_value = 0;
+    NvU32 sysmembar_value = 0;
+    NvU32 va_lo;
+    NvU32 va_hi;
+    NvU64 end;
+    NvU64 actual_base;
+    NvU64 actual_size;
+    NvU64 actual_end;
+    NvU32 log2_invalidation_size;
+    uvm_gpu_t *gpu = uvm_push_get_gpu(push);
+
+    UVM_ASSERT_MSG(IS_ALIGNED(page_size, 1 << 12), "page_size 0x%llx\n", page_size);
+    UVM_ASSERT_MSG(IS_ALIGNED(base, page_size), "base 0x%llx page_size 0x%llx\n", base, page_size);
+    UVM_ASSERT_MSG(IS_ALIGNED(size, page_size), "size 0x%llx page_size 0x%llx\n", size, page_size);
+    UVM_ASSERT_MSG(size > 0, "size 0x%llx\n", size);
+
+    // The invalidation size must be a power-of-two number of pages containing
+    // the passed interval
+    end = base + size - 1;
+    log2_invalidation_size = __fls((unsigned long)(end ^ base)) + 1;
+
+    if (log2_invalidation_size == 64) {
+        // Invalidate everything
+        gpu->parent->host_hal->tlb_invalidate_all(push, pdb, depth, membar);
+        return;
+    }
+
+    // The hardware aligns the target address down to the invalidation size.
+    actual_size = 1ULL << log2_invalidation_size;
+    actual_base = UVM_ALIGN_DOWN(base, actual_size);
+    actual_end = actual_base + actual_size - 1;
+    UVM_ASSERT(actual_end >= end);
+
+    // The invalidation size field expects log2(invalidation size in 4K), not
+    // log2(invalidation size in bytes)
+    log2_invalidation_size -= 12;
+
+    // Address to invalidate, as a multiple of 4K.
+    base >>= 12;
+    va_lo = base & HWMASK(C96F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO);
+    va_hi = base >> HWSIZE(C96F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO);
+
+    UVM_ASSERT_MSG(pdb.aperture == UVM_APERTURE_VID || pdb.aperture == UVM_APERTURE_SYS, "aperture: %u", pdb.aperture);
+
+    if (pdb.aperture == UVM_APERTURE_VID)
+        aperture_value = HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_APERTURE, VID_MEM);
+    else
+        aperture_value = HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_APERTURE, SYS_MEM_COHERENT);
+
+    UVM_ASSERT_MSG(IS_ALIGNED(pdb.address, 1 << 12), "pdb 0x%llx\n", pdb.address);
+    pdb.address >>= 12;
+
+    pdb_lo = pdb.address & HWMASK(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
+    pdb_hi = pdb.address >> HWSIZE(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
+
+    // PDE4 is the highest level on Blackwell, see the comment in
+    // uvm_blackwell_mmu.c for details.
+    UVM_ASSERT_MSG(depth < NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE4, "depth %u", depth);
+    page_table_level = NVC96F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE4 - depth;
+
+    if (membar != UVM_MEMBAR_NONE)
+        ack_value = HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_ACK_TYPE, GLOBALLY);
+
+    if (membar == UVM_MEMBAR_SYS)
+        sysmembar_value = HWCONST(C96F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, EN);
+    else
+        sysmembar_value = HWCONST(C96F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS);
+
+    NV_PUSH_4U(C96F, MEM_OP_A, HWVALUE(C96F, MEM_OP_A, TLB_INVALIDATE_INVALIDATION_SIZE, log2_invalidation_size) |
+                               sysmembar_value |
+                               HWCONST(C96F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS) |
+                               HWVALUE(C96F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO, va_lo),
+                     MEM_OP_B, HWVALUE(C96F, MEM_OP_B, TLB_INVALIDATE_TARGET_ADDR_HI, va_hi),
+                     MEM_OP_C, HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_PDB, ONE) |
+                               HWVALUE(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO, pdb_lo) |
+                               HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_GPC, ENABLE) |
+                               HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_REPLAY, NONE) |
+                               HWVALUE(C96F, MEM_OP_C, TLB_INVALIDATE_PAGE_TABLE_LEVEL, page_table_level) |
+                               aperture_value |
+                               ack_value,
+                     MEM_OP_D, HWCONST(C96F, MEM_OP_D, OPERATION, MMU_TLB_INVALIDATE_TARGETED) |
+                               HWVALUE(C96F, MEM_OP_D, TLB_INVALIDATE_PDB_ADDR_HI, pdb_hi));
+}
+
+void uvm_hal_blackwell_host_tlb_invalidate_test(uvm_push_t *push,
+                                                uvm_gpu_phys_address_t pdb,
+                                                UVM_TEST_INVALIDATE_TLB_PARAMS *params)
+{
+    NvU32 ack_value = 0;
+    NvU32 sysmembar_value = 0;
+    NvU32 invalidate_gpc_value = 0;
+    NvU32 aperture_value = 0;
+    NvU32 pdb_lo = 0;
+    NvU32 pdb_hi = 0;
+    NvU32 page_table_level = 0;
+
+    UVM_ASSERT_MSG(pdb.aperture == UVM_APERTURE_VID || pdb.aperture == UVM_APERTURE_SYS, "aperture: %u", pdb.aperture);
+    if (pdb.aperture == UVM_APERTURE_VID)
+        aperture_value = HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_APERTURE, VID_MEM);
+    else
+        aperture_value = HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_APERTURE, SYS_MEM_COHERENT);
+
+    UVM_ASSERT_MSG(IS_ALIGNED(pdb.address, 1 << 12), "pdb 0x%llx\n", pdb.address);
+    pdb.address >>= 12;
+
+    pdb_lo = pdb.address & HWMASK(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
+    pdb_hi = pdb.address >> HWSIZE(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
+
+    if (params->page_table_level != UvmInvalidatePageTableLevelAll) {
+        // PDE4 is the highest level on Blackwell, see the comment in
+        // uvm_blackwell_mmu.c for details.
+        page_table_level = min((NvU32)UvmInvalidatePageTableLevelPde4, params->page_table_level) - 1;
+    }
+
+    if (params->membar != UvmInvalidateTlbMemBarNone)
+        ack_value = HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_ACK_TYPE, GLOBALLY);
+
+    if (params->membar == UvmInvalidateTlbMemBarSys)
+        sysmembar_value = HWCONST(C96F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, EN);
+    else
+        sysmembar_value = HWCONST(C96F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS);
+
+    if (params->disable_gpc_invalidate)
+        invalidate_gpc_value = HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_GPC, DISABLE);
+    else
+        invalidate_gpc_value = HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_GPC, ENABLE);
+
+    if (params->target_va_mode == UvmTargetVaModeTargeted) {
+        NvU64 va = params->va >> 12;
+
+        NvU32 va_lo = va & HWMASK(C96F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO);
+        NvU32 va_hi = va >> HWSIZE(C96F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO);
+
+        NV_PUSH_4U(C96F, MEM_OP_A, sysmembar_value |
+                                   HWCONST(C96F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS) |
+                                   HWVALUE(C96F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO, va_lo),
+                         MEM_OP_B, HWVALUE(C96F, MEM_OP_B, TLB_INVALIDATE_TARGET_ADDR_HI, va_hi),
+                         MEM_OP_C, HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_REPLAY, NONE) |
+                                   HWVALUE(C96F, MEM_OP_C, TLB_INVALIDATE_PAGE_TABLE_LEVEL, page_table_level) |
+                                   HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_PDB, ONE) |
+                                   HWVALUE(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO, pdb_lo) |
+                                   invalidate_gpc_value |
+                                   aperture_value |
+                                   ack_value,
+                         MEM_OP_D, HWCONST(C96F, MEM_OP_D, OPERATION, MMU_TLB_INVALIDATE_TARGETED) |
+                                   HWVALUE(C96F, MEM_OP_D, TLB_INVALIDATE_PDB_ADDR_HI, pdb_hi));
+    }
+    else {
+        NV_PUSH_4U(C96F, MEM_OP_A, sysmembar_value |
+                                   HWCONST(C96F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS),
+                         MEM_OP_B, 0,
+                         MEM_OP_C, HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_REPLAY, NONE) |
+                                   HWVALUE(C96F, MEM_OP_C, TLB_INVALIDATE_PAGE_TABLE_LEVEL, page_table_level) |
+                                   HWCONST(C96F, MEM_OP_C, TLB_INVALIDATE_PDB, ONE) |
+                                   HWVALUE(C96F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO, pdb_lo) |
+                                   invalidate_gpc_value |
+                                   aperture_value |
+                                   ack_value,
+                         MEM_OP_D, HWCONST(C96F, MEM_OP_D, OPERATION, MMU_TLB_INVALIDATE) |
+                                   HWVALUE(C96F, MEM_OP_D, TLB_INVALIDATE_PDB_ADDR_HI, pdb_hi));
+    }
+}
--- a/kernel-open/nvidia-uvm/uvm_blackwell_mmu.c
+++ b/kernel-open/nvidia-uvm/uvm_blackwell_mmu.c
@@ -0,0 +1,165 @@
+/*******************************************************************************
+    Copyright (c) 2022-2024 NVIDIA Corporation
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to
+    deal in the Software without restriction, including without limitation the
+    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+        The above copyright notice and this permission notice shall be
+        included in all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+// On Blackwell, the UVM page tree 'depth' maps to hardware as follows:
+//
+// UVM depth   HW level                            VA bits
+// 0           PDE4                                56:56
+// 1           PDE3                                55:47
+// 2           PDE2 (or 256G PTE)                  46:38
+// 3           PDE1 (or 512M PTE)                  37:29
+// 4           PDE0 (dual 64K/4K PDE, or 2M PTE)   28:21
+// 5           PTE_64K / PTE_4K                    20:16 / 20:12
+
+#include "uvm_types.h"
+#include "uvm_global.h"
+#include "uvm_hal.h"
+#include "uvm_hal_types.h"
+#include "uvm_blackwell_fault_buffer.h"
+#include "hwref/blackwell/gb100/dev_fault.h"
+#include "hwref/blackwell/gb100/dev_mmu.h"
+
+static uvm_mmu_mode_hal_t blackwell_mmu_mode_hal;
+
+static NvU32 page_table_depth_blackwell(NvU64 page_size)
+{
+    switch (page_size) {
+        case UVM_PAGE_SIZE_2M:
+            return 4;
+        case UVM_PAGE_SIZE_512M:
+            return 3;
+        case UVM_PAGE_SIZE_256G:
+            return 2;
+        default:
+            return 5;
+    }
+}
+
+static NvU64 page_sizes_blackwell(void)
+{
+    return UVM_PAGE_SIZE_256G | UVM_PAGE_SIZE_512M | UVM_PAGE_SIZE_2M | UVM_PAGE_SIZE_64K | UVM_PAGE_SIZE_4K;
+}
+
+uvm_mmu_mode_hal_t *uvm_hal_mmu_mode_blackwell(NvU64 big_page_size)
+{
+    static bool initialized = false;
+
+    UVM_ASSERT(big_page_size == UVM_PAGE_SIZE_64K || big_page_size == UVM_PAGE_SIZE_128K);
+
+    // TODO: Bug 1789555: RM should reject the creation of GPU VA spaces with
+    // 128K big page size for Pascal+ GPUs
+    if (big_page_size == UVM_PAGE_SIZE_128K)
+        return NULL;
+
+    if (!initialized) {
+        uvm_mmu_mode_hal_t *hopper_mmu_mode_hal = uvm_hal_mmu_mode_hopper(big_page_size);
+        UVM_ASSERT(hopper_mmu_mode_hal);
+
+        // The assumption made is that arch_hal->mmu_mode_hal() will be called
+        // under the global lock the first time, so check it here.
+        uvm_assert_mutex_locked(&g_uvm_global.global_lock);
+
+        blackwell_mmu_mode_hal = *hopper_mmu_mode_hal;
+        blackwell_mmu_mode_hal.page_table_depth = page_table_depth_blackwell;
+        blackwell_mmu_mode_hal.page_sizes = page_sizes_blackwell;
+
+        initialized = true;
+    }
+
+    return &blackwell_mmu_mode_hal;
+}
+
+NvU16 uvm_hal_blackwell_mmu_client_id_to_utlb_id(NvU16 client_id)
+{
+    switch (client_id) {
+        case NV_PFAULT_CLIENT_GPC_RAST:
+        case NV_PFAULT_CLIENT_GPC_GCC:
+        case NV_PFAULT_CLIENT_GPC_GPCCS:
+            return UVM_BLACKWELL_GPC_UTLB_ID_RGG;
+        case NV_PFAULT_CLIENT_GPC_T1_0:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP0;
+        case NV_PFAULT_CLIENT_GPC_T1_1:
+        case NV_PFAULT_CLIENT_GPC_PE_0:
+        case NV_PFAULT_CLIENT_GPC_TPCCS_0:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP1;
+        case NV_PFAULT_CLIENT_GPC_T1_2:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP2;
+        case NV_PFAULT_CLIENT_GPC_T1_3:
+        case NV_PFAULT_CLIENT_GPC_PE_1:
+        case NV_PFAULT_CLIENT_GPC_TPCCS_1:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP3;
+        case NV_PFAULT_CLIENT_GPC_T1_4:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP4;
+        case NV_PFAULT_CLIENT_GPC_T1_5:
+        case NV_PFAULT_CLIENT_GPC_PE_2:
+        case NV_PFAULT_CLIENT_GPC_TPCCS_2:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP5;
+        case NV_PFAULT_CLIENT_GPC_T1_6:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP6;
+        case NV_PFAULT_CLIENT_GPC_T1_7:
+        case NV_PFAULT_CLIENT_GPC_PE_3:
+        case NV_PFAULT_CLIENT_GPC_TPCCS_3:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP7;
+        case NV_PFAULT_CLIENT_GPC_T1_8:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP8;
+        case NV_PFAULT_CLIENT_GPC_T1_9:
+        case NV_PFAULT_CLIENT_GPC_PE_4:
+        case NV_PFAULT_CLIENT_GPC_TPCCS_4:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP9;
+        case NV_PFAULT_CLIENT_GPC_T1_10:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP10;
+        case NV_PFAULT_CLIENT_GPC_T1_11:
+        case NV_PFAULT_CLIENT_GPC_PE_5:
+        case NV_PFAULT_CLIENT_GPC_TPCCS_5:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP11;
+        case NV_PFAULT_CLIENT_GPC_T1_12:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP12;
+        case NV_PFAULT_CLIENT_GPC_T1_13:
+        case NV_PFAULT_CLIENT_GPC_PE_6:
+        case NV_PFAULT_CLIENT_GPC_TPCCS_6:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP13;
+        case NV_PFAULT_CLIENT_GPC_T1_14:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP14;
+        case NV_PFAULT_CLIENT_GPC_T1_15:
+        case NV_PFAULT_CLIENT_GPC_PE_7:
+        case NV_PFAULT_CLIENT_GPC_TPCCS_7:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP15;
+        case NV_PFAULT_CLIENT_GPC_T1_16:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP16;
+        case NV_PFAULT_CLIENT_GPC_T1_17:
+        case NV_PFAULT_CLIENT_GPC_PE_8:
+        case NV_PFAULT_CLIENT_GPC_TPCCS_8:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP17;
+        case NV_PFAULT_CLIENT_GPC_T1_18:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP18;
+        case NV_PFAULT_CLIENT_GPC_T1_19:
+        case NV_PFAULT_CLIENT_GPC_PE_9:
+        case NV_PFAULT_CLIENT_GPC_TPCCS_9:
+            return UVM_BLACKWELL_GPC_UTLB_ID_LTP19;
+
+        default:
+            UVM_ASSERT_MSG(false, "Invalid client value: 0x%x\n", client_id);
+    }
+
+    return 0;
+}
--- a/kernel-open/nvidia-uvm/uvm_ce_test.c
+++ b/kernel-open/nvidia-uvm/uvm_ce_test.c
@@ -855,6 +855,7 @@ static NV_STATUS cpu_decrypt_in_order(uvm_channel_t *channel,
                                      uvm_mem_t *dst_mem,
                                      uvm_mem_t *src_mem,
                                      const UvmCslIv *decrypt_iv,
+                                      NvU32 key_version,
                                      uvm_mem_t *auth_tag_mem,
                                      size_t size,
                                      NvU32 copy_size)
@@ -869,6 +870,7 @@ static NV_STATUS cpu_decrypt_in_order(uvm_channel_t *channel,
                                                         dst_plain + i * copy_size,
                                                         src_cipher + i * copy_size,
                                                         decrypt_iv + i,
+                                                         key_version,
                                                         copy_size,
                                                         auth_tag_buffer + i * UVM_CONF_COMPUTING_AUTH_TAG_SIZE));
    }
@@ -879,6 +881,7 @@ static NV_STATUS cpu_decrypt_out_of_order(uvm_channel_t *channel,
                                          uvm_mem_t *dst_mem,
                                          uvm_mem_t *src_mem,
                                          const UvmCslIv *decrypt_iv,
+                                          NvU32 key_version,
                                          uvm_mem_t *auth_tag_mem,
                                          size_t size,
                                          NvU32 copy_size)
@@ -896,6 +899,7 @@ static NV_STATUS cpu_decrypt_out_of_order(uvm_channel_t *channel,
                                                         dst_plain + i * copy_size,
                                                         src_cipher + i * copy_size,
                                                         decrypt_iv + i,
+                                                         key_version,
                                                         copy_size,
                                                         auth_tag_buffer + i * UVM_CONF_COMPUTING_AUTH_TAG_SIZE));
    }
@@ -959,7 +963,7 @@ static void gpu_encrypt(uvm_push_t *push,
                                                          i * UVM_CONF_COMPUTING_AUTH_TAG_SIZE,
                                                          dst_cipher);

-        uvm_conf_computing_log_gpu_encryption(push->channel, decrypt_iv);
+        uvm_conf_computing_log_gpu_encryption(push->channel, copy_size, decrypt_iv);

        if (i > 0)
            uvm_push_set_flag(push, UVM_PUSH_FLAG_CE_NEXT_PIPELINED);
@@ -1020,6 +1024,7 @@ static NV_STATUS test_cpu_to_gpu_roundtrip(uvm_gpu_t *gpu,
    size_t auth_tag_buffer_size = (size / copy_size) * UVM_CONF_COMPUTING_AUTH_TAG_SIZE;
    UvmCslIv *decrypt_iv = NULL;
    UvmCslIv *encrypt_iv = NULL;
+    NvU32 key_version;
    uvm_tracker_t tracker;
    size_t src_plain_size;

@@ -1089,6 +1094,11 @@ static NV_STATUS test_cpu_to_gpu_roundtrip(uvm_gpu_t *gpu,

    gpu_encrypt(&push, dst_cipher, dst_plain_gpu, auth_tag_mem, decrypt_iv, size, copy_size);

+    // There shouldn't be any key rotation between the end of the push and the
+    // CPU decryption(s), but it is more robust against test changes to force
+    // decryption to use the saved key.
+    key_version = uvm_channel_pool_key_version(push.channel->pool);
+
    TEST_NV_CHECK_GOTO(uvm_push_end_and_wait(&push), out);

    TEST_CHECK_GOTO(!mem_match(src_plain, src_cipher, size), out);
@@ -1101,6 +1111,7 @@ static NV_STATUS test_cpu_to_gpu_roundtrip(uvm_gpu_t *gpu,
                                                dst_plain,
                                                dst_cipher,
                                                decrypt_iv,
+                                                key_version,
                                                auth_tag_mem,
                                                size,
                                                copy_size),
@@ -1111,6 +1122,7 @@ static NV_STATUS test_cpu_to_gpu_roundtrip(uvm_gpu_t *gpu,
                                                    dst_plain,
                                                    dst_cipher,
                                                    decrypt_iv,
+                                                    key_version,
                                                    auth_tag_mem,
                                                    size,
                                                    copy_size),
--- a/kernel-open/nvidia-uvm/uvm_channel.c
+++ b/kernel-open/nvidia-uvm/uvm_channel.c
--- a/kernel-open/nvidia-uvm/uvm_channel.h
+++ b/kernel-open/nvidia-uvm/uvm_channel.h
@@ -228,21 +228,65 @@ typedef struct
    // variant is required when the thread holding the pool lock must sleep
    // (ex: acquire another mutex) deeper in the call stack, either in UVM or
    // RM.
-    union {
+    union
+    {
        uvm_spinlock_t spinlock;
        uvm_mutex_t mutex;
    };

-    // Secure operations require that uvm_push_begin order matches
-    // uvm_push_end order, because the engine's state is used in its internal
-    // operation and each push may modify this state. push_locks is protected by
-    // the channel pool lock.
-    DECLARE_BITMAP(push_locks, UVM_CHANNEL_MAX_NUM_CHANNELS_PER_POOL);
+    struct
+    {
+        // Secure operations require that uvm_push_begin order matches
+        // uvm_push_end order, because the engine's state is used in its
+        // internal operation and each push may modify this state.
+        // push_locks is protected by the channel pool lock.
+        DECLARE_BITMAP(push_locks, UVM_CHANNEL_MAX_NUM_CHANNELS_PER_POOL);

-    // Counting semaphore for available and unlocked channels, it must be
-    // acquired before submitting work to a channel when the Confidential
-    // Computing feature is enabled.
-    uvm_semaphore_t push_sem;
+        // Counting semaphore for available and unlocked channels, it must be
+        // acquired before submitting work to a channel when the Confidential
+        // Computing feature is enabled.
+        uvm_semaphore_t push_sem;
+
+        // Per channel buffers in unprotected sysmem.
+        uvm_rm_mem_t *pool_sysmem;
+
+        // Per channel buffers in protected vidmem.
+        uvm_rm_mem_t *pool_vidmem;
+
+       struct
+       {
+            // Current encryption key version, incremented upon key rotation.
+            // While there are separate keys for encryption and decryption, the
+            // two keys are rotated at once, so the versioning applies to both.
+            NvU32 version;
+
+            // Lock used to ensure mutual exclusion during key rotation.
+            uvm_mutex_t mutex;
+
+            // CSL contexts passed to RM for key rotation. This is usually an
+            // array containing the CSL contexts associated with the channels in
+            // the pool. In the case of the WLC pool, the array also includes
+            // CSL contexts associated with LCIC channels.
+            UvmCslContext **csl_contexts;
+
+            // Number of elements in the CSL context array.
+            unsigned num_csl_contexts;
+
+            // Number of bytes encrypted, or decrypted, on the engine associated
+            // with the pool since the last key rotation. Only used during
+            // testing, to force key rotations after a certain encryption size,
+            // see UVM_CONF_COMPUTING_KEY_ROTATION_LOWER_THRESHOLD.
+            //
+            // Encryptions on a LCIC pool are accounted for in the paired WLC
+            // pool.
+            //
+            // TODO: Bug 4612912: these accounting variables can be removed once
+            // RM exposes an API to set the key rotation lower threshold.
+            atomic64_t encrypted;
+            atomic64_t decrypted;
+        } key_rotation;
+
+    } conf_computing;
 } uvm_channel_pool_t;

 struct uvm_channel_struct
@@ -322,43 +366,14 @@ struct uvm_channel_struct
        // work launches to match the order of push end-s that triggered them.
        volatile NvU32 gpu_put;

-        // Static pushbuffer for channels with static schedule (WLC/LCIC)
-        uvm_rm_mem_t *static_pb_protected_vidmem;
-
-        // Static pushbuffer staging buffer for WLC
-        uvm_rm_mem_t *static_pb_unprotected_sysmem;
-        void *static_pb_unprotected_sysmem_cpu;
-        void *static_pb_unprotected_sysmem_auth_tag_cpu;
-
-        // The above static locations are required by the WLC (and LCIC)
-        // schedule. Protected sysmem location completes WLC's independence
-        // from the pushbuffer allocator.
+        // Protected sysmem location makes WLC independent from the pushbuffer
+        // allocator. Unprotected sysmem and protected vidmem counterparts
+        // are allocated from the channel pool (sysmem, vidmem).
        void *static_pb_protected_sysmem;

-        // Static tracking semaphore notifier values
-        // Because of LCIC's fixed schedule, the secure semaphore release
-        // mechanism uses two additional static locations for incrementing the
-        // notifier values. See:
-        // . channel_semaphore_secure_release()
-        // . setup_lcic_schedule()
-        // . internal_channel_submit_work_wlc()
-        uvm_rm_mem_t *static_notifier_unprotected_sysmem;
-        NvU32 *static_notifier_entry_unprotected_sysmem_cpu;
-        NvU32 *static_notifier_exit_unprotected_sysmem_cpu;
-        uvm_gpu_address_t static_notifier_entry_unprotected_sysmem_gpu_va;
-        uvm_gpu_address_t static_notifier_exit_unprotected_sysmem_gpu_va;
-
-        // Explicit location for push launch tag used by WLC.
-        // Encryption auth tags have to be located in unprotected sysmem.
-        void *launch_auth_tag_cpu;
-        NvU64 launch_auth_tag_gpu_va;
-
        // Used to decrypt the push back to protected sysmem.
        // This happens when profilers register callbacks for migration data.
        uvm_push_crypto_bundle_t *push_crypto_bundles;
-
-        // Accompanying authentication tags for the crypto bundles
-        uvm_rm_mem_t *push_crypto_bundle_auth_tags;
    } conf_computing;

    // RM channel information
@@ -418,7 +433,7 @@ struct uvm_channel_manager_struct
    unsigned num_channel_pools;

    // Mask containing the indexes of the usable Copy Engines. Each usable CE
-    // has at least one pool associated with it.
+    // has at least one pool of type UVM_CHANNEL_POOL_TYPE_CE associated with it
    DECLARE_BITMAP(ce_mask, UVM_COPY_ENGINE_COUNT_MAX);

    struct
@@ -451,6 +466,16 @@ struct uvm_channel_manager_struct
        UVM_BUFFER_LOCATION gpput_loc;
        UVM_BUFFER_LOCATION pushbuffer_loc;
    } conf;
+
+    struct
+    {
+        // Flag indicating that the WLC/LCIC mechanism is ready/setup; should
+        // only be false during (de)initialization.
+        bool wlc_ready;
+
+        // True indicates that key rotation is enabled (UVM-wise).
+        bool key_rotation_enabled;
+    } conf_computing;
 };

 // Create a channel manager for the GPU
@@ -501,6 +526,12 @@ uvm_channel_t *uvm_channel_lcic_get_paired_wlc(uvm_channel_t *lcic_channel);

 uvm_channel_t *uvm_channel_wlc_get_paired_lcic(uvm_channel_t *wlc_channel);

+NvU64 uvm_channel_get_static_pb_protected_vidmem_gpu_va(uvm_channel_t *channel);
+
+NvU64 uvm_channel_get_static_pb_unprotected_sysmem_gpu_va(uvm_channel_t *channel);
+
+char* uvm_channel_get_static_pb_unprotected_sysmem_cpu(uvm_channel_t *channel);
+
 static bool uvm_channel_pool_is_proxy(uvm_channel_pool_t *pool)
 {
    UVM_ASSERT(uvm_pool_type_is_valid(pool->pool_type));
@@ -532,6 +563,17 @@ static uvm_channel_type_t uvm_channel_proxy_channel_type(void)
    return UVM_CHANNEL_TYPE_MEMOPS;
 }

+// Force key rotation in the engine associated with the given channel pool.
+// Rotation may still not happen if RM cannot acquire the necessary locks (in
+// which case the function returns NV_ERR_STATE_IN_USE).
+//
+// This function should be only invoked in pools in which key rotation is
+// enabled.
+NV_STATUS uvm_channel_pool_rotate_key(uvm_channel_pool_t *pool);
+
+// Retrieve the current encryption key version associated with the channel pool.
+NvU32 uvm_channel_pool_key_version(uvm_channel_pool_t *pool);
+
 // Privileged channels support all the Host and engine methods, while
 // non-privileged channels don't support privileged methods.
 //
@@ -579,12 +621,9 @@ NvU32 uvm_channel_manager_update_progress(uvm_channel_manager_t *channel_manager
 // beginning.
 NV_STATUS uvm_channel_manager_wait(uvm_channel_manager_t *manager);

-// Check if WLC/LCIC mechanism is ready/setup
-// Should only return false during initialization
 static bool uvm_channel_manager_is_wlc_ready(uvm_channel_manager_t *manager)
 {
-    return (manager->pool_to_use.default_for_type[UVM_CHANNEL_TYPE_WLC] != NULL) &&
-           (manager->pool_to_use.default_for_type[UVM_CHANNEL_TYPE_LCIC] != NULL);
+    return manager->conf_computing.wlc_ready;
 }
 // Get the GPU VA of semaphore_channel's tracking semaphore within the VA space
 // associated with access_channel.
--- a/kernel-open/nvidia-uvm/uvm_channel_test.c
+++ b/kernel-open/nvidia-uvm/uvm_channel_test.c
@@ -340,9 +340,9 @@ static NV_STATUS uvm_test_iommu_rc_for_gpu(uvm_gpu_t *gpu)
    if (!domain || !iommu_is_dma_domain(domain))
        return NV_OK;

-    // Only run if ATS is enabled. Otherwise the CE doesn't get response on
-    // writing to unmapped location.
-    if (!g_uvm_global.ats.enabled)
+    // Only run if ATS is enabled with 64kB base page.
+    // Otherwise the CE doesn't get response on writing to unmapped location.
+    if (!g_uvm_global.ats.enabled || PAGE_SIZE != UVM_PAGE_SIZE_64K)
        return NV_OK;

    status = uvm_mem_alloc_sysmem_and_map_cpu_kernel(data_size, NULL, &sysmem);
@@ -691,12 +691,16 @@ static NV_STATUS stress_test_all_gpus_in_va(uvm_va_space_t *va_space,
            if (uvm_test_rng_range_32(&rng, 0, 1) == 0) {
                NvU32 random_stream_index = uvm_test_rng_range_32(&rng, 0, num_streams - 1);
                uvm_test_stream_t *random_stream = &streams[random_stream_index];
-                uvm_push_acquire_tracker(&stream->push, &random_stream->tracker);
-                snapshot_counter(&stream->push,
-                                 random_stream->counter_mem,
-                                 stream->other_stream_counter_snapshots_mem,
-                                 i,
-                                 random_stream->queued_counter_repeat);
+
+                if ((random_stream->push.gpu == gpu) || uvm_push_allow_dependencies_across_gpus()) {
+                    uvm_push_acquire_tracker(&stream->push, &random_stream->tracker);
+
+                    snapshot_counter(&stream->push,
+                                     random_stream->counter_mem,
+                                     stream->other_stream_counter_snapshots_mem,
+                                     i,
+                                     random_stream->queued_counter_repeat);
+                }
            }

            uvm_push_end(&stream->push);
@@ -789,14 +793,11 @@ done:
 // This test verifies that concurrent pushes using the same channel pool
 // select different channels, when the Confidential Computing feature is
 // enabled.
-NV_STATUS test_conf_computing_channel_selection(uvm_va_space_t *va_space)
+static NV_STATUS test_conf_computing_channel_selection(uvm_va_space_t *va_space)
 {
    NV_STATUS status = NV_OK;
-    uvm_channel_pool_t *pool;
-    uvm_push_t *pushes;
-    uvm_gpu_t *gpu;
-    NvU32 i;
-    NvU32 num_pushes;
+    uvm_push_t *pushes = NULL;
+    uvm_gpu_t *gpu = NULL;

    if (!g_uvm_global.conf_computing_enabled)
        return NV_OK;
@@ -806,9 +807,19 @@ NV_STATUS test_conf_computing_channel_selection(uvm_va_space_t *va_space)
    for_each_va_space_gpu(gpu, va_space) {
        uvm_channel_type_t channel_type;

+        // Key rotation is disabled because this test relies on nested pushes,
+        // which is illegal. If any push other than the first one triggers key
+        // rotation, the test won't complete. This is because key rotation
+        // depends on waiting for ongoing pushes to end, which doesn't happen
+        // if those pushes are ended after the current one begins.
+        uvm_conf_computing_disable_key_rotation(gpu);
+
        for (channel_type = 0; channel_type < UVM_CHANNEL_TYPE_COUNT; channel_type++) {
-            pool = gpu->channel_manager->pool_to_use.default_for_type[channel_type];
-            TEST_CHECK_RET(pool != NULL);
+            NvU32 i;
+            NvU32 num_pushes;
+            uvm_channel_pool_t *pool = gpu->channel_manager->pool_to_use.default_for_type[channel_type];
+
+            TEST_CHECK_GOTO(pool != NULL, error);

            // Skip LCIC channels as those can't accept any pushes
            if (uvm_channel_pool_is_lcic(pool))
@@ -820,7 +831,7 @@ NV_STATUS test_conf_computing_channel_selection(uvm_va_space_t *va_space)
            num_pushes = min(pool->num_channels, (NvU32)UVM_PUSH_MAX_CONCURRENT_PUSHES);

            pushes = uvm_kvmalloc_zero(sizeof(*pushes) * num_pushes);
-            TEST_CHECK_RET(pushes != NULL);
+            TEST_CHECK_GOTO(pushes != NULL, error);

            for (i = 0; i < num_pushes; i++) {
                uvm_push_t *push = &pushes[i];
@@ -837,19 +848,25 @@ NV_STATUS test_conf_computing_channel_selection(uvm_va_space_t *va_space)

            uvm_kvfree(pushes);
        }
+
+        uvm_conf_computing_enable_key_rotation(gpu);
    }

    uvm_thread_context_lock_enable_tracking();

    return status;
+
 error:
+    if (gpu != NULL)
+        uvm_conf_computing_enable_key_rotation(gpu);
+
    uvm_thread_context_lock_enable_tracking();
    uvm_kvfree(pushes);

    return status;
 }

-NV_STATUS test_channel_iv_rotation(uvm_va_space_t *va_space)
+static NV_STATUS test_channel_iv_rotation(uvm_va_space_t *va_space)
 {
    uvm_gpu_t *gpu;

@@ -944,7 +961,319 @@ release:
    return NV_OK;
 }

-NV_STATUS test_write_ctrl_gpfifo_noop(uvm_va_space_t *va_space)
+static NV_STATUS force_key_rotations(uvm_channel_pool_t *pool, unsigned num_rotations)
+{
+    unsigned num_tries;
+    unsigned max_num_tries = 20;
+    unsigned num_rotations_completed = 0;
+
+    if (num_rotations == 0)
+        return NV_OK;
+
+    // The number of accepted rotations is kept low, so failed rotation
+    // invocations due to RM not acquiring the necessary locks (which imply a
+    // sleep in the test) do not balloon the test execution time.
+    UVM_ASSERT(num_rotations <= 10);
+
+    for (num_tries = 0; (num_tries < max_num_tries) && (num_rotations_completed < num_rotations); num_tries++) {
+        // Force key rotation, irrespective of encryption usage.
+        NV_STATUS status = uvm_channel_pool_rotate_key(pool);
+
+        // Key rotation may not be able to complete due to RM failing to acquire
+        // the necessary locks. Detect the situation, sleep for a bit, and then
+        // try again
+        //
+        // The maximum time spent sleeping in a single rotation call is
+        // (max_num_tries * max_sleep_us)
+        if (status == NV_ERR_STATE_IN_USE) {
+            NvU32 min_sleep_us = 1000;
+            NvU32 max_sleep_us = 10000;
+
+            usleep_range(min_sleep_us, max_sleep_us);
+            continue;
+        }
+
+        TEST_NV_CHECK_RET(status);
+
+        num_rotations_completed++;
+    }
+
+    // If not a single key rotation occurred, the dependent tests still pass,
+    // but there is no much value to them. Instead, return an error so the
+    // maximum number of tries, or the maximum sleep time, are adjusted to
+    // ensure that at least one rotation completes.
+    if (num_rotations_completed > 0)
+        return NV_OK;
+    else
+        return NV_ERR_STATE_IN_USE;
+}
+
+static NV_STATUS force_key_rotation(uvm_channel_pool_t *pool)
+{
+    return force_key_rotations(pool, 1);
+}
+
+// Test key rotation in all pools. This is useful because key rotation may not
+// happen otherwise on certain engines during UVM test execution. For example,
+// if the MEMOPS channel type is mapped to a CE not shared with any other
+// channel type, then the only encryption taking place in the engine is due to
+// semaphore releases (4 bytes each). This small encryption size makes it
+// unlikely to exceed even small rotation thresholds.
+static NV_STATUS test_channel_key_rotation_basic(uvm_gpu_t *gpu)
+{
+    uvm_channel_pool_t *pool;
+
+    uvm_for_each_pool(pool, gpu->channel_manager) {
+        if (!uvm_conf_computing_is_key_rotation_enabled_in_pool(pool))
+            continue;
+
+        TEST_NV_CHECK_RET(force_key_rotation(pool));
+    }
+
+    return NV_OK;
+}
+
+// Interleave GPU encryptions and decryptions, and their CPU counterparts, with
+// key rotations.
+static NV_STATUS test_channel_key_rotation_interleave(uvm_gpu_t *gpu)
+{
+    int i;
+    uvm_channel_pool_t *gpu_to_cpu_pool;
+    uvm_channel_pool_t *cpu_to_gpu_pool;
+    NV_STATUS status = NV_OK;
+    size_t size = UVM_CONF_COMPUTING_DMA_BUFFER_SIZE;
+    void *initial_plain_cpu = NULL;
+    void *final_plain_cpu = NULL;
+    uvm_mem_t *plain_gpu = NULL;
+    uvm_gpu_address_t plain_gpu_address;
+
+    cpu_to_gpu_pool = gpu->channel_manager->pool_to_use.default_for_type[UVM_CHANNEL_TYPE_CPU_TO_GPU];
+    TEST_CHECK_RET(uvm_conf_computing_is_key_rotation_enabled_in_pool(cpu_to_gpu_pool));
+
+    gpu_to_cpu_pool = gpu->channel_manager->pool_to_use.default_for_type[UVM_CHANNEL_TYPE_GPU_TO_CPU];
+    TEST_CHECK_RET(uvm_conf_computing_is_key_rotation_enabled_in_pool(gpu_to_cpu_pool));
+
+    initial_plain_cpu = uvm_kvmalloc_zero(size);
+    if (initial_plain_cpu == NULL) {
+        status = NV_ERR_NO_MEMORY;
+        goto out;
+    }
+
+    final_plain_cpu = uvm_kvmalloc_zero(size);
+    if (final_plain_cpu == NULL) {
+        status = NV_ERR_NO_MEMORY;
+        goto out;
+    }
+
+    TEST_NV_CHECK_GOTO(uvm_mem_alloc_vidmem(size, gpu, &plain_gpu), out);
+    TEST_NV_CHECK_GOTO(uvm_mem_map_gpu_kernel(plain_gpu, gpu), out);
+    plain_gpu_address = uvm_mem_gpu_address_virtual_kernel(plain_gpu, gpu);
+
+    memset(initial_plain_cpu, 1, size);
+
+    for (i = 0; i < 5; i++) {
+        TEST_NV_CHECK_GOTO(force_key_rotation(gpu_to_cpu_pool), out);
+        TEST_NV_CHECK_GOTO(force_key_rotation(cpu_to_gpu_pool), out);
+
+        TEST_NV_CHECK_GOTO(uvm_conf_computing_util_memcopy_cpu_to_gpu(gpu,
+                                                                      plain_gpu_address,
+                                                                      initial_plain_cpu,
+                                                                      size,
+                                                                      NULL,
+                                                                      "CPU > GPU"),
+                           out);
+
+        TEST_NV_CHECK_GOTO(force_key_rotation(gpu_to_cpu_pool), out);
+        TEST_NV_CHECK_GOTO(force_key_rotation(cpu_to_gpu_pool), out);
+
+        TEST_NV_CHECK_GOTO(uvm_conf_computing_util_memcopy_gpu_to_cpu(gpu,
+                                                                      final_plain_cpu,
+                                                                      plain_gpu_address,
+                                                                      size,
+                                                                      NULL,
+                                                                      "GPU > CPU"),
+                           out);
+
+        TEST_CHECK_GOTO(!memcmp(initial_plain_cpu, final_plain_cpu, size), out);
+
+        memset(final_plain_cpu, 0, size);
+    }
+
+out:
+    uvm_mem_free(plain_gpu);
+    uvm_kvfree(final_plain_cpu);
+    uvm_kvfree(initial_plain_cpu);
+
+    return status;
+}
+
+static NV_STATUS memset_vidmem(uvm_mem_t *mem, NvU8 val)
+{
+    uvm_push_t push;
+    uvm_gpu_address_t gpu_address;
+    uvm_gpu_t *gpu = mem->backing_gpu;
+
+    UVM_ASSERT(uvm_mem_is_vidmem(mem));
+
+    TEST_NV_CHECK_RET(uvm_push_begin(gpu->channel_manager, UVM_CHANNEL_TYPE_GPU_INTERNAL, &push, "zero vidmem"));
+
+    gpu_address = uvm_mem_gpu_address_virtual_kernel(mem, gpu);
+    gpu->parent->ce_hal->memset_1(&push, gpu_address, val, mem->size);
+
+    TEST_NV_CHECK_RET(uvm_push_end_and_wait(&push));
+
+    return NV_OK;
+}
+
+// Custom version of uvm_conf_computing_util_memcopy_gpu_to_cpu that allows
+// testing to insert key rotations in between the push end, and the CPU
+// decryption
+static NV_STATUS encrypted_memcopy_gpu_to_cpu(uvm_gpu_t *gpu,
+                                              void *dst_plain,
+                                              uvm_gpu_address_t src_gpu_address,
+                                              size_t size,
+                                              unsigned num_rotations_to_insert)
+{
+    NV_STATUS status;
+    uvm_push_t push;
+    uvm_conf_computing_dma_buffer_t *dma_buffer;
+    uvm_gpu_address_t dst_gpu_address, auth_tag_gpu_address;
+    void *src_cipher, *auth_tag;
+    uvm_channel_t *channel;
+
+    UVM_ASSERT(g_uvm_global.conf_computing_enabled);
+    UVM_ASSERT(size <= UVM_CONF_COMPUTING_DMA_BUFFER_SIZE);
+
+    status = uvm_conf_computing_dma_buffer_alloc(&gpu->conf_computing.dma_buffer_pool, &dma_buffer, NULL);
+    if (status != NV_OK)
+        return status;
+
+    status = uvm_push_begin(gpu->channel_manager, UVM_CHANNEL_TYPE_GPU_TO_CPU, &push, "Small GPU > CPU encryption");
+    if (status != NV_OK)
+        goto out;
+
+    channel = push.channel;
+    uvm_conf_computing_log_gpu_encryption(channel, size, dma_buffer->decrypt_iv);
+    dma_buffer->key_version[0] = uvm_channel_pool_key_version(channel->pool);
+
+    dst_gpu_address = uvm_mem_gpu_address_virtual_kernel(dma_buffer->alloc, gpu);
+    auth_tag_gpu_address = uvm_mem_gpu_address_virtual_kernel(dma_buffer->auth_tag, gpu);
+    gpu->parent->ce_hal->encrypt(&push, dst_gpu_address, src_gpu_address, size, auth_tag_gpu_address);
+
+    status = uvm_push_end_and_wait(&push);
+    if (status != NV_OK)
+        goto out;
+
+    TEST_NV_CHECK_GOTO(force_key_rotations(channel->pool, num_rotations_to_insert), out);
+
+    // If num_rotations_to_insert is not zero, the current encryption key will
+    // be different from the one used during CE encryption.
+
+    src_cipher = uvm_mem_get_cpu_addr_kernel(dma_buffer->alloc);
+    auth_tag = uvm_mem_get_cpu_addr_kernel(dma_buffer->auth_tag);
+    status = uvm_conf_computing_cpu_decrypt(channel,
+                                            dst_plain,
+                                            src_cipher,
+                                            dma_buffer->decrypt_iv,
+                                            dma_buffer->key_version[0],
+                                            size,
+                                            auth_tag);
+
+ out:
+    uvm_conf_computing_dma_buffer_free(&gpu->conf_computing.dma_buffer_pool, dma_buffer, NULL);
+    return status;
+}
+
+static NV_STATUS test_channel_key_rotation_cpu_decryption(uvm_gpu_t *gpu,
+                                                          unsigned num_repetitions,
+                                                          unsigned num_rotations_to_insert)
+{
+    unsigned i;
+    uvm_channel_pool_t *gpu_to_cpu_pool;
+    NV_STATUS status = NV_OK;
+    size_t size = UVM_CONF_COMPUTING_DMA_BUFFER_SIZE;
+    NvU8 *plain_cpu = NULL;
+    uvm_mem_t *plain_gpu = NULL;
+    uvm_gpu_address_t plain_gpu_address;
+
+    if (!uvm_conf_computing_is_key_rotation_enabled(gpu))
+        return NV_OK;
+
+    gpu_to_cpu_pool = gpu->channel_manager->pool_to_use.default_for_type[UVM_CHANNEL_TYPE_GPU_TO_CPU];
+    TEST_CHECK_RET(uvm_conf_computing_is_key_rotation_enabled_in_pool(gpu_to_cpu_pool));
+
+    plain_cpu = (NvU8 *) uvm_kvmalloc_zero(size);
+    if (plain_cpu == NULL) {
+        status = NV_ERR_NO_MEMORY;
+        goto out;
+    }
+
+    TEST_NV_CHECK_GOTO(uvm_mem_alloc_vidmem(size, gpu, &plain_gpu), out);
+    TEST_NV_CHECK_GOTO(uvm_mem_map_gpu_kernel(plain_gpu, gpu), out);
+    TEST_NV_CHECK_GOTO(memset_vidmem(plain_gpu, 1), out);
+
+    plain_gpu_address = uvm_mem_gpu_address_virtual_kernel(plain_gpu, gpu);
+
+    for (i = 0; i < num_repetitions; i++) {
+        unsigned j;
+
+        TEST_NV_CHECK_GOTO(encrypted_memcopy_gpu_to_cpu(gpu,
+                                                        plain_cpu,
+                                                        plain_gpu_address,
+                                                        size,
+                                                        num_rotations_to_insert),
+                          out);
+
+        for (j = 0; j < size; j++)
+            TEST_CHECK_GOTO(plain_cpu[j] == 1, out);
+
+        memset(plain_cpu, 0, size);
+
+    }
+out:
+    uvm_mem_free(plain_gpu);
+    uvm_kvfree(plain_cpu);
+
+    return status;
+}
+
+// Test that CPU decryptions can use old keys i.e. previous versions of the keys
+// that are no longer the current key, due to key rotation. Given that SEC2
+// does not expose encryption capabilities, the "decrypt-after-rotation" problem
+// is exclusive of CE encryptions.
+static NV_STATUS test_channel_key_rotation_decrypt_after_key_rotation(uvm_gpu_t *gpu)
+{
+    // Instruct encrypted_memcopy_gpu_to_cpu to insert several key rotations
+    // between the GPU encryption, and the associated CPU decryption.
+    unsigned num_rotations_to_insert = 8;
+
+    TEST_NV_CHECK_RET(test_channel_key_rotation_cpu_decryption(gpu, 1, num_rotations_to_insert));
+
+    return NV_OK;
+}
+
+static NV_STATUS test_channel_key_rotation(uvm_va_space_t *va_space)
+{
+    uvm_gpu_t *gpu;
+
+    if (!g_uvm_global.conf_computing_enabled)
+        return NV_OK;
+
+    for_each_va_space_gpu(gpu, va_space) {
+        if (!uvm_conf_computing_is_key_rotation_enabled(gpu))
+            break;
+
+        TEST_NV_CHECK_RET(test_channel_key_rotation_basic(gpu));
+
+        TEST_NV_CHECK_RET(test_channel_key_rotation_interleave(gpu));
+
+        TEST_NV_CHECK_RET(test_channel_key_rotation_decrypt_after_key_rotation(gpu));
+    }
+
+    return NV_OK;
+}
+
+static NV_STATUS test_write_ctrl_gpfifo_noop(uvm_va_space_t *va_space)
 {
    uvm_gpu_t *gpu;

@@ -983,7 +1312,7 @@ NV_STATUS test_write_ctrl_gpfifo_noop(uvm_va_space_t *va_space)
    return NV_OK;
 }

-NV_STATUS test_write_ctrl_gpfifo_and_pushes(uvm_va_space_t *va_space)
+static NV_STATUS test_write_ctrl_gpfifo_and_pushes(uvm_va_space_t *va_space)
 {
    uvm_gpu_t *gpu;

@@ -1031,7 +1360,7 @@ NV_STATUS test_write_ctrl_gpfifo_and_pushes(uvm_va_space_t *va_space)
    return NV_OK;
 }

-NV_STATUS test_write_ctrl_gpfifo_tight(uvm_va_space_t *va_space)
+static NV_STATUS test_write_ctrl_gpfifo_tight(uvm_va_space_t *va_space)
 {
    NV_STATUS status = NV_OK;
    uvm_gpu_t *gpu;
@@ -1090,7 +1419,7 @@ NV_STATUS test_write_ctrl_gpfifo_tight(uvm_va_space_t *va_space)
        TEST_NV_CHECK_GOTO(uvm_channel_write_ctrl_gpfifo(channel, entry), error);

        // Release the semaphore.
-        UVM_WRITE_ONCE(*cpu_ptr, 1);
+        WRITE_ONCE(*cpu_ptr, 1);

        TEST_NV_CHECK_GOTO(uvm_push_wait(&push), error);

@@ -1199,6 +1528,10 @@ NV_STATUS uvm_test_channel_sanity(UVM_TEST_CHANNEL_SANITY_PARAMS *params, struct
    if (status != NV_OK)
        goto done;

+    status = test_channel_key_rotation(va_space);
+    if (status != NV_OK)
+        goto done;
+
    // The following tests have side effects, they reset the GPU's
    // channel_manager.
    status = test_channel_pushbuffer_extension_base(va_space);
@@ -1334,6 +1667,126 @@ done:
    return status;
 }

+static NV_STATUS channel_stress_key_rotation_cpu_encryption(uvm_gpu_t *gpu, UVM_TEST_CHANNEL_STRESS_PARAMS *params)
+{
+    int i;
+    uvm_channel_pool_t *cpu_to_gpu_pool;
+    NV_STATUS status = NV_OK;
+    size_t size = UVM_CONF_COMPUTING_DMA_BUFFER_SIZE;
+    void *initial_plain_cpu = NULL;
+    uvm_mem_t *plain_gpu = NULL;
+    uvm_gpu_address_t plain_gpu_address;
+
+    UVM_ASSERT(params->key_rotation_operation == UVM_TEST_CHANNEL_STRESS_KEY_ROTATION_OPERATION_CPU_TO_GPU);
+
+    cpu_to_gpu_pool = gpu->channel_manager->pool_to_use.default_for_type[UVM_CHANNEL_TYPE_CPU_TO_GPU];
+    TEST_CHECK_RET(uvm_conf_computing_is_key_rotation_enabled_in_pool(cpu_to_gpu_pool));
+
+    initial_plain_cpu = uvm_kvmalloc_zero(size);
+    if (initial_plain_cpu == NULL) {
+        status = NV_ERR_NO_MEMORY;
+        goto out;
+    }
+
+    TEST_NV_CHECK_GOTO(uvm_mem_alloc_vidmem(size, gpu, &plain_gpu), out);
+    TEST_NV_CHECK_GOTO(uvm_mem_map_gpu_kernel(plain_gpu, gpu), out);
+    plain_gpu_address = uvm_mem_gpu_address_virtual_kernel(plain_gpu, gpu);
+
+    memset(initial_plain_cpu, 1, size);
+
+    for (i = 0; i < params->iterations; i++) {
+        TEST_NV_CHECK_GOTO(uvm_conf_computing_util_memcopy_cpu_to_gpu(gpu,
+                                                                      plain_gpu_address,
+                                                                      initial_plain_cpu,
+                                                                      size,
+                                                                      NULL,
+                                                                      "CPU > GPU"),
+                           out);
+    }
+
+out:
+    uvm_mem_free(plain_gpu);
+    uvm_kvfree(initial_plain_cpu);
+
+    return status;
+}
+
+static NV_STATUS channel_stress_key_rotation_cpu_decryption(uvm_gpu_t *gpu, UVM_TEST_CHANNEL_STRESS_PARAMS *params)
+{
+    unsigned num_rotations_to_insert = 0;
+
+    UVM_ASSERT(params->key_rotation_operation == UVM_TEST_CHANNEL_STRESS_KEY_ROTATION_OPERATION_GPU_TO_CPU);
+
+    return test_channel_key_rotation_cpu_decryption(gpu, params->iterations, num_rotations_to_insert);
+}
+
+static NV_STATUS channel_stress_key_rotation_rotate(uvm_gpu_t *gpu, UVM_TEST_CHANNEL_STRESS_PARAMS *params)
+{
+    NvU32 i;
+
+    UVM_ASSERT(params->key_rotation_operation == UVM_TEST_CHANNEL_STRESS_KEY_ROTATION_OPERATION_ROTATE);
+
+    for (i = 0; i < params->iterations; ++i) {
+        NV_STATUS status;
+        uvm_channel_pool_t *pool;
+        uvm_channel_type_t type;
+
+        if ((i % 3) == 0)
+            type = UVM_CHANNEL_TYPE_CPU_TO_GPU;
+        else if ((i % 3) == 1)
+            type = UVM_CHANNEL_TYPE_GPU_TO_CPU;
+        else
+            type = UVM_CHANNEL_TYPE_WLC;
+
+        pool = gpu->channel_manager->pool_to_use.default_for_type[type];
+
+        if (!uvm_conf_computing_is_key_rotation_enabled_in_pool(pool))
+            return NV_ERR_INVALID_STATE;
+
+        status = force_key_rotation(pool);
+        if (status != NV_OK)
+            return status;
+    }
+
+    return NV_OK;
+}
+
+// The objective of this test is documented in the user-level function
+static NV_STATUS uvm_test_channel_stress_key_rotation(uvm_va_space_t *va_space, UVM_TEST_CHANNEL_STRESS_PARAMS *params)
+{
+    uvm_test_rng_t rng;
+    uvm_gpu_t *gpu;
+    NV_STATUS status = NV_OK;
+
+    if (!g_uvm_global.conf_computing_enabled)
+        return NV_OK;
+
+    uvm_test_rng_init(&rng, params->seed);
+
+    uvm_va_space_down_read(va_space);
+
+    // Key rotation should be enabled, or disabled, in all GPUs. Pick a random
+    // one.
+    gpu = random_va_space_gpu(&rng, va_space);
+
+    if (!uvm_conf_computing_is_key_rotation_enabled(gpu))
+        goto out;
+
+    if (params->key_rotation_operation == UVM_TEST_CHANNEL_STRESS_KEY_ROTATION_OPERATION_CPU_TO_GPU)
+        status = channel_stress_key_rotation_cpu_encryption(gpu, params);
+    else if (params->key_rotation_operation == UVM_TEST_CHANNEL_STRESS_KEY_ROTATION_OPERATION_GPU_TO_CPU)
+        status = channel_stress_key_rotation_cpu_decryption(gpu, params);
+    else if (params->key_rotation_operation == UVM_TEST_CHANNEL_STRESS_KEY_ROTATION_OPERATION_ROTATE)
+        status = channel_stress_key_rotation_rotate(gpu, params);
+    else
+        status = NV_ERR_INVALID_PARAMETER;
+
+out:
+    uvm_va_space_up_read(va_space);
+
+    return status;
+}
+
 NV_STATUS uvm_test_channel_stress(UVM_TEST_CHANNEL_STRESS_PARAMS *params, struct file *filp)
 {
    uvm_va_space_t *va_space = uvm_va_space_get(filp);
@@ -1345,6 +1798,8 @@ NV_STATUS uvm_test_channel_stress(UVM_TEST_CHANNEL_STRESS_PARAMS *params, struct
            return uvm_test_channel_stress_update_channels(va_space, params);
        case UVM_TEST_CHANNEL_STRESS_MODE_NOOP_PUSH:
            return uvm_test_channel_noop_push(va_space, params);
+        case UVM_TEST_CHANNEL_STRESS_MODE_KEY_ROTATION:
+            return uvm_test_channel_stress_key_rotation(va_space, params);
        default:
            return NV_ERR_INVALID_PARAMETER;
    }
--- a/kernel-open/nvidia-uvm/uvm_common.c
+++ b/kernel-open/nvidia-uvm/uvm_common.c
@@ -281,29 +281,6 @@ NV_STATUS uvm_spin_loop(uvm_spin_loop_t *spin)
    return NV_OK;
 }

-// This formats a GPU UUID, in a UVM-friendly way. That is, nearly the same as
-// what nvidia-smi reports.  It will always prefix the UUID with UVM-GPU so
-// that we know that we have a real, binary formatted UUID that will work in
-// the UVM APIs.
-//
-// It comes out like this:
-//
-//     UVM-GPU-d802726c-df8d-a3c3-ec53-48bdec201c27
-//
-//  This routine will always null-terminate the string for you. This is true
-//  even if the buffer was too small!
-//
-//  Return value is the number of non-null characters written.
-//
-// Note that if you were to let the NV2080_CTRL_CMD_GPU_GET_GID_INFO command
-// return it's default format, which is ascii, not binary, then you would get
-// this back:
-//
-//     GPU-d802726c-df8d-a3c3-ec53-48bdec201c27
-//
-//  ...which is actually a character string, and won't work for UVM API calls.
-//  So it's very important to be able to see the difference.
-//
 static char uvm_digit_to_hex(unsigned value)
 {
    if (value >= 10)
@@ -312,27 +289,19 @@ static char uvm_digit_to_hex(unsigned value)
        return value + '0';
 }

-int format_uuid_to_buffer(char *buffer, unsigned bufferLength, const NvProcessorUuid *pUuidStruct)
+void uvm_uuid_string(char *buffer, const NvProcessorUuid *pUuidStruct)
 {
-    char *str = buffer+8;
+    char *str = buffer;
    unsigned i;
    unsigned dashMask = 1 << 4 | 1 << 6 | 1 << 8 | 1 << 10;

-    if (bufferLength < (8 /*prefix*/+ 16 * 2 /*digits*/ + 4 * 1 /*dashes*/ + 1 /*null*/))
-        return *buffer = 0;
-
-    memcpy(buffer, "UVM-GPU-", 8);
-
    for (i = 0; i < 16; i++) {
        *str++ = uvm_digit_to_hex(pUuidStruct->uuid[i] >> 4);
        *str++ = uvm_digit_to_hex(pUuidStruct->uuid[i] & 0xF);

-        if (dashMask & (1 << (i+1)))
+        if (dashMask & (1 << (i + 1)))
            *str++ = '-';
    }

    *str = 0;
-
-    return (int)(str-buffer);
 }
-
--- a/kernel-open/nvidia-uvm/uvm_common.h
+++ b/kernel-open/nvidia-uvm/uvm_common.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2013-2023 NVIDIA Corporation
+    Copyright (c) 2013-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -50,9 +50,12 @@ enum {
    NVIDIA_UVM_NUM_MINOR_DEVICES
 };

-#define UVM_GPU_UUID_TEXT_BUFFER_LENGTH (8+16*2+4+1)
+// UUID has the format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+#define UVM_UUID_STRING_LENGTH ((8 + 1) + 3 * (4 + 1) + 12 + 1)

-int format_uuid_to_buffer(char *buffer, unsigned bufferLength, const NvProcessorUuid *pGpuUuid);
+// Writes UVM_UUID_STRING_LENGTH characters into buffer, including a terminating
+// NULL.
+void uvm_uuid_string(char *buffer, const NvProcessorUuid *uuid);

 #define UVM_PRINT_FUNC_PREFIX(func, prefix, fmt, ...) \
    func(prefix "%s:%u %s[pid:%d]" fmt,               \
@@ -98,27 +101,9 @@ bool uvm_debug_prints_enabled(void);
 #define UVM_INFO_PRINT(fmt, ...) \
    UVM_PRINT_FUNC_PREFIX_CHECK(printk, KERN_INFO NVIDIA_UVM_PRETTY_PRINTING_PREFIX, " " fmt, ##__VA_ARGS__)

-//
-// Please see the documentation of format_uuid_to_buffer, for details on what
-// this routine prints for you.
-//
-#define UVM_DBG_PRINT_UUID(msg, uuidPtr)                                \
-    do {                                                                \
-        char uuidBuffer[UVM_GPU_UUID_TEXT_BUFFER_LENGTH];               \
-        format_uuid_to_buffer(uuidBuffer, sizeof(uuidBuffer), uuidPtr); \
-        UVM_DBG_PRINT("%s: %s\n", msg, uuidBuffer);                     \
-    } while (0)
-
 #define UVM_ERR_PRINT_NV_STATUS(msg, rmStatus, ...)                        \
    UVM_ERR_PRINT("ERROR: %s : " msg "\n", nvstatusToString(rmStatus), ##__VA_ARGS__)

-#define UVM_ERR_PRINT_UUID(msg, uuidPtr, ...)                              \
-    do {                                                                   \
-        char uuidBuffer[UVM_GPU_UUID_TEXT_BUFFER_LENGTH];                  \
-        format_uuid_to_buffer(uuidBuffer, sizeof(uuidBuffer), uuidPtr);    \
-        UVM_ERR_PRINT("ERROR: %s : " msg "\n", uuidBuffer, ##__VA_ARGS__); \
-    } while (0)
-
 #define UVM_PANIC()             UVM_PRINT_FUNC(panic, "\n")
 #define UVM_PANIC_MSG(fmt, ...) UVM_PRINT_FUNC(panic, ": " fmt, ##__VA_ARGS__)

@@ -395,7 +380,7 @@ static inline void uvm_touch_page(struct page *page)
    UVM_ASSERT(page);

    mapping = (char *) kmap(page);
-    (void)UVM_READ_ONCE(*mapping);
+    (void)READ_ONCE(*mapping);
    kunmap(page);
 }

@@ -423,7 +408,9 @@ static void uvm_get_unaddressable_range(NvU32 num_va_bits, NvU64 *first, NvU64 *
    UVM_ASSERT(first);
    UVM_ASSERT(outer);

-    if (uvm_platform_uses_canonical_form_address()) {
+    // Maxwell GPUs (num_va_bits == 40b) do not support canonical form address
+    // even when plugged into platforms using it.
+    if (uvm_platform_uses_canonical_form_address() && num_va_bits > 40) {
        *first = 1ULL << (num_va_bits - 1);
        *outer = (NvU64)((NvS64)(1ULL << 63) >> (64 - num_va_bits));
    }
--- a/kernel-open/nvidia-uvm/uvm_conf_computing.c
+++ b/kernel-open/nvidia-uvm/uvm_conf_computing.c
@@ -33,6 +33,15 @@
 #include "nv_uvm_interface.h"
 #include "uvm_va_block.h"

+// Amount of encrypted data on a given engine that triggers key rotation. This
+// is a UVM internal threshold, different from that of RM, and used only during
+// testing.
+//
+// Key rotation is triggered when the total encryption size, or the total
+// decryption size (whatever comes first) reaches this lower threshold on the
+// engine.
+#define UVM_CONF_COMPUTING_KEY_ROTATION_LOWER_THRESHOLD (UVM_SIZE_1MB * 8)
+
 // The maximum number of secure operations per push is:
 // UVM_MAX_PUSH_SIZE / min(CE encryption size, CE decryption size)
 // + 1 (tracking semaphore) =  128 * 1024 / 56 + 1 = 2342
@@ -352,6 +361,19 @@ error:
    return status;
 }

+// The production key rotation defaults are such that key rotations rarely
+// happen. During UVM testing more frequent rotations are triggering by relying
+// on internal encryption usage accounting. When key rotations are triggered by
+// UVM, the driver does not rely on channel key rotation notifiers.
+//
+// TODO: Bug 4612912: UVM should be able to programmatically set the rotation
+// lower threshold. This function, and all the metadata associated with it
+// (per-pool encryption accounting, for example) can be removed at that point.
+static bool key_rotation_is_notifier_driven(void)
+{
+    return !uvm_enable_builtin_tests;
+}
+
 NV_STATUS uvm_conf_computing_gpu_init(uvm_gpu_t *gpu)
 {
    NV_STATUS status;
@@ -394,17 +416,35 @@ void uvm_conf_computing_gpu_deinit(uvm_gpu_t *gpu)
    conf_computing_dma_buffer_pool_deinit(&gpu->conf_computing.dma_buffer_pool);
 }

-void uvm_conf_computing_log_gpu_encryption(uvm_channel_t *channel, UvmCslIv *iv)
+void uvm_conf_computing_log_gpu_encryption(uvm_channel_t *channel, size_t size, UvmCslIv *iv)
 {
    NV_STATUS status;
+    uvm_channel_pool_t *pool;
+
+    if (uvm_channel_is_lcic(channel))
+        pool = uvm_channel_lcic_get_paired_wlc(channel)->pool;
+    else
+        pool = channel->pool;

    uvm_mutex_lock(&channel->csl.ctx_lock);
+
+    if (uvm_conf_computing_is_key_rotation_enabled_in_pool(pool)) {
+        status = nvUvmInterfaceCslLogEncryption(&channel->csl.ctx, UVM_CSL_OPERATION_DECRYPT, size);
+
+        // Informing RM of an encryption/decryption should not fail
+        UVM_ASSERT(status == NV_OK);
+
+        if (!key_rotation_is_notifier_driven())
+            atomic64_add(size, &pool->conf_computing.key_rotation.encrypted);
+    }
+
    status = nvUvmInterfaceCslIncrementIv(&channel->csl.ctx, UVM_CSL_OPERATION_DECRYPT, 1, iv);
-    uvm_mutex_unlock(&channel->csl.ctx_lock);

    // IV rotation is done preemptively as needed, so the above
    // call cannot return failure.
    UVM_ASSERT(status == NV_OK);
+
+    uvm_mutex_unlock(&channel->csl.ctx_lock);
 }

 void uvm_conf_computing_acquire_encryption_iv(uvm_channel_t *channel, UvmCslIv *iv)
@@ -428,27 +468,46 @@ void uvm_conf_computing_cpu_encrypt(uvm_channel_t *channel,
                                    void *auth_tag_buffer)
 {
    NV_STATUS status;
+    uvm_channel_pool_t *pool;

    UVM_ASSERT(size);

+    if (uvm_channel_is_lcic(channel))
+        pool = uvm_channel_lcic_get_paired_wlc(channel)->pool;
+    else
+        pool = channel->pool;
+
    uvm_mutex_lock(&channel->csl.ctx_lock);
+
    status = nvUvmInterfaceCslEncrypt(&channel->csl.ctx,
                                      size,
                                      (NvU8 const *) src_plain,
                                      encrypt_iv,
                                      (NvU8 *) dst_cipher,
                                      (NvU8 *) auth_tag_buffer);
-    uvm_mutex_unlock(&channel->csl.ctx_lock);

    // IV rotation is done preemptively as needed, so the above
    // call cannot return failure.
    UVM_ASSERT(status == NV_OK);
+
+    if (uvm_conf_computing_is_key_rotation_enabled_in_pool(pool)) {
+        status = nvUvmInterfaceCslLogEncryption(&channel->csl.ctx, UVM_CSL_OPERATION_ENCRYPT, size);
+
+        // Informing RM of an encryption/decryption should not fail
+        UVM_ASSERT(status == NV_OK);
+
+        if (!key_rotation_is_notifier_driven())
+            atomic64_add(size, &pool->conf_computing.key_rotation.decrypted);
+    }
+
+    uvm_mutex_unlock(&channel->csl.ctx_lock);
 }

 NV_STATUS uvm_conf_computing_cpu_decrypt(uvm_channel_t *channel,
                                         void *dst_plain,
                                         const void *src_cipher,
                                         const UvmCslIv *src_iv,
+                                         NvU32 key_version,
                                         size_t size,
                                         const void *auth_tag_buffer)
 {
@@ -469,10 +528,19 @@ NV_STATUS uvm_conf_computing_cpu_decrypt(uvm_channel_t *channel,
                                      size,
                                      (const NvU8 *) src_cipher,
                                      src_iv,
+                                      key_version,
                                      (NvU8 *) dst_plain,
                                      NULL,
                                      0,
                                      (const NvU8 *) auth_tag_buffer);
+
+    if (status != NV_OK) {
+        UVM_ERR_PRINT("nvUvmInterfaceCslDecrypt() failed: %s, channel %s, GPU %s\n",
+                      nvstatusToString(status),
+                      channel->name,
+                      uvm_gpu_name(uvm_channel_get_gpu(channel)));
+    }
+
    uvm_mutex_unlock(&channel->csl.ctx_lock);

    return status;
@@ -485,6 +553,8 @@ NV_STATUS uvm_conf_computing_fault_decrypt(uvm_parent_gpu_t *parent_gpu,
                                           NvU8 valid)
 {
    NV_STATUS status;
+    NvU32 fault_entry_size = parent_gpu->fault_buffer_hal->entry_size(parent_gpu);
+    UvmCslContext *csl_context = &parent_gpu->fault_buffer_info.rm_info.replayable.cslCtx;

    // There is no dedicated lock for the CSL context associated with replayable
    // faults. The mutual exclusion required by the RM CSL API is enforced by
@@ -494,36 +564,48 @@ NV_STATUS uvm_conf_computing_fault_decrypt(uvm_parent_gpu_t *parent_gpu,

    UVM_ASSERT(g_uvm_global.conf_computing_enabled);

-    status = nvUvmInterfaceCslDecrypt(&parent_gpu->fault_buffer_info.rm_info.replayable.cslCtx,
-                                      parent_gpu->fault_buffer_hal->entry_size(parent_gpu),
+    status = nvUvmInterfaceCslLogEncryption(csl_context, UVM_CSL_OPERATION_DECRYPT, fault_entry_size);
+
+    // Informing RM of an encryption/decryption should not fail
+    UVM_ASSERT(status == NV_OK);
+
+    status = nvUvmInterfaceCslDecrypt(csl_context,
+                                      fault_entry_size,
                                      (const NvU8 *) src_cipher,
                                      NULL,
+                                      NV_U32_MAX,
                                      (NvU8 *) dst_plain,
                                      &valid,
                                      sizeof(valid),
                                      (const NvU8 *) auth_tag_buffer);

-    if (status != NV_OK)
+    if (status != NV_OK) {
        UVM_ERR_PRINT("nvUvmInterfaceCslDecrypt() failed: %s, GPU %s\n",
                      nvstatusToString(status),
                      uvm_parent_gpu_name(parent_gpu));

+    }
+
    return status;
 }

-void uvm_conf_computing_fault_increment_decrypt_iv(uvm_parent_gpu_t *parent_gpu, NvU64 increment)
+void uvm_conf_computing_fault_increment_decrypt_iv(uvm_parent_gpu_t *parent_gpu)
 {
    NV_STATUS status;
+    NvU32 fault_entry_size = parent_gpu->fault_buffer_hal->entry_size(parent_gpu);
+    UvmCslContext *csl_context = &parent_gpu->fault_buffer_info.rm_info.replayable.cslCtx;

    // See comment in uvm_conf_computing_fault_decrypt
    UVM_ASSERT(uvm_sem_is_locked(&parent_gpu->isr.replayable_faults.service_lock));

    UVM_ASSERT(g_uvm_global.conf_computing_enabled);

-    status = nvUvmInterfaceCslIncrementIv(&parent_gpu->fault_buffer_info.rm_info.replayable.cslCtx,
-                                          UVM_CSL_OPERATION_DECRYPT,
-                                          increment,
-                                          NULL);
+    status = nvUvmInterfaceCslLogEncryption(csl_context, UVM_CSL_OPERATION_DECRYPT, fault_entry_size);
+
+    // Informing RM of an encryption/decryption should not fail
+    UVM_ASSERT(status == NV_OK);
+
+    status = nvUvmInterfaceCslIncrementIv(csl_context, UVM_CSL_OPERATION_DECRYPT, 1, NULL);

    UVM_ASSERT(status == NV_OK);
 }
@@ -625,3 +707,231 @@ NV_STATUS uvm_conf_computing_maybe_rotate_channel_ivs_retry_busy(uvm_channel_t *
 {
    return uvm_conf_computing_rotate_channel_ivs_below_limit(channel, uvm_conf_computing_channel_iv_rotation_limit, true);
 }
+
+void uvm_conf_computing_enable_key_rotation(uvm_gpu_t *gpu)
+{
+    if (!g_uvm_global.conf_computing_enabled)
+        return;
+
+    // Key rotation cannot be enabled on UVM if it is disabled on RM
+    if (!gpu->parent->rm_info.gpuConfComputeCaps.bKeyRotationEnabled)
+        return;
+
+    gpu->channel_manager->conf_computing.key_rotation_enabled = true;
+}
+
+void uvm_conf_computing_disable_key_rotation(uvm_gpu_t *gpu)
+{
+    if (!g_uvm_global.conf_computing_enabled)
+        return;
+
+    gpu->channel_manager->conf_computing.key_rotation_enabled = false;
+}
+
+bool uvm_conf_computing_is_key_rotation_enabled(uvm_gpu_t *gpu)
+{
+    return gpu->channel_manager->conf_computing.key_rotation_enabled;
+}
+
+bool uvm_conf_computing_is_key_rotation_enabled_in_pool(uvm_channel_pool_t *pool)
+{
+    if (!uvm_conf_computing_is_key_rotation_enabled(pool->manager->gpu))
+        return false;
+
+    // TODO: Bug 4586447: key rotation must be disabled in the SEC2 engine,
+    // because currently the encryption key is shared between UVM and RM, but
+    // UVM is not able to idle SEC2 channels owned by RM.
+    if (uvm_channel_pool_is_sec2(pool))
+        return false;
+
+    // Key rotation happens as part of channel reservation, and LCIC channels
+    // are never reserved directly. Rotation of keys in LCIC channels happens
+    // as the result of key rotation in WLC channels.
+    //
+    // Return false even if there is nothing fundamental prohibiting direct key
+    // rotation on LCIC pools
+    if (uvm_channel_pool_is_lcic(pool))
+        return false;
+
+    return true;
+}
+
+static bool conf_computing_is_key_rotation_pending_use_stats(uvm_channel_pool_t *pool)
+{
+    NvU64 decrypted, encrypted;
+
+    UVM_ASSERT(!key_rotation_is_notifier_driven());
+
+    decrypted = atomic64_read(&pool->conf_computing.key_rotation.decrypted);
+
+    if (decrypted > UVM_CONF_COMPUTING_KEY_ROTATION_LOWER_THRESHOLD)
+        return true;
+
+    encrypted = atomic64_read(&pool->conf_computing.key_rotation.encrypted);
+
+    if (encrypted > UVM_CONF_COMPUTING_KEY_ROTATION_LOWER_THRESHOLD)
+        return true;
+
+    return false;
+}
+
+static bool conf_computing_is_key_rotation_pending_use_notifier(uvm_channel_pool_t *pool)
+{
+    // If key rotation is pending for the pool's engine, then the key rotation
+    // notifier in any of the engine channels can be used by UVM to detect the
+    // situation. Note that RM doesn't update all the notifiers in a single
+    // atomic operation, so it is possible that the channel read by UVM (the
+    // first one in the pool) indicates that a key rotation is pending, but
+    // another channel in the pool (temporarily) indicates the opposite, or vice
+    // versa.
+    uvm_channel_t *first_channel = pool->channels;
+
+    UVM_ASSERT(key_rotation_is_notifier_driven());
+    UVM_ASSERT(first_channel != NULL);
+
+    return first_channel->channel_info.keyRotationNotifier->status == UVM_KEY_ROTATION_STATUS_PENDING;
+}
+
+bool uvm_conf_computing_is_key_rotation_pending_in_pool(uvm_channel_pool_t *pool)
+{
+    if (!uvm_conf_computing_is_key_rotation_enabled_in_pool(pool))
+        return false;
+
+    if (key_rotation_is_notifier_driven())
+        return conf_computing_is_key_rotation_pending_use_notifier(pool);
+    else
+        return conf_computing_is_key_rotation_pending_use_stats(pool);
+}
+
+NV_STATUS uvm_conf_computing_rotate_pool_key(uvm_channel_pool_t *pool)
+{
+    NV_STATUS status;
+
+    UVM_ASSERT(uvm_conf_computing_is_key_rotation_enabled_in_pool(pool));
+    UVM_ASSERT(pool->conf_computing.key_rotation.csl_contexts != NULL);
+    UVM_ASSERT(pool->conf_computing.key_rotation.num_csl_contexts > 0);
+
+    // NV_ERR_STATE_IN_USE indicates that RM was not able to acquire the
+    // required locks at this time. This status is not interpreted as an error,
+    // but as a sign for UVM to try again later. This is the same "protocol"
+    // used in IV rotation.
+    status = nvUvmInterfaceCslRotateKey(pool->conf_computing.key_rotation.csl_contexts,
+                                        pool->conf_computing.key_rotation.num_csl_contexts);
+
+    if (status == NV_OK) {
+        pool->conf_computing.key_rotation.version++;
+
+        if (!key_rotation_is_notifier_driven()) {
+            atomic64_set(&pool->conf_computing.key_rotation.decrypted, 0);
+            atomic64_set(&pool->conf_computing.key_rotation.encrypted, 0);
+        }
+    }
+    else if (status != NV_ERR_STATE_IN_USE) {
+        UVM_DBG_PRINT("nvUvmInterfaceCslRotateKey() failed in engine %u: %s\n",
+                      pool->engine_index,
+                      nvstatusToString(status));
+    }
+
+    return status;
+}
+
+__attribute__ ((format(printf, 6, 7)))
+NV_STATUS uvm_conf_computing_util_memcopy_cpu_to_gpu(uvm_gpu_t *gpu,
+                                                     uvm_gpu_address_t dst_gpu_address,
+                                                     void *src_plain,
+                                                     size_t size,
+                                                     uvm_tracker_t *tracker,
+                                                     const char *format,
+                                                     ...)
+{
+    NV_STATUS status;
+    uvm_push_t push;
+    uvm_conf_computing_dma_buffer_t *dma_buffer;
+    uvm_gpu_address_t src_gpu_address, auth_tag_gpu_address;
+    void *dst_cipher, *auth_tag;
+    va_list args;
+
+    UVM_ASSERT(g_uvm_global.conf_computing_enabled);
+    UVM_ASSERT(size <= UVM_CONF_COMPUTING_DMA_BUFFER_SIZE);
+
+    status = uvm_conf_computing_dma_buffer_alloc(&gpu->conf_computing.dma_buffer_pool, &dma_buffer, NULL);
+    if (status != NV_OK)
+        return status;
+
+    va_start(args, format);
+    status = uvm_push_begin_acquire(gpu->channel_manager, UVM_CHANNEL_TYPE_CPU_TO_GPU, tracker, &push, format, args);
+    va_end(args);
+
+    if (status != NV_OK)
+        goto out;
+
+    dst_cipher = uvm_mem_get_cpu_addr_kernel(dma_buffer->alloc);
+    auth_tag = uvm_mem_get_cpu_addr_kernel(dma_buffer->auth_tag);
+    uvm_conf_computing_cpu_encrypt(push.channel, dst_cipher, src_plain, NULL, size, auth_tag);
+
+    src_gpu_address = uvm_mem_gpu_address_virtual_kernel(dma_buffer->alloc, gpu);
+    auth_tag_gpu_address = uvm_mem_gpu_address_virtual_kernel(dma_buffer->auth_tag, gpu);
+    gpu->parent->ce_hal->decrypt(&push, dst_gpu_address, src_gpu_address, size, auth_tag_gpu_address);
+
+    status = uvm_push_end_and_wait(&push);
+
+out:
+    uvm_conf_computing_dma_buffer_free(&gpu->conf_computing.dma_buffer_pool, dma_buffer, NULL);
+    return status;
+}
+
+__attribute__ ((format(printf, 6, 7)))
+NV_STATUS uvm_conf_computing_util_memcopy_gpu_to_cpu(uvm_gpu_t *gpu,
+                                                     void *dst_plain,
+                                                     uvm_gpu_address_t src_gpu_address,
+                                                     size_t size,
+                                                     uvm_tracker_t *tracker,
+                                                     const char *format,
+                                                     ...)
+{
+    NV_STATUS status;
+    uvm_push_t push;
+    uvm_conf_computing_dma_buffer_t *dma_buffer;
+    uvm_gpu_address_t dst_gpu_address, auth_tag_gpu_address;
+    void *src_cipher, *auth_tag;
+    va_list args;
+
+    UVM_ASSERT(g_uvm_global.conf_computing_enabled);
+    UVM_ASSERT(size <= UVM_CONF_COMPUTING_DMA_BUFFER_SIZE);
+
+    status = uvm_conf_computing_dma_buffer_alloc(&gpu->conf_computing.dma_buffer_pool, &dma_buffer, NULL);
+    if (status != NV_OK)
+        return status;
+
+    va_start(args, format);
+    status = uvm_push_begin_acquire(gpu->channel_manager, UVM_CHANNEL_TYPE_GPU_TO_CPU, tracker, &push, format, args);
+    va_end(args);
+
+    if (status != NV_OK)
+        goto out;
+
+    uvm_conf_computing_log_gpu_encryption(push.channel, size, dma_buffer->decrypt_iv);
+    dma_buffer->key_version[0] = uvm_channel_pool_key_version(push.channel->pool);
+
+    dst_gpu_address = uvm_mem_gpu_address_virtual_kernel(dma_buffer->alloc, gpu);
+    auth_tag_gpu_address = uvm_mem_gpu_address_virtual_kernel(dma_buffer->auth_tag, gpu);
+    gpu->parent->ce_hal->encrypt(&push, dst_gpu_address, src_gpu_address, size, auth_tag_gpu_address);
+
+    status = uvm_push_end_and_wait(&push);
+    if (status != NV_OK)
+        goto out;
+
+    src_cipher = uvm_mem_get_cpu_addr_kernel(dma_buffer->alloc);
+    auth_tag = uvm_mem_get_cpu_addr_kernel(dma_buffer->auth_tag);
+    status = uvm_conf_computing_cpu_decrypt(push.channel,
+                                            dst_plain,
+                                            src_cipher,
+                                            dma_buffer->decrypt_iv,
+                                            dma_buffer->key_version[0],
+                                            size,
+                                            auth_tag);
+
+ out:
+    uvm_conf_computing_dma_buffer_free(&gpu->conf_computing.dma_buffer_pool, dma_buffer, NULL);
+    return status;
+}
--- a/kernel-open/nvidia-uvm/uvm_conf_computing.h
+++ b/kernel-open/nvidia-uvm/uvm_conf_computing.h
@@ -87,9 +87,9 @@ typedef struct
    // a free buffer.
    uvm_tracker_t tracker;

-    // When the DMA buffer is used as the destination of a GPU encryption, SEC2
-    // writes the authentication tag here. Later when the buffer is decrypted
-    // on the CPU the authentication tag is used again (read) for CSL to verify
+    // When the DMA buffer is used as the destination of a GPU encryption, the
+    // engine (CE or SEC2) writes the authentication tag here. When the buffer
+    // is decrypted on the CPU the authentication tag is used by CSL to verify
    // the authenticity. The allocation is big enough for one authentication
    // tag per PAGE_SIZE page in the alloc buffer.
    uvm_mem_t *auth_tag;
@@ -98,7 +98,12 @@ typedef struct
    // to the authentication tag. The allocation is big enough for one IV per
    // PAGE_SIZE page in the alloc buffer. The granularity between the decrypt
    // IV and authentication tag must match.
-    UvmCslIv decrypt_iv[(UVM_CONF_COMPUTING_DMA_BUFFER_SIZE / PAGE_SIZE)];
+    UvmCslIv decrypt_iv[UVM_CONF_COMPUTING_DMA_BUFFER_SIZE / PAGE_SIZE];
+
+    // When the DMA buffer is used as the destination of a GPU encryption, the
+    // key version used during GPU encryption of each PAGE_SIZE page can be
+    // saved here, so CPU decryption uses the correct decryption key.
+    NvU32 key_version[UVM_CONF_COMPUTING_DMA_BUFFER_SIZE / PAGE_SIZE];

    // Bitmap of the encrypted pages in the backing allocation
    uvm_page_mask_t encrypted_page_mask;
@@ -147,7 +152,7 @@ NV_STATUS uvm_conf_computing_gpu_init(uvm_gpu_t *gpu);
 void uvm_conf_computing_gpu_deinit(uvm_gpu_t *gpu);

 // Logs encryption information from the GPU and returns the IV.
-void uvm_conf_computing_log_gpu_encryption(uvm_channel_t *channel, UvmCslIv *iv);
+void uvm_conf_computing_log_gpu_encryption(uvm_channel_t *channel, size_t size, UvmCslIv *iv);

 // Acquires next CPU encryption IV and returns it.
 void uvm_conf_computing_acquire_encryption_iv(uvm_channel_t *channel, UvmCslIv *iv);
@@ -167,10 +172,14 @@ void uvm_conf_computing_cpu_encrypt(uvm_channel_t *channel,
 // CPU side decryption helper. Decrypts data from src_cipher and writes the
 // plain text in dst_plain. src_cipher and dst_plain can't overlap. IV obtained
 // from uvm_conf_computing_log_gpu_encryption() needs to be be passed to src_iv.
+//
+// The caller must indicate which key to use for decryption by passing the
+// appropiate key version number.
 NV_STATUS uvm_conf_computing_cpu_decrypt(uvm_channel_t *channel,
                                         void *dst_plain,
                                         const void *src_cipher,
                                         const UvmCslIv *src_iv,
+                                         NvU32 key_version,
                                         size_t size,
                                         const void *auth_tag_buffer);

@@ -191,12 +200,12 @@ NV_STATUS uvm_conf_computing_fault_decrypt(uvm_parent_gpu_t *parent_gpu,
                                           NvU8 valid);

 // Increment the CPU-side decrypt IV of the CSL context associated with
-// replayable faults. The function is a no-op if the given increment is zero.
+// replayable faults.
 //
 // The IV associated with a fault CSL context is a 64-bit counter.
 //
 // Locking: this function must be invoked while holding the replayable ISR lock.
-void uvm_conf_computing_fault_increment_decrypt_iv(uvm_parent_gpu_t *parent_gpu, NvU64 increment);
+void uvm_conf_computing_fault_increment_decrypt_iv(uvm_parent_gpu_t *parent_gpu);

 // Query the number of remaining messages before IV needs to be rotated.
 void uvm_conf_computing_query_message_pools(uvm_channel_t *channel,
@@ -214,4 +223,71 @@ NV_STATUS uvm_conf_computing_maybe_rotate_channel_ivs_retry_busy(uvm_channel_t *
 // Check if there are fewer than 'limit' messages available in either direction
 // and rotate if not.
 NV_STATUS uvm_conf_computing_rotate_channel_ivs_below_limit(uvm_channel_t *channel, NvU64 limit, bool retry_if_busy);
+
+// Rotate the engine key associated with the given channel pool.
+NV_STATUS uvm_conf_computing_rotate_pool_key(uvm_channel_pool_t *pool);
+
+// Returns true if key rotation is allowed in the channel pool.
+bool uvm_conf_computing_is_key_rotation_enabled_in_pool(uvm_channel_pool_t *pool);
+
+// Returns true if key rotation is pending in the channel pool.
+bool uvm_conf_computing_is_key_rotation_pending_in_pool(uvm_channel_pool_t *pool);
+
+// Enable/disable key rotation in the passed GPU. Note that UVM enablement is
+// dependent on RM enablement: key rotation may still be disabled upon calling
+// this function, if it is disabled in RM. On the other hand, key rotation can
+// be disabled in UVM, even if it is enabled in RM.
+//
+// Enablement/Disablement affects only kernel key rotation in keys owned by UVM.
+// It doesn't affect user key rotation (CUDA, Video...), nor it affects RM
+// kernel key rotation.
+void uvm_conf_computing_enable_key_rotation(uvm_gpu_t *gpu);
+void uvm_conf_computing_disable_key_rotation(uvm_gpu_t *gpu);
+
+// Returns true if key rotation is enabled on UVM in the given GPU. Key rotation
+// can be enabled on the GPU but disabled on some of GPU engines (LCEs or SEC2),
+// see uvm_conf_computing_is_key_rotation_enabled_in_pool.
+bool uvm_conf_computing_is_key_rotation_enabled(uvm_gpu_t *gpu);
+
+// Launch a synchronous, encrypted copy between CPU and GPU.
+//
+// The maximum copy size allowed is UVM_CONF_COMPUTING_DMA_BUFFER_SIZE.
+//
+// The source CPU buffer pointed by src_plain contains the unencrypted (plain
+// text) contents; the function internally performs a CPU-side encryption step
+// before launching the GPU-side CE decryption. The source buffer can be in
+// protected or unprotected sysmem, while the destination buffer must be in
+// protected vidmem.
+//
+// The input tracker, if not NULL, is internally acquired by the push
+// responsible for the encrypted copy.
+__attribute__ ((format(printf, 6, 7)))
+NV_STATUS uvm_conf_computing_util_memcopy_cpu_to_gpu(uvm_gpu_t *gpu,
+                                                     uvm_gpu_address_t dst_gpu_address,
+                                                     void *src_plain,
+                                                     size_t size,
+                                                     uvm_tracker_t *tracker,
+                                                     const char *format,
+                                                     ...);
+
+// Launch a synchronous, encrypted copy between CPU and GPU.
+//
+// The maximum copy size allowed is UVM_CONF_COMPUTING_DMA_BUFFER_SIZE.
+//
+// The source CPU buffer pointed by src_plain contains the unencrypted (plain
+// text) contents; the function internally performs a CPU-side encryption step
+// before launching the GPU-side CE decryption. The source buffer can be in
+// protected or unprotected sysmem, while the destination buffer must be in
+// protected vidmem.
+//
+// The input tracker, if not NULL, is internally acquired by the push
+// responsible for the encrypted copy.
+__attribute__ ((format(printf, 6, 7)))
+NV_STATUS uvm_conf_computing_util_memcopy_gpu_to_cpu(uvm_gpu_t *gpu,
+                                                     void *dst_plain,
+                                                     uvm_gpu_address_t src_gpu_address,
+                                                     size_t size,
+                                                     uvm_tracker_t *tracker,
+                                                     const char *format,
+                                                     ...);
 #endif // __UVM_CONF_COMPUTING_H__
--- a/kernel-open/nvidia-uvm/uvm_debug_optimized.c
+++ b/kernel-open/nvidia-uvm/uvm_debug_optimized.c
@@ -1,53 +0,0 @@
-/*******************************************************************************
-    Copyright (c) 2015 NVIDIA Corporation
-
-    Permission is hereby granted, free of charge, to any person obtaining a copy
-    of this software and associated documentation files (the "Software"), to
-    deal in the Software without restriction, including without limitation the
-    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
-    sell copies of the Software, and to permit persons to whom the Software is
-    furnished to do so, subject to the following conditions:
-
-        The above copyright notice and this permission notice shall be
-        included in all copies or substantial portions of the Software.
-
-    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
-    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
-    DEALINGS IN THE SOFTWARE.
-
-*******************************************************************************/
-
-// This file provides simple wrappers that are always built with optimizations
-// turned on to WAR issues with functions that don't build correctly otherwise.
-
-#include "uvm_linux.h"
-
-int nv_atomic_xchg(atomic_t *val, int new)
-{
-    return atomic_xchg(val, new);
-}
-
-int nv_atomic_cmpxchg(atomic_t *val, int old, int new)
-{
-    return atomic_cmpxchg(val, old, new);
-}
-
-long nv_atomic_long_cmpxchg(atomic_long_t *val, long old, long new)
-{
-    return atomic_long_cmpxchg(val, old, new);
-}
-
-unsigned long nv_copy_from_user(void *to, const void __user *from, unsigned long n)
-{
-    return copy_from_user(to, from, n);
-}
-
-unsigned long nv_copy_to_user(void __user *to, const void *from, unsigned long n)
-{
-    return copy_to_user(to, from, n);
-}
-
--- a/kernel-open/nvidia-uvm/uvm_fault_buffer_flush_test.c
+++ b/kernel-open/nvidia-uvm/uvm_fault_buffer_flush_test.c
@@ -51,8 +51,10 @@ NV_STATUS uvm_test_fault_buffer_flush(UVM_TEST_FAULT_BUFFER_FLUSH_PARAMS *params

    uvm_va_space_up_read(va_space);

-    if (uvm_processor_mask_empty(retained_gpus))
-        return NV_ERR_INVALID_DEVICE;
+    if (uvm_processor_mask_empty(retained_gpus)) {
+        status = NV_ERR_INVALID_DEVICE;
+        goto out;
+    }

    for (i = 0; i < params->iterations; i++) {
        if (fatal_signal_pending(current)) {
--- a/kernel-open/nvidia-uvm/uvm_forward_decl.h
+++ b/kernel-open/nvidia-uvm/uvm_forward_decl.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2015-2022 NVIDIA Corporation
+    Copyright (c) 2015-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -42,7 +42,6 @@ typedef struct uvm_gpu_semaphore_struct uvm_gpu_semaphore_t;
 typedef struct uvm_gpu_tracking_semaphore_struct uvm_gpu_tracking_semaphore_t;
 typedef struct uvm_gpu_semaphore_pool_struct uvm_gpu_semaphore_pool_t;
 typedef struct uvm_gpu_semaphore_pool_page_struct uvm_gpu_semaphore_pool_page_t;
-typedef struct uvm_gpu_peer_struct uvm_gpu_peer_t;
 typedef struct uvm_mmu_mode_hal_struct uvm_mmu_mode_hal_t;

 typedef struct uvm_channel_manager_struct uvm_channel_manager_t;
@@ -57,6 +56,12 @@ typedef struct uvm_gpfifo_entry_struct uvm_gpfifo_entry_t;

 typedef struct uvm_va_policy_struct uvm_va_policy_t;
 typedef struct uvm_va_range_struct uvm_va_range_t;
+typedef struct uvm_va_range_managed_struct uvm_va_range_managed_t;
+typedef struct uvm_va_range_external_struct uvm_va_range_external_t;
+typedef struct uvm_va_range_channel_struct uvm_va_range_channel_t;
+typedef struct uvm_va_range_sked_reflected_struct uvm_va_range_sked_reflected_t;
+typedef struct uvm_va_range_semaphore_pool_struct uvm_va_range_semaphore_pool_t;
+typedef struct uvm_va_range_device_p2p_struct uvm_va_range_device_p2p_t;
 typedef struct uvm_va_block_struct uvm_va_block_t;
 typedef struct uvm_va_block_test_struct uvm_va_block_test_t;
 typedef struct uvm_va_block_wrapper_struct uvm_va_block_wrapper_t;
--- a/kernel-open/nvidia-uvm/uvm_get_rm_ptes_test.c
+++ b/kernel-open/nvidia-uvm/uvm_get_rm_ptes_test.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016-2023 NVIDIA Corporation
+    Copyright (c) 2016-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -115,14 +115,10 @@ static NV_STATUS verify_mapping_info(uvm_va_space_t *va_space,

    TEST_CHECK_RET(skip);

-    memory_owning_gpu = uvm_va_space_get_gpu_by_uuid(va_space, &memory_info->uuid);
-    if (memory_owning_gpu == NULL)
+    memory_owning_gpu = uvm_va_space_get_gpu_by_mem_info(va_space, memory_info);
+    if (!memory_owning_gpu)
        return NV_ERR_INVALID_DEVICE;

-    // TODO: Bug 1903234: Once RM supports indirect peer mappings, we'll need to
-    //       update this test since the aperture will be SYS. Depending on how
-    //       RM implements things, we might not be able to compare the physical
-    //       addresses either.
    aperture = get_aperture(va_space, memory_owning_gpu, memory_mapping_gpu, memory_info, sli_supported);

    if (is_cacheable(ext_mapping_info, aperture))
@@ -133,7 +129,8 @@ static NV_STATUS verify_mapping_info(uvm_va_space_t *va_space,
    phys_offset = mapping_offset;

    // Add the physical offset for nvswitch connected peer mappings
-    if (uvm_aperture_is_peer(aperture) && uvm_gpus_are_nvswitch_connected(memory_mapping_gpu, memory_owning_gpu))
+    if (uvm_aperture_is_peer(aperture) &&
+        uvm_parent_gpus_are_nvswitch_connected(memory_mapping_gpu->parent, memory_owning_gpu->parent))
        phys_offset += memory_owning_gpu->parent->nvswitch_info.fabric_memory_window_start;

    for (index = 0; index < ext_mapping_info->numWrittenPtes; index++) {
--- a/kernel-open/nvidia-uvm/uvm_global.c
+++ b/kernel-open/nvidia-uvm/uvm_global.c
@@ -412,7 +412,7 @@ void uvm_global_set_fatal_error_impl(NV_STATUS error)

    UVM_ASSERT(error != NV_OK);

-    previous_error = nv_atomic_cmpxchg(&g_uvm_global.fatal_error, NV_OK, error);
+    previous_error = atomic_cmpxchg(&g_uvm_global.fatal_error, NV_OK, error);

    if (previous_error == NV_OK) {
        UVM_ERR_PRINT("Encountered a global fatal error: %s\n", nvstatusToString(error));
@@ -421,6 +421,8 @@ void uvm_global_set_fatal_error_impl(NV_STATUS error)
        UVM_ERR_PRINT("Encountered a global fatal error: %s after a global error has been already set: %s\n",
                nvstatusToString(error), nvstatusToString(previous_error));
    }
+
+    nvUvmInterfaceReportFatalError(error);
 }

 NV_STATUS uvm_global_reset_fatal_error(void)
@@ -430,7 +432,7 @@ NV_STATUS uvm_global_reset_fatal_error(void)
        return NV_ERR_INVALID_STATE;
    }

-    return nv_atomic_xchg(&g_uvm_global.fatal_error, NV_OK);
+    return atomic_xchg(&g_uvm_global.fatal_error, NV_OK);
 }

 void uvm_global_gpu_retain(const uvm_processor_mask_t *mask)
--- a/kernel-open/nvidia-uvm/uvm_global.h
+++ b/kernel-open/nvidia-uvm/uvm_global.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2015-2023 NVIDIA Corporation
+    Copyright (c) 2015-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -52,19 +52,23 @@ struct uvm_global_struct
    // Created on module load and destroyed on module unload
    uvmGpuSessionHandle rm_session_handle;

-    // peer-to-peer table
-    // peer info is added and removed from this table when usermode
-    // driver calls UvmEnablePeerAccess and UvmDisablePeerAccess
-    // respectively.
-    uvm_gpu_peer_t peers[UVM_MAX_UNIQUE_GPU_PAIRS];
+    // Peer-to-peer table for storing parent GPU and MIG instance peer info.
+    // Note that MIG instances can be peers within a single parent GPU or
+    // be peers in different parent GPUs if NVLINK or PCIe peers is enabled.
+    // PCIe and MIG peer info is added and removed from this table when
+    // usermode driver calls UvmEnablePeerAccess() and UvmDisablePeerAccess()
+    // respectively. NvLink and MIG peers are updated when UvmRegisterGpu() and
+    // UvmUnregisterGpu() are called. Peer to peer state for MIG instances
+    // within the same parent GPU are not stored here.
+    uvm_parent_gpu_peer_t parent_gpu_peers[UVM_MAX_UNIQUE_PARENT_GPU_PAIRS];

    // peer-to-peer copy mode
    // Pascal+ GPUs support virtual addresses in p2p copies.
    // Ampere+ GPUs add support for physical addresses in p2p copies.
    uvm_gpu_peer_copy_mode_t peer_copy_mode;

-    // Stores an NV_STATUS, once it becomes != NV_OK, the driver should refuse to
-    // do most anything other than try and clean up as much as possible.
+    // Stores an NV_STATUS, once it becomes != NV_OK, the driver should refuse
+    // to do most anything other than try and clean up as much as possible.
    // An example of a fatal error is an unrecoverable ECC error on one of the
    // GPUs.
    atomic_t fatal_error;
@@ -232,12 +236,12 @@ static uvmGpuSessionHandle uvm_global_session_handle(void)
 // suspended.
 #define UVM_GPU_WRITE_ONCE(x, val) do {         \
        UVM_ASSERT(!uvm_global_is_suspended()); \
-        UVM_WRITE_ONCE(x, val);                 \
+        WRITE_ONCE(x, val);                     \
    } while (0)

 #define UVM_GPU_READ_ONCE(x) ({                 \
        UVM_ASSERT(!uvm_global_is_suspended()); \
-        UVM_READ_ONCE(x);                       \
+        READ_ONCE(x);                           \
    })

 static bool global_is_fatal_error_assert_disabled(void)
@@ -384,7 +388,7 @@ static uvm_gpu_t *uvm_gpu_find_next_valid_gpu_in_parent(uvm_parent_gpu_t *parent
         (parent_gpu) = uvm_global_find_next_parent_gpu((parent_gpu)))

 // LOCKING: Must hold the global_lock
-#define for_each_gpu_in_parent(parent_gpu, gpu)                                                 \
+#define for_each_gpu_in_parent(gpu, parent_gpu)                                                 \
    for (({uvm_assert_mutex_locked(&g_uvm_global.global_lock);                                  \
         (gpu) = uvm_gpu_find_next_valid_gpu_in_parent((parent_gpu), NULL);});                  \
         (gpu) != NULL;                                                                         \
@@ -409,4 +413,10 @@ NV_STATUS uvm_service_block_context_init(void);
 // Release fault service contexts if any exist.
 void uvm_service_block_context_exit(void);

+// Allocate a service block context
+uvm_service_block_context_t *uvm_service_block_context_alloc(struct mm_struct *mm);
+
+// Free a servic block context
+void uvm_service_block_context_free(uvm_service_block_context_t *service_context);
+
 #endif // __UVM_GLOBAL_H__
--- a/kernel-open/nvidia-uvm/uvm_gpu.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu.c
--- a/kernel-open/nvidia-uvm/uvm_gpu.h
+++ b/kernel-open/nvidia-uvm/uvm_gpu.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2015-2023 NVIDIA Corporation
+    Copyright (c) 2015-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -49,9 +49,13 @@
 #include <linux/mmu_notifier.h>
 #include "uvm_conf_computing.h"

-// Buffer length to store uvm gpu id, RM device name and gpu uuid.
-#define UVM_GPU_NICE_NAME_BUFFER_LENGTH (sizeof("ID 999: : ") + \
-            UVM_GPU_NAME_LENGTH + UVM_GPU_UUID_TEXT_BUFFER_LENGTH)
+#define UVM_PARENT_GPU_UUID_PREFIX "GPU-"
+#define UVM_GPU_UUID_PREFIX "GI-"
+
+// UVM_UUID_STRING_LENGTH already includes NULL, don't double-count it with
+// sizeof()
+#define UVM_PARENT_GPU_UUID_STRING_LENGTH (sizeof(UVM_PARENT_GPU_UUID_PREFIX) - 1 + UVM_UUID_STRING_LENGTH)
+#define UVM_GPU_UUID_STRING_LENGTH (sizeof(UVM_GPU_UUID_PREFIX) - 1 + UVM_UUID_STRING_LENGTH)

 #define UVM_GPU_MAGIC_VALUE 0xc001d00d12341993ULL

@@ -160,6 +164,10 @@ struct uvm_service_block_context_struct
    // Pages whose permissions need to be revoked from other processors
    uvm_page_mask_t revocation_mask;

+    // Temporary mask used in service_va_block_locked() in
+    // uvm_gpu_access_counters.c.
+    uvm_processor_mask_t update_processors;
+
    struct
    {
        // Per-processor mask with the pages that will be resident after
@@ -180,29 +188,45 @@ struct uvm_service_block_context_struct

 typedef struct
 {
-    // Mask of read faulted pages in a UVM_VA_BLOCK_SIZE aligned region of a SAM
-    // VMA. Used for batching ATS faults in a vma. This is unused for access
-    // counter service requests.
-    uvm_page_mask_t read_fault_mask;
+    union
+    {
+        struct
+        {
+            // Mask of read faulted pages in a UVM_VA_BLOCK_SIZE aligned region
+            // of a SAM VMA. Used for batching ATS faults in a vma.
+            uvm_page_mask_t read_fault_mask;

-    // Mask of write faulted pages in a UVM_VA_BLOCK_SIZE aligned region of a
-    // SAM VMA. Used for batching ATS faults in a vma. This is unused for access
-    // counter service requests.
-    uvm_page_mask_t write_fault_mask;
+            // Mask of write faulted pages in a UVM_VA_BLOCK_SIZE aligned region
+            // of a SAM VMA. Used for batching ATS faults in a vma.
+            uvm_page_mask_t write_fault_mask;

-    // Mask of successfully serviced pages in a UVM_VA_BLOCK_SIZE aligned region
-    // of a SAM VMA. Used to return ATS fault status. This is unused for access
-    // counter service requests.
-    uvm_page_mask_t faults_serviced_mask;
+            // Mask of all faulted pages in a UVM_VA_BLOCK_SIZE aligned region
+            // of a SAM VMA. This is a logical or of read_fault_mask and
+            // write_mask.
+            uvm_page_mask_t accessed_mask;

-    // Mask of successfully serviced read faults on pages in write_fault_mask.
-    // This is unused for access counter service requests.
-    uvm_page_mask_t reads_serviced_mask;
+            // Mask of successfully serviced pages in a UVM_VA_BLOCK_SIZE
+            // aligned region of a SAM VMA. Used to return ATS fault status.
+            uvm_page_mask_t faults_serviced_mask;

-    // Mask of all accessed pages in a UVM_VA_BLOCK_SIZE aligned region of a SAM
-    // VMA. This is used as input for access counter service requests and output
-    // of fault service requests.
-    uvm_page_mask_t accessed_mask;
+            // Mask of successfully serviced read faults on pages in
+            // write_fault_mask.
+            uvm_page_mask_t reads_serviced_mask;
+
+        } faults;
+
+        struct
+        {
+            // Mask of all accessed pages in a UVM_VA_BLOCK_SIZE aligned region
+            // of a SAM VMA.
+            uvm_page_mask_t accessed_mask;
+
+            // Mask of successfully migrated pages in a UVM_VA_BLOCK_SIZE
+            // aligned region of a SAM VMA.
+            uvm_page_mask_t migrated_mask;
+
+        } access_counters;
+    };

    // Client type of the service requestor.
    uvm_fault_client_type_t client_type;
@@ -275,6 +299,10 @@ struct uvm_fault_service_batch_context_struct
    // pick one to be the target of the cancel sequence.
    uvm_va_space_t *fatal_va_space;

+    // TODO: Bug 3900733: refactor service_fault_batch_for_cancel() to handle
+    // iterating over multiple GPU VA spaces and remove fatal_gpu.
+    uvm_gpu_t *fatal_gpu;
+
    bool has_throttled_faults;

    NvU32 num_invalid_prefetch_faults;
@@ -589,20 +617,26 @@ typedef enum
    UVM_GPU_LINK_NVLINK_2,
    UVM_GPU_LINK_NVLINK_3,
    UVM_GPU_LINK_NVLINK_4,
+    UVM_GPU_LINK_NVLINK_5,
    UVM_GPU_LINK_C2C,
    UVM_GPU_LINK_MAX
 } uvm_gpu_link_type_t;

-// UVM does not support P2P copies on pre-Pascal GPUs. Pascal+ GPUs only
-// support virtual addresses in P2P copies. Therefore, a peer identity mapping
-// needs to be created.
-// Ampere+ GPUs support physical peer copies, too, so identity mappings are not
-// needed
 typedef enum
 {
+    // Peer copies can be disallowed for a variety of reasons. For example,
+    // P2P transfers are disabled in pre-Pascal GPUs because there is no
+    // compelling use case for direct peer migrations.
    UVM_GPU_PEER_COPY_MODE_UNSUPPORTED,
+
+    // Pascal+ GPUs support virtual addresses in P2P copies. Virtual peer copies
+    // require the creation of peer identity mappings.
    UVM_GPU_PEER_COPY_MODE_VIRTUAL,
+
+    // Ampere+ GPUs support virtual and physical peer copies. Physical peer
+    // copies do not depend on peer identity mappings.
    UVM_GPU_PEER_COPY_MODE_PHYSICAL,
+
    UVM_GPU_PEER_COPY_MODE_COUNT
 } uvm_gpu_peer_copy_mode_t;

@@ -619,9 +653,10 @@ struct uvm_gpu_struct
    NvProcessorUuid uuid;

    // Nice printable name in the format:
-    // ID: 999: GPU-<parent_uuid> UVM-GI-<gi_uuid>.
-    // UVM_GPU_UUID_TEXT_BUFFER_LENGTH includes the null character.
-    char name[9 + 2 * UVM_GPU_UUID_TEXT_BUFFER_LENGTH];
+    // ID: 999: GPU-<parent_uuid> GI-<gi_uuid>
+    // UVM_PARENT_GPU_UUID_STRING_LENGTH includes a NULL character but will be
+    // used for a space instead.
+    char name[sizeof("ID: 999: ") - 1 + UVM_PARENT_GPU_UUID_STRING_LENGTH - 1 + 1 + UVM_GPU_UUID_STRING_LENGTH];

    // Refcount of the gpu, i.e. how many times it has been retained. This is
    // roughly a count of how many times it has been registered with a VA space,
@@ -668,6 +703,12 @@ struct uvm_gpu_struct
            bool enabled;
            unsigned int node_id;
        } numa;
+
+        // Physical address of the start of statically mapped fb memory in BAR1
+        NvU64 static_bar1_start;
+
+        // Size of statically mapped fb memory in BAR1.
+        NvU64 static_bar1_size;
    } mem_info;

    struct
@@ -692,9 +733,6 @@ struct uvm_gpu_struct
    struct
    {
        // Mask of peer_gpus set
-        //
-        // We can use a regular processor id because P2P is not allowed between
-        // partitioned GPUs when SMC is enabled
        uvm_processor_mask_t peer_gpu_mask;

        // lazily-populated array of peer GPUs, indexed by the peer's GPU index
@@ -845,16 +883,19 @@ struct uvm_gpu_struct

    struct
    {
+        // "gpus/UVM-GPU-${physical-UUID}/${sub_processor_index}/"
        struct proc_dir_entry *dir;

+        // "gpus/${gpu_id}" -> "UVM-GPU-${physical-UUID}/${sub_processor_index}"
        struct proc_dir_entry *dir_symlink;

-        // The GPU instance UUID symlink if SMC is enabled.
+        // The GPU instance UUID symlink.
+        // "gpus/UVM-GI-${GI-UUID}" ->
+        //     "UVM-GPU-${physical-UUID}/${sub_processor_index}"
        struct proc_dir_entry *gpu_instance_uuid_symlink;

+        // "gpus/UVM-GPU-${physical-UUID}/${sub_processor_index}/info"
        struct proc_dir_entry *info_file;
-
-        struct proc_dir_entry *dir_peers;
    } procfs;

    // Placeholder for per-GPU performance heuristics information
@@ -862,6 +903,13 @@ struct uvm_gpu_struct

    // Force pushbuffer's GPU VA to be >= 1TB; used only for testing purposes.
    bool uvm_test_force_upper_pushbuffer_segment;
+
+    // Have we initialised device p2p pages.
+    bool device_p2p_initialised;
+
+    // Used to protect allocation of p2p_mem and assignment of the page
+    // zone_device_data fields.
+    uvm_mutex_t device_p2p_lock;
 };

 // In order to support SMC/MIG GPU partitions, we split UVM GPUs into two
@@ -891,7 +939,7 @@ struct uvm_parent_gpu_struct
    NvProcessorUuid uuid;

    // Nice printable name including the uvm gpu id, ascii name from RM and uuid
-    char name[UVM_GPU_NICE_NAME_BUFFER_LENGTH];
+    char name[sizeof("ID 999: : ") - 1 + UVM_GPU_NAME_LENGTH + UVM_PARENT_GPU_UUID_STRING_LENGTH];

    // GPU information and provided by RM (architecture, implementation,
    // hardware classes, etc.).
@@ -1073,11 +1121,17 @@ struct uvm_parent_gpu_struct

    struct
    {
+        // "gpus/UVM-GPU-${physical-UUID}/"
        struct proc_dir_entry *dir;

+        // "gpus/UVM-GPU-${physical-UUID}/fault_stats"
        struct proc_dir_entry *fault_stats_file;

+        // "gpus/UVM-GPU-${physical-UUID}/access_counters"
        struct proc_dir_entry *access_counters_file;
+
+        // "gpus/UVM-GPU-${physical-UUID}/peers/"
+        struct proc_dir_entry *dir_peers;
    } procfs;

    // Interrupt handling state and locks
@@ -1225,58 +1279,66 @@ static uvmGpuDeviceHandle uvm_gpu_device_handle(uvm_gpu_t *gpu)
    return gpu->parent->rm_device;
 }

-struct uvm_gpu_peer_struct
+typedef struct
+{
+    // ref_count also controls state maintained in each GPU instance
+    // (uvm_gpu_t). See init_peer_access().
+    NvU64 ref_count;
+} uvm_gpu_peer_t;
+
+typedef struct
 {
    // The fields in this global structure can only be inspected under one of
    // the following conditions:
    //
-    // - The VA space lock is held for either read or write, both GPUs are
-    //   registered in the VA space, and the corresponding bit in the
+    // - The VA space lock is held for either read or write, both parent GPUs
+    //   are registered in the VA space, and the corresponding bit in the
    //   va_space.enabled_peers bitmap is set.
    //
    // - The global lock is held.
    //
-    // - While the global lock was held in the past, the two GPUs were detected
-    //   to be SMC peers and were both retained.
+    // - While the global lock was held in the past, the two parent GPUs were
+    //   both retained.
    //
-    // - While the global lock was held in the past, the two GPUs were detected
-    //   to be NVLINK peers and were both retained.
+    // - While the global lock was held in the past, the two parent GPUs were
+    //   detected to be NVLINK peers and were both retained.
    //
-    // - While the global lock was held in the past, the two GPUs were detected
-    //   to be PCIe peers and uvm_gpu_retain_pcie_peer_access() was called.
+    // - While the global lock was held in the past, the two parent GPUs were
+    //   detected to be PCIe peers and uvm_gpu_retain_pcie_peer_access() was
+    //   called.
    //
    // - The peer_gpus_lock is held on one of the GPUs. In this case, the other
    //   GPU must be read from the original GPU's peer_gpus table. The fields
    //   will not change while the lock is held, but they may no longer be valid
    //   because the other GPU might be in teardown.

-    // Peer Id associated with this device w.r.t. to a peer GPU.
+    // This field is used to determine when this struct has been initialized
+    // (ref_count != 0). NVLink peers are initialized at GPU registration time.
+    // PCIe peers are initialized when retain_pcie_peers_from_uuids() is called.
+    NvU64 ref_count;
+
+    // Saved values from UvmGpuP2PCapsParams to be used after GPU instance
+    // creation. This should be per GPU instance since LCEs are associated with
+    // GPU instances, not parent GPUs, but for now MIG is not supported for
+    // NVLINK peers so RM associates this state with the parent GPUs. This will
+    // need to be revisited if that NVLINK MIG peer support is added.
+    NvU8 optimalNvlinkWriteCEs[2];
+
+    // Peer Id associated with this device with respect to a peer parent GPU.
    // Note: peerId (A -> B) != peerId (B -> A)
    // peer_id[0] from min(gpu_id_1, gpu_id_2) -> max(gpu_id_1, gpu_id_2)
    // peer_id[1] from max(gpu_id_1, gpu_id_2) -> min(gpu_id_1, gpu_id_2)
    NvU8 peer_ids[2];

-    // Indirect peers are GPUs which can coherently access each others' memory
-    // over NVLINK, but are routed through the CPU using the SYS aperture rather
-    // than a PEER aperture
-    NvU8 is_indirect_peer : 1;
-
-    // The link type between the peer GPUs, currently either PCIe or NVLINK.
-    // This field is used to determine the when this peer struct has been
-    // initialized (link_type != UVM_GPU_LINK_INVALID). NVLink peers are
-    // initialized at GPU registration time. PCIe peers are initialized when
-    // the refcount below goes from 0 to 1.
+    // The link type between the peer parent GPUs, currently either PCIe or
+    // NVLINK.
    uvm_gpu_link_type_t link_type;

    // Maximum unidirectional bandwidth between the peers in megabytes per
-    // second, not taking into account the protocols' overhead. The reported
-    // bandwidth for indirect peers is zero. See UvmGpuP2PCapsParams.
+    // second, not taking into account the protocols' overhead.
+    // See UvmGpuP2PCapsParams.
    NvU32 total_link_line_rate_mbyte_per_s;

-    // For PCIe, the number of times that this has been retained by a VA space.
-    // For NVLINK this will always be 1.
-    NvU64 ref_count;
-
    // This handle gets populated when enable_peer_access successfully creates
    // an NV50_P2P object. disable_peer_access resets the same on the object
    // deletion.
@@ -1290,9 +1352,13 @@ struct uvm_gpu_peer_struct
        // GPU-A <-> GPU-B link is bidirectional, pairs[x][0] is always the
        // local GPU, while pairs[x][1] is the remote GPU. The table shall be
        // filled like so: [[GPU-A, GPU-B], [GPU-B, GPU-A]].
-        uvm_gpu_t *pairs[2][2];
+        uvm_parent_gpu_t *pairs[2][2];
    } procfs;
-};
+
+    // Peer-to-peer state for MIG instance pairs between two different parent
+    // GPUs.
+    uvm_gpu_peer_t gpu_peers[UVM_MAX_UNIQUE_SUB_PROCESSOR_PAIRS];
+} uvm_parent_gpu_peer_t;

 // Initialize global gpu state
 NV_STATUS uvm_gpu_init(void);
@@ -1371,12 +1437,12 @@ static NvU64 uvm_gpu_retained_count(uvm_gpu_t *gpu)
    return atomic64_read(&gpu->retained_count);
 }

-// Decrease the refcount on the parent GPU object, and actually delete the object
-// if the refcount hits zero.
+// Decrease the refcount on the parent GPU object, and actually delete the
+// object if the refcount hits zero.
 void uvm_parent_gpu_kref_put(uvm_parent_gpu_t *gpu);

-// Calculates peer table index using GPU ids.
-NvU32 uvm_gpu_peer_table_index(const uvm_gpu_id_t gpu_id0, const uvm_gpu_id_t gpu_id1);
+// Returns a GPU peer pair index in the range [0 .. UVM_MAX_UNIQUE_GPU_PAIRS).
+NvU32 uvm_gpu_pair_index(const uvm_gpu_id_t id0, const uvm_gpu_id_t id1);

 // Either retains an existing PCIe peer entry or creates a new one. In both
 // cases the two GPUs are also each retained.
@@ -1387,46 +1453,27 @@ NV_STATUS uvm_gpu_retain_pcie_peer_access(uvm_gpu_t *gpu0, uvm_gpu_t *gpu1);
 // LOCKING: requires the global lock to be held
 void uvm_gpu_release_pcie_peer_access(uvm_gpu_t *gpu0, uvm_gpu_t *gpu1);

+uvm_gpu_link_type_t uvm_parent_gpu_peer_link_type(uvm_parent_gpu_t *parent_gpu0, uvm_parent_gpu_t *parent_gpu1);
+
 // Get the aperture for local_gpu to use to map memory resident on remote_gpu.
 // They must not be the same gpu.
 uvm_aperture_t uvm_gpu_peer_aperture(uvm_gpu_t *local_gpu, uvm_gpu_t *remote_gpu);

+// Return the reference count for the P2P state between the given GPUs.
+// The two GPUs must have different parents.
+NvU64 uvm_gpu_peer_ref_count(const uvm_gpu_t *gpu0, const uvm_gpu_t *gpu1);
+
 // Get the processor id accessible by the given GPU for the given physical
 // address.
 uvm_processor_id_t uvm_gpu_get_processor_id_by_address(uvm_gpu_t *gpu, uvm_gpu_phys_address_t addr);

-// Get the P2P capabilities between the gpus with the given indexes
-uvm_gpu_peer_t *uvm_gpu_index_peer_caps(const uvm_gpu_id_t gpu_id0, const uvm_gpu_id_t gpu_id1);
+bool uvm_parent_gpus_are_nvswitch_connected(const uvm_parent_gpu_t *parent_gpu0, const uvm_parent_gpu_t *parent_gpu1);

-// Get the P2P capabilities between the given gpus
-static uvm_gpu_peer_t *uvm_gpu_peer_caps(const uvm_gpu_t *gpu0, const uvm_gpu_t *gpu1)
+static bool uvm_gpus_are_smc_peers(const uvm_gpu_t *gpu0, const uvm_gpu_t *gpu1)
 {
-    return uvm_gpu_index_peer_caps(gpu0->id, gpu1->id);
-}
+    UVM_ASSERT(gpu0 != gpu1);

-static bool uvm_gpus_are_nvswitch_connected(const uvm_gpu_t *gpu0, const uvm_gpu_t *gpu1)
-{
-    if (gpu0->parent->nvswitch_info.is_nvswitch_connected && gpu1->parent->nvswitch_info.is_nvswitch_connected) {
-        UVM_ASSERT(uvm_gpu_peer_caps(gpu0, gpu1)->link_type >= UVM_GPU_LINK_NVLINK_2);
-        return true;
-    }
-
-    return false;
-}
-
-static bool uvm_gpus_are_indirect_peers(uvm_gpu_t *gpu0, uvm_gpu_t *gpu1)
-{
-    uvm_gpu_peer_t *peer_caps = uvm_gpu_peer_caps(gpu0, gpu1);
-
-    if (peer_caps->link_type != UVM_GPU_LINK_INVALID && peer_caps->is_indirect_peer) {
-        UVM_ASSERT(gpu0->mem_info.numa.enabled);
-        UVM_ASSERT(gpu1->mem_info.numa.enabled);
-        UVM_ASSERT(peer_caps->link_type != UVM_GPU_LINK_PCIE);
-        UVM_ASSERT(!uvm_gpus_are_nvswitch_connected(gpu0, gpu1));
-        return true;
-    }
-
-    return false;
+    return gpu0->parent == gpu1->parent;
 }

 // Retrieve the virtual address corresponding to the given vidmem physical
@@ -1596,9 +1643,6 @@ static bool uvm_parent_gpu_needs_proxy_channel_pool(const uvm_parent_gpu_t *pare

 uvm_aperture_t uvm_get_page_tree_location(const uvm_parent_gpu_t *parent_gpu);

-// Debug print of GPU properties
-void uvm_gpu_print(uvm_gpu_t *gpu);
-
 // Add the given instance pointer -> user_channel mapping to this GPU. The
 // bottom half GPU page fault handler uses this to look up the VA space for GPU
 // faults.
@@ -1611,16 +1655,25 @@ void uvm_parent_gpu_remove_user_channel(uvm_parent_gpu_t *parent_gpu, uvm_user_c
 //  NV_ERR_PAGE_TABLE_NOT_AVAIL  Entry's instance pointer is valid but the entry
 //                               targets an invalid subcontext
 //
-// out_va_space is valid if NV_OK is returned, otherwise it's NULL. The caller
-// is responsibile for ensuring that the returned va_space can't be destroyed,
-// so these functions should only be called from the bottom half.
+// out_va_space is valid if NV_OK is returned, otherwise it's NULL.
+// out_gpu is valid if NV_OK is returned, otherwise it's NULL.
+// The caller is responsible for ensuring that the returned va_space and gpu
+// can't be destroyed, so this function should only be called from the bottom
+// half.
 NV_STATUS uvm_parent_gpu_fault_entry_to_va_space(uvm_parent_gpu_t *parent_gpu,
-                                                 uvm_fault_buffer_entry_t *fault,
-                                                 uvm_va_space_t **out_va_space);
+                                                 const uvm_fault_buffer_entry_t *fault,
+                                                 uvm_va_space_t **out_va_space,
+                                                 uvm_gpu_t **out_gpu);

+// Return the GPU VA space for the given instance pointer and ve_id in the
+// access counter entry. This function can only be used for virtual address
+// entries.
+// The return values are the same as uvm_parent_gpu_fault_entry_to_va_space()
+// but for virtual access counter entries.
 NV_STATUS uvm_parent_gpu_access_counter_entry_to_va_space(uvm_parent_gpu_t *parent_gpu,
-                                                          uvm_access_counter_buffer_entry_t *entry,
-                                                          uvm_va_space_t **out_va_space);
+                                                          const uvm_access_counter_buffer_entry_t *entry,
+                                                          uvm_va_space_t **out_va_space,
+                                                          uvm_gpu_t **out_gpu);

 typedef enum
 {
@@ -1629,4 +1682,7 @@ typedef enum
    UVM_GPU_BUFFER_FLUSH_MODE_WAIT_UPDATE_PUT,
 } uvm_gpu_buffer_flush_mode_t;

+// PCIe BAR containing static framebuffer memory mappings for PCIe P2P
+int uvm_device_p2p_static_bar(uvm_gpu_t *gpu);
+
 #endif // __UVM_GPU_H__
--- a/kernel-open/nvidia-uvm/uvm_gpu_access_counters.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu_access_counters.c
@@ -24,6 +24,7 @@
 #include "nv_uvm_interface.h"
 #include "uvm_gpu_access_counters.h"
 #include "uvm_global.h"
+#include "uvm_api.h"
 #include "uvm_gpu.h"
 #include "uvm_hal.h"
 #include "uvm_kvmalloc.h"
@@ -43,8 +44,9 @@
 #define UVM_PERF_ACCESS_COUNTER_THRESHOLD_MAX       ((1 << 16) - 1)
 #define UVM_PERF_ACCESS_COUNTER_THRESHOLD_DEFAULT   256

-#define UVM_ACCESS_COUNTER_ACTION_CLEAR     0x1
-#define UVM_ACCESS_COUNTER_PHYS_ON_MANAGED  0x2
+#define UVM_ACCESS_COUNTER_ACTION_BATCH_CLEAR       0x1
+#define UVM_ACCESS_COUNTER_ACTION_TARGETED_CLEAR    0x2
+#define UVM_ACCESS_COUNTER_PHYS_ON_MANAGED          0x4

 // Each page in a tracked physical range may belong to a different VA Block. We
 // preallocate an array of reverse map translations. However, access counter
@@ -600,7 +602,7 @@ NV_STATUS uvm_gpu_access_counters_enable(uvm_gpu_t *gpu, uvm_va_space_t *va_spac
    uvm_parent_gpu_access_counters_isr_lock(gpu->parent);

    if (uvm_parent_processor_mask_test(&va_space->access_counters_enabled_processors, gpu->parent->id)) {
-        status = NV_ERR_INVALID_DEVICE;
+        status = NV_OK;
    }
    else {
        UvmGpuAccessCntrConfig default_config =
@@ -684,7 +686,10 @@ static void access_counter_buffer_flush_locked(uvm_parent_gpu_t *parent_gpu,

    while (get != put) {
        // Wait until valid bit is set
-        UVM_SPIN_WHILE(!parent_gpu->access_counter_buffer_hal->entry_is_valid(parent_gpu, get), &spin);
+        UVM_SPIN_WHILE(!parent_gpu->access_counter_buffer_hal->entry_is_valid(parent_gpu, get), &spin) {
+            if (uvm_global_get_status() != NV_OK)
+                goto done;
+        }

        parent_gpu->access_counter_buffer_hal->entry_clear_valid(parent_gpu, get);
        ++get;
@@ -692,6 +697,7 @@ static void access_counter_buffer_flush_locked(uvm_parent_gpu_t *parent_gpu,
            get = 0;
    }

+done:
    write_get(parent_gpu, get);
 }

@@ -734,9 +740,18 @@ static int cmp_sort_virt_notifications_by_instance_ptr(const void *_a, const voi
    return cmp_access_counter_instance_ptr(a, b);
 }

+// Compare two GPUs
+static inline int cmp_gpu(const uvm_gpu_t *a, const uvm_gpu_t *b)
+{
+    NvU32 id_a = a ? uvm_id_value(a->id) : 0;
+    NvU32 id_b = b ? uvm_id_value(b->id) : 0;
+
+    return UVM_CMP_DEFAULT(id_a, id_b);
+}
+
 // Sort comparator for pointers to GVA access counter notification buffer
-// entries that sorts by va_space, and fault address.
-static int cmp_sort_virt_notifications_by_va_space_address(const void *_a, const void *_b)
+// entries that sorts by va_space, GPU ID, and fault address.
+static int cmp_sort_virt_notifications_by_va_space_gpu_address(const void *_a, const void *_b)
 {
    const uvm_access_counter_buffer_entry_t **a = (const uvm_access_counter_buffer_entry_t **)_a;
    const uvm_access_counter_buffer_entry_t **b = (const uvm_access_counter_buffer_entry_t **)_b;
@@ -747,6 +762,10 @@ static int cmp_sort_virt_notifications_by_va_space_address(const void *_a, const
    if (result != 0)
        return result;

+    result = cmp_gpu((*a)->gpu, (*b)->gpu);
+    if (result != 0)
+        return result;
+
    return UVM_CMP_DEFAULT((*a)->address.address, (*b)->address.address);
 }

@@ -774,7 +793,7 @@ typedef enum
    NOTIFICATION_FETCH_MODE_ALL,
 } notification_fetch_mode_t;

-static NvU32 fetch_access_counter_buffer_entries(uvm_gpu_t *gpu,
+static NvU32 fetch_access_counter_buffer_entries(uvm_parent_gpu_t *parent_gpu,
                                                 uvm_access_counter_service_batch_context_t *batch_context,
                                                 notification_fetch_mode_t fetch_mode)
 {
@@ -783,12 +802,12 @@ static NvU32 fetch_access_counter_buffer_entries(uvm_gpu_t *gpu,
    NvU32 notification_index;
    uvm_access_counter_buffer_entry_t *notification_cache;
    uvm_spin_loop_t spin;
-    uvm_access_counter_buffer_info_t *access_counters = &gpu->parent->access_counter_buffer_info;
+    uvm_access_counter_buffer_info_t *access_counters = &parent_gpu->access_counter_buffer_info;
    NvU32 last_instance_ptr_idx = 0;
    uvm_aperture_t last_aperture = UVM_APERTURE_PEER_MAX;

-    UVM_ASSERT(uvm_sem_is_locked(&gpu->parent->isr.access_counters.service_lock));
-    UVM_ASSERT(gpu->parent->access_counters_supported);
+    UVM_ASSERT(uvm_sem_is_locked(&parent_gpu->isr.access_counters.service_lock));
+    UVM_ASSERT(parent_gpu->access_counters_supported);

    notification_cache = batch_context->notification_cache;

@@ -817,19 +836,25 @@ static NvU32 fetch_access_counter_buffer_entries(uvm_gpu_t *gpu,
           (fetch_mode == NOTIFICATION_FETCH_MODE_ALL || notification_index < access_counters->max_batch_size)) {
        uvm_access_counter_buffer_entry_t *current_entry = &notification_cache[notification_index];

-        // We cannot just wait for the last entry (the one pointed by put) to become valid, we have to do it
-        // individually since entries can be written out of order
-        UVM_SPIN_WHILE(!gpu->parent->access_counter_buffer_hal->entry_is_valid(gpu->parent, get), &spin) {
+        // We cannot just wait for the last entry (the one pointed by put) to
+        // become valid, we have to do it individually since entries can be
+        // written out of order
+        UVM_SPIN_WHILE(!parent_gpu->access_counter_buffer_hal->entry_is_valid(parent_gpu, get), &spin) {
            // We have some entry to work on. Let's do the rest later.
            if (fetch_mode != NOTIFICATION_FETCH_MODE_ALL && notification_index > 0)
                goto done;
+
+            // There's no entry to work on and something has gone wrong. Ignore
+            // the rest.
+            if (uvm_global_get_status() != NV_OK)
+               goto done;
        }

        // Prevent later accesses being moved above the read of the valid bit
        smp_mb__after_atomic();

        // Got valid bit set. Let's cache.
-        gpu->parent->access_counter_buffer_hal->parse_entry(gpu->parent, get, current_entry);
+        parent_gpu->access_counter_buffer_hal->parse_entry(parent_gpu, get, current_entry);

        if (current_entry->address.is_virtual) {
            batch_context->virt.notifications[batch_context->virt.num_notifications++] = current_entry;
@@ -845,26 +870,38 @@ static NvU32 fetch_access_counter_buffer_entries(uvm_gpu_t *gpu,
            }
        }
        else {
-            const NvU64 translation_size = get_config_for_type(access_counters, current_entry->counter_type)->translation_size;
+            NvU64 translation_size;
+            uvm_gpu_t *gpu;
+
+            translation_size = get_config_for_type(access_counters,
+                                                   current_entry->counter_type)->translation_size;
            current_entry->address.address = UVM_ALIGN_DOWN(current_entry->address.address, translation_size);

            batch_context->phys.notifications[batch_context->phys.num_notifications++] = current_entry;

-            current_entry->physical_info.resident_id =
-                uvm_gpu_get_processor_id_by_address(gpu, uvm_gpu_phys_address(current_entry->address.aperture,
-                                                                              current_entry->address.address));
-
-            if (batch_context->phys.is_single_aperture) {
-                if (batch_context->phys.num_notifications == 1)
-                    last_aperture = current_entry->address.aperture;
-                else if (current_entry->address.aperture != last_aperture)
-                    batch_context->phys.is_single_aperture = false;
+            gpu = uvm_parent_gpu_find_first_valid_gpu(parent_gpu);
+            if (!gpu) {
+                current_entry->physical_info.resident_id = UVM_ID_INVALID;
+                current_entry->gpu = NULL;
            }
+            else {
+                current_entry->gpu = gpu;
+                current_entry->physical_info.resident_id =
+                    uvm_gpu_get_processor_id_by_address(gpu, uvm_gpu_phys_address(current_entry->address.aperture,
+                                                                                  current_entry->address.address));

-            if (current_entry->counter_type == UVM_ACCESS_COUNTER_TYPE_MOMC)
-                UVM_ASSERT(uvm_id_equal(current_entry->physical_info.resident_id, gpu->id));
-            else
-                UVM_ASSERT(!uvm_id_equal(current_entry->physical_info.resident_id, gpu->id));
+                if (batch_context->phys.is_single_aperture) {
+                    if (batch_context->phys.num_notifications == 1)
+                        last_aperture = current_entry->address.aperture;
+                    else if (current_entry->address.aperture != last_aperture)
+                        batch_context->phys.is_single_aperture = false;
+                }
+
+                if (current_entry->counter_type == UVM_ACCESS_COUNTER_TYPE_MOMC)
+                    UVM_ASSERT(uvm_id_equal(current_entry->physical_info.resident_id, gpu->id));
+                else
+                    UVM_ASSERT(!uvm_id_equal(current_entry->physical_info.resident_id, gpu->id));
+            }
        }

        ++notification_index;
@@ -874,7 +911,7 @@ static NvU32 fetch_access_counter_buffer_entries(uvm_gpu_t *gpu,
    }

 done:
-    write_get(gpu->parent, get);
+    write_get(parent_gpu, get);

    return notification_index;
 }
@@ -895,12 +932,16 @@ static void translate_virt_notifications_instance_ptrs(uvm_parent_gpu_t *parent_
            // simply be ignored in subsequent processing.
            status = uvm_parent_gpu_access_counter_entry_to_va_space(parent_gpu,
                                                                     current_entry,
-                                                                     &current_entry->virtual_info.va_space);
-            if (status != NV_OK)
+                                                                     &current_entry->virtual_info.va_space,
+                                                                     &current_entry->gpu);
+            if (status != NV_OK) {
                UVM_ASSERT(current_entry->virtual_info.va_space == NULL);
+                UVM_ASSERT(current_entry->gpu == NULL);
+            }
        }
        else {
            current_entry->virtual_info.va_space = batch_context->virt.notifications[i - 1]->virtual_info.va_space;
+            current_entry->gpu = batch_context->virt.notifications[i - 1]->gpu;
        }
    }
 }
@@ -924,7 +965,7 @@ static void preprocess_virt_notifications(uvm_parent_gpu_t *parent_gpu,
    sort(batch_context->virt.notifications,
         batch_context->virt.num_notifications,
         sizeof(*batch_context->virt.notifications),
-         cmp_sort_virt_notifications_by_va_space_address,
+         cmp_sort_virt_notifications_by_va_space_gpu_address,
         NULL);
 }

@@ -942,13 +983,17 @@ static void preprocess_phys_notifications(uvm_access_counter_service_batch_conte
    }
 }

-static NV_STATUS notify_tools_and_process_flags(uvm_gpu_t *gpu,
-                                                uvm_access_counter_buffer_entry_t **notification_start,
-                                                NvU32 num_entries,
-                                                NvU32 flags)
+static NV_STATUS notify_tools_broadcast_and_process_flags(uvm_parent_gpu_t *parent_gpu,
+                                                          uvm_access_counter_buffer_entry_t **notification_start,
+                                                          NvU32 num_entries,
+                                                          NvU32 flags)
 {
+    uvm_gpu_t *gpu = uvm_parent_gpu_find_first_valid_gpu(parent_gpu);
    NV_STATUS status = NV_OK;

+    if (!gpu)
+        return NV_OK;
+
    if (uvm_enable_builtin_tests) {
        // TODO: Bug 4310744: [UVM][TOOLS] Attribute access counter tools events
        //                    to va_space instead of broadcasting.
@@ -958,12 +1003,72 @@ static NV_STATUS notify_tools_and_process_flags(uvm_gpu_t *gpu,
            uvm_tools_broadcast_access_counter(gpu, notification_start[i], flags & UVM_ACCESS_COUNTER_PHYS_ON_MANAGED);
    }

-    if (flags & UVM_ACCESS_COUNTER_ACTION_CLEAR)
+    UVM_ASSERT(!(flags & UVM_ACCESS_COUNTER_ACTION_TARGETED_CLEAR));
+
+    if (flags & UVM_ACCESS_COUNTER_ACTION_BATCH_CLEAR)
        status = access_counter_clear_notifications(gpu, notification_start, num_entries);

    return status;
 }

+static NV_STATUS notify_tools_and_process_flags(uvm_va_space_t *va_space,
+                                                uvm_gpu_t *gpu,
+                                                NvU64 base,
+                                                uvm_access_counter_buffer_entry_t **notification_start,
+                                                NvU32 num_entries,
+                                                NvU32 flags,
+                                                uvm_page_mask_t *migrated_mask)
+{
+    NV_STATUS status = NV_OK;
+
+    if (uvm_enable_builtin_tests) {
+        NvU32 i;
+
+        for (i = 0; i < num_entries; i++) {
+            uvm_tools_record_access_counter(va_space,
+                                            gpu->id,
+                                            notification_start[i],
+                                            flags & UVM_ACCESS_COUNTER_PHYS_ON_MANAGED);
+        }
+    }
+
+    if (flags & UVM_ACCESS_COUNTER_ACTION_TARGETED_CLEAR) {
+        NvU32 i;
+
+        UVM_ASSERT(base);
+        UVM_ASSERT(migrated_mask);
+
+        for (i = 0; i < num_entries; i++) {
+            NvU32 start_index = i;
+            NvU32 end_index;
+
+            for (end_index = i; end_index < num_entries; end_index++) {
+                NvU32 mask_index = (notification_start[end_index]->address.address - base) / PAGE_SIZE;
+
+                if (!uvm_page_mask_test(migrated_mask, mask_index))
+                    break;
+            }
+
+            if (end_index > start_index) {
+                status = access_counter_clear_notifications(gpu,
+                                                            &notification_start[start_index],
+                                                            end_index - start_index);
+                if (status != NV_OK)
+                    return status;
+            }
+
+            i = end_index;
+        }
+    }
+    else if (flags & UVM_ACCESS_COUNTER_ACTION_BATCH_CLEAR) {
+        UVM_ASSERT(!base);
+        UVM_ASSERT(!migrated_mask);
+        status = access_counter_clear_notifications(gpu, notification_start, num_entries);
+    }
+
+    return status;
+}
+
 static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,
                                         uvm_va_block_t *va_block,
                                         uvm_va_block_retry_t *va_block_retry,
@@ -1087,12 +1192,12 @@ static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,
    // pages to be serviced
    if (page_count > 0) {
        uvm_processor_id_t id;
-        uvm_processor_mask_t update_processors;
+        uvm_processor_mask_t *update_processors = &service_context->update_processors;

-        uvm_processor_mask_and(&update_processors, &va_block->resident, &service_context->resident_processors);
+        uvm_processor_mask_and(update_processors, &va_block->resident, &service_context->resident_processors);

        // Remove pages that are already resident in the destination processors
-        for_each_id_in_mask(id, &update_processors) {
+        for_each_id_in_mask(id, update_processors) {
            bool migrate_pages;
            uvm_page_mask_t *residency_mask = uvm_va_block_resident_mask_get(va_block, id, NUMA_NO_NODE);
            UVM_ASSERT(residency_mask);
@@ -1169,13 +1274,13 @@ static void reverse_mappings_to_va_block_page_mask(uvm_va_block_t *va_block,
    }
 }

-static NV_STATUS service_phys_single_va_block(uvm_gpu_t *gpu,
-                                              uvm_access_counter_service_batch_context_t *batch_context,
+static NV_STATUS service_phys_single_va_block(uvm_access_counter_service_batch_context_t *batch_context,
                                              const uvm_access_counter_buffer_entry_t *current_entry,
                                              const uvm_reverse_map_t *reverse_mappings,
                                              size_t num_reverse_mappings,
                                              NvU32 *out_flags)
 {
+    uvm_gpu_t *gpu = current_entry->gpu;
    size_t index;
    uvm_va_block_t *va_block = reverse_mappings[0].va_block;
    uvm_va_space_t *va_space = NULL;
@@ -1184,7 +1289,7 @@ static NV_STATUS service_phys_single_va_block(uvm_gpu_t *gpu,
    const uvm_processor_id_t processor = current_entry->counter_type == UVM_ACCESS_COUNTER_TYPE_MIMC?
                                             gpu->id: UVM_ID_CPU;

-    *out_flags &= ~UVM_ACCESS_COUNTER_ACTION_CLEAR;
+    *out_flags &= ~UVM_ACCESS_COUNTER_ACTION_BATCH_CLEAR;

    UVM_ASSERT(num_reverse_mappings > 0);

@@ -1246,7 +1351,7 @@ static NV_STATUS service_phys_single_va_block(uvm_gpu_t *gpu,
        }

        if (status == NV_OK)
-            *out_flags |= UVM_ACCESS_COUNTER_ACTION_CLEAR;
+            *out_flags |= UVM_ACCESS_COUNTER_ACTION_BATCH_CLEAR;
    }

 done:
@@ -1262,8 +1367,7 @@ done:
    return status;
 }

-static NV_STATUS service_phys_va_blocks(uvm_gpu_t *gpu,
-                                        uvm_access_counter_service_batch_context_t *batch_context,
+static NV_STATUS service_phys_va_blocks(uvm_access_counter_service_batch_context_t *batch_context,
                                        const uvm_access_counter_buffer_entry_t *current_entry,
                                        const uvm_reverse_map_t *reverse_mappings,
                                        size_t num_reverse_mappings,
@@ -1272,12 +1376,11 @@ static NV_STATUS service_phys_va_blocks(uvm_gpu_t *gpu,
    NV_STATUS status = NV_OK;
    size_t index;

-    *out_flags &= ~UVM_ACCESS_COUNTER_ACTION_CLEAR;
+    *out_flags &= ~UVM_ACCESS_COUNTER_ACTION_BATCH_CLEAR;

    for (index = 0; index < num_reverse_mappings; ++index) {
        NvU32 out_flags_local = 0;
-        status = service_phys_single_va_block(gpu,
-                                              batch_context,
+        status = service_phys_single_va_block(batch_context,
                                              current_entry,
                                              reverse_mappings + index,
                                              1,
@@ -1285,7 +1388,7 @@ static NV_STATUS service_phys_va_blocks(uvm_gpu_t *gpu,
        if (status != NV_OK)
            break;

-        UVM_ASSERT((out_flags_local & ~UVM_ACCESS_COUNTER_ACTION_CLEAR) == 0);
+        UVM_ASSERT((out_flags_local & ~UVM_ACCESS_COUNTER_ACTION_BATCH_CLEAR) == 0);
        *out_flags |= out_flags_local;
    }

@@ -1326,8 +1429,7 @@ static bool are_reverse_mappings_on_single_block(const uvm_reverse_map_t *revers
 // Service the given translation range. It will return the count of the reverse
 // mappings found during servicing in num_reverse_mappings, even if the function
 // doesn't return NV_OK.
-static NV_STATUS service_phys_notification_translation(uvm_gpu_t *gpu,
-                                                       uvm_gpu_t *resident_gpu,
+static NV_STATUS service_phys_notification_translation(uvm_gpu_t *resident_gpu,
                                                       uvm_access_counter_service_batch_context_t *batch_context,
                                                       const uvm_gpu_access_counter_type_config_t *config,
                                                       const uvm_access_counter_buffer_entry_t *current_entry,
@@ -1336,6 +1438,7 @@ static NV_STATUS service_phys_notification_translation(uvm_gpu_t *gpu,
                                                       size_t *num_reverse_mappings,
                                                       NvU32 *out_flags)
 {
+    uvm_gpu_t *gpu = current_entry->gpu;
    NV_STATUS status;
    NvU32 region_start, region_end;

@@ -1373,16 +1476,14 @@ static NV_STATUS service_phys_notification_translation(uvm_gpu_t *gpu,

    // Service all the translations
    if (are_reverse_mappings_on_single_block(batch_context->phys.translations, *num_reverse_mappings)) {
-        status = service_phys_single_va_block(gpu,
-                                              batch_context,
+        status = service_phys_single_va_block(batch_context,
                                              current_entry,
                                              batch_context->phys.translations,
                                              *num_reverse_mappings,
                                              out_flags);
    }
    else {
-        status = service_phys_va_blocks(gpu,
-                                        batch_context,
+        status = service_phys_va_blocks(batch_context,
                                        current_entry,
                                        batch_context->phys.translations,
                                        *num_reverse_mappings,
@@ -1392,14 +1493,14 @@ static NV_STATUS service_phys_notification_translation(uvm_gpu_t *gpu,
    return status;
 }

-static NV_STATUS service_phys_notification(uvm_gpu_t *gpu,
-                                           uvm_access_counter_service_batch_context_t *batch_context,
-                                           const uvm_access_counter_buffer_entry_t *current_entry,
-                                           NvU32 *out_flags)
+static NV_STATUS service_phys_notification(uvm_access_counter_service_batch_context_t *batch_context,
+                                           uvm_access_counter_buffer_entry_t *current_entry)
 {
    NvU64 address;
    NvU64 translation_index;
-    uvm_access_counter_buffer_info_t *access_counters = &gpu->parent->access_counter_buffer_info;
+    uvm_gpu_t *gpu = current_entry->gpu;
+    uvm_parent_gpu_t *parent_gpu = gpu->parent;
+    uvm_access_counter_buffer_info_t *access_counters = &parent_gpu->access_counter_buffer_info;
    uvm_access_counter_type_t counter_type = current_entry->counter_type;
    const uvm_gpu_access_counter_type_config_t *config = get_config_for_type(access_counters, counter_type);
    unsigned long sub_granularity;
@@ -1419,7 +1520,7 @@ static NV_STATUS service_phys_notification(uvm_gpu_t *gpu,
        resident_gpu = uvm_gpu_get(current_entry->physical_info.resident_id);
        UVM_ASSERT(resident_gpu != NULL);

-        if (gpu != resident_gpu && uvm_gpus_are_nvswitch_connected(gpu, resident_gpu)) {
+        if (gpu != resident_gpu && uvm_parent_gpus_are_nvswitch_connected(gpu->parent, resident_gpu->parent)) {
            UVM_ASSERT(address >= resident_gpu->parent->nvswitch_info.fabric_memory_window_start);
            address -= resident_gpu->parent->nvswitch_info.fabric_memory_window_start;
        }
@@ -1429,14 +1530,13 @@ static NV_STATUS service_phys_notification(uvm_gpu_t *gpu,
        // fall outside of the allocatable address range. We just drop
        // them.
        if (address >= resident_gpu->mem_info.max_allocatable_address)
-            return NV_OK;
+            goto out;
    }

    for (translation_index = 0; translation_index < config->translations_per_counter; ++translation_index) {
        size_t num_reverse_mappings;
        NvU32 out_flags_local = 0;
-        status = service_phys_notification_translation(gpu,
-                                                       resident_gpu,
+        status = service_phys_notification_translation(resident_gpu,
                                                       batch_context,
                                                       config,
                                                       current_entry,
@@ -1446,7 +1546,7 @@ static NV_STATUS service_phys_notification(uvm_gpu_t *gpu,
                                                       &out_flags_local);
        total_reverse_mappings += num_reverse_mappings;

-        UVM_ASSERT((out_flags_local & ~UVM_ACCESS_COUNTER_ACTION_CLEAR) == 0);
+        UVM_ASSERT((out_flags_local & ~UVM_ACCESS_COUNTER_ACTION_BATCH_CLEAR) == 0);
        flags |= out_flags_local;

        if (status != NV_OK)
@@ -1457,37 +1557,32 @@ static NV_STATUS service_phys_notification(uvm_gpu_t *gpu,
    }

    if (uvm_enable_builtin_tests)
-        *out_flags |= ((total_reverse_mappings != 0) ? UVM_ACCESS_COUNTER_PHYS_ON_MANAGED : 0);
-
-    if (status == NV_OK && (flags & UVM_ACCESS_COUNTER_ACTION_CLEAR))
-        *out_flags |= UVM_ACCESS_COUNTER_ACTION_CLEAR;
+        flags |= ((total_reverse_mappings != 0) ? UVM_ACCESS_COUNTER_PHYS_ON_MANAGED : 0);

+out:
+    notify_tools_broadcast_and_process_flags(parent_gpu, &current_entry, 1, flags);
    return status;
 }

 // TODO: Bug 2018899: Add statistics for dropped access counter notifications
-static NV_STATUS service_phys_notifications(uvm_gpu_t *gpu,
+static NV_STATUS service_phys_notifications(uvm_parent_gpu_t *parent_gpu,
                                            uvm_access_counter_service_batch_context_t *batch_context)
 {
    NvU32 i;
    uvm_access_counter_buffer_entry_t **notifications = batch_context->phys.notifications;

-    UVM_ASSERT(gpu->parent->access_counters_can_use_physical_addresses);
+    UVM_ASSERT(parent_gpu->access_counters_can_use_physical_addresses);

    preprocess_phys_notifications(batch_context);

    for (i = 0; i < batch_context->phys.num_notifications; ++i) {
        NV_STATUS status;
        uvm_access_counter_buffer_entry_t *current_entry = notifications[i];
-        NvU32 flags = 0;

        if (!UVM_ID_IS_VALID(current_entry->physical_info.resident_id))
            continue;

-        status = service_phys_notification(gpu, batch_context, current_entry, &flags);
-
-        notify_tools_and_process_flags(gpu, &notifications[i], 1, flags);
-
+        status = service_phys_notification(batch_context, current_entry);
        if (status != NV_OK)
            return status;
    }
@@ -1562,7 +1657,7 @@ static void expand_notification_block(uvm_gpu_va_space_t *gpu_va_space,
        return;

    if (UVM_ID_IS_GPU(resident_id))
-        resident_gpu = uvm_va_space_get_gpu(gpu_va_space->va_space, resident_id);
+        resident_gpu = uvm_gpu_get(resident_id);

    if (uvm_va_block_get_physical_size(va_block, resident_id, page_index) != granularity) {
        uvm_page_mask_set(accessed_pages, page_index);
@@ -1624,16 +1719,14 @@ static NV_STATUS service_virt_notifications_in_block(uvm_gpu_va_space_t *gpu_va_
        uvm_access_counter_buffer_entry_t *current_entry = notifications[i];
        NvU64 address = current_entry->address.address;

-        if ((current_entry->virtual_info.va_space == va_space) && (address <= va_block->end)) {
-            expand_notification_block(gpu_va_space,
-                                      va_block,
-                                      batch_context->block_service_context.block_context,
-                                      accessed_pages,
-                                      current_entry);
-        }
-        else {
+        if (current_entry->virtual_info.va_space != va_space || current_entry->gpu != gpu || address > va_block->end)
            break;
-        }
+
+        expand_notification_block(gpu_va_space,
+                                  va_block,
+                                  batch_context->block_service_context.block_context,
+                                  accessed_pages,
+                                  current_entry);
    }

    *out_index = i;
@@ -1646,9 +1739,15 @@ static NV_STATUS service_virt_notifications_in_block(uvm_gpu_va_space_t *gpu_va_
    uvm_mutex_unlock(&va_block->lock);

    if (status == NV_OK)
-        flags |= UVM_ACCESS_COUNTER_ACTION_CLEAR;
+        flags |= UVM_ACCESS_COUNTER_ACTION_BATCH_CLEAR;

-    flags_status = notify_tools_and_process_flags(gpu, &notifications[index], *out_index - index, flags);
+    flags_status = notify_tools_and_process_flags(va_space,
+                                                  gpu,
+                                                  0,
+                                                  &notifications[index],
+                                                  *out_index - index,
+                                                  flags,
+                                                  NULL);

    if ((status == NV_OK) && (flags_status != NV_OK))
        status = flags_status;
@@ -1667,7 +1766,6 @@ static NV_STATUS service_virt_notification_ats(uvm_gpu_va_space_t *gpu_va_space,
    NvU64 base;
    NvU64 end;
    NvU64 address;
-    NvU32 flags = UVM_ACCESS_COUNTER_ACTION_CLEAR;
    NV_STATUS status = NV_OK;
    NV_STATUS flags_status;
    struct vm_area_struct *vma = NULL;
@@ -1687,7 +1785,13 @@ static NV_STATUS service_virt_notification_ats(uvm_gpu_va_space_t *gpu_va_space,
    if (!vma) {
        // Clear the notification entry to continue receiving access counter
        // notifications when a new VMA is allocated in this range.
-        status = notify_tools_and_process_flags(gpu, &notifications[index], 1, flags);
+        status = notify_tools_and_process_flags(va_space,
+                                                gpu,
+                                                0,
+                                                &notifications[index],
+                                                1,
+                                                UVM_ACCESS_COUNTER_ACTION_BATCH_CLEAR,
+                                                NULL);
        *out_index = index + 1;
        return status;
    }
@@ -1695,16 +1799,16 @@ static NV_STATUS service_virt_notification_ats(uvm_gpu_va_space_t *gpu_va_space,
    base = UVM_VA_BLOCK_ALIGN_DOWN(address);
    end = min(base + UVM_VA_BLOCK_SIZE, (NvU64)vma->vm_end);

-    uvm_page_mask_zero(&ats_context->accessed_mask);
+    uvm_page_mask_zero(&ats_context->access_counters.accessed_mask);

    for (i = index; i < batch_context->virt.num_notifications; i++) {
        uvm_access_counter_buffer_entry_t *current_entry = notifications[i];
        address = current_entry->address.address;

-        if ((current_entry->virtual_info.va_space == va_space) && (address < end))
-            uvm_page_mask_set(&ats_context->accessed_mask, (address - base) / PAGE_SIZE);
-        else
+        if (current_entry->virtual_info.va_space != va_space || current_entry->gpu != gpu || address >= end)
            break;
+
+        uvm_page_mask_set(&ats_context->access_counters.accessed_mask, (address - base) / PAGE_SIZE);
    }

    *out_index = i;
@@ -1716,10 +1820,15 @@ static NV_STATUS service_virt_notification_ats(uvm_gpu_va_space_t *gpu_va_space,
    //                    location is set
    // If no pages were actually migrated, don't clear the access counters.
    status = uvm_ats_service_access_counters(gpu_va_space, vma, base, ats_context);
-    if (status != NV_OK)
-        flags &= ~UVM_ACCESS_COUNTER_ACTION_CLEAR;

-    flags_status = notify_tools_and_process_flags(gpu, &notifications[index], *out_index - index, flags);
+    flags_status = notify_tools_and_process_flags(va_space,
+                                                  gpu,
+                                                  base,
+                                                  &notifications[index],
+                                                  *out_index - index,
+                                                  UVM_ACCESS_COUNTER_ACTION_TARGETED_CLEAR,
+                                                  &ats_context->access_counters.migrated_mask);
+
    if ((status == NV_OK) && (flags_status != NV_OK))
        status = flags_status;

@@ -1753,25 +1862,32 @@ static NV_STATUS service_virt_notifications_batch(uvm_gpu_va_space_t *gpu_va_spa
        // Avoid clearing the entry by default.
        NvU32 flags = 0;
        uvm_va_block_t *va_block = NULL;
+        uvm_va_range_managed_t *managed_range = uvm_va_range_to_managed_or_null(va_range);

-        if (va_range->type == UVM_VA_RANGE_TYPE_MANAGED) {
-            size_t index = uvm_va_range_block_index(va_range, address);
+        if (managed_range) {
+            size_t index = uvm_va_range_block_index(managed_range, address);

-            va_block = uvm_va_range_block(va_range, index);
+            va_block = uvm_va_range_block(managed_range, index);

            // If the va_range is a managed range, the notification belongs to a
            // recently freed va_range if va_block is NULL. If va_block is not
            // NULL, service_virt_notifications_in_block will process flags.
            // Clear the notification entry to continue receiving notifications
            // when a new va_range is allocated in that region.
-            flags = UVM_ACCESS_COUNTER_ACTION_CLEAR;
+            flags = UVM_ACCESS_COUNTER_ACTION_BATCH_CLEAR;
        }

        if (va_block) {
            status = service_virt_notifications_in_block(gpu_va_space, mm, va_block, batch_context, index, out_index);
        }
        else {
-            status = notify_tools_and_process_flags(gpu_va_space->gpu, batch_context->virt.notifications, 1, flags);
+            status = notify_tools_and_process_flags(va_space,
+                                                    gpu_va_space->gpu,
+                                                    0,
+                                                    batch_context->virt.notifications,
+                                                    1,
+                                                    flags,
+                                                    NULL);
            *out_index = index + 1;
        }
    }
@@ -1793,7 +1909,7 @@ static NV_STATUS service_virt_notifications_batch(uvm_gpu_va_space_t *gpu_va_spa
        // - If the va_block isn't HMM, the notification belongs to a recently
        // freed va_range. Clear the notification entry to continue receiving
        // notifications when a new va_range is allocated in this region.
-        flags = va_block ? 0 : UVM_ACCESS_COUNTER_ACTION_CLEAR;
+        flags = va_block ? 0 : UVM_ACCESS_COUNTER_ACTION_BATCH_CLEAR;

        UVM_ASSERT((status == NV_ERR_OBJECT_NOT_FOUND) ||
                   (status == NV_ERR_INVALID_ADDRESS)  ||
@@ -1801,7 +1917,13 @@ static NV_STATUS service_virt_notifications_batch(uvm_gpu_va_space_t *gpu_va_spa

        // Clobber status to continue processing the rest of the notifications
        // in the batch.
-        status = notify_tools_and_process_flags(gpu_va_space->gpu, batch_context->virt.notifications, 1, flags);
+        status = notify_tools_and_process_flags(va_space,
+                                                gpu_va_space->gpu,
+                                                0,
+                                                batch_context->virt.notifications,
+                                                1,
+                                                flags,
+                                                NULL);

        *out_index = index + 1;
    }
@@ -1809,7 +1931,7 @@ static NV_STATUS service_virt_notifications_batch(uvm_gpu_va_space_t *gpu_va_spa
    return status;
 }

-static NV_STATUS service_virt_notifications(uvm_gpu_t *gpu,
+static NV_STATUS service_virt_notifications(uvm_parent_gpu_t *parent_gpu,
                                            uvm_access_counter_service_batch_context_t *batch_context)
 {
    NvU32 i = 0;
@@ -1817,18 +1939,19 @@ static NV_STATUS service_virt_notifications(uvm_gpu_t *gpu,
    struct mm_struct *mm = NULL;
    uvm_va_space_t *va_space = NULL;
    uvm_va_space_t *prev_va_space = NULL;
+    uvm_gpu_t *prev_gpu = NULL;
    uvm_gpu_va_space_t *gpu_va_space = NULL;

    // TODO: Bug 4299018 : Add support for virtual access counter migrations on
    //                     4K page sizes.
    if (PAGE_SIZE == UVM_PAGE_SIZE_4K) {
-        return notify_tools_and_process_flags(gpu,
-                                              batch_context->virt.notifications,
-                                              batch_context->virt.num_notifications,
-                                              0);
+        return notify_tools_broadcast_and_process_flags(parent_gpu,
+                                                        batch_context->virt.notifications,
+                                                        batch_context->virt.num_notifications,
+                                                        0);
    }

-    preprocess_virt_notifications(gpu->parent, batch_context);
+    preprocess_virt_notifications(parent_gpu, batch_context);

    while (i < batch_context->virt.num_notifications) {
        uvm_access_counter_buffer_entry_t *current_entry = batch_context->virt.notifications[i];
@@ -1842,25 +1965,40 @@ static NV_STATUS service_virt_notifications(uvm_gpu_t *gpu,
                uvm_va_space_mm_release_unlock(prev_va_space, mm);

                mm = NULL;
-                gpu_va_space = NULL;
+                prev_gpu = NULL;
            }

            // Acquire locks for the new va_space.
            if (va_space) {
                mm = uvm_va_space_mm_retain_lock(va_space);
                uvm_va_space_down_read(va_space);
-
-                gpu_va_space = uvm_gpu_va_space_get_by_parent_gpu(va_space, gpu->parent);
            }

            prev_va_space = va_space;
        }

-        if (va_space && gpu_va_space && uvm_va_space_has_access_counter_migrations(va_space)) {
-            status = service_virt_notifications_batch(gpu_va_space, mm, batch_context, i, &i);
+        if (va_space) {
+            if (prev_gpu != current_entry->gpu) {
+                prev_gpu = current_entry->gpu;
+                gpu_va_space = uvm_gpu_va_space_get(va_space, current_entry->gpu);
+            }
+
+            if (gpu_va_space && uvm_va_space_has_access_counter_migrations(va_space)) {
+                status = service_virt_notifications_batch(gpu_va_space, mm, batch_context, i, &i);
+            }
+            else {
+                status = notify_tools_and_process_flags(va_space,
+                                                        current_entry->gpu,
+                                                        0,
+                                                        &batch_context->virt.notifications[i],
+                                                        1,
+                                                        0,
+                                                        NULL);
+                i++;
+            }
        }
        else {
-            status = notify_tools_and_process_flags(gpu, &batch_context->virt.notifications[i], 1, 0);
+            status = notify_tools_broadcast_and_process_flags(parent_gpu, &batch_context->virt.notifications[i], 1, 0);
            i++;
        }

@@ -1876,19 +2014,18 @@ static NV_STATUS service_virt_notifications(uvm_gpu_t *gpu,
    return status;
 }

-
-void uvm_gpu_service_access_counters(uvm_gpu_t *gpu)
+void uvm_parent_gpu_service_access_counters(uvm_parent_gpu_t *parent_gpu)
 {
    NV_STATUS status = NV_OK;
-    uvm_access_counter_service_batch_context_t *batch_context = &gpu->parent->access_counter_buffer_info.batch_service_context;
+    uvm_access_counter_service_batch_context_t *batch_context = &parent_gpu->access_counter_buffer_info.batch_service_context;

-    UVM_ASSERT(gpu->parent->access_counters_supported);
+    UVM_ASSERT(parent_gpu->access_counters_supported);

-    if (gpu->parent->access_counter_buffer_info.notifications_ignored_count > 0)
+    if (parent_gpu->access_counter_buffer_info.notifications_ignored_count > 0)
        return;

    while (1) {
-        batch_context->num_cached_notifications = fetch_access_counter_buffer_entries(gpu,
+        batch_context->num_cached_notifications = fetch_access_counter_buffer_entries(parent_gpu,
                                                                                      batch_context,
                                                                                      NOTIFICATION_FETCH_MODE_BATCH_READY);
        if (batch_context->num_cached_notifications == 0)
@@ -1897,13 +2034,13 @@ void uvm_gpu_service_access_counters(uvm_gpu_t *gpu)
        ++batch_context->batch_id;

        if (batch_context->virt.num_notifications) {
-            status = service_virt_notifications(gpu, batch_context);
+            status = service_virt_notifications(parent_gpu, batch_context);
            if (status != NV_OK)
                break;
        }

        if (batch_context->phys.num_notifications) {
-            status = service_phys_notifications(gpu, batch_context);
+            status = service_phys_notifications(parent_gpu, batch_context);
            if (status != NV_OK)
                break;
        }
@@ -1912,10 +2049,68 @@ void uvm_gpu_service_access_counters(uvm_gpu_t *gpu)
    if (status != NV_OK) {
        UVM_DBG_PRINT("Error %s servicing access counter notifications on GPU: %s\n",
                      nvstatusToString(status),
-                      uvm_gpu_name(gpu));
+                      uvm_parent_gpu_name(parent_gpu));
    }
 }

+NV_STATUS uvm_api_clear_all_access_counters(UVM_CLEAR_ALL_ACCESS_COUNTERS_PARAMS *params, struct file *filp)
+{
+    uvm_gpu_t *gpu;
+    uvm_parent_gpu_t *parent_gpu = NULL;
+    NV_STATUS status = NV_OK;
+    uvm_va_space_t *va_space = uvm_va_space_get(filp);
+    uvm_processor_mask_t *retained_gpus;
+
+    retained_gpus = uvm_processor_mask_cache_alloc();
+    if (!retained_gpus)
+        return NV_ERR_NO_MEMORY;
+
+    uvm_processor_mask_zero(retained_gpus);
+
+    uvm_va_space_down_read(va_space);
+
+    for_each_va_space_gpu(gpu, va_space) {
+
+        if (gpu->parent == parent_gpu)
+            continue;
+
+        uvm_gpu_retain(gpu);
+        uvm_processor_mask_set(retained_gpus, gpu->id);
+        parent_gpu = gpu->parent;
+    }
+
+    uvm_va_space_up_read(va_space);
+
+    for_each_gpu_in_mask(gpu, retained_gpus) {
+
+        if (!gpu->parent->access_counters_supported)
+            continue;
+
+        uvm_parent_gpu_access_counters_isr_lock(gpu->parent);
+
+        // Access counters not enabled. Nothing to clear
+        if (gpu->parent->isr.access_counters.handling_ref_count) {
+            uvm_access_counter_buffer_info_t *access_counters = &gpu->parent->access_counter_buffer_info;
+
+            status = access_counter_clear_all(gpu);
+            if (status == NV_OK)
+                status = uvm_tracker_wait(&access_counters->clear_tracker);
+        }
+
+        uvm_parent_gpu_access_counters_isr_unlock(gpu->parent);
+
+        if (status != NV_OK)
+            break;
+    }
+
+    for_each_gpu_in_mask(gpu, retained_gpus)
+        uvm_gpu_release(gpu);
+
+    uvm_processor_mask_cache_free(retained_gpus);
+
+    return status;
+}
+
 static const NvU32 g_uvm_access_counters_threshold_max = (1 << 15) - 1;

 static NV_STATUS access_counters_config_from_test_params(const UVM_TEST_RECONFIGURE_ACCESS_COUNTERS_PARAMS *params,
--- a/kernel-open/nvidia-uvm/uvm_gpu_access_counters.h
+++ b/kernel-open/nvidia-uvm/uvm_gpu_access_counters.h
@@ -31,7 +31,7 @@ NV_STATUS uvm_parent_gpu_init_access_counters(uvm_parent_gpu_t *parent_gpu);
 void uvm_parent_gpu_deinit_access_counters(uvm_parent_gpu_t *parent_gpu);
 bool uvm_parent_gpu_access_counters_pending(uvm_parent_gpu_t *parent_gpu);

-void uvm_gpu_service_access_counters(uvm_gpu_t *gpu);
+void uvm_parent_gpu_service_access_counters(uvm_parent_gpu_t *parent_gpu);

 void uvm_parent_gpu_access_counter_buffer_flush(uvm_parent_gpu_t *parent_gpu);

--- a/kernel-open/nvidia-uvm/uvm_gpu_isr.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu_isr.c
@@ -479,17 +479,14 @@ void uvm_parent_gpu_deinit_isr(uvm_parent_gpu_t *parent_gpu)
    uvm_kvfree(parent_gpu->isr.access_counters.stats.cpu_exec_count);
 }

-static uvm_gpu_t *find_first_valid_gpu(uvm_parent_gpu_t *parent_gpu)
+uvm_gpu_t *uvm_parent_gpu_find_first_valid_gpu(uvm_parent_gpu_t *parent_gpu)
 {
    uvm_gpu_t *gpu;

    // When SMC is enabled, there's no longer a 1:1 relationship between the
-    // parent and the partitions.  But because all relevant interrupt paths
-    // are shared, as is the fault reporting logic, it's sufficient here
-    // to proceed with any valid uvm_gpu_t, even if the corresponding partition
-    // didn't cause all, or even any of the interrupts.
-    // The bottom half handlers will later find the appropriate partitions by
-    // attributing the notifications to VA spaces as necessary.
+    // parent and the partitions. It's sufficient to return any valid uvm_gpu_t
+    // since the purpose is to have a channel and push buffer for operations
+    // that affect the whole parent GPU.
    if (parent_gpu->smc.enabled) {
        NvU32 sub_processor_index;

@@ -518,13 +515,8 @@ static uvm_gpu_t *find_first_valid_gpu(uvm_parent_gpu_t *parent_gpu)
 static void replayable_faults_isr_bottom_half(void *args)
 {
    uvm_parent_gpu_t *parent_gpu = (uvm_parent_gpu_t *)args;
-    uvm_gpu_t *gpu;
    unsigned int cpu;

-    gpu = find_first_valid_gpu(parent_gpu);
-    if (gpu == NULL)
-        goto put_kref;
-
    UVM_ASSERT(parent_gpu->replayable_faults_supported);

    // Record the lock ownership
@@ -545,11 +537,10 @@ static void replayable_faults_isr_bottom_half(void *args)
    ++parent_gpu->isr.replayable_faults.stats.cpu_exec_count[cpu];
    put_cpu();

-    uvm_gpu_service_replayable_faults(gpu);
+    uvm_parent_gpu_service_replayable_faults(parent_gpu);

    uvm_parent_gpu_replayable_faults_isr_unlock(parent_gpu);

-put_kref:
    // It is OK to drop a reference on the parent GPU if a bottom half has
    // been retriggered within uvm_parent_gpu_replayable_faults_isr_unlock,
    // because the rescheduling added an additional reference.
@@ -564,13 +555,8 @@ static void replayable_faults_isr_bottom_half_entry(void *args)
 static void non_replayable_faults_isr_bottom_half(void *args)
 {
    uvm_parent_gpu_t *parent_gpu = (uvm_parent_gpu_t *)args;
-    uvm_gpu_t *gpu;
    unsigned int cpu;

-    gpu = find_first_valid_gpu(parent_gpu);
-    if (gpu == NULL)
-        goto put_kref;
-
    UVM_ASSERT(parent_gpu->non_replayable_faults_supported);

    uvm_parent_gpu_non_replayable_faults_isr_lock(parent_gpu);
@@ -584,11 +570,10 @@ static void non_replayable_faults_isr_bottom_half(void *args)
    ++parent_gpu->isr.non_replayable_faults.stats.cpu_exec_count[cpu];
    put_cpu();

-    uvm_gpu_service_non_replayable_fault_buffer(gpu);
+    uvm_parent_gpu_service_non_replayable_fault_buffer(parent_gpu);

    uvm_parent_gpu_non_replayable_faults_isr_unlock(parent_gpu);

-put_kref:
    uvm_parent_gpu_kref_put(parent_gpu);
 }

@@ -600,13 +585,8 @@ static void non_replayable_faults_isr_bottom_half_entry(void *args)
 static void access_counters_isr_bottom_half(void *args)
 {
    uvm_parent_gpu_t *parent_gpu = (uvm_parent_gpu_t *)args;
-    uvm_gpu_t *gpu;
    unsigned int cpu;

-    gpu = find_first_valid_gpu(parent_gpu);
-    if (gpu == NULL)
-        goto put_kref;
-
    UVM_ASSERT(parent_gpu->access_counters_supported);

    uvm_record_lock(&parent_gpu->isr.access_counters.service_lock, UVM_LOCK_FLAGS_MODE_SHARED);
@@ -620,11 +600,10 @@ static void access_counters_isr_bottom_half(void *args)
    ++parent_gpu->isr.access_counters.stats.cpu_exec_count[cpu];
    put_cpu();

-    uvm_gpu_service_access_counters(gpu);
+    uvm_parent_gpu_service_access_counters(parent_gpu);

    uvm_parent_gpu_access_counters_isr_unlock(parent_gpu);

-put_kref:
    uvm_parent_gpu_kref_put(parent_gpu);
 }

--- a/kernel-open/nvidia-uvm/uvm_gpu_isr.h
+++ b/kernel-open/nvidia-uvm/uvm_gpu_isr.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016-2023 NVIDIA Corporation
+    Copyright (c) 2016-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -193,4 +193,10 @@ void uvm_parent_gpu_access_counters_intr_disable(uvm_parent_gpu_t *parent_gpu);
 // parent_gpu->isr.interrupts_lock must be held to call this function.
 void uvm_parent_gpu_access_counters_intr_enable(uvm_parent_gpu_t *parent_gpu);

+// Return the first valid GPU given the parent GPU or NULL if no MIG instances
+// are registered. This should only be called from bottom halves or if the
+// g_uvm_global.global_lock is held so that the returned pointer remains valid.
+//
+uvm_gpu_t *uvm_parent_gpu_find_first_valid_gpu(uvm_parent_gpu_t *parent_gpu);
+
 #endif // __UVM_GPU_ISR_H__
--- a/kernel-open/nvidia-uvm/uvm_gpu_non_replayable_faults.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu_non_replayable_faults.c
@@ -212,6 +212,7 @@ static NV_STATUS fetch_non_replayable_fault_buffer_entries(uvm_parent_gpu_t *par

        // Make sure that all fields in the entry are properly initialized
        fault_entry->va_space = NULL;
+        fault_entry->gpu = NULL;
        fault_entry->is_fatal = (fault_entry->fault_type >= UVM_FAULT_TYPE_FATAL);
        fault_entry->filtered = false;

@@ -235,7 +236,7 @@ static NV_STATUS fetch_non_replayable_fault_buffer_entries(uvm_parent_gpu_t *par
    return NV_OK;
 }

-static bool use_clear_faulted_channel_sw_method(uvm_gpu_t *gpu)
+static bool use_clear_faulted_channel_sw_method(uvm_parent_gpu_t *parent_gpu)
 {
    // If true, UVM uses a SW method to request RM to do the clearing on its
    // behalf.
@@ -243,7 +244,7 @@ static bool use_clear_faulted_channel_sw_method(uvm_gpu_t *gpu)

    // In SRIOV, the UVM (guest) driver does not have access to the privileged
    // registers used to clear the faulted bit.
-    if (uvm_parent_gpu_is_virt_mode_sriov(gpu->parent))
+    if (uvm_parent_gpu_is_virt_mode_sriov(parent_gpu))
        use_sw_method = true;

    // In Confidential Computing access to the privileged registers is blocked,
@@ -253,17 +254,17 @@ static bool use_clear_faulted_channel_sw_method(uvm_gpu_t *gpu)
        use_sw_method = true;

    if (use_sw_method)
-        UVM_ASSERT(gpu->parent->has_clear_faulted_channel_sw_method);
+        UVM_ASSERT(parent_gpu->has_clear_faulted_channel_sw_method);

    return use_sw_method;
 }

-static NV_STATUS clear_faulted_method_on_gpu(uvm_gpu_t *gpu,
-                                             uvm_user_channel_t *user_channel,
+static NV_STATUS clear_faulted_method_on_gpu(uvm_user_channel_t *user_channel,
                                             const uvm_fault_buffer_entry_t *fault_entry,
                                             NvU32 batch_id,
                                             uvm_tracker_t *tracker)
 {
+    uvm_gpu_t *gpu = user_channel->gpu;
    NV_STATUS status;
    uvm_push_t push;
    uvm_non_replayable_fault_buffer_info_t *non_replayable_faults = &gpu->parent->fault_buffer_info.non_replayable;
@@ -283,7 +284,7 @@ static NV_STATUS clear_faulted_method_on_gpu(uvm_gpu_t *gpu,
        return status;
    }

-    if (use_clear_faulted_channel_sw_method(gpu))
+    if (use_clear_faulted_channel_sw_method(gpu->parent))
        gpu->parent->host_hal->clear_faulted_channel_sw_method(&push, user_channel, fault_entry);
    else
        gpu->parent->host_hal->clear_faulted_channel_method(&push, user_channel, fault_entry);
@@ -305,12 +306,12 @@ static NV_STATUS clear_faulted_method_on_gpu(uvm_gpu_t *gpu,
    return status;
 }

-static NV_STATUS clear_faulted_register_on_gpu(uvm_gpu_t *gpu,
-                                               uvm_user_channel_t *user_channel,
+static NV_STATUS clear_faulted_register_on_gpu(uvm_user_channel_t *user_channel,
                                               const uvm_fault_buffer_entry_t *fault_entry,
                                               NvU32 batch_id,
                                               uvm_tracker_t *tracker)
 {
+    uvm_gpu_t *gpu = user_channel->gpu;
    NV_STATUS status;

    UVM_ASSERT(!gpu->parent->has_clear_faulted_channel_method);
@@ -328,25 +329,26 @@ static NV_STATUS clear_faulted_register_on_gpu(uvm_gpu_t *gpu,
    return NV_OK;
 }

-static NV_STATUS clear_faulted_on_gpu(uvm_gpu_t *gpu,
-                                      uvm_user_channel_t *user_channel,
+static NV_STATUS clear_faulted_on_gpu(uvm_user_channel_t *user_channel,
                                      const uvm_fault_buffer_entry_t *fault_entry,
                                      NvU32 batch_id,
                                      uvm_tracker_t *tracker)
 {
-    if (gpu->parent->has_clear_faulted_channel_method || use_clear_faulted_channel_sw_method(gpu))
-        return clear_faulted_method_on_gpu(gpu, user_channel, fault_entry, batch_id, tracker);
+    uvm_gpu_t *gpu = user_channel->gpu;

-    return clear_faulted_register_on_gpu(gpu, user_channel, fault_entry, batch_id, tracker);
+    if (gpu->parent->has_clear_faulted_channel_method || use_clear_faulted_channel_sw_method(gpu->parent))
+        return clear_faulted_method_on_gpu(user_channel, fault_entry, batch_id, tracker);
+
+    return clear_faulted_register_on_gpu(user_channel, fault_entry, batch_id, tracker);
 }

-static NV_STATUS service_managed_fault_in_block_locked(uvm_gpu_t *gpu,
-                                                       uvm_va_block_t *va_block,
+static NV_STATUS service_managed_fault_in_block_locked(uvm_va_block_t *va_block,
                                                       uvm_va_block_retry_t *va_block_retry,
                                                       uvm_fault_buffer_entry_t *fault_entry,
                                                       uvm_service_block_context_t *service_context,
                                                       const bool hmm_migratable)
 {
+    uvm_gpu_t *gpu = fault_entry->gpu;
    NV_STATUS status = NV_OK;
    uvm_page_index_t page_index;
    uvm_perf_thrashing_hint_t thrashing_hint;
@@ -441,13 +443,13 @@ static NV_STATUS service_managed_fault_in_block_locked(uvm_gpu_t *gpu,
    return status;
 }

-static NV_STATUS service_managed_fault_in_block(uvm_gpu_t *gpu,
-                                                uvm_va_block_t *va_block,
+static NV_STATUS service_managed_fault_in_block(uvm_va_block_t *va_block,
                                                uvm_fault_buffer_entry_t *fault_entry,
                                                const bool hmm_migratable)
 {
    NV_STATUS status, tracker_status;
    uvm_va_block_retry_t va_block_retry;
+    uvm_gpu_t *gpu = fault_entry->gpu;
    uvm_service_block_context_t *service_context = &gpu->parent->fault_buffer_info.non_replayable.block_service_context;

    service_context->operation = UVM_SERVICE_OPERATION_NON_REPLAYABLE_FAULTS;
@@ -459,8 +461,7 @@ static NV_STATUS service_managed_fault_in_block(uvm_gpu_t *gpu,
    uvm_mutex_lock(&va_block->lock);

    status = UVM_VA_BLOCK_RETRY_LOCKED(va_block, &va_block_retry,
-                                       service_managed_fault_in_block_locked(gpu,
-                                                                             va_block,
+                                       service_managed_fault_in_block_locked(va_block,
                                                                             &va_block_retry,
                                                                             fault_entry,
                                                                             service_context,
@@ -502,16 +503,14 @@ static void kill_channel_delayed_entry(void *user_channel)
    UVM_ENTRY_VOID(kill_channel_delayed(user_channel));
 }

-static void schedule_kill_channel(uvm_gpu_t *gpu,
-                                  uvm_fault_buffer_entry_t *fault_entry,
-                                  uvm_user_channel_t *user_channel)
+static void schedule_kill_channel(uvm_fault_buffer_entry_t *fault_entry, uvm_user_channel_t *user_channel)
 {
    uvm_va_space_t *va_space = fault_entry->va_space;
-    uvm_non_replayable_fault_buffer_info_t *non_replayable_faults = &gpu->parent->fault_buffer_info.non_replayable;
+    uvm_parent_gpu_t *parent_gpu = fault_entry->gpu->parent;
+    uvm_non_replayable_fault_buffer_info_t *non_replayable_faults = &parent_gpu->fault_buffer_info.non_replayable;
    void *packet = (char *)non_replayable_faults->shadow_buffer_copy +
-                   (fault_entry->non_replayable.buffer_index * gpu->parent->fault_buffer_hal->entry_size(gpu->parent));
+                   (fault_entry->non_replayable.buffer_index * parent_gpu->fault_buffer_hal->entry_size(parent_gpu));

-    UVM_ASSERT(gpu);
    UVM_ASSERT(va_space);
    UVM_ASSERT(user_channel);

@@ -522,7 +521,7 @@ static void schedule_kill_channel(uvm_gpu_t *gpu,
    user_channel->kill_channel.va_space = va_space;

    // Save the packet to be handled by RM in the channel structure
-    memcpy(user_channel->kill_channel.fault_packet, packet, gpu->parent->fault_buffer_hal->entry_size(gpu->parent));
+    memcpy(user_channel->kill_channel.fault_packet, packet, parent_gpu->fault_buffer_hal->entry_size(parent_gpu));

    // Retain the channel here so it is not prematurely destroyed. It will be
    // released after forwarding the fault to RM in kill_channel_delayed.
@@ -533,7 +532,7 @@ static void schedule_kill_channel(uvm_gpu_t *gpu,
                           kill_channel_delayed_entry,
                           user_channel);

-    nv_kthread_q_schedule_q_item(&gpu->parent->isr.kill_channel_q,
+    nv_kthread_q_schedule_q_item(&parent_gpu->isr.kill_channel_q,
                                 &user_channel->kill_channel.kill_channel_q_item);
 }

@@ -550,6 +549,7 @@ static NV_STATUS service_non_managed_fault(uvm_gpu_va_space_t *gpu_va_space,
                                           uvm_fault_buffer_entry_t *fault_entry,
                                           NV_STATUS lookup_status)
 {
+    uvm_va_space_t *va_space = gpu_va_space->va_space;
    uvm_gpu_t *gpu = gpu_va_space->gpu;
    uvm_non_replayable_fault_buffer_info_t *non_replayable_faults = &gpu->parent->fault_buffer_info.non_replayable;
    uvm_ats_fault_invalidate_t *ats_invalidate = &non_replayable_faults->ats_invalidate;
@@ -557,9 +557,11 @@ static NV_STATUS service_non_managed_fault(uvm_gpu_va_space_t *gpu_va_space,
    NV_STATUS fatal_fault_status = NV_ERR_INVALID_ADDRESS;

    UVM_ASSERT(!fault_entry->is_fatal);
+    UVM_ASSERT(fault_entry->va_space == va_space);
+    UVM_ASSERT(fault_entry->gpu == gpu);

    // Avoid dropping fault events when the VA block is not found or cannot be created
-    uvm_perf_event_notify_gpu_fault(&fault_entry->va_space->perf_events,
+    uvm_perf_event_notify_gpu_fault(&va_space->perf_events,
                                    NULL,
                                    gpu->id,
                                    UVM_ID_INVALID,
@@ -577,32 +579,36 @@ static NV_STATUS service_non_managed_fault(uvm_gpu_va_space_t *gpu_va_space,
        uvm_fault_access_type_t fault_access_type = fault_entry->fault_access_type;
        uvm_ats_fault_context_t *ats_context = &non_replayable_faults->ats_context;

-        uvm_page_mask_zero(&ats_context->read_fault_mask);
-        uvm_page_mask_zero(&ats_context->write_fault_mask);
+        uvm_page_mask_zero(&ats_context->faults.read_fault_mask);
+        uvm_page_mask_zero(&ats_context->faults.write_fault_mask);
+        uvm_page_mask_zero(&ats_context->faults.accessed_mask);

        ats_context->client_type = UVM_FAULT_CLIENT_TYPE_HUB;

        ats_invalidate->tlb_batch_pending = false;

-        va_range_next = uvm_va_space_iter_first(gpu_va_space->va_space, fault_entry->fault_address, ~0ULL);
+        va_range_next = uvm_va_space_iter_first(va_space, fault_entry->fault_address, ~0ULL);

        // The VA isn't managed. See if ATS knows about it.
        vma = find_vma_intersection(mm, fault_address, fault_address + 1);
-        if (!vma || uvm_ats_check_in_gmmu_region(gpu_va_space->va_space, fault_address, va_range_next)) {
+        if (!vma || uvm_ats_check_in_gmmu_region(va_space, fault_address, va_range_next)) {

            // Do not return error due to logical errors in the application
            status = NV_OK;
        }
        else {
            NvU64 base = UVM_VA_BLOCK_ALIGN_DOWN(fault_address);
-            uvm_page_mask_t *faults_serviced_mask = &ats_context->faults_serviced_mask;
+            uvm_page_mask_t *faults_serviced_mask = &ats_context->faults.faults_serviced_mask;
+            uvm_page_mask_t *accessed_mask = &ats_context->faults.accessed_mask;
            uvm_page_index_t page_index = (fault_address - base) / PAGE_SIZE;
            uvm_page_mask_t *fault_mask = (fault_access_type >= UVM_FAULT_ACCESS_TYPE_WRITE) ?
-                                                                                       &ats_context->write_fault_mask :
-                                                                                       &ats_context->read_fault_mask;
+                                                                                &ats_context->faults.write_fault_mask :
+                                                                                &ats_context->faults.read_fault_mask;

            uvm_page_mask_set(fault_mask, page_index);

+            uvm_page_mask_set(accessed_mask, page_index);
+
            status = uvm_ats_service_faults(gpu_va_space, vma, base, ats_context);
            if (status == NV_OK) {
                // Invalidate ATS TLB entries if needed
@@ -631,19 +637,24 @@ static NV_STATUS service_non_managed_fault(uvm_gpu_va_space_t *gpu_va_space,
    return status;
 }

-static NV_STATUS service_fault_once(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fault_entry, const bool hmm_migratable)
+static NV_STATUS service_fault_once(uvm_parent_gpu_t *parent_gpu,
+                                    uvm_fault_buffer_entry_t *fault_entry,
+                                    const bool hmm_migratable)
 {
    NV_STATUS status;
    uvm_user_channel_t *user_channel;
    uvm_va_block_t *va_block;
-    uvm_va_space_t *va_space = NULL;
+    uvm_va_space_t *va_space;
    struct mm_struct *mm;
    uvm_gpu_va_space_t *gpu_va_space;
-    uvm_non_replayable_fault_buffer_info_t *non_replayable_faults = &gpu->parent->fault_buffer_info.non_replayable;
-    uvm_va_block_context_t *va_block_context =
-        gpu->parent->fault_buffer_info.non_replayable.block_service_context.block_context;
+    uvm_gpu_t *gpu;
+    uvm_non_replayable_fault_buffer_info_t *non_replayable_faults = &parent_gpu->fault_buffer_info.non_replayable;
+    uvm_va_block_context_t *va_block_context = non_replayable_faults->block_service_context.block_context;

-    status = uvm_parent_gpu_fault_entry_to_va_space(gpu->parent, fault_entry, &va_space);
+    status = uvm_parent_gpu_fault_entry_to_va_space(parent_gpu,
+                                                    fault_entry,
+                                                    &va_space,
+                                                    &gpu);
    if (status != NV_OK) {
        // The VA space lookup will fail if we're running concurrently with
        // removal of the channel from the VA space (channel unregister, GPU VA
@@ -657,10 +668,12 @@ static NV_STATUS service_fault_once(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fa
        // replayable faults only use the address space of their channel.
        UVM_ASSERT(status == NV_ERR_INVALID_CHANNEL);
        UVM_ASSERT(!va_space);
+        UVM_ASSERT(!gpu);
        return NV_OK;
    }

    UVM_ASSERT(va_space);
+    UVM_ASSERT(gpu);

    // If an mm is registered with the VA space, we have to retain it
    // in order to lock it before locking the VA space. It is guaranteed
@@ -671,8 +684,7 @@ static NV_STATUS service_fault_once(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fa

    uvm_va_space_down_read(va_space);

-    gpu_va_space = uvm_gpu_va_space_get_by_parent_gpu(va_space, gpu->parent);
-
+    gpu_va_space = uvm_gpu_va_space_get(va_space, gpu);
    if (!gpu_va_space) {
        // The va_space might have gone away. See the comment above.
        status = NV_OK;
@@ -680,6 +692,7 @@ static NV_STATUS service_fault_once(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fa
    }

    fault_entry->va_space = va_space;
+    fault_entry->gpu = gpu;

    user_channel = uvm_gpu_va_space_get_user_channel(gpu_va_space, fault_entry->instance_ptr);
    if (!user_channel) {
@@ -692,26 +705,25 @@ static NV_STATUS service_fault_once(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fa

    if (!fault_entry->is_fatal) {
        if (mm) {
-            status = uvm_va_block_find_create(fault_entry->va_space,
+            status = uvm_va_block_find_create(va_space,
                                              fault_entry->fault_address,
                                              &va_block_context->hmm.vma,
                                              &va_block);
        }
        else {
-            status = uvm_va_block_find_create_managed(fault_entry->va_space,
+            status = uvm_va_block_find_create_managed(va_space,
                                                      fault_entry->fault_address,
                                                      &va_block);
        }
        if (status == NV_OK)
-            status = service_managed_fault_in_block(gpu_va_space->gpu, va_block, fault_entry, hmm_migratable);
+            status = service_managed_fault_in_block(va_block, fault_entry, hmm_migratable);
        else
            status = service_non_managed_fault(gpu_va_space, mm, fault_entry, status);

        // We are done, we clear the faulted bit on the channel, so it can be
        // re-scheduled again
        if (status == NV_OK && !fault_entry->is_fatal) {
-            status = clear_faulted_on_gpu(gpu,
-                                          user_channel,
+            status = clear_faulted_on_gpu(user_channel,
                                          fault_entry,
                                          non_replayable_faults->batch_id,
                                          &non_replayable_faults->fault_service_tracker);
@@ -720,13 +732,13 @@ static NV_STATUS service_fault_once(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fa
    }

    if (fault_entry->is_fatal)
-        uvm_tools_record_gpu_fatal_fault(gpu->id, fault_entry->va_space, fault_entry, fault_entry->fatal_reason);
+        uvm_tools_record_gpu_fatal_fault(gpu->id, va_space, fault_entry, fault_entry->fatal_reason);

    if (fault_entry->is_fatal ||
        (status != NV_OK &&
         status != NV_WARN_MORE_PROCESSING_REQUIRED &&
         status != NV_WARN_MISMATCHED_TARGET))
-        schedule_kill_channel(gpu, fault_entry, user_channel);
+        schedule_kill_channel(fault_entry, user_channel);

 exit_no_channel:
    uvm_va_space_up_read(va_space);
@@ -735,22 +747,23 @@ exit_no_channel:
    if (status != NV_OK &&
        status != NV_WARN_MORE_PROCESSING_REQUIRED &&
        status != NV_WARN_MISMATCHED_TARGET)
-        UVM_DBG_PRINT("Error servicing non-replayable faults on GPU: %s\n", uvm_gpu_name(gpu));
+        UVM_DBG_PRINT("Error servicing non-replayable faults on GPU: %s\n",
+                      uvm_parent_gpu_name(parent_gpu));

    return status;
 }

-static NV_STATUS service_fault(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fault_entry)
+static NV_STATUS service_fault(uvm_parent_gpu_t *parent_gpu, uvm_fault_buffer_entry_t *fault_entry)
 {
    uvm_service_block_context_t *service_context =
-        &gpu->parent->fault_buffer_info.non_replayable.block_service_context;
+        &parent_gpu->fault_buffer_info.non_replayable.block_service_context;
    NV_STATUS status;
    bool hmm_migratable = true;

    service_context->num_retries = 0;

    do {
-        status = service_fault_once(gpu, fault_entry, hmm_migratable);
+        status = service_fault_once(parent_gpu, fault_entry, hmm_migratable);
        if (status == NV_WARN_MISMATCHED_TARGET) {
            hmm_migratable = false;
            status = NV_WARN_MORE_PROCESSING_REQUIRED;
@@ -760,7 +773,7 @@ static NV_STATUS service_fault(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fault_e
    return status;
 }

-void uvm_gpu_service_non_replayable_fault_buffer(uvm_gpu_t *gpu)
+void uvm_parent_gpu_service_non_replayable_fault_buffer(uvm_parent_gpu_t *parent_gpu)
 {
    NvU32 cached_faults;

@@ -772,7 +785,7 @@ void uvm_gpu_service_non_replayable_fault_buffer(uvm_gpu_t *gpu)
        NV_STATUS status;
        NvU32 i;

-        status = fetch_non_replayable_fault_buffer_entries(gpu->parent, &cached_faults);
+        status = fetch_non_replayable_fault_buffer_entries(parent_gpu, &cached_faults);
        if (status != NV_OK)
            return;

@@ -780,7 +793,7 @@ void uvm_gpu_service_non_replayable_fault_buffer(uvm_gpu_t *gpu)
        // non-replayable faults since getting multiple faults on the same
        // memory region is not very likely
        for (i = 0; i < cached_faults; ++i) {
-            status = service_fault(gpu, &gpu->parent->fault_buffer_info.non_replayable.fault_cache[i]);
+            status = service_fault(parent_gpu, &parent_gpu->fault_buffer_info.non_replayable.fault_cache[i]);
            if (status != NV_OK)
                return;
        }
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Bernhard Stoeckner	9d0b0414a5	565.77	2024-12-05 16:37:35 +01:00
Bernhard Stoeckner	d5a0858f90	565.57.01	2024-10-22 17:38:58 +02:00
Gaurav Juvekar	ed4be64962	560.35.03	2024-08-19 10:46:21 -07:00
Gaurav Juvekar	315fd96d2d	560.31.02	2024-07-31 11:27:06 -07:00
Gaurav Juvekar	448d5cc656	560.28.03	2024-07-19 15:45:15 -07:00
Bernhard Stoeckner	5fdf5032fb	555.58.02 (cherry picked from commit `1795a8bb20`)	2024-07-19 15:38:11 -07:00
Milos Tijanic	171c735e57	555.58 (cherry picked from commit `af77e083a2`)	2024-07-19 15:38:08 -07:00
Bernhard Stoeckner	74ee05e160	555.52.04 (cherry picked from commit `78d807e001`)	2024-07-19 15:38:04 -07:00
Bernhard Stoeckner	3084c04453	555.42.02 (cherry picked from commit `5a1c474040`)	2024-07-19 15:38:00 -07:00
Bernhard Stoeckner	caa2dd11a0	550.100	2024-07-09 15:49:19 +02:00
Bernhard Stoeckner	e45d91de02	550.90.07	2024-06-04 13:48:03 +02:00
Bernhard Stoeckner	083cd9cf17	550.78	2024-04-25 16:24:58 +02:00
Bernhard Stoeckner	ea4c27fad6	550.76	2024-04-17 17:23:37 +02:00
Bernhard Stoeckner	3bf16b890c	550.67	2024-03-19 16:56:28 +01:00
Bernhard Stoeckner	12933b2d3c	550.54.15	2024-03-18 17:52:11 +01:00