535.216.03

535.216.01
535.183.06
2026-01-27 19:49:47 +00:00 · 2024-11-19 17:42:03 +01:00 · 2024-10-22 17:35:00 +02:00 · 2024-07-09 17:24:25 +02:00 · 2024-06-04 10:45:14 +02:00 · 2024-05-08 08:14:09 -07:00
994 changed files with 124641 additions and 144695 deletions
--- a/.github/ISSUE_TEMPLATE/20_build_bug.yml
+++ b/.github/ISSUE_TEMPLATE/20_build_bug.yml
@@ -32,14 +32,6 @@ body:
    description: "Which kernel are you running? (output of `uname -a`, say if you built it yourself)."
  validations:
    required: true
- type: checkboxes
-  id: sw_host_kernel_stable
-  attributes:
-    label: "Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels."
-    options:
-    - label: "I am running on a stable kernel release."
-  validations:
-    required: true
 - type: textarea
  id: bug_description
  attributes:
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,196 +0,0 @@
-# Changelog
-
-## Release 545 Entries
-
-### [545.29.06] 2023-11-22
-
-#### Fixed
-
- The brightness control of NVIDIA seems to be broken, [#573](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/573)
-
-### [545.29.02] 2023-10-31
-
-### [545.23.06] 2023-10-17
-
-#### Fixed
-
- Fix always-false conditional, [#493](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/493) by @meme8383
-
-#### Added
-
- Added beta-quality support for GeForce and Workstation GPUs. Please see the "Open Linux Kernel Modules" chapter in the NVIDIA GPU driver end user README for details.
-
-## Release 535 Entries
-
-### [535.129.03] 2023-10-31
-
-### [535.113.01] 2023-09-21
-
-#### Fixed
-
- Fixed building main against current centos stream 8 fails, [#550](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/550) by @airlied
-
-### [535.104.05] 2023-08-22
-
-### [535.98] 2023-08-08
-
-### [535.86.10] 2023-07-31
-
-### [535.86.05] 2023-07-18
-
-### [535.54.03] 2023-06-14
-
-### [535.43.02] 2023-05-30
-
-#### Fixed
-
- Fixed console restore with traditional VGA consoles.
-
-#### Added
-
- Added support for Run Time D3 (RTD3) on Ampere and later GPUs.
- Added support for G-Sync on desktop GPUs.
-
-## Release 530 Entries
-
-### [530.41.03] 2023-03-23
-
-### [530.30.02] 2023-02-28
-
-#### Changed
-
- GSP firmware is now distributed as `gsp_tu10x.bin` and `gsp_ga10x.bin` to better reflect the GPU architectures supported by each firmware file in this release.
-    - The .run installer will continue to install firmware to /lib/firmware/nvidia/<version> and the nvidia.ko kernel module will load the appropriate firmware for each GPU at runtime.
-  
-#### Fixed
-
- Add support for resizable BAR on Linux when NVreg_EnableResizableBar=1 module param is set. [#3](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/3) by @sjkelly
-
-#### Added
-
- Support for power management features like Suspend, Hibernate and Resume.
-
-## Release 525 Entries
-
-### [525.147.05] 2023-10-31
-
-#### Fixed
-
- Fix nvidia_p2p_get_pages(): Fix double-free in register-callback error path, [#557](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/557) by @BrendanCunningham
-
-### [525.125.06] 2023-06-26
-
-### [525.116.04] 2023-05-09
-
-### [525.116.03] 2023-04-25
-
-### [525.105.17] 2023-03-30
-
-### [525.89.02] 2023-02-08
-
-### [525.85.12] 2023-01-30
-
-### [525.85.05] 2023-01-19
-
-#### Fixed
-
- Fix build problems with Clang 15.0, [#377](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/377) by @ptr1337
-
-### [525.78.01] 2023-01-05
-
-### [525.60.13] 2022-12-05
-
-### [525.60.11] 2022-11-28
-
-#### Fixed
-
- Fixed nvenc compatibility with usermode clients [#104](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/104)
-
-### [525.53] 2022-11-10
-
-#### Changed
-
- GSP firmware is now distributed as multiple firmware files: this release has `gsp_tu10x.bin` and `gsp_ad10x.bin` replacing `gsp.bin` from previous releases.
-    - Each file is named after a GPU architecture and supports GPUs from one or more architectures. This allows GSP firmware to better leverage each architecture's capabilities.
-    - The .run installer will continue to install firmware to `/lib/firmware/nvidia/<version>` and the `nvidia.ko` kernel module will load the appropriate firmware for each GPU at runtime.
-
-#### Fixed
-
- Add support for IBT (indirect branch tracking) on supported platforms, [#256](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/256) by @rnd-ash
- Return EINVAL when [failing to] allocating memory, [#280](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/280) by @YusufKhan-gamedev
- Fix various typos in nvidia/src/kernel, [#16](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/16) by @alexisgeoffrey
- Added support for rotation in X11, Quadro Sync, Stereo, and YUV 4:2:0 on Turing.
-
-## Release 520 Entries
-
-### [520.61.07] 2022-10-20
-
-### [520.56.06] 2022-10-12
-
-#### Added
-
- Introduce support for GeForce RTX 4090 GPUs.
-
-### [520.61.05] 2022-10-10
-
-#### Added
-
- Introduce support for NVIDIA H100 GPUs.
-
-#### Fixed
-
- Fix/Improve Makefile, [#308](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/308/) by @izenynn
- Make nvLogBase2 more efficient, [#177](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/177/) by @DMaroo
- nv-pci: fixed always true expression, [#195](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/195/) by @ValZapod
-
-## Release 515 Entries
-
-### [515.76] 2022-09-20
-
-#### Fixed
-
- Improved compatibility with new Linux kernel releases
- Fixed possible excessive GPU power draw on an idle X11 or Wayland desktop when driving high resolutions or refresh rates
-
-### [515.65.07] 2022-10-19
-
-### [515.65.01] 2022-08-02
-
-#### Fixed
-
- Collection of minor fixes to issues, [#6](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/61) by @Joshua-Ashton
- Remove unnecessary use of acpi_bus_get_device().
-
-### [515.57] 2022-06-28
-
-#### Fixed
-
- Backtick is deprecated, [#273](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/273) by @arch-user-france1
-
-### [515.48.07] 2022-05-31
-
-#### Added
-
- List of compatible GPUs in README.md.
-
-#### Fixed
-
- Fix various README capitalizations, [#8](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/8) by @27lx 
- Automatically tag bug report issues, [#15](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/15) by @thebeanogamer
- Improve conftest.sh Script, [#37](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/37) by @Nitepone
- Update HTTP link to HTTPS, [#101](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/101) by @alcaparra
- moved array sanity check to before the array access, [#117](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/117) by @RealAstolfo
- Fixed some typos, [#122](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/122) by @FEDOyt
- Fixed capitalization, [#123](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/123) by @keroeslux
- Fix typos in NVDEC Engine Descriptor, [#126](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/126) from @TrickyDmitriy
- Extranous apostrohpes in a makefile script [sic], [#14](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/14) by @kiroma
- HDMI no audio @ 4K above 60Hz, [#75](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/75) by @adolfotregosa
- dp_configcaps.cpp:405: array index sanity check in wrong place?, [#110](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/110) by @dcb314
- NVRM kgspInitRm_IMPL: missing NVDEC0 engine, cannot initialize GSP-RM, [#116](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/116) by @kfazz
- ERROR: modpost: "backlight_device_register" [...nvidia-modeset.ko] undefined, [#135](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/135) by @sndirsch
- aarch64 build fails, [#151](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/151) by @frezbo
-
-### [515.43.04] 2022-05-11
-
- Initial release.
-
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 # NVIDIA Linux Open GPU Kernel Module Source

 This is the source release of the NVIDIA Linux open GPU kernel modules,
-version 545.29.06.
+version 535.216.03.


 ## How to Build
@@ -17,7 +17,7 @@ as root:

 Note that the kernel modules built here must be used with GSP
 firmware and user-space NVIDIA GPU driver components from a corresponding
-545.29.06 driver release.  This can be achieved by installing
+535.216.03 driver release.  This can be achieved by installing
 the NVIDIA GPU driver from the .run file using the `--no-kernel-modules`
 option.  E.g.,

@@ -179,16 +179,16 @@ software applications.

 ## Compatible GPUs

-The NVIDIA open kernel modules can be used on any Turing or later GPU
-(see the table below). However, in the __DRIVER_VERION__ release, GeForce and
-Workstation support is considered to be Beta quality. The open kernel modules
-are suitable for broad usage, and NVIDIA requests feedback on any issues
-encountered specific to them.
+The open-gpu-kernel-modules can be used on any Turing or later GPU
+(see the table below). However, in the 535.216.03 release,
+GeForce and Workstation support is still considered alpha-quality.

-For details on feature support and limitations, see the NVIDIA GPU driver
-end user README here:
+To enable use of the open kernel modules on GeForce and Workstation GPUs,
+set the "NVreg_OpenRmEnableUnsupportedGpus" nvidia.ko kernel module
+parameter to 1. For more details, see the NVIDIA GPU driver end user
+README here:

-https://us.download.nvidia.com/XFree86/Linux-x86_64/545.29.06/README/kernel_open.html
+https://us.download.nvidia.com/XFree86/Linux-x86_64/535.216.03/README/kernel_open.html

 In the below table, if three IDs are listed, the first is the PCI Device 
 ID, the second is the PCI Subsystem Vendor ID, and the third is the PCI
@@ -648,6 +648,7 @@ Subsystem Device ID.
 | NVIDIA T1000 8GB                                | 1FF0 17AA 1612 |
 | NVIDIA T400 4GB                                 | 1FF2 1028 1613 |
 | NVIDIA T400 4GB                                 | 1FF2 103C 1613 |
+| NVIDIA T400E                                    | 1FF2 103C 18FF |
 | NVIDIA T400 4GB                                 | 1FF2 103C 8A80 |
 | NVIDIA T400 4GB                                 | 1FF2 10DE 1613 |
 | NVIDIA T400 4GB                                 | 1FF2 17AA 1613 |
@@ -683,6 +684,7 @@ Subsystem Device ID.
 | NVIDIA A800 40GB Active                         | 20F6 103C 180A |
 | NVIDIA A800 40GB Active                         | 20F6 10DE 180A |
 | NVIDIA A800 40GB Active                         | 20F6 17AA 180A |
+| NVIDIA AX800                                    | 20FD 10DE 17F8 |
 | NVIDIA GeForce GTX 1660 Ti                      | 2182           |
 | NVIDIA GeForce GTX 1660                         | 2184           |
 | NVIDIA GeForce GTX 1650 SUPER                   | 2187           |
@@ -745,12 +747,16 @@ Subsystem Device ID.
 | NVIDIA H800 PCIe                                | 2322 10DE 17A4 |
 | NVIDIA H800                                     | 2324 10DE 17A6 |
 | NVIDIA H800                                     | 2324 10DE 17A8 |
+| NVIDIA H20                                      | 2329 10DE 198B |
+| NVIDIA H20                                      | 2329 10DE 198C |
+| NVIDIA H20-3e                                   | 232C 10DE 2063 |
 | NVIDIA H100 80GB HBM3                           | 2330 10DE 16C0 |
 | NVIDIA H100 80GB HBM3                           | 2330 10DE 16C1 |
 | NVIDIA H100 PCIe                                | 2331 10DE 1626 |
 | NVIDIA H100                                     | 2339 10DE 17FC |
 | NVIDIA H800 NVL                                 | 233A 10DE 183A |
 | NVIDIA GH200 120GB                              | 2342 10DE 16EB |
+| NVIDIA GH200 120GB                              | 2342 10DE 1805 |
 | NVIDIA GH200 480GB                              | 2342 10DE 1809 |
 | NVIDIA GeForce RTX 3060 Ti                      | 2414           |
 | NVIDIA GeForce RTX 3080 Ti Laptop GPU           | 2420           |
@@ -804,6 +810,7 @@ Subsystem Device ID.
 | NVIDIA RTX A2000 12GB                           | 2571 10DE 1611 |
 | NVIDIA RTX A2000 12GB                           | 2571 17AA 1611 |
 | NVIDIA GeForce RTX 3050                         | 2582           |
+| NVIDIA GeForce RTX 3050                         | 2584           |
 | NVIDIA GeForce RTX 3050 Ti Laptop GPU           | 25A0           |
 | NVIDIA GeForce RTX 3050Ti Laptop GPU            | 25A0 103C 8928 |
 | NVIDIA GeForce RTX 3050Ti Laptop GPU            | 25A0 103C 89F9 |
@@ -844,10 +851,13 @@ Subsystem Device ID.
 | NVIDIA RTX 5000 Ada Generation                  | 26B2 103C 17FA |
 | NVIDIA RTX 5000 Ada Generation                  | 26B2 10DE 17FA |
 | NVIDIA RTX 5000 Ada Generation                  | 26B2 17AA 17FA |
+| NVIDIA RTX 5880 Ada Generation                  | 26B3 103C 1934 |
+| NVIDIA RTX 5880 Ada Generation                  | 26B3 10DE 1934 |
 | NVIDIA L40                                      | 26B5 10DE 169D |
 | NVIDIA L40                                      | 26B5 10DE 17DA |
 | NVIDIA L40S                                     | 26B9 10DE 1851 |
 | NVIDIA L40S                                     | 26B9 10DE 18CF |
+| NVIDIA L20                                      | 26BA 10DE 1957 |
 | NVIDIA GeForce RTX 4080                         | 2704           |
 | NVIDIA GeForce RTX 4090 Laptop GPU              | 2717           |
 | NVIDIA RTX 5000 Ada Generation Laptop GPU       | 2730           |
@@ -868,6 +878,7 @@ Subsystem Device ID.
 | NVIDIA RTX 4000 Ada Generation                  | 27B2 103C 181B |
 | NVIDIA RTX 4000 Ada Generation                  | 27B2 10DE 181B |
 | NVIDIA RTX 4000 Ada Generation                  | 27B2 17AA 181B |
+| NVIDIA L2                                       | 27B6 10DE 1933 |
 | NVIDIA L4                                       | 27B8 10DE 16CA |
 | NVIDIA L4                                       | 27B8 10DE 16EE |
 | NVIDIA RTX 4000 Ada Generation Laptop GPU       | 27BA           |
@@ -883,6 +894,9 @@ Subsystem Device ID.
 | NVIDIA GeForce RTX 4060 Laptop GPU              | 28A0           |
 | NVIDIA GeForce RTX 4050 Laptop GPU              | 28A1           |
 | NVIDIA RTX 2000 Ada Generation Laptop GPU       | 28B8           |
+| NVIDIA RTX 1000 Ada Generation Laptop GPU       | 28B9           |
+| NVIDIA RTX 500 Ada Generation Laptop GPU        | 28BA           |
+| NVIDIA RTX 500 Ada Generation Laptop GPU        | 28BB           |
 | NVIDIA GeForce RTX 4060 Laptop GPU              | 28E0           |
 | NVIDIA GeForce RTX 4050 Laptop GPU              | 28E1           |
 | NVIDIA RTX 2000 Ada Generation Embedded GPU     | 28F8           |
--- a/kernel-open/Kbuild
+++ b/kernel-open/Kbuild
@@ -72,24 +72,12 @@ EXTRA_CFLAGS += -I$(src)/common/inc
 EXTRA_CFLAGS += -I$(src)
 EXTRA_CFLAGS += -Wall $(DEFINES) $(INCLUDES) -Wno-cast-qual -Wno-error -Wno-format-extra-args
 EXTRA_CFLAGS += -D__KERNEL__ -DMODULE -DNVRM
-EXTRA_CFLAGS += -DNV_VERSION_STRING=\"545.29.06\"
+EXTRA_CFLAGS += -DNV_VERSION_STRING=\"535.216.03\"

 ifneq ($(SYSSRCHOST1X),)
 EXTRA_CFLAGS += -I$(SYSSRCHOST1X)
 endif

-# Some Android kernels prohibit driver use of filesystem functions like
-# filp_open() and kernel_read(). Disable the NV_FILESYSTEM_ACCESS_AVAILABLE
-# functionality that uses those functions when building for Android.
-
-PLATFORM_IS_ANDROID ?= 0
-
-ifeq ($(PLATFORM_IS_ANDROID),1)
- EXTRA_CFLAGS += -DNV_FILESYSTEM_ACCESS_AVAILABLE=0
-else
- EXTRA_CFLAGS += -DNV_FILESYSTEM_ACCESS_AVAILABLE=1
-endif
-
 EXTRA_CFLAGS += -Wno-unused-function

 ifneq ($(NV_BUILD_TYPE),debug)
@@ -104,6 +92,7 @@ endif

 ifeq ($(NV_BUILD_TYPE),debug)
 EXTRA_CFLAGS += -g
+ EXTRA_CFLAGS += $(call cc-option,-gsplit-dwarf,)
 endif

 EXTRA_CFLAGS += -ffreestanding
@@ -134,6 +123,9 @@ ifneq ($(wildcard /proc/sgi_uv),)
 EXTRA_CFLAGS += -DNV_CONFIG_X86_UV
 endif

+ifdef VGX_FORCE_VFIO_PCI_CORE
+ EXTRA_CFLAGS += -DNV_VGPU_FORCE_VFIO_PCI_CORE
+endif

 #
 # The conftest.sh script tests various aspects of the target kernel.
@@ -160,6 +152,8 @@ NV_CONFTEST_CMD := /bin/sh $(NV_CONFTEST_SCRIPT) \
 NV_CFLAGS_FROM_CONFTEST := $(shell $(NV_CONFTEST_CMD) build_cflags)

 NV_CONFTEST_CFLAGS = $(NV_CFLAGS_FROM_CONFTEST) $(EXTRA_CFLAGS) -fno-pie
+NV_CONFTEST_CFLAGS += $(call cc-disable-warning,pointer-sign)
+NV_CONFTEST_CFLAGS += $(call cc-option,-fshort-wchar,)

 NV_CONFTEST_COMPILE_TEST_HEADERS := $(obj)/conftest/macros.h
 NV_CONFTEST_COMPILE_TEST_HEADERS += $(obj)/conftest/functions.h
@@ -225,7 +219,6 @@ $(obj)/conftest/patches.h: $(NV_CONFTEST_SCRIPT)
 NV_HEADER_PRESENCE_TESTS = \
 asm/system.h \
 drm/drmP.h \
- drm/drm_aperture.h \
 drm/drm_auth.h \
 drm/drm_gem.h \
 drm/drm_crtc.h \
@@ -236,7 +229,6 @@ NV_HEADER_PRESENCE_TESTS = \
 drm/drm_encoder.h \
 drm/drm_atomic_uapi.h \
 drm/drm_drv.h \
- drm/drm_fbdev_generic.h \
 drm/drm_framebuffer.h \
 drm/drm_connector.h \
 drm/drm_probe_helper.h \
@@ -270,7 +262,6 @@ NV_HEADER_PRESENCE_TESTS = \
 linux/sched/task_stack.h \
 xen/ioemu.h \
 linux/fence.h \
- linux/dma-fence.h \
 linux/dma-resv.h \
 soc/tegra/chip-id.h \
 soc/tegra/fuse.h \
@@ -316,7 +307,6 @@ NV_HEADER_PRESENCE_TESTS = \
 linux/mdev.h \
 soc/tegra/bpmp-abi.h \
 soc/tegra/bpmp.h \
- linux/sync_file.h \
 linux/cc_platform.h \
 asm/cpufeature.h

--- a/kernel-open/Makefile
+++ b/kernel-open/Makefile
@@ -28,7 +28,7 @@ else
  else
    KERNEL_UNAME ?= $(shell uname -r)
    KERNEL_MODLIB := /lib/modules/$(KERNEL_UNAME)
-    KERNEL_SOURCES := $(shell test -d $(KERNEL_MODLIB)/source && echo $(KERNEL_MODLIB)/source || echo $(KERNEL_MODLIB)/build)
+    KERNEL_SOURCES := $(shell ((test -d $(KERNEL_MODLIB)/source && echo $(KERNEL_MODLIB)/source) || (test -d $(KERNEL_MODLIB)/build/source && echo $(KERNEL_MODLIB)/build/source)) || echo $(KERNEL_MODLIB)/build)
  endif

  KERNEL_OUTPUT := $(KERNEL_SOURCES)
@@ -42,7 +42,11 @@ else
  else
    KERNEL_UNAME ?= $(shell uname -r)
    KERNEL_MODLIB := /lib/modules/$(KERNEL_UNAME)
-    ifeq ($(KERNEL_SOURCES), $(KERNEL_MODLIB)/source)
+    # $(filter patter...,text) - Returns all whitespace-separated words in text that
+    # do match any of the pattern words, removing any words that do not match.
+    # Set the KERNEL_OUTPUT only if either $(KERNEL_MODLIB)/source or
+    # $(KERNEL_MODLIB)/build/source path matches the KERNEL_SOURCES.
+    ifneq ($(filter $(KERNEL_SOURCES),$(KERNEL_MODLIB)/source $(KERNEL_MODLIB)/build/source),)
      KERNEL_OUTPUT := $(KERNEL_MODLIB)/build
      KBUILD_PARAMS := KBUILD_OUTPUT=$(KERNEL_OUTPUT)
    endif
--- a/kernel-open/common/inc/nv-hypervisor.h
+++ b/kernel-open/common/inc/nv-hypervisor.h
@@ -37,13 +37,11 @@ typedef enum _HYPERVISOR_TYPE
    OS_HYPERVISOR_UNKNOWN
 } HYPERVISOR_TYPE;

-#define CMD_VGPU_VFIO_WAKE_WAIT_QUEUE         0
-#define CMD_VGPU_VFIO_INJECT_INTERRUPT        1
-#define CMD_VGPU_VFIO_REGISTER_MDEV           2
-#define CMD_VGPU_VFIO_PRESENT                 3
-#define CMD_VFIO_PCI_CORE_PRESENT             4
+#define CMD_VFIO_WAKE_REMOVE_GPU              1
+#define CMD_VGPU_VFIO_PRESENT                 2
+#define CMD_VFIO_PCI_CORE_PRESENT             3

-#define MAX_VF_COUNT_PER_GPU 64
+#define MAX_VF_COUNT_PER_GPU                  64

 typedef enum _VGPU_TYPE_INFO
 {
@@ -54,17 +52,11 @@ typedef enum _VGPU_TYPE_INFO

 typedef struct
 {
-    void  *vgpuVfioRef;
-    void  *waitQueue;
    void  *nv;
-    NvU32 *vgpuTypeIds;
-    NvU8 **vgpuNames;
-    NvU32  numVgpuTypes;
-    NvU32  domain;
-    NvU8   bus;
-    NvU8   slot;
-    NvU8   function;
-    NvBool is_virtfn;
+    NvU32 domain;
+    NvU32 bus;
+    NvU32 device;
+    NvU32 return_status;
 } vgpu_vfio_info;

 typedef struct
--- a/kernel-open/common/inc/nv-ioctl-numa.h
+++ b/kernel-open/common/inc/nv-ioctl-numa.h
@@ -25,12 +25,14 @@
 #ifndef NV_IOCTL_NUMA_H
 #define NV_IOCTL_NUMA_H

+#if defined(NV_LINUX)
+
 #include <nv-ioctl-numbers.h>

-#if defined(NV_KERNEL_INTERFACE_LAYER) && defined(NV_LINUX)
+#if defined(NV_KERNEL_INTERFACE_LAYER)
+
 #include <linux/types.h>
-#elif defined (NV_KERNEL_INTERFACE_LAYER) && defined(NV_BSD)
-#include <sys/stdint.h>
+
 #else

 #include <stdint.h>
@@ -79,3 +81,5 @@ typedef struct nv_ioctl_set_numa_status
 #define NV_IOCTL_NUMA_STATUS_OFFLINE_FAILED         6

 #endif
+
+#endif
--- a/kernel-open/common/inc/nv-kthread-q.h
+++ b/kernel-open/common/inc/nv-kthread-q.h
@@ -24,14 +24,13 @@
 #ifndef __NV_KTHREAD_QUEUE_H__
 #define __NV_KTHREAD_QUEUE_H__

-struct nv_kthread_q;
-struct nv_kthread_q_item;
-typedef struct nv_kthread_q nv_kthread_q_t;
-typedef struct nv_kthread_q_item nv_kthread_q_item_t;
+#include <linux/types.h>            // atomic_t
+#include <linux/list.h>             // list
+#include <linux/sched.h>            // task_struct
+#include <linux/numa.h>             // NUMA_NO_NODE
+#include <linux/semaphore.h>

-typedef void (*nv_q_func_t)(void *args);
-
-#include "nv-kthread-q-os.h"
+#include "conftest.h"

 ////////////////////////////////////////////////////////////////////////////////
 // nv_kthread_q:
@@ -86,6 +85,38 @@ typedef void (*nv_q_func_t)(void *args);
 //
 ////////////////////////////////////////////////////////////////////////////////

+typedef struct nv_kthread_q nv_kthread_q_t;
+typedef struct nv_kthread_q_item nv_kthread_q_item_t;
+
+typedef void (*nv_q_func_t)(void *args);
+
+struct nv_kthread_q
+{
+    struct list_head q_list_head;
+    spinlock_t q_lock;
+
+    // This is a counting semaphore. It gets incremented and decremented
+    // exactly once for each item that is added to the queue.
+    struct semaphore q_sem;
+    atomic_t main_loop_should_exit;
+
+    struct task_struct *q_kthread;
+};
+
+struct nv_kthread_q_item
+{
+    struct list_head q_list_node;
+    nv_q_func_t function_to_run;
+    void *function_args;
+};
+
+
+#ifndef NUMA_NO_NODE
+#define NUMA_NO_NODE (-1)
+#endif
+
+#define NV_KTHREAD_NO_NODE NUMA_NO_NODE
+
 //
 // The queue must not be used before calling this routine.
 //
@@ -124,7 +155,10 @@ int nv_kthread_q_init_on_node(nv_kthread_q_t *q,
 // This routine is the same as nv_kthread_q_init_on_node() with the exception
 // that the queue stack will be allocated on the NUMA node of the caller.
 //
-int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname);
+static inline int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname)
+{
+    return nv_kthread_q_init_on_node(q, qname, NV_KTHREAD_NO_NODE);
+}

 //
 // The caller is responsible for stopping all queues, by calling this routine
--- a/kernel-open/common/inc/nv-linux.h
+++ b/kernel-open/common/inc/nv-linux.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2001-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2001-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -248,7 +248,7 @@ NV_STATUS nvos_forward_error_to_cray(struct pci_dev *, NvU32,
 #undef NV_SET_PAGES_UC_PRESENT
 #endif

-#if !defined(NVCPU_AARCH64) && !defined(NVCPU_PPC64LE) && !defined(NVCPU_RISCV64)
+#if !defined(NVCPU_AARCH64) && !defined(NVCPU_PPC64LE)
 #if !defined(NV_SET_MEMORY_UC_PRESENT) && !defined(NV_SET_PAGES_UC_PRESENT)
 #error "This driver requires the ability to change memory types!"
 #endif
@@ -430,11 +430,6 @@ extern NvBool nvos_is_chipset_io_coherent(void);
 #define CACHE_FLUSH()            asm volatile("sync;  \n" \
                                              "isync; \n" ::: "memory")
 #define WRITE_COMBINE_FLUSH()    CACHE_FLUSH()
-#elif defined(NVCPU_RISCV64)
-#define CACHE_FLUSH()            mb()
-#define WRITE_COMBINE_FLUSH()    CACHE_FLUSH()
-#else
-#error "CACHE_FLUSH() and WRITE_COMBINE_FLUSH() need to be defined for this architecture."
 #endif

 typedef enum
@@ -445,7 +440,7 @@ typedef enum
    NV_MEMORY_TYPE_DEVICE_MMIO, /* All kinds of MMIO referred by NVRM e.g. BARs and MCFG of device */
 } nv_memory_type_t;

-#if defined(NVCPU_AARCH64) || defined(NVCPU_PPC64LE) || defined(NVCPU_RISCV64)
+#if defined(NVCPU_AARCH64) || defined(NVCPU_PPC64LE)
 #define NV_ALLOW_WRITE_COMBINING(mt)    1
 #elif defined(NVCPU_X86_64)
 #if defined(NV_ENABLE_PAT_SUPPORT)
@@ -504,7 +499,9 @@ static inline void *nv_vmalloc(unsigned long size)
    void *ptr = __vmalloc(size, GFP_KERNEL);
 #endif
    if (ptr)
+    {
        NV_MEMDBG_ADD(ptr, size);
+    }
    return ptr;
 }

@@ -522,7 +519,9 @@ static inline void *nv_ioremap(NvU64 phys, NvU64 size)
    void *ptr = ioremap(phys, size);
 #endif
    if (ptr)
+    {
        NV_MEMDBG_ADD(ptr, size);
+    }
    return ptr;
 }

@@ -558,8 +557,9 @@ static inline void *nv_ioremap_cache(NvU64 phys, NvU64 size)
 #endif

    if (ptr)
+    {
        NV_MEMDBG_ADD(ptr, size);
-
+    }
    return ptr;
 }

@@ -575,8 +575,9 @@ static inline void *nv_ioremap_wc(NvU64 phys, NvU64 size)
 #endif

    if (ptr)
+    {
        NV_MEMDBG_ADD(ptr, size);
-
+    }
    return ptr;
 }

@@ -705,7 +706,9 @@ static inline NvUPtr nv_vmap(struct page **pages, NvU32 page_count,
    /* All memory cached in PPC64LE; can't honor 'cached' input. */
    ptr = vmap(pages, page_count, VM_MAP, prot);
    if (ptr)
+    {
        NV_MEMDBG_ADD(ptr, page_count * PAGE_SIZE);
+    }
    return (NvUPtr)ptr;
 }

@@ -758,6 +761,7 @@ static inline dma_addr_t nv_phys_to_dma(struct device *dev, NvU64 pa)
 #define NV_VMA_FILE(vma)              ((vma)->vm_file)

 #define NV_DEVICE_MINOR_NUMBER(x)     minor((x)->i_rdev)
+#define NV_CONTROL_DEVICE_MINOR       255

 #define NV_PCI_DISABLE_DEVICE(pci_dev)                           \
    {                                                            \
@@ -1607,6 +1611,10 @@ typedef struct nv_linux_state_s {

    struct nv_dma_device dma_dev;
    struct nv_dma_device niso_dma_dev;
+#if defined(NV_VGPU_KVM_BUILD)
+    wait_queue_head_t wait;
+    NvS32 return_status;
+#endif
 } nv_linux_state_t;

 extern nv_linux_state_t *nv_linux_devices;
@@ -1650,11 +1658,20 @@ typedef struct nvidia_event
    nv_event_t event;
 } nvidia_event_t;

+typedef enum
+{
+    NV_FOPS_STACK_INDEX_MMAP,
+    NV_FOPS_STACK_INDEX_IOCTL,
+    NV_FOPS_STACK_INDEX_COUNT
+} nvidia_entry_point_index_t;
+
 typedef struct
 {
    nv_file_private_t nvfp;

    nvidia_stack_t *sp;
+    nvidia_stack_t *fops_sp[NV_FOPS_STACK_INDEX_COUNT];
+    struct semaphore fops_sp_lock[NV_FOPS_STACK_INDEX_COUNT];
    nv_alloc_t *free_list;
    void *nvptr;
    nvidia_event_t *event_data_head, *event_data_tail;
@@ -1684,6 +1701,28 @@ static inline nv_linux_file_private_t *nv_get_nvlfp_from_nvfp(nv_file_private_t

 #define NV_STATE_PTR(nvl)   &(((nv_linux_state_t *)(nvl))->nv_state)

+static inline nvidia_stack_t *nv_nvlfp_get_sp(nv_linux_file_private_t *nvlfp, nvidia_entry_point_index_t which)
+{
+#if defined(NVCPU_X86_64)
+    if (rm_is_altstack_in_use())
+    {
+        down(&nvlfp->fops_sp_lock[which]);
+        return nvlfp->fops_sp[which];
+    }
+#endif
+    return NULL;
+}
+
+static inline void nv_nvlfp_put_sp(nv_linux_file_private_t *nvlfp, nvidia_entry_point_index_t which)
+{
+#if defined(NVCPU_X86_64)
+    if (rm_is_altstack_in_use())
+    {
+        up(&nvlfp->fops_sp_lock[which]);
+    }
+#endif
+}
+
 #define NV_ATOMIC_READ(data)            atomic_read(&(data))
 #define NV_ATOMIC_SET(data,val)         atomic_set(&(data), (val))
 #define NV_ATOMIC_INC(data)             atomic_inc(&(data))
@@ -1955,31 +1994,6 @@ static inline NvBool nv_platform_use_auto_online(nv_linux_state_t *nvl)
    return nvl->numa_info.use_auto_online;
 }

-typedef struct {
-    NvU64 base;
-    NvU64 size;
-    NvU32 nodeId;
-    int ret;
-} remove_numa_memory_info_t;
-
-static void offline_numa_memory_callback
-(
-    void *args
-)
-{
-#ifdef NV_OFFLINE_AND_REMOVE_MEMORY_PRESENT
-    remove_numa_memory_info_t *pNumaInfo = (remove_numa_memory_info_t *)args;
-#ifdef NV_REMOVE_MEMORY_HAS_NID_ARG
-    pNumaInfo->ret = offline_and_remove_memory(pNumaInfo->nodeId,
-                                               pNumaInfo->base,
-                                               pNumaInfo->size);
-#else
-    pNumaInfo->ret = offline_and_remove_memory(pNumaInfo->base,
-                                               pNumaInfo->size);
-#endif
-#endif
-}
-
 typedef enum
 {
    NV_NUMA_STATUS_DISABLED             = 0,
@@ -2040,4 +2054,7 @@ typedef enum
 #include <linux/clk-provider.h>
 #endif

+#define NV_EXPORT_SYMBOL(symbol)        EXPORT_SYMBOL_GPL(symbol)
+#define NV_CHECK_EXPORT_SYMBOL(symbol)  NV_IS_EXPORT_SYMBOL_PRESENT_##symbol
+
 #endif  /* _NV_LINUX_H_ */
--- a/kernel-open/common/inc/nv-pgprot.h
+++ b/kernel-open/common/inc/nv-pgprot.h
@@ -119,13 +119,6 @@ static inline pgprot_t pgprot_modify_writecombine(pgprot_t old_prot)
 #define NV_PGPROT_WRITE_COMBINED(old_prot)    old_prot
 #define NV_PGPROT_READ_ONLY(old_prot)                                         \
    __pgprot(pgprot_val((old_prot)) & ~NV_PAGE_RW)
-#elif defined(NVCPU_RISCV64)
-#define NV_PGPROT_WRITE_COMBINED_DEVICE(old_prot)                             \
-    pgprot_writecombine(old_prot)
-/* Don't attempt to mark sysmem pages as write combined on riscv */
-#define NV_PGPROT_WRITE_COMBINED(old_prot)     old_prot
-#define NV_PGPROT_READ_ONLY(old_prot)                                         \
-            __pgprot(pgprot_val((old_prot)) & ~_PAGE_WRITE)
 #else
 /* Writecombine is not supported */
 #undef NV_PGPROT_WRITE_COMBINED_DEVICE(old_prot)
--- a/kernel-open/common/inc/nv-proto.h
+++ b/kernel-open/common/inc/nv-proto.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 1999-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 1999-2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -25,8 +25,10 @@
 #define _NV_PROTO_H_

 #include "nv-pci.h"
+#include "nv-register-module.h"

 extern const char *nv_device_name;
+extern nvidia_module_t nv_fops;

 void        nv_acpi_register_notifier   (nv_linux_state_t *);
 void        nv_acpi_unregister_notifier (nv_linux_state_t *);
@@ -84,7 +86,7 @@ void          nv_shutdown_adapter(nvidia_stack_t *, nv_state_t *, nv_linux_state
 void          nv_dev_free_stacks(nv_linux_state_t *);
 NvBool        nv_lock_init_locks(nvidia_stack_t *, nv_state_t *);
 void          nv_lock_destroy_locks(nvidia_stack_t *, nv_state_t *);
-int           nv_linux_add_device_locked(nv_linux_state_t *);
+void          nv_linux_add_device_locked(nv_linux_state_t *);
 void          nv_linux_remove_device_locked(nv_linux_state_t *);
 NvBool        nv_acpi_power_resource_method_present(struct pci_dev *);

--- a/kernel-open/common/inc/nv.h
+++ b/kernel-open/common/inc/nv.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 1999-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 1999-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -42,7 +42,6 @@
 #include <nv-caps.h>
 #include <nv-firmware.h>
 #include <nv-ioctl.h>
-#include <nv-ioctl-numa.h>
 #include <nvmisc.h>

 extern nv_cap_t *nvidia_caps_root;
@@ -51,6 +50,9 @@ extern const NvBool nv_is_rm_firmware_supported_os;

 #include <nv-kernel-interface-api.h>

+/* NVIDIA's reserved major character device number (Linux). */
+#define NV_MAJOR_DEVICE_NUMBER 195
+
 #define GPU_UUID_LEN    (16)

 /*
@@ -476,6 +478,8 @@ typedef struct nv_state_t
    /* Bool to check if dma-buf is supported */
    NvBool dma_buf_supported;

+    NvBool printed_openrm_enable_unsupported_gpus_error;
+
    /* Check if NVPCF DSM function is implemented under NVPCF or GPU device scope */
    NvBool nvpcf_dsm_in_gpu_scope;

@@ -501,7 +505,6 @@ struct nv_file_private_t
    NvHandle *handles;
    NvU16 maxHandles;
    NvU32 deviceInstance;
-    NvU32 gpuInstanceId;
    NvU8 metadata[64];

    nv_file_private_t *ctl_nvfp;
@@ -612,6 +615,14 @@ typedef enum
 #define NV_IS_DEVICE_IN_SURPRISE_REMOVAL(nv)    \
        (((nv)->flags & NV_FLAG_IN_SURPRISE_REMOVAL) != 0)

+/*
+ * For console setup by EFI GOP, the base address is BAR1.
+ * For console setup by VBIOS, the base address is BAR2 + 16MB.
+ */
+#define NV_IS_CONSOLE_MAPPED(nv, addr)  \
+        (((addr) == (nv)->bars[NV_GPU_BAR_INDEX_FB].cpu_address) || \
+         ((addr) == ((nv)->bars[NV_GPU_BAR_INDEX_IMEM].cpu_address + 0x1000000)))
+
 #define NV_SOC_IS_ISO_IOMMU_PRESENT(nv)     \
        ((nv)->iso_iommu_present)

@@ -762,7 +773,7 @@ nv_state_t*  NV_API_CALL  nv_get_ctl_state       (void);
 void   NV_API_CALL  nv_set_dma_address_size      (nv_state_t *, NvU32 );

 NV_STATUS  NV_API_CALL  nv_alias_pages           (nv_state_t *, NvU32, NvU32, NvU32, NvU64, NvU64 *, void **);
-NV_STATUS  NV_API_CALL  nv_alloc_pages           (nv_state_t *, NvU32, NvU64, NvBool, NvU32, NvBool, NvBool, NvS32, NvU64 *, void **);
+NV_STATUS  NV_API_CALL  nv_alloc_pages           (nv_state_t *, NvU32, NvBool, NvU32, NvBool, NvBool, NvS32, NvU64 *, void **);
 NV_STATUS  NV_API_CALL  nv_free_pages            (nv_state_t *, NvU32, NvBool, NvU32, void *);

 NV_STATUS  NV_API_CALL  nv_register_user_pages   (nv_state_t *, NvU64, NvU64 *, void *, void **);
@@ -871,6 +882,8 @@ NvBool    NV_API_CALL nv_match_gpu_os_info(nv_state_t *, void *);
 NvU32     NV_API_CALL nv_get_os_type(void);

 void      NV_API_CALL nv_get_updated_emu_seg(NvU32 *start, NvU32 *end);
+void      NV_API_CALL nv_get_screen_info(nv_state_t *, NvU64 *, NvU16 *, NvU16 *, NvU16 *, NvU16 *, NvU64 *);
+
 struct dma_buf;
 typedef struct nv_dma_buf nv_dma_buf_t;
 struct drm_gem_object;
@@ -921,6 +934,7 @@ NV_STATUS  NV_API_CALL  rm_ioctl                 (nvidia_stack_t *, nv_state_t *
 NvBool     NV_API_CALL  rm_isr                   (nvidia_stack_t *, nv_state_t *, NvU32 *);
 void       NV_API_CALL  rm_isr_bh                (nvidia_stack_t *, nv_state_t *);
 void       NV_API_CALL  rm_isr_bh_unlocked       (nvidia_stack_t *, nv_state_t *);
+NvBool     NV_API_CALL  rm_is_msix_allowed       (nvidia_stack_t *, nv_state_t *);
 NV_STATUS  NV_API_CALL  rm_power_management      (nvidia_stack_t *, nv_state_t *, nv_pm_action_t);
 NV_STATUS  NV_API_CALL  rm_stop_user_channels    (nvidia_stack_t *, nv_state_t *);
 NV_STATUS  NV_API_CALL  rm_restart_user_channels (nvidia_stack_t *, nv_state_t *);
@@ -978,7 +992,7 @@ NV_STATUS  NV_API_CALL  rm_dma_buf_dup_mem_handle (nvidia_stack_t *, nv_state_t
 void       NV_API_CALL  rm_dma_buf_undup_mem_handle(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle);
 NV_STATUS  NV_API_CALL  rm_dma_buf_map_mem_handle (nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvU64, NvU64, void *, nv_phys_addr_range_t **, NvU32 *);
 void       NV_API_CALL  rm_dma_buf_unmap_mem_handle(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvU64, nv_phys_addr_range_t **, NvU32);
-NV_STATUS  NV_API_CALL  rm_dma_buf_get_client_and_device(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvHandle *, NvHandle *, NvHandle *, void **, NvBool *);
+NV_STATUS  NV_API_CALL  rm_dma_buf_get_client_and_device(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle *, NvHandle *, NvHandle *, void **, NvBool *);
 void       NV_API_CALL  rm_dma_buf_put_client_and_device(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvHandle, void *);
 NV_STATUS  NV_API_CALL  rm_log_gpu_crash          (nv_stack_t *, nv_state_t *);

@@ -990,7 +1004,7 @@ NvBool     NV_API_CALL rm_gpu_need_4k_page_isolation(nv_state_t *);
 NvBool     NV_API_CALL rm_is_chipset_io_coherent(nv_stack_t *);
 NvBool     NV_API_CALL rm_init_event_locks(nvidia_stack_t *, nv_state_t *);
 void       NV_API_CALL rm_destroy_event_locks(nvidia_stack_t *, nv_state_t *);
-NV_STATUS  NV_API_CALL rm_get_gpu_numa_info(nvidia_stack_t *, nv_state_t *, nv_ioctl_numa_info_t *);
+NV_STATUS  NV_API_CALL rm_get_gpu_numa_info(nvidia_stack_t *, nv_state_t *, NvS32 *, NvU64 *, NvU64 *, NvU64 *, NvU32 *);
 NV_STATUS  NV_API_CALL rm_gpu_numa_online(nvidia_stack_t *, nv_state_t *);
 NV_STATUS  NV_API_CALL rm_gpu_numa_offline(nvidia_stack_t *, nv_state_t *);
 NvBool     NV_API_CALL rm_is_device_sequestered(nvidia_stack_t *, nv_state_t *);
@@ -1005,7 +1019,7 @@ void       NV_API_CALL rm_cleanup_dynamic_power_management(nvidia_stack_t *, nv_
 void       NV_API_CALL rm_enable_dynamic_power_management(nvidia_stack_t *, nv_state_t *);
 NV_STATUS  NV_API_CALL rm_ref_dynamic_power(nvidia_stack_t *, nv_state_t *, nv_dynamic_power_mode_t);
 void       NV_API_CALL rm_unref_dynamic_power(nvidia_stack_t *, nv_state_t *, nv_dynamic_power_mode_t);
-NV_STATUS  NV_API_CALL rm_transition_dynamic_power(nvidia_stack_t *, nv_state_t *, NvBool, NvBool *);
+NV_STATUS  NV_API_CALL rm_transition_dynamic_power(nvidia_stack_t *, nv_state_t *, NvBool);
 const char* NV_API_CALL rm_get_vidmem_power_status(nvidia_stack_t *, nv_state_t *);
 const char* NV_API_CALL rm_get_dynamic_power_management_status(nvidia_stack_t *, nv_state_t *);
 const char* NV_API_CALL rm_get_gpu_gcx_support(nvidia_stack_t *, nv_state_t *, NvBool);
@@ -1020,13 +1034,11 @@ NV_STATUS  NV_API_CALL  nv_vgpu_create_request(nvidia_stack_t *, nv_state_t *, c
 NV_STATUS  NV_API_CALL  nv_vgpu_delete(nvidia_stack_t *, const NvU8 *, NvU16);
 NV_STATUS  NV_API_CALL  nv_vgpu_get_type_ids(nvidia_stack_t *, nv_state_t *, NvU32 *, NvU32 *, NvBool, NvU8, NvBool);
 NV_STATUS  NV_API_CALL  nv_vgpu_get_type_info(nvidia_stack_t *, nv_state_t *, NvU32, char *, int, NvU8);
-NV_STATUS  NV_API_CALL  nv_vgpu_get_bar_info(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 *, NvU32, void *, NvBool *);
-NV_STATUS  NV_API_CALL  nv_vgpu_get_hbm_info(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 *, NvU64 *);
-NV_STATUS  NV_API_CALL  nv_vgpu_start(nvidia_stack_t *, const NvU8 *, void *, NvS32 *, NvU8 *, NvU32);
-NV_STATUS  NV_API_CALL  nv_vgpu_get_sparse_mmap(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 **, NvU64 **, NvU32 *);
+NV_STATUS  NV_API_CALL  nv_vgpu_get_bar_info(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 *,
+                                             NvU64 *, NvU64 *, NvU32 *, NvU8 *);
 NV_STATUS  NV_API_CALL  nv_vgpu_process_vf_info(nvidia_stack_t *, nv_state_t *, NvU8, NvU32, NvU8, NvU8, NvU8, NvBool, void *);
-NV_STATUS  NV_API_CALL  nv_vgpu_update_request(nvidia_stack_t *, const NvU8 *, NvU32, NvU64 *, NvU64 *, const char *);
-NV_STATUS  NV_API_CALL  nv_gpu_bind_event(nvidia_stack_t *);
+NV_STATUS  NV_API_CALL  nv_gpu_bind_event(nvidia_stack_t *, NvU32, NvBool *);
+NV_STATUS  NV_API_CALL  nv_gpu_unbind_event(nvidia_stack_t *, NvU32, NvBool *);

 NV_STATUS NV_API_CALL nv_get_usermap_access_params(nv_state_t*, nv_usermap_access_params_t*);
 nv_soc_irq_type_t NV_API_CALL nv_get_current_irq_type(nv_state_t*);
--- a/kernel-open/common/inc/nv_speculation_barrier.h
+++ b/kernel-open/common/inc/nv_speculation_barrier.h
@@ -86,7 +86,7 @@
 /* Not currently implemented for MSVC/ARM64. See bug 3366890. */
 #   define nv_speculation_barrier()
 #   define speculation_barrier() nv_speculation_barrier()
-#elif defined(NVCPU_IS_RISCV64)
+#elif defined(NVCPU_NVRISCV64) && NVOS_IS_LIBOS
 #   define nv_speculation_barrier()
 #else
 #error "Unknown compiler/chip family"
--- a/kernel-open/common/inc/nv_uvm_types.h
+++ b/kernel-open/common/inc/nv_uvm_types.h
@@ -104,10 +104,6 @@ typedef struct UvmGpuMemoryInfo_tag
    // Out: Set to TRUE, if the allocation is in sysmem.
    NvBool sysmem;

-    // Out: Set to TRUE, if this allocation is treated as EGM.
-    //      sysmem is also TRUE when egm is TRUE.
-    NvBool egm;
-
    // Out: Set to TRUE, if the allocation is a constructed
    //      under a Device or Subdevice.
    //      All permutations of sysmem and deviceDescendant are valid.
@@ -129,8 +125,6 @@ typedef struct UvmGpuMemoryInfo_tag

    // Out: Uuid of the GPU to which the allocation belongs.
    //      This is only valid if deviceDescendant is NV_TRUE.
-    //      When egm is NV_TRUE, this is also the UUID of the GPU
-    //      for which EGM is local.
    //      Note: If the allocation is owned by a device in
    //      an SLI group and the allocation is broadcast
    //      across the SLI group, this UUID will be any one
@@ -338,7 +332,7 @@ typedef struct UvmGpuPagingChannelAllocParams_tag

 // The max number of Copy Engines supported by a GPU.
 // The gpu ops build has a static assert that this is the correct number.
-#define UVM_COPY_ENGINE_COUNT_MAX 64
+#define UVM_COPY_ENGINE_COUNT_MAX 10

 typedef struct
 {
@@ -572,8 +566,11 @@ typedef struct UvmPlatformInfo_tag
    // Out: ATS (Address Translation Services) is supported
    NvBool atsSupported;

-    // Out: AMD SEV (Secure Encrypted Virtualization) is enabled
-    NvBool sevEnabled;
+    // Out: True if HW trusted execution, such as AMD's SEV-SNP or Intel's TDX,
+    // is enabled in the VM, indicating that Confidential Computing must be
+    // also enabled in the GPU(s); these two security features are either both
+    // enabled, or both disabled.
+    NvBool confComputingEnabled;
 } UvmPlatformInfo;

 typedef struct UvmGpuClientInfo_tag
@@ -686,10 +683,6 @@ typedef struct UvmGpuInfo_tag
    // to NVSwitch peers.
    NvBool connectedToSwitch;
    NvU64 nvswitchMemoryWindowStart;
-
-    // local EGM properties
-    NvBool   egmEnabled;
-    NvU8     egmPeerId;
 } UvmGpuInfo;

 typedef struct UvmGpuFbInfo_tag
--- a/kernel-open/common/inc/nvkms-api-types.h
+++ b/kernel-open/common/inc/nvkms-api-types.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2014-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -45,11 +45,6 @@

 #define NVKMS_DEVICE_ID_TEGRA                 0x0000ffff

-#define NVKMS_MAX_SUPERFRAME_VIEWS            4
-
-#define NVKMS_LOG2_LUT_ARRAY_SIZE             10
-#define NVKMS_LUT_ARRAY_SIZE                  (1 << NVKMS_LOG2_LUT_ARRAY_SIZE)
-
 typedef NvU32 NvKmsDeviceHandle;
 typedef NvU32 NvKmsDispHandle;
 typedef NvU32 NvKmsConnectorHandle;
@@ -184,14 +179,6 @@ enum NvKmsEventType {
    NVKMS_EVENT_TYPE_FLIP_OCCURRED,
 };

-enum NvKmsFlipResult {
-    NV_KMS_FLIP_RESULT_SUCCESS = 0,    /* Success */
-    NV_KMS_FLIP_RESULT_INVALID_PARAMS, /* Parameter validation failed */
-    NV_KMS_FLIP_RESULT_IN_PROGRESS,    /* Flip would fail because an outstanding
-                                          flip containing changes that cannot be
-                                          queued is in progress */
-};
-
 typedef enum {
    NV_EVO_SCALER_1TAP      = 0,
    NV_EVO_SCALER_2TAPS     = 1,
@@ -234,16 +221,6 @@ struct NvKmsUsageBounds {
    } layer[NVKMS_MAX_LAYERS_PER_HEAD];
 };

-/*!
- * Per-component arrays of NvU16s describing the LUT; used for both the input
- * LUT and output LUT.
- */
-struct NvKmsLutRamps {
-    NvU16 red[NVKMS_LUT_ARRAY_SIZE];   /*! in */
-    NvU16 green[NVKMS_LUT_ARRAY_SIZE]; /*! in */
-    NvU16 blue[NVKMS_LUT_ARRAY_SIZE];  /*! in */
-};
-
 /*
 * A 3x4 row-major colorspace conversion matrix.
 *
@@ -554,18 +531,6 @@ typedef struct {
    NvBool noncoherent;
 } NvKmsDispIOCoherencyModes;

-enum NvKmsInputColorRange {
-    /*
-     * If DEFAULT is provided, driver will assume full range for RGB formats
-     * and limited range for YUV formats.
-     */
-    NVKMS_INPUT_COLORRANGE_DEFAULT = 0,
-
-    NVKMS_INPUT_COLORRANGE_LIMITED = 1,
-
-    NVKMS_INPUT_COLORRANGE_FULL = 2,
-};
-
 enum NvKmsInputColorSpace {
    /* Unknown colorspace; no de-gamma will be applied */
    NVKMS_INPUT_COLORSPACE_NONE = 0,
@@ -577,12 +542,6 @@ enum NvKmsInputColorSpace {
    NVKMS_INPUT_COLORSPACE_BT2100_PQ = 2,
 };

-enum NvKmsOutputColorimetry {
-    NVKMS_OUTPUT_COLORIMETRY_DEFAULT = 0,
-
-    NVKMS_OUTPUT_COLORIMETRY_BT2100 = 1,
-};
-
 enum NvKmsOutputTf {
    /*
     * NVKMS itself won't apply any OETF (clients are still
@@ -593,17 +552,6 @@ enum NvKmsOutputTf {
    NVKMS_OUTPUT_TF_PQ = 2,
 };

-/*!
- * EOTF Data Byte 1 as per CTA-861-G spec.
- * This is expected to match exactly with the spec.
- */
-enum NvKmsInfoFrameEOTF {
-    NVKMS_INFOFRAME_EOTF_SDR_GAMMA = 0,
-    NVKMS_INFOFRAME_EOTF_HDR_GAMMA = 1,
-    NVKMS_INFOFRAME_EOTF_ST2084 = 2,
-    NVKMS_INFOFRAME_EOTF_HLG = 3,
-};
-
 /*!
 * HDR Static Metadata Type1 Descriptor as per CEA-861.3 spec.
 * This is expected to match exactly with the spec.
@@ -657,29 +605,4 @@ struct NvKmsHDRStaticMetadata {
    NvU16 maxFALL;
 };

-/*!
- * A superframe is made of two or more video streams that are combined in
- * a specific way. A DP serializer (an external device connected to a Tegra
- * ARM SOC over DP or HDMI) can receive a video stream comprising multiple
- * videos combined into a single frame and then split it into multiple
- * video streams. The following structure describes the number of views
- * and dimensions of each view inside a superframe.
- */
-struct NvKmsSuperframeInfo {
-    NvU8 numViews;
-    struct {
-        /* x offset inside superframe at which this view starts */
-        NvU16 x;
-
-        /* y offset inside superframe at which this view starts */
-        NvU16 y;
-
-        /* Horizontal active width in pixels for this view */
-        NvU16 width;
-
-        /* Vertical active height in lines for this view */
-        NvU16 height;
-    } view[NVKMS_MAX_SUPERFRAME_VIEWS];
-};
-
 #endif /* NVKMS_API_TYPES_H */
--- a/kernel-open/common/inc/nvkms-kapi.h
+++ b/kernel-open/common/inc/nvkms-kapi.h
@@ -49,8 +49,6 @@ struct NvKmsKapiDevice;
 struct NvKmsKapiMemory;
 struct NvKmsKapiSurface;
 struct NvKmsKapiChannelEvent;
-struct NvKmsKapiSemaphoreSurface;
-struct NvKmsKapiSemaphoreSurfaceCallback;

 typedef NvU32 NvKmsKapiConnector;
 typedef NvU32 NvKmsKapiDisplay;
@@ -69,14 +67,6 @@ typedef NvU32 NvKmsKapiDisplay;
 */
 typedef void NvKmsChannelEventProc(void *dataPtr, NvU32 dataU32);

-/*
- * Note: Same as above, this function must not call back into NVKMS-KAPI, nor
- * directly into RM. Doing so could cause deadlocks given the notification
- * function will most likely be called from within RM's interrupt handler
- * callchain.
- */
-typedef void NvKmsSemaphoreSurfaceCallbackProc(void *pData);
-
 /** @} */

 /**
@@ -136,11 +126,6 @@ struct NvKmsKapiDeviceResourcesInfo {
        NvU32 validCursorCompositionModes;
        NvU64 supportedCursorSurfaceMemoryFormats;

-        struct {
-            NvU64 maxSubmittedOffset;
-            NvU64 stride;
-        } semsurf;
-
        struct {
            NvU16 validRRTransforms;
            NvU32 validCompositionModes;
@@ -233,10 +218,8 @@ struct NvKmsKapiLayerConfig {
    struct NvKmsRRParams rrParams;
    struct NvKmsKapiSyncpt syncptParams;

-    struct {
-        struct NvKmsHDRStaticMetadata val;
-        NvBool enabled;
-    } hdrMetadata;
+    struct NvKmsHDRStaticMetadata hdrMetadata;
+    NvBool hdrMetadataSpecified;

    enum NvKmsOutputTf tf;

@@ -250,21 +233,16 @@ struct NvKmsKapiLayerConfig {
    NvU16 dstWidth, dstHeight;

    enum NvKmsInputColorSpace inputColorSpace;
-    struct NvKmsCscMatrix csc;
-    NvBool cscUseMain;
 };

 struct NvKmsKapiLayerRequestedConfig {
    struct NvKmsKapiLayerConfig config;
    struct {
-        NvBool surfaceChanged     : 1;
-        NvBool srcXYChanged       : 1;
-        NvBool srcWHChanged       : 1;
-        NvBool dstXYChanged       : 1;
-        NvBool dstWHChanged       : 1;
-        NvBool cscChanged         : 1;
-        NvBool tfChanged          : 1;
-        NvBool hdrMetadataChanged : 1;
+        NvBool surfaceChanged : 1;
+        NvBool srcXYChanged   : 1;
+        NvBool srcWHChanged   : 1;
+        NvBool dstXYChanged   : 1;
+        NvBool dstWHChanged   : 1;
    } flags;
 };

@@ -308,41 +286,14 @@ struct NvKmsKapiHeadModeSetConfig {
    struct NvKmsKapiDisplayMode mode;

    NvBool vrrEnabled;
-
-    struct {
-        NvBool enabled;
-        enum NvKmsInfoFrameEOTF eotf;
-        struct NvKmsHDRStaticMetadata staticMetadata;
-    } hdrInfoFrame;
-
-    enum NvKmsOutputColorimetry colorimetry;
-
-    struct {
-        struct {
-            NvBool specified;
-            NvU32 depth;
-            NvU32 start;
-            NvU32 end;
-            struct NvKmsLutRamps *pRamps;
-        } input;
-
-        struct {
-            NvBool specified;
-            NvBool enabled;
-            struct NvKmsLutRamps *pRamps;
-        } output;
-    } lut;
 };

 struct NvKmsKapiHeadRequestedConfig {
    struct NvKmsKapiHeadModeSetConfig modeSetConfig;
    struct {
-        NvBool activeChanged       : 1;
-        NvBool displaysChanged     : 1;
-        NvBool modeChanged         : 1;
-        NvBool hdrInfoFrameChanged : 1;
-        NvBool colorimetryChanged  : 1;
-        NvBool lutChanged      : 1;
+        NvBool activeChanged   : 1;
+        NvBool displaysChanged : 1;
+        NvBool modeChanged     : 1;
    } flags;

    struct NvKmsKapiCursorRequestedConfig cursorRequestedConfig;
@@ -367,7 +318,6 @@ struct NvKmsKapiHeadReplyConfig {
 };

 struct NvKmsKapiModeSetReplyConfig {
-    enum NvKmsFlipResult flipResult;
    struct NvKmsKapiHeadReplyConfig
        headReplyConfig[NVKMS_KAPI_MAX_HEADS];
 };
@@ -484,12 +434,6 @@ enum NvKmsKapiAllocationType {
    NVKMS_KAPI_ALLOCATION_TYPE_OFFSCREEN = 2,
 };

-typedef enum NvKmsKapiRegisterWaiterResultRec {
-    NVKMS_KAPI_REG_WAITER_FAILED,
-    NVKMS_KAPI_REG_WAITER_SUCCESS,
-    NVKMS_KAPI_REG_WAITER_ALREADY_SIGNALLED,
-} NvKmsKapiRegisterWaiterResult;
-
 struct NvKmsKapiFunctionsTable {

    /*!
@@ -575,8 +519,8 @@ struct NvKmsKapiFunctionsTable {
    );

    /*!
-     * Revoke modeset permissions previously granted. Only one (dispIndex,
-     * head, display) is currently supported.
+     * Revoke permissions previously granted. Only one (dispIndex, head,
+     * display) is currently supported.
     *
     * \param [in]  device     A device returned by allocateDevice().
     *
@@ -593,34 +537,6 @@ struct NvKmsKapiFunctionsTable {
        NvKmsKapiDisplay display
    );

-    /*!
-     * Grant modeset sub-owner permissions to fd. This is used by clients to
-     * convert drm 'master' permissions into nvkms sub-owner permission.
-     *
-     * \param [in]  fd         fd from opening /dev/nvidia-modeset.
-     *
-     * \param [in]  device     A device returned by allocateDevice().
-     *
-     * \return NV_TRUE on success, NV_FALSE on failure.
-     */
-    NvBool (*grantSubOwnership)
-    (
-        NvS32 fd,
-        struct NvKmsKapiDevice *device
-    );
-
-    /*!
-     * Revoke sub-owner permissions previously granted.
-     *
-     * \param [in]  device     A device returned by allocateDevice().
-     *
-     * \return NV_TRUE on success, NV_FALSE on failure.
-     */
-    NvBool (*revokeSubOwnership)
-    (
-        struct NvKmsKapiDevice *device
-    );
-
    /*!
     * Registers for notification, via
     * NvKmsKapiAllocateDeviceParams::eventCallback, of the events specified
@@ -1206,199 +1122,6 @@ struct NvKmsKapiFunctionsTable {
                                       NvP64 dmaBuf,
                                       NvU32 limit);

-    /*!
-     * Import a semaphore surface allocated elsewhere to NVKMS and return a
-     * handle to the new object.
-     *
-     * \param [in] device            A device allocated using allocateDevice().
-     *
-     * \param [in] nvKmsParamsUser   Userspace pointer to driver-specific
-     *                               parameters describing the semaphore
-     *                               surface being imported.
-     *
-     * \param [in] nvKmsParamsSize   Size of the driver-specific parameter
-     *                               struct.
-     *
-     * \param [out] pSemaphoreMap    Returns a CPU mapping of the semaphore
-     *                               surface's semaphore memory to the client.
-     *
-     * \param [out] pMaxSubmittedMap Returns a CPU mapping of the semaphore
-     *                               surface's semaphore memory to the client.
-     *
-     * \return struct NvKmsKapiSemaphoreSurface* on success, NULL on failure.
-     */
-    struct NvKmsKapiSemaphoreSurface* (*importSemaphoreSurface)
-    (
-         struct NvKmsKapiDevice *device,
-         NvU64 nvKmsParamsUser,
-         NvU64 nvKmsParamsSize,
-         void **pSemaphoreMap,
-         void **pMaxSubmittedMap
-    );
-
-    /*!
-     * Free an imported semaphore surface.
-     *
-     * \param [in]  device              The device passed to
-     *                                  importSemaphoreSurface() when creating
-     *                                  semaphoreSurface.
-     *
-     * \param [in]  semaphoreSurface    A semaphore surface returned by
-     *                                  importSemaphoreSurface().
-     */
-    void (*freeSemaphoreSurface)
-    (
-        struct NvKmsKapiDevice *device,
-        struct NvKmsKapiSemaphoreSurface *semaphoreSurface
-    );
-
-    /*!
-     * Register a callback to be called when a semaphore reaches a value.
-     *
-     * The callback will be called when the semaphore at index in
-     * semaphoreSurface reaches the value wait_value.  The callback will
-     * be called at most once and is automatically unregistered when called.
-     * It may also be unregistered (i.e., cancelled) explicitly using the
-     * unregisterSemaphoreSurfaceCallback() function. To avoid leaking the
-     * memory used to track the registered callback, callers must ensure one
-     * of these methods of unregistration is used for every successful
-     * callback registration that returns a non-NULL pCallbackHandle.
-     *
-     * \param [in]  device              The device passed to
-     *                                  importSemaphoreSurface() when creating
-     *                                  semaphoreSurface.
-     *
-     * \param [in]  semaphoreSurface    A semaphore surface returned by
-     *                                  importSemaphoreSurface().
-     *
-     * \param [in]  pCallback           A pointer to the function to call when
-     *                                  the specified value is reached. NULL
-     *                                  means no callback.
-     *
-     * \param [in]  pData               Arbitrary data to be passed back to the
-     *                                  callback as its sole parameter.
-     *
-     * \param [in]  index               The index of the semaphore within
-     *                                  semaphoreSurface.
-     *
-     * \param [in]  wait_value          The value the semaphore must reach or
-     *                                  exceed before the callback is called.
-     *
-     * \param [in]  new_value           The value the semaphore will be set to
-     *                                  when it reaches or exceeds <wait_value>.
-     *                                  0 means do not update the value.
-     *
-     * \param [out] pCallbackHandle     On success, the value pointed to will
-     *                                  contain an opaque handle to the
-     *                                  registered callback that may be used to
-     *                                  cancel it if needed. Unused if pCallback
-     *                                  is NULL.
-     *
-     * \return NVKMS_KAPI_REG_WAITER_SUCCESS if the waiter was registered or if
-     *         no callback was requested and the semaphore at <index> has
-     *         already reached or exceeded <wait_value>
-     *
-     *         NVKMS_KAPI_REG_WAITER_ALREADY_SIGNALLED if a callback was
-     *         requested and the semaphore at <index> has already reached or
-     *         exceeded <wait_value>
-     *
-     *         NVKMS_KAPI_REG_WAITER_FAILED if waiter registration failed.
-     */
-    NvKmsKapiRegisterWaiterResult
-    (*registerSemaphoreSurfaceCallback)
-    (
-        struct NvKmsKapiDevice *device,
-        struct NvKmsKapiSemaphoreSurface *semaphoreSurface,
-        NvKmsSemaphoreSurfaceCallbackProc *pCallback,
-        void *pData,
-        NvU64 index,
-        NvU64 wait_value,
-        NvU64 new_value,
-        struct NvKmsKapiSemaphoreSurfaceCallback **pCallbackHandle
-    );
-
-    /*!
-     * Unregister a callback registered via registerSemaphoreSurfaceCallback()
-     *
-     * If the callback has not yet been called, this function will cancel the
-     * callback and free its associated resources.
-     *
-     * Note this function treats the callback handle as a pointer. While this
-     * function does not dereference that pointer itself, the underlying call
-     * to RM does within a properly guarded critical section that first ensures
-     * it is not in the process of being used within a callback. This means
-     * the callstack must take into consideration that pointers are not in
-     * general unique handles if they may have been freed, since a subsequent
-     * malloc could return the same pointer value at that point. This callchain
-     * avoids that by leveraging the behavior of the underlying RM APIs:
-     *
-     * 1) A callback handle is referenced relative to its corresponding
-     *    (semaphore surface, index, wait_value) tuple here and within RM. It
-     *    is not a valid handle outside of that scope.
-     *
-     * 2) A callback can not be registered against an already-reached value
-     *    for a given semaphore surface index.
-     *
-     * 3) A given callback handle can not be registered twice against the same
-     *    (semaphore surface, index, wait_value) tuple, so unregistration will
-     *    never race with registration at the RM level, and would only race at
-     *    a higher level if used incorrectly. Since this is kernel code, we
-     *    can safely assume there won't be malicious clients purposely misuing
-     *    the API, but the burden is placed on the caller to ensure its usage
-     *    does not lead to races at higher levels.
-     *
-     * These factors considered together ensure any valid registered handle is
-     * either still in the relevant waiter list and refers to the same event/
-     * callback as when it was registered, or has been removed from the list
-     * as part of a critical section that also destroys the list itself and
-     * makes future lookups in that list impossible, and hence eliminates the
-     * chance of comparing a stale handle with a new handle of the same value
-     * as part of a lookup.
-     *
-     * \param [in]  device              The device passed to
-     *                                  importSemaphoreSurface() when creating
-     *                                  semaphoreSurface.
-     *
-     * \param [in]  semaphoreSurface    The semaphore surface passed to
-     *                                  registerSemaphoreSurfaceCallback() when
-     *                                  registering the callback.
-     *
-     * \param [in]  index               The index passed to
-     *                                  registerSemaphoreSurfaceCallback() when
-     *                                  registering the callback.
-     *
-     * \param [in]  wait_value          The wait_value passed to
-     *                                  registerSemaphoreSurfaceCallback() when
-     *                                  registering the callback.
-     *
-     * \param [in]  callbackHandle      The callback handle returned by
-     *                                  registerSemaphoreSurfaceCallback().
-     */
-    NvBool
-    (*unregisterSemaphoreSurfaceCallback)
-    (
-        struct NvKmsKapiDevice *device,
-        struct NvKmsKapiSemaphoreSurface *semaphoreSurface,
-        NvU64 index,
-        NvU64 wait_value,
-        struct NvKmsKapiSemaphoreSurfaceCallback *callbackHandle
-    );
-
-    /*!
-     * Update the value of a semaphore surface from the CPU.
-     *
-     * Update the semaphore value at the specified index from the CPU, then
-     * wake up any pending CPU waiters associated with that index that are
-     * waiting on it reaching a value <= the new value.
-     */
-    NvBool
-    (*setSemaphoreSurfaceValue)
-    (
-        struct NvKmsKapiDevice *device,
-        struct NvKmsKapiSemaphoreSurface *semaphoreSurface,
-        NvU64 index,
-        NvU64 new_value
-    );
 };

 /** @} */
--- a/kernel-open/common/inc/os-interface.h
+++ b/kernel-open/common/inc/os-interface.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 1999-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 1999-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -162,10 +162,9 @@ NvBool      NV_API_CALL  os_is_vgx_hyper             (void);
 NV_STATUS   NV_API_CALL  os_inject_vgx_msi           (NvU16, NvU64, NvU32);
 NvBool      NV_API_CALL  os_is_grid_supported        (void);
 NvU32       NV_API_CALL  os_get_grid_csp_support     (void);
-void        NV_API_CALL  os_get_screen_info          (NvU64 *, NvU32 *, NvU32 *, NvU32 *, NvU32 *, NvU64, NvU64);
 void        NV_API_CALL  os_bug_check                (NvU32, const char *);
 NV_STATUS   NV_API_CALL  os_lock_user_pages          (void *, NvU64, void **, NvU32);
-NV_STATUS   NV_API_CALL  os_lookup_user_io_memory    (void *, NvU64, NvU64 **, void**);
+NV_STATUS   NV_API_CALL  os_lookup_user_io_memory    (void *, NvU64, NvU64 **);
 NV_STATUS   NV_API_CALL  os_unlock_user_pages        (NvU64, void *);
 NV_STATUS   NV_API_CALL  os_match_mmap_offset        (void *, NvU64, NvU64 *);
 NV_STATUS   NV_API_CALL  os_get_euid                 (NvU32 *);
@@ -230,14 +229,12 @@ extern NvBool os_dma_buf_enabled;
 * ---------------------------------------------------------------------------
 */

-#define NV_DBG_INFO       0x1
-#define NV_DBG_SETUP      0x2
+#define NV_DBG_INFO       0x0
+#define NV_DBG_SETUP      0x1
+#define NV_DBG_USERERRORS 0x2
 #define NV_DBG_WARNINGS   0x3
 #define NV_DBG_ERRORS     0x4
-#define NV_DBG_HW_ERRORS  0x5
-#define NV_DBG_FATAL      0x6

-#define NV_DBG_FORCE_LEVEL(level) ((level) | (1 << 8))

 void NV_API_CALL  out_string(const char *str);
 int  NV_API_CALL  nv_printf(NvU32 debuglevel, const char *printf_format, ...);
--- a/kernel-open/conftest.sh
+++ b/kernel-open/conftest.sh
--- a/kernel-open/nvidia-drm/nv-kthread-q.c
+++ b/kernel-open/nvidia-drm/nv-kthread-q.c
@@ -1,334 +0,0 @@
-/*
- * SPDX-FileCopyrightText: Copyright (c) 2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
- * SPDX-License-Identifier: MIT
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS IN THE SOFTWARE.
- */
-
-#include "nv-kthread-q.h"
-#include "nv-list-helpers.h"
-
-#include <linux/kthread.h>
-#include <linux/interrupt.h>
-#include <linux/completion.h>
-#include <linux/module.h>
-#include <linux/mm.h>
-
-#if defined(NV_LINUX_BUG_H_PRESENT)
-    #include <linux/bug.h>
-#else
-    #include <asm/bug.h>
-#endif
-
-// Today's implementation is a little simpler and more limited than the
-// API description allows for in nv-kthread-q.h. Details include:
-//
-// 1. Each nv_kthread_q instance is a first-in, first-out queue.
-//
-// 2. Each nv_kthread_q instance is serviced by exactly one kthread.
-//
-// You can create any number of queues, each of which gets its own
-// named kernel thread (kthread). You can then insert arbitrary functions
-// into the queue, and those functions will be run in the context of the
-// queue's kthread.
-
-#ifndef WARN
-    // Only *really* old kernels (2.6.9) end up here. Just use a simple printk
-    // to implement this, because such kernels won't be supported much longer.
-    #define WARN(condition, format...) ({                    \
-        int __ret_warn_on = !!(condition);                   \
-        if (unlikely(__ret_warn_on))                         \
-            printk(KERN_ERR format);                         \
-        unlikely(__ret_warn_on);                             \
-    })
-#endif
-
-#define NVQ_WARN(fmt, ...)                                   \
-    do {                                                     \
-        if (in_interrupt()) {                                \
-            WARN(1, "nv_kthread_q: [in interrupt]: " fmt,    \
-            ##__VA_ARGS__);                                  \
-        }                                                    \
-        else {                                               \
-            WARN(1, "nv_kthread_q: task: %s: " fmt,          \
-                 current->comm,                              \
-                 ##__VA_ARGS__);                             \
-        }                                                    \
-    } while (0)
-
-static int _main_loop(void *args)
-{
-    nv_kthread_q_t *q = (nv_kthread_q_t *)args;
-    nv_kthread_q_item_t *q_item = NULL;
-    unsigned long flags;
-
-    while (1) {
-        // Normally this thread is never interrupted. However,
-        // down_interruptible (instead of down) is called here,
-        // in order to avoid being classified as a potentially
-        // hung task, by the kernel watchdog.
-        while (down_interruptible(&q->q_sem))
-            NVQ_WARN("Interrupted during semaphore wait\n");
-
-        if (atomic_read(&q->main_loop_should_exit))
-            break;
-
-        spin_lock_irqsave(&q->q_lock, flags);
-
-        // The q_sem semaphore prevents us from getting here unless there is
-        // at least one item in the list, so an empty list indicates a bug.
-        if (unlikely(list_empty(&q->q_list_head))) {
-            spin_unlock_irqrestore(&q->q_lock, flags);
-            NVQ_WARN("_main_loop: Empty queue: q: 0x%p\n", q);
-            continue;
-        }
-
-        // Consume one item from the queue
-        q_item = list_first_entry(&q->q_list_head,
-                                   nv_kthread_q_item_t,
-                                   q_list_node);
-
-        list_del_init(&q_item->q_list_node);
-
-        spin_unlock_irqrestore(&q->q_lock, flags);
-
-        // Run the item
-        q_item->function_to_run(q_item->function_args);
-
-        // Make debugging a little simpler by clearing this between runs:
-        q_item = NULL;
-    }
-
-    while (!kthread_should_stop())
-        schedule();
-
-    return 0;
-}
-
-void nv_kthread_q_stop(nv_kthread_q_t *q)
-{
-    // check if queue has been properly initialized
-    if (unlikely(!q->q_kthread))
-        return;
-
-    nv_kthread_q_flush(q);
-
-    // If this assertion fires, then a caller likely either broke the API rules,
-    // by adding items after calling nv_kthread_q_stop, or possibly messed up
-    // with inadequate flushing of self-rescheduling q_items.
-    if (unlikely(!list_empty(&q->q_list_head)))
-        NVQ_WARN("list not empty after flushing\n");
-
-    if (likely(!atomic_read(&q->main_loop_should_exit))) {
-
-        atomic_set(&q->main_loop_should_exit, 1);
-
-        // Wake up the kthread so that it can see that it needs to stop:
-        up(&q->q_sem);
-
-        kthread_stop(q->q_kthread);
-        q->q_kthread = NULL;
-    }
-}
-
-// When CONFIG_VMAP_STACK is defined, the kernel thread stack allocator used by
-// kthread_create_on_node relies on a 2 entry, per-core cache to minimize
-// vmalloc invocations. The cache is NUMA-unaware, so when there is a hit, the
-// stack location ends up being a function of the core assigned to the current
-// thread, instead of being a function of the specified NUMA node. The cache was
-// added to the kernel in commit ac496bf48d97f2503eaa353996a4dd5e4383eaf0
-// ("fork: Optimize task creation by caching two thread stacks per CPU if
-// CONFIG_VMAP_STACK=y")
-//
-// To work around the problematic cache, we create up to three kernel threads
-//   -If the first thread's stack is resident on the preferred node, return this
-//    thread.
-//   -Otherwise, create a second thread. If its stack is resident on the
-//    preferred node, stop the first thread and return this one.
-//   -Otherwise, create a third thread. The stack allocator does not find a
-//    cached stack, and so falls back to vmalloc, which takes the NUMA hint into
-//    consideration. The first two threads are then stopped.
-//
-// When CONFIG_VMAP_STACK is not defined, the first kernel thread is returned.
-//
-// This function is never invoked when there is no NUMA preference (preferred
-// node is NUMA_NO_NODE).
-static struct task_struct *thread_create_on_node(int (*threadfn)(void *data),
-                                                 nv_kthread_q_t *q,
-                                                 int preferred_node,
-                                                 const char *q_name)
-{
-
-    unsigned i, j;
-    const static unsigned attempts = 3;
-    struct task_struct *thread[3];
-
-    for (i = 0;; i++) {
-        struct page *stack;
-
-        thread[i] = kthread_create_on_node(threadfn, q, preferred_node, q_name);
-
-        if (unlikely(IS_ERR(thread[i]))) {
-
-            // Instead of failing, pick the previous thread, even if its
-            // stack is not allocated on the preferred node.
-            if (i > 0)
-                i--;
-
-            break;
-        }
-
-        // vmalloc is not used to allocate the stack, so simply return the
-        // thread, even if its stack may not be allocated on the preferred node
-        if (!is_vmalloc_addr(thread[i]->stack))
-            break;
-
-        // Ran out of attempts - return thread even if its stack may not be
-        // allocated on the preferred node
-        if ((i == (attempts - 1)))
-            break;
-
-        // Get the NUMA node where the first page of the stack is resident. If
-        // it is the preferred node, select this thread.
-        stack = vmalloc_to_page(thread[i]->stack);
-        if (page_to_nid(stack) == preferred_node)
-            break;
-    }
-
-    for (j = i; j > 0; j--)
-        kthread_stop(thread[j - 1]);
-
-    return thread[i];
-}
-
-int nv_kthread_q_init_on_node(nv_kthread_q_t *q, const char *q_name, int preferred_node)
-{
-    memset(q, 0, sizeof(*q));
-
-    INIT_LIST_HEAD(&q->q_list_head);
-    spin_lock_init(&q->q_lock);
-    sema_init(&q->q_sem, 0);
-
-    if (preferred_node == NV_KTHREAD_NO_NODE) {
-        q->q_kthread = kthread_create(_main_loop, q, q_name);
-    }
-    else {
-        q->q_kthread = thread_create_on_node(_main_loop, q, preferred_node, q_name);
-    }
-
-    if (IS_ERR(q->q_kthread)) {
-        int err = PTR_ERR(q->q_kthread);
-
-        // Clear q_kthread before returning so that nv_kthread_q_stop() can be
-        // safely called on it making error handling easier.
-        q->q_kthread = NULL;
-
-        return err;
-    }
-
-    wake_up_process(q->q_kthread);
-
-    return 0;
-}
-
-int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname)
-{
-    return nv_kthread_q_init_on_node(q, qname, NV_KTHREAD_NO_NODE);
-}
-
-// Returns true (non-zero) if the item was actually scheduled, and false if the
-// item was already pending in a queue.
-static int _raw_q_schedule(nv_kthread_q_t *q, nv_kthread_q_item_t *q_item)
-{
-    unsigned long flags;
-    int ret = 1;
-
-    spin_lock_irqsave(&q->q_lock, flags);
-
-    if (likely(list_empty(&q_item->q_list_node)))
-        list_add_tail(&q_item->q_list_node, &q->q_list_head);
-    else
-        ret = 0;
-
-    spin_unlock_irqrestore(&q->q_lock, flags);
-
-    if (likely(ret))
-        up(&q->q_sem);
-
-    return ret;
-}
-
-void nv_kthread_q_item_init(nv_kthread_q_item_t *q_item,
-                            nv_q_func_t function_to_run,
-                            void *function_args)
-{
-    INIT_LIST_HEAD(&q_item->q_list_node);
-    q_item->function_to_run = function_to_run;
-    q_item->function_args   = function_args;
-}
-
-// Returns true (non-zero) if the q_item got scheduled, false otherwise.
-int nv_kthread_q_schedule_q_item(nv_kthread_q_t *q,
-                                 nv_kthread_q_item_t *q_item)
-{
-    if (unlikely(atomic_read(&q->main_loop_should_exit))) {
-        NVQ_WARN("Not allowed: nv_kthread_q_schedule_q_item was "
-                   "called with a non-alive q: 0x%p\n", q);
-        return 0;
-    }
-
-    return _raw_q_schedule(q, q_item);
-}
-
-static void _q_flush_function(void *args)
-{
-    struct completion *completion = (struct completion *)args;
-    complete(completion);
-}
-
-
-static void _raw_q_flush(nv_kthread_q_t *q)
-{
-    nv_kthread_q_item_t q_item;
-    DECLARE_COMPLETION_ONSTACK(completion);
-
-    nv_kthread_q_item_init(&q_item, _q_flush_function, &completion);
-
-    _raw_q_schedule(q, &q_item);
-
-    // Wait for the flush item to run. Once it has run, then all of the
-    // previously queued items in front of it will have run, so that means
-    // the flush is complete.
-    wait_for_completion(&completion);
-}
-
-void nv_kthread_q_flush(nv_kthread_q_t *q)
-{
-    if (unlikely(atomic_read(&q->main_loop_should_exit))) {
-        NVQ_WARN("Not allowed: nv_kthread_q_flush was called after "
-                   "nv_kthread_q_stop. q: 0x%p\n", q);
-        return;
-    }
-
-    // This 2x flush is not a typing mistake. The queue really does have to be
-    // flushed twice, in order to take care of the case of a q_item that
-    // reschedules itself.
-    _raw_q_flush(q);
-    _raw_q_flush(q);
-}
--- a/kernel-open/nvidia-drm/nvidia-dma-fence-helper.h
+++ b/kernel-open/nvidia-drm/nvidia-dma-fence-helper.h
@@ -43,13 +43,9 @@
 #if defined(NV_LINUX_FENCE_H_PRESENT)
 typedef struct fence nv_dma_fence_t;
 typedef struct fence_ops nv_dma_fence_ops_t;
-typedef struct fence_cb nv_dma_fence_cb_t;
-typedef fence_func_t nv_dma_fence_func_t;
 #else
 typedef struct dma_fence nv_dma_fence_t;
 typedef struct dma_fence_ops nv_dma_fence_ops_t;
-typedef struct dma_fence_cb nv_dma_fence_cb_t;
-typedef dma_fence_func_t nv_dma_fence_func_t;
 #endif

 #if defined(NV_LINUX_FENCE_H_PRESENT)
@@ -101,14 +97,6 @@ static inline int nv_dma_fence_signal(nv_dma_fence_t *fence) {
 #endif
 }

-static inline int nv_dma_fence_signal_locked(nv_dma_fence_t *fence) {
-#if defined(NV_LINUX_FENCE_H_PRESENT)
-    return fence_signal_locked(fence);
-#else
-    return dma_fence_signal_locked(fence);
-#endif
-}
-
 static inline u64 nv_dma_fence_context_alloc(unsigned num) {
 #if defined(NV_LINUX_FENCE_H_PRESENT)
    return fence_context_alloc(num);
@@ -120,7 +108,7 @@ static inline u64 nv_dma_fence_context_alloc(unsigned num) {
 static inline void
 nv_dma_fence_init(nv_dma_fence_t *fence,
                  const nv_dma_fence_ops_t *ops,
-                  spinlock_t *lock, u64 context, uint64_t seqno) {
+                  spinlock_t *lock, u64 context, unsigned seqno) {
 #if defined(NV_LINUX_FENCE_H_PRESENT)
    fence_init(fence, ops, lock, context, seqno);
 #else
@@ -128,29 +116,6 @@ nv_dma_fence_init(nv_dma_fence_t *fence,
 #endif
 }

-static inline void
-nv_dma_fence_set_error(nv_dma_fence_t *fence,
-                       int error) {
-#if defined(NV_DMA_FENCE_SET_ERROR_PRESENT)
-    return dma_fence_set_error(fence, error);
-#elif defined(NV_FENCE_SET_ERROR_PRESENT)
-    return fence_set_error(fence, error);
-#else
-    fence->status = error;
-#endif
-}
-
-static inline int
-nv_dma_fence_add_callback(nv_dma_fence_t *fence,
-                          nv_dma_fence_cb_t *cb,
-                          nv_dma_fence_func_t func) {
-#if defined(NV_LINUX_FENCE_H_PRESENT)
-    return fence_add_callback(fence, cb, func);
-#else
-    return dma_fence_add_callback(fence, cb, func);
-#endif
-}
-
 #endif /* defined(NV_DRM_FENCE_AVAILABLE) */

 #endif /* __NVIDIA_DMA_FENCE_HELPER_H__ */
--- a/kernel-open/nvidia-drm/nvidia-dma-resv-helper.h
+++ b/kernel-open/nvidia-drm/nvidia-dma-resv-helper.h
@@ -121,20 +121,6 @@ static inline void nv_dma_resv_add_excl_fence(nv_dma_resv_t *obj,
 #endif
 }

-static inline void nv_dma_resv_add_shared_fence(nv_dma_resv_t *obj,
-                                                nv_dma_fence_t *fence)
-{
-#if defined(NV_LINUX_DMA_RESV_H_PRESENT)
-#if defined(NV_DMA_RESV_ADD_FENCE_PRESENT)
-    dma_resv_add_fence(obj, fence, DMA_RESV_USAGE_READ);
-#else
-    dma_resv_add_shared_fence(obj, fence);
-#endif
-#else
-    reservation_object_add_shared_fence(obj, fence);
-#endif
-}
-
 #endif /* defined(NV_DRM_FENCE_AVAILABLE) */

 #endif /* __NVIDIA_DMA_RESV_HELPER_H__ */
--- a/kernel-open/nvidia-drm/nvidia-drm-conftest.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-conftest.h
@@ -61,15 +61,4 @@
 #undef NV_DRM_FENCE_AVAILABLE
 #endif

-/*
- * We can support color management if either drm_helper_crtc_enable_color_mgmt()
- * or drm_crtc_enable_color_mgmt() exist.
- */
-#if defined(NV_DRM_HELPER_CRTC_ENABLE_COLOR_MGMT_PRESENT) || \
-    defined(NV_DRM_CRTC_ENABLE_COLOR_MGMT_PRESENT)
-#define NV_DRM_COLOR_MGMT_AVAILABLE
-#else
-#undef NV_DRM_COLOR_MGMT_AVAILABLE
-#endif
-
 #endif /* defined(__NVIDIA_DRM_CONFTEST_H__) */
--- a/kernel-open/nvidia-drm/nvidia-drm-connector.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-connector.c
@@ -349,125 +349,10 @@ nv_drm_connector_best_encoder(struct drm_connector *connector)
    return NULL;
 }

-#if defined(NV_DRM_MODE_CREATE_DP_COLORSPACE_PROPERTY_HAS_SUPPORTED_COLORSPACES_ARG)
-static const NvU32 __nv_drm_connector_supported_colorspaces =
-    BIT(DRM_MODE_COLORIMETRY_BT2020_RGB) |
-    BIT(DRM_MODE_COLORIMETRY_BT2020_YCC);
-#endif
-
-#if defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT)
-static int
-__nv_drm_connector_atomic_check(struct drm_connector *connector,
-                                struct drm_atomic_state *state)
-{
-    struct drm_connector_state *new_connector_state =
-        drm_atomic_get_new_connector_state(state, connector);
-    struct drm_connector_state *old_connector_state =
-        drm_atomic_get_old_connector_state(state, connector);
-    struct nv_drm_device *nv_dev = to_nv_device(connector->dev);
-
-    struct drm_crtc *crtc = new_connector_state->crtc;
-    struct drm_crtc_state *crtc_state;
-    struct nv_drm_crtc_state *nv_crtc_state;
-    struct NvKmsKapiHeadRequestedConfig *req_config;
-
-    if (!crtc) {
-        return 0;
-    }
-
-    crtc_state = drm_atomic_get_new_crtc_state(state, crtc);
-    nv_crtc_state = to_nv_crtc_state(crtc_state);
-    req_config = &nv_crtc_state->req_config;
-
-    /*
-     * Override metadata for the entire head instead of allowing NVKMS to derive
-     * it from the layers' metadata.
-     *
-     * This is the metadata that will sent to the display, and if applicable,
-     * layers will be tone mapped to this metadata rather than that of the
-     * display.
-     */
-    req_config->flags.hdrInfoFrameChanged =
-        !drm_connector_atomic_hdr_metadata_equal(old_connector_state,
-                                                 new_connector_state);
-    if (new_connector_state->hdr_output_metadata &&
-        new_connector_state->hdr_output_metadata->data) {
-
-        /*
-         * Note that HDMI definitions are used here even though we might not
-         * be using HDMI. While that seems odd, it is consistent with
-         * upstream behavior.
-         */
-
-        struct hdr_output_metadata *hdr_metadata =
-            new_connector_state->hdr_output_metadata->data;
-        struct hdr_metadata_infoframe *info_frame =
-            &hdr_metadata->hdmi_metadata_type1;
-        unsigned int i;
-
-        if (hdr_metadata->metadata_type != HDMI_STATIC_METADATA_TYPE1) {
-            return -EINVAL;
-        }
-
-        for (i = 0; i < ARRAY_SIZE(info_frame->display_primaries); i++) {
-            req_config->modeSetConfig.hdrInfoFrame.staticMetadata.displayPrimaries[i].x =
-                info_frame->display_primaries[i].x;
-            req_config->modeSetConfig.hdrInfoFrame.staticMetadata.displayPrimaries[i].y =
-                info_frame->display_primaries[i].y;
-        }
-
-        req_config->modeSetConfig.hdrInfoFrame.staticMetadata.whitePoint.x =
-            info_frame->white_point.x;
-        req_config->modeSetConfig.hdrInfoFrame.staticMetadata.whitePoint.y =
-            info_frame->white_point.y;
-        req_config->modeSetConfig.hdrInfoFrame.staticMetadata.maxDisplayMasteringLuminance =
-            info_frame->max_display_mastering_luminance;
-        req_config->modeSetConfig.hdrInfoFrame.staticMetadata.minDisplayMasteringLuminance =
-            info_frame->min_display_mastering_luminance;
-        req_config->modeSetConfig.hdrInfoFrame.staticMetadata.maxCLL =
-            info_frame->max_cll;
-        req_config->modeSetConfig.hdrInfoFrame.staticMetadata.maxFALL =
-            info_frame->max_fall;
-
-        req_config->modeSetConfig.hdrInfoFrame.eotf = info_frame->eotf;
-
-        req_config->modeSetConfig.hdrInfoFrame.enabled = NV_TRUE;
-    } else {
-        req_config->modeSetConfig.hdrInfoFrame.enabled = NV_FALSE;
-    }
-
-    req_config->flags.colorimetryChanged =
-        (old_connector_state->colorspace != new_connector_state->colorspace);
-    // When adding a case here, also add to __nv_drm_connector_supported_colorspaces
-    switch (new_connector_state->colorspace) {
-        case DRM_MODE_COLORIMETRY_DEFAULT:
-            req_config->modeSetConfig.colorimetry =
-                NVKMS_OUTPUT_COLORIMETRY_DEFAULT;
-            break;
-        case DRM_MODE_COLORIMETRY_BT2020_RGB:
-        case DRM_MODE_COLORIMETRY_BT2020_YCC:
-            // Ignore RGB/YCC
-            // See https://patchwork.freedesktop.org/patch/525496/?series=111865&rev=4
-            req_config->modeSetConfig.colorimetry =
-                NVKMS_OUTPUT_COLORIMETRY_BT2100;
-            break;
-        default:
-            // XXX HDR TODO: Add support for more color spaces
-            NV_DRM_DEV_LOG_ERR(nv_dev, "Unsupported color space");
-            return -EINVAL;
-    }
-
-    return 0;
-}
-#endif /* defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT) */
-
 static const struct drm_connector_helper_funcs nv_connector_helper_funcs = {
    .get_modes    = nv_drm_connector_get_modes,
    .mode_valid   = nv_drm_connector_mode_valid,
    .best_encoder = nv_drm_connector_best_encoder,
-#if defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT)
-    .atomic_check = __nv_drm_connector_atomic_check,
-#endif
 };

 static struct drm_connector*
@@ -520,32 +405,6 @@ nv_drm_connector_new(struct drm_device *dev,
            DRM_CONNECTOR_POLL_CONNECT | DRM_CONNECTOR_POLL_DISCONNECT;
    }

-#if defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT)
-    if (nv_connector->type == NVKMS_CONNECTOR_TYPE_HDMI) {
-#if defined(NV_DRM_MODE_CREATE_DP_COLORSPACE_PROPERTY_HAS_SUPPORTED_COLORSPACES_ARG)
-        if (drm_mode_create_hdmi_colorspace_property(
-                &nv_connector->base,
-                __nv_drm_connector_supported_colorspaces) == 0) {
-#else
-        if (drm_mode_create_hdmi_colorspace_property(&nv_connector->base) == 0) {
-#endif
-            drm_connector_attach_colorspace_property(&nv_connector->base);
-        }
-        drm_connector_attach_hdr_output_metadata_property(&nv_connector->base);
-    } else if (nv_connector->type == NVKMS_CONNECTOR_TYPE_DP) {
-#if defined(NV_DRM_MODE_CREATE_DP_COLORSPACE_PROPERTY_HAS_SUPPORTED_COLORSPACES_ARG)
-        if (drm_mode_create_dp_colorspace_property(
-                &nv_connector->base,
-                __nv_drm_connector_supported_colorspaces) == 0) {
-#else
-        if (drm_mode_create_dp_colorspace_property(&nv_connector->base) == 0) {
-#endif
-            drm_connector_attach_colorspace_property(&nv_connector->base);
-        }
-        drm_connector_attach_hdr_output_metadata_property(&nv_connector->base);
-    }
-#endif /* defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT) */
-
    /* Register connector with DRM subsystem */

    ret = drm_connector_register(&nv_connector->base);
--- a/kernel-open/nvidia-drm/nvidia-drm-crtc.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-crtc.c
@@ -48,11 +48,6 @@
 #include <linux/host1x-next.h>
 #endif

-#if defined(NV_DRM_DRM_COLOR_MGMT_H_PRESENT)
-#include <drm/drm_color_mgmt.h>
-#endif
-
-
 #if defined(NV_DRM_HAS_HDR_OUTPUT_METADATA)
 static int
 nv_drm_atomic_replace_property_blob_from_id(struct drm_device *dev,
@@ -404,25 +399,27 @@ plane_req_config_update(struct drm_plane *plane,
        }

        for (i = 0; i < ARRAY_SIZE(info_frame->display_primaries); i ++) {
-            req_config->config.hdrMetadata.val.displayPrimaries[i].x =
+            req_config->config.hdrMetadata.displayPrimaries[i].x =
                info_frame->display_primaries[i].x;
-            req_config->config.hdrMetadata.val.displayPrimaries[i].y =
+            req_config->config.hdrMetadata.displayPrimaries[i].y =
                info_frame->display_primaries[i].y;
        }

-        req_config->config.hdrMetadata.val.whitePoint.x =
+        req_config->config.hdrMetadata.whitePoint.x =
            info_frame->white_point.x;
-        req_config->config.hdrMetadata.val.whitePoint.y =
+        req_config->config.hdrMetadata.whitePoint.y =
            info_frame->white_point.y;
-        req_config->config.hdrMetadata.val.maxDisplayMasteringLuminance =
+        req_config->config.hdrMetadata.maxDisplayMasteringLuminance =
            info_frame->max_display_mastering_luminance;
-        req_config->config.hdrMetadata.val.minDisplayMasteringLuminance =
+        req_config->config.hdrMetadata.minDisplayMasteringLuminance =
            info_frame->min_display_mastering_luminance;
-        req_config->config.hdrMetadata.val.maxCLL =
+        req_config->config.hdrMetadata.maxCLL =
            info_frame->max_cll;
-        req_config->config.hdrMetadata.val.maxFALL =
+        req_config->config.hdrMetadata.maxFALL =
            info_frame->max_fall;

+        req_config->config.hdrMetadataSpecified = true;
+
        switch (info_frame->eotf) {
            case HDMI_EOTF_SMPTE_ST2084:
                req_config->config.tf = NVKMS_OUTPUT_TF_PQ;
@@ -435,21 +432,10 @@ plane_req_config_update(struct drm_plane *plane,
                NV_DRM_DEV_LOG_ERR(nv_dev, "Unsupported EOTF");
                return -1;
        }
-
-        req_config->config.hdrMetadata.enabled = true;
    } else {
-        req_config->config.hdrMetadata.enabled = false;
+        req_config->config.hdrMetadataSpecified = false;
        req_config->config.tf = NVKMS_OUTPUT_TF_NONE;
    }
-
-    req_config->flags.hdrMetadataChanged =
-        ((old_config.hdrMetadata.enabled !=
-          req_config->config.hdrMetadata.enabled) ||
-         memcmp(&old_config.hdrMetadata.val,
-                &req_config->config.hdrMetadata.val,
-                sizeof(struct NvKmsHDRStaticMetadata)));
-
-    req_config->flags.tfChanged = (old_config.tf != req_config->config.tf);
 #endif

    /*
@@ -706,11 +692,9 @@ static inline void __nv_drm_plane_atomic_destroy_state(
 #endif

 #if defined(NV_DRM_HAS_HDR_OUTPUT_METADATA)
-    {
-        struct nv_drm_plane_state *nv_drm_plane_state =
-            to_nv_drm_plane_state(state);
-        drm_property_blob_put(nv_drm_plane_state->hdr_output_metadata);
-    }
+    struct nv_drm_plane_state *nv_drm_plane_state =
+        to_nv_drm_plane_state(state);
+    drm_property_blob_put(nv_drm_plane_state->hdr_output_metadata);
 #endif
 }

@@ -816,9 +800,6 @@ nv_drm_atomic_crtc_duplicate_state(struct drm_crtc *crtc)
        &(to_nv_crtc_state(crtc->state)->req_config),
        &nv_state->req_config);

-    nv_state->ilut_ramps = NULL;
-    nv_state->olut_ramps = NULL;
-
    return &nv_state->base;
 }

@@ -842,9 +823,6 @@ static void nv_drm_atomic_crtc_destroy_state(struct drm_crtc *crtc,

    __nv_drm_atomic_helper_crtc_destroy_state(crtc, &nv_state->base);

-    nv_drm_free(nv_state->ilut_ramps);
-    nv_drm_free(nv_state->olut_ramps);
-
    nv_drm_free(nv_state);
 }

@@ -855,9 +833,6 @@ static struct drm_crtc_funcs nv_crtc_funcs = {
    .destroy                = nv_drm_crtc_destroy,
    .atomic_duplicate_state = nv_drm_atomic_crtc_duplicate_state,
    .atomic_destroy_state   = nv_drm_atomic_crtc_destroy_state,
-#if defined(NV_DRM_ATOMIC_HELPER_LEGACY_GAMMA_SET_PRESENT)
-    .gamma_set = drm_atomic_helper_legacy_gamma_set,
-#endif
 };

 /*
@@ -891,198 +866,6 @@ static int head_modeset_config_attach_connector(
    return 0;
 }

-#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
-static int color_mgmt_config_copy_lut(struct NvKmsLutRamps *nvkms_lut,
-                                      struct drm_color_lut *drm_lut,
-                                      uint64_t lut_len)
-{
-    uint64_t i = 0;
-    if (lut_len != NVKMS_LUT_ARRAY_SIZE) {
-        return -EINVAL;
-    }
-
-    /*
-     * Both NvKms and drm LUT values are 16-bit linear values. NvKms LUT ramps
-     * are in arrays in a single struct while drm LUT ramps are an array of
-     * structs.
-     */
-    for (i = 0; i < lut_len; i++) {
-        nvkms_lut->red[i]   = drm_lut[i].red;
-        nvkms_lut->green[i] = drm_lut[i].green;
-        nvkms_lut->blue[i]  = drm_lut[i].blue;
-    }
-    return 0;
-}
-
-static void color_mgmt_config_ctm_to_csc(struct NvKmsCscMatrix *nvkms_csc,
-                                         struct drm_color_ctm  *drm_ctm)
-{
-    int y;
-
-    /* CTM is a 3x3 matrix while ours is 3x4. Zero out the last column. */
-    nvkms_csc->m[0][3] = nvkms_csc->m[1][3] = nvkms_csc->m[2][3] = 0;
-
-    for (y = 0; y < 3; y++) {
-        int x;
-
-        for (x = 0; x < 3; x++) {
-            /*
-             * Values in the CTM are encoded in S31.32 sign-magnitude fixed-
-             * point format, while NvKms CSC values are signed 2's-complement
-             * S15.16 (Ssign-extend12-3.16?) fixed-point format.
-             */
-            NvU64 ctmVal = drm_ctm->matrix[y*3 + x];
-            NvU64 signBit = ctmVal & (1ULL << 63);
-            NvU64 magnitude = ctmVal & ~signBit;
-
-            /*
-             * Drop the low 16 bits of the fractional part and the high 17 bits
-             * of the integral part. Drop 17 bits to avoid corner cases where
-             * the highest resulting bit is a 1, causing the `cscVal = -cscVal`
-             * line to result in a positive number.
-             */
-            NvS32 cscVal = (magnitude >> 16) & ((1ULL << 31) - 1);
-            if (signBit) {
-                cscVal = -cscVal;
-            }
-
-            nvkms_csc->m[y][x] = cscVal;
-        }
-    }
-}
-
-static int color_mgmt_config_set(struct nv_drm_crtc_state *nv_crtc_state,
-                                 struct NvKmsKapiHeadRequestedConfig *req_config)
-{
-    struct NvKmsKapiHeadModeSetConfig *modeset_config =
-        &req_config->modeSetConfig;
-    struct drm_crtc_state *crtc_state = &nv_crtc_state->base;
-    int ret = 0;
-
-    struct drm_color_lut *degamma_lut = NULL;
-    struct drm_color_ctm *ctm = NULL;
-    struct drm_color_lut *gamma_lut = NULL;
-    uint64_t degamma_len = 0;
-    uint64_t gamma_len = 0;
-
-    int i;
-    struct drm_plane *plane;
-    struct drm_plane_state *plane_state;
-
-    /*
-     * According to the comment in the Linux kernel's
-     * drivers/gpu/drm/drm_color_mgmt.c, if any of these properties are NULL,
-     * that LUT or CTM needs to be changed to a linear LUT or identity matrix
-     * respectively.
-     */
-
-    req_config->flags.lutChanged = NV_TRUE;
-    if (crtc_state->degamma_lut) {
-        nv_crtc_state->ilut_ramps = nv_drm_calloc(1, sizeof(*nv_crtc_state->ilut_ramps));
-        if (!nv_crtc_state->ilut_ramps) {
-            ret = -ENOMEM;
-            goto fail;
-        }
-
-        degamma_lut = (struct drm_color_lut *)crtc_state->degamma_lut->data;
-        degamma_len = crtc_state->degamma_lut->length /
-                      sizeof(struct drm_color_lut);
-
-        if ((ret = color_mgmt_config_copy_lut(nv_crtc_state->ilut_ramps,
-                                              degamma_lut,
-                                              degamma_len)) != 0) {
-            goto fail;
-        }
-
-        modeset_config->lut.input.specified = NV_TRUE;
-        modeset_config->lut.input.depth     = 30; /* specify the full LUT */
-        modeset_config->lut.input.start     = 0;
-        modeset_config->lut.input.end       = degamma_len - 1;
-        modeset_config->lut.input.pRamps    = nv_crtc_state->ilut_ramps;
-    } else {
-        /* setting input.end to 0 is equivalent to disabling the LUT, which
-         * should be equivalent to a linear LUT */
-        modeset_config->lut.input.specified = NV_TRUE;
-        modeset_config->lut.input.depth     = 30; /* specify the full LUT */
-        modeset_config->lut.input.start     = 0;
-        modeset_config->lut.input.end       = 0;
-        modeset_config->lut.input.pRamps    = NULL;
-    }
-
-    nv_drm_for_each_new_plane_in_state(crtc_state->state, plane,
-                                       plane_state, i) {
-        struct nv_drm_plane *nv_plane = to_nv_plane(plane);
-        uint32_t layer = nv_plane->layer_idx;
-        struct NvKmsKapiLayerRequestedConfig *layer_config;
-
-        if (layer == NVKMS_KAPI_LAYER_INVALID_IDX || plane_state->crtc != crtc_state->crtc) {
-            continue;
-        }
-        layer_config = &req_config->layerRequestedConfig[layer];
-
-        if (layer == NVKMS_KAPI_LAYER_PRIMARY_IDX && crtc_state->ctm) {
-            ctm = (struct drm_color_ctm *)crtc_state->ctm->data;
-
-            color_mgmt_config_ctm_to_csc(&layer_config->config.csc, ctm);
-            layer_config->config.cscUseMain = NV_FALSE;
-        } else {
-            /* When crtc_state->ctm is unset, this also sets the main layer to
-             * the identity matrix.
-             */
-            layer_config->config.csc = NVKMS_IDENTITY_CSC_MATRIX;
-        }
-        layer_config->flags.cscChanged = NV_TRUE;
-    }
-
-    if (crtc_state->gamma_lut) {
-        nv_crtc_state->olut_ramps = nv_drm_calloc(1, sizeof(*nv_crtc_state->olut_ramps));
-        if (!nv_crtc_state->olut_ramps) {
-            ret = -ENOMEM;
-            goto fail;
-        }
-
-        gamma_lut = (struct drm_color_lut *)crtc_state->gamma_lut->data;
-        gamma_len = crtc_state->gamma_lut->length /
-                    sizeof(struct drm_color_lut);
-
-        if ((ret = color_mgmt_config_copy_lut(nv_crtc_state->olut_ramps,
-                                              gamma_lut,
-                                              gamma_len)) != 0) {
-            goto fail;
-        }
-
-        modeset_config->lut.output.specified = NV_TRUE;
-        modeset_config->lut.output.enabled   = NV_TRUE;
-        modeset_config->lut.output.pRamps    = nv_crtc_state->olut_ramps;
-    } else {
-        /* disabling the output LUT should be equivalent to setting a linear
-         * LUT */
-        modeset_config->lut.output.specified = NV_TRUE;
-        modeset_config->lut.output.enabled   = NV_FALSE;
-        modeset_config->lut.output.pRamps    = NULL;
-    }
-
-    return 0;
-
-fail:
-    /* free allocated state */
-    nv_drm_free(nv_crtc_state->ilut_ramps);
-    nv_drm_free(nv_crtc_state->olut_ramps);
-
-    /* remove dangling pointers */
-    nv_crtc_state->ilut_ramps = NULL;
-    nv_crtc_state->olut_ramps = NULL;
-    modeset_config->lut.input.pRamps = NULL;
-    modeset_config->lut.output.pRamps = NULL;
-
-    /* prevent attempts at reading NULLs */
-    modeset_config->lut.input.specified = NV_FALSE;
-    modeset_config->lut.output.specified = NV_FALSE;
-
-    return ret;
-}
-#endif /* NV_DRM_COLOR_MGMT_AVAILABLE */
-
 /**
 * nv_drm_crtc_atomic_check() can fail after it has modified
 * the 'nv_drm_crtc_state::req_config', that is fine because 'nv_drm_crtc_state'
@@ -1104,9 +887,6 @@ static int nv_drm_crtc_atomic_check(struct drm_crtc *crtc,
    struct NvKmsKapiHeadRequestedConfig *req_config =
        &nv_crtc_state->req_config;
    int ret = 0;
-#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
-    struct nv_drm_device *nv_dev = to_nv_device(crtc_state->crtc->dev);
-#endif

    if (crtc_state->mode_changed) {
        drm_mode_to_nvkms_display_mode(&crtc_state->mode,
@@ -1145,25 +925,6 @@ static int nv_drm_crtc_atomic_check(struct drm_crtc *crtc,
        req_config->flags.activeChanged = NV_TRUE;
    }

-#if defined(NV_DRM_CRTC_STATE_HAS_VRR_ENABLED)
-    req_config->modeSetConfig.vrrEnabled = crtc_state->vrr_enabled;
-#endif
-
-#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
-    if (nv_dev->drmMasterChangedSinceLastAtomicCommit &&
-        (crtc_state->degamma_lut ||
-         crtc_state->ctm ||
-         crtc_state->gamma_lut)) {
-
-        crtc_state->color_mgmt_changed = NV_TRUE;
-    }
-    if (crtc_state->color_mgmt_changed) {
-        if ((ret = color_mgmt_config_set(nv_crtc_state, req_config)) != 0) {
-            return ret;
-        }
-    }
-#endif
-
    return ret;
 }

@@ -1395,8 +1156,6 @@ nv_drm_plane_create(struct drm_device *dev,
            plane,
            validLayerRRTransforms);

-    nv_drm_free(formats);
-
    return plane;

 failed_plane_init:
@@ -1461,22 +1220,6 @@ static struct drm_crtc *__nv_drm_crtc_create(struct nv_drm_device *nv_dev,

    drm_crtc_helper_add(&nv_crtc->base, &nv_crtc_helper_funcs);

-#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
-#if defined(NV_DRM_CRTC_ENABLE_COLOR_MGMT_PRESENT)
-    drm_crtc_enable_color_mgmt(&nv_crtc->base, NVKMS_LUT_ARRAY_SIZE, true,
-                               NVKMS_LUT_ARRAY_SIZE);
-#else
-    drm_helper_crtc_enable_color_mgmt(&nv_crtc->base, NVKMS_LUT_ARRAY_SIZE,
-                                      NVKMS_LUT_ARRAY_SIZE);
-#endif
-    ret = drm_mode_crtc_set_gamma_size(&nv_crtc->base, NVKMS_LUT_ARRAY_SIZE);
-    if (ret != 0) {
-        NV_DRM_DEV_LOG_WARN(
-            nv_dev,
-            "Failed to initialize legacy gamma support for head %u", head);
-    }
-#endif
-
    return &nv_crtc->base;

 failed_init_crtc:
@@ -1585,16 +1328,10 @@ static void NvKmsKapiCrcsToDrm(const struct NvKmsKapiCrcs *crcs,
 {
    drmCrcs->outputCrc32.value = crcs->outputCrc32.value;
    drmCrcs->outputCrc32.supported = crcs->outputCrc32.supported;
-    drmCrcs->outputCrc32.__pad0 = 0;
-    drmCrcs->outputCrc32.__pad1 = 0;
    drmCrcs->rasterGeneratorCrc32.value = crcs->rasterGeneratorCrc32.value;
    drmCrcs->rasterGeneratorCrc32.supported = crcs->rasterGeneratorCrc32.supported;
-    drmCrcs->rasterGeneratorCrc32.__pad0 = 0;
-    drmCrcs->rasterGeneratorCrc32.__pad1 = 0;
    drmCrcs->compositorCrc32.value = crcs->compositorCrc32.value;
    drmCrcs->compositorCrc32.supported = crcs->compositorCrc32.supported;
-    drmCrcs->compositorCrc32.__pad0 = 0;
-    drmCrcs->compositorCrc32.__pad1 = 0;
 }

 int nv_drm_get_crtc_crc32_v2_ioctl(struct drm_device *dev,
--- a/kernel-open/nvidia-drm/nvidia-drm-crtc.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-crtc.h
@@ -129,9 +129,6 @@ struct nv_drm_crtc_state {
     */
    struct NvKmsKapiHeadRequestedConfig req_config;

-    struct NvKmsLutRamps *ilut_ramps;
-    struct NvKmsLutRamps *olut_ramps;
-
    /**
     * @nv_flip:
     *
--- a/kernel-open/nvidia-drm/nvidia-drm-drv.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-drv.c
@@ -44,10 +44,6 @@
 #include <drm/drmP.h>
 #endif

-#if defined(NV_DRM_DRM_ATOMIC_UAPI_H_PRESENT)
-#include <drm/drm_atomic_uapi.h>
-#endif
-
 #if defined(NV_DRM_DRM_VBLANK_H_PRESENT)
 #include <drm/drm_vblank.h>
 #endif
@@ -64,15 +60,6 @@
 #include <drm/drm_ioctl.h>
 #endif

-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
-#include <drm/drm_aperture.h>
-#include <drm/drm_fb_helper.h>
-#endif
-
-#if defined(NV_DRM_DRM_FBDEV_GENERIC_H_PRESENT)
-#include <drm/drm_fbdev_generic.h>
-#endif
-
 #include <linux/pci.h>

 /*
@@ -97,11 +84,6 @@
 #include <drm/drm_atomic_helper.h>
 #endif

-static int nv_drm_revoke_modeset_permission(struct drm_device *dev,
-                                            struct drm_file *filep,
-                                            NvU32 dpyId);
-static int nv_drm_revoke_sub_ownership(struct drm_device *dev);
-
 static struct nv_drm_device *dev_list = NULL;

 static const char* nv_get_input_colorspace_name(
@@ -123,6 +105,7 @@ static const char* nv_get_input_colorspace_name(

 #if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)

+#if defined(NV_DRM_OUTPUT_POLL_CHANGED_PRESENT)
 static void nv_drm_output_poll_changed(struct drm_device *dev)
 {
    struct drm_connector *connector = NULL;
@@ -166,6 +149,7 @@ static void nv_drm_output_poll_changed(struct drm_device *dev)
    nv_drm_connector_list_iter_end(&conn_iter);
 #endif
 }
+#endif /* NV_DRM_OUTPUT_POLL_CHANGED_PRESENT */

 static struct drm_framebuffer *nv_drm_framebuffer_create(
    struct drm_device *dev,
@@ -203,7 +187,9 @@ static const struct drm_mode_config_funcs nv_mode_config_funcs = {
    .atomic_check  = nv_drm_atomic_check,
    .atomic_commit = nv_drm_atomic_commit,

+    #if defined(NV_DRM_OUTPUT_POLL_CHANGED_PRESENT)
    .output_poll_changed = nv_drm_output_poll_changed,
+    #endif
 };

 static void nv_drm_event_callback(const struct NvKmsKapiEvent *event)
@@ -478,11 +464,6 @@ static int nv_drm_load(struct drm_device *dev, unsigned long flags)

    nv_dev->supportsSyncpts = resInfo.caps.supportsSyncpts;

-    nv_dev->semsurf_stride = resInfo.caps.semsurf.stride;
-
-    nv_dev->semsurf_max_submitted_offset =
-        resInfo.caps.semsurf.maxSubmittedOffset;
-
 #if defined(NV_DRM_FORMAT_MODIFIERS_PRESENT)
    gen = nv_dev->pageKindGeneration;
    kind = nv_dev->genericPageKind;
@@ -569,8 +550,6 @@ static void __nv_drm_unload(struct drm_device *dev)

    mutex_lock(&nv_dev->lock);

-    WARN_ON(nv_dev->subOwnershipGranted);
-
    /* Disable event handling */

    atomic_set(&nv_dev->enable_event_handling, false);
@@ -620,15 +599,9 @@ static int __nv_drm_master_set(struct drm_device *dev,
 {
    struct nv_drm_device *nv_dev = to_nv_device(dev);

-    /*
-     * If this device is driving a framebuffer, then nvidia-drm already has
-     * modeset ownership. Otherwise, grab ownership now.
-     */
-    if (!nv_dev->hasFramebufferConsole &&
-        !nvKms->grabOwnership(nv_dev->pDevice)) {
+    if (!nvKms->grabOwnership(nv_dev->pDevice)) {
        return -EINVAL;
    }
-    nv_dev->drmMasterChangedSinceLastAtomicCommit = NV_TRUE;

    return 0;
 }
@@ -662,9 +635,6 @@ void nv_drm_master_drop(struct drm_device *dev, struct drm_file *file_priv)
    struct nv_drm_device *nv_dev = to_nv_device(dev);
    int err;

-    nv_drm_revoke_modeset_permission(dev, file_priv, 0);
-    nv_drm_revoke_sub_ownership(dev);
-
    /*
     * After dropping nvkms modeset onwership, it is not guaranteed that
     * drm and nvkms modeset state will remain in sync.  Therefore, disable
@@ -689,9 +659,7 @@ void nv_drm_master_drop(struct drm_device *dev, struct drm_file *file_priv)

    drm_modeset_unlock_all(dev);

-    if (!nv_dev->hasFramebufferConsole) {
-        nvKms->releaseOwnership(nv_dev->pDevice);
-    }
+    nvKms->releaseOwnership(nv_dev->pDevice);
 }
 #endif /* NV_DRM_ATOMIC_MODESET_AVAILABLE */

@@ -729,30 +697,15 @@ static int nv_drm_get_dev_info_ioctl(struct drm_device *dev,

    params->gpu_id = nv_dev->gpu_info.gpu_id;
    params->primary_index = dev->primary->index;
-    params->supports_alloc = false;
+#if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
+    params->generic_page_kind = nv_dev->genericPageKind;
+    params->page_kind_generation = nv_dev->pageKindGeneration;
+    params->sector_layout = nv_dev->sectorLayout;
+#else
    params->generic_page_kind = 0;
    params->page_kind_generation = 0;
    params->sector_layout = 0;
-    params->supports_sync_fd = false;
-    params->supports_semsurf = false;
-
-#if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
-    /* Memory allocation and semaphore surfaces are only supported
-     * if the modeset = 1 parameter is set */
-    if (nv_dev->pDevice != NULL) {
-        params->supports_alloc = true;
-        params->generic_page_kind = nv_dev->genericPageKind;
-        params->page_kind_generation = nv_dev->pageKindGeneration;
-        params->sector_layout = nv_dev->sectorLayout;
-
-        if (nv_dev->semsurf_stride != 0) {
-            params->supports_semsurf = true;
-#if defined(NV_SYNC_FILE_GET_FENCE_PRESENT)
-            params->supports_sync_fd = true;
-#endif /* defined(NV_SYNC_FILE_GET_FENCE_PRESENT) */
-        }
-    }
-#endif /* defined(NV_DRM_ATOMIC_MODESET_AVAILABLE) */
+#endif

    return 0;
 }
@@ -884,10 +837,10 @@ static NvU32 nv_drm_get_head_bit_from_connector(struct drm_connector *connector)
    return 0;
 }

-static int nv_drm_grant_modeset_permission(struct drm_device *dev,
-                                           struct drm_nvidia_grant_permissions_params *params,
-                                           struct drm_file *filep)
+static int nv_drm_grant_permission_ioctl(struct drm_device *dev, void *data,
+                                         struct drm_file *filep)
 {
+    struct drm_nvidia_grant_permissions_params *params = data;
    struct nv_drm_device *nv_dev = to_nv_device(dev);
    struct nv_drm_connector *target_nv_connector = NULL;
    struct nv_drm_crtc *target_nv_crtc = NULL;
@@ -1009,102 +962,26 @@ done:
    return ret;
 }

-static int nv_drm_grant_sub_ownership(struct drm_device *dev,
-                                      struct drm_nvidia_grant_permissions_params *params)
+static bool nv_drm_revoke_connector(struct nv_drm_device *nv_dev,
+                                    struct nv_drm_connector *nv_connector)
 {
-    int ret = -EINVAL;
-    struct nv_drm_device *nv_dev = to_nv_device(dev);
-    struct drm_modeset_acquire_ctx *pctx;
-#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
-    struct drm_modeset_acquire_ctx ctx;
-    DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, DRM_MODESET_ACQUIRE_INTERRUPTIBLE,
-                                ret);
-    pctx = &ctx;
-#else
-    mutex_lock(&dev->mode_config.mutex);
-    pctx = dev->mode_config.acquire_ctx;
-#endif
-
-    if (nv_dev->subOwnershipGranted ||
-        !nvKms->grantSubOwnership(params->fd, nv_dev->pDevice)) {
-        goto done;
-    }
-
-    /*
-     * When creating an ownership grant, shut down all heads and disable flip
-     * notifications.
-     */
-    ret = nv_drm_atomic_helper_disable_all(dev, pctx);
-    if (ret != 0) {
-        NV_DRM_DEV_LOG_ERR(
-            nv_dev,
-            "nv_drm_atomic_helper_disable_all failed with error code %d!",
-            ret);
-    }
-
-    atomic_set(&nv_dev->enable_event_handling, false);
-    nv_dev->subOwnershipGranted = NV_TRUE;
-
-    ret = 0;
-
-done:
-#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
-    DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
-#else
-    mutex_unlock(&dev->mode_config.mutex);
-#endif
-    return 0;
-}
-
-static int nv_drm_grant_permission_ioctl(struct drm_device *dev, void *data,
-                                         struct drm_file *filep)
-{
-    struct drm_nvidia_grant_permissions_params *params = data;
-
-    if (params->type == NV_DRM_PERMISSIONS_TYPE_MODESET) {
-        return nv_drm_grant_modeset_permission(dev, params, filep);
-    } else if (params->type == NV_DRM_PERMISSIONS_TYPE_SUB_OWNER) {
-        return nv_drm_grant_sub_ownership(dev, params);
-    }
-
-    return -EINVAL;
-}
-
-static int
-nv_drm_atomic_disable_connector(struct drm_atomic_state *state,
-                                struct nv_drm_connector *nv_connector)
-{
-    struct drm_crtc_state *crtc_state;
-    struct drm_connector_state *connector_state;
-    int ret = 0;
-
+    bool ret = true;
    if (nv_connector->modeset_permission_crtc) {
-        crtc_state = drm_atomic_get_crtc_state(
-            state, &nv_connector->modeset_permission_crtc->base);
-        if (!crtc_state) {
-            return -EINVAL;
-        }
-
-        crtc_state->active = false;
-        ret = drm_atomic_set_mode_prop_for_crtc(crtc_state, NULL);
-        if (ret < 0) {
-            return ret;
+        if (nv_connector->nv_detected_encoder) {
+            ret = nvKms->revokePermissions(
+                nv_dev->pDevice, nv_connector->modeset_permission_crtc->head,
+                nv_connector->nv_detected_encoder->hDisplay);
        }
+        nv_connector->modeset_permission_crtc->modeset_permission_filep = NULL;
+        nv_connector->modeset_permission_crtc = NULL;
    }
-
-    connector_state = drm_atomic_get_connector_state(state, &nv_connector->base);
-    if (!connector_state) {
-        return -EINVAL;
-    }
-
-    return drm_atomic_set_crtc_for_connector(connector_state, NULL);
+    nv_connector->modeset_permission_filep = NULL;
+    return ret;
 }

-static int nv_drm_revoke_modeset_permission(struct drm_device *dev,
-                                            struct drm_file *filep, NvU32 dpyId)
+static int nv_drm_revoke_permission(struct drm_device *dev,
+                                    struct drm_file *filep, NvU32 dpyId)
 {
-    struct drm_modeset_acquire_ctx *pctx;
-    struct drm_atomic_state *state;
    struct drm_connector *connector;
    struct drm_crtc *crtc;
    int ret = 0;
@@ -1115,19 +992,10 @@ static int nv_drm_revoke_modeset_permission(struct drm_device *dev,
    struct drm_modeset_acquire_ctx ctx;
    DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, DRM_MODESET_ACQUIRE_INTERRUPTIBLE,
                               ret);
-    pctx = &ctx;
 #else
    mutex_lock(&dev->mode_config.mutex);
-    pctx = dev->mode_config.acquire_ctx;
 #endif

-    state = drm_atomic_state_alloc(dev);
-    if (!state) {
-        ret = -ENOMEM;
-        goto done;
-    }
-    state->acquire_ctx = pctx;
-
    /*
     * If dpyId is set, only revoke those specific resources. Otherwise,
     * it is from closing the file so revoke all resources for that filep.
@@ -1139,13 +1007,10 @@ static int nv_drm_revoke_modeset_permission(struct drm_device *dev,
        struct nv_drm_connector *nv_connector = to_nv_connector(connector);
        if (nv_connector->modeset_permission_filep == filep &&
            (!dpyId || nv_drm_connector_is_dpy_id(connector, dpyId))) {
-            ret = nv_drm_atomic_disable_connector(state, nv_connector);
-            if (ret < 0) {
-                goto done;
+            if (!nv_drm_connector_revoke_permissions(dev, nv_connector)) {
+                ret = -EINVAL;
+                // Continue trying to revoke as much as possible.
            }
-
-            // Continue trying to revoke as much as possible.
-            nv_drm_connector_revoke_permissions(dev, nv_connector);
        }
    }
 #if defined(NV_DRM_CONNECTOR_LIST_ITER_PRESENT)
@@ -1159,25 +1024,6 @@ static int nv_drm_revoke_modeset_permission(struct drm_device *dev,
        }
    }

-    ret = drm_atomic_commit(state);
-done:
-#if defined(NV_DRM_ATOMIC_STATE_REF_COUNTING_PRESENT)
-    drm_atomic_state_put(state);
-#else
-    if (ret != 0) {
-        drm_atomic_state_free(state);
-    } else {
-        /*
-         * In case of success, drm_atomic_commit() takes care to cleanup and
-         * free @state.
-         *
-         * Comment placed above drm_atomic_commit() says: The caller must not
-         * free or in any other way access @state. If the function fails then
-         * the caller must clean up @state itself.
-         */
-    }
-#endif
-
 #if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
    DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
 #else
@@ -1187,55 +1033,14 @@ done:
    return ret;
 }

-static int nv_drm_revoke_sub_ownership(struct drm_device *dev)
-{
-    int ret = -EINVAL;
-    struct nv_drm_device *nv_dev = to_nv_device(dev);
-#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
-    struct drm_modeset_acquire_ctx ctx;
-    DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, DRM_MODESET_ACQUIRE_INTERRUPTIBLE,
-                               ret);
-#else
-    mutex_lock(&dev->mode_config.mutex);
-#endif
-
-    if (!nv_dev->subOwnershipGranted) {
-        goto done;
-    }
-
-    if (!nvKms->revokeSubOwnership(nv_dev->pDevice)) {
-        NV_DRM_DEV_LOG_ERR(nv_dev, "Failed to revoke sub-ownership from NVKMS");
-        goto done;
-    }
-
-    nv_dev->subOwnershipGranted = NV_FALSE;
-    atomic_set(&nv_dev->enable_event_handling, true);
-    ret = 0;
-
-done:
-#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
-    DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
-#else
-    mutex_unlock(&dev->mode_config.mutex);
-#endif
-    return ret;
-}
-
 static int nv_drm_revoke_permission_ioctl(struct drm_device *dev, void *data,
                                          struct drm_file *filep)
 {
    struct drm_nvidia_revoke_permissions_params *params = data;
-
-    if (params->type == NV_DRM_PERMISSIONS_TYPE_MODESET) {
-        if (!params->dpyId) {
-            return -EINVAL;
-        }
-        return nv_drm_revoke_modeset_permission(dev, filep, params->dpyId);
-    } else if (params->type == NV_DRM_PERMISSIONS_TYPE_SUB_OWNER) {
-        return nv_drm_revoke_sub_ownership(dev);
+    if (!params->dpyId) {
+        return -EINVAL;
    }
-
-    return -EINVAL;
+    return nv_drm_revoke_permission(dev, filep, params->dpyId);
 }

 static void nv_drm_postclose(struct drm_device *dev, struct drm_file *filep)
@@ -1250,7 +1055,7 @@ static void nv_drm_postclose(struct drm_device *dev, struct drm_file *filep)
        dev->mode_config.num_connector > 0 &&
        dev->mode_config.connector_list.next != NULL &&
        dev->mode_config.connector_list.prev != NULL) {
-        nv_drm_revoke_modeset_permission(dev, filep, 0);
+        nv_drm_revoke_permission(dev, filep, 0);
    }
 }
 #endif /* NV_DRM_ATOMIC_MODESET_AVAILABLE */
@@ -1509,23 +1314,23 @@ static const struct drm_ioctl_desc nv_drm_ioctls[] = {
    DRM_IOCTL_DEF_DRV(NVIDIA_GEM_PRIME_FENCE_ATTACH,
                      nv_drm_gem_prime_fence_attach_ioctl,
                      DRM_RENDER_ALLOW|DRM_UNLOCKED),
-    DRM_IOCTL_DEF_DRV(NVIDIA_SEMSURF_FENCE_CTX_CREATE,
-                      nv_drm_semsurf_fence_ctx_create_ioctl,
-                      DRM_RENDER_ALLOW|DRM_UNLOCKED),
-    DRM_IOCTL_DEF_DRV(NVIDIA_SEMSURF_FENCE_CREATE,
-                      nv_drm_semsurf_fence_create_ioctl,
-                      DRM_RENDER_ALLOW|DRM_UNLOCKED),
-    DRM_IOCTL_DEF_DRV(NVIDIA_SEMSURF_FENCE_WAIT,
-                      nv_drm_semsurf_fence_wait_ioctl,
-                      DRM_RENDER_ALLOW|DRM_UNLOCKED),
-    DRM_IOCTL_DEF_DRV(NVIDIA_SEMSURF_FENCE_ATTACH,
-                      nv_drm_semsurf_fence_attach_ioctl,
-                      DRM_RENDER_ALLOW|DRM_UNLOCKED),
 #endif

+    /*
+     * DRM_UNLOCKED is implicit for all non-legacy DRM driver IOCTLs since Linux
+     * v4.10 commit fa5386459f06 "drm: Used DRM_LEGACY for all legacy functions"
+     * (Linux v4.4 commit ea487835e887 "drm: Enforce unlocked ioctl operation
+     * for kms driver ioctls" previously did it only for drivers that set the
+     * DRM_MODESET flag), so this will race with SET_CLIENT_CAP. Linux v4.11
+     * commit dcf727ab5d17 "drm: setclientcap doesn't need the drm BKL" also
+     * removed locking from SET_CLIENT_CAP so there is no use attempting to lock
+     * manually. The latter commit acknowledges that this can expose userspace
+     * to inconsistent behavior when racing with itself, but accepts that risk.
+     */
    DRM_IOCTL_DEF_DRV(NVIDIA_GET_CLIENT_CAPABILITY,
                      nv_drm_get_client_capability_ioctl,
                      0),
+
 #if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
    DRM_IOCTL_DEF_DRV(NVIDIA_GET_CRTC_CRC32,
                      nv_drm_get_crtc_crc32_ioctl,
@@ -1724,30 +1529,6 @@ static void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)
        goto failed_drm_register;
    }

-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
-    if (nv_drm_fbdev_module_param &&
-        drm_core_check_feature(dev, DRIVER_MODESET)) {
-
-        if (!nvKms->grabOwnership(nv_dev->pDevice)) {
-            NV_DRM_DEV_LOG_ERR(nv_dev, "Failed to grab NVKMS modeset ownership");
-            goto failed_grab_ownership;
-        }
-
-        if (device->bus == &pci_bus_type) {
-            struct pci_dev *pdev = to_pci_dev(device);
-
-#if defined(NV_DRM_APERTURE_REMOVE_CONFLICTING_PCI_FRAMEBUFFERS_HAS_DRIVER_ARG)
-            drm_aperture_remove_conflicting_pci_framebuffers(pdev, &nv_drm_driver);
-#else
-            drm_aperture_remove_conflicting_pci_framebuffers(pdev, nv_drm_driver.name);
-#endif
-        }
-        drm_fbdev_generic_setup(dev, 32);
-
-        nv_dev->hasFramebufferConsole = NV_TRUE;
-    }
-#endif /* defined(NV_DRM_FBDEV_GENERIC_AVAILABLE) */
-
    /* Add NVIDIA-DRM device into list */

    nv_dev->next = dev_list;
@@ -1755,12 +1536,6 @@ static void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)

    return; /* Success */

-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
-failed_grab_ownership:
-
-    drm_dev_unregister(dev);
-#endif
-
 failed_drm_register:

    nv_drm_dev_free(dev);
@@ -1823,16 +1598,9 @@ void nv_drm_remove_devices(void)
 {
    while (dev_list != NULL) {
        struct nv_drm_device *next = dev_list->next;
-        struct drm_device *dev = dev_list->dev;

-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
-        if (dev_list->hasFramebufferConsole) {
-            drm_atomic_helper_shutdown(dev);
-            nvKms->releaseOwnership(dev_list->pDevice);
-        }
-#endif
-        drm_dev_unregister(dev);
-        nv_drm_dev_free(dev);
+        drm_dev_unregister(dev_list->dev);
+        nv_drm_dev_free(dev_list->dev);

        nv_drm_free(dev_list);

--- a/kernel-open/nvidia-drm/nvidia-drm-fence.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-fence.c
--- a/kernel-open/nvidia-drm/nvidia-drm-fence.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-fence.h
@@ -41,22 +41,6 @@ int nv_drm_prime_fence_context_create_ioctl(struct drm_device *dev,
 int nv_drm_gem_prime_fence_attach_ioctl(struct drm_device *dev,
                                        void *data, struct drm_file *filep);

-int nv_drm_semsurf_fence_ctx_create_ioctl(struct drm_device *dev,
-                                          void *data,
-                                          struct drm_file *filep);
-
-int nv_drm_semsurf_fence_create_ioctl(struct drm_device *dev,
-                                      void *data,
-                                      struct drm_file *filep);
-
-int nv_drm_semsurf_fence_wait_ioctl(struct drm_device *dev,
-                                    void *data,
-                                    struct drm_file *filep);
-
-int nv_drm_semsurf_fence_attach_ioctl(struct drm_device *dev,
-                                      void *data,
-                                      struct drm_file *filep);
-
 #endif /* NV_DRM_FENCE_AVAILABLE */

 #endif /* NV_DRM_AVAILABLE */
--- a/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c
@@ -243,6 +243,15 @@ static int __nv_drm_nvkms_gem_obj_init(
    NvU64 *pages = NULL;
    NvU32 numPages = 0;

+    if ((size % PAGE_SIZE) != 0) {
+        NV_DRM_DEV_LOG_ERR(
+            nv_dev,
+            "NvKmsKapiMemory 0x%p size should be in a multiple of page size to "
+            "create a gem object",
+            pMemory);
+        return -EINVAL;
+    }
+
    nv_nvkms_memory->pPhysicalAddress = NULL;
    nv_nvkms_memory->pWriteCombinedIORemapAddress = NULL;
    nv_nvkms_memory->physically_mapped = false;
@@ -465,7 +474,7 @@ int nv_drm_gem_alloc_nvkms_memory_ioctl(struct drm_device *dev,
        goto failed;
    }

-    if ((p->__pad0 != 0) || (p->__pad1 != 0)) {
+    if (p->__pad != 0) {
        ret = -EINVAL;
        NV_DRM_DEV_LOG_ERR(nv_dev, "non-zero value in padding field");
        goto failed;
--- a/kernel-open/nvidia-drm/nvidia-drm-gem.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-gem.h
@@ -95,16 +95,6 @@ static inline struct nv_drm_gem_object *to_nv_gem_object(
 * 3e70fd160cf0b1945225eaa08dd2cb8544f21cb8 (2018-11-15).
 */

-static inline void
-nv_drm_gem_object_reference(struct nv_drm_gem_object *nv_gem)
-{
-#if defined(NV_DRM_GEM_OBJECT_GET_PRESENT)
-    drm_gem_object_get(&nv_gem->base);
-#else
-    drm_gem_object_reference(&nv_gem->base);
-#endif
-}
-
 static inline void
 nv_drm_gem_object_unreference_unlocked(struct nv_drm_gem_object *nv_gem)
 {
--- a/kernel-open/nvidia-drm/nvidia-drm-helper.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-helper.h
@@ -306,36 +306,6 @@ int nv_drm_atomic_helper_disable_all(struct drm_device *dev,
    for_each_plane_in_state(__state, plane, plane_state, __i)
 #endif

-/*
- * for_each_new_plane_in_state() was added by kernel commit
- * 581e49fe6b411f407102a7f2377648849e0fa37f which was Signed-off-by:
- *      Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
- *      Daniel Vetter <daniel.vetter@ffwll.ch>
- *
- * This commit also added the old_state and new_state pointers to
- * __drm_planes_state. Because of this, the best that can be done on kernel
- * versions without this macro is for_each_plane_in_state.
- */
-
-/**
- * nv_drm_for_each_new_plane_in_state - iterate over all planes in an atomic update
- * @__state: &struct drm_atomic_state pointer
- * @plane: &struct drm_plane iteration cursor
- * @new_plane_state: &struct drm_plane_state iteration cursor for the new state
- * @__i: int iteration cursor, for macro-internal use
- *
- * This iterates over all planes in an atomic update, tracking only the new
- * state. This is useful in enable functions, where we need the new state the
- * hardware should be in when the atomic commit operation has completed.
- */
-#if !defined(for_each_new_plane_in_state)
-#define nv_drm_for_each_new_plane_in_state(__state, plane, new_plane_state, __i) \
-    nv_drm_for_each_plane_in_state(__state, plane, new_plane_state, __i)
-#else
-#define nv_drm_for_each_new_plane_in_state(__state, plane, new_plane_state, __i) \
-    for_each_new_plane_in_state(__state, plane, new_plane_state, __i)
-#endif
-
 static inline struct drm_connector *
 nv_drm_connector_lookup(struct drm_device *dev, struct drm_file *filep,
                        uint32_t id)
@@ -612,6 +582,19 @@ static inline int nv_drm_format_num_planes(uint32_t format)

 #endif /* defined(NV_DRM_FORMAT_MODIFIERS_PRESENT) */

+/*
+ * DRM_UNLOCKED was removed with linux-next commit 2798ffcc1d6a ("drm: Remove
+ * locking for legacy ioctls and DRM_UNLOCKED"), but it was previously made
+ * implicit for all non-legacy DRM driver IOCTLs since Linux v4.10 commit
+ * fa5386459f06 "drm: Used DRM_LEGACY for all legacy functions" (Linux v4.4
+ * commit ea487835e887 "drm: Enforce unlocked ioctl operation for kms driver
+ * ioctls" previously did it only for drivers that set the DRM_MODESET flag), so
+ * it was effectively a no-op anyway.
+ */
+#if !defined(NV_DRM_UNLOCKED_IOCTL_FLAG_PRESENT)
+#define DRM_UNLOCKED 0
+#endif
+
 /*
 * drm_vma_offset_exact_lookup_locked() were added
 * by kernel commit 2225cfe46bcc which was Signed-off-by:
--- a/kernel-open/nvidia-drm/nvidia-drm-ioctl.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-ioctl.h
@@ -48,10 +48,6 @@
 #define DRM_NVIDIA_GET_CONNECTOR_ID_FOR_DPY_ID      0x11
 #define DRM_NVIDIA_GRANT_PERMISSIONS                0x12
 #define DRM_NVIDIA_REVOKE_PERMISSIONS               0x13
-#define DRM_NVIDIA_SEMSURF_FENCE_CTX_CREATE         0x14
-#define DRM_NVIDIA_SEMSURF_FENCE_CREATE             0x15
-#define DRM_NVIDIA_SEMSURF_FENCE_WAIT               0x16
-#define DRM_NVIDIA_SEMSURF_FENCE_ATTACH             0x17

 #define DRM_IOCTL_NVIDIA_GEM_IMPORT_NVKMS_MEMORY                           \
    DRM_IOWR((DRM_COMMAND_BASE + DRM_NVIDIA_GEM_IMPORT_NVKMS_MEMORY),      \
@@ -137,26 +133,6 @@
    DRM_IOWR((DRM_COMMAND_BASE + DRM_NVIDIA_REVOKE_PERMISSIONS),        \
             struct drm_nvidia_revoke_permissions_params)

-#define DRM_IOCTL_NVIDIA_SEMSURF_FENCE_CTX_CREATE                       \
-    DRM_IOWR((DRM_COMMAND_BASE +                                        \
-              DRM_NVIDIA_SEMSURF_FENCE_CTX_CREATE),                     \
-              struct drm_nvidia_semsurf_fence_ctx_create_params)
-
-#define DRM_IOCTL_NVIDIA_SEMSURF_FENCE_CREATE                           \
-    DRM_IOWR((DRM_COMMAND_BASE +                                        \
-              DRM_NVIDIA_SEMSURF_FENCE_CREATE),                         \
-              struct drm_nvidia_semsurf_fence_create_params)
-
-#define DRM_IOCTL_NVIDIA_SEMSURF_FENCE_WAIT                             \
-    DRM_IOW((DRM_COMMAND_BASE +                                         \
-              DRM_NVIDIA_SEMSURF_FENCE_WAIT),                           \
-              struct drm_nvidia_semsurf_fence_wait_params)
-
-#define DRM_IOCTL_NVIDIA_SEMSURF_FENCE_ATTACH                           \
-    DRM_IOW((DRM_COMMAND_BASE +                                         \
-              DRM_NVIDIA_SEMSURF_FENCE_ATTACH),                         \
-              struct drm_nvidia_semsurf_fence_attach_params)
-
 struct drm_nvidia_gem_import_nvkms_memory_params {
    uint64_t mem_size;           /* IN */

@@ -178,15 +154,10 @@ struct drm_nvidia_get_dev_info_params {
    uint32_t gpu_id;             /* OUT */
    uint32_t primary_index;      /* OUT; the "card%d" value */

-    uint32_t supports_alloc;     /* OUT */
-    /* The generic_page_kind, page_kind_generation, and sector_layout
-     * fields are only valid if supports_alloc is true.
-     * See DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D definitions of these. */
+    /* See DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D definitions of these */
    uint32_t generic_page_kind;    /* OUT */
    uint32_t page_kind_generation; /* OUT */
    uint32_t sector_layout;        /* OUT */
-    uint32_t supports_sync_fd;     /* OUT */
-    uint32_t supports_semsurf;     /* OUT */
 };

 struct drm_nvidia_prime_fence_context_create_params {
@@ -208,7 +179,6 @@ struct drm_nvidia_gem_prime_fence_attach_params {
    uint32_t handle;                /* IN GEM handle to attach fence to */
    uint32_t fence_context_handle;  /* IN GEM handle to fence context on which fence is run on */
    uint32_t sem_thresh;            /* IN Semaphore value to reach before signal */
-    uint32_t __pad;
 };

 struct drm_nvidia_get_client_capability_params {
@@ -220,8 +190,6 @@ struct drm_nvidia_get_client_capability_params {
 struct drm_nvidia_crtc_crc32 {
    uint32_t value; /* Read value, undefined if supported is false */
    uint8_t supported; /* Supported boolean, true if readable by hardware */
-    uint8_t __pad0;
-    uint16_t __pad1;
 };

 struct drm_nvidia_crtc_crc32_v2_out {
@@ -261,11 +229,10 @@ struct drm_nvidia_gem_alloc_nvkms_memory_params {
    uint32_t handle;              /* OUT */
    uint8_t  block_linear;        /* IN */
    uint8_t  compressible;        /* IN/OUT */
-    uint16_t __pad0;
+    uint16_t __pad;

    uint64_t memory_size;         /* IN */
    uint32_t flags;               /* IN */
-    uint32_t __pad1;
 };

 struct drm_nvidia_gem_export_dmabuf_memory_params {
@@ -299,90 +266,13 @@ struct drm_nvidia_get_connector_id_for_dpy_id_params {
    uint32_t connectorId; /* OUT */
 };

-enum drm_nvidia_permissions_type {
-    NV_DRM_PERMISSIONS_TYPE_MODESET = 2,
-    NV_DRM_PERMISSIONS_TYPE_SUB_OWNER = 3
-};
-
 struct drm_nvidia_grant_permissions_params {
    int32_t fd;           /* IN */
    uint32_t dpyId;       /* IN */
-    uint32_t type;        /* IN */
 };

 struct drm_nvidia_revoke_permissions_params {
    uint32_t dpyId;       /* IN */
-    uint32_t type;        /* IN */
-};
-
-struct drm_nvidia_semsurf_fence_ctx_create_params {
-    uint64_t index;             /* IN Index of the desired semaphore in the
-                                 * fence context's semaphore surface */
-
-    /* Params for importing userspace semaphore surface */
-    uint64_t nvkms_params_ptr;  /* IN */
-    uint64_t nvkms_params_size; /* IN */
-
-    uint32_t handle;            /* OUT GEM handle to fence context */
-    uint32_t __pad;
-};
-
-struct drm_nvidia_semsurf_fence_create_params {
-    uint32_t fence_context_handle;  /* IN GEM handle to fence context on which
-                                     * fence is run on */
-
-    uint32_t timeout_value_ms;      /* IN Timeout value in ms for the fence
-                                     * after which the fence will be signaled
-                                     * with its error status set to -ETIMEDOUT.
-                                     * Default timeout value is 5000ms */
-
-    uint64_t wait_value;            /* IN Semaphore value to reach before signal */
-
-    int32_t  fd;                    /* OUT sync FD object representing the
-                                     * semaphore at the specified index reaching
-                                     * a value >= wait_value */
-    uint32_t __pad;
-};
-
-/*
- * Note there is no provision for timeouts in this ioctl. The kernel
- * documentation asserts timeouts should be handled by fence producers, and
- * that waiters should not second-guess their logic, as it is producers rather
- * than consumers that have better information when it comes to determining a
- * reasonable timeout for a given workload.
- */
-struct drm_nvidia_semsurf_fence_wait_params {
-    uint32_t fence_context_handle;  /* IN GEM handle to fence context which will
-                                     * be used to wait on the sync FD.  Need not
-                                     * be the fence context used to create the
-                                     * sync FD. */
-
-    int32_t  fd;                    /* IN sync FD object to wait on */
-
-    uint64_t pre_wait_value;        /* IN Wait for the semaphore represented by
-                                     * fence_context to reach this value before
-                                     * waiting for the sync file. */
-
-    uint64_t post_wait_value;       /* IN Signal the semaphore represented by
-                                     * fence_context to this value after waiting
-                                     * for the sync file */
-};
-
-struct drm_nvidia_semsurf_fence_attach_params {
-    uint32_t handle;                /* IN GEM handle of buffer */
-
-    uint32_t fence_context_handle;  /* IN GEM handle of fence context */
-
-    uint32_t timeout_value_ms;      /* IN Timeout value in ms for the fence
-                                     * after which the fence will be signaled
-                                     * with its error status set to -ETIMEDOUT.
-                                     * Default timeout value is 5000ms */
-
-    uint32_t shared;                /* IN If true, fence will reserve shared
-                                     * access to the buffer, otherwise it will
-                                     * reserve exclusive access */
-
-    uint64_t wait_value;            /* IN Semaphore value to reach before signal */
 };

 #endif /* _UAPI_NVIDIA_DRM_IOCTL_H_ */
--- a/kernel-open/nvidia-drm/nvidia-drm-linux.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-linux.c
@@ -35,13 +35,7 @@
 #include <drm/drmP.h>
 #endif

-#if defined(NV_LINUX_SYNC_FILE_H_PRESENT)
-#include <linux/file.h>
-#include <linux/sync_file.h>
-#endif
-
 #include <linux/vmalloc.h>
-#include <linux/sched.h>

 #include "nv-mm.h"

@@ -51,14 +45,6 @@ MODULE_PARM_DESC(
 bool nv_drm_modeset_module_param = false;
 module_param_named(modeset, nv_drm_modeset_module_param, bool, 0400);

-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
-MODULE_PARM_DESC(
-    fbdev,
-    "Create a framebuffer device (1 = enable, 0 = disable (default)) (EXPERIMENTAL)");
-bool nv_drm_fbdev_module_param = false;
-module_param_named(fbdev, nv_drm_fbdev_module_param, bool, 0400);
-#endif
-
 void *nv_drm_calloc(size_t nmemb, size_t size)
 {
    size_t total_size = nmemb * size;
@@ -95,10 +81,14 @@ char *nv_drm_asprintf(const char *fmt, ...)

 #if defined(NVCPU_X86) || defined(NVCPU_X86_64)
  #define WRITE_COMBINE_FLUSH()    asm volatile("sfence":::"memory")
+#elif defined(NVCPU_FAMILY_ARM)
+  #if defined(NVCPU_ARM)
+    #define WRITE_COMBINE_FLUSH()  { dsb(); outer_sync(); }
+  #elif defined(NVCPU_AARCH64)
+    #define WRITE_COMBINE_FLUSH()  mb()
+  #endif
 #elif defined(NVCPU_PPC64LE)
  #define WRITE_COMBINE_FLUSH()    asm volatile("sync":::"memory")
-#else
-  #define WRITE_COMBINE_FLUSH()    mb()
 #endif

 void nv_drm_write_combine_flush(void)
@@ -170,122 +160,6 @@ void nv_drm_vunmap(void *address)
    vunmap(address);
 }

-bool nv_drm_workthread_init(nv_drm_workthread *worker, const char *name)
-{
-    worker->shutting_down = false;
-    if (nv_kthread_q_init(&worker->q, name)) {
-        return false;
-    }
-
-    spin_lock_init(&worker->lock);
-
-    return true;
-}
-
-void nv_drm_workthread_shutdown(nv_drm_workthread *worker)
-{
-    unsigned long flags;
-
-    spin_lock_irqsave(&worker->lock, flags);
-    worker->shutting_down = true;
-    spin_unlock_irqrestore(&worker->lock, flags);
-
-    nv_kthread_q_stop(&worker->q);
-}
-
-void nv_drm_workthread_work_init(nv_drm_work *work,
-                                 void (*callback)(void *),
-                                 void *arg)
-{
-    nv_kthread_q_item_init(work, callback, arg);
-}
-
-int nv_drm_workthread_add_work(nv_drm_workthread *worker, nv_drm_work *work)
-{
-    unsigned long flags;
-    int ret = 0;
-
-    spin_lock_irqsave(&worker->lock, flags);
-    if (!worker->shutting_down) {
-        ret = nv_kthread_q_schedule_q_item(&worker->q, work);
-    }
-    spin_unlock_irqrestore(&worker->lock, flags);
-
-    return ret;
-}
-
-void nv_drm_timer_setup(nv_drm_timer *timer, void (*callback)(nv_drm_timer *nv_drm_timer))
-{
-    nv_timer_setup(timer, callback);
-}
-
-void nv_drm_mod_timer(nv_drm_timer *timer, unsigned long timeout_native)
-{
-    mod_timer(&timer->kernel_timer, timeout_native);
-}
-
-unsigned long nv_drm_timer_now(void)
-{
-    return jiffies;
-}
-
-unsigned long nv_drm_timeout_from_ms(NvU64 relative_timeout_ms)
-{
-    return jiffies + msecs_to_jiffies(relative_timeout_ms);
-}
-
-bool nv_drm_del_timer_sync(nv_drm_timer *timer)
-{
-    if (del_timer_sync(&timer->kernel_timer)) {
-        return true;
-    } else {
-        return false;
-    }
-}
-
-#if defined(NV_DRM_FENCE_AVAILABLE)
-int nv_drm_create_sync_file(nv_dma_fence_t *fence)
-{
-#if defined(NV_LINUX_SYNC_FILE_H_PRESENT)
-    struct sync_file *sync;
-    int fd = get_unused_fd_flags(O_CLOEXEC);
-
-    if (fd < 0) {
-        return fd;
-    }
-
-    /* sync_file_create() generates its own reference to the fence */
-    sync = sync_file_create(fence);
-
-    if (IS_ERR(sync)) {
-        put_unused_fd(fd);
-        return PTR_ERR(sync);
-    }
-
-    fd_install(fd, sync->file);
-
-    return fd;
-#else /* defined(NV_LINUX_SYNC_FILE_H_PRESENT) */
-    return -EINVAL;
-#endif  /* defined(NV_LINUX_SYNC_FILE_H_PRESENT) */
-}
-
-nv_dma_fence_t *nv_drm_sync_file_get_fence(int fd)
-{
-#if defined(NV_SYNC_FILE_GET_FENCE_PRESENT)
-    return sync_file_get_fence(fd);
-#else /* defined(NV_SYNC_FILE_GET_FENCE_PRESENT) */
-    return NULL;
-#endif  /* defined(NV_SYNC_FILE_GET_FENCE_PRESENT) */
-}
-#endif /* defined(NV_DRM_FENCE_AVAILABLE) */
-
-void nv_drm_yield(void)
-{
-    set_current_state(TASK_INTERRUPTIBLE);
-    schedule_timeout(1);
-}
-
 #endif /* NV_DRM_AVAILABLE */

 /*************************************************************************
--- a/kernel-open/nvidia-drm/nvidia-drm-modeset.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-modeset.c
@@ -237,14 +237,6 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,
    int i;
    int ret;

-    /*
-     * If sub-owner permission was granted to another NVKMS client, disallow
-     * modesets through the DRM interface.
-     */
-    if (nv_dev->subOwnershipGranted) {
-        return -EINVAL;
-    }
-
    memset(requested_config, 0, sizeof(*requested_config));

    /* Loop over affected crtcs and construct NvKmsKapiRequestedModeSetConfig */
@@ -282,6 +274,9 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,

                nv_new_crtc_state->nv_flip = NULL;
            }
+#if defined(NV_DRM_CRTC_STATE_HAS_VRR_ENABLED)
+            requested_config->headRequestedConfig[nv_crtc->head].modeSetConfig.vrrEnabled = new_crtc_state->vrr_enabled;
+#endif
        }
    }

@@ -297,9 +292,7 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,
                                   requested_config,
                                   &reply_config,
                                   commit)) {
-        if (commit || reply_config.flipResult != NV_KMS_FLIP_RESULT_IN_PROGRESS) {
-            return -EINVAL;
-        }
+        return -EINVAL;
    }

    if (commit && nv_dev->supportsSyncpts) {
@@ -395,56 +388,42 @@ int nv_drm_atomic_commit(struct drm_device *dev,
    struct nv_drm_device *nv_dev = to_nv_device(dev);

    /*
-     * XXX: drm_mode_config_funcs::atomic_commit() mandates to return -EBUSY
-     * for nonblocking commit if the commit would need to wait for previous
-     * updates (commit tasks/flip event) to complete. In case of blocking
-     * commits it mandates to wait for previous updates to complete. However,
-     * the kernel DRM-KMS documentation does explicitly allow maintaining a
-     * queue of outstanding commits.
-     *
-     * Our system already implements such a queue, but due to
-     * bug 4054608, it is currently not used.
+     * drm_mode_config_funcs::atomic_commit() mandates to return -EBUSY
+     * for nonblocking commit if previous updates (commit tasks/flip event) are
+     * pending. In case of blocking commits it mandates to wait for previous
+     * updates to complete.
     */
-    nv_drm_for_each_crtc_in_state(state, crtc, crtc_state, i) {
-        struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
+    if (nonblock) {
+        nv_drm_for_each_crtc_in_state(state, crtc, crtc_state, i) {
+            struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);

-        /*
-         * Here you aren't required to hold nv_drm_crtc::flip_list_lock
-         * because:
-         *
-         * The core DRM driver acquires lock for all affected crtcs before
-         * calling into ->commit() hook, therefore it is not possible for
-         * other threads to call into ->commit() hook affecting same crtcs
-         * and enqueue flip objects into flip_list -
-         *
-         *   nv_drm_atomic_commit_internal()
-         *     |-> nv_drm_atomic_apply_modeset_config(commit=true)
-         *           |-> nv_drm_crtc_enqueue_flip()
-         *
-         * Only possibility is list_empty check races with code path
-         * dequeuing flip object -
-         *
-         *   __nv_drm_handle_flip_event()
-         *     |-> nv_drm_crtc_dequeue_flip()
-         *
-         * But this race condition can't lead list_empty() to return
-         * incorrect result. nv_drm_crtc_dequeue_flip() in the middle of
-         * updating the list could not trick us into thinking the list is
-         * empty when it isn't.
-         */
-        if (nonblock) {
+            /*
+             * Here you aren't required to hold nv_drm_crtc::flip_list_lock
+             * because:
+             *
+             * The core DRM driver acquires lock for all affected crtcs before
+             * calling into ->commit() hook, therefore it is not possible for
+             * other threads to call into ->commit() hook affecting same crtcs
+             * and enqueue flip objects into flip_list -
+             *
+             *   nv_drm_atomic_commit_internal()
+             *     |-> nv_drm_atomic_apply_modeset_config(commit=true)
+             *           |-> nv_drm_crtc_enqueue_flip()
+             *
+             * Only possibility is list_empty check races with code path
+             * dequeuing flip object -
+             *
+             *   __nv_drm_handle_flip_event()
+             *     |-> nv_drm_crtc_dequeue_flip()
+             *
+             * But this race condition can't lead list_empty() to return
+             * incorrect result. nv_drm_crtc_dequeue_flip() in the middle of
+             * updating the list could not trick us into thinking the list is
+             * empty when it isn't.
+             */
            if (!list_empty(&nv_crtc->flip_list)) {
                return -EBUSY;
            }
-        } else {
-            if (wait_event_timeout(
-                    nv_dev->flip_event_wq,
-                    list_empty(&nv_crtc->flip_list),
-                    3 * HZ /* 3 second */) == 0) {
-                NV_DRM_DEV_LOG_ERR(
-                    nv_dev,
-                    "Flip event timeout on head %u", nv_crtc->head);
-            }
        }
    }

@@ -488,7 +467,6 @@ int nv_drm_atomic_commit(struct drm_device *dev,

        goto done;
    }
-    nv_dev->drmMasterChangedSinceLastAtomicCommit = NV_FALSE;

    nv_drm_for_each_crtc_in_state(state, crtc, crtc_state, i) {
        struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
--- a/kernel-open/nvidia-drm/nvidia-drm-os-interface.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-os-interface.h
@@ -29,47 +29,10 @@

 #if defined(NV_DRM_AVAILABLE)

-#if defined(NV_DRM_FENCE_AVAILABLE)
-#include "nvidia-dma-fence-helper.h"
-#endif
-
-#if defined(NV_LINUX)
-#include "nv-kthread-q.h"
-#include "linux/spinlock.h"
-
-typedef struct nv_drm_workthread {
-    spinlock_t lock;
-    struct nv_kthread_q q;
-    bool shutting_down;
-} nv_drm_workthread;
-
-typedef nv_kthread_q_item_t nv_drm_work;
-
-#else /* defined(NV_LINUX) */
-#error "Need to define deferred work primitives for this OS"
-#endif /* else defined(NV_LINUX) */
-
-#if defined(NV_LINUX)
-#include "nv-timer.h"
-
-typedef struct nv_timer nv_drm_timer;
-
-#else /* defined(NV_LINUX) */
-#error "Need to define kernel timer callback primitives for this OS"
-#endif /* else defined(NV_LINUX) */
-
-#if defined(NV_DRM_FBDEV_GENERIC_SETUP_PRESENT) && defined(NV_DRM_APERTURE_REMOVE_CONFLICTING_PCI_FRAMEBUFFERS_PRESENT)
-#define NV_DRM_FBDEV_GENERIC_AVAILABLE
-#endif
-
 struct page;

 /* Set to true when the atomic modeset feature is enabled. */
 extern bool nv_drm_modeset_module_param;
-#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
-/* Set to true when the nvidia-drm driver should install a framebuffer device */
-extern bool nv_drm_fbdev_module_param;
-#endif

 void *nv_drm_calloc(size_t nmemb, size_t size);

@@ -88,37 +51,6 @@ void *nv_drm_vmap(struct page **pages, unsigned long pages_count);

 void nv_drm_vunmap(void *address);

-bool nv_drm_workthread_init(nv_drm_workthread *worker, const char *name);
-
-/* Can be called concurrently with nv_drm_workthread_add_work() */
-void nv_drm_workthread_shutdown(nv_drm_workthread *worker);
-
-void nv_drm_workthread_work_init(nv_drm_work *work,
-                                 void (*callback)(void *),
-                                 void *arg);
-
-/* Can be called concurrently with nv_drm_workthread_shutdown() */
-int nv_drm_workthread_add_work(nv_drm_workthread *worker, nv_drm_work *work);
-
-void nv_drm_timer_setup(nv_drm_timer *timer,
-                        void (*callback)(nv_drm_timer *nv_drm_timer));
-
-void nv_drm_mod_timer(nv_drm_timer *timer, unsigned long relative_timeout_ms);
-
-bool nv_drm_del_timer_sync(nv_drm_timer *timer);
-
-unsigned long nv_drm_timer_now(void);
-
-unsigned long nv_drm_timeout_from_ms(NvU64 relative_timeout_ms);
-
-#if defined(NV_DRM_FENCE_AVAILABLE)
-int nv_drm_create_sync_file(nv_dma_fence_t *fence);
-
-nv_dma_fence_t *nv_drm_sync_file_get_fence(int fd);
-#endif /* defined(NV_DRM_FENCE_AVAILABLE) */
-
-void nv_drm_yield(void);
-
-#endif /* defined(NV_DRM_AVAILABLE) */
+#endif

 #endif /* __NVIDIA_DRM_OS_INTERFACE_H__ */
--- a/kernel-open/nvidia-drm/nvidia-drm-priv.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-priv.h
@@ -46,33 +46,12 @@
 #define NV_DRM_LOG_ERR(__fmt, ...) \
    DRM_ERROR("[nvidia-drm] " __fmt "\n", ##__VA_ARGS__)

-/*
- * DRM_WARN() was added in v4.9 by kernel commit
- * 30b0da8d556e65ff935a56cd82c05ba0516d3e4a
- *
- * Before this commit, only DRM_INFO and DRM_ERROR were defined and
- * DRM_INFO(fmt, ...) was defined as
- * printk(KERN_INFO "[" DRM_NAME "] " fmt, ##__VA_ARGS__). So, if
- * DRM_WARN is undefined this defines NV_DRM_LOG_WARN following the
- * same pattern as DRM_INFO.
- */
-#ifdef DRM_WARN
-#define NV_DRM_LOG_WARN(__fmt, ...) \
-    DRM_WARN("[nvidia-drm] " __fmt "\n", ##__VA_ARGS__)
-#else
-#define NV_DRM_LOG_WARN(__fmt, ...) \
-    printk(KERN_WARNING "[" DRM_NAME "] [nvidia-drm] " __fmt "\n", ##__VA_ARGS__)
-#endif
-
 #define NV_DRM_LOG_INFO(__fmt, ...) \
    DRM_INFO("[nvidia-drm] " __fmt "\n", ##__VA_ARGS__)

 #define NV_DRM_DEV_LOG_INFO(__dev, __fmt, ...) \
    NV_DRM_LOG_INFO("[GPU ID 0x%08x] " __fmt, __dev->gpu_info.gpu_id, ##__VA_ARGS__)

-#define NV_DRM_DEV_LOG_WARN(__dev, __fmt, ...) \
-    NV_DRM_LOG_WARN("[GPU ID 0x%08x] " __fmt, __dev->gpu_info.gpu_id, ##__VA_ARGS__)
-
 #define NV_DRM_DEV_LOG_ERR(__dev, __fmt, ...) \
    NV_DRM_LOG_ERR("[GPU ID 0x%08x] " __fmt, __dev->gpu_info.gpu_id, ##__VA_ARGS__)

@@ -138,26 +117,9 @@ struct nv_drm_device {

 #endif

-#if defined(NV_DRM_FENCE_AVAILABLE)
-    NvU64 semsurf_stride;
-    NvU64 semsurf_max_submitted_offset;
-#endif
-
    NvBool hasVideoMemory;

    NvBool supportsSyncpts;
-    NvBool subOwnershipGranted;
-    NvBool hasFramebufferConsole;
-
-    /**
-     * @drmMasterChangedSinceLastAtomicCommit:
-     *
-     * This flag is set in nv_drm_master_set and reset after a completed atomic
-     * commit. It is used to restore or recommit state that is lost by the
-     * NvKms modeset owner change, such as the CRTC color management
-     * properties.
-     */
-    NvBool drmMasterChangedSinceLastAtomicCommit;

    struct drm_property *nv_out_fence_property;
    struct drm_property *nv_input_colorspace_property;
--- a/kernel-open/nvidia-drm/nvidia-drm.Kbuild
+++ b/kernel-open/nvidia-drm/nvidia-drm.Kbuild
@@ -19,7 +19,6 @@ NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-modeset.c
 NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-fence.c
 NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-linux.c
 NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-helper.c
-NVIDIA_DRM_SOURCES += nvidia-drm/nv-kthread-q.c
 NVIDIA_DRM_SOURCES += nvidia-drm/nv-pci-table.c
 NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-gem-nvkms-memory.c
 NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-gem-user-memory.c
@@ -80,17 +79,6 @@ NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_rotation_available
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_vma_offset_exact_lookup_locked
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_gem_object_put_unlocked
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += nvhost_dma_fence_unpack
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += list_is_first
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += timer_setup
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += dma_fence_set_error
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += fence_set_error
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += sync_file_get_fence
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_aperture_remove_conflicting_pci_framebuffers
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_fbdev_generic_setup
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_connector_attach_hdr_output_metadata_property
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_helper_crtc_enable_color_mgmt
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_crtc_enable_color_mgmt
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_atomic_helper_legacy_gamma_set

 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_present
 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_has_bus_type
@@ -145,6 +133,5 @@ NV_CONFTEST_TYPE_COMPILE_TESTS += drm_connector_lookup
 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_connector_put
 NV_CONFTEST_TYPE_COMPILE_TESTS += vm_area_struct_has_const_vm_flags
 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_dumb_destroy
-NV_CONFTEST_TYPE_COMPILE_TESTS += fence_ops_use_64bit_seqno
-NV_CONFTEST_TYPE_COMPILE_TESTS += drm_aperture_remove_conflicting_pci_framebuffers_has_driver_arg
-NV_CONFTEST_TYPE_COMPILE_TESTS += drm_mode_create_dp_colorspace_property_has_supported_colorspaces_arg
+NV_CONFTEST_TYPE_COMPILE_TESTS += drm_unlocked_ioctl_flag_present
+NV_CONFTEST_TYPE_COMPILE_TESTS += drm_output_poll_changed
--- a/kernel-open/nvidia-modeset/nv-kthread-q.c
+++ b/kernel-open/nvidia-modeset/nv-kthread-q.c
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2016-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -176,7 +176,7 @@ static struct task_struct *thread_create_on_node(int (*threadfn)(void *data),
 {

    unsigned i, j;
-    const static unsigned attempts = 3;
+    static const unsigned attempts = 3;
    struct task_struct *thread[3];

    for (i = 0;; i++) {
@@ -247,11 +247,6 @@ int nv_kthread_q_init_on_node(nv_kthread_q_t *q, const char *q_name, int preferr
    return 0;
 }

-int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname)
-{
-    return nv_kthread_q_init_on_node(q, qname, NV_KTHREAD_NO_NODE);
-}
-
 // Returns true (non-zero) if the item was actually scheduled, and false if the
 // item was already pending in a queue.
 static int _raw_q_schedule(nv_kthread_q_t *q, nv_kthread_q_item_t *q_item)
--- a/kernel-open/nvidia-modeset/nvidia-modeset-linux.c
+++ b/kernel-open/nvidia-modeset/nvidia-modeset-linux.c
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2015-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2015-21 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -54,7 +54,11 @@
 #include "nv-time.h"
 #include "nv-lock.h"

-#if !defined(CONFIG_RETPOLINE)
+/*
+ * Commit aefb2f2e619b ("x86/bugs: Rename CONFIG_RETPOLINE =>
+ * CONFIG_MITIGATION_RETPOLINE) in v6.8 renamed CONFIG_RETPOLINE.
+ */
+#if !defined(CONFIG_RETPOLINE) && !defined(CONFIG_MITIGATION_RETPOLINE)
 #include "nv-retpoline.h"
 #endif

@@ -65,14 +69,11 @@
 static bool output_rounding_fix = true;
 module_param_named(output_rounding_fix, output_rounding_fix, bool, 0400);

-static bool disable_hdmi_frl = false;
-module_param_named(disable_hdmi_frl, disable_hdmi_frl, bool, 0400);
-
 static bool disable_vrr_memclk_switch = false;
 module_param_named(disable_vrr_memclk_switch, disable_vrr_memclk_switch, bool, 0400);

-static bool hdmi_deepcolor = false;
-module_param_named(hdmi_deepcolor, hdmi_deepcolor, bool, 0400);
+static bool opportunistic_display_sync = true;
+module_param_named(opportunistic_display_sync, opportunistic_display_sync, bool, 0400);

 /* These parameters are used for fault injection tests.  Normally the defaults
 * should be used. */
@@ -84,7 +85,6 @@ MODULE_PARM_DESC(malloc_verbose, "Report information about malloc calls on modul
 static bool malloc_verbose = false;
 module_param_named(malloc_verbose, malloc_verbose, bool, 0400);

-#if NVKMS_CONFIG_FILE_SUPPORTED
 /* This parameter is used to find the dpy override conf file */
 #define NVKMS_CONF_FILE_SPECIFIED (nvkms_conf != NULL)

@@ -93,7 +93,6 @@ MODULE_PARM_DESC(config_file,
                 "(default: disabled)");
 static char *nvkms_conf = NULL;
 module_param_named(config_file, nvkms_conf, charp, 0400);
-#endif

 static atomic_t nvkms_alloc_called_count;

@@ -102,19 +101,14 @@ NvBool nvkms_output_rounding_fix(void)
    return output_rounding_fix;
 }

-NvBool nvkms_disable_hdmi_frl(void)
-{
-    return disable_hdmi_frl;
-}
-
 NvBool nvkms_disable_vrr_memclk_switch(void)
 {
    return disable_vrr_memclk_switch;
 }

-NvBool nvkms_hdmi_deepcolor(void)
+NvBool nvkms_opportunistic_display_sync(void)
 {
-    return hdmi_deepcolor;
+    return opportunistic_display_sync;
 }

 #define NVKMS_SYNCPT_STUBS_NEEDED
@@ -367,7 +361,7 @@ NvU64 nvkms_get_usec(void)
    struct timespec64 ts;
    NvU64 ns;

-    ktime_get_raw_ts64(&ts);
+    ktime_get_real_ts64(&ts);

    ns = timespec64_to_ns(&ts);
    return ns / 1000;
@@ -1087,7 +1081,7 @@ static void nvkms_kapi_event_kthread_q_callback(void *arg)
    nvKmsKapiHandleEventQueueChange(device);
 }

-struct nvkms_per_open *nvkms_open_common(enum NvKmsClientType type,
+static struct nvkms_per_open *nvkms_open_common(enum NvKmsClientType type,
                                         struct NvKmsKapiDevice *device,
                                         int *status)
 {
@@ -1139,7 +1133,7 @@ failed:
    return NULL;
 }

-void nvkms_close_pm_locked(struct nvkms_per_open *popen)
+static void nvkms_close_pm_locked(struct nvkms_per_open *popen)
 {
    /*
     * Don't use down_interruptible(): we need to free resources
@@ -1202,7 +1196,7 @@ static void nvkms_close_popen(struct nvkms_per_open *popen)
    }
 }

-int nvkms_ioctl_common
+static int nvkms_ioctl_common
 (
    struct nvkms_per_open *popen,
    NvU32 cmd, NvU64 address, const size_t size
@@ -1414,7 +1408,6 @@ static void nvkms_proc_exit(void)
 /*************************************************************************
 * NVKMS Config File Read
 ************************************************************************/
-#if NVKMS_CONFIG_FILE_SUPPORTED
 static NvBool nvkms_fs_mounted(void)
 {
    return current->fs != NULL;
@@ -1522,11 +1515,6 @@ static void nvkms_read_config_file_locked(void)

    nvkms_free(buffer, buf_size);
 }
-#else
-static void nvkms_read_config_file_locked(void)
-{
-}
-#endif

 /*************************************************************************
 * NVKMS KAPI functions
--- a/kernel-open/nvidia-modeset/nvidia-modeset-os-interface.h
+++ b/kernel-open/nvidia-modeset/nvidia-modeset-os-interface.h
@@ -97,9 +97,9 @@ typedef struct {
 } NvKmsSyncPtOpParams;

 NvBool nvkms_output_rounding_fix(void);
-NvBool nvkms_disable_hdmi_frl(void);
+
 NvBool nvkms_disable_vrr_memclk_switch(void);
-NvBool nvkms_hdmi_deepcolor(void);
+NvBool nvkms_opportunistic_display_sync(void);

 void   nvkms_call_rm    (void *ops);
 void*  nvkms_alloc      (size_t size,
--- a/kernel-open/nvidia-modeset/nvidia-modeset.Kbuild
+++ b/kernel-open/nvidia-modeset/nvidia-modeset.Kbuild
@@ -58,18 +58,6 @@ nvidia-modeset-y += $(NVIDIA_MODESET_BINARY_OBJECT_O)
 NVIDIA_MODESET_CFLAGS += -I$(src)/nvidia-modeset
 NVIDIA_MODESET_CFLAGS += -UDEBUG -U_DEBUG -DNDEBUG -DNV_BUILD_MODULE_INSTANCES=0

-# Some Android kernels prohibit driver use of filesystem functions like
-# filp_open() and kernel_read().  Disable the NVKMS_CONFIG_FILE_SUPPORTED
-# functionality that uses those functions when building for Android.
-
-PLATFORM_IS_ANDROID ?= 0
-
-ifeq ($(PLATFORM_IS_ANDROID),1)
-  NVIDIA_MODESET_CFLAGS += -DNVKMS_CONFIG_FILE_SUPPORTED=0
-else
-  NVIDIA_MODESET_CFLAGS += -DNVKMS_CONFIG_FILE_SUPPORTED=1
-endif
-
 $(call ASSIGN_PER_OBJ_CFLAGS, $(NVIDIA_MODESET_OBJECTS), $(NVIDIA_MODESET_CFLAGS))


--- a/kernel-open/nvidia-modeset/nvkms.h
+++ b/kernel-open/nvidia-modeset/nvkms.h
@@ -66,8 +66,6 @@ enum NvKmsClientType {
    NVKMS_CLIENT_KERNEL_SPACE,
 };

-struct NvKmsPerOpenDev;
-
 NvBool nvKmsIoctl(
    void *pOpenVoid,
    NvU32 cmd,
@@ -106,6 +104,4 @@ NvBool nvKmsKapiGetFunctionsTableInternal
 NvBool nvKmsGetBacklight(NvU32 display_id, void *drv_priv, NvU32 *brightness);
 NvBool nvKmsSetBacklight(NvU32 display_id, void *drv_priv, NvU32 brightness);

-NvBool nvKmsOpenDevHasSubOwnerPermissionOrBetter(const struct NvKmsPerOpenDev *pOpenDev);
-
 #endif /* __NV_KMS_H__ */
--- a/kernel-open/nvidia-peermem/nvidia-peermem.c
+++ b/kernel-open/nvidia-peermem/nvidia-peermem.c
@@ -1,20 +1,25 @@
-/* SPDX-License-Identifier: Linux-OpenIB */
 /*
 * Copyright (c) 2006, 2007 Cisco Systems, Inc. All rights reserved.
 * Copyright (c) 2007, 2008 Mellanox Technologies. All rights reserved.
 *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
 *
- *  - Redistributions of source code must retain the above
- *    copyright notice, this list of conditions and the following
- *    disclaimer.
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
 *
- *  - Redistributions in binary form must reproduce the above
- *    copyright notice, this list of conditions and the following
- *    disclaimer in the documentation and/or other materials
- *    provided with the distribution.
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
@@ -43,7 +48,9 @@

 MODULE_AUTHOR("Yishai Hadas");
 MODULE_DESCRIPTION("NVIDIA GPU memory plug-in");
-MODULE_LICENSE("Linux-OpenIB");
+
+MODULE_LICENSE("Dual BSD/GPL");
+
 MODULE_VERSION(DRV_VERSION);
 enum {
        NV_MEM_PEERDIRECT_SUPPORT_DEFAULT = 0,
@@ -53,7 +60,13 @@ static int peerdirect_support = NV_MEM_PEERDIRECT_SUPPORT_DEFAULT;
 module_param(peerdirect_support, int, S_IRUGO);
 MODULE_PARM_DESC(peerdirect_support, "Set level of support for Peer-direct, 0 [default] or 1 [legacy, for example MLNX_OFED 4.9 LTS]");

-#define peer_err(FMT, ARGS...) printk(KERN_ERR "nvidia-peermem" " %s:%d " FMT, __FUNCTION__, __LINE__, ## ARGS)
+
+#define peer_err(FMT, ARGS...) printk(KERN_ERR "nvidia-peermem" " %s:%d ERROR " FMT, __FUNCTION__, __LINE__, ## ARGS)
+#ifdef NV_MEM_DEBUG
+#define peer_trace(FMT, ARGS...) printk(KERN_DEBUG "nvidia-peermem" " %s:%d TRACE " FMT, __FUNCTION__, __LINE__, ## ARGS)
+#else
+#define peer_trace(FMT, ARGS...) do {} while (0)
+#endif

 #if defined(NV_MLNX_IB_PEER_MEM_SYMBOLS_PRESENT)

@@ -74,7 +87,10 @@ invalidate_peer_memory mem_invalidate_callback;
 static void *reg_handle = NULL;
 static void *reg_handle_nc = NULL;

+#define NV_MEM_CONTEXT_MAGIC ((u64)0xF1F4F1D0FEF0DAD0ULL)
+
 struct nv_mem_context {
+    u64 pad1;
    struct nvidia_p2p_page_table *page_table;
    struct nvidia_p2p_dma_mapping *dma_mapping;
    u64 core_context;
@@ -86,8 +102,22 @@ struct nv_mem_context {
    struct task_struct *callback_task;
    int sg_allocated;
    struct sg_table sg_head;
+    u64 pad2;
 };

+#define NV_MEM_CONTEXT_CHECK_OK(MC) ({                                  \
+    struct nv_mem_context *mc = (MC);                                   \
+    int rc = ((0 != mc) &&                                              \
+              (READ_ONCE(mc->pad1) == NV_MEM_CONTEXT_MAGIC) &&          \
+              (READ_ONCE(mc->pad2) == NV_MEM_CONTEXT_MAGIC));           \
+    if (!rc) {                                                          \
+        peer_trace("invalid nv_mem_context=%px pad1=%016llx pad2=%016llx\n", \
+                   mc,                                                  \
+                   mc?mc->pad1:0,                                       \
+                   mc?mc->pad2:0);                                      \
+    }                                                                   \
+    rc;                                                                 \
+})

 static void nv_get_p2p_free_callback(void *data)
 {
@@ -97,8 +127,9 @@ static void nv_get_p2p_free_callback(void *data)
    struct nvidia_p2p_dma_mapping *dma_mapping = NULL;

    __module_get(THIS_MODULE);
-    if (!nv_mem_context) {
-        peer_err("nv_get_p2p_free_callback -- invalid nv_mem_context\n");
+
+    if (!NV_MEM_CONTEXT_CHECK_OK(nv_mem_context)) {
+        peer_err("detected invalid context, skipping further processing\n");
        goto out;
    }

@@ -169,9 +200,11 @@ static int nv_mem_acquire(unsigned long addr, size_t size, void *peer_mem_privat
        /* Error case handled as not mine */
        return 0;

+    nv_mem_context->pad1 = NV_MEM_CONTEXT_MAGIC;
    nv_mem_context->page_virt_start = addr & GPU_PAGE_MASK;
    nv_mem_context->page_virt_end   = (addr + size + GPU_PAGE_SIZE - 1) & GPU_PAGE_MASK;
    nv_mem_context->mapped_size  = nv_mem_context->page_virt_end - nv_mem_context->page_virt_start;
+    nv_mem_context->pad2 = NV_MEM_CONTEXT_MAGIC;

    ret = nvidia_p2p_get_pages(0, 0, nv_mem_context->page_virt_start, nv_mem_context->mapped_size,
                               &nv_mem_context->page_table, nv_mem_dummy_callback, nv_mem_context);
@@ -195,6 +228,7 @@ static int nv_mem_acquire(unsigned long addr, size_t size, void *peer_mem_privat
    return 1;

 err:
+    memset(nv_mem_context, 0, sizeof(*nv_mem_context));
    kfree(nv_mem_context);

    /* Error case handled as not mine */
@@ -249,8 +283,8 @@ static int nv_dma_map(struct sg_table *sg_head, void *context,
    nv_mem_context->sg_allocated = 1;
    for_each_sg(sg_head->sgl, sg, nv_mem_context->npages, i) {
        sg_set_page(sg, NULL, nv_mem_context->page_size, 0);
-        sg_dma_address(sg) = dma_mapping->dma_addresses[i];
-        sg_dma_len(sg) = nv_mem_context->page_size;
+        sg->dma_address = dma_mapping->dma_addresses[i];
+        sg->dma_length = nv_mem_context->page_size;
    }
    nv_mem_context->sg_head = *sg_head;
    *nmap = nv_mem_context->npages;
@@ -304,13 +338,8 @@ static void nv_mem_put_pages_common(int nc,
        return;

    if (nc) {
-#ifdef NVIDIA_P2P_CAP_GET_PAGES_PERSISTENT_API
        ret = nvidia_p2p_put_pages_persistent(nv_mem_context->page_virt_start,
                                              nv_mem_context->page_table, 0);
-#else
-        ret = nvidia_p2p_put_pages(0, 0, nv_mem_context->page_virt_start,
-                                   nv_mem_context->page_table);
-#endif
    } else {
        ret = nvidia_p2p_put_pages(0, 0, nv_mem_context->page_virt_start,
                                   nv_mem_context->page_table);
@@ -347,6 +376,7 @@ static void nv_mem_release(void *context)
        sg_free_table(&nv_mem_context->sg_head);
        nv_mem_context->sg_allocated = 0;
    }
+    memset(nv_mem_context, 0, sizeof(*nv_mem_context));
    kfree(nv_mem_context);
    module_put(THIS_MODULE);
    return;
@@ -417,15 +447,9 @@ static int nv_mem_get_pages_nc(unsigned long addr,
    nv_mem_context->core_context = core_context;
    nv_mem_context->page_size = GPU_PAGE_SIZE;

-#ifdef NVIDIA_P2P_CAP_GET_PAGES_PERSISTENT_API
    ret = nvidia_p2p_get_pages_persistent(nv_mem_context->page_virt_start,
                                          nv_mem_context->mapped_size,
                                          &nv_mem_context->page_table, 0);
-#else
-    ret = nvidia_p2p_get_pages(0, 0, nv_mem_context->page_virt_start, nv_mem_context->mapped_size,
-                               &nv_mem_context->page_table, NULL, NULL);
-#endif
-
    if (ret < 0) {
        peer_err("error %d while calling nvidia_p2p_get_pages() with NULL callback\n", ret);
        return ret;
@@ -470,6 +494,8 @@ static int __init nv_mem_client_init(void)
    }

 #if defined (NV_MLNX_IB_PEER_MEM_SYMBOLS_PRESENT)
+    int status = 0;
+
    // off by one, to leave space for the trailing '1' which is flagging
    // the new client type
    BUG_ON(strlen(DRV_NAME) > IB_PEER_MEMORY_NAME_MAX-1);
@@ -498,7 +524,7 @@ static int __init nv_mem_client_init(void)
                         &mem_invalidate_callback);
    if (!reg_handle) {
        peer_err("nv_mem_client_init -- error while registering traditional client\n");
-        rc = -EINVAL;
+        status = -EINVAL;
        goto out;
    }

@@ -508,12 +534,12 @@ static int __init nv_mem_client_init(void)
    reg_handle_nc = ib_register_peer_memory_client(&nv_mem_client_nc, NULL);
    if (!reg_handle_nc) {
        peer_err("nv_mem_client_init -- error while registering nc client\n");
-        rc = -EINVAL;
+        status = -EINVAL;
        goto out;
    }

 out:
-    if (rc) {
+    if (status) {
        if (reg_handle) {
            ib_unregister_peer_memory_client(reg_handle);
            reg_handle = NULL;
@@ -525,7 +551,7 @@ out:
        }
    }

-    return rc;
+    return status;
 #else
    return -EINVAL;
 #endif
--- a/kernel-open/nvidia-uvm/clc365.h
+++ b/kernel-open/nvidia-uvm/clc365.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2023 NVIDIA Corporation
+    Copyright (c) 2022 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
--- a/kernel-open/nvidia-uvm/clc369.h
+++ b/kernel-open/nvidia-uvm/clc369.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2023 NVIDIA Corporation
+    Copyright (c) 2022 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
--- a/kernel-open/nvidia-uvm/nv-kthread-q-selftest.c
+++ b/kernel-open/nvidia-uvm/nv-kthread-q-selftest.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016 NVIDIA Corporation
+    Copyright (c) 2016-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -81,7 +81,7 @@
 #define NUM_Q_ITEMS_IN_MULTITHREAD_TEST (NUM_TEST_Q_ITEMS * NUM_TEST_KTHREADS)

 // This exists in order to have a function to place a breakpoint on:
-void on_nvq_assert(void)
+static void on_nvq_assert(void)
 {
    (void)NULL;
 }
--- a/kernel-open/nvidia-uvm/nv-kthread-q.c
+++ b/kernel-open/nvidia-uvm/nv-kthread-q.c
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2016-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -176,7 +176,7 @@ static struct task_struct *thread_create_on_node(int (*threadfn)(void *data),
 {

    unsigned i, j;
-    const static unsigned attempts = 3;
+    static const unsigned attempts = 3;
    struct task_struct *thread[3];

    for (i = 0;; i++) {
@@ -247,11 +247,6 @@ int nv_kthread_q_init_on_node(nv_kthread_q_t *q, const char *q_name, int preferr
    return 0;
 }

-int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname)
-{
-    return nv_kthread_q_init_on_node(q, qname, NV_KTHREAD_NO_NODE);
-}
-
 // Returns true (non-zero) if the item was actually scheduled, and false if the
 // item was already pending in a queue.
 static int _raw_q_schedule(nv_kthread_q_t *q, nv_kthread_q_item_t *q_item)
--- a/kernel-open/nvidia-uvm/nvidia-uvm-sources.Kbuild
+++ b/kernel-open/nvidia-uvm/nvidia-uvm-sources.Kbuild
@@ -27,7 +27,6 @@ NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_rm_mem.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_channel.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_lock.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_hal.c
-NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_processors.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_range_tree.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_rb_tree.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_range_allocator.c
--- a/kernel-open/nvidia-uvm/nvidia-uvm.Kbuild
+++ b/kernel-open/nvidia-uvm/nvidia-uvm.Kbuild
@@ -82,12 +82,10 @@ NV_CONFTEST_FUNCTION_COMPILE_TESTS += set_pages_uc
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += ktime_get_raw_ts64
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += ioasid_get
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += mm_pasid_drop
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += migrate_vma_setup
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += mmget_not_zero
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += mmgrab
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += iommu_sva_bind_device_has_drvdata_arg
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += vm_fault_to_errno
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += find_next_bit_wrap

 NV_CONFTEST_TYPE_COMPILE_TESTS += backing_dev_info
 NV_CONFTEST_TYPE_COMPILE_TESTS += mm_context_t
@@ -116,3 +114,4 @@ NV_CONFTEST_TYPE_COMPILE_TESTS += mpol_preferred_many_present
 NV_CONFTEST_TYPE_COMPILE_TESTS += mmu_interval_notifier

 NV_CONFTEST_SYMBOL_COMPILE_TESTS += is_export_symbol_present_int_active_memcg
+NV_CONFTEST_SYMBOL_COMPILE_TESTS += is_export_symbol_present_migrate_vma_setup
--- a/kernel-open/nvidia-uvm/nvstatus.c
+++ b/kernel-open/nvidia-uvm/nvstatus.c
@@ -24,11 +24,11 @@
 #include "nvstatus.h"

 #if !defined(NV_PRINTF_STRING_SECTION)
-#if defined(NVRM) && NVOS_IS_LIBOS
+#if defined(NVRM) && NVCPU_IS_RISCV64
 #define NV_PRINTF_STRING_SECTION         __attribute__ ((section (".logging")))
-#else // defined(NVRM) && NVOS_IS_LIBOS
+#else // defined(NVRM) && NVCPU_IS_RISCV64
 #define NV_PRINTF_STRING_SECTION
-#endif // defined(NVRM) && NVOS_IS_LIBOS
+#endif // defined(NVRM) && NVCPU_IS_RISCV64
 #endif // !defined(NV_PRINTF_STRING_SECTION)

 /*
--- a/kernel-open/nvidia-uvm/uvm.h
+++ b/kernel-open/nvidia-uvm/uvm.h
@@ -216,10 +216,6 @@ NV_STATUS UvmDeinitialize(void);
 // Note that it is not required to release VA ranges that were reserved with
 // UvmReserveVa().
 //
-// This is useful for per-process checkpoint and restore, where kernel-mode
-// state needs to be reconfigured to match the expectations of a pre-existing
-// user-mode process.
-//
 // UvmReopen() closes the open file returned by UvmGetFileDescriptor() and
 // replaces it with a new open file with the same name.
 //
--- a/kernel-open/nvidia-uvm/uvm_ada.c
+++ b/kernel-open/nvidia-uvm/uvm_ada.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2021 NVIDIA Corporation
+    Copyright (c) 2021-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -94,4 +94,6 @@ void uvm_hal_ada_arch_init_properties(uvm_parent_gpu_t *parent_gpu)
    parent_gpu->map_remap_larger_page_promotion = false;

    parent_gpu->plc_supported = true;
+
+    parent_gpu->no_ats_range_required = false;
 }
--- a/kernel-open/nvidia-uvm/uvm_ampere.c
+++ b/kernel-open/nvidia-uvm/uvm_ampere.c
@@ -101,4 +101,6 @@ void uvm_hal_ampere_arch_init_properties(uvm_parent_gpu_t *parent_gpu)
        parent_gpu->map_remap_larger_page_promotion = false;

    parent_gpu->plc_supported = true;
+
+    parent_gpu->no_ats_range_required = false;
 }
--- a/kernel-open/nvidia-uvm/uvm_ats.h
+++ b/kernel-open/nvidia-uvm/uvm_ats.h
@@ -34,16 +34,6 @@

    #define UVM_ATS_SUPPORTED() (UVM_ATS_IBM_SUPPORTED() || UVM_ATS_SVA_SUPPORTED())

-// ATS prefetcher uses hmm_range_fault() to query residency information.
-// hmm_range_fault() needs CONFIG_HMM_MIRROR. To detect racing CPU invalidates
-// of memory regions while hmm_range_fault() is being called, MMU interval
-// notifiers are needed.
-    #if defined(CONFIG_HMM_MIRROR) && defined(NV_MMU_INTERVAL_NOTIFIER)
-        #define UVM_ATS_PREFETCH_SUPPORTED() 1
-    #else
-        #define UVM_ATS_PREFETCH_SUPPORTED() 0
-    #endif
-
 typedef struct
 {
    // Mask of gpu_va_spaces which are registered for ATS access. The mask is
--- a/kernel-open/nvidia-uvm/uvm_ats_faults.c
+++ b/kernel-open/nvidia-uvm/uvm_ats_faults.c
@@ -30,23 +30,36 @@
 #include <linux/mempolicy.h>
 #include <linux/mmu_notifier.h>

-#if UVM_ATS_PREFETCH_SUPPORTED()
+#if UVM_HMM_RANGE_FAULT_SUPPORTED()
 #include <linux/hmm.h>
 #endif

-static NV_STATUS service_ats_faults(uvm_gpu_va_space_t *gpu_va_space,
-                                    struct vm_area_struct *vma,
-                                    NvU64 start,
-                                    size_t length,
-                                    uvm_fault_access_type_t access_type,
-                                    uvm_ats_fault_context_t *ats_context)
+typedef enum
+{
+    UVM_ATS_SERVICE_TYPE_FAULTS = 0,
+    UVM_ATS_SERVICE_TYPE_ACCESS_COUNTERS,
+    UVM_ATS_SERVICE_TYPE_COUNT
+} uvm_ats_service_type_t;
+
+static NV_STATUS service_ats_requests(uvm_gpu_va_space_t *gpu_va_space,
+                                      struct vm_area_struct *vma,
+                                      NvU64 start,
+                                      size_t length,
+                                      uvm_fault_access_type_t access_type,
+                                      uvm_ats_service_type_t service_type,
+                                      uvm_ats_fault_context_t *ats_context)
 {
    uvm_va_space_t *va_space = gpu_va_space->va_space;
    struct mm_struct *mm = va_space->va_space_mm.mm;
-    bool write = (access_type >= UVM_FAULT_ACCESS_TYPE_WRITE);
    NV_STATUS status;
    NvU64 user_space_start;
    NvU64 user_space_length;
+    bool write = (access_type >= UVM_FAULT_ACCESS_TYPE_WRITE);
+    bool fault_service_type = (service_type == UVM_ATS_SERVICE_TYPE_FAULTS);
+    uvm_populate_permissions_t populate_permissions = fault_service_type ?
+                                            (write ? UVM_POPULATE_PERMISSIONS_WRITE : UVM_POPULATE_PERMISSIONS_ANY) :
+                                            UVM_POPULATE_PERMISSIONS_INHERIT;
+

    // Request uvm_migrate_pageable() to touch the corresponding page after
    // population.
@@ -83,10 +96,10 @@ static NV_STATUS service_ats_faults(uvm_gpu_va_space_t *gpu_va_space,
        .dst_node_id                    = ats_context->residency_node,
        .start                          = start,
        .length                         = length,
-        .populate_permissions           = write ? UVM_POPULATE_PERMISSIONS_WRITE : UVM_POPULATE_PERMISSIONS_ANY,
-        .touch                          = true,
-        .skip_mapped                    = true,
-        .populate_on_cpu_alloc_failures = true,
+        .populate_permissions           = populate_permissions,
+        .touch                          = fault_service_type,
+        .skip_mapped                    = fault_service_type,
+        .populate_on_cpu_alloc_failures = fault_service_type,
        .user_space_start               = &user_space_start,
        .user_space_length              = &user_space_length,
    };
@@ -107,26 +120,24 @@ static NV_STATUS service_ats_faults(uvm_gpu_va_space_t *gpu_va_space,
    return status;
 }

-static void flush_tlb_write_faults(uvm_gpu_va_space_t *gpu_va_space,
-                                   NvU64 addr,
-                                   size_t size,
-                                   uvm_fault_client_type_t client_type)
+static void flush_tlb_va_region(uvm_gpu_va_space_t *gpu_va_space,
+                                NvU64 addr,
+                                size_t size,
+                                uvm_fault_client_type_t client_type)
 {
    uvm_ats_fault_invalidate_t *ats_invalidate;

-    uvm_ats_smmu_invalidate_tlbs(gpu_va_space, addr, size);
-
    if (client_type == UVM_FAULT_CLIENT_TYPE_GPC)
        ats_invalidate = &gpu_va_space->gpu->parent->fault_buffer_info.replayable.ats_invalidate;
    else
        ats_invalidate = &gpu_va_space->gpu->parent->fault_buffer_info.non_replayable.ats_invalidate;

-    if (!ats_invalidate->write_faults_in_batch) {
-        uvm_tlb_batch_begin(&gpu_va_space->page_tables, &ats_invalidate->write_faults_tlb_batch);
-        ats_invalidate->write_faults_in_batch = true;
+    if (!ats_invalidate->tlb_batch_pending) {
+        uvm_tlb_batch_begin(&gpu_va_space->page_tables, &ats_invalidate->tlb_batch);
+        ats_invalidate->tlb_batch_pending = true;
    }

-    uvm_tlb_batch_invalidate(&ats_invalidate->write_faults_tlb_batch, addr, size, PAGE_SIZE, UVM_MEMBAR_NONE);
+    uvm_tlb_batch_invalidate(&ats_invalidate->tlb_batch, addr, size, PAGE_SIZE, UVM_MEMBAR_NONE);
 }

 static void ats_batch_select_residency(uvm_gpu_va_space_t *gpu_va_space,
@@ -235,7 +246,7 @@ static uvm_va_block_region_t uvm_ats_region_from_vma(struct vm_area_struct *vma,
    return uvm_ats_region_from_start_end(start, end);
 }

-#if UVM_ATS_PREFETCH_SUPPORTED()
+#if UVM_HMM_RANGE_FAULT_SUPPORTED()

 static bool uvm_ats_invalidate_notifier(struct mmu_interval_notifier *mni, unsigned long cur_seq)
 {
@@ -273,12 +284,12 @@ static NV_STATUS ats_compute_residency_mask(uvm_gpu_va_space_t *gpu_va_space,
                                            uvm_ats_fault_context_t *ats_context)
 {
    NV_STATUS status = NV_OK;
+    uvm_page_mask_t *residency_mask = &ats_context->prefetch_state.residency_mask;

-#if UVM_ATS_PREFETCH_SUPPORTED()
+#if UVM_HMM_RANGE_FAULT_SUPPORTED()
    int ret;
    NvU64 start;
    NvU64 end;
-    uvm_page_mask_t *residency_mask = &ats_context->prefetch_state.residency_mask;
    struct hmm_range range;
    uvm_page_index_t page_index;
    uvm_va_block_region_t vma_region;
@@ -359,78 +370,83 @@ static NV_STATUS ats_compute_residency_mask(uvm_gpu_va_space_t *gpu_va_space,

    mmu_interval_notifier_remove(range.notifier);

+#else
+    uvm_page_mask_zero(residency_mask);
 #endif

    return status;
 }

-static void ats_expand_fault_region(uvm_gpu_va_space_t *gpu_va_space,
-                                    struct vm_area_struct *vma,
-                                    uvm_ats_fault_context_t *ats_context,
-                                    uvm_va_block_region_t max_prefetch_region,
-                                    uvm_page_mask_t *faulted_mask)
+static void ats_compute_prefetch_mask(uvm_gpu_va_space_t *gpu_va_space,
+                                      struct vm_area_struct *vma,
+                                      uvm_ats_fault_context_t *ats_context,
+                                      uvm_va_block_region_t max_prefetch_region)
 {
-    uvm_page_mask_t *read_fault_mask = &ats_context->read_fault_mask;
-    uvm_page_mask_t *write_fault_mask = &ats_context->write_fault_mask;
+    uvm_page_mask_t *accessed_mask = &ats_context->accessed_mask;
    uvm_page_mask_t *residency_mask = &ats_context->prefetch_state.residency_mask;
    uvm_page_mask_t *prefetch_mask = &ats_context->prefetch_state.prefetch_pages_mask;
    uvm_perf_prefetch_bitmap_tree_t *bitmap_tree = &ats_context->prefetch_state.bitmap_tree;

-    if (uvm_page_mask_empty(faulted_mask))
+    if (uvm_page_mask_empty(accessed_mask))
        return;

    uvm_perf_prefetch_compute_ats(gpu_va_space->va_space,
-                                  faulted_mask,
-                                  uvm_va_block_region_from_mask(NULL, faulted_mask),
+                                  accessed_mask,
+                                  uvm_va_block_region_from_mask(NULL, accessed_mask),
                                  max_prefetch_region,
                                  residency_mask,
                                  bitmap_tree,
                                  prefetch_mask);
-
-    uvm_page_mask_or(read_fault_mask, read_fault_mask, prefetch_mask);
-
-    if (vma->vm_flags & VM_WRITE)
-        uvm_page_mask_or(write_fault_mask, write_fault_mask, prefetch_mask);
 }

-static NV_STATUS ats_fault_prefetch(uvm_gpu_va_space_t *gpu_va_space,
-                                    struct vm_area_struct *vma,
-                                    NvU64 base,
-                                    uvm_ats_fault_context_t *ats_context)
+static NV_STATUS ats_compute_prefetch(uvm_gpu_va_space_t *gpu_va_space,
+                                      struct vm_area_struct *vma,
+                                      NvU64 base,
+                                      uvm_ats_service_type_t service_type,
+                                      uvm_ats_fault_context_t *ats_context)
 {
-    NV_STATUS status = NV_OK;
-    uvm_page_mask_t *read_fault_mask = &ats_context->read_fault_mask;
-    uvm_page_mask_t *write_fault_mask = &ats_context->write_fault_mask;
-    uvm_page_mask_t *faulted_mask = &ats_context->faulted_mask;
+    NV_STATUS status;
+    uvm_page_mask_t *accessed_mask = &ats_context->accessed_mask;
    uvm_page_mask_t *prefetch_mask = &ats_context->prefetch_state.prefetch_pages_mask;
    uvm_va_block_region_t max_prefetch_region = uvm_ats_region_from_vma(vma, base);

+    // Residency mask needs to be computed even if prefetching is disabled since
+    // the residency information is also needed by access counters servicing in
+    // uvm_ats_service_access_counters()
+    status = ats_compute_residency_mask(gpu_va_space, vma, base, ats_context);
+    if (status != NV_OK)
+        return status;
+
    if (!uvm_perf_prefetch_enabled(gpu_va_space->va_space))
        return status;

-    if (uvm_page_mask_empty(faulted_mask))
-        return status;
-
-    status = ats_compute_residency_mask(gpu_va_space, vma, base, ats_context);
-    if (status != NV_OK)
+    if (uvm_page_mask_empty(accessed_mask))
        return status;

    // Prefetch the entire region if none of the pages are resident on any node
    // and if preferred_location is the faulting GPU.
    if (ats_context->prefetch_state.has_preferred_location &&
-        ats_context->prefetch_state.first_touch &&
-        uvm_id_equal(ats_context->residency_id, gpu_va_space->gpu->parent->id)) {
+        (ats_context->prefetch_state.first_touch || (service_type == UVM_ATS_SERVICE_TYPE_ACCESS_COUNTERS)) &&
+        uvm_id_equal(ats_context->residency_id, gpu_va_space->gpu->id)) {

        uvm_page_mask_init_from_region(prefetch_mask, max_prefetch_region, NULL);
+    }
+    else {
+        ats_compute_prefetch_mask(gpu_va_space, vma, ats_context, max_prefetch_region);
+    }
+
+    if (service_type == UVM_ATS_SERVICE_TYPE_FAULTS) {
+        uvm_page_mask_t *read_fault_mask = &ats_context->read_fault_mask;
+        uvm_page_mask_t *write_fault_mask = &ats_context->write_fault_mask;
+
        uvm_page_mask_or(read_fault_mask, read_fault_mask, prefetch_mask);

        if (vma->vm_flags & VM_WRITE)
            uvm_page_mask_or(write_fault_mask, write_fault_mask, prefetch_mask);
-
-        return status;
    }
-
-    ats_expand_fault_region(gpu_va_space, vma, ats_context, max_prefetch_region, faulted_mask);
+    else {
+        uvm_page_mask_or(accessed_mask, accessed_mask, prefetch_mask);
+    }

    return status;
 }
@@ -448,6 +464,7 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
    uvm_page_mask_t *faults_serviced_mask = &ats_context->faults_serviced_mask;
    uvm_page_mask_t *reads_serviced_mask = &ats_context->reads_serviced_mask;
    uvm_fault_client_type_t client_type = ats_context->client_type;
+    uvm_ats_service_type_t service_type = UVM_ATS_SERVICE_TYPE_FAULTS;

    UVM_ASSERT(vma);
    UVM_ASSERT(IS_ALIGNED(base, UVM_VA_BLOCK_SIZE));
@@ -456,6 +473,9 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
    UVM_ASSERT(gpu_va_space->ats.enabled);
    UVM_ASSERT(uvm_gpu_va_space_state(gpu_va_space) == UVM_GPU_VA_SPACE_STATE_ACTIVE);

+    uvm_assert_mmap_lock_locked(vma->vm_mm);
+    uvm_assert_rwsem_locked(&gpu_va_space->va_space->lock);
+
    uvm_page_mask_zero(faults_serviced_mask);
    uvm_page_mask_zero(reads_serviced_mask);

@@ -481,7 +501,7 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,

    ats_batch_select_residency(gpu_va_space, vma, ats_context);

-    ats_fault_prefetch(gpu_va_space, vma, base, ats_context);
+    ats_compute_prefetch(gpu_va_space, vma, base, service_type, ats_context);

    for_each_va_block_subregion_in_mask(subregion, write_fault_mask, region) {
        NvU64 start = base + (subregion.first * PAGE_SIZE);
@@ -493,12 +513,13 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
        UVM_ASSERT(start >= vma->vm_start);
        UVM_ASSERT((start + length) <= vma->vm_end);

-        status = service_ats_faults(gpu_va_space, vma, start, length, access_type, ats_context);
+        status = service_ats_requests(gpu_va_space, vma, start, length, access_type, service_type, ats_context);
        if (status != NV_OK)
            return status;

        if (vma->vm_flags & VM_WRITE) {
            uvm_page_mask_region_fill(faults_serviced_mask, subregion);
+            uvm_ats_smmu_invalidate_tlbs(gpu_va_space, start, length);

            // The Linux kernel never invalidates TLB entries on mapping
            // permission upgrade. This is a problem if the GPU has cached
@@ -509,7 +530,7 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
            // infinite loop because we just forward the fault to the Linux
            // kernel and it will see that the permissions in the page table are
            // correct. Therefore, we flush TLB entries on ATS write faults.
-            flush_tlb_write_faults(gpu_va_space, start, length, client_type);
+            flush_tlb_va_region(gpu_va_space, start, length, client_type);
        }
        else {
            uvm_page_mask_region_fill(reads_serviced_mask, subregion);
@@ -527,11 +548,20 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
        UVM_ASSERT(start >= vma->vm_start);
        UVM_ASSERT((start + length) <= vma->vm_end);

-        status = service_ats_faults(gpu_va_space, vma, start, length, access_type, ats_context);
+        status = service_ats_requests(gpu_va_space, vma, start, length, access_type, service_type, ats_context);
        if (status != NV_OK)
            return status;

        uvm_page_mask_region_fill(faults_serviced_mask, subregion);
+
+        // Similarly to permission upgrade scenario, discussed above, GPU
+        // will not re-fetch the entry if the PTE is invalid and page size
+        // is 4K. To avoid infinite faulting loop, invalidate TLB for every
+        // new translation written explicitly like in the case of permission
+        // upgrade.
+        if (PAGE_SIZE == UVM_PAGE_SIZE_4K)
+            flush_tlb_va_region(gpu_va_space, start, length, client_type);
+
    }

    return status;
@@ -566,7 +596,7 @@ NV_STATUS uvm_ats_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space,
    NV_STATUS status;
    uvm_push_t push;

-    if (!ats_invalidate->write_faults_in_batch)
+    if (!ats_invalidate->tlb_batch_pending)
        return NV_OK;

    UVM_ASSERT(gpu_va_space);
@@ -578,7 +608,7 @@ NV_STATUS uvm_ats_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space,
                            "Invalidate ATS entries");

    if (status == NV_OK) {
-        uvm_tlb_batch_end(&ats_invalidate->write_faults_tlb_batch, &push, UVM_MEMBAR_NONE);
+        uvm_tlb_batch_end(&ats_invalidate->tlb_batch, &push, UVM_MEMBAR_NONE);
        uvm_push_end(&push);

        // Add this push to the GPU's tracker so that fault replays/clears can
@@ -586,7 +616,57 @@ NV_STATUS uvm_ats_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space,
        status = uvm_tracker_add_push_safe(out_tracker, &push);
    }

-    ats_invalidate->write_faults_in_batch = false;
+    ats_invalidate->tlb_batch_pending = false;

    return status;
 }
+
+NV_STATUS uvm_ats_service_access_counters(uvm_gpu_va_space_t *gpu_va_space,
+                                          struct vm_area_struct *vma,
+                                          NvU64 base,
+                                          uvm_ats_fault_context_t *ats_context)
+{
+    uvm_va_block_region_t subregion;
+    uvm_va_block_region_t region = uvm_va_block_region(0, PAGES_PER_UVM_VA_BLOCK);
+    uvm_ats_service_type_t service_type = UVM_ATS_SERVICE_TYPE_ACCESS_COUNTERS;
+
+    UVM_ASSERT(vma);
+    UVM_ASSERT(IS_ALIGNED(base, UVM_VA_BLOCK_SIZE));
+    UVM_ASSERT(g_uvm_global.ats.enabled);
+    UVM_ASSERT(gpu_va_space);
+    UVM_ASSERT(gpu_va_space->ats.enabled);
+    UVM_ASSERT(uvm_gpu_va_space_state(gpu_va_space) == UVM_GPU_VA_SPACE_STATE_ACTIVE);
+
+    uvm_assert_mmap_lock_locked(vma->vm_mm);
+    uvm_assert_rwsem_locked(&gpu_va_space->va_space->lock);
+
+    ats_batch_select_residency(gpu_va_space, vma, ats_context);
+
+    // Ignoring the return value of ats_compute_prefetch is ok since prefetching
+    // is just an optimization and servicing access counter migrations is still
+    // worthwhile even without any prefetching added. So, let servicing continue
+    // instead of returning early even if the prefetch computation fails.
+    ats_compute_prefetch(gpu_va_space, vma, base, service_type, ats_context);
+
+    // Remove pages which are already resident at the intended destination from
+    // the accessed_mask.
+    uvm_page_mask_andnot(&ats_context->accessed_mask,
+                         &ats_context->accessed_mask,
+                         &ats_context->prefetch_state.residency_mask);
+
+    for_each_va_block_subregion_in_mask(subregion, &ats_context->accessed_mask, region) {
+        NV_STATUS status;
+        NvU64 start = base + (subregion.first * PAGE_SIZE);
+        size_t length = uvm_va_block_region_num_pages(subregion) * PAGE_SIZE;
+        uvm_fault_access_type_t access_type = UVM_FAULT_ACCESS_TYPE_COUNT;
+
+        UVM_ASSERT(start >= vma->vm_start);
+        UVM_ASSERT((start + length) <= vma->vm_end);
+
+        status = service_ats_requests(gpu_va_space, vma, start, length, access_type, service_type, ats_context);
+        if (status != NV_OK)
+            return status;
+    }
+
+    return NV_OK;
+}
--- a/kernel-open/nvidia-uvm/uvm_ats_faults.h
+++ b/kernel-open/nvidia-uvm/uvm_ats_faults.h
@@ -42,17 +42,37 @@
 // corresponding bit in read_fault_mask. These returned masks are only valid if
 // the return status is NV_OK. Status other than NV_OK indicate system global
 // fault servicing failures.
+//
+// LOCKING: The caller must retain and hold the mmap_lock and hold the va_space
+// lock.
 NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
                                 struct vm_area_struct *vma,
                                 NvU64 base,
                                 uvm_ats_fault_context_t *ats_context);

+// Service access counter notifications on ATS regions in the range (base, base
+// + UVM_VA_BLOCK_SIZE) for individual pages in the range requested by page_mask
+// set in ats_context->accessed_mask. base must be aligned to UVM_VA_BLOCK_SIZE.
+// The caller is responsible for ensuring that the addresses in the
+// accessed_mask is completely covered by the VMA. The caller is also
+// responsible for handling any errors returned by this function.
+//
+// Returns NV_OK if servicing was successful. Any other error indicates an error
+// while servicing the range.
+//
+// LOCKING: The caller must retain and hold the mmap_lock and hold the va_space
+// lock.
+NV_STATUS uvm_ats_service_access_counters(uvm_gpu_va_space_t *gpu_va_space,
+                                          struct vm_area_struct *vma,
+                                          NvU64 base,
+                                          uvm_ats_fault_context_t *ats_context);
+
 // Return whether there are any VA ranges (and thus GMMU mappings) within the
 // UVM_GMMU_ATS_GRANULARITY-aligned region containing address.
 bool uvm_ats_check_in_gmmu_region(uvm_va_space_t *va_space, NvU64 address, uvm_va_range_t *next);

 // This function performs pending TLB invalidations for ATS and clears the
-// ats_invalidate->write_faults_in_batch flag
+// ats_invalidate->tlb_batch_pending flag
 NV_STATUS uvm_ats_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space,
                                  uvm_ats_fault_invalidate_t *ats_invalidate,
                                  uvm_tracker_t *out_tracker);
--- a/kernel-open/nvidia-uvm/uvm_ats_sva.c
+++ b/kernel-open/nvidia-uvm/uvm_ats_sva.c
@@ -30,6 +30,7 @@
 #include "uvm_va_space_mm.h"

 #include <asm/io.h>
+#include <linux/log2.h>
 #include <linux/iommu.h>
 #include <linux/mm_types.h>
 #include <linux/acpi.h>
@@ -50,6 +51,12 @@
 #define UVM_IOMMU_SVA_BIND_DEVICE(dev, mm) iommu_sva_bind_device(dev, mm)
 #endif

+// Type to represent a 128-bit SMMU command queue command.
+struct smmu_cmd {
+    NvU64 low;
+    NvU64 high;
+};
+
 // Base address of SMMU CMDQ-V for GSMMU0.
 #define SMMU_CMDQV_BASE_ADDR(smmu_base) (smmu_base + 0x200000)
 #define SMMU_CMDQV_BASE_LEN 0x00830000
@@ -101,9 +108,9 @@
 // Base address offset for the VCMDQ registers.
 #define SMMU_VCMDQ_CMDQ_BASE 0x10000

-// Size of the command queue. Each command is 8 bytes and we can't
-// have a command queue greater than one page.
-#define SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE 9
+// Size of the command queue. Each command is 16 bytes and we can't
+// have a command queue greater than one page in size.
+#define SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE (PAGE_SHIFT - ilog2(sizeof(struct smmu_cmd)))
 #define SMMU_VCMDQ_CMDQ_ENTRIES (1UL << SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE)

 // We always use VINTF63 for the WAR
@@ -175,7 +182,6 @@ static NV_STATUS uvm_ats_smmu_war_init(uvm_parent_gpu_t *parent_gpu)
    iowrite32((VINTF << SMMU_CMDQV_CMDQ_ALLOC_MAP_VIRT_INTF_INDX_SHIFT) | SMMU_CMDQV_CMDQ_ALLOC_MAP_ALLOC,
              smmu_cmdqv_base + SMMU_CMDQV_CMDQ_ALLOC_MAP(VCMDQ));

-    BUILD_BUG_ON((SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE + 3) > PAGE_SHIFT);
    smmu_vcmdq_write64(smmu_cmdqv_base, SMMU_VCMDQ_CMDQ_BASE,
                       page_to_phys(parent_gpu->smmu_war.smmu_cmdq) | SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE);
    smmu_vcmdq_write32(smmu_cmdqv_base, SMMU_VCMDQ_CONS, 0);
--- a/kernel-open/nvidia-uvm/uvm_channel.c
+++ b/kernel-open/nvidia-uvm/uvm_channel.c
@@ -722,7 +722,17 @@ static void internal_channel_submit_work_wlc(uvm_push_t *push)

    // Wait for the WLC/LCIC to be primed. This means that PUT == GET + 2
    // and a WLC doorbell ring is enough to start work.
-    UVM_SPIN_WHILE(!uvm_gpu_tracking_semaphore_is_completed(&lcic_channel->tracking_sem), &spin);
+    UVM_SPIN_WHILE(!uvm_gpu_tracking_semaphore_is_completed(&lcic_channel->tracking_sem), &spin) {
+        NV_STATUS status = uvm_channel_check_errors(lcic_channel);
+        if (status != NV_OK) {
+            UVM_ASSERT(uvm_global_get_status() != NV_OK);
+
+            // If there's a global fatal error we can't communicate with the GPU
+            // and the below launch sequence doesn't work.
+            UVM_ERR_PRINT_NV_STATUS("Failed to wait for LCIC channel (%s) completion.", status, lcic_channel->name);
+            return;
+        }
+    }

    // Executing WLC adds an extra job to LCIC
    ++lcic_channel->tracking_sem.queued_value;
@@ -2683,7 +2693,7 @@ static void init_channel_manager_conf(uvm_channel_manager_t *manager)
    // caches vidmem (and sysmem), we place GPFIFO and GPPUT on sysmem to avoid
    // cache thrash. The memory access latency is reduced, despite the required
    // access through the bus, because no cache coherence message is exchanged.
-    if (uvm_parent_gpu_is_coherent(gpu->parent)) {
+    if (uvm_gpu_is_coherent(gpu->parent)) {
        manager->conf.gpfifo_loc = UVM_BUFFER_LOCATION_SYS;

        // On GPUs with limited ESCHED addressing range, e.g., Volta on P9, RM
@@ -3250,7 +3260,17 @@ static void channel_manager_stop_wlc(uvm_channel_manager_t *manager)

        // Wait for the WLC/LCIC to be primed. This means that PUT == GET + 2
        // and a WLC doorbell ring is enough to start work.
-        UVM_SPIN_WHILE(!uvm_gpu_tracking_semaphore_is_completed(&channel->tracking_sem), &spin);
+        UVM_SPIN_WHILE(!uvm_gpu_tracking_semaphore_is_completed(&channel->tracking_sem), &spin) {
+            status = uvm_channel_check_errors(channel);
+            if (status != NV_OK) {
+                UVM_ERR_PRINT_NV_STATUS("Failed to wait for LCIC channel (%s) completion", status, channel->name);
+                break;
+            }
+        }
+
+        // Continue on error and attempt to stop WLC below. This can lead to
+        // channel destruction with mismatched GET and PUT pointers. RM will
+        // print errors if that's the case, but channel destruction succeeeds.
    }

    status = uvm_push_begin(manager, UVM_CHANNEL_TYPE_SEC2, &push, "Stop WLC channels");
--- a/kernel-open/nvidia-uvm/uvm_common.c
+++ b/kernel-open/nvidia-uvm/uvm_common.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2013-2023 NVIDIA Corporation
+    Copyright (c) 2013-2021 NVIDIA Corporation

    This program is free software; you can redistribute it and/or
    modify it under the terms of the GNU General Public License
@@ -233,6 +233,18 @@ unsigned uvm_get_stale_thread_id(void)
    return (unsigned)task_pid_vnr(current);
 }

+//
+// A simple security rule for allowing access to UVM user space memory: if you
+// are the same user as the owner of the memory, or if you are root, then you
+// are granted access. The idea is to allow debuggers and profilers to work, but
+// without opening up any security holes.
+//
+NvBool uvm_user_id_security_check(uid_t euidTarget)
+{
+    return (NV_CURRENT_EUID() == euidTarget) ||
+           (UVM_ROOT_UID == euidTarget);
+}
+
 void on_uvm_test_fail(void)
 {
    (void)NULL;
--- a/kernel-open/nvidia-uvm/uvm_common.h
+++ b/kernel-open/nvidia-uvm/uvm_common.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2013-2023 NVIDIA Corporation
+    Copyright (c) 2013-2024 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -21,8 +21,8 @@

 *******************************************************************************/

-#ifndef _UVM_COMMON_H
-#define _UVM_COMMON_H
+#ifndef __UVM_COMMON_H__
+#define __UVM_COMMON_H__

 #ifdef DEBUG
    #define UVM_IS_DEBUG() 1
@@ -282,6 +282,9 @@ static inline void kmem_cache_destroy_safe(struct kmem_cache **ppCache)
    }
 }

+static const uid_t UVM_ROOT_UID = 0;
+
+
 typedef struct
 {
    NvU64 start_time_ns;
@@ -332,6 +335,7 @@ NV_STATUS errno_to_nv_status(int errnoCode);
 int nv_status_to_errno(NV_STATUS status);
 unsigned uvm_get_stale_process_id(void);
 unsigned uvm_get_stale_thread_id(void);
+NvBool uvm_user_id_security_check(uid_t euidTarget);

 extern int uvm_enable_builtin_tests;

@@ -409,4 +413,42 @@ static inline void uvm_touch_page(struct page *page)
 // Return true if the VMA is one used by UVM managed allocations.
 bool uvm_vma_is_managed(struct vm_area_struct *vma);

-#endif /* _UVM_COMMON_H */
+static bool uvm_platform_uses_canonical_form_address(void)
+{
+    if (NVCPU_IS_PPC64LE)
+        return false;
+
+    return true;
+}
+
+// Similar to the GPU MMU HAL num_va_bits(), it returns the CPU's num_va_bits().
+static NvU32 uvm_cpu_num_va_bits(void)
+{
+    return fls64(TASK_SIZE - 1) + 1;
+}
+
+// Return the unaddressable range in a num_va_bits-wide VA space, [first, outer)
+static void uvm_get_unaddressable_range(NvU32 num_va_bits, NvU64 *first, NvU64 *outer)
+{
+    UVM_ASSERT(num_va_bits < 64);
+    UVM_ASSERT(first);
+    UVM_ASSERT(outer);
+
+    // Maxwell GPUs (num_va_bits == 40b) do not support canonical form address
+    // even when plugged into platforms using it.
+    if (uvm_platform_uses_canonical_form_address() && num_va_bits > 40) {
+        *first = 1ULL << (num_va_bits - 1);
+        *outer = (NvU64)((NvS64)(1ULL << 63) >> (64 - num_va_bits));
+    }
+    else {
+        *first = 1ULL << num_va_bits;
+        *outer = ~0Ull;
+    }
+}
+
+static void uvm_cpu_get_unaddressable_range(NvU64 *first, NvU64 *outer)
+{
+    return uvm_get_unaddressable_range(uvm_cpu_num_va_bits(), first, outer);
+}
+
+#endif /* __UVM_COMMON_H__ */
--- a/kernel-open/nvidia-uvm/uvm_conf_computing.c
+++ b/kernel-open/nvidia-uvm/uvm_conf_computing.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2021 NVIDIA Corporation
+    Copyright (c) 2021-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -54,23 +54,26 @@ bool uvm_conf_computing_mode_is_hcc(const uvm_gpu_t *gpu)
    return uvm_conf_computing_get_mode(gpu->parent) == UVM_GPU_CONF_COMPUTE_MODE_HCC;
 }

-NV_STATUS uvm_conf_computing_init_parent_gpu(const uvm_parent_gpu_t *parent)
+void uvm_conf_computing_check_parent_gpu(const uvm_parent_gpu_t *parent)
 {
-    UvmGpuConfComputeMode cc, sys_cc;
-    uvm_gpu_t *first;
+    uvm_gpu_t *first_gpu;

    uvm_assert_mutex_locked(&g_uvm_global.global_lock);

+    // The Confidential Computing state of the GPU should match that of the
+    // system.
+    UVM_ASSERT(uvm_conf_computing_mode_enabled_parent(parent) == g_uvm_global.conf_computing_enabled);
+
    // TODO: Bug 2844714: since we have no routine to traverse parent GPUs,
    // find first child GPU and get its parent.
-    first = uvm_global_processor_mask_find_first_gpu(&g_uvm_global.retained_gpus);
-    if (!first)
-        return NV_OK;
+    first_gpu = uvm_global_processor_mask_find_first_gpu(&g_uvm_global.retained_gpus);
+    if (first_gpu == NULL)
+        return;

-    sys_cc = uvm_conf_computing_get_mode(first->parent);
-    cc = uvm_conf_computing_get_mode(parent);
-
-    return cc == sys_cc ? NV_OK : NV_ERR_NOT_SUPPORTED;
+    // All GPUs derive Confidential Computing status from their parent. By
+    // current policy all parent GPUs have identical Confidential Computing
+    // status.
+    UVM_ASSERT(uvm_conf_computing_get_mode(parent) == uvm_conf_computing_get_mode(first_gpu->parent));
 }

 static void dma_buffer_destroy_locked(uvm_conf_computing_dma_buffer_pool_t *dma_buffer_pool,
--- a/kernel-open/nvidia-uvm/uvm_conf_computing.h
+++ b/kernel-open/nvidia-uvm/uvm_conf_computing.h
@@ -60,10 +60,8 @@
 // UVM_METHOD_SIZE * 2 * 10 = 80.
 #define UVM_CONF_COMPUTING_SIGN_BUF_MAX_SIZE 80

-// All GPUs derive confidential computing status from their parent.
-// By current policy all parent GPUs have identical confidential
-// computing status.
-NV_STATUS uvm_conf_computing_init_parent_gpu(const uvm_parent_gpu_t *parent);
+void uvm_conf_computing_check_parent_gpu(const uvm_parent_gpu_t *parent);
+
 bool uvm_conf_computing_mode_enabled_parent(const uvm_parent_gpu_t *parent);
 bool uvm_conf_computing_mode_enabled(const uvm_gpu_t *gpu);
 bool uvm_conf_computing_mode_is_hcc(const uvm_gpu_t *gpu);
--- a/kernel-open/nvidia-uvm/uvm_global.c
+++ b/kernel-open/nvidia-uvm/uvm_global.c
@@ -71,11 +71,6 @@ static void uvm_unregister_callbacks(void)
    }
 }

-static void sev_init(const UvmPlatformInfo *platform_info)
-{
-    g_uvm_global.sev_enabled = platform_info->sevEnabled;
-}
-
 NV_STATUS uvm_global_init(void)
 {
    NV_STATUS status;
@@ -124,8 +119,7 @@ NV_STATUS uvm_global_init(void)

    uvm_ats_init(&platform_info);
    g_uvm_global.num_simulated_devices = 0;
-
-    sev_init(&platform_info);
+    g_uvm_global.conf_computing_enabled = platform_info.confComputingEnabled;

    status = uvm_gpu_init();
    if (status != NV_OK) {
--- a/kernel-open/nvidia-uvm/uvm_global.h
+++ b/kernel-open/nvidia-uvm/uvm_global.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2015-2021 NVIDIA Corporation
+    Copyright (c) 2015-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -143,11 +143,16 @@ struct uvm_global_struct
        struct page *page;
    } unload_state;

-    // AMD Secure Encrypted Virtualization (SEV) status. True if VM has SEV
-    // enabled. This field is set once during global initialization
-    // (uvm_global_init), and can be read afterwards without acquiring any
-    // locks.
-    bool sev_enabled;
+    // True if the VM has AMD's SEV, or equivalent HW security extensions such
+    // as Intel's TDX, enabled. The flag is always false on the host.
+    //
+    // This value moves in tandem with that of Confidential Computing in the
+    // GPU(s) in all supported configurations, so it is used as a proxy for the
+    // Confidential Computing state.
+    //
+    // This field is set once during global initialization (uvm_global_init),
+    // and can be read afterwards without acquiring any locks.
+    bool conf_computing_enabled;
 };

 // Initialize global uvm state
@@ -233,8 +238,10 @@ static uvm_gpu_t *uvm_gpu_get_by_processor_id(uvm_processor_id_t id)
    return gpu;
 }

-static uvmGpuSessionHandle uvm_global_session_handle(void)
+static uvmGpuSessionHandle uvm_gpu_session_handle(uvm_gpu_t *gpu)
 {
+    if (gpu->parent->smc.enabled)
+        return gpu->smc.rm_session_handle;
    return g_uvm_global.rm_session_handle;
 }

--- a/kernel-open/nvidia-uvm/uvm_gpu.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu.c
@@ -99,8 +99,8 @@ static void fill_gpu_info(uvm_parent_gpu_t *parent_gpu, const UvmGpuInfo *gpu_in
    parent_gpu->system_bus.link_rate_mbyte_per_s = gpu_info->sysmemLinkRateMBps;

    if (gpu_info->systemMemoryWindowSize > 0) {
-        // memory_window_end is inclusive but uvm_parent_gpu_is_coherent()
-        // checks memory_window_end > memory_window_start as its condition.
+        // memory_window_end is inclusive but uvm_gpu_is_coherent() checks
+        // memory_window_end > memory_window_start as its condition.
        UVM_ASSERT(gpu_info->systemMemoryWindowSize > 1);
        parent_gpu->system_bus.memory_window_start = gpu_info->systemMemoryWindowStart;
        parent_gpu->system_bus.memory_window_end   = gpu_info->systemMemoryWindowStart +
@@ -136,12 +136,12 @@ static NV_STATUS get_gpu_caps(uvm_gpu_t *gpu)
        return status;

    if (gpu_caps.numaEnabled) {
-        UVM_ASSERT(uvm_parent_gpu_is_coherent(gpu->parent));
+        UVM_ASSERT(uvm_gpu_is_coherent(gpu->parent));
        gpu->mem_info.numa.enabled = true;
        gpu->mem_info.numa.node_id = gpu_caps.numaNodeId;
    }
    else {
-        UVM_ASSERT(!uvm_parent_gpu_is_coherent(gpu->parent));
+        UVM_ASSERT(!uvm_gpu_is_coherent(gpu->parent));
    }

    return NV_OK;
@@ -218,19 +218,12 @@ static bool gpu_supports_uvm(uvm_parent_gpu_t *parent_gpu)
    return parent_gpu->rm_info.subdeviceCount == 1;
 }

-static bool platform_uses_canonical_form_address(void)
-{
-    if (NVCPU_IS_PPC64LE)
-        return false;
-
-    return true;
-}
-
 bool uvm_gpu_can_address(uvm_gpu_t *gpu, NvU64 addr, NvU64 size)
 {
    // Lower and upper address spaces are typically found in platforms that use
    // the canonical address form.
    NvU64 max_va_lower;
+    NvU64 min_va_upper;
    NvU64 addr_end = addr + size - 1;
    NvU8 gpu_addr_shift;
    NvU8 cpu_addr_shift;
@@ -243,7 +236,7 @@ bool uvm_gpu_can_address(uvm_gpu_t *gpu, NvU64 addr, NvU64 size)
    UVM_ASSERT(size > 0);

    gpu_addr_shift = gpu->address_space_tree.hal->num_va_bits();
-    cpu_addr_shift = fls64(TASK_SIZE - 1) + 1;
+    cpu_addr_shift = uvm_cpu_num_va_bits();
    addr_shift = gpu_addr_shift;

    // Pascal+ GPUs are capable of accessing kernel pointers in various modes
@@ -279,9 +272,7 @@ bool uvm_gpu_can_address(uvm_gpu_t *gpu, NvU64 addr, NvU64 size)
    //               0 +----------------+               0 +----------------+

    // On canonical form address platforms and Pascal+ GPUs.
-    if (platform_uses_canonical_form_address() && gpu_addr_shift > 40) {
-        NvU64 min_va_upper;
-
+    if (uvm_platform_uses_canonical_form_address() && gpu_addr_shift > 40) {
        // On x86, when cpu_addr_shift > gpu_addr_shift, it means the CPU uses
        // 5-level paging and the GPU is pre-Hopper. On Pascal-Ada GPUs (49b
        // wide VA) we set addr_shift to match a 4-level paging x86 (48b wide).
@@ -292,15 +283,11 @@ bool uvm_gpu_can_address(uvm_gpu_t *gpu, NvU64 addr, NvU64 size)
            addr_shift = gpu_addr_shift;
        else
            addr_shift = cpu_addr_shift;
+    }

-        min_va_upper = (NvU64)((NvS64)(1ULL << 63) >> (64 - addr_shift));
-        max_va_lower = 1ULL << (addr_shift - 1);
-        return (addr_end < max_va_lower) || (addr >= min_va_upper);
-    }
-    else {
-        max_va_lower = 1ULL << addr_shift;
-        return addr_end < max_va_lower;
-    }
+    uvm_get_unaddressable_range(addr_shift, &max_va_lower, &min_va_upper);
+
+    return (addr_end < max_va_lower) || (addr >= min_va_upper);
 }

 // The internal UVM VAS does not use canonical form addresses.
@@ -326,14 +313,14 @@ NvU64 uvm_parent_gpu_canonical_address(uvm_parent_gpu_t *parent_gpu, NvU64 addr)
    NvU8 addr_shift;
    NvU64 input_addr = addr;

-    if (platform_uses_canonical_form_address()) {
+    if (uvm_platform_uses_canonical_form_address()) {
        // When the CPU VA width is larger than GPU's, it means that:
        // On ARM: the CPU is on LVA mode and the GPU is pre-Hopper.
        // On x86: the CPU uses 5-level paging and the GPU is pre-Hopper.
        // We sign-extend on the 48b on ARM and on the 47b on x86 to mirror the
        // behavior of CPUs with smaller (than GPU) VA widths.
        gpu_addr_shift = parent_gpu->arch_hal->mmu_mode_hal(UVM_PAGE_SIZE_64K)->num_va_bits();
-        cpu_addr_shift = fls64(TASK_SIZE - 1) + 1;
+        cpu_addr_shift = uvm_cpu_num_va_bits();

        if (cpu_addr_shift > gpu_addr_shift)
            addr_shift = NVCPU_IS_X86_64 ? 48 : 49;
@@ -1089,7 +1076,7 @@ static NV_STATUS init_parent_gpu(uvm_parent_gpu_t *parent_gpu,
 {
    NV_STATUS status;

-    status = uvm_rm_locked_call(nvUvmInterfaceDeviceCreate(uvm_global_session_handle(),
+    status = uvm_rm_locked_call(nvUvmInterfaceDeviceCreate(g_uvm_global.rm_session_handle,
                                                           gpu_info,
                                                           gpu_uuid,
                                                           &parent_gpu->rm_device,
@@ -1099,12 +1086,7 @@ static NV_STATUS init_parent_gpu(uvm_parent_gpu_t *parent_gpu,
        return status;
    }

-    status = uvm_conf_computing_init_parent_gpu(parent_gpu);
-    if (status != NV_OK) {
-        UVM_ERR_PRINT("Confidential computing: %s, GPU %s\n",
-                      nvstatusToString(status), parent_gpu->name);
-        return status;
-    }
+    uvm_conf_computing_check_parent_gpu(parent_gpu);

    parent_gpu->pci_dev = gpu_platform_info->pci_dev;
    parent_gpu->closest_cpu_numa_node = dev_to_node(&parent_gpu->pci_dev->dev);
@@ -1166,8 +1148,19 @@ static NV_STATUS init_gpu(uvm_gpu_t *gpu, const UvmGpuInfo *gpu_info)
 {
    NV_STATUS status;

+    // Presently, an RM client can only subscribe to a single partition per
+    // GPU. Therefore, UVM needs to create several RM clients. For simplicity,
+    // and since P2P is not supported when SMC partitions are created, we
+    // create a client (session) per GPU partition.
    if (gpu->parent->smc.enabled) {
-        status = uvm_rm_locked_call(nvUvmInterfaceDeviceCreate(uvm_global_session_handle(),
+        UvmPlatformInfo platform_info;
+        status = uvm_rm_locked_call(nvUvmInterfaceSessionCreate(&gpu->smc.rm_session_handle, &platform_info));
+        if (status != NV_OK) {
+            UVM_ERR_PRINT("Creating RM session failed: %s\n", nvstatusToString(status));
+            return status;
+        }
+
+        status = uvm_rm_locked_call(nvUvmInterfaceDeviceCreate(uvm_gpu_session_handle(gpu),
                                                               gpu_info,
                                                               uvm_gpu_uuid(gpu),
                                                               &gpu->smc.rm_device,
@@ -1537,6 +1530,9 @@ static void deinit_gpu(uvm_gpu_t *gpu)
    if (gpu->parent->smc.enabled) {
        if (gpu->smc.rm_device != 0)
            uvm_rm_locked_call_void(nvUvmInterfaceDeviceDestroy(gpu->smc.rm_device));
+
+        if (gpu->smc.rm_session_handle != 0)
+            uvm_rm_locked_call_void(nvUvmInterfaceSessionDestroy(gpu->smc.rm_session_handle));
    }

    gpu->magic = 0;
@@ -2566,7 +2562,7 @@ static void disable_peer_access(uvm_gpu_t *gpu0, uvm_gpu_t *gpu1)
        uvm_mmu_destroy_peer_identity_mappings(gpu0, gpu1);
        uvm_mmu_destroy_peer_identity_mappings(gpu1, gpu0);

-        uvm_rm_locked_call_void(nvUvmInterfaceP2pObjectDestroy(uvm_global_session_handle(), p2p_handle));
+        uvm_rm_locked_call_void(nvUvmInterfaceP2pObjectDestroy(uvm_gpu_session_handle(gpu0), p2p_handle));

        UVM_ASSERT(uvm_gpu_get(gpu0->global_id) == gpu0);
        UVM_ASSERT(uvm_gpu_get(gpu1->global_id) == gpu1);
@@ -2692,9 +2688,9 @@ uvm_processor_id_t uvm_gpu_get_processor_id_by_address(uvm_gpu_t *gpu, uvm_gpu_p
    return id;
 }

-uvm_gpu_peer_t *uvm_gpu_index_peer_caps(const uvm_gpu_id_t gpu_id0, const uvm_gpu_id_t gpu_id1)
+uvm_gpu_peer_t *uvm_gpu_index_peer_caps(const uvm_gpu_id_t gpu_id1, const uvm_gpu_id_t gpu_id2)
 {
-    NvU32 table_index = uvm_gpu_peer_table_index(gpu_id0, gpu_id1);
+    NvU32 table_index = uvm_gpu_peer_table_index(gpu_id1, gpu_id2);
    return &g_uvm_global.peers[table_index];
 }

--- a/kernel-open/nvidia-uvm/uvm_gpu.h
+++ b/kernel-open/nvidia-uvm/uvm_gpu.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2015-2023 NVIDIA Corporation
+    Copyright (c) 2015-2022 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -57,14 +57,16 @@

 typedef struct
 {
-    // Number of faults from this uTLB that have been fetched but have not been serviced yet
+    // Number of faults from this uTLB that have been fetched but have not been
+    // serviced yet.
    NvU32 num_pending_faults;

    // Whether the uTLB contains fatal faults
    bool has_fatal_faults;

-    // We have issued a replay of type START_ACK_ALL while containing fatal faults. This puts
-    // the uTLB in lockdown mode and no new translations are accepted
+    // We have issued a replay of type START_ACK_ALL while containing fatal
+    // faults. This puts the uTLB in lockdown mode and no new translations are
+    // accepted.
    bool in_lockdown;

    // We have issued a cancel on this uTLB
@@ -126,8 +128,8 @@ struct uvm_service_block_context_struct
        struct list_head service_context_list;

        // A mask of GPUs that need to be checked for ECC errors before the CPU
-        // fault handler returns, but after the VA space lock has been unlocked to
-        // avoid the RM/UVM VA space lock deadlocks.
+        // fault handler returns, but after the VA space lock has been unlocked
+        // to avoid the RM/UVM VA space lock deadlocks.
        uvm_processor_mask_t gpus_to_check_for_ecc;

        // This is set to throttle page fault thrashing.
@@ -160,14 +162,14 @@ struct uvm_service_block_context_struct

    struct
    {
-        // Per-processor mask with the pages that will be resident after servicing.
-        // We need one mask per processor because we may coalesce faults that
-        // trigger migrations to different processors.
+        // Per-processor mask with the pages that will be resident after
+        // servicing. We need one mask per processor because we may coalesce
+        // faults that trigger migrations to different processors.
        uvm_page_mask_t new_residency;
    } per_processor_masks[UVM_ID_MAX_PROCESSORS];

    // State used by the VA block routines called by the servicing routine
-    uvm_va_block_context_t *block_context;
+    uvm_va_block_context_t block_context;

    // Prefetch state hint
    uvm_perf_prefetch_hint_t prefetch_hint;
@@ -179,23 +181,28 @@ struct uvm_service_block_context_struct
 typedef struct
 {
    // Mask of read faulted pages in a UVM_VA_BLOCK_SIZE aligned region of a SAM
-    // VMA. Used for batching ATS faults in a vma.
+    // VMA. Used for batching ATS faults in a vma. This is unused for access
+    // counter service requests.
    uvm_page_mask_t read_fault_mask;

    // Mask of write faulted pages in a UVM_VA_BLOCK_SIZE aligned region of a
-    // SAM VMA. Used for batching ATS faults in a vma.
+    // SAM VMA. Used for batching ATS faults in a vma. This is unused for access
+    // counter service requests.
    uvm_page_mask_t write_fault_mask;

    // Mask of successfully serviced pages in a UVM_VA_BLOCK_SIZE aligned region
-    // of a SAM VMA. Used to return ATS fault status.
+    // of a SAM VMA. Used to return ATS fault status. This is unused for access
+    // counter service requests.
    uvm_page_mask_t faults_serviced_mask;

    // Mask of successfully serviced read faults on pages in write_fault_mask.
+    // This is unused for access counter service requests.
    uvm_page_mask_t reads_serviced_mask;

-    // Mask of all faulted pages in a UVM_VA_BLOCK_SIZE aligned region of a
-    // SAM VMA. This is used as input to the prefetcher.
-    uvm_page_mask_t faulted_mask;
+    // Mask of all accessed pages in a UVM_VA_BLOCK_SIZE aligned region of a SAM
+    // VMA. This is used as input for access counter service requests and output
+    // of fault service requests.
+    uvm_page_mask_t accessed_mask;

    // Client type of the service requestor.
    uvm_fault_client_type_t client_type;
@@ -294,11 +301,8 @@ struct uvm_fault_service_batch_context_struct

 struct uvm_ats_fault_invalidate_struct
 {
-    // Whether the TLB batch contains any information
-    bool            write_faults_in_batch;
-
-    // Batch of TLB entries to be invalidated
-    uvm_tlb_batch_t write_faults_tlb_batch;
+    bool            tlb_batch_pending;
+    uvm_tlb_batch_t tlb_batch;
 };

 typedef struct
@@ -443,20 +447,9 @@ struct uvm_access_counter_service_batch_context_struct
        NvU32                             num_notifications;

        // Boolean used to avoid sorting the fault batch by instance_ptr if we
-        // determine at fetch time that all the access counter notifications in the
-        // batch report the same instance_ptr
+        // determine at fetch time that all the access counter notifications in
+        // the batch report the same instance_ptr
        bool is_single_instance_ptr;
-
-        // Scratch space, used to generate artificial physically addressed notifications.
-        // Virtual address notifications are always aligned to 64k. This means up to 16
-        // different physical locations could have been accessed to trigger one notification.
-        // The sub-granularity mask can correspond to any of them.
-        struct
-        {
-            uvm_processor_id_t resident_processors[16];
-            uvm_gpu_phys_address_t phys_addresses[16];
-            uvm_access_counter_buffer_entry_t phys_entry;
-        } scratch;
    } virt;

    struct
@@ -467,8 +460,8 @@ struct uvm_access_counter_service_batch_context_struct
        NvU32                              num_notifications;

        // Boolean used to avoid sorting the fault batch by aperture if we
-        // determine at fetch time that all the access counter notifications in the
-        // batch report the same aperture
+        // determine at fetch time that all the access counter notifications in
+        // the batch report the same aperture
        bool                              is_single_aperture;
    } phys;

@@ -478,6 +471,9 @@ struct uvm_access_counter_service_batch_context_struct
    // Structure used to coalesce access counter servicing in a VA block
    uvm_service_block_context_t block_service_context;

+    // Structure used to service access counter migrations in an ATS block.
+    uvm_ats_fault_context_t ats_context;
+
    // Unique id (per-GPU) generated for tools events recording
    NvU32 batch_id;
 };
@@ -664,8 +660,8 @@ struct uvm_gpu_struct
    struct
    {
        // Big page size used by the internal UVM VA space
-        // Notably it may be different than the big page size used by a user's VA
-        // space in general.
+        // Notably it may be different than the big page size used by a user's
+        // VA space in general.
        NvU32 internal_size;
    } big_page;

@@ -691,8 +687,8 @@ struct uvm_gpu_struct
        // lazily-populated array of peer GPUs, indexed by the peer's GPU index
        uvm_gpu_t *peer_gpus[UVM_ID_MAX_GPUS];

-        // Leaf spinlock used to synchronize access to the peer_gpus table so that
-        // it can be safely accessed from the access counters bottom half
+        // Leaf spinlock used to synchronize access to the peer_gpus table so
+        // that it can be safely accessed from the access counters bottom half
        uvm_spinlock_t peer_gpus_lock;
    } peer_info;

@@ -828,6 +824,8 @@ struct uvm_gpu_struct
    {
        NvU32 swizz_id;

+        uvmGpuSessionHandle rm_session_handle;
+
        // RM device handle used in many of the UVM/RM APIs.
        //
        // Do not read this field directly, use uvm_gpu_device_handle instead.
@@ -981,6 +979,10 @@ struct uvm_parent_gpu_struct

    bool plc_supported;

+    // If true, page_tree initialization pre-populates no_ats_ranges. It only
+    // affects ATS systems.
+    bool no_ats_range_required;
+
    // Parameters used by the TLB batching API
    struct
    {
@@ -1052,14 +1054,16 @@ struct uvm_parent_gpu_struct
    // Interrupt handling state and locks
    uvm_isr_info_t isr;

-    // Fault buffer info. This is only valid if supports_replayable_faults is set to true
+    // Fault buffer info. This is only valid if supports_replayable_faults is
+    // set to true.
    uvm_fault_buffer_info_t fault_buffer_info;

    // PMM lazy free processing queue.
    // TODO: Bug 3881835: revisit whether to use nv_kthread_q_t or workqueue.
    nv_kthread_q_t lazy_free_q;

-    // Access counter buffer info. This is only valid if supports_access_counters is set to true
+    // Access counter buffer info. This is only valid if
+    // supports_access_counters is set to true.
    uvm_access_counter_buffer_info_t access_counter_buffer_info;

    // Number of uTLBs per GPC. This information is only valid on Pascal+ GPUs.
@@ -1109,7 +1113,7 @@ struct uvm_parent_gpu_struct
    uvm_rb_tree_t instance_ptr_table;
    uvm_spinlock_t instance_ptr_table_lock;

-    // This is set to true if the GPU belongs to an SLI group. Else, set to false.
+    // This is set to true if the GPU belongs to an SLI group.
    bool sli_enabled;

    struct
@@ -1136,8 +1140,8 @@ struct uvm_parent_gpu_struct
    // environment, rather than using the peer-id field of the PTE (which can
    // only address 8 gpus), all gpus are assigned a 47-bit physical address
    // space by the fabric manager. Any physical address access to these
-    // physical address spaces are routed through the switch to the corresponding
-    // peer.
+    // physical address spaces are routed through the switch to the
+    // corresponding peer.
    struct
    {
        bool is_nvswitch_connected;
@@ -1347,7 +1351,7 @@ static NvU64 uvm_gpu_retained_count(uvm_gpu_t *gpu)
 void uvm_parent_gpu_kref_put(uvm_parent_gpu_t *gpu);

 // Calculates peer table index using GPU ids.
-NvU32 uvm_gpu_peer_table_index(const uvm_gpu_id_t gpu_id0, const uvm_gpu_id_t gpu_id1);
+NvU32 uvm_gpu_peer_table_index(uvm_gpu_id_t gpu_id1, uvm_gpu_id_t gpu_id2);

 // Either retains an existing PCIe peer entry or creates a new one. In both
 // cases the two GPUs are also each retained.
@@ -1362,11 +1366,12 @@ void uvm_gpu_release_pcie_peer_access(uvm_gpu_t *gpu0, uvm_gpu_t *gpu1);
 // They must not be the same gpu.
 uvm_aperture_t uvm_gpu_peer_aperture(uvm_gpu_t *local_gpu, uvm_gpu_t *remote_gpu);

-// Get the processor id accessible by the given GPU for the given physical address
+// Get the processor id accessible by the given GPU for the given physical
+// address.
 uvm_processor_id_t uvm_gpu_get_processor_id_by_address(uvm_gpu_t *gpu, uvm_gpu_phys_address_t addr);

 // Get the P2P capabilities between the gpus with the given indexes
-uvm_gpu_peer_t *uvm_gpu_index_peer_caps(const uvm_gpu_id_t gpu_id0, const uvm_gpu_id_t gpu_id1);
+uvm_gpu_peer_t *uvm_gpu_index_peer_caps(uvm_gpu_id_t gpu_id1, uvm_gpu_id_t gpu_id2);

 // Get the P2P capabilities between the given gpus
 static uvm_gpu_peer_t *uvm_gpu_peer_caps(const uvm_gpu_t *gpu0, const uvm_gpu_t *gpu1)
@@ -1374,10 +1379,10 @@ static uvm_gpu_peer_t *uvm_gpu_peer_caps(const uvm_gpu_t *gpu0, const uvm_gpu_t
    return uvm_gpu_index_peer_caps(gpu0->id, gpu1->id);
 }

-static bool uvm_gpus_are_nvswitch_connected(const uvm_gpu_t *gpu0, const uvm_gpu_t *gpu1)
+static bool uvm_gpus_are_nvswitch_connected(uvm_gpu_t *gpu1, uvm_gpu_t *gpu2)
 {
-    if (gpu0->parent->nvswitch_info.is_nvswitch_connected && gpu1->parent->nvswitch_info.is_nvswitch_connected) {
-        UVM_ASSERT(uvm_gpu_peer_caps(gpu0, gpu1)->link_type >= UVM_GPU_LINK_NVLINK_2);
+    if (gpu1->parent->nvswitch_info.is_nvswitch_connected && gpu2->parent->nvswitch_info.is_nvswitch_connected) {
+        UVM_ASSERT(uvm_gpu_peer_caps(gpu1, gpu2)->link_type >= UVM_GPU_LINK_NVLINK_2);
        return true;
    }

@@ -1459,9 +1464,9 @@ NV_STATUS uvm_gpu_check_ecc_error(uvm_gpu_t *gpu);

 // Check for ECC errors without calling into RM
 //
-// Calling into RM is problematic in many places, this check is always safe to do.
-// Returns NV_WARN_MORE_PROCESSING_REQUIRED if there might be an ECC error and
-// it's required to call uvm_gpu_check_ecc_error() to be sure.
+// Calling into RM is problematic in many places, this check is always safe to
+// do. Returns NV_WARN_MORE_PROCESSING_REQUIRED if there might be an ECC error
+// and it's required to call uvm_gpu_check_ecc_error() to be sure.
 NV_STATUS uvm_gpu_check_ecc_error_no_rm(uvm_gpu_t *gpu);

 // Map size bytes of contiguous sysmem on the GPU for physical access
@@ -1518,11 +1523,13 @@ bool uvm_gpu_can_address(uvm_gpu_t *gpu, NvU64 addr, NvU64 size);
 // The GPU must be initialized before calling this function.
 bool uvm_gpu_can_address_kernel(uvm_gpu_t *gpu, NvU64 addr, NvU64 size);

+bool uvm_platform_uses_canonical_form_address(void);
+
 // Returns addr's canonical form for host systems that use canonical form
 // addresses.
 NvU64 uvm_parent_gpu_canonical_address(uvm_parent_gpu_t *parent_gpu, NvU64 addr);

-static bool uvm_parent_gpu_is_coherent(const uvm_parent_gpu_t *parent_gpu)
+static bool uvm_gpu_is_coherent(const uvm_parent_gpu_t *parent_gpu)
 {
    return parent_gpu->system_bus.memory_window_end > parent_gpu->system_bus.memory_window_start;
 }
@@ -1564,8 +1571,9 @@ uvm_aperture_t uvm_gpu_page_tree_init_location(const uvm_gpu_t *gpu);
 // Debug print of GPU properties
 void uvm_gpu_print(uvm_gpu_t *gpu);

-// Add the given instance pointer -> user_channel mapping to this GPU. The bottom
-// half GPU page fault handler uses this to look up the VA space for GPU faults.
+// Add the given instance pointer -> user_channel mapping to this GPU. The
+// bottom half GPU page fault handler uses this to look up the VA space for GPU
+// faults.
 NV_STATUS uvm_gpu_add_user_channel(uvm_gpu_t *gpu, uvm_user_channel_t *user_channel);
 void uvm_gpu_remove_user_channel(uvm_gpu_t *gpu, uvm_user_channel_t *user_channel);

--- a/kernel-open/nvidia-uvm/uvm_gpu_access_counters.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu_access_counters.c
@@ -33,17 +33,18 @@
 #include "uvm_va_space_mm.h"
 #include "uvm_pmm_sysmem.h"
 #include "uvm_perf_module.h"
+#include "uvm_ats.h"
+#include "uvm_ats_faults.h"

 #define UVM_PERF_ACCESS_COUNTER_BATCH_COUNT_MIN     1
 #define UVM_PERF_ACCESS_COUNTER_BATCH_COUNT_DEFAULT 256
-#define UVM_PERF_ACCESS_COUNTER_GRANULARITY_DEFAULT "2m"
+#define UVM_PERF_ACCESS_COUNTER_GRANULARITY         UVM_ACCESS_COUNTER_GRANULARITY_2M
 #define UVM_PERF_ACCESS_COUNTER_THRESHOLD_MIN       1
 #define UVM_PERF_ACCESS_COUNTER_THRESHOLD_MAX       ((1 << 16) - 1)
 #define UVM_PERF_ACCESS_COUNTER_THRESHOLD_DEFAULT   256

-#define UVM_ACCESS_COUNTER_ACTION_NOTIFY 0x1
-#define UVM_ACCESS_COUNTER_ACTION_CLEAR  0x2
-#define UVM_ACCESS_COUNTER_ON_MANAGED    0x4
+#define UVM_ACCESS_COUNTER_ACTION_CLEAR     0x1
+#define UVM_ACCESS_COUNTER_PHYS_ON_MANAGED  0x2

 // Each page in a tracked physical range may belong to a different VA Block. We
 // preallocate an array of reverse map translations. However, access counter
@@ -54,12 +55,6 @@
 #define UVM_MAX_TRANSLATION_SIZE (2 * 1024 * 1024ULL)
 #define UVM_SUB_GRANULARITY_REGIONS 32

-// The GPU offers the following tracking granularities: 64K, 2M, 16M, 16G
-//
-// Use the largest granularity to minimize the number of access counter
-// notifications. This is fine because we simply drop the notifications during
-// normal operation, and tests override these values.
-static UVM_ACCESS_COUNTER_GRANULARITY g_uvm_access_counter_granularity;
 static unsigned g_uvm_access_counter_threshold;

 // Per-VA space access counters information
@@ -87,7 +82,6 @@ static int uvm_perf_access_counter_momc_migration_enable = -1;
 static unsigned uvm_perf_access_counter_batch_count = UVM_PERF_ACCESS_COUNTER_BATCH_COUNT_DEFAULT;

 // See module param documentation below
-static char *uvm_perf_access_counter_granularity = UVM_PERF_ACCESS_COUNTER_GRANULARITY_DEFAULT;
 static unsigned uvm_perf_access_counter_threshold = UVM_PERF_ACCESS_COUNTER_THRESHOLD_DEFAULT;

 // Module parameters for the tunables
@@ -100,10 +94,6 @@ MODULE_PARM_DESC(uvm_perf_access_counter_momc_migration_enable,
                 "Whether MOMC access counters will trigger migrations."
                 "Valid values: <= -1 (default policy), 0 (off), >= 1 (on)");
 module_param(uvm_perf_access_counter_batch_count, uint, S_IRUGO);
-module_param(uvm_perf_access_counter_granularity, charp, S_IRUGO);
-MODULE_PARM_DESC(uvm_perf_access_counter_granularity,
-                 "Size of the physical memory region tracked by each counter. Valid values as"
-                 "of Volta: 64k, 2m, 16m, 16g");
 module_param(uvm_perf_access_counter_threshold, uint, S_IRUGO);
 MODULE_PARM_DESC(uvm_perf_access_counter_threshold,
                 "Number of remote accesses on a region required to trigger a notification."
@@ -136,7 +126,7 @@ static va_space_access_counters_info_t *va_space_access_counters_info_get(uvm_va

 // Whether access counter migrations are enabled or not. The policy is as
 // follows:
-// - MIMC migrations are enabled by default on P9 systems with ATS support
+// - MIMC migrations are disabled by default on all non-ATS systems.
 // - MOMC migrations are disabled by default on all systems
 // - Users can override this policy by specifying on/off
 static bool is_migration_enabled(uvm_access_counter_type_t type)
@@ -159,7 +149,10 @@ static bool is_migration_enabled(uvm_access_counter_type_t type)
    if (type == UVM_ACCESS_COUNTER_TYPE_MOMC)
        return false;

-    return g_uvm_global.ats.supported;
+    if (UVM_ATS_SUPPORTED())
+        return g_uvm_global.ats.supported;
+
+    return false;
 }

 // Create the access counters tracking struct for the given VA space
@@ -225,30 +218,18 @@ static NV_STATUS config_granularity_to_bytes(UVM_ACCESS_COUNTER_GRANULARITY gran
    return NV_OK;
 }

-// Clear the given access counter and add it to the per-GPU clear tracker
-static NV_STATUS access_counter_clear_targeted(uvm_gpu_t *gpu,
-                                               const uvm_access_counter_buffer_entry_t *entry)
+// Clear the access counter notifications and add it to the per-GPU clear
+// tracker.
+static NV_STATUS access_counter_clear_notifications(uvm_gpu_t *gpu,
+                                                    uvm_access_counter_buffer_entry_t **notification_start,
+                                                    NvU32 num_notifications)
 {
+    NvU32 i;
    NV_STATUS status;
    uvm_push_t push;
    uvm_access_counter_buffer_info_t *access_counters = &gpu->parent->access_counter_buffer_info;

-    if (entry->address.is_virtual) {
-        status = uvm_push_begin(gpu->channel_manager,
-                                UVM_CHANNEL_TYPE_MEMOPS,
-                                &push,
-                                "Clear access counter with virtual address: 0x%llx",
-                                entry->address.address);
-    }
-    else {
-        status = uvm_push_begin(gpu->channel_manager,
-                                UVM_CHANNEL_TYPE_MEMOPS,
-                                &push,
-                                "Clear access counter with physical address: 0x%llx:%s",
-                                entry->address.address,
-                                uvm_aperture_string(entry->address.aperture));
-    }
-
+    status = uvm_push_begin(gpu->channel_manager, UVM_CHANNEL_TYPE_MEMOPS, &push, "Clear access counter batch");
    if (status != NV_OK) {
        UVM_ERR_PRINT("Error creating push to clear access counters: %s, GPU %s\n",
                      nvstatusToString(status),
@@ -256,7 +237,8 @@ static NV_STATUS access_counter_clear_targeted(uvm_gpu_t *gpu,
        return status;
    }

-    gpu->parent->host_hal->access_counter_clear_targeted(&push, entry);
+    for (i = 0; i < num_notifications; i++)
+        gpu->parent->host_hal->access_counter_clear_targeted(&push, notification_start[i]);

    uvm_push_end(&push);

@@ -381,25 +363,6 @@ NV_STATUS uvm_gpu_init_access_counters(uvm_parent_gpu_t *parent_gpu)
        g_uvm_access_counter_threshold = uvm_perf_access_counter_threshold;
    }

-    if (strcmp(uvm_perf_access_counter_granularity, "64k") == 0) {
-        g_uvm_access_counter_granularity = UVM_ACCESS_COUNTER_GRANULARITY_64K;
-    }
-    else if (strcmp(uvm_perf_access_counter_granularity, "2m") == 0) {
-        g_uvm_access_counter_granularity = UVM_ACCESS_COUNTER_GRANULARITY_2M;
-    }
-    else if (strcmp(uvm_perf_access_counter_granularity, "16m") == 0) {
-        g_uvm_access_counter_granularity = UVM_ACCESS_COUNTER_GRANULARITY_16M;
-    }
-    else if (strcmp(uvm_perf_access_counter_granularity, "16g") == 0) {
-        g_uvm_access_counter_granularity = UVM_ACCESS_COUNTER_GRANULARITY_16G;
-    }
-    else {
-        g_uvm_access_counter_granularity = UVM_ACCESS_COUNTER_GRANULARITY_2M;
-        pr_info("Invalid value '%s' for uvm_perf_access_counter_granularity, using '%s' instead",
-                uvm_perf_access_counter_granularity,
-                UVM_PERF_ACCESS_COUNTER_GRANULARITY_DEFAULT);
-    }
-
    uvm_assert_mutex_locked(&g_uvm_global.global_lock);
    UVM_ASSERT(parent_gpu->access_counter_buffer_hal != NULL);

@@ -422,7 +385,7 @@ NV_STATUS uvm_gpu_init_access_counters(uvm_parent_gpu_t *parent_gpu)
    UVM_ASSERT(access_counters->rm_info.bufferSize %
               parent_gpu->access_counter_buffer_hal->entry_size(parent_gpu) == 0);

-    status = config_granularity_to_bytes(g_uvm_access_counter_granularity, &granularity_bytes);
+    status = config_granularity_to_bytes(UVM_PERF_ACCESS_COUNTER_GRANULARITY, &granularity_bytes);
    UVM_ASSERT(status == NV_OK);
    if (granularity_bytes > UVM_MAX_TRANSLATION_SIZE)
        UVM_ASSERT(granularity_bytes % UVM_MAX_TRANSLATION_SIZE == 0);
@@ -641,8 +604,8 @@ NV_STATUS uvm_gpu_access_counters_enable(uvm_gpu_t *gpu, uvm_va_space_t *va_spac
    else {
        UvmGpuAccessCntrConfig default_config =
        {
-            .mimcGranularity = g_uvm_access_counter_granularity,
-            .momcGranularity = g_uvm_access_counter_granularity,
+            .mimcGranularity = UVM_PERF_ACCESS_COUNTER_GRANULARITY,
+            .momcGranularity = UVM_PERF_ACCESS_COUNTER_GRANULARITY,
            .mimcUseLimit = UVM_ACCESS_COUNTER_USE_LIMIT_FULL,
            .momcUseLimit = UVM_ACCESS_COUNTER_USE_LIMIT_FULL,
            .threshold = g_uvm_access_counter_threshold,
@@ -717,7 +680,10 @@ static void access_counter_buffer_flush_locked(uvm_gpu_t *gpu, uvm_gpu_buffer_fl

    while (get != put) {
        // Wait until valid bit is set
-        UVM_SPIN_WHILE(!gpu->parent->access_counter_buffer_hal->entry_is_valid(gpu->parent, get), &spin);
+        UVM_SPIN_WHILE(!gpu->parent->access_counter_buffer_hal->entry_is_valid(gpu->parent, get), &spin) {
+            if (uvm_global_get_status() != NV_OK)
+                goto done;
+        }

        gpu->parent->access_counter_buffer_hal->entry_clear_valid(gpu->parent, get);
        ++get;
@@ -725,6 +691,7 @@ static void access_counter_buffer_flush_locked(uvm_gpu_t *gpu, uvm_gpu_buffer_fl
            get = 0;
    }

+done:
    write_get(gpu->parent, get);
 }

@@ -767,6 +734,22 @@ static int cmp_sort_virt_notifications_by_instance_ptr(const void *_a, const voi
    return cmp_access_counter_instance_ptr(a, b);
 }

+// Sort comparator for pointers to GVA access counter notification buffer
+// entries that sorts by va_space, and fault address.
+static int cmp_sort_virt_notifications_by_va_space_address(const void *_a, const void *_b)
+{
+    const uvm_access_counter_buffer_entry_t **a = (const uvm_access_counter_buffer_entry_t **)_a;
+    const uvm_access_counter_buffer_entry_t **b = (const uvm_access_counter_buffer_entry_t **)_b;
+
+    int result;
+
+    result = UVM_CMP_DEFAULT((*a)->virtual_info.va_space, (*b)->virtual_info.va_space);
+    if (result != 0)
+        return result;
+
+    return UVM_CMP_DEFAULT((*a)->address.address, (*b)->address.address);
+}
+
 // Sort comparator for pointers to GPA access counter notification buffer
 // entries that sorts by physical address' aperture
 static int cmp_sort_phys_notifications_by_processor_id(const void *_a, const void *_b)
@@ -834,12 +817,18 @@ static NvU32 fetch_access_counter_buffer_entries(uvm_gpu_t *gpu,
           (fetch_mode == NOTIFICATION_FETCH_MODE_ALL || notification_index < access_counters->max_batch_size)) {
        uvm_access_counter_buffer_entry_t *current_entry = &notification_cache[notification_index];

-        // We cannot just wait for the last entry (the one pointed by put) to become valid, we have to do it
-        // individually since entries can be written out of order
+        // We cannot just wait for the last entry (the one pointed by put) to
+        // become valid, we have to do it individually since entries can be
+        // written out of order
        UVM_SPIN_WHILE(!gpu->parent->access_counter_buffer_hal->entry_is_valid(gpu->parent, get), &spin) {
            // We have some entry to work on. Let's do the rest later.
            if (fetch_mode != NOTIFICATION_FETCH_MODE_ALL && notification_index > 0)
                goto done;
+
+            // There's no entry to work on and something has gone wrong. Ignore
+            // the rest.
+            if (uvm_global_get_status() != NV_OK)
+               goto done;
        }

        // Prevent later accesses being moved above the read of the valid bit
@@ -924,12 +913,11 @@ static void translate_virt_notifications_instance_ptrs(uvm_gpu_t *gpu,

 // GVA notifications provide an instance_ptr and ve_id that can be directly
 // translated to a VA space. In order to minimize translations, we sort the
-// entries by instance_ptr.
+// entries by instance_ptr, va_space and notification address in that order.
 static void preprocess_virt_notifications(uvm_gpu_t *gpu,
                                          uvm_access_counter_service_batch_context_t *batch_context)
 {
    if (!batch_context->virt.is_single_instance_ptr) {
-        // Sort by instance_ptr
        sort(batch_context->virt.notifications,
             batch_context->virt.num_notifications,
             sizeof(*batch_context->virt.notifications),
@@ -938,6 +926,12 @@ static void preprocess_virt_notifications(uvm_gpu_t *gpu,
    }

    translate_virt_notifications_instance_ptrs(gpu, batch_context);
+
+    sort(batch_context->virt.notifications,
+         batch_context->virt.num_notifications,
+         sizeof(*batch_context->virt.notifications),
+         cmp_sort_virt_notifications_by_va_space_address,
+         NULL);
 }

 // GPA notifications provide a physical address and an aperture. Sort
@@ -946,7 +940,6 @@ static void preprocess_virt_notifications(uvm_gpu_t *gpu,
 static void preprocess_phys_notifications(uvm_access_counter_service_batch_context_t *batch_context)
 {
    if (!batch_context->phys.is_single_aperture) {
-        // Sort by instance_ptr
        sort(batch_context->phys.notifications,
             batch_context->phys.num_notifications,
             sizeof(*batch_context->phys.notifications),
@@ -955,6 +948,28 @@ static void preprocess_phys_notifications(uvm_access_counter_service_batch_conte
    }
 }

+static NV_STATUS notify_tools_and_process_flags(uvm_gpu_t *gpu,
+                                                uvm_access_counter_buffer_entry_t **notification_start,
+                                                NvU32 num_entries,
+                                                NvU32 flags)
+{
+    NV_STATUS status = NV_OK;
+
+    if (uvm_enable_builtin_tests) {
+        // TODO: Bug 4310744: [UVM][TOOLS] Attribute access counter tools events
+        //                    to va_space instead of broadcasting.
+        NvU32 i;
+
+        for (i = 0; i < num_entries; i++)
+            uvm_tools_broadcast_access_counter(gpu, notification_start[i], flags & UVM_ACCESS_COUNTER_PHYS_ON_MANAGED);
+    }
+
+    if (flags & UVM_ACCESS_COUNTER_ACTION_CLEAR)
+        status = access_counter_clear_notifications(gpu, notification_start, num_entries);
+
+    return status;
+}
+
 static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,
                                         uvm_va_block_t *va_block,
                                         uvm_va_block_retry_t *va_block_retry,
@@ -985,7 +1000,7 @@ static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,
        return NV_OK;

    if (uvm_processor_mask_test(&va_block->resident, processor))
-        residency_mask = uvm_va_block_resident_mask_get(va_block, processor, NUMA_NO_NODE);
+        residency_mask = uvm_va_block_resident_mask_get(va_block, processor);
    else
        residency_mask = NULL;

@@ -1036,8 +1051,8 @@ static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,

        // If the underlying VMA is gone, skip HMM migrations.
        if (uvm_va_block_is_hmm(va_block)) {
-            status = uvm_hmm_find_vma(service_context->block_context->mm,
-                                      &service_context->block_context->hmm.vma,
+            status = uvm_hmm_find_vma(service_context->block_context.mm,
+                                      &service_context->block_context.hmm.vma,
                                      address);
            if (status == NV_ERR_INVALID_ADDRESS)
                continue;
@@ -1048,7 +1063,7 @@ static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,
        policy = uvm_va_policy_get(va_block, address);

        new_residency = uvm_va_block_select_residency(va_block,
-                                                      service_context->block_context,
+                                                      &service_context->block_context,
                                                      page_index,
                                                      processor,
                                                      uvm_fault_access_type_mask_bit(UVM_FAULT_ACCESS_TYPE_PREFETCH),
@@ -1083,7 +1098,7 @@ static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,
        // Remove pages that are already resident in the destination processors
        for_each_id_in_mask(id, &update_processors) {
            bool migrate_pages;
-            uvm_page_mask_t *residency_mask = uvm_va_block_resident_mask_get(va_block, id, NUMA_NO_NODE);
+            uvm_page_mask_t *residency_mask = uvm_va_block_resident_mask_get(va_block, id);
            UVM_ASSERT(residency_mask);

            migrate_pages = uvm_page_mask_andnot(&service_context->per_processor_masks[uvm_id_value(id)].new_residency,
@@ -1101,9 +1116,9 @@ static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,

                if (uvm_va_block_is_hmm(va_block)) {
                    status = NV_ERR_INVALID_ADDRESS;
-                    if (service_context->block_context->mm) {
+                    if (service_context->block_context.mm) {
                        status = uvm_hmm_find_policy_vma_and_outer(va_block,
-                                                                   &service_context->block_context->hmm.vma,
+                                                                   &service_context->block_context.hmm.vma,
                                                                   first_page_index,
                                                                   &policy,
                                                                   &outer);
@@ -1163,7 +1178,7 @@ static NV_STATUS service_phys_single_va_block(uvm_gpu_t *gpu,
                                              const uvm_access_counter_buffer_entry_t *current_entry,
                                              const uvm_reverse_map_t *reverse_mappings,
                                              size_t num_reverse_mappings,
-                                              unsigned *out_flags)
+                                              NvU32 *out_flags)
 {
    size_t index;
    uvm_va_block_t *va_block = reverse_mappings[0].va_block;
@@ -1190,7 +1205,6 @@ static NV_STATUS service_phys_single_va_block(uvm_gpu_t *gpu,
        // If an mm is registered with the VA space, we have to retain it
        // in order to lock it before locking the VA space.
        mm = uvm_va_space_mm_retain_lock(va_space);
-
        uvm_va_space_down_read(va_space);

        // Re-check that the VA block is valid after taking the VA block lock.
@@ -1206,7 +1220,7 @@ static NV_STATUS service_phys_single_va_block(uvm_gpu_t *gpu,

        service_context->operation = UVM_SERVICE_OPERATION_ACCESS_COUNTERS;
        service_context->num_retries = 0;
-        service_context->block_context->mm = mm;
+        service_context->block_context.mm = mm;

        if (uvm_va_block_is_hmm(va_block)) {
            uvm_hmm_service_context_init(service_context);
@@ -1251,7 +1265,7 @@ static NV_STATUS service_phys_va_blocks(uvm_gpu_t *gpu,
                                        const uvm_access_counter_buffer_entry_t *current_entry,
                                        const uvm_reverse_map_t *reverse_mappings,
                                        size_t num_reverse_mappings,
-                                        unsigned *out_flags)
+                                        NvU32 *out_flags)
 {
    NV_STATUS status = NV_OK;
    size_t index;
@@ -1259,7 +1273,7 @@ static NV_STATUS service_phys_va_blocks(uvm_gpu_t *gpu,
    *out_flags &= ~UVM_ACCESS_COUNTER_ACTION_CLEAR;

    for (index = 0; index < num_reverse_mappings; ++index) {
-        unsigned out_flags_local = 0;
+        NvU32 out_flags_local = 0;
        status = service_phys_single_va_block(gpu,
                                              batch_context,
                                              current_entry,
@@ -1318,7 +1332,7 @@ static NV_STATUS service_phys_notification_translation(uvm_gpu_t *gpu,
                                                       NvU64 address,
                                                       unsigned long sub_granularity,
                                                       size_t *num_reverse_mappings,
-                                                       unsigned *out_flags)
+                                                       NvU32 *out_flags)
 {
    NV_STATUS status;
    NvU32 region_start, region_end;
@@ -1327,7 +1341,10 @@ static NV_STATUS service_phys_notification_translation(uvm_gpu_t *gpu,

    // Get the reverse_map translations for all the regions set in the
    // sub_granularity field of the counter.
-    for_each_sub_granularity_region(region_start, region_end, sub_granularity, config->sub_granularity_regions_per_translation) {
+    for_each_sub_granularity_region(region_start,
+                                    region_end,
+                                    sub_granularity,
+                                    config->sub_granularity_regions_per_translation) {
        NvU64 local_address = address + region_start * config->sub_granularity_region_size;
        NvU32 local_translation_size = (region_end - region_start) * config->sub_granularity_region_size;
        uvm_reverse_map_t *local_reverse_mappings = batch_context->phys.translations + *num_reverse_mappings;
@@ -1376,7 +1393,7 @@ static NV_STATUS service_phys_notification_translation(uvm_gpu_t *gpu,
 static NV_STATUS service_phys_notification(uvm_gpu_t *gpu,
                                           uvm_access_counter_service_batch_context_t *batch_context,
                                           const uvm_access_counter_buffer_entry_t *current_entry,
-                                           unsigned *out_flags)
+                                           NvU32 *out_flags)
 {
    NvU64 address;
    NvU64 translation_index;
@@ -1387,7 +1404,7 @@ static NV_STATUS service_phys_notification(uvm_gpu_t *gpu,
    size_t total_reverse_mappings = 0;
    uvm_gpu_t *resident_gpu = NULL;
    NV_STATUS status = NV_OK;
-    unsigned flags = 0;
+    NvU32 flags = 0;

    address = current_entry->address.address;
    UVM_ASSERT(address % config->translation_size == 0);
@@ -1415,7 +1432,7 @@ static NV_STATUS service_phys_notification(uvm_gpu_t *gpu,

    for (translation_index = 0; translation_index < config->translations_per_counter; ++translation_index) {
        size_t num_reverse_mappings;
-        unsigned out_flags_local = 0;
+        NvU32 out_flags_local = 0;
        status = service_phys_notification_translation(gpu,
                                                       resident_gpu,
                                                       batch_context,
@@ -1437,11 +1454,8 @@ static NV_STATUS service_phys_notification(uvm_gpu_t *gpu,
        sub_granularity = sub_granularity >> config->sub_granularity_regions_per_translation;
    }

-    // Currently we only report events for our tests, not for tools
-    if (uvm_enable_builtin_tests) {
-        *out_flags |= UVM_ACCESS_COUNTER_ACTION_NOTIFY;
-        *out_flags |= ((total_reverse_mappings != 0) ? UVM_ACCESS_COUNTER_ON_MANAGED : 0);
-    }
+    if (uvm_enable_builtin_tests)
+        *out_flags |= ((total_reverse_mappings != 0) ? UVM_ACCESS_COUNTER_PHYS_ON_MANAGED : 0);

    if (status == NV_OK && (flags & UVM_ACCESS_COUNTER_ACTION_CLEAR))
        *out_flags |= UVM_ACCESS_COUNTER_ACTION_CLEAR;
@@ -1454,22 +1468,21 @@ static NV_STATUS service_phys_notifications(uvm_gpu_t *gpu,
                                            uvm_access_counter_service_batch_context_t *batch_context)
 {
    NvU32 i;
+    uvm_access_counter_buffer_entry_t **notifications = batch_context->phys.notifications;
+
    preprocess_phys_notifications(batch_context);

    for (i = 0; i < batch_context->phys.num_notifications; ++i) {
        NV_STATUS status;
-        uvm_access_counter_buffer_entry_t *current_entry = batch_context->phys.notifications[i];
-        unsigned flags = 0;
+        uvm_access_counter_buffer_entry_t *current_entry = notifications[i];
+        NvU32 flags = 0;

        if (!UVM_ID_IS_VALID(current_entry->physical_info.resident_id))
            continue;

        status = service_phys_notification(gpu, batch_context, current_entry, &flags);
-        if (flags & UVM_ACCESS_COUNTER_ACTION_NOTIFY)
-            uvm_tools_broadcast_access_counter(gpu, current_entry, flags & UVM_ACCESS_COUNTER_ON_MANAGED);

-        if (status == NV_OK && (flags & UVM_ACCESS_COUNTER_ACTION_CLEAR))
-            status = access_counter_clear_targeted(gpu, current_entry);
+        notify_tools_and_process_flags(gpu, &notifications[i], 1, flags);

        if (status != NV_OK)
            return status;
@@ -1478,187 +1491,375 @@ static NV_STATUS service_phys_notifications(uvm_gpu_t *gpu,
    return NV_OK;
 }

-static int cmp_sort_gpu_phys_addr(const void *_a, const void *_b)
+static NV_STATUS service_notification_va_block_helper(struct mm_struct *mm,
+                                                      uvm_va_block_t *va_block,
+                                                      uvm_processor_id_t processor,
+                                                      uvm_access_counter_service_batch_context_t *batch_context)
 {
-    return uvm_gpu_phys_addr_cmp(*(uvm_gpu_phys_address_t*)_a,
-                                 *(uvm_gpu_phys_address_t*)_b);
-}
+    uvm_va_block_retry_t va_block_retry;
+    uvm_page_mask_t *accessed_pages = &batch_context->accessed_pages;
+    uvm_service_block_context_t *service_context = &batch_context->block_service_context;

-static bool gpu_phys_same_region(uvm_gpu_phys_address_t a, uvm_gpu_phys_address_t b, NvU64 granularity)
-{
-    if (a.aperture != b.aperture)
-        return false;
-
-    UVM_ASSERT(is_power_of_2(granularity));
-
-    return UVM_ALIGN_DOWN(a.address, granularity) == UVM_ALIGN_DOWN(b.address, granularity);
-}
-
-static bool phys_address_in_accessed_sub_region(uvm_gpu_phys_address_t address,
-                                                NvU64 region_size,
-                                                NvU64 sub_region_size,
-                                                NvU32 accessed_mask)
-{
-    const unsigned accessed_index = (address.address % region_size) / sub_region_size;
-
-    // accessed_mask is only filled for tracking granularities larger than 64K
-    if (region_size == UVM_PAGE_SIZE_64K)
-        return true;
-
-    UVM_ASSERT(accessed_index < 32);
-    return ((1 << accessed_index) & accessed_mask) != 0;
-}
-
-static NV_STATUS service_virt_notification(uvm_gpu_t *gpu,
-                                           uvm_access_counter_service_batch_context_t *batch_context,
-                                           const uvm_access_counter_buffer_entry_t *current_entry,
-                                           unsigned *out_flags)
-{
-    NV_STATUS status = NV_OK;
-    NvU64 notification_size;
-    NvU64 address;
-    uvm_processor_id_t *resident_processors = batch_context->virt.scratch.resident_processors;
-    uvm_gpu_phys_address_t *phys_addresses = batch_context->virt.scratch.phys_addresses;
-    int num_addresses = 0;
-    int i;
-
-    // Virtual address notifications are always 64K aligned
-    NvU64 region_start = current_entry->address.address;
-    NvU64 region_end = current_entry->address.address + UVM_PAGE_SIZE_64K;
-    
-
-    uvm_access_counter_buffer_info_t *access_counters = &gpu->parent->access_counter_buffer_info;
-    uvm_access_counter_type_t counter_type = current_entry->counter_type;
-
-    const uvm_gpu_access_counter_type_config_t *config = get_config_for_type(access_counters, counter_type);
-
-    uvm_va_space_t *va_space = current_entry->virtual_info.va_space;
-
-    UVM_ASSERT(counter_type == UVM_ACCESS_COUNTER_TYPE_MIMC);
-
-    // Entries with NULL va_space are simply dropped.
-    if (!va_space)
+    if (uvm_page_mask_empty(accessed_pages))
        return NV_OK;

-    status = config_granularity_to_bytes(config->rm.granularity, &notification_size);
-    if (status != NV_OK)
-        return status;
+    uvm_assert_mutex_locked(&va_block->lock);

-    // Collect physical locations that could have been touched
-    // in the reported 64K VA region. The notification mask can
-    // correspond to any of them.
-    uvm_va_space_down_read(va_space);
-    for (address = region_start; address < region_end;) {
-        uvm_va_block_t *va_block;
+    service_context->operation = UVM_SERVICE_OPERATION_ACCESS_COUNTERS;
+    service_context->num_retries = 0;
+    service_context->block_context.mm = mm;

-        NV_STATUS local_status = uvm_va_block_find(va_space, address, &va_block);
-        if (local_status == NV_ERR_INVALID_ADDRESS || local_status == NV_ERR_OBJECT_NOT_FOUND) {
-            address += PAGE_SIZE;
-            continue;
-        }
+    return UVM_VA_BLOCK_RETRY_LOCKED(va_block,
+                                     &va_block_retry,
+                                     service_va_block_locked(processor,
+                                                             va_block,
+                                                             &va_block_retry,
+                                                             service_context,
+                                                             accessed_pages));
+}

-        uvm_mutex_lock(&va_block->lock);
-        while (address < va_block->end && address < region_end) {
-            const unsigned page_index = uvm_va_block_cpu_page_index(va_block, address);
+static void expand_notification_block(uvm_gpu_va_space_t *gpu_va_space,
+                                      uvm_va_block_t *va_block,
+                                      uvm_page_mask_t *accessed_pages,
+                                      const uvm_access_counter_buffer_entry_t *current_entry)
+{
+    NvU64 addr;
+    NvU64 granularity = 0;
+    uvm_gpu_t *resident_gpu = NULL;
+    uvm_processor_id_t resident_id;
+    uvm_page_index_t page_index;
+    uvm_gpu_t *gpu = gpu_va_space->gpu;
+    const uvm_access_counter_buffer_info_t *access_counters = &gpu->parent->access_counter_buffer_info;
+    const uvm_gpu_access_counter_type_config_t *config = get_config_for_type(access_counters,
+                                                                             UVM_ACCESS_COUNTER_TYPE_MIMC);

-            // UVM va_block always maps the closest resident location to processor
-            const uvm_processor_id_t res_id = uvm_va_block_page_get_closest_resident(va_block, page_index, gpu->id);
+    config_granularity_to_bytes(config->rm.granularity, &granularity);

-            // Add physical location if it's valid and not local vidmem
-            if (UVM_ID_IS_VALID(res_id) && !uvm_id_equal(res_id, gpu->id)) {
-                uvm_gpu_phys_address_t phys_address = uvm_va_block_res_phys_page_address(va_block, page_index, res_id, gpu);
-                if (phys_address_in_accessed_sub_region(phys_address,
-                                                        notification_size,
-                                                        config->sub_granularity_region_size,
-                                                        current_entry->sub_granularity)) {
-                    resident_processors[num_addresses] = res_id;
-                    phys_addresses[num_addresses] = phys_address;
-                    ++num_addresses;
-                }
-                else {
-                    UVM_DBG_PRINT_RL("Skipping phys address %llx:%s, because it couldn't have been accessed in mask %x",
-                                     phys_address.address,
-                                     uvm_aperture_string(phys_address.aperture),
-                                     current_entry->sub_granularity);
-                }
-            }
+    // Granularities other than 2MB can only be enabled by UVM tests. Do nothing
+    // in that case.
+    if (granularity != UVM_PAGE_SIZE_2M)
+        return;

-            address += PAGE_SIZE;
-        }
-        uvm_mutex_unlock(&va_block->lock);
+    addr = current_entry->address.address;
+
+    uvm_assert_rwsem_locked(&gpu_va_space->va_space->lock);
+    uvm_assert_mutex_locked(&va_block->lock);
+
+    page_index = uvm_va_block_cpu_page_index(va_block, addr);
+
+    resident_id = uvm_va_block_page_get_closest_resident(va_block, page_index, gpu->id);
+
+    // resident_id might be invalid or might already be the same as the GPU
+    // which received the notification if the memory was already migrated before
+    // acquiring the locks either during the servicing of previous notifications
+    // or during faults or because of explicit migrations or if the VA range was
+    // freed after receiving the notification. Return NV_OK in such cases.
+    if (!UVM_ID_IS_VALID(resident_id) || uvm_id_equal(resident_id, gpu->id))
+        return;
+
+    if (UVM_ID_IS_GPU(resident_id))
+        resident_gpu = uvm_va_space_get_gpu(gpu_va_space->va_space, resident_id);
+
+    if (uvm_va_block_get_physical_size(va_block, resident_id, page_index) != granularity) {
+        uvm_page_mask_set(accessed_pages, page_index);
    }
-    uvm_va_space_up_read(va_space);
+    else {
+        NvU32 region_start;
+        NvU32 region_end;
+        unsigned long sub_granularity = current_entry->sub_granularity;
+        NvU32 num_regions = config->sub_granularity_regions_per_translation;
+        NvU32 num_sub_pages = config->sub_granularity_region_size / PAGE_SIZE;
+        uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, resident_id);

-    // The addresses need to be sorted to aid coalescing.
-    sort(phys_addresses,
-         num_addresses,
-         sizeof(*phys_addresses),
-         cmp_sort_gpu_phys_addr,
-         NULL);
+        UVM_ASSERT(num_sub_pages >= 1);

-    for (i = 0; i < num_addresses; ++i) {
-        uvm_access_counter_buffer_entry_t *fake_entry = &batch_context->virt.scratch.phys_entry;
-
-        // Skip the current pointer if the physical region was already handled
-        if (i > 0 && gpu_phys_same_region(phys_addresses[i - 1], phys_addresses[i], notification_size)) {
-            UVM_ASSERT(uvm_id_equal(resident_processors[i - 1], resident_processors[i]));
-            continue;
+        // region_start and region_end refer to sub_granularity indices, not
+        // page_indices.
+        for_each_sub_granularity_region(region_start, region_end, sub_granularity, num_regions) {
+            uvm_page_mask_region_fill(accessed_pages,
+                                      uvm_va_block_region(region_start * num_sub_pages,
+                                                          region_end * num_sub_pages));
        }
-        UVM_DBG_PRINT_RL("Faking MIMC address[%i/%i]: %llx (granularity mask: %llx) in aperture %s on device %s\n",
-                         i,
-                         num_addresses,
-                         phys_addresses[i].address,
-                         notification_size - 1,
-                         uvm_aperture_string(phys_addresses[i].aperture),
-                         uvm_gpu_name(gpu));

-        // Construct a fake phys addr AC entry
-        fake_entry->counter_type = current_entry->counter_type;
-        fake_entry->address.address = UVM_ALIGN_DOWN(phys_addresses[i].address, notification_size);
-        fake_entry->address.aperture = phys_addresses[i].aperture;
-        fake_entry->address.is_virtual = false;
-        fake_entry->physical_info.resident_id = resident_processors[i];
-        fake_entry->counter_value = current_entry->counter_value;
-        fake_entry->sub_granularity = current_entry->sub_granularity;
+        // Remove pages in the va_block which are not resident on resident_id.
+        // If the GPU is heavily accessing those pages, future access counter
+        // migrations will migrate them to the GPU.
+        uvm_page_mask_and(accessed_pages, accessed_pages, resident_mask);
+    }
+}

-        status = service_phys_notification(gpu, batch_context, fake_entry, out_flags);
-        if (status != NV_OK)
+static NV_STATUS service_virt_notifications_in_block(uvm_gpu_va_space_t *gpu_va_space,
+                                                     struct mm_struct *mm,
+                                                     uvm_va_block_t *va_block,
+                                                     uvm_access_counter_service_batch_context_t *batch_context,
+                                                     NvU32 index,
+                                                     NvU32 *out_index)
+{
+    NvU32 i;
+    NvU32 flags = 0;
+    NV_STATUS status = NV_OK;
+    NV_STATUS flags_status;
+    uvm_gpu_t *gpu = gpu_va_space->gpu;
+    uvm_va_space_t *va_space = gpu_va_space->va_space;
+    uvm_page_mask_t *accessed_pages = &batch_context->accessed_pages;
+    uvm_access_counter_buffer_entry_t **notifications = batch_context->virt.notifications;
+
+    UVM_ASSERT(va_block);
+    UVM_ASSERT(index < batch_context->virt.num_notifications);
+
+    uvm_assert_rwsem_locked(&va_space->lock);
+
+    uvm_page_mask_zero(accessed_pages);
+
+    uvm_mutex_lock(&va_block->lock);
+
+    for (i = index; i < batch_context->virt.num_notifications; i++) {
+        uvm_access_counter_buffer_entry_t *current_entry = notifications[i];
+        NvU64 address = current_entry->address.address;
+
+        if ((current_entry->virtual_info.va_space == va_space) && (address <= va_block->end))
+            expand_notification_block(gpu_va_space, va_block, accessed_pages, current_entry);
+        else
            break;
    }

+    *out_index = i;
+
+    // Atleast one notification should have been processed.
+    UVM_ASSERT(index < *out_index);
+
+    status = service_notification_va_block_helper(mm, va_block, gpu->id, batch_context);
+
+    uvm_mutex_unlock(&va_block->lock);
+
+    if (status == NV_OK)
+        flags |= UVM_ACCESS_COUNTER_ACTION_CLEAR;
+
+    flags_status = notify_tools_and_process_flags(gpu, &notifications[index], *out_index - index, flags);
+
+    if ((status == NV_OK) && (flags_status != NV_OK))
+        status = flags_status;
+
+    return status;
+}
+
+static NV_STATUS service_virt_notification_ats(uvm_gpu_va_space_t *gpu_va_space,
+                                               struct mm_struct *mm,
+                                               uvm_access_counter_service_batch_context_t *batch_context,
+                                               NvU32 index,
+                                               NvU32 *out_index)
+{
+
+    NvU32 i;
+    NvU64 base;
+    NvU64 end;
+    NvU64 address;
+    NvU32 flags = UVM_ACCESS_COUNTER_ACTION_CLEAR;
+    NV_STATUS status = NV_OK;
+    NV_STATUS flags_status;
+    struct vm_area_struct *vma = NULL;
+    uvm_gpu_t *gpu = gpu_va_space->gpu;
+    uvm_va_space_t *va_space = gpu_va_space->va_space;
+    uvm_ats_fault_context_t *ats_context = &batch_context->ats_context;
+    uvm_access_counter_buffer_entry_t **notifications = batch_context->virt.notifications;
+
+    UVM_ASSERT(index < batch_context->virt.num_notifications);
+
+    uvm_assert_mmap_lock_locked(mm);
+    uvm_assert_rwsem_locked(&va_space->lock);
+
+    address = notifications[index]->address.address;
+
+    vma = find_vma_intersection(mm, address, address + 1);
+    if (!vma) {
+        // Clear the notification entry to continue receiving access counter
+        // notifications when a new VMA is allocated in this range.
+        status = notify_tools_and_process_flags(gpu, &notifications[index], 1, flags);
+        *out_index = index + 1;
+        return status;
+    }
+
+    base = UVM_VA_BLOCK_ALIGN_DOWN(address);
+    end = min(base + UVM_VA_BLOCK_SIZE, (NvU64)vma->vm_end);
+
+    uvm_page_mask_zero(&ats_context->accessed_mask);
+
+    for (i = index; i < batch_context->virt.num_notifications; i++) {
+        uvm_access_counter_buffer_entry_t *current_entry = notifications[i];
+        address = current_entry->address.address;
+
+        if ((current_entry->virtual_info.va_space == va_space) && (address < end))
+            uvm_page_mask_set(&ats_context->accessed_mask, (address - base) / PAGE_SIZE);
+        else
+            break;
+    }
+
+    *out_index = i;
+
+    // Atleast one notification should have been processed.
+    UVM_ASSERT(index < *out_index);
+
+    // TODO: Bug 2113632: [UVM] Don't clear access counters when the preferred
+    //                    location is set
+    // If no pages were actually migrated, don't clear the access counters.
+    status = uvm_ats_service_access_counters(gpu_va_space, vma, base, ats_context);
+    if (status != NV_OK)
+        flags &= ~UVM_ACCESS_COUNTER_ACTION_CLEAR;
+
+    flags_status = notify_tools_and_process_flags(gpu, &notifications[index], *out_index - index, flags);
+    if ((status == NV_OK) && (flags_status != NV_OK))
+        status = flags_status;
+
+    return status;
+}
+
+static NV_STATUS service_virt_notifications_batch(uvm_gpu_va_space_t *gpu_va_space,
+                                                  struct mm_struct *mm,
+                                                  uvm_access_counter_service_batch_context_t *batch_context,
+                                                  NvU32 index,
+                                                  NvU32 *out_index)
+{
+    NV_STATUS status;
+    uvm_va_range_t *va_range;
+    uvm_va_space_t *va_space = gpu_va_space->va_space;
+    uvm_access_counter_buffer_entry_t *current_entry = batch_context->virt.notifications[index];
+    NvU64 address = current_entry->address.address;
+
+    UVM_ASSERT(va_space);
+
+    if (mm)
+        uvm_assert_mmap_lock_locked(mm);
+
+    uvm_assert_rwsem_locked(&va_space->lock);
+
+    // Virtual address notifications are always 64K aligned
+    UVM_ASSERT(IS_ALIGNED(address, UVM_PAGE_SIZE_64K));
+
+    va_range = uvm_va_range_find(va_space, address);
+    if (va_range) {
+        // Avoid clearing the entry by default.
+        NvU32 flags = 0;
+        uvm_va_block_t *va_block = NULL;
+
+        if (va_range->type == UVM_VA_RANGE_TYPE_MANAGED) {
+            size_t index = uvm_va_range_block_index(va_range, address);
+
+            va_block = uvm_va_range_block(va_range, index);
+
+            // If the va_range is a managed range, the notification belongs to a
+            // recently freed va_range if va_block is NULL. If va_block is not
+            // NULL, service_virt_notifications_in_block will process flags.
+            // Clear the notification entry to continue receiving notifications
+            // when a new va_range is allocated in that region.
+            flags = UVM_ACCESS_COUNTER_ACTION_CLEAR;
+        }
+
+        if (va_block) {
+            status = service_virt_notifications_in_block(gpu_va_space, mm, va_block, batch_context, index, out_index);
+        }
+        else {
+            status = notify_tools_and_process_flags(gpu_va_space->gpu, batch_context->virt.notifications, 1, flags);
+            *out_index = index + 1;
+        }
+    }
+    else if (uvm_ats_can_service_faults(gpu_va_space, mm)) {
+        status = service_virt_notification_ats(gpu_va_space, mm, batch_context, index, out_index);
+    }
+    else {
+        NvU32 flags;
+        uvm_va_block_t *va_block = NULL;
+
+        status = uvm_hmm_va_block_find(va_space, address, &va_block);
+
+        // TODO: Bug 4309292: [UVM][HMM] Re-enable access counter HMM block
+        //                    migrations for virtual notifications
+        //
+        // - If the va_block is HMM, don't clear the notification since HMM
+        // migrations are currently disabled.
+        //
+        // - If the va_block isn't HMM, the notification belongs to a recently
+        // freed va_range. Clear the notification entry to continue receiving
+        // notifications when a new va_range is allocated in this region.
+        flags = va_block ? 0 : UVM_ACCESS_COUNTER_ACTION_CLEAR;
+
+        UVM_ASSERT((status == NV_ERR_OBJECT_NOT_FOUND) ||
+                   (status == NV_ERR_INVALID_ADDRESS)  ||
+                   uvm_va_block_is_hmm(va_block));
+
+        // Clobber status to continue processing the rest of the notifications
+        // in the batch.
+        status = notify_tools_and_process_flags(gpu_va_space->gpu, batch_context->virt.notifications, 1, flags);
+
+        *out_index = index + 1;
+    }
+
    return status;
 }

 static NV_STATUS service_virt_notifications(uvm_gpu_t *gpu,
                                            uvm_access_counter_service_batch_context_t *batch_context)
 {
-    NvU32 i;
+    NvU32 i = 0;
    NV_STATUS status = NV_OK;
+    struct mm_struct *mm = NULL;
+    uvm_va_space_t *va_space = NULL;
+    uvm_va_space_t *prev_va_space = NULL;
+    uvm_gpu_va_space_t *gpu_va_space = NULL;
+
+    // TODO: Bug 4299018 : Add support for virtual access counter migrations on
+    //                     4K page sizes.
+    if (PAGE_SIZE == UVM_PAGE_SIZE_4K) {
+        return notify_tools_and_process_flags(gpu,
+                                              batch_context->virt.notifications,
+                                              batch_context->virt.num_notifications,
+                                              0);
+    }
+
    preprocess_virt_notifications(gpu, batch_context);

-    for (i = 0; i < batch_context->virt.num_notifications; ++i) {
-        unsigned flags = 0;
+    while (i < batch_context->virt.num_notifications) {
        uvm_access_counter_buffer_entry_t *current_entry = batch_context->virt.notifications[i];
+        va_space = current_entry->virtual_info.va_space;

-        status = service_virt_notification(gpu, batch_context, current_entry, &flags);
+        if (va_space != prev_va_space) {

-        UVM_DBG_PRINT_RL("Processed virt access counter (%d/%d): %sMANAGED (status: %d) clear: %s\n",
-                         i + 1,
-                         batch_context->virt.num_notifications,
-                         (flags & UVM_ACCESS_COUNTER_ON_MANAGED) ? "" : "NOT ",
-                         status,
-                         (flags & UVM_ACCESS_COUNTER_ACTION_CLEAR) ? "YES" : "NO");
+            // New va_space detected, drop locks of the old va_space.
+            if (prev_va_space) {
+                uvm_va_space_up_read(prev_va_space);
+                uvm_va_space_mm_release_unlock(prev_va_space, mm);

-        if (uvm_enable_builtin_tests)
-            uvm_tools_broadcast_access_counter(gpu, current_entry, flags & UVM_ACCESS_COUNTER_ON_MANAGED);
+                mm = NULL;
+                gpu_va_space = NULL;
+            }

-        if (status == NV_OK && (flags & UVM_ACCESS_COUNTER_ACTION_CLEAR))
-            status = access_counter_clear_targeted(gpu, current_entry);
+            // Acquire locks for the new va_space.
+            if (va_space) {
+                mm = uvm_va_space_mm_retain_lock(va_space);
+                uvm_va_space_down_read(va_space);
+
+                gpu_va_space = uvm_gpu_va_space_get_by_parent_gpu(va_space, gpu->parent);
+            }
+
+            prev_va_space = va_space;
+        }
+
+        if (va_space && gpu_va_space && uvm_va_space_has_access_counter_migrations(va_space)) {
+            status = service_virt_notifications_batch(gpu_va_space, mm, batch_context, i, &i);
+        }
+        else {
+            status = notify_tools_and_process_flags(gpu, &batch_context->virt.notifications[i], 1, 0);
+            i++;
+        }

        if (status != NV_OK)
            break;
    }

+    if (va_space) {
+        uvm_va_space_up_read(va_space);
+        uvm_va_space_mm_release_unlock(va_space, mm);
+    }
+
    return status;
 }

@@ -1941,6 +2142,7 @@ NV_STATUS uvm_test_reset_access_counters(UVM_TEST_RESET_ACCESS_COUNTERS_PARAMS *
    }
    else {
        uvm_access_counter_buffer_entry_t entry = { 0 };
+        uvm_access_counter_buffer_entry_t *notification = &entry;

        if (params->counter_type == UVM_TEST_ACCESS_COUNTER_TYPE_MIMC)
            entry.counter_type = UVM_ACCESS_COUNTER_TYPE_MIMC;
@@ -1950,7 +2152,7 @@ NV_STATUS uvm_test_reset_access_counters(UVM_TEST_RESET_ACCESS_COUNTERS_PARAMS *
        entry.bank = params->bank;
        entry.tag = params->tag;

-        status = access_counter_clear_targeted(gpu, &entry);
+        status = access_counter_clear_notifications(gpu, &notification, 1);
    }

    if (status == NV_OK)
--- a/kernel-open/nvidia-uvm/uvm_gpu_isr.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu_isr.c
@@ -292,7 +292,6 @@ NV_STATUS uvm_gpu_init_isr(uvm_parent_gpu_t *parent_gpu)
 {
    NV_STATUS status = NV_OK;
    char kthread_name[TASK_COMM_LEN + 1];
-    uvm_va_block_context_t *block_context;

    if (parent_gpu->replayable_faults_supported) {
        status = uvm_gpu_fault_buffer_init(parent_gpu);
@@ -312,12 +311,6 @@ NV_STATUS uvm_gpu_init_isr(uvm_parent_gpu_t *parent_gpu)
        if (!parent_gpu->isr.replayable_faults.stats.cpu_exec_count)
            return NV_ERR_NO_MEMORY;

-        block_context = uvm_va_block_context_alloc(NULL);
-        if (!block_context)
-            return NV_ERR_NO_MEMORY;
-
-        parent_gpu->fault_buffer_info.replayable.block_service_context.block_context = block_context;
-
        parent_gpu->isr.replayable_faults.handling = true;

        snprintf(kthread_name, sizeof(kthread_name), "UVM GPU%u BH", uvm_id_value(parent_gpu->id));
@@ -340,12 +333,6 @@ NV_STATUS uvm_gpu_init_isr(uvm_parent_gpu_t *parent_gpu)
            if (!parent_gpu->isr.non_replayable_faults.stats.cpu_exec_count)
                return NV_ERR_NO_MEMORY;

-            block_context = uvm_va_block_context_alloc(NULL);
-            if (!block_context)
-                return NV_ERR_NO_MEMORY;
-
-            parent_gpu->fault_buffer_info.non_replayable.block_service_context.block_context = block_context;
-
            parent_gpu->isr.non_replayable_faults.handling = true;

            snprintf(kthread_name, sizeof(kthread_name), "UVM GPU%u KC", uvm_id_value(parent_gpu->id));
@@ -369,13 +356,6 @@ NV_STATUS uvm_gpu_init_isr(uvm_parent_gpu_t *parent_gpu)
                return status;
            }

-            block_context = uvm_va_block_context_alloc(NULL);
-            if (!block_context)
-                return NV_ERR_NO_MEMORY;
-
-            parent_gpu->access_counter_buffer_info.batch_service_context.block_service_context.block_context =
-                block_context;
-
            nv_kthread_q_item_init(&parent_gpu->isr.access_counters.bottom_half_q_item,
                                   access_counters_isr_bottom_half_entry,
                                   parent_gpu);
@@ -430,8 +410,6 @@ void uvm_gpu_disable_isr(uvm_parent_gpu_t *parent_gpu)

 void uvm_gpu_deinit_isr(uvm_parent_gpu_t *parent_gpu)
 {
-    uvm_va_block_context_t *block_context;
-
    // Return ownership to RM:
    if (parent_gpu->isr.replayable_faults.was_handling) {
        // No user threads could have anything left on
@@ -461,18 +439,8 @@ void uvm_gpu_deinit_isr(uvm_parent_gpu_t *parent_gpu)
        // It is safe to deinitialize access counters even if they have not been
        // successfully initialized.
        uvm_gpu_deinit_access_counters(parent_gpu);
-        block_context =
-            parent_gpu->access_counter_buffer_info.batch_service_context.block_service_context.block_context;
-        uvm_va_block_context_free(block_context);
    }

-    if (parent_gpu->non_replayable_faults_supported) {
-        block_context = parent_gpu->fault_buffer_info.non_replayable.block_service_context.block_context;
-        uvm_va_block_context_free(block_context);
-    }
-
-    block_context = parent_gpu->fault_buffer_info.replayable.block_service_context.block_context;
-    uvm_va_block_context_free(block_context);
    uvm_kvfree(parent_gpu->isr.replayable_faults.stats.cpu_exec_count);
    uvm_kvfree(parent_gpu->isr.non_replayable_faults.stats.cpu_exec_count);
    uvm_kvfree(parent_gpu->isr.access_counters.stats.cpu_exec_count);
--- a/kernel-open/nvidia-uvm/uvm_gpu_non_replayable_faults.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu_non_replayable_faults.c
@@ -235,17 +235,27 @@ static NV_STATUS fetch_non_replayable_fault_buffer_entries(uvm_parent_gpu_t *par
    return NV_OK;
 }

-// In SRIOV, the UVM (guest) driver does not have access to the privileged
-// registers used to clear the faulted bit. Instead, UVM requests host RM to do
-// the clearing on its behalf, using a SW method.
 static bool use_clear_faulted_channel_sw_method(uvm_gpu_t *gpu)
 {
-    if (uvm_gpu_is_virt_mode_sriov(gpu)) {
-        UVM_ASSERT(gpu->parent->has_clear_faulted_channel_sw_method);
-        return true;
-    }
+    // If true, UVM uses a SW method to request RM to do the clearing on its
+    // behalf.
+    bool use_sw_method = false;

-    return false;
+    // In SRIOV, the UVM (guest) driver does not have access to the privileged
+    // registers used to clear the faulted bit.
+    if (uvm_gpu_is_virt_mode_sriov(gpu))
+        use_sw_method = true;
+
+    // In Confidential Computing access to the privileged registers is blocked,
+    // in order to prevent interference between guests, or between the
+    // (untrusted) host and the guests.
+    if (g_uvm_global.conf_computing_enabled)
+        use_sw_method = true;
+
+    if (use_sw_method)
+        UVM_ASSERT(gpu->parent->has_clear_faulted_channel_sw_method);
+
+    return use_sw_method;
 }

 static NV_STATUS clear_faulted_method_on_gpu(uvm_gpu_t *gpu,
@@ -370,7 +380,7 @@ static NV_STATUS service_managed_fault_in_block_locked(uvm_gpu_t *gpu,

    // Check logical permissions
    status = uvm_va_block_check_logical_permissions(va_block,
-                                                    service_context->block_context,
+                                                    &service_context->block_context,
                                                    gpu->id,
                                                    uvm_va_block_cpu_page_index(va_block,
                                                                                fault_entry->fault_address),
@@ -393,7 +403,7 @@ static NV_STATUS service_managed_fault_in_block_locked(uvm_gpu_t *gpu,

    // Compute new residency and update the masks
    new_residency = uvm_va_block_select_residency(va_block,
-                                                  service_context->block_context,
+                                                  &service_context->block_context,
                                                  page_index,
                                                  gpu->id,
                                                  fault_entry->access_type_mask,
@@ -570,7 +580,7 @@ static NV_STATUS service_non_managed_fault(uvm_gpu_va_space_t *gpu_va_space,

        ats_context->client_type = UVM_FAULT_CLIENT_TYPE_HUB;

-        ats_invalidate->write_faults_in_batch = false;
+        ats_invalidate->tlb_batch_pending = false;

        va_range_next = uvm_va_space_iter_first(gpu_va_space->va_space, fault_entry->fault_address, ~0ULL);

@@ -629,7 +639,7 @@ static NV_STATUS service_fault(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fault_e
    uvm_gpu_va_space_t *gpu_va_space;
    uvm_non_replayable_fault_buffer_info_t *non_replayable_faults = &gpu->parent->fault_buffer_info.non_replayable;
    uvm_va_block_context_t *va_block_context =
-        gpu->parent->fault_buffer_info.non_replayable.block_service_context.block_context;
+        &gpu->parent->fault_buffer_info.non_replayable.block_service_context.block_context;

    status = uvm_gpu_fault_entry_to_va_space(gpu, fault_entry, &va_space);
    if (status != NV_OK) {
@@ -655,7 +665,7 @@ static NV_STATUS service_fault(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fault_e
    // to remain valid until we release. If no mm is registered, we
    // can only service managed faults, not ATS/HMM faults.
    mm = uvm_va_space_mm_retain_lock(va_space);
-    uvm_va_block_context_init(va_block_context, mm);
+    va_block_context->mm = mm;

    uvm_va_space_down_read(va_space);

--- a/kernel-open/nvidia-uvm/uvm_gpu_replayable_faults.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu_replayable_faults.c
@@ -362,7 +362,8 @@ static NV_STATUS push_cancel_on_gpu(uvm_gpu_t *gpu,
                                        "Cancel targeting instance_ptr {0x%llx:%s}\n",
                                        instance_ptr.address,
                                        uvm_aperture_string(instance_ptr.aperture));
-    } else {
+    }
+    else {
        status = uvm_push_begin_acquire(gpu->channel_manager,
                                        UVM_CHANNEL_TYPE_MEMOPS,
                                        &replayable_faults->replay_tracker,
@@ -623,7 +624,15 @@ static NV_STATUS fault_buffer_flush_locked(uvm_gpu_t *gpu,

    while (get != put) {
        // Wait until valid bit is set
-        UVM_SPIN_WHILE(!parent_gpu->fault_buffer_hal->entry_is_valid(parent_gpu, get), &spin);
+        UVM_SPIN_WHILE(!parent_gpu->fault_buffer_hal->entry_is_valid(parent_gpu, get), &spin) {
+            // Channels might be idle (e.g. in teardown) so check for errors
+            // actively. In that case the gpu pointer is valid.
+            NV_STATUS status = gpu ? uvm_channel_manager_check_errors(gpu->channel_manager) : uvm_global_get_status();
+            if (status != NV_OK) {
+                write_get(parent_gpu, get);
+                return status;
+            }
+        }

        fault_buffer_skip_replayable_entry(parent_gpu, get);
        ++get;
@@ -856,6 +865,10 @@ static NV_STATUS fetch_fault_buffer_entries(uvm_gpu_t *gpu,
            // We have some entry to work on. Let's do the rest later.
            if (fetch_mode == FAULT_FETCH_MODE_BATCH_READY && fault_index > 0)
                goto done;
+            
+            status = uvm_global_get_status();
+            if (status != NV_OK)
+                goto done;
        }

        // Prevent later accesses being moved above the read of the valid bit
@@ -1234,7 +1247,7 @@ static uvm_fault_access_type_t check_fault_access_permissions(uvm_gpu_t *gpu,
    UvmEventFatalReason fatal_reason;
    uvm_fault_cancel_va_mode_t cancel_va_mode;
    uvm_fault_access_type_t ret = UVM_FAULT_ACCESS_TYPE_COUNT;
-    uvm_va_block_context_t *va_block_context = service_block_context->block_context;
+    uvm_va_block_context_t *va_block_context = &service_block_context->block_context;

    perm_status = uvm_va_block_check_logical_permissions(va_block,
                                                         va_block_context,
@@ -1349,7 +1362,7 @@ static NV_STATUS service_fault_batch_block_locked(uvm_gpu_t *gpu,

    if (uvm_va_block_is_hmm(va_block)) {
        policy = uvm_hmm_find_policy_end(va_block,
-                                         block_context->block_context->hmm.vma,
+                                         block_context->block_context.hmm.vma,
                                         ordered_fault_cache[first_fault_index]->fault_address,
                                         &end);
    }
@@ -1473,7 +1486,7 @@ static NV_STATUS service_fault_batch_block_locked(uvm_gpu_t *gpu,

        // Compute new residency and update the masks
        new_residency = uvm_va_block_select_residency(va_block,
-                                                      block_context->block_context,
+                                                      &block_context->block_context,
                                                      page_index,
                                                      gpu->id,
                                                      service_access_type_mask,
@@ -1516,7 +1529,7 @@ static NV_STATUS service_fault_batch_block_locked(uvm_gpu_t *gpu,
    ++block_context->num_retries;

    if (status == NV_OK && batch_context->fatal_va_space)
-        status = uvm_va_block_set_cancel(va_block, block_context->block_context, gpu);
+        status = uvm_va_block_set_cancel(va_block, &block_context->block_context, gpu);

    return status;
 }
@@ -1631,23 +1644,23 @@ static NV_STATUS service_fault_batch_ats_sub_vma(uvm_gpu_va_space_t *gpu_va_spac
    const uvm_page_mask_t *write_fault_mask = &ats_context->write_fault_mask;
    const uvm_page_mask_t *reads_serviced_mask = &ats_context->reads_serviced_mask;
    uvm_page_mask_t *faults_serviced_mask = &ats_context->faults_serviced_mask;
-    uvm_page_mask_t *faulted_mask = &ats_context->faulted_mask;
+    uvm_page_mask_t *accessed_mask = &ats_context->accessed_mask;

    UVM_ASSERT(vma);

    ats_context->client_type = UVM_FAULT_CLIENT_TYPE_GPC;

-    uvm_page_mask_or(faulted_mask, write_fault_mask, read_fault_mask);
+    uvm_page_mask_or(accessed_mask, write_fault_mask, read_fault_mask);

    status = uvm_ats_service_faults(gpu_va_space, vma, base, &batch_context->ats_context);

    // Remove prefetched pages from the serviced mask since fault servicing
    // failures belonging to prefetch pages need to be ignored.
-    uvm_page_mask_and(faults_serviced_mask, faults_serviced_mask, faulted_mask);
+    uvm_page_mask_and(faults_serviced_mask, faults_serviced_mask, accessed_mask);

-    UVM_ASSERT(uvm_page_mask_subset(faults_serviced_mask, faulted_mask));
+    UVM_ASSERT(uvm_page_mask_subset(faults_serviced_mask, accessed_mask));

-    if ((status != NV_OK) || uvm_page_mask_equal(faults_serviced_mask, faulted_mask)) {
+    if ((status != NV_OK) || uvm_page_mask_equal(faults_serviced_mask, accessed_mask)) {
        (*block_faults) += (fault_index_end - fault_index_start);
        return status;
    }
@@ -1679,7 +1692,8 @@ static NV_STATUS service_fault_batch_ats_sub_vma(uvm_gpu_va_space_t *gpu_va_spac
        if (access_type <= UVM_FAULT_ACCESS_TYPE_READ) {
            cancel_va_mode = UVM_FAULT_CANCEL_VA_MODE_ALL;
        }
-        else if (access_type >= UVM_FAULT_ACCESS_TYPE_WRITE) {
+	else {
+            UVM_ASSERT(access_type >= UVM_FAULT_ACCESS_TYPE_WRITE);
            if (uvm_fault_access_type_mask_test(current_entry->access_type_mask, UVM_FAULT_ACCESS_TYPE_READ) &&
                !uvm_page_mask_test(reads_serviced_mask, page_index))
                cancel_va_mode = UVM_FAULT_CANCEL_VA_MODE_ALL;
@@ -1864,7 +1878,7 @@ static NV_STATUS service_fault_batch_dispatch(uvm_va_space_t *va_space,
    uvm_va_block_t *va_block;
    uvm_gpu_t *gpu = gpu_va_space->gpu;
    uvm_va_block_context_t *va_block_context =
-        gpu->parent->fault_buffer_info.replayable.block_service_context.block_context;
+        &gpu->parent->fault_buffer_info.replayable.block_service_context.block_context;
    uvm_fault_buffer_entry_t *current_entry = batch_context->ordered_fault_cache[fault_index];
    struct mm_struct *mm = va_block_context->mm;
    NvU64 fault_address = current_entry->fault_address;
@@ -1955,7 +1969,7 @@ static NV_STATUS service_fault_batch_for_cancel(uvm_gpu_t *gpu, uvm_fault_servic
    struct mm_struct *mm;
    uvm_replayable_fault_buffer_info_t *replayable_faults = &gpu->parent->fault_buffer_info.replayable;
    uvm_service_block_context_t *service_context = &gpu->parent->fault_buffer_info.replayable.block_service_context;
-    uvm_va_block_context_t *va_block_context = service_context->block_context;
+    uvm_va_block_context_t *va_block_context = &service_context->block_context;

    UVM_ASSERT(gpu->parent->replayable_faults_supported);
    UVM_ASSERT(va_space);
@@ -1965,7 +1979,7 @@ static NV_STATUS service_fault_batch_for_cancel(uvm_gpu_t *gpu, uvm_fault_servic
    // modifications (mmap, munmap, mprotect) from happening between the time HW
    // takes the fault and we cancel it.
    mm = uvm_va_space_mm_retain_lock(va_space);
-    uvm_va_block_context_init(va_block_context, mm);
+    va_block_context->mm = mm;
    uvm_va_space_down_read(va_space);

    // We saw fatal faults in this VA space before. Flush while holding
@@ -2065,7 +2079,7 @@ static NV_STATUS service_fault_batch_for_cancel(uvm_gpu_t *gpu, uvm_fault_servic
            uvm_ats_fault_invalidate_t *ats_invalidate = &gpu->parent->fault_buffer_info.replayable.ats_invalidate;
            NvU32 block_faults;

-            ats_invalidate->write_faults_in_batch = false;
+            ats_invalidate->tlb_batch_pending = false;
            uvm_hmm_service_context_init(service_context);

            // Service all the faults that we can. We only really need to search
@@ -2147,11 +2161,11 @@ static NV_STATUS service_fault_batch(uvm_gpu_t *gpu,
                                     gpu->parent->fault_buffer_info.replayable.replay_policy == UVM_PERF_FAULT_REPLAY_POLICY_BLOCK;
    uvm_service_block_context_t *service_context =
        &gpu->parent->fault_buffer_info.replayable.block_service_context;
-    uvm_va_block_context_t *va_block_context = service_context->block_context;
+    uvm_va_block_context_t *va_block_context = &service_context->block_context;

    UVM_ASSERT(gpu->parent->replayable_faults_supported);

-    ats_invalidate->write_faults_in_batch = false;
+    ats_invalidate->tlb_batch_pending = false;
    uvm_hmm_service_context_init(service_context);

    for (i = 0; i < batch_context->num_coalesced_faults;) {
@@ -2183,7 +2197,7 @@ static NV_STATUS service_fault_batch(uvm_gpu_t *gpu,
            // to remain valid until we release. If no mm is registered, we
            // can only service managed faults, not ATS/HMM faults.
            mm = uvm_va_space_mm_retain_lock(va_space);
-            uvm_va_block_context_init(va_block_context, mm);
+            va_block_context->mm = mm;

            uvm_va_space_down_read(va_space);
            gpu_va_space = uvm_gpu_va_space_get_by_parent_gpu(va_space, gpu->parent);
--- a/kernel-open/nvidia-uvm/uvm_hal.c
+++ b/kernel-open/nvidia-uvm/uvm_hal.c
@@ -794,7 +794,7 @@ uvm_membar_t uvm_hal_downgrade_membar_type(uvm_gpu_t *gpu, bool is_local_vidmem)
    // memory, including those from other processors like the CPU or peer GPUs,
    // must come through this GPU's L2. In all current architectures, MEMBAR_GPU
    // is sufficient to resolve ordering at the L2 level.
-    if (is_local_vidmem && !uvm_parent_gpu_is_coherent(gpu->parent) && !uvm_downgrade_force_membar_sys)
+    if (is_local_vidmem && !uvm_gpu_is_coherent(gpu->parent) && !uvm_downgrade_force_membar_sys)
        return UVM_MEMBAR_GPU;

    // If the mapped memory was remote, or if a coherence protocol can cache
--- a/kernel-open/nvidia-uvm/uvm_hmm.c
+++ b/kernel-open/nvidia-uvm/uvm_hmm.c
@@ -60,8 +60,6 @@ module_param(uvm_disable_hmm, bool, 0444);
 #include "uvm_gpu.h"
 #include "uvm_pmm_gpu.h"
 #include "uvm_hal_types.h"
-#include "uvm_push.h"
-#include "uvm_hal.h"
 #include "uvm_va_block_types.h"
 #include "uvm_va_space_mm.h"
 #include "uvm_va_space.h"
@@ -112,7 +110,20 @@ typedef struct

 bool uvm_hmm_is_enabled_system_wide(void)
 {
-    return !uvm_disable_hmm && !g_uvm_global.ats.enabled && uvm_va_space_mm_enabled_system();
+    if (uvm_disable_hmm)
+        return false;
+
+    if (g_uvm_global.ats.enabled)
+        return false;
+
+    // Confidential Computing and HMM impose mutually exclusive constraints. In
+    // Confidential Computing the GPU can only access pages resident in vidmem,
+    // but in HMM pages may be required to be resident in sysmem: file backed
+    // VMAs, huge pages, etc.
+    if (g_uvm_global.conf_computing_enabled)
+        return false;
+
+    return uvm_va_space_mm_enabled_system();
 }

 bool uvm_hmm_is_enabled(uvm_va_space_t *va_space)
@@ -129,100 +140,6 @@ static uvm_va_block_t *hmm_va_block_from_node(uvm_range_tree_node_t *node)
    return container_of(node, uvm_va_block_t, hmm.node);
 }

-// Copies the contents of the source device-private page to the
-// destination CPU page. This will invalidate mappings, so cannot be
-// called while holding any va_block locks.
-static NV_STATUS uvm_hmm_copy_devmem_page(struct page *dst_page, struct page *src_page, uvm_tracker_t *tracker)
-{
-    uvm_gpu_phys_address_t src_addr;
-    uvm_gpu_phys_address_t dst_addr;
-    uvm_gpu_chunk_t *gpu_chunk;
-    NvU64 dma_addr;
-    uvm_push_t push;
-    NV_STATUS status = NV_OK;
-    uvm_gpu_t *gpu;
-
-    // Holding a reference on the device-private page ensures the gpu
-    // is already retained. This is because when a GPU is unregistered
-    // all device-private pages are migrated back to the CPU and freed
-    // before releasing the GPU. Therefore if we could get a reference
-    // to the page the GPU must be retained.
-    UVM_ASSERT(is_device_private_page(src_page) && page_count(src_page));
-    gpu_chunk = uvm_pmm_devmem_page_to_chunk(src_page);
-    gpu = uvm_gpu_chunk_get_gpu(gpu_chunk);
-    status = uvm_mmu_chunk_map(gpu_chunk);
-    if (status != NV_OK)
-        return status;
-
-    status = uvm_gpu_map_cpu_pages(gpu->parent, dst_page, PAGE_SIZE, &dma_addr);
-    if (status != NV_OK)
-        goto out_unmap_gpu;
-
-    dst_addr = uvm_gpu_phys_address(UVM_APERTURE_SYS, dma_addr);
-    src_addr = uvm_gpu_phys_address(UVM_APERTURE_VID, gpu_chunk->address);
-    status = uvm_push_begin_acquire(gpu->channel_manager,
-                                    UVM_CHANNEL_TYPE_GPU_TO_CPU,
-                                    tracker,
-                                    &push,
-                                    "Copy for remote process fault");
-    if (status != NV_OK)
-        goto out_unmap_cpu;
-
-    gpu->parent->ce_hal->memcopy(&push,
-                                 uvm_gpu_address_copy(gpu, dst_addr),
-                                 uvm_gpu_address_copy(gpu, src_addr),
-                                 PAGE_SIZE);
-    uvm_push_end(&push);
-    status = uvm_tracker_add_push_safe(tracker, &push);
-
-out_unmap_cpu:
-    uvm_gpu_unmap_cpu_pages(gpu->parent, dma_addr, PAGE_SIZE);
-
-out_unmap_gpu:
-    uvm_mmu_chunk_unmap(gpu_chunk, NULL);
-
-    return status;
-}
-
-static NV_STATUS uvm_hmm_pmm_gpu_evict_pfn(unsigned long pfn)
-{
-    unsigned long src_pfn = 0;
-    unsigned long dst_pfn = 0;
-    struct page *dst_page;
-    NV_STATUS status = NV_OK;
-    int ret;
-
-    ret = migrate_device_range(&src_pfn, pfn, 1);
-    if (ret)
-        return errno_to_nv_status(ret);
-
-    if (src_pfn & MIGRATE_PFN_MIGRATE) {
-        uvm_tracker_t tracker = UVM_TRACKER_INIT();
-
-        dst_page = alloc_page(GFP_HIGHUSER_MOVABLE);
-        if (!dst_page) {
-            status = NV_ERR_NO_MEMORY;
-            goto out;
-        }
-
-        lock_page(dst_page);
-        if (WARN_ON(uvm_hmm_copy_devmem_page(dst_page, migrate_pfn_to_page(src_pfn), &tracker) != NV_OK))
-            memzero_page(dst_page, 0, PAGE_SIZE);
-
-        dst_pfn = migrate_pfn(page_to_pfn(dst_page));
-        migrate_device_pages(&src_pfn, &dst_pfn, 1);
-        uvm_tracker_wait_deinit(&tracker);
-    }
-
-out:
-    migrate_device_finalize(&src_pfn, &dst_pfn, 1);
-
-    if (!(src_pfn & MIGRATE_PFN_MIGRATE))
-        status = NV_ERR_BUSY_RETRY;
-
-    return status;
-}
-
 void uvm_hmm_va_space_initialize(uvm_va_space_t *va_space)
 {
    uvm_hmm_va_space_t *hmm_va_space = &va_space->hmm;
@@ -282,9 +199,6 @@ void uvm_hmm_unregister_gpu(uvm_va_space_t *va_space, uvm_gpu_t *gpu, struct mm_
 {
    uvm_range_tree_node_t *node;
    uvm_va_block_t *va_block;
-    struct range range = gpu->pmm.devmem.pagemap.range;
-    unsigned long pfn;
-    bool retry;

    if (!uvm_hmm_is_enabled(va_space))
        return;
@@ -293,29 +207,6 @@ void uvm_hmm_unregister_gpu(uvm_va_space_t *va_space, uvm_gpu_t *gpu, struct mm_
        uvm_assert_mmap_lock_locked(mm);
    uvm_assert_rwsem_locked_write(&va_space->lock);

-    // There could be pages with page->zone_device_data pointing to the va_space
-    // which may be about to be freed. Migrate those back to the CPU so we don't
-    // fault on them. Normally infinite retries are bad, but we don't have any
-    // option here. Device-private pages can't be pinned so migration should
-    // eventually succeed. Even if we did eventually bail out of the loop we'd
-    // just stall in memunmap_pages() anyway.
-    do {
-        retry = false;
-
-        for (pfn = __phys_to_pfn(range.start); pfn <= __phys_to_pfn(range.end); pfn++) {
-            struct page *page = pfn_to_page(pfn);
-
-            UVM_ASSERT(is_device_private_page(page));
-
-            // This check is racy because nothing stops the page being freed and
-            // even reused. That doesn't matter though - worst case the
-            // migration fails, we retry and find the va_space doesn't match.
-            if (page->zone_device_data == va_space)
-                if (uvm_hmm_pmm_gpu_evict_pfn(pfn) != NV_OK)
-                    retry = true;
-        }
-    } while (retry);
-
    uvm_range_tree_for_each(node, &va_space->hmm.blocks) {
        va_block = hmm_va_block_from_node(node);

@@ -677,7 +568,7 @@ bool uvm_hmm_check_context_vma_is_valid(uvm_va_block_t *va_block,
 void uvm_hmm_service_context_init(uvm_service_block_context_t *service_context)
 {
    // TODO: Bug 4050579: Remove this when swap cached pages can be migrated.
-    service_context->block_context->hmm.swap_cached = false;
+    service_context->block_context.hmm.swap_cached = false;
 }

 NV_STATUS uvm_hmm_migrate_begin(uvm_va_block_t *va_block)
@@ -740,6 +631,47 @@ static NV_STATUS hmm_migrate_range(uvm_va_block_t *va_block,
    return status;
 }

+void uvm_hmm_evict_va_blocks(uvm_va_space_t *va_space)
+{
+    // We can't use uvm_va_space_mm_retain(), because the va_space_mm
+    // should already be dead by now.
+    struct mm_struct *mm = va_space->va_space_mm.mm;
+    uvm_hmm_va_space_t *hmm_va_space = &va_space->hmm;
+    uvm_range_tree_node_t *node, *next;
+    uvm_va_block_t *va_block;
+    uvm_va_block_context_t *block_context;
+
+    uvm_down_read_mmap_lock(mm);
+    uvm_va_space_down_write(va_space);
+
+    uvm_range_tree_for_each_safe(node, next, &hmm_va_space->blocks) {
+        uvm_va_block_region_t region;
+        struct vm_area_struct *vma;
+
+        va_block = hmm_va_block_from_node(node);
+        block_context = uvm_va_space_block_context(va_space, mm);
+        uvm_hmm_migrate_begin_wait(va_block);
+        uvm_mutex_lock(&va_block->lock);
+        for_each_va_block_vma_region(va_block, mm, vma, &region) {
+            if (!uvm_hmm_vma_is_valid(vma, vma->vm_start, false))
+                continue;
+
+            block_context->hmm.vma = vma;
+            uvm_hmm_va_block_migrate_locked(va_block,
+                                            NULL,
+                                            block_context,
+                                            UVM_ID_CPU,
+                                            region,
+                                            UVM_MAKE_RESIDENT_CAUSE_API_MIGRATE);
+        }
+        uvm_mutex_unlock(&va_block->lock);
+        uvm_hmm_migrate_finish(va_block);
+    }
+
+    uvm_va_space_up_write(va_space);
+    uvm_up_read_mmap_lock(mm);
+}
+
 NV_STATUS uvm_hmm_test_va_block_inject_split_error(uvm_va_space_t *va_space, NvU64 addr)
 {
    uvm_va_block_test_t *block_test;
@@ -1544,59 +1476,40 @@ static NV_STATUS hmm_va_block_cpu_page_populate(uvm_va_block_t *va_block,
        return status;
    }

-    status = uvm_va_block_map_cpu_chunk_on_gpus(va_block, chunk, page_index);
+    status = uvm_va_block_map_cpu_chunk_on_gpus(va_block, page_index);
    if (status != NV_OK) {
-        uvm_cpu_chunk_remove_from_block(va_block, page_to_nid(page), page_index);
+        uvm_cpu_chunk_remove_from_block(va_block, page_index);
        uvm_cpu_chunk_free(chunk);
    }

    return status;
 }

-static void hmm_va_block_cpu_unpopulate_chunk(uvm_va_block_t *va_block,
-                                              uvm_cpu_chunk_t *chunk,
-                                              int chunk_nid,
-                                              uvm_page_index_t page_index)
+static void hmm_va_block_cpu_page_unpopulate(uvm_va_block_t *va_block,
+                                             uvm_page_index_t page_index)
 {
+    uvm_cpu_chunk_t *chunk = uvm_cpu_chunk_get_chunk_for_page(va_block, page_index);
+
+    UVM_ASSERT(uvm_va_block_is_hmm(va_block));
+
    if (!chunk)
        return;

    UVM_ASSERT(!uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU) ||
-               !uvm_va_block_cpu_is_page_resident_on(va_block, NUMA_NO_NODE, page_index));
-    UVM_ASSERT(uvm_cpu_chunk_get_size(chunk) == PAGE_SIZE);
+               !uvm_page_mask_test(&va_block->cpu.resident, page_index));

-    uvm_cpu_chunk_remove_from_block(va_block, chunk_nid, page_index);
+    uvm_cpu_chunk_remove_from_block(va_block, page_index);
    uvm_va_block_unmap_cpu_chunk_on_gpus(va_block, chunk, page_index);
    uvm_cpu_chunk_free(chunk);
 }

-static void hmm_va_block_cpu_page_unpopulate(uvm_va_block_t *va_block, uvm_page_index_t page_index, struct page *page)
-{
-    uvm_cpu_chunk_t *chunk;
-
-    UVM_ASSERT(uvm_va_block_is_hmm(va_block));
-
-    if (page) {
-        chunk = uvm_cpu_chunk_get_chunk_for_page(va_block, page_to_nid(page), page_index);
-        hmm_va_block_cpu_unpopulate_chunk(va_block, chunk, page_to_nid(page), page_index);
-    }
-    else {
-        int nid;
-
-        for_each_possible_uvm_node(nid) {
-            chunk = uvm_cpu_chunk_get_chunk_for_page(va_block, nid, page_index);
-            hmm_va_block_cpu_unpopulate_chunk(va_block, chunk, nid, page_index);
-        }
-    }
-}
-
 static bool hmm_va_block_cpu_page_is_same(uvm_va_block_t *va_block,
                                          uvm_page_index_t page_index,
                                          struct page *page)
 {
-    struct page *old_page = uvm_va_block_get_cpu_page(va_block, page_index);
+    struct page *old_page = uvm_cpu_chunk_get_cpu_page(va_block, page_index);

-    UVM_ASSERT(uvm_cpu_chunk_is_hmm(uvm_cpu_chunk_get_chunk_for_page(va_block, page_to_nid(page), page_index)));
+    UVM_ASSERT(uvm_cpu_chunk_is_hmm(uvm_cpu_chunk_get_chunk_for_page(va_block, page_index)));
    return old_page == page;
 }

@@ -1609,7 +1522,7 @@ static void clear_service_context_masks(uvm_service_block_context_t *service_con
                                        uvm_processor_id_t new_residency,
                                        uvm_page_index_t page_index)
 {
-    uvm_page_mask_clear(&service_context->block_context->caller_page_mask, page_index);
+    uvm_page_mask_clear(&service_context->block_context.caller_page_mask, page_index);

    uvm_page_mask_clear(&service_context->per_processor_masks[uvm_id_value(new_residency)].new_residency,
                        page_index);
@@ -1636,6 +1549,7 @@ static void cpu_mapping_set(uvm_va_block_t *va_block,
                            uvm_page_index_t page_index)
 {
    uvm_processor_mask_set(&va_block->mapped, UVM_ID_CPU);
+    uvm_page_mask_set(&va_block->maybe_mapped_pages, page_index);
    uvm_page_mask_set(&va_block->cpu.pte_bits[UVM_PTE_BITS_CPU_READ], page_index);
    if (is_write)
        uvm_page_mask_set(&va_block->cpu.pte_bits[UVM_PTE_BITS_CPU_WRITE], page_index);
@@ -1785,7 +1699,7 @@ static NV_STATUS sync_page_and_chunk_state(uvm_va_block_t *va_block,
            // migrate_vma_finalize() will release the reference so we should
            // clear our pointer to it.
            // TODO: Bug 3660922: Need to handle read duplication at some point.
-            hmm_va_block_cpu_page_unpopulate(va_block, page_index, page);
+            hmm_va_block_cpu_page_unpopulate(va_block, page_index);
        }
    }

@@ -1811,7 +1725,7 @@ static void clean_up_non_migrating_page(uvm_va_block_t *va_block,
    else {
        UVM_ASSERT(page_ref_count(dst_page) == 1);

-        hmm_va_block_cpu_page_unpopulate(va_block, page_index, dst_page);
+        hmm_va_block_cpu_page_unpopulate(va_block, page_index);
    }

    unlock_page(dst_page);
@@ -1846,7 +1760,7 @@ static void lock_block_cpu_page(uvm_va_block_t *va_block,
                                unsigned long *dst_pfns,
                                uvm_page_mask_t *same_devmem_page_mask)
 {
-    uvm_cpu_chunk_t *chunk = uvm_cpu_chunk_get_chunk_for_page(va_block, page_to_nid(src_page), page_index);
+    uvm_cpu_chunk_t *chunk = uvm_cpu_chunk_get_chunk_for_page(va_block, page_index);
    uvm_va_block_region_t chunk_region;
    struct page *dst_page;

@@ -1872,7 +1786,7 @@ static void lock_block_cpu_page(uvm_va_block_t *va_block,
        // hmm_va_block_cpu_page_unpopulate() or block_kill(). If the page
        // does not migrate, it will be freed though.
        UVM_ASSERT(!uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU) ||
-                   !uvm_va_block_cpu_is_page_resident_on(va_block, NUMA_NO_NODE, page_index));
+                   !uvm_page_mask_test(&va_block->cpu.resident, page_index));
        UVM_ASSERT(chunk->type == UVM_CPU_CHUNK_TYPE_PHYSICAL);
        UVM_ASSERT(page_ref_count(dst_page) == 1);
        uvm_cpu_chunk_make_hmm(chunk);
@@ -2020,7 +1934,7 @@ static NV_STATUS alloc_and_copy_to_cpu(uvm_va_block_t *va_block,
        }

        UVM_ASSERT(!uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU) ||
-                   !uvm_va_block_cpu_is_page_resident_on(va_block, NUMA_NO_NODE, page_index));
+                   !uvm_page_mask_test(&va_block->cpu.resident, page_index));

        // Allocate a user system memory page for the destination.
        // This is the typical case since Linux will free the source page when
@@ -2098,8 +2012,8 @@ static NV_STATUS uvm_hmm_devmem_fault_alloc_and_copy(uvm_hmm_devmem_fault_contex
    service_context = devmem_fault_context->service_context;
    va_block_retry = devmem_fault_context->va_block_retry;
    va_block = devmem_fault_context->va_block;
-    src_pfns = service_context->block_context->hmm.src_pfns;
-    dst_pfns = service_context->block_context->hmm.dst_pfns;
+    src_pfns = service_context->block_context.hmm.src_pfns;
+    dst_pfns = service_context->block_context.hmm.dst_pfns;

    // Build the migration page mask.
    // Note that thrashing pinned pages and prefetch pages are already
@@ -2108,7 +2022,7 @@ static NV_STATUS uvm_hmm_devmem_fault_alloc_and_copy(uvm_hmm_devmem_fault_contex
    uvm_page_mask_copy(page_mask, &service_context->per_processor_masks[UVM_ID_CPU_VALUE].new_residency);

    status = alloc_and_copy_to_cpu(va_block,
-                                   service_context->block_context->hmm.vma,
+                                   service_context->block_context.hmm.vma,
                                   src_pfns,
                                   dst_pfns,
                                   service_context->region,
@@ -2143,8 +2057,8 @@ static NV_STATUS uvm_hmm_devmem_fault_finalize_and_map(uvm_hmm_devmem_fault_cont
    prefetch_hint = &service_context->prefetch_hint;
    va_block = devmem_fault_context->va_block;
    va_block_retry = devmem_fault_context->va_block_retry;
-    src_pfns = service_context->block_context->hmm.src_pfns;
-    dst_pfns = service_context->block_context->hmm.dst_pfns;
+    src_pfns = service_context->block_context.hmm.src_pfns;
+    dst_pfns = service_context->block_context.hmm.dst_pfns;
    region = service_context->region;

    page_mask = &devmem_fault_context->page_mask;
@@ -2251,7 +2165,8 @@ static NV_STATUS populate_region(uvm_va_block_t *va_block,

        // Since we have a stable snapshot of the CPU pages, we can
        // update the residency and protection information.
-        uvm_va_block_cpu_set_resident_page(va_block, page_to_nid(page), page_index);
+        uvm_processor_mask_set(&va_block->resident, UVM_ID_CPU);
+        uvm_page_mask_set(&va_block->cpu.resident, page_index);

        cpu_mapping_set(va_block, pfns[page_index] & HMM_PFN_WRITE, page_index);
    }
@@ -2338,7 +2253,7 @@ static void hmm_release_atomic_pages(uvm_va_block_t *va_block,
    uvm_page_index_t page_index;

    for_each_va_block_page_in_region(page_index, region) {
-        struct page *page = service_context->block_context->hmm.pages[page_index];
+        struct page *page = service_context->block_context.hmm.pages[page_index];

        if (!page)
            continue;
@@ -2354,14 +2269,14 @@ static NV_STATUS hmm_block_atomic_fault_locked(uvm_processor_id_t processor_id,
                                               uvm_service_block_context_t *service_context)
 {
    uvm_va_block_region_t region = service_context->region;
-    struct page **pages = service_context->block_context->hmm.pages;
+    struct page **pages = service_context->block_context.hmm.pages;
    int npages;
    uvm_page_index_t page_index;
    uvm_make_resident_cause_t cause;
    NV_STATUS status;

    if (!uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU) ||
-        !uvm_va_block_cpu_is_region_resident_on(va_block, NUMA_NO_NODE, region)) {
+        !uvm_page_mask_region_full(&va_block->cpu.resident, region)) {
        // There is an atomic GPU fault. We need to make sure no pages are
        // GPU resident so that make_device_exclusive_range() doesn't call
        // migrate_to_ram() and cause a va_space lock recursion problem.
@@ -2374,7 +2289,7 @@ static NV_STATUS hmm_block_atomic_fault_locked(uvm_processor_id_t processor_id,

        status = uvm_hmm_va_block_migrate_locked(va_block,
                                                 va_block_retry,
-                                                 service_context->block_context,
+                                                 &service_context->block_context,
                                                 UVM_ID_CPU,
                                                 region,
                                                 cause);
@@ -2384,7 +2299,7 @@ static NV_STATUS hmm_block_atomic_fault_locked(uvm_processor_id_t processor_id,
        // make_device_exclusive_range() will try to call migrate_to_ram()
        // and deadlock with ourself if the data isn't CPU resident.
        if (!uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU) ||
-            !uvm_va_block_cpu_is_region_resident_on(va_block, NUMA_NO_NODE, region)) {
+            !uvm_page_mask_region_full(&va_block->cpu.resident, region)) {
            status = NV_WARN_MORE_PROCESSING_REQUIRED;
            goto done;
        }
@@ -2394,7 +2309,7 @@ static NV_STATUS hmm_block_atomic_fault_locked(uvm_processor_id_t processor_id,
    // mmap() files so we check for that here and report a fatal fault.
    // Otherwise with the current Linux 6.1 make_device_exclusive_range(),
    // it doesn't make the page exclusive and we end up in an endless loop.
-    if (service_context->block_context->hmm.vma->vm_flags & (VM_SHARED | VM_HUGETLB)) {
+    if (service_context->block_context.hmm.vma->vm_flags & VM_SHARED) {
        status = NV_ERR_NOT_SUPPORTED;
        goto done;
    }
@@ -2403,7 +2318,7 @@ static NV_STATUS hmm_block_atomic_fault_locked(uvm_processor_id_t processor_id,

    uvm_mutex_unlock(&va_block->lock);

-    npages = make_device_exclusive_range(service_context->block_context->mm,
+    npages = make_device_exclusive_range(service_context->block_context.mm,
        uvm_va_block_cpu_page_address(va_block, region.first),
        uvm_va_block_cpu_page_address(va_block, region.outer - 1) + PAGE_SIZE,
        pages + region.first,
@@ -2441,13 +2356,15 @@ static NV_STATUS hmm_block_atomic_fault_locked(uvm_processor_id_t processor_id,
        if (uvm_page_mask_test(&va_block->cpu.allocated, page_index)) {
            UVM_ASSERT(hmm_va_block_cpu_page_is_same(va_block, page_index, page));
            UVM_ASSERT(uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU));
-            UVM_ASSERT(uvm_va_block_cpu_is_page_resident_on(va_block, NUMA_NO_NODE, page_index));
+            UVM_ASSERT(uvm_page_mask_test(&va_block->cpu.resident, page_index));
        }
        else {
            NV_STATUS s = hmm_va_block_cpu_page_populate(va_block, page_index, page);

-            if (s == NV_OK)
-                uvm_va_block_cpu_set_resident_page(va_block, page_to_nid(page), page_index);
+            if (s == NV_OK) {
+                uvm_processor_mask_set(&va_block->resident, UVM_ID_CPU);
+                uvm_page_mask_set(&va_block->cpu.resident, page_index);
+            }
        }

        cpu_mapping_clear(va_block, page_index);
@@ -2502,7 +2419,7 @@ static NV_STATUS hmm_block_cpu_fault_locked(uvm_processor_id_t processor_id,
                                            uvm_service_block_context_t *service_context)
 {
    uvm_va_block_region_t region = service_context->region;
-    struct migrate_vma *args = &service_context->block_context->hmm.migrate_vma_args;
+    struct migrate_vma *args = &service_context->block_context.hmm.migrate_vma_args;
    NV_STATUS status;
    int ret;
    uvm_hmm_devmem_fault_context_t fault_context = {
@@ -2536,8 +2453,8 @@ static NV_STATUS hmm_block_cpu_fault_locked(uvm_processor_id_t processor_id,
        }

        status = hmm_make_resident_cpu(va_block,
-                                       service_context->block_context->hmm.vma,
-                                       service_context->block_context->hmm.src_pfns,
+                                       service_context->block_context.hmm.vma,
+                                       service_context->block_context.hmm.src_pfns,
                                       region,
                                       service_context->access_type,
                                       &fault_context.same_devmem_page_mask);
@@ -2559,9 +2476,9 @@ static NV_STATUS hmm_block_cpu_fault_locked(uvm_processor_id_t processor_id,
        }
    }

-    args->vma = service_context->block_context->hmm.vma;
-    args->src = service_context->block_context->hmm.src_pfns + region.first;
-    args->dst = service_context->block_context->hmm.dst_pfns + region.first;
+    args->vma = service_context->block_context.hmm.vma;
+    args->src = service_context->block_context.hmm.src_pfns + region.first;
+    args->dst = service_context->block_context.hmm.dst_pfns + region.first;
    args->start = uvm_va_block_region_start(va_block, region);
    args->end = uvm_va_block_region_end(va_block, region) + 1;
    args->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
@@ -2641,7 +2558,7 @@ static NV_STATUS dmamap_src_sysmem_pages(uvm_va_block_t *va_block,
                // TODO: Bug 4050579: Remove this when swap cached pages can be
                // migrated.
                if (service_context) {
-                    service_context->block_context->hmm.swap_cached = true;
+                    service_context->block_context.hmm.swap_cached = true;
                    break;
                }

@@ -2657,7 +2574,7 @@ static NV_STATUS dmamap_src_sysmem_pages(uvm_va_block_t *va_block,
            if (uvm_page_mask_test(&va_block->cpu.allocated, page_index)) {
                UVM_ASSERT(hmm_va_block_cpu_page_is_same(va_block, page_index, src_page));
                UVM_ASSERT(uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU));
-                UVM_ASSERT(uvm_va_block_cpu_is_page_resident_on(va_block, NUMA_NO_NODE, page_index));
+                UVM_ASSERT(uvm_page_mask_test(&va_block->cpu.resident, page_index));
            }
            else {
                status = hmm_va_block_cpu_page_populate(va_block, page_index, src_page);
@@ -2671,7 +2588,8 @@ static NV_STATUS dmamap_src_sysmem_pages(uvm_va_block_t *va_block,

                // migrate_vma_setup() was able to isolate and lock the page;
                // therefore, it is CPU resident and not mapped.
-                uvm_va_block_cpu_set_resident_page(va_block, page_to_nid(src_page), page_index);
+                uvm_processor_mask_set(&va_block->resident, UVM_ID_CPU);
+                uvm_page_mask_set(&va_block->cpu.resident, page_index);
            }

            // The call to migrate_vma_setup() will have inserted a migration
@@ -2686,7 +2604,7 @@ static NV_STATUS dmamap_src_sysmem_pages(uvm_va_block_t *va_block,
            if (uvm_page_mask_test(&va_block->cpu.allocated, page_index)) {
                UVM_ASSERT(!uvm_va_block_page_resident_processors_count(va_block, page_index));

-                hmm_va_block_cpu_page_unpopulate(va_block, page_index, NULL);
+                hmm_va_block_cpu_page_unpopulate(va_block, page_index);
            }
        }

@@ -2700,7 +2618,7 @@ static NV_STATUS dmamap_src_sysmem_pages(uvm_va_block_t *va_block,
    }

    if (uvm_page_mask_empty(page_mask) ||
-        (service_context && service_context->block_context->hmm.swap_cached))
+        (service_context && service_context->block_context.hmm.swap_cached))
        status = NV_WARN_MORE_PROCESSING_REQUIRED;

    if (status != NV_OK)
@@ -2731,8 +2649,8 @@ static NV_STATUS uvm_hmm_gpu_fault_alloc_and_copy(struct vm_area_struct *vma,
    service_context = uvm_hmm_gpu_fault_event->service_context;
    region = service_context->region;
    prefetch_hint = &service_context->prefetch_hint;
-    src_pfns = service_context->block_context->hmm.src_pfns;
-    dst_pfns = service_context->block_context->hmm.dst_pfns;
+    src_pfns = service_context->block_context.hmm.src_pfns;
+    dst_pfns = service_context->block_context.hmm.dst_pfns;

    // Build the migration mask.
    // Note that thrashing pinned pages are already accounted for in
@@ -2790,8 +2708,8 @@ static NV_STATUS uvm_hmm_gpu_fault_finalize_and_map(uvm_hmm_gpu_fault_event_t *u
    va_block = uvm_hmm_gpu_fault_event->va_block;
    va_block_retry = uvm_hmm_gpu_fault_event->va_block_retry;
    service_context = uvm_hmm_gpu_fault_event->service_context;
-    src_pfns = service_context->block_context->hmm.src_pfns;
-    dst_pfns = service_context->block_context->hmm.dst_pfns;
+    src_pfns = service_context->block_context.hmm.src_pfns;
+    dst_pfns = service_context->block_context.hmm.dst_pfns;
    region = service_context->region;
    page_mask = &uvm_hmm_gpu_fault_event->page_mask;

@@ -2834,11 +2752,11 @@ NV_STATUS uvm_hmm_va_block_service_locked(uvm_processor_id_t processor_id,
                                          uvm_va_block_retry_t *va_block_retry,
                                          uvm_service_block_context_t *service_context)
 {
-    struct mm_struct *mm = service_context->block_context->mm;
-    struct vm_area_struct *vma = service_context->block_context->hmm.vma;
+    struct mm_struct *mm = service_context->block_context.mm;
+    struct vm_area_struct *vma = service_context->block_context.hmm.vma;
    uvm_va_block_region_t region = service_context->region;
    uvm_hmm_gpu_fault_event_t uvm_hmm_gpu_fault_event;
-    struct migrate_vma *args = &service_context->block_context->hmm.migrate_vma_args;
+    struct migrate_vma *args = &service_context->block_context.hmm.migrate_vma_args;
    int ret;
    NV_STATUS status = NV_ERR_INVALID_ADDRESS;

@@ -2862,8 +2780,8 @@ NV_STATUS uvm_hmm_va_block_service_locked(uvm_processor_id_t processor_id,
    uvm_hmm_gpu_fault_event.service_context = service_context;

    args->vma = vma;
-    args->src = service_context->block_context->hmm.src_pfns + region.first;
-    args->dst = service_context->block_context->hmm.dst_pfns + region.first;
+    args->src = service_context->block_context.hmm.src_pfns + region.first;
+    args->dst = service_context->block_context.hmm.dst_pfns + region.first;
    args->start = uvm_va_block_region_start(va_block, region);
    args->end = uvm_va_block_region_end(va_block, region) + 1;
    args->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE | MIGRATE_VMA_SELECT_SYSTEM;
@@ -2897,8 +2815,8 @@ NV_STATUS uvm_hmm_va_block_service_locked(uvm_processor_id_t processor_id,
        // since migrate_vma_setup() would have reported that information.
        // Try to make it resident in system memory and retry the migration.
        status = hmm_make_resident_cpu(va_block,
-                                       service_context->block_context->hmm.vma,
-                                       service_context->block_context->hmm.src_pfns,
+                                       service_context->block_context.hmm.vma,
+                                       service_context->block_context.hmm.src_pfns,
                                       region,
                                       service_context->access_type,
                                       NULL);
@@ -3044,6 +2962,16 @@ static NV_STATUS uvm_hmm_migrate_finalize(uvm_hmm_migrate_event_t *uvm_hmm_migra
                                     &uvm_hmm_migrate_event->same_devmem_page_mask);
 }

+static bool is_resident(uvm_va_block_t *va_block,
+                        uvm_processor_id_t dest_id,
+                        uvm_va_block_region_t region)
+{
+    if (!uvm_processor_mask_test(&va_block->resident, dest_id))
+        return false;
+
+    return uvm_page_mask_region_full(uvm_va_block_resident_mask_get(va_block, dest_id), region);
+}
+
 // Note that migrate_vma_*() doesn't handle asynchronous migrations so the
 // migration flag UVM_MIGRATE_FLAG_SKIP_CPU_MAP doesn't have an effect.
 // TODO: Bug 3900785: investigate ways to implement async migration.
@@ -3135,7 +3063,9 @@ NV_STATUS uvm_hmm_va_block_migrate_locked(uvm_va_block_t *va_block,
        uvm_page_mask_init_from_region(page_mask, region, NULL);

        for_each_id_in_mask(id, &va_block->resident) {
-            if (!uvm_page_mask_andnot(page_mask, page_mask, uvm_va_block_resident_mask_get(va_block, id, NUMA_NO_NODE)))
+            if (!uvm_page_mask_andnot(page_mask,
+                                      page_mask,
+                                      uvm_va_block_resident_mask_get(va_block, id)))
                return NV_OK;
        }

@@ -3263,7 +3193,6 @@ static NV_STATUS hmm_va_block_evict_chunks(uvm_va_block_t *va_block,
    uvm_page_mask_t *page_mask = &uvm_hmm_migrate_event.page_mask;
    const uvm_va_policy_t *policy;
    uvm_va_policy_node_t *node;
-    uvm_page_mask_t *cpu_resident_mask = uvm_va_block_resident_mask_get(va_block, UVM_ID_CPU, NUMA_NO_NODE);
    unsigned long npages;
    NV_STATUS status;

@@ -3286,7 +3215,7 @@ static NV_STATUS hmm_va_block_evict_chunks(uvm_va_block_t *va_block,
        // Pages resident on the GPU should not have a resident page in system
        // memory.
        // TODO: Bug 3660922: Need to handle read duplication at some point.
-        UVM_ASSERT(uvm_page_mask_region_empty(cpu_resident_mask, region));
+        UVM_ASSERT(uvm_page_mask_region_empty(&va_block->cpu.resident, region));

        status = alloc_and_copy_to_cpu(va_block,
                                       NULL,
@@ -3385,34 +3314,35 @@ NV_STATUS uvm_hmm_va_block_evict_pages_from_gpu(uvm_va_block_t *va_block,
                                     NULL);
 }

-NV_STATUS uvm_hmm_remote_cpu_fault(struct vm_fault *vmf)
+NV_STATUS uvm_hmm_pmm_gpu_evict_pfn(unsigned long pfn)
 {
+    unsigned long src_pfn = 0;
+    unsigned long dst_pfn = 0;
+    struct page *dst_page;
    NV_STATUS status = NV_OK;
-    unsigned long src_pfn;
-    unsigned long dst_pfn;
-    struct migrate_vma args;
-    struct page *src_page = vmf->page;
-    uvm_tracker_t tracker = UVM_TRACKER_INIT();
    int ret;

-    args.vma = vmf->vma;
-    args.src = &src_pfn;
-    args.dst = &dst_pfn;
-    args.start = nv_page_fault_va(vmf);
-    args.end = args.start + PAGE_SIZE;
-    args.pgmap_owner = &g_uvm_global;
-    args.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
-    args.fault_page = src_page;
-
-    // We don't call migrate_vma_setup_locked() here because we don't
-    // have a va_block and don't want to ignore invalidations.
-    ret = migrate_vma_setup(&args);
-    UVM_ASSERT(!ret);
+    ret = migrate_device_range(&src_pfn, pfn, 1);
+    if (ret)
+        return errno_to_nv_status(ret);

    if (src_pfn & MIGRATE_PFN_MIGRATE) {
-        struct page *dst_page;
-
-        dst_page = alloc_page(GFP_HIGHUSER_MOVABLE);
+        // All the code for copying a vidmem page to sysmem relies on
+        // having a va_block. However certain combinations of mremap()
+        // and fork() can result in device-private pages being mapped
+        // in a child process without a va_block.
+        //
+        // We don't expect the above to be a common occurance so for
+        // now we allocate a fresh zero page when evicting without a
+        // va_block. However this results in child processes losing
+        // data so make sure we warn about it. Ideally we would just
+        // not migrate and SIGBUS the child if it tries to access the
+        // page. However that would prevent unloading of the driver so
+        // we're stuck with this until we fix the problem.
+        // TODO: Bug 3902536: add code to migrate GPU memory without having a
+        // va_block.
+        WARN_ON(1);
+        dst_page = alloc_page(GFP_HIGHUSER_MOVABLE | __GFP_ZERO);
        if (!dst_page) {
            status = NV_ERR_NO_MEMORY;
            goto out;
@@ -3421,15 +3351,11 @@ NV_STATUS uvm_hmm_remote_cpu_fault(struct vm_fault *vmf)
        lock_page(dst_page);
        dst_pfn = migrate_pfn(page_to_pfn(dst_page));

-        status = uvm_hmm_copy_devmem_page(dst_page, src_page, &tracker);
-        if (status == NV_OK)
-            status = uvm_tracker_wait_deinit(&tracker);
+        migrate_device_pages(&src_pfn, &dst_pfn, 1);
    }

-    migrate_vma_pages(&args);
-
 out:
-    migrate_vma_finalize(&args);
+    migrate_device_finalize(&src_pfn, &dst_pfn, 1);

    return status;
 }
@@ -3680,3 +3606,4 @@ bool uvm_hmm_must_use_sysmem(uvm_va_block_t *va_block,
 }

 #endif // UVM_IS_CONFIG_HMM()
+
--- a/kernel-open/nvidia-uvm/uvm_hmm.h
+++ b/kernel-open/nvidia-uvm/uvm_hmm.h
@@ -307,10 +307,10 @@ typedef struct
                                     uvm_migrate_mode_t mode,
                                     uvm_tracker_t *out_tracker);

-    // Handle a fault to a device-private page from a process other than the
-    // process which created the va_space that originally allocated the
-    // device-private page.
-    NV_STATUS uvm_hmm_remote_cpu_fault(struct vm_fault *vmf);
+    // Evicts all va_blocks in the va_space to the CPU. Unlike the
+    // other va_block eviction functions this is based on virtual
+    // address and therefore takes mmap_lock for read.
+    void uvm_hmm_evict_va_blocks(uvm_va_space_t *va_space);

    // This sets the va_block_context->hmm.src_pfns[] to the ZONE_DEVICE private
    // PFN for the GPU chunk memory.
@@ -343,6 +343,14 @@ typedef struct
                                                    const uvm_page_mask_t *pages_to_evict,
                                                    uvm_va_block_region_t region);

+    // Migrate a GPU device-private page to system memory. This is
+    // called to remove CPU page table references to device private
+    // struct pages for the given GPU after all other references in
+    // va_blocks have been released and the GPU is in the process of
+    // being removed/torn down. Note that there is no mm, VMA,
+    // va_block or any user channel activity on this GPU.
+    NV_STATUS uvm_hmm_pmm_gpu_evict_pfn(unsigned long pfn);
+
    // This returns what would be the intersection of va_block start/end and
    // VMA start/end-1 for the given 'lookup_address' if
    // uvm_hmm_va_block_find_create() was called.
@@ -584,10 +592,8 @@ typedef struct
        return NV_ERR_INVALID_ADDRESS;
    }

-    static NV_STATUS uvm_hmm_remote_cpu_fault(struct vm_fault *vmf)
+    static void uvm_hmm_evict_va_blocks(uvm_va_space_t *va_space)
    {
-        UVM_ASSERT(0);
-        return NV_ERR_INVALID_ADDRESS;
    }

    static NV_STATUS uvm_hmm_va_block_evict_chunk_prep(uvm_va_block_t *va_block,
@@ -616,6 +622,11 @@ typedef struct
        return NV_OK;
    }

+    static NV_STATUS uvm_hmm_pmm_gpu_evict_pfn(unsigned long pfn)
+    {
+        return NV_OK;
+    }
+
    static NV_STATUS uvm_hmm_va_block_range_bounds(uvm_va_space_t *va_space,
                                                   struct mm_struct *mm,
                                                   NvU64 lookup_address,
--- a/kernel-open/nvidia-uvm/uvm_hopper.c
+++ b/kernel-open/nvidia-uvm/uvm_hopper.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2020-2023 NVIDIA Corporation
+    Copyright (c) 2020-2022 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -59,12 +59,12 @@ void uvm_hal_hopper_arch_init_properties(uvm_parent_gpu_t *parent_gpu)

    // Physical CE writes to vidmem are non-coherent with respect to the CPU on
    // GH180.
-    parent_gpu->ce_phys_vidmem_write_supported = !uvm_parent_gpu_is_coherent(parent_gpu);
+    parent_gpu->ce_phys_vidmem_write_supported = !uvm_gpu_is_coherent(parent_gpu);

    // TODO: Bug 4174553: [HGX-SkinnyJoe][GH180] channel errors discussion/debug
    //                    portion for the uvm tests became nonresponsive after
    //                    some time and then failed even after reboot
-    parent_gpu->peer_copy_mode = uvm_parent_gpu_is_coherent(parent_gpu) ?
+    parent_gpu->peer_copy_mode = uvm_gpu_is_coherent(parent_gpu) ?
                                                           UVM_GPU_PEER_COPY_MODE_VIRTUAL : g_uvm_global.peer_copy_mode;

    // All GR context buffers may be mapped to 57b wide VAs. All "compute" units
@@ -103,5 +103,7 @@ void uvm_hal_hopper_arch_init_properties(uvm_parent_gpu_t *parent_gpu)
    parent_gpu->map_remap_larger_page_promotion = false;

    parent_gpu->plc_supported = true;
+
+    parent_gpu->no_ats_range_required = true;
 }

--- a/kernel-open/nvidia-uvm/uvm_hopper_mmu.c
+++ b/kernel-open/nvidia-uvm/uvm_hopper_mmu.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2020-2022 NVIDIA Corporation
+    Copyright (c) 2020-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -33,6 +33,7 @@

 #include "uvm_types.h"
 #include "uvm_global.h"
+#include "uvm_common.h"
 #include "uvm_hal.h"
 #include "uvm_hal_types.h"
 #include "uvm_hopper_fault_buffer.h"
@@ -42,6 +43,10 @@
 #define MMU_BIG 0
 #define MMU_SMALL 1

+// Used in pde_pcf().
+#define ATS_ALLOWED 0
+#define ATS_NOT_ALLOWED 1
+
 uvm_mmu_engine_type_t uvm_hal_hopper_mmu_engine_id_to_type(NvU16 mmu_engine_id)
 {
    if (mmu_engine_id >= NV_PFAULT_MMU_ENG_ID_HOST0 && mmu_engine_id <= NV_PFAULT_MMU_ENG_ID_HOST44)
@@ -260,7 +265,108 @@ static NvU64 poisoned_pte_hopper(void)
    return WRITE_HWCONST64(pte_bits, _MMU_VER3, PTE, PCF, PRIVILEGE_RO_NO_ATOMIC_UNCACHED_ACD);
 }

-static NvU64 single_pde_hopper(uvm_mmu_page_table_alloc_t *phys_alloc, NvU32 depth)
+typedef enum
+{
+    PDE_TYPE_SINGLE,
+    PDE_TYPE_DUAL_BIG,
+    PDE_TYPE_DUAL_SMALL,
+    PDE_TYPE_COUNT,
+} pde_type_t;
+
+static const NvU8 valid_pcf[][2] = { { NV_MMU_VER3_PDE_PCF_VALID_UNCACHED_ATS_ALLOWED,
+                                       NV_MMU_VER3_PDE_PCF_VALID_UNCACHED_ATS_NOT_ALLOWED },
+                                     { NV_MMU_VER3_DUAL_PDE_PCF_BIG_VALID_UNCACHED_ATS_ALLOWED,
+                                       NV_MMU_VER3_DUAL_PDE_PCF_BIG_VALID_UNCACHED_ATS_NOT_ALLOWED },
+                                     { NV_MMU_VER3_DUAL_PDE_PCF_SMALL_VALID_UNCACHED_ATS_ALLOWED,
+                                       NV_MMU_VER3_DUAL_PDE_PCF_SMALL_VALID_UNCACHED_ATS_NOT_ALLOWED } };
+
+static const NvU8 invalid_pcf[][2] = { { NV_MMU_VER3_PDE_PCF_INVALID_ATS_ALLOWED,
+                                         NV_MMU_VER3_PDE_PCF_INVALID_ATS_NOT_ALLOWED },
+                                       { NV_MMU_VER3_DUAL_PDE_PCF_BIG_INVALID_ATS_ALLOWED,
+                                         NV_MMU_VER3_DUAL_PDE_PCF_BIG_INVALID_ATS_NOT_ALLOWED },
+                                       { NV_MMU_VER3_DUAL_PDE_PCF_SMALL_INVALID_ATS_ALLOWED,
+                                         NV_MMU_VER3_DUAL_PDE_PCF_SMALL_INVALID_ATS_NOT_ALLOWED } };
+
+static const NvU8 va_base[] = { 56, 47, 38, 29, 21 };
+
+static bool is_ats_range_valid(uvm_page_directory_t *dir, NvU32 child_index)
+{
+    NvU64 pde_base_va;
+    NvU64 min_va_upper;
+    NvU64 max_va_lower;
+    NvU32 index_in_dir;
+
+    uvm_cpu_get_unaddressable_range(&max_va_lower, &min_va_upper);
+
+    UVM_ASSERT(dir->depth < ARRAY_SIZE(va_base));
+
+    // We can use UVM_PAGE_SIZE_AGNOSTIC because page_size is only used in
+    // index_bits_hopper() for PTE table, i.e., depth 5+, which does not use a
+    // PDE PCF or an ATS_ALLOWED/NOT_ALLOWED setting.
+    UVM_ASSERT(child_index < (1ull << index_bits_hopper(dir->depth, UVM_PAGE_SIZE_AGNOSTIC)));
+
+    pde_base_va = 0;
+    index_in_dir = child_index;
+    while (dir) {
+        pde_base_va += index_in_dir * (1ull << va_base[dir->depth]);
+        index_in_dir = dir->index_in_parent;
+        dir = dir->host_parent;
+    }
+    pde_base_va = (NvU64)((NvS64)(pde_base_va << (64 - num_va_bits_hopper())) >> (64 - num_va_bits_hopper()));
+
+    if (pde_base_va < max_va_lower || pde_base_va >= min_va_upper)
+        return true;
+
+    return false;
+}
+
+// PDE Permission Control Flags
+static NvU32 pde_pcf(bool valid, pde_type_t pde_type, uvm_page_directory_t *dir, NvU32 child_index)
+{
+    const NvU8 (*pcf)[2] = valid ? valid_pcf : invalid_pcf;
+    NvU8 depth = dir->depth;
+
+    UVM_ASSERT(pde_type < PDE_TYPE_COUNT);
+    UVM_ASSERT(depth < 5);
+
+    // On non-ATS systems, PDE PCF only sets the valid and volatile/cache bits.
+    if (!g_uvm_global.ats.enabled)
+        return pcf[pde_type][ATS_ALLOWED];
+
+    // We assume all supported ATS platforms use canonical form address.
+    // See comments in uvm_gpu.c:uvm_gpu_can_address() and in
+    // uvm_mmu.c:page_tree_ats_init();
+    UVM_ASSERT(uvm_platform_uses_canonical_form_address());
+
+    // Hopper GPUs on ATS-enabled systems, perform a parallel lookup on both
+    // ATS and GMMU page tables. For managed memory we need to prevent this
+    // parallel lookup since we would not get any GPU fault if the CPU has
+    // a valid mapping. Also, for external ranges that are known to be
+    // mapped entirely on the GMMU page table we can skip the ATS lookup
+    // for performance reasons. Parallel ATS lookup is disabled in PDE1
+    // (depth 3) and, therefore, it applies to the underlying 512MB VA
+    // range.
+    //
+    // UVM sets ATS_NOT_ALLOWED for all Hopper+ mappings on ATS systems.
+    // This is fine because CUDA ensures that all managed and external
+    // allocations are properly compartmentalized in 512MB-aligned VA
+    // regions. For cudaHostRegister CUDA cannot control the VA range, but
+    // we rely on ATS for those allocations so they can't choose the
+    // ATS_NOT_ALLOWED mode.
+    // TODO: Bug 3254055: Relax the NO_ATS setting from 512MB (pde1) range to
+    //                    PTEs.
+    // HW complies with the leaf PDE's ATS_ALLOWED/ATS_NOT_ALLOWED settings,
+    // enabling us to treat any upper-level PDE as a don't care as long as there
+    // are leaf PDEs for the entire upper-level PDE range. We assume PDE4
+    // entries (depth == 0) are always ATS enabled, and the no_ats_range is in
+    // PDE3 or lower.
+    if (depth == 0 || (!valid && is_ats_range_valid(dir, child_index)))
+        return pcf[pde_type][ATS_ALLOWED];
+
+    return pcf[pde_type][ATS_NOT_ALLOWED];
+}
+
+static NvU64 single_pde_hopper(uvm_mmu_page_table_alloc_t *phys_alloc, uvm_page_directory_t *dir, NvU32 child_index)
 {
    NvU64 pde_bits = 0;

@@ -280,38 +386,17 @@ static NvU64 single_pde_hopper(uvm_mmu_page_table_alloc_t *phys_alloc, NvU32 dep
                break;
        }

-        // PCF (permission control flags) 5:3
-        // Hopper GPUs on ATS-enabled systems, perform a parallel lookup on both
-        // ATS and GMMU page tables. For managed memory we need to prevent this
-        // parallel lookup since we would not get any GPU fault if the CPU has
-        // a valid mapping. Also, for external ranges that are known to be
-        // mapped entirely on the GMMU page table we can skip the ATS lookup
-        // for performance reasons. Parallel ATS lookup is disabled in PDE1
-        // (depth 3) and, therefore, it applies to the underlying 512MB VA
-        // range.
-        //
-        // UVM sets ATS_NOT_ALLOWED for all Hopper+ mappings on ATS systems.
-        // This is fine because CUDA ensures that all managed and external
-        // allocations are properly compartmentalized in 512MB-aligned VA
-        // regions. For cudaHostRegister CUDA cannot control the VA range, but
-        // we rely on ATS for those allocations so they can't choose the
-        // ATS_NOT_ALLOWED mode.
-        //
-        // TODO: Bug 3254055: Relax the NO_ATS setting from 512MB (pde1) range
-        // to PTEs.
-        if (depth == 3 && g_uvm_global.ats.enabled)
-            pde_bits |= HWCONST64(_MMU_VER3, PDE, PCF, VALID_UNCACHED_ATS_NOT_ALLOWED);
-        else
-            pde_bits |= HWCONST64(_MMU_VER3, PDE, PCF, VALID_UNCACHED_ATS_ALLOWED);
-
        // address 51:12
        pde_bits |= HWVALUE64(_MMU_VER3, PDE, ADDRESS, address);
    }

+    // PCF (permission control flags) 5:3
+    pde_bits |= HWVALUE64(_MMU_VER3, PDE, PCF, pde_pcf(phys_alloc != NULL, PDE_TYPE_SINGLE, dir, child_index));
+
    return pde_bits;
 }

-static NvU64 big_half_pde_hopper(uvm_mmu_page_table_alloc_t *phys_alloc)
+static NvU64 big_half_pde_hopper(uvm_mmu_page_table_alloc_t *phys_alloc, uvm_page_directory_t *dir, NvU32 child_index)
 {
    NvU64 pde_bits = 0;

@@ -330,17 +415,20 @@ static NvU64 big_half_pde_hopper(uvm_mmu_page_table_alloc_t *phys_alloc)
                break;
        }

-        // PCF (permission control flags) 5:3
-        pde_bits |= HWCONST64(_MMU_VER3, DUAL_PDE, PCF_BIG, VALID_UNCACHED_ATS_NOT_ALLOWED);
-
        // address 51:8
        pde_bits |= HWVALUE64(_MMU_VER3, DUAL_PDE, ADDRESS_BIG, address);
    }

+    // PCF (permission control flags) 5:3
+    pde_bits |= HWVALUE64(_MMU_VER3,
+                          DUAL_PDE,
+                          PCF_BIG,
+                          pde_pcf(phys_alloc != NULL, PDE_TYPE_DUAL_BIG, dir, child_index));
+
    return pde_bits;
 }

-static NvU64 small_half_pde_hopper(uvm_mmu_page_table_alloc_t *phys_alloc)
+static NvU64 small_half_pde_hopper(uvm_mmu_page_table_alloc_t *phys_alloc, uvm_page_directory_t *dir, NvU32 child_index)
 {
    NvU64 pde_bits = 0;

@@ -359,29 +447,40 @@ static NvU64 small_half_pde_hopper(uvm_mmu_page_table_alloc_t *phys_alloc)
                break;
        }

-        // PCF (permission control flags) 69:67 [5:3]
-        pde_bits |= HWCONST64(_MMU_VER3, DUAL_PDE, PCF_SMALL, VALID_UNCACHED_ATS_NOT_ALLOWED);
-
        // address 115:76 [51:12]
        pde_bits |= HWVALUE64(_MMU_VER3, DUAL_PDE, ADDRESS_SMALL, address);
    }
+
+    // PCF (permission control flags) 69:67 [5:3]
+    pde_bits |= HWVALUE64(_MMU_VER3,
+                          DUAL_PDE,
+                          PCF_SMALL,
+                          pde_pcf(phys_alloc != NULL, PDE_TYPE_DUAL_SMALL, dir, child_index));
+
    return pde_bits;
 }

-static void make_pde_hopper(void *entry, uvm_mmu_page_table_alloc_t **phys_allocs, NvU32 depth)
+static void make_pde_hopper(void *entry,
+                            uvm_mmu_page_table_alloc_t **phys_allocs,
+                            uvm_page_directory_t *dir,
+                            NvU32 child_index)
 {
-    NvU32 entry_count = entries_per_index_hopper(depth);
+    NvU32 entry_count;
    NvU64 *entry_bits = (NvU64 *)entry;

+    UVM_ASSERT(dir);
+
+    entry_count = entries_per_index_hopper(dir->depth);
+
    if (entry_count == 1) {
-        *entry_bits = single_pde_hopper(*phys_allocs, depth);
+        *entry_bits = single_pde_hopper(*phys_allocs, dir, child_index);
    }
    else if (entry_count == 2) {
-        entry_bits[MMU_BIG] = big_half_pde_hopper(phys_allocs[MMU_BIG]);
-        entry_bits[MMU_SMALL] = small_half_pde_hopper(phys_allocs[MMU_SMALL]);
+        entry_bits[MMU_BIG] = big_half_pde_hopper(phys_allocs[MMU_BIG], dir, child_index);
+        entry_bits[MMU_SMALL] = small_half_pde_hopper(phys_allocs[MMU_SMALL], dir, child_index);

        // This entry applies to the whole dual PDE but is stored in the lower
-        // bits
+        // bits.
        entry_bits[MMU_BIG] |= HWCONST64(_MMU_VER3, DUAL_PDE, IS_PTE, FALSE);
    }
    else {
--- a/kernel-open/nvidia-uvm/uvm_linux.h
+++ b/kernel-open/nvidia-uvm/uvm_linux.h
@@ -114,6 +114,16 @@ static inline const struct cpumask *uvm_cpumask_of_node(int node)
        #define UVM_IS_CONFIG_HMM() 0
    #endif

+// ATS prefetcher uses hmm_range_fault() to query residency information.
+// hmm_range_fault() needs CONFIG_HMM_MIRROR. To detect racing CPU invalidates
+// of memory regions while hmm_range_fault() is being called, MMU interval
+// notifiers are needed.
+    #if defined(CONFIG_HMM_MIRROR) && defined(NV_MMU_INTERVAL_NOTIFIER)
+        #define UVM_HMM_RANGE_FAULT_SUPPORTED() 1
+    #else
+        #define UVM_HMM_RANGE_FAULT_SUPPORTED() 0
+    #endif
+
 // Various issues prevent us from using mmu_notifiers in older kernels. These
 // include:
 //  - ->release being called under RCU instead of SRCU: fixed by commit
@@ -349,47 +359,6 @@ static inline NvU64 NV_GETTIME(void)
             (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
 #endif

-#if !defined(NV_FIND_NEXT_BIT_WRAP_PRESENT)
-    static inline unsigned long find_next_bit_wrap(const unsigned long *addr, unsigned long size, unsigned long offset)
-    {
-        unsigned long bit = find_next_bit(addr, size, offset);
-
-        if (bit < size)
-            return bit;
-
-        bit = find_first_bit(addr, offset);
-        return bit < offset ? bit : size;
-    }
-#endif
-
-// for_each_set_bit_wrap and __for_each_wrap were introduced in v6.1-rc1
-// by commit 4fe49b3b97c2640147c46519c2a6fdb06df34f5f
-#if !defined(for_each_set_bit_wrap)
-static inline unsigned long __for_each_wrap(const unsigned long *bitmap,
-                                            unsigned long size,
-                                            unsigned long start,
-                                            unsigned long n)
-{
-    unsigned long bit;
-
-    if (n > start) {
-        bit = find_next_bit(bitmap, size, n);
-        if (bit < size)
-            return bit;
-
-        n = 0;
-    }
-
-    bit = find_next_bit(bitmap, start, n);
-    return bit < start ? bit : size;
-}
-
-#define for_each_set_bit_wrap(bit, addr, size, start)                   \
-    for ((bit) = find_next_bit_wrap((addr), (size), (start));           \
-         (bit) < (size);                                                \
-         (bit) = __for_each_wrap((addr), (size), (start), (bit) + 1))
-#endif
-
 // Added in 2.6.24
 #ifndef ACCESS_ONCE
  #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
@@ -621,5 +590,4 @@ static inline pgprot_t uvm_pgprot_decrypted(pgprot_t prot)
  #include <asm/page.h>
  #define page_to_virt(x)    __va(PFN_PHYS(page_to_pfn(x)))
 #endif
-
 #endif // _UVM_LINUX_H
--- a/kernel-open/nvidia-uvm/uvm_map_external.c
+++ b/kernel-open/nvidia-uvm/uvm_map_external.c
@@ -355,7 +355,6 @@ static uvm_membar_t va_range_downgrade_membar(uvm_va_range_t *va_range, uvm_ext_
    if (!ext_gpu_map->mem_handle)
        return UVM_MEMBAR_GPU;

-    // EGM uses the same barriers as sysmem.
    return uvm_hal_downgrade_membar_type(ext_gpu_map->gpu,
                                         !ext_gpu_map->is_sysmem && ext_gpu_map->gpu == ext_gpu_map->owning_gpu);
 }
@@ -634,8 +633,6 @@ static NV_STATUS set_ext_gpu_map_location(uvm_ext_gpu_map_t *ext_gpu_map,
                                          const UvmGpuMemoryInfo *mem_info)
 {
    uvm_gpu_t *owning_gpu;
-    if (mem_info->egm)
-        UVM_ASSERT(mem_info->sysmem);

    if (!mem_info->deviceDescendant && !mem_info->sysmem) {
        ext_gpu_map->owning_gpu = NULL;
@@ -644,7 +641,6 @@ static NV_STATUS set_ext_gpu_map_location(uvm_ext_gpu_map_t *ext_gpu_map,
    }
    // This is a local or peer allocation, so the owning GPU must have been
    // registered.
-    // This also checks for if EGM owning GPU is registered.
    owning_gpu = uvm_va_space_get_gpu_by_uuid(va_space, &mem_info->uuid);
    if (!owning_gpu)
        return NV_ERR_INVALID_DEVICE;
@@ -655,10 +651,13 @@ static NV_STATUS set_ext_gpu_map_location(uvm_ext_gpu_map_t *ext_gpu_map,
    // crashes when it's eventually freed.
    // TODO: Bug 1811006: Bug tracking the RM issue, its fix might change the
    // semantics of sysmem allocations.
+    if (mem_info->sysmem) {
+        ext_gpu_map->owning_gpu = owning_gpu;
+        ext_gpu_map->is_sysmem = true;
+        return NV_OK;
+    }

-    // Check if peer access for peer memory is enabled.
-    // This path also handles EGM allocations.
-    if (owning_gpu != mapping_gpu && (!mem_info->sysmem || mem_info->egm)) {
+    if (owning_gpu != mapping_gpu) {
        // TODO: Bug 1757136: In SLI, the returned UUID may be different but a
        //       local mapping must be used. We need to query SLI groups to know
        //       that.
@@ -667,9 +666,7 @@ static NV_STATUS set_ext_gpu_map_location(uvm_ext_gpu_map_t *ext_gpu_map,
    }

    ext_gpu_map->owning_gpu = owning_gpu;
-    ext_gpu_map->is_sysmem = mem_info->sysmem;
-    ext_gpu_map->is_egm = mem_info->egm;
-
+    ext_gpu_map->is_sysmem = false;
    return NV_OK;
 }

@@ -722,7 +719,6 @@ static NV_STATUS uvm_ext_gpu_map_split(uvm_range_tree_t *tree,
    new->gpu = existing_map->gpu;
    new->owning_gpu = existing_map->owning_gpu;
    new->is_sysmem = existing_map->is_sysmem;
-    new->is_egm = existing_map->is_egm;

    // Initialize the new ext_gpu_map tracker as a copy of the existing_map tracker.
    // This way, any operations on any of the two ext_gpu_maps will be able to
--- a/kernel-open/nvidia-uvm/uvm_maxwell.c
+++ b/kernel-open/nvidia-uvm/uvm_maxwell.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016-2021 NVIDIA Corporation
+    Copyright (c) 2016-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -71,4 +71,6 @@ void uvm_hal_maxwell_arch_init_properties(uvm_parent_gpu_t *parent_gpu)
    parent_gpu->smc.supported = false;

    parent_gpu->plc_supported = false;
+
+    parent_gpu->no_ats_range_required = false;
 }
--- a/kernel-open/nvidia-uvm/uvm_maxwell_mmu.c
+++ b/kernel-open/nvidia-uvm/uvm_maxwell_mmu.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016-2021 NVIDIA Corporation
+    Copyright (c) 2016-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -106,10 +106,16 @@ static NvU64 small_half_pde_maxwell(uvm_mmu_page_table_alloc_t *phys_alloc)
    return pde_bits;
 }

-static void make_pde_maxwell(void *entry, uvm_mmu_page_table_alloc_t **phys_allocs, NvU32 depth)
+static void make_pde_maxwell(void *entry,
+                             uvm_mmu_page_table_alloc_t **phys_allocs,
+                             uvm_page_directory_t *dir,
+                             NvU32 child_index)
 {
    NvU64 pde_bits = 0;
-    UVM_ASSERT(depth == 0);
+
+    UVM_ASSERT(dir);
+    UVM_ASSERT(dir->depth == 0);
+
    pde_bits |= HWCONST64(_MMU, PDE, SIZE, FULL);
    pde_bits |= big_half_pde_maxwell(phys_allocs[MMU_BIG]) | small_half_pde_maxwell(phys_allocs[MMU_SMALL]);

--- a/kernel-open/nvidia-uvm/uvm_mem.c
+++ b/kernel-open/nvidia-uvm/uvm_mem.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016-2022 NVIDIA Corporation
+    Copyright (c) 2016-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -93,8 +93,9 @@ static bool sysmem_can_be_mapped_on_gpu(uvm_mem_t *sysmem)
 {
    UVM_ASSERT(uvm_mem_is_sysmem(sysmem));

-    // If SEV is enabled, only unprotected memory can be mapped
-    if (g_uvm_global.sev_enabled)
+    // In Confidential Computing, only unprotected memory can be mapped on the
+    // GPU
+    if (g_uvm_global.conf_computing_enabled)
        return uvm_mem_is_sysmem_dma(sysmem);

    return true;
@@ -737,7 +738,7 @@ static NV_STATUS mem_map_cpu_to_sysmem_kernel(uvm_mem_t *mem)
            pages[page_index] = mem_cpu_page(mem, page_index * PAGE_SIZE);
    }

-    if (g_uvm_global.sev_enabled && uvm_mem_is_sysmem_dma(mem))
+    if (g_uvm_global.conf_computing_enabled && uvm_mem_is_sysmem_dma(mem))
        prot = uvm_pgprot_decrypted(PAGE_KERNEL_NOENC);

    mem->kernel.cpu_addr = vmap(pages, num_pages, VM_MAP, prot);
--- a/kernel-open/nvidia-uvm/uvm_mem_test.c
+++ b/kernel-open/nvidia-uvm/uvm_mem_test.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016-2021 NVIDIA Corporation
+    Copyright (c) 2016-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -44,10 +44,10 @@ static NvU32 first_page_size(NvU32 page_sizes)

 static inline NV_STATUS __alloc_map_sysmem(NvU64 size, uvm_gpu_t *gpu, uvm_mem_t **sys_mem)
 {
-    if (g_uvm_global.sev_enabled)
+    if (g_uvm_global.conf_computing_enabled)
        return uvm_mem_alloc_sysmem_dma_and_map_cpu_kernel(size, gpu, current->mm, sys_mem);
-    else
-        return uvm_mem_alloc_sysmem_and_map_cpu_kernel(size, current->mm, sys_mem);
+
+    return uvm_mem_alloc_sysmem_and_map_cpu_kernel(size, current->mm, sys_mem);
 }

 static NV_STATUS check_accessible_from_gpu(uvm_gpu_t *gpu, uvm_mem_t *mem)
@@ -335,9 +335,6 @@ error:

 static bool should_test_page_size(size_t alloc_size, NvU32 page_size)
 {
-    if (g_uvm_global.sev_enabled)
-        return false;
-
    if (g_uvm_global.num_simulated_devices == 0)
        return true;

--- a/kernel-open/nvidia-uvm/uvm_migrate.c
+++ b/kernel-open/nvidia-uvm/uvm_migrate.c
@@ -130,9 +130,9 @@ static NV_STATUS block_migrate_map_unmapped_pages(uvm_va_block_t *va_block,
    NV_STATUS status = NV_OK;
    NV_STATUS tracker_status;

-    // Get the mask of unmapped pages because it will change after the
+    // Save the mask of unmapped pages because it will change after the
    // first map operation
-    uvm_va_block_unmapped_pages_get(va_block, region, &va_block_context->caller_page_mask);
+    uvm_page_mask_complement(&va_block_context->caller_page_mask, &va_block->maybe_mapped_pages);

    if (uvm_va_block_is_hmm(va_block) && !UVM_ID_IS_CPU(dest_id)) {
        // Do not map pages that are already resident on the CPU. This is in
@@ -147,7 +147,7 @@ static NV_STATUS block_migrate_map_unmapped_pages(uvm_va_block_t *va_block,
        // such pages at all, when migrating.
        uvm_page_mask_andnot(&va_block_context->caller_page_mask,
                             &va_block_context->caller_page_mask,
-                             uvm_va_block_resident_mask_get(va_block, UVM_ID_CPU, NUMA_NO_NODE));
+                             uvm_va_block_resident_mask_get(va_block, UVM_ID_CPU));
    }

    // Only map those pages that are not mapped anywhere else (likely due
@@ -377,7 +377,7 @@ static bool va_block_should_do_cpu_preunmap(uvm_va_block_t *va_block,

    mapped_pages_cpu = uvm_va_block_map_mask_get(va_block, UVM_ID_CPU);
    if (uvm_processor_mask_test(&va_block->resident, dest_id)) {
-        const uvm_page_mask_t *resident_pages_dest = uvm_va_block_resident_mask_get(va_block, dest_id, NUMA_NO_NODE);
+        const uvm_page_mask_t *resident_pages_dest = uvm_va_block_resident_mask_get(va_block, dest_id);
        uvm_page_mask_t *do_not_unmap_pages = &va_block_context->scratch_page_mask;

        // TODO: Bug 1877578
--- a/kernel-open/nvidia-uvm/uvm_migrate_pageable.c
+++ b/kernel-open/nvidia-uvm/uvm_migrate_pageable.c
@@ -836,6 +836,17 @@ static NV_STATUS migrate_pageable_vma_region(struct vm_area_struct *vma,
    return NV_OK;
 }

+NV_STATUS uvm_test_skip_migrate_vma(UVM_TEST_SKIP_MIGRATE_VMA_PARAMS *params, struct file *filp)
+{
+    uvm_va_space_t *va_space = uvm_va_space_get(filp);
+
+    uvm_va_space_down_write(va_space);
+    va_space->test.skip_migrate_vma = params->skip;
+    uvm_va_space_up_write(va_space);
+
+    return NV_OK;
+}
+
 static NV_STATUS migrate_pageable_vma(struct vm_area_struct *vma,
                                      unsigned long start,
                                      unsigned long outer,
@@ -858,6 +869,9 @@ static NV_STATUS migrate_pageable_vma(struct vm_area_struct *vma,
    start = max(start, vma->vm_start);
    outer = min(outer, vma->vm_end);

+    if (va_space->test.skip_migrate_vma)
+        return NV_WARN_NOTHING_TO_DO;
+
    // TODO: Bug 2419180: support file-backed pages in migrate_vma, when
    //       support for it is added to the Linux kernel
    if (!vma_is_anonymous(vma))
@@ -920,7 +934,9 @@ static NV_STATUS migrate_pageable(migrate_vma_state_t *state)
            bool touch = uvm_migrate_args->touch;
            uvm_populate_permissions_t populate_permissions = uvm_migrate_args->populate_permissions;

-            UVM_ASSERT(!vma_is_anonymous(vma) || uvm_processor_mask_empty(&va_space->registered_gpus));
+            UVM_ASSERT(va_space->test.skip_migrate_vma ||
+                       !vma_is_anonymous(vma) ||
+                       uvm_processor_mask_empty(&va_space->registered_gpus));

            // We can't use migrate_vma to move the pages as desired. Normally
            // this fallback path is supposed to populate the memory then inform
--- a/kernel-open/nvidia-uvm/uvm_migrate_pageable.h
+++ b/kernel-open/nvidia-uvm/uvm_migrate_pageable.h
@@ -51,7 +51,7 @@ typedef struct
 #if defined(CONFIG_MIGRATE_VMA_HELPER)
 #define UVM_MIGRATE_VMA_SUPPORTED 1
 #else
-#if defined(CONFIG_DEVICE_PRIVATE) && defined(NV_MIGRATE_VMA_SETUP_PRESENT)
+#if NV_IS_EXPORT_SYMBOL_PRESENT_migrate_vma_setup
 #define UVM_MIGRATE_VMA_SUPPORTED 1
 #endif
 #endif
@@ -218,6 +218,9 @@ NV_STATUS uvm_migrate_pageable(uvm_migrate_args_t *uvm_migrate_args);
 NV_STATUS uvm_migrate_pageable_init(void);

 void uvm_migrate_pageable_exit(void);
+
+NV_STATUS uvm_test_skip_migrate_vma(UVM_TEST_SKIP_MIGRATE_VMA_PARAMS *params, struct file *filp);
+
 #else // UVM_MIGRATE_VMA_SUPPORTED

 static NV_STATUS uvm_migrate_pageable(uvm_migrate_args_t *uvm_migrate_args)
@@ -251,6 +254,10 @@ static void uvm_migrate_pageable_exit(void)
 {
 }

+static inline NV_STATUS uvm_test_skip_migrate_vma(UVM_TEST_SKIP_MIGRATE_VMA_PARAMS *params, struct file *filp)
+{
+    return NV_OK;
+}
 #endif // UVM_MIGRATE_VMA_SUPPORTED

 #endif
--- a/kernel-open/nvidia-uvm/uvm_mmu.c
+++ b/kernel-open/nvidia-uvm/uvm_mmu.c
@@ -323,37 +323,156 @@ static void uvm_mmu_page_table_cpu_memset_16(uvm_gpu_t *gpu,
    uvm_mmu_page_table_cpu_unmap(gpu, phys_alloc);
 }

+static void pde_fill_cpu(uvm_page_tree_t *tree,
+                         uvm_page_directory_t *directory,
+                         NvU32 start_index,
+                         NvU32 pde_count,
+                         uvm_mmu_page_table_alloc_t **phys_addr)
+{
+    NvU64 pde_data[2], entry_size;
+    NvU32 i;
+
+    UVM_ASSERT(uvm_mmu_use_cpu(tree));
+
+    entry_size = tree->hal->entry_size(directory->depth);
+    UVM_ASSERT(sizeof(pde_data) >= entry_size);
+
+    for (i = 0; i < pde_count; i++) {
+        tree->hal->make_pde(pde_data, phys_addr, directory, start_index + i);
+
+        if (entry_size == sizeof(pde_data[0]))
+            uvm_mmu_page_table_cpu_memset_8(tree->gpu, &directory->phys_alloc, start_index + i, pde_data[0], 1);
+        else
+            uvm_mmu_page_table_cpu_memset_16(tree->gpu, &directory->phys_alloc, start_index + i, pde_data, 1);
+    }
+}
+
+static void pde_fill_gpu(uvm_page_tree_t *tree,
+                         uvm_page_directory_t *directory,
+                         NvU32 start_index,
+                         NvU32 pde_count,
+                         uvm_mmu_page_table_alloc_t **phys_addr,
+                         uvm_push_t *push)
+{
+    NvU64 pde_data[2], entry_size;
+    uvm_gpu_address_t pde_entry_addr = uvm_mmu_gpu_address(tree->gpu, directory->phys_alloc.addr);
+    NvU32 max_inline_entries;
+    uvm_push_flag_t push_membar_flag = UVM_PUSH_FLAG_COUNT;
+    uvm_gpu_address_t inline_data_addr;
+    uvm_push_inline_data_t inline_data;
+    NvU32 entry_count, i, j;
+
+    UVM_ASSERT(!uvm_mmu_use_cpu(tree));
+
+    entry_size = tree->hal->entry_size(directory->depth);
+    UVM_ASSERT(sizeof(pde_data) >= entry_size);
+
+    max_inline_entries = UVM_PUSH_INLINE_DATA_MAX_SIZE / entry_size;
+
+    if (uvm_push_get_and_reset_flag(push, UVM_PUSH_FLAG_NEXT_MEMBAR_NONE))
+        push_membar_flag = UVM_PUSH_FLAG_NEXT_MEMBAR_NONE;
+    else if (uvm_push_get_and_reset_flag(push, UVM_PUSH_FLAG_NEXT_MEMBAR_GPU))
+        push_membar_flag = UVM_PUSH_FLAG_NEXT_MEMBAR_GPU;
+
+    pde_entry_addr.address += start_index * entry_size;
+
+    for (i = 0; i < pde_count;) {
+        // All but the first memory operation can be pipelined. We respect the
+        // caller's pipelining settings for the first push.
+        if (i != 0)
+            uvm_push_set_flag(push, UVM_PUSH_FLAG_CE_NEXT_PIPELINED);
+
+        entry_count = min(pde_count - i, max_inline_entries);
+
+        // No membar is needed until the last memory operation. Otherwise,
+        // use caller's membar flag.
+        if ((i + entry_count) < pde_count)
+            uvm_push_set_flag(push, UVM_PUSH_FLAG_NEXT_MEMBAR_NONE);
+        else if (push_membar_flag != UVM_PUSH_FLAG_COUNT)
+            uvm_push_set_flag(push, push_membar_flag);
+
+        uvm_push_inline_data_begin(push, &inline_data);
+        for (j = 0; j < entry_count; j++) {
+            tree->hal->make_pde(pde_data, phys_addr, directory, start_index + i + j);
+            uvm_push_inline_data_add(&inline_data, pde_data, entry_size);
+        }
+        inline_data_addr = uvm_push_inline_data_end(&inline_data);
+
+        tree->gpu->parent->ce_hal->memcopy(push, pde_entry_addr, inline_data_addr, entry_count * entry_size);
+
+        i += entry_count;
+        pde_entry_addr.address += entry_size * entry_count;
+    }
+}
+
+// pde_fill() populates pde_count PDE entries (starting at start_index) with
+// the same mapping, i.e., with the same physical address (phys_addr).
+// pde_fill() is optimized for pde_count == 1, which is the common case.
+static void pde_fill(uvm_page_tree_t *tree,
+                     uvm_page_directory_t *directory,
+                     NvU32 start_index,
+                     NvU32 pde_count,
+                     uvm_mmu_page_table_alloc_t **phys_addr,
+                     uvm_push_t *push)
+{
+    UVM_ASSERT(start_index + pde_count <= uvm_mmu_page_tree_entries(tree, directory->depth, UVM_PAGE_SIZE_AGNOSTIC));
+
+    if (push)
+        pde_fill_gpu(tree, directory, start_index, pde_count, phys_addr, push);
+    else
+        pde_fill_cpu(tree, directory, start_index, pde_count, phys_addr);
+}
+
 static void phys_mem_init(uvm_page_tree_t *tree, NvU32 page_size, uvm_page_directory_t *dir, uvm_push_t *push)
 {
-    NvU64 clear_bits[2];
-    uvm_mmu_mode_hal_t *hal = tree->hal;
+    NvU32 entries_count = uvm_mmu_page_tree_entries(tree, dir->depth, page_size);
+    NvU8 max_pde_depth = tree->hal->page_table_depth(UVM_PAGE_SIZE_AGNOSTIC) - 1;

-    if (dir->depth == tree->hal->page_table_depth(page_size)) {
-        *clear_bits = 0; // Invalid PTE
-    }
-    else {
-        // passing in NULL for the phys_allocs will mark the child entries as invalid
-        uvm_mmu_page_table_alloc_t *phys_allocs[2] = {NULL, NULL};
-        hal->make_pde(clear_bits, phys_allocs, dir->depth);
+    // Passing in NULL for the phys_allocs will mark the child entries as
+    // invalid.
+    uvm_mmu_page_table_alloc_t *phys_allocs[2] = {NULL, NULL};

-        // Make sure that using only clear_bits[0] will work
-        UVM_ASSERT(hal->entry_size(dir->depth) == sizeof(clear_bits[0]) || clear_bits[0] == clear_bits[1]);
-    }
+    // Init with an invalid PTE or clean PDE. Only Maxwell PDEs can have more
+    // than 512 entries. In this case, we initialize them all with the same
+    // clean PDE. ATS systems may require clean PDEs with
+    // ATS_ALLOWED/ATS_NOT_ALLOWED bit settings based on the mapping VA.
+    // We only clean_bits to 0 at the lowest page table level (PTE table), i.e.,
+    // when depth is greater than the max_pde_depth.
+    if ((dir->depth > max_pde_depth) || (entries_count > 512 && !g_uvm_global.ats.enabled)) {
+        NvU64 clear_bits[2];

-    // initialize the memory to a reasonable value
-    if (push) {
-        tree->gpu->parent->ce_hal->memset_8(push,
-                                            uvm_mmu_gpu_address(tree->gpu, dir->phys_alloc.addr),
+        // If it is not a PTE, make a clean PDE.
+        if (dir->depth != tree->hal->page_table_depth(page_size)) {
+            // make_pde() child index is zero/ignored, since it is only used in
+            // PDEs on ATS-enabled systems where pde_fill() is preferred.
+            tree->hal->make_pde(clear_bits, phys_allocs, dir, 0);
+
+            // Make sure that using only clear_bits[0] will work.
+            UVM_ASSERT(tree->hal->entry_size(dir->depth) == sizeof(clear_bits[0]) || clear_bits[0] == clear_bits[1]);
+        }
+        else {
+            *clear_bits = 0;
+        }
+
+        // Initialize the memory to a reasonable value.
+        if (push) {
+            tree->gpu->parent->ce_hal->memset_8(push,
+                                                uvm_mmu_gpu_address(tree->gpu, dir->phys_alloc.addr),
+                                                *clear_bits,
+                                                dir->phys_alloc.size);
+        }
+        else {
+            uvm_mmu_page_table_cpu_memset_8(tree->gpu,
+                                            &dir->phys_alloc,
+                                            0,
                                            *clear_bits,
-                                            dir->phys_alloc.size);
+                                            dir->phys_alloc.size / sizeof(*clear_bits));
+        }
    }
    else {
-        uvm_mmu_page_table_cpu_memset_8(tree->gpu,
-                                        &dir->phys_alloc,
-                                        0,
-                                        *clear_bits,
-                                        dir->phys_alloc.size / sizeof(*clear_bits));
+        pde_fill(tree, dir, 0, entries_count, phys_allocs, push);
    }
+
 }

 static uvm_page_directory_t *allocate_directory(uvm_page_tree_t *tree,
@@ -367,8 +486,10 @@ static uvm_page_directory_t *allocate_directory(uvm_page_tree_t *tree,
    NvLength phys_alloc_size = hal->allocation_size(depth, page_size);
    uvm_page_directory_t *dir;

-    // The page tree doesn't cache PTEs so space is not allocated for entries that are always PTEs.
-    // 2M PTEs may later become PDEs so pass UVM_PAGE_SIZE_AGNOSTIC, not page_size.
+    // The page tree doesn't cache PTEs so space is not allocated for entries
+    // that are always PTEs.
+    // 2M PTEs may later become PDEs so pass UVM_PAGE_SIZE_AGNOSTIC, not
+    // page_size.
    if (depth == hal->page_table_depth(UVM_PAGE_SIZE_AGNOSTIC))
        entry_count = 0;
    else
@@ -409,108 +530,6 @@ static inline NvU32 index_to_entry(uvm_mmu_mode_hal_t *hal, NvU32 entry_index, N
    return hal->entries_per_index(depth) * entry_index + hal->entry_offset(depth, page_size);
 }

-static void pde_fill_cpu(uvm_page_tree_t *tree,
-                         NvU32 depth,
-                         uvm_mmu_page_table_alloc_t *directory,
-                         NvU32 start_index,
-                         NvU32 pde_count,
-                         uvm_mmu_page_table_alloc_t **phys_addr)
-{
-    NvU64 pde_data[2], entry_size;
-
-    UVM_ASSERT(uvm_mmu_use_cpu(tree));
-    entry_size = tree->hal->entry_size(depth);
-    UVM_ASSERT(sizeof(pde_data) >= entry_size);
-
-    tree->hal->make_pde(pde_data, phys_addr, depth);
-
-    if (entry_size == sizeof(pde_data[0]))
-        uvm_mmu_page_table_cpu_memset_8(tree->gpu, directory, start_index, pde_data[0], pde_count);
-    else
-        uvm_mmu_page_table_cpu_memset_16(tree->gpu, directory, start_index, pde_data, pde_count);
-}
-
-static void pde_fill_gpu(uvm_page_tree_t *tree,
-                         NvU32 depth,
-                         uvm_mmu_page_table_alloc_t *directory,
-                         NvU32 start_index,
-                         NvU32 pde_count,
-                         uvm_mmu_page_table_alloc_t **phys_addr,
-                         uvm_push_t *push)
-{
-    NvU64 pde_data[2], entry_size;
-    uvm_gpu_address_t pde_entry_addr = uvm_mmu_gpu_address(tree->gpu, directory->addr);
-
-    UVM_ASSERT(!uvm_mmu_use_cpu(tree));
-
-    entry_size = tree->hal->entry_size(depth);
-    UVM_ASSERT(sizeof(pde_data) >= entry_size);
-
-    tree->hal->make_pde(pde_data, phys_addr, depth);
-    pde_entry_addr.address += start_index * entry_size;
-
-    if (entry_size == sizeof(pde_data[0])) {
-        tree->gpu->parent->ce_hal->memset_8(push, pde_entry_addr, pde_data[0], sizeof(pde_data[0]) * pde_count);
-    }
-    else {
-        NvU32 max_inline_entries = UVM_PUSH_INLINE_DATA_MAX_SIZE / sizeof(pde_data);
-        uvm_gpu_address_t inline_data_addr;
-        uvm_push_inline_data_t inline_data;
-        uvm_push_flag_t push_membar_flag = UVM_PUSH_FLAG_COUNT;
-        NvU32 i;
-
-        if (uvm_push_get_and_reset_flag(push, UVM_PUSH_FLAG_NEXT_MEMBAR_NONE))
-            push_membar_flag = UVM_PUSH_FLAG_NEXT_MEMBAR_NONE;
-        else if (uvm_push_get_and_reset_flag(push, UVM_PUSH_FLAG_NEXT_MEMBAR_GPU))
-            push_membar_flag = UVM_PUSH_FLAG_NEXT_MEMBAR_GPU;
-
-        for (i = 0; i < pde_count;) {
-            NvU32 j;
-            NvU32 entry_count = min(pde_count - i, max_inline_entries);
-
-            uvm_push_inline_data_begin(push, &inline_data);
-            for (j = 0; j < entry_count; j++)
-                uvm_push_inline_data_add(&inline_data, pde_data, sizeof(pde_data));
-            inline_data_addr = uvm_push_inline_data_end(&inline_data);
-
-            // All but the first memcopy can be pipelined. We respect the
-            // caller's pipelining settings for the first push.
-            if (i != 0)
-                uvm_push_set_flag(push, UVM_PUSH_FLAG_CE_NEXT_PIPELINED);
-
-            // No membar is needed until the last copy. Otherwise, use
-            // caller's membar flag.
-            if (i + entry_count < pde_count)
-                uvm_push_set_flag(push, UVM_PUSH_FLAG_NEXT_MEMBAR_NONE);
-            else if (push_membar_flag != UVM_PUSH_FLAG_COUNT)
-                uvm_push_set_flag(push, push_membar_flag);
-
-            tree->gpu->parent->ce_hal->memcopy(push, pde_entry_addr, inline_data_addr, entry_count * sizeof(pde_data));
-
-            i += entry_count;
-            pde_entry_addr.address += sizeof(pde_data) * entry_count;
-        }
-    }
-}
-
-// pde_fill() populates pde_count PDE entries (starting at start_index) with
-// the same mapping, i.e., with the same physical address (phys_addr).
-static void pde_fill(uvm_page_tree_t *tree,
-                     NvU32 depth,
-                     uvm_mmu_page_table_alloc_t *directory,
-                     NvU32 start_index,
-                     NvU32 pde_count,
-                     uvm_mmu_page_table_alloc_t **phys_addr,
-                     uvm_push_t *push)
-{
-    UVM_ASSERT(start_index + pde_count <= uvm_mmu_page_tree_entries(tree, depth, UVM_PAGE_SIZE_AGNOSTIC));
-
-    if (push)
-        pde_fill_gpu(tree, depth, directory, start_index, pde_count, phys_addr, push);
-    else
-        pde_fill_cpu(tree, depth, directory, start_index, pde_count, phys_addr);
-}
-
 static uvm_page_directory_t *host_pde_write(uvm_page_directory_t *dir,
                                            uvm_page_directory_t *parent,
                                            NvU32 index_in_parent)
@@ -540,7 +559,7 @@ static void pde_write(uvm_page_tree_t *tree,
            phys_allocs[i] = &entry->phys_alloc;
    }

-    pde_fill(tree, dir->depth, &dir->phys_alloc, entry_index, 1, phys_allocs, push);
+    pde_fill(tree, dir, entry_index, 1, phys_allocs, push);
 }

 static void host_pde_clear(uvm_page_tree_t *tree, uvm_page_directory_t *dir, NvU32 entry_index, NvU32 page_size)
@@ -800,7 +819,6 @@ static void free_unused_directories(uvm_page_tree_t *tree,
            }
        }
    }
-
 }

 static NV_STATUS allocate_page_table(uvm_page_tree_t *tree, NvU32 page_size, uvm_mmu_page_table_alloc_t *out)
@@ -811,10 +829,93 @@ static NV_STATUS allocate_page_table(uvm_page_tree_t *tree, NvU32 page_size, uvm
    return phys_mem_allocate(tree, alloc_size, tree->location, UVM_PMM_ALLOC_FLAGS_EVICT, out);
 }

+static bool page_tree_ats_init_required(uvm_page_tree_t *tree)
+{
+    // We have full control of the kernel page tables mappings, no ATS address
+    // aliases is expected.
+    if (tree->type == UVM_PAGE_TREE_TYPE_KERNEL)
+        return false;
+
+    // Enable uvm_page_tree_init() from the page_tree test.
+    if (uvm_enable_builtin_tests && tree->gpu_va_space == NULL)
+        return false;
+
+    if (!tree->gpu_va_space->ats.enabled)
+        return false;
+
+    return tree->gpu->parent->no_ats_range_required;
+}
+
+static NV_STATUS page_tree_ats_init(uvm_page_tree_t *tree)
+{
+    NV_STATUS status;
+    NvU64 min_va_upper, max_va_lower;
+    NvU32 page_size;
+
+    if (!page_tree_ats_init_required(tree))
+        return NV_OK;
+
+    page_size = uvm_mmu_biggest_page_size(tree);
+
+    uvm_cpu_get_unaddressable_range(&max_va_lower, &min_va_upper);
+
+    // Potential violation of the UVM internal get/put_ptes contract. get_ptes()
+    // creates and initializes enough PTEs to populate all PDEs covering the
+    // no_ats_ranges. We store the no_ats_ranges in the tree, so they can be
+    // put_ptes()'ed on deinit(). It doesn't preclude the range to be used by a
+    // future get_ptes(), since we don't write to the PTEs (range->table) from
+    // the tree->no_ats_ranges.
+    //
+    // Lower half
+    status = uvm_page_tree_get_ptes(tree,
+                                    page_size,
+                                    max_va_lower,
+                                    page_size,
+                                    UVM_PMM_ALLOC_FLAGS_EVICT,
+                                    &tree->no_ats_ranges[0]);
+    if (status != NV_OK)
+        return status;
+
+    UVM_ASSERT(tree->no_ats_ranges[0].entry_count == 1);
+
+    if (uvm_platform_uses_canonical_form_address()) {
+        // Upper half
+        status = uvm_page_tree_get_ptes(tree,
+                                        page_size,
+                                        min_va_upper - page_size,
+                                        page_size,
+                                        UVM_PMM_ALLOC_FLAGS_EVICT,
+                                        &tree->no_ats_ranges[1]);
+        if (status != NV_OK)
+            return status;
+
+        UVM_ASSERT(tree->no_ats_ranges[1].entry_count == 1);
+    }
+
+    return NV_OK;
+}
+
+static void page_tree_ats_deinit(uvm_page_tree_t *tree)
+{
+    size_t i;
+
+    if (page_tree_ats_init_required(tree)) {
+        for (i = 0; i < ARRAY_SIZE(tree->no_ats_ranges); i++) {
+            if (tree->no_ats_ranges[i].entry_count)
+                uvm_page_tree_put_ptes(tree, &tree->no_ats_ranges[i]);
+        }
+
+        memset(tree->no_ats_ranges, 0, sizeof(tree->no_ats_ranges));
+    }
+}
+
 static void map_remap_deinit(uvm_page_tree_t *tree)
 {
-    if (tree->map_remap.pde0.size)
-        phys_mem_deallocate(tree, &tree->map_remap.pde0);
+    if (tree->map_remap.pde0) {
+        phys_mem_deallocate(tree, &tree->map_remap.pde0->phys_alloc);
+        uvm_kvfree(tree->map_remap.pde0);
+        tree->map_remap.pde0 = NULL;
+    }

    if (tree->map_remap.ptes_invalid_4k.size)
        phys_mem_deallocate(tree, &tree->map_remap.ptes_invalid_4k);
@@ -839,10 +940,16 @@ static NV_STATUS map_remap_init(uvm_page_tree_t *tree)
    // PDE1-depth(512M) PTE. We first map it to the pde0 directory, then we
    // return the PTE for the get_ptes()'s caller.
    if (tree->hal->page_sizes() & UVM_PAGE_SIZE_512M) {
-        status = allocate_page_table(tree, UVM_PAGE_SIZE_2M, &tree->map_remap.pde0);
-        if (status != NV_OK)
+        tree->map_remap.pde0 = allocate_directory(tree,
+                                                  UVM_PAGE_SIZE_2M,
+                                                  tree->hal->page_table_depth(UVM_PAGE_SIZE_2M),
+                                                  UVM_PMM_ALLOC_FLAGS_EVICT);
+        if (tree->map_remap.pde0 == NULL) {
+            status = NV_ERR_NO_MEMORY;
            goto error;
+        }
    }
+
    status = page_tree_begin_acquire(tree, &tree->tracker, &push, "map remap init");
    if (status != NV_OK)
        goto error;
@@ -864,22 +971,23 @@ static NV_STATUS map_remap_init(uvm_page_tree_t *tree)
        uvm_mmu_page_table_alloc_t *phys_allocs[2] = {NULL, NULL};
        NvU32 depth = tree->hal->page_table_depth(UVM_PAGE_SIZE_4K) - 1;
        size_t index_4k = tree->hal->entry_offset(depth, UVM_PAGE_SIZE_4K);
-
-        // pde0 depth equals UVM_PAGE_SIZE_2M.
-        NvU32 pde0_depth = tree->hal->page_table_depth(UVM_PAGE_SIZE_2M);
-        NvU32 pde0_entries = tree->map_remap.pde0.size / tree->hal->entry_size(pde0_depth);
+        NvU32 pde0_entries = tree->map_remap.pde0->phys_alloc.size / tree->hal->entry_size(tree->map_remap.pde0->depth);

        // The big-page entry is NULL which makes it an invalid entry.
        phys_allocs[index_4k] = &tree->map_remap.ptes_invalid_4k;

        // By default CE operations include a MEMBAR_SYS. MEMBAR_GPU is
        // sufficient when pde0 is allocated in VIDMEM.
-        if (tree->map_remap.pde0.addr.aperture == UVM_APERTURE_VID)
+        if (tree->map_remap.pde0->phys_alloc.addr.aperture == UVM_APERTURE_VID)
            uvm_push_set_flag(&push, UVM_PUSH_FLAG_NEXT_MEMBAR_GPU);

+        // This is an orphan directory, make_pde() requires a directory to
+        // compute the VA. The UVM depth map_remap() operates on is not in the
+        // range make_pde() must operate. We only need to supply the fields used
+        // by make_pde() to not access invalid memory addresses.
+
        pde_fill(tree,
-                 pde0_depth,
-                 &tree->map_remap.pde0,
+                 tree->map_remap.pde0,
                 0,
                 pde0_entries,
                 (uvm_mmu_page_table_alloc_t **)&phys_allocs,
@@ -906,11 +1014,10 @@ error:
 // --------------|-------------------------||----------------|----------------
 //    vidmem     |           -             ||    vidmem      |      false
 //    sysmem     |           -             ||    sysmem      |      false
-//    default    |        <not set>        ||    vidmem      |      true (1)
+//    default    |        <not set>        ||    vidmem      |      true
 //    default    |         vidmem          ||    vidmem      |      false
 //    default    |         sysmem          ||    sysmem      |      false
 //
-// (1) When SEV mode is enabled, the fallback path is disabled.
 //
 // In SR-IOV heavy the the page tree must be in vidmem, to prevent guest drivers
 // from updating GPU page tables without hypervisor knowledge.
@@ -926,28 +1033,27 @@ error:
 //
 static void page_tree_set_location(uvm_page_tree_t *tree, uvm_aperture_t location)
 {
-    bool should_location_be_vidmem;
    UVM_ASSERT(tree->gpu != NULL);
    UVM_ASSERT_MSG((location == UVM_APERTURE_VID) ||
                   (location == UVM_APERTURE_SYS) ||
                   (location == UVM_APERTURE_DEFAULT),
                   "Invalid location %s (%d)\n", uvm_aperture_string(location), (int)location);

-    should_location_be_vidmem = uvm_gpu_is_virt_mode_sriov_heavy(tree->gpu)
-                                || uvm_conf_computing_mode_enabled(tree->gpu);
-
    // The page tree of a "fake" GPU used during page tree testing can be in
-    // sysmem even if should_location_be_vidmem is true. A fake GPU can be
-    // identified by having no channel manager.
-    if ((tree->gpu->channel_manager != NULL) && should_location_be_vidmem)
-        UVM_ASSERT(location == UVM_APERTURE_VID);
+    // sysmem in scenarios where a "real" GPU must be in vidmem. Fake GPUs can
+    // be identified by having no channel manager.
+    if (tree->gpu->channel_manager != NULL) {
+
+        if (uvm_gpu_is_virt_mode_sriov_heavy(tree->gpu))
+            UVM_ASSERT(location == UVM_APERTURE_VID);
+        else if (uvm_conf_computing_mode_enabled(tree->gpu))
+            UVM_ASSERT(location == UVM_APERTURE_VID);
+    }

    if (location == UVM_APERTURE_DEFAULT) {
        if (page_table_aperture == UVM_APERTURE_DEFAULT) {
            tree->location = UVM_APERTURE_VID;
-
-            // See the comment (1) above.
-            tree->location_sys_fallback = !g_uvm_global.sev_enabled;
+            tree->location_sys_fallback = true;
        }
        else {
            tree->location = page_table_aperture;
@@ -1008,11 +1114,22 @@ NV_STATUS uvm_page_tree_init(uvm_gpu_t *gpu,
        return status;

    phys_mem_init(tree, UVM_PAGE_SIZE_AGNOSTIC, tree->root, &push);
-    return page_tree_end_and_wait(tree, &push);
+
+    status = page_tree_end_and_wait(tree, &push);
+    if (status != NV_OK)
+        return status;
+
+    status = page_tree_ats_init(tree);
+    if (status != NV_OK)
+        return status;
+
+    return NV_OK;
 }

 void uvm_page_tree_deinit(uvm_page_tree_t *tree)
 {
+    page_tree_ats_deinit(tree);
+
    UVM_ASSERT(tree->root->ref_count == 0);

    // Take the tree lock only to avoid assertions. It is not required for
@@ -1251,7 +1368,6 @@ static NV_STATUS try_get_ptes(uvm_page_tree_t *tree,
        UVM_ASSERT(uvm_gpu_can_address_kernel(tree->gpu, start, size));

    while (true) {
-
        // index of the entry, for the first byte of the range, within its
        // containing directory
        NvU32 start_index;
@@ -1283,7 +1399,8 @@ static NV_STATUS try_get_ptes(uvm_page_tree_t *tree,
                if (dir_cache[dir->depth] == NULL) {
                    *cur_depth = dir->depth;

-                    // Undo the changes to the tree so that the dir cache remains private to the thread
+                    // Undo the changes to the tree so that the dir cache
+                    // remains private to the thread.
                    for (i = 0; i < used_count; i++)
                        host_pde_clear(tree, dirs_used[i]->host_parent, dirs_used[i]->index_in_parent, page_size);

@@ -1334,10 +1451,9 @@ static NV_STATUS map_remap(uvm_page_tree_t *tree, NvU64 start, NvLength size, uv
    if (uvm_page_table_range_aperture(range) == UVM_APERTURE_VID)
        uvm_push_set_flag(&push, UVM_PUSH_FLAG_NEXT_MEMBAR_GPU);

-    phys_alloc[0] = &tree->map_remap.pde0;
+    phys_alloc[0] = &tree->map_remap.pde0->phys_alloc;
    pde_fill(tree,
-             range->table->depth,
-             &range->table->phys_alloc,
+             range->table,
             range->start_index,
             range->entry_count,
             (uvm_mmu_page_table_alloc_t **)&phys_alloc,
@@ -1382,7 +1498,8 @@ NV_STATUS uvm_page_tree_get_ptes_async(uvm_page_tree_t *tree,
                                  dir_cache)) == NV_ERR_MORE_PROCESSING_REQUIRED) {
        uvm_mutex_unlock(&tree->lock);

-        // try_get_ptes never needs depth 0, so store a directory at its parent's depth
+        // try_get_ptes never needs depth 0, so store a directory at its
+        // parent's depth.
        // TODO: Bug 1766655: Allocate everything below cur_depth instead of
        //       retrying for every level.
        dir_cache[cur_depth] = allocate_directory(tree, page_size, cur_depth + 1, pmm_flags);
@@ -1665,8 +1782,12 @@ NV_STATUS uvm_page_table_range_vec_init(uvm_page_tree_t *tree,
                                              range);
        if (status != NV_OK) {
            UVM_ERR_PRINT("Failed to get PTEs for subrange %zd [0x%llx, 0x%llx) size 0x%llx, part of [0x%llx, 0x%llx)\n",
-                    i, range_start, range_start + range_size, range_size,
-                    start, size);
+                          i,
+                          range_start,
+                          range_start + range_size,
+                          range_size,
+                          start,
+                          size);
            goto out;
        }
    }
--- a/kernel-open/nvidia-uvm/uvm_mmu.h
+++ b/kernel-open/nvidia-uvm/uvm_mmu.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2015-2022 NVIDIA Corporation
+    Copyright (c) 2015-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -215,11 +215,14 @@ struct uvm_mmu_mode_hal_struct
    // memory out-of-range error so we can immediately identify bad PTE usage.
    NvU64 (*poisoned_pte)(void);

-    // write a PDE bit-pattern to entry based on the data in entries (which may
+    // Write a PDE bit-pattern to entry based on the data in allocs (which may
    // point to two items for dual PDEs).
-    // any of allocs are allowed to be NULL, in which case they are to be
-    // treated as empty.
-    void (*make_pde)(void *entry, uvm_mmu_page_table_alloc_t **allocs, NvU32 depth);
+    // Any of allocs are allowed to be NULL, in which case they are to be
+    // treated as empty. make_pde() uses dir and child_index to compute the
+    // mapping PDE VA. On ATS-enabled systems, we may set PDE's PCF as
+    // ATS_ALLOWED or ATS_NOT_ALLOWED based on the mapping PDE VA, even for
+    // invalid/clean PDE entries.
+    void (*make_pde)(void *entry, uvm_mmu_page_table_alloc_t **allocs, uvm_page_directory_t *dir, NvU32 child_index);

    // size of an entry in a directory/table.  Generally either 8 or 16 bytes.
    // (in the case of Pascal dual PDEs)
@@ -229,7 +232,7 @@ struct uvm_mmu_mode_hal_struct
    NvU32 (*entries_per_index)(NvU32 depth);

    // For dual PDEs, this is ether 1 or 0, depending on the page size.
-    // This is used to index the host copy only.  GPU PDEs are always entirely
+    // This is used to index the host copy only. GPU PDEs are always entirely
    // re-written using make_pde.
    NvLength (*entry_offset)(NvU32 depth, NvU32 page_size);

@@ -295,11 +298,16 @@ struct uvm_page_tree_struct

        // PDE0 where all big-page entries are invalid, and small-page entries
        // point to ptes_invalid_4k.
-        // pde0 is only used on Pascal-Ampere, i.e., they have the same PDE
-        // format.
-        uvm_mmu_page_table_alloc_t pde0;
+        // pde0 is used on Pascal+ GPUs, i.e., they have the same PDE format.
+        uvm_page_directory_t *pde0;
    } map_remap;

+    // On ATS-enabled systems where the CPU VA width is smaller than the GPU VA
+    // width, the excess address range is set with ATS_NOT_ALLOWED on all  leaf
+    // PDEs covering that range. We have at most 2 no_ats_ranges, due to
+    // canonical form address systems.
+    uvm_page_table_range_t no_ats_ranges[2];
+
    // Tracker for all GPU operations on the tree
    uvm_tracker_t tracker;
 };
@@ -365,21 +373,32 @@ void uvm_page_tree_deinit(uvm_page_tree_t *tree);
 // the same page size without an intervening put_ptes. To duplicate a subset of
 // an existing range or change the size of an existing range, use
 // uvm_page_table_range_get_upper() and/or uvm_page_table_range_shrink().
-NV_STATUS uvm_page_tree_get_ptes(uvm_page_tree_t *tree, NvU32 page_size, NvU64 start, NvLength size,
-        uvm_pmm_alloc_flags_t pmm_flags, uvm_page_table_range_t *range);
+NV_STATUS uvm_page_tree_get_ptes(uvm_page_tree_t *tree,
+                                 NvU32 page_size,
+                                 NvU64 start,
+                                 NvLength size,
+                                 uvm_pmm_alloc_flags_t pmm_flags,
+                                 uvm_page_table_range_t *range);

 // Same as uvm_page_tree_get_ptes(), but doesn't synchronize the GPU work.
 //
 // All pending operations can be waited on with uvm_page_tree_wait().
-NV_STATUS uvm_page_tree_get_ptes_async(uvm_page_tree_t *tree, NvU32 page_size, NvU64 start, NvLength size,
-        uvm_pmm_alloc_flags_t pmm_flags, uvm_page_table_range_t *range);
+NV_STATUS uvm_page_tree_get_ptes_async(uvm_page_tree_t *tree,
+                                       NvU32 page_size,
+                                       NvU64 start,
+                                       NvLength size,
+                                       uvm_pmm_alloc_flags_t pmm_flags,
+                                       uvm_page_table_range_t *range);

 // Returns a single-entry page table range for the addresses passed.
 // The size parameter must be a page size supported by this tree.
 // This is equivalent to calling uvm_page_tree_get_ptes() with size equal to
 // page_size.
-NV_STATUS uvm_page_tree_get_entry(uvm_page_tree_t *tree, NvU32 page_size, NvU64 start,
-        uvm_pmm_alloc_flags_t pmm_flags, uvm_page_table_range_t *single);
+NV_STATUS uvm_page_tree_get_entry(uvm_page_tree_t *tree,
+                                  NvU32 page_size,
+                                  NvU64 start,
+                                  uvm_pmm_alloc_flags_t pmm_flags,
+                                  uvm_page_table_range_t *single);

 // For a single-entry page table range, write the PDE (which could be a dual
 // PDE) to the GPU.
@@ -478,8 +497,8 @@ NV_STATUS uvm_page_table_range_vec_create(uvm_page_tree_t *tree,
 // new_range_vec will contain the upper portion of range_vec, starting at
 // new_end + 1.
 //
-// new_end + 1 is required to be within the address range of range_vec and be aligned to
-// range_vec's page_size.
+// new_end + 1 is required to be within the address range of range_vec and be
+// aligned to range_vec's page_size.
 //
 // On failure, the original range vector is left unmodified.
 NV_STATUS uvm_page_table_range_vec_split_upper(uvm_page_table_range_vec_t *range_vec,
@@ -501,18 +520,22 @@ void uvm_page_table_range_vec_destroy(uvm_page_table_range_vec_t *range_vec);
 // for each offset.
 // The caller_data pointer is what the caller passed in as caller_data to
 // uvm_page_table_range_vec_write_ptes().
-typedef NvU64 (*uvm_page_table_range_pte_maker_t)(uvm_page_table_range_vec_t *range_vec, NvU64 offset,
-        void *caller_data);
+typedef NvU64 (*uvm_page_table_range_pte_maker_t)(uvm_page_table_range_vec_t *range_vec,
+                                                  NvU64 offset,
+                                                  void *caller_data);

-// Write all PTEs covered by the range vector using the given PTE making function.
+// Write all PTEs covered by the range vector using the given PTE making
+// function.
 //
 // After writing all the PTEs a TLB invalidate operation is performed including
 // the passed in tlb_membar.
 //
 // See comments about uvm_page_table_range_pte_maker_t for details about the
 // PTE making callback.
-NV_STATUS uvm_page_table_range_vec_write_ptes(uvm_page_table_range_vec_t *range_vec, uvm_membar_t tlb_membar,
-        uvm_page_table_range_pte_maker_t pte_maker, void *caller_data);
+NV_STATUS uvm_page_table_range_vec_write_ptes(uvm_page_table_range_vec_t *range_vec,
+                                              uvm_membar_t tlb_membar,
+                                              uvm_page_table_range_pte_maker_t pte_maker,
+                                              void *caller_data);

 // Set all PTEs covered by the range vector to an empty PTE
 //
@@ -636,8 +659,9 @@ static NvU64 uvm_page_table_range_size(uvm_page_table_range_t *range)

 // Get the physical address of the entry at entry_index within the range
 // (counted from range->start_index).
-static uvm_gpu_phys_address_t uvm_page_table_range_entry_address(uvm_page_tree_t *tree, uvm_page_table_range_t *range,
-        size_t entry_index)
+static uvm_gpu_phys_address_t uvm_page_table_range_entry_address(uvm_page_tree_t *tree,
+                                                                 uvm_page_table_range_t *range,
+                                                                 size_t entry_index)
 {
    NvU32 entry_size = uvm_mmu_pte_size(tree, range->page_size);
    uvm_gpu_phys_address_t entry = range->table->phys_alloc.addr;
--- a/kernel-open/nvidia-uvm/uvm_page_tree_test.c
+++ b/kernel-open/nvidia-uvm/uvm_page_tree_test.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2015-2022 NVIDIA Corporation
+    Copyright (c) 2015-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -146,9 +146,15 @@ static void fake_tlb_invals_disable(void)
    g_fake_tlb_invals_tracking_enabled = false;
 }

-// Fake TLB invalidate VA that just saves off the parameters so that they can be verified later
-static void fake_tlb_invalidate_va(uvm_push_t *push, uvm_gpu_phys_address_t pdb,
-        NvU32 depth, NvU64 base, NvU64 size, NvU32 page_size, uvm_membar_t membar)
+// Fake TLB invalidate VA that just saves off the parameters so that they can be
+// verified later.
+static void fake_tlb_invalidate_va(uvm_push_t *push,
+                                   uvm_gpu_phys_address_t pdb,
+                                   NvU32 depth,
+                                   NvU64 base,
+                                   NvU64 size,
+                                   NvU32 page_size,
+                                   uvm_membar_t membar)
 {
    if (!g_fake_tlb_invals_tracking_enabled)
        return;
@@ -210,8 +216,8 @@ static bool assert_and_reset_last_invalidate(NvU32 expected_depth, bool expected
    }
    if ((g_last_fake_inval->membar == UVM_MEMBAR_NONE) == expected_membar) {
        UVM_TEST_PRINT("Expected %s membar, got %s instead\n",
-                expected_membar ? "a" : "no",
-                uvm_membar_string(g_last_fake_inval->membar));
+                       expected_membar ? "a" : "no",
+                       uvm_membar_string(g_last_fake_inval->membar));
        result = false;
    }

@@ -230,7 +236,8 @@ static bool assert_last_invalidate_all(NvU32 expected_depth, bool expected_memba
    }
    if (g_last_fake_inval->base != 0 || g_last_fake_inval->size != -1) {
        UVM_TEST_PRINT("Expected invalidate all but got range [0x%llx, 0x%llx) instead\n",
-                g_last_fake_inval->base, g_last_fake_inval->base + g_last_fake_inval->size);
+                       g_last_fake_inval->base,
+                       g_last_fake_inval->base + g_last_fake_inval->size);
        return false;
    }
    if (g_last_fake_inval->depth != expected_depth) {
@@ -247,15 +254,16 @@ static bool assert_invalidate_range_specific(fake_tlb_invalidate_t *inval,
    UVM_ASSERT(g_fake_tlb_invals_tracking_enabled);

    if (g_fake_invals_count == 0) {
-        UVM_TEST_PRINT("Expected an invalidate for range [0x%llx, 0x%llx), but got none\n",
-                base, base + size);
+        UVM_TEST_PRINT("Expected an invalidate for range [0x%llx, 0x%llx), but got none\n", base, base + size);
        return false;
    }

    if ((inval->base != base || inval->size != size) && inval->base != 0 && inval->size != -1) {
        UVM_TEST_PRINT("Expected invalidate range [0x%llx, 0x%llx), but got range [0x%llx, 0x%llx) instead\n",
-                base, base + size,
-                inval->base, inval->base + inval->size);
+                        base,
+                        base + size,
+                        inval->base,
+                        inval->base + inval->size);
        return false;
    }
    if (inval->depth != expected_depth) {
@@ -270,7 +278,13 @@ static bool assert_invalidate_range_specific(fake_tlb_invalidate_t *inval,
    return true;
 }

-static bool assert_invalidate_range(NvU64 base, NvU64 size, NvU32 page_size, bool allow_inval_all, NvU32 range_depth, NvU32 all_depth, bool expected_membar)
+static bool assert_invalidate_range(NvU64 base,
+                                    NvU64 size,
+                                    NvU32 page_size,
+                                    bool allow_inval_all,
+                                    NvU32 range_depth,
+                                    NvU32 all_depth,
+                                    bool expected_membar)
 {
    NvU32 i;

@@ -488,7 +502,6 @@ static NV_STATUS alloc_adjacent_pde_64k_memory(uvm_gpu_t *gpu)
    return NV_OK;
 }

-
 static NV_STATUS alloc_nearby_pde_64k_memory(uvm_gpu_t *gpu)
 {
    uvm_page_tree_t tree;
@@ -842,6 +855,7 @@ static NV_STATUS get_two_free_apart(uvm_gpu_t *gpu)
    TEST_CHECK_RET(range2.entry_count == 256);
    TEST_CHECK_RET(range2.table->ref_count == 512);
    TEST_CHECK_RET(range1.table == range2.table);
+
    // 4k page is second entry in a dual PDE
    TEST_CHECK_RET(range1.table == tree.root->entries[0]->entries[0]->entries[0]->entries[1]);
    TEST_CHECK_RET(range1.start_index == 256);
@@ -871,6 +885,7 @@ static NV_STATUS get_overlapping_dual_pdes(uvm_gpu_t *gpu)
    MEM_NV_CHECK_RET(test_page_tree_get_ptes(&tree, UVM_PAGE_SIZE_64K, size, size, &range64k), NV_OK);
    TEST_CHECK_RET(range64k.entry_count == 16);
    TEST_CHECK_RET(range64k.table->ref_count == 16);
+
    // 4k page is second entry in a dual PDE
    TEST_CHECK_RET(range64k.table == tree.root->entries[0]->entries[0]->entries[0]->entries[0]);
    TEST_CHECK_RET(range64k.start_index == 16);
@@ -1030,10 +1045,13 @@ static NV_STATUS test_tlb_invalidates(uvm_gpu_t *gpu)

    // Depth 4
    NvU64 extent_pte = UVM_PAGE_SIZE_2M;
+
    // Depth 3
    NvU64 extent_pde0 = extent_pte * (1ull << 8);
+
    // Depth 2
    NvU64 extent_pde1 = extent_pde0 * (1ull << 9);
+
    // Depth 1
    NvU64 extent_pde2 = extent_pde1 * (1ull << 9);

@@ -1081,7 +1099,11 @@ static NV_STATUS test_tlb_invalidates(uvm_gpu_t *gpu)
    return status;
 }

-static NV_STATUS test_tlb_batch_invalidates_case(uvm_page_tree_t *tree, NvU64 base, NvU64 size, NvU32 min_page_size, NvU32 max_page_size)
+static NV_STATUS test_tlb_batch_invalidates_case(uvm_page_tree_t *tree,
+                                                 NvU64 base,
+                                                 NvU64 size,
+                                                 NvU32 min_page_size,
+                                                 NvU32 max_page_size)
 {
    NV_STATUS status = NV_OK;
    uvm_push_t push;
@@ -1205,7 +1227,11 @@ static bool assert_range_vec_ptes(uvm_page_table_range_vec_t *range_vec, bool ex
            NvU64 expected_pte = expecting_cleared ? 0 : range_vec->size + offset;
            if (*pte != expected_pte) {
                UVM_TEST_PRINT("PTE is 0x%llx instead of 0x%llx for offset 0x%llx within range [0x%llx, 0x%llx)\n",
-                        *pte, expected_pte, offset, range_vec->start, range_vec->size);
+                               *pte,
+                               expected_pte,
+                               offset,
+                               range_vec->start,
+                               range_vec->size);
                return false;
            }
            offset += range_vec->page_size;
@@ -1226,7 +1252,11 @@ static NV_STATUS test_range_vec_write_ptes(uvm_page_table_range_vec_t *range_vec
    TEST_CHECK_RET(data.status == NV_OK);
    TEST_CHECK_RET(data.count == range_vec->size / range_vec->page_size);
    TEST_CHECK_RET(assert_invalidate_range_specific(g_last_fake_inval,
-            range_vec->start, range_vec->size, range_vec->page_size, page_table_depth, membar != UVM_MEMBAR_NONE));
+                                                    range_vec->start,
+                                                    range_vec->size,
+                                                    range_vec->page_size,
+                                                    page_table_depth,
+                                                    membar != UVM_MEMBAR_NONE));
    TEST_CHECK_RET(assert_range_vec_ptes(range_vec, false));

    fake_tlb_invals_disable();
@@ -1249,7 +1279,11 @@ static NV_STATUS test_range_vec_clear_ptes(uvm_page_table_range_vec_t *range_vec
    return NV_OK;
 }

-static NV_STATUS test_range_vec_create(uvm_page_tree_t *tree, NvU64 start, NvU64 size, NvU32 page_size, uvm_page_table_range_vec_t **range_vec_out)
+static NV_STATUS test_range_vec_create(uvm_page_tree_t *tree,
+                                       NvU64 start,
+                                       NvU64 size,
+                                       NvU32 page_size,
+                                       uvm_page_table_range_vec_t **range_vec_out)
 {
    uvm_page_table_range_vec_t *range_vec;
    uvm_pmm_alloc_flags_t pmm_flags = UVM_PMM_ALLOC_FLAGS_EVICT;
@@ -1544,25 +1578,28 @@ static NV_STATUS entry_test_maxwell(uvm_gpu_t *gpu)
    uvm_mmu_page_table_alloc_t alloc_sys = fake_table_alloc(UVM_APERTURE_SYS, 0x9999999000LL);
    uvm_mmu_page_table_alloc_t alloc_vid = fake_table_alloc(UVM_APERTURE_VID, 0x1BBBBBB000LL);
    uvm_mmu_mode_hal_t *hal;
+    uvm_page_directory_t dir;
    NvU32 i, j, big_page_size, page_size;

+    dir.depth = 0;
+
    for (i = 0; i < ARRAY_SIZE(big_page_sizes); i++) {
        big_page_size = big_page_sizes[i];
        hal = gpu->parent->arch_hal->mmu_mode_hal(big_page_size);

        memset(phys_allocs, 0, sizeof(phys_allocs));

-        hal->make_pde(&pde_bits, phys_allocs, 0);
+        hal->make_pde(&pde_bits, phys_allocs, &dir, 0);
        TEST_CHECK_RET(pde_bits == 0x0L);

        phys_allocs[0] = &alloc_sys;
        phys_allocs[1] = &alloc_vid;
-        hal->make_pde(&pde_bits, phys_allocs, 0);
+        hal->make_pde(&pde_bits, phys_allocs, &dir, 0);
        TEST_CHECK_RET(pde_bits == 0x1BBBBBBD99999992LL);

        phys_allocs[0] = &alloc_vid;
        phys_allocs[1] = &alloc_sys;
-        hal->make_pde(&pde_bits, phys_allocs, 0);
+        hal->make_pde(&pde_bits, phys_allocs, &dir, 0);
        TEST_CHECK_RET(pde_bits == 0x9999999E1BBBBBB1LL);

        for (j = 0; j <= 2; j++) {
@@ -1632,38 +1669,47 @@ static NV_STATUS entry_test_pascal(uvm_gpu_t *gpu, entry_test_page_size_func ent
    uvm_mmu_page_table_alloc_t *phys_allocs[2] = {NULL, NULL};
    uvm_mmu_page_table_alloc_t alloc_sys = fake_table_alloc(UVM_APERTURE_SYS, 0x399999999999000LL);
    uvm_mmu_page_table_alloc_t alloc_vid = fake_table_alloc(UVM_APERTURE_VID, 0x1BBBBBB000LL);
+    uvm_page_directory_t dir;
+
    // big versions have [11:8] set as well to test the page table merging
    uvm_mmu_page_table_alloc_t alloc_big_sys = fake_table_alloc(UVM_APERTURE_SYS, 0x399999999999900LL);
    uvm_mmu_page_table_alloc_t alloc_big_vid = fake_table_alloc(UVM_APERTURE_VID, 0x1BBBBBBB00LL);

    uvm_mmu_mode_hal_t *hal = gpu->parent->arch_hal->mmu_mode_hal(UVM_PAGE_SIZE_64K);

+    dir.index_in_parent = 0;
+    dir.host_parent = NULL;
+    dir.depth = 0;
+
    // Make sure cleared PDEs work as expected
-    hal->make_pde(pde_bits, phys_allocs, 0);
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    TEST_CHECK_RET(pde_bits[0] == 0);

    memset(pde_bits, 0xFF, sizeof(pde_bits));
-    hal->make_pde(pde_bits, phys_allocs, 3);
+    dir.depth = 3;
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    TEST_CHECK_RET(pde_bits[0] == 0 && pde_bits[1] == 0);

    // Sys and vidmem PDEs
    phys_allocs[0] = &alloc_sys;
-    hal->make_pde(pde_bits, phys_allocs, 0);
+    dir.depth = 0;
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    TEST_CHECK_RET(pde_bits[0] == 0x3999999999990C);

    phys_allocs[0] = &alloc_vid;
-    hal->make_pde(pde_bits, phys_allocs, 0);
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    TEST_CHECK_RET(pde_bits[0] == 0x1BBBBBB0A);

    // Dual PDEs
    phys_allocs[0] = &alloc_big_sys;
    phys_allocs[1] = &alloc_vid;
-    hal->make_pde(pde_bits, phys_allocs, 3);
+    dir.depth = 3;
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    TEST_CHECK_RET(pde_bits[0] == 0x3999999999999C && pde_bits[1] == 0x1BBBBBB0A);

    phys_allocs[0] = &alloc_big_vid;
    phys_allocs[1] = &alloc_sys;
-    hal->make_pde(pde_bits, phys_allocs, 3);
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    TEST_CHECK_RET(pde_bits[0] == 0x1BBBBBBBA && pde_bits[1] == 0x3999999999990C);

    // uncached, i.e., the sysmem data is not cached in GPU's L2 cache. Clear
@@ -1719,6 +1765,7 @@ static NV_STATUS entry_test_volta(uvm_gpu_t *gpu, entry_test_page_size_func entr
    uvm_mmu_page_table_alloc_t *phys_allocs[2] = {NULL, NULL};
    uvm_mmu_page_table_alloc_t alloc_sys = fake_table_alloc(UVM_APERTURE_SYS, 0x399999999999000LL);
    uvm_mmu_page_table_alloc_t alloc_vid = fake_table_alloc(UVM_APERTURE_VID, 0x1BBBBBB000LL);
+    uvm_page_directory_t dir;

    // big versions have [11:8] set as well to test the page table merging
    uvm_mmu_page_table_alloc_t alloc_big_sys = fake_table_alloc(UVM_APERTURE_SYS, 0x399999999999900LL);
@@ -1726,37 +1773,45 @@ static NV_STATUS entry_test_volta(uvm_gpu_t *gpu, entry_test_page_size_func entr

    uvm_mmu_mode_hal_t *hal = gpu->parent->arch_hal->mmu_mode_hal(UVM_PAGE_SIZE_64K);

+    dir.index_in_parent = 0;
+    dir.host_parent = NULL;
+    dir.depth = 0;
+
    // Make sure cleared PDEs work as expected
-    hal->make_pde(pde_bits, phys_allocs, 0);
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    TEST_CHECK_RET(pde_bits[0] == 0);

    memset(pde_bits, 0xFF, sizeof(pde_bits));
-    hal->make_pde(pde_bits, phys_allocs, 3);
+    dir.depth = 3;
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    TEST_CHECK_RET(pde_bits[0] == 0 && pde_bits[1] == 0);

    // Sys and vidmem PDEs
    phys_allocs[0] = &alloc_sys;
-    hal->make_pde(pde_bits, phys_allocs, 0);
+    dir.depth = 0;
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    TEST_CHECK_RET(pde_bits[0] == 0x3999999999990C);

    phys_allocs[0] = &alloc_vid;
-    hal->make_pde(pde_bits, phys_allocs, 0);
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    TEST_CHECK_RET(pde_bits[0] == 0x1BBBBBB0A);

    // Dual PDEs
    phys_allocs[0] = &alloc_big_sys;
    phys_allocs[1] = &alloc_vid;
-    hal->make_pde(pde_bits, phys_allocs, 3);
+    dir.depth = 3;
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    TEST_CHECK_RET(pde_bits[0] == 0x3999999999999C && pde_bits[1] == 0x1BBBBBB0A);

    phys_allocs[0] = &alloc_big_vid;
    phys_allocs[1] = &alloc_sys;
-    hal->make_pde(pde_bits, phys_allocs, 3);
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    TEST_CHECK_RET(pde_bits[0] == 0x1BBBBBBBA && pde_bits[1] == 0x3999999999990C);

    // NO_ATS PDE1 (depth 2)
    phys_allocs[0] = &alloc_vid;
-    hal->make_pde(pde_bits, phys_allocs, 2);
+    dir.depth = 2;
+    hal->make_pde(pde_bits, phys_allocs, &dir, 0);
    if (g_uvm_global.ats.enabled)
        TEST_CHECK_RET(pde_bits[0] == 0x1BBBBBB2A);
    else
@@ -1791,104 +1846,203 @@ static NV_STATUS entry_test_ampere(uvm_gpu_t *gpu, entry_test_page_size_func ent

 static NV_STATUS entry_test_hopper(uvm_gpu_t *gpu, entry_test_page_size_func entry_test_page_size)
 {
+    NV_STATUS status = NV_OK;
    NvU32 page_sizes[MAX_NUM_PAGE_SIZES];
    NvU64 pde_bits[2];
+    uvm_page_directory_t *dirs[5];
    size_t i, num_page_sizes;
    uvm_mmu_page_table_alloc_t *phys_allocs[2] = {NULL, NULL};
    uvm_mmu_page_table_alloc_t alloc_sys = fake_table_alloc(UVM_APERTURE_SYS, 0x9999999999000LL);
    uvm_mmu_page_table_alloc_t alloc_vid = fake_table_alloc(UVM_APERTURE_VID, 0xBBBBBBB000LL);

-    // big versions have [11:8] set as well to test the page table merging
+    // Big versions have [11:8] set as well to test the page table merging
    uvm_mmu_page_table_alloc_t alloc_big_sys = fake_table_alloc(UVM_APERTURE_SYS, 0x9999999999900LL);
    uvm_mmu_page_table_alloc_t alloc_big_vid = fake_table_alloc(UVM_APERTURE_VID, 0xBBBBBBBB00LL);

    uvm_mmu_mode_hal_t *hal = gpu->parent->arch_hal->mmu_mode_hal(UVM_PAGE_SIZE_64K);

-    // Make sure cleared PDEs work as expected
-    hal->make_pde(pde_bits, phys_allocs, 0);
-    TEST_CHECK_RET(pde_bits[0] == 0);
+    memset(dirs, 0, sizeof(dirs));
+    // Fake directory tree.
+    for (i = 0; i < ARRAY_SIZE(dirs); i++) {
+        dirs[i] = uvm_kvmalloc_zero(sizeof(uvm_page_directory_t) + sizeof(dirs[i]->entries[0]) * 512);
+        TEST_CHECK_GOTO(dirs[i] != NULL, cleanup);
+
+        dirs[i]->depth = i;
+        dirs[i]->index_in_parent = 0;
+
+        if (i == 0)
+            dirs[i]->host_parent = NULL;
+        else
+            dirs[i]->host_parent = dirs[i - 1];
+    }
+
+    // Make sure cleared PDEs work as expected.
+    hal->make_pde(pde_bits, phys_allocs, dirs[0], 0);
+    TEST_CHECK_GOTO(pde_bits[0] == 0, cleanup);

    // Cleared PDEs work as expected for big and small PDEs.
    memset(pde_bits, 0xFF, sizeof(pde_bits));
-    hal->make_pde(pde_bits, phys_allocs, 4);
-    TEST_CHECK_RET(pde_bits[0] == 0 && pde_bits[1] == 0);
+    hal->make_pde(pde_bits, phys_allocs, dirs[4], 0);
+    TEST_CHECK_GOTO(pde_bits[0] == 0 && pde_bits[1] == 0, cleanup);

    // Sys and vidmem PDEs, uncached ATS allowed.
    phys_allocs[0] = &alloc_sys;
-    hal->make_pde(pde_bits, phys_allocs, 0);
-    TEST_CHECK_RET(pde_bits[0] == 0x999999999900C);
+    hal->make_pde(pde_bits, phys_allocs, dirs[0], 0);
+    TEST_CHECK_GOTO(pde_bits[0] == 0x999999999900C, cleanup);

    phys_allocs[0] = &alloc_vid;
-    hal->make_pde(pde_bits, phys_allocs, 0);
-    TEST_CHECK_RET(pde_bits[0] == 0xBBBBBBB00A);
+    hal->make_pde(pde_bits, phys_allocs, dirs[0], 0);
+    TEST_CHECK_GOTO(pde_bits[0] == 0xBBBBBBB00A, cleanup);

-    // Dual PDEs, uncached.
+    // Dual PDEs, uncached. We don't use child_dir in the depth 4 checks because
+    // our policy decides the PDE's PCF without using it.
    phys_allocs[0] = &alloc_big_sys;
    phys_allocs[1] = &alloc_vid;
-    hal->make_pde(pde_bits, phys_allocs, 4);
-    TEST_CHECK_RET(pde_bits[0] == 0x999999999991C && pde_bits[1] == 0xBBBBBBB01A);
+    hal->make_pde(pde_bits, phys_allocs, dirs[4], 0);
+    if (g_uvm_global.ats.enabled)
+        TEST_CHECK_GOTO(pde_bits[0] == 0x999999999991C && pde_bits[1] == 0xBBBBBBB01A, cleanup);
+    else
+        TEST_CHECK_GOTO(pde_bits[0] == 0x999999999990C && pde_bits[1] == 0xBBBBBBB00A, cleanup);

    phys_allocs[0] = &alloc_big_vid;
    phys_allocs[1] = &alloc_sys;
-    hal->make_pde(pde_bits, phys_allocs, 4);
-    TEST_CHECK_RET(pde_bits[0] == 0xBBBBBBBB1A && pde_bits[1] == 0x999999999901C);
+    hal->make_pde(pde_bits, phys_allocs, dirs[4], 0);
+    if (g_uvm_global.ats.enabled)
+        TEST_CHECK_GOTO(pde_bits[0] == 0xBBBBBBBB1A && pde_bits[1] == 0x999999999901C, cleanup);
+    else
+        TEST_CHECK_GOTO(pde_bits[0] == 0xBBBBBBBB0A && pde_bits[1] == 0x999999999900C, cleanup);
+
+    // We only need to test make_pde() on ATS when the CPU VA width < GPU's.
+    if (g_uvm_global.ats.enabled && uvm_cpu_num_va_bits() < hal->num_va_bits()) {
+        phys_allocs[0] = &alloc_sys;
+
+        dirs[1]->index_in_parent = 0;
+        hal->make_pde(pde_bits, phys_allocs, dirs[0], 0);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x999999999900C, cleanup);
+
+        dirs[2]->index_in_parent = 0;
+        hal->make_pde(pde_bits, phys_allocs, dirs[1], 0);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x999999999901C, cleanup);
+
+        dirs[2]->index_in_parent = 1;
+        hal->make_pde(pde_bits, phys_allocs, dirs[1], 1);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x999999999901C, cleanup);
+
+        dirs[2]->index_in_parent = 2;
+        hal->make_pde(pde_bits, phys_allocs, dirs[1], 2);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x999999999901C, cleanup);
+
+        dirs[2]->index_in_parent = 511;
+        hal->make_pde(pde_bits, phys_allocs, dirs[1], 511);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x999999999901C, cleanup);
+
+        dirs[1]->index_in_parent = 1;
+        hal->make_pde(pde_bits, phys_allocs, dirs[0], 1);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x999999999900C, cleanup);
+
+        dirs[2]->index_in_parent = 0;
+        hal->make_pde(pde_bits, phys_allocs, dirs[1], 0);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x999999999901C, cleanup);
+
+        dirs[2]->index_in_parent = 509;
+        hal->make_pde(pde_bits, phys_allocs, dirs[1], 509);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x999999999901C, cleanup);
+
+        dirs[2]->index_in_parent = 510;
+        hal->make_pde(pde_bits, phys_allocs, dirs[1], 510);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x999999999901C, cleanup);
+
+        phys_allocs[0] = NULL;
+
+        dirs[1]->index_in_parent = 0;
+        hal->make_pde(pde_bits, phys_allocs, dirs[0], 0);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x0, cleanup);
+
+        dirs[2]->index_in_parent = 0;
+        hal->make_pde(pde_bits, phys_allocs, dirs[1], 0);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x0, cleanup);
+
+        dirs[2]->index_in_parent = 2;
+        hal->make_pde(pde_bits, phys_allocs, dirs[1], 2);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x10, cleanup);
+
+        dirs[1]->index_in_parent = 1;
+        dirs[2]->index_in_parent = 509;
+        hal->make_pde(pde_bits, phys_allocs, dirs[1], 509);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x10, cleanup);
+
+        dirs[2]->index_in_parent = 510;
+        hal->make_pde(pde_bits, phys_allocs, dirs[1], 510);
+        TEST_CHECK_GOTO(pde_bits[0] == 0x0, cleanup);
+    }

    // uncached, i.e., the sysmem data is not cached in GPU's L2 cache, and
    // access counters disabled.
-    TEST_CHECK_RET(hal->make_pte(UVM_APERTURE_SYS,
-                                 0x9999999999000LL,
-                                 UVM_PROT_READ_WRITE_ATOMIC,
-                                 UVM_MMU_PTE_FLAGS_ACCESS_COUNTERS_DISABLED) == 0x999999999968D);
+    TEST_CHECK_GOTO(hal->make_pte(UVM_APERTURE_SYS,
+                                  0x9999999999000LL,
+                                  UVM_PROT_READ_WRITE_ATOMIC,
+                                  UVM_MMU_PTE_FLAGS_ACCESS_COUNTERS_DISABLED) == 0x999999999968D,
+                    cleanup);

    // change to cached.
-    TEST_CHECK_RET(hal->make_pte(UVM_APERTURE_SYS,
-                                 0x9999999999000LL,
-                                 UVM_PROT_READ_WRITE_ATOMIC,
-                                 UVM_MMU_PTE_FLAGS_CACHED | UVM_MMU_PTE_FLAGS_ACCESS_COUNTERS_DISABLED) ==
-                   0x9999999999685);
+    TEST_CHECK_GOTO(hal->make_pte(UVM_APERTURE_SYS,
+                                  0x9999999999000LL,
+                                  UVM_PROT_READ_WRITE_ATOMIC,
+                                  UVM_MMU_PTE_FLAGS_CACHED | UVM_MMU_PTE_FLAGS_ACCESS_COUNTERS_DISABLED) ==
+                                  0x9999999999685,
+                    cleanup);

    // enable access counters.
-    TEST_CHECK_RET(hal->make_pte(UVM_APERTURE_SYS,
-                                 0x9999999999000LL,
-                                 UVM_PROT_READ_WRITE_ATOMIC,
-                                 UVM_MMU_PTE_FLAGS_CACHED) == 0x9999999999605);
+    TEST_CHECK_GOTO(hal->make_pte(UVM_APERTURE_SYS,
+                                  0x9999999999000LL,
+                                  UVM_PROT_READ_WRITE_ATOMIC,
+                                  UVM_MMU_PTE_FLAGS_CACHED) == 0x9999999999605,
+                    cleanup);

    // remove atomic
-    TEST_CHECK_RET(hal->make_pte(UVM_APERTURE_SYS,
-                                 0x9999999999000LL,
-                                 UVM_PROT_READ_WRITE,
-                                 UVM_MMU_PTE_FLAGS_CACHED) == 0x9999999999645);
+    TEST_CHECK_GOTO(hal->make_pte(UVM_APERTURE_SYS,
+                                  0x9999999999000LL,
+                                  UVM_PROT_READ_WRITE,
+                                  UVM_MMU_PTE_FLAGS_CACHED) == 0x9999999999645,
+                    cleanup);

    // read only
-    TEST_CHECK_RET(hal->make_pte(UVM_APERTURE_SYS,
-                                 0x9999999999000LL,
-                                 UVM_PROT_READ_ONLY,
-                                 UVM_MMU_PTE_FLAGS_CACHED) == 0x9999999999665);
+    TEST_CHECK_GOTO(hal->make_pte(UVM_APERTURE_SYS,
+                                  0x9999999999000LL,
+                                  UVM_PROT_READ_ONLY,
+                                  UVM_MMU_PTE_FLAGS_CACHED) == 0x9999999999665,
+                    cleanup);

    // local video
-    TEST_CHECK_RET(hal->make_pte(UVM_APERTURE_VID,
-                                 0xBBBBBBB000LL,
-                                 UVM_PROT_READ_ONLY,
-                                 UVM_MMU_PTE_FLAGS_CACHED) == 0xBBBBBBB661);
+    TEST_CHECK_GOTO(hal->make_pte(UVM_APERTURE_VID,
+                                  0xBBBBBBB000LL,
+                                  UVM_PROT_READ_ONLY,
+                                  UVM_MMU_PTE_FLAGS_CACHED) == 0xBBBBBBB661,
+                    cleanup);

    // peer 1
-    TEST_CHECK_RET(hal->make_pte(UVM_APERTURE_PEER_1,
-                                 0xBBBBBBB000LL,
-                                 UVM_PROT_READ_ONLY,
-                                 UVM_MMU_PTE_FLAGS_CACHED) == 0x200000BBBBBBB663);
+    TEST_CHECK_GOTO(hal->make_pte(UVM_APERTURE_PEER_1,
+                                  0xBBBBBBB000LL,
+                                  UVM_PROT_READ_ONLY,
+                                  UVM_MMU_PTE_FLAGS_CACHED) == 0x200000BBBBBBB663,
+                    cleanup);

    // sparse
-    TEST_CHECK_RET(hal->make_sparse_pte() == 0x8);
+    TEST_CHECK_GOTO(hal->make_sparse_pte() == 0x8, cleanup);

    // sked reflected
-    TEST_CHECK_RET(hal->make_sked_reflected_pte() == 0xF09);
+    TEST_CHECK_GOTO(hal->make_sked_reflected_pte() == 0xF09, cleanup);

    num_page_sizes = get_page_sizes(gpu, page_sizes);

    for (i = 0; i < num_page_sizes; i++)
-        TEST_NV_CHECK_RET(entry_test_page_size(gpu, page_sizes[i]));
+        TEST_NV_CHECK_GOTO(entry_test_page_size(gpu, page_sizes[i]), cleanup);

-    return NV_OK;
+cleanup:
+    for (i = 0; i < ARRAY_SIZE(dirs); i++)
+        uvm_kvfree(dirs[i]);
+
+    return status;
 }

 static NV_STATUS alloc_4k_maxwell(uvm_gpu_t *gpu)
@@ -2303,7 +2457,8 @@ NV_STATUS uvm_test_page_tree(UVM_TEST_PAGE_TREE_PARAMS *params, struct file *fil
    gpu->parent = parent_gpu;

    // At least test_tlb_invalidates() relies on global state
-    // (g_tlb_invalidate_*) so make sure only one test instance can run at a time.
+    // (g_tlb_invalidate_*) so make sure only one test instance can run at a
+    // time.
    uvm_mutex_lock(&g_uvm_global.global_lock);

    // Allocate the fake TLB tracking state. Notably tests still need to enable
@@ -2311,7 +2466,13 @@ NV_STATUS uvm_test_page_tree(UVM_TEST_PAGE_TREE_PARAMS *params, struct file *fil
    // calls.
    TEST_NV_CHECK_GOTO(fake_tlb_invals_alloc(), done);

-    TEST_NV_CHECK_GOTO(maxwell_test_page_tree(gpu), done);
+    // We prevent the maxwell_test_page_tree test from running on ATS-enabled
+    // systems. On "fake" Maxwell-based ATS systems pde_fill() may push more
+    // methods than what we support in UVM. Specifically, on
+    // uvm_page_tree_init() which eventually calls phys_mem_init(). On Maxwell,
+    // upper PDE levels have more than 512 entries.
+    if (!g_uvm_global.ats.enabled)
+        TEST_NV_CHECK_GOTO(maxwell_test_page_tree(gpu), done);
    TEST_NV_CHECK_GOTO(pascal_test_page_tree(gpu), done);
    TEST_NV_CHECK_GOTO(volta_test_page_tree(gpu), done);
    TEST_NV_CHECK_GOTO(ampere_test_page_tree(gpu), done);
--- a/kernel-open/nvidia-uvm/uvm_pascal.c
+++ b/kernel-open/nvidia-uvm/uvm_pascal.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016-2020 NVIDIA Corporation
+    Copyright (c) 2016-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -100,4 +100,6 @@ void uvm_hal_pascal_arch_init_properties(uvm_parent_gpu_t *parent_gpu)
    parent_gpu->smc.supported = false;

    parent_gpu->plc_supported = false;
+
+    parent_gpu->no_ats_range_required = false;
 }
--- a/kernel-open/nvidia-uvm/uvm_pascal_mmu.c
+++ b/kernel-open/nvidia-uvm/uvm_pascal_mmu.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2015-2020 NVIDIA Corporation
+    Copyright (c) 2015-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -140,11 +140,18 @@ static NvU64 small_half_pde_pascal(uvm_mmu_page_table_alloc_t *phys_alloc)
    return pde_bits;
 }

-static void make_pde_pascal(void *entry, uvm_mmu_page_table_alloc_t **phys_allocs, NvU32 depth)
+static void make_pde_pascal(void *entry,
+                            uvm_mmu_page_table_alloc_t **phys_allocs,
+                            uvm_page_directory_t *dir,
+                            NvU32 child_index)
 {
-    NvU32 entry_count = entries_per_index_pascal(depth);
+    NvU32 entry_count;
    NvU64 *entry_bits = (NvU64 *)entry;

+    UVM_ASSERT(dir);
+
+    entry_count = entries_per_index_pascal(dir->depth);
+
    if (entry_count == 1) {
        *entry_bits = single_pde_pascal(*phys_allocs);
    }
@@ -152,7 +159,8 @@ static void make_pde_pascal(void *entry, uvm_mmu_page_table_alloc_t **phys_alloc
        entry_bits[MMU_BIG] = big_half_pde_pascal(phys_allocs[MMU_BIG]);
        entry_bits[MMU_SMALL] = small_half_pde_pascal(phys_allocs[MMU_SMALL]);

-        // This entry applies to the whole dual PDE but is stored in the lower bits
+        // This entry applies to the whole dual PDE but is stored in the lower
+        // bits.
        entry_bits[MMU_BIG] |= HWCONST64(_MMU_VER2, DUAL_PDE, IS_PDE, TRUE);
    }
    else {
--- a/kernel-open/nvidia-uvm/uvm_perf_events_test.c
+++ b/kernel-open/nvidia-uvm/uvm_perf_events_test.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016-2023 NVIDIA Corporation
+    Copyright (c) 2016-2019 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -22,7 +22,10 @@
 *******************************************************************************/

 #include "uvm_perf_events.h"
+#include "uvm_va_block.h"
+#include "uvm_va_range.h"
 #include "uvm_va_space.h"
+#include "uvm_kvmalloc.h"
 #include "uvm_test.h"

 // Global variable used to check that callbacks are correctly executed
@@ -43,7 +46,10 @@ static NV_STATUS test_events(uvm_va_space_t *va_space)
    NV_STATUS status;
    uvm_perf_event_data_t event_data;

+    uvm_va_block_t block;
+
    test_data = 0;
+
    memset(&event_data, 0, sizeof(event_data));

    // Use CPU id to avoid triggering the GPU stats update code
@@ -52,7 +58,6 @@ static NV_STATUS test_events(uvm_va_space_t *va_space)
    // Register a callback for page fault
    status = uvm_perf_register_event_callback(&va_space->perf_events, UVM_PERF_EVENT_FAULT, callback_inc_1);
    TEST_CHECK_GOTO(status == NV_OK, done);
-
    // Register a callback for page fault
    status = uvm_perf_register_event_callback(&va_space->perf_events, UVM_PERF_EVENT_FAULT, callback_inc_2);
    TEST_CHECK_GOTO(status == NV_OK, done);
@@ -60,14 +65,13 @@ static NV_STATUS test_events(uvm_va_space_t *va_space)
    // va_space read lock is required for page fault event notification
    uvm_va_space_down_read(va_space);

-    // Notify (fake) page fault. The two registered callbacks for this event
-    // increment the value of test_value
+    // Notify (fake) page fault. The two registered callbacks for this event increment the value of test_value
+    event_data.fault.block = &block;
    uvm_perf_event_notify(&va_space->perf_events, UVM_PERF_EVENT_FAULT, &event_data);

    uvm_va_space_up_read(va_space);

-    // test_data was initialized to zero. It should have been incremented by 1
-    // and 2, respectively in the callbacks
+    // test_data was initialized to zero. It should have been incremented by 1 and 2, respectively in the callbacks
    TEST_CHECK_GOTO(test_data == 3, done);

 done:
@@ -92,3 +96,4 @@ NV_STATUS uvm_test_perf_events_sanity(UVM_TEST_PERF_EVENTS_SANITY_PARAMS *params
 done:
    return status;
 }
+
--- a/kernel-open/nvidia-uvm/uvm_perf_prefetch.c
+++ b/kernel-open/nvidia-uvm/uvm_perf_prefetch.c
@@ -355,7 +355,7 @@ static NvU32 uvm_perf_prefetch_prenotify_fault_migrations(uvm_va_block_t *va_blo
    uvm_page_mask_zero(prefetch_pages);

    if (UVM_ID_IS_CPU(new_residency) || va_block->gpus[uvm_id_gpu_index(new_residency)] != NULL)
-        resident_mask = uvm_va_block_resident_mask_get(va_block, new_residency, NUMA_NO_NODE);
+        resident_mask = uvm_va_block_resident_mask_get(va_block, new_residency);

    // If this is a first-touch fault and the destination processor is the
    // preferred location, populate the whole max_prefetch_region.
--- a/kernel-open/nvidia-uvm/uvm_perf_thrashing.c
+++ b/kernel-open/nvidia-uvm/uvm_perf_thrashing.c
@@ -164,7 +164,7 @@ typedef struct

        uvm_spinlock_t                          lock;

-        uvm_va_block_context_t      *va_block_context;
+        uvm_va_block_context_t      va_block_context;

        // Flag used to avoid scheduling delayed unpinning operations after
        // uvm_perf_thrashing_stop has been called.
@@ -601,14 +601,6 @@ static va_space_thrashing_info_t *va_space_thrashing_info_create(uvm_va_space_t

    va_space_thrashing = uvm_kvmalloc_zero(sizeof(*va_space_thrashing));
    if (va_space_thrashing) {
-        uvm_va_block_context_t *block_context = uvm_va_block_context_alloc(NULL);
-
-        if (!block_context) {
-            uvm_kvfree(va_space_thrashing);
-            return NULL;
-        }
-
-        va_space_thrashing->pinned_pages.va_block_context = block_context;
        va_space_thrashing->va_space = va_space;

        va_space_thrashing_info_init_params(va_space_thrashing);
@@ -629,7 +621,6 @@ static void va_space_thrashing_info_destroy(uvm_va_space_t *va_space)

    if (va_space_thrashing) {
        uvm_perf_module_type_unset_data(va_space->perf_modules_data, UVM_PERF_MODULE_TYPE_THRASHING);
-        uvm_va_block_context_free(va_space_thrashing->pinned_pages.va_block_context);
        uvm_kvfree(va_space_thrashing);
    }
 }
@@ -1113,7 +1104,7 @@ static NV_STATUS unmap_remote_pinned_pages(uvm_va_block_t *va_block,
                   !uvm_processor_mask_test(&policy->accessed_by, processor_id));

        if (uvm_processor_mask_test(&va_block->resident, processor_id)) {
-            const uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, processor_id, NUMA_NO_NODE);
+            const uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, processor_id);

            if (!uvm_page_mask_andnot(&va_block_context->caller_page_mask,
                                      &block_thrashing->pinned_pages.mask,
@@ -1321,8 +1312,9 @@ void thrashing_event_cb(uvm_perf_event_t event_id, uvm_perf_event_data_t *event_

        if (block_thrashing->last_time_stamp == 0 ||
            uvm_id_equal(block_thrashing->last_processor, processor_id) ||
-            time_stamp - block_thrashing->last_time_stamp > va_space_thrashing->params.lapse_ns)
+            time_stamp - block_thrashing->last_time_stamp > va_space_thrashing->params.lapse_ns) {
            goto done;
+        }

        num_block_pages = uvm_va_block_size(va_block) / PAGE_SIZE;

@@ -1811,7 +1803,7 @@ static void thrashing_unpin_pages(struct work_struct *work)
    struct delayed_work *dwork = to_delayed_work(work);
    va_space_thrashing_info_t *va_space_thrashing = container_of(dwork, va_space_thrashing_info_t, pinned_pages.dwork);
    uvm_va_space_t *va_space = va_space_thrashing->va_space;
-    uvm_va_block_context_t *va_block_context = va_space_thrashing->pinned_pages.va_block_context;
+    uvm_va_block_context_t *va_block_context = &va_space_thrashing->pinned_pages.va_block_context;

    // Take the VA space lock so that VA blocks don't go away during this
    // operation.
@@ -1945,6 +1937,7 @@ void uvm_perf_thrashing_unload(uvm_va_space_t *va_space)

    // Make sure that there are not pending work items
    if (va_space_thrashing) {
+        UVM_ASSERT(va_space_thrashing->pinned_pages.in_va_space_teardown);
        UVM_ASSERT(list_empty(&va_space_thrashing->pinned_pages.list));

        va_space_thrashing_info_destroy(va_space);
--- a/kernel-open/nvidia-uvm/uvm_pmm_gpu.c
+++ b/kernel-open/nvidia-uvm/uvm_pmm_gpu.c
@@ -3377,47 +3377,76 @@ uvm_gpu_id_t uvm_pmm_devmem_page_to_gpu_id(struct page *page)
    return gpu->id;
 }

-// Check there are no orphan pages. This should be only called as part of
-// removing a GPU: after all work is stopped and all va_blocks have been
-// destroyed. By now there should be no device-private page references left as
-// there are no va_space's left on this GPU and orphan pages should be removed
-// by va_space destruction or unregistration from the GPU.
-static bool uvm_pmm_gpu_check_orphan_pages(uvm_pmm_gpu_t *pmm)
+static void evict_orphan_pages(uvm_pmm_gpu_t *pmm, uvm_gpu_chunk_t *chunk)
+{
+    NvU32 i;
+
+    UVM_ASSERT(chunk->state == UVM_PMM_GPU_CHUNK_STATE_IS_SPLIT);
+    UVM_ASSERT(chunk->suballoc);
+
+    for (i = 0; i < num_subchunks(chunk); i++) {
+        uvm_gpu_chunk_t *subchunk = chunk->suballoc->subchunks[i];
+
+        uvm_spin_lock(&pmm->list_lock);
+
+        if (subchunk->state == UVM_PMM_GPU_CHUNK_STATE_IS_SPLIT) {
+            uvm_spin_unlock(&pmm->list_lock);
+
+            evict_orphan_pages(pmm, subchunk);
+            continue;
+        }
+
+        if (subchunk->state == UVM_PMM_GPU_CHUNK_STATE_ALLOCATED && subchunk->is_referenced) {
+            unsigned long pfn = uvm_pmm_gpu_devmem_get_pfn(pmm, subchunk);
+
+            // TODO: Bug 3368756: add support for large GPU pages.
+            UVM_ASSERT(uvm_gpu_chunk_get_size(subchunk) == PAGE_SIZE);
+            uvm_spin_unlock(&pmm->list_lock);
+
+            // The above check for subchunk state is racy because the
+            // chunk may be freed after the lock is dropped. It is
+            // still safe to proceed in that case because the struct
+            // page reference will have dropped to zero and cannot
+            // have been re-allocated as this is only called during
+            // GPU teardown. Therefore migrate_device_range() will
+            // simply fail.
+            uvm_hmm_pmm_gpu_evict_pfn(pfn);
+            continue;
+        }
+
+        uvm_spin_unlock(&pmm->list_lock);
+    }
+}
+
+// Free any orphan pages.
+// This should be called as part of removing a GPU: after all work is stopped
+// and all va_blocks have been destroyed. There normally won't be any
+// device private struct page references left but there can be cases after
+// fork() where a child process still holds a reference. This function searches
+// for pages that still have a reference and migrates the page to the GPU in
+// order to release the reference in the CPU page table.
+static void uvm_pmm_gpu_free_orphan_pages(uvm_pmm_gpu_t *pmm)
 {
    size_t i;
-    bool ret = true;
-    unsigned long pfn;
-    struct range range = pmm->devmem.pagemap.range;

-    if (!pmm->initialized || !uvm_hmm_is_enabled_system_wide())
-        return ret;
+    if (!pmm->initialized)
+        return;
+
+    // This is only safe to call during GPU teardown where chunks
+    // cannot be re-allocated.
+    UVM_ASSERT(uvm_gpu_retained_count(uvm_pmm_to_gpu(pmm)) == 0);

    // Scan all the root chunks looking for subchunks which are still
-    // referenced.
+    // referenced. This is slow, but we only do this when unregistering a GPU
+    // and is not critical for performance.
    for (i = 0; i < pmm->root_chunks.count; i++) {
        uvm_gpu_root_chunk_t *root_chunk = &pmm->root_chunks.array[i];

        root_chunk_lock(pmm, root_chunk);
        if (root_chunk->chunk.state == UVM_PMM_GPU_CHUNK_STATE_IS_SPLIT)
-            ret = false;
+            evict_orphan_pages(pmm, &root_chunk->chunk);
        root_chunk_unlock(pmm, root_chunk);
    }
-
-    for (pfn = __phys_to_pfn(range.start); pfn <= __phys_to_pfn(range.end); pfn++) {
-        struct page *page = pfn_to_page(pfn);
-
-        if (!is_device_private_page(page)) {
-            ret = false;
-            break;
-        }
-
-        if (page_count(page)) {
-            ret = false;
-            break;
-        }
-    }
-
-    return ret;
 }

 static void devmem_page_free(struct page *page)
@@ -3450,7 +3479,7 @@ static vm_fault_t devmem_fault(struct vm_fault *vmf)
 {
    uvm_va_space_t *va_space = vmf->page->zone_device_data;

-    if (!va_space)
+    if (!va_space || va_space->va_space_mm.mm != vmf->vma->vm_mm)
        return VM_FAULT_SIGBUS;

    return uvm_va_space_cpu_fault_hmm(va_space, vmf->vma, vmf);
@@ -3539,9 +3568,8 @@ static void devmem_deinit(uvm_pmm_gpu_t *pmm)
 {
 }

-static bool uvm_pmm_gpu_check_orphan_pages(uvm_pmm_gpu_t *pmm)
+static void uvm_pmm_gpu_free_orphan_pages(uvm_pmm_gpu_t *pmm)
 {
-    return true;
 }
 #endif // UVM_IS_CONFIG_HMM()

@@ -3716,7 +3744,7 @@ void uvm_pmm_gpu_deinit(uvm_pmm_gpu_t *pmm)

    gpu = uvm_pmm_to_gpu(pmm);

-    UVM_ASSERT(uvm_pmm_gpu_check_orphan_pages(pmm));
+    uvm_pmm_gpu_free_orphan_pages(pmm);
    nv_kthread_q_flush(&gpu->parent->lazy_free_q);
    UVM_ASSERT(list_empty(&pmm->root_chunks.va_block_lazy_free));
    release_free_root_chunks(pmm);
--- a/kernel-open/nvidia-uvm/uvm_pmm_sysmem.c
+++ b/kernel-open/nvidia-uvm/uvm_pmm_sysmem.c
@@ -749,7 +749,6 @@ NV_STATUS uvm_cpu_chunk_map_gpu(uvm_cpu_chunk_t *chunk, uvm_gpu_t *gpu)
 }

 static struct page *uvm_cpu_chunk_alloc_page(uvm_chunk_size_t alloc_size,
-                                             int nid,
                                             uvm_cpu_chunk_alloc_flags_t alloc_flags)
 {
    gfp_t kernel_alloc_flags;
@@ -765,27 +764,18 @@ static struct page *uvm_cpu_chunk_alloc_page(uvm_chunk_size_t alloc_size,

    kernel_alloc_flags |= GFP_HIGHUSER;

-    // For allocation sizes higher than PAGE_SIZE, use __GFP_NORETRY in order
-    // to avoid higher allocation latency from the kernel compacting memory to
-    // satisfy the request.
-    // Use __GFP_NOWARN to avoid printing allocation failure to the kernel log.
-    // High order allocation failures are handled gracefully by the caller.
+    // For allocation sizes higher than PAGE_SIZE, use __GFP_NORETRY in
+    // order to avoid higher allocation latency from the kernel compacting
+    // memory to satisfy the request.
    if (alloc_size > PAGE_SIZE)
-        kernel_alloc_flags |= __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN;
+        kernel_alloc_flags |= __GFP_COMP | __GFP_NORETRY;

    if (alloc_flags & UVM_CPU_CHUNK_ALLOC_FLAGS_ZERO)
        kernel_alloc_flags |= __GFP_ZERO;

-    UVM_ASSERT(nid < num_online_nodes());
-    if (nid == NUMA_NO_NODE)
-        page = alloc_pages(kernel_alloc_flags, get_order(alloc_size));
-    else
-        page = alloc_pages_node(nid, kernel_alloc_flags, get_order(alloc_size));
-
-    if (page) {
-        if (alloc_flags & UVM_CPU_CHUNK_ALLOC_FLAGS_ZERO)
-            SetPageDirty(page);
-    }
+    page = alloc_pages(kernel_alloc_flags, get_order(alloc_size));
+    if (page && (alloc_flags & UVM_CPU_CHUNK_ALLOC_FLAGS_ZERO))
+        SetPageDirty(page);

    return page;
 }
@@ -815,7 +805,6 @@ static uvm_cpu_physical_chunk_t *uvm_cpu_chunk_create(uvm_chunk_size_t alloc_siz

 NV_STATUS uvm_cpu_chunk_alloc(uvm_chunk_size_t alloc_size,
                              uvm_cpu_chunk_alloc_flags_t alloc_flags,
-                              int nid,
                              uvm_cpu_chunk_t **new_chunk)
 {
    uvm_cpu_physical_chunk_t *chunk;
@@ -823,7 +812,7 @@ NV_STATUS uvm_cpu_chunk_alloc(uvm_chunk_size_t alloc_size,

    UVM_ASSERT(new_chunk);

-    page = uvm_cpu_chunk_alloc_page(alloc_size, nid, alloc_flags);
+    page = uvm_cpu_chunk_alloc_page(alloc_size, alloc_flags);
    if (!page)
        return NV_ERR_NO_MEMORY;

@@ -858,13 +847,6 @@ NV_STATUS uvm_cpu_chunk_alloc_hmm(struct page *page,
    return NV_OK;
 }

-int uvm_cpu_chunk_get_numa_node(uvm_cpu_chunk_t *chunk)
-{
-    UVM_ASSERT(chunk);
-    UVM_ASSERT(chunk->page);
-    return page_to_nid(chunk->page);
-}
-
 NV_STATUS uvm_cpu_chunk_split(uvm_cpu_chunk_t *chunk, uvm_cpu_chunk_t **new_chunks)
 {
    NV_STATUS status = NV_OK;
--- a/kernel-open/nvidia-uvm/uvm_pmm_sysmem.h
+++ b/kernel-open/nvidia-uvm/uvm_pmm_sysmem.h
@@ -304,24 +304,11 @@ uvm_chunk_sizes_mask_t uvm_cpu_chunk_get_allocation_sizes(void);

 // Allocate a physical CPU chunk of the specified size.
 //
-// The nid argument is used to indicate a memory node preference. If the
-// value is a memory node ID, the chunk allocation will be attempted on
-// that memory node. If the chunk cannot be allocated on that memory node,
-// it will be allocated on any memory node allowed by the process's policy.
-//
-// If the value of nid is a memory node ID that is not in the set of
-// current process's allowed memory nodes, it will be allocated on one of the
-// nodes in the allowed set.
-//
-// If the value of nid is NUMA_NO_NODE, the chunk will be allocated from any
-// of the allowed memory nodes by the process policy.
-//
 // If a CPU chunk allocation succeeds, NV_OK is returned. new_chunk will be set
 // to point to the newly allocated chunk. On failure, NV_ERR_NO_MEMORY is
 // returned.
 NV_STATUS uvm_cpu_chunk_alloc(uvm_chunk_size_t alloc_size,
                              uvm_cpu_chunk_alloc_flags_t flags,
-                              int nid,
                              uvm_cpu_chunk_t **new_chunk);

 // Allocate a HMM CPU chunk.
@@ -388,9 +375,6 @@ static uvm_cpu_logical_chunk_t *uvm_cpu_chunk_to_logical(uvm_cpu_chunk_t *chunk)
    return container_of((chunk), uvm_cpu_logical_chunk_t, common);
 }

-// Return the NUMA node ID of the physical page backing the chunk.
-int uvm_cpu_chunk_get_numa_node(uvm_cpu_chunk_t *chunk);
-
 // Free a CPU chunk.
 // This may not result in the immediate freeing of the physical pages of the
 // chunk if this is a logical chunk and there are other logical chunks holding
--- a/kernel-open/nvidia-uvm/uvm_pmm_sysmem_test.c
+++ b/kernel-open/nvidia-uvm/uvm_pmm_sysmem_test.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2017-2023 NVIDIA Corporation
+    Copyright (c) 2017-2019 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -664,7 +664,6 @@ done:

 static NV_STATUS test_cpu_chunk_alloc(uvm_chunk_size_t size,
                                      uvm_cpu_chunk_alloc_flags_t flags,
-                                      int nid,
                                      uvm_cpu_chunk_t **out_chunk)
 {
    uvm_cpu_chunk_t *chunk;
@@ -676,7 +675,7 @@ static NV_STATUS test_cpu_chunk_alloc(uvm_chunk_size_t size,
    // It is possible that the allocation fails due to lack of large pages
    // rather than an API issue, which will result in a false negative.
    // However, that should be very rare.
-    TEST_NV_CHECK_RET(uvm_cpu_chunk_alloc(size, flags, nid, &chunk));
+    TEST_NV_CHECK_RET(uvm_cpu_chunk_alloc(size, flags, &chunk));

    // Check general state of the chunk:
    //   - chunk should be a physical chunk,
@@ -686,12 +685,6 @@ static NV_STATUS test_cpu_chunk_alloc(uvm_chunk_size_t size,
    TEST_CHECK_GOTO(uvm_cpu_chunk_get_size(chunk) == size, done);
    TEST_CHECK_GOTO(uvm_cpu_chunk_num_pages(chunk) == size / PAGE_SIZE, done);

-    // It is possible for the kernel to allocate a chunk on a NUMA node other
-    // than the one requested. However, that should not be an issue with
-    // sufficient memory on each NUMA node.
-    if (nid != NUMA_NO_NODE)
-        TEST_CHECK_GOTO(uvm_cpu_chunk_get_numa_node(chunk) == nid, done);
-
    if (flags & UVM_CPU_CHUNK_ALLOC_FLAGS_ZERO) {
        NvU64 *cpu_addr;

@@ -726,7 +719,7 @@ static NV_STATUS test_cpu_chunk_mapping_basic_verify(uvm_gpu_t *gpu,
    NvU64 dma_addr;
    NV_STATUS status = NV_OK;

-    TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, flags, NUMA_NO_NODE, &chunk));
+    TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, flags, &chunk));
    phys_chunk = uvm_cpu_chunk_to_physical(chunk);

    // Check state of the physical chunk:
@@ -770,27 +763,27 @@ static NV_STATUS test_cpu_chunk_mapping_basic(uvm_gpu_t *gpu, uvm_cpu_chunk_allo
    return NV_OK;
 }

-static NV_STATUS test_cpu_chunk_mapping_array(uvm_gpu_t *gpu0, uvm_gpu_t *gpu1, uvm_gpu_t *gpu2)
+static NV_STATUS test_cpu_chunk_mapping_array(uvm_gpu_t *gpu1, uvm_gpu_t *gpu2, uvm_gpu_t *gpu3)
 {
    NV_STATUS status = NV_OK;
    uvm_cpu_chunk_t *chunk;
    uvm_cpu_physical_chunk_t *phys_chunk;
-    NvU64 dma_addr_gpu1;
+    NvU64 dma_addr_gpu2;

-    TEST_NV_CHECK_RET(test_cpu_chunk_alloc(PAGE_SIZE, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, NUMA_NO_NODE, &chunk));
+    TEST_NV_CHECK_RET(test_cpu_chunk_alloc(PAGE_SIZE, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, &chunk));
    phys_chunk = uvm_cpu_chunk_to_physical(chunk);

+    TEST_NV_CHECK_GOTO(uvm_cpu_chunk_map_gpu(chunk, gpu2), done);
+    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu2), done);
+    TEST_NV_CHECK_GOTO(uvm_cpu_chunk_map_gpu(chunk, gpu3), done);
+    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu2), done);
+    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu3), done);
+    dma_addr_gpu2 = uvm_cpu_chunk_get_gpu_phys_addr(chunk, gpu2->parent);
+    uvm_cpu_chunk_unmap_gpu_phys(chunk, gpu3->parent);
+    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu2), done);
    TEST_NV_CHECK_GOTO(uvm_cpu_chunk_map_gpu(chunk, gpu1), done);
    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu1), done);
-    TEST_NV_CHECK_GOTO(uvm_cpu_chunk_map_gpu(chunk, gpu2), done);
-    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu1), done);
    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu2), done);
-    dma_addr_gpu1 = uvm_cpu_chunk_get_gpu_phys_addr(chunk, gpu1->parent);
-    uvm_cpu_chunk_unmap_gpu_phys(chunk, gpu2->parent);
-    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu1), done);
-    TEST_NV_CHECK_GOTO(uvm_cpu_chunk_map_gpu(chunk, gpu0), done);
-    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu0), done);
-    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu1), done);

    // DMA mapping addresses for different GPUs live in different IOMMU spaces,
    // so it would be perfectly legal for them to have the same IOVA, and even
@@ -800,7 +793,7 @@ static NV_STATUS test_cpu_chunk_mapping_array(uvm_gpu_t *gpu0, uvm_gpu_t *gpu1,
    // GPU1. It's true that we may get a false negative if both addresses
    // happened to alias and we had a bug in how the addresses are shifted in
    // the dense array, but that's better than intermittent failure.
-    TEST_CHECK_GOTO(uvm_cpu_chunk_get_gpu_phys_addr(chunk, gpu1->parent) == dma_addr_gpu1, done);
+    TEST_CHECK_GOTO(uvm_cpu_chunk_get_gpu_phys_addr(chunk, gpu2->parent) == dma_addr_gpu2, done);

 done:
    uvm_cpu_chunk_free(chunk);
@@ -918,7 +911,7 @@ static NV_STATUS test_cpu_chunk_split_and_merge(uvm_gpu_t *gpu)
        uvm_cpu_chunk_t *chunk;
        NV_STATUS status;

-        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, NUMA_NO_NODE, &chunk));
+        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, &chunk));
        status = do_test_cpu_chunk_split_and_merge(chunk, gpu);
        uvm_cpu_chunk_free(chunk);

@@ -1000,7 +993,7 @@ static NV_STATUS test_cpu_chunk_dirty(uvm_gpu_t *gpu)
        uvm_cpu_physical_chunk_t *phys_chunk;
        size_t num_pages;

-        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, NUMA_NO_NODE, &chunk));
+        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, &chunk));
        phys_chunk = uvm_cpu_chunk_to_physical(chunk);
        num_pages = uvm_cpu_chunk_num_pages(chunk);

@@ -1012,7 +1005,7 @@ static NV_STATUS test_cpu_chunk_dirty(uvm_gpu_t *gpu)

        uvm_cpu_chunk_free(chunk);

-        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_ZERO, NUMA_NO_NODE, &chunk));
+        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_ZERO, &chunk));
        phys_chunk = uvm_cpu_chunk_to_physical(chunk);
        num_pages = uvm_cpu_chunk_num_pages(chunk);

@@ -1177,35 +1170,13 @@ NV_STATUS test_cpu_chunk_free(uvm_va_space_t *va_space, uvm_processor_mask_t *te
    size_t size = uvm_chunk_find_next_size(alloc_sizes, PAGE_SIZE);

    for_each_chunk_size_from(size, alloc_sizes) {
-        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, NUMA_NO_NODE, &chunk));
+        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, &chunk));
        TEST_NV_CHECK_RET(do_test_cpu_chunk_free(chunk, va_space, test_gpus));
    }

    return NV_OK;
 }

-static NV_STATUS test_cpu_chunk_numa_alloc(uvm_va_space_t *va_space)
-{
-    uvm_cpu_chunk_t *chunk;
-    uvm_chunk_sizes_mask_t alloc_sizes = uvm_cpu_chunk_get_allocation_sizes();
-    size_t size;
-
-    for_each_chunk_size(size, alloc_sizes) {
-        int nid;
-
-        for_each_possible_uvm_node(nid) {
-            // Do not test CPU allocation on nodes that have no memory or CPU
-            if (!node_state(nid, N_MEMORY) || !node_state(nid, N_CPU))
-                continue;
-
-            TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, nid, &chunk));
-            uvm_cpu_chunk_free(chunk);
-        }
-    }
-
-    return NV_OK;
-}
-
 NV_STATUS uvm_test_cpu_chunk_api(UVM_TEST_CPU_CHUNK_API_PARAMS *params, struct file *filp)
 {
    uvm_va_space_t *va_space = uvm_va_space_get(filp);
@@ -1226,7 +1197,6 @@ NV_STATUS uvm_test_cpu_chunk_api(UVM_TEST_CPU_CHUNK_API_PARAMS *params, struct f
    }

    TEST_NV_CHECK_GOTO(test_cpu_chunk_free(va_space, &test_gpus), done);
-    TEST_NV_CHECK_GOTO(test_cpu_chunk_numa_alloc(va_space), done);

    if (uvm_processor_mask_get_gpu_count(&test_gpus) >= 3) {
        uvm_gpu_t *gpu2, *gpu3;
--- a/kernel-open/nvidia-uvm/uvm_pmm_test.c
+++ b/kernel-open/nvidia-uvm/uvm_pmm_test.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2015-2022 NVIDIA Corporation
+    Copyright (c) 2015-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -324,7 +324,7 @@ static NV_STATUS gpu_mem_check(uvm_gpu_t *gpu,

    // TODO: Bug 3839176: [UVM][HCC][uvm_test] Update tests that assume GPU
    //                     engines can directly access sysmem
-    // Skip this test for now. To enable this test under SEV,
+    // Skip this test for now. To enable this test in Confidential Computing,
    // The GPU->CPU CE copy needs to be updated so it uses encryption when
    // CC is enabled.
    if (uvm_conf_computing_mode_enabled(gpu))
@@ -1068,7 +1068,7 @@ static NV_STATUS test_pmm_reverse_map_single(uvm_gpu_t *gpu, uvm_va_space_t *va_
    uvm_mutex_lock(&va_block->lock);

    is_resident = uvm_processor_mask_test(&va_block->resident, gpu->id) &&
-                  uvm_page_mask_full(uvm_va_block_resident_mask_get(va_block, gpu->id, NUMA_NO_NODE));
+                  uvm_page_mask_full(uvm_va_block_resident_mask_get(va_block, gpu->id));
    if (is_resident)
        phys_addr = uvm_va_block_gpu_phys_page_address(va_block, 0, gpu);

@@ -1154,7 +1154,7 @@ static NV_STATUS test_pmm_reverse_map_many_blocks(uvm_gpu_t *gpu, uvm_va_space_t
                uvm_mutex_lock(&va_block->lock);

                // Verify that all pages are populated on the GPU
-                is_resident = uvm_page_mask_region_full(uvm_va_block_resident_mask_get(va_block, gpu->id, NUMA_NO_NODE),
+                is_resident = uvm_page_mask_region_full(uvm_va_block_resident_mask_get(va_block, gpu->id),
                                                        reverse_mapping->region);

                uvm_mutex_unlock(&va_block->lock);
@@ -1223,8 +1223,6 @@ static NV_STATUS test_indirect_peers(uvm_gpu_t *owning_gpu, uvm_gpu_t *accessing
    if (!chunks)
        return NV_ERR_NO_MEMORY;

-    UVM_ASSERT(!g_uvm_global.sev_enabled);
-
    TEST_NV_CHECK_GOTO(uvm_mem_alloc_sysmem_and_map_cpu_kernel(UVM_CHUNK_SIZE_MAX, current->mm, &verif_mem), out);
    TEST_NV_CHECK_GOTO(uvm_mem_map_gpu_kernel(verif_mem, owning_gpu), out);
    TEST_NV_CHECK_GOTO(uvm_mem_map_gpu_kernel(verif_mem, accessing_gpu), out);
--- a/kernel-open/nvidia-uvm/uvm_policy.c
+++ b/kernel-open/nvidia-uvm/uvm_policy.c
@@ -176,9 +176,7 @@ static NV_STATUS preferred_location_unmap_remote_pages(uvm_va_block_t *va_block,
    mapped_mask = uvm_va_block_map_mask_get(va_block, preferred_location);

    if (uvm_processor_mask_test(&va_block->resident, preferred_location)) {
-        const uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block,
-                                                                              preferred_location,
-                                                                              NUMA_NO_NODE);
+        const uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, preferred_location);

        if (!uvm_page_mask_andnot(&va_block_context->caller_page_mask, mapped_mask, resident_mask))
            goto done;
@@ -640,7 +638,7 @@ static NV_STATUS va_block_set_read_duplication_locked(uvm_va_block_t *va_block,

    for_each_id_in_mask(src_id, &va_block->resident) {
        NV_STATUS status;
-        uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, src_id, NUMA_NO_NODE);
+        uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, src_id);

        // Calling uvm_va_block_make_resident_read_duplicate will break all
        // SetAccessedBy and remote mappings
@@ -697,7 +695,7 @@ static NV_STATUS va_block_unset_read_duplication_locked(uvm_va_block_t *va_block
    // If preferred_location is set and has resident copies, give it preference
    if (UVM_ID_IS_VALID(preferred_location) &&
        uvm_processor_mask_test(&va_block->resident, preferred_location)) {
-        uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, preferred_location, NUMA_NO_NODE);
+        uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, preferred_location);
        bool is_mask_empty = !uvm_page_mask_and(break_read_duplication_pages,
                                                &va_block->read_duplicated_pages,
                                                resident_mask);
@@ -725,7 +723,7 @@ static NV_STATUS va_block_unset_read_duplication_locked(uvm_va_block_t *va_block
        if (uvm_id_equal(processor_id, preferred_location))
            continue;

-        resident_mask = uvm_va_block_resident_mask_get(va_block, processor_id, NUMA_NO_NODE);
+        resident_mask = uvm_va_block_resident_mask_get(va_block, processor_id);
        is_mask_empty = !uvm_page_mask_and(break_read_duplication_pages,
                                           &va_block->read_duplicated_pages,
                                           resident_mask);
--- a/kernel-open/nvidia-uvm/uvm_processors.c
+++ b/kernel-open/nvidia-uvm/uvm_processors.c
@@ -1,40 +0,0 @@
-/*******************************************************************************
-    Copyright (c) 2023 NVIDIA Corporation
-
-    Permission is hereby granted, free of charge, to any person obtaining a copy
-    of this software and associated documentation files (the "Software"), to
-    deal in the Software without restriction, including without limitation the
-    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
-    sell copies of the Software, and to permit persons to whom the Software is
-    furnished to do so, subject to the following conditions:
-
-        The above copyright notice and this permission notice shall be
-        included in all copies or substantial portions of the Software.
-
-    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
-    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
-    DEALINGS IN THE SOFTWARE.
-
-*******************************************************************************/
-
-#include "uvm_processors.h"
-
-int uvm_find_closest_node_mask(int src, const nodemask_t *mask)
-{
-    int nid;
-    int closest_nid = NUMA_NO_NODE;
-
-    if (node_isset(src, *mask))
-        return src;
-
-    for_each_set_bit(nid, mask->bits, MAX_NUMNODES) {
-        if (closest_nid == NUMA_NO_NODE || node_distance(src, nid) < node_distance(src, closest_nid))
-            closest_nid = nid;
-    }
-
-    return closest_nid;
-}
--- a/kernel-open/nvidia-uvm/uvm_processors.h
+++ b/kernel-open/nvidia-uvm/uvm_processors.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016-2023 NVIDIA Corporation
+    Copyright (c) 2016-2019 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -26,7 +26,6 @@

 #include "uvm_linux.h"
 #include "uvm_common.h"
-#include <linux/numa.h>

 #define UVM_MAX_UNIQUE_GPU_PAIRS SUM_FROM_0_TO_N(UVM_MAX_GPUS - 1)

@@ -38,11 +37,11 @@
 // provide type safety, they are wrapped within the uvm_processor_id_t struct.
 // The range of valid identifiers needs to cover the maximum number of
 // supported GPUs on a system plus the CPU. CPU is assigned value 0, and GPUs
-// range: [1, UVM_PARENT_ID_MAX_GPUS].
+// range: [1, UVM_ID_MAX_GPUS].
 //
 // There are some functions that only expect GPU identifiers and, in order to
-// make it clearer, the uvm_parent_gpu_id_t alias type is provided. However, as
-// this type is just a typedef of uvm_processor_id_t, there is no type checking
+// make it clearer, the uvm_gpu_id_t alias type is provided. However, as this
+// type is just a typedef of uvm_processor_id_t, there is no type checking
 // performed by the compiler.
 //
 // Identifier value vs index
@@ -61,25 +60,22 @@
 // the GPU within the GPU id space (basically id - 1).
 //
 // In the diagram below, MAX_SUB is used to abbreviate
-// UVM_PARENT_ID_MAX_SUB_PROCESSORS.
+// UVM_ID_MAX_SUB_PROCESSORS.
 //
-// TODO: Bug 4195538: uvm_parent_processor_id_t is currently but temporarily the
-//                    same as uvm_processor_id_t.
+//            |-------------------------- uvm_processor_id_t ----------------------|
+//            |                                                                    |
+//            |     |----------------------- uvm_gpu_id_t ------------------------||
+//            |     |                                                             ||
+// Proc type  | CPU | GPU          ...          GPU   ... GPU                     ||
+//            |     |                                                             ||
+// ID values  |  0  |  1           ...          i+1   ... UVM_ID_MAX_PROCESSORS-1 ||
 //
-//            |-------------------------- uvm_parent_processor_id_t ----------------------|
-//            |                                                                           |
-//            |     |----------------------- uvm_parent_gpu_id_t ------------------------||
-//            |     |                                                                    ||
-// Proc type  | CPU | GPU          ...          GPU   ... GPU                            ||
-//            |     |                                                                    ||
-// ID values  |  0  |  1           ...          i+1   ... UVM_PARENT_ID_MAX_PROCESSORS-1 ||
-//
-// GPU index           0           ...           i    ... UVM_PARENT_ID_MAX_GPUS-1
+// GPU index           0           ...           i    ... UVM_ID_MAX_GPUS-1
 //                  |     |                   |     |
 //                  |     |                   |     |
-//                  |     |-------------|     |     |------------------------------------|
-//                  |                   |     |                                          |
-//                  |                   |     |                                          |
+//                  |     |-------------|     |     |-----------------------------|
+//                  |                   |     |                                   |
+//                  |                   |     |                                   |
 // GPU index           0  ... MAX_SUB-1   ...    i*MAX_SUB    ... (i+1)*MAX_SUB-1   ... UVM_GLOBAL_ID_MAX_GPUS-1
 //
 // ID values  |  0  |  1  ... MAX_SUB     ...   (i*MAX_SUB)+1 ... (i+1)*MAX_SUB     ... UVM_GLOBAL_ID_MAX_PROCESSORS-1 ||
@@ -214,7 +210,7 @@ static proc_id_t prefix_fn_mask##_find_first_id(const mask_t *mask)
                                                                                                             \
 static proc_id_t prefix_fn_mask##_find_first_gpu_id(const mask_t *mask)                                      \
 {                                                                                                            \
-    return proc_id_ctor(find_next_bit(mask->bitmap, (maxval), UVM_PARENT_ID_GPU0_VALUE));                    \
+    return proc_id_ctor(find_next_bit(mask->bitmap, (maxval), UVM_ID_GPU0_VALUE));                           \
 }                                                                                                            \
                                                                                                             \
 static proc_id_t prefix_fn_mask##_find_next_id(const mask_t *mask, proc_id_t min_id)                         \
@@ -256,7 +252,7 @@ static NvU32 prefix_fn_mask##_get_gpu_count(const mask_t *mask)
 {                                                                                                            \
    NvU32 gpu_count = prefix_fn_mask##_get_count(mask);                                                      \
                                                                                                             \
-    if (prefix_fn_mask##_test(mask, proc_id_ctor(UVM_PARENT_ID_CPU_VALUE)))                                  \
+    if (prefix_fn_mask##_test(mask, proc_id_ctor(UVM_ID_CPU_VALUE)))                                         \
        --gpu_count;                                                                                         \
                                                                                                             \
    return gpu_count;                                                                                        \
@@ -265,55 +261,55 @@ static NvU32 prefix_fn_mask##_get_gpu_count(const mask_t *mask)
 typedef struct
 {
    NvU32 val;
-} uvm_parent_processor_id_t;
+} uvm_processor_id_t;

 typedef struct
 {
    NvU32 val;
 } uvm_global_processor_id_t;

-typedef uvm_parent_processor_id_t uvm_parent_gpu_id_t;
+typedef uvm_processor_id_t uvm_gpu_id_t;
 typedef uvm_global_processor_id_t uvm_global_gpu_id_t;

 // Static value assigned to the CPU
-#define UVM_PARENT_ID_CPU_VALUE      0
-#define UVM_PARENT_ID_GPU0_VALUE     (UVM_PARENT_ID_CPU_VALUE + 1)
+#define UVM_ID_CPU_VALUE      0
+#define UVM_ID_GPU0_VALUE     (UVM_ID_CPU_VALUE + 1)

 // ID values for the CPU and first GPU, respectively; the values for both types
 // of IDs must match to enable sharing of UVM_PROCESSOR_MASK().
-#define UVM_GLOBAL_ID_CPU_VALUE  UVM_PARENT_ID_CPU_VALUE
-#define UVM_GLOBAL_ID_GPU0_VALUE UVM_PARENT_ID_GPU0_VALUE
+#define UVM_GLOBAL_ID_CPU_VALUE  UVM_ID_CPU_VALUE
+#define UVM_GLOBAL_ID_GPU0_VALUE UVM_ID_GPU0_VALUE

 // Maximum number of GPUs/processors that can be represented with the id types
-#define UVM_PARENT_ID_MAX_GPUS       UVM_MAX_GPUS
-#define UVM_PARENT_ID_MAX_PROCESSORS UVM_MAX_PROCESSORS
+#define UVM_ID_MAX_GPUS       UVM_MAX_GPUS
+#define UVM_ID_MAX_PROCESSORS UVM_MAX_PROCESSORS

-#define UVM_PARENT_ID_MAX_SUB_PROCESSORS 8
+#define UVM_ID_MAX_SUB_PROCESSORS 8

-#define UVM_GLOBAL_ID_MAX_GPUS       (UVM_PARENT_ID_MAX_GPUS * UVM_PARENT_ID_MAX_SUB_PROCESSORS)
+#define UVM_GLOBAL_ID_MAX_GPUS       (UVM_MAX_GPUS * UVM_ID_MAX_SUB_PROCESSORS)
 #define UVM_GLOBAL_ID_MAX_PROCESSORS (UVM_GLOBAL_ID_MAX_GPUS + 1)

-#define UVM_PARENT_ID_CPU     ((uvm_parent_processor_id_t) { .val = UVM_PARENT_ID_CPU_VALUE })
-#define UVM_PARENT_ID_INVALID ((uvm_parent_processor_id_t) { .val = UVM_PARENT_ID_MAX_PROCESSORS })
+#define UVM_ID_CPU            ((uvm_processor_id_t) { .val = UVM_ID_CPU_VALUE })
+#define UVM_ID_INVALID        ((uvm_processor_id_t) { .val = UVM_ID_MAX_PROCESSORS })
 #define UVM_GLOBAL_ID_CPU     ((uvm_global_processor_id_t) { .val = UVM_GLOBAL_ID_CPU_VALUE })
 #define UVM_GLOBAL_ID_INVALID ((uvm_global_processor_id_t) { .val = UVM_GLOBAL_ID_MAX_PROCESSORS })

-#define UVM_PARENT_ID_CHECK_BOUNDS(id) UVM_ASSERT_MSG(id.val <= UVM_PARENT_ID_MAX_PROCESSORS, "id %u\n", id.val)
+#define UVM_ID_CHECK_BOUNDS(id) UVM_ASSERT_MSG(id.val <= UVM_ID_MAX_PROCESSORS, "id %u\n", id.val)

 #define UVM_GLOBAL_ID_CHECK_BOUNDS(id) UVM_ASSERT_MSG(id.val <= UVM_GLOBAL_ID_MAX_PROCESSORS, "id %u\n", id.val)

-static int uvm_parent_id_cmp(uvm_parent_processor_id_t id1, uvm_parent_processor_id_t id2)
+static int uvm_id_cmp(uvm_processor_id_t id1, uvm_processor_id_t id2)
 {
-    UVM_PARENT_ID_CHECK_BOUNDS(id1);
-    UVM_PARENT_ID_CHECK_BOUNDS(id2);
+    UVM_ID_CHECK_BOUNDS(id1);
+    UVM_ID_CHECK_BOUNDS(id2);

    return UVM_CMP_DEFAULT(id1.val, id2.val);
 }

-static bool uvm_parent_id_equal(uvm_parent_processor_id_t id1, uvm_parent_processor_id_t id2)
+static bool uvm_id_equal(uvm_processor_id_t id1, uvm_processor_id_t id2)
 {
-    UVM_PARENT_ID_CHECK_BOUNDS(id1);
-    UVM_PARENT_ID_CHECK_BOUNDS(id2);
+    UVM_ID_CHECK_BOUNDS(id1);
+    UVM_ID_CHECK_BOUNDS(id2);

    return id1.val == id2.val;
 }
@@ -334,30 +330,30 @@ static bool uvm_global_id_equal(uvm_global_processor_id_t id1, uvm_global_proces
    return id1.val == id2.val;
 }

-#define UVM_PARENT_ID_IS_CPU(id)     uvm_parent_id_equal(id, UVM_PARENT_ID_CPU)
-#define UVM_PARENT_ID_IS_INVALID(id) uvm_parent_id_equal(id, UVM_PARENT_ID_INVALID)
-#define UVM_PARENT_ID_IS_VALID(id)   (!UVM_PARENT_ID_IS_INVALID(id))
-#define UVM_PARENT_ID_IS_GPU(id)     (!UVM_PARENT_ID_IS_CPU(id) && !UVM_PARENT_ID_IS_INVALID(id))
+#define UVM_ID_IS_CPU(id)     uvm_id_equal(id, UVM_ID_CPU)
+#define UVM_ID_IS_INVALID(id) uvm_id_equal(id, UVM_ID_INVALID)
+#define UVM_ID_IS_VALID(id)   (!UVM_ID_IS_INVALID(id))
+#define UVM_ID_IS_GPU(id)     (!UVM_ID_IS_CPU(id) && !UVM_ID_IS_INVALID(id))

 #define UVM_GLOBAL_ID_IS_CPU(id)     uvm_global_id_equal(id, UVM_GLOBAL_ID_CPU)
 #define UVM_GLOBAL_ID_IS_INVALID(id) uvm_global_id_equal(id, UVM_GLOBAL_ID_INVALID)
 #define UVM_GLOBAL_ID_IS_VALID(id)   (!UVM_GLOBAL_ID_IS_INVALID(id))
 #define UVM_GLOBAL_ID_IS_GPU(id)     (!UVM_GLOBAL_ID_IS_CPU(id) && !UVM_GLOBAL_ID_IS_INVALID(id))

-static uvm_parent_processor_id_t uvm_parent_id_from_value(NvU32 val)
+static uvm_processor_id_t uvm_id_from_value(NvU32 val)
 {
-    uvm_parent_processor_id_t ret = { .val = val };
+    uvm_processor_id_t ret = { .val = val };

-    UVM_PARENT_ID_CHECK_BOUNDS(ret);
+    UVM_ID_CHECK_BOUNDS(ret);

    return ret;
 }

-static uvm_parent_gpu_id_t uvm_parent_gpu_id_from_value(NvU32 val)
+static uvm_gpu_id_t uvm_gpu_id_from_value(NvU32 val)
 {
-    uvm_parent_gpu_id_t ret = uvm_parent_id_from_value(val);
+    uvm_gpu_id_t ret = uvm_id_from_value(val);

-    UVM_ASSERT(!UVM_PARENT_ID_IS_CPU(ret));
+    UVM_ASSERT(!UVM_ID_IS_CPU(ret));

    return ret;
 }
@@ -380,34 +376,34 @@ static uvm_global_gpu_id_t uvm_global_gpu_id_from_value(NvU32 val)
    return ret;
 }

-// Create a parent GPU id from the given parent GPU id index (previously
-// obtained via uvm_parent_id_gpu_index)
-static uvm_parent_gpu_id_t uvm_parent_gpu_id_from_index(NvU32 index)
+// Create a GPU id from the given GPU id index (previously obtained via
+// uvm_id_gpu_index)
+static uvm_gpu_id_t uvm_gpu_id_from_index(NvU32 index)
 {
-    return uvm_parent_gpu_id_from_value(index + UVM_PARENT_ID_GPU0_VALUE);
+    return uvm_gpu_id_from_value(index + UVM_ID_GPU0_VALUE);
 }

-static uvm_parent_processor_id_t uvm_parent_id_next(uvm_parent_processor_id_t id)
+static uvm_processor_id_t uvm_id_next(uvm_processor_id_t id)
 {
    ++id.val;

-    UVM_PARENT_ID_CHECK_BOUNDS(id);
+    UVM_ID_CHECK_BOUNDS(id);

    return id;
 }

-static uvm_parent_gpu_id_t uvm_parent_gpu_id_next(uvm_parent_gpu_id_t id)
+static uvm_gpu_id_t uvm_gpu_id_next(uvm_gpu_id_t id)
 {
-    UVM_ASSERT(UVM_PARENT_ID_IS_GPU(id));
+    UVM_ASSERT(UVM_ID_IS_GPU(id));

    ++id.val;

-    UVM_PARENT_ID_CHECK_BOUNDS(id);
+    UVM_ID_CHECK_BOUNDS(id);

    return id;
 }

-// Same as uvm_parent_gpu_id_from_index but for uvm_global_processor_id_t
+// Same as uvm_gpu_id_from_index but for uvm_global_processor_id_t
 static uvm_global_gpu_id_t uvm_global_gpu_id_from_index(NvU32 index)
 {
    return uvm_global_gpu_id_from_value(index + UVM_GLOBAL_ID_GPU0_VALUE);
@@ -433,11 +429,11 @@ static uvm_global_gpu_id_t uvm_global_gpu_id_next(uvm_global_gpu_id_t id)
    return id;
 }

-// This function returns the numerical value within
-// [0, UVM_PARENT_ID_MAX_PROCESSORS) of the given parent processor id.
-static NvU32 uvm_parent_id_value(uvm_parent_processor_id_t id)
+// This function returns the numerical value within [0, UVM_ID_MAX_PROCESSORS)
+// of the given processor id
+static NvU32 uvm_id_value(uvm_processor_id_t id)
 {
-    UVM_ASSERT(UVM_PARENT_ID_IS_VALID(id));
+    UVM_ASSERT(UVM_ID_IS_VALID(id));

    return id.val;
 }
@@ -452,12 +448,12 @@ static NvU32 uvm_global_id_value(uvm_global_processor_id_t id)
 }

 // This function returns the index of the given GPU id within the GPU id space
-// [0, UVM_PARENT_ID_MAX_GPUS)
-static NvU32 uvm_parent_id_gpu_index(uvm_parent_gpu_id_t id)
+// [0, UVM_ID_MAX_GPUS)
+static NvU32 uvm_id_gpu_index(uvm_gpu_id_t id)
 {
-    UVM_ASSERT(UVM_PARENT_ID_IS_GPU(id));
+    UVM_ASSERT(UVM_ID_IS_GPU(id));

-    return id.val - UVM_PARENT_ID_GPU0_VALUE;
+    return id.val - UVM_ID_GPU0_VALUE;
 }

 // This function returns the index of the given GPU id within the GPU id space
@@ -469,61 +465,61 @@ static NvU32 uvm_global_id_gpu_index(const uvm_global_gpu_id_t id)
    return id.val - UVM_GLOBAL_ID_GPU0_VALUE;
 }

-static NvU32 uvm_global_id_gpu_index_from_parent_gpu_id(const uvm_parent_gpu_id_t id)
+static NvU32 uvm_global_id_gpu_index_from_gpu_id(const uvm_gpu_id_t id)
 {
-    UVM_ASSERT(UVM_PARENT_ID_IS_GPU(id));
+    UVM_ASSERT(UVM_ID_IS_GPU(id));

-    return uvm_parent_id_gpu_index(id) * UVM_PARENT_ID_MAX_SUB_PROCESSORS;
+    return uvm_id_gpu_index(id) * UVM_ID_MAX_SUB_PROCESSORS;
 }

-static NvU32 uvm_parent_id_gpu_index_from_global_gpu_id(const uvm_global_gpu_id_t id)
+static NvU32 uvm_id_gpu_index_from_global_gpu_id(const uvm_global_gpu_id_t id)
 {
    UVM_ASSERT(UVM_GLOBAL_ID_IS_GPU(id));

-    return uvm_global_id_gpu_index(id) / UVM_PARENT_ID_MAX_SUB_PROCESSORS;
+    return uvm_global_id_gpu_index(id) / UVM_ID_MAX_SUB_PROCESSORS;
 }

-static uvm_global_gpu_id_t uvm_global_gpu_id_from_parent_gpu_id(const uvm_parent_gpu_id_t id)
+static uvm_global_gpu_id_t uvm_global_gpu_id_from_gpu_id(const uvm_gpu_id_t id)
 {
-    UVM_ASSERT(UVM_PARENT_ID_IS_GPU(id));
+    UVM_ASSERT(UVM_ID_IS_GPU(id));

-    return uvm_global_gpu_id_from_index(uvm_global_id_gpu_index_from_parent_gpu_id(id));
+    return uvm_global_gpu_id_from_index(uvm_global_id_gpu_index_from_gpu_id(id));
 }

 static uvm_global_gpu_id_t uvm_global_gpu_id_from_parent_index(NvU32 index)
 {
-    UVM_ASSERT(index < UVM_PARENT_ID_MAX_GPUS);
+    UVM_ASSERT(index < UVM_MAX_GPUS);

-    return uvm_global_gpu_id_from_parent_gpu_id(uvm_parent_gpu_id_from_value(index + UVM_GLOBAL_ID_GPU0_VALUE));
+    return uvm_global_gpu_id_from_gpu_id(uvm_gpu_id_from_value(index + UVM_GLOBAL_ID_GPU0_VALUE));
 }

-static uvm_global_gpu_id_t uvm_global_gpu_id_from_sub_processor_index(const uvm_parent_gpu_id_t id, NvU32 sub_index)
+static uvm_global_gpu_id_t uvm_global_gpu_id_from_sub_processor_index(const uvm_gpu_id_t id, NvU32 sub_index)
 {
    NvU32 index;

-    UVM_ASSERT(sub_index < UVM_PARENT_ID_MAX_SUB_PROCESSORS);
+    UVM_ASSERT(sub_index < UVM_ID_MAX_SUB_PROCESSORS);

-    index = uvm_global_id_gpu_index_from_parent_gpu_id(id) + sub_index;
+    index = uvm_global_id_gpu_index_from_gpu_id(id) + sub_index;
    return uvm_global_gpu_id_from_index(index);
 }

-static uvm_parent_gpu_id_t uvm_parent_gpu_id_from_global_gpu_id(const uvm_global_gpu_id_t id)
+static uvm_gpu_id_t uvm_gpu_id_from_global_gpu_id(const uvm_global_gpu_id_t id)
 {
    UVM_ASSERT(UVM_GLOBAL_ID_IS_GPU(id));

-    return uvm_parent_gpu_id_from_index(uvm_parent_id_gpu_index_from_global_gpu_id(id));
+    return uvm_gpu_id_from_index(uvm_id_gpu_index_from_global_gpu_id(id));
 }

 static NvU32 uvm_global_id_sub_processor_index(const uvm_global_gpu_id_t id)
 {
-    return uvm_global_id_gpu_index(id) % UVM_PARENT_ID_MAX_SUB_PROCESSORS;
+    return uvm_global_id_gpu_index(id) % UVM_ID_MAX_SUB_PROCESSORS;
 }

 UVM_PROCESSOR_MASK(uvm_processor_mask_t,              \
                   uvm_processor_mask,                \
-                   UVM_PARENT_ID_MAX_PROCESSORS,      \
-                   uvm_parent_processor_id_t,         \
-                   uvm_parent_id_from_value)
+                   UVM_ID_MAX_PROCESSORS,             \
+                   uvm_processor_id_t,                \
+                   uvm_id_from_value)

 UVM_PROCESSOR_MASK(uvm_global_processor_mask_t,       \
                   uvm_global_processor_mask,         \
@@ -537,19 +533,19 @@ static bool uvm_processor_mask_gpu_subset(const uvm_processor_mask_t *subset, co
 {
    uvm_processor_mask_t subset_gpus;
    uvm_processor_mask_copy(&subset_gpus, subset);
-    uvm_processor_mask_clear(&subset_gpus, UVM_PARENT_ID_CPU);
+    uvm_processor_mask_clear(&subset_gpus, UVM_ID_CPU);
    return uvm_processor_mask_subset(&subset_gpus, mask);
 }

 #define for_each_id_in_mask(id, mask)                                                                 \
    for ((id) = uvm_processor_mask_find_first_id(mask);                                               \
-         UVM_PARENT_ID_IS_VALID(id);                                                                  \
-         (id) = uvm_processor_mask_find_next_id((mask), uvm_parent_id_next(id)))
+         UVM_ID_IS_VALID(id);                                                                         \
+         (id) = uvm_processor_mask_find_next_id((mask), uvm_id_next(id)))

 #define for_each_gpu_id_in_mask(gpu_id, mask)                                                         \
    for ((gpu_id) = uvm_processor_mask_find_first_gpu_id((mask));                                     \
-         UVM_PARENT_ID_IS_VALID(gpu_id);                                                              \
-         (gpu_id) = uvm_processor_mask_find_next_id((mask), uvm_parent_gpu_id_next(gpu_id)))
+         UVM_ID_IS_VALID(gpu_id);                                                                     \
+         (gpu_id) = uvm_processor_mask_find_next_id((mask), uvm_gpu_id_next(gpu_id)))

 #define for_each_global_id_in_mask(id, mask)                                                          \
    for ((id) = uvm_global_processor_mask_find_first_id(mask);                                        \
@@ -563,36 +559,21 @@ static bool uvm_processor_mask_gpu_subset(const uvm_processor_mask_t *subset, co

 // Helper to iterate over all valid gpu ids
 #define for_each_gpu_id(i)       \
-    for (i = uvm_parent_gpu_id_from_value(UVM_PARENT_ID_GPU0_VALUE); UVM_PARENT_ID_IS_VALID(i); i = uvm_parent_gpu_id_next(i))
+    for (i = uvm_gpu_id_from_value(UVM_ID_GPU0_VALUE); UVM_ID_IS_VALID(i); i = uvm_gpu_id_next(i))
 #define for_each_global_gpu_id(i)  \
    for (i = uvm_global_gpu_id_from_value(UVM_GLOBAL_ID_GPU0_VALUE); UVM_GLOBAL_ID_IS_VALID(i); i = uvm_global_gpu_id_next(i))

 #define for_each_global_sub_processor_id_in_gpu(id, i) \
-    for (i = uvm_global_gpu_id_from_parent_gpu_id(id); \
+    for (i = uvm_global_gpu_id_from_gpu_id(id); \
         UVM_GLOBAL_ID_IS_VALID(i) && \
-         (uvm_global_id_value(i) < uvm_global_id_value(uvm_global_gpu_id_from_parent_gpu_id(id)) + UVM_PARENT_ID_MAX_SUB_PROCESSORS); \
+         (uvm_global_id_value(i) < uvm_global_id_value(uvm_global_gpu_id_from_gpu_id(id)) + UVM_ID_MAX_SUB_PROCESSORS); \
         i = uvm_global_gpu_id_next(i))

 // Helper to iterate over all valid gpu ids
-#define for_each_processor_id(i) for (i = UVM_PARENT_ID_CPU; UVM_PARENT_ID_IS_VALID(i); i = uvm_parent_id_next(i))
+#define for_each_processor_id(i) for (i = UVM_ID_CPU; UVM_ID_IS_VALID(i); i = uvm_id_next(i))

 #define for_each_global_id(i) for (i = UVM_GLOBAL_ID_CPU; UVM_GLOBAL_ID_IS_VALID(i); i = uvm_global_id_next(i))

-// Find the node in mask with the shorted distance (as returned by
-// node_distance) for src.
-// Note that the search is inclusive of src.
-// If mask has no bits set, NUMA_NO_NODE is returned.
-int uvm_find_closest_node_mask(int src, const nodemask_t *mask);
-
-// Iterate over all nodes in mask with increasing distance from src.
-// Note that this iterator is destructive of the mask.
-#define for_each_closest_uvm_node(nid, src, mask)                                                                      \
-    for ((nid) = uvm_find_closest_node_mask((src), &(mask));                                                           \
-         (nid) != NUMA_NO_NODE;                                                                                        \
-         node_clear((nid), (mask)), (nid) = uvm_find_closest_node_mask((src), &(mask)))
-
-#define for_each_possible_uvm_node(nid) for_each_node_mask((nid), node_possible_map)
-
 static bool uvm_processor_uuid_eq(const NvProcessorUuid *uuid1, const NvProcessorUuid *uuid2)
 {
    return memcmp(uuid1, uuid2, sizeof(*uuid1)) == 0;
@@ -604,78 +585,4 @@ static void uvm_processor_uuid_copy(NvProcessorUuid *dst, const NvProcessorUuid
    memcpy(dst, src, sizeof(*dst));
 }

-// TODO: Bug 4195538: [uvm][multi-SMC] Get UVM internal data structures ready to
-// meet multi-SMC requirements. Temporary aliases, they must be removed once
-// the data structures are converted.
-typedef uvm_parent_processor_id_t uvm_processor_id_t;
-typedef uvm_parent_gpu_id_t uvm_gpu_id_t;
-
-#define UVM_ID_CPU_VALUE                 UVM_PARENT_ID_CPU_VALUE
-#define UVM_ID_GPU0_VALUE                UVM_PARENT_ID_GPU0_VALUE
-#define UVM_ID_MAX_GPUS                  UVM_PARENT_ID_MAX_GPUS
-#define UVM_ID_MAX_PROCESSORS            UVM_PARENT_ID_MAX_PROCESSORS
-#define UVM_ID_MAX_SUB_PROCESSORS        UVM_PARENT_ID_MAX_SUB_PROCESSORS
-#define UVM_ID_CPU                       UVM_PARENT_ID_CPU
-#define UVM_ID_INVALID                   UVM_PARENT_ID_INVALID
-
-static int uvm_id_cmp(uvm_parent_processor_id_t id1, uvm_parent_processor_id_t id2)
-{
-    return UVM_CMP_DEFAULT(id1.val, id2.val);
-}
-
-static bool uvm_id_equal(uvm_parent_processor_id_t id1, uvm_parent_processor_id_t id2)
-{
-    return uvm_parent_id_equal(id1, id2);
-}
-
-#define UVM_ID_IS_CPU(id)     uvm_id_equal(id, UVM_ID_CPU)
-#define UVM_ID_IS_INVALID(id) uvm_id_equal(id, UVM_ID_INVALID)
-#define UVM_ID_IS_VALID(id)   (!UVM_ID_IS_INVALID(id))
-#define UVM_ID_IS_GPU(id)     (!UVM_ID_IS_CPU(id) && !UVM_ID_IS_INVALID(id))
-
-static uvm_parent_gpu_id_t uvm_gpu_id_from_value(NvU32 val)
-{
-    return uvm_parent_gpu_id_from_value(val);
-}
-
-static NvU32 uvm_id_value(uvm_parent_processor_id_t id)
-{
-    return uvm_parent_id_value(id);
-}
-
-static NvU32 uvm_id_gpu_index(uvm_parent_gpu_id_t id)
-{
-    return uvm_parent_id_gpu_index(id);
-}
-
-static NvU32 uvm_id_gpu_index_from_global_gpu_id(const uvm_global_gpu_id_t id)
-{
-    return uvm_parent_id_gpu_index_from_global_gpu_id(id);
-}
-
-static uvm_parent_gpu_id_t uvm_gpu_id_from_index(NvU32 index)
-{
-    return uvm_parent_gpu_id_from_index(index);
-}
-
-static uvm_parent_gpu_id_t uvm_gpu_id_next(uvm_parent_gpu_id_t id)
-{
-    return uvm_parent_gpu_id_next(id);
-}
-
-static uvm_parent_gpu_id_t uvm_gpu_id_from_global_gpu_id(const uvm_global_gpu_id_t id)
-{
-    return uvm_parent_gpu_id_from_global_gpu_id(id);
-}
-
-static NvU32 uvm_global_id_gpu_index_from_gpu_id(const uvm_parent_gpu_id_t id)
-{
-    return uvm_global_id_gpu_index_from_parent_gpu_id(id);
-}
-
-static uvm_global_gpu_id_t uvm_global_gpu_id_from_gpu_id(const uvm_parent_gpu_id_t id)
-{
-    return uvm_global_gpu_id_from_parent_gpu_id(id);
-}
-
 #endif
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Bernhard Stoeckner	8845de1ce4	535.216.03	2024-11-19 17:42:03 +01:00
Bernhard Stoeckner	60d85c464b	535.216.01	2024-10-22 17:35:00 +02:00
Bernhard Stoeckner	c588c3877f	535.183.06	2024-07-09 17:24:25 +02:00
Bernhard Stoeckner	4459285b60	535.183.01	2024-06-04 10:45:14 +02:00
Gaurav Juvekar	f4bdce9a0a	535.179	2024-05-08 08:14:09 -07:00
Bernhard Stoeckner	c042c7903d	535.171.04	2024-03-21 14:23:59 +01:00
Bernhard Stoeckner	044f70bbb8	535.161.08	2024-03-18 17:57:23 +01:00
Bernhard Stoeckner	6d33efe502	535.161.07	2024-02-22 17:28:26 +01:00
Bernhard Stoeckner	ee55481a49	535.154.05	2024-01-16 14:59:49 +01:00
Bernhard Stoeckner	7165299dee	535.146.02	2023-12-07 15:10:34 +01:00
Bernhard Stoeckner	e573018659	535.129.03	2023-10-31 14:22:38 +01:00