545.23.08

545.23.06
535.113.01
2026-01-28 03:59:48 +00:00 · 2023-11-17 11:03:42 -08:00 · 2023-10-17 09:25:29 -07:00 · 2023-09-21 10:43:43 -07:00 · 2023-08-22 15:09:37 +02:00 · 2023-08-08 18:28:38 +02:00
979 changed files with 151253 additions and 119602 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,7 +1,29 @@
 # Changelog

+## Release 545 Entries
+
+### [545.23.08] 2023-11-17
+
+#### Fixed
+
+- Fix always-false conditional, [#493](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/493) by @meme8383
+
+#### Added
+
+- Added beta-quality support for GeForce and Workstation GPUs. Please see the "Open Linux Kernel Modules" chapter in the NVIDIA GPU driver end user README for details.
+
 ## Release 535 Entries

+### [535.113.01] 2023-09-21
+
+#### Fixed
+
+- Fixed building main against current centos stream 8 fails, [#550](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/550) by @airlied
+
+### [535.104.05] 2023-08-22
+
+### [535.98] 2023-08-08
+
 ### [535.86.10] 2023-07-31

 ### [535.86.05] 2023-07-18
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 # NVIDIA Linux Open GPU Kernel Module Source

 This is the source release of the NVIDIA Linux open GPU kernel modules,
-version 535.86.10.
+version 545.23.08.


 ## How to Build
@@ -17,7 +17,7 @@ as root:

 Note that the kernel modules built here must be used with GSP
 firmware and user-space NVIDIA GPU driver components from a corresponding
-535.86.10 driver release.  This can be achieved by installing
+545.23.08 driver release.  This can be achieved by installing
 the NVIDIA GPU driver from the .run file using the `--no-kernel-modules`
 option.  E.g.,

@@ -179,16 +179,16 @@ software applications.

 ## Compatible GPUs

-The open-gpu-kernel-modules can be used on any Turing or later GPU
-(see the table below). However, in the 535.86.10 release,
-GeForce and Workstation support is still considered alpha-quality.
+The NVIDIA open kernel modules can be used on any Turing or later GPU
+(see the table below). However, in the __DRIVER_VERION__ release, GeForce and
+Workstation support is considered to be Beta quality. The open kernel modules
+are suitable for broad usage, and NVIDIA requests feedback on any issues
+encountered specific to them.

-To enable use of the open kernel modules on GeForce and Workstation GPUs,
-set the "NVreg_OpenRmEnableUnsupportedGpus" nvidia.ko kernel module
-parameter to 1. For more details, see the NVIDIA GPU driver end user
-README here:
+For details on feature support and limitations, see the NVIDIA GPU driver
+end user README here:

-https://us.download.nvidia.com/XFree86/Linux-x86_64/535.86.10/README/kernel_open.html
+https://us.download.nvidia.com/XFree86/Linux-x86_64/545.23.08/README/kernel_open.html

 In the below table, if three IDs are listed, the first is the PCI Device 
 ID, the second is the PCI Subsystem Vendor ID, and the third is the PCI
@@ -665,6 +665,7 @@ Subsystem Device ID.
 | NVIDIA PG506-232                                | 20B6 10DE 1492 |
 | NVIDIA A30                                      | 20B7 10DE 1532 |
 | NVIDIA A30                                      | 20B7 10DE 1804 |
+| NVIDIA A800-SXM4-40GB                           | 20BD 10DE 17F4 |
 | NVIDIA A100-PCIE-40GB                           | 20F1 10DE 145F |
 | NVIDIA A800-SXM4-80GB                           | 20F3 10DE 179B |
 | NVIDIA A800-SXM4-80GB                           | 20F3 10DE 179C |
@@ -676,6 +677,10 @@ Subsystem Device ID.
 | NVIDIA A800-SXM4-80GB                           | 20F3 10DE 17A2 |
 | NVIDIA A800 80GB PCIe                           | 20F5 10DE 1799 |
 | NVIDIA A800 80GB PCIe LC                        | 20F5 10DE 179A |
+| NVIDIA A800 40GB Active                         | 20F6 1028 180A |
+| NVIDIA A800 40GB Active                         | 20F6 103C 180A |
+| NVIDIA A800 40GB Active                         | 20F6 10DE 180A |
+| NVIDIA A800 40GB Active                         | 20F6 17AA 180A |
 | NVIDIA GeForce GTX 1660 Ti                      | 2182           |
 | NVIDIA GeForce GTX 1660                         | 2184           |
 | NVIDIA GeForce GTX 1650 SUPER                   | 2187           |
@@ -734,6 +739,7 @@ Subsystem Device ID.
 | NVIDIA A10                                      | 2236 10DE 1482 |
 | NVIDIA A10G                                     | 2237 10DE 152F |
 | NVIDIA A10M                                     | 2238 10DE 1677 |
+| NVIDIA H100 NVL                                 | 2321 10DE 1839 |
 | NVIDIA H800 PCIe                                | 2322 10DE 17A4 |
 | NVIDIA H800                                     | 2324 10DE 17A6 |
 | NVIDIA H800                                     | 2324 10DE 17A8 |
@@ -741,6 +747,7 @@ Subsystem Device ID.
 | NVIDIA H100 80GB HBM3                           | 2330 10DE 16C1 |
 | NVIDIA H100 PCIe                                | 2331 10DE 1626 |
 | NVIDIA H100                                     | 2339 10DE 17FC |
+| NVIDIA H800 NVL                                 | 233A 10DE 183A |
 | NVIDIA GeForce RTX 3060 Ti                      | 2414           |
 | NVIDIA GeForce RTX 3080 Ti Laptop GPU           | 2420           |
 | NVIDIA RTX A5500 Laptop GPU                     | 2438           |
@@ -829,12 +836,19 @@ Subsystem Device ID.
 | NVIDIA RTX 6000 Ada Generation                  | 26B1 103C 16A1 |
 | NVIDIA RTX 6000 Ada Generation                  | 26B1 10DE 16A1 |
 | NVIDIA RTX 6000 Ada Generation                  | 26B1 17AA 16A1 |
+| NVIDIA RTX 5000 Ada Generation                  | 26B2 1028 17FA |
+| NVIDIA RTX 5000 Ada Generation                  | 26B2 103C 17FA |
+| NVIDIA RTX 5000 Ada Generation                  | 26B2 10DE 17FA |
+| NVIDIA RTX 5000 Ada Generation                  | 26B2 17AA 17FA |
 | NVIDIA L40                                      | 26B5 10DE 169D |
 | NVIDIA L40                                      | 26B5 10DE 17DA |
+| NVIDIA L40S                                     | 26B9 10DE 1851 |
+| NVIDIA L40S                                     | 26B9 10DE 18CF |
 | NVIDIA GeForce RTX 4080                         | 2704           |
 | NVIDIA GeForce RTX 4090 Laptop GPU              | 2717           |
 | NVIDIA RTX 5000 Ada Generation Laptop GPU       | 2730           |
 | NVIDIA GeForce RTX 4090 Laptop GPU              | 2757           |
+| NVIDIA RTX 5000 Ada Generation Embedded GPU     | 2770           |
 | NVIDIA GeForce RTX 4070 Ti                      | 2782           |
 | NVIDIA GeForce RTX 4070                         | 2786           |
 | NVIDIA GeForce RTX 4080 Laptop GPU              | 27A0           |
@@ -842,11 +856,20 @@ Subsystem Device ID.
 | NVIDIA RTX 4000 SFF Ada Generation              | 27B0 103C 16FA |
 | NVIDIA RTX 4000 SFF Ada Generation              | 27B0 10DE 16FA |
 | NVIDIA RTX 4000 SFF Ada Generation              | 27B0 17AA 16FA |
+| NVIDIA RTX 4500 Ada Generation                  | 27B1 1028 180C |
+| NVIDIA RTX 4500 Ada Generation                  | 27B1 103C 180C |
+| NVIDIA RTX 4500 Ada Generation                  | 27B1 10DE 180C |
+| NVIDIA RTX 4500 Ada Generation                  | 27B1 17AA 180C |
+| NVIDIA RTX 4000 Ada Generation                  | 27B2 1028 181B |
+| NVIDIA RTX 4000 Ada Generation                  | 27B2 103C 181B |
+| NVIDIA RTX 4000 Ada Generation                  | 27B2 10DE 181B |
+| NVIDIA RTX 4000 Ada Generation                  | 27B2 17AA 181B |
 | NVIDIA L4                                       | 27B8 10DE 16CA |
 | NVIDIA L4                                       | 27B8 10DE 16EE |
 | NVIDIA RTX 4000 Ada Generation Laptop GPU       | 27BA           |
 | NVIDIA RTX 3500 Ada Generation Laptop GPU       | 27BB           |
 | NVIDIA GeForce RTX 4080 Laptop GPU              | 27E0           |
+| NVIDIA RTX 3500 Ada Generation Embedded GPU     | 27FB           |
 | NVIDIA GeForce RTX 4060 Ti                      | 2803           |
 | NVIDIA GeForce RTX 4060 Ti                      | 2805           |
 | NVIDIA GeForce RTX 4070 Laptop GPU              | 2820           |
@@ -858,3 +881,4 @@ Subsystem Device ID.
 | NVIDIA RTX 2000 Ada Generation Laptop GPU       | 28B8           |
 | NVIDIA GeForce RTX 4060 Laptop GPU              | 28E0           |
 | NVIDIA GeForce RTX 4050 Laptop GPU              | 28E1           |
+| NVIDIA RTX 2000 Ada Generation Embedded GPU     | 28F8           |
--- a/kernel-open/Kbuild
+++ b/kernel-open/Kbuild
@@ -72,12 +72,24 @@ EXTRA_CFLAGS += -I$(src)/common/inc
 EXTRA_CFLAGS += -I$(src)
 EXTRA_CFLAGS += -Wall $(DEFINES) $(INCLUDES) -Wno-cast-qual -Wno-error -Wno-format-extra-args
 EXTRA_CFLAGS += -D__KERNEL__ -DMODULE -DNVRM
-EXTRA_CFLAGS += -DNV_VERSION_STRING=\"535.86.10\"
+EXTRA_CFLAGS += -DNV_VERSION_STRING=\"545.23.08\"

 ifneq ($(SYSSRCHOST1X),)
 EXTRA_CFLAGS += -I$(SYSSRCHOST1X)
 endif

+# Some Android kernels prohibit driver use of filesystem functions like
+# filp_open() and kernel_read(). Disable the NV_FILESYSTEM_ACCESS_AVAILABLE
+# functionality that uses those functions when building for Android.
+
+PLATFORM_IS_ANDROID ?= 0
+
+ifeq ($(PLATFORM_IS_ANDROID),1)
+ EXTRA_CFLAGS += -DNV_FILESYSTEM_ACCESS_AVAILABLE=0
+else
+ EXTRA_CFLAGS += -DNV_FILESYSTEM_ACCESS_AVAILABLE=1
+endif
+
 EXTRA_CFLAGS += -Wno-unused-function

 ifneq ($(NV_BUILD_TYPE),debug)
@@ -92,7 +104,6 @@ endif

 ifeq ($(NV_BUILD_TYPE),debug)
 EXTRA_CFLAGS += -g
- EXTRA_CFLAGS += $(call cc-option,-gsplit-dwarf,)
 endif

 EXTRA_CFLAGS += -ffreestanding
@@ -214,6 +225,7 @@ $(obj)/conftest/patches.h: $(NV_CONFTEST_SCRIPT)
 NV_HEADER_PRESENCE_TESTS = \
 asm/system.h \
 drm/drmP.h \
+ drm/drm_aperture.h \
 drm/drm_auth.h \
 drm/drm_gem.h \
 drm/drm_crtc.h \
@@ -224,6 +236,7 @@ NV_HEADER_PRESENCE_TESTS = \
 drm/drm_encoder.h \
 drm/drm_atomic_uapi.h \
 drm/drm_drv.h \
+ drm/drm_fbdev_generic.h \
 drm/drm_framebuffer.h \
 drm/drm_connector.h \
 drm/drm_probe_helper.h \
@@ -257,6 +270,7 @@ NV_HEADER_PRESENCE_TESTS = \
 linux/sched/task_stack.h \
 xen/ioemu.h \
 linux/fence.h \
+ linux/dma-fence.h \
 linux/dma-resv.h \
 soc/tegra/chip-id.h \
 soc/tegra/fuse.h \
@@ -275,6 +289,7 @@ NV_HEADER_PRESENCE_TESTS = \
 asm/opal-api.h \
 sound/hdaudio.h \
 asm/pgtable_types.h \
+ asm/page.h \
 linux/stringhash.h \
 linux/dma-map-ops.h \
 rdma/peer_mem.h \
@@ -300,7 +315,10 @@ NV_HEADER_PRESENCE_TESTS = \
 linux/vfio_pci_core.h \
 linux/mdev.h \
 soc/tegra/bpmp-abi.h \
- soc/tegra/bpmp.h
+ soc/tegra/bpmp.h \
+ linux/sync_file.h \
+ linux/cc_platform.h \
+ asm/cpufeature.h

 # Filename to store the define for the header in $(1); this is only consumed by
 # the rule below that concatenates all of these together.
--- a/kernel-open/common/inc/nv-chardev-numbers.h
+++ b/kernel-open/common/inc/nv-chardev-numbers.h
@@ -0,0 +1,43 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: MIT
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef _NV_CHARDEV_NUMBERS_H_
+#define _NV_CHARDEV_NUMBERS_H_
+
+// NVIDIA's reserved major character device number (Linux).
+#define NV_MAJOR_DEVICE_NUMBER  195
+
+// Minor numbers 0 to 247 reserved for regular devices
+#define NV_MINOR_DEVICE_NUMBER_REGULAR_MAX         247
+
+// Minor numbers 248 to 253 currently unused
+
+// Minor number 254 reserved for the modeset device (provided by NVKMS)
+#define NV_MINOR_DEVICE_NUMBER_MODESET_DEVICE      254
+
+// Minor number 255 reserved for the control device
+#define NV_MINOR_DEVICE_NUMBER_CONTROL_DEVICE      255
+
+#endif  // _NV_CHARDEV_NUMBERS_H_
+
--- a/kernel-open/common/inc/nv-ioctl-numa.h
+++ b/kernel-open/common/inc/nv-ioctl-numa.h
@@ -25,14 +25,12 @@
 #ifndef NV_IOCTL_NUMA_H
 #define NV_IOCTL_NUMA_H

-#if defined(NV_LINUX)
-
 #include <nv-ioctl-numbers.h>

-#if defined(NV_KERNEL_INTERFACE_LAYER)
-
+#if defined(NV_KERNEL_INTERFACE_LAYER) && defined(NV_LINUX)
 #include <linux/types.h>
-
+#elif defined (NV_KERNEL_INTERFACE_LAYER) && defined(NV_BSD)
+#include <sys/stdint.h>
 #else

 #include <stdint.h>
@@ -81,5 +79,3 @@ typedef struct nv_ioctl_set_numa_status
 #define NV_IOCTL_NUMA_STATUS_OFFLINE_FAILED         6

 #endif
-
-#endif
--- a/kernel-open/common/inc/nv-kthread-q-os.h
+++ b/kernel-open/common/inc/nv-kthread-q-os.h
@@ -0,0 +1,62 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: MIT
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __NV_KTHREAD_QUEUE_OS_H__
+#define __NV_KTHREAD_QUEUE_OS_H__
+
+#include <linux/types.h>            // atomic_t
+#include <linux/list.h>             // list
+#include <linux/sched.h>            // task_struct
+#include <linux/numa.h>             // NUMA_NO_NODE
+#include <linux/semaphore.h>
+
+#include "conftest.h"
+
+struct nv_kthread_q
+{
+    struct list_head q_list_head;
+    spinlock_t q_lock;
+
+    // This is a counting semaphore. It gets incremented and decremented
+    // exactly once for each item that is added to the queue.
+    struct semaphore q_sem;
+    atomic_t main_loop_should_exit;
+
+    struct task_struct *q_kthread;
+};
+
+struct nv_kthread_q_item
+{
+    struct list_head q_list_node;
+    nv_q_func_t function_to_run;
+    void *function_args;
+};
+
+
+#ifndef NUMA_NO_NODE
+#define NUMA_NO_NODE (-1)
+#endif
+
+#define NV_KTHREAD_NO_NODE NUMA_NO_NODE
+
+#endif
--- a/kernel-open/common/inc/nv-kthread-q.h
+++ b/kernel-open/common/inc/nv-kthread-q.h
@@ -24,13 +24,14 @@
 #ifndef __NV_KTHREAD_QUEUE_H__
 #define __NV_KTHREAD_QUEUE_H__

-#include <linux/types.h>            // atomic_t
-#include <linux/list.h>             // list
-#include <linux/sched.h>            // task_struct
-#include <linux/numa.h>             // NUMA_NO_NODE
-#include <linux/semaphore.h>
+struct nv_kthread_q;
+struct nv_kthread_q_item;
+typedef struct nv_kthread_q nv_kthread_q_t;
+typedef struct nv_kthread_q_item nv_kthread_q_item_t;

-#include "conftest.h"
+typedef void (*nv_q_func_t)(void *args);
+
+#include "nv-kthread-q-os.h"

 ////////////////////////////////////////////////////////////////////////////////
 // nv_kthread_q:
@@ -85,38 +86,6 @@
 //
 ////////////////////////////////////////////////////////////////////////////////

-typedef struct nv_kthread_q nv_kthread_q_t;
-typedef struct nv_kthread_q_item nv_kthread_q_item_t;
-
-typedef void (*nv_q_func_t)(void *args);
-
-struct nv_kthread_q
-{
-    struct list_head q_list_head;
-    spinlock_t q_lock;
-
-    // This is a counting semaphore. It gets incremented and decremented
-    // exactly once for each item that is added to the queue.
-    struct semaphore q_sem;
-    atomic_t main_loop_should_exit;
-
-    struct task_struct *q_kthread;
-};
-
-struct nv_kthread_q_item
-{
-    struct list_head q_list_node;
-    nv_q_func_t function_to_run;
-    void *function_args;
-};
-
-
-#ifndef NUMA_NO_NODE
-#define NUMA_NO_NODE (-1)
-#endif
-
-#define NV_KTHREAD_NO_NODE NUMA_NO_NODE
-
 //
 // The queue must not be used before calling this routine.
 //
@@ -155,10 +124,7 @@ int nv_kthread_q_init_on_node(nv_kthread_q_t *q,
 // This routine is the same as nv_kthread_q_init_on_node() with the exception
 // that the queue stack will be allocated on the NUMA node of the caller.
 //
-static inline int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname)
-{
-    return nv_kthread_q_init_on_node(q, qname, NV_KTHREAD_NO_NODE);
-}
+int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname);

 //
 // The caller is responsible for stopping all queues, by calling this routine
--- a/kernel-open/common/inc/nv-linux.h
+++ b/kernel-open/common/inc/nv-linux.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2001-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2001-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -248,7 +248,7 @@ NV_STATUS nvos_forward_error_to_cray(struct pci_dev *, NvU32,
 #undef NV_SET_PAGES_UC_PRESENT
 #endif

-#if !defined(NVCPU_AARCH64) && !defined(NVCPU_PPC64LE)
+#if !defined(NVCPU_AARCH64) && !defined(NVCPU_PPC64LE) && !defined(NVCPU_RISCV64)
 #if !defined(NV_SET_MEMORY_UC_PRESENT) && !defined(NV_SET_PAGES_UC_PRESENT)
 #error "This driver requires the ability to change memory types!"
 #endif
@@ -430,6 +430,11 @@ extern NvBool nvos_is_chipset_io_coherent(void);
 #define CACHE_FLUSH()            asm volatile("sync;  \n" \
                                              "isync; \n" ::: "memory")
 #define WRITE_COMBINE_FLUSH()    CACHE_FLUSH()
+#elif defined(NVCPU_RISCV64)
+#define CACHE_FLUSH()            mb()
+#define WRITE_COMBINE_FLUSH()    CACHE_FLUSH()
+#else
+#error "CACHE_FLUSH() and WRITE_COMBINE_FLUSH() need to be defined for this architecture."
 #endif

 typedef enum
@@ -440,7 +445,7 @@ typedef enum
    NV_MEMORY_TYPE_DEVICE_MMIO, /* All kinds of MMIO referred by NVRM e.g. BARs and MCFG of device */
 } nv_memory_type_t;

-#if defined(NVCPU_AARCH64) || defined(NVCPU_PPC64LE)
+#if defined(NVCPU_AARCH64) || defined(NVCPU_PPC64LE) || defined(NVCPU_RISCV64)
 #define NV_ALLOW_WRITE_COMBINING(mt)    1
 #elif defined(NVCPU_X86_64)
 #if defined(NV_ENABLE_PAT_SUPPORT)
@@ -511,7 +516,11 @@ static inline void nv_vfree(void *ptr, NvU64 size)

 static inline void *nv_ioremap(NvU64 phys, NvU64 size)
 {
+#if IS_ENABLED(CONFIG_INTEL_TDX_GUEST) && defined(NV_IOREMAP_DRIVER_HARDENED_PRESENT)
+    void *ptr = ioremap_driver_hardened(phys, size);
+#else
    void *ptr = ioremap(phys, size);
+#endif
    if (ptr)
        NV_MEMDBG_ADD(ptr, size);
    return ptr;
@@ -524,11 +533,11 @@ static inline void *nv_ioremap_nocache(NvU64 phys, NvU64 size)

 static inline void *nv_ioremap_cache(NvU64 phys, NvU64 size)
 {
-#if defined(NV_IOREMAP_CACHE_PRESENT)
-    void *ptr = ioremap_cache(phys, size);
-    if (ptr)
-        NV_MEMDBG_ADD(ptr, size);
-    return ptr;
+    void *ptr = NULL;
+#if IS_ENABLED(CONFIG_INTEL_TDX_GUEST) && defined(NV_IOREMAP_CACHE_SHARED_PRESENT)
+    ptr = ioremap_cache_shared(phys, size);
+#elif defined(NV_IOREMAP_CACHE_PRESENT)
+    ptr = ioremap_cache(phys, size);
 #elif defined(NVCPU_PPC64LE)
    //
    // ioremap_cache() has been only implemented correctly for ppc64le with
@@ -543,25 +552,32 @@ static inline void *nv_ioremap_cache(NvU64 phys, NvU64 size)
    // (commit 40f1ce7fb7e8, kernel 3.0+) and that covers all kernels we
    // support on power.
    //
-    void *ptr = ioremap_prot(phys, size, pgprot_val(PAGE_KERNEL));
-    if (ptr)
-        NV_MEMDBG_ADD(ptr, size);
-    return ptr;
+    ptr = ioremap_prot(phys, size, pgprot_val(PAGE_KERNEL));
 #else
    return nv_ioremap(phys, size);
 #endif
+
+    if (ptr)
+        NV_MEMDBG_ADD(ptr, size);
+
+    return ptr;
 }

 static inline void *nv_ioremap_wc(NvU64 phys, NvU64 size)
 {
-#if defined(NV_IOREMAP_WC_PRESENT)
-    void *ptr = ioremap_wc(phys, size);
-    if (ptr)
-        NV_MEMDBG_ADD(ptr, size);
-    return ptr;
+    void *ptr = NULL;
+#if IS_ENABLED(CONFIG_INTEL_TDX_GUEST) && defined(NV_IOREMAP_DRIVER_HARDENED_WC_PRESENT)
+    ptr = ioremap_driver_hardened_wc(phys, size);
+#elif defined(NV_IOREMAP_WC_PRESENT)
+    ptr = ioremap_wc(phys, size);
 #else
    return nv_ioremap_nocache(phys, size);
 #endif
+
+    if (ptr)
+        NV_MEMDBG_ADD(ptr, size);
+
+    return ptr;
 }

 static inline void nv_iounmap(void *ptr, NvU64 size)
@@ -634,37 +650,24 @@ static NvBool nv_numa_node_has_memory(int node_id)
        free_pages(ptr, order);                      \
    }

-extern NvU64 nv_shared_gpa_boundary;
+static inline pgprot_t nv_sme_clr(pgprot_t prot)
+{
+#if defined(__sme_clr)
+    return __pgprot(__sme_clr(pgprot_val(prot)));
+#else
+    return prot;
+#endif // __sme_clr
+}

 static inline pgprot_t nv_adjust_pgprot(pgprot_t vm_prot, NvU32 extra)
 {
    pgprot_t prot = __pgprot(pgprot_val(vm_prot) | extra);
-#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(NV_PGPROT_DECRYPTED_PRESENT)
-    /*
-     * When AMD memory encryption is enabled, device memory mappings with the
-     * C-bit set read as 0xFF, so ensure the bit is cleared for user mappings.
-     *
-     * If cc_mkdec() is present, then pgprot_decrypted() can't be used.
-     */
-#if defined(NV_CC_MKDEC_PRESENT)
-    if (nv_shared_gpa_boundary != 0)
-    {
-        /*
-         * By design, a VM using vTOM doesn't see the SEV setting and
-         * for AMD with vTOM, *set* means decrypted.
-         */
-        prot =  __pgprot(nv_shared_gpa_boundary | (pgprot_val(vm_prot)));
-    }
-    else
-    {
-        prot =  __pgprot(__sme_clr(pgprot_val(vm_prot)));
-    }
-#else
-    prot = pgprot_decrypted(prot);
-#endif
-#endif

-    return prot;
+#if defined(pgprot_decrypted)
+    return pgprot_decrypted(prot);
+#else
+    return nv_sme_clr(prot);
+#endif // pgprot_decrypted
 }

 #if defined(PAGE_KERNEL_NOENC)
@@ -755,7 +758,6 @@ static inline dma_addr_t nv_phys_to_dma(struct device *dev, NvU64 pa)
 #define NV_VMA_FILE(vma)              ((vma)->vm_file)

 #define NV_DEVICE_MINOR_NUMBER(x)     minor((x)->i_rdev)
-#define NV_CONTROL_DEVICE_MINOR       255

 #define NV_PCI_DISABLE_DEVICE(pci_dev)                           \
    {                                                            \
@@ -1324,7 +1326,7 @@ nv_dma_maps_swiotlb(struct device *dev)
     * SEV memory encryption") forces SWIOTLB to be enabled when AMD SEV 
     * is active in all cases.
     */
-    if (os_sev_enabled)
+    if (os_cc_enabled)
        swiotlb_in_use = NV_TRUE;
 #endif

@@ -1648,20 +1650,11 @@ typedef struct nvidia_event
    nv_event_t event;
 } nvidia_event_t;

-typedef enum
-{
-    NV_FOPS_STACK_INDEX_MMAP,
-    NV_FOPS_STACK_INDEX_IOCTL,
-    NV_FOPS_STACK_INDEX_COUNT
-} nvidia_entry_point_index_t;
-
 typedef struct
 {
    nv_file_private_t nvfp;

    nvidia_stack_t *sp;
-    nvidia_stack_t *fops_sp[NV_FOPS_STACK_INDEX_COUNT];
-    struct semaphore fops_sp_lock[NV_FOPS_STACK_INDEX_COUNT];
    nv_alloc_t *free_list;
    void *nvptr;
    nvidia_event_t *event_data_head, *event_data_tail;
@@ -1691,28 +1684,6 @@ static inline nv_linux_file_private_t *nv_get_nvlfp_from_nvfp(nv_file_private_t

 #define NV_STATE_PTR(nvl)   &(((nv_linux_state_t *)(nvl))->nv_state)

-static inline nvidia_stack_t *nv_nvlfp_get_sp(nv_linux_file_private_t *nvlfp, nvidia_entry_point_index_t which)
-{
-#if defined(NVCPU_X86_64)
-    if (rm_is_altstack_in_use())
-    {
-        down(&nvlfp->fops_sp_lock[which]);
-        return nvlfp->fops_sp[which];
-    }
-#endif
-    return NULL;
-}
-
-static inline void nv_nvlfp_put_sp(nv_linux_file_private_t *nvlfp, nvidia_entry_point_index_t which)
-{
-#if defined(NVCPU_X86_64)
-    if (rm_is_altstack_in_use())
-    {
-        up(&nvlfp->fops_sp_lock[which]);
-    }
-#endif
-}
-
 #define NV_ATOMIC_READ(data)            atomic_read(&(data))
 #define NV_ATOMIC_SET(data,val)         atomic_set(&(data), (val))
 #define NV_ATOMIC_INC(data)             atomic_inc(&(data))
--- a/kernel-open/common/inc/nv-pgprot.h
+++ b/kernel-open/common/inc/nv-pgprot.h
@@ -119,6 +119,13 @@ static inline pgprot_t pgprot_modify_writecombine(pgprot_t old_prot)
 #define NV_PGPROT_WRITE_COMBINED(old_prot)    old_prot
 #define NV_PGPROT_READ_ONLY(old_prot)                                         \
    __pgprot(pgprot_val((old_prot)) & ~NV_PAGE_RW)
+#elif defined(NVCPU_RISCV64)
+#define NV_PGPROT_WRITE_COMBINED_DEVICE(old_prot)                             \
+    pgprot_writecombine(old_prot)
+/* Don't attempt to mark sysmem pages as write combined on riscv */
+#define NV_PGPROT_WRITE_COMBINED(old_prot)     old_prot
+#define NV_PGPROT_READ_ONLY(old_prot)                                         \
+            __pgprot(pgprot_val((old_prot)) & ~_PAGE_WRITE)
 #else
 /* Writecombine is not supported */
 #undef NV_PGPROT_WRITE_COMBINED_DEVICE(old_prot)
--- a/kernel-open/common/inc/nv-proto.h
+++ b/kernel-open/common/inc/nv-proto.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 1999-2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 1999-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -25,10 +25,8 @@
 #define _NV_PROTO_H_

 #include "nv-pci.h"
-#include "nv-register-module.h"

 extern const char *nv_device_name;
-extern nvidia_module_t nv_fops;

 void        nv_acpi_register_notifier   (nv_linux_state_t *);
 void        nv_acpi_unregister_notifier (nv_linux_state_t *);
@@ -86,7 +84,7 @@ void          nv_shutdown_adapter(nvidia_stack_t *, nv_state_t *, nv_linux_state
 void          nv_dev_free_stacks(nv_linux_state_t *);
 NvBool        nv_lock_init_locks(nvidia_stack_t *, nv_state_t *);
 void          nv_lock_destroy_locks(nvidia_stack_t *, nv_state_t *);
-void          nv_linux_add_device_locked(nv_linux_state_t *);
+int           nv_linux_add_device_locked(nv_linux_state_t *);
 void          nv_linux_remove_device_locked(nv_linux_state_t *);
 NvBool        nv_acpi_power_resource_method_present(struct pci_dev *);

--- a/kernel-open/common/inc/nv.h
+++ b/kernel-open/common/inc/nv.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 1999-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 1999-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -42,6 +42,7 @@
 #include <nv-caps.h>
 #include <nv-firmware.h>
 #include <nv-ioctl.h>
+#include <nv-ioctl-numa.h>
 #include <nvmisc.h>

 extern nv_cap_t *nvidia_caps_root;
@@ -50,9 +51,6 @@ extern const NvBool nv_is_rm_firmware_supported_os;

 #include <nv-kernel-interface-api.h>

-/* NVIDIA's reserved major character device number (Linux). */
-#define NV_MAJOR_DEVICE_NUMBER 195
-
 #define GPU_UUID_LEN    (16)

 /*
@@ -478,8 +476,6 @@ typedef struct nv_state_t
    /* Bool to check if dma-buf is supported */
    NvBool dma_buf_supported;

-    NvBool printed_openrm_enable_unsupported_gpus_error;
-
    /* Check if NVPCF DSM function is implemented under NVPCF or GPU device scope */
    NvBool nvpcf_dsm_in_gpu_scope;

@@ -505,6 +501,7 @@ struct nv_file_private_t
    NvHandle *handles;
    NvU16 maxHandles;
    NvU32 deviceInstance;
+    NvU32 gpuInstanceId;
    NvU8 metadata[64];

    nv_file_private_t *ctl_nvfp;
@@ -765,7 +762,7 @@ nv_state_t*  NV_API_CALL  nv_get_ctl_state       (void);
 void   NV_API_CALL  nv_set_dma_address_size      (nv_state_t *, NvU32 );

 NV_STATUS  NV_API_CALL  nv_alias_pages           (nv_state_t *, NvU32, NvU32, NvU32, NvU64, NvU64 *, void **);
-NV_STATUS  NV_API_CALL  nv_alloc_pages           (nv_state_t *, NvU32, NvBool, NvU32, NvBool, NvBool, NvS32, NvU64 *, void **);
+NV_STATUS  NV_API_CALL  nv_alloc_pages           (nv_state_t *, NvU32, NvU64, NvBool, NvU32, NvBool, NvBool, NvS32, NvU64 *, void **);
 NV_STATUS  NV_API_CALL  nv_free_pages            (nv_state_t *, NvU32, NvBool, NvU32, void *);

 NV_STATUS  NV_API_CALL  nv_register_user_pages   (nv_state_t *, NvU64, NvU64 *, void *, void **);
@@ -981,7 +978,7 @@ NV_STATUS  NV_API_CALL  rm_dma_buf_dup_mem_handle (nvidia_stack_t *, nv_state_t
 void       NV_API_CALL  rm_dma_buf_undup_mem_handle(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle);
 NV_STATUS  NV_API_CALL  rm_dma_buf_map_mem_handle (nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvU64, NvU64, void *, nv_phys_addr_range_t **, NvU32 *);
 void       NV_API_CALL  rm_dma_buf_unmap_mem_handle(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvU64, nv_phys_addr_range_t **, NvU32);
-NV_STATUS  NV_API_CALL  rm_dma_buf_get_client_and_device(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle *, NvHandle *, NvHandle *, void **, NvBool *);
+NV_STATUS  NV_API_CALL  rm_dma_buf_get_client_and_device(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvHandle *, NvHandle *, NvHandle *, void **, NvBool *);
 void       NV_API_CALL  rm_dma_buf_put_client_and_device(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvHandle, void *);
 NV_STATUS  NV_API_CALL  rm_log_gpu_crash          (nv_stack_t *, nv_state_t *);

@@ -993,7 +990,7 @@ NvBool     NV_API_CALL rm_gpu_need_4k_page_isolation(nv_state_t *);
 NvBool     NV_API_CALL rm_is_chipset_io_coherent(nv_stack_t *);
 NvBool     NV_API_CALL rm_init_event_locks(nvidia_stack_t *, nv_state_t *);
 void       NV_API_CALL rm_destroy_event_locks(nvidia_stack_t *, nv_state_t *);
-NV_STATUS  NV_API_CALL rm_get_gpu_numa_info(nvidia_stack_t *, nv_state_t *, NvS32 *, NvU64 *, NvU64 *, NvU64 *, NvU32 *);
+NV_STATUS  NV_API_CALL rm_get_gpu_numa_info(nvidia_stack_t *, nv_state_t *, nv_ioctl_numa_info_t *);
 NV_STATUS  NV_API_CALL rm_gpu_numa_online(nvidia_stack_t *, nv_state_t *);
 NV_STATUS  NV_API_CALL rm_gpu_numa_offline(nvidia_stack_t *, nv_state_t *);
 NvBool     NV_API_CALL rm_is_device_sequestered(nvidia_stack_t *, nv_state_t *);
@@ -1008,7 +1005,7 @@ void       NV_API_CALL rm_cleanup_dynamic_power_management(nvidia_stack_t *, nv_
 void       NV_API_CALL rm_enable_dynamic_power_management(nvidia_stack_t *, nv_state_t *);
 NV_STATUS  NV_API_CALL rm_ref_dynamic_power(nvidia_stack_t *, nv_state_t *, nv_dynamic_power_mode_t);
 void       NV_API_CALL rm_unref_dynamic_power(nvidia_stack_t *, nv_state_t *, nv_dynamic_power_mode_t);
-NV_STATUS  NV_API_CALL rm_transition_dynamic_power(nvidia_stack_t *, nv_state_t *, NvBool);
+NV_STATUS  NV_API_CALL rm_transition_dynamic_power(nvidia_stack_t *, nv_state_t *, NvBool, NvBool *);
 const char* NV_API_CALL rm_get_vidmem_power_status(nvidia_stack_t *, nv_state_t *);
 const char* NV_API_CALL rm_get_dynamic_power_management_status(nvidia_stack_t *, nv_state_t *);
 const char* NV_API_CALL rm_get_gpu_gcx_support(nvidia_stack_t *, nv_state_t *, NvBool);
@@ -1023,7 +1020,8 @@ NV_STATUS  NV_API_CALL  nv_vgpu_create_request(nvidia_stack_t *, nv_state_t *, c
 NV_STATUS  NV_API_CALL  nv_vgpu_delete(nvidia_stack_t *, const NvU8 *, NvU16);
 NV_STATUS  NV_API_CALL  nv_vgpu_get_type_ids(nvidia_stack_t *, nv_state_t *, NvU32 *, NvU32 *, NvBool, NvU8, NvBool);
 NV_STATUS  NV_API_CALL  nv_vgpu_get_type_info(nvidia_stack_t *, nv_state_t *, NvU32, char *, int, NvU8);
-NV_STATUS  NV_API_CALL  nv_vgpu_get_bar_info(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 *, NvU32, void *);
+NV_STATUS  NV_API_CALL  nv_vgpu_get_bar_info(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 *, NvU32, void *, NvBool *);
+NV_STATUS  NV_API_CALL  nv_vgpu_get_hbm_info(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 *, NvU64 *);
 NV_STATUS  NV_API_CALL  nv_vgpu_start(nvidia_stack_t *, const NvU8 *, void *, NvS32 *, NvU8 *, NvU32);
 NV_STATUS  NV_API_CALL  nv_vgpu_get_sparse_mmap(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 **, NvU64 **, NvU32 *);
 NV_STATUS  NV_API_CALL  nv_vgpu_process_vf_info(nvidia_stack_t *, nv_state_t *, NvU8, NvU32, NvU8, NvU8, NvU8, NvBool, void *);
--- a/kernel-open/common/inc/nv_speculation_barrier.h
+++ b/kernel-open/common/inc/nv_speculation_barrier.h
@@ -86,7 +86,7 @@
 /* Not currently implemented for MSVC/ARM64. See bug 3366890. */
 #   define nv_speculation_barrier()
 #   define speculation_barrier() nv_speculation_barrier()
-#elif defined(NVCPU_NVRISCV64) && NVOS_IS_LIBOS
+#elif defined(NVCPU_IS_RISCV64)
 #   define nv_speculation_barrier()
 #else
 #error "Unknown compiler/chip family"
--- a/kernel-open/common/inc/nv_uvm_types.h
+++ b/kernel-open/common/inc/nv_uvm_types.h
@@ -104,6 +104,10 @@ typedef struct UvmGpuMemoryInfo_tag
    // Out: Set to TRUE, if the allocation is in sysmem.
    NvBool sysmem;

+    // Out: Set to TRUE, if this allocation is treated as EGM.
+    //      sysmem is also TRUE when egm is TRUE.
+    NvBool egm;
+
    // Out: Set to TRUE, if the allocation is a constructed
    //      under a Device or Subdevice.
    //      All permutations of sysmem and deviceDescendant are valid.
@@ -125,6 +129,8 @@ typedef struct UvmGpuMemoryInfo_tag

    // Out: Uuid of the GPU to which the allocation belongs.
    //      This is only valid if deviceDescendant is NV_TRUE.
+    //      When egm is NV_TRUE, this is also the UUID of the GPU
+    //      for which EGM is local.
    //      Note: If the allocation is owned by a device in
    //      an SLI group and the allocation is broadcast
    //      across the SLI group, this UUID will be any one
@@ -321,10 +327,6 @@ typedef struct UvmGpuChannelAllocParams_tag
    // The next two fields store UVM_BUFFER_LOCATION values
    NvU32 gpFifoLoc;
    NvU32 gpPutLoc;
-
-    // Allocate the channel as secure. This flag should only be set when
-    // Confidential Compute is enabled.
-    NvBool secure;
 } UvmGpuChannelAllocParams;

 typedef struct UvmGpuPagingChannelAllocParams_tag
@@ -336,7 +338,7 @@ typedef struct UvmGpuPagingChannelAllocParams_tag

 // The max number of Copy Engines supported by a GPU.
 // The gpu ops build has a static assert that this is the correct number.
-#define UVM_COPY_ENGINE_COUNT_MAX 10
+#define UVM_COPY_ENGINE_COUNT_MAX 64

 typedef struct
 {
@@ -368,9 +370,6 @@ typedef struct
    // True if the CE can be used for P2P transactions
    NvBool p2p:1;

-    // True if the CE supports encryption
-    NvBool secure:1;
-
    // Mask of physical CEs assigned to this LCE
    //
    // The value returned by RM for this field may change when a GPU is
@@ -687,6 +686,10 @@ typedef struct UvmGpuInfo_tag
    // to NVSwitch peers.
    NvBool connectedToSwitch;
    NvU64 nvswitchMemoryWindowStart;
+
+    // local EGM properties
+    NvBool   egmEnabled;
+    NvU8     egmPeerId;
 } UvmGpuInfo;

 typedef struct UvmGpuFbInfo_tag
--- a/kernel-open/common/inc/nvkms-api-types.h
+++ b/kernel-open/common/inc/nvkms-api-types.h
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2014-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -45,6 +45,11 @@

 #define NVKMS_DEVICE_ID_TEGRA                 0x0000ffff

+#define NVKMS_MAX_SUPERFRAME_VIEWS            4
+
+#define NVKMS_LOG2_LUT_ARRAY_SIZE             10
+#define NVKMS_LUT_ARRAY_SIZE                  (1 << NVKMS_LOG2_LUT_ARRAY_SIZE)
+
 typedef NvU32 NvKmsDeviceHandle;
 typedef NvU32 NvKmsDispHandle;
 typedef NvU32 NvKmsConnectorHandle;
@@ -179,6 +184,14 @@ enum NvKmsEventType {
    NVKMS_EVENT_TYPE_FLIP_OCCURRED,
 };

+enum NvKmsFlipResult {
+    NV_KMS_FLIP_RESULT_SUCCESS = 0,    /* Success */
+    NV_KMS_FLIP_RESULT_INVALID_PARAMS, /* Parameter validation failed */
+    NV_KMS_FLIP_RESULT_IN_PROGRESS,    /* Flip would fail because an outstanding
+                                          flip containing changes that cannot be
+                                          queued is in progress */
+};
+
 typedef enum {
    NV_EVO_SCALER_1TAP      = 0,
    NV_EVO_SCALER_2TAPS     = 1,
@@ -221,6 +234,16 @@ struct NvKmsUsageBounds {
    } layer[NVKMS_MAX_LAYERS_PER_HEAD];
 };

+/*!
+ * Per-component arrays of NvU16s describing the LUT; used for both the input
+ * LUT and output LUT.
+ */
+struct NvKmsLutRamps {
+    NvU16 red[NVKMS_LUT_ARRAY_SIZE];   /*! in */
+    NvU16 green[NVKMS_LUT_ARRAY_SIZE]; /*! in */
+    NvU16 blue[NVKMS_LUT_ARRAY_SIZE];  /*! in */
+};
+
 /*
 * A 3x4 row-major colorspace conversion matrix.
 *
@@ -531,6 +554,18 @@ typedef struct {
    NvBool noncoherent;
 } NvKmsDispIOCoherencyModes;

+enum NvKmsInputColorRange {
+    /*
+     * If DEFAULT is provided, driver will assume full range for RGB formats
+     * and limited range for YUV formats.
+     */
+    NVKMS_INPUT_COLORRANGE_DEFAULT = 0,
+
+    NVKMS_INPUT_COLORRANGE_LIMITED = 1,
+
+    NVKMS_INPUT_COLORRANGE_FULL = 2,
+};
+
 enum NvKmsInputColorSpace {
    /* Unknown colorspace; no de-gamma will be applied */
    NVKMS_INPUT_COLORSPACE_NONE = 0,
@@ -542,6 +577,12 @@ enum NvKmsInputColorSpace {
    NVKMS_INPUT_COLORSPACE_BT2100_PQ = 2,
 };

+enum NvKmsOutputColorimetry {
+    NVKMS_OUTPUT_COLORIMETRY_DEFAULT = 0,
+
+    NVKMS_OUTPUT_COLORIMETRY_BT2100 = 1,
+};
+
 enum NvKmsOutputTf {
    /*
     * NVKMS itself won't apply any OETF (clients are still
@@ -552,6 +593,17 @@ enum NvKmsOutputTf {
    NVKMS_OUTPUT_TF_PQ = 2,
 };

+/*!
+ * EOTF Data Byte 1 as per CTA-861-G spec.
+ * This is expected to match exactly with the spec.
+ */
+enum NvKmsInfoFrameEOTF {
+    NVKMS_INFOFRAME_EOTF_SDR_GAMMA = 0,
+    NVKMS_INFOFRAME_EOTF_HDR_GAMMA = 1,
+    NVKMS_INFOFRAME_EOTF_ST2084 = 2,
+    NVKMS_INFOFRAME_EOTF_HLG = 3,
+};
+
 /*!
 * HDR Static Metadata Type1 Descriptor as per CEA-861.3 spec.
 * This is expected to match exactly with the spec.
@@ -605,4 +657,29 @@ struct NvKmsHDRStaticMetadata {
    NvU16 maxFALL;
 };

+/*!
+ * A superframe is made of two or more video streams that are combined in
+ * a specific way. A DP serializer (an external device connected to a Tegra
+ * ARM SOC over DP or HDMI) can receive a video stream comprising multiple
+ * videos combined into a single frame and then split it into multiple
+ * video streams. The following structure describes the number of views
+ * and dimensions of each view inside a superframe.
+ */
+struct NvKmsSuperframeInfo {
+    NvU8 numViews;
+    struct {
+        /* x offset inside superframe at which this view starts */
+        NvU16 x;
+
+        /* y offset inside superframe at which this view starts */
+        NvU16 y;
+
+        /* Horizontal active width in pixels for this view */
+        NvU16 width;
+
+        /* Vertical active height in lines for this view */
+        NvU16 height;
+    } view[NVKMS_MAX_SUPERFRAME_VIEWS];
+};
+
 #endif /* NVKMS_API_TYPES_H */
--- a/kernel-open/common/inc/nvkms-kapi.h
+++ b/kernel-open/common/inc/nvkms-kapi.h
@@ -49,6 +49,8 @@ struct NvKmsKapiDevice;
 struct NvKmsKapiMemory;
 struct NvKmsKapiSurface;
 struct NvKmsKapiChannelEvent;
+struct NvKmsKapiSemaphoreSurface;
+struct NvKmsKapiSemaphoreSurfaceCallback;

 typedef NvU32 NvKmsKapiConnector;
 typedef NvU32 NvKmsKapiDisplay;
@@ -67,6 +69,14 @@ typedef NvU32 NvKmsKapiDisplay;
 */
 typedef void NvKmsChannelEventProc(void *dataPtr, NvU32 dataU32);

+/*
+ * Note: Same as above, this function must not call back into NVKMS-KAPI, nor
+ * directly into RM. Doing so could cause deadlocks given the notification
+ * function will most likely be called from within RM's interrupt handler
+ * callchain.
+ */
+typedef void NvKmsSemaphoreSurfaceCallbackProc(void *pData);
+
 /** @} */

 /**
@@ -126,6 +136,11 @@ struct NvKmsKapiDeviceResourcesInfo {
        NvU32 validCursorCompositionModes;
        NvU64 supportedCursorSurfaceMemoryFormats;

+        struct {
+            NvU64 maxSubmittedOffset;
+            NvU64 stride;
+        } semsurf;
+
        struct {
            NvU16 validRRTransforms;
            NvU32 validCompositionModes;
@@ -218,8 +233,10 @@ struct NvKmsKapiLayerConfig {
    struct NvKmsRRParams rrParams;
    struct NvKmsKapiSyncpt syncptParams;

-    struct NvKmsHDRStaticMetadata hdrMetadata;
-    NvBool hdrMetadataSpecified;
+    struct {
+        struct NvKmsHDRStaticMetadata val;
+        NvBool enabled;
+    } hdrMetadata;

    enum NvKmsOutputTf tf;

@@ -233,16 +250,21 @@ struct NvKmsKapiLayerConfig {
    NvU16 dstWidth, dstHeight;

    enum NvKmsInputColorSpace inputColorSpace;
+    struct NvKmsCscMatrix csc;
+    NvBool cscUseMain;
 };

 struct NvKmsKapiLayerRequestedConfig {
    struct NvKmsKapiLayerConfig config;
    struct {
-        NvBool surfaceChanged : 1;
-        NvBool srcXYChanged   : 1;
-        NvBool srcWHChanged   : 1;
-        NvBool dstXYChanged   : 1;
-        NvBool dstWHChanged   : 1;
+        NvBool surfaceChanged     : 1;
+        NvBool srcXYChanged       : 1;
+        NvBool srcWHChanged       : 1;
+        NvBool dstXYChanged       : 1;
+        NvBool dstWHChanged       : 1;
+        NvBool cscChanged         : 1;
+        NvBool tfChanged          : 1;
+        NvBool hdrMetadataChanged : 1;
    } flags;
 };

@@ -286,14 +308,41 @@ struct NvKmsKapiHeadModeSetConfig {
    struct NvKmsKapiDisplayMode mode;

    NvBool vrrEnabled;
+
+    struct {
+        NvBool enabled;
+        enum NvKmsInfoFrameEOTF eotf;
+        struct NvKmsHDRStaticMetadata staticMetadata;
+    } hdrInfoFrame;
+
+    enum NvKmsOutputColorimetry colorimetry;
+
+    struct {
+        struct {
+            NvBool specified;
+            NvU32 depth;
+            NvU32 start;
+            NvU32 end;
+            struct NvKmsLutRamps *pRamps;
+        } input;
+
+        struct {
+            NvBool specified;
+            NvBool enabled;
+            struct NvKmsLutRamps *pRamps;
+        } output;
+    } lut;
 };

 struct NvKmsKapiHeadRequestedConfig {
    struct NvKmsKapiHeadModeSetConfig modeSetConfig;
    struct {
-        NvBool activeChanged   : 1;
-        NvBool displaysChanged : 1;
-        NvBool modeChanged     : 1;
+        NvBool activeChanged       : 1;
+        NvBool displaysChanged     : 1;
+        NvBool modeChanged         : 1;
+        NvBool hdrInfoFrameChanged : 1;
+        NvBool colorimetryChanged  : 1;
+        NvBool lutChanged      : 1;
    } flags;

    struct NvKmsKapiCursorRequestedConfig cursorRequestedConfig;
@@ -318,6 +367,7 @@ struct NvKmsKapiHeadReplyConfig {
 };

 struct NvKmsKapiModeSetReplyConfig {
+    enum NvKmsFlipResult flipResult;
    struct NvKmsKapiHeadReplyConfig
        headReplyConfig[NVKMS_KAPI_MAX_HEADS];
 };
@@ -434,6 +484,12 @@ enum NvKmsKapiAllocationType {
    NVKMS_KAPI_ALLOCATION_TYPE_OFFSCREEN = 2,
 };

+typedef enum NvKmsKapiRegisterWaiterResultRec {
+    NVKMS_KAPI_REG_WAITER_FAILED,
+    NVKMS_KAPI_REG_WAITER_SUCCESS,
+    NVKMS_KAPI_REG_WAITER_ALREADY_SIGNALLED,
+} NvKmsKapiRegisterWaiterResult;
+
 struct NvKmsKapiFunctionsTable {

    /*!
@@ -519,8 +575,8 @@ struct NvKmsKapiFunctionsTable {
    );

    /*!
-     * Revoke permissions previously granted. Only one (dispIndex, head,
-     * display) is currently supported.
+     * Revoke modeset permissions previously granted. Only one (dispIndex,
+     * head, display) is currently supported.
     *
     * \param [in]  device     A device returned by allocateDevice().
     *
@@ -537,6 +593,34 @@ struct NvKmsKapiFunctionsTable {
        NvKmsKapiDisplay display
    );

+    /*!
+     * Grant modeset sub-owner permissions to fd. This is used by clients to
+     * convert drm 'master' permissions into nvkms sub-owner permission.
+     *
+     * \param [in]  fd         fd from opening /dev/nvidia-modeset.
+     *
+     * \param [in]  device     A device returned by allocateDevice().
+     *
+     * \return NV_TRUE on success, NV_FALSE on failure.
+     */
+    NvBool (*grantSubOwnership)
+    (
+        NvS32 fd,
+        struct NvKmsKapiDevice *device
+    );
+
+    /*!
+     * Revoke sub-owner permissions previously granted.
+     *
+     * \param [in]  device     A device returned by allocateDevice().
+     *
+     * \return NV_TRUE on success, NV_FALSE on failure.
+     */
+    NvBool (*revokeSubOwnership)
+    (
+        struct NvKmsKapiDevice *device
+    );
+
    /*!
     * Registers for notification, via
     * NvKmsKapiAllocateDeviceParams::eventCallback, of the events specified
@@ -1122,6 +1206,199 @@ struct NvKmsKapiFunctionsTable {
                                       NvP64 dmaBuf,
                                       NvU32 limit);

+    /*!
+     * Import a semaphore surface allocated elsewhere to NVKMS and return a
+     * handle to the new object.
+     *
+     * \param [in] device            A device allocated using allocateDevice().
+     *
+     * \param [in] nvKmsParamsUser   Userspace pointer to driver-specific
+     *                               parameters describing the semaphore
+     *                               surface being imported.
+     *
+     * \param [in] nvKmsParamsSize   Size of the driver-specific parameter
+     *                               struct.
+     *
+     * \param [out] pSemaphoreMap    Returns a CPU mapping of the semaphore
+     *                               surface's semaphore memory to the client.
+     *
+     * \param [out] pMaxSubmittedMap Returns a CPU mapping of the semaphore
+     *                               surface's semaphore memory to the client.
+     *
+     * \return struct NvKmsKapiSemaphoreSurface* on success, NULL on failure.
+     */
+    struct NvKmsKapiSemaphoreSurface* (*importSemaphoreSurface)
+    (
+         struct NvKmsKapiDevice *device,
+         NvU64 nvKmsParamsUser,
+         NvU64 nvKmsParamsSize,
+         void **pSemaphoreMap,
+         void **pMaxSubmittedMap
+    );
+
+    /*!
+     * Free an imported semaphore surface.
+     *
+     * \param [in]  device              The device passed to
+     *                                  importSemaphoreSurface() when creating
+     *                                  semaphoreSurface.
+     *
+     * \param [in]  semaphoreSurface    A semaphore surface returned by
+     *                                  importSemaphoreSurface().
+     */
+    void (*freeSemaphoreSurface)
+    (
+        struct NvKmsKapiDevice *device,
+        struct NvKmsKapiSemaphoreSurface *semaphoreSurface
+    );
+
+    /*!
+     * Register a callback to be called when a semaphore reaches a value.
+     *
+     * The callback will be called when the semaphore at index in
+     * semaphoreSurface reaches the value wait_value.  The callback will
+     * be called at most once and is automatically unregistered when called.
+     * It may also be unregistered (i.e., cancelled) explicitly using the
+     * unregisterSemaphoreSurfaceCallback() function. To avoid leaking the
+     * memory used to track the registered callback, callers must ensure one
+     * of these methods of unregistration is used for every successful
+     * callback registration that returns a non-NULL pCallbackHandle.
+     *
+     * \param [in]  device              The device passed to
+     *                                  importSemaphoreSurface() when creating
+     *                                  semaphoreSurface.
+     *
+     * \param [in]  semaphoreSurface    A semaphore surface returned by
+     *                                  importSemaphoreSurface().
+     *
+     * \param [in]  pCallback           A pointer to the function to call when
+     *                                  the specified value is reached. NULL
+     *                                  means no callback.
+     *
+     * \param [in]  pData               Arbitrary data to be passed back to the
+     *                                  callback as its sole parameter.
+     *
+     * \param [in]  index               The index of the semaphore within
+     *                                  semaphoreSurface.
+     *
+     * \param [in]  wait_value          The value the semaphore must reach or
+     *                                  exceed before the callback is called.
+     *
+     * \param [in]  new_value           The value the semaphore will be set to
+     *                                  when it reaches or exceeds <wait_value>.
+     *                                  0 means do not update the value.
+     *
+     * \param [out] pCallbackHandle     On success, the value pointed to will
+     *                                  contain an opaque handle to the
+     *                                  registered callback that may be used to
+     *                                  cancel it if needed. Unused if pCallback
+     *                                  is NULL.
+     *
+     * \return NVKMS_KAPI_REG_WAITER_SUCCESS if the waiter was registered or if
+     *         no callback was requested and the semaphore at <index> has
+     *         already reached or exceeded <wait_value>
+     *
+     *         NVKMS_KAPI_REG_WAITER_ALREADY_SIGNALLED if a callback was
+     *         requested and the semaphore at <index> has already reached or
+     *         exceeded <wait_value>
+     *
+     *         NVKMS_KAPI_REG_WAITER_FAILED if waiter registration failed.
+     */
+    NvKmsKapiRegisterWaiterResult
+    (*registerSemaphoreSurfaceCallback)
+    (
+        struct NvKmsKapiDevice *device,
+        struct NvKmsKapiSemaphoreSurface *semaphoreSurface,
+        NvKmsSemaphoreSurfaceCallbackProc *pCallback,
+        void *pData,
+        NvU64 index,
+        NvU64 wait_value,
+        NvU64 new_value,
+        struct NvKmsKapiSemaphoreSurfaceCallback **pCallbackHandle
+    );
+
+    /*!
+     * Unregister a callback registered via registerSemaphoreSurfaceCallback()
+     *
+     * If the callback has not yet been called, this function will cancel the
+     * callback and free its associated resources.
+     *
+     * Note this function treats the callback handle as a pointer. While this
+     * function does not dereference that pointer itself, the underlying call
+     * to RM does within a properly guarded critical section that first ensures
+     * it is not in the process of being used within a callback. This means
+     * the callstack must take into consideration that pointers are not in
+     * general unique handles if they may have been freed, since a subsequent
+     * malloc could return the same pointer value at that point. This callchain
+     * avoids that by leveraging the behavior of the underlying RM APIs:
+     *
+     * 1) A callback handle is referenced relative to its corresponding
+     *    (semaphore surface, index, wait_value) tuple here and within RM. It
+     *    is not a valid handle outside of that scope.
+     *
+     * 2) A callback can not be registered against an already-reached value
+     *    for a given semaphore surface index.
+     *
+     * 3) A given callback handle can not be registered twice against the same
+     *    (semaphore surface, index, wait_value) tuple, so unregistration will
+     *    never race with registration at the RM level, and would only race at
+     *    a higher level if used incorrectly. Since this is kernel code, we
+     *    can safely assume there won't be malicious clients purposely misuing
+     *    the API, but the burden is placed on the caller to ensure its usage
+     *    does not lead to races at higher levels.
+     *
+     * These factors considered together ensure any valid registered handle is
+     * either still in the relevant waiter list and refers to the same event/
+     * callback as when it was registered, or has been removed from the list
+     * as part of a critical section that also destroys the list itself and
+     * makes future lookups in that list impossible, and hence eliminates the
+     * chance of comparing a stale handle with a new handle of the same value
+     * as part of a lookup.
+     *
+     * \param [in]  device              The device passed to
+     *                                  importSemaphoreSurface() when creating
+     *                                  semaphoreSurface.
+     *
+     * \param [in]  semaphoreSurface    The semaphore surface passed to
+     *                                  registerSemaphoreSurfaceCallback() when
+     *                                  registering the callback.
+     *
+     * \param [in]  index               The index passed to
+     *                                  registerSemaphoreSurfaceCallback() when
+     *                                  registering the callback.
+     *
+     * \param [in]  wait_value          The wait_value passed to
+     *                                  registerSemaphoreSurfaceCallback() when
+     *                                  registering the callback.
+     *
+     * \param [in]  callbackHandle      The callback handle returned by
+     *                                  registerSemaphoreSurfaceCallback().
+     */
+    NvBool
+    (*unregisterSemaphoreSurfaceCallback)
+    (
+        struct NvKmsKapiDevice *device,
+        struct NvKmsKapiSemaphoreSurface *semaphoreSurface,
+        NvU64 index,
+        NvU64 wait_value,
+        struct NvKmsKapiSemaphoreSurfaceCallback *callbackHandle
+    );
+
+    /*!
+     * Update the value of a semaphore surface from the CPU.
+     *
+     * Update the semaphore value at the specified index from the CPU, then
+     * wake up any pending CPU waiters associated with that index that are
+     * waiting on it reaching a value <= the new value.
+     */
+    NvBool
+    (*setSemaphoreSurfaceValue)
+    (
+        struct NvKmsKapiDevice *device,
+        struct NvKmsKapiSemaphoreSurface *semaphoreSurface,
+        NvU64 index,
+        NvU64 new_value
+    );
 };

 /** @} */
--- a/kernel-open/common/inc/os-interface.h
+++ b/kernel-open/common/inc/os-interface.h
@@ -162,7 +162,7 @@ NvBool      NV_API_CALL  os_is_vgx_hyper             (void);
 NV_STATUS   NV_API_CALL  os_inject_vgx_msi           (NvU16, NvU64, NvU32);
 NvBool      NV_API_CALL  os_is_grid_supported        (void);
 NvU32       NV_API_CALL  os_get_grid_csp_support     (void);
-void        NV_API_CALL  os_get_screen_info          (NvU64 *, NvU16 *, NvU16 *, NvU16 *, NvU16 *, NvU64, NvU64);
+void        NV_API_CALL  os_get_screen_info          (NvU64 *, NvU32 *, NvU32 *, NvU32 *, NvU32 *, NvU64, NvU64);
 void        NV_API_CALL  os_bug_check                (NvU32, const char *);
 NV_STATUS   NV_API_CALL  os_lock_user_pages          (void *, NvU64, void **, NvU32);
 NV_STATUS   NV_API_CALL  os_lookup_user_io_memory    (void *, NvU64, NvU64 **, void**);
@@ -207,15 +207,19 @@ enum os_pci_req_atomics_type {
    OS_INTF_PCIE_REQ_ATOMICS_128BIT
 };
 NV_STATUS   NV_API_CALL  os_enable_pci_req_atomics   (void *, enum os_pci_req_atomics_type);
+NV_STATUS   NV_API_CALL  os_get_numa_node_memory_usage (NvS32, NvU64 *, NvU64 *);
 NV_STATUS   NV_API_CALL  os_numa_add_gpu_memory      (void *, NvU64, NvU64, NvU32 *);
 NV_STATUS   NV_API_CALL  os_numa_remove_gpu_memory   (void *, NvU64, NvU64, NvU32); 
 NV_STATUS   NV_API_CALL  os_offline_page_at_address(NvU64 address);
+void*       NV_API_CALL  os_get_pid_info(void);
+void        NV_API_CALL  os_put_pid_info(void *pid_info);
+NV_STATUS   NV_API_CALL  os_find_ns_pid(void *pid_info, NvU32 *ns_pid);

 extern NvU32 os_page_size;
 extern NvU64 os_page_mask;
 extern NvU8  os_page_shift;
-extern NvU32 os_sev_status;
-extern NvBool os_sev_enabled;
+extern NvBool os_cc_enabled;
+extern NvBool os_cc_tdx_enabled;
 extern NvBool os_dma_buf_enabled;

 /*
@@ -226,12 +230,14 @@ extern NvBool os_dma_buf_enabled;
 * ---------------------------------------------------------------------------
 */

-#define NV_DBG_INFO       0x0
-#define NV_DBG_SETUP      0x1
-#define NV_DBG_USERERRORS 0x2
+#define NV_DBG_INFO       0x1
+#define NV_DBG_SETUP      0x2
 #define NV_DBG_WARNINGS   0x3
 #define NV_DBG_ERRORS     0x4
+#define NV_DBG_HW_ERRORS  0x5
+#define NV_DBG_FATAL      0x6

+#define NV_DBG_FORCE_LEVEL(level) ((level) | (1 << 8))

 void NV_API_CALL  out_string(const char *str);
 int  NV_API_CALL  nv_printf(NvU32 debuglevel, const char *printf_format, ...);
--- a/kernel-open/conftest.sh
+++ b/kernel-open/conftest.sh
--- a/kernel-open/nvidia-drm/nv-kthread-q.c
+++ b/kernel-open/nvidia-drm/nv-kthread-q.c
@@ -0,0 +1,334 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: MIT
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "nv-kthread-q.h"
+#include "nv-list-helpers.h"
+
+#include <linux/kthread.h>
+#include <linux/interrupt.h>
+#include <linux/completion.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+
+#if defined(NV_LINUX_BUG_H_PRESENT)
+    #include <linux/bug.h>
+#else
+    #include <asm/bug.h>
+#endif
+
+// Today's implementation is a little simpler and more limited than the
+// API description allows for in nv-kthread-q.h. Details include:
+//
+// 1. Each nv_kthread_q instance is a first-in, first-out queue.
+//
+// 2. Each nv_kthread_q instance is serviced by exactly one kthread.
+//
+// You can create any number of queues, each of which gets its own
+// named kernel thread (kthread). You can then insert arbitrary functions
+// into the queue, and those functions will be run in the context of the
+// queue's kthread.
+
+#ifndef WARN
+    // Only *really* old kernels (2.6.9) end up here. Just use a simple printk
+    // to implement this, because such kernels won't be supported much longer.
+    #define WARN(condition, format...) ({                    \
+        int __ret_warn_on = !!(condition);                   \
+        if (unlikely(__ret_warn_on))                         \
+            printk(KERN_ERR format);                         \
+        unlikely(__ret_warn_on);                             \
+    })
+#endif
+
+#define NVQ_WARN(fmt, ...)                                   \
+    do {                                                     \
+        if (in_interrupt()) {                                \
+            WARN(1, "nv_kthread_q: [in interrupt]: " fmt,    \
+            ##__VA_ARGS__);                                  \
+        }                                                    \
+        else {                                               \
+            WARN(1, "nv_kthread_q: task: %s: " fmt,          \
+                 current->comm,                              \
+                 ##__VA_ARGS__);                             \
+        }                                                    \
+    } while (0)
+
+static int _main_loop(void *args)
+{
+    nv_kthread_q_t *q = (nv_kthread_q_t *)args;
+    nv_kthread_q_item_t *q_item = NULL;
+    unsigned long flags;
+
+    while (1) {
+        // Normally this thread is never interrupted. However,
+        // down_interruptible (instead of down) is called here,
+        // in order to avoid being classified as a potentially
+        // hung task, by the kernel watchdog.
+        while (down_interruptible(&q->q_sem))
+            NVQ_WARN("Interrupted during semaphore wait\n");
+
+        if (atomic_read(&q->main_loop_should_exit))
+            break;
+
+        spin_lock_irqsave(&q->q_lock, flags);
+
+        // The q_sem semaphore prevents us from getting here unless there is
+        // at least one item in the list, so an empty list indicates a bug.
+        if (unlikely(list_empty(&q->q_list_head))) {
+            spin_unlock_irqrestore(&q->q_lock, flags);
+            NVQ_WARN("_main_loop: Empty queue: q: 0x%p\n", q);
+            continue;
+        }
+
+        // Consume one item from the queue
+        q_item = list_first_entry(&q->q_list_head,
+                                   nv_kthread_q_item_t,
+                                   q_list_node);
+
+        list_del_init(&q_item->q_list_node);
+
+        spin_unlock_irqrestore(&q->q_lock, flags);
+
+        // Run the item
+        q_item->function_to_run(q_item->function_args);
+
+        // Make debugging a little simpler by clearing this between runs:
+        q_item = NULL;
+    }
+
+    while (!kthread_should_stop())
+        schedule();
+
+    return 0;
+}
+
+void nv_kthread_q_stop(nv_kthread_q_t *q)
+{
+    // check if queue has been properly initialized
+    if (unlikely(!q->q_kthread))
+        return;
+
+    nv_kthread_q_flush(q);
+
+    // If this assertion fires, then a caller likely either broke the API rules,
+    // by adding items after calling nv_kthread_q_stop, or possibly messed up
+    // with inadequate flushing of self-rescheduling q_items.
+    if (unlikely(!list_empty(&q->q_list_head)))
+        NVQ_WARN("list not empty after flushing\n");
+
+    if (likely(!atomic_read(&q->main_loop_should_exit))) {
+
+        atomic_set(&q->main_loop_should_exit, 1);
+
+        // Wake up the kthread so that it can see that it needs to stop:
+        up(&q->q_sem);
+
+        kthread_stop(q->q_kthread);
+        q->q_kthread = NULL;
+    }
+}
+
+// When CONFIG_VMAP_STACK is defined, the kernel thread stack allocator used by
+// kthread_create_on_node relies on a 2 entry, per-core cache to minimize
+// vmalloc invocations. The cache is NUMA-unaware, so when there is a hit, the
+// stack location ends up being a function of the core assigned to the current
+// thread, instead of being a function of the specified NUMA node. The cache was
+// added to the kernel in commit ac496bf48d97f2503eaa353996a4dd5e4383eaf0
+// ("fork: Optimize task creation by caching two thread stacks per CPU if
+// CONFIG_VMAP_STACK=y")
+//
+// To work around the problematic cache, we create up to three kernel threads
+//   -If the first thread's stack is resident on the preferred node, return this
+//    thread.
+//   -Otherwise, create a second thread. If its stack is resident on the
+//    preferred node, stop the first thread and return this one.
+//   -Otherwise, create a third thread. The stack allocator does not find a
+//    cached stack, and so falls back to vmalloc, which takes the NUMA hint into
+//    consideration. The first two threads are then stopped.
+//
+// When CONFIG_VMAP_STACK is not defined, the first kernel thread is returned.
+//
+// This function is never invoked when there is no NUMA preference (preferred
+// node is NUMA_NO_NODE).
+static struct task_struct *thread_create_on_node(int (*threadfn)(void *data),
+                                                 nv_kthread_q_t *q,
+                                                 int preferred_node,
+                                                 const char *q_name)
+{
+
+    unsigned i, j;
+    const static unsigned attempts = 3;
+    struct task_struct *thread[3];
+
+    for (i = 0;; i++) {
+        struct page *stack;
+
+        thread[i] = kthread_create_on_node(threadfn, q, preferred_node, q_name);
+
+        if (unlikely(IS_ERR(thread[i]))) {
+
+            // Instead of failing, pick the previous thread, even if its
+            // stack is not allocated on the preferred node.
+            if (i > 0)
+                i--;
+
+            break;
+        }
+
+        // vmalloc is not used to allocate the stack, so simply return the
+        // thread, even if its stack may not be allocated on the preferred node
+        if (!is_vmalloc_addr(thread[i]->stack))
+            break;
+
+        // Ran out of attempts - return thread even if its stack may not be
+        // allocated on the preferred node
+        if ((i == (attempts - 1)))
+            break;
+
+        // Get the NUMA node where the first page of the stack is resident. If
+        // it is the preferred node, select this thread.
+        stack = vmalloc_to_page(thread[i]->stack);
+        if (page_to_nid(stack) == preferred_node)
+            break;
+    }
+
+    for (j = i; j > 0; j--)
+        kthread_stop(thread[j - 1]);
+
+    return thread[i];
+}
+
+int nv_kthread_q_init_on_node(nv_kthread_q_t *q, const char *q_name, int preferred_node)
+{
+    memset(q, 0, sizeof(*q));
+
+    INIT_LIST_HEAD(&q->q_list_head);
+    spin_lock_init(&q->q_lock);
+    sema_init(&q->q_sem, 0);
+
+    if (preferred_node == NV_KTHREAD_NO_NODE) {
+        q->q_kthread = kthread_create(_main_loop, q, q_name);
+    }
+    else {
+        q->q_kthread = thread_create_on_node(_main_loop, q, preferred_node, q_name);
+    }
+
+    if (IS_ERR(q->q_kthread)) {
+        int err = PTR_ERR(q->q_kthread);
+
+        // Clear q_kthread before returning so that nv_kthread_q_stop() can be
+        // safely called on it making error handling easier.
+        q->q_kthread = NULL;
+
+        return err;
+    }
+
+    wake_up_process(q->q_kthread);
+
+    return 0;
+}
+
+int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname)
+{
+    return nv_kthread_q_init_on_node(q, qname, NV_KTHREAD_NO_NODE);
+}
+
+// Returns true (non-zero) if the item was actually scheduled, and false if the
+// item was already pending in a queue.
+static int _raw_q_schedule(nv_kthread_q_t *q, nv_kthread_q_item_t *q_item)
+{
+    unsigned long flags;
+    int ret = 1;
+
+    spin_lock_irqsave(&q->q_lock, flags);
+
+    if (likely(list_empty(&q_item->q_list_node)))
+        list_add_tail(&q_item->q_list_node, &q->q_list_head);
+    else
+        ret = 0;
+
+    spin_unlock_irqrestore(&q->q_lock, flags);
+
+    if (likely(ret))
+        up(&q->q_sem);
+
+    return ret;
+}
+
+void nv_kthread_q_item_init(nv_kthread_q_item_t *q_item,
+                            nv_q_func_t function_to_run,
+                            void *function_args)
+{
+    INIT_LIST_HEAD(&q_item->q_list_node);
+    q_item->function_to_run = function_to_run;
+    q_item->function_args   = function_args;
+}
+
+// Returns true (non-zero) if the q_item got scheduled, false otherwise.
+int nv_kthread_q_schedule_q_item(nv_kthread_q_t *q,
+                                 nv_kthread_q_item_t *q_item)
+{
+    if (unlikely(atomic_read(&q->main_loop_should_exit))) {
+        NVQ_WARN("Not allowed: nv_kthread_q_schedule_q_item was "
+                   "called with a non-alive q: 0x%p\n", q);
+        return 0;
+    }
+
+    return _raw_q_schedule(q, q_item);
+}
+
+static void _q_flush_function(void *args)
+{
+    struct completion *completion = (struct completion *)args;
+    complete(completion);
+}
+
+
+static void _raw_q_flush(nv_kthread_q_t *q)
+{
+    nv_kthread_q_item_t q_item;
+    DECLARE_COMPLETION_ONSTACK(completion);
+
+    nv_kthread_q_item_init(&q_item, _q_flush_function, &completion);
+
+    _raw_q_schedule(q, &q_item);
+
+    // Wait for the flush item to run. Once it has run, then all of the
+    // previously queued items in front of it will have run, so that means
+    // the flush is complete.
+    wait_for_completion(&completion);
+}
+
+void nv_kthread_q_flush(nv_kthread_q_t *q)
+{
+    if (unlikely(atomic_read(&q->main_loop_should_exit))) {
+        NVQ_WARN("Not allowed: nv_kthread_q_flush was called after "
+                   "nv_kthread_q_stop. q: 0x%p\n", q);
+        return;
+    }
+
+    // This 2x flush is not a typing mistake. The queue really does have to be
+    // flushed twice, in order to take care of the case of a q_item that
+    // reschedules itself.
+    _raw_q_flush(q);
+    _raw_q_flush(q);
+}
--- a/kernel-open/nvidia-drm/nvidia-dma-fence-helper.h
+++ b/kernel-open/nvidia-drm/nvidia-dma-fence-helper.h
@@ -43,9 +43,13 @@
 #if defined(NV_LINUX_FENCE_H_PRESENT)
 typedef struct fence nv_dma_fence_t;
 typedef struct fence_ops nv_dma_fence_ops_t;
+typedef struct fence_cb nv_dma_fence_cb_t;
+typedef fence_func_t nv_dma_fence_func_t;
 #else
 typedef struct dma_fence nv_dma_fence_t;
 typedef struct dma_fence_ops nv_dma_fence_ops_t;
+typedef struct dma_fence_cb nv_dma_fence_cb_t;
+typedef dma_fence_func_t nv_dma_fence_func_t;
 #endif

 #if defined(NV_LINUX_FENCE_H_PRESENT)
@@ -97,6 +101,14 @@ static inline int nv_dma_fence_signal(nv_dma_fence_t *fence) {
 #endif
 }

+static inline int nv_dma_fence_signal_locked(nv_dma_fence_t *fence) {
+#if defined(NV_LINUX_FENCE_H_PRESENT)
+    return fence_signal_locked(fence);
+#else
+    return dma_fence_signal_locked(fence);
+#endif
+}
+
 static inline u64 nv_dma_fence_context_alloc(unsigned num) {
 #if defined(NV_LINUX_FENCE_H_PRESENT)
    return fence_context_alloc(num);
@@ -108,7 +120,7 @@ static inline u64 nv_dma_fence_context_alloc(unsigned num) {
 static inline void
 nv_dma_fence_init(nv_dma_fence_t *fence,
                  const nv_dma_fence_ops_t *ops,
-                  spinlock_t *lock, u64 context, unsigned seqno) {
+                  spinlock_t *lock, u64 context, uint64_t seqno) {
 #if defined(NV_LINUX_FENCE_H_PRESENT)
    fence_init(fence, ops, lock, context, seqno);
 #else
@@ -116,6 +128,29 @@ nv_dma_fence_init(nv_dma_fence_t *fence,
 #endif
 }

+static inline void
+nv_dma_fence_set_error(nv_dma_fence_t *fence,
+                       int error) {
+#if defined(NV_DMA_FENCE_SET_ERROR_PRESENT)
+    return dma_fence_set_error(fence, error);
+#elif defined(NV_FENCE_SET_ERROR_PRESENT)
+    return fence_set_error(fence, error);
+#else
+    fence->status = error;
+#endif
+}
+
+static inline int
+nv_dma_fence_add_callback(nv_dma_fence_t *fence,
+                          nv_dma_fence_cb_t *cb,
+                          nv_dma_fence_func_t func) {
+#if defined(NV_LINUX_FENCE_H_PRESENT)
+    return fence_add_callback(fence, cb, func);
+#else
+    return dma_fence_add_callback(fence, cb, func);
+#endif
+}
+
 #endif /* defined(NV_DRM_FENCE_AVAILABLE) */

 #endif /* __NVIDIA_DMA_FENCE_HELPER_H__ */
--- a/kernel-open/nvidia-drm/nvidia-dma-resv-helper.h
+++ b/kernel-open/nvidia-drm/nvidia-dma-resv-helper.h
@@ -121,6 +121,20 @@ static inline void nv_dma_resv_add_excl_fence(nv_dma_resv_t *obj,
 #endif
 }

+static inline void nv_dma_resv_add_shared_fence(nv_dma_resv_t *obj,
+                                                nv_dma_fence_t *fence)
+{
+#if defined(NV_LINUX_DMA_RESV_H_PRESENT)
+#if defined(NV_DMA_RESV_ADD_FENCE_PRESENT)
+    dma_resv_add_fence(obj, fence, DMA_RESV_USAGE_READ);
+#else
+    dma_resv_add_shared_fence(obj, fence);
+#endif
+#else
+    reservation_object_add_shared_fence(obj, fence);
+#endif
+}
+
 #endif /* defined(NV_DRM_FENCE_AVAILABLE) */

 #endif /* __NVIDIA_DMA_RESV_HELPER_H__ */
--- a/kernel-open/nvidia-drm/nvidia-drm-conftest.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-conftest.h
@@ -61,4 +61,15 @@
 #undef NV_DRM_FENCE_AVAILABLE
 #endif

+/*
+ * We can support color management if either drm_helper_crtc_enable_color_mgmt()
+ * or drm_crtc_enable_color_mgmt() exist.
+ */
+#if defined(NV_DRM_HELPER_CRTC_ENABLE_COLOR_MGMT_PRESENT) || \
+    defined(NV_DRM_CRTC_ENABLE_COLOR_MGMT_PRESENT)
+#define NV_DRM_COLOR_MGMT_AVAILABLE
+#else
+#undef NV_DRM_COLOR_MGMT_AVAILABLE
+#endif
+
 #endif /* defined(__NVIDIA_DRM_CONFTEST_H__) */
--- a/kernel-open/nvidia-drm/nvidia-drm-connector.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-connector.c
@@ -349,10 +349,125 @@ nv_drm_connector_best_encoder(struct drm_connector *connector)
    return NULL;
 }

+#if defined(NV_DRM_MODE_CREATE_DP_COLORSPACE_PROPERTY_HAS_SUPPORTED_COLORSPACES_ARG)
+static const NvU32 __nv_drm_connector_supported_colorspaces =
+    BIT(DRM_MODE_COLORIMETRY_BT2020_RGB) |
+    BIT(DRM_MODE_COLORIMETRY_BT2020_YCC);
+#endif
+
+#if defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT)
+static int
+__nv_drm_connector_atomic_check(struct drm_connector *connector,
+                                struct drm_atomic_state *state)
+{
+    struct drm_connector_state *new_connector_state =
+        drm_atomic_get_new_connector_state(state, connector);
+    struct drm_connector_state *old_connector_state =
+        drm_atomic_get_old_connector_state(state, connector);
+    struct nv_drm_device *nv_dev = to_nv_device(connector->dev);
+
+    struct drm_crtc *crtc = new_connector_state->crtc;
+    struct drm_crtc_state *crtc_state;
+    struct nv_drm_crtc_state *nv_crtc_state;
+    struct NvKmsKapiHeadRequestedConfig *req_config;
+
+    if (!crtc) {
+        return 0;
+    }
+
+    crtc_state = drm_atomic_get_new_crtc_state(state, crtc);
+    nv_crtc_state = to_nv_crtc_state(crtc_state);
+    req_config = &nv_crtc_state->req_config;
+
+    /*
+     * Override metadata for the entire head instead of allowing NVKMS to derive
+     * it from the layers' metadata.
+     *
+     * This is the metadata that will sent to the display, and if applicable,
+     * layers will be tone mapped to this metadata rather than that of the
+     * display.
+     */
+    req_config->flags.hdrInfoFrameChanged =
+        !drm_connector_atomic_hdr_metadata_equal(old_connector_state,
+                                                 new_connector_state);
+    if (new_connector_state->hdr_output_metadata &&
+        new_connector_state->hdr_output_metadata->data) {
+
+        /*
+         * Note that HDMI definitions are used here even though we might not
+         * be using HDMI. While that seems odd, it is consistent with
+         * upstream behavior.
+         */
+
+        struct hdr_output_metadata *hdr_metadata =
+            new_connector_state->hdr_output_metadata->data;
+        struct hdr_metadata_infoframe *info_frame =
+            &hdr_metadata->hdmi_metadata_type1;
+        unsigned int i;
+
+        if (hdr_metadata->metadata_type != HDMI_STATIC_METADATA_TYPE1) {
+            return -EINVAL;
+        }
+
+        for (i = 0; i < ARRAY_SIZE(info_frame->display_primaries); i++) {
+            req_config->modeSetConfig.hdrInfoFrame.staticMetadata.displayPrimaries[i].x =
+                info_frame->display_primaries[i].x;
+            req_config->modeSetConfig.hdrInfoFrame.staticMetadata.displayPrimaries[i].y =
+                info_frame->display_primaries[i].y;
+        }
+
+        req_config->modeSetConfig.hdrInfoFrame.staticMetadata.whitePoint.x =
+            info_frame->white_point.x;
+        req_config->modeSetConfig.hdrInfoFrame.staticMetadata.whitePoint.y =
+            info_frame->white_point.y;
+        req_config->modeSetConfig.hdrInfoFrame.staticMetadata.maxDisplayMasteringLuminance =
+            info_frame->max_display_mastering_luminance;
+        req_config->modeSetConfig.hdrInfoFrame.staticMetadata.minDisplayMasteringLuminance =
+            info_frame->min_display_mastering_luminance;
+        req_config->modeSetConfig.hdrInfoFrame.staticMetadata.maxCLL =
+            info_frame->max_cll;
+        req_config->modeSetConfig.hdrInfoFrame.staticMetadata.maxFALL =
+            info_frame->max_fall;
+
+        req_config->modeSetConfig.hdrInfoFrame.eotf = info_frame->eotf;
+
+        req_config->modeSetConfig.hdrInfoFrame.enabled = NV_TRUE;
+    } else {
+        req_config->modeSetConfig.hdrInfoFrame.enabled = NV_FALSE;
+    }
+
+    req_config->flags.colorimetryChanged =
+        (old_connector_state->colorspace != new_connector_state->colorspace);
+    // When adding a case here, also add to __nv_drm_connector_supported_colorspaces
+    switch (new_connector_state->colorspace) {
+        case DRM_MODE_COLORIMETRY_DEFAULT:
+            req_config->modeSetConfig.colorimetry =
+                NVKMS_OUTPUT_COLORIMETRY_DEFAULT;
+            break;
+        case DRM_MODE_COLORIMETRY_BT2020_RGB:
+        case DRM_MODE_COLORIMETRY_BT2020_YCC:
+            // Ignore RGB/YCC
+            // See https://patchwork.freedesktop.org/patch/525496/?series=111865&rev=4
+            req_config->modeSetConfig.colorimetry =
+                NVKMS_OUTPUT_COLORIMETRY_BT2100;
+            break;
+        default:
+            // XXX HDR TODO: Add support for more color spaces
+            NV_DRM_DEV_LOG_ERR(nv_dev, "Unsupported color space");
+            return -EINVAL;
+    }
+
+    return 0;
+}
+#endif /* defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT) */
+
 static const struct drm_connector_helper_funcs nv_connector_helper_funcs = {
    .get_modes    = nv_drm_connector_get_modes,
    .mode_valid   = nv_drm_connector_mode_valid,
    .best_encoder = nv_drm_connector_best_encoder,
+#if defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT)
+    .atomic_check = __nv_drm_connector_atomic_check,
+#endif
 };

 static struct drm_connector*
@@ -405,6 +520,32 @@ nv_drm_connector_new(struct drm_device *dev,
            DRM_CONNECTOR_POLL_CONNECT | DRM_CONNECTOR_POLL_DISCONNECT;
    }

+#if defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT)
+    if (nv_connector->type == NVKMS_CONNECTOR_TYPE_HDMI) {
+#if defined(NV_DRM_MODE_CREATE_DP_COLORSPACE_PROPERTY_HAS_SUPPORTED_COLORSPACES_ARG)
+        if (drm_mode_create_hdmi_colorspace_property(
+                &nv_connector->base,
+                __nv_drm_connector_supported_colorspaces) == 0) {
+#else
+        if (drm_mode_create_hdmi_colorspace_property(&nv_connector->base) == 0) {
+#endif
+            drm_connector_attach_colorspace_property(&nv_connector->base);
+        }
+        drm_connector_attach_hdr_output_metadata_property(&nv_connector->base);
+    } else if (nv_connector->type == NVKMS_CONNECTOR_TYPE_DP) {
+#if defined(NV_DRM_MODE_CREATE_DP_COLORSPACE_PROPERTY_HAS_SUPPORTED_COLORSPACES_ARG)
+        if (drm_mode_create_dp_colorspace_property(
+                &nv_connector->base,
+                __nv_drm_connector_supported_colorspaces) == 0) {
+#else
+        if (drm_mode_create_dp_colorspace_property(&nv_connector->base) == 0) {
+#endif
+            drm_connector_attach_colorspace_property(&nv_connector->base);
+        }
+        drm_connector_attach_hdr_output_metadata_property(&nv_connector->base);
+    }
+#endif /* defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT) */
+
    /* Register connector with DRM subsystem */

    ret = drm_connector_register(&nv_connector->base);
--- a/kernel-open/nvidia-drm/nvidia-drm-crtc.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-crtc.c
@@ -48,6 +48,11 @@
 #include <linux/host1x-next.h>
 #endif

+#if defined(NV_DRM_DRM_COLOR_MGMT_H_PRESENT)
+#include <drm/drm_color_mgmt.h>
+#endif
+
+
 #if defined(NV_DRM_HAS_HDR_OUTPUT_METADATA)
 static int
 nv_drm_atomic_replace_property_blob_from_id(struct drm_device *dev,
@@ -399,27 +404,25 @@ plane_req_config_update(struct drm_plane *plane,
        }

        for (i = 0; i < ARRAY_SIZE(info_frame->display_primaries); i ++) {
-            req_config->config.hdrMetadata.displayPrimaries[i].x =
+            req_config->config.hdrMetadata.val.displayPrimaries[i].x =
                info_frame->display_primaries[i].x;
-            req_config->config.hdrMetadata.displayPrimaries[i].y =
+            req_config->config.hdrMetadata.val.displayPrimaries[i].y =
                info_frame->display_primaries[i].y;
        }

-        req_config->config.hdrMetadata.whitePoint.x =
+        req_config->config.hdrMetadata.val.whitePoint.x =
            info_frame->white_point.x;
-        req_config->config.hdrMetadata.whitePoint.y =
+        req_config->config.hdrMetadata.val.whitePoint.y =
            info_frame->white_point.y;
-        req_config->config.hdrMetadata.maxDisplayMasteringLuminance =
+        req_config->config.hdrMetadata.val.maxDisplayMasteringLuminance =
            info_frame->max_display_mastering_luminance;
-        req_config->config.hdrMetadata.minDisplayMasteringLuminance =
+        req_config->config.hdrMetadata.val.minDisplayMasteringLuminance =
            info_frame->min_display_mastering_luminance;
-        req_config->config.hdrMetadata.maxCLL =
+        req_config->config.hdrMetadata.val.maxCLL =
            info_frame->max_cll;
-        req_config->config.hdrMetadata.maxFALL =
+        req_config->config.hdrMetadata.val.maxFALL =
            info_frame->max_fall;

-        req_config->config.hdrMetadataSpecified = true;
-
        switch (info_frame->eotf) {
            case HDMI_EOTF_SMPTE_ST2084:
                req_config->config.tf = NVKMS_OUTPUT_TF_PQ;
@@ -432,10 +435,21 @@ plane_req_config_update(struct drm_plane *plane,
                NV_DRM_DEV_LOG_ERR(nv_dev, "Unsupported EOTF");
                return -1;
        }
+
+        req_config->config.hdrMetadata.enabled = true;
    } else {
-        req_config->config.hdrMetadataSpecified = false;
+        req_config->config.hdrMetadata.enabled = false;
        req_config->config.tf = NVKMS_OUTPUT_TF_NONE;
    }
+
+    req_config->flags.hdrMetadataChanged =
+        ((old_config.hdrMetadata.enabled !=
+          req_config->config.hdrMetadata.enabled) ||
+         memcmp(&old_config.hdrMetadata.val,
+                &req_config->config.hdrMetadata.val,
+                sizeof(struct NvKmsHDRStaticMetadata)));
+
+    req_config->flags.tfChanged = (old_config.tf != req_config->config.tf);
 #endif

    /*
@@ -692,9 +706,11 @@ static inline void __nv_drm_plane_atomic_destroy_state(
 #endif

 #if defined(NV_DRM_HAS_HDR_OUTPUT_METADATA)
-    struct nv_drm_plane_state *nv_drm_plane_state =
-        to_nv_drm_plane_state(state);
-    drm_property_blob_put(nv_drm_plane_state->hdr_output_metadata);
+    {
+        struct nv_drm_plane_state *nv_drm_plane_state =
+            to_nv_drm_plane_state(state);
+        drm_property_blob_put(nv_drm_plane_state->hdr_output_metadata);
+    }
 #endif
 }

@@ -800,6 +816,9 @@ nv_drm_atomic_crtc_duplicate_state(struct drm_crtc *crtc)
        &(to_nv_crtc_state(crtc->state)->req_config),
        &nv_state->req_config);

+    nv_state->ilut_ramps = NULL;
+    nv_state->olut_ramps = NULL;
+
    return &nv_state->base;
 }

@@ -823,6 +842,9 @@ static void nv_drm_atomic_crtc_destroy_state(struct drm_crtc *crtc,

    __nv_drm_atomic_helper_crtc_destroy_state(crtc, &nv_state->base);

+    nv_drm_free(nv_state->ilut_ramps);
+    nv_drm_free(nv_state->olut_ramps);
+
    nv_drm_free(nv_state);
 }

@@ -833,6 +855,9 @@ static struct drm_crtc_funcs nv_crtc_funcs = {
    .destroy                = nv_drm_crtc_destroy,
    .atomic_duplicate_state = nv_drm_atomic_crtc_duplicate_state,
    .atomic_destroy_state   = nv_drm_atomic_crtc_destroy_state,
+#if defined(NV_DRM_ATOMIC_HELPER_LEGACY_GAMMA_SET_PRESENT)
+    .gamma_set = drm_atomic_helper_legacy_gamma_set,
+#endif
 };

 /*
@@ -866,6 +891,198 @@ static int head_modeset_config_attach_connector(
    return 0;
 }

+#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
+static int color_mgmt_config_copy_lut(struct NvKmsLutRamps *nvkms_lut,
+                                      struct drm_color_lut *drm_lut,
+                                      uint64_t lut_len)
+{
+    uint64_t i = 0;
+    if (lut_len != NVKMS_LUT_ARRAY_SIZE) {
+        return -EINVAL;
+    }
+
+    /*
+     * Both NvKms and drm LUT values are 16-bit linear values. NvKms LUT ramps
+     * are in arrays in a single struct while drm LUT ramps are an array of
+     * structs.
+     */
+    for (i = 0; i < lut_len; i++) {
+        nvkms_lut->red[i]   = drm_lut[i].red;
+        nvkms_lut->green[i] = drm_lut[i].green;
+        nvkms_lut->blue[i]  = drm_lut[i].blue;
+    }
+    return 0;
+}
+
+static void color_mgmt_config_ctm_to_csc(struct NvKmsCscMatrix *nvkms_csc,
+                                         struct drm_color_ctm  *drm_ctm)
+{
+    int y;
+
+    /* CTM is a 3x3 matrix while ours is 3x4. Zero out the last column. */
+    nvkms_csc->m[0][3] = nvkms_csc->m[1][3] = nvkms_csc->m[2][3] = 0;
+
+    for (y = 0; y < 3; y++) {
+        int x;
+
+        for (x = 0; x < 3; x++) {
+            /*
+             * Values in the CTM are encoded in S31.32 sign-magnitude fixed-
+             * point format, while NvKms CSC values are signed 2's-complement
+             * S15.16 (Ssign-extend12-3.16?) fixed-point format.
+             */
+            NvU64 ctmVal = drm_ctm->matrix[y*3 + x];
+            NvU64 signBit = ctmVal & (1ULL << 63);
+            NvU64 magnitude = ctmVal & ~signBit;
+
+            /*
+             * Drop the low 16 bits of the fractional part and the high 17 bits
+             * of the integral part. Drop 17 bits to avoid corner cases where
+             * the highest resulting bit is a 1, causing the `cscVal = -cscVal`
+             * line to result in a positive number.
+             */
+            NvS32 cscVal = (magnitude >> 16) & ((1ULL << 31) - 1);
+            if (signBit) {
+                cscVal = -cscVal;
+            }
+
+            nvkms_csc->m[y][x] = cscVal;
+        }
+    }
+}
+
+static int color_mgmt_config_set(struct nv_drm_crtc_state *nv_crtc_state,
+                                 struct NvKmsKapiHeadRequestedConfig *req_config)
+{
+    struct NvKmsKapiHeadModeSetConfig *modeset_config =
+        &req_config->modeSetConfig;
+    struct drm_crtc_state *crtc_state = &nv_crtc_state->base;
+    int ret = 0;
+
+    struct drm_color_lut *degamma_lut = NULL;
+    struct drm_color_ctm *ctm = NULL;
+    struct drm_color_lut *gamma_lut = NULL;
+    uint64_t degamma_len = 0;
+    uint64_t gamma_len = 0;
+
+    int i;
+    struct drm_plane *plane;
+    struct drm_plane_state *plane_state;
+
+    /*
+     * According to the comment in the Linux kernel's
+     * drivers/gpu/drm/drm_color_mgmt.c, if any of these properties are NULL,
+     * that LUT or CTM needs to be changed to a linear LUT or identity matrix
+     * respectively.
+     */
+
+    req_config->flags.lutChanged = NV_TRUE;
+    if (crtc_state->degamma_lut) {
+        nv_crtc_state->ilut_ramps = nv_drm_calloc(1, sizeof(*nv_crtc_state->ilut_ramps));
+        if (!nv_crtc_state->ilut_ramps) {
+            ret = -ENOMEM;
+            goto fail;
+        }
+
+        degamma_lut = (struct drm_color_lut *)crtc_state->degamma_lut->data;
+        degamma_len = crtc_state->degamma_lut->length /
+                      sizeof(struct drm_color_lut);
+
+        if ((ret = color_mgmt_config_copy_lut(nv_crtc_state->ilut_ramps,
+                                              degamma_lut,
+                                              degamma_len)) != 0) {
+            goto fail;
+        }
+
+        modeset_config->lut.input.specified = NV_TRUE;
+        modeset_config->lut.input.depth     = 30; /* specify the full LUT */
+        modeset_config->lut.input.start     = 0;
+        modeset_config->lut.input.end       = degamma_len - 1;
+        modeset_config->lut.input.pRamps    = nv_crtc_state->ilut_ramps;
+    } else {
+        /* setting input.end to 0 is equivalent to disabling the LUT, which
+         * should be equivalent to a linear LUT */
+        modeset_config->lut.input.specified = NV_TRUE;
+        modeset_config->lut.input.depth     = 30; /* specify the full LUT */
+        modeset_config->lut.input.start     = 0;
+        modeset_config->lut.input.end       = 0;
+        modeset_config->lut.input.pRamps    = NULL;
+    }
+
+    nv_drm_for_each_new_plane_in_state(crtc_state->state, plane,
+                                       plane_state, i) {
+        struct nv_drm_plane *nv_plane = to_nv_plane(plane);
+        uint32_t layer = nv_plane->layer_idx;
+        struct NvKmsKapiLayerRequestedConfig *layer_config;
+
+        if (layer == NVKMS_KAPI_LAYER_INVALID_IDX || plane_state->crtc != crtc_state->crtc) {
+            continue;
+        }
+        layer_config = &req_config->layerRequestedConfig[layer];
+
+        if (layer == NVKMS_KAPI_LAYER_PRIMARY_IDX && crtc_state->ctm) {
+            ctm = (struct drm_color_ctm *)crtc_state->ctm->data;
+
+            color_mgmt_config_ctm_to_csc(&layer_config->config.csc, ctm);
+            layer_config->config.cscUseMain = NV_FALSE;
+        } else {
+            /* When crtc_state->ctm is unset, this also sets the main layer to
+             * the identity matrix.
+             */
+            layer_config->config.csc = NVKMS_IDENTITY_CSC_MATRIX;
+        }
+        layer_config->flags.cscChanged = NV_TRUE;
+    }
+
+    if (crtc_state->gamma_lut) {
+        nv_crtc_state->olut_ramps = nv_drm_calloc(1, sizeof(*nv_crtc_state->olut_ramps));
+        if (!nv_crtc_state->olut_ramps) {
+            ret = -ENOMEM;
+            goto fail;
+        }
+
+        gamma_lut = (struct drm_color_lut *)crtc_state->gamma_lut->data;
+        gamma_len = crtc_state->gamma_lut->length /
+                    sizeof(struct drm_color_lut);
+
+        if ((ret = color_mgmt_config_copy_lut(nv_crtc_state->olut_ramps,
+                                              gamma_lut,
+                                              gamma_len)) != 0) {
+            goto fail;
+        }
+
+        modeset_config->lut.output.specified = NV_TRUE;
+        modeset_config->lut.output.enabled   = NV_TRUE;
+        modeset_config->lut.output.pRamps    = nv_crtc_state->olut_ramps;
+    } else {
+        /* disabling the output LUT should be equivalent to setting a linear
+         * LUT */
+        modeset_config->lut.output.specified = NV_TRUE;
+        modeset_config->lut.output.enabled   = NV_FALSE;
+        modeset_config->lut.output.pRamps    = NULL;
+    }
+
+    return 0;
+
+fail:
+    /* free allocated state */
+    nv_drm_free(nv_crtc_state->ilut_ramps);
+    nv_drm_free(nv_crtc_state->olut_ramps);
+
+    /* remove dangling pointers */
+    nv_crtc_state->ilut_ramps = NULL;
+    nv_crtc_state->olut_ramps = NULL;
+    modeset_config->lut.input.pRamps = NULL;
+    modeset_config->lut.output.pRamps = NULL;
+
+    /* prevent attempts at reading NULLs */
+    modeset_config->lut.input.specified = NV_FALSE;
+    modeset_config->lut.output.specified = NV_FALSE;
+
+    return ret;
+}
+#endif /* NV_DRM_COLOR_MGMT_AVAILABLE */
+
 /**
 * nv_drm_crtc_atomic_check() can fail after it has modified
 * the 'nv_drm_crtc_state::req_config', that is fine because 'nv_drm_crtc_state'
@@ -887,6 +1104,9 @@ static int nv_drm_crtc_atomic_check(struct drm_crtc *crtc,
    struct NvKmsKapiHeadRequestedConfig *req_config =
        &nv_crtc_state->req_config;
    int ret = 0;
+#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
+    struct nv_drm_device *nv_dev = to_nv_device(crtc_state->crtc->dev);
+#endif

    if (crtc_state->mode_changed) {
        drm_mode_to_nvkms_display_mode(&crtc_state->mode,
@@ -925,6 +1145,25 @@ static int nv_drm_crtc_atomic_check(struct drm_crtc *crtc,
        req_config->flags.activeChanged = NV_TRUE;
    }

+#if defined(NV_DRM_CRTC_STATE_HAS_VRR_ENABLED)
+    req_config->modeSetConfig.vrrEnabled = crtc_state->vrr_enabled;
+#endif
+
+#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
+    if (nv_dev->drmMasterChangedSinceLastAtomicCommit &&
+        (crtc_state->degamma_lut ||
+         crtc_state->ctm ||
+         crtc_state->gamma_lut)) {
+
+        crtc_state->color_mgmt_changed = NV_TRUE;
+    }
+    if (crtc_state->color_mgmt_changed) {
+        if ((ret = color_mgmt_config_set(nv_crtc_state, req_config)) != 0) {
+            return ret;
+        }
+    }
+#endif
+
    return ret;
 }

@@ -1156,6 +1395,8 @@ nv_drm_plane_create(struct drm_device *dev,
            plane,
            validLayerRRTransforms);

+    nv_drm_free(formats);
+
    return plane;

 failed_plane_init:
@@ -1220,6 +1461,22 @@ static struct drm_crtc *__nv_drm_crtc_create(struct nv_drm_device *nv_dev,

    drm_crtc_helper_add(&nv_crtc->base, &nv_crtc_helper_funcs);

+#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
+#if defined(NV_DRM_CRTC_ENABLE_COLOR_MGMT_PRESENT)
+    drm_crtc_enable_color_mgmt(&nv_crtc->base, NVKMS_LUT_ARRAY_SIZE, true,
+                               NVKMS_LUT_ARRAY_SIZE);
+#else
+    drm_helper_crtc_enable_color_mgmt(&nv_crtc->base, NVKMS_LUT_ARRAY_SIZE,
+                                      NVKMS_LUT_ARRAY_SIZE);
+#endif
+    ret = drm_mode_crtc_set_gamma_size(&nv_crtc->base, NVKMS_LUT_ARRAY_SIZE);
+    if (ret != 0) {
+        NV_DRM_DEV_LOG_WARN(
+            nv_dev,
+            "Failed to initialize legacy gamma support for head %u", head);
+    }
+#endif
+
    return &nv_crtc->base;

 failed_init_crtc:
@@ -1328,10 +1585,16 @@ static void NvKmsKapiCrcsToDrm(const struct NvKmsKapiCrcs *crcs,
 {
    drmCrcs->outputCrc32.value = crcs->outputCrc32.value;
    drmCrcs->outputCrc32.supported = crcs->outputCrc32.supported;
+    drmCrcs->outputCrc32.__pad0 = 0;
+    drmCrcs->outputCrc32.__pad1 = 0;
    drmCrcs->rasterGeneratorCrc32.value = crcs->rasterGeneratorCrc32.value;
    drmCrcs->rasterGeneratorCrc32.supported = crcs->rasterGeneratorCrc32.supported;
+    drmCrcs->rasterGeneratorCrc32.__pad0 = 0;
+    drmCrcs->rasterGeneratorCrc32.__pad1 = 0;
    drmCrcs->compositorCrc32.value = crcs->compositorCrc32.value;
    drmCrcs->compositorCrc32.supported = crcs->compositorCrc32.supported;
+    drmCrcs->compositorCrc32.__pad0 = 0;
+    drmCrcs->compositorCrc32.__pad1 = 0;
 }

 int nv_drm_get_crtc_crc32_v2_ioctl(struct drm_device *dev,
--- a/kernel-open/nvidia-drm/nvidia-drm-crtc.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-crtc.h
@@ -129,6 +129,9 @@ struct nv_drm_crtc_state {
     */
    struct NvKmsKapiHeadRequestedConfig req_config;

+    struct NvKmsLutRamps *ilut_ramps;
+    struct NvKmsLutRamps *olut_ramps;
+
    /**
     * @nv_flip:
     *
--- a/kernel-open/nvidia-drm/nvidia-drm-drv.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-drv.c
@@ -44,6 +44,10 @@
 #include <drm/drmP.h>
 #endif

+#if defined(NV_DRM_DRM_ATOMIC_UAPI_H_PRESENT)
+#include <drm/drm_atomic_uapi.h>
+#endif
+
 #if defined(NV_DRM_DRM_VBLANK_H_PRESENT)
 #include <drm/drm_vblank.h>
 #endif
@@ -60,6 +64,15 @@
 #include <drm/drm_ioctl.h>
 #endif

+#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
+#include <drm/drm_aperture.h>
+#include <drm/drm_fb_helper.h>
+#endif
+
+#if defined(NV_DRM_DRM_FBDEV_GENERIC_H_PRESENT)
+#include <drm/drm_fbdev_generic.h>
+#endif
+
 #include <linux/pci.h>

 /*
@@ -84,6 +97,11 @@
 #include <drm/drm_atomic_helper.h>
 #endif

+static int nv_drm_revoke_modeset_permission(struct drm_device *dev,
+                                            struct drm_file *filep,
+                                            NvU32 dpyId);
+static int nv_drm_revoke_sub_ownership(struct drm_device *dev);
+
 static struct nv_drm_device *dev_list = NULL;

 static const char* nv_get_input_colorspace_name(
@@ -460,6 +478,11 @@ static int nv_drm_load(struct drm_device *dev, unsigned long flags)

    nv_dev->supportsSyncpts = resInfo.caps.supportsSyncpts;

+    nv_dev->semsurf_stride = resInfo.caps.semsurf.stride;
+
+    nv_dev->semsurf_max_submitted_offset =
+        resInfo.caps.semsurf.maxSubmittedOffset;
+
 #if defined(NV_DRM_FORMAT_MODIFIERS_PRESENT)
    gen = nv_dev->pageKindGeneration;
    kind = nv_dev->genericPageKind;
@@ -546,6 +569,8 @@ static void __nv_drm_unload(struct drm_device *dev)

    mutex_lock(&nv_dev->lock);

+    WARN_ON(nv_dev->subOwnershipGranted);
+
    /* Disable event handling */

    atomic_set(&nv_dev->enable_event_handling, false);
@@ -595,9 +620,15 @@ static int __nv_drm_master_set(struct drm_device *dev,
 {
    struct nv_drm_device *nv_dev = to_nv_device(dev);

-    if (!nvKms->grabOwnership(nv_dev->pDevice)) {
+    /*
+     * If this device is driving a framebuffer, then nvidia-drm already has
+     * modeset ownership. Otherwise, grab ownership now.
+     */
+    if (!nv_dev->hasFramebufferConsole &&
+        !nvKms->grabOwnership(nv_dev->pDevice)) {
        return -EINVAL;
    }
+    nv_dev->drmMasterChangedSinceLastAtomicCommit = NV_TRUE;

    return 0;
 }
@@ -631,6 +662,9 @@ void nv_drm_master_drop(struct drm_device *dev, struct drm_file *file_priv)
    struct nv_drm_device *nv_dev = to_nv_device(dev);
    int err;

+    nv_drm_revoke_modeset_permission(dev, file_priv, 0);
+    nv_drm_revoke_sub_ownership(dev);
+
    /*
     * After dropping nvkms modeset onwership, it is not guaranteed that
     * drm and nvkms modeset state will remain in sync.  Therefore, disable
@@ -655,7 +689,9 @@ void nv_drm_master_drop(struct drm_device *dev, struct drm_file *file_priv)

    drm_modeset_unlock_all(dev);

-    nvKms->releaseOwnership(nv_dev->pDevice);
+    if (!nv_dev->hasFramebufferConsole) {
+        nvKms->releaseOwnership(nv_dev->pDevice);
+    }
 }
 #endif /* NV_DRM_ATOMIC_MODESET_AVAILABLE */

@@ -693,15 +729,24 @@ static int nv_drm_get_dev_info_ioctl(struct drm_device *dev,

    params->gpu_id = nv_dev->gpu_info.gpu_id;
    params->primary_index = dev->primary->index;
+    params->generic_page_kind = 0;
+    params->page_kind_generation = 0;
+    params->sector_layout = 0;
+    params->supports_sync_fd = false;
+    params->supports_semsurf = false;
+
 #if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
    params->generic_page_kind = nv_dev->genericPageKind;
    params->page_kind_generation = nv_dev->pageKindGeneration;
    params->sector_layout = nv_dev->sectorLayout;
-#else
-    params->generic_page_kind = 0;
-    params->page_kind_generation = 0;
-    params->sector_layout = 0;
-#endif
+    /* Semaphore surfaces are only supported if the modeset = 1 parameter is set */
+    if ((nv_dev->pDevice) != NULL && (nv_dev->semsurf_stride != 0)) {
+        params->supports_semsurf = true;
+#if defined(NV_SYNC_FILE_GET_FENCE_PRESENT)
+        params->supports_sync_fd = true;
+#endif /* defined(NV_SYNC_FILE_GET_FENCE_PRESENT) */
+    }
+#endif /* defined(NV_DRM_ATOMIC_MODESET_AVAILABLE) */

    return 0;
 }
@@ -833,10 +878,10 @@ static NvU32 nv_drm_get_head_bit_from_connector(struct drm_connector *connector)
    return 0;
 }

-static int nv_drm_grant_permission_ioctl(struct drm_device *dev, void *data,
-                                         struct drm_file *filep)
+static int nv_drm_grant_modeset_permission(struct drm_device *dev,
+                                           struct drm_nvidia_grant_permissions_params *params,
+                                           struct drm_file *filep)
 {
-    struct drm_nvidia_grant_permissions_params *params = data;
    struct nv_drm_device *nv_dev = to_nv_device(dev);
    struct nv_drm_connector *target_nv_connector = NULL;
    struct nv_drm_crtc *target_nv_crtc = NULL;
@@ -958,26 +1003,102 @@ done:
    return ret;
 }

-static bool nv_drm_revoke_connector(struct nv_drm_device *nv_dev,
-                                    struct nv_drm_connector *nv_connector)
+static int nv_drm_grant_sub_ownership(struct drm_device *dev,
+                                      struct drm_nvidia_grant_permissions_params *params)
 {
-    bool ret = true;
-    if (nv_connector->modeset_permission_crtc) {
-        if (nv_connector->nv_detected_encoder) {
-            ret = nvKms->revokePermissions(
-                nv_dev->pDevice, nv_connector->modeset_permission_crtc->head,
-                nv_connector->nv_detected_encoder->hDisplay);
-        }
-        nv_connector->modeset_permission_crtc->modeset_permission_filep = NULL;
-        nv_connector->modeset_permission_crtc = NULL;
+    int ret = -EINVAL;
+    struct nv_drm_device *nv_dev = to_nv_device(dev);
+    struct drm_modeset_acquire_ctx *pctx;
+#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
+    struct drm_modeset_acquire_ctx ctx;
+    DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, DRM_MODESET_ACQUIRE_INTERRUPTIBLE,
+                                ret);
+    pctx = &ctx;
+#else
+    mutex_lock(&dev->mode_config.mutex);
+    pctx = dev->mode_config.acquire_ctx;
+#endif
+
+    if (nv_dev->subOwnershipGranted ||
+        !nvKms->grantSubOwnership(params->fd, nv_dev->pDevice)) {
+        goto done;
    }
-    nv_connector->modeset_permission_filep = NULL;
-    return ret;
+
+    /*
+     * When creating an ownership grant, shut down all heads and disable flip
+     * notifications.
+     */
+    ret = nv_drm_atomic_helper_disable_all(dev, pctx);
+    if (ret != 0) {
+        NV_DRM_DEV_LOG_ERR(
+            nv_dev,
+            "nv_drm_atomic_helper_disable_all failed with error code %d!",
+            ret);
+    }
+
+    atomic_set(&nv_dev->enable_event_handling, false);
+    nv_dev->subOwnershipGranted = NV_TRUE;
+
+    ret = 0;
+
+done:
+#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
+    DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
+#else
+    mutex_unlock(&dev->mode_config.mutex);
+#endif
+    return 0;
 }

-static int nv_drm_revoke_permission(struct drm_device *dev,
-                                    struct drm_file *filep, NvU32 dpyId)
+static int nv_drm_grant_permission_ioctl(struct drm_device *dev, void *data,
+                                         struct drm_file *filep)
 {
+    struct drm_nvidia_grant_permissions_params *params = data;
+
+    if (params->type == NV_DRM_PERMISSIONS_TYPE_MODESET) {
+        return nv_drm_grant_modeset_permission(dev, params, filep);
+    } else if (params->type == NV_DRM_PERMISSIONS_TYPE_SUB_OWNER) {
+        return nv_drm_grant_sub_ownership(dev, params);
+    }
+
+    return -EINVAL;
+}
+
+static int
+nv_drm_atomic_disable_connector(struct drm_atomic_state *state,
+                                struct nv_drm_connector *nv_connector)
+{
+    struct drm_crtc_state *crtc_state;
+    struct drm_connector_state *connector_state;
+    int ret = 0;
+
+    if (nv_connector->modeset_permission_crtc) {
+        crtc_state = drm_atomic_get_crtc_state(
+            state, &nv_connector->modeset_permission_crtc->base);
+        if (!crtc_state) {
+            return -EINVAL;
+        }
+
+        crtc_state->active = false;
+        ret = drm_atomic_set_mode_prop_for_crtc(crtc_state, NULL);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    connector_state = drm_atomic_get_connector_state(state, &nv_connector->base);
+    if (!connector_state) {
+        return -EINVAL;
+    }
+
+    return drm_atomic_set_crtc_for_connector(connector_state, NULL);
+}
+
+static int nv_drm_revoke_modeset_permission(struct drm_device *dev,
+                                            struct drm_file *filep, NvU32 dpyId)
+{
+    struct drm_modeset_acquire_ctx *pctx;
+    struct drm_atomic_state *state;
    struct drm_connector *connector;
    struct drm_crtc *crtc;
    int ret = 0;
@@ -988,10 +1109,19 @@ static int nv_drm_revoke_permission(struct drm_device *dev,
    struct drm_modeset_acquire_ctx ctx;
    DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, DRM_MODESET_ACQUIRE_INTERRUPTIBLE,
                               ret);
+    pctx = &ctx;
 #else
    mutex_lock(&dev->mode_config.mutex);
+    pctx = dev->mode_config.acquire_ctx;
 #endif

+    state = drm_atomic_state_alloc(dev);
+    if (!state) {
+        ret = -ENOMEM;
+        goto done;
+    }
+    state->acquire_ctx = pctx;
+
    /*
     * If dpyId is set, only revoke those specific resources. Otherwise,
     * it is from closing the file so revoke all resources for that filep.
@@ -1003,10 +1133,13 @@ static int nv_drm_revoke_permission(struct drm_device *dev,
        struct nv_drm_connector *nv_connector = to_nv_connector(connector);
        if (nv_connector->modeset_permission_filep == filep &&
            (!dpyId || nv_drm_connector_is_dpy_id(connector, dpyId))) {
-            if (!nv_drm_connector_revoke_permissions(dev, nv_connector)) {
-                ret = -EINVAL;
-                // Continue trying to revoke as much as possible.
+            ret = nv_drm_atomic_disable_connector(state, nv_connector);
+            if (ret < 0) {
+                goto done;
            }
+
+            // Continue trying to revoke as much as possible.
+            nv_drm_connector_revoke_permissions(dev, nv_connector);
        }
    }
 #if defined(NV_DRM_CONNECTOR_LIST_ITER_PRESENT)
@@ -1020,6 +1153,25 @@ static int nv_drm_revoke_permission(struct drm_device *dev,
        }
    }

+    ret = drm_atomic_commit(state);
+done:
+#if defined(NV_DRM_ATOMIC_STATE_REF_COUNTING_PRESENT)
+    drm_atomic_state_put(state);
+#else
+    if (ret != 0) {
+        drm_atomic_state_free(state);
+    } else {
+        /*
+         * In case of success, drm_atomic_commit() takes care to cleanup and
+         * free @state.
+         *
+         * Comment placed above drm_atomic_commit() says: The caller must not
+         * free or in any other way access @state. If the function fails then
+         * the caller must clean up @state itself.
+         */
+    }
+#endif
+
 #if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
    DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
 #else
@@ -1029,14 +1181,55 @@ static int nv_drm_revoke_permission(struct drm_device *dev,
    return ret;
 }

+static int nv_drm_revoke_sub_ownership(struct drm_device *dev)
+{
+    int ret = -EINVAL;
+    struct nv_drm_device *nv_dev = to_nv_device(dev);
+#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
+    struct drm_modeset_acquire_ctx ctx;
+    DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, DRM_MODESET_ACQUIRE_INTERRUPTIBLE,
+                               ret);
+#else
+    mutex_lock(&dev->mode_config.mutex);
+#endif
+
+    if (!nv_dev->subOwnershipGranted) {
+        goto done;
+    }
+
+    if (!nvKms->revokeSubOwnership(nv_dev->pDevice)) {
+        NV_DRM_DEV_LOG_ERR(nv_dev, "Failed to revoke sub-ownership from NVKMS");
+        goto done;
+    }
+
+    nv_dev->subOwnershipGranted = NV_FALSE;
+    atomic_set(&nv_dev->enable_event_handling, true);
+    ret = 0;
+
+done:
+#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
+    DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
+#else
+    mutex_unlock(&dev->mode_config.mutex);
+#endif
+    return ret;
+}
+
 static int nv_drm_revoke_permission_ioctl(struct drm_device *dev, void *data,
                                          struct drm_file *filep)
 {
    struct drm_nvidia_revoke_permissions_params *params = data;
-    if (!params->dpyId) {
-        return -EINVAL;
+
+    if (params->type == NV_DRM_PERMISSIONS_TYPE_MODESET) {
+        if (!params->dpyId) {
+            return -EINVAL;
+        }
+        return nv_drm_revoke_modeset_permission(dev, filep, params->dpyId);
+    } else if (params->type == NV_DRM_PERMISSIONS_TYPE_SUB_OWNER) {
+        return nv_drm_revoke_sub_ownership(dev);
    }
-    return nv_drm_revoke_permission(dev, filep, params->dpyId);
+
+    return -EINVAL;
 }

 static void nv_drm_postclose(struct drm_device *dev, struct drm_file *filep)
@@ -1051,7 +1244,7 @@ static void nv_drm_postclose(struct drm_device *dev, struct drm_file *filep)
        dev->mode_config.num_connector > 0 &&
        dev->mode_config.connector_list.next != NULL &&
        dev->mode_config.connector_list.prev != NULL) {
-        nv_drm_revoke_permission(dev, filep, 0);
+        nv_drm_revoke_modeset_permission(dev, filep, 0);
    }
 }
 #endif /* NV_DRM_ATOMIC_MODESET_AVAILABLE */
@@ -1310,6 +1503,18 @@ static const struct drm_ioctl_desc nv_drm_ioctls[] = {
    DRM_IOCTL_DEF_DRV(NVIDIA_GEM_PRIME_FENCE_ATTACH,
                      nv_drm_gem_prime_fence_attach_ioctl,
                      DRM_RENDER_ALLOW|DRM_UNLOCKED),
+    DRM_IOCTL_DEF_DRV(NVIDIA_SEMSURF_FENCE_CTX_CREATE,
+                      nv_drm_semsurf_fence_ctx_create_ioctl,
+                      DRM_RENDER_ALLOW|DRM_UNLOCKED),
+    DRM_IOCTL_DEF_DRV(NVIDIA_SEMSURF_FENCE_CREATE,
+                      nv_drm_semsurf_fence_create_ioctl,
+                      DRM_RENDER_ALLOW|DRM_UNLOCKED),
+    DRM_IOCTL_DEF_DRV(NVIDIA_SEMSURF_FENCE_WAIT,
+                      nv_drm_semsurf_fence_wait_ioctl,
+                      DRM_RENDER_ALLOW|DRM_UNLOCKED),
+    DRM_IOCTL_DEF_DRV(NVIDIA_SEMSURF_FENCE_ATTACH,
+                      nv_drm_semsurf_fence_attach_ioctl,
+                      DRM_RENDER_ALLOW|DRM_UNLOCKED),
 #endif

    DRM_IOCTL_DEF_DRV(NVIDIA_GET_CLIENT_CAPABILITY,
@@ -1367,8 +1572,23 @@ static struct drm_driver nv_drm_driver = {
    .ioctls                 = nv_drm_ioctls,
    .num_ioctls             = ARRAY_SIZE(nv_drm_ioctls),

+/*
+ * linux-next commit 71a7974ac701 ("drm/prime: Unexport helpers for fd/handle
+ * conversion") unexports drm_gem_prime_handle_to_fd() and
+ * drm_gem_prime_fd_to_handle().
+ *
+ * Prior linux-next commit 6b85aa68d9d5 ("drm: Enable PRIME import/export for
+ * all drivers") made these helpers the default when .prime_handle_to_fd /
+ * .prime_fd_to_handle are unspecified, so it's fine to just skip specifying
+ * them if the helpers aren't present.
+ */
+#if NV_IS_EXPORT_SYMBOL_PRESENT_drm_gem_prime_handle_to_fd
    .prime_handle_to_fd     = drm_gem_prime_handle_to_fd,
+#endif
+#if NV_IS_EXPORT_SYMBOL_PRESENT_drm_gem_prime_fd_to_handle
    .prime_fd_to_handle     = drm_gem_prime_fd_to_handle,
+#endif
+
    .gem_prime_import       = nv_drm_gem_prime_import,
    .gem_prime_import_sg_table = nv_drm_gem_prime_import_sg_table,

@@ -1498,6 +1718,30 @@ static void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)
        goto failed_drm_register;
    }

+#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
+    if (nv_drm_fbdev_module_param &&
+        drm_core_check_feature(dev, DRIVER_MODESET)) {
+
+        if (!nvKms->grabOwnership(nv_dev->pDevice)) {
+            NV_DRM_DEV_LOG_ERR(nv_dev, "Failed to grab NVKMS modeset ownership");
+            goto failed_grab_ownership;
+        }
+
+        if (device->bus == &pci_bus_type) {
+            struct pci_dev *pdev = to_pci_dev(device);
+
+#if defined(NV_DRM_APERTURE_REMOVE_CONFLICTING_PCI_FRAMEBUFFERS_HAS_DRIVER_ARG)
+            drm_aperture_remove_conflicting_pci_framebuffers(pdev, &nv_drm_driver);
+#else
+            drm_aperture_remove_conflicting_pci_framebuffers(pdev, nv_drm_driver.name);
+#endif
+        }
+        drm_fbdev_generic_setup(dev, 32);
+
+        nv_dev->hasFramebufferConsole = NV_TRUE;
+    }
+#endif /* defined(NV_DRM_FBDEV_GENERIC_AVAILABLE) */
+
    /* Add NVIDIA-DRM device into list */

    nv_dev->next = dev_list;
@@ -1505,6 +1749,12 @@ static void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)

    return; /* Success */

+#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
+failed_grab_ownership:
+
+    drm_dev_unregister(dev);
+#endif
+
 failed_drm_register:

    nv_drm_dev_free(dev);
@@ -1567,9 +1817,16 @@ void nv_drm_remove_devices(void)
 {
    while (dev_list != NULL) {
        struct nv_drm_device *next = dev_list->next;
+        struct drm_device *dev = dev_list->dev;

-        drm_dev_unregister(dev_list->dev);
-        nv_drm_dev_free(dev_list->dev);
+#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
+        if (dev_list->hasFramebufferConsole) {
+            drm_atomic_helper_shutdown(dev);
+            nvKms->releaseOwnership(dev_list->pDevice);
+        }
+#endif
+        drm_dev_unregister(dev);
+        nv_drm_dev_free(dev);

        nv_drm_free(dev_list);

--- a/kernel-open/nvidia-drm/nvidia-drm-fence.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-fence.c
--- a/kernel-open/nvidia-drm/nvidia-drm-fence.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-fence.h
@@ -41,6 +41,22 @@ int nv_drm_prime_fence_context_create_ioctl(struct drm_device *dev,
 int nv_drm_gem_prime_fence_attach_ioctl(struct drm_device *dev,
                                        void *data, struct drm_file *filep);

+int nv_drm_semsurf_fence_ctx_create_ioctl(struct drm_device *dev,
+                                          void *data,
+                                          struct drm_file *filep);
+
+int nv_drm_semsurf_fence_create_ioctl(struct drm_device *dev,
+                                      void *data,
+                                      struct drm_file *filep);
+
+int nv_drm_semsurf_fence_wait_ioctl(struct drm_device *dev,
+                                    void *data,
+                                    struct drm_file *filep);
+
+int nv_drm_semsurf_fence_attach_ioctl(struct drm_device *dev,
+                                      void *data,
+                                      struct drm_file *filep);
+
 #endif /* NV_DRM_FENCE_AVAILABLE */

 #endif /* NV_DRM_AVAILABLE */
--- a/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c
@@ -465,7 +465,7 @@ int nv_drm_gem_alloc_nvkms_memory_ioctl(struct drm_device *dev,
        goto failed;
    }

-    if (p->__pad != 0) {
+    if ((p->__pad0 != 0) || (p->__pad1 != 0)) {
        ret = -EINVAL;
        NV_DRM_DEV_LOG_ERR(nv_dev, "non-zero value in padding field");
        goto failed;
--- a/kernel-open/nvidia-drm/nvidia-drm-gem.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-gem.h
@@ -95,6 +95,16 @@ static inline struct nv_drm_gem_object *to_nv_gem_object(
 * 3e70fd160cf0b1945225eaa08dd2cb8544f21cb8 (2018-11-15).
 */

+static inline void
+nv_drm_gem_object_reference(struct nv_drm_gem_object *nv_gem)
+{
+#if defined(NV_DRM_GEM_OBJECT_GET_PRESENT)
+    drm_gem_object_get(&nv_gem->base);
+#else
+    drm_gem_object_reference(&nv_gem->base);
+#endif
+}
+
 static inline void
 nv_drm_gem_object_unreference_unlocked(struct nv_drm_gem_object *nv_gem)
 {
--- a/kernel-open/nvidia-drm/nvidia-drm-helper.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-helper.h
@@ -306,6 +306,36 @@ int nv_drm_atomic_helper_disable_all(struct drm_device *dev,
    for_each_plane_in_state(__state, plane, plane_state, __i)
 #endif

+/*
+ * for_each_new_plane_in_state() was added by kernel commit
+ * 581e49fe6b411f407102a7f2377648849e0fa37f which was Signed-off-by:
+ *      Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
+ *      Daniel Vetter <daniel.vetter@ffwll.ch>
+ *
+ * This commit also added the old_state and new_state pointers to
+ * __drm_planes_state. Because of this, the best that can be done on kernel
+ * versions without this macro is for_each_plane_in_state.
+ */
+
+/**
+ * nv_drm_for_each_new_plane_in_state - iterate over all planes in an atomic update
+ * @__state: &struct drm_atomic_state pointer
+ * @plane: &struct drm_plane iteration cursor
+ * @new_plane_state: &struct drm_plane_state iteration cursor for the new state
+ * @__i: int iteration cursor, for macro-internal use
+ *
+ * This iterates over all planes in an atomic update, tracking only the new
+ * state. This is useful in enable functions, where we need the new state the
+ * hardware should be in when the atomic commit operation has completed.
+ */
+#if !defined(for_each_new_plane_in_state)
+#define nv_drm_for_each_new_plane_in_state(__state, plane, new_plane_state, __i) \
+    nv_drm_for_each_plane_in_state(__state, plane, new_plane_state, __i)
+#else
+#define nv_drm_for_each_new_plane_in_state(__state, plane, new_plane_state, __i) \
+    for_each_new_plane_in_state(__state, plane, new_plane_state, __i)
+#endif
+
 static inline struct drm_connector *
 nv_drm_connector_lookup(struct drm_device *dev, struct drm_file *filep,
                        uint32_t id)
--- a/kernel-open/nvidia-drm/nvidia-drm-ioctl.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-ioctl.h
@@ -48,6 +48,10 @@
 #define DRM_NVIDIA_GET_CONNECTOR_ID_FOR_DPY_ID      0x11
 #define DRM_NVIDIA_GRANT_PERMISSIONS                0x12
 #define DRM_NVIDIA_REVOKE_PERMISSIONS               0x13
+#define DRM_NVIDIA_SEMSURF_FENCE_CTX_CREATE         0x14
+#define DRM_NVIDIA_SEMSURF_FENCE_CREATE             0x15
+#define DRM_NVIDIA_SEMSURF_FENCE_WAIT               0x16
+#define DRM_NVIDIA_SEMSURF_FENCE_ATTACH             0x17

 #define DRM_IOCTL_NVIDIA_GEM_IMPORT_NVKMS_MEMORY                           \
    DRM_IOWR((DRM_COMMAND_BASE + DRM_NVIDIA_GEM_IMPORT_NVKMS_MEMORY),      \
@@ -133,6 +137,26 @@
    DRM_IOWR((DRM_COMMAND_BASE + DRM_NVIDIA_REVOKE_PERMISSIONS),        \
             struct drm_nvidia_revoke_permissions_params)

+#define DRM_IOCTL_NVIDIA_SEMSURF_FENCE_CTX_CREATE                       \
+    DRM_IOWR((DRM_COMMAND_BASE +                                        \
+              DRM_NVIDIA_SEMSURF_FENCE_CTX_CREATE),                     \
+              struct drm_nvidia_semsurf_fence_ctx_create_params)
+
+#define DRM_IOCTL_NVIDIA_SEMSURF_FENCE_CREATE                           \
+    DRM_IOWR((DRM_COMMAND_BASE +                                        \
+              DRM_NVIDIA_SEMSURF_FENCE_CREATE),                         \
+              struct drm_nvidia_semsurf_fence_create_params)
+
+#define DRM_IOCTL_NVIDIA_SEMSURF_FENCE_WAIT                             \
+    DRM_IOW((DRM_COMMAND_BASE +                                         \
+              DRM_NVIDIA_SEMSURF_FENCE_WAIT),                           \
+              struct drm_nvidia_semsurf_fence_wait_params)
+
+#define DRM_IOCTL_NVIDIA_SEMSURF_FENCE_ATTACH                           \
+    DRM_IOW((DRM_COMMAND_BASE +                                         \
+              DRM_NVIDIA_SEMSURF_FENCE_ATTACH),                         \
+              struct drm_nvidia_semsurf_fence_attach_params)
+
 struct drm_nvidia_gem_import_nvkms_memory_params {
    uint64_t mem_size;           /* IN */

@@ -158,6 +182,8 @@ struct drm_nvidia_get_dev_info_params {
    uint32_t generic_page_kind;    /* OUT */
    uint32_t page_kind_generation; /* OUT */
    uint32_t sector_layout;        /* OUT */
+    uint32_t supports_sync_fd;     /* OUT */
+    uint32_t supports_semsurf;     /* OUT */
 };

 struct drm_nvidia_prime_fence_context_create_params {
@@ -179,6 +205,7 @@ struct drm_nvidia_gem_prime_fence_attach_params {
    uint32_t handle;                /* IN GEM handle to attach fence to */
    uint32_t fence_context_handle;  /* IN GEM handle to fence context on which fence is run on */
    uint32_t sem_thresh;            /* IN Semaphore value to reach before signal */
+    uint32_t __pad;
 };

 struct drm_nvidia_get_client_capability_params {
@@ -190,6 +217,8 @@ struct drm_nvidia_get_client_capability_params {
 struct drm_nvidia_crtc_crc32 {
    uint32_t value; /* Read value, undefined if supported is false */
    uint8_t supported; /* Supported boolean, true if readable by hardware */
+    uint8_t __pad0;
+    uint16_t __pad1;
 };

 struct drm_nvidia_crtc_crc32_v2_out {
@@ -229,10 +258,11 @@ struct drm_nvidia_gem_alloc_nvkms_memory_params {
    uint32_t handle;              /* OUT */
    uint8_t  block_linear;        /* IN */
    uint8_t  compressible;        /* IN/OUT */
-    uint16_t __pad;
+    uint16_t __pad0;

    uint64_t memory_size;         /* IN */
    uint32_t flags;               /* IN */
+    uint32_t __pad1;
 };

 struct drm_nvidia_gem_export_dmabuf_memory_params {
@@ -266,13 +296,90 @@ struct drm_nvidia_get_connector_id_for_dpy_id_params {
    uint32_t connectorId; /* OUT */
 };

+enum drm_nvidia_permissions_type {
+    NV_DRM_PERMISSIONS_TYPE_MODESET = 2,
+    NV_DRM_PERMISSIONS_TYPE_SUB_OWNER = 3
+};
+
 struct drm_nvidia_grant_permissions_params {
    int32_t fd;           /* IN */
    uint32_t dpyId;       /* IN */
+    uint32_t type;        /* IN */
 };

 struct drm_nvidia_revoke_permissions_params {
    uint32_t dpyId;       /* IN */
+    uint32_t type;        /* IN */
+};
+
+struct drm_nvidia_semsurf_fence_ctx_create_params {
+    uint64_t index;             /* IN Index of the desired semaphore in the
+                                 * fence context's semaphore surface */
+
+    /* Params for importing userspace semaphore surface */
+    uint64_t nvkms_params_ptr;  /* IN */
+    uint64_t nvkms_params_size; /* IN */
+
+    uint32_t handle;            /* OUT GEM handle to fence context */
+    uint32_t __pad;
+};
+
+struct drm_nvidia_semsurf_fence_create_params {
+    uint32_t fence_context_handle;  /* IN GEM handle to fence context on which
+                                     * fence is run on */
+
+    uint32_t timeout_value_ms;      /* IN Timeout value in ms for the fence
+                                     * after which the fence will be signaled
+                                     * with its error status set to -ETIMEDOUT.
+                                     * Default timeout value is 5000ms */
+
+    uint64_t wait_value;            /* IN Semaphore value to reach before signal */
+
+    int32_t  fd;                    /* OUT sync FD object representing the
+                                     * semaphore at the specified index reaching
+                                     * a value >= wait_value */
+    uint32_t __pad;
+};
+
+/*
+ * Note there is no provision for timeouts in this ioctl. The kernel
+ * documentation asserts timeouts should be handled by fence producers, and
+ * that waiters should not second-guess their logic, as it is producers rather
+ * than consumers that have better information when it comes to determining a
+ * reasonable timeout for a given workload.
+ */
+struct drm_nvidia_semsurf_fence_wait_params {
+    uint32_t fence_context_handle;  /* IN GEM handle to fence context which will
+                                     * be used to wait on the sync FD.  Need not
+                                     * be the fence context used to create the
+                                     * sync FD. */
+
+    int32_t  fd;                    /* IN sync FD object to wait on */
+
+    uint64_t pre_wait_value;        /* IN Wait for the semaphore represented by
+                                     * fence_context to reach this value before
+                                     * waiting for the sync file. */
+
+    uint64_t post_wait_value;       /* IN Signal the semaphore represented by
+                                     * fence_context to this value after waiting
+                                     * for the sync file */
+};
+
+struct drm_nvidia_semsurf_fence_attach_params {
+    uint32_t handle;                /* IN GEM handle of buffer */
+
+    uint32_t fence_context_handle;  /* IN GEM handle of fence context */
+
+    uint32_t timeout_value_ms;      /* IN Timeout value in ms for the fence
+                                     * after which the fence will be signaled
+                                     * with its error status set to -ETIMEDOUT.
+                                     * Default timeout value is 5000ms */
+
+    uint32_t shared;                /* IN If true, fence will reserve shared
+                                     * access to the buffer, otherwise it will
+                                     * reserve exclusive access */
+
+    uint64_t wait_value;            /* IN Semaphore value to reach before signal */
 };

 #endif /* _UAPI_NVIDIA_DRM_IOCTL_H_ */
--- a/kernel-open/nvidia-drm/nvidia-drm-linux.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-linux.c
@@ -35,7 +35,13 @@
 #include <drm/drmP.h>
 #endif

+#if defined(NV_LINUX_SYNC_FILE_H_PRESENT)
+#include <linux/file.h>
+#include <linux/sync_file.h>
+#endif
+
 #include <linux/vmalloc.h>
+#include <linux/sched.h>

 #include "nv-mm.h"

@@ -45,6 +51,14 @@ MODULE_PARM_DESC(
 bool nv_drm_modeset_module_param = false;
 module_param_named(modeset, nv_drm_modeset_module_param, bool, 0400);

+#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
+MODULE_PARM_DESC(
+    fbdev,
+    "Create a framebuffer device (1 = enable, 0 = disable (default)) (EXPERIMENTAL)");
+bool nv_drm_fbdev_module_param = false;
+module_param_named(fbdev, nv_drm_fbdev_module_param, bool, 0400);
+#endif
+
 void *nv_drm_calloc(size_t nmemb, size_t size)
 {
    size_t total_size = nmemb * size;
@@ -81,14 +95,10 @@ char *nv_drm_asprintf(const char *fmt, ...)

 #if defined(NVCPU_X86) || defined(NVCPU_X86_64)
  #define WRITE_COMBINE_FLUSH()    asm volatile("sfence":::"memory")
-#elif defined(NVCPU_FAMILY_ARM)
-  #if defined(NVCPU_ARM)
-    #define WRITE_COMBINE_FLUSH()  { dsb(); outer_sync(); }
-  #elif defined(NVCPU_AARCH64)
-    #define WRITE_COMBINE_FLUSH()  mb()
-  #endif
 #elif defined(NVCPU_PPC64LE)
  #define WRITE_COMBINE_FLUSH()    asm volatile("sync":::"memory")
+#else
+  #define WRITE_COMBINE_FLUSH()    mb()
 #endif

 void nv_drm_write_combine_flush(void)
@@ -160,6 +170,122 @@ void nv_drm_vunmap(void *address)
    vunmap(address);
 }

+bool nv_drm_workthread_init(nv_drm_workthread *worker, const char *name)
+{
+    worker->shutting_down = false;
+    if (nv_kthread_q_init(&worker->q, name)) {
+        return false;
+    }
+
+    spin_lock_init(&worker->lock);
+
+    return true;
+}
+
+void nv_drm_workthread_shutdown(nv_drm_workthread *worker)
+{
+    unsigned long flags;
+
+    spin_lock_irqsave(&worker->lock, flags);
+    worker->shutting_down = true;
+    spin_unlock_irqrestore(&worker->lock, flags);
+
+    nv_kthread_q_stop(&worker->q);
+}
+
+void nv_drm_workthread_work_init(nv_drm_work *work,
+                                 void (*callback)(void *),
+                                 void *arg)
+{
+    nv_kthread_q_item_init(work, callback, arg);
+}
+
+int nv_drm_workthread_add_work(nv_drm_workthread *worker, nv_drm_work *work)
+{
+    unsigned long flags;
+    int ret = 0;
+
+    spin_lock_irqsave(&worker->lock, flags);
+    if (!worker->shutting_down) {
+        ret = nv_kthread_q_schedule_q_item(&worker->q, work);
+    }
+    spin_unlock_irqrestore(&worker->lock, flags);
+
+    return ret;
+}
+
+void nv_drm_timer_setup(nv_drm_timer *timer, void (*callback)(nv_drm_timer *nv_drm_timer))
+{
+    nv_timer_setup(timer, callback);
+}
+
+void nv_drm_mod_timer(nv_drm_timer *timer, unsigned long timeout_native)
+{
+    mod_timer(&timer->kernel_timer, timeout_native);
+}
+
+unsigned long nv_drm_timer_now(void)
+{
+    return jiffies;
+}
+
+unsigned long nv_drm_timeout_from_ms(NvU64 relative_timeout_ms)
+{
+    return jiffies + msecs_to_jiffies(relative_timeout_ms);
+}
+
+bool nv_drm_del_timer_sync(nv_drm_timer *timer)
+{
+    if (del_timer_sync(&timer->kernel_timer)) {
+        return true;
+    } else {
+        return false;
+    }
+}
+
+#if defined(NV_DRM_FENCE_AVAILABLE)
+int nv_drm_create_sync_file(nv_dma_fence_t *fence)
+{
+#if defined(NV_LINUX_SYNC_FILE_H_PRESENT)
+    struct sync_file *sync;
+    int fd = get_unused_fd_flags(O_CLOEXEC);
+
+    if (fd < 0) {
+        return fd;
+    }
+
+    /* sync_file_create() generates its own reference to the fence */
+    sync = sync_file_create(fence);
+
+    if (IS_ERR(sync)) {
+        put_unused_fd(fd);
+        return PTR_ERR(sync);
+    }
+
+    fd_install(fd, sync->file);
+
+    return fd;
+#else /* defined(NV_LINUX_SYNC_FILE_H_PRESENT) */
+    return -EINVAL;
+#endif  /* defined(NV_LINUX_SYNC_FILE_H_PRESENT) */
+}
+
+nv_dma_fence_t *nv_drm_sync_file_get_fence(int fd)
+{
+#if defined(NV_SYNC_FILE_GET_FENCE_PRESENT)
+    return sync_file_get_fence(fd);
+#else /* defined(NV_SYNC_FILE_GET_FENCE_PRESENT) */
+    return NULL;
+#endif  /* defined(NV_SYNC_FILE_GET_FENCE_PRESENT) */
+}
+#endif /* defined(NV_DRM_FENCE_AVAILABLE) */
+
+void nv_drm_yield(void)
+{
+    set_current_state(TASK_INTERRUPTIBLE);
+    schedule_timeout(1);
+}
+
 #endif /* NV_DRM_AVAILABLE */

 /*************************************************************************
--- a/kernel-open/nvidia-drm/nvidia-drm-modeset.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-modeset.c
@@ -237,6 +237,14 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,
    int i;
    int ret;

+    /*
+     * If sub-owner permission was granted to another NVKMS client, disallow
+     * modesets through the DRM interface.
+     */
+    if (nv_dev->subOwnershipGranted) {
+        return -EINVAL;
+    }
+
    memset(requested_config, 0, sizeof(*requested_config));

    /* Loop over affected crtcs and construct NvKmsKapiRequestedModeSetConfig */
@@ -274,9 +282,6 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,

                nv_new_crtc_state->nv_flip = NULL;
            }
-#if defined(NV_DRM_CRTC_STATE_HAS_VRR_ENABLED)
-            requested_config->headRequestedConfig[nv_crtc->head].modeSetConfig.vrrEnabled = new_crtc_state->vrr_enabled;
-#endif
        }
    }

@@ -292,7 +297,9 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,
                                   requested_config,
                                   &reply_config,
                                   commit)) {
-        return -EINVAL;
+        if (commit || reply_config.flipResult != NV_KMS_FLIP_RESULT_IN_PROGRESS) {
+            return -EINVAL;
+        }
    }

    if (commit && nv_dev->supportsSyncpts) {
@@ -388,42 +395,56 @@ int nv_drm_atomic_commit(struct drm_device *dev,
    struct nv_drm_device *nv_dev = to_nv_device(dev);

    /*
-     * drm_mode_config_funcs::atomic_commit() mandates to return -EBUSY
-     * for nonblocking commit if previous updates (commit tasks/flip event) are
-     * pending. In case of blocking commits it mandates to wait for previous
-     * updates to complete.
+     * XXX: drm_mode_config_funcs::atomic_commit() mandates to return -EBUSY
+     * for nonblocking commit if the commit would need to wait for previous
+     * updates (commit tasks/flip event) to complete. In case of blocking
+     * commits it mandates to wait for previous updates to complete. However,
+     * the kernel DRM-KMS documentation does explicitly allow maintaining a
+     * queue of outstanding commits.
+     *
+     * Our system already implements such a queue, but due to
+     * bug 4054608, it is currently not used.
     */
-    if (nonblock) {
-        nv_drm_for_each_crtc_in_state(state, crtc, crtc_state, i) {
-            struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
+    nv_drm_for_each_crtc_in_state(state, crtc, crtc_state, i) {
+        struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);

-            /*
-             * Here you aren't required to hold nv_drm_crtc::flip_list_lock
-             * because:
-             *
-             * The core DRM driver acquires lock for all affected crtcs before
-             * calling into ->commit() hook, therefore it is not possible for
-             * other threads to call into ->commit() hook affecting same crtcs
-             * and enqueue flip objects into flip_list -
-             *
-             *   nv_drm_atomic_commit_internal()
-             *     |-> nv_drm_atomic_apply_modeset_config(commit=true)
-             *           |-> nv_drm_crtc_enqueue_flip()
-             *
-             * Only possibility is list_empty check races with code path
-             * dequeuing flip object -
-             *
-             *   __nv_drm_handle_flip_event()
-             *     |-> nv_drm_crtc_dequeue_flip()
-             *
-             * But this race condition can't lead list_empty() to return
-             * incorrect result. nv_drm_crtc_dequeue_flip() in the middle of
-             * updating the list could not trick us into thinking the list is
-             * empty when it isn't.
-             */
+        /*
+         * Here you aren't required to hold nv_drm_crtc::flip_list_lock
+         * because:
+         *
+         * The core DRM driver acquires lock for all affected crtcs before
+         * calling into ->commit() hook, therefore it is not possible for
+         * other threads to call into ->commit() hook affecting same crtcs
+         * and enqueue flip objects into flip_list -
+         *
+         *   nv_drm_atomic_commit_internal()
+         *     |-> nv_drm_atomic_apply_modeset_config(commit=true)
+         *           |-> nv_drm_crtc_enqueue_flip()
+         *
+         * Only possibility is list_empty check races with code path
+         * dequeuing flip object -
+         *
+         *   __nv_drm_handle_flip_event()
+         *     |-> nv_drm_crtc_dequeue_flip()
+         *
+         * But this race condition can't lead list_empty() to return
+         * incorrect result. nv_drm_crtc_dequeue_flip() in the middle of
+         * updating the list could not trick us into thinking the list is
+         * empty when it isn't.
+         */
+        if (nonblock) {
            if (!list_empty(&nv_crtc->flip_list)) {
                return -EBUSY;
            }
+        } else {
+            if (wait_event_timeout(
+                    nv_dev->flip_event_wq,
+                    list_empty(&nv_crtc->flip_list),
+                    3 * HZ /* 3 second */) == 0) {
+                NV_DRM_DEV_LOG_ERR(
+                    nv_dev,
+                    "Flip event timeout on head %u", nv_crtc->head);
+            }
        }
    }

@@ -467,6 +488,7 @@ int nv_drm_atomic_commit(struct drm_device *dev,

        goto done;
    }
+    nv_dev->drmMasterChangedSinceLastAtomicCommit = NV_FALSE;

    nv_drm_for_each_crtc_in_state(state, crtc, crtc_state, i) {
        struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
--- a/kernel-open/nvidia-drm/nvidia-drm-os-interface.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-os-interface.h
@@ -29,10 +29,47 @@

 #if defined(NV_DRM_AVAILABLE)

+#if defined(NV_DRM_FENCE_AVAILABLE)
+#include "nvidia-dma-fence-helper.h"
+#endif
+
+#if defined(NV_LINUX)
+#include "nv-kthread-q.h"
+#include "linux/spinlock.h"
+
+typedef struct nv_drm_workthread {
+    spinlock_t lock;
+    struct nv_kthread_q q;
+    bool shutting_down;
+} nv_drm_workthread;
+
+typedef nv_kthread_q_item_t nv_drm_work;
+
+#else /* defined(NV_LINUX) */
+#error "Need to define deferred work primitives for this OS"
+#endif /* else defined(NV_LINUX) */
+
+#if defined(NV_LINUX)
+#include "nv-timer.h"
+
+typedef struct nv_timer nv_drm_timer;
+
+#else /* defined(NV_LINUX) */
+#error "Need to define kernel timer callback primitives for this OS"
+#endif /* else defined(NV_LINUX) */
+
+#if defined(NV_DRM_FBDEV_GENERIC_SETUP_PRESENT) && defined(NV_DRM_APERTURE_REMOVE_CONFLICTING_PCI_FRAMEBUFFERS_PRESENT)
+#define NV_DRM_FBDEV_GENERIC_AVAILABLE
+#endif
+
 struct page;

 /* Set to true when the atomic modeset feature is enabled. */
 extern bool nv_drm_modeset_module_param;
+#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
+/* Set to true when the nvidia-drm driver should install a framebuffer device */
+extern bool nv_drm_fbdev_module_param;
+#endif

 void *nv_drm_calloc(size_t nmemb, size_t size);

@@ -51,6 +88,37 @@ void *nv_drm_vmap(struct page **pages, unsigned long pages_count);

 void nv_drm_vunmap(void *address);

-#endif
+bool nv_drm_workthread_init(nv_drm_workthread *worker, const char *name);
+
+/* Can be called concurrently with nv_drm_workthread_add_work() */
+void nv_drm_workthread_shutdown(nv_drm_workthread *worker);
+
+void nv_drm_workthread_work_init(nv_drm_work *work,
+                                 void (*callback)(void *),
+                                 void *arg);
+
+/* Can be called concurrently with nv_drm_workthread_shutdown() */
+int nv_drm_workthread_add_work(nv_drm_workthread *worker, nv_drm_work *work);
+
+void nv_drm_timer_setup(nv_drm_timer *timer,
+                        void (*callback)(nv_drm_timer *nv_drm_timer));
+
+void nv_drm_mod_timer(nv_drm_timer *timer, unsigned long relative_timeout_ms);
+
+bool nv_drm_del_timer_sync(nv_drm_timer *timer);
+
+unsigned long nv_drm_timer_now(void);
+
+unsigned long nv_drm_timeout_from_ms(NvU64 relative_timeout_ms);
+
+#if defined(NV_DRM_FENCE_AVAILABLE)
+int nv_drm_create_sync_file(nv_dma_fence_t *fence);
+
+nv_dma_fence_t *nv_drm_sync_file_get_fence(int fd);
+#endif /* defined(NV_DRM_FENCE_AVAILABLE) */
+
+void nv_drm_yield(void);
+
+#endif /* defined(NV_DRM_AVAILABLE) */

 #endif /* __NVIDIA_DRM_OS_INTERFACE_H__ */
--- a/kernel-open/nvidia-drm/nvidia-drm-priv.h
+++ b/kernel-open/nvidia-drm/nvidia-drm-priv.h
@@ -46,12 +46,33 @@
 #define NV_DRM_LOG_ERR(__fmt, ...) \
    DRM_ERROR("[nvidia-drm] " __fmt "\n", ##__VA_ARGS__)

+/*
+ * DRM_WARN() was added in v4.9 by kernel commit
+ * 30b0da8d556e65ff935a56cd82c05ba0516d3e4a
+ *
+ * Before this commit, only DRM_INFO and DRM_ERROR were defined and
+ * DRM_INFO(fmt, ...) was defined as
+ * printk(KERN_INFO "[" DRM_NAME "] " fmt, ##__VA_ARGS__). So, if
+ * DRM_WARN is undefined this defines NV_DRM_LOG_WARN following the
+ * same pattern as DRM_INFO.
+ */
+#ifdef DRM_WARN
+#define NV_DRM_LOG_WARN(__fmt, ...) \
+    DRM_WARN("[nvidia-drm] " __fmt "\n", ##__VA_ARGS__)
+#else
+#define NV_DRM_LOG_WARN(__fmt, ...) \
+    printk(KERN_WARNING "[" DRM_NAME "] [nvidia-drm] " __fmt "\n", ##__VA_ARGS__)
+#endif
+
 #define NV_DRM_LOG_INFO(__fmt, ...) \
    DRM_INFO("[nvidia-drm] " __fmt "\n", ##__VA_ARGS__)

 #define NV_DRM_DEV_LOG_INFO(__dev, __fmt, ...) \
    NV_DRM_LOG_INFO("[GPU ID 0x%08x] " __fmt, __dev->gpu_info.gpu_id, ##__VA_ARGS__)

+#define NV_DRM_DEV_LOG_WARN(__dev, __fmt, ...) \
+    NV_DRM_LOG_WARN("[GPU ID 0x%08x] " __fmt, __dev->gpu_info.gpu_id, ##__VA_ARGS__)
+
 #define NV_DRM_DEV_LOG_ERR(__dev, __fmt, ...) \
    NV_DRM_LOG_ERR("[GPU ID 0x%08x] " __fmt, __dev->gpu_info.gpu_id, ##__VA_ARGS__)

@@ -117,9 +138,26 @@ struct nv_drm_device {

 #endif

+#if defined(NV_DRM_FENCE_AVAILABLE)
+    NvU64 semsurf_stride;
+    NvU64 semsurf_max_submitted_offset;
+#endif
+
    NvBool hasVideoMemory;

    NvBool supportsSyncpts;
+    NvBool subOwnershipGranted;
+    NvBool hasFramebufferConsole;
+
+    /**
+     * @drmMasterChangedSinceLastAtomicCommit:
+     *
+     * This flag is set in nv_drm_master_set and reset after a completed atomic
+     * commit. It is used to restore or recommit state that is lost by the
+     * NvKms modeset owner change, such as the CRTC color management
+     * properties.
+     */
+    NvBool drmMasterChangedSinceLastAtomicCommit;

    struct drm_property *nv_out_fence_property;
    struct drm_property *nv_input_colorspace_property;
--- a/kernel-open/nvidia-drm/nvidia-drm.Kbuild
+++ b/kernel-open/nvidia-drm/nvidia-drm.Kbuild
@@ -19,6 +19,7 @@ NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-modeset.c
 NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-fence.c
 NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-linux.c
 NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-helper.c
+NVIDIA_DRM_SOURCES += nvidia-drm/nv-kthread-q.c
 NVIDIA_DRM_SOURCES += nvidia-drm/nv-pci-table.c
 NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-gem-nvkms-memory.c
 NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-gem-user-memory.c
@@ -54,6 +55,8 @@ NV_CONFTEST_GENERIC_COMPILE_TESTS += drm_atomic_available
 NV_CONFTEST_GENERIC_COMPILE_TESTS += is_export_symbol_gpl_refcount_inc
 NV_CONFTEST_GENERIC_COMPILE_TESTS += is_export_symbol_gpl_refcount_dec_and_test
 NV_CONFTEST_GENERIC_COMPILE_TESTS += drm_alpha_blending_available
+NV_CONFTEST_GENERIC_COMPILE_TESTS += is_export_symbol_present_drm_gem_prime_fd_to_handle
+NV_CONFTEST_GENERIC_COMPILE_TESTS += is_export_symbol_present_drm_gem_prime_handle_to_fd

 NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_dev_unref
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_reinit_primary_mode_group
@@ -77,6 +80,17 @@ NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_rotation_available
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_vma_offset_exact_lookup_locked
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_gem_object_put_unlocked
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += nvhost_dma_fence_unpack
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += list_is_first
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += timer_setup
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += dma_fence_set_error
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += fence_set_error
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += sync_file_get_fence
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_aperture_remove_conflicting_pci_framebuffers
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_fbdev_generic_setup
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_connector_attach_hdr_output_metadata_property
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_helper_crtc_enable_color_mgmt
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_crtc_enable_color_mgmt
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_atomic_helper_legacy_gamma_set

 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_present
 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_has_bus_type
@@ -131,3 +145,6 @@ NV_CONFTEST_TYPE_COMPILE_TESTS += drm_connector_lookup
 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_connector_put
 NV_CONFTEST_TYPE_COMPILE_TESTS += vm_area_struct_has_const_vm_flags
 NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_dumb_destroy
+NV_CONFTEST_TYPE_COMPILE_TESTS += fence_ops_use_64bit_seqno
+NV_CONFTEST_TYPE_COMPILE_TESTS += drm_aperture_remove_conflicting_pci_framebuffers_has_driver_arg
+NV_CONFTEST_TYPE_COMPILE_TESTS += drm_mode_create_dp_colorspace_property_has_supported_colorspaces_arg
--- a/kernel-open/nvidia-modeset/nv-kthread-q.c
+++ b/kernel-open/nvidia-modeset/nv-kthread-q.c
@@ -247,6 +247,11 @@ int nv_kthread_q_init_on_node(nv_kthread_q_t *q, const char *q_name, int preferr
    return 0;
 }

+int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname)
+{
+    return nv_kthread_q_init_on_node(q, qname, NV_KTHREAD_NO_NODE);
+}
+
 // Returns true (non-zero) if the item was actually scheduled, and false if the
 // item was already pending in a queue.
 static int _raw_q_schedule(nv_kthread_q_t *q, nv_kthread_q_item_t *q_item)
--- a/kernel-open/nvidia-modeset/nvidia-modeset-linux.c
+++ b/kernel-open/nvidia-modeset/nvidia-modeset-linux.c
@@ -1,5 +1,5 @@
 /*
- * SPDX-FileCopyrightText: Copyright (c) 2015-21 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2015-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
@@ -65,9 +65,15 @@
 static bool output_rounding_fix = true;
 module_param_named(output_rounding_fix, output_rounding_fix, bool, 0400);

+static bool disable_hdmi_frl = false;
+module_param_named(disable_hdmi_frl, disable_hdmi_frl, bool, 0400);
+
 static bool disable_vrr_memclk_switch = false;
 module_param_named(disable_vrr_memclk_switch, disable_vrr_memclk_switch, bool, 0400);

+static bool hdmi_deepcolor = false;
+module_param_named(hdmi_deepcolor, hdmi_deepcolor, bool, 0400);
+
 /* These parameters are used for fault injection tests.  Normally the defaults
 * should be used. */
 MODULE_PARM_DESC(fail_malloc, "Fail the Nth call to nvkms_alloc");
@@ -78,6 +84,7 @@ MODULE_PARM_DESC(malloc_verbose, "Report information about malloc calls on modul
 static bool malloc_verbose = false;
 module_param_named(malloc_verbose, malloc_verbose, bool, 0400);

+#if NVKMS_CONFIG_FILE_SUPPORTED
 /* This parameter is used to find the dpy override conf file */
 #define NVKMS_CONF_FILE_SPECIFIED (nvkms_conf != NULL)

@@ -86,6 +93,7 @@ MODULE_PARM_DESC(config_file,
                 "(default: disabled)");
 static char *nvkms_conf = NULL;
 module_param_named(config_file, nvkms_conf, charp, 0400);
+#endif

 static atomic_t nvkms_alloc_called_count;

@@ -94,11 +102,21 @@ NvBool nvkms_output_rounding_fix(void)
    return output_rounding_fix;
 }

+NvBool nvkms_disable_hdmi_frl(void)
+{
+    return disable_hdmi_frl;
+}
+
 NvBool nvkms_disable_vrr_memclk_switch(void)
 {
    return disable_vrr_memclk_switch;
 }

+NvBool nvkms_hdmi_deepcolor(void)
+{
+    return hdmi_deepcolor;
+}
+
 #define NVKMS_SYNCPT_STUBS_NEEDED

 /*************************************************************************
@@ -335,7 +353,7 @@ NvU64 nvkms_get_usec(void)
    struct timespec64 ts;
    NvU64 ns;

-    ktime_get_real_ts64(&ts);
+    ktime_get_raw_ts64(&ts);

    ns = timespec64_to_ns(&ts);
    return ns / 1000;
@@ -1382,6 +1400,7 @@ static void nvkms_proc_exit(void)
 /*************************************************************************
 * NVKMS Config File Read
 ************************************************************************/
+#if NVKMS_CONFIG_FILE_SUPPORTED
 static NvBool nvkms_fs_mounted(void)
 {
    return current->fs != NULL;
@@ -1489,6 +1508,11 @@ static void nvkms_read_config_file_locked(void)

    nvkms_free(buffer, buf_size);
 }
+#else
+static void nvkms_read_config_file_locked(void)
+{
+}
+#endif

 /*************************************************************************
 * NVKMS KAPI functions
--- a/kernel-open/nvidia-modeset/nvidia-modeset-os-interface.h
+++ b/kernel-open/nvidia-modeset/nvidia-modeset-os-interface.h
@@ -97,8 +97,9 @@ typedef struct {
 } NvKmsSyncPtOpParams;

 NvBool nvkms_output_rounding_fix(void);
-
+NvBool nvkms_disable_hdmi_frl(void);
 NvBool nvkms_disable_vrr_memclk_switch(void);
+NvBool nvkms_hdmi_deepcolor(void);

 void   nvkms_call_rm    (void *ops);
 void*  nvkms_alloc      (size_t size,
--- a/kernel-open/nvidia-modeset/nvidia-modeset.Kbuild
+++ b/kernel-open/nvidia-modeset/nvidia-modeset.Kbuild
@@ -58,6 +58,18 @@ nvidia-modeset-y += $(NVIDIA_MODESET_BINARY_OBJECT_O)
 NVIDIA_MODESET_CFLAGS += -I$(src)/nvidia-modeset
 NVIDIA_MODESET_CFLAGS += -UDEBUG -U_DEBUG -DNDEBUG -DNV_BUILD_MODULE_INSTANCES=0

+# Some Android kernels prohibit driver use of filesystem functions like
+# filp_open() and kernel_read().  Disable the NVKMS_CONFIG_FILE_SUPPORTED
+# functionality that uses those functions when building for Android.
+
+PLATFORM_IS_ANDROID ?= 0
+
+ifeq ($(PLATFORM_IS_ANDROID),1)
+  NVIDIA_MODESET_CFLAGS += -DNVKMS_CONFIG_FILE_SUPPORTED=0
+else
+  NVIDIA_MODESET_CFLAGS += -DNVKMS_CONFIG_FILE_SUPPORTED=1
+endif
+
 $(call ASSIGN_PER_OBJ_CFLAGS, $(NVIDIA_MODESET_OBJECTS), $(NVIDIA_MODESET_CFLAGS))


--- a/kernel-open/nvidia-modeset/nvkms.h
+++ b/kernel-open/nvidia-modeset/nvkms.h
@@ -66,6 +66,8 @@ enum NvKmsClientType {
    NVKMS_CLIENT_KERNEL_SPACE,
 };

+struct NvKmsPerOpenDev;
+
 NvBool nvKmsIoctl(
    void *pOpenVoid,
    NvU32 cmd,
@@ -104,4 +106,6 @@ NvBool nvKmsKapiGetFunctionsTableInternal
 NvBool nvKmsGetBacklight(NvU32 display_id, void *drv_priv, NvU32 *brightness);
 NvBool nvKmsSetBacklight(NvU32 display_id, void *drv_priv, NvU32 brightness);

+NvBool nvKmsOpenDevHasSubOwnerPermissionOrBetter(const struct NvKmsPerOpenDev *pOpenDev);
+
 #endif /* __NV_KMS_H__ */
--- a/kernel-open/nvidia-peermem/nvidia-peermem.c
+++ b/kernel-open/nvidia-peermem/nvidia-peermem.c
@@ -249,8 +249,8 @@ static int nv_dma_map(struct sg_table *sg_head, void *context,
    nv_mem_context->sg_allocated = 1;
    for_each_sg(sg_head->sgl, sg, nv_mem_context->npages, i) {
        sg_set_page(sg, NULL, nv_mem_context->page_size, 0);
-        sg->dma_address = dma_mapping->dma_addresses[i];
-        sg->dma_length = nv_mem_context->page_size;
+        sg_dma_address(sg) = dma_mapping->dma_addresses[i];
+        sg_dma_len(sg) = nv_mem_context->page_size;
    }
    nv_mem_context->sg_head = *sg_head;
    *nmap = nv_mem_context->npages;
@@ -304,8 +304,13 @@ static void nv_mem_put_pages_common(int nc,
        return;

    if (nc) {
+#ifdef NVIDIA_P2P_CAP_GET_PAGES_PERSISTENT_API
        ret = nvidia_p2p_put_pages_persistent(nv_mem_context->page_virt_start,
                                              nv_mem_context->page_table, 0);
+#else
+        ret = nvidia_p2p_put_pages(0, 0, nv_mem_context->page_virt_start,
+                                   nv_mem_context->page_table);
+#endif
    } else {
        ret = nvidia_p2p_put_pages(0, 0, nv_mem_context->page_virt_start,
                                   nv_mem_context->page_table);
@@ -412,9 +417,15 @@ static int nv_mem_get_pages_nc(unsigned long addr,
    nv_mem_context->core_context = core_context;
    nv_mem_context->page_size = GPU_PAGE_SIZE;

+#ifdef NVIDIA_P2P_CAP_GET_PAGES_PERSISTENT_API
    ret = nvidia_p2p_get_pages_persistent(nv_mem_context->page_virt_start,
                                          nv_mem_context->mapped_size,
                                          &nv_mem_context->page_table, 0);
+#else
+    ret = nvidia_p2p_get_pages(0, 0, nv_mem_context->page_virt_start, nv_mem_context->mapped_size,
+                               &nv_mem_context->page_table, NULL, NULL);
+#endif
+
    if (ret < 0) {
        peer_err("error %d while calling nvidia_p2p_get_pages() with NULL callback\n", ret);
        return ret;
@@ -459,8 +470,6 @@ static int __init nv_mem_client_init(void)
    }

 #if defined (NV_MLNX_IB_PEER_MEM_SYMBOLS_PRESENT)
-    int status = 0;
-
    // off by one, to leave space for the trailing '1' which is flagging
    // the new client type
    BUG_ON(strlen(DRV_NAME) > IB_PEER_MEMORY_NAME_MAX-1);
@@ -489,7 +498,7 @@ static int __init nv_mem_client_init(void)
                         &mem_invalidate_callback);
    if (!reg_handle) {
        peer_err("nv_mem_client_init -- error while registering traditional client\n");
-        status = -EINVAL;
+        rc = -EINVAL;
        goto out;
    }

@@ -499,12 +508,12 @@ static int __init nv_mem_client_init(void)
    reg_handle_nc = ib_register_peer_memory_client(&nv_mem_client_nc, NULL);
    if (!reg_handle_nc) {
        peer_err("nv_mem_client_init -- error while registering nc client\n");
-        status = -EINVAL;
+        rc = -EINVAL;
        goto out;
    }

 out:
-    if (status) {
+    if (rc) {
        if (reg_handle) {
            ib_unregister_peer_memory_client(reg_handle);
            reg_handle = NULL;
@@ -516,7 +525,7 @@ out:
        }
    }

-    return status;
+    return rc;
 #else
    return -EINVAL;
 #endif
--- a/kernel-open/nvidia-uvm/clc365.h
+++ b/kernel-open/nvidia-uvm/clc365.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2022 NVIDIA Corporation
+    Copyright (c) 2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
--- a/kernel-open/nvidia-uvm/clc369.h
+++ b/kernel-open/nvidia-uvm/clc369.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2022 NVIDIA Corporation
+    Copyright (c) 2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
--- a/kernel-open/nvidia-uvm/nv-kthread-q.c
+++ b/kernel-open/nvidia-uvm/nv-kthread-q.c
@@ -247,6 +247,11 @@ int nv_kthread_q_init_on_node(nv_kthread_q_t *q, const char *q_name, int preferr
    return 0;
 }

+int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname)
+{
+    return nv_kthread_q_init_on_node(q, qname, NV_KTHREAD_NO_NODE);
+}
+
 // Returns true (non-zero) if the item was actually scheduled, and false if the
 // item was already pending in a queue.
 static int _raw_q_schedule(nv_kthread_q_t *q, nv_kthread_q_item_t *q_item)
--- a/kernel-open/nvidia-uvm/nvidia-uvm-sources.Kbuild
+++ b/kernel-open/nvidia-uvm/nvidia-uvm-sources.Kbuild
@@ -27,6 +27,7 @@ NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_rm_mem.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_channel.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_lock.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_hal.c
+NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_processors.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_range_tree.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_rb_tree.c
 NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_range_allocator.c
--- a/kernel-open/nvidia-uvm/nvidia-uvm.Kbuild
+++ b/kernel-open/nvidia-uvm/nvidia-uvm.Kbuild
@@ -81,12 +81,13 @@ NV_CONFTEST_FUNCTION_COMPILE_TESTS += set_memory_uc
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += set_pages_uc
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += ktime_get_raw_ts64
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += ioasid_get
-NV_CONFTEST_FUNCTION_COMPILE_TESTS += mm_pasid_set
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += mm_pasid_drop
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += migrate_vma_setup
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += mmget_not_zero
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += mmgrab
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += iommu_sva_bind_device_has_drvdata_arg
 NV_CONFTEST_FUNCTION_COMPILE_TESTS += vm_fault_to_errno
+NV_CONFTEST_FUNCTION_COMPILE_TESTS += find_next_bit_wrap

 NV_CONFTEST_TYPE_COMPILE_TESTS += backing_dev_info
 NV_CONFTEST_TYPE_COMPILE_TESTS += mm_context_t
@@ -100,6 +101,7 @@ NV_CONFTEST_TYPE_COMPILE_TESTS += kmem_cache_has_kobj_remove_work
 NV_CONFTEST_TYPE_COMPILE_TESTS += sysfs_slab_unlink
 NV_CONFTEST_TYPE_COMPILE_TESTS += vm_fault_t
 NV_CONFTEST_TYPE_COMPILE_TESTS += mmu_notifier_ops_invalidate_range
+NV_CONFTEST_TYPE_COMPILE_TESTS += mmu_notifier_ops_arch_invalidate_secondary_tlbs
 NV_CONFTEST_TYPE_COMPILE_TESTS += proc_ops
 NV_CONFTEST_TYPE_COMPILE_TESTS += timespec64
 NV_CONFTEST_TYPE_COMPILE_TESTS += mm_has_mmap_lock
@@ -110,5 +112,7 @@ NV_CONFTEST_TYPE_COMPILE_TESTS += handle_mm_fault_has_mm_arg
 NV_CONFTEST_TYPE_COMPILE_TESTS += handle_mm_fault_has_pt_regs_arg
 NV_CONFTEST_TYPE_COMPILE_TESTS += mempolicy_has_unified_nodes
 NV_CONFTEST_TYPE_COMPILE_TESTS += mempolicy_has_home_node
+NV_CONFTEST_TYPE_COMPILE_TESTS += mpol_preferred_many_present
+NV_CONFTEST_TYPE_COMPILE_TESTS += mmu_interval_notifier

 NV_CONFTEST_SYMBOL_COMPILE_TESTS += is_export_symbol_present_int_active_memcg
--- a/kernel-open/nvidia-uvm/nvstatus.c
+++ b/kernel-open/nvidia-uvm/nvstatus.c
@@ -24,11 +24,11 @@
 #include "nvstatus.h"

 #if !defined(NV_PRINTF_STRING_SECTION)
-#if defined(NVRM) && NVCPU_IS_RISCV64
+#if defined(NVRM) && NVOS_IS_LIBOS
 #define NV_PRINTF_STRING_SECTION         __attribute__ ((section (".logging")))
-#else // defined(NVRM) && NVCPU_IS_RISCV64
+#else // defined(NVRM) && NVOS_IS_LIBOS
 #define NV_PRINTF_STRING_SECTION
-#endif // defined(NVRM) && NVCPU_IS_RISCV64
+#endif // defined(NVRM) && NVOS_IS_LIBOS
 #endif // !defined(NV_PRINTF_STRING_SECTION)

 /*
--- a/kernel-open/nvidia-uvm/uvm.c
+++ b/kernel-open/nvidia-uvm/uvm.c
@@ -571,7 +571,6 @@ static void uvm_vm_open_managed_entry(struct vm_area_struct *vma)
 static void uvm_vm_close_managed(struct vm_area_struct *vma)
 {
    uvm_va_space_t *va_space = uvm_va_space_get(vma->vm_file);
-    uvm_processor_id_t gpu_id;
    bool make_zombie = false;

    if (current->mm != NULL)
@@ -606,12 +605,6 @@ static void uvm_vm_close_managed(struct vm_area_struct *vma)

    uvm_destroy_vma_managed(vma, make_zombie);

-    // Notify GPU address spaces that the fault buffer needs to be flushed to
-    // avoid finding stale entries that can be attributed to new VA ranges
-    // reallocated at the same address.
-    for_each_gpu_id_in_mask(gpu_id, &va_space->registered_gpu_va_spaces) {
-        uvm_processor_mask_set_atomic(&va_space->needs_fault_buffer_flush, gpu_id);
-    }
    uvm_va_space_up_write(va_space);

    if (current->mm != NULL)
--- a/kernel-open/nvidia-uvm/uvm.h
+++ b/kernel-open/nvidia-uvm/uvm.h
@@ -216,6 +216,10 @@ NV_STATUS UvmDeinitialize(void);
 // Note that it is not required to release VA ranges that were reserved with
 // UvmReserveVa().
 //
+// This is useful for per-process checkpoint and restore, where kernel-mode
+// state needs to be reconfigured to match the expectations of a pre-existing
+// user-mode process.
+//
 // UvmReopen() closes the open file returned by UvmGetFileDescriptor() and
 // replaces it with a new open file with the same name.
 //
--- a/kernel-open/nvidia-uvm/uvm_ampere_ce.c
+++ b/kernel-open/nvidia-uvm/uvm_ampere_ce.c
@@ -121,6 +121,8 @@ bool uvm_hal_ampere_ce_memcopy_is_valid_c6b5(uvm_push_t *push, uvm_gpu_address_t
        return true;

    if (uvm_channel_is_proxy(push->channel)) {
+        uvm_pushbuffer_t *pushbuffer;
+
        if (dst.is_virtual) {
            UVM_ERR_PRINT("Destination address of memcopy must be physical, not virtual\n");
            return false;
@@ -142,7 +144,8 @@ bool uvm_hal_ampere_ce_memcopy_is_valid_c6b5(uvm_push_t *push, uvm_gpu_address_t
            return false;
        }

-        push_begin_gpu_va = uvm_pushbuffer_get_gpu_va_for_push(push->channel->pool->manager->pushbuffer, push);
+        pushbuffer = uvm_channel_get_pushbuffer(push->channel);
+        push_begin_gpu_va = uvm_pushbuffer_get_gpu_va_for_push(pushbuffer, push);

        if ((src.address < push_begin_gpu_va) || (src.address >= push_begin_gpu_va + uvm_push_get_size(push))) {
            UVM_ERR_PRINT("Source address of memcopy must point to pushbuffer\n");
@@ -177,10 +180,13 @@ bool uvm_hal_ampere_ce_memcopy_is_valid_c6b5(uvm_push_t *push, uvm_gpu_address_t
 // irrespective of the virtualization mode.
 void uvm_hal_ampere_ce_memcopy_patch_src_c6b5(uvm_push_t *push, uvm_gpu_address_t *src)
 {
+    uvm_pushbuffer_t *pushbuffer;
+
    if (!uvm_channel_is_proxy(push->channel))
        return;

-    src->address -= uvm_pushbuffer_get_gpu_va_for_push(push->channel->pool->manager->pushbuffer, push);
+    pushbuffer = uvm_channel_get_pushbuffer(push->channel);
+    src->address -= uvm_pushbuffer_get_gpu_va_for_push(pushbuffer, push);
 }

 bool uvm_hal_ampere_ce_memset_is_valid_c6b5(uvm_push_t *push,
--- a/kernel-open/nvidia-uvm/uvm_ats.c
+++ b/kernel-open/nvidia-uvm/uvm_ats.c
@@ -44,6 +44,8 @@ void uvm_ats_init(const UvmPlatformInfo *platform_info)

 void uvm_ats_init_va_space(uvm_va_space_t *va_space)
 {
+    uvm_init_rwsem(&va_space->ats.lock, UVM_LOCK_ORDER_LEAF);
+
    if (UVM_ATS_IBM_SUPPORTED())
        uvm_ats_ibm_init_va_space(va_space);
 }
--- a/kernel-open/nvidia-uvm/uvm_ats.h
+++ b/kernel-open/nvidia-uvm/uvm_ats.h
@@ -28,17 +28,32 @@
 #include "uvm_forward_decl.h"
 #include "uvm_ats_ibm.h"
 #include "nv_uvm_types.h"
+#include "uvm_lock.h"

    #include "uvm_ats_sva.h"

    #define UVM_ATS_SUPPORTED() (UVM_ATS_IBM_SUPPORTED() || UVM_ATS_SVA_SUPPORTED())

+// ATS prefetcher uses hmm_range_fault() to query residency information.
+// hmm_range_fault() needs CONFIG_HMM_MIRROR. To detect racing CPU invalidates
+// of memory regions while hmm_range_fault() is being called, MMU interval
+// notifiers are needed.
+    #if defined(CONFIG_HMM_MIRROR) && defined(NV_MMU_INTERVAL_NOTIFIER)
+        #define UVM_ATS_PREFETCH_SUPPORTED() 1
+    #else
+        #define UVM_ATS_PREFETCH_SUPPORTED() 0
+    #endif
+
 typedef struct
 {
    // Mask of gpu_va_spaces which are registered for ATS access. The mask is
    // indexed by gpu->id. This mask is protected by the VA space lock.
    uvm_processor_mask_t registered_gpu_va_spaces;

+    // Protects racing invalidates in the VA space while hmm_range_fault() is
+    // being called in ats_compute_residency_mask().
+    uvm_rw_semaphore_t lock;
+
    union
    {
        uvm_ibm_va_space_t ibm;
--- a/kernel-open/nvidia-uvm/uvm_ats_faults.c
+++ b/kernel-open/nvidia-uvm/uvm_ats_faults.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2018 NVIDIA Corporation
+    Copyright (c) 2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -20,60 +20,19 @@
    DEALINGS IN THE SOFTWARE.
 *******************************************************************************/

+#include "uvm_api.h"
 #include "uvm_tools.h"
 #include "uvm_va_range.h"
+#include "uvm_ats.h"
 #include "uvm_ats_faults.h"
 #include "uvm_migrate_pageable.h"
+#include <linux/nodemask.h>
 #include <linux/mempolicy.h>
+#include <linux/mmu_notifier.h>

-// TODO: Bug 2103669: Implement a real prefetching policy and remove or adapt
-// these experimental parameters. These are intended to help guide that policy.
-static unsigned int uvm_exp_perf_prefetch_ats_order_replayable = 0;
-module_param(uvm_exp_perf_prefetch_ats_order_replayable, uint, 0644);
-MODULE_PARM_DESC(uvm_exp_perf_prefetch_ats_order_replayable,
-                 "Max order of pages (2^N) to prefetch on replayable ATS faults");
-
-static unsigned int uvm_exp_perf_prefetch_ats_order_non_replayable = 0;
-module_param(uvm_exp_perf_prefetch_ats_order_non_replayable, uint, 0644);
-MODULE_PARM_DESC(uvm_exp_perf_prefetch_ats_order_non_replayable,
-                 "Max order of pages (2^N) to prefetch on non-replayable ATS faults");
-
-// Expand the fault region to the naturally-aligned region with order given by
-// the module parameters, clamped to the vma containing fault_addr (if any).
-// Note that this means the region contains fault_addr but may not begin at
-// fault_addr.
-static void expand_fault_region(struct vm_area_struct *vma,
-                                NvU64 start,
-                                size_t length,
-                                uvm_fault_client_type_t client_type,
-                                unsigned long *migrate_start,
-                                unsigned long *migrate_length)
-{
-    unsigned int order;
-    unsigned long outer, aligned_start, aligned_size;
-
-    *migrate_start = start;
-    *migrate_length = length;
-
-    if (client_type == UVM_FAULT_CLIENT_TYPE_HUB)
-        order = uvm_exp_perf_prefetch_ats_order_non_replayable;
-    else
-        order = uvm_exp_perf_prefetch_ats_order_replayable;
-
-    if (order == 0)
-        return;
-
-    UVM_ASSERT(vma);
-    UVM_ASSERT(order < BITS_PER_LONG - PAGE_SHIFT);
-
-    aligned_size = (1UL << order) * PAGE_SIZE;
-
-    aligned_start = start & ~(aligned_size - 1);
-
-    *migrate_start = max(vma->vm_start, aligned_start);
-    outer = min(vma->vm_end, aligned_start + aligned_size);
-    *migrate_length = outer - *migrate_start;
-}
+#if UVM_ATS_PREFETCH_SUPPORTED()
+#include <linux/hmm.h>
+#endif

 static NV_STATUS service_ats_faults(uvm_gpu_va_space_t *gpu_va_space,
                                    struct vm_area_struct *vma,
@@ -122,6 +81,8 @@ static NV_STATUS service_ats_faults(uvm_gpu_va_space_t *gpu_va_space,
        .mm                             = mm,
        .dst_id                         = ats_context->residency_id,
        .dst_node_id                    = ats_context->residency_node,
+        .start                          = start,
+        .length                         = length,
        .populate_permissions           = write ? UVM_POPULATE_PERMISSIONS_WRITE : UVM_POPULATE_PERMISSIONS_ANY,
        .touch                          = true,
        .skip_mapped                    = true,
@@ -132,13 +93,6 @@ static NV_STATUS service_ats_faults(uvm_gpu_va_space_t *gpu_va_space,

    UVM_ASSERT(uvm_ats_can_service_faults(gpu_va_space, mm));

-    expand_fault_region(vma,
-                        start,
-                        length,
-                        ats_context->client_type,
-                        &uvm_migrate_args.start,
-                        &uvm_migrate_args.length);
-
    // We are trying to use migrate_vma API in the kernel (if it exists) to
    // populate and map the faulting region on the GPU. We want to do this only
    // on the first touch. That is, pages which are not already mapped. So, we
@@ -160,6 +114,8 @@ static void flush_tlb_write_faults(uvm_gpu_va_space_t *gpu_va_space,
 {
    uvm_ats_fault_invalidate_t *ats_invalidate;

+    uvm_ats_smmu_invalidate_tlbs(gpu_va_space, addr, size);
+
    if (client_type == UVM_FAULT_CLIENT_TYPE_GPC)
        ats_invalidate = &gpu_va_space->gpu->parent->fault_buffer_info.replayable.ats_invalidate;
    else
@@ -184,12 +140,22 @@ static void ats_batch_select_residency(uvm_gpu_va_space_t *gpu_va_space,
    struct mempolicy *vma_policy = vma_policy(vma);
    unsigned short mode;

+    ats_context->prefetch_state.has_preferred_location = false;
+
+    // It's safe to read vma_policy since the mmap_lock is held in at least read
+    // mode in this path.
+    uvm_assert_mmap_lock_locked(vma->vm_mm);
+
    if (!vma_policy)
        goto done;

    mode = vma_policy->mode;

-    if ((mode == MPOL_BIND) || (mode == MPOL_PREFERRED_MANY) || (mode == MPOL_PREFERRED)) {
+    if ((mode == MPOL_BIND)
+#if defined(NV_MPOL_PREFERRED_MANY_PRESENT)
+         || (mode == MPOL_PREFERRED_MANY)
+#endif
+         || (mode == MPOL_PREFERRED)) {
        int home_node = NUMA_NO_NODE;

 #if defined(NV_MEMPOLICY_HAS_HOME_NODE)
@@ -212,6 +178,9 @@ static void ats_batch_select_residency(uvm_gpu_va_space_t *gpu_va_space,
            else
                residency = first_node(vma_policy->nodes);
        }
+
+        if (!nodes_empty(vma_policy->nodes))
+            ats_context->prefetch_state.has_preferred_location = true;
    }

    // Update gpu if residency is not the faulting gpu.
@@ -219,12 +188,253 @@ static void ats_batch_select_residency(uvm_gpu_va_space_t *gpu_va_space,
        gpu = uvm_va_space_find_gpu_with_memory_node_id(gpu_va_space->va_space, residency);

 done:
+#else
+    ats_context->prefetch_state.has_preferred_location = false;
 #endif

    ats_context->residency_id = gpu ? gpu->parent->id : UVM_ID_CPU;
    ats_context->residency_node = residency;
 }

+static void get_range_in_vma(struct vm_area_struct *vma, NvU64 base, NvU64 *start, NvU64 *end)
+{
+    *start = max(vma->vm_start, (unsigned long) base);
+    *end = min(vma->vm_end, (unsigned long) (base + UVM_VA_BLOCK_SIZE));
+}
+
+static uvm_page_index_t uvm_ats_cpu_page_index(NvU64 base, NvU64 addr)
+{
+    UVM_ASSERT(addr >= base);
+    UVM_ASSERT(addr <= (base + UVM_VA_BLOCK_SIZE));
+
+    return (addr - base) / PAGE_SIZE;
+}
+
+// start and end must be aligned to PAGE_SIZE and must fall within
+// [base, base + UVM_VA_BLOCK_SIZE]
+static uvm_va_block_region_t uvm_ats_region_from_start_end(NvU64 start, NvU64 end)
+{
+    // base can be greater than, less than or equal to the start of a VMA.
+    NvU64 base = UVM_VA_BLOCK_ALIGN_DOWN(start);
+
+    UVM_ASSERT(start < end);
+    UVM_ASSERT(PAGE_ALIGNED(start));
+    UVM_ASSERT(PAGE_ALIGNED(end));
+    UVM_ASSERT(IS_ALIGNED(base, UVM_VA_BLOCK_SIZE));
+
+    return uvm_va_block_region(uvm_ats_cpu_page_index(base, start), uvm_ats_cpu_page_index(base, end));
+}
+
+static uvm_va_block_region_t uvm_ats_region_from_vma(struct vm_area_struct *vma, NvU64 base)
+{
+    NvU64 start;
+    NvU64 end;
+
+    get_range_in_vma(vma, base, &start, &end);
+
+    return uvm_ats_region_from_start_end(start, end);
+}
+
+#if UVM_ATS_PREFETCH_SUPPORTED()
+
+static bool uvm_ats_invalidate_notifier(struct mmu_interval_notifier *mni, unsigned long cur_seq)
+{
+    uvm_ats_fault_context_t *ats_context = container_of(mni, uvm_ats_fault_context_t, prefetch_state.notifier);
+    uvm_va_space_t *va_space = ats_context->prefetch_state.va_space;
+
+    // The following write lock protects against concurrent invalidates while
+    // hmm_range_fault() is being called in ats_compute_residency_mask().
+    uvm_down_write(&va_space->ats.lock);
+
+    mmu_interval_set_seq(mni, cur_seq);
+
+    uvm_up_write(&va_space->ats.lock);
+
+    return true;
+}
+
+static bool uvm_ats_invalidate_notifier_entry(struct mmu_interval_notifier *mni,
+                                              const struct mmu_notifier_range *range,
+                                              unsigned long cur_seq)
+{
+    UVM_ENTRY_RET(uvm_ats_invalidate_notifier(mni, cur_seq));
+}
+
+static const struct mmu_interval_notifier_ops uvm_ats_notifier_ops =
+{
+    .invalidate = uvm_ats_invalidate_notifier_entry,
+};
+
+#endif
+
+static NV_STATUS ats_compute_residency_mask(uvm_gpu_va_space_t *gpu_va_space,
+                                            struct vm_area_struct *vma,
+                                            NvU64 base,
+                                            uvm_ats_fault_context_t *ats_context)
+{
+    NV_STATUS status = NV_OK;
+
+#if UVM_ATS_PREFETCH_SUPPORTED()
+    int ret;
+    NvU64 start;
+    NvU64 end;
+    uvm_page_mask_t *residency_mask = &ats_context->prefetch_state.residency_mask;
+    struct hmm_range range;
+    uvm_page_index_t page_index;
+    uvm_va_block_region_t vma_region;
+    uvm_va_space_t *va_space = gpu_va_space->va_space;
+    struct mm_struct *mm = va_space->va_space_mm.mm;
+
+    uvm_assert_rwsem_locked_read(&va_space->lock);
+
+    ats_context->prefetch_state.first_touch = true;
+
+    uvm_page_mask_zero(residency_mask);
+
+    get_range_in_vma(vma, base, &start, &end);
+
+    vma_region = uvm_ats_region_from_start_end(start, end);
+
+    range.notifier = &ats_context->prefetch_state.notifier;
+    range.start = start;
+    range.end = end;
+    range.hmm_pfns = ats_context->prefetch_state.pfns;
+    range.default_flags = 0;
+    range.pfn_flags_mask = 0;
+    range.dev_private_owner = NULL;
+
+    ats_context->prefetch_state.va_space = va_space;
+
+    // mmu_interval_notifier_insert() will try to acquire mmap_lock for write
+    // and will deadlock since mmap_lock is already held for read in this path.
+    // This is prevented by calling __mmu_notifier_register() during va_space
+    // creation. See the comment in uvm_mmu_notifier_register() for more
+    // details.
+    ret = mmu_interval_notifier_insert(range.notifier, mm, start, end, &uvm_ats_notifier_ops);
+    if (ret)
+        return errno_to_nv_status(ret);
+
+    while (true) {
+        range.notifier_seq = mmu_interval_read_begin(range.notifier);
+        ret = hmm_range_fault(&range);
+        if (ret == -EBUSY)
+            continue;
+        if (ret) {
+            status = errno_to_nv_status(ret);
+            UVM_ASSERT(status != NV_OK);
+            break;
+        }
+
+        uvm_down_read(&va_space->ats.lock);
+
+        // Pages may have been freed or re-allocated after hmm_range_fault() is
+        // called. So the PTE might point to a different page or nothing. In the
+        // memory hot-unplug case it is not safe to call page_to_nid() on the
+        // page as the struct page itself may have been freed. To protect
+        // against these cases, uvm_ats_invalidate_entry() blocks on va_space
+        // ATS write lock for concurrent invalidates since va_space ATS lock is
+        // held for read in this path.
+        if (!mmu_interval_read_retry(range.notifier, range.notifier_seq))
+            break;
+
+        uvm_up_read(&va_space->ats.lock);
+    }
+
+    if (status == NV_OK) {
+        for_each_va_block_page_in_region(page_index, vma_region) {
+            unsigned long pfn = ats_context->prefetch_state.pfns[page_index - vma_region.first];
+
+            if (pfn & HMM_PFN_VALID) {
+                struct page *page = hmm_pfn_to_page(pfn);
+
+                if (page_to_nid(page) == ats_context->residency_node)
+                    uvm_page_mask_set(residency_mask, page_index);
+
+                ats_context->prefetch_state.first_touch = false;
+            }
+        }
+
+        uvm_up_read(&va_space->ats.lock);
+    }
+
+    mmu_interval_notifier_remove(range.notifier);
+
+#endif
+
+    return status;
+}
+
+static void ats_expand_fault_region(uvm_gpu_va_space_t *gpu_va_space,
+                                    struct vm_area_struct *vma,
+                                    uvm_ats_fault_context_t *ats_context,
+                                    uvm_va_block_region_t max_prefetch_region,
+                                    uvm_page_mask_t *faulted_mask)
+{
+    uvm_page_mask_t *read_fault_mask = &ats_context->read_fault_mask;
+    uvm_page_mask_t *write_fault_mask = &ats_context->write_fault_mask;
+    uvm_page_mask_t *residency_mask = &ats_context->prefetch_state.residency_mask;
+    uvm_page_mask_t *prefetch_mask = &ats_context->prefetch_state.prefetch_pages_mask;
+    uvm_perf_prefetch_bitmap_tree_t *bitmap_tree = &ats_context->prefetch_state.bitmap_tree;
+
+    if (uvm_page_mask_empty(faulted_mask))
+        return;
+
+    uvm_perf_prefetch_compute_ats(gpu_va_space->va_space,
+                                  faulted_mask,
+                                  uvm_va_block_region_from_mask(NULL, faulted_mask),
+                                  max_prefetch_region,
+                                  residency_mask,
+                                  bitmap_tree,
+                                  prefetch_mask);
+
+    uvm_page_mask_or(read_fault_mask, read_fault_mask, prefetch_mask);
+
+    if (vma->vm_flags & VM_WRITE)
+        uvm_page_mask_or(write_fault_mask, write_fault_mask, prefetch_mask);
+}
+
+static NV_STATUS ats_fault_prefetch(uvm_gpu_va_space_t *gpu_va_space,
+                                    struct vm_area_struct *vma,
+                                    NvU64 base,
+                                    uvm_ats_fault_context_t *ats_context)
+{
+    NV_STATUS status = NV_OK;
+    uvm_page_mask_t *read_fault_mask = &ats_context->read_fault_mask;
+    uvm_page_mask_t *write_fault_mask = &ats_context->write_fault_mask;
+    uvm_page_mask_t *faulted_mask = &ats_context->faulted_mask;
+    uvm_page_mask_t *prefetch_mask = &ats_context->prefetch_state.prefetch_pages_mask;
+    uvm_va_block_region_t max_prefetch_region = uvm_ats_region_from_vma(vma, base);
+
+    if (!uvm_perf_prefetch_enabled(gpu_va_space->va_space))
+        return status;
+
+    if (uvm_page_mask_empty(faulted_mask))
+        return status;
+
+    status = ats_compute_residency_mask(gpu_va_space, vma, base, ats_context);
+    if (status != NV_OK)
+        return status;
+
+    // Prefetch the entire region if none of the pages are resident on any node
+    // and if preferred_location is the faulting GPU.
+    if (ats_context->prefetch_state.has_preferred_location &&
+        ats_context->prefetch_state.first_touch &&
+        uvm_id_equal(ats_context->residency_id, gpu_va_space->gpu->parent->id)) {
+
+        uvm_page_mask_init_from_region(prefetch_mask, max_prefetch_region, NULL);
+        uvm_page_mask_or(read_fault_mask, read_fault_mask, prefetch_mask);
+
+        if (vma->vm_flags & VM_WRITE)
+            uvm_page_mask_or(write_fault_mask, write_fault_mask, prefetch_mask);
+
+        return status;
+    }
+
+    ats_expand_fault_region(gpu_va_space, vma, ats_context, max_prefetch_region, faulted_mask);
+
+    return status;
+}
+
 NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
                                 struct vm_area_struct *vma,
                                 NvU64 base,
@@ -263,10 +473,16 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
            uvm_page_mask_and(write_fault_mask, write_fault_mask, read_fault_mask);
        else
            uvm_page_mask_zero(write_fault_mask);
+
+        // There are no pending faults beyond write faults to RO region.
+        if (uvm_page_mask_empty(read_fault_mask))
+            return status;
    }

    ats_batch_select_residency(gpu_va_space, vma, ats_context);

+    ats_fault_prefetch(gpu_va_space, vma, base, ats_context);
+
    for_each_va_block_subregion_in_mask(subregion, write_fault_mask, region) {
        NvU64 start = base + (subregion.first * PAGE_SIZE);
        size_t length = uvm_va_block_region_num_pages(subregion) * PAGE_SIZE;
@@ -374,4 +590,3 @@ NV_STATUS uvm_ats_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space,

    return status;
 }
-
--- a/kernel-open/nvidia-uvm/uvm_ats_sva.c
+++ b/kernel-open/nvidia-uvm/uvm_ats_sva.c
@@ -29,8 +29,12 @@
 #include "uvm_va_space.h"
 #include "uvm_va_space_mm.h"

+#include <asm/io.h>
 #include <linux/iommu.h>
 #include <linux/mm_types.h>
+#include <linux/acpi.h>
+#include <linux/device.h>
+#include <linux/mmu_context.h>

 // linux/sched/mm.h is needed for mmget_not_zero and mmput to get the mm
 // reference required for the iommu_sva_bind_device() call. This header is not
@@ -46,17 +50,271 @@
 #define UVM_IOMMU_SVA_BIND_DEVICE(dev, mm) iommu_sva_bind_device(dev, mm)
 #endif

+// Base address of SMMU CMDQ-V for GSMMU0.
+#define SMMU_CMDQV_BASE_ADDR(smmu_base) (smmu_base + 0x200000)
+#define SMMU_CMDQV_BASE_LEN 0x00830000
+
+// CMDQV configuration is done by firmware but we check status here.
+#define SMMU_CMDQV_CONFIG 0x0
+#define SMMU_CMDQV_CONFIG_CMDQV_EN BIT(0)
+
+// Used to map a particular VCMDQ to a VINTF.
+#define SMMU_CMDQV_CMDQ_ALLOC_MAP(vcmdq_id) (0x200 + 0x4 * (vcmdq_id))
+#define SMMU_CMDQV_CMDQ_ALLOC_MAP_ALLOC BIT(0)
+
+// Shift for the field containing the index of the virtual interface
+// owning the VCMDQ.
+#define SMMU_CMDQV_CMDQ_ALLOC_MAP_VIRT_INTF_INDX_SHIFT 15
+
+// Base address for the VINTF registers.
+#define SMMU_VINTF_BASE_ADDR(cmdqv_base_addr, vintf_id) (cmdqv_base_addr + 0x1000 + 0x100 * (vintf_id))
+
+// Virtual interface (VINTF) configuration registers. The WAR only
+// works on baremetal so we need to configure ourselves as the
+// hypervisor owner.
+#define SMMU_VINTF_CONFIG 0x0
+#define SMMU_VINTF_CONFIG_ENABLE BIT(0)
+#define SMMU_VINTF_CONFIG_HYP_OWN BIT(17)
+
+#define SMMU_VINTF_STATUS 0x0
+#define SMMU_VINTF_STATUS_ENABLED BIT(0)
+
+// Caclulates the base address for a particular VCMDQ instance.
+#define SMMU_VCMDQ_BASE_ADDR(cmdqv_base_addr, vcmdq_id) (cmdqv_base_addr + 0x10000 + 0x80 * (vcmdq_id))
+
+// SMMU command queue consumer index register. Updated by SMMU
+// when commands are consumed.
+#define SMMU_VCMDQ_CONS 0x0
+
+// SMMU command queue producer index register. Updated by UVM when
+// commands are added to the queue.
+#define SMMU_VCMDQ_PROD 0x4
+
+// Configuration register used to enable a VCMDQ.
+#define SMMU_VCMDQ_CONFIG 0x8
+#define SMMU_VCMDQ_CONFIG_ENABLE BIT(0)
+
+// Status register used to check the VCMDQ is enabled.
+#define SMMU_VCMDQ_STATUS 0xc
+#define SMMU_VCMDQ_STATUS_ENABLED BIT(0)
+
+// Base address offset for the VCMDQ registers.
+#define SMMU_VCMDQ_CMDQ_BASE 0x10000
+
+// Size of the command queue. Each command is 8 bytes and we can't
+// have a command queue greater than one page.
+#define SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE 9
+#define SMMU_VCMDQ_CMDQ_ENTRIES (1UL << SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE)
+
+// We always use VINTF63 for the WAR
+#define VINTF 63
+static void smmu_vintf_write32(void __iomem *smmu_cmdqv_base, int reg, NvU32 val)
+{
+    iowrite32(val, SMMU_VINTF_BASE_ADDR(smmu_cmdqv_base, VINTF) + reg);
+}
+
+static NvU32 smmu_vintf_read32(void __iomem *smmu_cmdqv_base, int reg)
+{
+    return ioread32(SMMU_VINTF_BASE_ADDR(smmu_cmdqv_base, VINTF) + reg);
+}
+
+// We always use VCMDQ127 for the WAR
+#define VCMDQ 127
+void smmu_vcmdq_write32(void __iomem *smmu_cmdqv_base, int reg, NvU32 val)
+{
+    iowrite32(val, SMMU_VCMDQ_BASE_ADDR(smmu_cmdqv_base, VCMDQ) + reg);
+}
+
+NvU32 smmu_vcmdq_read32(void __iomem *smmu_cmdqv_base, int reg)
+{
+    return ioread32(SMMU_VCMDQ_BASE_ADDR(smmu_cmdqv_base, VCMDQ) + reg);
+}
+
+static void smmu_vcmdq_write64(void __iomem *smmu_cmdqv_base, int reg, NvU64 val)
+{
+    iowrite64(val, SMMU_VCMDQ_BASE_ADDR(smmu_cmdqv_base, VCMDQ) + reg);
+}
+
+// Fix for Bug 4130089: [GH180][r535] WAR for kernel not issuing SMMU
+// TLB invalidates on read-only to read-write upgrades
+static NV_STATUS uvm_ats_smmu_war_init(uvm_parent_gpu_t *parent_gpu)
+{
+    uvm_spin_loop_t spin;
+    NV_STATUS status;
+    unsigned long cmdqv_config;
+    void __iomem *smmu_cmdqv_base;
+    struct acpi_iort_node *node;
+    struct acpi_iort_smmu_v3 *iort_smmu;
+
+    node = *(struct acpi_iort_node **) dev_get_platdata(parent_gpu->pci_dev->dev.iommu->iommu_dev->dev->parent);
+    iort_smmu = (struct acpi_iort_smmu_v3 *) node->node_data;
+
+    smmu_cmdqv_base = ioremap(SMMU_CMDQV_BASE_ADDR(iort_smmu->base_address), SMMU_CMDQV_BASE_LEN);
+    if (!smmu_cmdqv_base)
+        return NV_ERR_NO_MEMORY;
+
+    parent_gpu->smmu_war.smmu_cmdqv_base = smmu_cmdqv_base;
+    cmdqv_config = ioread32(smmu_cmdqv_base + SMMU_CMDQV_CONFIG);
+    if (!(cmdqv_config & SMMU_CMDQV_CONFIG_CMDQV_EN)) {
+        status = NV_ERR_OBJECT_NOT_FOUND;
+        goto out;
+    }
+
+    // Allocate SMMU CMDQ pages for WAR
+    parent_gpu->smmu_war.smmu_cmdq = alloc_page(NV_UVM_GFP_FLAGS | __GFP_ZERO);
+    if (!parent_gpu->smmu_war.smmu_cmdq) {
+        status = NV_ERR_NO_MEMORY;
+        goto out;
+    }
+
+    // Initialise VINTF for the WAR
+    smmu_vintf_write32(smmu_cmdqv_base, SMMU_VINTF_CONFIG, SMMU_VINTF_CONFIG_ENABLE | SMMU_VINTF_CONFIG_HYP_OWN);
+    UVM_SPIN_WHILE(!(smmu_vintf_read32(smmu_cmdqv_base, SMMU_VINTF_STATUS) & SMMU_VINTF_STATUS_ENABLED), &spin);
+
+    // Allocate VCMDQ to VINTF
+    iowrite32((VINTF << SMMU_CMDQV_CMDQ_ALLOC_MAP_VIRT_INTF_INDX_SHIFT) | SMMU_CMDQV_CMDQ_ALLOC_MAP_ALLOC,
+              smmu_cmdqv_base + SMMU_CMDQV_CMDQ_ALLOC_MAP(VCMDQ));
+
+    BUILD_BUG_ON((SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE + 3) > PAGE_SHIFT);
+    smmu_vcmdq_write64(smmu_cmdqv_base, SMMU_VCMDQ_CMDQ_BASE,
+                       page_to_phys(parent_gpu->smmu_war.smmu_cmdq) | SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE);
+    smmu_vcmdq_write32(smmu_cmdqv_base, SMMU_VCMDQ_CONS, 0);
+    smmu_vcmdq_write32(smmu_cmdqv_base, SMMU_VCMDQ_PROD, 0);
+    smmu_vcmdq_write32(smmu_cmdqv_base, SMMU_VCMDQ_CONFIG, SMMU_VCMDQ_CONFIG_ENABLE);
+    UVM_SPIN_WHILE(!(smmu_vcmdq_read32(smmu_cmdqv_base, SMMU_VCMDQ_STATUS) & SMMU_VCMDQ_STATUS_ENABLED), &spin);
+
+    uvm_mutex_init(&parent_gpu->smmu_war.smmu_lock, UVM_LOCK_ORDER_LEAF);
+    parent_gpu->smmu_war.smmu_prod = 0;
+    parent_gpu->smmu_war.smmu_cons = 0;
+
+    return NV_OK;
+
+out:
+    iounmap(parent_gpu->smmu_war.smmu_cmdqv_base);
+    parent_gpu->smmu_war.smmu_cmdqv_base = NULL;
+
+    return status;
+}
+
+static void uvm_ats_smmu_war_deinit(uvm_parent_gpu_t *parent_gpu)
+{
+    void __iomem *smmu_cmdqv_base = parent_gpu->smmu_war.smmu_cmdqv_base;
+    NvU32 cmdq_alloc_map;
+
+    if (parent_gpu->smmu_war.smmu_cmdqv_base) {
+        smmu_vcmdq_write32(smmu_cmdqv_base, SMMU_VCMDQ_CONFIG, 0);
+        cmdq_alloc_map = ioread32(smmu_cmdqv_base + SMMU_CMDQV_CMDQ_ALLOC_MAP(VCMDQ));
+        iowrite32(cmdq_alloc_map & SMMU_CMDQV_CMDQ_ALLOC_MAP_ALLOC, smmu_cmdqv_base + SMMU_CMDQV_CMDQ_ALLOC_MAP(VCMDQ));
+        smmu_vintf_write32(smmu_cmdqv_base, SMMU_VINTF_CONFIG, 0);
+    }
+
+    if (parent_gpu->smmu_war.smmu_cmdq)
+        __free_page(parent_gpu->smmu_war.smmu_cmdq);
+
+    if (parent_gpu->smmu_war.smmu_cmdqv_base)
+        iounmap(parent_gpu->smmu_war.smmu_cmdqv_base);
+}
+
+// The SMMU on ARM64 can run under different translation regimes depending on
+// what features the OS and CPU variant support. The CPU for GH180 supports
+// virtualisation extensions and starts the kernel at EL2 meaning SMMU operates
+// under the NS-EL2-E2H translation regime. Therefore we need to use the
+// TLBI_EL2_* commands which invalidate TLB entries created under this
+// translation regime.
+#define CMDQ_OP_TLBI_EL2_ASID 0x21;
+#define CMDQ_OP_TLBI_EL2_VA 0x22;
+#define CMDQ_OP_CMD_SYNC 0x46
+
+// Use the same maximum as used for MAX_TLBI_OPS in the upstream
+// kernel.
+#define UVM_MAX_TLBI_OPS (1UL << (PAGE_SHIFT - 3))
+
+#if UVM_ATS_SMMU_WAR_REQUIRED()
+void uvm_ats_smmu_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space, NvU64 addr, size_t size)
+{
+    struct mm_struct *mm = gpu_va_space->va_space->va_space_mm.mm;
+    uvm_parent_gpu_t *parent_gpu = gpu_va_space->gpu->parent;
+    struct {
+        NvU64 low;
+        NvU64 high;
+    } *vcmdq;
+    unsigned long vcmdq_prod;
+    NvU64 end;
+    uvm_spin_loop_t spin;
+    NvU16 asid;
+
+    if (!parent_gpu->smmu_war.smmu_cmdqv_base)
+        return;
+
+    asid = arm64_mm_context_get(mm);
+    vcmdq = kmap(parent_gpu->smmu_war.smmu_cmdq);
+    uvm_mutex_lock(&parent_gpu->smmu_war.smmu_lock);
+    vcmdq_prod = parent_gpu->smmu_war.smmu_prod;
+
+    // Our queue management is very simple. The mutex prevents multiple
+    // producers writing to the queue and all our commands require waiting for
+    // the queue to drain so we know it's empty. If we can't fit enough commands
+    // in the queue we just invalidate the whole ASID.
+    //
+    // The command queue is a cirular buffer with the MSB representing a wrap
+    // bit that must toggle on each wrap. See the SMMU architecture
+    // specification for more details.
+    //
+    // SMMU_VCMDQ_CMDQ_ENTRIES - 1 because we need to leave space for the
+    // CMD_SYNC.
+    if ((size >> PAGE_SHIFT) > min(UVM_MAX_TLBI_OPS, SMMU_VCMDQ_CMDQ_ENTRIES - 1)) {
+        vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].low = CMDQ_OP_TLBI_EL2_ASID;
+        vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].low |= (NvU64) asid << 48;
+        vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].high = 0;
+        vcmdq_prod++;
+    }
+    else {
+        for (end = addr + size; addr < end; addr += PAGE_SIZE) {
+            vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].low = CMDQ_OP_TLBI_EL2_VA;
+            vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].low |= (NvU64) asid << 48;
+            vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].high = addr & ~((1UL << 12) - 1);
+            vcmdq_prod++;
+        }
+    }
+
+    vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].low = CMDQ_OP_CMD_SYNC;
+    vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].high = 0x0;
+    vcmdq_prod++;
+
+    // MSB is the wrap bit
+    vcmdq_prod &= (1UL << (SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE + 1)) - 1;
+    parent_gpu->smmu_war.smmu_prod = vcmdq_prod;
+    smmu_vcmdq_write32(parent_gpu->smmu_war.smmu_cmdqv_base, SMMU_VCMDQ_PROD, parent_gpu->smmu_war.smmu_prod);
+
+    UVM_SPIN_WHILE(
+        (smmu_vcmdq_read32(parent_gpu->smmu_war.smmu_cmdqv_base, SMMU_VCMDQ_CONS) & GENMASK(19, 0)) != vcmdq_prod,
+        &spin);
+
+    uvm_mutex_unlock(&parent_gpu->smmu_war.smmu_lock);
+    kunmap(parent_gpu->smmu_war.smmu_cmdq);
+    arm64_mm_context_put(mm);
+}
+#endif
+
 NV_STATUS uvm_ats_sva_add_gpu(uvm_parent_gpu_t *parent_gpu)
 {
    int ret;

    ret = iommu_dev_enable_feature(&parent_gpu->pci_dev->dev, IOMMU_DEV_FEAT_SVA);
+    if (ret)
+        return errno_to_nv_status(ret);

-    return errno_to_nv_status(ret);
+    if (UVM_ATS_SMMU_WAR_REQUIRED())
+        return uvm_ats_smmu_war_init(parent_gpu);
+    else
+        return NV_OK;
 }

 void uvm_ats_sva_remove_gpu(uvm_parent_gpu_t *parent_gpu)
 {
+    if (UVM_ATS_SMMU_WAR_REQUIRED())
+        uvm_ats_smmu_war_deinit(parent_gpu);
+
    iommu_dev_disable_feature(&parent_gpu->pci_dev->dev, IOMMU_DEV_FEAT_SVA);
 }

--- a/kernel-open/nvidia-uvm/uvm_ats_sva.h
+++ b/kernel-open/nvidia-uvm/uvm_ats_sva.h
@@ -32,23 +32,38 @@
 // For ATS support on aarch64, arm_smmu_sva_bind() is needed for
 // iommu_sva_bind_device() calls. Unfortunately, arm_smmu_sva_bind() is not
 // conftest-able. We instead look for the presence of ioasid_get() or
-// mm_pasid_set(). ioasid_get() was added in the same patch series as
-// arm_smmu_sva_bind() and removed in v6.0. mm_pasid_set() was added in the
+// mm_pasid_drop(). ioasid_get() was added in the same patch series as
+// arm_smmu_sva_bind() and removed in v6.0. mm_pasid_drop() was added in the
 // same patch as the removal of ioasid_get(). We assume the presence of
-// arm_smmu_sva_bind() if ioasid_get(v5.11 - v5.17) or mm_pasid_set(v5.18+) is
+// arm_smmu_sva_bind() if ioasid_get(v5.11 - v5.17) or mm_pasid_drop(v5.18+) is
 // present.
 //
 // arm_smmu_sva_bind() was added with commit
 // 32784a9562fb0518b12e9797ee2aec52214adf6f and ioasid_get() was added with
 // commit cb4789b0d19ff231ce9f73376a023341300aed96 (11/23/2020). Commit
 // 701fac40384f07197b106136012804c3cae0b3de (02/15/2022) removed ioasid_get()
-// and added mm_pasid_set().
-    #if UVM_CAN_USE_MMU_NOTIFIERS() && (defined(NV_IOASID_GET_PRESENT) || defined(NV_MM_PASID_SET_PRESENT))
-        #define UVM_ATS_SVA_SUPPORTED() 1
+// and added mm_pasid_drop().
+    #if UVM_CAN_USE_MMU_NOTIFIERS() && (defined(NV_IOASID_GET_PRESENT) || defined(NV_MM_PASID_DROP_PRESENT))
+        #if defined(CONFIG_IOMMU_SVA)
+            #define UVM_ATS_SVA_SUPPORTED() 1
+        #else
+            #define UVM_ATS_SVA_SUPPORTED() 0
+        #endif
    #else
        #define UVM_ATS_SVA_SUPPORTED() 0
    #endif

+// If NV_ARCH_INVALIDATE_SECONDARY_TLBS is defined it means the upstream fix is
+// in place so no need for the WAR from Bug 4130089: [GH180][r535] WAR for
+// kernel not issuing SMMU TLB invalidates on read-only
+#if defined(NV_ARCH_INVALIDATE_SECONDARY_TLBS)
+    #define UVM_ATS_SMMU_WAR_REQUIRED() 0
+#elif NVCPU_IS_AARCH64
+    #define UVM_ATS_SMMU_WAR_REQUIRED() 1
+#else
+    #define UVM_ATS_SMMU_WAR_REQUIRED() 0
+#endif
+
 typedef struct
 {
    int placeholder;
@@ -77,6 +92,17 @@ typedef struct

    // LOCKING: None
    void uvm_ats_sva_unregister_gpu_va_space(uvm_gpu_va_space_t *gpu_va_space);
+
+    // Fix for Bug 4130089: [GH180][r535] WAR for kernel not issuing SMMU
+    // TLB invalidates on read-only to read-write upgrades
+    #if UVM_ATS_SMMU_WAR_REQUIRED()
+        void uvm_ats_smmu_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space, NvU64 addr, size_t size);
+    #else
+        static void uvm_ats_smmu_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space, NvU64 addr, size_t size)
+        {
+
+        }
+    #endif
 #else
    static NV_STATUS uvm_ats_sva_add_gpu(uvm_parent_gpu_t *parent_gpu)
    {
@@ -107,6 +133,11 @@ typedef struct
    {

    }
+
+    static void uvm_ats_smmu_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space, NvU64 addr, size_t size)
+    {
+
+    }
 #endif // UVM_ATS_SVA_SUPPORTED

 #endif // __UVM_ATS_SVA_H__
--- a/kernel-open/nvidia-uvm/uvm_ce_test.c
+++ b/kernel-open/nvidia-uvm/uvm_ce_test.c
@@ -191,7 +191,7 @@ static NV_STATUS test_membar(uvm_gpu_t *gpu)

    for (i = 0; i < REDUCTIONS; ++i) {
        uvm_push_set_flag(&push, UVM_PUSH_FLAG_NEXT_MEMBAR_NONE);
-        gpu->parent->ce_hal->semaphore_reduction_inc(&push, host_mem_gpu_va, REDUCTIONS + 1);
+        gpu->parent->ce_hal->semaphore_reduction_inc(&push, host_mem_gpu_va, REDUCTIONS);
    }

    // Without a sys membar the channel tracking semaphore can and does complete
@@ -577,7 +577,7 @@ static NV_STATUS test_semaphore_reduction_inc(uvm_gpu_t *gpu)

    for (i = 0; i < REDUCTIONS; i++) {
        uvm_push_set_flag(&push, UVM_PUSH_FLAG_NEXT_MEMBAR_NONE);
-        gpu->parent->ce_hal->semaphore_reduction_inc(&push, gpu_va, i+1);
+        gpu->parent->ce_hal->semaphore_reduction_inc(&push, gpu_va, REDUCTIONS);
    }

    status = uvm_push_end_and_wait(&push);
@@ -760,7 +760,7 @@ static NV_STATUS alloc_vidmem_protected(uvm_gpu_t *gpu, uvm_mem_t **mem, size_t

    *mem = NULL;

-    TEST_NV_CHECK_RET(uvm_mem_alloc_vidmem_protected(size, gpu, mem));
+    TEST_NV_CHECK_RET(uvm_mem_alloc_vidmem(size, gpu, mem));
    TEST_NV_CHECK_GOTO(uvm_mem_map_gpu_kernel(*mem, gpu), err);
    TEST_NV_CHECK_GOTO(zero_vidmem(*mem), err);

--- a/kernel-open/nvidia-uvm/uvm_channel.c
+++ b/kernel-open/nvidia-uvm/uvm_channel.c
@@ -272,19 +272,26 @@ static bool try_claim_channel(uvm_channel_t *channel, NvU32 num_gpfifo_entries)

 static void unlock_channel_for_push(uvm_channel_t *channel)
 {
-    if (uvm_channel_is_secure(channel)) {
-        NvU32 index = uvm_channel_index_in_pool(channel);
+    NvU32 index;
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);

-        uvm_channel_pool_assert_locked(channel->pool);
-        UVM_ASSERT(test_bit(index, channel->pool->push_locks));
-        __clear_bit(index, channel->pool->push_locks);
-        uvm_up_out_of_order(&channel->pool->push_sem);
-    }
+    if (!uvm_conf_computing_mode_enabled(gpu))
+        return;
+
+    index = uvm_channel_index_in_pool(channel);
+
+    uvm_channel_pool_assert_locked(channel->pool);
+    UVM_ASSERT(test_bit(index, channel->pool->push_locks));
+
+    __clear_bit(index, channel->pool->push_locks);
+    uvm_up_out_of_order(&channel->pool->push_sem);
 }

 static bool is_channel_locked_for_push(uvm_channel_t *channel)
 {
-    if (uvm_channel_is_secure(channel))
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
+
+    if (uvm_conf_computing_mode_enabled(gpu))
        return test_bit(uvm_channel_index_in_pool(channel), channel->pool->push_locks);

    // For CE and proxy channels, we always return that the channel is locked,
@@ -295,25 +302,25 @@ static bool is_channel_locked_for_push(uvm_channel_t *channel)

 static void lock_channel_for_push(uvm_channel_t *channel)
 {
-    if (uvm_channel_is_secure(channel)) {
-        NvU32 index = uvm_channel_index_in_pool(channel);
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
+    NvU32 index = uvm_channel_index_in_pool(channel);

-        uvm_channel_pool_assert_locked(channel->pool);
+    UVM_ASSERT(uvm_conf_computing_mode_enabled(gpu));
+    uvm_channel_pool_assert_locked(channel->pool);
+    UVM_ASSERT(!test_bit(index, channel->pool->push_locks));

-        UVM_ASSERT(!test_bit(index, channel->pool->push_locks));
-        __set_bit(index, channel->pool->push_locks);
-    }
+    __set_bit(index, channel->pool->push_locks);
 }

 static bool test_claim_and_lock_channel(uvm_channel_t *channel, NvU32 num_gpfifo_entries)
 {
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
    NvU32 index = uvm_channel_index_in_pool(channel);

+    UVM_ASSERT(uvm_conf_computing_mode_enabled(gpu));
    uvm_channel_pool_assert_locked(channel->pool);

-    if (uvm_channel_is_secure(channel) &&
-        !test_bit(index, channel->pool->push_locks) &&
-        try_claim_channel_locked(channel, num_gpfifo_entries)) {
+    if (!test_bit(index, channel->pool->push_locks) && try_claim_channel_locked(channel, num_gpfifo_entries)) {
        lock_channel_for_push(channel);
        return true;
    }
@@ -321,57 +328,15 @@ static bool test_claim_and_lock_channel(uvm_channel_t *channel, NvU32 num_gpfifo
    return false;
 }

-// Reserve a channel in the specified CE pool
-static NV_STATUS channel_reserve_in_ce_pool(uvm_channel_pool_t *pool, uvm_channel_t **channel_out)
-{
-    uvm_channel_t *channel;
-    uvm_spin_loop_t spin;
-
-    UVM_ASSERT(pool);
-    UVM_ASSERT(uvm_channel_pool_is_ce(pool));
-
-    uvm_for_each_channel_in_pool(channel, pool) {
-        // TODO: Bug 1764953: Prefer idle/less busy channels
-        if (try_claim_channel(channel, 1)) {
-            *channel_out = channel;
-            return NV_OK;
-        }
-    }
-
-    uvm_spin_loop_init(&spin);
-    while (1) {
-        uvm_for_each_channel_in_pool(channel, pool) {
-            NV_STATUS status;
-
-            uvm_channel_update_progress(channel);
-
-            if (try_claim_channel(channel, 1)) {
-                *channel_out = channel;
-
-                return NV_OK;
-            }
-
-            status = uvm_channel_check_errors(channel);
-            if (status != NV_OK)
-                return status;
-
-            UVM_SPIN_LOOP(&spin);
-        }
-    }
-
-    UVM_ASSERT_MSG(0, "Cannot get here?!\n");
-    return NV_ERR_GENERIC;
-}
-
-// Reserve a channel in the specified secure pool
-static NV_STATUS channel_reserve_in_secure_pool(uvm_channel_pool_t *pool, uvm_channel_t **channel_out)
+// Reserve a channel in the specified pool. The channel is locked until the push
+// ends
+static NV_STATUS channel_reserve_and_lock_in_pool(uvm_channel_pool_t *pool, uvm_channel_t **channel_out)
 {
    uvm_channel_t *channel;
    uvm_spin_loop_t spin;
    NvU32 index;

    UVM_ASSERT(pool);
-    UVM_ASSERT(pool->secure);
    UVM_ASSERT(uvm_conf_computing_mode_enabled(pool->manager->gpu));

    // This semaphore is uvm_up() in unlock_channel_for_push() as part of the
@@ -426,6 +391,51 @@ done:
    return NV_OK;
 }

+// Reserve a channel in the specified pool
+static NV_STATUS channel_reserve_in_pool(uvm_channel_pool_t *pool, uvm_channel_t **channel_out)
+{
+    uvm_channel_t *channel;
+    uvm_spin_loop_t spin;
+
+    UVM_ASSERT(pool);
+
+    if (uvm_conf_computing_mode_enabled(pool->manager->gpu))
+        return channel_reserve_and_lock_in_pool(pool, channel_out);
+
+    uvm_for_each_channel_in_pool(channel, pool) {
+        // TODO: Bug 1764953: Prefer idle/less busy channels
+        if (try_claim_channel(channel, 1)) {
+            *channel_out = channel;
+            return NV_OK;
+        }
+    }
+
+    uvm_spin_loop_init(&spin);
+    while (1) {
+        uvm_for_each_channel_in_pool(channel, pool) {
+            NV_STATUS status;
+
+            uvm_channel_update_progress(channel);
+
+            if (try_claim_channel(channel, 1)) {
+                *channel_out = channel;
+
+                return NV_OK;
+            }
+
+            status = uvm_channel_check_errors(channel);
+            if (status != NV_OK)
+                return status;
+
+            UVM_SPIN_LOOP(&spin);
+        }
+    }
+
+    UVM_ASSERT_MSG(0, "Cannot get here?!\n");
+
+    return NV_ERR_GENERIC;
+}
+
 NV_STATUS uvm_channel_reserve_type(uvm_channel_manager_t *manager, uvm_channel_type_t type, uvm_channel_t **channel_out)
 {
    uvm_channel_pool_t *pool = manager->pool_to_use.default_for_type[type];
@@ -433,10 +443,7 @@ NV_STATUS uvm_channel_reserve_type(uvm_channel_manager_t *manager, uvm_channel_t
    UVM_ASSERT(pool != NULL);
    UVM_ASSERT(type < UVM_CHANNEL_TYPE_COUNT);

-    if (pool->secure)
-        return channel_reserve_in_secure_pool(pool, channel_out);
-
-    return channel_reserve_in_ce_pool(pool, channel_out);
+    return channel_reserve_in_pool(pool, channel_out);
 }

 NV_STATUS uvm_channel_reserve_gpu_to_gpu(uvm_channel_manager_t *manager,
@@ -452,10 +459,7 @@ NV_STATUS uvm_channel_reserve_gpu_to_gpu(uvm_channel_manager_t *manager,

    UVM_ASSERT(pool->pool_type == UVM_CHANNEL_POOL_TYPE_CE);

-    if (pool->secure)
-        return channel_reserve_in_secure_pool(pool, channel_out);
-
-    return channel_reserve_in_ce_pool(pool, channel_out);
+    return channel_reserve_in_pool(pool, channel_out);
 }

 NV_STATUS uvm_channel_manager_wait(uvm_channel_manager_t *manager)
@@ -491,7 +495,7 @@ static NvU32 channel_get_available_push_info_index(uvm_channel_t *channel)
    return push_info - channel->push_infos;
 }

-static void channel_semaphore_gpu_encrypt_payload(uvm_push_t *push, uvm_channel_t *channel, NvU64 semaphore_va)
+static void channel_semaphore_gpu_encrypt_payload(uvm_push_t *push, NvU64 semaphore_va)
 {
    NvU32 iv_index;
    uvm_gpu_address_t notifier_gpu_va;
@@ -499,12 +503,14 @@ static void channel_semaphore_gpu_encrypt_payload(uvm_push_t *push, uvm_channel_
    uvm_gpu_address_t semaphore_gpu_va;
    uvm_gpu_address_t encrypted_payload_gpu_va;
    uvm_gpu_t *gpu = push->gpu;
+    uvm_channel_t *channel = push->channel;
    uvm_gpu_semaphore_t *semaphore = &channel->tracking_sem.semaphore;
    UvmCslIv *iv_cpu_addr = semaphore->conf_computing.ivs;
    NvU32 payload_size = sizeof(*semaphore->payload);
    NvU32 *last_pushed_notifier = &semaphore->conf_computing.last_pushed_notifier;

-    UVM_ASSERT(uvm_channel_is_secure_ce(channel));
+    UVM_ASSERT(uvm_conf_computing_mode_enabled(gpu));
+    UVM_ASSERT(uvm_channel_is_ce(channel));

    encrypted_payload_gpu_va = uvm_rm_mem_get_gpu_va(semaphore->conf_computing.encrypted_payload, gpu, false);
    notifier_gpu_va = uvm_rm_mem_get_gpu_va(semaphore->conf_computing.notifier, gpu, false);
@@ -538,19 +544,21 @@ NV_STATUS uvm_channel_begin_push(uvm_channel_t *channel, uvm_push_t *push)
 {
    NV_STATUS status;
    uvm_channel_manager_t *manager;
+    uvm_gpu_t *gpu;

    UVM_ASSERT(channel);
    UVM_ASSERT(push);

    manager = channel->pool->manager;

+    gpu = uvm_channel_get_gpu(channel);
+
    // Only SEC2 and WLC with set up fixed schedule can use direct push
    // submission. All other cases (including WLC pre-schedule) need to
    // reserve a launch channel that will be used to submit this push
    // indirectly.
-    if (uvm_conf_computing_mode_enabled(uvm_channel_get_gpu(channel)) &&
-        !(uvm_channel_is_wlc(channel) && uvm_channel_manager_is_wlc_ready(manager)) &&
-        !uvm_channel_is_sec2(channel)) {
+    if (uvm_conf_computing_mode_enabled(gpu) && uvm_channel_is_ce(channel) &&
+        !(uvm_channel_is_wlc(channel) && uvm_channel_manager_is_wlc_ready(manager))) {
        uvm_channel_type_t indirect_channel_type = uvm_channel_manager_is_wlc_ready(manager) ?
                                                   UVM_CHANNEL_TYPE_WLC :
                                                   UVM_CHANNEL_TYPE_SEC2;
@@ -559,9 +567,9 @@ NV_STATUS uvm_channel_begin_push(uvm_channel_t *channel, uvm_push_t *push)
            return status;
    }

-    // For secure channels, channel's lock should have been acquired in
-    // uvm_channel_reserve() or channel_reserve_in_secure_pool() before
-    // reaching here.
+    // When the Confidential Computing feature is enabled, the channel's lock
+    // should have already been acquired in uvm_channel_reserve() or
+    // channel_reserve_and_lock_in_pool().
    UVM_ASSERT(is_channel_locked_for_push(channel));

    push->channel = channel;
@@ -586,9 +594,8 @@ static void internal_channel_submit_work(uvm_push_t *push, NvU32 push_size, NvU3
    NvU64 *gpfifo_entry;
    NvU64 pushbuffer_va;
    uvm_channel_t *channel = push->channel;
-    uvm_channel_manager_t *channel_manager = channel->pool->manager;
-    uvm_pushbuffer_t *pushbuffer = channel_manager->pushbuffer;
-    uvm_gpu_t *gpu = channel_manager->gpu;
+    uvm_pushbuffer_t *pushbuffer = uvm_channel_get_pushbuffer(channel);
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);

    BUILD_BUG_ON(sizeof(*gpfifo_entry) != NVB06F_GP_ENTRY__SIZE);
    UVM_ASSERT(!uvm_channel_is_proxy(channel));
@@ -644,12 +651,11 @@ static void proxy_channel_submit_work(uvm_push_t *push, NvU32 push_size)
 static void do_semaphore_release(uvm_push_t *push, NvU64 semaphore_va, NvU32 new_payload)
 {
    uvm_gpu_t *gpu = uvm_push_get_gpu(push);
+
    if (uvm_channel_is_ce(push->channel))
        gpu->parent->ce_hal->semaphore_release(push, semaphore_va, new_payload);
-    else if (uvm_channel_is_sec2(push->channel))
-        gpu->parent->sec2_hal->semaphore_release(push, semaphore_va, new_payload);
    else
-        UVM_ASSERT_MSG(0, "Semaphore release on an unsupported channel.\n");
+        gpu->parent->sec2_hal->semaphore_release(push, semaphore_va, new_payload);
 }

 static void uvm_channel_tracking_semaphore_release(uvm_push_t *push, NvU64 semaphore_va, NvU32 new_payload)
@@ -668,8 +674,8 @@ static void uvm_channel_tracking_semaphore_release(uvm_push_t *push, NvU64 semap
    // needs to be scheduled to get an encrypted shadow copy in unprotected
    // sysmem. This allows UVM to later decrypt it and observe the new
    // semaphore value.
-    if (uvm_channel_is_secure_ce(push->channel))
-        channel_semaphore_gpu_encrypt_payload(push, push->channel, semaphore_va);
+    if (uvm_conf_computing_mode_enabled(push->gpu) && uvm_channel_is_ce(push->channel))
+        channel_semaphore_gpu_encrypt_payload(push, semaphore_va);
 }

 static uvm_channel_t *get_paired_channel(uvm_channel_t *channel)
@@ -746,15 +752,12 @@ static void internal_channel_submit_work_wlc(uvm_push_t *push)
    wmb();

    // Ring the WLC doorbell to start processing the above push
-    UVM_GPU_WRITE_ONCE(*wlc_channel->channel_info.workSubmissionOffset,
-                       wlc_channel->channel_info.workSubmissionToken);
+    UVM_GPU_WRITE_ONCE(*wlc_channel->channel_info.workSubmissionOffset, wlc_channel->channel_info.workSubmissionToken);
 }

-static void internal_channel_submit_work_indirect_wlc(uvm_push_t *push,
-                                                      NvU32 old_cpu_put,
-                                                      NvU32 new_gpu_put)
+static void internal_channel_submit_work_indirect_wlc(uvm_push_t *push, NvU32 old_cpu_put, NvU32 new_gpu_put)
 {
-    uvm_pushbuffer_t *pushbuffer = push->channel->pool->manager->pushbuffer;
+    uvm_pushbuffer_t *pushbuffer = uvm_channel_get_pushbuffer(push->channel);
    uvm_gpu_t *gpu = uvm_push_get_gpu(push);

    uvm_push_t indirect_push;
@@ -767,7 +770,7 @@ static void internal_channel_submit_work_indirect_wlc(uvm_push_t *push,
    uvm_gpu_address_t push_enc_auth_tag_gpu;
    NvU64 gpfifo_gpu_va = push->channel->channel_info.gpFifoGpuVa + old_cpu_put * sizeof(gpfifo_entry);

-    UVM_ASSERT(!uvm_channel_is_sec2(push->channel));
+    UVM_ASSERT(uvm_channel_is_ce(push->channel));
    UVM_ASSERT(uvm_channel_is_wlc(push->launch_channel));

    // WLC submissions are done under channel lock, so there should be no
@@ -848,8 +851,6 @@ static void update_gpput_via_sec2(uvm_push_t *sec2_push, uvm_channel_t *channel,
                                                           UVM_CONF_COMPUTING_AUTH_TAG_ALIGNMENT,
                                                           &gpput_auth_tag_gpu);

-
-
    // Update GPPUT. The update needs 4B write to specific offset,
    // however we can only do 16B aligned decrypt writes.
    // A poison value is written to all other locations, this is ignored in
@@ -922,7 +923,7 @@ static void set_gpfifo_via_sec2(uvm_push_t *sec2_push, uvm_channel_t *channel, N
            gpfifo_scratchpad[0] = previous_gpfifo->control_value;
        }
        else {
-            uvm_pushbuffer_t *pushbuffer = channel->pool->manager->pushbuffer;
+            uvm_pushbuffer_t *pushbuffer = uvm_channel_get_pushbuffer(channel);
            NvU64 prev_pb_va = uvm_pushbuffer_get_gpu_va_base(pushbuffer) + previous_gpfifo->pushbuffer_offset;

            // Reconstruct the previous gpfifo entry. UVM_GPFIFO_SYNC_WAIT is
@@ -951,11 +952,9 @@ static void set_gpfifo_via_sec2(uvm_push_t *sec2_push, uvm_channel_t *channel, N
                                   gpfifo_auth_tag_gpu.address);
 }

-static NV_STATUS internal_channel_submit_work_indirect_sec2(uvm_push_t *push,
-                                                            NvU32 old_cpu_put,
-                                                            NvU32 new_gpu_put)
+static NV_STATUS internal_channel_submit_work_indirect_sec2(uvm_push_t *push, NvU32 old_cpu_put, NvU32 new_gpu_put)
 {
-    uvm_pushbuffer_t *pushbuffer = push->channel->pool->manager->pushbuffer;
+    uvm_pushbuffer_t *pushbuffer = uvm_channel_get_pushbuffer(push->channel);
    uvm_gpu_t *gpu = uvm_push_get_gpu(push);

    uvm_push_t indirect_push;
@@ -968,7 +967,7 @@ static NV_STATUS internal_channel_submit_work_indirect_sec2(uvm_push_t *push,
    uvm_gpu_address_t push_auth_tag_gpu;
    uvm_spin_loop_t spin;

-    UVM_ASSERT(!uvm_channel_is_sec2(push->channel));
+    UVM_ASSERT(uvm_channel_is_ce(push->channel));
    UVM_ASSERT(uvm_channel_is_sec2(push->launch_channel));

    // If the old_cpu_put is not equal to the last gpu put, other pushes are
@@ -1051,7 +1050,7 @@ static void encrypt_push(uvm_push_t *push)
    uvm_gpu_t *gpu = uvm_push_get_gpu(push);
    NvU32 push_size = uvm_push_get_size(push);
    uvm_push_info_t *push_info = uvm_push_info_from_push(push);
-    uvm_pushbuffer_t *pushbuffer = channel->pool->manager->pushbuffer;
+    uvm_pushbuffer_t *pushbuffer = uvm_channel_get_pushbuffer(channel);
    unsigned auth_tag_offset = UVM_CONF_COMPUTING_AUTH_TAG_SIZE * push->push_info_index;

    if (!uvm_conf_computing_mode_enabled(gpu))
@@ -1098,6 +1097,7 @@ void uvm_channel_end_push(uvm_push_t *push)
    NvU32 push_size;
    NvU32 cpu_put;
    NvU32 new_cpu_put;
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
    bool needs_sec2_work_submit = false;

    channel_pool_lock(channel->pool);
@@ -1112,7 +1112,7 @@ void uvm_channel_end_push(uvm_push_t *push)

    if (uvm_channel_is_wlc(channel) && uvm_channel_manager_is_wlc_ready(channel_manager)) {
        uvm_channel_t *paired_lcic = wlc_get_paired_lcic(channel);
-        uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
+
        gpu->parent->ce_hal->semaphore_reduction_inc(push,
                                                     paired_lcic->channel_info.gpPutGpuVa,
                                                     paired_lcic->num_gpfifo_entries - 1);
@@ -1126,7 +1126,7 @@ void uvm_channel_end_push(uvm_push_t *push)
            // pushes. However, direct pushes to WLC can be smaller than this
            // size. This is used e.g. by indirect submission of control
            // gpfifo entries.
-            channel_manager->gpu->parent->host_hal->noop(push, UVM_MAX_WLC_PUSH_SIZE - uvm_push_get_size(push));
+            gpu->parent->host_hal->noop(push, UVM_MAX_WLC_PUSH_SIZE - uvm_push_get_size(push));
        }
    }

@@ -1144,7 +1144,7 @@ void uvm_channel_end_push(uvm_push_t *push)
    // Indirect submission via SEC2/WLC needs pushes to be aligned for
    // encryption/decryption. The pushbuffer_size of this push
    // influences starting address of the next push.
-    if (uvm_conf_computing_mode_enabled(uvm_channel_get_gpu(channel)))
+    if (uvm_conf_computing_mode_enabled(gpu))
        entry->pushbuffer_size = UVM_ALIGN_UP(push_size, UVM_CONF_COMPUTING_BUF_ALIGNMENT);
    entry->push_info = &channel->push_infos[push->push_info_index];
    entry->type = UVM_GPFIFO_ENTRY_TYPE_NORMAL;
@@ -1158,12 +1158,13 @@ void uvm_channel_end_push(uvm_push_t *push)
    else if (uvm_channel_is_wlc(channel) && uvm_channel_manager_is_wlc_ready(channel_manager)) {
        internal_channel_submit_work_wlc(push);
    }
-    else if (uvm_conf_computing_mode_enabled(channel_manager->gpu) && !uvm_channel_is_sec2(channel)) {
+    else if (uvm_conf_computing_mode_enabled(gpu) && uvm_channel_is_ce(channel)) {
        if (uvm_channel_manager_is_wlc_ready(channel_manager)) {
            internal_channel_submit_work_indirect_wlc(push, cpu_put, new_cpu_put);
        }
        else {
-            // submitting via SEC2 starts a push, postpone until this push is ended
+            // submitting via SEC2 starts a push, postpone until this push is
+            // ended
            needs_sec2_work_submit = true;
        }
    }
@@ -1202,12 +1203,13 @@ void uvm_channel_end_push(uvm_push_t *push)

 static void submit_ctrl_gpfifo(uvm_channel_t *channel, uvm_gpfifo_entry_t *entry, NvU32 new_cpu_put)
 {
-    uvm_gpu_t *gpu = channel->pool->manager->gpu;
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
    NvU32 cpu_put = channel->cpu_put;
    NvU64 *gpfifo_entry;

    UVM_ASSERT(entry == &channel->gpfifo_entries[cpu_put]);
-    if (uvm_conf_computing_mode_enabled(gpu) && !uvm_channel_is_sec2(channel))
+
+    if (uvm_conf_computing_mode_enabled(gpu) && uvm_channel_is_ce(channel))
        return;

    gpfifo_entry = (NvU64*)channel->channel_info.gpFifoEntries + cpu_put;
@@ -1234,7 +1236,7 @@ static NV_STATUS submit_ctrl_gpfifo_indirect(uvm_channel_t *channel,
                                               UVM_CHANNEL_TYPE_WLC :
                                               UVM_CHANNEL_TYPE_SEC2;

-    UVM_ASSERT(!uvm_channel_is_sec2(channel));
+    UVM_ASSERT(uvm_channel_is_ce(channel));

    // If the old_cpu_put is not equal to the last gpu put,
    // Another push(es) is pending that needs to be submitted.
@@ -1290,6 +1292,7 @@ static void write_ctrl_gpfifo(uvm_channel_t *channel, NvU64 ctrl_fifo_entry_valu
    NvU32 cpu_put;
    NvU32 new_cpu_put;
    bool needs_indirect_submit = false;
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);

    channel_pool_lock(channel->pool);

@@ -1312,7 +1315,7 @@ static void write_ctrl_gpfifo(uvm_channel_t *channel, NvU64 ctrl_fifo_entry_valu
    --channel->current_gpfifo_count;

    submit_ctrl_gpfifo(channel, entry, new_cpu_put);
-    if (uvm_conf_computing_mode_enabled(channel->pool->manager->gpu) && !uvm_channel_is_sec2(channel))
+    if (uvm_conf_computing_mode_enabled(gpu) && uvm_channel_is_ce(channel))
        needs_indirect_submit = true;

    channel->cpu_put = new_cpu_put;
@@ -1385,16 +1388,15 @@ NV_STATUS uvm_channel_write_ctrl_gpfifo(uvm_channel_t *channel, NvU64 ctrl_fifo_
    return NV_OK;
 }

-static NV_STATUS uvm_channel_reserve_secure(uvm_channel_t *channel, NvU32 num_gpfifo_entries)
+static NV_STATUS channel_reserve_and_lock(uvm_channel_t *channel, NvU32 num_gpfifo_entries)
 {
    uvm_spin_loop_t spin;
-    NV_STATUS status = NV_OK;
    uvm_channel_pool_t *pool = channel->pool;

    // This semaphore is uvm_up() in unlock_channel_for_push() as part of the
    // uvm_channel_end_push() routine. Note that different than in
-    // channel_reserve_in_secure_pool, we cannot pick an unlocked channel from
-    // the secure pool, even when there is one available and *channel is locked.
+    // channel_reserve_and_lock_in_pool, we cannot pick an unlocked channel from
+    // the pool, even when there is one available and *channel is locked.
    // Not a concern given that uvm_channel_reserve() is not the common-case for
    // channel reservation, and only used for channel initialization, GPFIFO
    // control work submission, and testing.
@@ -1409,6 +1411,8 @@ static NV_STATUS uvm_channel_reserve_secure(uvm_channel_t *channel, NvU32 num_gp

    uvm_spin_loop_init(&spin);
    while (1) {
+        NV_STATUS status;
+
        uvm_channel_update_progress(channel);

        channel_pool_lock(pool);
@@ -1436,9 +1440,10 @@ NV_STATUS uvm_channel_reserve(uvm_channel_t *channel, NvU32 num_gpfifo_entries)
 {
    NV_STATUS status = NV_OK;
    uvm_spin_loop_t spin;
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);

-    if (uvm_channel_is_secure(channel))
-        return uvm_channel_reserve_secure(channel, num_gpfifo_entries);
+    if (uvm_conf_computing_mode_enabled(gpu))
+        return channel_reserve_and_lock(channel, num_gpfifo_entries);

    if (try_claim_channel(channel, num_gpfifo_entries))
        return NV_OK;
@@ -1578,8 +1583,10 @@ NvU64 uvm_channel_update_completed_value(uvm_channel_t *channel)
 static NV_STATUS csl_init(uvm_channel_t *channel)
 {
    NV_STATUS status;
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
+
+    UVM_ASSERT(uvm_conf_computing_mode_enabled(gpu));

-    UVM_ASSERT(uvm_channel_is_secure(channel));
    uvm_mutex_init(&channel->csl.ctx_lock, UVM_LOCK_ORDER_LEAF);

    status = uvm_rm_locked_call(nvUvmInterfaceCslInitContext(&channel->csl.ctx, channel->handle));
@@ -1589,7 +1596,7 @@ static NV_STATUS csl_init(uvm_channel_t *channel)
    else {
        UVM_DBG_PRINT("nvUvmInterfaceCslInitContext() failed: %s, GPU %s\n",
                      nvstatusToString(status),
-                      uvm_gpu_name(channel->pool->manager->gpu));
+                      uvm_gpu_name(gpu));
    }

    return status;
@@ -1609,7 +1616,10 @@ static void csl_destroy(uvm_channel_t *channel)

 static void free_conf_computing_buffers(uvm_channel_t *channel)
 {
-    UVM_ASSERT(uvm_channel_is_secure_ce(channel));
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
+
+    UVM_ASSERT(uvm_conf_computing_mode_enabled(gpu));
+    UVM_ASSERT(uvm_channel_is_ce(channel));

    uvm_rm_mem_free(channel->conf_computing.static_pb_protected_vidmem);
    uvm_rm_mem_free(channel->conf_computing.static_pb_unprotected_sysmem);
@@ -1637,10 +1647,12 @@ static void free_conf_computing_buffers(uvm_channel_t *channel)
 static NV_STATUS alloc_conf_computing_buffers_semaphore(uvm_channel_t *channel)
 {
    uvm_gpu_semaphore_t *semaphore = &channel->tracking_sem.semaphore;
-    uvm_gpu_t *gpu = channel->pool->manager->gpu;
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
    NV_STATUS status;

-    UVM_ASSERT(uvm_channel_is_secure_ce(channel));
+    UVM_ASSERT(uvm_conf_computing_mode_enabled(gpu));
+    UVM_ASSERT(uvm_channel_is_ce(channel));
+
    status = uvm_rm_mem_alloc_and_map_cpu(gpu,
                                          UVM_RM_MEM_TYPE_SYS,
                                          sizeof(semaphore->conf_computing.last_pushed_notifier),
@@ -1679,7 +1691,7 @@ static NV_STATUS alloc_conf_computing_buffers_semaphore(uvm_channel_t *channel)

 static NV_STATUS alloc_conf_computing_buffers_wlc(uvm_channel_t *channel)
 {
-    uvm_gpu_t *gpu = channel->pool->manager->gpu;
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
    size_t aligned_wlc_push_size = UVM_ALIGN_UP(UVM_MAX_WLC_PUSH_SIZE, UVM_CONF_COMPUTING_AUTH_TAG_ALIGNMENT);
    NV_STATUS status = uvm_rm_mem_alloc_and_map_cpu(gpu,
                                                    UVM_RM_MEM_TYPE_SYS,
@@ -1723,7 +1735,7 @@ static NV_STATUS alloc_conf_computing_buffers_wlc(uvm_channel_t *channel)

 static NV_STATUS alloc_conf_computing_buffers_lcic(uvm_channel_t *channel)
 {
-    uvm_gpu_t *gpu = channel->pool->manager->gpu;
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
    const size_t notifier_size = sizeof(*channel->conf_computing.static_notifier_entry_unprotected_sysmem_cpu);
    NV_STATUS status = uvm_rm_mem_alloc_and_map_cpu(gpu,
                                                    UVM_RM_MEM_TYPE_SYS,
@@ -1758,8 +1770,10 @@ static NV_STATUS alloc_conf_computing_buffers_lcic(uvm_channel_t *channel)
 static NV_STATUS alloc_conf_computing_buffers(uvm_channel_t *channel)
 {
    NV_STATUS status;
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);

-    UVM_ASSERT(uvm_channel_is_secure_ce(channel));
+    UVM_ASSERT(uvm_conf_computing_mode_enabled(gpu));
+    UVM_ASSERT(uvm_channel_is_ce(channel));

    status = alloc_conf_computing_buffers_semaphore(channel);
    if (status != NV_OK)
@@ -1772,7 +1786,6 @@ static NV_STATUS alloc_conf_computing_buffers(uvm_channel_t *channel)
        status = alloc_conf_computing_buffers_lcic(channel);
    }
    else {
-        uvm_gpu_t *gpu = channel->pool->manager->gpu;
        void *push_crypto_bundles = uvm_kvmalloc_zero(sizeof(*channel->conf_computing.push_crypto_bundles) *
                                                      channel->num_gpfifo_entries);

@@ -1793,6 +1806,8 @@ static NV_STATUS alloc_conf_computing_buffers(uvm_channel_t *channel)

 static void channel_destroy(uvm_channel_pool_t *pool, uvm_channel_t *channel)
 {
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
+
    UVM_ASSERT(pool->num_channels > 0);

    if (channel->tracking_sem.queued_value > 0) {
@@ -1816,9 +1831,10 @@ static void channel_destroy(uvm_channel_pool_t *pool, uvm_channel_t *channel)

    uvm_kvfree(channel->gpfifo_entries);

-    if (uvm_channel_is_secure(channel)) {
+    if (uvm_conf_computing_mode_enabled(gpu)) {
        csl_destroy(channel);
-        if (uvm_channel_is_secure_ce(channel))
+
+        if (uvm_channel_is_ce(channel))
            free_conf_computing_buffers(channel);
    }

@@ -1905,8 +1921,6 @@ static NV_STATUS internal_channel_create(uvm_channel_t *channel)
        channel_alloc_params.gpPutLoc = UVM_BUFFER_LOCATION_SYS;
    }

-    channel_alloc_params.secure = channel->pool->secure;
-
    status = uvm_rm_locked_call(nvUvmInterfaceChannelAllocate(channel_get_tsg(channel),
                                                              &channel_alloc_params,
                                                              &channel->handle,
@@ -1928,8 +1942,7 @@ static NV_STATUS internal_channel_create(uvm_channel_t *channel)
             channel_info->hwChannelId,
             uvm_channel_is_sec2(channel) ? "SEC2" :
             uvm_channel_is_wlc(channel) ? "WLC" :
-             uvm_channel_is_lcic(channel) ? "LCIC" :
-             uvm_channel_is_secure(channel) ? "CE (secure)" : "CE",
+             uvm_channel_is_lcic(channel) ? "LCIC" : "CE",
             channel->pool->engine_index);

    return NV_OK;
@@ -1981,7 +1994,7 @@ static NV_STATUS channel_create(uvm_channel_pool_t *pool, uvm_channel_t *channel
    channel->tools.pending_event_count = 0;
    INIT_LIST_HEAD(&channel->tools.channel_list_node);

-    if (uvm_conf_computing_mode_enabled(gpu) && !uvm_channel_is_sec2(channel))
+    if (uvm_conf_computing_mode_enabled(gpu) && uvm_channel_is_ce(channel))
        semaphore_pool = gpu->secure_semaphore_pool;

    status = uvm_gpu_tracking_semaphore_alloc(semaphore_pool, &channel->tracking_sem);
@@ -2007,7 +2020,7 @@ static NV_STATUS channel_create(uvm_channel_pool_t *pool, uvm_channel_t *channel
        goto error;
    }

-    if (uvm_channel_is_secure(channel)) {
+    if (uvm_conf_computing_mode_enabled(gpu)) {
        status = csl_init(channel);
        if (status != NV_OK)
            goto error;
@@ -2075,7 +2088,7 @@ static NV_STATUS channel_init(uvm_channel_t *channel)

    if (uvm_gpu_has_pushbuffer_segments(gpu)) {
        NvU64 gpfifo_entry;
-        uvm_pushbuffer_t *pushbuffer = channel->pool->manager->pushbuffer;
+        uvm_pushbuffer_t *pushbuffer = uvm_channel_get_pushbuffer(channel);
        NvU64 pb_base = uvm_pushbuffer_get_gpu_va_base(pushbuffer);

        if (uvm_channel_is_sec2(channel))
@@ -2095,10 +2108,8 @@ static NV_STATUS channel_init(uvm_channel_t *channel)

    if (uvm_channel_is_ce(channel))
        gpu->parent->ce_hal->init(&push);
-    else if (uvm_channel_is_sec2(channel))
-        gpu->parent->sec2_hal->init(&push);
    else
-        UVM_ASSERT_MSG(0, "Unknown channel type!");
+        gpu->parent->sec2_hal->init(&push);

    gpu->parent->host_hal->init(&push);

@@ -2153,11 +2164,6 @@ static unsigned channel_pool_type_num_tsgs(uvm_channel_pool_type_t pool_type)
    return 1;
 }

-static bool pool_type_is_valid(uvm_channel_pool_type_t pool_type)
-{
-    return(is_power_of_2(pool_type) && (pool_type < UVM_CHANNEL_POOL_TYPE_MASK));
-}
-
 static UVM_GPU_CHANNEL_ENGINE_TYPE pool_type_to_engine_type(uvm_channel_pool_type_t pool_type)
 {
    if (pool_type ==  UVM_CHANNEL_POOL_TYPE_SEC2)
@@ -2229,7 +2235,7 @@ static NV_STATUS channel_pool_add(uvm_channel_manager_t *channel_manager,
    unsigned num_tsgs;
    uvm_channel_pool_t *pool;

-    UVM_ASSERT(pool_type_is_valid(pool_type));
+    UVM_ASSERT(uvm_pool_type_is_valid(pool_type));

    pool = channel_manager->channel_pools + channel_manager->num_channel_pools;
    channel_manager->num_channel_pools++;
@@ -2260,10 +2266,10 @@ static NV_STATUS channel_pool_add(uvm_channel_manager_t *channel_manager,
    num_channels = channel_pool_type_num_channels(pool_type);
    UVM_ASSERT(num_channels <= UVM_CHANNEL_MAX_NUM_CHANNELS_PER_POOL);

-    if (pool->secure) {
+    if (uvm_conf_computing_mode_enabled(channel_manager->gpu)) {
        // Use different order lock for SEC2 and WLC channels.
        // This allows reserving a SEC2 or WLC channel for indirect work
-        // submission while holding a reservation for a secure channel.
+        // submission while holding a reservation for a channel.
        uvm_lock_order_t order = uvm_channel_pool_is_sec2(pool) ? UVM_LOCK_ORDER_CSL_SEC2_PUSH :
                                 (uvm_channel_pool_is_wlc(pool) ? UVM_LOCK_ORDER_CSL_WLC_PUSH :
                                                                  UVM_LOCK_ORDER_CSL_PUSH);
@@ -2297,23 +2303,6 @@ static NV_STATUS channel_pool_add(uvm_channel_manager_t *channel_manager,
    return status;
 }

-static NV_STATUS channel_pool_add_secure(uvm_channel_manager_t *channel_manager,
-                                         uvm_channel_pool_type_t pool_type,
-                                         unsigned engine_index,
-                                         uvm_channel_pool_t **pool_out)
-{
-    uvm_channel_pool_t *pool = channel_manager->channel_pools + channel_manager->num_channel_pools;
-
-    pool->secure = true;
-    return channel_pool_add(channel_manager, pool_type, engine_index, pool_out);
-}
-
-bool uvm_channel_type_requires_secure_pool(uvm_gpu_t *gpu, uvm_channel_type_t channel_type)
-{
-    // For now, all channels are secure channels
-    return true;
-}
-
 static bool ce_usable_for_channel_type(uvm_channel_type_t type, const UvmGpuCopyEngineCaps *cap)
 {
    if (!cap->supported || cap->grce)
@@ -2461,13 +2450,6 @@ static NV_STATUS pick_ce_for_channel_type(uvm_channel_manager_t *manager,
        if (!ce_usable_for_channel_type(type, cap))
            continue;

-        if (uvm_conf_computing_mode_is_hcc(manager->gpu)) {
-            // All usable CEs are secure
-            UVM_ASSERT(cap->secure);
-
-            // Multi-PCE LCEs are disallowed
-            UVM_ASSERT(hweight32(cap->cePceMask) == 1);
-        }
        __set_bit(i, manager->ce_mask);

        if (best_ce == UVM_COPY_ENGINE_COUNT_MAX) {
@@ -2523,7 +2505,7 @@ out:
    return status;
 }

-// Return the non-secure pool corresponding to the given CE index
+// Return the pool corresponding to the given CE index
 //
 // This function cannot be used to access the proxy pool in SR-IOV heavy.
 static uvm_channel_pool_t *channel_manager_ce_pool(uvm_channel_manager_t *manager, NvU32 ce)
@@ -2701,7 +2683,7 @@ static void init_channel_manager_conf(uvm_channel_manager_t *manager)
    // caches vidmem (and sysmem), we place GPFIFO and GPPUT on sysmem to avoid
    // cache thrash. The memory access latency is reduced, despite the required
    // access through the bus, because no cache coherence message is exchanged.
-    if (uvm_gpu_is_coherent(gpu->parent)) {
+    if (uvm_parent_gpu_is_coherent(gpu->parent)) {
        manager->conf.gpfifo_loc = UVM_BUFFER_LOCATION_SYS;

        // On GPUs with limited ESCHED addressing range, e.g., Volta on P9, RM
@@ -2734,24 +2716,17 @@ static void init_channel_manager_conf(uvm_channel_manager_t *manager)
 static unsigned channel_manager_get_max_pools(uvm_channel_manager_t *manager)
 {
    unsigned num_channel_pools;
-    unsigned num_used_ce = bitmap_weight(manager->ce_mask, UVM_COPY_ENGINE_COUNT_MAX);

    // Create one CE channel pool per usable CE
-    num_channel_pools = num_used_ce;
+    num_channel_pools = bitmap_weight(manager->ce_mask, UVM_COPY_ENGINE_COUNT_MAX);

    // CE proxy channel pool.
    if (uvm_gpu_uses_proxy_channel_pool(manager->gpu))
        num_channel_pools++;

-    if (uvm_conf_computing_mode_enabled(manager->gpu)) {
-
-        // Create one CE secure channel pool per usable CE
-        if (uvm_conf_computing_mode_is_hcc(manager->gpu))
-            num_channel_pools += num_used_ce;
-
-        // SEC2 pool, WLC pool, LCIC pool
+    // SEC2 pool, WLC pool, LCIC pool
+    if (uvm_conf_computing_mode_enabled(manager->gpu))
        num_channel_pools += 3;
-    }

    return num_channel_pools;
 }
@@ -2783,38 +2758,6 @@ static NV_STATUS channel_manager_create_ce_pools(uvm_channel_manager_t *manager,
    return NV_OK;
 }

-static NV_STATUS channel_manager_create_ce_secure_pools(uvm_channel_manager_t *manager, unsigned *preferred_ce)
-{
-    unsigned ce;
-
-    if (!uvm_conf_computing_mode_is_hcc(manager->gpu))
-        return NV_OK;
-
-    for_each_set_bit(ce, manager->ce_mask, UVM_COPY_ENGINE_COUNT_MAX) {
-        NV_STATUS status;
-        unsigned type;
-        uvm_channel_pool_t *pool = NULL;
-
-        status = channel_pool_add_secure(manager, UVM_CHANNEL_POOL_TYPE_CE, ce, &pool);
-        if (status != NV_OK)
-            return status;
-
-        for (type = 0; type < UVM_CHANNEL_TYPE_CE_COUNT; type++) {
-            unsigned preferred = preferred_ce[type];
-
-            if (preferred != ce)
-                continue;
-
-            if (uvm_channel_type_requires_secure_pool(manager->gpu, type)) {
-                UVM_ASSERT(manager->pool_to_use.default_for_type[type] == NULL);
-                manager->pool_to_use.default_for_type[type] = pool;
-            }
-        }
-    }
-
-    return NV_OK;
-}
-
 static NV_STATUS setup_wlc_schedule(uvm_channel_t *wlc)
 {
    uvm_gpu_t *gpu = uvm_channel_get_gpu(wlc);
@@ -3142,6 +3085,64 @@ static NV_STATUS channel_manager_setup_wlc_lcic(uvm_channel_pool_t *wlc_pool, uv
    return NV_OK;
 }

+static NV_STATUS channel_manager_create_conf_computing_pools(uvm_channel_manager_t *manager, unsigned *preferred_ce)
+{
+    NV_STATUS status;
+    unsigned wlc_lcic_ce_index;
+    uvm_channel_pool_t *sec2_pool = NULL;
+    uvm_channel_pool_t *wlc_pool = NULL;
+    uvm_channel_pool_t *lcic_pool = NULL;
+
+    if (!uvm_conf_computing_mode_enabled(manager->gpu))
+        return NV_OK;
+
+    status = uvm_rm_mem_alloc(manager->gpu,
+                             UVM_RM_MEM_TYPE_SYS,
+                             sizeof(UvmCslIv),
+                             UVM_CONF_COMPUTING_BUF_ALIGNMENT,
+                             &manager->gpu->conf_computing.iv_rm_mem);
+    if (status != NV_OK)
+        return status;
+
+    // Create SEC2 pool. This needs to be done first, initialization of
+    // other channels needs SEC2.
+    status = channel_pool_add(manager, UVM_CHANNEL_POOL_TYPE_SEC2, 0, &sec2_pool);
+    if (status != NV_OK)
+        return status;
+
+    manager->pool_to_use.default_for_type[UVM_CHANNEL_TYPE_SEC2] = sec2_pool;
+
+    // Use the same CE as CPU TO GPU channels for WLC/LCIC
+    // Both need to use the same engine for the fixed schedule to work.
+    // TODO: Bug 3981928: [hcc][uvm] Optimize parameters of WLC/LCIC secure
+    // work launch
+    // Find a metric to select the best CE to use
+    wlc_lcic_ce_index = preferred_ce[UVM_CHANNEL_TYPE_CPU_TO_GPU];
+
+    // Create WLC/LCIC pools. This should be done early, CE channels use
+    // them for secure launch. The WLC pool must be created before the LCIC.
+    status = channel_pool_add(manager, UVM_CHANNEL_POOL_TYPE_WLC, wlc_lcic_ce_index, &wlc_pool);
+    if (status != NV_OK)
+        return status;
+
+    manager->pool_to_use.default_for_type[UVM_CHANNEL_TYPE_WLC] = wlc_pool;
+
+    status = channel_pool_add(manager, UVM_CHANNEL_POOL_TYPE_LCIC, wlc_lcic_ce_index, &lcic_pool);
+    if (status != NV_OK)
+        return status;
+
+    status = channel_manager_setup_wlc_lcic(wlc_pool, lcic_pool);
+    if (status != NV_OK)
+        return status;
+
+    // The LCIC pool must be assigned after the call to
+    // channel_manager_setup_wlc_lcic(). It determines WLC and LCIC channels
+    // are ready to be used for secure work submission.
+    manager->pool_to_use.default_for_type[UVM_CHANNEL_TYPE_LCIC] = lcic_pool;
+
+    return NV_OK;
+}
+
 static NV_STATUS channel_manager_create_pools(uvm_channel_manager_t *manager)
 {
    NV_STATUS status;
@@ -3162,62 +3163,11 @@ static NV_STATUS channel_manager_create_pools(uvm_channel_manager_t *manager)
    if (!manager->channel_pools)
        return NV_ERR_NO_MEMORY;

-    if (uvm_conf_computing_mode_enabled(manager->gpu)) {
-        uvm_channel_pool_t *sec2_pool = NULL;
-        uvm_channel_pool_t *wlc_pool = NULL;
-        uvm_channel_pool_t *lcic_pool = NULL;
-        unsigned wlc_lcic_ce_index;
-
-        status = uvm_rm_mem_alloc(manager->gpu,
-                                  UVM_RM_MEM_TYPE_SYS,
-                                  sizeof(UvmCslIv),
-                                  UVM_CONF_COMPUTING_BUF_ALIGNMENT,
-                                  &manager->gpu->conf_computing.iv_rm_mem);
-
-        if (status != NV_OK)
-            return status;
-
-        // Create SEC2 pool. This needs to be done first, initialization of
-        // other channels needs SEC2.
-        status = channel_pool_add_secure(manager, UVM_CHANNEL_POOL_TYPE_SEC2, 0, &sec2_pool);
-        if (status != NV_OK)
-            return status;
-
-        manager->pool_to_use.default_for_type[UVM_CHANNEL_TYPE_SEC2] = sec2_pool;
-
-        // Use the same CE as CPU TO GPU channels for WLC/LCIC
-        // Both need to use the same engine for the fixed schedule to work.
-        // TODO: Bug 3981928: [hcc][uvm] Optimize parameters of WLC/LCIC secure
-        // work launch
-        // Find a metric to select the best CE to use
-        wlc_lcic_ce_index = preferred_ce[UVM_CHANNEL_TYPE_CPU_TO_GPU];
-
-        // Create WLC/LCIC pools. This should be done early, CE channels use
-        // them for secure launch. The WLC pool must be created before the LCIC.
-        status = channel_pool_add_secure(manager, UVM_CHANNEL_POOL_TYPE_WLC, wlc_lcic_ce_index, &wlc_pool);
-        if (status != NV_OK)
-            return status;
-
-        manager->pool_to_use.default_for_type[UVM_CHANNEL_TYPE_WLC] = wlc_pool;
-
-        status = channel_pool_add_secure(manager, UVM_CHANNEL_POOL_TYPE_LCIC, wlc_lcic_ce_index, &lcic_pool);
-        if (status != NV_OK)
-            return status;
-
-        status = channel_manager_setup_wlc_lcic(wlc_pool, lcic_pool);
-        if (status != NV_OK)
-            return status;
-
-        // The LCIC pool must be assigned after the call to
-        // channel_manager_setup_wlc_lcic(). It determines WLC and LCIC channels
-        // are ready to be used for secure work submission.
-        manager->pool_to_use.default_for_type[UVM_CHANNEL_TYPE_LCIC] = lcic_pool;
-        status = channel_manager_create_ce_secure_pools(manager, preferred_ce);
-    }
-    else {
-        status = channel_manager_create_ce_pools(manager, preferred_ce);
-    }
+    status = channel_manager_create_conf_computing_pools(manager, preferred_ce);
+    if (status != NV_OK)
+        return status;

+    status = channel_manager_create_ce_pools(manager, preferred_ce);
    if (status != NV_OK)
        return status;

--- a/kernel-open/nvidia-uvm/uvm_channel.h
+++ b/kernel-open/nvidia-uvm/uvm_channel.h
@@ -104,16 +104,14 @@ typedef enum
    // ----------------------------------
    // Channel type with fixed schedules

-    // Work Launch Channel (WLC) is a specialized channel
-    // for launching work on other channels when
-    // Confidential Computing is enabled.
-    // It is paired with LCIC (below)
+    // Work Launch Channel (WLC) is a specialized channel for launching work on
+    // other channels when the Confidential Computing is feature enabled. It is
+    // paired with LCIC (below)
    UVM_CHANNEL_TYPE_WLC,

-    // Launch Confirmation Indicator Channel (LCIC) is a
-    // specialized channel with fixed schedule. It gets
-    // triggered by executing WLC work, and makes sure that
-    // WLC get/put pointers are up-to-date.
+    // Launch Confirmation Indicator Channel (LCIC) is a specialized channel
+    // with fixed schedule. It gets triggered by executing WLC work, and makes
+    // sure that WLC get/put pointers are up-to-date.
    UVM_CHANNEL_TYPE_LCIC,

    UVM_CHANNEL_TYPE_COUNT,
@@ -242,11 +240,9 @@ typedef struct
    DECLARE_BITMAP(push_locks, UVM_CHANNEL_MAX_NUM_CHANNELS_PER_POOL);

    // Counting semaphore for available and unlocked channels, it must be
-    // acquired before submitting work to a secure channel.
+    // acquired before submitting work to a channel when the Confidential
+    // Computing feature is enabled.
    uvm_semaphore_t push_sem;
-
-    // See uvm_channel_is_secure() documentation.
-    bool secure;
 } uvm_channel_pool_t;

 struct uvm_channel_struct
@@ -304,8 +300,9 @@ struct uvm_channel_struct
        // its internal operation and each push may modify this state.
        uvm_mutex_t push_lock;

-        // Every secure channel has cryptographic state in HW, which is
-        // mirrored here for CPU-side operations.
+        // When the Confidential Computing feature is enabled, every channel has
+        // cryptographic state in HW, which is mirrored here for CPU-side
+        // operations.
        UvmCslContext ctx;
        bool is_ctx_initialized;

@@ -459,46 +456,28 @@ struct uvm_channel_manager_struct
 // Create a channel manager for the GPU
 NV_STATUS uvm_channel_manager_create(uvm_gpu_t *gpu, uvm_channel_manager_t **manager_out);

-static bool uvm_channel_pool_is_ce(uvm_channel_pool_t *pool);
-
-// A channel is secure if it has HW encryption capabilities.
-//
-// Secure channels are treated differently in the UVM driver. Each secure
-// channel has a unique CSL context associated with it, has relatively
-// restrictive reservation policies (in comparison with non-secure channels),
-// it is requested to be allocated differently by RM, etc.
-static bool uvm_channel_pool_is_secure(uvm_channel_pool_t *pool)
+static bool uvm_pool_type_is_valid(uvm_channel_pool_type_t pool_type)
 {
-    return pool->secure;
-}
-
-static bool uvm_channel_is_secure(uvm_channel_t *channel)
-{
-    return uvm_channel_pool_is_secure(channel->pool);
+    return (is_power_of_2(pool_type) && (pool_type < UVM_CHANNEL_POOL_TYPE_MASK));
 }

 static bool uvm_channel_pool_is_sec2(uvm_channel_pool_t *pool)
 {
-    UVM_ASSERT(pool->pool_type < UVM_CHANNEL_POOL_TYPE_MASK);
+    UVM_ASSERT(uvm_pool_type_is_valid(pool->pool_type));

    return (pool->pool_type == UVM_CHANNEL_POOL_TYPE_SEC2);
 }

-static bool uvm_channel_pool_is_secure_ce(uvm_channel_pool_t *pool)
-{
-    return uvm_channel_pool_is_secure(pool) && uvm_channel_pool_is_ce(pool);
-}
-
 static bool uvm_channel_pool_is_wlc(uvm_channel_pool_t *pool)
 {
-    UVM_ASSERT(pool->pool_type < UVM_CHANNEL_POOL_TYPE_MASK);
+    UVM_ASSERT(uvm_pool_type_is_valid(pool->pool_type));

    return (pool->pool_type == UVM_CHANNEL_POOL_TYPE_WLC);
 }

 static bool uvm_channel_pool_is_lcic(uvm_channel_pool_t *pool)
 {
-    UVM_ASSERT(pool->pool_type < UVM_CHANNEL_POOL_TYPE_MASK);
+    UVM_ASSERT(uvm_pool_type_is_valid(pool->pool_type));

    return (pool->pool_type == UVM_CHANNEL_POOL_TYPE_LCIC);
 }
@@ -508,11 +487,6 @@ static bool uvm_channel_is_sec2(uvm_channel_t *channel)
    return uvm_channel_pool_is_sec2(channel->pool);
 }

-static bool uvm_channel_is_secure_ce(uvm_channel_t *channel)
-{
-    return uvm_channel_pool_is_secure_ce(channel->pool);
-}
-
 static bool uvm_channel_is_wlc(uvm_channel_t *channel)
 {
    return uvm_channel_pool_is_wlc(channel->pool);
@@ -523,12 +497,9 @@ static bool uvm_channel_is_lcic(uvm_channel_t *channel)
    return uvm_channel_pool_is_lcic(channel->pool);
 }

-bool uvm_channel_type_requires_secure_pool(uvm_gpu_t *gpu, uvm_channel_type_t channel_type);
-NV_STATUS uvm_channel_secure_init(uvm_gpu_t *gpu, uvm_channel_t *channel);
-
 static bool uvm_channel_pool_is_proxy(uvm_channel_pool_t *pool)
 {
-    UVM_ASSERT(pool->pool_type < UVM_CHANNEL_POOL_TYPE_MASK);
+    UVM_ASSERT(uvm_pool_type_is_valid(pool->pool_type));

    return pool->pool_type == UVM_CHANNEL_POOL_TYPE_CE_PROXY;
 }
@@ -540,11 +511,7 @@ static bool uvm_channel_is_proxy(uvm_channel_t *channel)

 static bool uvm_channel_pool_is_ce(uvm_channel_pool_t *pool)
 {
-    UVM_ASSERT(pool->pool_type < UVM_CHANNEL_POOL_TYPE_MASK);
-    if (uvm_channel_pool_is_wlc(pool) || uvm_channel_pool_is_lcic(pool))
-        return true;
-
-    return (pool->pool_type == UVM_CHANNEL_POOL_TYPE_CE) || uvm_channel_pool_is_proxy(pool);
+    return !uvm_channel_pool_is_sec2(pool);
 }

 static bool uvm_channel_is_ce(uvm_channel_t *channel)
@@ -686,6 +653,11 @@ static uvm_gpu_t *uvm_channel_get_gpu(uvm_channel_t *channel)
    return channel->pool->manager->gpu;
 }

+static uvm_pushbuffer_t *uvm_channel_get_pushbuffer(uvm_channel_t *channel)
+{
+    return channel->pool->manager->pushbuffer;
+}
+
 // Index of a channel within the owning pool
 static unsigned uvm_channel_index_in_pool(const uvm_channel_t *channel)
 {
--- a/kernel-open/nvidia-uvm/uvm_channel_test.c
+++ b/kernel-open/nvidia-uvm/uvm_channel_test.c
@@ -681,9 +681,10 @@ done:
 }

 // The following test is inspired by uvm_push_test.c:test_concurrent_pushes.
-// This test verifies that concurrent pushes using the same secure channel pool
-// select different channels.
-NV_STATUS test_secure_channel_selection(uvm_va_space_t *va_space)
+// This test verifies that concurrent pushes using the same channel pool
+// select different channels, when the Confidential Computing feature is
+// enabled.
+NV_STATUS test_conf_computing_channel_selection(uvm_va_space_t *va_space)
 {
    NV_STATUS status = NV_OK;
    uvm_channel_pool_t *pool;
@@ -703,9 +704,6 @@ NV_STATUS test_secure_channel_selection(uvm_va_space_t *va_space)
        uvm_channel_type_t channel_type;

        for (channel_type = 0; channel_type < UVM_CHANNEL_TYPE_COUNT; channel_type++) {
-            if (!uvm_channel_type_requires_secure_pool(gpu, channel_type))
-                continue;
-
            pool = gpu->channel_manager->pool_to_use.default_for_type[channel_type];
            TEST_CHECK_RET(pool != NULL);

@@ -997,7 +995,7 @@ NV_STATUS uvm_test_channel_sanity(UVM_TEST_CHANNEL_SANITY_PARAMS *params, struct
    if (status != NV_OK)
        goto done;

-    status = test_secure_channel_selection(va_space);
+    status = test_conf_computing_channel_selection(va_space);
    if (status != NV_OK)
        goto done;

--- a/kernel-open/nvidia-uvm/uvm_common.c
+++ b/kernel-open/nvidia-uvm/uvm_common.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2013-2021 NVIDIA Corporation
+    Copyright (c) 2013-2023 NVIDIA Corporation

    This program is free software; you can redistribute it and/or
    modify it under the terms of the GNU General Public License
@@ -233,18 +233,6 @@ unsigned uvm_get_stale_thread_id(void)
    return (unsigned)task_pid_vnr(current);
 }

-//
-// A simple security rule for allowing access to UVM user space memory: if you
-// are the same user as the owner of the memory, or if you are root, then you
-// are granted access. The idea is to allow debuggers and profilers to work, but
-// without opening up any security holes.
-//
-NvBool uvm_user_id_security_check(uid_t euidTarget)
-{
-    return (NV_CURRENT_EUID() == euidTarget) ||
-           (UVM_ROOT_UID == euidTarget);
-}
-
 void on_uvm_test_fail(void)
 {
    (void)NULL;
--- a/kernel-open/nvidia-uvm/uvm_common.h
+++ b/kernel-open/nvidia-uvm/uvm_common.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2013-2021 NVIDIA Corporation
+    Copyright (c) 2013-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -282,9 +282,6 @@ static inline void kmem_cache_destroy_safe(struct kmem_cache **ppCache)
    }
 }

-static const uid_t UVM_ROOT_UID = 0;
-
-
 typedef struct
 {
    NvU64 start_time_ns;
@@ -335,7 +332,6 @@ NV_STATUS errno_to_nv_status(int errnoCode);
 int nv_status_to_errno(NV_STATUS status);
 unsigned uvm_get_stale_process_id(void);
 unsigned uvm_get_stale_thread_id(void);
-NvBool uvm_user_id_security_check(uid_t euidTarget);

 extern int uvm_enable_builtin_tests;

--- a/kernel-open/nvidia-uvm/uvm_global.h
+++ b/kernel-open/nvidia-uvm/uvm_global.h
@@ -233,10 +233,8 @@ static uvm_gpu_t *uvm_gpu_get_by_processor_id(uvm_processor_id_t id)
    return gpu;
 }

-static uvmGpuSessionHandle uvm_gpu_session_handle(uvm_gpu_t *gpu)
+static uvmGpuSessionHandle uvm_global_session_handle(void)
 {
-    if (gpu->parent->smc.enabled)
-        return gpu->smc.rm_session_handle;
    return g_uvm_global.rm_session_handle;
 }

--- a/kernel-open/nvidia-uvm/uvm_gpu.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu.c
@@ -99,8 +99,8 @@ static void fill_gpu_info(uvm_parent_gpu_t *parent_gpu, const UvmGpuInfo *gpu_in
    parent_gpu->system_bus.link_rate_mbyte_per_s = gpu_info->sysmemLinkRateMBps;

    if (gpu_info->systemMemoryWindowSize > 0) {
-        // memory_window_end is inclusive but uvm_gpu_is_coherent() checks
-        // memory_window_end > memory_window_start as its condition.
+        // memory_window_end is inclusive but uvm_parent_gpu_is_coherent()
+        // checks memory_window_end > memory_window_start as its condition.
        UVM_ASSERT(gpu_info->systemMemoryWindowSize > 1);
        parent_gpu->system_bus.memory_window_start = gpu_info->systemMemoryWindowStart;
        parent_gpu->system_bus.memory_window_end   = gpu_info->systemMemoryWindowStart +
@@ -136,12 +136,12 @@ static NV_STATUS get_gpu_caps(uvm_gpu_t *gpu)
        return status;

    if (gpu_caps.numaEnabled) {
-        UVM_ASSERT(uvm_gpu_is_coherent(gpu->parent));
+        UVM_ASSERT(uvm_parent_gpu_is_coherent(gpu->parent));
        gpu->mem_info.numa.enabled = true;
        gpu->mem_info.numa.node_id = gpu_caps.numaNodeId;
    }
    else {
-        UVM_ASSERT(!uvm_gpu_is_coherent(gpu->parent));
+        UVM_ASSERT(!uvm_parent_gpu_is_coherent(gpu->parent));
    }

    return NV_OK;
@@ -1089,7 +1089,7 @@ static NV_STATUS init_parent_gpu(uvm_parent_gpu_t *parent_gpu,
 {
    NV_STATUS status;

-    status = uvm_rm_locked_call(nvUvmInterfaceDeviceCreate(g_uvm_global.rm_session_handle,
+    status = uvm_rm_locked_call(nvUvmInterfaceDeviceCreate(uvm_global_session_handle(),
                                                           gpu_info,
                                                           gpu_uuid,
                                                           &parent_gpu->rm_device,
@@ -1166,19 +1166,8 @@ static NV_STATUS init_gpu(uvm_gpu_t *gpu, const UvmGpuInfo *gpu_info)
 {
    NV_STATUS status;

-    // Presently, an RM client can only subscribe to a single partition per
-    // GPU. Therefore, UVM needs to create several RM clients. For simplicity,
-    // and since P2P is not supported when SMC partitions are created, we
-    // create a client (session) per GPU partition.
    if (gpu->parent->smc.enabled) {
-        UvmPlatformInfo platform_info;
-        status = uvm_rm_locked_call(nvUvmInterfaceSessionCreate(&gpu->smc.rm_session_handle, &platform_info));
-        if (status != NV_OK) {
-            UVM_ERR_PRINT("Creating RM session failed: %s\n", nvstatusToString(status));
-            return status;
-        }
-
-        status = uvm_rm_locked_call(nvUvmInterfaceDeviceCreate(uvm_gpu_session_handle(gpu),
+        status = uvm_rm_locked_call(nvUvmInterfaceDeviceCreate(uvm_global_session_handle(),
                                                               gpu_info,
                                                               uvm_gpu_uuid(gpu),
                                                               &gpu->smc.rm_device,
@@ -1548,9 +1537,6 @@ static void deinit_gpu(uvm_gpu_t *gpu)
    if (gpu->parent->smc.enabled) {
        if (gpu->smc.rm_device != 0)
            uvm_rm_locked_call_void(nvUvmInterfaceDeviceDestroy(gpu->smc.rm_device));
-
-        if (gpu->smc.rm_session_handle != 0)
-            uvm_rm_locked_call_void(nvUvmInterfaceSessionDestroy(gpu->smc.rm_session_handle));
    }

    gpu->magic = 0;
@@ -2580,7 +2566,7 @@ static void disable_peer_access(uvm_gpu_t *gpu0, uvm_gpu_t *gpu1)
        uvm_mmu_destroy_peer_identity_mappings(gpu0, gpu1);
        uvm_mmu_destroy_peer_identity_mappings(gpu1, gpu0);

-        uvm_rm_locked_call_void(nvUvmInterfaceP2pObjectDestroy(uvm_gpu_session_handle(gpu0), p2p_handle));
+        uvm_rm_locked_call_void(nvUvmInterfaceP2pObjectDestroy(uvm_global_session_handle(), p2p_handle));

        UVM_ASSERT(uvm_gpu_get(gpu0->global_id) == gpu0);
        UVM_ASSERT(uvm_gpu_get(gpu1->global_id) == gpu1);
@@ -2706,9 +2692,9 @@ uvm_processor_id_t uvm_gpu_get_processor_id_by_address(uvm_gpu_t *gpu, uvm_gpu_p
    return id;
 }

-uvm_gpu_peer_t *uvm_gpu_index_peer_caps(const uvm_gpu_id_t gpu_id1, const uvm_gpu_id_t gpu_id2)
+uvm_gpu_peer_t *uvm_gpu_index_peer_caps(const uvm_gpu_id_t gpu_id0, const uvm_gpu_id_t gpu_id1)
 {
-    NvU32 table_index = uvm_gpu_peer_table_index(gpu_id1, gpu_id2);
+    NvU32 table_index = uvm_gpu_peer_table_index(gpu_id0, gpu_id1);
    return &g_uvm_global.peers[table_index];
 }

--- a/kernel-open/nvidia-uvm/uvm_gpu.h
+++ b/kernel-open/nvidia-uvm/uvm_gpu.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2015-2022 NVIDIA Corporation
+    Copyright (c) 2015-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -46,6 +46,7 @@
 #include "uvm_rb_tree.h"
 #include "uvm_perf_prefetch.h"
 #include "nv-kthread-q.h"
+#include <linux/mmu_notifier.h>
 #include "uvm_conf_computing.h"

 // Buffer length to store uvm gpu id, RM device name and gpu uuid.
@@ -166,7 +167,7 @@ struct uvm_service_block_context_struct
    } per_processor_masks[UVM_ID_MAX_PROCESSORS];

    // State used by the VA block routines called by the servicing routine
-    uvm_va_block_context_t block_context;
+    uvm_va_block_context_t *block_context;

    // Prefetch state hint
    uvm_perf_prefetch_hint_t prefetch_hint;
@@ -192,9 +193,9 @@ typedef struct
    // Mask of successfully serviced read faults on pages in write_fault_mask.
    uvm_page_mask_t reads_serviced_mask;

-    // Temporary mask used for uvm_page_mask_or_equal. This is used since
-    // bitmap_or_equal() isn't present in all linux kernel versions.
-    uvm_page_mask_t tmp_mask;
+    // Mask of all faulted pages in a UVM_VA_BLOCK_SIZE aligned region of a
+    // SAM VMA. This is used as input to the prefetcher.
+    uvm_page_mask_t faulted_mask;

    // Client type of the service requestor.
    uvm_fault_client_type_t client_type;
@@ -204,6 +205,40 @@ typedef struct

    // New residency NUMA node ID of the faulting region.
    int residency_node;
+
+    struct
+    {
+        // True if preferred_location was set on this faulting region.
+        // UVM_VA_BLOCK_SIZE sized region in the faulting region bound by the
+        // VMA is is prefetched if preferred_location was set and if first_touch
+        // is true;
+        bool has_preferred_location;
+
+        // True if the UVM_VA_BLOCK_SIZE sized region isn't resident on any
+        // node. False if any page in the region is resident somewhere.
+        bool first_touch;
+
+        // Mask of prefetched pages in a UVM_VA_BLOCK_SIZE aligned region of a
+        // SAM VMA.
+        uvm_page_mask_t prefetch_pages_mask;
+
+        // PFN info of the faulting region
+        unsigned long pfns[PAGES_PER_UVM_VA_BLOCK];
+
+        // Faulting/preferred processor residency mask of the faulting region.
+        uvm_page_mask_t residency_mask;
+
+#if defined(NV_MMU_INTERVAL_NOTIFIER)
+        // MMU notifier used to compute residency of this faulting region.
+        struct mmu_interval_notifier notifier;
+#endif
+
+        uvm_va_space_t *va_space;
+
+        // Prefetch temporary state.
+        uvm_perf_prefetch_bitmap_tree_t bitmap_tree;
+    } prefetch_state;
+
 } uvm_ats_fault_context_t;

 struct uvm_fault_service_batch_context_struct
@@ -228,7 +263,10 @@ struct uvm_fault_service_batch_context_struct

    NvU32 num_coalesced_faults;

-    bool has_fatal_faults;
+    // One of the VA spaces in this batch which had fatal faults. If NULL, no
+    // faults were fatal. More than one VA space could have fatal faults, but we
+    // pick one to be the target of the cancel sequence.
+    uvm_va_space_t *fatal_va_space;

    bool has_throttled_faults;

@@ -790,8 +828,6 @@ struct uvm_gpu_struct
    {
        NvU32 swizz_id;

-        uvmGpuSessionHandle rm_session_handle;
-
        // RM device handle used in many of the UVM/RM APIs.
        //
        // Do not read this field directly, use uvm_gpu_device_handle instead.
@@ -1127,6 +1163,16 @@ struct uvm_parent_gpu_struct
        NvU64 memory_window_start;
        NvU64 memory_window_end;
    } system_bus;
+
+    // WAR to issue ATS TLB invalidation commands ourselves.
+    struct
+    {
+        uvm_mutex_t smmu_lock;
+        struct page *smmu_cmdq;
+        void __iomem *smmu_cmdqv_base;
+        unsigned long smmu_prod;
+        unsigned long smmu_cons;
+    } smmu_war;
 };

 static const char *uvm_gpu_name(uvm_gpu_t *gpu)
@@ -1301,7 +1347,7 @@ static NvU64 uvm_gpu_retained_count(uvm_gpu_t *gpu)
 void uvm_parent_gpu_kref_put(uvm_parent_gpu_t *gpu);

 // Calculates peer table index using GPU ids.
-NvU32 uvm_gpu_peer_table_index(uvm_gpu_id_t gpu_id1, uvm_gpu_id_t gpu_id2);
+NvU32 uvm_gpu_peer_table_index(const uvm_gpu_id_t gpu_id0, const uvm_gpu_id_t gpu_id1);

 // Either retains an existing PCIe peer entry or creates a new one. In both
 // cases the two GPUs are also each retained.
@@ -1320,7 +1366,7 @@ uvm_aperture_t uvm_gpu_peer_aperture(uvm_gpu_t *local_gpu, uvm_gpu_t *remote_gpu
 uvm_processor_id_t uvm_gpu_get_processor_id_by_address(uvm_gpu_t *gpu, uvm_gpu_phys_address_t addr);

 // Get the P2P capabilities between the gpus with the given indexes
-uvm_gpu_peer_t *uvm_gpu_index_peer_caps(uvm_gpu_id_t gpu_id1, uvm_gpu_id_t gpu_id2);
+uvm_gpu_peer_t *uvm_gpu_index_peer_caps(const uvm_gpu_id_t gpu_id0, const uvm_gpu_id_t gpu_id1);

 // Get the P2P capabilities between the given gpus
 static uvm_gpu_peer_t *uvm_gpu_peer_caps(const uvm_gpu_t *gpu0, const uvm_gpu_t *gpu1)
@@ -1328,10 +1374,10 @@ static uvm_gpu_peer_t *uvm_gpu_peer_caps(const uvm_gpu_t *gpu0, const uvm_gpu_t
    return uvm_gpu_index_peer_caps(gpu0->id, gpu1->id);
 }

-static bool uvm_gpus_are_nvswitch_connected(uvm_gpu_t *gpu1, uvm_gpu_t *gpu2)
+static bool uvm_gpus_are_nvswitch_connected(const uvm_gpu_t *gpu0, const uvm_gpu_t *gpu1)
 {
-    if (gpu1->parent->nvswitch_info.is_nvswitch_connected && gpu2->parent->nvswitch_info.is_nvswitch_connected) {
-        UVM_ASSERT(uvm_gpu_peer_caps(gpu1, gpu2)->link_type >= UVM_GPU_LINK_NVLINK_2);
+    if (gpu0->parent->nvswitch_info.is_nvswitch_connected && gpu1->parent->nvswitch_info.is_nvswitch_connected) {
+        UVM_ASSERT(uvm_gpu_peer_caps(gpu0, gpu1)->link_type >= UVM_GPU_LINK_NVLINK_2);
        return true;
    }

@@ -1476,7 +1522,7 @@ bool uvm_gpu_can_address_kernel(uvm_gpu_t *gpu, NvU64 addr, NvU64 size);
 // addresses.
 NvU64 uvm_parent_gpu_canonical_address(uvm_parent_gpu_t *parent_gpu, NvU64 addr);

-static bool uvm_gpu_is_coherent(const uvm_parent_gpu_t *parent_gpu)
+static bool uvm_parent_gpu_is_coherent(const uvm_parent_gpu_t *parent_gpu)
 {
    return parent_gpu->system_bus.memory_window_end > parent_gpu->system_bus.memory_window_start;
 }
--- a/kernel-open/nvidia-uvm/uvm_gpu_access_counters.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu_access_counters.c
@@ -985,7 +985,7 @@ static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,
        return NV_OK;

    if (uvm_processor_mask_test(&va_block->resident, processor))
-        residency_mask = uvm_va_block_resident_mask_get(va_block, processor);
+        residency_mask = uvm_va_block_resident_mask_get(va_block, processor, NUMA_NO_NODE);
    else
        residency_mask = NULL;

@@ -1009,6 +1009,7 @@ static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,
        NvU64 address = uvm_va_block_cpu_page_address(va_block, page_index);
        bool read_duplicate = false;
        uvm_processor_id_t new_residency;
+        const uvm_va_policy_t *policy;

        // Ensure that the migratability iterator covers the current address
        while (iter.end < address)
@@ -1035,21 +1036,23 @@ static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,

        // If the underlying VMA is gone, skip HMM migrations.
        if (uvm_va_block_is_hmm(va_block)) {
-            status = uvm_hmm_find_vma(&service_context->block_context, address);
+            status = uvm_hmm_find_vma(service_context->block_context->mm,
+                                      &service_context->block_context->hmm.vma,
+                                      address);
            if (status == NV_ERR_INVALID_ADDRESS)
                continue;

            UVM_ASSERT(status == NV_OK);
        }

-        service_context->block_context.policy = uvm_va_policy_get(va_block, address);
+        policy = uvm_va_policy_get(va_block, address);

        new_residency = uvm_va_block_select_residency(va_block,
-                                                      &service_context->block_context,
+                                                      service_context->block_context,
                                                      page_index,
                                                      processor,
                                                      uvm_fault_access_type_mask_bit(UVM_FAULT_ACCESS_TYPE_PREFETCH),
-                                                      service_context->block_context.policy,
+                                                      policy,
                                                      &thrashing_hint,
                                                      UVM_SERVICE_OPERATION_ACCESS_COUNTERS,
                                                      &read_duplicate);
@@ -1080,7 +1083,7 @@ static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,
        // Remove pages that are already resident in the destination processors
        for_each_id_in_mask(id, &update_processors) {
            bool migrate_pages;
-            uvm_page_mask_t *residency_mask = uvm_va_block_resident_mask_get(va_block, id);
+            uvm_page_mask_t *residency_mask = uvm_va_block_resident_mask_get(va_block, id, NUMA_NO_NODE);
            UVM_ASSERT(residency_mask);

            migrate_pages = uvm_page_mask_andnot(&service_context->per_processor_masks[uvm_id_value(id)].new_residency,
@@ -1094,12 +1097,17 @@ static NV_STATUS service_va_block_locked(uvm_processor_id_t processor,
        if (!uvm_processor_mask_empty(&service_context->resident_processors)) {
            while (first_page_index <= last_page_index) {
                uvm_page_index_t outer = last_page_index + 1;
+                const uvm_va_policy_t *policy;

                if (uvm_va_block_is_hmm(va_block)) {
-                    status = uvm_hmm_find_policy_vma_and_outer(va_block,
-                                                               &service_context->block_context,
-                                                               first_page_index,
-                                                               &outer);
+                    status = NV_ERR_INVALID_ADDRESS;
+                    if (service_context->block_context->mm) {
+                        status = uvm_hmm_find_policy_vma_and_outer(va_block,
+                                                                   &service_context->block_context->hmm.vma,
+                                                                   first_page_index,
+                                                                   &policy,
+                                                                   &outer);
+                    }
                    if (status != NV_OK)
                        break;
                }
@@ -1198,7 +1206,7 @@ static NV_STATUS service_phys_single_va_block(uvm_gpu_t *gpu,

        service_context->operation = UVM_SERVICE_OPERATION_ACCESS_COUNTERS;
        service_context->num_retries = 0;
-        service_context->block_context.mm = mm;
+        service_context->block_context->mm = mm;

        if (uvm_va_block_is_hmm(va_block)) {
            uvm_hmm_service_context_init(service_context);
--- a/kernel-open/nvidia-uvm/uvm_gpu_isr.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu_isr.c
@@ -292,6 +292,7 @@ NV_STATUS uvm_gpu_init_isr(uvm_parent_gpu_t *parent_gpu)
 {
    NV_STATUS status = NV_OK;
    char kthread_name[TASK_COMM_LEN + 1];
+    uvm_va_block_context_t *block_context;

    if (parent_gpu->replayable_faults_supported) {
        status = uvm_gpu_fault_buffer_init(parent_gpu);
@@ -311,6 +312,12 @@ NV_STATUS uvm_gpu_init_isr(uvm_parent_gpu_t *parent_gpu)
        if (!parent_gpu->isr.replayable_faults.stats.cpu_exec_count)
            return NV_ERR_NO_MEMORY;

+        block_context = uvm_va_block_context_alloc(NULL);
+        if (!block_context)
+            return NV_ERR_NO_MEMORY;
+
+        parent_gpu->fault_buffer_info.replayable.block_service_context.block_context = block_context;
+
        parent_gpu->isr.replayable_faults.handling = true;

        snprintf(kthread_name, sizeof(kthread_name), "UVM GPU%u BH", uvm_id_value(parent_gpu->id));
@@ -333,6 +340,12 @@ NV_STATUS uvm_gpu_init_isr(uvm_parent_gpu_t *parent_gpu)
            if (!parent_gpu->isr.non_replayable_faults.stats.cpu_exec_count)
                return NV_ERR_NO_MEMORY;

+            block_context = uvm_va_block_context_alloc(NULL);
+            if (!block_context)
+                return NV_ERR_NO_MEMORY;
+
+            parent_gpu->fault_buffer_info.non_replayable.block_service_context.block_context = block_context;
+
            parent_gpu->isr.non_replayable_faults.handling = true;

            snprintf(kthread_name, sizeof(kthread_name), "UVM GPU%u KC", uvm_id_value(parent_gpu->id));
@@ -356,6 +369,13 @@ NV_STATUS uvm_gpu_init_isr(uvm_parent_gpu_t *parent_gpu)
                return status;
            }

+            block_context = uvm_va_block_context_alloc(NULL);
+            if (!block_context)
+                return NV_ERR_NO_MEMORY;
+
+            parent_gpu->access_counter_buffer_info.batch_service_context.block_service_context.block_context =
+                block_context;
+
            nv_kthread_q_item_init(&parent_gpu->isr.access_counters.bottom_half_q_item,
                                   access_counters_isr_bottom_half_entry,
                                   parent_gpu);
@@ -410,6 +430,8 @@ void uvm_gpu_disable_isr(uvm_parent_gpu_t *parent_gpu)

 void uvm_gpu_deinit_isr(uvm_parent_gpu_t *parent_gpu)
 {
+    uvm_va_block_context_t *block_context;
+
    // Return ownership to RM:
    if (parent_gpu->isr.replayable_faults.was_handling) {
        // No user threads could have anything left on
@@ -439,8 +461,18 @@ void uvm_gpu_deinit_isr(uvm_parent_gpu_t *parent_gpu)
        // It is safe to deinitialize access counters even if they have not been
        // successfully initialized.
        uvm_gpu_deinit_access_counters(parent_gpu);
+        block_context =
+            parent_gpu->access_counter_buffer_info.batch_service_context.block_service_context.block_context;
+        uvm_va_block_context_free(block_context);
    }

+    if (parent_gpu->non_replayable_faults_supported) {
+        block_context = parent_gpu->fault_buffer_info.non_replayable.block_service_context.block_context;
+        uvm_va_block_context_free(block_context);
+    }
+
+    block_context = parent_gpu->fault_buffer_info.replayable.block_service_context.block_context;
+    uvm_va_block_context_free(block_context);
    uvm_kvfree(parent_gpu->isr.replayable_faults.stats.cpu_exec_count);
    uvm_kvfree(parent_gpu->isr.non_replayable_faults.stats.cpu_exec_count);
    uvm_kvfree(parent_gpu->isr.access_counters.stats.cpu_exec_count);
--- a/kernel-open/nvidia-uvm/uvm_gpu_non_replayable_faults.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu_non_replayable_faults.c
@@ -343,6 +343,7 @@ static NV_STATUS service_managed_fault_in_block_locked(uvm_gpu_t *gpu,
    bool read_duplicate;
    uvm_va_space_t *va_space = uvm_va_block_get_va_space(va_block);
    uvm_non_replayable_fault_buffer_info_t *non_replayable_faults = &gpu->parent->fault_buffer_info.non_replayable;
+    const uvm_va_policy_t *policy;

    UVM_ASSERT(!fault_entry->is_fatal);

@@ -352,7 +353,7 @@ static NV_STATUS service_managed_fault_in_block_locked(uvm_gpu_t *gpu,
    UVM_ASSERT(fault_entry->fault_address >= va_block->start);
    UVM_ASSERT(fault_entry->fault_address <= va_block->end);

-    service_context->block_context.policy = uvm_va_policy_get(va_block, fault_entry->fault_address);
+    policy = uvm_va_policy_get(va_block, fault_entry->fault_address);

    if (service_context->num_retries == 0) {
        // notify event to tools/performance heuristics. For now we use a
@@ -361,7 +362,7 @@ static NV_STATUS service_managed_fault_in_block_locked(uvm_gpu_t *gpu,
        uvm_perf_event_notify_gpu_fault(&va_space->perf_events,
                                        va_block,
                                        gpu->id,
-                                        service_context->block_context.policy->preferred_location,
+                                        policy->preferred_location,
                                        fault_entry,
                                        ++non_replayable_faults->batch_id,
                                        false);
@@ -369,7 +370,7 @@ static NV_STATUS service_managed_fault_in_block_locked(uvm_gpu_t *gpu,

    // Check logical permissions
    status = uvm_va_block_check_logical_permissions(va_block,
-                                                    &service_context->block_context,
+                                                    service_context->block_context,
                                                    gpu->id,
                                                    uvm_va_block_cpu_page_index(va_block,
                                                                                fault_entry->fault_address),
@@ -392,11 +393,11 @@ static NV_STATUS service_managed_fault_in_block_locked(uvm_gpu_t *gpu,

    // Compute new residency and update the masks
    new_residency = uvm_va_block_select_residency(va_block,
-                                                  &service_context->block_context,
+                                                  service_context->block_context,
                                                  page_index,
                                                  gpu->id,
                                                  fault_entry->access_type_mask,
-                                                  service_context->block_context.policy,
+                                                  policy,
                                                  &thrashing_hint,
                                                  UVM_SERVICE_OPERATION_NON_REPLAYABLE_FAULTS,
                                                  &read_duplicate);
@@ -628,7 +629,7 @@ static NV_STATUS service_fault(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fault_e
    uvm_gpu_va_space_t *gpu_va_space;
    uvm_non_replayable_fault_buffer_info_t *non_replayable_faults = &gpu->parent->fault_buffer_info.non_replayable;
    uvm_va_block_context_t *va_block_context =
-        &gpu->parent->fault_buffer_info.non_replayable.block_service_context.block_context;
+        gpu->parent->fault_buffer_info.non_replayable.block_service_context.block_context;

    status = uvm_gpu_fault_entry_to_va_space(gpu, fault_entry, &va_space);
    if (status != NV_OK) {
@@ -654,7 +655,7 @@ static NV_STATUS service_fault(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fault_e
    // to remain valid until we release. If no mm is registered, we
    // can only service managed faults, not ATS/HMM faults.
    mm = uvm_va_space_mm_retain_lock(va_space);
-    va_block_context->mm = mm;
+    uvm_va_block_context_init(va_block_context, mm);

    uvm_va_space_down_read(va_space);

@@ -678,10 +679,17 @@ static NV_STATUS service_fault(uvm_gpu_t *gpu, uvm_fault_buffer_entry_t *fault_e
    fault_entry->fault_source.channel_id = user_channel->hw_channel_id;

    if (!fault_entry->is_fatal) {
-        status = uvm_va_block_find_create(fault_entry->va_space,
-                                          fault_entry->fault_address,
-                                          va_block_context,
-                                          &va_block);
+        if (mm) {
+            status = uvm_va_block_find_create(fault_entry->va_space,
+                                              fault_entry->fault_address,
+                                              &va_block_context->hmm.vma,
+                                              &va_block);
+        }
+        else {
+            status = uvm_va_block_find_create_managed(fault_entry->va_space,
+                                                      fault_entry->fault_address,
+                                                      &va_block);
+        }
        if (status == NV_OK)
            status = service_managed_fault_in_block(gpu_va_space->gpu, va_block, fault_entry);
        else
@@ -734,8 +742,6 @@ void uvm_gpu_service_non_replayable_fault_buffer(uvm_gpu_t *gpu)
        // Differently to replayable faults, we do not batch up and preprocess
        // non-replayable faults since getting multiple faults on the same
        // memory region is not very likely
-        //
-        // TODO: Bug 2103669: [UVM/ATS] Optimize ATS fault servicing
        for (i = 0; i < cached_faults; ++i) {
            status = service_fault(gpu, &gpu->parent->fault_buffer_info.non_replayable.fault_cache[i]);
            if (status != NV_OK)
--- a/kernel-open/nvidia-uvm/uvm_gpu_replayable_faults.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu_replayable_faults.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2015-2022 NVIDIA Corporation
+    Copyright (c) 2015-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -697,9 +697,6 @@ static inline int cmp_access_type(uvm_fault_access_type_t a, uvm_fault_access_ty

 typedef enum
 {
-    // Fetch a batch of faults from the buffer.
-    FAULT_FETCH_MODE_BATCH_ALL,
-
    // Fetch a batch of faults from the buffer. Stop at the first entry that is
    // not ready yet
    FAULT_FETCH_MODE_BATCH_READY,
@@ -857,9 +854,7 @@ static NV_STATUS fetch_fault_buffer_entries(uvm_gpu_t *gpu,
        // written out of order
        UVM_SPIN_WHILE(!gpu->parent->fault_buffer_hal->entry_is_valid(gpu->parent, get), &spin) {
            // We have some entry to work on. Let's do the rest later.
-            if (fetch_mode != FAULT_FETCH_MODE_ALL &&
-                fetch_mode != FAULT_FETCH_MODE_BATCH_ALL &&
-                fault_index > 0)
+            if (fetch_mode == FAULT_FETCH_MODE_BATCH_READY && fault_index > 0)
                goto done;
        }

@@ -888,6 +883,7 @@ static NV_STATUS fetch_fault_buffer_entries(uvm_gpu_t *gpu,

        current_entry->va_space = NULL;
        current_entry->filtered = false;
+        current_entry->replayable.cancel_va_mode = UVM_FAULT_CANCEL_VA_MODE_ALL;

        if (current_entry->fault_source.utlb_id > batch_context->max_utlb_id) {
            UVM_ASSERT(current_entry->fault_source.utlb_id < replayable_faults->utlb_count);
@@ -1184,7 +1180,11 @@ static void mark_fault_fatal(uvm_fault_service_batch_context_t *batch_context,
    fault_entry->replayable.cancel_va_mode = cancel_va_mode;

    utlb->has_fatal_faults = true;
-    batch_context->has_fatal_faults = true;
+
+    if (!batch_context->fatal_va_space) {
+        UVM_ASSERT(fault_entry->va_space);
+        batch_context->fatal_va_space = fault_entry->va_space;
+    }
 }

 static void fault_entry_duplicate_flags(uvm_fault_service_batch_context_t *batch_context,
@@ -1234,7 +1234,7 @@ static uvm_fault_access_type_t check_fault_access_permissions(uvm_gpu_t *gpu,
    UvmEventFatalReason fatal_reason;
    uvm_fault_cancel_va_mode_t cancel_va_mode;
    uvm_fault_access_type_t ret = UVM_FAULT_ACCESS_TYPE_COUNT;
-    uvm_va_block_context_t *va_block_context = &service_block_context->block_context;
+    uvm_va_block_context_t *va_block_context = service_block_context->block_context;

    perm_status = uvm_va_block_check_logical_permissions(va_block,
                                                         va_block_context,
@@ -1322,6 +1322,7 @@ static NV_STATUS service_fault_batch_block_locked(uvm_gpu_t *gpu,
    uvm_fault_buffer_entry_t **ordered_fault_cache = batch_context->ordered_fault_cache;
    uvm_service_block_context_t *block_context = &replayable_faults->block_service_context;
    uvm_va_space_t *va_space = uvm_va_block_get_va_space(va_block);
+    const uvm_va_policy_t *policy;
    NvU64 end;

    // Check that all uvm_fault_access_type_t values can fit into an NvU8
@@ -1347,13 +1348,13 @@ static NV_STATUS service_fault_batch_block_locked(uvm_gpu_t *gpu,
    UVM_ASSERT(ordered_fault_cache[first_fault_index]->fault_address <= va_block->end);

    if (uvm_va_block_is_hmm(va_block)) {
-        uvm_hmm_find_policy_end(va_block,
-                                &block_context->block_context,
-                                ordered_fault_cache[first_fault_index]->fault_address,
-                                &end);
+        policy = uvm_hmm_find_policy_end(va_block,
+                                         block_context->block_context->hmm.vma,
+                                         ordered_fault_cache[first_fault_index]->fault_address,
+                                         &end);
    }
    else {
-        block_context->block_context.policy = uvm_va_range_get_policy(va_block->va_range);
+        policy = uvm_va_range_get_policy(va_block->va_range);
        end = va_block->end;
    }

@@ -1377,7 +1378,10 @@ static NV_STATUS service_fault_batch_block_locked(uvm_gpu_t *gpu,
        UVM_ASSERT(current_entry->fault_access_type ==
                   uvm_fault_access_type_mask_highest(current_entry->access_type_mask));

-        current_entry->is_fatal            = false;
+        // Unserviceable faults were already skipped by the caller. There are no
+        // unserviceable fault types that could be in the same VA block as a
+        // serviceable fault.
+        UVM_ASSERT(!current_entry->is_fatal);
        current_entry->is_throttled        = false;
        current_entry->is_invalid_prefetch = false;

@@ -1393,7 +1397,7 @@ static NV_STATUS service_fault_batch_block_locked(uvm_gpu_t *gpu,
            update_batch_and_notify_fault(gpu,
                                          batch_context,
                                          va_block,
-                                          block_context->block_context.policy->preferred_location,
+                                          policy->preferred_location,
                                          current_entry,
                                          is_duplicate);
        }
@@ -1469,11 +1473,11 @@ static NV_STATUS service_fault_batch_block_locked(uvm_gpu_t *gpu,

        // Compute new residency and update the masks
        new_residency = uvm_va_block_select_residency(va_block,
-                                                      &block_context->block_context,
+                                                      block_context->block_context,
                                                      page_index,
                                                      gpu->id,
                                                      service_access_type_mask,
-                                                      block_context->block_context.policy,
+                                                      policy,
                                                      &thrashing_hint,
                                                      UVM_SERVICE_OPERATION_REPLAYABLE_FAULTS,
                                                      &read_duplicate);
@@ -1511,8 +1515,8 @@ static NV_STATUS service_fault_batch_block_locked(uvm_gpu_t *gpu,

    ++block_context->num_retries;

-    if (status == NV_OK && batch_context->has_fatal_faults)
-        status = uvm_va_block_set_cancel(va_block, &block_context->block_context, gpu);
+    if (status == NV_OK && batch_context->fatal_va_space)
+        status = uvm_va_block_set_cancel(va_block, block_context->block_context, gpu);

    return status;
 }
@@ -1625,21 +1629,25 @@ static NV_STATUS service_fault_batch_ats_sub_vma(uvm_gpu_va_space_t *gpu_va_spac
    uvm_ats_fault_context_t *ats_context = &batch_context->ats_context;
    const uvm_page_mask_t *read_fault_mask = &ats_context->read_fault_mask;
    const uvm_page_mask_t *write_fault_mask = &ats_context->write_fault_mask;
-    const uvm_page_mask_t *faults_serviced_mask = &ats_context->faults_serviced_mask;
    const uvm_page_mask_t *reads_serviced_mask = &ats_context->reads_serviced_mask;
-    uvm_page_mask_t *tmp_mask = &ats_context->tmp_mask;
+    uvm_page_mask_t *faults_serviced_mask = &ats_context->faults_serviced_mask;
+    uvm_page_mask_t *faulted_mask = &ats_context->faulted_mask;

    UVM_ASSERT(vma);

    ats_context->client_type = UVM_FAULT_CLIENT_TYPE_GPC;

-    uvm_page_mask_or(tmp_mask, write_fault_mask, read_fault_mask);
+    uvm_page_mask_or(faulted_mask, write_fault_mask, read_fault_mask);

    status = uvm_ats_service_faults(gpu_va_space, vma, base, &batch_context->ats_context);

-    UVM_ASSERT(uvm_page_mask_subset(faults_serviced_mask, tmp_mask));
+    // Remove prefetched pages from the serviced mask since fault servicing
+    // failures belonging to prefetch pages need to be ignored.
+    uvm_page_mask_and(faults_serviced_mask, faults_serviced_mask, faulted_mask);

-    if ((status != NV_OK) || uvm_page_mask_equal(faults_serviced_mask, tmp_mask)) {
+    UVM_ASSERT(uvm_page_mask_subset(faults_serviced_mask, faulted_mask));
+
+    if ((status != NV_OK) || uvm_page_mask_equal(faults_serviced_mask, faulted_mask)) {
        (*block_faults) += (fault_index_end - fault_index_start);
        return status;
    }
@@ -1730,6 +1738,10 @@ static NV_STATUS service_fault_batch_ats_sub(uvm_gpu_va_space_t *gpu_va_space,
        uvm_fault_access_type_t access_type = current_entry->fault_access_type;
        bool is_duplicate = check_fault_entry_duplicate(current_entry, previous_entry);

+        // ATS faults can't be unserviceable, since unserviceable faults require
+        // GMMU PTEs.
+        UVM_ASSERT(!current_entry->is_fatal);
+
        i++;

        update_batch_and_notify_fault(gpu_va_space->gpu,
@@ -1852,7 +1864,7 @@ static NV_STATUS service_fault_batch_dispatch(uvm_va_space_t *va_space,
    uvm_va_block_t *va_block;
    uvm_gpu_t *gpu = gpu_va_space->gpu;
    uvm_va_block_context_t *va_block_context =
-        &gpu->parent->fault_buffer_info.replayable.block_service_context.block_context;
+        gpu->parent->fault_buffer_info.replayable.block_service_context.block_context;
    uvm_fault_buffer_entry_t *current_entry = batch_context->ordered_fault_cache[fault_index];
    struct mm_struct *mm = va_block_context->mm;
    NvU64 fault_address = current_entry->fault_address;
@@ -1867,7 +1879,13 @@ static NV_STATUS service_fault_batch_dispatch(uvm_va_space_t *va_space,
        va_range_next = uvm_va_space_iter_next(va_range_next, ~0ULL);
    }

-    status = uvm_va_block_find_create_in_range(va_space, va_range, fault_address, va_block_context, &va_block);
+    if (va_range)
+        status = uvm_va_block_find_create_in_range(va_space, va_range, fault_address, &va_block);
+    else if (mm)
+        status = uvm_hmm_va_block_find_create(va_space, fault_address, &va_block_context->hmm.vma, &va_block);
+    else
+        status = NV_ERR_INVALID_ADDRESS;
+
    if (status == NV_OK) {
        status = service_fault_batch_block(gpu, va_block, batch_context, fault_index, block_faults);
    }
@@ -1923,14 +1941,198 @@ static NV_STATUS service_fault_batch_dispatch(uvm_va_space_t *va_space,
    return status;
 }

+// Called when a fault in the batch has been marked fatal. Flush the buffer
+// under the VA and mmap locks to remove any potential stale fatal faults, then
+// service all new faults for just that VA space and cancel those which are
+// fatal. Faults in other VA spaces are replayed when done and will be processed
+// when normal fault servicing resumes.
+static NV_STATUS service_fault_batch_for_cancel(uvm_gpu_t *gpu, uvm_fault_service_batch_context_t *batch_context)
+{
+    NV_STATUS status = NV_OK;
+    NvU32 i;
+    uvm_va_space_t *va_space = batch_context->fatal_va_space;
+    uvm_gpu_va_space_t *gpu_va_space = NULL;
+    struct mm_struct *mm;
+    uvm_replayable_fault_buffer_info_t *replayable_faults = &gpu->parent->fault_buffer_info.replayable;
+    uvm_service_block_context_t *service_context = &gpu->parent->fault_buffer_info.replayable.block_service_context;
+    uvm_va_block_context_t *va_block_context = service_context->block_context;
+
+    UVM_ASSERT(gpu->parent->replayable_faults_supported);
+    UVM_ASSERT(va_space);
+
+    // Perform the flush and re-fetch while holding the mmap_lock and the
+    // VA space lock. This avoids stale faults because it prevents any vma
+    // modifications (mmap, munmap, mprotect) from happening between the time HW
+    // takes the fault and we cancel it.
+    mm = uvm_va_space_mm_retain_lock(va_space);
+    uvm_va_block_context_init(va_block_context, mm);
+    uvm_va_space_down_read(va_space);
+
+    // We saw fatal faults in this VA space before. Flush while holding
+    // mmap_lock to make sure those faults come back (aren't stale).
+    //
+    // We need to wait until all old fault messages have arrived before
+    // flushing, hence UVM_GPU_BUFFER_FLUSH_MODE_WAIT_UPDATE_PUT.
+    status = fault_buffer_flush_locked(gpu,
+                                       UVM_GPU_BUFFER_FLUSH_MODE_WAIT_UPDATE_PUT,
+                                       UVM_FAULT_REPLAY_TYPE_START,
+                                       batch_context);
+    if (status != NV_OK)
+        goto done;
+
+    // Wait for the flush's replay to finish to give the legitimate faults a
+    // chance to show up in the buffer again.
+    status = uvm_tracker_wait(&replayable_faults->replay_tracker);
+    if (status != NV_OK)
+        goto done;
+
+    // We expect all replayed faults to have arrived in the buffer so we can re-
+    // service them. The replay-and-wait sequence above will ensure they're all
+    // in the HW buffer. When GSP owns the HW buffer, we also have to wait for
+    // GSP to copy all available faults from the HW buffer into the shadow
+    // buffer.
+    //
+    // TODO: Bug 2533557: This flush does not actually guarantee that GSP will
+    //       copy over all faults.
+    status = hw_fault_buffer_flush_locked(gpu->parent);
+    if (status != NV_OK)
+        goto done;
+
+    // If there is no GPU VA space for the GPU, ignore all faults in the VA
+    // space. This can happen if the GPU VA space has been destroyed since we
+    // unlocked the VA space in service_fault_batch. That means the fatal faults
+    // are stale, because unregistering the GPU VA space requires preempting the
+    // context and detaching all channels in that VA space. Restart fault
+    // servicing from the top.
+    gpu_va_space = uvm_gpu_va_space_get_by_parent_gpu(va_space, gpu->parent);
+    if (!gpu_va_space)
+        goto done;
+
+    // Re-parse the new faults
+    batch_context->num_invalid_prefetch_faults = 0;
+    batch_context->num_duplicate_faults        = 0;
+    batch_context->num_replays                 = 0;
+    batch_context->fatal_va_space              = NULL;
+    batch_context->has_throttled_faults        = false;
+
+    status = fetch_fault_buffer_entries(gpu, batch_context, FAULT_FETCH_MODE_ALL);
+    if (status != NV_OK)
+        goto done;
+
+    // No more faults left. Either the previously-seen fatal entry was stale, or
+    // RM killed the context underneath us.
+    if (batch_context->num_cached_faults == 0)
+        goto done;
+
+    ++batch_context->batch_id;
+
+    status = preprocess_fault_batch(gpu, batch_context);
+    if (status != NV_OK) {
+        if (status == NV_WARN_MORE_PROCESSING_REQUIRED) {
+            // Another flush happened due to stale faults or a context-fatal
+            // error. The previously-seen fatal fault might not exist anymore,
+            // so restart fault servicing from the top.
+            status = NV_OK;
+        }
+
+        goto done;
+    }
+
+    // Search for the target VA space
+    for (i = 0; i < batch_context->num_coalesced_faults; i++) {
+        uvm_fault_buffer_entry_t *current_entry = batch_context->ordered_fault_cache[i];
+        UVM_ASSERT(current_entry->va_space);
+        if (current_entry->va_space == va_space)
+            break;
+    }
+
+    while (i < batch_context->num_coalesced_faults) {
+        uvm_fault_buffer_entry_t *current_entry = batch_context->ordered_fault_cache[i];
+
+        if (current_entry->va_space != va_space)
+            break;
+
+        // service_fault_batch_dispatch() doesn't expect unserviceable faults.
+        // Just cancel them directly.
+        if (current_entry->is_fatal) {
+            status = cancel_fault_precise_va(gpu, current_entry, UVM_FAULT_CANCEL_VA_MODE_ALL);
+            if (status != NV_OK)
+                break;
+
+            ++i;
+        }
+        else {
+            uvm_ats_fault_invalidate_t *ats_invalidate = &gpu->parent->fault_buffer_info.replayable.ats_invalidate;
+            NvU32 block_faults;
+
+            ats_invalidate->write_faults_in_batch = false;
+            uvm_hmm_service_context_init(service_context);
+
+            // Service all the faults that we can. We only really need to search
+            // for fatal faults, but attempting to service all is the easiest
+            // way to do that.
+            status = service_fault_batch_dispatch(va_space, gpu_va_space, batch_context, i, &block_faults, false);
+            if (status != NV_OK) {
+                // TODO: Bug 3900733: clean up locking in service_fault_batch().
+                // We need to drop lock and retry. That means flushing and
+                // starting over.
+                if (status == NV_WARN_MORE_PROCESSING_REQUIRED)
+                    status = NV_OK;
+
+                break;
+            }
+
+            // Invalidate TLBs before cancel to ensure that fatal faults don't
+            // get stuck in HW behind non-fatal faults to the same line.
+            status = uvm_ats_invalidate_tlbs(gpu_va_space, ats_invalidate, &batch_context->tracker);
+            if (status != NV_OK)
+                break;
+
+            while (block_faults-- > 0) {
+                current_entry = batch_context->ordered_fault_cache[i];
+                if (current_entry->is_fatal) {
+                    status = cancel_fault_precise_va(gpu, current_entry, current_entry->replayable.cancel_va_mode);
+                    if (status != NV_OK)
+                        break;
+                }
+
+                ++i;
+            }
+        }
+    }
+
+done:
+    uvm_va_space_up_read(va_space);
+    uvm_va_space_mm_release_unlock(va_space, mm);
+
+    if (status == NV_OK) {
+        // There are two reasons to flush the fault buffer here.
+        //
+        // 1) Functional. We need to replay both the serviced non-fatal faults
+        //    and the skipped faults in other VA spaces. The former need to be
+        //    restarted and the latter need to be replayed so the normal fault
+        //    service mechanism can fetch and process them.
+        //
+        // 2) Performance. After cancelling the fatal faults, a flush removes
+        //    any potential duplicated fault that may have been added while
+        //    processing the faults in this batch. This flush also avoids doing
+        //    unnecessary processing after the fatal faults have been cancelled,
+        //    so all the rest are unlikely to remain after a replay because the
+        //    context is probably in the process of dying.
+        status = fault_buffer_flush_locked(gpu,
+                                           UVM_GPU_BUFFER_FLUSH_MODE_UPDATE_PUT,
+                                           UVM_FAULT_REPLAY_TYPE_START,
+                                           batch_context);
+    }
+
+    return status;
+}
 // Scan the ordered view of faults and group them by different va_blocks
 // (managed faults) and service faults for each va_block, in batch.
 // Service non-managed faults one at a time as they are encountered during the
 // scan.
 //
-// This function returns NV_WARN_MORE_PROCESSING_REQUIRED if the fault buffer
-// was flushed because the needs_fault_buffer_flush flag was set on some GPU VA
-// space
+// Fatal faults are marked for later processing by the caller.
 static NV_STATUS service_fault_batch(uvm_gpu_t *gpu,
                                     fault_service_mode_t service_mode,
                                     uvm_fault_service_batch_context_t *batch_context)
@@ -1945,7 +2147,7 @@ static NV_STATUS service_fault_batch(uvm_gpu_t *gpu,
                                     gpu->parent->fault_buffer_info.replayable.replay_policy == UVM_PERF_FAULT_REPLAY_POLICY_BLOCK;
    uvm_service_block_context_t *service_context =
        &gpu->parent->fault_buffer_info.replayable.block_service_context;
-    uvm_va_block_context_t *va_block_context = &service_context->block_context;
+    uvm_va_block_context_t *va_block_context = service_context->block_context;

    UVM_ASSERT(gpu->parent->replayable_faults_supported);

@@ -1981,41 +2183,28 @@ static NV_STATUS service_fault_batch(uvm_gpu_t *gpu,
            // to remain valid until we release. If no mm is registered, we
            // can only service managed faults, not ATS/HMM faults.
            mm = uvm_va_space_mm_retain_lock(va_space);
-            va_block_context->mm = mm;
+            uvm_va_block_context_init(va_block_context, mm);

            uvm_va_space_down_read(va_space);
-
            gpu_va_space = uvm_gpu_va_space_get_by_parent_gpu(va_space, gpu->parent);
-            if (uvm_processor_mask_test_and_clear_atomic(&va_space->needs_fault_buffer_flush, gpu->id)) {
-                status = fault_buffer_flush_locked(gpu,
-                                                   UVM_GPU_BUFFER_FLUSH_MODE_WAIT_UPDATE_PUT,
-                                                   UVM_FAULT_REPLAY_TYPE_START,
-                                                   batch_context);
-                if (status == NV_OK)
-                    status = NV_WARN_MORE_PROCESSING_REQUIRED;
-
-                break;
-            }
-
-            // The case where there is no valid GPU VA space for the GPU in this
-            // VA space is handled next
        }

        // Some faults could be already fatal if they cannot be handled by
        // the UVM driver
        if (current_entry->is_fatal) {
            ++i;
-            batch_context->has_fatal_faults = true;
+            if (!batch_context->fatal_va_space)
+                batch_context->fatal_va_space = va_space;
+
            utlb->has_fatal_faults = true;
            UVM_ASSERT(utlb->num_pending_faults > 0);
            continue;
        }

-        if (!uvm_processor_mask_test(&va_space->registered_gpu_va_spaces, gpu->parent->id)) {
+        if (!gpu_va_space) {
            // If there is no GPU VA space for the GPU, ignore the fault. This
            // can happen if a GPU VA space is destroyed without explicitly
-            // freeing all memory ranges (destroying the VA range triggers a
-            // flush of the fault buffer) and there are stale entries in the
+            // freeing all memory ranges and there are stale entries in the
            // buffer that got fixed by the servicing in a previous batch.
            ++i;
            continue;
@@ -2033,15 +2222,17 @@ static NV_STATUS service_fault_batch(uvm_gpu_t *gpu,
            uvm_va_space_mm_release_unlock(va_space, mm);
            mm = NULL;
            va_space = NULL;
+            status = NV_OK;
            continue;
        }
+
        if (status != NV_OK)
            goto fail;

        i += block_faults;

        // Don't issue replays in cancel mode
-        if (replay_per_va_block && !batch_context->has_fatal_faults) {
+        if (replay_per_va_block && !batch_context->fatal_va_space) {
            status = push_replay_on_gpu(gpu, UVM_FAULT_REPLAY_TYPE_START, batch_context);
            if (status != NV_OK)
                goto fail;
@@ -2053,8 +2244,6 @@ static NV_STATUS service_fault_batch(uvm_gpu_t *gpu,
        }
    }

-    // Only clobber status if invalidate_status != NV_OK, since status may also
-    // contain NV_WARN_MORE_PROCESSING_REQUIRED.
    if (va_space != NULL) {
        NV_STATUS invalidate_status = uvm_ats_invalidate_tlbs(gpu_va_space, ats_invalidate, &batch_context->tracker);
        if (invalidate_status != NV_OK)
@@ -2262,77 +2451,48 @@ static NvU32 is_fatal_fault_in_buffer(uvm_fault_service_batch_context_t *batch_c
    return false;
 }

-typedef enum
-{
-    // Only cancel faults flagged as fatal
-    FAULT_CANCEL_MODE_FATAL,
-
-    // Cancel all faults in the batch unconditionally
-    FAULT_CANCEL_MODE_ALL,
-} fault_cancel_mode_t;
-
-// Cancel faults in the given fault service batch context. The function provides
-// two different modes depending on the value of cancel_mode:
-// - If cancel_mode == FAULT_CANCEL_MODE_FATAL, only faults flagged as fatal
-// will be cancelled. In this case, the reason reported to tools is the one
-// contained in the fault entry itself.
-// - If cancel_mode == FAULT_CANCEL_MODE_ALL, all faults will be cancelled
-// unconditionally. In this case, the reason reported to tools for non-fatal
-// faults is the one passed to this function.
-static NV_STATUS cancel_faults_precise_va(uvm_gpu_t *gpu,
-                                          uvm_fault_service_batch_context_t *batch_context,
-                                          fault_cancel_mode_t cancel_mode,
-                                          UvmEventFatalReason reason)
+// Cancel all faults in the given fault service batch context, even those not
+// marked as fatal.
+static NV_STATUS cancel_faults_all(uvm_gpu_t *gpu,
+                                   uvm_fault_service_batch_context_t *batch_context,
+                                   UvmEventFatalReason reason)
 {
    NV_STATUS status = NV_OK;
    NV_STATUS fault_status;
-    uvm_va_space_t *va_space = NULL;
-    NvU32 i;
+    NvU32 i = 0;

    UVM_ASSERT(gpu->parent->fault_cancel_va_supported);
-    if (cancel_mode == FAULT_CANCEL_MODE_ALL)
-        UVM_ASSERT(reason != UvmEventFatalReasonInvalid);
+    UVM_ASSERT(reason != UvmEventFatalReasonInvalid);

-    for (i = 0; i < batch_context->num_coalesced_faults; ++i) {
+    while (i < batch_context->num_coalesced_faults && status == NV_OK) {
        uvm_fault_buffer_entry_t *current_entry = batch_context->ordered_fault_cache[i];
+        uvm_va_space_t *va_space = current_entry->va_space;
+        bool skip_va_space;

-        UVM_ASSERT(current_entry->va_space);
+        UVM_ASSERT(va_space);

-        if (current_entry->va_space != va_space) {
-            // Fault on a different va_space, drop the lock of the old one...
-            if (va_space != NULL)
-                uvm_va_space_up_read(va_space);
+        uvm_va_space_down_read(va_space);

-            va_space = current_entry->va_space;
+        // If there is no GPU VA space for the GPU, ignore all faults in
+        // that VA space. This can happen if the GPU VA space has been
+        // destroyed since we unlocked the VA space in service_fault_batch.
+        // Ignoring the fault avoids targetting a PDB that might have been
+        // reused by another process.
+        skip_va_space = !uvm_gpu_va_space_get_by_parent_gpu(va_space, gpu->parent);

-            // ... and take the lock of the new one
-            uvm_va_space_down_read(va_space);
+        for (;
+             i < batch_context->num_coalesced_faults && current_entry->va_space == va_space;
+             current_entry = batch_context->ordered_fault_cache[++i]) {
+            uvm_fault_cancel_va_mode_t cancel_va_mode;

-            // We don't need to check whether a buffer flush is required
-            // (due to VA range destruction).
-            // - For cancel_mode == FAULT_CANCEL_MODE_FATAL, once a fault is
-            // flagged as fatal we need to cancel it, even if its VA range no
-            // longer exists.
-            // - For cancel_mode == FAULT_CANCEL_MODE_ALL we don't care about
-            // any of this, we just want to trigger RC in RM.
-        }
+            if (skip_va_space)
+                continue;

-        if (!uvm_processor_mask_test(&va_space->registered_gpu_va_spaces, gpu->parent->id)) {
-            // If there is no GPU VA space for the GPU, ignore the fault.
-            // This can happen if the GPU VA did not exist in
-            // service_fault_batch(), or it was destroyed since then.
-            // This is to avoid targetting a PDB that might have been reused
-            // by another process.
-            continue;
-        }
-
-        // Cancel the fault
-        if (cancel_mode == FAULT_CANCEL_MODE_ALL || current_entry->is_fatal) {
-            uvm_fault_cancel_va_mode_t cancel_va_mode = current_entry->replayable.cancel_va_mode;
-
-            // If cancelling unconditionally and the fault was not fatal,
-            // set the cancel reason passed to this function
-            if (!current_entry->is_fatal) {
+            if (current_entry->is_fatal) {
+                UVM_ASSERT(current_entry->fatal_reason != UvmEventFatalReasonInvalid);
+                cancel_va_mode = current_entry->replayable.cancel_va_mode;
+            }
+            else {
                current_entry->fatal_reason = reason;
                cancel_va_mode = UVM_FAULT_CANCEL_VA_MODE_ALL;
            }
@@ -2341,17 +2501,13 @@ static NV_STATUS cancel_faults_precise_va(uvm_gpu_t *gpu,
            if (status != NV_OK)
                break;
        }
+
+        uvm_va_space_up_read(va_space);
    }

-    if (va_space != NULL)
-        uvm_va_space_up_read(va_space);
-
-    // After cancelling the fatal faults, the fault buffer is flushed to remove
-    // any potential duplicated fault that may have been added while processing
-    // the faults in this batch. This flush also avoids doing unnecessary
-    // processing after the fatal faults have been cancelled, so all the rest
-    // are unlikely to remain after a replay because the context is probably in
-    // the process of dying.
+    // Because each cancel itself triggers a replay, there may be a large number
+    // of new duplicated faults in the buffer after cancelling all the known
+    // ones. Flushing the buffer discards them to avoid unnecessary processing.
    fault_status = fault_buffer_flush_locked(gpu,
                                             UVM_GPU_BUFFER_FLUSH_MODE_UPDATE_PUT,
                                             UVM_FAULT_REPLAY_TYPE_START,
@@ -2399,12 +2555,12 @@ static void cancel_fault_batch(uvm_gpu_t *gpu,
                               uvm_fault_service_batch_context_t *batch_context,
                               UvmEventFatalReason reason)
 {
-    if (gpu->parent->fault_cancel_va_supported) {
-        cancel_faults_precise_va(gpu, batch_context, FAULT_CANCEL_MODE_ALL, reason);
-        return;
-    }
-
-    cancel_fault_batch_tlb(gpu, batch_context, reason);
+    // Return code is ignored since we're on a global error path and wouldn't be
+    // able to recover anyway.
+    if (gpu->parent->fault_cancel_va_supported)
+        cancel_faults_all(gpu, batch_context, reason);
+    else
+        cancel_fault_batch_tlb(gpu, batch_context, reason);
 }


@@ -2491,7 +2647,7 @@ static NV_STATUS cancel_faults_precise_tlb(uvm_gpu_t *gpu, uvm_fault_service_bat

        batch_context->num_invalid_prefetch_faults = 0;
        batch_context->num_replays                 = 0;
-        batch_context->has_fatal_faults            = false;
+        batch_context->fatal_va_space              = NULL;
        batch_context->has_throttled_faults        = false;

        // 5) Fetch all faults from buffer
@@ -2538,9 +2694,6 @@ static NV_STATUS cancel_faults_precise_tlb(uvm_gpu_t *gpu, uvm_fault_service_bat
        // 8) Service all non-fatal faults and mark all non-serviceable faults
        // as fatal
        status = service_fault_batch(gpu, FAULT_SERVICE_MODE_CANCEL, batch_context);
-        if (status == NV_WARN_MORE_PROCESSING_REQUIRED)
-            continue;
-
        UVM_ASSERT(batch_context->num_replays == 0);
        if (status == NV_ERR_NO_MEMORY)
            continue;
@@ -2548,7 +2701,7 @@ static NV_STATUS cancel_faults_precise_tlb(uvm_gpu_t *gpu, uvm_fault_service_bat
            break;

        // No more fatal faults left, we are done
-        if (!batch_context->has_fatal_faults)
+        if (!batch_context->fatal_va_space)
            break;

        // 9) Search for uTLBs that contain fatal faults and meet the
@@ -2570,13 +2723,9 @@ static NV_STATUS cancel_faults_precise_tlb(uvm_gpu_t *gpu, uvm_fault_service_bat

 static NV_STATUS cancel_faults_precise(uvm_gpu_t *gpu, uvm_fault_service_batch_context_t *batch_context)
 {
-    UVM_ASSERT(batch_context->has_fatal_faults);
-    if (gpu->parent->fault_cancel_va_supported) {
-        return cancel_faults_precise_va(gpu,
-                                        batch_context,
-                                        FAULT_CANCEL_MODE_FATAL,
-                                        UvmEventFatalReasonInvalid);
-    }
+    UVM_ASSERT(batch_context->fatal_va_space);
+    if (gpu->parent->fault_cancel_va_supported)
+        return service_fault_batch_for_cancel(gpu, batch_context);

    return cancel_faults_precise_tlb(gpu, batch_context);
 }
@@ -2632,7 +2781,7 @@ void uvm_gpu_service_replayable_faults(uvm_gpu_t *gpu)
        batch_context->num_invalid_prefetch_faults = 0;
        batch_context->num_duplicate_faults        = 0;
        batch_context->num_replays                 = 0;
-        batch_context->has_fatal_faults            = false;
+        batch_context->fatal_va_space              = NULL;
        batch_context->has_throttled_faults        = false;

        status = fetch_fault_buffer_entries(gpu, batch_context, FAULT_FETCH_MODE_BATCH_READY);
@@ -2660,9 +2809,6 @@ void uvm_gpu_service_replayable_faults(uvm_gpu_t *gpu)
        // was flushed
        num_replays += batch_context->num_replays;

-        if (status == NV_WARN_MORE_PROCESSING_REQUIRED)
-            continue;
-
        enable_disable_prefetch_faults(gpu->parent, batch_context);

        if (status != NV_OK) {
@@ -2676,10 +2822,17 @@ void uvm_gpu_service_replayable_faults(uvm_gpu_t *gpu)
            break;
        }

-        if (batch_context->has_fatal_faults) {
+        if (batch_context->fatal_va_space) {
            status = uvm_tracker_wait(&batch_context->tracker);
-            if (status == NV_OK)
+            if (status == NV_OK) {
                status = cancel_faults_precise(gpu, batch_context);
+                if (status == NV_OK) {
+                    // Cancel handling should've issued at least one replay
+                    UVM_ASSERT(batch_context->num_replays > 0);
+                    ++num_batches;
+                    continue;
+                }
+            }

            break;
        }
--- a/kernel-open/nvidia-uvm/uvm_gpu_semaphore.c
+++ b/kernel-open/nvidia-uvm/uvm_gpu_semaphore.c
@@ -579,8 +579,10 @@ static void uvm_gpu_semaphore_encrypted_payload_update(uvm_channel_t *channel, u
    void *auth_tag_cpu_addr = uvm_rm_mem_get_cpu_va(semaphore->conf_computing.auth_tag);
    NvU32 *gpu_notifier_cpu_addr = (NvU32 *)uvm_rm_mem_get_cpu_va(semaphore->conf_computing.notifier);
    NvU32 *payload_cpu_addr = (NvU32 *)uvm_rm_mem_get_cpu_va(semaphore->conf_computing.encrypted_payload);
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);

-    UVM_ASSERT(uvm_channel_is_secure_ce(channel));
+    UVM_ASSERT(uvm_conf_computing_mode_enabled(gpu));
+    UVM_ASSERT(uvm_channel_is_ce(channel));

    last_observed_notifier = semaphore->conf_computing.last_observed_notifier;
    gpu_notifier = UVM_READ_ONCE(*gpu_notifier_cpu_addr);
--- a/kernel-open/nvidia-uvm/uvm_gpu_semaphore.h
+++ b/kernel-open/nvidia-uvm/uvm_gpu_semaphore.h
@@ -91,9 +91,9 @@ struct uvm_gpu_tracking_semaphore_struct
 // Create a semaphore pool for a GPU.
 NV_STATUS uvm_gpu_semaphore_pool_create(uvm_gpu_t *gpu, uvm_gpu_semaphore_pool_t **pool_out);

-// When the Confidential Computing feature is enabled, pools associated with
-// secure CE channels are allocated in the CPR of vidmem and as such have
-// all the associated access restrictions. Because of this, they're called
+// When the Confidential Computing feature is enabled, semaphore pools
+// associated with CE channels are allocated in the CPR of vidmem and as such
+// have all the associated access restrictions. Because of this, they're called
 // secure pools and secure semaphores are allocated out of said secure pools.
 NV_STATUS uvm_gpu_semaphore_secure_pool_create(uvm_gpu_t *gpu, uvm_gpu_semaphore_pool_t **pool_out);

--- a/kernel-open/nvidia-uvm/uvm_hal.c
+++ b/kernel-open/nvidia-uvm/uvm_hal.c
@@ -794,7 +794,7 @@ uvm_membar_t uvm_hal_downgrade_membar_type(uvm_gpu_t *gpu, bool is_local_vidmem)
    // memory, including those from other processors like the CPU or peer GPUs,
    // must come through this GPU's L2. In all current architectures, MEMBAR_GPU
    // is sufficient to resolve ordering at the L2 level.
-    if (is_local_vidmem && !uvm_gpu_is_coherent(gpu->parent) && !uvm_downgrade_force_membar_sys)
+    if (is_local_vidmem && !uvm_parent_gpu_is_coherent(gpu->parent) && !uvm_downgrade_force_membar_sys)
        return UVM_MEMBAR_GPU;

    // If the mapped memory was remote, or if a coherence protocol can cache
--- a/kernel-open/nvidia-uvm/uvm_hmm.c
+++ b/kernel-open/nvidia-uvm/uvm_hmm.c
@@ -60,6 +60,8 @@ module_param(uvm_disable_hmm, bool, 0444);
 #include "uvm_gpu.h"
 #include "uvm_pmm_gpu.h"
 #include "uvm_hal_types.h"
+#include "uvm_push.h"
+#include "uvm_hal.h"
 #include "uvm_va_block_types.h"
 #include "uvm_va_space_mm.h"
 #include "uvm_va_space.h"
@@ -127,32 +129,111 @@ static uvm_va_block_t *hmm_va_block_from_node(uvm_range_tree_node_t *node)
    return container_of(node, uvm_va_block_t, hmm.node);
 }

-NV_STATUS uvm_hmm_va_space_initialize(uvm_va_space_t *va_space)
+// Copies the contents of the source device-private page to the
+// destination CPU page. This will invalidate mappings, so cannot be
+// called while holding any va_block locks.
+static NV_STATUS uvm_hmm_copy_devmem_page(struct page *dst_page, struct page *src_page, uvm_tracker_t *tracker)
 {
-    uvm_hmm_va_space_t *hmm_va_space = &va_space->hmm;
-    struct mm_struct *mm = va_space->va_space_mm.mm;
+    uvm_gpu_phys_address_t src_addr;
+    uvm_gpu_phys_address_t dst_addr;
+    uvm_gpu_chunk_t *gpu_chunk;
+    NvU64 dma_addr;
+    uvm_push_t push;
+    NV_STATUS status = NV_OK;
+    uvm_gpu_t *gpu;
+
+    // Holding a reference on the device-private page ensures the gpu
+    // is already retained. This is because when a GPU is unregistered
+    // all device-private pages are migrated back to the CPU and freed
+    // before releasing the GPU. Therefore if we could get a reference
+    // to the page the GPU must be retained.
+    UVM_ASSERT(is_device_private_page(src_page) && page_count(src_page));
+    gpu_chunk = uvm_pmm_devmem_page_to_chunk(src_page);
+    gpu = uvm_gpu_chunk_get_gpu(gpu_chunk);
+    status = uvm_mmu_chunk_map(gpu_chunk);
+    if (status != NV_OK)
+        return status;
+
+    status = uvm_gpu_map_cpu_pages(gpu->parent, dst_page, PAGE_SIZE, &dma_addr);
+    if (status != NV_OK)
+        goto out_unmap_gpu;
+
+    dst_addr = uvm_gpu_phys_address(UVM_APERTURE_SYS, dma_addr);
+    src_addr = uvm_gpu_phys_address(UVM_APERTURE_VID, gpu_chunk->address);
+    status = uvm_push_begin_acquire(gpu->channel_manager,
+                                    UVM_CHANNEL_TYPE_GPU_TO_CPU,
+                                    tracker,
+                                    &push,
+                                    "Copy for remote process fault");
+    if (status != NV_OK)
+        goto out_unmap_cpu;
+
+    gpu->parent->ce_hal->memcopy(&push,
+                                 uvm_gpu_address_copy(gpu, dst_addr),
+                                 uvm_gpu_address_copy(gpu, src_addr),
+                                 PAGE_SIZE);
+    uvm_push_end(&push);
+    status = uvm_tracker_add_push_safe(tracker, &push);
+
+out_unmap_cpu:
+    uvm_gpu_unmap_cpu_pages(gpu->parent, dma_addr, PAGE_SIZE);
+
+out_unmap_gpu:
+    uvm_mmu_chunk_unmap(gpu_chunk, NULL);
+
+    return status;
+}
+
+static NV_STATUS uvm_hmm_pmm_gpu_evict_pfn(unsigned long pfn)
+{
+    unsigned long src_pfn = 0;
+    unsigned long dst_pfn = 0;
+    struct page *dst_page;
+    NV_STATUS status = NV_OK;
    int ret;

-    if (!uvm_hmm_is_enabled(va_space))
-        return NV_OK;
+    ret = migrate_device_range(&src_pfn, pfn, 1);
+    if (ret)
+        return errno_to_nv_status(ret);

-    uvm_assert_mmap_lock_locked_write(mm);
-    uvm_assert_rwsem_locked_write(&va_space->lock);
+    if (src_pfn & MIGRATE_PFN_MIGRATE) {
+        uvm_tracker_t tracker = UVM_TRACKER_INIT();
+
+        dst_page = alloc_page(GFP_HIGHUSER_MOVABLE);
+        if (!dst_page) {
+            status = NV_ERR_NO_MEMORY;
+            goto out;
+        }
+
+        lock_page(dst_page);
+        if (WARN_ON(uvm_hmm_copy_devmem_page(dst_page, migrate_pfn_to_page(src_pfn), &tracker) != NV_OK))
+            memzero_page(dst_page, 0, PAGE_SIZE);
+
+        dst_pfn = migrate_pfn(page_to_pfn(dst_page));
+        migrate_device_pages(&src_pfn, &dst_pfn, 1);
+        uvm_tracker_wait_deinit(&tracker);
+    }
+
+out:
+    migrate_device_finalize(&src_pfn, &dst_pfn, 1);
+
+    if (!(src_pfn & MIGRATE_PFN_MIGRATE))
+        status = NV_ERR_BUSY_RETRY;
+
+    return status;
+}
+
+void uvm_hmm_va_space_initialize(uvm_va_space_t *va_space)
+{
+    uvm_hmm_va_space_t *hmm_va_space = &va_space->hmm;
+
+    if (!uvm_hmm_is_enabled(va_space))
+        return;

    uvm_range_tree_init(&hmm_va_space->blocks);
    uvm_mutex_init(&hmm_va_space->blocks_lock, UVM_LOCK_ORDER_LEAF);

-    // Initialize MMU interval notifiers for this process.
-    // This allows mmu_interval_notifier_insert() to be called without holding
-    // the mmap_lock for write.
-    // Note: there is no __mmu_notifier_unregister(), this call just allocates
-    // memory which is attached to the mm_struct and freed when the mm_struct
-    // is freed.
-    ret = __mmu_notifier_register(NULL, mm);
-    if (ret)
-        return errno_to_nv_status(ret);
-
-    return NV_OK;
+    return;
 }

 void uvm_hmm_va_space_destroy(uvm_va_space_t *va_space)
@@ -201,6 +282,9 @@ void uvm_hmm_unregister_gpu(uvm_va_space_t *va_space, uvm_gpu_t *gpu, struct mm_
 {
    uvm_range_tree_node_t *node;
    uvm_va_block_t *va_block;
+    struct range range = gpu->pmm.devmem.pagemap.range;
+    unsigned long pfn;
+    bool retry;

    if (!uvm_hmm_is_enabled(va_space))
        return;
@@ -209,6 +293,29 @@ void uvm_hmm_unregister_gpu(uvm_va_space_t *va_space, uvm_gpu_t *gpu, struct mm_
        uvm_assert_mmap_lock_locked(mm);
    uvm_assert_rwsem_locked_write(&va_space->lock);

+    // There could be pages with page->zone_device_data pointing to the va_space
+    // which may be about to be freed. Migrate those back to the CPU so we don't
+    // fault on them. Normally infinite retries are bad, but we don't have any
+    // option here. Device-private pages can't be pinned so migration should
+    // eventually succeed. Even if we did eventually bail out of the loop we'd
+    // just stall in memunmap_pages() anyway.
+    do {
+        retry = false;
+
+        for (pfn = __phys_to_pfn(range.start); pfn <= __phys_to_pfn(range.end); pfn++) {
+            struct page *page = pfn_to_page(pfn);
+
+            UVM_ASSERT(is_device_private_page(page));
+
+            // This check is racy because nothing stops the page being freed and
+            // even reused. That doesn't matter though - worst case the
+            // migration fails, we retry and find the va_space doesn't match.
+            if (page->zone_device_data == va_space)
+                if (uvm_hmm_pmm_gpu_evict_pfn(pfn) != NV_OK)
+                    retry = true;
+        }
+    } while (retry);
+
    uvm_range_tree_for_each(node, &va_space->hmm.blocks) {
        va_block = hmm_va_block_from_node(node);

@@ -325,7 +432,6 @@ static bool hmm_invalidate(uvm_va_block_t *va_block,
    region = uvm_va_block_region_from_start_end(va_block, start, end);

    va_block_context->hmm.vma = NULL;
-    va_block_context->policy = NULL;

    // We only need to unmap GPUs since Linux handles the CPUs.
    for_each_gpu_id_in_mask(id, &va_block->mapped) {
@@ -444,11 +550,11 @@ static void hmm_va_block_init(uvm_va_block_t *va_block,
 static NV_STATUS hmm_va_block_find_create(uvm_va_space_t *va_space,
                                          NvU64 addr,
                                          bool allow_unreadable_vma,
-                                          uvm_va_block_context_t *va_block_context,
+                                          struct vm_area_struct **vma_out,
                                          uvm_va_block_t **va_block_ptr)
 {
-    struct mm_struct *mm = va_space->va_space_mm.mm;
-    struct vm_area_struct *vma;
+    struct mm_struct *mm;
+    struct vm_area_struct *va_block_vma;
    uvm_va_block_t *va_block;
    NvU64 start, end;
    NV_STATUS status;
@@ -457,15 +563,14 @@ static NV_STATUS hmm_va_block_find_create(uvm_va_space_t *va_space,
    if (!uvm_hmm_is_enabled(va_space))
        return NV_ERR_INVALID_ADDRESS;

-    UVM_ASSERT(mm);
-    UVM_ASSERT(!va_block_context || va_block_context->mm == mm);
+    mm = va_space->va_space_mm.mm;
    uvm_assert_mmap_lock_locked(mm);
    uvm_assert_rwsem_locked(&va_space->lock);
    UVM_ASSERT(PAGE_ALIGNED(addr));

    // Note that we have to allow PROT_NONE VMAs so that policies can be set.
-    vma = find_vma(mm, addr);
-    if (!uvm_hmm_vma_is_valid(vma, addr, allow_unreadable_vma))
+    va_block_vma = find_vma(mm, addr);
+    if (!uvm_hmm_vma_is_valid(va_block_vma, addr, allow_unreadable_vma))
        return NV_ERR_INVALID_ADDRESS;

    // Since we only hold the va_space read lock, there can be multiple
@@ -517,8 +622,8 @@ static NV_STATUS hmm_va_block_find_create(uvm_va_space_t *va_space,

 done:
    uvm_mutex_unlock(&va_space->hmm.blocks_lock);
-    if (va_block_context)
-        va_block_context->hmm.vma = vma;
+    if (vma_out)
+        *vma_out = va_block_vma;
    *va_block_ptr = va_block;
    return NV_OK;

@@ -532,43 +637,36 @@ err_unlock:

 NV_STATUS uvm_hmm_va_block_find_create(uvm_va_space_t *va_space,
                                       NvU64 addr,
-                                       uvm_va_block_context_t *va_block_context,
+                                       struct vm_area_struct **vma,
                                       uvm_va_block_t **va_block_ptr)
 {
-    return hmm_va_block_find_create(va_space, addr, false, va_block_context, va_block_ptr);
+    return hmm_va_block_find_create(va_space, addr, false, vma, va_block_ptr);
 }

-NV_STATUS uvm_hmm_find_vma(uvm_va_block_context_t *va_block_context, NvU64 addr)
+NV_STATUS uvm_hmm_find_vma(struct mm_struct *mm, struct vm_area_struct **vma_out, NvU64 addr)
 {
-    struct mm_struct *mm = va_block_context->mm;
-    struct vm_area_struct *vma;
-
    if (!mm)
        return NV_ERR_INVALID_ADDRESS;

    uvm_assert_mmap_lock_locked(mm);

-    vma = find_vma(mm, addr);
-    if (!uvm_hmm_vma_is_valid(vma, addr, false))
+    *vma_out = find_vma(mm, addr);
+    if (!uvm_hmm_vma_is_valid(*vma_out, addr, false))
        return NV_ERR_INVALID_ADDRESS;

-    va_block_context->hmm.vma = vma;
-
    return NV_OK;
 }

 bool uvm_hmm_check_context_vma_is_valid(uvm_va_block_t *va_block,
-                                        uvm_va_block_context_t *va_block_context,
+                                        struct vm_area_struct *vma,
                                        uvm_va_block_region_t region)
 {
    uvm_assert_mutex_locked(&va_block->lock);

    if (uvm_va_block_is_hmm(va_block)) {
-        struct vm_area_struct *vma = va_block_context->hmm.vma;
-
        UVM_ASSERT(vma);
-        UVM_ASSERT(va_block_context->mm == vma->vm_mm);
-        uvm_assert_mmap_lock_locked(va_block_context->mm);
+        UVM_ASSERT(va_block->hmm.va_space->va_space_mm.mm == vma->vm_mm);
+        uvm_assert_mmap_lock_locked(va_block->hmm.va_space->va_space_mm.mm);
        UVM_ASSERT(vma->vm_start <= uvm_va_block_region_start(va_block, region));
        UVM_ASSERT(vma->vm_end > uvm_va_block_region_end(va_block, region));
    }
@@ -579,7 +677,7 @@ bool uvm_hmm_check_context_vma_is_valid(uvm_va_block_t *va_block,
 void uvm_hmm_service_context_init(uvm_service_block_context_t *service_context)
 {
    // TODO: Bug 4050579: Remove this when swap cached pages can be migrated.
-    service_context->block_context.hmm.swap_cached = false;
+    service_context->block_context->hmm.swap_cached = false;
 }

 NV_STATUS uvm_hmm_migrate_begin(uvm_va_block_t *va_block)
@@ -619,8 +717,6 @@ static NV_STATUS hmm_migrate_range(uvm_va_block_t *va_block,
    uvm_mutex_lock(&va_block->lock);

    uvm_for_each_va_policy_in(policy, va_block, start, end, node, region) {
-        va_block_context->policy = policy;
-
        // Even though UVM_VA_BLOCK_RETRY_LOCKED() may unlock and relock the
        // va_block lock, the policy remains valid because we hold the mmap
        // lock so munmap can't remove the policy, and the va_space lock so the
@@ -644,48 +740,6 @@ static NV_STATUS hmm_migrate_range(uvm_va_block_t *va_block,
    return status;
 }

-void uvm_hmm_evict_va_blocks(uvm_va_space_t *va_space)
-{
-    // We can't use uvm_va_space_mm_retain(), because the va_space_mm
-    // should already be dead by now.
-    struct mm_struct *mm = va_space->va_space_mm.mm;
-    uvm_hmm_va_space_t *hmm_va_space = &va_space->hmm;
-    uvm_range_tree_node_t *node, *next;
-    uvm_va_block_t *va_block;
-    uvm_va_block_context_t *block_context;
-
-    uvm_down_read_mmap_lock(mm);
-    uvm_va_space_down_write(va_space);
-
-    uvm_range_tree_for_each_safe(node, next, &hmm_va_space->blocks) {
-        uvm_va_block_region_t region;
-        struct vm_area_struct *vma;
-
-        va_block = hmm_va_block_from_node(node);
-        block_context = uvm_va_space_block_context(va_space, mm);
-        uvm_hmm_migrate_begin_wait(va_block);
-        uvm_mutex_lock(&va_block->lock);
-        for_each_va_block_vma_region(va_block, mm, vma, &region) {
-            if (!uvm_hmm_vma_is_valid(vma, vma->vm_start, false))
-                continue;
-
-            block_context->hmm.vma = vma;
-            block_context->policy = &uvm_va_policy_default;
-            uvm_hmm_va_block_migrate_locked(va_block,
-                                            NULL,
-                                            block_context,
-                                            UVM_ID_CPU,
-                                            region,
-                                            UVM_MAKE_RESIDENT_CAUSE_API_MIGRATE);
-        }
-        uvm_mutex_unlock(&va_block->lock);
-        uvm_hmm_migrate_finish(va_block);
-    }
-
-    uvm_va_space_up_write(va_space);
-    uvm_up_read_mmap_lock(mm);
-}
-
 NV_STATUS uvm_hmm_test_va_block_inject_split_error(uvm_va_space_t *va_space, NvU64 addr)
 {
    uvm_va_block_test_t *block_test;
@@ -1046,11 +1100,7 @@ static NV_STATUS hmm_set_preferred_location_locked(uvm_va_block_t *va_block,
            uvm_processor_mask_test(&old_policy->accessed_by, old_policy->preferred_location))
            uvm_processor_mask_set(&set_accessed_by_processors, old_policy->preferred_location);

-        va_block_context->policy = uvm_va_policy_set_preferred_location(va_block,
-                                                                        region,
-                                                                        preferred_location,
-                                                                        old_policy);
-        if (!va_block_context->policy)
+        if (!uvm_va_policy_set_preferred_location(va_block, region, preferred_location, old_policy))
            return NV_ERR_NO_MEMORY;

        // Establish new remote mappings if the old preferred location had
@@ -1109,7 +1159,7 @@ NV_STATUS uvm_hmm_set_preferred_location(uvm_va_space_t *va_space,
    for (addr = base; addr < last_address; addr = va_block->end + 1) {
        NvU64 end;

-        status = hmm_va_block_find_create(va_space, addr, true, va_block_context, &va_block);
+        status = hmm_va_block_find_create(va_space, addr, true, &va_block_context->hmm.vma, &va_block);
        if (status != NV_OK)
            break;

@@ -1151,7 +1201,6 @@ static NV_STATUS hmm_set_accessed_by_start_end_locked(uvm_va_block_t *va_block,
        if (uvm_va_policy_is_read_duplicate(&node->policy, va_space))
            continue;

-        va_block_context->policy = &node->policy;
        region = uvm_va_block_region_from_start_end(va_block,
                                                    max(start, node->node.start),
                                                    min(end, node->node.end));
@@ -1196,7 +1245,7 @@ NV_STATUS uvm_hmm_set_accessed_by(uvm_va_space_t *va_space,
    for (addr = base; addr < last_address; addr = va_block->end + 1) {
        NvU64 end;

-        status = hmm_va_block_find_create(va_space, addr, true, va_block_context, &va_block);
+        status = hmm_va_block_find_create(va_space, addr, true, &va_block_context->hmm.vma, &va_block);
        if (status != NV_OK)
            break;

@@ -1249,8 +1298,6 @@ void uvm_hmm_block_add_eviction_mappings(uvm_va_space_t *va_space,
    uvm_mutex_lock(&va_block->lock);

    uvm_for_each_va_policy_node_in(node, va_block, va_block->start, va_block->end) {
-        block_context->policy = &node->policy;
-
        for_each_id_in_mask(id, &node->policy.accessed_by) {
            status = hmm_set_accessed_by_start_end_locked(va_block,
                                                          block_context,
@@ -1309,13 +1356,13 @@ void uvm_hmm_block_add_eviction_mappings(uvm_va_space_t *va_space,
    }
 }

-void uvm_hmm_find_policy_end(uvm_va_block_t *va_block,
-                             uvm_va_block_context_t *va_block_context,
-                             unsigned long addr,
-                             NvU64 *endp)
+const uvm_va_policy_t *uvm_hmm_find_policy_end(uvm_va_block_t *va_block,
+                                               struct vm_area_struct *vma,
+                                               unsigned long addr,
+                                               NvU64 *endp)
 {
-    struct vm_area_struct *vma = va_block_context->hmm.vma;
    const uvm_va_policy_node_t *node;
+    const uvm_va_policy_t *policy;
    NvU64 end = va_block->end;

    uvm_assert_mmap_lock_locked(vma->vm_mm);
@@ -1326,40 +1373,45 @@ void uvm_hmm_find_policy_end(uvm_va_block_t *va_block,

    node = uvm_va_policy_node_find(va_block, addr);
    if (node) {
-        va_block_context->policy = &node->policy;
+        policy = &node->policy;
        if (end > node->node.end)
            end = node->node.end;
    }
    else {
-        va_block_context->policy = &uvm_va_policy_default;
+        policy = &uvm_va_policy_default;
    }

    *endp = end;
+
+    return policy;
 }

 NV_STATUS uvm_hmm_find_policy_vma_and_outer(uvm_va_block_t *va_block,
-                                            uvm_va_block_context_t *va_block_context,
+                                            struct vm_area_struct **vma_out,
                                            uvm_page_index_t page_index,
+                                            const uvm_va_policy_t **policy,
                                            uvm_page_index_t *outerp)
 {
-    struct vm_area_struct *vma;
    unsigned long addr;
    NvU64 end;
    uvm_page_index_t outer;
+    uvm_va_space_t *va_space = uvm_va_block_get_va_space(va_block);
+    struct mm_struct *mm = va_space->va_space_mm.mm;
+
+    if (!mm)
+        return NV_ERR_INVALID_ADDRESS;

    UVM_ASSERT(uvm_va_block_is_hmm(va_block));
-    uvm_assert_mmap_lock_locked(va_block_context->mm);
+    uvm_assert_mmap_lock_locked(mm);
    uvm_assert_mutex_locked(&va_block->lock);

    addr = uvm_va_block_cpu_page_address(va_block, page_index);

-    vma = vma_lookup(va_block_context->mm, addr);
-    if (!vma || !(vma->vm_flags & VM_READ))
+    *vma_out = vma_lookup(mm, addr);
+    if (!*vma_out || !((*vma_out)->vm_flags & VM_READ))
        return NV_ERR_INVALID_ADDRESS;

-    va_block_context->hmm.vma = vma;
-
-    uvm_hmm_find_policy_end(va_block, va_block_context, addr, &end);
+    *policy = uvm_hmm_find_policy_end(va_block, *vma_out, addr, &end);

    outer = uvm_va_block_cpu_page_index(va_block, end) + 1;
    if (*outerp > outer)
@@ -1379,8 +1431,6 @@ static NV_STATUS hmm_clear_thrashing_policy(uvm_va_block_t *va_block,
    uvm_mutex_lock(&va_block->lock);

    uvm_for_each_va_policy_in(policy, va_block, va_block->start, va_block->end, node, region) {
-        block_context->policy = policy;
-
        // Unmap may split PTEs and require a retry. Needs to be called
        // before the pinned pages information is destroyed.
        status = UVM_VA_BLOCK_RETRY_LOCKED(va_block,
@@ -1424,11 +1474,10 @@ NV_STATUS uvm_hmm_clear_thrashing_policy(uvm_va_space_t *va_space)
 }

 uvm_va_block_region_t uvm_hmm_get_prefetch_region(uvm_va_block_t *va_block,
-                                                  uvm_va_block_context_t *va_block_context,
+                                                  struct vm_area_struct *vma,
+                                                  const uvm_va_policy_t *policy,
                                                  NvU64 address)
 {
-    struct vm_area_struct *vma = va_block_context->hmm.vma;
-    const uvm_va_policy_t *policy = va_block_context->policy;
    NvU64 start, end;

    UVM_ASSERT(uvm_va_block_is_hmm(va_block));
@@ -1457,13 +1506,11 @@ uvm_va_block_region_t uvm_hmm_get_prefetch_region(uvm_va_block_t *va_block,
 }

 uvm_prot_t uvm_hmm_compute_logical_prot(uvm_va_block_t *va_block,
-                                        uvm_va_block_context_t *va_block_context,
+                                        struct vm_area_struct *vma,
                                        NvU64 addr)
 {
-    struct vm_area_struct *vma = va_block_context->hmm.vma;
-
    UVM_ASSERT(uvm_va_block_is_hmm(va_block));
-    uvm_assert_mmap_lock_locked(va_block_context->mm);
+    uvm_assert_mmap_lock_locked(va_block->hmm.va_space->va_space_mm.mm);
    UVM_ASSERT(vma && addr >= vma->vm_start && addr < vma->vm_end);

    if (!(vma->vm_flags & VM_READ))
@@ -1497,40 +1544,59 @@ static NV_STATUS hmm_va_block_cpu_page_populate(uvm_va_block_t *va_block,
        return status;
    }

-    status = uvm_va_block_map_cpu_chunk_on_gpus(va_block, page_index);
+    status = uvm_va_block_map_cpu_chunk_on_gpus(va_block, chunk, page_index);
    if (status != NV_OK) {
-        uvm_cpu_chunk_remove_from_block(va_block, page_index);
+        uvm_cpu_chunk_remove_from_block(va_block, page_to_nid(page), page_index);
        uvm_cpu_chunk_free(chunk);
    }

    return status;
 }

-static void hmm_va_block_cpu_page_unpopulate(uvm_va_block_t *va_block,
-                                             uvm_page_index_t page_index)
+static void hmm_va_block_cpu_unpopulate_chunk(uvm_va_block_t *va_block,
+                                              uvm_cpu_chunk_t *chunk,
+                                              int chunk_nid,
+                                              uvm_page_index_t page_index)
 {
-    uvm_cpu_chunk_t *chunk = uvm_cpu_chunk_get_chunk_for_page(va_block, page_index);
-
-    UVM_ASSERT(uvm_va_block_is_hmm(va_block));
-
    if (!chunk)
        return;

    UVM_ASSERT(!uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU) ||
-               !uvm_page_mask_test(&va_block->cpu.resident, page_index));
+               !uvm_va_block_cpu_is_page_resident_on(va_block, NUMA_NO_NODE, page_index));
+    UVM_ASSERT(uvm_cpu_chunk_get_size(chunk) == PAGE_SIZE);

-    uvm_cpu_chunk_remove_from_block(va_block, page_index);
+    uvm_cpu_chunk_remove_from_block(va_block, chunk_nid, page_index);
    uvm_va_block_unmap_cpu_chunk_on_gpus(va_block, chunk, page_index);
    uvm_cpu_chunk_free(chunk);
 }

+static void hmm_va_block_cpu_page_unpopulate(uvm_va_block_t *va_block, uvm_page_index_t page_index, struct page *page)
+{
+    uvm_cpu_chunk_t *chunk;
+
+    UVM_ASSERT(uvm_va_block_is_hmm(va_block));
+
+    if (page) {
+        chunk = uvm_cpu_chunk_get_chunk_for_page(va_block, page_to_nid(page), page_index);
+        hmm_va_block_cpu_unpopulate_chunk(va_block, chunk, page_to_nid(page), page_index);
+    }
+    else {
+        int nid;
+
+        for_each_possible_uvm_node(nid) {
+            chunk = uvm_cpu_chunk_get_chunk_for_page(va_block, nid, page_index);
+            hmm_va_block_cpu_unpopulate_chunk(va_block, chunk, nid, page_index);
+        }
+    }
+}
+
 static bool hmm_va_block_cpu_page_is_same(uvm_va_block_t *va_block,
                                          uvm_page_index_t page_index,
                                          struct page *page)
 {
-    struct page *old_page = uvm_cpu_chunk_get_cpu_page(va_block, page_index);
+    struct page *old_page = uvm_va_block_get_cpu_page(va_block, page_index);

-    UVM_ASSERT(uvm_cpu_chunk_is_hmm(uvm_cpu_chunk_get_chunk_for_page(va_block, page_index)));
+    UVM_ASSERT(uvm_cpu_chunk_is_hmm(uvm_cpu_chunk_get_chunk_for_page(va_block, page_to_nid(page), page_index)));
    return old_page == page;
 }

@@ -1543,7 +1609,7 @@ static void clear_service_context_masks(uvm_service_block_context_t *service_con
                                        uvm_processor_id_t new_residency,
                                        uvm_page_index_t page_index)
 {
-    uvm_page_mask_clear(&service_context->block_context.caller_page_mask, page_index);
+    uvm_page_mask_clear(&service_context->block_context->caller_page_mask, page_index);

    uvm_page_mask_clear(&service_context->per_processor_masks[uvm_id_value(new_residency)].new_residency,
                        page_index);
@@ -1570,7 +1636,6 @@ static void cpu_mapping_set(uvm_va_block_t *va_block,
                            uvm_page_index_t page_index)
 {
    uvm_processor_mask_set(&va_block->mapped, UVM_ID_CPU);
-    uvm_page_mask_set(&va_block->maybe_mapped_pages, page_index);
    uvm_page_mask_set(&va_block->cpu.pte_bits[UVM_PTE_BITS_CPU_READ], page_index);
    if (is_write)
        uvm_page_mask_set(&va_block->cpu.pte_bits[UVM_PTE_BITS_CPU_WRITE], page_index);
@@ -1720,7 +1785,7 @@ static NV_STATUS sync_page_and_chunk_state(uvm_va_block_t *va_block,
            // migrate_vma_finalize() will release the reference so we should
            // clear our pointer to it.
            // TODO: Bug 3660922: Need to handle read duplication at some point.
-            hmm_va_block_cpu_page_unpopulate(va_block, page_index);
+            hmm_va_block_cpu_page_unpopulate(va_block, page_index, page);
        }
    }

@@ -1746,7 +1811,7 @@ static void clean_up_non_migrating_page(uvm_va_block_t *va_block,
    else {
        UVM_ASSERT(page_ref_count(dst_page) == 1);

-        hmm_va_block_cpu_page_unpopulate(va_block, page_index);
+        hmm_va_block_cpu_page_unpopulate(va_block, page_index, dst_page);
    }

    unlock_page(dst_page);
@@ -1781,7 +1846,7 @@ static void lock_block_cpu_page(uvm_va_block_t *va_block,
                                unsigned long *dst_pfns,
                                uvm_page_mask_t *same_devmem_page_mask)
 {
-    uvm_cpu_chunk_t *chunk = uvm_cpu_chunk_get_chunk_for_page(va_block, page_index);
+    uvm_cpu_chunk_t *chunk = uvm_cpu_chunk_get_chunk_for_page(va_block, page_to_nid(src_page), page_index);
    uvm_va_block_region_t chunk_region;
    struct page *dst_page;

@@ -1807,7 +1872,7 @@ static void lock_block_cpu_page(uvm_va_block_t *va_block,
        // hmm_va_block_cpu_page_unpopulate() or block_kill(). If the page
        // does not migrate, it will be freed though.
        UVM_ASSERT(!uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU) ||
-                   !uvm_page_mask_test(&va_block->cpu.resident, page_index));
+                   !uvm_va_block_cpu_is_page_resident_on(va_block, NUMA_NO_NODE, page_index));
        UVM_ASSERT(chunk->type == UVM_CPU_CHUNK_TYPE_PHYSICAL);
        UVM_ASSERT(page_ref_count(dst_page) == 1);
        uvm_cpu_chunk_make_hmm(chunk);
@@ -1955,7 +2020,7 @@ static NV_STATUS alloc_and_copy_to_cpu(uvm_va_block_t *va_block,
        }

        UVM_ASSERT(!uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU) ||
-                   !uvm_page_mask_test(&va_block->cpu.resident, page_index));
+                   !uvm_va_block_cpu_is_page_resident_on(va_block, NUMA_NO_NODE, page_index));

        // Allocate a user system memory page for the destination.
        // This is the typical case since Linux will free the source page when
@@ -2033,8 +2098,8 @@ static NV_STATUS uvm_hmm_devmem_fault_alloc_and_copy(uvm_hmm_devmem_fault_contex
    service_context = devmem_fault_context->service_context;
    va_block_retry = devmem_fault_context->va_block_retry;
    va_block = devmem_fault_context->va_block;
-    src_pfns = service_context->block_context.hmm.src_pfns;
-    dst_pfns = service_context->block_context.hmm.dst_pfns;
+    src_pfns = service_context->block_context->hmm.src_pfns;
+    dst_pfns = service_context->block_context->hmm.dst_pfns;

    // Build the migration page mask.
    // Note that thrashing pinned pages and prefetch pages are already
@@ -2043,7 +2108,7 @@ static NV_STATUS uvm_hmm_devmem_fault_alloc_and_copy(uvm_hmm_devmem_fault_contex
    uvm_page_mask_copy(page_mask, &service_context->per_processor_masks[UVM_ID_CPU_VALUE].new_residency);

    status = alloc_and_copy_to_cpu(va_block,
-                                   service_context->block_context.hmm.vma,
+                                   service_context->block_context->hmm.vma,
                                   src_pfns,
                                   dst_pfns,
                                   service_context->region,
@@ -2078,8 +2143,8 @@ static NV_STATUS uvm_hmm_devmem_fault_finalize_and_map(uvm_hmm_devmem_fault_cont
    prefetch_hint = &service_context->prefetch_hint;
    va_block = devmem_fault_context->va_block;
    va_block_retry = devmem_fault_context->va_block_retry;
-    src_pfns = service_context->block_context.hmm.src_pfns;
-    dst_pfns = service_context->block_context.hmm.dst_pfns;
+    src_pfns = service_context->block_context->hmm.src_pfns;
+    dst_pfns = service_context->block_context->hmm.dst_pfns;
    region = service_context->region;

    page_mask = &devmem_fault_context->page_mask;
@@ -2186,8 +2251,7 @@ static NV_STATUS populate_region(uvm_va_block_t *va_block,

        // Since we have a stable snapshot of the CPU pages, we can
        // update the residency and protection information.
-        uvm_processor_mask_set(&va_block->resident, UVM_ID_CPU);
-        uvm_page_mask_set(&va_block->cpu.resident, page_index);
+        uvm_va_block_cpu_set_resident_page(va_block, page_to_nid(page), page_index);

        cpu_mapping_set(va_block, pfns[page_index] & HMM_PFN_WRITE, page_index);
    }
@@ -2274,7 +2338,7 @@ static void hmm_release_atomic_pages(uvm_va_block_t *va_block,
    uvm_page_index_t page_index;

    for_each_va_block_page_in_region(page_index, region) {
-        struct page *page = service_context->block_context.hmm.pages[page_index];
+        struct page *page = service_context->block_context->hmm.pages[page_index];

        if (!page)
            continue;
@@ -2290,14 +2354,14 @@ static NV_STATUS hmm_block_atomic_fault_locked(uvm_processor_id_t processor_id,
                                               uvm_service_block_context_t *service_context)
 {
    uvm_va_block_region_t region = service_context->region;
-    struct page **pages = service_context->block_context.hmm.pages;
+    struct page **pages = service_context->block_context->hmm.pages;
    int npages;
    uvm_page_index_t page_index;
    uvm_make_resident_cause_t cause;
    NV_STATUS status;

    if (!uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU) ||
-        !uvm_page_mask_region_full(&va_block->cpu.resident, region)) {
+        !uvm_va_block_cpu_is_region_resident_on(va_block, NUMA_NO_NODE, region)) {
        // There is an atomic GPU fault. We need to make sure no pages are
        // GPU resident so that make_device_exclusive_range() doesn't call
        // migrate_to_ram() and cause a va_space lock recursion problem.
@@ -2310,7 +2374,7 @@ static NV_STATUS hmm_block_atomic_fault_locked(uvm_processor_id_t processor_id,

        status = uvm_hmm_va_block_migrate_locked(va_block,
                                                 va_block_retry,
-                                                 &service_context->block_context,
+                                                 service_context->block_context,
                                                 UVM_ID_CPU,
                                                 region,
                                                 cause);
@@ -2320,7 +2384,7 @@ static NV_STATUS hmm_block_atomic_fault_locked(uvm_processor_id_t processor_id,
        // make_device_exclusive_range() will try to call migrate_to_ram()
        // and deadlock with ourself if the data isn't CPU resident.
        if (!uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU) ||
-            !uvm_page_mask_region_full(&va_block->cpu.resident, region)) {
+            !uvm_va_block_cpu_is_region_resident_on(va_block, NUMA_NO_NODE, region)) {
            status = NV_WARN_MORE_PROCESSING_REQUIRED;
            goto done;
        }
@@ -2330,7 +2394,7 @@ static NV_STATUS hmm_block_atomic_fault_locked(uvm_processor_id_t processor_id,
    // mmap() files so we check for that here and report a fatal fault.
    // Otherwise with the current Linux 6.1 make_device_exclusive_range(),
    // it doesn't make the page exclusive and we end up in an endless loop.
-    if (service_context->block_context.hmm.vma->vm_flags & VM_SHARED) {
+    if (service_context->block_context->hmm.vma->vm_flags & (VM_SHARED | VM_HUGETLB)) {
        status = NV_ERR_NOT_SUPPORTED;
        goto done;
    }
@@ -2339,7 +2403,7 @@ static NV_STATUS hmm_block_atomic_fault_locked(uvm_processor_id_t processor_id,

    uvm_mutex_unlock(&va_block->lock);

-    npages = make_device_exclusive_range(service_context->block_context.mm,
+    npages = make_device_exclusive_range(service_context->block_context->mm,
        uvm_va_block_cpu_page_address(va_block, region.first),
        uvm_va_block_cpu_page_address(va_block, region.outer - 1) + PAGE_SIZE,
        pages + region.first,
@@ -2377,15 +2441,13 @@ static NV_STATUS hmm_block_atomic_fault_locked(uvm_processor_id_t processor_id,
        if (uvm_page_mask_test(&va_block->cpu.allocated, page_index)) {
            UVM_ASSERT(hmm_va_block_cpu_page_is_same(va_block, page_index, page));
            UVM_ASSERT(uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU));
-            UVM_ASSERT(uvm_page_mask_test(&va_block->cpu.resident, page_index));
+            UVM_ASSERT(uvm_va_block_cpu_is_page_resident_on(va_block, NUMA_NO_NODE, page_index));
        }
        else {
            NV_STATUS s = hmm_va_block_cpu_page_populate(va_block, page_index, page);

-            if (s == NV_OK) {
-                uvm_processor_mask_set(&va_block->resident, UVM_ID_CPU);
-                uvm_page_mask_set(&va_block->cpu.resident, page_index);
-            }
+            if (s == NV_OK)
+                uvm_va_block_cpu_set_resident_page(va_block, page_to_nid(page), page_index);
        }

        cpu_mapping_clear(va_block, page_index);
@@ -2440,7 +2502,7 @@ static NV_STATUS hmm_block_cpu_fault_locked(uvm_processor_id_t processor_id,
                                            uvm_service_block_context_t *service_context)
 {
    uvm_va_block_region_t region = service_context->region;
-    struct migrate_vma *args = &service_context->block_context.hmm.migrate_vma_args;
+    struct migrate_vma *args = &service_context->block_context->hmm.migrate_vma_args;
    NV_STATUS status;
    int ret;
    uvm_hmm_devmem_fault_context_t fault_context = {
@@ -2474,8 +2536,8 @@ static NV_STATUS hmm_block_cpu_fault_locked(uvm_processor_id_t processor_id,
        }

        status = hmm_make_resident_cpu(va_block,
-                                       service_context->block_context.hmm.vma,
-                                       service_context->block_context.hmm.src_pfns,
+                                       service_context->block_context->hmm.vma,
+                                       service_context->block_context->hmm.src_pfns,
                                       region,
                                       service_context->access_type,
                                       &fault_context.same_devmem_page_mask);
@@ -2497,9 +2559,9 @@ static NV_STATUS hmm_block_cpu_fault_locked(uvm_processor_id_t processor_id,
        }
    }

-    args->vma = service_context->block_context.hmm.vma;
-    args->src = service_context->block_context.hmm.src_pfns + region.first;
-    args->dst = service_context->block_context.hmm.dst_pfns + region.first;
+    args->vma = service_context->block_context->hmm.vma;
+    args->src = service_context->block_context->hmm.src_pfns + region.first;
+    args->dst = service_context->block_context->hmm.dst_pfns + region.first;
    args->start = uvm_va_block_region_start(va_block, region);
    args->end = uvm_va_block_region_end(va_block, region) + 1;
    args->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
@@ -2579,7 +2641,7 @@ static NV_STATUS dmamap_src_sysmem_pages(uvm_va_block_t *va_block,
                // TODO: Bug 4050579: Remove this when swap cached pages can be
                // migrated.
                if (service_context) {
-                    service_context->block_context.hmm.swap_cached = true;
+                    service_context->block_context->hmm.swap_cached = true;
                    break;
                }

@@ -2595,7 +2657,7 @@ static NV_STATUS dmamap_src_sysmem_pages(uvm_va_block_t *va_block,
            if (uvm_page_mask_test(&va_block->cpu.allocated, page_index)) {
                UVM_ASSERT(hmm_va_block_cpu_page_is_same(va_block, page_index, src_page));
                UVM_ASSERT(uvm_processor_mask_test(&va_block->resident, UVM_ID_CPU));
-                UVM_ASSERT(uvm_page_mask_test(&va_block->cpu.resident, page_index));
+                UVM_ASSERT(uvm_va_block_cpu_is_page_resident_on(va_block, NUMA_NO_NODE, page_index));
            }
            else {
                status = hmm_va_block_cpu_page_populate(va_block, page_index, src_page);
@@ -2609,8 +2671,7 @@ static NV_STATUS dmamap_src_sysmem_pages(uvm_va_block_t *va_block,

                // migrate_vma_setup() was able to isolate and lock the page;
                // therefore, it is CPU resident and not mapped.
-                uvm_processor_mask_set(&va_block->resident, UVM_ID_CPU);
-                uvm_page_mask_set(&va_block->cpu.resident, page_index);
+                uvm_va_block_cpu_set_resident_page(va_block, page_to_nid(src_page), page_index);
            }

            // The call to migrate_vma_setup() will have inserted a migration
@@ -2625,7 +2686,7 @@ static NV_STATUS dmamap_src_sysmem_pages(uvm_va_block_t *va_block,
            if (uvm_page_mask_test(&va_block->cpu.allocated, page_index)) {
                UVM_ASSERT(!uvm_va_block_page_resident_processors_count(va_block, page_index));

-                hmm_va_block_cpu_page_unpopulate(va_block, page_index);
+                hmm_va_block_cpu_page_unpopulate(va_block, page_index, NULL);
            }
        }

@@ -2639,7 +2700,7 @@ static NV_STATUS dmamap_src_sysmem_pages(uvm_va_block_t *va_block,
    }

    if (uvm_page_mask_empty(page_mask) ||
-        (service_context && service_context->block_context.hmm.swap_cached))
+        (service_context && service_context->block_context->hmm.swap_cached))
        status = NV_WARN_MORE_PROCESSING_REQUIRED;

    if (status != NV_OK)
@@ -2670,8 +2731,8 @@ static NV_STATUS uvm_hmm_gpu_fault_alloc_and_copy(struct vm_area_struct *vma,
    service_context = uvm_hmm_gpu_fault_event->service_context;
    region = service_context->region;
    prefetch_hint = &service_context->prefetch_hint;
-    src_pfns = service_context->block_context.hmm.src_pfns;
-    dst_pfns = service_context->block_context.hmm.dst_pfns;
+    src_pfns = service_context->block_context->hmm.src_pfns;
+    dst_pfns = service_context->block_context->hmm.dst_pfns;

    // Build the migration mask.
    // Note that thrashing pinned pages are already accounted for in
@@ -2729,8 +2790,8 @@ static NV_STATUS uvm_hmm_gpu_fault_finalize_and_map(uvm_hmm_gpu_fault_event_t *u
    va_block = uvm_hmm_gpu_fault_event->va_block;
    va_block_retry = uvm_hmm_gpu_fault_event->va_block_retry;
    service_context = uvm_hmm_gpu_fault_event->service_context;
-    src_pfns = service_context->block_context.hmm.src_pfns;
-    dst_pfns = service_context->block_context.hmm.dst_pfns;
+    src_pfns = service_context->block_context->hmm.src_pfns;
+    dst_pfns = service_context->block_context->hmm.dst_pfns;
    region = service_context->region;
    page_mask = &uvm_hmm_gpu_fault_event->page_mask;

@@ -2773,11 +2834,11 @@ NV_STATUS uvm_hmm_va_block_service_locked(uvm_processor_id_t processor_id,
                                          uvm_va_block_retry_t *va_block_retry,
                                          uvm_service_block_context_t *service_context)
 {
-    struct mm_struct *mm = service_context->block_context.mm;
-    struct vm_area_struct *vma = service_context->block_context.hmm.vma;
+    struct mm_struct *mm = service_context->block_context->mm;
+    struct vm_area_struct *vma = service_context->block_context->hmm.vma;
    uvm_va_block_region_t region = service_context->region;
    uvm_hmm_gpu_fault_event_t uvm_hmm_gpu_fault_event;
-    struct migrate_vma *args = &service_context->block_context.hmm.migrate_vma_args;
+    struct migrate_vma *args = &service_context->block_context->hmm.migrate_vma_args;
    int ret;
    NV_STATUS status = NV_ERR_INVALID_ADDRESS;

@@ -2801,8 +2862,8 @@ NV_STATUS uvm_hmm_va_block_service_locked(uvm_processor_id_t processor_id,
    uvm_hmm_gpu_fault_event.service_context = service_context;

    args->vma = vma;
-    args->src = service_context->block_context.hmm.src_pfns + region.first;
-    args->dst = service_context->block_context.hmm.dst_pfns + region.first;
+    args->src = service_context->block_context->hmm.src_pfns + region.first;
+    args->dst = service_context->block_context->hmm.dst_pfns + region.first;
    args->start = uvm_va_block_region_start(va_block, region);
    args->end = uvm_va_block_region_end(va_block, region) + 1;
    args->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE | MIGRATE_VMA_SELECT_SYSTEM;
@@ -2836,8 +2897,8 @@ NV_STATUS uvm_hmm_va_block_service_locked(uvm_processor_id_t processor_id,
        // since migrate_vma_setup() would have reported that information.
        // Try to make it resident in system memory and retry the migration.
        status = hmm_make_resident_cpu(va_block,
-                                       service_context->block_context.hmm.vma,
-                                       service_context->block_context.hmm.src_pfns,
+                                       service_context->block_context->hmm.vma,
+                                       service_context->block_context->hmm.src_pfns,
                                       region,
                                       service_context->access_type,
                                       NULL);
@@ -2907,8 +2968,6 @@ static NV_STATUS uvm_hmm_migrate_alloc_and_copy(struct vm_area_struct *vma,
    if (status != NV_OK)
        return status;

-    UVM_ASSERT(!uvm_va_policy_is_read_duplicate(va_block_context->policy, va_block->hmm.va_space));
-
    status = uvm_va_block_make_resident_copy(va_block,
                                             va_block_retry,
                                             va_block_context,
@@ -2985,16 +3044,6 @@ static NV_STATUS uvm_hmm_migrate_finalize(uvm_hmm_migrate_event_t *uvm_hmm_migra
                                     &uvm_hmm_migrate_event->same_devmem_page_mask);
 }

-static bool is_resident(uvm_va_block_t *va_block,
-                        uvm_processor_id_t dest_id,
-                        uvm_va_block_region_t region)
-{
-    if (!uvm_processor_mask_test(&va_block->resident, dest_id))
-        return false;
-
-    return uvm_page_mask_region_full(uvm_va_block_resident_mask_get(va_block, dest_id), region);
-}
-
 // Note that migrate_vma_*() doesn't handle asynchronous migrations so the
 // migration flag UVM_MIGRATE_FLAG_SKIP_CPU_MAP doesn't have an effect.
 // TODO: Bug 3900785: investigate ways to implement async migration.
@@ -3086,9 +3135,7 @@ NV_STATUS uvm_hmm_va_block_migrate_locked(uvm_va_block_t *va_block,
        uvm_page_mask_init_from_region(page_mask, region, NULL);

        for_each_id_in_mask(id, &va_block->resident) {
-            if (!uvm_page_mask_andnot(page_mask,
-                                      page_mask,
-                                      uvm_va_block_resident_mask_get(va_block, id)))
+            if (!uvm_page_mask_andnot(page_mask, page_mask, uvm_va_block_resident_mask_get(va_block, id, NUMA_NO_NODE)))
                return NV_OK;
        }

@@ -3140,7 +3187,7 @@ NV_STATUS uvm_hmm_migrate_ranges(uvm_va_space_t *va_space,
    for (addr = base; addr < last_address; addr = end + 1) {
        struct vm_area_struct *vma;

-        status = hmm_va_block_find_create(va_space, addr, false, va_block_context, &va_block);
+        status = hmm_va_block_find_create(va_space, addr, false, &va_block_context->hmm.vma, &va_block);
        if (status != NV_OK)
            return status;

@@ -3216,6 +3263,7 @@ static NV_STATUS hmm_va_block_evict_chunks(uvm_va_block_t *va_block,
    uvm_page_mask_t *page_mask = &uvm_hmm_migrate_event.page_mask;
    const uvm_va_policy_t *policy;
    uvm_va_policy_node_t *node;
+    uvm_page_mask_t *cpu_resident_mask = uvm_va_block_resident_mask_get(va_block, UVM_ID_CPU, NUMA_NO_NODE);
    unsigned long npages;
    NV_STATUS status;

@@ -3232,14 +3280,13 @@ static NV_STATUS hmm_va_block_evict_chunks(uvm_va_block_t *va_block,
    uvm_for_each_va_policy_in(policy, va_block, start, end, node, region) {
        npages = uvm_va_block_region_num_pages(region);

-        va_block_context->policy = policy;
        if (out_accessed_by_set && uvm_processor_mask_get_count(&policy->accessed_by) > 0)
            *out_accessed_by_set = true;

        // Pages resident on the GPU should not have a resident page in system
        // memory.
        // TODO: Bug 3660922: Need to handle read duplication at some point.
-        UVM_ASSERT(uvm_page_mask_region_empty(&va_block->cpu.resident, region));
+        UVM_ASSERT(uvm_page_mask_region_empty(cpu_resident_mask, region));

        status = alloc_and_copy_to_cpu(va_block,
                                       NULL,
@@ -3338,35 +3385,34 @@ NV_STATUS uvm_hmm_va_block_evict_pages_from_gpu(uvm_va_block_t *va_block,
                                     NULL);
 }

-NV_STATUS uvm_hmm_pmm_gpu_evict_pfn(unsigned long pfn)
+NV_STATUS uvm_hmm_remote_cpu_fault(struct vm_fault *vmf)
 {
-    unsigned long src_pfn = 0;
-    unsigned long dst_pfn = 0;
-    struct page *dst_page;
    NV_STATUS status = NV_OK;
+    unsigned long src_pfn;
+    unsigned long dst_pfn;
+    struct migrate_vma args;
+    struct page *src_page = vmf->page;
+    uvm_tracker_t tracker = UVM_TRACKER_INIT();
    int ret;

-    ret = migrate_device_range(&src_pfn, pfn, 1);
-    if (ret)
-        return errno_to_nv_status(ret);
+    args.vma = vmf->vma;
+    args.src = &src_pfn;
+    args.dst = &dst_pfn;
+    args.start = nv_page_fault_va(vmf);
+    args.end = args.start + PAGE_SIZE;
+    args.pgmap_owner = &g_uvm_global;
+    args.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
+    args.fault_page = src_page;
+
+    // We don't call migrate_vma_setup_locked() here because we don't
+    // have a va_block and don't want to ignore invalidations.
+    ret = migrate_vma_setup(&args);
+    UVM_ASSERT(!ret);

    if (src_pfn & MIGRATE_PFN_MIGRATE) {
-        // All the code for copying a vidmem page to sysmem relies on
-        // having a va_block. However certain combinations of mremap()
-        // and fork() can result in device-private pages being mapped
-        // in a child process without a va_block.
-        //
-        // We don't expect the above to be a common occurance so for
-        // now we allocate a fresh zero page when evicting without a
-        // va_block. However this results in child processes losing
-        // data so make sure we warn about it. Ideally we would just
-        // not migrate and SIGBUS the child if it tries to access the
-        // page. However that would prevent unloading of the driver so
-        // we're stuck with this until we fix the problem.
-        // TODO: Bug 3902536: add code to migrate GPU memory without having a
-        // va_block.
-        WARN_ON(1);
-        dst_page = alloc_page(GFP_HIGHUSER_MOVABLE | __GFP_ZERO);
+        struct page *dst_page;
+
+        dst_page = alloc_page(GFP_HIGHUSER_MOVABLE);
        if (!dst_page) {
            status = NV_ERR_NO_MEMORY;
            goto out;
@@ -3375,11 +3421,15 @@ NV_STATUS uvm_hmm_pmm_gpu_evict_pfn(unsigned long pfn)
        lock_page(dst_page);
        dst_pfn = migrate_pfn(page_to_pfn(dst_page));

-        migrate_device_pages(&src_pfn, &dst_pfn, 1);
+        status = uvm_hmm_copy_devmem_page(dst_page, src_page, &tracker);
+        if (status == NV_OK)
+            status = uvm_tracker_wait_deinit(&tracker);
    }

+    migrate_vma_pages(&args);
+
 out:
-    migrate_device_finalize(&src_pfn, &dst_pfn, 1);
+    migrate_vma_finalize(&args);

    return status;
 }
@@ -3630,4 +3680,3 @@ bool uvm_hmm_must_use_sysmem(uvm_va_block_t *va_block,
 }

 #endif // UVM_IS_CONFIG_HMM()
-
--- a/kernel-open/nvidia-uvm/uvm_hmm.h
+++ b/kernel-open/nvidia-uvm/uvm_hmm.h
@@ -49,9 +49,7 @@ typedef struct
    bool uvm_hmm_is_enabled_system_wide(void);

    // Initialize HMM for the given the va_space.
-    // Locking: the va_space->va_space_mm.mm mmap_lock must be write locked
-    // and the va_space lock must be held in write mode.
-    NV_STATUS uvm_hmm_va_space_initialize(uvm_va_space_t *va_space);
+    void uvm_hmm_va_space_initialize(uvm_va_space_t *va_space);

    // Destroy any HMM state for the given the va_space.
    // Locking: va_space lock must be held in write mode.
@@ -90,31 +88,30 @@ typedef struct
    // address 'addr' or the VMA does not have at least PROT_READ permission.
    // The caller is also responsible for checking that there is no UVM
    // va_range covering the given address before calling this function.
-    // If va_block_context is not NULL, the VMA is cached in
-    // va_block_context->hmm.vma.
+    // The VMA is returned in vma_out if it's not NULL.
    // Locking: This function must be called with mm retained and locked for
    // at least read and the va_space lock at least for read.
    NV_STATUS uvm_hmm_va_block_find_create(uvm_va_space_t *va_space,
                                           NvU64 addr,
-                                           uvm_va_block_context_t *va_block_context,
+                                           struct vm_area_struct **vma_out,
                                           uvm_va_block_t **va_block_ptr);

-    // Find the VMA for the given address and set va_block_context->hmm.vma.
-    // Return NV_ERR_INVALID_ADDRESS if va_block_context->mm is NULL or there
-    // is no VMA associated with the address 'addr' or the VMA does not have at
-    // least PROT_READ permission.
+    // Find the VMA for the given address and return it in vma_out. Return
+    // NV_ERR_INVALID_ADDRESS if mm is NULL or there is no VMA associated with
+    // the address 'addr' or the VMA does not have at least PROT_READ
+    // permission.
    // Locking: This function must be called with mm retained and locked for
    // at least read or mm equal to NULL.
-    NV_STATUS uvm_hmm_find_vma(uvm_va_block_context_t *va_block_context, NvU64 addr);
+    NV_STATUS uvm_hmm_find_vma(struct mm_struct *mm, struct vm_area_struct **vma_out, NvU64 addr);

-    // If va_block is a HMM va_block, check that va_block_context->hmm.vma is
-    // not NULL and covers the given region. This always returns true and is
-    // intended to only be used with UVM_ASSERT().
+    // If va_block is a HMM va_block, check that vma is not NULL and covers the
+    // given region. This always returns true and is intended to only be used
+    // with UVM_ASSERT().
    // Locking: This function must be called with the va_block lock held and if
-    // va_block is a HMM block, va_block_context->mm must be retained and
-    // locked for at least read.
+    // va_block is a HMM block, va_space->va_space_mm.mm->mmap_lock must be
+    // retained and locked for at least read.
    bool uvm_hmm_check_context_vma_is_valid(uvm_va_block_t *va_block,
-                                            uvm_va_block_context_t *va_block_context,
+                                            struct vm_area_struct *vma,
                                            uvm_va_block_region_t region);

    // Initialize the HMM portion of the service_context.
@@ -225,31 +222,29 @@ typedef struct
        return NV_OK;
    }

-    // This function assigns va_block_context->policy to the policy covering
-    // the given address 'addr' and assigns the ending address '*endp' to the
-    // minimum of va_block->end, va_block_context->hmm.vma->vm_end - 1, and the
-    // ending address of the policy range. Note that va_block_context->hmm.vma
-    // is expected to be initialized before calling this function.
-    // Locking: This function must be called with
-    // va_block_context->hmm.vma->vm_mm retained and locked for least read and
-    // the va_block lock held.
-    void uvm_hmm_find_policy_end(uvm_va_block_t *va_block,
-                                 uvm_va_block_context_t *va_block_context,
-                                 unsigned long addr,
-                                 NvU64 *endp);
+    // This function returns the policy covering the given address 'addr' and
+    // assigns the ending address '*endp' to the minimum of va_block->end,
+    // vma->vm_end - 1, and the ending address of the policy range. Locking:
+    // This function must be called with vma->vm_mm retained and locked for at
+    // least read and the va_block and va_space lock held.
+    const uvm_va_policy_t *uvm_hmm_find_policy_end(uvm_va_block_t *va_block,
+                                                   struct vm_area_struct *vma,
+                                                   unsigned long addr,
+                                                   NvU64 *endp);

-    // This function finds the VMA for the page index 'page_index' and assigns
-    // it to va_block_context->vma, sets va_block_context->policy to the policy
-    // covering the given address, and sets the ending page range '*outerp'
-    // to the minimum of *outerp, va_block_context->hmm.vma->vm_end - 1, the
-    // ending address of the policy range, and va_block->end.
-    // Return NV_ERR_INVALID_ADDRESS if no VMA is found; otherwise, NV_OK.
-    // Locking: This function must be called with
-    // va_block_context->hmm.vma->vm_mm retained and locked for least read and
-    // the va_block lock held.
+    // This function finds the VMA for the page index 'page_index' and returns
+    // it in vma_out which must not be NULL. Returns the policy covering the
+    // given address, and sets the ending page range '*outerp' to the minimum of
+    // *outerp, vma->vm_end - 1, the ending address of the policy range, and
+    // va_block->end.
+    // Return NV_ERR_INVALID_ADDRESS if no VMA is found; otherwise sets *vma
+    // and returns NV_OK.
+    // Locking: This function must be called with mm retained and locked for at
+    // least read and the va_block and va_space lock held.
    NV_STATUS uvm_hmm_find_policy_vma_and_outer(uvm_va_block_t *va_block,
-                                                uvm_va_block_context_t *va_block_context,
+                                                struct vm_area_struct **vma,
                                                uvm_page_index_t page_index,
+                                                const uvm_va_policy_t **policy,
                                                uvm_page_index_t *outerp);

    // Clear thrashing policy information from all HMM va_blocks.
@@ -258,24 +253,21 @@ typedef struct

    // Return the expanded region around 'address' limited to the intersection
    // of va_block start/end, vma start/end, and policy start/end.
-    // va_block_context must not be NULL, va_block_context->hmm.vma must be
-    // valid (this is usually set by uvm_hmm_va_block_find_create()), and
-    // va_block_context->policy must be valid.
-    // Locking: the caller must hold mm->mmap_lock in at least read mode, the
-    // va_space lock must be held in at least read mode, and the va_block lock
-    // held.
+    // Locking: the caller must hold va_space->va_space_mm.mm->mmap_lock in at
+    // least read mode, the va_space lock must be held in at least read mode,
+    // and the va_block lock held.
    uvm_va_block_region_t uvm_hmm_get_prefetch_region(uvm_va_block_t *va_block,
-                                                      uvm_va_block_context_t *va_block_context,
+                                                      struct vm_area_struct *vma,
+                                                      const uvm_va_policy_t *policy,
                                                      NvU64 address);

    // Return the logical protection allowed of a HMM va_block for the page at
-    // the given address.
-    // va_block_context must not be NULL and va_block_context->hmm.vma must be
-    // valid (this is usually set by uvm_hmm_va_block_find_create()).
-    // Locking: the caller must hold va_block_context->mm mmap_lock in at least
-    // read mode.
+    // the given address within the vma which must be valid. This is usually
+    // obtained from uvm_hmm_va_block_find_create()).
+    // Locking: the caller must hold va_space->va_space_mm.mm mmap_lock in at
+    // least read mode.
    uvm_prot_t uvm_hmm_compute_logical_prot(uvm_va_block_t *va_block,
-                                            uvm_va_block_context_t *va_block_context,
+                                            struct vm_area_struct *vma,
                                            NvU64 addr);

    // This is called to service a GPU fault.
@@ -288,9 +280,9 @@ typedef struct
                                              uvm_service_block_context_t *service_context);

    // This is called to migrate a region within a HMM va_block.
-    // va_block_context must not be NULL and va_block_context->policy and
-    // va_block_context->hmm.vma must be valid.
-    // Locking: the va_block_context->mm must be retained, mmap_lock must be
+    // va_block_context must not be NULL and va_block_context->hmm.vma
+    // must be valid.
+    // Locking: the va_space->va_space_mm.mm must be retained, mmap_lock must be
    // locked, and the va_block lock held.
    NV_STATUS uvm_hmm_va_block_migrate_locked(uvm_va_block_t *va_block,
                                              uvm_va_block_retry_t *va_block_retry,
@@ -303,7 +295,7 @@ typedef struct
    // UvmMigrate().
    //
    // va_block_context must not be NULL. The caller is not required to set
-    // va_block_context->policy or va_block_context->hmm.vma.
+    // va_block_context->hmm.vma.
    //
    // Locking: the va_space->va_space_mm.mm mmap_lock must be locked and
    // the va_space read lock must be held.
@@ -315,10 +307,10 @@ typedef struct
                                     uvm_migrate_mode_t mode,
                                     uvm_tracker_t *out_tracker);

-    // Evicts all va_blocks in the va_space to the CPU. Unlike the
-    // other va_block eviction functions this is based on virtual
-    // address and therefore takes mmap_lock for read.
-    void uvm_hmm_evict_va_blocks(uvm_va_space_t *va_space);
+    // Handle a fault to a device-private page from a process other than the
+    // process which created the va_space that originally allocated the
+    // device-private page.
+    NV_STATUS uvm_hmm_remote_cpu_fault(struct vm_fault *vmf);

    // This sets the va_block_context->hmm.src_pfns[] to the ZONE_DEVICE private
    // PFN for the GPU chunk memory.
@@ -351,14 +343,6 @@ typedef struct
                                                    const uvm_page_mask_t *pages_to_evict,
                                                    uvm_va_block_region_t region);

-    // Migrate a GPU device-private page to system memory. This is
-    // called to remove CPU page table references to device private
-    // struct pages for the given GPU after all other references in
-    // va_blocks have been released and the GPU is in the process of
-    // being removed/torn down. Note that there is no mm, VMA,
-    // va_block or any user channel activity on this GPU.
-    NV_STATUS uvm_hmm_pmm_gpu_evict_pfn(unsigned long pfn);
-
    // This returns what would be the intersection of va_block start/end and
    // VMA start/end-1 for the given 'lookup_address' if
    // uvm_hmm_va_block_find_create() was called.
@@ -412,9 +396,8 @@ typedef struct
        return false;
    }

-    static NV_STATUS uvm_hmm_va_space_initialize(uvm_va_space_t *va_space)
+    static void uvm_hmm_va_space_initialize(uvm_va_space_t *va_space)
    {
-        return NV_OK;
    }

    static void uvm_hmm_va_space_destroy(uvm_va_space_t *va_space)
@@ -440,19 +423,19 @@ typedef struct

    static NV_STATUS uvm_hmm_va_block_find_create(uvm_va_space_t *va_space,
                                                  NvU64 addr,
-                                                  uvm_va_block_context_t *va_block_context,
+                                                  struct vm_area_struct **vma,
                                                  uvm_va_block_t **va_block_ptr)
    {
        return NV_ERR_INVALID_ADDRESS;
    }

-    static NV_STATUS uvm_hmm_find_vma(uvm_va_block_context_t *va_block_context, NvU64 addr)
+    static NV_STATUS uvm_hmm_find_vma(struct mm_struct *mm, struct vm_area_struct **vma, NvU64 addr)
    {
        return NV_OK;
    }

    static bool uvm_hmm_check_context_vma_is_valid(uvm_va_block_t *va_block,
-                                                   uvm_va_block_context_t *va_block_context,
+                                                   struct vm_area_struct *vma,
                                                   uvm_va_block_region_t region)
    {
        return true;
@@ -533,16 +516,19 @@ typedef struct
        return NV_ERR_INVALID_ADDRESS;
    }

-    static void uvm_hmm_find_policy_end(uvm_va_block_t *va_block,
-                                        uvm_va_block_context_t *va_block_context,
-                                        unsigned long addr,
-                                        NvU64 *endp)
+    static const uvm_va_policy_t *uvm_hmm_find_policy_end(uvm_va_block_t *va_block,
+                                                          struct vm_area_struct *vma,
+                                                          unsigned long addr,
+                                                          NvU64 *endp)
    {
+        UVM_ASSERT(0);
+        return NULL;
    }

    static NV_STATUS uvm_hmm_find_policy_vma_and_outer(uvm_va_block_t *va_block,
-                                                       uvm_va_block_context_t *va_block_context,
+                                                       struct vm_area_struct **vma,
                                                       uvm_page_index_t page_index,
+                                                       const uvm_va_policy_t **policy,
                                                       uvm_page_index_t *outerp)
    {
        return NV_OK;
@@ -554,14 +540,15 @@ typedef struct
    }

    static uvm_va_block_region_t uvm_hmm_get_prefetch_region(uvm_va_block_t *va_block,
-                                                             uvm_va_block_context_t *va_block_context,
+                                                             struct vm_area_struct *vma,
+                                                             const uvm_va_policy_t *policy,
                                                             NvU64 address)
    {
        return (uvm_va_block_region_t){};
    }

    static uvm_prot_t uvm_hmm_compute_logical_prot(uvm_va_block_t *va_block,
-                                                   uvm_va_block_context_t *va_block_context,
+                                                   struct vm_area_struct *vma,
                                                   NvU64 addr)
    {
        return UVM_PROT_NONE;
@@ -597,8 +584,10 @@ typedef struct
        return NV_ERR_INVALID_ADDRESS;
    }

-    static void uvm_hmm_evict_va_blocks(uvm_va_space_t *va_space)
+    static NV_STATUS uvm_hmm_remote_cpu_fault(struct vm_fault *vmf)
    {
+        UVM_ASSERT(0);
+        return NV_ERR_INVALID_ADDRESS;
    }

    static NV_STATUS uvm_hmm_va_block_evict_chunk_prep(uvm_va_block_t *va_block,
@@ -627,11 +616,6 @@ typedef struct
        return NV_OK;
    }

-    static NV_STATUS uvm_hmm_pmm_gpu_evict_pfn(unsigned long pfn)
-    {
-        return NV_OK;
-    }
-
    static NV_STATUS uvm_hmm_va_block_range_bounds(uvm_va_space_t *va_space,
                                                   struct mm_struct *mm,
                                                   NvU64 lookup_address,
--- a/kernel-open/nvidia-uvm/uvm_hopper.c
+++ b/kernel-open/nvidia-uvm/uvm_hopper.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2020-2022 NVIDIA Corporation
+    Copyright (c) 2020-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -59,9 +59,13 @@ void uvm_hal_hopper_arch_init_properties(uvm_parent_gpu_t *parent_gpu)

    // Physical CE writes to vidmem are non-coherent with respect to the CPU on
    // GH180.
-    parent_gpu->ce_phys_vidmem_write_supported = !uvm_gpu_is_coherent(parent_gpu);
+    parent_gpu->ce_phys_vidmem_write_supported = !uvm_parent_gpu_is_coherent(parent_gpu);

-    parent_gpu->peer_copy_mode = g_uvm_global.peer_copy_mode;
+    // TODO: Bug 4174553: [HGX-SkinnyJoe][GH180] channel errors discussion/debug
+    //                    portion for the uvm tests became nonresponsive after
+    //                    some time and then failed even after reboot
+    parent_gpu->peer_copy_mode = uvm_parent_gpu_is_coherent(parent_gpu) ?
+                                                           UVM_GPU_PEER_COPY_MODE_VIRTUAL : g_uvm_global.peer_copy_mode;

    // All GR context buffers may be mapped to 57b wide VAs. All "compute" units
    // accessing GR context buffers support the 57-bit VA range.
--- a/kernel-open/nvidia-uvm/uvm_hopper_ce.c
+++ b/kernel-open/nvidia-uvm/uvm_hopper_ce.c
@@ -491,7 +491,6 @@ void uvm_hal_hopper_ce_encrypt(uvm_push_t *push,
    uvm_gpu_t *gpu = uvm_push_get_gpu(push);

    UVM_ASSERT(uvm_conf_computing_mode_is_hcc(gpu));
-    UVM_ASSERT(uvm_push_is_fake(push) || uvm_channel_is_secure(push->channel));
    UVM_ASSERT(IS_ALIGNED(auth_tag.address, UVM_CONF_COMPUTING_AUTH_TAG_ALIGNMENT));

    if (!src.is_virtual)
@@ -540,7 +539,6 @@ void uvm_hal_hopper_ce_decrypt(uvm_push_t *push,
    uvm_gpu_t *gpu = uvm_push_get_gpu(push);

    UVM_ASSERT(uvm_conf_computing_mode_is_hcc(gpu));
-    UVM_ASSERT(!push->channel || uvm_channel_is_secure(push->channel));
    UVM_ASSERT(IS_ALIGNED(auth_tag.address, UVM_CONF_COMPUTING_AUTH_TAG_ALIGNMENT));

    // The addressing mode (and aperture, if applicable) of the source and
--- a/kernel-open/nvidia-uvm/uvm_linux.h
+++ b/kernel-open/nvidia-uvm/uvm_linux.h
@@ -128,8 +128,9 @@ static inline const struct cpumask *uvm_cpumask_of_node(int node)
 // present if we see the callback.
 //
 // The callback was added in commit 0f0a327fa12cd55de5e7f8c05a70ac3d047f405e,
-// v3.19 (2014-11-13).
-    #if defined(NV_MMU_NOTIFIER_OPS_HAS_INVALIDATE_RANGE)
+// v3.19 (2014-11-13) and renamed in commit 1af5a8109904.
+    #if defined(NV_MMU_NOTIFIER_OPS_HAS_INVALIDATE_RANGE) || \
+        defined(NV_MMU_NOTIFIER_OPS_HAS_ARCH_INVALIDATE_SECONDARY_TLBS)
        #define UVM_CAN_USE_MMU_NOTIFIERS() 1
    #else
        #define UVM_CAN_USE_MMU_NOTIFIERS() 0
@@ -153,10 +154,6 @@ static inline const struct cpumask *uvm_cpumask_of_node(int node)
 #define VM_MIXEDMAP    0x00000000
 #endif

-#if !defined(MPOL_PREFERRED_MANY)
-#define MPOL_PREFERRED_MANY    5
-#endif
-
 //
 // printk.h already defined pr_fmt, so we have to redefine it so the pr_*
 // routines pick up our version
@@ -352,6 +349,47 @@ static inline NvU64 NV_GETTIME(void)
             (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
 #endif

+#if !defined(NV_FIND_NEXT_BIT_WRAP_PRESENT)
+    static inline unsigned long find_next_bit_wrap(const unsigned long *addr, unsigned long size, unsigned long offset)
+    {
+        unsigned long bit = find_next_bit(addr, size, offset);
+
+        if (bit < size)
+            return bit;
+
+        bit = find_first_bit(addr, offset);
+        return bit < offset ? bit : size;
+    }
+#endif
+
+// for_each_set_bit_wrap and __for_each_wrap were introduced in v6.1-rc1
+// by commit 4fe49b3b97c2640147c46519c2a6fdb06df34f5f
+#if !defined(for_each_set_bit_wrap)
+static inline unsigned long __for_each_wrap(const unsigned long *bitmap,
+                                            unsigned long size,
+                                            unsigned long start,
+                                            unsigned long n)
+{
+    unsigned long bit;
+
+    if (n > start) {
+        bit = find_next_bit(bitmap, size, n);
+        if (bit < size)
+            return bit;
+
+        n = 0;
+    }
+
+    bit = find_next_bit(bitmap, start, n);
+    return bit < start ? bit : size;
+}
+
+#define for_each_set_bit_wrap(bit, addr, size, start)                   \
+    for ((bit) = find_next_bit_wrap((addr), (size), (start));           \
+         (bit) < (size);                                                \
+         (bit) = __for_each_wrap((addr), (size), (start), (bit) + 1))
+#endif
+
 // Added in 2.6.24
 #ifndef ACCESS_ONCE
  #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
@@ -583,4 +621,5 @@ static inline pgprot_t uvm_pgprot_decrypted(pgprot_t prot)
  #include <asm/page.h>
  #define page_to_virt(x)    __va(PFN_PHYS(page_to_pfn(x)))
 #endif
+
 #endif // _UVM_LINUX_H
--- a/kernel-open/nvidia-uvm/uvm_lock.h
+++ b/kernel-open/nvidia-uvm/uvm_lock.h
@@ -279,13 +279,14 @@
 //      Operations not allowed while holding the lock:
 //      - GPU memory allocation which can evict memory (would require nesting
 //        block locks)
+//
 // - GPU DMA Allocation pool lock (gpu->conf_computing.dma_buffer_pool.lock)
 //      Order: UVM_LOCK_ORDER_CONF_COMPUTING_DMA_BUFFER_POOL
+//      Condition: The Confidential Computing feature is enabled
 //      Exclusive lock (mutex)
 //
 //      Protects:
 //      - Protect the state of the uvm_conf_computing_dma_buffer_pool_t
-//        when the Confidential Computing feature is enabled on the system.
 //
 // - Chunk mapping lock (gpu->root_chunk_mappings.bitlocks and
 //   gpu->sysmem_mappings.bitlock)
@@ -321,22 +322,25 @@
 //      Operations not allowed while holding this lock
 //      - GPU memory allocation which can evict
 //
-// - Secure channel CSL channel pool semaphore
+// - CE channel CSL channel pool semaphore
 //      Order: UVM_LOCK_ORDER_CSL_PUSH
-//      Semaphore per SEC2 channel pool
+//      Condition: The Confidential Computing feature is enabled
+//      Semaphore per CE channel pool
 //
-//      The semaphore controls concurrent pushes to secure channels. Secure work
-//      submission depends on channel availability in GPFIFO entries (as in any
-//      other channel type) but also on channel locking. Each secure channel has a
-//      lock to enforce ordering of pushes. The channel's CSL lock is taken on
-//      channel reservation until uvm_push_end. Secure channels are stateful
-//      channels and the CSL lock protects their CSL state/context.
+//      The semaphore controls concurrent pushes to CE channels that are not WCL
+//      channels. Secure work submission depends on channel availability in
+//      GPFIFO entries (as in any other channel type) but also on channel
+//      locking. Each channel has a lock to enforce ordering of pushes. The
+//      channel's CSL lock is taken on channel reservation until uvm_push_end.
+//      When the Confidential Computing feature is enabled, channels are
+//      stateful, and the CSL lock protects their CSL state/context.
 //
 //      Operations allowed while holding this lock
-//      - Pushing work to CE secure channels
+//      - Pushing work to CE channels (except for WLC channels)
 //
 // - WLC CSL channel pool semaphore
 //      Order: UVM_LOCK_ORDER_CSL_WLC_PUSH
+//      Condition: The Confidential Computing feature is enabled
 //      Semaphore per WLC channel pool
 //
 //      The semaphore controls concurrent pushes to WLC channels. WLC work
@@ -346,8 +350,8 @@
 //      channel reservation until uvm_push_end. SEC2 channels are stateful
 //      channels and the CSL lock protects their CSL state/context.
 //
-//      This lock ORDER is different and sits below generic secure channel CSL
-//      lock and above SEC2 CSL lock. This reflects the dual nature of WLC
+//      This lock ORDER is different and sits below the generic channel CSL
+//      lock and above the SEC2 CSL lock. This reflects the dual nature of WLC
 //      channels; they use SEC2 indirect work launch during initialization,
 //      and after their schedule is initialized they provide indirect launch
 //      functionality to other CE channels.
@@ -357,6 +361,7 @@
 //
 // - SEC2 CSL channel pool semaphore
 //      Order: UVM_LOCK_ORDER_SEC2_CSL_PUSH
+//      Condition: The Confidential Computing feature is enabled
 //      Semaphore per SEC2 channel pool
 //
 //      The semaphore controls concurrent pushes to SEC2 channels. SEC2 work
@@ -366,9 +371,9 @@
 //      channel reservation until uvm_push_end. SEC2 channels are stateful
 //      channels and the CSL lock protects their CSL state/context.
 //
-//      This lock ORDER is different and lower than the generic secure channel
-//      lock to allow secure work submission to use a SEC2 channel to submit
-//      work before releasing the CSL lock of the originating secure channel.
+//      This lock ORDER is different and lower than UVM_LOCK_ORDER_CSL_PUSH
+//      to allow secure work submission to use a SEC2 channel to submit
+//      work before releasing the CSL lock of the originating channel.
 //
 //      Operations allowed while holding this lock
 //      - Pushing work to SEC2 channels
@@ -408,16 +413,18 @@
 //
 // - WLC Channel lock
 //      Order: UVM_LOCK_ORDER_WLC_CHANNEL
+//      Condition: The Confidential Computing feature is enabled
 //      Spinlock (uvm_spinlock_t)
 //
 //      Lock protecting the state of WLC channels in a channel pool. This lock
-//      is separate from the above generic channel lock to allow for indirect
-//      worklaunch pushes while holding the main channel lock.
-//      (WLC pushes don't need any of the pushbuffer locks described above)
+//      is separate from the generic channel lock (UVM_LOCK_ORDER_CHANNEL)
+//      to allow for indirect worklaunch pushes while holding the main channel
+//      lock (WLC pushes don't need any of the pushbuffer locks described
+//      above)
 //
 // - Tools global VA space list lock (g_tools_va_space_list_lock)
 //      Order: UVM_LOCK_ORDER_TOOLS_VA_SPACE_LIST
-//      Reader/writer lock (rw_sempahore)
+//      Reader/writer lock (rw_semaphore)
 //
 //      This lock protects the list of VA spaces used when broadcasting
 //      UVM profiling events.
@@ -437,9 +444,10 @@
 //
 // - Tracking semaphores
 //      Order: UVM_LOCK_ORDER_SECURE_SEMAPHORE
-//      When the Confidential Computing feature is enabled, CE semaphores are
-//      encrypted, and require to take the CSL lock (UVM_LOCK_ORDER_LEAF) to
-//      decrypt the payload.
+//      Condition: The Confidential Computing feature is enabled
+//
+//      CE semaphore payloads are encrypted, and require to take the CSL lock
+//      (UVM_LOCK_ORDER_LEAF) to decrypt the payload.
 //
 // - Leaf locks
 //      Order: UVM_LOCK_ORDER_LEAF
--- a/kernel-open/nvidia-uvm/uvm_map_external.c
+++ b/kernel-open/nvidia-uvm/uvm_map_external.c
@@ -355,6 +355,7 @@ static uvm_membar_t va_range_downgrade_membar(uvm_va_range_t *va_range, uvm_ext_
    if (!ext_gpu_map->mem_handle)
        return UVM_MEMBAR_GPU;

+    // EGM uses the same barriers as sysmem.
    return uvm_hal_downgrade_membar_type(ext_gpu_map->gpu,
                                         !ext_gpu_map->is_sysmem && ext_gpu_map->gpu == ext_gpu_map->owning_gpu);
 }
@@ -633,6 +634,8 @@ static NV_STATUS set_ext_gpu_map_location(uvm_ext_gpu_map_t *ext_gpu_map,
                                          const UvmGpuMemoryInfo *mem_info)
 {
    uvm_gpu_t *owning_gpu;
+    if (mem_info->egm)
+        UVM_ASSERT(mem_info->sysmem);

    if (!mem_info->deviceDescendant && !mem_info->sysmem) {
        ext_gpu_map->owning_gpu = NULL;
@@ -641,6 +644,7 @@ static NV_STATUS set_ext_gpu_map_location(uvm_ext_gpu_map_t *ext_gpu_map,
    }
    // This is a local or peer allocation, so the owning GPU must have been
    // registered.
+    // This also checks for if EGM owning GPU is registered.
    owning_gpu = uvm_va_space_get_gpu_by_uuid(va_space, &mem_info->uuid);
    if (!owning_gpu)
        return NV_ERR_INVALID_DEVICE;
@@ -651,13 +655,10 @@ static NV_STATUS set_ext_gpu_map_location(uvm_ext_gpu_map_t *ext_gpu_map,
    // crashes when it's eventually freed.
    // TODO: Bug 1811006: Bug tracking the RM issue, its fix might change the
    // semantics of sysmem allocations.
-    if (mem_info->sysmem) {
-        ext_gpu_map->owning_gpu = owning_gpu;
-        ext_gpu_map->is_sysmem = true;
-        return NV_OK;
-    }

-    if (owning_gpu != mapping_gpu) {
+    // Check if peer access for peer memory is enabled.
+    // This path also handles EGM allocations.
+    if (owning_gpu != mapping_gpu && (!mem_info->sysmem || mem_info->egm)) {
        // TODO: Bug 1757136: In SLI, the returned UUID may be different but a
        //       local mapping must be used. We need to query SLI groups to know
        //       that.
@@ -666,7 +667,9 @@ static NV_STATUS set_ext_gpu_map_location(uvm_ext_gpu_map_t *ext_gpu_map,
    }

    ext_gpu_map->owning_gpu = owning_gpu;
-    ext_gpu_map->is_sysmem = false;
+    ext_gpu_map->is_sysmem = mem_info->sysmem;
+    ext_gpu_map->is_egm = mem_info->egm;
+
    return NV_OK;
 }

@@ -719,6 +722,7 @@ static NV_STATUS uvm_ext_gpu_map_split(uvm_range_tree_t *tree,
    new->gpu = existing_map->gpu;
    new->owning_gpu = existing_map->owning_gpu;
    new->is_sysmem = existing_map->is_sysmem;
+    new->is_egm = existing_map->is_egm;

    // Initialize the new ext_gpu_map tracker as a copy of the existing_map tracker.
    // This way, any operations on any of the two ext_gpu_maps will be able to
--- a/kernel-open/nvidia-uvm/uvm_mem.h
+++ b/kernel-open/nvidia-uvm/uvm_mem.h
@@ -392,12 +392,6 @@ static NV_STATUS uvm_mem_alloc_vidmem(NvU64 size, uvm_gpu_t *gpu, uvm_mem_t **me
    return uvm_mem_alloc(&params, mem_out);
 }

-// Helper for allocating protected vidmem with the default page size
-static NV_STATUS uvm_mem_alloc_vidmem_protected(NvU64 size, uvm_gpu_t *gpu, uvm_mem_t **mem_out)
-{
-    return uvm_mem_alloc_vidmem(size, gpu, mem_out);
-}
-
 // Helper for allocating sysmem and mapping it on the CPU
 static NV_STATUS uvm_mem_alloc_sysmem_and_map_cpu_kernel(NvU64 size, struct mm_struct *mm, uvm_mem_t **mem_out)
 {
--- a/kernel-open/nvidia-uvm/uvm_migrate.c
+++ b/kernel-open/nvidia-uvm/uvm_migrate.c
@@ -130,9 +130,25 @@ static NV_STATUS block_migrate_map_unmapped_pages(uvm_va_block_t *va_block,
    NV_STATUS status = NV_OK;
    NV_STATUS tracker_status;

-    // Save the mask of unmapped pages because it will change after the
+    // Get the mask of unmapped pages because it will change after the
    // first map operation
-    uvm_page_mask_complement(&va_block_context->caller_page_mask, &va_block->maybe_mapped_pages);
+    uvm_va_block_unmapped_pages_get(va_block, region, &va_block_context->caller_page_mask);
+
+    if (uvm_va_block_is_hmm(va_block) && !UVM_ID_IS_CPU(dest_id)) {
+        // Do not map pages that are already resident on the CPU. This is in
+        // order to avoid breaking system-wide atomic operations on HMM. HMM's
+        // implementation of system-side atomic operations involves restricting
+        // mappings to one processor (CPU or a GPU) at a time. If we were to
+        // grant a GPU a mapping to system memory, this gets into trouble
+        // because, on the CPU side, Linux can silently upgrade PTE permissions
+        // (move from read-only, to read-write, without any MMU notifiers
+        // firing), thus breaking the model by allowing simultaneous read-write
+        // access from two separate processors. To avoid that, just don't map
+        // such pages at all, when migrating.
+        uvm_page_mask_andnot(&va_block_context->caller_page_mask,
+                             &va_block_context->caller_page_mask,
+                             uvm_va_block_resident_mask_get(va_block, UVM_ID_CPU, NUMA_NO_NODE));
+    }

    // Only map those pages that are not mapped anywhere else (likely due
    // to a first touch or a migration). We pass
@@ -207,7 +223,7 @@ NV_STATUS uvm_va_block_migrate_locked(uvm_va_block_t *va_block,
    NV_STATUS status, tracker_status = NV_OK;

    uvm_assert_mutex_locked(&va_block->lock);
-    UVM_ASSERT(uvm_hmm_check_context_vma_is_valid(va_block, va_block_context, region));
+    UVM_ASSERT(uvm_hmm_check_context_vma_is_valid(va_block, va_block_context->hmm.vma, region));

    if (uvm_va_block_is_hmm(va_block)) {
        status = uvm_hmm_va_block_migrate_locked(va_block,
@@ -218,9 +234,9 @@ NV_STATUS uvm_va_block_migrate_locked(uvm_va_block_t *va_block,
                                                 UVM_MAKE_RESIDENT_CAUSE_API_MIGRATE);
    }
    else {
-        va_block_context->policy = uvm_va_range_get_policy(va_block->va_range);
+        uvm_va_policy_t *policy = uvm_va_range_get_policy(va_block->va_range);

-        if (uvm_va_policy_is_read_duplicate(va_block_context->policy, va_space)) {
+        if (uvm_va_policy_is_read_duplicate(policy, va_space)) {
            status = uvm_va_block_make_resident_read_duplicate(va_block,
                                                               va_block_retry,
                                                               va_block_context,
@@ -355,15 +371,13 @@ static bool va_block_should_do_cpu_preunmap(uvm_va_block_t *va_block,
    if (!va_block)
        return true;

-    UVM_ASSERT(va_range_should_do_cpu_preunmap(va_block_context->policy, uvm_va_block_get_va_space(va_block)));
-
    region = uvm_va_block_region_from_start_end(va_block, max(start, va_block->start), min(end, va_block->end));

    uvm_mutex_lock(&va_block->lock);

    mapped_pages_cpu = uvm_va_block_map_mask_get(va_block, UVM_ID_CPU);
    if (uvm_processor_mask_test(&va_block->resident, dest_id)) {
-        const uvm_page_mask_t *resident_pages_dest = uvm_va_block_resident_mask_get(va_block, dest_id);
+        const uvm_page_mask_t *resident_pages_dest = uvm_va_block_resident_mask_get(va_block, dest_id, NUMA_NO_NODE);
        uvm_page_mask_t *do_not_unmap_pages = &va_block_context->scratch_page_mask;

        // TODO: Bug 1877578
@@ -480,11 +494,9 @@ static NV_STATUS uvm_va_range_migrate(uvm_va_range_t *va_range,
                                      uvm_tracker_t *out_tracker)
 {
    NvU64 preunmap_range_start = start;
+    uvm_va_policy_t *policy = uvm_va_range_get_policy(va_range);

-    UVM_ASSERT(va_block_context->policy == uvm_va_range_get_policy(va_range));
-
-    should_do_cpu_preunmap = should_do_cpu_preunmap && va_range_should_do_cpu_preunmap(va_block_context->policy,
-                                                                                       va_range->va_space);
+    should_do_cpu_preunmap = should_do_cpu_preunmap && va_range_should_do_cpu_preunmap(policy, va_range->va_space);

    // Divide migrations into groups of contiguous VA blocks. This is to trigger
    // CPU unmaps for that region before the migration starts.
@@ -561,8 +573,6 @@ static NV_STATUS uvm_migrate_ranges(uvm_va_space_t *va_space,
            break;
        }

-        va_block_context->policy = uvm_va_range_get_policy(va_range);
-
        // For UVM-Lite GPUs, the CUDA driver may suballocate a single va_range
        // into many range groups.  For this reason, we iterate over each va_range first
        // then through the range groups within.
@@ -637,6 +647,8 @@ static NV_STATUS uvm_migrate(uvm_va_space_t *va_space,

    if (mm)
        uvm_assert_mmap_lock_locked(mm);
+    else if (!first_va_range)
+        return NV_ERR_INVALID_ADDRESS;

    va_block_context = uvm_va_block_context_alloc(mm);
    if (!va_block_context)
--- a/kernel-open/nvidia-uvm/uvm_migrate_pageable.h
+++ b/kernel-open/nvidia-uvm/uvm_migrate_pageable.h
@@ -34,8 +34,8 @@ typedef struct
 {
    uvm_va_space_t                  *va_space;
    struct mm_struct                *mm;
-    unsigned long                   start;
-    unsigned long                   length;
+    const unsigned long             start;
+    const unsigned long             length;
    uvm_processor_id_t              dst_id;

    // dst_node_id may be clobbered by uvm_migrate_pageable().
--- a/kernel-open/nvidia-uvm/uvm_mmu.c
+++ b/kernel-open/nvidia-uvm/uvm_mmu.c
@@ -456,13 +456,13 @@ static void pde_fill_gpu(uvm_page_tree_t *tree,
        NvU32 max_inline_entries = UVM_PUSH_INLINE_DATA_MAX_SIZE / sizeof(pde_data);
        uvm_gpu_address_t inline_data_addr;
        uvm_push_inline_data_t inline_data;
-        NvU32 membar_flag = 0;
+        uvm_push_flag_t push_membar_flag = UVM_PUSH_FLAG_COUNT;
        NvU32 i;

        if (uvm_push_get_and_reset_flag(push, UVM_PUSH_FLAG_NEXT_MEMBAR_NONE))
-            membar_flag = UVM_PUSH_FLAG_NEXT_MEMBAR_NONE;
+            push_membar_flag = UVM_PUSH_FLAG_NEXT_MEMBAR_NONE;
        else if (uvm_push_get_and_reset_flag(push, UVM_PUSH_FLAG_NEXT_MEMBAR_GPU))
-            membar_flag = UVM_PUSH_FLAG_NEXT_MEMBAR_GPU;
+            push_membar_flag = UVM_PUSH_FLAG_NEXT_MEMBAR_GPU;

        for (i = 0; i < pde_count;) {
            NvU32 j;
@@ -482,8 +482,8 @@ static void pde_fill_gpu(uvm_page_tree_t *tree,
            // caller's membar flag.
            if (i + entry_count < pde_count)
                uvm_push_set_flag(push, UVM_PUSH_FLAG_NEXT_MEMBAR_NONE);
-            else if (membar_flag)
-                uvm_push_set_flag(push, membar_flag);
+            else if (push_membar_flag != UVM_PUSH_FLAG_COUNT)
+                uvm_push_set_flag(push, push_membar_flag);

            tree->gpu->parent->ce_hal->memcopy(push, pde_entry_addr, inline_data_addr, entry_count * sizeof(pde_data));

--- a/kernel-open/nvidia-uvm/uvm_perf_events_test.c
+++ b/kernel-open/nvidia-uvm/uvm_perf_events_test.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016-2019 NVIDIA Corporation
+    Copyright (c) 2016-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -22,10 +22,7 @@
 *******************************************************************************/

 #include "uvm_perf_events.h"
-#include "uvm_va_block.h"
-#include "uvm_va_range.h"
 #include "uvm_va_space.h"
-#include "uvm_kvmalloc.h"
 #include "uvm_test.h"

 // Global variable used to check that callbacks are correctly executed
@@ -46,10 +43,7 @@ static NV_STATUS test_events(uvm_va_space_t *va_space)
    NV_STATUS status;
    uvm_perf_event_data_t event_data;

-    uvm_va_block_t block;
-
    test_data = 0;
-
    memset(&event_data, 0, sizeof(event_data));

    // Use CPU id to avoid triggering the GPU stats update code
@@ -58,6 +52,7 @@ static NV_STATUS test_events(uvm_va_space_t *va_space)
    // Register a callback for page fault
    status = uvm_perf_register_event_callback(&va_space->perf_events, UVM_PERF_EVENT_FAULT, callback_inc_1);
    TEST_CHECK_GOTO(status == NV_OK, done);
+
    // Register a callback for page fault
    status = uvm_perf_register_event_callback(&va_space->perf_events, UVM_PERF_EVENT_FAULT, callback_inc_2);
    TEST_CHECK_GOTO(status == NV_OK, done);
@@ -65,13 +60,14 @@ static NV_STATUS test_events(uvm_va_space_t *va_space)
    // va_space read lock is required for page fault event notification
    uvm_va_space_down_read(va_space);

-    // Notify (fake) page fault. The two registered callbacks for this event increment the value of test_value
-    event_data.fault.block = &block;
+    // Notify (fake) page fault. The two registered callbacks for this event
+    // increment the value of test_value
    uvm_perf_event_notify(&va_space->perf_events, UVM_PERF_EVENT_FAULT, &event_data);

    uvm_va_space_up_read(va_space);

-    // test_data was initialized to zero. It should have been incremented by 1 and 2, respectively in the callbacks
+    // test_data was initialized to zero. It should have been incremented by 1
+    // and 2, respectively in the callbacks
    TEST_CHECK_GOTO(test_data == 3, done);

 done:
@@ -96,4 +92,3 @@ NV_STATUS uvm_test_perf_events_sanity(UVM_TEST_PERF_EVENTS_SANITY_PARAMS *params
 done:
    return status;
 }
-
--- a/kernel-open/nvidia-uvm/uvm_perf_prefetch.c
+++ b/kernel-open/nvidia-uvm/uvm_perf_prefetch.c
@@ -218,57 +218,11 @@ static void grow_fault_granularity(uvm_perf_prefetch_bitmap_tree_t *bitmap_tree,
    }
 }

-// Within a block we only allow prefetching to a single processor. Therefore,
-// if two processors are accessing non-overlapping regions within the same
-// block they won't benefit from prefetching.
-//
-// TODO: Bug 1778034: [uvm] Explore prefetching to different processors within
-// a VA block.
-static NvU32 uvm_perf_prefetch_prenotify_fault_migrations(uvm_va_block_t *va_block,
-                                                          uvm_va_block_context_t *va_block_context,
-                                                          uvm_processor_id_t new_residency,
-                                                          const uvm_page_mask_t *faulted_pages,
-                                                          uvm_va_block_region_t faulted_region,
-                                                          uvm_page_mask_t *prefetch_pages,
-                                                          uvm_perf_prefetch_bitmap_tree_t *bitmap_tree)
+static void init_bitmap_tree_from_region(uvm_perf_prefetch_bitmap_tree_t *bitmap_tree,
+                                         uvm_va_block_region_t max_prefetch_region,
+                                         const uvm_page_mask_t *resident_mask,
+                                         const uvm_page_mask_t *faulted_pages)
 {
-    uvm_page_index_t page_index;
-    const uvm_page_mask_t *resident_mask = NULL;
-    const uvm_page_mask_t *thrashing_pages = NULL;
-    uvm_va_space_t *va_space = uvm_va_block_get_va_space(va_block);
-    const uvm_va_policy_t *policy = va_block_context->policy;
-    uvm_va_block_region_t max_prefetch_region;
-    NvU32 big_page_size;
-    uvm_va_block_region_t big_pages_region;
-
-    if (!uvm_id_equal(va_block->prefetch_info.last_migration_proc_id, new_residency)) {
-        va_block->prefetch_info.last_migration_proc_id = new_residency;
-        va_block->prefetch_info.fault_migrations_to_last_proc = 0;
-    }
-
-    // Compute the expanded region that prefetching is allowed from.
-    if (uvm_va_block_is_hmm(va_block)) {
-        max_prefetch_region = uvm_hmm_get_prefetch_region(va_block,
-                                                          va_block_context,
-                                                          uvm_va_block_region_start(va_block, faulted_region));
-    }
-    else {
-        max_prefetch_region = uvm_va_block_region_from_block(va_block);
-    }
-
-    uvm_page_mask_zero(prefetch_pages);
-
-    if (UVM_ID_IS_CPU(new_residency) || va_block->gpus[uvm_id_gpu_index(new_residency)] != NULL)
-        resident_mask = uvm_va_block_resident_mask_get(va_block, new_residency);
-
-    // If this is a first-touch fault and the destination processor is the
-    // preferred location, populate the whole max_prefetch_region.
-    if (uvm_processor_mask_empty(&va_block->resident) &&
-        uvm_id_equal(new_residency, policy->preferred_location)) {
-        uvm_page_mask_region_fill(prefetch_pages, max_prefetch_region);
-        goto done;
-    }
-
    if (resident_mask)
        uvm_page_mask_or(&bitmap_tree->pages, resident_mask, faulted_pages);
    else
@@ -277,6 +231,29 @@ static NvU32 uvm_perf_prefetch_prenotify_fault_migrations(uvm_va_block_t *va_blo
    // If we are using a subregion of the va_block, align bitmap_tree
    uvm_page_mask_shift_right(&bitmap_tree->pages, &bitmap_tree->pages, max_prefetch_region.first);

+    bitmap_tree->offset = 0;
+    bitmap_tree->leaf_count = uvm_va_block_region_num_pages(max_prefetch_region);
+    bitmap_tree->level_count = ilog2(roundup_pow_of_two(bitmap_tree->leaf_count)) + 1;
+}
+
+static void update_bitmap_tree_from_va_block(uvm_perf_prefetch_bitmap_tree_t *bitmap_tree,
+                                             uvm_va_block_t *va_block,
+                                             uvm_va_block_context_t *va_block_context,
+                                             uvm_processor_id_t new_residency,
+                                             const uvm_page_mask_t *faulted_pages,
+                                             uvm_va_block_region_t max_prefetch_region)
+
+{
+    NvU32 big_page_size;
+    uvm_va_block_region_t big_pages_region;
+    uvm_va_space_t *va_space;
+    const uvm_page_mask_t *thrashing_pages;
+
+    UVM_ASSERT(va_block);
+    UVM_ASSERT(va_block_context);
+
+    va_space = uvm_va_block_get_va_space(va_block);
+
    // Get the big page size for the new residency.
    // Assume 64K size if the new residency is the CPU or no GPU va space is
    // registered in the current process for this GPU.
@@ -302,13 +279,9 @@ static NvU32 uvm_perf_prefetch_prenotify_fault_migrations(uvm_va_block_t *va_blo
        UVM_ASSERT(bitmap_tree->leaf_count <= PAGES_PER_UVM_VA_BLOCK);

        uvm_page_mask_shift_left(&bitmap_tree->pages, &bitmap_tree->pages, bitmap_tree->offset);
-    }
-    else {
-        bitmap_tree->offset = 0;
-        bitmap_tree->leaf_count = uvm_va_block_region_num_pages(max_prefetch_region);
-    }

-    bitmap_tree->level_count = ilog2(roundup_pow_of_two(bitmap_tree->leaf_count)) + 1;
+        bitmap_tree->level_count = ilog2(roundup_pow_of_two(bitmap_tree->leaf_count)) + 1;
+    }

    thrashing_pages = uvm_perf_thrashing_get_thrashing_pages(va_block);

@@ -320,25 +293,99 @@ static NvU32 uvm_perf_prefetch_prenotify_fault_migrations(uvm_va_block_t *va_blo
                           max_prefetch_region,
                           faulted_pages,
                           thrashing_pages);
+}

-    // Do not compute prefetch regions with faults on pages that are thrashing
-    if (thrashing_pages)
-        uvm_page_mask_andnot(&va_block_context->scratch_page_mask, faulted_pages, thrashing_pages);
-    else
-        uvm_page_mask_copy(&va_block_context->scratch_page_mask, faulted_pages);
+static void compute_prefetch_mask(uvm_va_block_region_t faulted_region,
+                                  uvm_va_block_region_t max_prefetch_region,
+                                  uvm_perf_prefetch_bitmap_tree_t *bitmap_tree,
+                                  const uvm_page_mask_t *faulted_pages,
+                                  uvm_page_mask_t *out_prefetch_mask)
+{
+    uvm_page_index_t page_index;

-    // Update the tree using the scratch mask to compute the pages to prefetch
-    for_each_va_block_page_in_region_mask(page_index, &va_block_context->scratch_page_mask, faulted_region) {
+    uvm_page_mask_zero(out_prefetch_mask);
+
+    // Update the tree using the faulted mask to compute the pages to prefetch.
+    for_each_va_block_page_in_region_mask(page_index, faulted_pages, faulted_region) {
        uvm_va_block_region_t region = compute_prefetch_region(page_index, bitmap_tree, max_prefetch_region);

-        uvm_page_mask_region_fill(prefetch_pages, region);
+        uvm_page_mask_region_fill(out_prefetch_mask, region);

        // Early out if we have already prefetched until the end of the VA block
        if (region.outer == max_prefetch_region.outer)
            break;
    }
+}
+
+// Within a block we only allow prefetching to a single processor. Therefore,
+// if two processors are accessing non-overlapping regions within the same
+// block they won't benefit from prefetching.
+//
+// TODO: Bug 1778034: [uvm] Explore prefetching to different processors within
+// a VA block.
+static NvU32 uvm_perf_prefetch_prenotify_fault_migrations(uvm_va_block_t *va_block,
+                                                          uvm_va_block_context_t *va_block_context,
+                                                          uvm_processor_id_t new_residency,
+                                                          const uvm_page_mask_t *faulted_pages,
+                                                          uvm_va_block_region_t faulted_region,
+                                                          uvm_page_mask_t *prefetch_pages,
+                                                          uvm_perf_prefetch_bitmap_tree_t *bitmap_tree)
+{
+    const uvm_page_mask_t *resident_mask = NULL;
+    const uvm_va_policy_t *policy = uvm_va_policy_get_region(va_block, faulted_region);
+    uvm_va_block_region_t max_prefetch_region;
+    const uvm_page_mask_t *thrashing_pages = uvm_perf_thrashing_get_thrashing_pages(va_block);
+
+    if (!uvm_id_equal(va_block->prefetch_info.last_migration_proc_id, new_residency)) {
+        va_block->prefetch_info.last_migration_proc_id = new_residency;
+        va_block->prefetch_info.fault_migrations_to_last_proc = 0;
+    }
+
+    // Compute the expanded region that prefetching is allowed from.
+    if (uvm_va_block_is_hmm(va_block)) {
+        max_prefetch_region = uvm_hmm_get_prefetch_region(va_block,
+                                                          va_block_context->hmm.vma,
+                                                          policy,
+                                                          uvm_va_block_region_start(va_block, faulted_region));
+    }
+    else {
+        max_prefetch_region = uvm_va_block_region_from_block(va_block);
+    }
+
+    uvm_page_mask_zero(prefetch_pages);
+
+    if (UVM_ID_IS_CPU(new_residency) || va_block->gpus[uvm_id_gpu_index(new_residency)] != NULL)
+        resident_mask = uvm_va_block_resident_mask_get(va_block, new_residency, NUMA_NO_NODE);
+
+    // If this is a first-touch fault and the destination processor is the
+    // preferred location, populate the whole max_prefetch_region.
+    if (uvm_processor_mask_empty(&va_block->resident) &&
+        uvm_id_equal(new_residency, policy->preferred_location)) {
+        uvm_page_mask_region_fill(prefetch_pages, max_prefetch_region);
+    }
+    else {
+        init_bitmap_tree_from_region(bitmap_tree, max_prefetch_region, resident_mask, faulted_pages);
+
+        update_bitmap_tree_from_va_block(bitmap_tree,
+                                         va_block,
+                                         va_block_context,
+                                         new_residency,
+                                         faulted_pages,
+                                         max_prefetch_region);
+
+        // Do not compute prefetch regions with faults on pages that are thrashing
+        if (thrashing_pages)
+            uvm_page_mask_andnot(&va_block_context->scratch_page_mask, faulted_pages, thrashing_pages);
+        else
+            uvm_page_mask_copy(&va_block_context->scratch_page_mask, faulted_pages);
+
+        compute_prefetch_mask(faulted_region,
+                              max_prefetch_region,
+                              bitmap_tree,
+                              &va_block_context->scratch_page_mask,
+                              prefetch_pages);
+    }

-done:
    // Do not prefetch pages that are going to be migrated/populated due to a
    // fault
    uvm_page_mask_andnot(prefetch_pages, prefetch_pages, faulted_pages);
@@ -364,31 +411,58 @@ done:
    return uvm_page_mask_weight(prefetch_pages);
 }

-void uvm_perf_prefetch_get_hint(uvm_va_block_t *va_block,
-                                uvm_va_block_context_t *va_block_context,
-                                uvm_processor_id_t new_residency,
-                                const uvm_page_mask_t *faulted_pages,
-                                uvm_va_block_region_t faulted_region,
-                                uvm_perf_prefetch_bitmap_tree_t *bitmap_tree,
-                                uvm_perf_prefetch_hint_t *out_hint)
+bool uvm_perf_prefetch_enabled(uvm_va_space_t *va_space)
+{
+    if (!g_uvm_perf_prefetch_enable)
+        return false;
+
+    UVM_ASSERT(va_space);
+
+    return va_space->test.page_prefetch_enabled;
+}
+
+void uvm_perf_prefetch_compute_ats(uvm_va_space_t *va_space,
+                                   const uvm_page_mask_t *faulted_pages,
+                                   uvm_va_block_region_t faulted_region,
+                                   uvm_va_block_region_t max_prefetch_region,
+                                   const uvm_page_mask_t *residency_mask,
+                                   uvm_perf_prefetch_bitmap_tree_t *bitmap_tree,
+                                   uvm_page_mask_t *out_prefetch_mask)
+{
+    UVM_ASSERT(faulted_pages);
+    UVM_ASSERT(bitmap_tree);
+    UVM_ASSERT(out_prefetch_mask);
+
+    uvm_page_mask_zero(out_prefetch_mask);
+
+    if (!uvm_perf_prefetch_enabled(va_space))
+        return;
+
+    init_bitmap_tree_from_region(bitmap_tree, max_prefetch_region, residency_mask, faulted_pages);
+
+    compute_prefetch_mask(faulted_region, max_prefetch_region, bitmap_tree, faulted_pages, out_prefetch_mask);
+}
+
+void uvm_perf_prefetch_get_hint_va_block(uvm_va_block_t *va_block,
+                                         uvm_va_block_context_t *va_block_context,
+                                         uvm_processor_id_t new_residency,
+                                         const uvm_page_mask_t *faulted_pages,
+                                         uvm_va_block_region_t faulted_region,
+                                         uvm_perf_prefetch_bitmap_tree_t *bitmap_tree,
+                                         uvm_perf_prefetch_hint_t *out_hint)
 {
-    const uvm_va_policy_t *policy = va_block_context->policy;
    uvm_va_space_t *va_space = uvm_va_block_get_va_space(va_block);
    uvm_page_mask_t *prefetch_pages = &out_hint->prefetch_pages_mask;
    NvU32 pending_prefetch_pages;

    uvm_assert_rwsem_locked(&va_space->lock);
    uvm_assert_mutex_locked(&va_block->lock);
-    UVM_ASSERT(uvm_va_block_check_policy_is_valid(va_block, policy, faulted_region));
-    UVM_ASSERT(uvm_hmm_check_context_vma_is_valid(va_block, va_block_context, faulted_region));
+    UVM_ASSERT(uvm_hmm_check_context_vma_is_valid(va_block, va_block_context->hmm.vma, faulted_region));

    out_hint->residency = UVM_ID_INVALID;
    uvm_page_mask_zero(prefetch_pages);

-    if (!g_uvm_perf_prefetch_enable)
-        return;
-
-    if (!va_space->test.page_prefetch_enabled)
+    if (!uvm_perf_prefetch_enabled(va_space))
        return;

    pending_prefetch_pages = uvm_perf_prefetch_prenotify_fault_migrations(va_block,
--- a/kernel-open/nvidia-uvm/uvm_perf_prefetch.h
+++ b/kernel-open/nvidia-uvm/uvm_perf_prefetch.h
@@ -61,21 +61,41 @@ typedef struct
 // Global initialization function (no clean up needed).
 NV_STATUS uvm_perf_prefetch_init(void);

+// Returns whether prefetching is enabled in the VA space.
+// va_space cannot be NULL.
+bool uvm_perf_prefetch_enabled(uvm_va_space_t *va_space);
+
+// Return the prefetch mask with the pages that may be prefetched in a ATS
+// block. ATS block is a system allocated memory block with base aligned to
+// UVM_VA_BLOCK_SIZE and a maximum size of UVM_VA_BLOCK_SIZE. The faulted_pages
+// mask and faulted_region are the pages being faulted on the given residency.
+//
+// Only residency_mask can be NULL.
+//
+// Locking: The caller must hold the va_space lock.
+void uvm_perf_prefetch_compute_ats(uvm_va_space_t *va_space,
+                                   const uvm_page_mask_t *faulted_pages,
+                                   uvm_va_block_region_t faulted_region,
+                                   uvm_va_block_region_t max_prefetch_region,
+                                   const uvm_page_mask_t *residency_mask,
+                                   uvm_perf_prefetch_bitmap_tree_t *bitmap_tree,
+                                   uvm_page_mask_t *out_prefetch_mask);
+
 // Return a hint with the pages that may be prefetched in the block.
 // The faulted_pages mask and faulted_region are the pages being migrated to
 // the given residency.
-// va_block_context must not be NULL, va_block_context->policy must be valid,
-// and if the va_block is a HMM block, va_block_context->hmm.vma must be valid
-// which also means the va_block_context->mm is not NULL, retained, and locked
-// for at least read.
+// va_block_context must not be NULL, and if the va_block is a HMM
+// block, va_block_context->hmm.vma must be valid which also means the
+// va_block_context->mm is not NULL, retained, and locked for at least
+// read.
 // Locking: The caller must hold the va_space lock and va_block lock.
-void uvm_perf_prefetch_get_hint(uvm_va_block_t *va_block,
-                                uvm_va_block_context_t *va_block_context,
-                                uvm_processor_id_t new_residency,
-                                const uvm_page_mask_t *faulted_pages,
-                                uvm_va_block_region_t faulted_region,
-                                uvm_perf_prefetch_bitmap_tree_t *bitmap_tree,
-                                uvm_perf_prefetch_hint_t *out_hint);
+void uvm_perf_prefetch_get_hint_va_block(uvm_va_block_t *va_block,
+                                         uvm_va_block_context_t *va_block_context,
+                                         uvm_processor_id_t new_residency,
+                                         const uvm_page_mask_t *faulted_pages,
+                                         uvm_va_block_region_t faulted_region,
+                                         uvm_perf_prefetch_bitmap_tree_t *bitmap_tree,
+                                         uvm_perf_prefetch_hint_t *out_hint);

 void uvm_perf_prefetch_bitmap_tree_iter_init(const uvm_perf_prefetch_bitmap_tree_t *bitmap_tree,
                                             uvm_page_index_t page_index,
--- a/kernel-open/nvidia-uvm/uvm_perf_thrashing.c
+++ b/kernel-open/nvidia-uvm/uvm_perf_thrashing.c
@@ -164,7 +164,7 @@ typedef struct

        uvm_spinlock_t                          lock;

-        uvm_va_block_context_t      va_block_context;
+        uvm_va_block_context_t      *va_block_context;

        // Flag used to avoid scheduling delayed unpinning operations after
        // uvm_perf_thrashing_stop has been called.
@@ -601,6 +601,14 @@ static va_space_thrashing_info_t *va_space_thrashing_info_create(uvm_va_space_t

    va_space_thrashing = uvm_kvmalloc_zero(sizeof(*va_space_thrashing));
    if (va_space_thrashing) {
+        uvm_va_block_context_t *block_context = uvm_va_block_context_alloc(NULL);
+
+        if (!block_context) {
+            uvm_kvfree(va_space_thrashing);
+            return NULL;
+        }
+
+        va_space_thrashing->pinned_pages.va_block_context = block_context;
        va_space_thrashing->va_space = va_space;

        va_space_thrashing_info_init_params(va_space_thrashing);
@@ -621,6 +629,7 @@ static void va_space_thrashing_info_destroy(uvm_va_space_t *va_space)

    if (va_space_thrashing) {
        uvm_perf_module_type_unset_data(va_space->perf_modules_data, UVM_PERF_MODULE_TYPE_THRASHING);
+        uvm_va_block_context_free(va_space_thrashing->pinned_pages.va_block_context);
        uvm_kvfree(va_space_thrashing);
    }
 }
@@ -1095,7 +1104,7 @@ static NV_STATUS unmap_remote_pinned_pages(uvm_va_block_t *va_block,
    NV_STATUS tracker_status;
    uvm_tracker_t local_tracker = UVM_TRACKER_INIT();
    uvm_processor_id_t processor_id;
-    const uvm_va_policy_t *policy = va_block_context->policy;
+    const uvm_va_policy_t *policy = uvm_va_policy_get(va_block, uvm_va_block_region_start(va_block, region));

    uvm_assert_mutex_locked(&va_block->lock);

@@ -1104,7 +1113,7 @@ static NV_STATUS unmap_remote_pinned_pages(uvm_va_block_t *va_block,
                   !uvm_processor_mask_test(&policy->accessed_by, processor_id));

        if (uvm_processor_mask_test(&va_block->resident, processor_id)) {
-            const uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, processor_id);
+            const uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, processor_id, NUMA_NO_NODE);

            if (!uvm_page_mask_andnot(&va_block_context->caller_page_mask,
                                      &block_thrashing->pinned_pages.mask,
@@ -1141,10 +1150,9 @@ NV_STATUS uvm_perf_thrashing_unmap_remote_pinned_pages_all(uvm_va_block_t *va_bl
 {
    block_thrashing_info_t *block_thrashing;
    uvm_processor_mask_t unmap_processors;
-    const uvm_va_policy_t *policy = va_block_context->policy;
+    const uvm_va_policy_t *policy = uvm_va_policy_get_region(va_block, region);

    uvm_assert_mutex_locked(&va_block->lock);
-    UVM_ASSERT(uvm_va_block_check_policy_is_valid(va_block, policy, region));

    block_thrashing = thrashing_info_get(va_block);
    if (!block_thrashing || !block_thrashing->pages)
@@ -1313,9 +1321,8 @@ void thrashing_event_cb(uvm_perf_event_t event_id, uvm_perf_event_data_t *event_

        if (block_thrashing->last_time_stamp == 0 ||
            uvm_id_equal(block_thrashing->last_processor, processor_id) ||
-            time_stamp - block_thrashing->last_time_stamp > va_space_thrashing->params.lapse_ns) {
+            time_stamp - block_thrashing->last_time_stamp > va_space_thrashing->params.lapse_ns)
            goto done;
-        }

        num_block_pages = uvm_va_block_size(va_block) / PAGE_SIZE;

@@ -1804,7 +1811,7 @@ static void thrashing_unpin_pages(struct work_struct *work)
    struct delayed_work *dwork = to_delayed_work(work);
    va_space_thrashing_info_t *va_space_thrashing = container_of(dwork, va_space_thrashing_info_t, pinned_pages.dwork);
    uvm_va_space_t *va_space = va_space_thrashing->va_space;
-    uvm_va_block_context_t *va_block_context = &va_space_thrashing->pinned_pages.va_block_context;
+    uvm_va_block_context_t *va_block_context = va_space_thrashing->pinned_pages.va_block_context;

    // Take the VA space lock so that VA blocks don't go away during this
    // operation.
@@ -1867,8 +1874,6 @@ static void thrashing_unpin_pages(struct work_struct *work)
            UVM_ASSERT(uvm_page_mask_test(&block_thrashing->pinned_pages.mask, page_index));

            uvm_va_block_context_init(va_block_context, NULL);
-            va_block_context->policy =
-                uvm_va_policy_get(va_block, uvm_va_block_cpu_page_address(va_block, page_index));

            uvm_perf_thrashing_unmap_remote_pinned_pages_all(va_block,
                                                             va_block_context,
@@ -1940,7 +1945,6 @@ void uvm_perf_thrashing_unload(uvm_va_space_t *va_space)

    // Make sure that there are not pending work items
    if (va_space_thrashing) {
-        UVM_ASSERT(va_space_thrashing->pinned_pages.in_va_space_teardown);
        UVM_ASSERT(list_empty(&va_space_thrashing->pinned_pages.list));

        va_space_thrashing_info_destroy(va_space);
@@ -2123,8 +2127,6 @@ NV_STATUS uvm_test_set_page_thrashing_policy(UVM_TEST_SET_PAGE_THRASHING_POLICY_
                uvm_va_block_region_t va_block_region = uvm_va_block_region_from_block(va_block);
                uvm_va_block_context_t *block_context = uvm_va_space_block_context(va_space, NULL);

-                block_context->policy = uvm_va_range_get_policy(va_range);
-
                uvm_mutex_lock(&va_block->lock);

                // Unmap may split PTEs and require a retry. Needs to be called
--- a/kernel-open/nvidia-uvm/uvm_perf_thrashing.h
+++ b/kernel-open/nvidia-uvm/uvm_perf_thrashing.h
@@ -103,11 +103,11 @@ void uvm_perf_thrashing_unload(uvm_va_space_t *va_space);
 // Destroy the thrashing detection struct for the given block.
 void uvm_perf_thrashing_info_destroy(uvm_va_block_t *va_block);

-// Unmap remote mappings from all processors on the pinned pages
-// described by region and block_thrashing->pinned pages.
-// va_block_context must not be NULL and va_block_context->policy must be valid.
-// See the comments for uvm_va_block_check_policy_is_valid() in uvm_va_block.h.
-// Locking: the va_block lock must be held.
+// Unmap remote mappings from all processors on the pinned pages described by
+// region and block_thrashing->pinned pages.  va_block_context must not be NULL
+// and policy for the region must match.  See the comments for
+// uvm_va_block_check_policy_is_valid() in uvm_va_block.h.  Locking: the
+// va_block lock must be held.
 NV_STATUS uvm_perf_thrashing_unmap_remote_pinned_pages_all(uvm_va_block_t *va_block,
                                                           uvm_va_block_context_t *va_block_context,
                                                           uvm_va_block_region_t region);
--- a/kernel-open/nvidia-uvm/uvm_pmm_gpu.c
+++ b/kernel-open/nvidia-uvm/uvm_pmm_gpu.c
@@ -3377,76 +3377,47 @@ uvm_gpu_id_t uvm_pmm_devmem_page_to_gpu_id(struct page *page)
    return gpu->id;
 }

-static void evict_orphan_pages(uvm_pmm_gpu_t *pmm, uvm_gpu_chunk_t *chunk)
-{
-    NvU32 i;
-
-    UVM_ASSERT(chunk->state == UVM_PMM_GPU_CHUNK_STATE_IS_SPLIT);
-    UVM_ASSERT(chunk->suballoc);
-
-    for (i = 0; i < num_subchunks(chunk); i++) {
-        uvm_gpu_chunk_t *subchunk = chunk->suballoc->subchunks[i];
-
-        uvm_spin_lock(&pmm->list_lock);
-
-        if (subchunk->state == UVM_PMM_GPU_CHUNK_STATE_IS_SPLIT) {
-            uvm_spin_unlock(&pmm->list_lock);
-
-            evict_orphan_pages(pmm, subchunk);
-            continue;
-        }
-
-        if (subchunk->state == UVM_PMM_GPU_CHUNK_STATE_ALLOCATED && subchunk->is_referenced) {
-            unsigned long pfn = uvm_pmm_gpu_devmem_get_pfn(pmm, subchunk);
-
-            // TODO: Bug 3368756: add support for large GPU pages.
-            UVM_ASSERT(uvm_gpu_chunk_get_size(subchunk) == PAGE_SIZE);
-            uvm_spin_unlock(&pmm->list_lock);
-
-            // The above check for subchunk state is racy because the
-            // chunk may be freed after the lock is dropped. It is
-            // still safe to proceed in that case because the struct
-            // page reference will have dropped to zero and cannot
-            // have been re-allocated as this is only called during
-            // GPU teardown. Therefore migrate_device_range() will
-            // simply fail.
-            uvm_hmm_pmm_gpu_evict_pfn(pfn);
-            continue;
-        }
-
-        uvm_spin_unlock(&pmm->list_lock);
-    }
-}
-
-// Free any orphan pages.
-// This should be called as part of removing a GPU: after all work is stopped
-// and all va_blocks have been destroyed. There normally won't be any
-// device private struct page references left but there can be cases after
-// fork() where a child process still holds a reference. This function searches
-// for pages that still have a reference and migrates the page to the GPU in
-// order to release the reference in the CPU page table.
-static void uvm_pmm_gpu_free_orphan_pages(uvm_pmm_gpu_t *pmm)
+// Check there are no orphan pages. This should be only called as part of
+// removing a GPU: after all work is stopped and all va_blocks have been
+// destroyed. By now there should be no device-private page references left as
+// there are no va_space's left on this GPU and orphan pages should be removed
+// by va_space destruction or unregistration from the GPU.
+static bool uvm_pmm_gpu_check_orphan_pages(uvm_pmm_gpu_t *pmm)
 {
    size_t i;
+    bool ret = true;
+    unsigned long pfn;
+    struct range range = pmm->devmem.pagemap.range;

-    if (!pmm->initialized)
-        return;
-
-    // This is only safe to call during GPU teardown where chunks
-    // cannot be re-allocated.
-    UVM_ASSERT(uvm_gpu_retained_count(uvm_pmm_to_gpu(pmm)) == 0);
+    if (!pmm->initialized || !uvm_hmm_is_enabled_system_wide())
+        return ret;

    // Scan all the root chunks looking for subchunks which are still
-    // referenced. This is slow, but we only do this when unregistering a GPU
-    // and is not critical for performance.
+    // referenced.
    for (i = 0; i < pmm->root_chunks.count; i++) {
        uvm_gpu_root_chunk_t *root_chunk = &pmm->root_chunks.array[i];

        root_chunk_lock(pmm, root_chunk);
        if (root_chunk->chunk.state == UVM_PMM_GPU_CHUNK_STATE_IS_SPLIT)
-            evict_orphan_pages(pmm, &root_chunk->chunk);
+            ret = false;
        root_chunk_unlock(pmm, root_chunk);
    }
+
+    for (pfn = __phys_to_pfn(range.start); pfn <= __phys_to_pfn(range.end); pfn++) {
+        struct page *page = pfn_to_page(pfn);
+
+        if (!is_device_private_page(page)) {
+            ret = false;
+            break;
+        }
+
+        if (page_count(page)) {
+            ret = false;
+            break;
+        }
+    }
+
+    return ret;
 }

 static void devmem_page_free(struct page *page)
@@ -3479,7 +3450,7 @@ static vm_fault_t devmem_fault(struct vm_fault *vmf)
 {
    uvm_va_space_t *va_space = vmf->page->zone_device_data;

-    if (!va_space || va_space->va_space_mm.mm != vmf->vma->vm_mm)
+    if (!va_space)
        return VM_FAULT_SIGBUS;

    return uvm_va_space_cpu_fault_hmm(va_space, vmf->vma, vmf);
@@ -3568,8 +3539,9 @@ static void devmem_deinit(uvm_pmm_gpu_t *pmm)
 {
 }

-static void uvm_pmm_gpu_free_orphan_pages(uvm_pmm_gpu_t *pmm)
+static bool uvm_pmm_gpu_check_orphan_pages(uvm_pmm_gpu_t *pmm)
 {
+    return true;
 }
 #endif // UVM_IS_CONFIG_HMM()

@@ -3744,7 +3716,7 @@ void uvm_pmm_gpu_deinit(uvm_pmm_gpu_t *pmm)

    gpu = uvm_pmm_to_gpu(pmm);

-    uvm_pmm_gpu_free_orphan_pages(pmm);
+    UVM_ASSERT(uvm_pmm_gpu_check_orphan_pages(pmm));
    nv_kthread_q_flush(&gpu->parent->lazy_free_q);
    UVM_ASSERT(list_empty(&pmm->root_chunks.va_block_lazy_free));
    release_free_root_chunks(pmm);
@@ -3820,18 +3792,11 @@ NV_STATUS uvm_test_evict_chunk(UVM_TEST_EVICT_CHUNK_PARAMS *params, struct file
    // For virtual mode, look up and retain the block first so that eviction can
    // be started without the VA space lock held.
    if (params->eviction_mode == UvmTestEvictModeVirtual) {
-        uvm_va_block_context_t *block_context;
+        if (mm)
+            status = uvm_va_block_find_create(va_space, params->address, NULL, &block);
+        else
+            status = uvm_va_block_find_create_managed(va_space, params->address, &block);

-        block_context = uvm_va_block_context_alloc(mm);
-        if (!block_context) {
-            status = NV_ERR_NO_MEMORY;
-            uvm_va_space_up_read(va_space);
-            uvm_va_space_mm_release_unlock(va_space, mm);
-            goto out;
-        }
-
-        status = uvm_va_block_find_create(va_space, params->address, block_context, &block);
-        uvm_va_block_context_free(block_context);
        if (status != NV_OK) {
            uvm_va_space_up_read(va_space);
            uvm_va_space_mm_or_current_release_unlock(va_space, mm);
--- a/kernel-open/nvidia-uvm/uvm_pmm_sysmem.c
+++ b/kernel-open/nvidia-uvm/uvm_pmm_sysmem.c
@@ -749,6 +749,7 @@ NV_STATUS uvm_cpu_chunk_map_gpu(uvm_cpu_chunk_t *chunk, uvm_gpu_t *gpu)
 }

 static struct page *uvm_cpu_chunk_alloc_page(uvm_chunk_size_t alloc_size,
+                                             int nid,
                                             uvm_cpu_chunk_alloc_flags_t alloc_flags)
 {
    gfp_t kernel_alloc_flags;
@@ -764,18 +765,27 @@ static struct page *uvm_cpu_chunk_alloc_page(uvm_chunk_size_t alloc_size,

    kernel_alloc_flags |= GFP_HIGHUSER;

-    // For allocation sizes higher than PAGE_SIZE, use __GFP_NORETRY in
-    // order to avoid higher allocation latency from the kernel compacting
-    // memory to satisfy the request.
+    // For allocation sizes higher than PAGE_SIZE, use __GFP_NORETRY in order
+    // to avoid higher allocation latency from the kernel compacting memory to
+    // satisfy the request.
+    // Use __GFP_NOWARN to avoid printing allocation failure to the kernel log.
+    // High order allocation failures are handled gracefully by the caller.
    if (alloc_size > PAGE_SIZE)
-        kernel_alloc_flags |= __GFP_COMP | __GFP_NORETRY;
+        kernel_alloc_flags |= __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN;

    if (alloc_flags & UVM_CPU_CHUNK_ALLOC_FLAGS_ZERO)
        kernel_alloc_flags |= __GFP_ZERO;

-    page = alloc_pages(kernel_alloc_flags, get_order(alloc_size));
-    if (page && (alloc_flags & UVM_CPU_CHUNK_ALLOC_FLAGS_ZERO))
-        SetPageDirty(page);
+    UVM_ASSERT(nid < num_online_nodes());
+    if (nid == NUMA_NO_NODE)
+        page = alloc_pages(kernel_alloc_flags, get_order(alloc_size));
+    else
+        page = alloc_pages_node(nid, kernel_alloc_flags, get_order(alloc_size));
+
+    if (page) {
+        if (alloc_flags & UVM_CPU_CHUNK_ALLOC_FLAGS_ZERO)
+            SetPageDirty(page);
+    }

    return page;
 }
@@ -805,6 +815,7 @@ static uvm_cpu_physical_chunk_t *uvm_cpu_chunk_create(uvm_chunk_size_t alloc_siz

 NV_STATUS uvm_cpu_chunk_alloc(uvm_chunk_size_t alloc_size,
                              uvm_cpu_chunk_alloc_flags_t alloc_flags,
+                              int nid,
                              uvm_cpu_chunk_t **new_chunk)
 {
    uvm_cpu_physical_chunk_t *chunk;
@@ -812,7 +823,7 @@ NV_STATUS uvm_cpu_chunk_alloc(uvm_chunk_size_t alloc_size,

    UVM_ASSERT(new_chunk);

-    page = uvm_cpu_chunk_alloc_page(alloc_size, alloc_flags);
+    page = uvm_cpu_chunk_alloc_page(alloc_size, nid, alloc_flags);
    if (!page)
        return NV_ERR_NO_MEMORY;

@@ -847,6 +858,13 @@ NV_STATUS uvm_cpu_chunk_alloc_hmm(struct page *page,
    return NV_OK;
 }

+int uvm_cpu_chunk_get_numa_node(uvm_cpu_chunk_t *chunk)
+{
+    UVM_ASSERT(chunk);
+    UVM_ASSERT(chunk->page);
+    return page_to_nid(chunk->page);
+}
+
 NV_STATUS uvm_cpu_chunk_split(uvm_cpu_chunk_t *chunk, uvm_cpu_chunk_t **new_chunks)
 {
    NV_STATUS status = NV_OK;
--- a/kernel-open/nvidia-uvm/uvm_pmm_sysmem.h
+++ b/kernel-open/nvidia-uvm/uvm_pmm_sysmem.h
@@ -304,11 +304,24 @@ uvm_chunk_sizes_mask_t uvm_cpu_chunk_get_allocation_sizes(void);

 // Allocate a physical CPU chunk of the specified size.
 //
+// The nid argument is used to indicate a memory node preference. If the
+// value is a memory node ID, the chunk allocation will be attempted on
+// that memory node. If the chunk cannot be allocated on that memory node,
+// it will be allocated on any memory node allowed by the process's policy.
+//
+// If the value of nid is a memory node ID that is not in the set of
+// current process's allowed memory nodes, it will be allocated on one of the
+// nodes in the allowed set.
+//
+// If the value of nid is NUMA_NO_NODE, the chunk will be allocated from any
+// of the allowed memory nodes by the process policy.
+//
 // If a CPU chunk allocation succeeds, NV_OK is returned. new_chunk will be set
 // to point to the newly allocated chunk. On failure, NV_ERR_NO_MEMORY is
 // returned.
 NV_STATUS uvm_cpu_chunk_alloc(uvm_chunk_size_t alloc_size,
                              uvm_cpu_chunk_alloc_flags_t flags,
+                              int nid,
                              uvm_cpu_chunk_t **new_chunk);

 // Allocate a HMM CPU chunk.
@@ -375,6 +388,9 @@ static uvm_cpu_logical_chunk_t *uvm_cpu_chunk_to_logical(uvm_cpu_chunk_t *chunk)
    return container_of((chunk), uvm_cpu_logical_chunk_t, common);
 }

+// Return the NUMA node ID of the physical page backing the chunk.
+int uvm_cpu_chunk_get_numa_node(uvm_cpu_chunk_t *chunk);
+
 // Free a CPU chunk.
 // This may not result in the immediate freeing of the physical pages of the
 // chunk if this is a logical chunk and there are other logical chunks holding
--- a/kernel-open/nvidia-uvm/uvm_pmm_sysmem_test.c
+++ b/kernel-open/nvidia-uvm/uvm_pmm_sysmem_test.c
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2017-2019 NVIDIA Corporation
+    Copyright (c) 2017-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -664,6 +664,7 @@ done:

 static NV_STATUS test_cpu_chunk_alloc(uvm_chunk_size_t size,
                                      uvm_cpu_chunk_alloc_flags_t flags,
+                                      int nid,
                                      uvm_cpu_chunk_t **out_chunk)
 {
    uvm_cpu_chunk_t *chunk;
@@ -675,7 +676,7 @@ static NV_STATUS test_cpu_chunk_alloc(uvm_chunk_size_t size,
    // It is possible that the allocation fails due to lack of large pages
    // rather than an API issue, which will result in a false negative.
    // However, that should be very rare.
-    TEST_NV_CHECK_RET(uvm_cpu_chunk_alloc(size, flags, &chunk));
+    TEST_NV_CHECK_RET(uvm_cpu_chunk_alloc(size, flags, nid, &chunk));

    // Check general state of the chunk:
    //   - chunk should be a physical chunk,
@@ -685,6 +686,12 @@ static NV_STATUS test_cpu_chunk_alloc(uvm_chunk_size_t size,
    TEST_CHECK_GOTO(uvm_cpu_chunk_get_size(chunk) == size, done);
    TEST_CHECK_GOTO(uvm_cpu_chunk_num_pages(chunk) == size / PAGE_SIZE, done);

+    // It is possible for the kernel to allocate a chunk on a NUMA node other
+    // than the one requested. However, that should not be an issue with
+    // sufficient memory on each NUMA node.
+    if (nid != NUMA_NO_NODE)
+        TEST_CHECK_GOTO(uvm_cpu_chunk_get_numa_node(chunk) == nid, done);
+
    if (flags & UVM_CPU_CHUNK_ALLOC_FLAGS_ZERO) {
        NvU64 *cpu_addr;

@@ -719,7 +726,7 @@ static NV_STATUS test_cpu_chunk_mapping_basic_verify(uvm_gpu_t *gpu,
    NvU64 dma_addr;
    NV_STATUS status = NV_OK;

-    TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, flags, &chunk));
+    TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, flags, NUMA_NO_NODE, &chunk));
    phys_chunk = uvm_cpu_chunk_to_physical(chunk);

    // Check state of the physical chunk:
@@ -763,27 +770,27 @@ static NV_STATUS test_cpu_chunk_mapping_basic(uvm_gpu_t *gpu, uvm_cpu_chunk_allo
    return NV_OK;
 }

-static NV_STATUS test_cpu_chunk_mapping_array(uvm_gpu_t *gpu1, uvm_gpu_t *gpu2, uvm_gpu_t *gpu3)
+static NV_STATUS test_cpu_chunk_mapping_array(uvm_gpu_t *gpu0, uvm_gpu_t *gpu1, uvm_gpu_t *gpu2)
 {
    NV_STATUS status = NV_OK;
    uvm_cpu_chunk_t *chunk;
    uvm_cpu_physical_chunk_t *phys_chunk;
-    NvU64 dma_addr_gpu2;
+    NvU64 dma_addr_gpu1;

-    TEST_NV_CHECK_RET(test_cpu_chunk_alloc(PAGE_SIZE, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, &chunk));
+    TEST_NV_CHECK_RET(test_cpu_chunk_alloc(PAGE_SIZE, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, NUMA_NO_NODE, &chunk));
    phys_chunk = uvm_cpu_chunk_to_physical(chunk);

-    TEST_NV_CHECK_GOTO(uvm_cpu_chunk_map_gpu(chunk, gpu2), done);
-    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu2), done);
-    TEST_NV_CHECK_GOTO(uvm_cpu_chunk_map_gpu(chunk, gpu3), done);
-    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu2), done);
-    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu3), done);
-    dma_addr_gpu2 = uvm_cpu_chunk_get_gpu_phys_addr(chunk, gpu2->parent);
-    uvm_cpu_chunk_unmap_gpu_phys(chunk, gpu3->parent);
-    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu2), done);
    TEST_NV_CHECK_GOTO(uvm_cpu_chunk_map_gpu(chunk, gpu1), done);
    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu1), done);
+    TEST_NV_CHECK_GOTO(uvm_cpu_chunk_map_gpu(chunk, gpu2), done);
+    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu1), done);
    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu2), done);
+    dma_addr_gpu1 = uvm_cpu_chunk_get_gpu_phys_addr(chunk, gpu1->parent);
+    uvm_cpu_chunk_unmap_gpu_phys(chunk, gpu2->parent);
+    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu1), done);
+    TEST_NV_CHECK_GOTO(uvm_cpu_chunk_map_gpu(chunk, gpu0), done);
+    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu0), done);
+    TEST_NV_CHECK_GOTO(test_cpu_chunk_mapping_access(chunk, gpu1), done);

    // DMA mapping addresses for different GPUs live in different IOMMU spaces,
    // so it would be perfectly legal for them to have the same IOVA, and even
@@ -793,7 +800,7 @@ static NV_STATUS test_cpu_chunk_mapping_array(uvm_gpu_t *gpu1, uvm_gpu_t *gpu2,
    // GPU1. It's true that we may get a false negative if both addresses
    // happened to alias and we had a bug in how the addresses are shifted in
    // the dense array, but that's better than intermittent failure.
-    TEST_CHECK_GOTO(uvm_cpu_chunk_get_gpu_phys_addr(chunk, gpu2->parent) == dma_addr_gpu2, done);
+    TEST_CHECK_GOTO(uvm_cpu_chunk_get_gpu_phys_addr(chunk, gpu1->parent) == dma_addr_gpu1, done);

 done:
    uvm_cpu_chunk_free(chunk);
@@ -911,7 +918,7 @@ static NV_STATUS test_cpu_chunk_split_and_merge(uvm_gpu_t *gpu)
        uvm_cpu_chunk_t *chunk;
        NV_STATUS status;

-        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, &chunk));
+        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, NUMA_NO_NODE, &chunk));
        status = do_test_cpu_chunk_split_and_merge(chunk, gpu);
        uvm_cpu_chunk_free(chunk);

@@ -993,7 +1000,7 @@ static NV_STATUS test_cpu_chunk_dirty(uvm_gpu_t *gpu)
        uvm_cpu_physical_chunk_t *phys_chunk;
        size_t num_pages;

-        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, &chunk));
+        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, NUMA_NO_NODE, &chunk));
        phys_chunk = uvm_cpu_chunk_to_physical(chunk);
        num_pages = uvm_cpu_chunk_num_pages(chunk);

@@ -1005,7 +1012,7 @@ static NV_STATUS test_cpu_chunk_dirty(uvm_gpu_t *gpu)

        uvm_cpu_chunk_free(chunk);

-        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_ZERO, &chunk));
+        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_ZERO, NUMA_NO_NODE, &chunk));
        phys_chunk = uvm_cpu_chunk_to_physical(chunk);
        num_pages = uvm_cpu_chunk_num_pages(chunk);

@@ -1170,13 +1177,35 @@ NV_STATUS test_cpu_chunk_free(uvm_va_space_t *va_space, uvm_processor_mask_t *te
    size_t size = uvm_chunk_find_next_size(alloc_sizes, PAGE_SIZE);

    for_each_chunk_size_from(size, alloc_sizes) {
-        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, &chunk));
+        TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, NUMA_NO_NODE, &chunk));
        TEST_NV_CHECK_RET(do_test_cpu_chunk_free(chunk, va_space, test_gpus));
    }

    return NV_OK;
 }

+static NV_STATUS test_cpu_chunk_numa_alloc(uvm_va_space_t *va_space)
+{
+    uvm_cpu_chunk_t *chunk;
+    uvm_chunk_sizes_mask_t alloc_sizes = uvm_cpu_chunk_get_allocation_sizes();
+    size_t size;
+
+    for_each_chunk_size(size, alloc_sizes) {
+        int nid;
+
+        for_each_possible_uvm_node(nid) {
+            // Do not test CPU allocation on nodes that have no memory or CPU
+            if (!node_state(nid, N_MEMORY) || !node_state(nid, N_CPU))
+                continue;
+
+            TEST_NV_CHECK_RET(test_cpu_chunk_alloc(size, UVM_CPU_CHUNK_ALLOC_FLAGS_NONE, nid, &chunk));
+            uvm_cpu_chunk_free(chunk);
+        }
+    }
+
+    return NV_OK;
+}
+
 NV_STATUS uvm_test_cpu_chunk_api(UVM_TEST_CPU_CHUNK_API_PARAMS *params, struct file *filp)
 {
    uvm_va_space_t *va_space = uvm_va_space_get(filp);
@@ -1197,6 +1226,7 @@ NV_STATUS uvm_test_cpu_chunk_api(UVM_TEST_CPU_CHUNK_API_PARAMS *params, struct f
    }

    TEST_NV_CHECK_GOTO(test_cpu_chunk_free(va_space, &test_gpus), done);
+    TEST_NV_CHECK_GOTO(test_cpu_chunk_numa_alloc(va_space), done);

    if (uvm_processor_mask_get_gpu_count(&test_gpus) >= 3) {
        uvm_gpu_t *gpu2, *gpu3;
--- a/kernel-open/nvidia-uvm/uvm_pmm_test.c
+++ b/kernel-open/nvidia-uvm/uvm_pmm_test.c
@@ -1068,7 +1068,7 @@ static NV_STATUS test_pmm_reverse_map_single(uvm_gpu_t *gpu, uvm_va_space_t *va_
    uvm_mutex_lock(&va_block->lock);

    is_resident = uvm_processor_mask_test(&va_block->resident, gpu->id) &&
-                  uvm_page_mask_full(uvm_va_block_resident_mask_get(va_block, gpu->id));
+                  uvm_page_mask_full(uvm_va_block_resident_mask_get(va_block, gpu->id, NUMA_NO_NODE));
    if (is_resident)
        phys_addr = uvm_va_block_gpu_phys_page_address(va_block, 0, gpu);

@@ -1154,7 +1154,7 @@ static NV_STATUS test_pmm_reverse_map_many_blocks(uvm_gpu_t *gpu, uvm_va_space_t
                uvm_mutex_lock(&va_block->lock);

                // Verify that all pages are populated on the GPU
-                is_resident = uvm_page_mask_region_full(uvm_va_block_resident_mask_get(va_block, gpu->id),
+                is_resident = uvm_page_mask_region_full(uvm_va_block_resident_mask_get(va_block, gpu->id, NUMA_NO_NODE),
                                                        reverse_mapping->region);

                uvm_mutex_unlock(&va_block->lock);
--- a/kernel-open/nvidia-uvm/uvm_policy.c
+++ b/kernel-open/nvidia-uvm/uvm_policy.c
@@ -160,7 +160,7 @@ static NV_STATUS preferred_location_unmap_remote_pages(uvm_va_block_t *va_block,
    NV_STATUS status = NV_OK;
    NV_STATUS tracker_status;
    uvm_tracker_t local_tracker = UVM_TRACKER_INIT();
-    const uvm_va_policy_t *policy = va_block_context->policy;
+    const uvm_va_policy_t *policy = uvm_va_policy_get_region(va_block, region);
    uvm_processor_id_t preferred_location = policy->preferred_location;
    uvm_va_space_t *va_space = uvm_va_block_get_va_space(va_block);
    const uvm_page_mask_t *mapped_mask;
@@ -176,7 +176,9 @@ static NV_STATUS preferred_location_unmap_remote_pages(uvm_va_block_t *va_block,
    mapped_mask = uvm_va_block_map_mask_get(va_block, preferred_location);

    if (uvm_processor_mask_test(&va_block->resident, preferred_location)) {
-        const uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, preferred_location);
+        const uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block,
+                                                                              preferred_location,
+                                                                              NUMA_NO_NODE);

        if (!uvm_page_mask_andnot(&va_block_context->caller_page_mask, mapped_mask, resident_mask))
            goto done;
@@ -279,6 +281,9 @@ static NV_STATUS preferred_location_set(uvm_va_space_t *va_space,
        return NV_OK;
    }

+    if (!mm)
+        return NV_ERR_INVALID_ADDRESS;
+
    return uvm_hmm_set_preferred_location(va_space, preferred_location, base, last_address, out_tracker);
 }

@@ -445,7 +450,6 @@ NV_STATUS uvm_va_block_set_accessed_by_locked(uvm_va_block_t *va_block,
    NV_STATUS tracker_status;

    uvm_assert_mutex_locked(&va_block->lock);
-    UVM_ASSERT(uvm_va_block_check_policy_is_valid(va_block, va_block_context->policy, region));

    status = uvm_va_block_add_mappings(va_block,
                                       va_block_context,
@@ -467,13 +471,13 @@ NV_STATUS uvm_va_block_set_accessed_by(uvm_va_block_t *va_block,
    uvm_va_block_region_t region = uvm_va_block_region_from_block(va_block);
    NV_STATUS status;
    uvm_tracker_t local_tracker = UVM_TRACKER_INIT();
+    uvm_va_policy_t *policy = uvm_va_range_get_policy(va_block->va_range);

    UVM_ASSERT(!uvm_va_block_is_hmm(va_block));
-    UVM_ASSERT(va_block_context->policy == uvm_va_range_get_policy(va_block->va_range));

    // Read duplication takes precedence over SetAccessedBy. Do not add mappings
    // if read duplication is enabled.
-    if (uvm_va_policy_is_read_duplicate(va_block_context->policy, va_space))
+    if (uvm_va_policy_is_read_duplicate(policy, va_space))
        return NV_OK;

    status = UVM_VA_BLOCK_LOCK_RETRY(va_block,
@@ -592,8 +596,15 @@ static NV_STATUS accessed_by_set(uvm_va_space_t *va_space,
        UVM_ASSERT(va_range_last->node.end >= last_address);
    }
    else {
+        // NULL mm case already filtered by uvm_api_range_type_check()
+        UVM_ASSERT(mm);
        UVM_ASSERT(type == UVM_API_RANGE_TYPE_HMM);
-        status = uvm_hmm_set_accessed_by(va_space, processor_id, set_bit, base, last_address, &local_tracker);
+        status = uvm_hmm_set_accessed_by(va_space,
+                                         processor_id,
+                                         set_bit,
+                                         base,
+                                         last_address,
+                                         &local_tracker);
    }

 done:
@@ -629,7 +640,7 @@ static NV_STATUS va_block_set_read_duplication_locked(uvm_va_block_t *va_block,

    for_each_id_in_mask(src_id, &va_block->resident) {
        NV_STATUS status;
-        uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, src_id);
+        uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, src_id, NUMA_NO_NODE);

        // Calling uvm_va_block_make_resident_read_duplicate will break all
        // SetAccessedBy and remote mappings
@@ -656,7 +667,6 @@ NV_STATUS uvm_va_block_set_read_duplication(uvm_va_block_t *va_block,

    // TODO: Bug 3660922: need to implement HMM read duplication support.
    UVM_ASSERT(!uvm_va_block_is_hmm(va_block));
-    UVM_ASSERT(va_block_context->policy == uvm_va_range_get_policy(va_block->va_range));

    status = UVM_VA_BLOCK_LOCK_RETRY(va_block, &va_block_retry,
                                     va_block_set_read_duplication_locked(va_block,
@@ -675,7 +685,7 @@ static NV_STATUS va_block_unset_read_duplication_locked(uvm_va_block_t *va_block
    uvm_processor_id_t processor_id;
    uvm_va_block_region_t block_region = uvm_va_block_region_from_block(va_block);
    uvm_page_mask_t *break_read_duplication_pages = &va_block_context->caller_page_mask;
-    const uvm_va_policy_t *policy = va_block_context->policy;
+    const uvm_va_policy_t *policy = uvm_va_range_get_policy(va_block->va_range);
    uvm_processor_id_t preferred_location = policy->preferred_location;
    uvm_processor_mask_t accessed_by = policy->accessed_by;

@@ -687,7 +697,7 @@ static NV_STATUS va_block_unset_read_duplication_locked(uvm_va_block_t *va_block
    // If preferred_location is set and has resident copies, give it preference
    if (UVM_ID_IS_VALID(preferred_location) &&
        uvm_processor_mask_test(&va_block->resident, preferred_location)) {
-        uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, preferred_location);
+        uvm_page_mask_t *resident_mask = uvm_va_block_resident_mask_get(va_block, preferred_location, NUMA_NO_NODE);
        bool is_mask_empty = !uvm_page_mask_and(break_read_duplication_pages,
                                                &va_block->read_duplicated_pages,
                                                resident_mask);
@@ -715,7 +725,7 @@ static NV_STATUS va_block_unset_read_duplication_locked(uvm_va_block_t *va_block
        if (uvm_id_equal(processor_id, preferred_location))
            continue;

-        resident_mask = uvm_va_block_resident_mask_get(va_block, processor_id);
+        resident_mask = uvm_va_block_resident_mask_get(va_block, processor_id, NUMA_NO_NODE);
        is_mask_empty = !uvm_page_mask_and(break_read_duplication_pages,
                                           &va_block->read_duplicated_pages,
                                           resident_mask);
@@ -757,7 +767,6 @@ NV_STATUS uvm_va_block_unset_read_duplication(uvm_va_block_t *va_block,
    uvm_tracker_t local_tracker = UVM_TRACKER_INIT();

    UVM_ASSERT(!uvm_va_block_is_hmm(va_block));
-    UVM_ASSERT(va_block_context->policy == uvm_va_range_get_policy(va_block->va_range));

    // Restore all SetAccessedBy mappings
    status = UVM_VA_BLOCK_LOCK_RETRY(va_block, &va_block_retry,
@@ -915,7 +924,6 @@ static NV_STATUS system_wide_atomics_set(uvm_va_space_t *va_space, const NvProce
            if (va_range->type != UVM_VA_RANGE_TYPE_MANAGED)
                continue;

-            va_block_context->policy = uvm_va_range_get_policy(va_range);
            for_each_va_block_in_va_range(va_range, va_block) {
                uvm_page_mask_t *non_resident_pages = &va_block_context->caller_page_mask;

--- a/kernel-open/nvidia-uvm/uvm_processors.c
+++ b/kernel-open/nvidia-uvm/uvm_processors.c
@@ -0,0 +1,40 @@
+/*******************************************************************************
+    Copyright (c) 2023 NVIDIA Corporation
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to
+    deal in the Software without restriction, including without limitation the
+    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+        The above copyright notice and this permission notice shall be
+        included in all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+#include "uvm_processors.h"
+
+int uvm_find_closest_node_mask(int src, const nodemask_t *mask)
+{
+    int nid;
+    int closest_nid = NUMA_NO_NODE;
+
+    if (node_isset(src, *mask))
+        return src;
+
+    for_each_set_bit(nid, mask->bits, MAX_NUMNODES) {
+        if (closest_nid == NUMA_NO_NODE || node_distance(src, nid) < node_distance(src, closest_nid))
+            closest_nid = nid;
+    }
+
+    return closest_nid;
+}
--- a/kernel-open/nvidia-uvm/uvm_processors.h
+++ b/kernel-open/nvidia-uvm/uvm_processors.h
@@ -1,5 +1,5 @@
 /*******************************************************************************
-    Copyright (c) 2016-2019 NVIDIA Corporation
+    Copyright (c) 2016-2023 NVIDIA Corporation

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to
@@ -26,6 +26,7 @@

 #include "uvm_linux.h"
 #include "uvm_common.h"
+#include <linux/numa.h>

 #define UVM_MAX_UNIQUE_GPU_PAIRS SUM_FROM_0_TO_N(UVM_MAX_GPUS - 1)

@@ -37,11 +38,11 @@
 // provide type safety, they are wrapped within the uvm_processor_id_t struct.
 // The range of valid identifiers needs to cover the maximum number of
 // supported GPUs on a system plus the CPU. CPU is assigned value 0, and GPUs
-// range: [1, UVM_ID_MAX_GPUS].
+// range: [1, UVM_PARENT_ID_MAX_GPUS].
 //
 // There are some functions that only expect GPU identifiers and, in order to
-// make it clearer, the uvm_gpu_id_t alias type is provided. However, as this
-// type is just a typedef of uvm_processor_id_t, there is no type checking
+// make it clearer, the uvm_parent_gpu_id_t alias type is provided. However, as
+// this type is just a typedef of uvm_processor_id_t, there is no type checking
 // performed by the compiler.
 //
 // Identifier value vs index
@@ -60,22 +61,25 @@
 // the GPU within the GPU id space (basically id - 1).
 //
 // In the diagram below, MAX_SUB is used to abbreviate
-// UVM_ID_MAX_SUB_PROCESSORS.
+// UVM_PARENT_ID_MAX_SUB_PROCESSORS.
 //
-//            |-------------------------- uvm_processor_id_t ----------------------|
-//            |                                                                    |
-//            |     |----------------------- uvm_gpu_id_t ------------------------||
-//            |     |                                                             ||
-// Proc type  | CPU | GPU          ...          GPU   ... GPU                     ||
-//            |     |                                                             ||
-// ID values  |  0  |  1           ...          i+1   ... UVM_ID_MAX_PROCESSORS-1 ||
+// TODO: Bug 4195538: uvm_parent_processor_id_t is currently but temporarily the
+//                    same as uvm_processor_id_t.
 //
-// GPU index           0           ...           i    ... UVM_ID_MAX_GPUS-1
+//            |-------------------------- uvm_parent_processor_id_t ----------------------|
+//            |                                                                           |
+//            |     |----------------------- uvm_parent_gpu_id_t ------------------------||
+//            |     |                                                                    ||
+// Proc type  | CPU | GPU          ...          GPU   ... GPU                            ||
+//            |     |                                                                    ||
+// ID values  |  0  |  1           ...          i+1   ... UVM_PARENT_ID_MAX_PROCESSORS-1 ||
+//
+// GPU index           0           ...           i    ... UVM_PARENT_ID_MAX_GPUS-1
 //                  |     |                   |     |
 //                  |     |                   |     |
-//                  |     |-------------|     |     |-----------------------------|
-//                  |                   |     |                                   |
-//                  |                   |     |                                   |
+//                  |     |-------------|     |     |------------------------------------|
+//                  |                   |     |                                          |
+//                  |                   |     |                                          |
 // GPU index           0  ... MAX_SUB-1   ...    i*MAX_SUB    ... (i+1)*MAX_SUB-1   ... UVM_GLOBAL_ID_MAX_GPUS-1
 //
 // ID values  |  0  |  1  ... MAX_SUB     ...   (i*MAX_SUB)+1 ... (i+1)*MAX_SUB     ... UVM_GLOBAL_ID_MAX_PROCESSORS-1 ||
@@ -210,7 +214,7 @@ static proc_id_t prefix_fn_mask##_find_first_id(const mask_t *mask)
                                                                                                             \
 static proc_id_t prefix_fn_mask##_find_first_gpu_id(const mask_t *mask)                                      \
 {                                                                                                            \
-    return proc_id_ctor(find_next_bit(mask->bitmap, (maxval), UVM_ID_GPU0_VALUE));                           \
+    return proc_id_ctor(find_next_bit(mask->bitmap, (maxval), UVM_PARENT_ID_GPU0_VALUE));                    \
 }                                                                                                            \
                                                                                                             \
 static proc_id_t prefix_fn_mask##_find_next_id(const mask_t *mask, proc_id_t min_id)                         \
@@ -252,7 +256,7 @@ static NvU32 prefix_fn_mask##_get_gpu_count(const mask_t *mask)
 {                                                                                                            \
    NvU32 gpu_count = prefix_fn_mask##_get_count(mask);                                                      \
                                                                                                             \
-    if (prefix_fn_mask##_test(mask, proc_id_ctor(UVM_ID_CPU_VALUE)))                                         \
+    if (prefix_fn_mask##_test(mask, proc_id_ctor(UVM_PARENT_ID_CPU_VALUE)))                                  \
        --gpu_count;                                                                                         \
                                                                                                             \
    return gpu_count;                                                                                        \
@@ -261,55 +265,55 @@ static NvU32 prefix_fn_mask##_get_gpu_count(const mask_t *mask)
 typedef struct
 {
    NvU32 val;
-} uvm_processor_id_t;
+} uvm_parent_processor_id_t;

 typedef struct
 {
    NvU32 val;
 } uvm_global_processor_id_t;

-typedef uvm_processor_id_t uvm_gpu_id_t;
+typedef uvm_parent_processor_id_t uvm_parent_gpu_id_t;
 typedef uvm_global_processor_id_t uvm_global_gpu_id_t;

 // Static value assigned to the CPU
-#define UVM_ID_CPU_VALUE      0
-#define UVM_ID_GPU0_VALUE     (UVM_ID_CPU_VALUE + 1)
+#define UVM_PARENT_ID_CPU_VALUE      0
+#define UVM_PARENT_ID_GPU0_VALUE     (UVM_PARENT_ID_CPU_VALUE + 1)

 // ID values for the CPU and first GPU, respectively; the values for both types
 // of IDs must match to enable sharing of UVM_PROCESSOR_MASK().
-#define UVM_GLOBAL_ID_CPU_VALUE  UVM_ID_CPU_VALUE
-#define UVM_GLOBAL_ID_GPU0_VALUE UVM_ID_GPU0_VALUE
+#define UVM_GLOBAL_ID_CPU_VALUE  UVM_PARENT_ID_CPU_VALUE
+#define UVM_GLOBAL_ID_GPU0_VALUE UVM_PARENT_ID_GPU0_VALUE

 // Maximum number of GPUs/processors that can be represented with the id types
-#define UVM_ID_MAX_GPUS       UVM_MAX_GPUS
-#define UVM_ID_MAX_PROCESSORS UVM_MAX_PROCESSORS
+#define UVM_PARENT_ID_MAX_GPUS       UVM_MAX_GPUS
+#define UVM_PARENT_ID_MAX_PROCESSORS UVM_MAX_PROCESSORS

-#define UVM_ID_MAX_SUB_PROCESSORS 8
+#define UVM_PARENT_ID_MAX_SUB_PROCESSORS 8

-#define UVM_GLOBAL_ID_MAX_GPUS       (UVM_MAX_GPUS * UVM_ID_MAX_SUB_PROCESSORS)
+#define UVM_GLOBAL_ID_MAX_GPUS       (UVM_PARENT_ID_MAX_GPUS * UVM_PARENT_ID_MAX_SUB_PROCESSORS)
 #define UVM_GLOBAL_ID_MAX_PROCESSORS (UVM_GLOBAL_ID_MAX_GPUS + 1)

-#define UVM_ID_CPU            ((uvm_processor_id_t) { .val = UVM_ID_CPU_VALUE })
-#define UVM_ID_INVALID        ((uvm_processor_id_t) { .val = UVM_ID_MAX_PROCESSORS })
+#define UVM_PARENT_ID_CPU     ((uvm_parent_processor_id_t) { .val = UVM_PARENT_ID_CPU_VALUE })
+#define UVM_PARENT_ID_INVALID ((uvm_parent_processor_id_t) { .val = UVM_PARENT_ID_MAX_PROCESSORS })
 #define UVM_GLOBAL_ID_CPU     ((uvm_global_processor_id_t) { .val = UVM_GLOBAL_ID_CPU_VALUE })
 #define UVM_GLOBAL_ID_INVALID ((uvm_global_processor_id_t) { .val = UVM_GLOBAL_ID_MAX_PROCESSORS })

-#define UVM_ID_CHECK_BOUNDS(id) UVM_ASSERT_MSG(id.val <= UVM_ID_MAX_PROCESSORS, "id %u\n", id.val)
+#define UVM_PARENT_ID_CHECK_BOUNDS(id) UVM_ASSERT_MSG(id.val <= UVM_PARENT_ID_MAX_PROCESSORS, "id %u\n", id.val)

 #define UVM_GLOBAL_ID_CHECK_BOUNDS(id) UVM_ASSERT_MSG(id.val <= UVM_GLOBAL_ID_MAX_PROCESSORS, "id %u\n", id.val)

-static int uvm_id_cmp(uvm_processor_id_t id1, uvm_processor_id_t id2)
+static int uvm_parent_id_cmp(uvm_parent_processor_id_t id1, uvm_parent_processor_id_t id2)
 {
-    UVM_ID_CHECK_BOUNDS(id1);
-    UVM_ID_CHECK_BOUNDS(id2);
+    UVM_PARENT_ID_CHECK_BOUNDS(id1);
+    UVM_PARENT_ID_CHECK_BOUNDS(id2);

    return UVM_CMP_DEFAULT(id1.val, id2.val);
 }

-static bool uvm_id_equal(uvm_processor_id_t id1, uvm_processor_id_t id2)
+static bool uvm_parent_id_equal(uvm_parent_processor_id_t id1, uvm_parent_processor_id_t id2)
 {
-    UVM_ID_CHECK_BOUNDS(id1);
-    UVM_ID_CHECK_BOUNDS(id2);
+    UVM_PARENT_ID_CHECK_BOUNDS(id1);
+    UVM_PARENT_ID_CHECK_BOUNDS(id2);

    return id1.val == id2.val;
 }
@@ -330,30 +334,30 @@ static bool uvm_global_id_equal(uvm_global_processor_id_t id1, uvm_global_proces
    return id1.val == id2.val;
 }

-#define UVM_ID_IS_CPU(id)     uvm_id_equal(id, UVM_ID_CPU)
-#define UVM_ID_IS_INVALID(id) uvm_id_equal(id, UVM_ID_INVALID)
-#define UVM_ID_IS_VALID(id)   (!UVM_ID_IS_INVALID(id))
-#define UVM_ID_IS_GPU(id)     (!UVM_ID_IS_CPU(id) && !UVM_ID_IS_INVALID(id))
+#define UVM_PARENT_ID_IS_CPU(id)     uvm_parent_id_equal(id, UVM_PARENT_ID_CPU)
+#define UVM_PARENT_ID_IS_INVALID(id) uvm_parent_id_equal(id, UVM_PARENT_ID_INVALID)
+#define UVM_PARENT_ID_IS_VALID(id)   (!UVM_PARENT_ID_IS_INVALID(id))
+#define UVM_PARENT_ID_IS_GPU(id)     (!UVM_PARENT_ID_IS_CPU(id) && !UVM_PARENT_ID_IS_INVALID(id))

 #define UVM_GLOBAL_ID_IS_CPU(id)     uvm_global_id_equal(id, UVM_GLOBAL_ID_CPU)
 #define UVM_GLOBAL_ID_IS_INVALID(id) uvm_global_id_equal(id, UVM_GLOBAL_ID_INVALID)
 #define UVM_GLOBAL_ID_IS_VALID(id)   (!UVM_GLOBAL_ID_IS_INVALID(id))
 #define UVM_GLOBAL_ID_IS_GPU(id)     (!UVM_GLOBAL_ID_IS_CPU(id) && !UVM_GLOBAL_ID_IS_INVALID(id))

-static uvm_processor_id_t uvm_id_from_value(NvU32 val)
+static uvm_parent_processor_id_t uvm_parent_id_from_value(NvU32 val)
 {
-    uvm_processor_id_t ret = { .val = val };
+    uvm_parent_processor_id_t ret = { .val = val };

-    UVM_ID_CHECK_BOUNDS(ret);
+    UVM_PARENT_ID_CHECK_BOUNDS(ret);

    return ret;
 }

-static uvm_gpu_id_t uvm_gpu_id_from_value(NvU32 val)
+static uvm_parent_gpu_id_t uvm_parent_gpu_id_from_value(NvU32 val)
 {
-    uvm_gpu_id_t ret = uvm_id_from_value(val);
+    uvm_parent_gpu_id_t ret = uvm_parent_id_from_value(val);

-    UVM_ASSERT(!UVM_ID_IS_CPU(ret));
+    UVM_ASSERT(!UVM_PARENT_ID_IS_CPU(ret));

    return ret;
 }
@@ -376,34 +380,34 @@ static uvm_global_gpu_id_t uvm_global_gpu_id_from_value(NvU32 val)
    return ret;
 }

-// Create a GPU id from the given GPU id index (previously obtained via
-// uvm_id_gpu_index)
-static uvm_gpu_id_t uvm_gpu_id_from_index(NvU32 index)
+// Create a parent GPU id from the given parent GPU id index (previously
+// obtained via uvm_parent_id_gpu_index)
+static uvm_parent_gpu_id_t uvm_parent_gpu_id_from_index(NvU32 index)
 {
-    return uvm_gpu_id_from_value(index + UVM_ID_GPU0_VALUE);
+    return uvm_parent_gpu_id_from_value(index + UVM_PARENT_ID_GPU0_VALUE);
 }

-static uvm_processor_id_t uvm_id_next(uvm_processor_id_t id)
+static uvm_parent_processor_id_t uvm_parent_id_next(uvm_parent_processor_id_t id)
 {
    ++id.val;

-    UVM_ID_CHECK_BOUNDS(id);
+    UVM_PARENT_ID_CHECK_BOUNDS(id);

    return id;
 }

-static uvm_gpu_id_t uvm_gpu_id_next(uvm_gpu_id_t id)
+static uvm_parent_gpu_id_t uvm_parent_gpu_id_next(uvm_parent_gpu_id_t id)
 {
-    UVM_ASSERT(UVM_ID_IS_GPU(id));
+    UVM_ASSERT(UVM_PARENT_ID_IS_GPU(id));

    ++id.val;

-    UVM_ID_CHECK_BOUNDS(id);
+    UVM_PARENT_ID_CHECK_BOUNDS(id);

    return id;
 }

-// Same as uvm_gpu_id_from_index but for uvm_global_processor_id_t
+// Same as uvm_parent_gpu_id_from_index but for uvm_global_processor_id_t
 static uvm_global_gpu_id_t uvm_global_gpu_id_from_index(NvU32 index)
 {
    return uvm_global_gpu_id_from_value(index + UVM_GLOBAL_ID_GPU0_VALUE);
@@ -429,11 +433,11 @@ static uvm_global_gpu_id_t uvm_global_gpu_id_next(uvm_global_gpu_id_t id)
    return id;
 }

-// This function returns the numerical value within [0, UVM_ID_MAX_PROCESSORS)
-// of the given processor id
-static NvU32 uvm_id_value(uvm_processor_id_t id)
+// This function returns the numerical value within
+// [0, UVM_PARENT_ID_MAX_PROCESSORS) of the given parent processor id.
+static NvU32 uvm_parent_id_value(uvm_parent_processor_id_t id)
 {
-    UVM_ASSERT(UVM_ID_IS_VALID(id));
+    UVM_ASSERT(UVM_PARENT_ID_IS_VALID(id));

    return id.val;
 }
@@ -448,12 +452,12 @@ static NvU32 uvm_global_id_value(uvm_global_processor_id_t id)
 }

 // This function returns the index of the given GPU id within the GPU id space
-// [0, UVM_ID_MAX_GPUS)
-static NvU32 uvm_id_gpu_index(uvm_gpu_id_t id)
+// [0, UVM_PARENT_ID_MAX_GPUS)
+static NvU32 uvm_parent_id_gpu_index(uvm_parent_gpu_id_t id)
 {
-    UVM_ASSERT(UVM_ID_IS_GPU(id));
+    UVM_ASSERT(UVM_PARENT_ID_IS_GPU(id));

-    return id.val - UVM_ID_GPU0_VALUE;
+    return id.val - UVM_PARENT_ID_GPU0_VALUE;
 }

 // This function returns the index of the given GPU id within the GPU id space
@@ -465,61 +469,61 @@ static NvU32 uvm_global_id_gpu_index(const uvm_global_gpu_id_t id)
    return id.val - UVM_GLOBAL_ID_GPU0_VALUE;
 }

-static NvU32 uvm_global_id_gpu_index_from_gpu_id(const uvm_gpu_id_t id)
+static NvU32 uvm_global_id_gpu_index_from_parent_gpu_id(const uvm_parent_gpu_id_t id)
 {
-    UVM_ASSERT(UVM_ID_IS_GPU(id));
+    UVM_ASSERT(UVM_PARENT_ID_IS_GPU(id));

-    return uvm_id_gpu_index(id) * UVM_ID_MAX_SUB_PROCESSORS;
+    return uvm_parent_id_gpu_index(id) * UVM_PARENT_ID_MAX_SUB_PROCESSORS;
 }

-static NvU32 uvm_id_gpu_index_from_global_gpu_id(const uvm_global_gpu_id_t id)
+static NvU32 uvm_parent_id_gpu_index_from_global_gpu_id(const uvm_global_gpu_id_t id)
 {
    UVM_ASSERT(UVM_GLOBAL_ID_IS_GPU(id));

-    return uvm_global_id_gpu_index(id) / UVM_ID_MAX_SUB_PROCESSORS;
+    return uvm_global_id_gpu_index(id) / UVM_PARENT_ID_MAX_SUB_PROCESSORS;
 }

-static uvm_global_gpu_id_t uvm_global_gpu_id_from_gpu_id(const uvm_gpu_id_t id)
+static uvm_global_gpu_id_t uvm_global_gpu_id_from_parent_gpu_id(const uvm_parent_gpu_id_t id)
 {
-    UVM_ASSERT(UVM_ID_IS_GPU(id));
+    UVM_ASSERT(UVM_PARENT_ID_IS_GPU(id));

-    return uvm_global_gpu_id_from_index(uvm_global_id_gpu_index_from_gpu_id(id));
+    return uvm_global_gpu_id_from_index(uvm_global_id_gpu_index_from_parent_gpu_id(id));
 }

 static uvm_global_gpu_id_t uvm_global_gpu_id_from_parent_index(NvU32 index)
 {
-    UVM_ASSERT(index < UVM_MAX_GPUS);
+    UVM_ASSERT(index < UVM_PARENT_ID_MAX_GPUS);

-    return uvm_global_gpu_id_from_gpu_id(uvm_gpu_id_from_value(index + UVM_GLOBAL_ID_GPU0_VALUE));
+    return uvm_global_gpu_id_from_parent_gpu_id(uvm_parent_gpu_id_from_value(index + UVM_GLOBAL_ID_GPU0_VALUE));
 }

-static uvm_global_gpu_id_t uvm_global_gpu_id_from_sub_processor_index(const uvm_gpu_id_t id, NvU32 sub_index)
+static uvm_global_gpu_id_t uvm_global_gpu_id_from_sub_processor_index(const uvm_parent_gpu_id_t id, NvU32 sub_index)
 {
    NvU32 index;

-    UVM_ASSERT(sub_index < UVM_ID_MAX_SUB_PROCESSORS);
+    UVM_ASSERT(sub_index < UVM_PARENT_ID_MAX_SUB_PROCESSORS);

-    index = uvm_global_id_gpu_index_from_gpu_id(id) + sub_index;
+    index = uvm_global_id_gpu_index_from_parent_gpu_id(id) + sub_index;
    return uvm_global_gpu_id_from_index(index);
 }

-static uvm_gpu_id_t uvm_gpu_id_from_global_gpu_id(const uvm_global_gpu_id_t id)
+static uvm_parent_gpu_id_t uvm_parent_gpu_id_from_global_gpu_id(const uvm_global_gpu_id_t id)
 {
    UVM_ASSERT(UVM_GLOBAL_ID_IS_GPU(id));

-    return uvm_gpu_id_from_index(uvm_id_gpu_index_from_global_gpu_id(id));
+    return uvm_parent_gpu_id_from_index(uvm_parent_id_gpu_index_from_global_gpu_id(id));
 }

 static NvU32 uvm_global_id_sub_processor_index(const uvm_global_gpu_id_t id)
 {
-    return uvm_global_id_gpu_index(id) % UVM_ID_MAX_SUB_PROCESSORS;
+    return uvm_global_id_gpu_index(id) % UVM_PARENT_ID_MAX_SUB_PROCESSORS;
 }

 UVM_PROCESSOR_MASK(uvm_processor_mask_t,              \
                   uvm_processor_mask,                \
-                   UVM_ID_MAX_PROCESSORS,             \
-                   uvm_processor_id_t,                \
-                   uvm_id_from_value)
+                   UVM_PARENT_ID_MAX_PROCESSORS,      \
+                   uvm_parent_processor_id_t,         \
+                   uvm_parent_id_from_value)

 UVM_PROCESSOR_MASK(uvm_global_processor_mask_t,       \
                   uvm_global_processor_mask,         \
@@ -533,19 +537,19 @@ static bool uvm_processor_mask_gpu_subset(const uvm_processor_mask_t *subset, co
 {
    uvm_processor_mask_t subset_gpus;
    uvm_processor_mask_copy(&subset_gpus, subset);
-    uvm_processor_mask_clear(&subset_gpus, UVM_ID_CPU);
+    uvm_processor_mask_clear(&subset_gpus, UVM_PARENT_ID_CPU);
    return uvm_processor_mask_subset(&subset_gpus, mask);
 }

 #define for_each_id_in_mask(id, mask)                                                                 \
    for ((id) = uvm_processor_mask_find_first_id(mask);                                               \
-         UVM_ID_IS_VALID(id);                                                                         \
-         (id) = uvm_processor_mask_find_next_id((mask), uvm_id_next(id)))
+         UVM_PARENT_ID_IS_VALID(id);                                                                  \
+         (id) = uvm_processor_mask_find_next_id((mask), uvm_parent_id_next(id)))

 #define for_each_gpu_id_in_mask(gpu_id, mask)                                                         \
    for ((gpu_id) = uvm_processor_mask_find_first_gpu_id((mask));                                     \
-         UVM_ID_IS_VALID(gpu_id);                                                                     \
-         (gpu_id) = uvm_processor_mask_find_next_id((mask), uvm_gpu_id_next(gpu_id)))
+         UVM_PARENT_ID_IS_VALID(gpu_id);                                                              \
+         (gpu_id) = uvm_processor_mask_find_next_id((mask), uvm_parent_gpu_id_next(gpu_id)))

 #define for_each_global_id_in_mask(id, mask)                                                          \
    for ((id) = uvm_global_processor_mask_find_first_id(mask);                                        \
@@ -559,21 +563,36 @@ static bool uvm_processor_mask_gpu_subset(const uvm_processor_mask_t *subset, co

 // Helper to iterate over all valid gpu ids
 #define for_each_gpu_id(i)       \
-    for (i = uvm_gpu_id_from_value(UVM_ID_GPU0_VALUE); UVM_ID_IS_VALID(i); i = uvm_gpu_id_next(i))
+    for (i = uvm_parent_gpu_id_from_value(UVM_PARENT_ID_GPU0_VALUE); UVM_PARENT_ID_IS_VALID(i); i = uvm_parent_gpu_id_next(i))
 #define for_each_global_gpu_id(i)  \
    for (i = uvm_global_gpu_id_from_value(UVM_GLOBAL_ID_GPU0_VALUE); UVM_GLOBAL_ID_IS_VALID(i); i = uvm_global_gpu_id_next(i))

 #define for_each_global_sub_processor_id_in_gpu(id, i) \
-    for (i = uvm_global_gpu_id_from_gpu_id(id); \
+    for (i = uvm_global_gpu_id_from_parent_gpu_id(id); \
         UVM_GLOBAL_ID_IS_VALID(i) && \
-         (uvm_global_id_value(i) < uvm_global_id_value(uvm_global_gpu_id_from_gpu_id(id)) + UVM_ID_MAX_SUB_PROCESSORS); \
+         (uvm_global_id_value(i) < uvm_global_id_value(uvm_global_gpu_id_from_parent_gpu_id(id)) + UVM_PARENT_ID_MAX_SUB_PROCESSORS); \
         i = uvm_global_gpu_id_next(i))

 // Helper to iterate over all valid gpu ids
-#define for_each_processor_id(i) for (i = UVM_ID_CPU; UVM_ID_IS_VALID(i); i = uvm_id_next(i))
+#define for_each_processor_id(i) for (i = UVM_PARENT_ID_CPU; UVM_PARENT_ID_IS_VALID(i); i = uvm_parent_id_next(i))

 #define for_each_global_id(i) for (i = UVM_GLOBAL_ID_CPU; UVM_GLOBAL_ID_IS_VALID(i); i = uvm_global_id_next(i))

+// Find the node in mask with the shorted distance (as returned by
+// node_distance) for src.
+// Note that the search is inclusive of src.
+// If mask has no bits set, NUMA_NO_NODE is returned.
+int uvm_find_closest_node_mask(int src, const nodemask_t *mask);
+
+// Iterate over all nodes in mask with increasing distance from src.
+// Note that this iterator is destructive of the mask.
+#define for_each_closest_uvm_node(nid, src, mask)                                                                      \
+    for ((nid) = uvm_find_closest_node_mask((src), &(mask));                                                           \
+         (nid) != NUMA_NO_NODE;                                                                                        \
+         node_clear((nid), (mask)), (nid) = uvm_find_closest_node_mask((src), &(mask)))
+
+#define for_each_possible_uvm_node(nid) for_each_node_mask((nid), node_possible_map)
+
 static bool uvm_processor_uuid_eq(const NvProcessorUuid *uuid1, const NvProcessorUuid *uuid2)
 {
    return memcmp(uuid1, uuid2, sizeof(*uuid1)) == 0;
@@ -585,4 +604,78 @@ static void uvm_processor_uuid_copy(NvProcessorUuid *dst, const NvProcessorUuid
    memcpy(dst, src, sizeof(*dst));
 }

+// TODO: Bug 4195538: [uvm][multi-SMC] Get UVM internal data structures ready to
+// meet multi-SMC requirements. Temporary aliases, they must be removed once
+// the data structures are converted.
+typedef uvm_parent_processor_id_t uvm_processor_id_t;
+typedef uvm_parent_gpu_id_t uvm_gpu_id_t;
+
+#define UVM_ID_CPU_VALUE                 UVM_PARENT_ID_CPU_VALUE
+#define UVM_ID_GPU0_VALUE                UVM_PARENT_ID_GPU0_VALUE
+#define UVM_ID_MAX_GPUS                  UVM_PARENT_ID_MAX_GPUS
+#define UVM_ID_MAX_PROCESSORS            UVM_PARENT_ID_MAX_PROCESSORS
+#define UVM_ID_MAX_SUB_PROCESSORS        UVM_PARENT_ID_MAX_SUB_PROCESSORS
+#define UVM_ID_CPU                       UVM_PARENT_ID_CPU
+#define UVM_ID_INVALID                   UVM_PARENT_ID_INVALID
+
+static int uvm_id_cmp(uvm_parent_processor_id_t id1, uvm_parent_processor_id_t id2)
+{
+    return UVM_CMP_DEFAULT(id1.val, id2.val);
+}
+
+static bool uvm_id_equal(uvm_parent_processor_id_t id1, uvm_parent_processor_id_t id2)
+{
+    return uvm_parent_id_equal(id1, id2);
+}
+
+#define UVM_ID_IS_CPU(id)     uvm_id_equal(id, UVM_ID_CPU)
+#define UVM_ID_IS_INVALID(id) uvm_id_equal(id, UVM_ID_INVALID)
+#define UVM_ID_IS_VALID(id)   (!UVM_ID_IS_INVALID(id))
+#define UVM_ID_IS_GPU(id)     (!UVM_ID_IS_CPU(id) && !UVM_ID_IS_INVALID(id))
+
+static uvm_parent_gpu_id_t uvm_gpu_id_from_value(NvU32 val)
+{
+    return uvm_parent_gpu_id_from_value(val);
+}
+
+static NvU32 uvm_id_value(uvm_parent_processor_id_t id)
+{
+    return uvm_parent_id_value(id);
+}
+
+static NvU32 uvm_id_gpu_index(uvm_parent_gpu_id_t id)
+{
+    return uvm_parent_id_gpu_index(id);
+}
+
+static NvU32 uvm_id_gpu_index_from_global_gpu_id(const uvm_global_gpu_id_t id)
+{
+    return uvm_parent_id_gpu_index_from_global_gpu_id(id);
+}
+
+static uvm_parent_gpu_id_t uvm_gpu_id_from_index(NvU32 index)
+{
+    return uvm_parent_gpu_id_from_index(index);
+}
+
+static uvm_parent_gpu_id_t uvm_gpu_id_next(uvm_parent_gpu_id_t id)
+{
+    return uvm_parent_gpu_id_next(id);
+}
+
+static uvm_parent_gpu_id_t uvm_gpu_id_from_global_gpu_id(const uvm_global_gpu_id_t id)
+{
+    return uvm_parent_gpu_id_from_global_gpu_id(id);
+}
+
+static NvU32 uvm_global_id_gpu_index_from_gpu_id(const uvm_parent_gpu_id_t id)
+{
+    return uvm_global_id_gpu_index_from_parent_gpu_id(id);
+}
+
+static uvm_global_gpu_id_t uvm_global_gpu_id_from_gpu_id(const uvm_parent_gpu_id_t id)
+{
+    return uvm_global_gpu_id_from_parent_gpu_id(id);
+}
+
 #endif
--- a/kernel-open/nvidia-uvm/uvm_push.c
+++ b/kernel-open/nvidia-uvm/uvm_push.c
@@ -391,11 +391,13 @@ uvm_gpu_address_t uvm_push_inline_data_end(uvm_push_inline_data_t *data)
        inline_data_address = (NvU64) (uintptr_t)(push->next + 1);
    }
    else {
+        uvm_pushbuffer_t *pushbuffer = uvm_channel_get_pushbuffer(channel);
+
        // Offset of the inlined data within the push.
        inline_data_address = (push->next - push->begin + 1) * UVM_METHOD_SIZE;

        // Add GPU VA of the push begin
-        inline_data_address += uvm_pushbuffer_get_gpu_va_for_push(channel->pool->manager->pushbuffer, push);
+        inline_data_address += uvm_pushbuffer_get_gpu_va_for_push(pushbuffer, push);
    }

    // This will place a noop right before the inline data that was written.
@@ -438,10 +440,8 @@ NvU64 *uvm_push_timestamp(uvm_push_t *push)

    if (uvm_channel_is_ce(push->channel))
        gpu->parent->ce_hal->semaphore_timestamp(push, address.address);
-    else if (uvm_channel_is_sec2(push->channel))
-        gpu->parent->sec2_hal->semaphore_timestamp(push, address.address);
    else
-        UVM_ASSERT_MSG(0, "Semaphore release timestamp on an unsupported channel.\n");
+        gpu->parent->sec2_hal->semaphore_timestamp(push, address.address);

    return timestamp;
 }
--- a/kernel-open/nvidia-uvm/uvm_push.h
+++ b/kernel-open/nvidia-uvm/uvm_push.h
@@ -377,11 +377,6 @@ static bool uvm_push_has_space(uvm_push_t *push, NvU32 free_space)
 NV_STATUS uvm_push_begin_fake(uvm_gpu_t *gpu, uvm_push_t *push);
 void uvm_push_end_fake(uvm_push_t *push);

-static bool uvm_push_is_fake(uvm_push_t *push)
-{
-    return !push->channel;
-}
-
 // Begin an inline data fragment in the push
 //
 // The inline data will be ignored by the GPU, but can be referenced from
--- a/kernel-open/nvidia-uvm/uvm_push_test.c
+++ b/kernel-open/nvidia-uvm/uvm_push_test.c
@@ -40,10 +40,9 @@

 static NvU32 get_push_begin_size(uvm_channel_t *channel)
 {
-    if (uvm_channel_is_sec2(channel)) {
-        // SEC2 channels allocate CSL signature buffer at the beginning.
+    // SEC2 channels allocate CSL signature buffer at the beginning.
+    if (uvm_channel_is_sec2(channel))
        return UVM_CONF_COMPUTING_SIGN_BUF_MAX_SIZE + UVM_METHOD_SIZE;
-    }

    return 0;
 }
@@ -51,10 +50,14 @@ static NvU32 get_push_begin_size(uvm_channel_t *channel)
 // This is the storage required by a semaphore release.
 static NvU32 get_push_end_min_size(uvm_channel_t *channel)
 {
-    if (uvm_channel_is_ce(channel)) {
-        if (uvm_channel_is_wlc(channel)) {
-            // Space (in bytes) used by uvm_push_end() on a Secure CE channel.
-            // Note that Secure CE semaphore release pushes two memset and one
+    uvm_gpu_t *gpu = uvm_channel_get_gpu(channel);
+
+    if (uvm_conf_computing_mode_enabled(gpu)) {
+        if (uvm_channel_is_ce(channel)) {
+            // Space (in bytes) used by uvm_push_end() on a CE channel when
+            // the Confidential Computing feature is enabled.
+            //
+            // Note that CE semaphore release pushes two memset and one
            // encryption method on top of the regular release.
            // Memset size
            // -------------
@@ -75,43 +78,44 @@ static NvU32 get_push_end_min_size(uvm_channel_t *channel)
            //
            // TOTAL                            : 144 Bytes

-            // Same as CE + LCIC GPPut update + LCIC doorbell
-            return 24 + 144 + 24 + 24;
-        }
-        else if (uvm_channel_is_secure_ce(channel)) {
+            if (uvm_channel_is_wlc(channel)) {
+                // Same as CE + LCIC GPPut update + LCIC doorbell
+                return 24 + 144 + 24 + 24;
+            }
+
            return 24 + 144;
        }
-        // Space (in bytes) used by uvm_push_end() on a CE channel.
-        return 24;
-    }
-    else if (uvm_channel_is_sec2(channel)) {
+
+        UVM_ASSERT(uvm_channel_is_sec2(channel));
+
        // A perfectly aligned inline buffer in SEC2 semaphore release.
        // We add UVM_METHOD_SIZE because of the NOP method to reserve
        // UVM_CSL_SIGN_AUTH_TAG_SIZE_BYTES (the inline buffer.)
        return 48 + UVM_CSL_SIGN_AUTH_TAG_SIZE_BYTES + UVM_METHOD_SIZE;
    }

-    return 0;
+    UVM_ASSERT(uvm_channel_is_ce(channel));
+
+    // Space (in bytes) used by uvm_push_end() on a CE channel.
+    return 24;
 }

 static NvU32 get_push_end_max_size(uvm_channel_t *channel)
 {
-    if (uvm_channel_is_ce(channel)) {
-        if (uvm_channel_is_wlc(channel)) {
-            // WLC pushes are always padded to UVM_MAX_WLC_PUSH_SIZE
-            return UVM_MAX_WLC_PUSH_SIZE;
-        }
-        // Space (in bytes) used by uvm_push_end() on a CE channel.
-        return get_push_end_min_size(channel);
-    }
-    else if (uvm_channel_is_sec2(channel)) {
-        // Space (in bytes) used by uvm_push_end() on a SEC2 channel.
-        // Note that SEC2 semaphore release uses an inline buffer with alignment
-        // requirements. This is the "worst" case semaphore_release storage.
-        return 48 + UVM_CSL_SIGN_AUTH_TAG_SIZE_BYTES + UVM_CONF_COMPUTING_AUTH_TAG_ALIGNMENT;
-    }
+    // WLC pushes are always padded to UVM_MAX_WLC_PUSH_SIZE
+    if (uvm_channel_is_wlc(channel))
+        return UVM_MAX_WLC_PUSH_SIZE;

-    return 0;
+    // Space (in bytes) used by uvm_push_end() on a SEC2 channel.
+    // Note that SEC2 semaphore release uses an inline buffer with alignment
+    // requirements. This is the "worst" case semaphore_release storage.
+    if (uvm_channel_is_sec2(channel))
+        return 48 + UVM_CSL_SIGN_AUTH_TAG_SIZE_BYTES + UVM_CONF_COMPUTING_AUTH_TAG_ALIGNMENT;
+
+    UVM_ASSERT(uvm_channel_is_ce(channel));
+
+    // Space (in bytes) used by uvm_push_end() on a CE channel.
+    return get_push_end_min_size(channel);
 }

 static NV_STATUS test_push_end_size(uvm_va_space_t *va_space)
@@ -294,10 +298,19 @@ static NV_STATUS test_concurrent_pushes(uvm_va_space_t *va_space)
 {
    NV_STATUS status = NV_OK;
    uvm_gpu_t *gpu;
-    NvU32 i;
    uvm_push_t *pushes;
-    uvm_tracker_t tracker = UVM_TRACKER_INIT();
-    uvm_channel_type_t channel_type = UVM_CHANNEL_TYPE_GPU_INTERNAL;
+    uvm_tracker_t tracker;
+
+    // When the Confidential Computing feature is enabled, a channel reserved at
+    // the start of a push cannot be reserved again until that push ends. The
+    // test is waived, because the number of pushes it starts per pool exceeds
+    // the number of channels in the pool, so it would block indefinitely.
+    gpu = uvm_va_space_find_first_gpu(va_space);
+
+    if ((gpu != NULL) && uvm_conf_computing_mode_enabled(gpu))
+        return NV_OK;
+
+    uvm_tracker_init(&tracker);

    // As noted above, this test does unsafe things that would be detected by
    // lock tracking, opt-out.
@@ -310,16 +323,11 @@ static NV_STATUS test_concurrent_pushes(uvm_va_space_t *va_space)
    }

    for_each_va_space_gpu(gpu, va_space) {
+        NvU32 i;

-        // A secure channels reserved at the start of a push cannot be reserved
-        // again until that push ends. The test would block indefinitely
-        // if secure pools are not skipped, because the number of pushes started
-        // per pool exceeds the number of channels in the pool.
-        if (uvm_channel_type_requires_secure_pool(gpu, channel_type))
-            goto done;
        for (i = 0; i < UVM_PUSH_MAX_CONCURRENT_PUSHES; ++i) {
            uvm_push_t *push = &pushes[i];
-            status = uvm_push_begin(gpu->channel_manager, channel_type, push, "concurrent push %u", i);
+            status = uvm_push_begin(gpu->channel_manager, UVM_CHANNEL_TYPE_GPU_INTERNAL, push, "concurrent push %u", i);
            TEST_CHECK_GOTO(status == NV_OK, done);
        }
        for (i = 0; i < UVM_PUSH_MAX_CONCURRENT_PUSHES; ++i) {
--- a/kernel-open/nvidia-uvm/uvm_pushbuffer.c
+++ b/kernel-open/nvidia-uvm/uvm_pushbuffer.c
@@ -458,7 +458,7 @@ static void decrypt_push(uvm_channel_t *channel, uvm_gpfifo_entry_t *gpfifo)
    void *push_unprotected_cpu_va;
    NvU32 pushbuffer_offset = gpfifo->pushbuffer_offset;
    NvU32 push_info_index = gpfifo->push_info - channel->push_infos;
-    uvm_pushbuffer_t *pushbuffer = channel->pool->manager->pushbuffer;
+    uvm_pushbuffer_t *pushbuffer = uvm_channel_get_pushbuffer(channel);
    uvm_push_crypto_bundle_t *crypto_bundle = channel->conf_computing.push_crypto_bundles + push_info_index;

    if (channel->conf_computing.push_crypto_bundles == NULL)
@@ -499,7 +499,7 @@ void uvm_pushbuffer_mark_completed(uvm_channel_t *channel, uvm_gpfifo_entry_t *g
    uvm_pushbuffer_chunk_t *chunk;
    bool need_to_update_chunk = false;
    uvm_push_info_t *push_info = gpfifo->push_info;
-    uvm_pushbuffer_t *pushbuffer = channel->pool->manager->pushbuffer;
+    uvm_pushbuffer_t *pushbuffer = uvm_channel_get_pushbuffer(channel);

    UVM_ASSERT(gpfifo->type == UVM_GPFIFO_ENTRY_TYPE_NORMAL);

--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Maneet Singh	c4ea803a64	545.23.08	2023-11-17 11:03:42 -08:00
Andy Ritger	b5bf85a8e3	545.23.06	2023-10-17 09:25:29 -07:00
Maneet Singh	f59818b751	535.113.01	2023-09-21 10:43:43 -07:00
Bernhard Stoeckner	a8e01be6b2	535.104.05	2023-08-22 15:09:37 +02:00
Bernhard Stoeckner	12c0739352	535.98	2023-08-08 18:28:38 +02:00