Compare commits

..

23 Commits

Author SHA1 Message Date
Bernhard Stoeckner
1795a8bb20 555.58.02 2024-07-01 14:52:22 +02:00
Milos Tijanic
af77e083a2 555.58 2024-06-27 19:21:59 +02:00
Bernhard Stoeckner
78d807e001 555.52.04 2024-06-05 09:39:46 +02:00
Bernhard Stoeckner
5a1c474040 555.42.02 2024-05-21 15:12:40 +02:00
Bernhard Stoeckner
083cd9cf17 550.78 2024-04-25 16:24:58 +02:00
Bernhard Stoeckner
ea4c27fad6 550.76 2024-04-17 17:23:37 +02:00
Bernhard Stoeckner
3bf16b890c 550.67 2024-03-19 16:56:28 +01:00
Bernhard Stoeckner
12933b2d3c 550.54.15 2024-03-18 17:52:11 +01:00
Bernhard Stoeckner
476bd34534 550.54.14 2024-02-23 16:37:56 +01:00
Bernhard Stoeckner
91676d6628 550.40.07 2024-01-24 18:28:48 +01:00
Bernhard Stöckner
bb2dac1f20 Update 20_build_bug.yml 2024-01-23 15:30:14 +01:00
Maneet Singh
4c29105335 545.29.06 2023-11-21 13:38:23 -08:00
Andy Ritger
be3cd9abcb 545.29.02 2023-10-31 16:31:08 -07:00
Andy Ritger
a2f89d6b59 Revert "545.29.03"
This reverts commit f364378a65.

545.29.03 and 545.29.02 are functionally the same for purposes of
open-gpu-kernel-modules, but there was poor NVIDIA-internal communication
about which driver would actually be released.  Revert 545.29.03 so that
a subsequent commit can provide 545.29.02 cleanly.
2023-10-31 16:28:17 -07:00
Maneet Singh
f364378a65 545.29.03 2023-10-31 09:44:03 -07:00
Andy Ritger
b5bf85a8e3 545.23.06 2023-10-17 09:25:29 -07:00
Maneet Singh
f59818b751 535.113.01 2023-09-21 10:43:43 -07:00
Bernhard Stoeckner
a8e01be6b2 535.104.05 2023-08-22 15:09:37 +02:00
Bernhard Stoeckner
12c0739352 535.98 2023-08-08 18:28:38 +02:00
Bernhard Stoeckner
29f830f1bb 535.86.10 2023-07-31 18:17:14 +02:00
Bernhard Stoeckner
337e28efda 535.86.05 2023-07-18 16:00:22 +02:00
Bernhard Stoeckner
22a077c4fe issue template: be clearer about issues with prop driver 2023-07-10 15:58:02 +02:00
Andy Ritger
26458140be 535.54.03 2023-06-14 12:37:59 -07:00
1947 changed files with 419985 additions and 270352 deletions

View File

@@ -1,5 +1,8 @@
name: Report a functional bug 🐛
description: Functional bugs affect operation or stability of the driver and/or hardware.
description: |
Functional bugs affect operation or stability of the driver or hardware.
Bugs with the closed source driver must be reported on the forums (see link on New Issue page below).
labels:
- "bug"
body:
@@ -18,14 +21,12 @@ body:
description: "Which open-gpu-kernel-modules version are you running? Be as specific as possible: SHA is best when built from specific commit."
validations:
required: true
- type: dropdown
- type: checkboxes
id: sw_driver_proprietary
attributes:
label: "Does this happen with the proprietary driver (of the same version) as well?"
label: "Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver."
options:
- "Yes"
- "No"
- "I cannot test this"
- label: "I confirm that this does not happen with the proprietary driver package."
validations:
required: true
- type: input
@@ -42,6 +43,14 @@ body:
description: "Which kernel are you running? (output of `uname -a`, say if you built it yourself)"
validations:
required: true
- type: checkboxes
id: sw_host_kernel_stable
attributes:
label: "Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels."
options:
- label: "I am running on a stable kernel release."
validations:
required: true
- type: input
id: hw_gpu_type
attributes:
@@ -78,7 +87,10 @@ body:
id: bug_report_gz
attributes:
label: nvidia-bug-report.log.gz
description: "Please reproduce the problem, after that run `nvidia-bug-report.sh`, and attach the resulting nvidia-bug-report.log.gz here."
description: |
Please reproduce the problem, after that run `nvidia-bug-report.sh`, and attach the resulting nvidia-bug-report.log.gz here.
Reports without this file will be closed.
placeholder: You can usually just drag & drop the file into this textbox.
validations:
required: true

View File

@@ -32,6 +32,14 @@ body:
description: "Which kernel are you running? (output of `uname -a`, say if you built it yourself)."
validations:
required: true
- type: checkboxes
id: sw_host_kernel_stable
attributes:
label: "Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels."
options:
- label: "I am running on a stable kernel release."
validations:
required: true
- type: textarea
id: bug_description
attributes:

View File

@@ -1,14 +1,14 @@
blank_issues_enabled: false
contact_links:
- name: Report a bug with the proprietary driver
url: https://forums.developer.nvidia.com/c/gpu-graphics/linux/148
about: Bugs that aren't specific to the open source driver in this repository must be reported with the linked forums instead.
- name: Report a cosmetic issue
url: https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/categories/general
about: We are not currently accepting cosmetic-only changes such as whitespace, typos, or simple renames. You can still discuss and collect them on the boards.
- name: Ask a question
url: https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/categories/q-a
about: Unsure of what to click, where to go, what the process for your thing is? We're happy to help. Click to visit the discussion board and say hello!
- name: Report a bug with the proprietary driver
url: https://forums.developer.nvidia.com/c/gpu-graphics/linux/148
about: Bugs that aren't specific to the open source driver in this repository should be reported with the linked forums instead. If you are unsure on what kind of bug you have, feel free to open a thread in Discussions. We're here to help!
- name: Suggest a feature
url: https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/categories/ideas
about: Please do not open Issues for feature requests; instead, suggest and discuss new features on the Github discussion board. If you have a feature you worked on and want to PR it, please also open a discussion before doing so.

View File

@@ -1,7 +1,77 @@
# Changelog
## Release 555 Entries
### [555.58.02] 2024-07-01
### [555.58] 2024-06-27
### [555.52.04] 2024-06-05
### [555.42.02] 2024-05-21
## Release 550 Entries
### [550.78] 2024-04-25
### [550.76] 2024-04-17
### [550.67] 2024-03-19
### [550.54.15] 2024-03-18
### [550.54.14] 2024-02-23
#### Added
- Added vGPU Host and vGPU Guest support. For vGPU Host, please refer to the README.vgpu packaged in the vGPU Host Package for more details.
### [550.40.07] 2024-01-24
#### Fixed
- Set INSTALL_MOD_DIR only if it's not defined, [#570](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/570) by @keelung-yang
## Release 545 Entries
### [545.29.06] 2023-11-22
#### Fixed
- The brightness control of NVIDIA seems to be broken, [#573](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/573)
### [545.29.02] 2023-10-31
### [545.23.06] 2023-10-17
#### Fixed
- Fix always-false conditional, [#493](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/493) by @meme8383
#### Added
- Added beta-quality support for GeForce and Workstation GPUs. Please see the "Open Linux Kernel Modules" chapter in the NVIDIA GPU driver end user README for details.
## Release 535 Entries
### [535.129.03] 2023-10-31
### [535.113.01] 2023-09-21
#### Fixed
- Fixed building main against current centos stream 8 fails, [#550](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/550) by @airlied
### [535.104.05] 2023-08-22
### [535.98] 2023-08-08
### [535.86.10] 2023-07-31
### [535.86.05] 2023-07-18
### [535.54.03] 2023-06-14
### [535.43.02] 2023-05-30
#### Fixed
@@ -34,6 +104,14 @@
## Release 525 Entries
### [525.147.05] 2023-10-31
#### Fixed
- Fix nvidia_p2p_get_pages(): Fix double-free in register-callback error path, [#557](https://github.com/NVIDIA/open-gpu-kernel-modules/pull/557) by @BrendanCunningham
### [525.125.06] 2023-06-26
### [525.116.04] 2023-05-09
### [525.116.03] 2023-04-25

View File

@@ -1,7 +1,7 @@
# NVIDIA Linux Open GPU Kernel Module Source
This is the source release of the NVIDIA Linux open GPU kernel modules,
version 535.43.02.
version 555.58.02.
## How to Build
@@ -17,7 +17,7 @@ as root:
Note that the kernel modules built here must be used with GSP
firmware and user-space NVIDIA GPU driver components from a corresponding
535.43.02 driver release. This can be achieved by installing
555.58.02 driver release. This can be achieved by installing
the NVIDIA GPU driver from the .run file using the `--no-kernel-modules`
option. E.g.,
@@ -74,7 +74,7 @@ kernel.
The NVIDIA open kernel modules support the same range of Linux kernel
versions that are supported with the proprietary NVIDIA kernel modules.
This is currently Linux kernel 3.10 or newer.
This is currently Linux kernel 4.15 or newer.
## How to Contribute
@@ -179,16 +179,19 @@ software applications.
## Compatible GPUs
The open-gpu-kernel-modules can be used on any Turing or later GPU
(see the table below). However, in the 535.43.02 release,
GeForce and Workstation support is still considered alpha-quality.
The NVIDIA open kernel modules can be used on any Turing or later GPU
(see the table below). However, in the __DRIVER_VERION__ release, GeForce and
Workstation support is considered to be Beta quality. The open kernel modules
are suitable for broad usage, and NVIDIA requests feedback on any issues
encountered specific to them.
To enable use of the open kernel modules on GeForce and Workstation GPUs,
set the "NVreg_OpenRmEnableUnsupportedGpus" nvidia.ko kernel module
parameter to 1. For more details, see the NVIDIA GPU driver end user
README here:
For details on feature support and limitations, see the NVIDIA GPU driver
end user README here:
https://us.download.nvidia.com/XFree86/Linux-x86_64/535.43.02/README/kernel_open.html
https://us.download.nvidia.com/XFree86/Linux-x86_64/555.58.02/README/kernel_open.html
For vGPU support, please refer to the README.vgpu packaged in the vGPU Host
Package for more details.
In the below table, if three IDs are listed, the first is the PCI Device
ID, the second is the PCI Subsystem Vendor ID, and the third is the PCI
@@ -648,9 +651,12 @@ Subsystem Device ID.
| NVIDIA T1000 8GB | 1FF0 17AA 1612 |
| NVIDIA T400 4GB | 1FF2 1028 1613 |
| NVIDIA T400 4GB | 1FF2 103C 1613 |
| NVIDIA T400E | 1FF2 103C 18FF |
| NVIDIA T400 4GB | 1FF2 103C 8A80 |
| NVIDIA T400 4GB | 1FF2 10DE 1613 |
| NVIDIA T400E | 1FF2 10DE 18FF |
| NVIDIA T400 4GB | 1FF2 17AA 1613 |
| NVIDIA T400E | 1FF2 17AA 18FF |
| Quadro T1000 | 1FF9 |
| NVIDIA A100-SXM4-40GB | 20B0 |
| NVIDIA A100-PG509-200 | 20B0 10DE 1450 |
@@ -658,12 +664,16 @@ Subsystem Device ID.
| NVIDIA A100-SXM4-80GB | 20B2 10DE 147F |
| NVIDIA A100-SXM4-80GB | 20B2 10DE 1622 |
| NVIDIA A100-SXM4-80GB | 20B2 10DE 1623 |
| NVIDIA PG509-210 | 20B2 10DE 1625 |
| NVIDIA A100-SXM-64GB | 20B3 10DE 14A7 |
| NVIDIA A100-SXM-64GB | 20B3 10DE 14A8 |
| NVIDIA A100 80GB PCIe | 20B5 10DE 1533 |
| NVIDIA A100 80GB PCIe | 20B5 10DE 1642 |
| NVIDIA PG506-232 | 20B6 10DE 1492 |
| NVIDIA A30 | 20B7 10DE 1532 |
| NVIDIA A30 | 20B7 10DE 1804 |
| NVIDIA A30 | 20B7 10DE 1852 |
| NVIDIA A800-SXM4-40GB | 20BD 10DE 17F4 |
| NVIDIA A100-PCIE-40GB | 20F1 10DE 145F |
| NVIDIA A800-SXM4-80GB | 20F3 10DE 179B |
| NVIDIA A800-SXM4-80GB | 20F3 10DE 179C |
@@ -675,6 +685,11 @@ Subsystem Device ID.
| NVIDIA A800-SXM4-80GB | 20F3 10DE 17A2 |
| NVIDIA A800 80GB PCIe | 20F5 10DE 1799 |
| NVIDIA A800 80GB PCIe LC | 20F5 10DE 179A |
| NVIDIA A800 40GB Active | 20F6 1028 180A |
| NVIDIA A800 40GB Active | 20F6 103C 180A |
| NVIDIA A800 40GB Active | 20F6 10DE 180A |
| NVIDIA A800 40GB Active | 20F6 17AA 180A |
| NVIDIA AX800 | 20FD 10DE 17F8 |
| NVIDIA GeForce GTX 1660 Ti | 2182 |
| NVIDIA GeForce GTX 1660 | 2184 |
| NVIDIA GeForce GTX 1650 SUPER | 2187 |
@@ -733,13 +748,22 @@ Subsystem Device ID.
| NVIDIA A10 | 2236 10DE 1482 |
| NVIDIA A10G | 2237 10DE 152F |
| NVIDIA A10M | 2238 10DE 1677 |
| NVIDIA H100 NVL | 2321 10DE 1839 |
| NVIDIA H800 PCIe | 2322 10DE 17A4 |
| NVIDIA H800 | 2324 10DE 17A6 |
| NVIDIA H800 | 2324 10DE 17A8 |
| NVIDIA H20 | 2329 10DE 198B |
| NVIDIA H20 | 2329 10DE 198C |
| NVIDIA H100 80GB HBM3 | 2330 10DE 16C0 |
| NVIDIA H100 80GB HBM3 | 2330 10DE 16C1 |
| NVIDIA H100 PCIe | 2331 10DE 1626 |
| NVIDIA H200 | 2335 10DE 18BE |
| NVIDIA H200 | 2335 10DE 18BF |
| NVIDIA H100 | 2339 10DE 17FC |
| NVIDIA H800 NVL | 233A 10DE 183A |
| NVIDIA GH200 120GB | 2342 10DE 16EB |
| NVIDIA GH200 120GB | 2342 10DE 1805 |
| NVIDIA GH200 480GB | 2342 10DE 1809 |
| NVIDIA GeForce RTX 3060 Ti | 2414 |
| NVIDIA GeForce RTX 3080 Ti Laptop GPU | 2420 |
| NVIDIA RTX A5500 Laptop GPU | 2438 |
@@ -792,6 +816,7 @@ Subsystem Device ID.
| NVIDIA RTX A2000 12GB | 2571 10DE 1611 |
| NVIDIA RTX A2000 12GB | 2571 17AA 1611 |
| NVIDIA GeForce RTX 3050 | 2582 |
| NVIDIA GeForce RTX 3050 | 2584 |
| NVIDIA GeForce RTX 3050 Ti Laptop GPU | 25A0 |
| NVIDIA GeForce RTX 3050Ti Laptop GPU | 25A0 103C 8928 |
| NVIDIA GeForce RTX 3050Ti Laptop GPU | 25A0 103C 89F9 |
@@ -807,6 +832,14 @@ Subsystem Device ID.
| NVIDIA GeForce RTX 3050 4GB Laptop GPU | 25AB |
| NVIDIA GeForce RTX 3050 6GB Laptop GPU | 25AC |
| NVIDIA GeForce RTX 2050 | 25AD |
| NVIDIA RTX A1000 | 25B0 1028 1878 |
| NVIDIA RTX A1000 | 25B0 103C 1878 |
| NVIDIA RTX A1000 | 25B0 10DE 1878 |
| NVIDIA RTX A1000 | 25B0 17AA 1878 |
| NVIDIA RTX A400 | 25B2 1028 1879 |
| NVIDIA RTX A400 | 25B2 103C 1879 |
| NVIDIA RTX A400 | 25B2 10DE 1879 |
| NVIDIA RTX A400 | 25B2 17AA 1879 |
| NVIDIA A16 | 25B6 10DE 14A9 |
| NVIDIA A2 | 25B6 10DE 157E |
| NVIDIA RTX A2000 Laptop GPU | 25B8 |
@@ -824,34 +857,78 @@ Subsystem Device ID.
| NVIDIA RTX A2000 Embedded GPU | 25FA |
| NVIDIA RTX A500 Embedded GPU | 25FB |
| NVIDIA GeForce RTX 4090 | 2684 |
| NVIDIA GeForce RTX 4090 D | 2685 |
| NVIDIA GeForce RTX 4070 Ti SUPER | 2689 |
| NVIDIA RTX 6000 Ada Generation | 26B1 1028 16A1 |
| NVIDIA RTX 6000 Ada Generation | 26B1 103C 16A1 |
| NVIDIA RTX 6000 Ada Generation | 26B1 10DE 16A1 |
| NVIDIA RTX 6000 Ada Generation | 26B1 17AA 16A1 |
| NVIDIA RTX 5000 Ada Generation | 26B2 1028 17FA |
| NVIDIA RTX 5000 Ada Generation | 26B2 103C 17FA |
| NVIDIA RTX 5000 Ada Generation | 26B2 10DE 17FA |
| NVIDIA RTX 5000 Ada Generation | 26B2 17AA 17FA |
| NVIDIA RTX 5880 Ada Generation | 26B3 1028 1934 |
| NVIDIA RTX 5880 Ada Generation | 26B3 103C 1934 |
| NVIDIA RTX 5880 Ada Generation | 26B3 10DE 1934 |
| NVIDIA RTX 5880 Ada Generation | 26B3 17AA 1934 |
| NVIDIA L40 | 26B5 10DE 169D |
| NVIDIA L40 | 26B5 10DE 17DA |
| NVIDIA L40S | 26B9 10DE 1851 |
| NVIDIA L40S | 26B9 10DE 18CF |
| NVIDIA L20 | 26BA 10DE 1957 |
| NVIDIA L20 | 26BA 10DE 1990 |
| NVIDIA GeForce RTX 4080 SUPER | 2702 |
| NVIDIA GeForce RTX 4080 | 2704 |
| NVIDIA GeForce RTX 4070 Ti SUPER | 2705 |
| NVIDIA GeForce RTX 4070 | 2709 |
| NVIDIA GeForce RTX 4090 Laptop GPU | 2717 |
| NVIDIA RTX 5000 Ada Generation Laptop GPU | 2730 |
| NVIDIA GeForce RTX 4090 Laptop GPU | 2757 |
| NVIDIA RTX 5000 Ada Generation Embedded GPU | 2770 |
| NVIDIA GeForce RTX 4070 Ti | 2782 |
| NVIDIA GeForce RTX 4070 SUPER | 2783 |
| NVIDIA GeForce RTX 4070 | 2786 |
| NVIDIA GeForce RTX 4060 Ti | 2788 |
| NVIDIA GeForce RTX 4080 Laptop GPU | 27A0 |
| NVIDIA RTX 4000 SFF Ada Generation | 27B0 1028 16FA |
| NVIDIA RTX 4000 SFF Ada Generation | 27B0 103C 16FA |
| NVIDIA RTX 4000 SFF Ada Generation | 27B0 10DE 16FA |
| NVIDIA RTX 4000 SFF Ada Generation | 27B0 17AA 16FA |
| NVIDIA RTX 4500 Ada Generation | 27B1 1028 180C |
| NVIDIA RTX 4500 Ada Generation | 27B1 103C 180C |
| NVIDIA RTX 4500 Ada Generation | 27B1 10DE 180C |
| NVIDIA RTX 4500 Ada Generation | 27B1 17AA 180C |
| NVIDIA RTX 4000 Ada Generation | 27B2 1028 181B |
| NVIDIA RTX 4000 Ada Generation | 27B2 103C 181B |
| NVIDIA RTX 4000 Ada Generation | 27B2 10DE 181B |
| NVIDIA RTX 4000 Ada Generation | 27B2 17AA 181B |
| NVIDIA L2 | 27B6 10DE 1933 |
| NVIDIA L4 | 27B8 10DE 16CA |
| NVIDIA L4 | 27B8 10DE 16EE |
| NVIDIA RTX 4000 Ada Generation Laptop GPU | 27BA |
| NVIDIA RTX 3500 Ada Generation Laptop GPU | 27BB |
| NVIDIA GeForce RTX 4080 Laptop GPU | 27E0 |
| NVIDIA RTX 3500 Ada Generation Embedded GPU | 27FB |
| NVIDIA GeForce RTX 4060 Ti | 2803 |
| NVIDIA GeForce RTX 4060 Ti | 2805 |
| NVIDIA GeForce RTX 4060 | 2808 |
| NVIDIA GeForce RTX 4070 Laptop GPU | 2820 |
| NVIDIA RTX 3000 Ada Generation Laptop GPU | 2838 |
| NVIDIA GeForce RTX 4070 Laptop GPU | 2860 |
| NVIDIA GeForce RTX 4060 | 2882 |
| NVIDIA GeForce RTX 4060 Laptop GPU | 28A0 |
| NVIDIA GeForce RTX 4050 Laptop GPU | 28A1 |
| NVIDIA RTX 2000 Ada Generation | 28B0 1028 1870 |
| NVIDIA RTX 2000 Ada Generation | 28B0 103C 1870 |
| NVIDIA RTX 2000E Ada Generation | 28B0 103C 1871 |
| NVIDIA RTX 2000 Ada Generation | 28B0 10DE 1870 |
| NVIDIA RTX 2000E Ada Generation | 28B0 10DE 1871 |
| NVIDIA RTX 2000 Ada Generation | 28B0 17AA 1870 |
| NVIDIA RTX 2000E Ada Generation | 28B0 17AA 1871 |
| NVIDIA RTX 2000 Ada Generation Laptop GPU | 28B8 |
| NVIDIA RTX 1000 Ada Generation Laptop GPU | 28B9 |
| NVIDIA RTX 500 Ada Generation Laptop GPU | 28BA |
| NVIDIA RTX 500 Ada Generation Laptop GPU | 28BB |
| NVIDIA GeForce RTX 4060 Laptop GPU | 28E0 |
| NVIDIA GeForce RTX 4050 Laptop GPU | 28E1 |
| NVIDIA RTX 2000 Ada Generation Embedded GPU | 28F8 |

View File

@@ -70,14 +70,26 @@ $(foreach _module, $(NV_KERNEL_MODULES), \
EXTRA_CFLAGS += -I$(src)/common/inc
EXTRA_CFLAGS += -I$(src)
EXTRA_CFLAGS += -Wall $(DEFINES) $(INCLUDES) -Wno-cast-qual -Wno-error -Wno-format-extra-args
EXTRA_CFLAGS += -Wall $(DEFINES) $(INCLUDES) -Wno-cast-qual -Wno-format-extra-args
EXTRA_CFLAGS += -D__KERNEL__ -DMODULE -DNVRM
EXTRA_CFLAGS += -DNV_VERSION_STRING=\"535.43.02\"
EXTRA_CFLAGS += -DNV_VERSION_STRING=\"555.58.02\"
ifneq ($(SYSSRCHOST1X),)
EXTRA_CFLAGS += -I$(SYSSRCHOST1X)
endif
# Some Android kernels prohibit driver use of filesystem functions like
# filp_open() and kernel_read(). Disable the NV_FILESYSTEM_ACCESS_AVAILABLE
# functionality that uses those functions when building for Android.
PLATFORM_IS_ANDROID ?= 0
ifeq ($(PLATFORM_IS_ANDROID),1)
EXTRA_CFLAGS += -DNV_FILESYSTEM_ACCESS_AVAILABLE=0
else
EXTRA_CFLAGS += -DNV_FILESYSTEM_ACCESS_AVAILABLE=1
endif
EXTRA_CFLAGS += -Wno-unused-function
ifneq ($(NV_BUILD_TYPE),debug)
@@ -92,7 +104,6 @@ endif
ifeq ($(NV_BUILD_TYPE),debug)
EXTRA_CFLAGS += -g
EXTRA_CFLAGS += $(call cc-option,-gsplit-dwarf,)
endif
EXTRA_CFLAGS += -ffreestanding
@@ -107,7 +118,7 @@ ifeq ($(ARCH),x86_64)
endif
ifeq ($(ARCH),powerpc)
EXTRA_CFLAGS += -mlittle-endian -mno-strict-align -mno-altivec
EXTRA_CFLAGS += -mlittle-endian -mno-strict-align
endif
EXTRA_CFLAGS += -DNV_UVM_ENABLE
@@ -123,6 +134,16 @@ ifneq ($(wildcard /proc/sgi_uv),)
EXTRA_CFLAGS += -DNV_CONFIG_X86_UV
endif
ifdef VGX_FORCE_VFIO_PCI_CORE
EXTRA_CFLAGS += -DNV_VGPU_FORCE_VFIO_PCI_CORE
endif
WARNINGS_AS_ERRORS ?=
ifeq ($(WARNINGS_AS_ERRORS),1)
ccflags-y += -Werror
else
ccflags-y += -Wno-error
endif
#
# The conftest.sh script tests various aspects of the target kernel.
@@ -149,6 +170,10 @@ NV_CONFTEST_CMD := /bin/sh $(NV_CONFTEST_SCRIPT) \
NV_CFLAGS_FROM_CONFTEST := $(shell $(NV_CONFTEST_CMD) build_cflags)
NV_CONFTEST_CFLAGS = $(NV_CFLAGS_FROM_CONFTEST) $(EXTRA_CFLAGS) -fno-pie
NV_CONFTEST_CFLAGS += $(call cc-disable-warning,pointer-sign)
NV_CONFTEST_CFLAGS += $(call cc-option,-fshort-wchar,)
NV_CONFTEST_CFLAGS += $(call cc-option,-Werror=incompatible-pointer-types,)
NV_CONFTEST_CFLAGS += -Wno-error
NV_CONFTEST_COMPILE_TEST_HEADERS := $(obj)/conftest/macros.h
NV_CONFTEST_COMPILE_TEST_HEADERS += $(obj)/conftest/functions.h
@@ -208,99 +233,7 @@ $(obj)/conftest/patches.h: $(NV_CONFTEST_SCRIPT)
@mkdir -p $(obj)/conftest
@$(NV_CONFTEST_CMD) patch_check > $@
# Each of these headers is checked for presence with a test #include; a
# corresponding #define will be generated in conftest/headers.h.
NV_HEADER_PRESENCE_TESTS = \
asm/system.h \
drm/drmP.h \
drm/drm_auth.h \
drm/drm_gem.h \
drm/drm_crtc.h \
drm/drm_color_mgmt.h \
drm/drm_atomic.h \
drm/drm_atomic_helper.h \
drm/drm_atomic_state_helper.h \
drm/drm_encoder.h \
drm/drm_atomic_uapi.h \
drm/drm_drv.h \
drm/drm_framebuffer.h \
drm/drm_connector.h \
drm/drm_probe_helper.h \
drm/drm_blend.h \
drm/drm_fourcc.h \
drm/drm_prime.h \
drm/drm_plane.h \
drm/drm_vblank.h \
drm/drm_file.h \
drm/drm_ioctl.h \
drm/drm_device.h \
drm/drm_mode_config.h \
drm/drm_modeset_lock.h \
dt-bindings/interconnect/tegra_icc_id.h \
generated/autoconf.h \
generated/compile.h \
generated/utsrelease.h \
linux/efi.h \
linux/kconfig.h \
linux/platform/tegra/mc_utils.h \
linux/printk.h \
linux/ratelimit.h \
linux/prio_tree.h \
linux/log2.h \
linux/of.h \
linux/bug.h \
linux/sched.h \
linux/sched/mm.h \
linux/sched/signal.h \
linux/sched/task.h \
linux/sched/task_stack.h \
xen/ioemu.h \
linux/fence.h \
linux/dma-resv.h \
soc/tegra/chip-id.h \
soc/tegra/fuse.h \
soc/tegra/tegra_bpmp.h \
video/nv_internal.h \
linux/platform/tegra/dce/dce-client-ipc.h \
linux/nvhost.h \
linux/nvhost_t194.h \
linux/host1x-next.h \
asm/book3s/64/hash-64k.h \
asm/set_memory.h \
asm/prom.h \
asm/powernv.h \
linux/atomic.h \
asm/barrier.h \
asm/opal-api.h \
sound/hdaudio.h \
asm/pgtable_types.h \
linux/stringhash.h \
linux/dma-map-ops.h \
rdma/peer_mem.h \
sound/hda_codec.h \
linux/dma-buf.h \
linux/time.h \
linux/platform_device.h \
linux/mutex.h \
linux/reset.h \
linux/of_platform.h \
linux/of_device.h \
linux/of_gpio.h \
linux/gpio.h \
linux/gpio/consumer.h \
linux/interconnect.h \
linux/pm_runtime.h \
linux/clk.h \
linux/clk-provider.h \
linux/ioasid.h \
linux/stdarg.h \
linux/iosys-map.h \
asm/coco.h \
linux/vfio_pci_core.h \
linux/mdev.h \
soc/tegra/bpmp-abi.h \
soc/tegra/bpmp.h
include $(src)/header-presence-tests.mk
# Filename to store the define for the header in $(1); this is only consumed by
# the rule below that concatenates all of these together.

View File

@@ -28,7 +28,7 @@ else
else
KERNEL_UNAME ?= $(shell uname -r)
KERNEL_MODLIB := /lib/modules/$(KERNEL_UNAME)
KERNEL_SOURCES := $(shell test -d $(KERNEL_MODLIB)/source && echo $(KERNEL_MODLIB)/source || echo $(KERNEL_MODLIB)/build)
KERNEL_SOURCES := $(shell ((test -d $(KERNEL_MODLIB)/source && echo $(KERNEL_MODLIB)/source) || (test -d $(KERNEL_MODLIB)/build/source && echo $(KERNEL_MODLIB)/build/source)) || echo $(KERNEL_MODLIB)/build)
endif
KERNEL_OUTPUT := $(KERNEL_SOURCES)
@@ -42,7 +42,11 @@ else
else
KERNEL_UNAME ?= $(shell uname -r)
KERNEL_MODLIB := /lib/modules/$(KERNEL_UNAME)
ifeq ($(KERNEL_SOURCES), $(KERNEL_MODLIB)/source)
# $(filter patter...,text) - Returns all whitespace-separated words in text that
# do match any of the pattern words, removing any words that do not match.
# Set the KERNEL_OUTPUT only if either $(KERNEL_MODLIB)/source or
# $(KERNEL_MODLIB)/build/source path matches the KERNEL_SOURCES.
ifneq ($(filter $(KERNEL_SOURCES),$(KERNEL_MODLIB)/source $(KERNEL_MODLIB)/build/source),)
KERNEL_OUTPUT := $(KERNEL_MODLIB)/build
KBUILD_PARAMS := KBUILD_OUTPUT=$(KERNEL_OUTPUT)
endif
@@ -57,12 +61,15 @@ else
-e 's/armv[0-7]\w\+/arm/' \
-e 's/aarch64/arm64/' \
-e 's/ppc64le/powerpc/' \
-e 's/riscv64/riscv/' \
)
endif
NV_KERNEL_MODULES ?= $(wildcard nvidia nvidia-uvm nvidia-vgpu-vfio nvidia-modeset nvidia-drm nvidia-peermem)
NV_KERNEL_MODULES := $(filter-out $(NV_EXCLUDE_KERNEL_MODULES), \
$(NV_KERNEL_MODULES))
INSTALL_MOD_DIR ?= kernel/drivers/video
NV_VERBOSE ?=
SPECTRE_V2_RETPOLINE ?= 0
@@ -74,7 +81,7 @@ else
KBUILD_PARAMS += NV_KERNEL_SOURCES=$(KERNEL_SOURCES)
KBUILD_PARAMS += NV_KERNEL_OUTPUT=$(KERNEL_OUTPUT)
KBUILD_PARAMS += NV_KERNEL_MODULES="$(NV_KERNEL_MODULES)"
KBUILD_PARAMS += INSTALL_MOD_DIR=kernel/drivers/video
KBUILD_PARAMS += INSTALL_MOD_DIR="$(INSTALL_MOD_DIR)"
KBUILD_PARAMS += NV_SPECTRE_V2=$(SPECTRE_V2_RETPOLINE)
.PHONY: modules module clean clean_conftest modules_install

View File

@@ -0,0 +1,43 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*/
#ifndef _NV_CHARDEV_NUMBERS_H_
#define _NV_CHARDEV_NUMBERS_H_
// NVIDIA's reserved major character device number (Linux).
#define NV_MAJOR_DEVICE_NUMBER 195
// Minor numbers 0 to 247 reserved for regular devices
#define NV_MINOR_DEVICE_NUMBER_REGULAR_MAX 247
// Minor numbers 248 to 253 currently unused
// Minor number 254 reserved for the modeset device (provided by NVKMS)
#define NV_MINOR_DEVICE_NUMBER_MODESET_DEVICE 254
// Minor number 255 reserved for the control device
#define NV_MINOR_DEVICE_NUMBER_CONTROL_DEVICE 255
#endif // _NV_CHARDEV_NUMBERS_H_

View File

@@ -37,13 +37,11 @@ typedef enum _HYPERVISOR_TYPE
OS_HYPERVISOR_UNKNOWN
} HYPERVISOR_TYPE;
#define CMD_VGPU_VFIO_WAKE_WAIT_QUEUE 0
#define CMD_VGPU_VFIO_INJECT_INTERRUPT 1
#define CMD_VGPU_VFIO_REGISTER_MDEV 2
#define CMD_VGPU_VFIO_PRESENT 3
#define CMD_VFIO_PCI_CORE_PRESENT 4
#define CMD_VFIO_WAKE_REMOVE_GPU 1
#define CMD_VGPU_VFIO_PRESENT 2
#define CMD_VFIO_PCI_CORE_PRESENT 3
#define MAX_VF_COUNT_PER_GPU 64
#define MAX_VF_COUNT_PER_GPU 64
typedef enum _VGPU_TYPE_INFO
{
@@ -54,17 +52,11 @@ typedef enum _VGPU_TYPE_INFO
typedef struct
{
void *vgpuVfioRef;
void *waitQueue;
void *nv;
NvU32 *vgpuTypeIds;
NvU8 **vgpuNames;
NvU32 numVgpuTypes;
NvU32 domain;
NvU8 bus;
NvU8 slot;
NvU8 function;
NvBool is_virtfn;
NvU32 domain;
NvU32 bus;
NvU32 device;
NvU32 return_status;
} vgpu_vfio_info;
typedef struct

View File

@@ -25,14 +25,12 @@
#ifndef NV_IOCTL_NUMA_H
#define NV_IOCTL_NUMA_H
#if defined(NV_LINUX)
#include <nv-ioctl-numbers.h>
#if defined(NV_KERNEL_INTERFACE_LAYER)
#if defined(NV_KERNEL_INTERFACE_LAYER) && defined(NV_LINUX)
#include <linux/types.h>
#elif defined (NV_KERNEL_INTERFACE_LAYER) && defined(NV_BSD)
#include <sys/stdint.h>
#else
#include <stdint.h>
@@ -81,5 +79,3 @@ typedef struct nv_ioctl_set_numa_status
#define NV_IOCTL_NUMA_STATUS_OFFLINE_FAILED 6
#endif
#endif

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2020-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2020-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -39,5 +39,6 @@
#define NV_ESC_QUERY_DEVICE_INTR (NV_IOCTL_BASE + 13)
#define NV_ESC_SYS_PARAMS (NV_IOCTL_BASE + 14)
#define NV_ESC_EXPORT_TO_DMABUF_FD (NV_IOCTL_BASE + 17)
#define NV_ESC_WAIT_OPEN_COMPLETE (NV_IOCTL_BASE + 18)
#endif

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2020-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2020-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -142,4 +142,10 @@ typedef struct nv_ioctl_export_to_dma_buf_fd
NvU32 status;
} nv_ioctl_export_to_dma_buf_fd_t;
typedef struct nv_ioctl_wait_open_complete
{
int rc;
NvU32 adapterStatus;
} nv_ioctl_wait_open_complete_t;
#endif

View File

@@ -0,0 +1,62 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*/
#ifndef __NV_KTHREAD_QUEUE_OS_H__
#define __NV_KTHREAD_QUEUE_OS_H__
#include <linux/types.h> // atomic_t
#include <linux/list.h> // list
#include <linux/sched.h> // task_struct
#include <linux/numa.h> // NUMA_NO_NODE
#include <linux/semaphore.h>
#include "conftest.h"
struct nv_kthread_q
{
struct list_head q_list_head;
spinlock_t q_lock;
// This is a counting semaphore. It gets incremented and decremented
// exactly once for each item that is added to the queue.
struct semaphore q_sem;
atomic_t main_loop_should_exit;
struct task_struct *q_kthread;
};
struct nv_kthread_q_item
{
struct list_head q_list_node;
nv_q_func_t function_to_run;
void *function_args;
};
#ifndef NUMA_NO_NODE
#define NUMA_NO_NODE (-1)
#endif
#define NV_KTHREAD_NO_NODE NUMA_NO_NODE
#endif

View File

@@ -24,13 +24,14 @@
#ifndef __NV_KTHREAD_QUEUE_H__
#define __NV_KTHREAD_QUEUE_H__
#include <linux/types.h> // atomic_t
#include <linux/list.h> // list
#include <linux/sched.h> // task_struct
#include <linux/numa.h> // NUMA_NO_NODE
#include <linux/semaphore.h>
struct nv_kthread_q;
struct nv_kthread_q_item;
typedef struct nv_kthread_q nv_kthread_q_t;
typedef struct nv_kthread_q_item nv_kthread_q_item_t;
#include "conftest.h"
typedef void (*nv_q_func_t)(void *args);
#include "nv-kthread-q-os.h"
////////////////////////////////////////////////////////////////////////////////
// nv_kthread_q:
@@ -85,38 +86,6 @@
//
////////////////////////////////////////////////////////////////////////////////
typedef struct nv_kthread_q nv_kthread_q_t;
typedef struct nv_kthread_q_item nv_kthread_q_item_t;
typedef void (*nv_q_func_t)(void *args);
struct nv_kthread_q
{
struct list_head q_list_head;
spinlock_t q_lock;
// This is a counting semaphore. It gets incremented and decremented
// exactly once for each item that is added to the queue.
struct semaphore q_sem;
atomic_t main_loop_should_exit;
struct task_struct *q_kthread;
};
struct nv_kthread_q_item
{
struct list_head q_list_node;
nv_q_func_t function_to_run;
void *function_args;
};
#ifndef NUMA_NO_NODE
#define NUMA_NO_NODE (-1)
#endif
#define NV_KTHREAD_NO_NODE NUMA_NO_NODE
//
// The queue must not be used before calling this routine.
//
@@ -155,10 +124,7 @@ int nv_kthread_q_init_on_node(nv_kthread_q_t *q,
// This routine is the same as nv_kthread_q_init_on_node() with the exception
// that the queue stack will be allocated on the NUMA node of the caller.
//
static inline int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname)
{
return nv_kthread_q_init_on_node(q, qname, NV_KTHREAD_NO_NODE);
}
int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname);
//
// The caller is responsible for stopping all queues, by calling this routine

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2001-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2001-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -35,6 +35,7 @@
#include "os-interface.h"
#include "nv-timer.h"
#include "nv-time.h"
#include "nv-chardev-numbers.h"
#define NV_KERNEL_NAME "Linux"
@@ -57,14 +58,10 @@
#include <linux/version.h>
#include <linux/utsname.h>
#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 32)
#error "This driver does not support kernels older than 2.6.32!"
#elif LINUX_VERSION_CODE < KERNEL_VERSION(2, 7, 0)
# define KERNEL_2_6
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(3, 0, 0)
# define KERNEL_3
#else
#error "This driver does not support development kernels!"
#if LINUX_VERSION_CODE == KERNEL_VERSION(4, 4, 0)
// Version 4.4 is allowed, temporarily, although not officially supported.
#elif LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
#error "This driver does not support kernels older than Linux 4.15!"
#endif
#if defined (CONFIG_SMP) && !defined (__SMP__)
@@ -211,6 +208,7 @@
#include <linux/highmem.h>
#include <linux/nodemask.h>
#include <linux/memory.h>
#include <linux/workqueue.h> /* workqueue */
#include "nv-kthread-q.h" /* kthread based queue */
@@ -247,7 +245,7 @@ NV_STATUS nvos_forward_error_to_cray(struct pci_dev *, NvU32,
#undef NV_SET_PAGES_UC_PRESENT
#endif
#if !defined(NVCPU_AARCH64) && !defined(NVCPU_PPC64LE)
#if !defined(NVCPU_AARCH64) && !defined(NVCPU_PPC64LE) && !defined(NVCPU_RISCV64)
#if !defined(NV_SET_MEMORY_UC_PRESENT) && !defined(NV_SET_PAGES_UC_PRESENT)
#error "This driver requires the ability to change memory types!"
#endif
@@ -405,32 +403,6 @@ extern int nv_pat_mode;
#define NV_GFP_DMA32 (NV_GFP_KERNEL)
#endif
extern NvBool nvos_is_chipset_io_coherent(void);
#if defined(NVCPU_X86_64)
#define CACHE_FLUSH() asm volatile("wbinvd":::"memory")
#define WRITE_COMBINE_FLUSH() asm volatile("sfence":::"memory")
#elif defined(NVCPU_AARCH64)
static inline void nv_flush_cache_cpu(void *info)
{
if (!nvos_is_chipset_io_coherent())
{
#if defined(NV_FLUSH_CACHE_ALL_PRESENT)
flush_cache_all();
#else
WARN_ONCE(0, "NVRM: kernel does not support flush_cache_all()\n");
#endif
}
}
#define CACHE_FLUSH() nv_flush_cache_cpu(NULL)
#define CACHE_FLUSH_ALL() on_each_cpu(nv_flush_cache_cpu, NULL, 1)
#define WRITE_COMBINE_FLUSH() mb()
#elif defined(NVCPU_PPC64LE)
#define CACHE_FLUSH() asm volatile("sync; \n" \
"isync; \n" ::: "memory")
#define WRITE_COMBINE_FLUSH() CACHE_FLUSH()
#endif
typedef enum
{
NV_MEMORY_TYPE_SYSTEM, /* Memory mapped for ROM, SBIOS and physical RAM. */
@@ -439,7 +411,7 @@ typedef enum
NV_MEMORY_TYPE_DEVICE_MMIO, /* All kinds of MMIO referred by NVRM e.g. BARs and MCFG of device */
} nv_memory_type_t;
#if defined(NVCPU_AARCH64) || defined(NVCPU_PPC64LE)
#if defined(NVCPU_AARCH64) || defined(NVCPU_PPC64LE) || defined(NVCPU_RISCV64)
#define NV_ALLOW_WRITE_COMBINING(mt) 1
#elif defined(NVCPU_X86_64)
#if defined(NV_ENABLE_PAT_SUPPORT)
@@ -510,7 +482,11 @@ static inline void nv_vfree(void *ptr, NvU64 size)
static inline void *nv_ioremap(NvU64 phys, NvU64 size)
{
#if IS_ENABLED(CONFIG_INTEL_TDX_GUEST) && defined(NV_IOREMAP_DRIVER_HARDENED_PRESENT)
void *ptr = ioremap_driver_hardened(phys, size);
#else
void *ptr = ioremap(phys, size);
#endif
if (ptr)
NV_MEMDBG_ADD(ptr, size);
return ptr;
@@ -523,11 +499,11 @@ static inline void *nv_ioremap_nocache(NvU64 phys, NvU64 size)
static inline void *nv_ioremap_cache(NvU64 phys, NvU64 size)
{
#if defined(NV_IOREMAP_CACHE_PRESENT)
void *ptr = ioremap_cache(phys, size);
if (ptr)
NV_MEMDBG_ADD(ptr, size);
return ptr;
void *ptr = NULL;
#if IS_ENABLED(CONFIG_INTEL_TDX_GUEST) && defined(NV_IOREMAP_CACHE_SHARED_PRESENT)
ptr = ioremap_cache_shared(phys, size);
#elif defined(NV_IOREMAP_CACHE_PRESENT)
ptr = ioremap_cache(phys, size);
#elif defined(NVCPU_PPC64LE)
//
// ioremap_cache() has been only implemented correctly for ppc64le with
@@ -542,25 +518,32 @@ static inline void *nv_ioremap_cache(NvU64 phys, NvU64 size)
// (commit 40f1ce7fb7e8, kernel 3.0+) and that covers all kernels we
// support on power.
//
void *ptr = ioremap_prot(phys, size, pgprot_val(PAGE_KERNEL));
if (ptr)
NV_MEMDBG_ADD(ptr, size);
return ptr;
ptr = ioremap_prot(phys, size, pgprot_val(PAGE_KERNEL));
#else
return nv_ioremap(phys, size);
#endif
if (ptr)
NV_MEMDBG_ADD(ptr, size);
return ptr;
}
static inline void *nv_ioremap_wc(NvU64 phys, NvU64 size)
{
#if defined(NV_IOREMAP_WC_PRESENT)
void *ptr = ioremap_wc(phys, size);
if (ptr)
NV_MEMDBG_ADD(ptr, size);
return ptr;
void *ptr = NULL;
#if IS_ENABLED(CONFIG_INTEL_TDX_GUEST) && defined(NV_IOREMAP_DRIVER_HARDENED_WC_PRESENT)
ptr = ioremap_driver_hardened_wc(phys, size);
#elif defined(NV_IOREMAP_WC_PRESENT)
ptr = ioremap_wc(phys, size);
#else
return nv_ioremap_nocache(phys, size);
#endif
if (ptr)
NV_MEMDBG_ADD(ptr, size);
return ptr;
}
static inline void nv_iounmap(void *ptr, NvU64 size)
@@ -633,37 +616,24 @@ static NvBool nv_numa_node_has_memory(int node_id)
free_pages(ptr, order); \
}
extern NvU64 nv_shared_gpa_boundary;
static inline pgprot_t nv_sme_clr(pgprot_t prot)
{
#if defined(__sme_clr)
return __pgprot(__sme_clr(pgprot_val(prot)));
#else
return prot;
#endif // __sme_clr
}
static inline pgprot_t nv_adjust_pgprot(pgprot_t vm_prot, NvU32 extra)
{
pgprot_t prot = __pgprot(pgprot_val(vm_prot) | extra);
#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(NV_PGPROT_DECRYPTED_PRESENT)
/*
* When AMD memory encryption is enabled, device memory mappings with the
* C-bit set read as 0xFF, so ensure the bit is cleared for user mappings.
*
* If cc_mkdec() is present, then pgprot_decrypted() can't be used.
*/
#if defined(NV_CC_MKDEC_PRESENT)
if (nv_shared_gpa_boundary != 0)
{
/*
* By design, a VM using vTOM doesn't see the SEV setting and
* for AMD with vTOM, *set* means decrypted.
*/
prot = __pgprot(nv_shared_gpa_boundary | (pgprot_val(vm_prot)));
}
else
{
prot = __pgprot(__sme_clr(pgprot_val(vm_prot)));
}
#else
prot = pgprot_decrypted(prot);
#endif
#endif
return prot;
#if defined(pgprot_decrypted)
return pgprot_decrypted(prot);
#else
return nv_sme_clr(prot);
#endif // pgprot_decrypted
}
#if defined(PAGE_KERNEL_NOENC)
@@ -754,7 +724,6 @@ static inline dma_addr_t nv_phys_to_dma(struct device *dev, NvU64 pa)
#define NV_VMA_FILE(vma) ((vma)->vm_file)
#define NV_DEVICE_MINOR_NUMBER(x) minor((x)->i_rdev)
#define NV_CONTROL_DEVICE_MINOR 255
#define NV_PCI_DISABLE_DEVICE(pci_dev) \
{ \
@@ -863,16 +832,16 @@ static inline dma_addr_t nv_phys_to_dma(struct device *dev, NvU64 pa)
#define NV_PRINT_AT(nv_debug_level,at) \
{ \
nv_printf(nv_debug_level, \
"NVRM: VM: %s:%d: 0x%p, %d page(s), count = %d, flags = 0x%08x, " \
"NVRM: VM: %s:%d: 0x%p, %d page(s), count = %d, " \
"page_table = 0x%p\n", __FUNCTION__, __LINE__, at, \
at->num_pages, NV_ATOMIC_READ(at->usage_count), \
at->flags, at->page_table); \
at->page_table); \
}
#define NV_PRINT_VMA(nv_debug_level,vma) \
{ \
nv_printf(nv_debug_level, \
"NVRM: VM: %s:%d: 0x%lx - 0x%lx, 0x%08x bytes @ 0x%016llx, 0x%p, 0x%p\n", \
"NVRM: VM: %s:%d: 0x%lx - 0x%lx, 0x%08lx bytes @ 0x%016llx, 0x%p, 0x%p\n", \
__FUNCTION__, __LINE__, vma->vm_start, vma->vm_end, NV_VMA_SIZE(vma), \
NV_VMA_OFFSET(vma), NV_VMA_PRIVATE(vma), NV_VMA_FILE(vma)); \
}
@@ -1105,6 +1074,8 @@ static inline void nv_kmem_ctor_dummy(void *arg)
kmem_cache_destroy(kmem_cache); \
}
#define NV_KMEM_CACHE_ALLOC_ATOMIC(kmem_cache) \
kmem_cache_alloc(kmem_cache, GFP_ATOMIC)
#define NV_KMEM_CACHE_ALLOC(kmem_cache) \
kmem_cache_alloc(kmem_cache, GFP_KERNEL)
#define NV_KMEM_CACHE_FREE(ptr, kmem_cache) \
@@ -1131,6 +1102,23 @@ static inline void *nv_kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags)
#endif
}
static inline int nv_kmem_cache_alloc_stack_atomic(nvidia_stack_t **stack)
{
nvidia_stack_t *sp = NULL;
#if defined(NVCPU_X86_64)
if (rm_is_altstack_in_use())
{
sp = NV_KMEM_CACHE_ALLOC_ATOMIC(nvidia_stack_t_cache);
if (sp == NULL)
return -ENOMEM;
sp->size = sizeof(sp->stack);
sp->top = sp->stack + sp->size;
}
#endif
*stack = sp;
return 0;
}
static inline int nv_kmem_cache_alloc_stack(nvidia_stack_t **stack)
{
nvidia_stack_t *sp = NULL;
@@ -1323,7 +1311,7 @@ nv_dma_maps_swiotlb(struct device *dev)
* SEV memory encryption") forces SWIOTLB to be enabled when AMD SEV
* is active in all cases.
*/
if (os_sev_enabled)
if (os_cc_enabled)
swiotlb_in_use = NV_TRUE;
#endif
@@ -1377,7 +1365,19 @@ typedef struct nv_dma_map_s {
i < dm->mapping.discontig.submap_count; \
i++, sm = &dm->mapping.discontig.submaps[i])
/*
* On 4K ARM kernels, use max submap size a multiple of 64K to keep nv-p2p happy.
* Despite 4K OS pages, we still use 64K P2P pages due to dependent modules still using 64K.
* Instead of using (4G-4K), use max submap size as (4G-64K) since the mapped IOVA range
* must be aligned at 64K boundary.
*/
#if defined(CONFIG_ARM64_4K_PAGES)
#define NV_DMA_U32_MAX_4K_PAGES ((NvU32)((NV_U32_MAX >> PAGE_SHIFT) + 1))
#define NV_DMA_SUBMAP_MAX_PAGES ((NvU32)(NV_DMA_U32_MAX_4K_PAGES - 16))
#else
#define NV_DMA_SUBMAP_MAX_PAGES ((NvU32)(NV_U32_MAX >> PAGE_SHIFT))
#endif
#define NV_DMA_SUBMAP_IDX_TO_PAGE_IDX(s) (s * NV_DMA_SUBMAP_MAX_PAGES)
/*
@@ -1457,6 +1457,11 @@ typedef struct coherent_link_info_s {
* baremetal OS environment it is System Physical Address(SPA) and in the case
* of virutalized OS environment it is Intermediate Physical Address(IPA) */
NvU64 gpu_mem_pa;
/* Physical address of the reserved portion of the GPU memory, applicable
* only in Grace Hopper self hosted passthrough virtualizatioan platform. */
NvU64 rsvd_mem_pa;
/* Bitmap of NUMA node ids, corresponding to the reserved PXMs,
* available for adding GPU memory to the kernel as system RAM */
DECLARE_BITMAP(free_node_bitmap, MAX_NUMNODES);
@@ -1604,6 +1609,30 @@ typedef struct nv_linux_state_s {
struct nv_dma_device dma_dev;
struct nv_dma_device niso_dma_dev;
/*
* Background kthread for handling deferred open operations
* (e.g. from O_NONBLOCK).
*
* Adding to open_q and reading/writing is_accepting_opens
* are protected by nvl->open_q_lock (not nvl->ldata_lock).
* This allows new deferred open operations to be enqueued without
* blocking behind previous ones (which hold nvl->ldata_lock).
*
* Adding to open_q is only safe if is_accepting_opens is true.
* This prevents open operations from racing with device removal.
*
* Stopping open_q is only safe after setting is_accepting_opens to false.
* This ensures that the open_q (and the larger nvl structure) will
* outlive any of the open operations enqueued.
*/
nv_kthread_q_t open_q;
NvBool is_accepting_opens;
struct semaphore open_q_lock;
#if defined(NV_VGPU_KVM_BUILD)
wait_queue_head_t wait;
NvS32 return_status;
#endif
} nv_linux_state_t;
extern nv_linux_state_t *nv_linux_devices;
@@ -1647,22 +1676,13 @@ typedef struct nvidia_event
nv_event_t event;
} nvidia_event_t;
typedef enum
{
NV_FOPS_STACK_INDEX_MMAP,
NV_FOPS_STACK_INDEX_IOCTL,
NV_FOPS_STACK_INDEX_COUNT
} nvidia_entry_point_index_t;
typedef struct
{
nv_file_private_t nvfp;
nvidia_stack_t *sp;
nvidia_stack_t *fops_sp[NV_FOPS_STACK_INDEX_COUNT];
struct semaphore fops_sp_lock[NV_FOPS_STACK_INDEX_COUNT];
nv_alloc_t *free_list;
void *nvptr;
nv_linux_state_t *nvptr;
nvidia_event_t *event_data_head, *event_data_tail;
NvBool dataless_event_pending;
nv_spinlock_t fp_lock;
@@ -1673,6 +1693,12 @@ typedef struct
nv_alloc_mapping_context_t mmap_context;
struct address_space mapping;
nv_kthread_q_item_t open_q_item;
struct completion open_complete;
nv_linux_state_t *deferred_open_nvl;
int open_rc;
NV_STATUS adapter_status;
struct list_head entry;
} nv_linux_file_private_t;
@@ -1681,6 +1707,21 @@ static inline nv_linux_file_private_t *nv_get_nvlfp_from_nvfp(nv_file_private_t
return container_of(nvfp, nv_linux_file_private_t, nvfp);
}
static inline int nv_wait_open_complete_interruptible(nv_linux_file_private_t *nvlfp)
{
return wait_for_completion_interruptible(&nvlfp->open_complete);
}
static inline void nv_wait_open_complete(nv_linux_file_private_t *nvlfp)
{
wait_for_completion(&nvlfp->open_complete);
}
static inline NvBool nv_is_open_complete(nv_linux_file_private_t *nvlfp)
{
return completion_done(&nvlfp->open_complete);
}
#define NV_SET_FILE_PRIVATE(filep,data) ((filep)->private_data = (data))
#define NV_GET_LINUX_FILE_PRIVATE(filep) ((nv_linux_file_private_t *)(filep)->private_data)
@@ -1690,28 +1731,6 @@ static inline nv_linux_file_private_t *nv_get_nvlfp_from_nvfp(nv_file_private_t
#define NV_STATE_PTR(nvl) &(((nv_linux_state_t *)(nvl))->nv_state)
static inline nvidia_stack_t *nv_nvlfp_get_sp(nv_linux_file_private_t *nvlfp, nvidia_entry_point_index_t which)
{
#if defined(NVCPU_X86_64)
if (rm_is_altstack_in_use())
{
down(&nvlfp->fops_sp_lock[which]);
return nvlfp->fops_sp[which];
}
#endif
return NULL;
}
static inline void nv_nvlfp_put_sp(nv_linux_file_private_t *nvlfp, nvidia_entry_point_index_t which)
{
#if defined(NVCPU_X86_64)
if (rm_is_altstack_in_use())
{
up(&nvlfp->fops_sp_lock[which]);
}
#endif
}
#define NV_ATOMIC_READ(data) atomic_read(&(data))
#define NV_ATOMIC_SET(data,val) atomic_set(&(data), (val))
#define NV_ATOMIC_INC(data) atomic_inc(&(data))
@@ -1784,12 +1803,18 @@ static inline NV_STATUS nv_check_gpu_state(nv_state_t *nv)
extern NvU32 NVreg_EnableUserNUMAManagement;
extern NvU32 NVreg_RegisterPCIDriver;
extern NvU32 NVreg_EnableResizableBar;
extern NvU32 NVreg_EnableNonblockingOpen;
extern NvU32 num_probed_nv_devices;
extern NvU32 num_nv_devices;
#define NV_FILE_INODE(file) (file)->f_inode
static inline int nv_is_control_device(struct inode *inode)
{
return (minor((inode)->i_rdev) == NV_MINOR_DEVICE_NUMBER_CONTROL_DEVICE);
}
#if defined(NV_DOM0_KERNEL_PRESENT) || defined(NV_VGPU_KVM_BUILD)
#define NV_VGX_HYPER
#if defined(NV_XEN_IOEMU_INJECT_MSI)
@@ -1983,31 +2008,6 @@ static inline NvBool nv_platform_use_auto_online(nv_linux_state_t *nvl)
return nvl->numa_info.use_auto_online;
}
typedef struct {
NvU64 base;
NvU64 size;
NvU32 nodeId;
int ret;
} remove_numa_memory_info_t;
static void offline_numa_memory_callback
(
void *args
)
{
#ifdef NV_OFFLINE_AND_REMOVE_MEMORY_PRESENT
remove_numa_memory_info_t *pNumaInfo = (remove_numa_memory_info_t *)args;
#ifdef NV_REMOVE_MEMORY_HAS_NID_ARG
pNumaInfo->ret = offline_and_remove_memory(pNumaInfo->nodeId,
pNumaInfo->base,
pNumaInfo->size);
#else
pNumaInfo->ret = offline_and_remove_memory(pNumaInfo->base,
pNumaInfo->size);
#endif
#endif
}
typedef enum
{
NV_NUMA_STATUS_DISABLED = 0,
@@ -2068,4 +2068,7 @@ typedef enum
#include <linux/clk-provider.h>
#endif
#define NV_EXPORT_SYMBOL(symbol) EXPORT_SYMBOL_GPL(symbol)
#define NV_CHECK_EXPORT_SYMBOL(symbol) NV_IS_EXPORT_SYMBOL_PRESENT_##symbol
#endif /* _NV_LINUX_H_ */

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2017 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2017-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -37,6 +37,7 @@
#if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_PREEMPT_RT_FULL)
typedef raw_spinlock_t nv_spinlock_t;
#define NV_DEFINE_SPINLOCK(lock) DEFINE_RAW_SPINLOCK(lock)
#define NV_SPIN_LOCK_INIT(lock) raw_spin_lock_init(lock)
#define NV_SPIN_LOCK_IRQ(lock) raw_spin_lock_irq(lock)
#define NV_SPIN_UNLOCK_IRQ(lock) raw_spin_unlock_irq(lock)
@@ -47,6 +48,7 @@ typedef raw_spinlock_t nv_spinlock_t;
#define NV_SPIN_UNLOCK_WAIT(lock) raw_spin_unlock_wait(lock)
#else
typedef spinlock_t nv_spinlock_t;
#define NV_DEFINE_SPINLOCK(lock) DEFINE_SPINLOCK(lock)
#define NV_SPIN_LOCK_INIT(lock) spin_lock_init(lock)
#define NV_SPIN_LOCK_IRQ(lock) spin_lock_irq(lock)
#define NV_SPIN_UNLOCK_IRQ(lock) spin_unlock_irq(lock)

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2016-2017 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2016-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -29,93 +29,125 @@
typedef int vm_fault_t;
#endif
/* pin_user_pages
/*
* pin_user_pages()
*
* Presence of pin_user_pages() also implies the presence of unpin-user_page().
* Both were added in the v5.6-rc1
* Both were added in the v5.6.
*
* pin_user_pages() was added by commit eddb1c228f7951d399240
* ("mm/gup: introduce pin_user_pages*() and FOLL_PIN") in v5.6-rc1 (2020-01-30)
* pin_user_pages() was added by commit eddb1c228f79
* ("mm/gup: introduce pin_user_pages*() and FOLL_PIN") in v5.6.
*
* Removed vmas parameter from pin_user_pages() by commit 4c630f307455
* ("mm/gup: remove vmas parameter from pin_user_pages()") in v6.5.
*/
#include <linux/mm.h>
#include <linux/sched.h>
#if defined(NV_PIN_USER_PAGES_PRESENT)
#define NV_PIN_USER_PAGES pin_user_pages
/*
* FreeBSD's pin_user_pages's conftest breaks since pin_user_pages is an inline
* function. Because it simply maps to get_user_pages, we can just replace
* NV_PIN_USER_PAGES with NV_GET_USER_PAGES on FreeBSD
*/
#if defined(NV_PIN_USER_PAGES_PRESENT) && !defined(NV_BSD)
#if defined(NV_PIN_USER_PAGES_HAS_ARGS_VMAS)
#define NV_PIN_USER_PAGES(start, nr_pages, gup_flags, pages) \
pin_user_pages(start, nr_pages, gup_flags, pages, NULL)
#else
#define NV_PIN_USER_PAGES pin_user_pages
#endif // NV_PIN_USER_PAGES_HAS_ARGS_VMAS
#define NV_UNPIN_USER_PAGE unpin_user_page
#else
#define NV_PIN_USER_PAGES NV_GET_USER_PAGES
#define NV_UNPIN_USER_PAGE put_page
#endif // NV_PIN_USER_PAGES_PRESENT
/* get_user_pages
/*
* get_user_pages()
*
* The 8-argument version of get_user_pages was deprecated by commit
* (2016 Feb 12: cde70140fed8429acf7a14e2e2cbd3e329036653)for the non-remote case
* The 8-argument version of get_user_pages() was deprecated by commit
* cde70140fed8 ("mm/gup: Overload get_user_pages() functions") in v4.6-rc1.
* (calling get_user_pages with current and current->mm).
*
* Completely moved to the 6 argument version of get_user_pages -
* 2016 Apr 4: c12d2da56d0e07d230968ee2305aaa86b93a6832
* Completely moved to the 6 argument version of get_user_pages() by
* commit c12d2da56d0e ("mm/gup: Remove the macro overload API migration
* helpers from the get_user*() APIs") in v4.6-rc4.
*
* write and force parameters were replaced with gup_flags by -
* 2016 Oct 12: 768ae309a96103ed02eb1e111e838c87854d8b51
* write and force parameters were replaced with gup_flags by
* commit 768ae309a961 ("mm: replace get_user_pages() write/force parameters
* with gup_flags") in v4.9.
*
* A 7-argument version of get_user_pages was introduced into linux-4.4.y by
* commit 8e50b8b07f462ab4b91bc1491b1c91bd75e4ad40 which cherry-picked the
* replacement of the write and force parameters with gup_flags
* commit 8e50b8b07f462 ("mm: replace get_user_pages() write/force parameters
* with gup_flags") which cherry-picked the replacement of the write and
* force parameters with gup_flags.
*
* Removed vmas parameter from get_user_pages() by commit 54d020692b34
* ("mm/gup: remove unused vmas parameter from get_user_pages()") in v6.5.
*
*/
#if defined(NV_GET_USER_PAGES_HAS_ARGS_FLAGS)
#define NV_GET_USER_PAGES get_user_pages
#elif defined(NV_GET_USER_PAGES_HAS_ARGS_TSK_FLAGS)
#define NV_GET_USER_PAGES(start, nr_pages, flags, pages, vmas) \
get_user_pages(current, current->mm, start, nr_pages, flags, pages, vmas)
#elif defined(NV_GET_USER_PAGES_HAS_ARGS_FLAGS_VMAS)
#define NV_GET_USER_PAGES(start, nr_pages, flags, pages) \
get_user_pages(start, nr_pages, flags, pages, NULL)
#elif defined(NV_GET_USER_PAGES_HAS_ARGS_TSK_FLAGS_VMAS)
#define NV_GET_USER_PAGES(start, nr_pages, flags, pages) \
get_user_pages(current, current->mm, start, nr_pages, flags, pages, NULL)
#else
static inline long NV_GET_USER_PAGES(unsigned long start,
unsigned long nr_pages,
unsigned int flags,
struct page **pages,
struct vm_area_struct **vmas)
struct page **pages)
{
int write = flags & FOLL_WRITE;
int force = flags & FOLL_FORCE;
#if defined(NV_GET_USER_PAGES_HAS_ARGS_WRITE_FORCE)
return get_user_pages(start, nr_pages, write, force, pages, vmas);
#if defined(NV_GET_USER_PAGES_HAS_ARGS_WRITE_FORCE_VMAS)
return get_user_pages(start, nr_pages, write, force, pages, NULL);
#else
// NV_GET_USER_PAGES_HAS_ARGS_TSK_WRITE_FORCE
// NV_GET_USER_PAGES_HAS_ARGS_TSK_WRITE_FORCE_VMAS
return get_user_pages(current, current->mm, start, nr_pages, write,
force, pages, vmas);
#endif // NV_GET_USER_PAGES_HAS_ARGS_WRITE_FORCE
force, pages, NULL);
#endif // NV_GET_USER_PAGES_HAS_ARGS_WRITE_FORCE_VMAS
}
#endif // NV_GET_USER_PAGES_HAS_ARGS_FLAGS
/* pin_user_pages_remote
/*
* pin_user_pages_remote()
*
* pin_user_pages_remote() was added by commit eddb1c228f7951d399240
* ("mm/gup: introduce pin_user_pages*() and FOLL_PIN") in v5.6 (2020-01-30)
* pin_user_pages_remote() was added by commit eddb1c228f79
* ("mm/gup: introduce pin_user_pages*() and FOLL_PIN") in v5.6.
*
* pin_user_pages_remote() removed 'tsk' parameter by commit
* 64019a2e467a ("mm/gup: remove task_struct pointer for all gup code")
* in v5.9-rc1 (2020-08-11). *
* 64019a2e467a ("mm/gup: remove task_struct pointer for all gup code")
* in v5.9.
*
* Removed unused vmas parameter from pin_user_pages_remote() by commit
* 0b295316b3a9 ("mm/gup: remove unused vmas parameter from
* pin_user_pages_remote()") in v6.5.
*
*/
#if defined(NV_PIN_USER_PAGES_REMOTE_PRESENT)
#if defined (NV_PIN_USER_PAGES_REMOTE_HAS_ARGS_TSK)
#define NV_PIN_USER_PAGES_REMOTE(mm, start, nr_pages, flags, pages, vmas, locked) \
pin_user_pages_remote(NULL, mm, start, nr_pages, flags, pages, vmas, locked)
#if defined(NV_PIN_USER_PAGES_REMOTE_HAS_ARGS_TSK_VMAS)
#define NV_PIN_USER_PAGES_REMOTE(mm, start, nr_pages, flags, pages, locked) \
pin_user_pages_remote(NULL, mm, start, nr_pages, flags, pages, NULL, locked)
#elif defined(NV_PIN_USER_PAGES_REMOTE_HAS_ARGS_VMAS)
#define NV_PIN_USER_PAGES_REMOTE(mm, start, nr_pages, flags, pages, locked) \
pin_user_pages_remote(mm, start, nr_pages, flags, pages, NULL, locked)
#else
#define NV_PIN_USER_PAGES_REMOTE pin_user_pages_remote
#endif // NV_PIN_USER_PAGES_REMOTE_HAS_ARGS_TSK
#endif // NV_PIN_USER_PAGES_REMOTE_HAS_ARGS_TSK_VMAS
#else
#define NV_PIN_USER_PAGES_REMOTE NV_GET_USER_PAGES_REMOTE
#endif // NV_PIN_USER_PAGES_REMOTE_PRESENT
/*
* get_user_pages_remote() was added by commit 1e9877902dc7
* ("mm/gup: Introduce get_user_pages_remote()") in v4.6 (2016-02-12).
* ("mm/gup: Introduce get_user_pages_remote()") in v4.6.
*
* Note that get_user_pages_remote() requires the caller to hold a reference on
* the task_struct (if non-NULL and if this API has tsk argument) and the mm_struct.
@@ -125,15 +157,17 @@ typedef int vm_fault_t;
*
* get_user_pages_remote() write/force parameters were replaced
* with gup_flags by commit 9beae1ea8930 ("mm: replace get_user_pages_remote()
* write/force parameters with gup_flags") in v4.9 (2016-10-13).
* write/force parameters with gup_flags") in v4.9.
*
* get_user_pages_remote() added 'locked' parameter by commit 5b56d49fc31d
* ("mm: add locked parameter to get_user_pages_remote()") in
* v4.10 (2016-12-14).
* ("mm: add locked parameter to get_user_pages_remote()") in v4.10.
*
* get_user_pages_remote() removed 'tsk' parameter by
* commit 64019a2e467a ("mm/gup: remove task_struct pointer for
* all gup code") in v5.9-rc1 (2020-08-11).
* all gup code") in v5.9.
*
* Removed vmas parameter from get_user_pages_remote() by commit ca5e863233e8
* ("mm/gup: remove vmas parameter from get_user_pages_remote()") in v6.5.
*
*/
@@ -141,51 +175,53 @@ typedef int vm_fault_t;
#if defined(NV_GET_USER_PAGES_REMOTE_HAS_ARGS_FLAGS_LOCKED)
#define NV_GET_USER_PAGES_REMOTE get_user_pages_remote
#elif defined(NV_GET_USER_PAGES_REMOTE_HAS_ARGS_TSK_FLAGS_LOCKED)
#define NV_GET_USER_PAGES_REMOTE(mm, start, nr_pages, flags, pages, vmas, locked) \
get_user_pages_remote(NULL, mm, start, nr_pages, flags, pages, vmas, locked)
#elif defined(NV_GET_USER_PAGES_REMOTE_HAS_ARGS_FLAGS_LOCKED_VMAS)
#define NV_GET_USER_PAGES_REMOTE(mm, start, nr_pages, flags, pages, locked) \
get_user_pages_remote(mm, start, nr_pages, flags, pages, NULL, locked)
#elif defined(NV_GET_USER_PAGES_REMOTE_HAS_ARGS_TSK_FLAGS)
#define NV_GET_USER_PAGES_REMOTE(mm, start, nr_pages, flags, pages, vmas, locked) \
get_user_pages_remote(NULL, mm, start, nr_pages, flags, pages, vmas)
#elif defined(NV_GET_USER_PAGES_REMOTE_HAS_ARGS_TSK_FLAGS_LOCKED_VMAS)
#define NV_GET_USER_PAGES_REMOTE(mm, start, nr_pages, flags, pages, locked) \
get_user_pages_remote(NULL, mm, start, nr_pages, flags, pages, NULL, locked)
#elif defined(NV_GET_USER_PAGES_REMOTE_HAS_ARGS_TSK_FLAGS_VMAS)
#define NV_GET_USER_PAGES_REMOTE(mm, start, nr_pages, flags, pages, locked) \
get_user_pages_remote(NULL, mm, start, nr_pages, flags, pages, NULL)
#else
// NV_GET_USER_PAGES_REMOTE_HAS_ARGS_TSK_WRITE_FORCE
// NV_GET_USER_PAGES_REMOTE_HAS_ARGS_TSK_WRITE_FORCE_VMAS
static inline long NV_GET_USER_PAGES_REMOTE(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
unsigned int flags,
struct page **pages,
struct vm_area_struct **vmas,
int *locked)
{
int write = flags & FOLL_WRITE;
int force = flags & FOLL_FORCE;
return get_user_pages_remote(NULL, mm, start, nr_pages, write, force,
pages, vmas);
pages, NULL);
}
#endif // NV_GET_USER_PAGES_REMOTE_HAS_ARGS_FLAGS_LOCKED
#else
#if defined(NV_GET_USER_PAGES_HAS_ARGS_TSK_WRITE_FORCE)
#if defined(NV_GET_USER_PAGES_HAS_ARGS_TSK_WRITE_FORCE_VMAS)
static inline long NV_GET_USER_PAGES_REMOTE(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
unsigned int flags,
struct page **pages,
struct vm_area_struct **vmas,
int *locked)
{
int write = flags & FOLL_WRITE;
int force = flags & FOLL_FORCE;
return get_user_pages(NULL, mm, start, nr_pages, write, force, pages, vmas);
return get_user_pages(NULL, mm, start, nr_pages, write, force, pages, NULL);
}
#else
#define NV_GET_USER_PAGES_REMOTE(mm, start, nr_pages, flags, pages, vmas, locked) \
get_user_pages(NULL, mm, start, nr_pages, flags, pages, vmas)
#endif // NV_GET_USER_PAGES_HAS_ARGS_TSK_WRITE_FORCE
#define NV_GET_USER_PAGES_REMOTE(mm, start, nr_pages, flags, pages, locked) \
get_user_pages(NULL, mm, start, nr_pages, flags, pages, NULL)
#endif // NV_GET_USER_PAGES_HAS_ARGS_TSK_WRITE_FORCE_VMAS
#endif // NV_GET_USER_PAGES_REMOTE_PRESENT
/*

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2015 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2015-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -60,6 +60,7 @@ static inline pgprot_t pgprot_modify_writecombine(pgprot_t old_prot)
#endif /* !defined(NV_VMWARE) */
#if defined(NVCPU_AARCH64)
extern NvBool nvos_is_chipset_io_coherent(void);
/*
* Don't rely on the kernel's definition of pgprot_noncached(), as on 64-bit
* ARM that's not for system memory, but device memory instead. For I/O cache
@@ -119,6 +120,13 @@ static inline pgprot_t pgprot_modify_writecombine(pgprot_t old_prot)
#define NV_PGPROT_WRITE_COMBINED(old_prot) old_prot
#define NV_PGPROT_READ_ONLY(old_prot) \
__pgprot(pgprot_val((old_prot)) & ~NV_PAGE_RW)
#elif defined(NVCPU_RISCV64)
#define NV_PGPROT_WRITE_COMBINED_DEVICE(old_prot) \
pgprot_writecombine(old_prot)
/* Don't attempt to mark sysmem pages as write combined on riscv */
#define NV_PGPROT_WRITE_COMBINED(old_prot) old_prot
#define NV_PGPROT_READ_ONLY(old_prot) \
__pgprot(pgprot_val((old_prot)) & ~_PAGE_WRITE)
#else
/* Writecombine is not supported */
#undef NV_PGPROT_WRITE_COMBINED_DEVICE(old_prot)

View File

@@ -92,6 +92,24 @@ typedef struct file_operations nv_proc_ops_t;
#endif
#define NV_DEFINE_SINGLE_PROCFS_FILE_HELPER(name, lock) \
static ssize_t nv_procfs_read_lock_##name( \
struct file *file, \
char __user *buf, \
size_t size, \
loff_t *ppos \
) \
{ \
int ret; \
ret = nv_down_read_interruptible(&lock); \
if (ret < 0) \
{ \
return ret; \
} \
size = seq_read(file, buf, size, ppos); \
up_read(&lock); \
return size; \
} \
\
static int nv_procfs_open_##name( \
struct inode *inode, \
struct file *filep \
@@ -104,11 +122,6 @@ typedef struct file_operations nv_proc_ops_t;
{ \
return ret; \
} \
ret = nv_down_read_interruptible(&lock); \
if (ret < 0) \
{ \
single_release(inode, filep); \
} \
return ret; \
} \
\
@@ -117,7 +130,6 @@ typedef struct file_operations nv_proc_ops_t;
struct file *filep \
) \
{ \
up_read(&lock); \
return single_release(inode, filep); \
}
@@ -127,46 +139,7 @@ typedef struct file_operations nv_proc_ops_t;
static const nv_proc_ops_t nv_procfs_##name##_fops = { \
NV_PROC_OPS_SET_OWNER() \
.NV_PROC_OPS_OPEN = nv_procfs_open_##name, \
.NV_PROC_OPS_READ = seq_read, \
.NV_PROC_OPS_LSEEK = seq_lseek, \
.NV_PROC_OPS_RELEASE = nv_procfs_release_##name, \
};
#define NV_DEFINE_SINGLE_PROCFS_FILE_READ_WRITE(name, lock, \
write_callback) \
NV_DEFINE_SINGLE_PROCFS_FILE_HELPER(name, lock) \
\
static ssize_t nv_procfs_write_##name( \
struct file *file, \
const char __user *buf, \
size_t size, \
loff_t *ppos \
) \
{ \
ssize_t ret; \
struct seq_file *s; \
\
s = file->private_data; \
if (s == NULL) \
{ \
return -EIO; \
} \
\
ret = write_callback(s, buf + *ppos, size - *ppos); \
if (ret == 0) \
{ \
/* avoid infinite loop */ \
ret = -EIO; \
} \
return ret; \
} \
\
static const nv_proc_ops_t nv_procfs_##name##_fops = { \
NV_PROC_OPS_SET_OWNER() \
.NV_PROC_OPS_OPEN = nv_procfs_open_##name, \
.NV_PROC_OPS_READ = seq_read, \
.NV_PROC_OPS_WRITE = nv_procfs_write_##name, \
.NV_PROC_OPS_READ = nv_procfs_read_lock_##name, \
.NV_PROC_OPS_LSEEK = seq_lseek, \
.NV_PROC_OPS_RELEASE = nv_procfs_release_##name, \
};

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 1999-2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 1999-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -25,10 +25,8 @@
#define _NV_PROTO_H_
#include "nv-pci.h"
#include "nv-register-module.h"
extern const char *nv_device_name;
extern nvidia_module_t nv_fops;
void nv_acpi_register_notifier (nv_linux_state_t *);
void nv_acpi_unregister_notifier (nv_linux_state_t *);
@@ -86,8 +84,11 @@ void nv_shutdown_adapter(nvidia_stack_t *, nv_state_t *, nv_linux_state
void nv_dev_free_stacks(nv_linux_state_t *);
NvBool nv_lock_init_locks(nvidia_stack_t *, nv_state_t *);
void nv_lock_destroy_locks(nvidia_stack_t *, nv_state_t *);
void nv_linux_add_device_locked(nv_linux_state_t *);
int nv_linux_add_device_locked(nv_linux_state_t *);
void nv_linux_remove_device_locked(nv_linux_state_t *);
NvBool nv_acpi_power_resource_method_present(struct pci_dev *);
int nv_linux_init_open_q(nv_linux_state_t *);
void nv_linux_stop_open_q(nv_linux_state_t *);
#endif /* _NV_PROTO_H_ */

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 1999-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 1999-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -42,6 +42,7 @@
#include <nv-caps.h>
#include <nv-firmware.h>
#include <nv-ioctl.h>
#include <nv-ioctl-numa.h>
#include <nvmisc.h>
extern nv_cap_t *nvidia_caps_root;
@@ -50,9 +51,6 @@ extern const NvBool nv_is_rm_firmware_supported_os;
#include <nv-kernel-interface-api.h>
/* NVIDIA's reserved major character device number (Linux). */
#define NV_MAJOR_DEVICE_NUMBER 195
#define GPU_UUID_LEN (16)
/*
@@ -223,7 +221,6 @@ typedef struct
#define NV_RM_PAGE_MASK (NV_RM_PAGE_SIZE - 1)
#define NV_RM_TO_OS_PAGE_SHIFT (os_page_shift - NV_RM_PAGE_SHIFT)
#define NV_RM_PAGES_PER_OS_PAGE (1U << NV_RM_TO_OS_PAGE_SHIFT)
#define NV_RM_PAGES_TO_OS_PAGES(count) \
((((NvUPtr)(count)) >> NV_RM_TO_OS_PAGE_SHIFT) + \
((((count) & ((1 << NV_RM_TO_OS_PAGE_SHIFT) - 1)) != 0) ? 1 : 0))
@@ -469,17 +466,9 @@ typedef struct nv_state_t
NvHandle hDisp;
} rmapi;
/* Bool to check if ISO iommu enabled */
NvBool iso_iommu_present;
/* Bool to check if NISO iommu enabled */
NvBool niso_iommu_present;
/* Bool to check if dma-buf is supported */
NvBool dma_buf_supported;
NvBool printed_openrm_enable_unsupported_gpus_error;
/* Check if NVPCF DSM function is implemented under NVPCF or GPU device scope */
NvBool nvpcf_dsm_in_gpu_scope;
@@ -488,6 +477,22 @@ typedef struct nv_state_t
/* Bool to check if the GPU has a coherent sysmem link */
NvBool coherent;
/*
* NUMA node ID of the CPU to which the GPU is attached.
* Holds NUMA_NO_NODE on platforms that don't support NUMA configuration.
*/
NvS32 cpu_numa_node_id;
struct {
/* Bool to check if ISO iommu enabled */
NvBool iso_iommu_present;
/* Bool to check if NISO iommu enabled */
NvBool niso_iommu_present;
/* Display SMMU Stream IDs */
NvU32 dispIsoStreamId;
NvU32 dispNisoStreamId;
} iommus;
} nv_state_t;
// These define need to be in sync with defines in system.h
@@ -505,11 +510,18 @@ struct nv_file_private_t
NvHandle *handles;
NvU16 maxHandles;
NvU32 deviceInstance;
NvU32 gpuInstanceId;
NvU8 metadata[64];
nv_file_private_t *ctl_nvfp;
void *ctl_nvfp_priv;
NvU32 register_or_refcount;
//
// True if a client or an event was ever allocated on this fd.
// If false, RMAPI cleanup is skipped.
//
NvBool bCleanupRmapi;
};
// Forward define the gpu ops structures
@@ -597,6 +609,15 @@ typedef enum
NV_POWER_STATE_RUNNING
} nv_power_state_t;
typedef struct
{
const char *vidmem_power_status;
const char *dynamic_power_status;
const char *gc6_support;
const char *gcoff_support;
const char *s0ix_status;
} nv_power_info_t;
#define NV_PRIMARY_VGA(nv) ((nv)->primary_vga)
#define NV_IS_CTL_DEVICE(nv) ((nv)->flags & NV_FLAG_CONTROL)
@@ -609,11 +630,19 @@ typedef enum
#define NV_IS_DEVICE_IN_SURPRISE_REMOVAL(nv) \
(((nv)->flags & NV_FLAG_IN_SURPRISE_REMOVAL) != 0)
/*
* For console setup by EFI GOP, the base address is BAR1.
* For console setup by VBIOS, the base address is BAR2 + 16MB.
*/
#define NV_IS_CONSOLE_MAPPED(nv, addr) \
(((addr) == (nv)->bars[NV_GPU_BAR_INDEX_FB].cpu_address) || \
((addr) == ((nv)->bars[NV_GPU_BAR_INDEX_IMEM].cpu_address + 0x1000000)))
#define NV_SOC_IS_ISO_IOMMU_PRESENT(nv) \
((nv)->iso_iommu_present)
((nv)->iommus.iso_iommu_present)
#define NV_SOC_IS_NISO_IOMMU_PRESENT(nv) \
((nv)->niso_iommu_present)
((nv)->iommus.niso_iommu_present)
/*
* GPU add/remove events
*/
@@ -758,8 +787,8 @@ nv_state_t* NV_API_CALL nv_get_ctl_state (void);
void NV_API_CALL nv_set_dma_address_size (nv_state_t *, NvU32 );
NV_STATUS NV_API_CALL nv_alias_pages (nv_state_t *, NvU32, NvU32, NvU32, NvU64, NvU64 *, void **);
NV_STATUS NV_API_CALL nv_alloc_pages (nv_state_t *, NvU32, NvBool, NvU32, NvBool, NvBool, NvS32, NvU64 *, void **);
NV_STATUS NV_API_CALL nv_alias_pages (nv_state_t *, NvU32, NvU64, NvU32, NvU32, NvU64, NvU64 *, void **);
NV_STATUS NV_API_CALL nv_alloc_pages (nv_state_t *, NvU32, NvU64, NvBool, NvU32, NvBool, NvBool, NvS32, NvU64 *, void **);
NV_STATUS NV_API_CALL nv_free_pages (nv_state_t *, NvU32, NvBool, NvU32, void *);
NV_STATUS NV_API_CALL nv_register_user_pages (nv_state_t *, NvU64, NvU64 *, void *, void **);
@@ -776,8 +805,6 @@ NV_STATUS NV_API_CALL nv_register_phys_pages (nv_state_t *, NvU64 *, NvU64,
void NV_API_CALL nv_unregister_phys_pages (nv_state_t *, void *);
NV_STATUS NV_API_CALL nv_dma_map_sgt (nv_dma_device_t *, NvU64, NvU64 *, NvU32, void **);
NV_STATUS NV_API_CALL nv_dma_map_pages (nv_dma_device_t *, NvU64, NvU64 *, NvBool, NvU32, void **);
NV_STATUS NV_API_CALL nv_dma_unmap_pages (nv_dma_device_t *, NvU64, NvU64 *, void **);
NV_STATUS NV_API_CALL nv_dma_map_alloc (nv_dma_device_t *, NvU64, NvU64 *, NvBool, void **);
NV_STATUS NV_API_CALL nv_dma_unmap_alloc (nv_dma_device_t *, NvU64, NvU64 *, void **);
@@ -804,6 +831,7 @@ void NV_API_CALL nv_acpi_methods_init (NvU32 *);
void NV_API_CALL nv_acpi_methods_uninit (void);
NV_STATUS NV_API_CALL nv_acpi_method (NvU32, NvU32, NvU32, void *, NvU16, NvU32 *, void *, NvU16 *);
NV_STATUS NV_API_CALL nv_acpi_d3cold_dsm_for_upstream_port (nv_state_t *, NvU8 *, NvU32, NvU32, NvU32 *);
NV_STATUS NV_API_CALL nv_acpi_dsm_method (nv_state_t *, NvU8 *, NvU32, NvBool, NvU32, void *, NvU16, NvU32 *, void *, NvU16 *);
NV_STATUS NV_API_CALL nv_acpi_ddc_method (nv_state_t *, void *, NvU32 *, NvBool);
NV_STATUS NV_API_CALL nv_acpi_dod_method (nv_state_t *, NvU32 *, NvU32 *);
@@ -827,7 +855,7 @@ void NV_API_CALL nv_put_firmware(const void *);
nv_file_private_t* NV_API_CALL nv_get_file_private(NvS32, NvBool, void **);
void NV_API_CALL nv_put_file_private(void *);
NV_STATUS NV_API_CALL nv_get_device_memory_config(nv_state_t *, NvU64 *, NvU64 *, NvU32 *, NvS32 *);
NV_STATUS NV_API_CALL nv_get_device_memory_config(nv_state_t *, NvU64 *, NvU64 *, NvU64 *, NvU32 *, NvS32 *);
NV_STATUS NV_API_CALL nv_get_egm_info(nv_state_t *, NvU64 *, NvU64 *, NvS32 *);
NV_STATUS NV_API_CALL nv_get_ibmnpu_genreg_info(nv_state_t *, NvU64 *, NvU64 *, void**);
@@ -868,15 +896,17 @@ NvBool NV_API_CALL nv_match_gpu_os_info(nv_state_t *, void *);
NvU32 NV_API_CALL nv_get_os_type(void);
void NV_API_CALL nv_get_updated_emu_seg(NvU32 *start, NvU32 *end);
void NV_API_CALL nv_get_screen_info(nv_state_t *, NvU64 *, NvU32 *, NvU32 *, NvU32 *, NvU32 *, NvU64 *);
struct dma_buf;
typedef struct nv_dma_buf nv_dma_buf_t;
struct drm_gem_object;
NV_STATUS NV_API_CALL nv_dma_import_sgt (nv_dma_device_t *, struct sg_table *, struct drm_gem_object *);
void NV_API_CALL nv_dma_release_sgt(struct sg_table *, struct drm_gem_object *);
NV_STATUS NV_API_CALL nv_dma_import_dma_buf (nv_dma_device_t *, struct dma_buf *, NvU32 *, void **, struct sg_table **, nv_dma_buf_t **);
NV_STATUS NV_API_CALL nv_dma_import_from_fd (nv_dma_device_t *, NvS32, NvU32 *, void **, struct sg_table **, nv_dma_buf_t **);
void NV_API_CALL nv_dma_release_dma_buf (void *, nv_dma_buf_t *);
NV_STATUS NV_API_CALL nv_dma_import_dma_buf (nv_dma_device_t *, struct dma_buf *, NvU32 *, struct sg_table **, nv_dma_buf_t **);
NV_STATUS NV_API_CALL nv_dma_import_from_fd (nv_dma_device_t *, NvS32, NvU32 *, struct sg_table **, nv_dma_buf_t **);
void NV_API_CALL nv_dma_release_dma_buf (nv_dma_buf_t *);
void NV_API_CALL nv_schedule_uvm_isr (nv_state_t *);
@@ -892,6 +922,8 @@ typedef void (*nvTegraDceClientIpcCallback)(NvU32, NvU32, NvU32, void *, void *)
NV_STATUS NV_API_CALL nv_get_num_phys_pages (void *, NvU32 *);
NV_STATUS NV_API_CALL nv_get_phys_pages (void *, void *, NvU32 *);
void NV_API_CALL nv_get_disp_smmu_stream_ids (nv_state_t *, NvU32 *, NvU32 *);
/*
* ---------------------------------------------------------------------------
*
@@ -918,6 +950,7 @@ NV_STATUS NV_API_CALL rm_ioctl (nvidia_stack_t *, nv_state_t *
NvBool NV_API_CALL rm_isr (nvidia_stack_t *, nv_state_t *, NvU32 *);
void NV_API_CALL rm_isr_bh (nvidia_stack_t *, nv_state_t *);
void NV_API_CALL rm_isr_bh_unlocked (nvidia_stack_t *, nv_state_t *);
NvBool NV_API_CALL rm_is_msix_allowed (nvidia_stack_t *, nv_state_t *);
NV_STATUS NV_API_CALL rm_power_management (nvidia_stack_t *, nv_state_t *, nv_pm_action_t);
NV_STATUS NV_API_CALL rm_stop_user_channels (nvidia_stack_t *, nv_state_t *);
NV_STATUS NV_API_CALL rm_restart_user_channels (nvidia_stack_t *, nv_state_t *);
@@ -937,6 +970,7 @@ void NV_API_CALL rm_parse_option_string (nvidia_stack_t *, const char *
char* NV_API_CALL rm_remove_spaces (const char *);
char* NV_API_CALL rm_string_token (char **, const char);
void NV_API_CALL rm_vgpu_vfio_set_driver_vm(nvidia_stack_t *, NvBool);
NV_STATUS NV_API_CALL rm_get_adapter_status_external(nvidia_stack_t *, nv_state_t *);
NV_STATUS NV_API_CALL rm_run_rc_callback (nvidia_stack_t *, nv_state_t *);
void NV_API_CALL rm_execute_work_item (nvidia_stack_t *, void *);
@@ -959,21 +993,23 @@ NV_STATUS NV_API_CALL rm_perform_version_check (nvidia_stack_t *, void *, NvU
void NV_API_CALL rm_power_source_change_event (nvidia_stack_t *, NvU32);
void NV_API_CALL rm_request_dnotifier_state (nvidia_stack_t *, nv_state_t *);
void NV_API_CALL rm_disable_gpu_state_persistence (nvidia_stack_t *sp, nv_state_t *);
NV_STATUS NV_API_CALL rm_p2p_init_mapping (nvidia_stack_t *, NvU64, NvU64 *, NvU64 *, NvU64 *, NvU64 *, NvU64, NvU64, NvU64, NvU64, void (*)(void *), void *);
NV_STATUS NV_API_CALL rm_p2p_destroy_mapping (nvidia_stack_t *, NvU64);
NV_STATUS NV_API_CALL rm_p2p_get_pages (nvidia_stack_t *, NvU64, NvU32, NvU64, NvU64, NvU64 *, NvU32 *, NvU32 *, NvU32 *, NvU8 **, void *);
NV_STATUS NV_API_CALL rm_p2p_get_gpu_info (nvidia_stack_t *, NvU64, NvU64, NvU8 **, void **);
NV_STATUS NV_API_CALL rm_p2p_get_pages_persistent (nvidia_stack_t *, NvU64, NvU64, void **, NvU64 *, NvU32 *, void *, void *);
NV_STATUS NV_API_CALL rm_p2p_get_pages_persistent (nvidia_stack_t *, NvU64, NvU64, void **, NvU64 *, NvU32 *, void *, void *, void **);
NV_STATUS NV_API_CALL rm_p2p_register_callback (nvidia_stack_t *, NvU64, NvU64, NvU64, void *, void (*)(void *), void *);
NV_STATUS NV_API_CALL rm_p2p_put_pages (nvidia_stack_t *, NvU64, NvU32, NvU64, void *);
NV_STATUS NV_API_CALL rm_p2p_put_pages_persistent(nvidia_stack_t *, void *, void *);
NV_STATUS NV_API_CALL rm_p2p_put_pages_persistent(nvidia_stack_t *, void *, void *, void *);
NV_STATUS NV_API_CALL rm_p2p_dma_map_pages (nvidia_stack_t *, nv_dma_device_t *, NvU8 *, NvU64, NvU32, NvU64 *, void **);
NV_STATUS NV_API_CALL rm_dma_buf_dup_mem_handle (nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvHandle, NvHandle, void *, NvHandle, NvU64, NvU64, NvHandle *, void **);
void NV_API_CALL rm_dma_buf_undup_mem_handle(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle);
NV_STATUS NV_API_CALL rm_dma_buf_map_mem_handle (nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvU64, NvU64, void *, nv_phys_addr_range_t **, NvU32 *);
void NV_API_CALL rm_dma_buf_unmap_mem_handle(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvU64, nv_phys_addr_range_t **, NvU32);
NV_STATUS NV_API_CALL rm_dma_buf_get_client_and_device(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle *, NvHandle *, NvHandle *, void **, NvBool *);
NV_STATUS NV_API_CALL rm_dma_buf_get_client_and_device(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvHandle *, NvHandle *, NvHandle *, void **, NvBool *);
void NV_API_CALL rm_dma_buf_put_client_and_device(nvidia_stack_t *, nv_state_t *, NvHandle, NvHandle, NvHandle, void *);
NV_STATUS NV_API_CALL rm_log_gpu_crash (nv_stack_t *, nv_state_t *);
@@ -985,7 +1021,7 @@ NvBool NV_API_CALL rm_gpu_need_4k_page_isolation(nv_state_t *);
NvBool NV_API_CALL rm_is_chipset_io_coherent(nv_stack_t *);
NvBool NV_API_CALL rm_init_event_locks(nvidia_stack_t *, nv_state_t *);
void NV_API_CALL rm_destroy_event_locks(nvidia_stack_t *, nv_state_t *);
NV_STATUS NV_API_CALL rm_get_gpu_numa_info(nvidia_stack_t *, nv_state_t *, NvS32 *, NvU64 *, NvU64 *, NvU64 *, NvU32 *);
NV_STATUS NV_API_CALL rm_get_gpu_numa_info(nvidia_stack_t *, nv_state_t *, nv_ioctl_numa_info_t *);
NV_STATUS NV_API_CALL rm_gpu_numa_online(nvidia_stack_t *, nv_state_t *);
NV_STATUS NV_API_CALL rm_gpu_numa_offline(nvidia_stack_t *, nv_state_t *);
NvBool NV_API_CALL rm_is_device_sequestered(nvidia_stack_t *, nv_state_t *);
@@ -1000,10 +1036,8 @@ void NV_API_CALL rm_cleanup_dynamic_power_management(nvidia_stack_t *, nv_
void NV_API_CALL rm_enable_dynamic_power_management(nvidia_stack_t *, nv_state_t *);
NV_STATUS NV_API_CALL rm_ref_dynamic_power(nvidia_stack_t *, nv_state_t *, nv_dynamic_power_mode_t);
void NV_API_CALL rm_unref_dynamic_power(nvidia_stack_t *, nv_state_t *, nv_dynamic_power_mode_t);
NV_STATUS NV_API_CALL rm_transition_dynamic_power(nvidia_stack_t *, nv_state_t *, NvBool);
const char* NV_API_CALL rm_get_vidmem_power_status(nvidia_stack_t *, nv_state_t *);
const char* NV_API_CALL rm_get_dynamic_power_management_status(nvidia_stack_t *, nv_state_t *);
const char* NV_API_CALL rm_get_gpu_gcx_support(nvidia_stack_t *, nv_state_t *, NvBool);
NV_STATUS NV_API_CALL rm_transition_dynamic_power(nvidia_stack_t *, nv_state_t *, NvBool, NvBool *);
void NV_API_CALL rm_get_power_info(nvidia_stack_t *, nv_state_t *, nv_power_info_t *);
void NV_API_CALL rm_acpi_notify(nvidia_stack_t *, nv_state_t *, NvU32);
void NV_API_CALL rm_acpi_nvpcf_notify(nvidia_stack_t *);
@@ -1015,12 +1049,12 @@ NV_STATUS NV_API_CALL nv_vgpu_create_request(nvidia_stack_t *, nv_state_t *, c
NV_STATUS NV_API_CALL nv_vgpu_delete(nvidia_stack_t *, const NvU8 *, NvU16);
NV_STATUS NV_API_CALL nv_vgpu_get_type_ids(nvidia_stack_t *, nv_state_t *, NvU32 *, NvU32 *, NvBool, NvU8, NvBool);
NV_STATUS NV_API_CALL nv_vgpu_get_type_info(nvidia_stack_t *, nv_state_t *, NvU32, char *, int, NvU8);
NV_STATUS NV_API_CALL nv_vgpu_get_bar_info(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 *, NvU32, void *);
NV_STATUS NV_API_CALL nv_vgpu_start(nvidia_stack_t *, const NvU8 *, void *, NvS32 *, NvU8 *, NvU32);
NV_STATUS NV_API_CALL nv_vgpu_get_sparse_mmap(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 **, NvU64 **, NvU32 *);
NV_STATUS NV_API_CALL nv_vgpu_get_bar_info(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 *,
NvU64 *, NvU64 *, NvU32 *, NvBool *, NvU8 *);
NV_STATUS NV_API_CALL nv_vgpu_get_hbm_info(nvidia_stack_t *, nv_state_t *, const NvU8 *, NvU64 *, NvU64 *);
NV_STATUS NV_API_CALL nv_vgpu_process_vf_info(nvidia_stack_t *, nv_state_t *, NvU8, NvU32, NvU8, NvU8, NvU8, NvBool, void *);
NV_STATUS NV_API_CALL nv_vgpu_update_request(nvidia_stack_t *, const NvU8 *, NvU32, NvU64 *, NvU64 *, const char *);
NV_STATUS NV_API_CALL nv_gpu_bind_event(nvidia_stack_t *);
NV_STATUS NV_API_CALL nv_gpu_unbind_event(nvidia_stack_t *, NvU32, NvBool *);
NV_STATUS NV_API_CALL nv_get_usermap_access_params(nv_state_t*, nv_usermap_access_params_t*);
nv_soc_irq_type_t NV_API_CALL nv_get_current_irq_type(nv_state_t*);

View File

@@ -86,7 +86,7 @@
/* Not currently implemented for MSVC/ARM64. See bug 3366890. */
# define nv_speculation_barrier()
# define speculation_barrier() nv_speculation_barrier()
#elif defined(NVCPU_NVRISCV64) && NVOS_IS_LIBOS
#elif defined(NVCPU_IS_RISCV64)
# define nv_speculation_barrier()
#else
#error "Unknown compiler/chip family"

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2013-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2013-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -62,10 +62,10 @@ typedef struct
/*******************************************************************************
nvUvmInterfaceRegisterGpu
Registers the GPU with the provided UUID for use. A GPU must be registered
before its UUID can be used with any other API. This call is ref-counted so
every nvUvmInterfaceRegisterGpu must be paired with a corresponding
nvUvmInterfaceUnregisterGpu.
Registers the GPU with the provided physical UUID for use. A GPU must be
registered before its UUID can be used with any other API. This call is
ref-counted so every nvUvmInterfaceRegisterGpu must be paired with a
corresponding nvUvmInterfaceUnregisterGpu.
You don't need to call nvUvmInterfaceSessionCreate before calling this.
@@ -79,12 +79,13 @@ NV_STATUS nvUvmInterfaceRegisterGpu(const NvProcessorUuid *gpuUuid, UvmGpuPlatfo
/*******************************************************************************
nvUvmInterfaceUnregisterGpu
Unregisters the GPU with the provided UUID. This drops the ref count from
nvUvmInterfaceRegisterGpu. Once the reference count goes to 0 the device may
no longer be accessible until the next nvUvmInterfaceRegisterGpu call. No
automatic resource freeing is performed, so only make the last unregister
call after destroying all your allocations associated with that UUID (such
as those from nvUvmInterfaceAddressSpaceCreate).
Unregisters the GPU with the provided physical UUID. This drops the ref
count from nvUvmInterfaceRegisterGpu. Once the reference count goes to 0
the device may no longer be accessible until the next
nvUvmInterfaceRegisterGpu call. No automatic resource freeing is performed,
so only make the last unregister call after destroying all your allocations
associated with that UUID (such as those from
nvUvmInterfaceAddressSpaceCreate).
If the UUID is not found, no operation is performed.
*/
@@ -121,10 +122,10 @@ NV_STATUS nvUvmInterfaceSessionDestroy(uvmGpuSessionHandle session);
nvUvmInterfaceDeviceCreate
Creates a device object under the given session for the GPU with the given
UUID. Also creates a partition object for the device iff bCreateSmcPartition
is true and pGpuInfo->smcEnabled is true. pGpuInfo->smcUserClientInfo will
be used to determine the SMC partition in this case. A device handle is
returned in the device output parameter.
physical UUID. Also creates a partition object for the device iff
bCreateSmcPartition is true and pGpuInfo->smcEnabled is true.
pGpuInfo->smcUserClientInfo will be used to determine the SMC partition in
this case. A device handle is returned in the device output parameter.
Error codes:
NV_ERR_GENERIC
@@ -161,6 +162,7 @@ void nvUvmInterfaceDeviceDestroy(uvmGpuDeviceHandle device);
NV_STATUS nvUvmInterfaceAddressSpaceCreate(uvmGpuDeviceHandle device,
unsigned long long vaBase,
unsigned long long vaSize,
NvBool enableAts,
uvmGpuAddressSpaceHandle *vaSpace,
UvmGpuAddressSpaceInfo *vaSpaceInfo);
@@ -422,33 +424,6 @@ NV_STATUS nvUvmInterfacePmaPinPages(void *pPma,
NvU64 pageSize,
NvU32 flags);
/*******************************************************************************
nvUvmInterfacePmaUnpinPages
This function will unpin the physical memory allocated using PMA. The pages
passed as input must be already pinned, else this function will return an
error and rollback any change if any page is not previously marked "pinned".
Behaviour is undefined if any blacklisted pages are unpinned.
Arguments:
pPma[IN] - Pointer to PMA object.
pPages[IN] - Array of pointers, containing the PA base
address of each page to be unpinned.
pageCount [IN] - Number of pages required to be unpinned.
pageSize [IN] - Page size of each page to be unpinned.
Error codes:
NV_ERR_INVALID_ARGUMENT - Invalid input arguments.
NV_ERR_GENERIC - Unexpected error. We try hard to avoid
returning this error code as is not very
informative.
NV_ERR_NOT_SUPPORTED - Operation not supported on broken FB
*/
NV_STATUS nvUvmInterfacePmaUnpinPages(void *pPma,
NvU64 *pPages,
NvLength pageCount,
NvU64 pageSize);
/*******************************************************************************
nvUvmInterfaceMemoryFree
@@ -638,6 +613,8 @@ NV_STATUS nvUvmInterfaceQueryCopyEnginesCaps(uvmGpuDeviceHandle device,
nvUvmInterfaceGetGpuInfo
Return various gpu info, refer to the UvmGpuInfo struct for details.
The input UUID is for the physical GPU and the pGpuClientInfo identifies
the SMC partition if SMC is enabled and the partition exists.
If no gpu matching the uuid is found, an error will be returned.
On Ampere+ GPUs, pGpuClientInfo contains SMC information provided by the
@@ -645,6 +622,9 @@ NV_STATUS nvUvmInterfaceQueryCopyEnginesCaps(uvmGpuDeviceHandle device,
Error codes:
NV_ERR_GENERIC
NV_ERR_NO_MEMORY
NV_ERR_GPU_UUID_NOT_FOUND
NV_ERR_INSUFFICIENT_PERMISSIONS
NV_ERR_INSUFFICIENT_RESOURCES
*/
NV_STATUS nvUvmInterfaceGetGpuInfo(const NvProcessorUuid *gpuUuid,
@@ -857,7 +837,7 @@ NV_STATUS nvUvmInterfaceGetEccInfo(uvmGpuDeviceHandle device,
UVM GPU UNLOCK
Arguments:
gpuUuid[IN] - UUID of the GPU to operate on
device[IN] - Device handle associated with the gpu
bOwnInterrupts - Set to NV_TRUE for UVM to take ownership of the
replayable page fault interrupts. Set to NV_FALSE
to return ownership of the page fault interrupts
@@ -973,14 +953,45 @@ NV_STATUS nvUvmInterfaceGetNonReplayableFaults(UvmGpuFaultInfo *pFaultInfo,
NOTES:
- This function DOES NOT acquire the RM API or GPU locks. That is because
it is called during fault servicing, which could produce deadlocks.
- This function should not be called when interrupts are disabled.
Arguments:
device[IN] - Device handle associated with the gpu
pFaultInfo[IN] - information provided by RM for fault handling.
used for obtaining the device handle without locks.
bCopyAndFlush[IN] - Instructs RM to perform the flush in the Copy+Flush mode.
In this mode, RM will perform a copy of the packets from
the HW buffer to UVM's SW buffer as part of performing
the flush. This mode gives UVM the opportunity to observe
the packets contained within the HW buffer at the time
of issuing the call.
Error codes:
NV_ERR_INVALID_ARGUMENT
*/
NV_STATUS nvUvmInterfaceFlushReplayableFaultBuffer(uvmGpuDeviceHandle device);
NV_STATUS nvUvmInterfaceFlushReplayableFaultBuffer(UvmGpuFaultInfo *pFaultInfo,
NvBool bCopyAndFlush);
/*******************************************************************************
nvUvmInterfaceTogglePrefetchFaults
This function sends an RPC to GSP in order to toggle the prefetch fault PRI.
NOTES:
- This function DOES NOT acquire the RM API or GPU locks. That is because
it is called during fault servicing, which could produce deadlocks.
- This function should not be called when interrupts are disabled.
Arguments:
pFaultInfo[IN] - Information provided by RM for fault handling.
Used for obtaining the device handle without locks.
bEnable[IN] - Instructs RM whether to toggle generating faults on
prefetch on/off.
Error codes:
NV_ERR_INVALID_ARGUMENT
*/
NV_STATUS nvUvmInterfaceTogglePrefetchFaults(UvmGpuFaultInfo *pFaultInfo,
NvBool bEnable);
/*******************************************************************************
nvUvmInterfaceInitAccessCntrInfo
@@ -1087,7 +1098,8 @@ void nvUvmInterfaceDeRegisterUvmOps(void);
Error codes:
NV_ERR_INVALID_ARGUMENT
NV_ERR_OBJECT_NOT_FOUND : If device object associated with the uuids aren't found.
NV_ERR_OBJECT_NOT_FOUND : If device object associated with the device
handles isn't found.
*/
NV_STATUS nvUvmInterfaceP2pObjectCreate(uvmGpuDeviceHandle device1,
uvmGpuDeviceHandle device2,
@@ -1140,6 +1152,8 @@ void nvUvmInterfaceP2pObjectDestroy(uvmGpuSessionHandle session,
NV_ERR_NOT_READY - Returned when querying the PTEs requires a deferred setup
which has not yet completed. It is expected that the caller
will reattempt the call until a different code is returned.
As an example, multi-node systems which require querying
PTEs from the Fabric Manager may return this code.
*/
NV_STATUS nvUvmInterfaceGetExternalAllocPtes(uvmGpuAddressSpaceHandle vaSpace,
NvHandle hMemory,
@@ -1449,18 +1463,30 @@ NV_STATUS nvUvmInterfacePagingChannelPushStream(UvmGpuPagingChannelHandle channe
NvU32 methodStreamSize);
/*******************************************************************************
CSL Interface and Locking
nvUvmInterfaceKeyRotationChannelDisable
The following functions do not acquire the RM API or GPU locks and must not be called
concurrently with the same UvmCslContext parameter in different threads. The caller must
guarantee this exclusion.
This function notifies RM that the given channels are idle.
* nvUvmInterfaceCslLogDeviceEncryption
* nvUvmInterfaceCslRotateIv
* nvUvmInterfaceCslEncrypt
* nvUvmInterfaceCslDecrypt
* nvUvmInterfaceCslSign
* nvUvmInterfaceCslQueryMessagePool
This function is called after RM has notified UVM that keys need to be rotated.
When called RM will disable the channels, rotate their keys, and then re-enable
the channels.
Locking: This function acquires an API and GPU lock.
Memory : This function dynamically allocates memory.
Arguments:
channelList[IN] - An array of channel handles whose channels are idle.
channelListCount[IN] - Number of channels in channelList. Its value must be
greater than 0.
Error codes:
NV_ERR_INVALID_ARGUMENT - channelList is NULL or channeListCount is 0.
*/
NV_STATUS nvUvmInterfaceKeyRotationChannelDisable(uvmGpuChannelHandle channelList[],
NvU32 channeListCount);
/*******************************************************************************
Cryptography Services Library (CSL) Interface
*/
/*******************************************************************************
@@ -1471,8 +1497,11 @@ NV_STATUS nvUvmInterfacePagingChannelPushStream(UvmGpuPagingChannelHandle channe
The lifetime of the context is the same as the lifetime of the secure channel
it is paired with.
Locking: This function acquires an API lock.
Memory : This function dynamically allocates memory.
Arguments:
uvmCslContext[IN/OUT] - The CSL context.
uvmCslContext[IN/OUT] - The CSL context associated with a channel.
channel[IN] - Handle to a secure channel.
Error codes:
@@ -1490,86 +1519,71 @@ NV_STATUS nvUvmInterfaceCslInitContext(UvmCslContext *uvmCslContext,
If context is already deinitialized then function returns immediately.
Locking: This function does not acquire an API or GPU lock.
Memory : This function may free memory.
Arguments:
uvmCslContext[IN] - The CSL context.
uvmCslContext[IN] - The CSL context associated with a channel.
*/
void nvUvmInterfaceDeinitCslContext(UvmCslContext *uvmCslContext);
/*******************************************************************************
nvUvmInterfaceCslLogDeviceEncryption
nvUvmInterfaceCslUpdateContext
Returns an IV that can be later used in the nvUvmInterfaceCslEncrypt
method. The IV contains a "freshness bit" which value is set by this method
and subsequently dirtied by nvUvmInterfaceCslEncrypt to prevent
non-malicious reuse of the IV.
Updates contexts after a key rotation event and can only be called once per
key rotation event. Following a key rotation event, and before
nvUvmInterfaceCslUpdateContext is called, data encrypted by the GPU with the
previous key can be decrypted with nvUvmInterfaceCslDecrypt.
See "CSL Interface and Locking" for locking requirements.
This function does not perform dynamic memory allocation.
Locking: This function acquires an API lock.
Memory : This function does not dynamically allocate memory.
Arguments:
uvmCslContext[IN/OUT] - The CSL context.
encryptIv[OUT] - Parameter that is stored before a successful
device encryption. It is used as an input to
nvUvmInterfaceCslEncrypt.
contextList[IN/OUT] - An array of pointers to CSL contexts.
contextListCount[IN] - Number of CSL contexts in contextList. Its value
must be greater than 0.
Error codes:
NV_ERR_INSUFFICIENT_RESOURCES - New IV would cause a counter to overflow.
NV_ERR_INVALID_ARGUMENT - contextList is NULL or contextListCount is 0.
*/
NV_STATUS nvUvmInterfaceCslAcquireEncryptionIv(UvmCslContext *uvmCslContext,
UvmCslIv *encryptIv);
/*******************************************************************************
nvUvmInterfaceCslLogDeviceEncryption
Logs and checks information about device encryption.
See "CSL Interface and Locking" for locking requirements.
This function does not perform dynamic memory allocation.
Arguments:
uvmCslContext[IN/OUT] - The CSL context.
decryptIv[OUT] - Parameter that is stored before a successful
device encryption. It is used as an input to
nvUvmInterfaceCslDecrypt.
Error codes:
NV_ERR_INSUFFICIENT_RESOURCES - The device encryption would cause a counter
to overflow.
*/
NV_STATUS nvUvmInterfaceCslLogDeviceEncryption(UvmCslContext *uvmCslContext,
UvmCslIv *decryptIv);
NV_STATUS nvUvmInterfaceCslUpdateContext(UvmCslContext *contextList[],
NvU32 contextListCount);
/*******************************************************************************
nvUvmInterfaceCslRotateIv
Rotates the IV for a given channel and direction.
Rotates the IV for a given channel and operation.
This function will rotate the IV on both the CPU and the GPU.
Outstanding messages that have been encrypted by the GPU should first be
decrypted before calling this function with direction equal to
UVM_CSL_DIR_GPU_TO_CPU. Similiarly, outstanding messages that have been
decrypted before calling this function with operation equal to
UVM_CSL_OPERATION_DECRYPT. Similarly, outstanding messages that have been
encrypted by the CPU should first be decrypted before calling this function
with direction equal to UVM_CSL_DIR_CPU_TO_GPU. For a given direction
with operation equal to UVM_CSL_OPERATION_ENCRYPT. For a given operation
the channel must be idle before calling this function. This function can be
called regardless of the value of the IV's message counter.
See "CSL Interface and Locking" for locking requirements.
This function does not perform dynamic memory allocation.
Locking: This function attempts to acquire the GPU lock.
In case of failure to acquire the return code
is NV_ERR_STATE_IN_USE.
Memory : This function does not dynamically allocate memory.
Arguments:
uvmCslContext[IN/OUT] - The CSL context.
direction[IN] - Either
- UVM_CSL_DIR_CPU_TO_GPU
- UVM_CSL_DIR_GPU_TO_CPU
uvmCslContext[IN/OUT] - The CSL context associated with a channel.
operation[IN] - Either
- UVM_CSL_OPERATION_ENCRYPT
- UVM_CSL_OPERATION_DECRYPT
Error codes:
NV_ERR_INSUFFICIENT_RESOURCES - The rotate operation would cause a counter
to overflow.
NV_ERR_INVALID_ARGUMENT - Invalid value for direction.
NV_ERR_STATE_IN_USE - Unable to acquire lock / resource. Caller
can retry at a later time.
NV_ERR_INVALID_ARGUMENT - Invalid value for operation.
NV_ERR_GENERIC - A failure other than _STATE_IN_USE occurred
when attempting to acquire a lock.
*/
NV_STATUS nvUvmInterfaceCslRotateIv(UvmCslContext *uvmCslContext,
UvmCslDirection direction);
UvmCslOperation operation);
/*******************************************************************************
nvUvmInterfaceCslEncrypt
@@ -1580,14 +1594,16 @@ NV_STATUS nvUvmInterfaceCslRotateIv(UvmCslContext *uvmCslContext,
this function produces undefined behavior. Performance is typically
maximized when the input and output buffers are 16-byte aligned. This is
natural alignment for AES block.
The encryptIV can be obtained from nvUvmInterfaceCslAcquireEncryptionIv.
The encryptIV can be obtained from nvUvmInterfaceCslIncrementIv.
However, it is optional. If it is NULL, the next IV in line will be used.
See "CSL Interface and Locking" for locking requirements.
This function does not perform dynamic memory allocation.
Locking: This function does not acquire an API or GPU lock.
If called concurrently in different threads with the same UvmCslContext
the caller must guarantee exclusion.
Memory : This function does not dynamically allocate memory.
Arguments:
uvmCslContext[IN/OUT] - The CSL context.
uvmCslContext[IN/OUT] - The CSL context associated with a channel.
bufferSize[IN] - Size of the input and output buffers in
units of bytes. Value can range from 1 byte
to (2^32) - 1 bytes.
@@ -1598,8 +1614,9 @@ Arguments:
Its size is UVM_CSL_CRYPT_AUTH_TAG_SIZE_BYTES.
Error codes:
NV_ERR_INVALID_ARGUMENT - The size of the data is 0 bytes.
- The encryptIv has already been used.
NV_ERR_INVALID_ARGUMENT - The CSL context is not associated with a channel.
- The size of the data is 0 bytes.
- The encryptIv has already been used.
*/
NV_STATUS nvUvmInterfaceCslEncrypt(UvmCslContext *uvmCslContext,
NvU32 bufferSize,
@@ -1618,17 +1635,25 @@ NV_STATUS nvUvmInterfaceCslEncrypt(UvmCslContext *uvmCslContext,
maximized when the input and output buffers are 16-byte aligned. This is
natural alignment for AES block.
See "CSL Interface and Locking" for locking requirements.
This function does not perform dynamic memory allocation.
Locking: This function does not acquire an API or GPU lock.
If called concurrently in different threads with the same UvmCslContext
the caller must guarantee exclusion.
Memory : This function does not dynamically allocate memory.
Arguments:
uvmCslContext[IN/OUT] - The CSL context.
bufferSize[IN] - Size of the input and output buffers in
units of bytes. Value can range from 1 byte
to (2^32) - 1 bytes.
decryptIv[IN] - Parameter given by nvUvmInterfaceCslLogDeviceEncryption.
bufferSize[IN] - Size of the input and output buffers in units of bytes.
Value can range from 1 byte to (2^32) - 1 bytes.
decryptIv[IN] - IV used to decrypt the ciphertext. Its value can either be given by
nvUvmInterfaceCslIncrementIv, or, if NULL, the CSL context's
internal counter is used.
inputBuffer[IN] - Address of ciphertext input buffer.
outputBuffer[OUT] - Address of plaintext output buffer.
addAuthData[IN] - Address of the plaintext additional authenticated data used to
calculate the authentication tag. Can be NULL.
addAuthDataSize[IN] - Size of the additional authenticated data in units of bytes.
Value can range from 1 byte to (2^32) - 1 bytes.
This parameter is ignored if addAuthData is NULL.
authTagBuffer[IN] - Address of authentication tag buffer.
Its size is UVM_CSL_CRYPT_AUTH_TAG_SIZE_BYTES.
@@ -1643,6 +1668,8 @@ NV_STATUS nvUvmInterfaceCslDecrypt(UvmCslContext *uvmCslContext,
NvU8 const *inputBuffer,
UvmCslIv const *decryptIv,
NvU8 *outputBuffer,
NvU8 const *addAuthData,
NvU32 addAuthDataSize,
NvU8 const *authTagBuffer);
/*******************************************************************************
@@ -1653,11 +1680,13 @@ NV_STATUS nvUvmInterfaceCslDecrypt(UvmCslContext *uvmCslContext,
Auth and input buffers must not overlap. If they do then calling this function produces
undefined behavior.
See "CSL Interface and Locking" for locking requirements.
This function does not perform dynamic memory allocation.
Locking: This function does not acquire an API or GPU lock.
If called concurrently in different threads with the same UvmCslContext
the caller must guarantee exclusion.
Memory : This function does not dynamically allocate memory.
Arguments:
uvmCslContext[IN/OUT] - The CSL context.
uvmCslContext[IN/OUT] - The CSL context associated with a channel.
bufferSize[IN] - Size of the input buffer in units of bytes.
Value can range from 1 byte to (2^32) - 1 bytes.
inputBuffer[IN] - Address of plaintext input buffer.
@@ -1666,32 +1695,99 @@ NV_STATUS nvUvmInterfaceCslDecrypt(UvmCslContext *uvmCslContext,
Error codes:
NV_ERR_INSUFFICIENT_RESOURCES - The signing operation would cause a counter overflow to occur.
NV_ERR_INVALID_ARGUMENT - The size of the data is 0 bytes.
NV_ERR_INVALID_ARGUMENT - The CSL context is not associated with a channel.
- The size of the data is 0 bytes.
*/
NV_STATUS nvUvmInterfaceCslSign(UvmCslContext *uvmCslContext,
NvU32 bufferSize,
NvU8 const *inputBuffer,
NvU8 *authTagBuffer);
/*******************************************************************************
nvUvmInterfaceCslQueryMessagePool
Returns the number of messages that can be encrypted before the message counter will overflow.
See "CSL Interface and Locking" for locking requirements.
This function does not perform dynamic memory allocation.
Locking: This function does not acquire an API or GPU lock.
Memory : This function does not dynamically allocate memory.
If called concurrently in different threads with the same UvmCslContext
the caller must guarantee exclusion.
Arguments:
uvmCslContext[IN/OUT] - The CSL context.
direction[IN] - Either UVM_CSL_DIR_CPU_TO_GPU or UVM_CSL_DIR_GPU_TO_CPU.
operation[IN] - Either UVM_CSL_OPERATION_ENCRYPT or UVM_CSL_OPERATION_DECRYPT.
messageNum[OUT] - Number of messages left before overflow.
Error codes:
NV_ERR_INVALID_ARGUMENT - The value of the direction parameter is illegal.
NV_ERR_INVALID_ARGUMENT - The value of the operation parameter is illegal.
*/
NV_STATUS nvUvmInterfaceCslQueryMessagePool(UvmCslContext *uvmCslContext,
UvmCslDirection direction,
UvmCslOperation operation,
NvU64 *messageNum);
/*******************************************************************************
nvUvmInterfaceCslIncrementIv
Increments the message counter by the specified amount.
If iv is non-NULL then the incremented value is returned.
If operation is UVM_CSL_OPERATION_ENCRYPT then the returned IV's "freshness" bit is set and
can be used in nvUvmInterfaceCslEncrypt. If operation is UVM_CSL_OPERATION_DECRYPT then
the returned IV can be used in nvUvmInterfaceCslDecrypt.
Locking: This function does not acquire an API or GPU lock.
If called concurrently in different threads with the same UvmCslContext
the caller must guarantee exclusion.
Memory : This function does not dynamically allocate memory.
Arguments:
uvmCslContext[IN/OUT] - The CSL context.
operation[IN] - Either
- UVM_CSL_OPERATION_ENCRYPT
- UVM_CSL_OPERATION_DECRYPT
increment[IN] - The amount by which the IV is incremented. Can be 0.
iv[OUT] - If non-NULL, a buffer to store the incremented IV.
Error codes:
NV_ERR_INVALID_ARGUMENT - The value of the operation parameter is illegal.
NV_ERR_INSUFFICIENT_RESOURCES - Incrementing the message counter would result
in an overflow.
*/
NV_STATUS nvUvmInterfaceCslIncrementIv(UvmCslContext *uvmCslContext,
UvmCslOperation operation,
NvU64 increment,
UvmCslIv *iv);
/*******************************************************************************
nvUvmInterfaceCslLogExternalEncryption
Checks and logs information about non-CSL encryptions, such as those that
originate from the GPU.
For contexts associated with channels, this function does not modify elements of
the UvmCslContext and must be called for each external encryption invocation.
For the context associated with fault buffers, bufferSize can encompass multiple
encryption invocations, and the UvmCslContext will be updated following a key
rotation event.
In either case the IV remains unmodified after this function is called.
Locking: This function does not acquire an API or GPU lock.
Memory : This function does not dynamically allocate memory.
If called concurrently in different threads with the same UvmCslContext
the caller must guarantee exclusion.
Arguments:
uvmCslContext[IN/OUT] - The CSL context.
bufferSize[OUT] - The size of the buffer(s) encrypted by the
external entity in units of bytes.
Error codes:
NV_ERR_INSUFFICIENT_RESOURCES - The device encryption would cause a counter
to overflow.
*/
NV_STATUS nvUvmInterfaceCslLogExternalEncryption(UvmCslContext *uvmCslContext,
NvU32 bufferSize);
#endif // _NV_UVM_INTERFACE_H_

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2014-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2014-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -39,12 +39,12 @@
// are multiple BIG page sizes in RM. These defines are used as flags to "0"
// should be OK when user is not sure which pagesize allocation it wants
//
#define UVM_PAGE_SIZE_DEFAULT 0x0
#define UVM_PAGE_SIZE_4K 0x1000
#define UVM_PAGE_SIZE_64K 0x10000
#define UVM_PAGE_SIZE_128K 0x20000
#define UVM_PAGE_SIZE_2M 0x200000
#define UVM_PAGE_SIZE_512M 0x20000000
#define UVM_PAGE_SIZE_DEFAULT 0x0ULL
#define UVM_PAGE_SIZE_4K 0x1000ULL
#define UVM_PAGE_SIZE_64K 0x10000ULL
#define UVM_PAGE_SIZE_128K 0x20000ULL
#define UVM_PAGE_SIZE_2M 0x200000ULL
#define UVM_PAGE_SIZE_512M 0x20000000ULL
//
// When modifying flags, make sure they are compatible with the mirrored
@@ -104,6 +104,10 @@ typedef struct UvmGpuMemoryInfo_tag
// Out: Set to TRUE, if the allocation is in sysmem.
NvBool sysmem;
// Out: Set to TRUE, if this allocation is treated as EGM.
// sysmem is also TRUE when egm is TRUE.
NvBool egm;
// Out: Set to TRUE, if the allocation is a constructed
// under a Device or Subdevice.
// All permutations of sysmem and deviceDescendant are valid.
@@ -125,6 +129,10 @@ typedef struct UvmGpuMemoryInfo_tag
// Out: Uuid of the GPU to which the allocation belongs.
// This is only valid if deviceDescendant is NV_TRUE.
// When egm is NV_TRUE, this is also the UUID of the GPU
// for which EGM is local.
// If the GPU has SMC enabled, the UUID is the GI UUID.
// Otherwise, it is the UUID for the physical GPU.
// Note: If the allocation is owned by a device in
// an SLI group and the allocation is broadcast
// across the SLI group, this UUID will be any one
@@ -259,6 +267,7 @@ typedef struct UvmGpuChannelInfo_tag
// The errorNotifier is filled out when the channel hits an RC error.
NvNotification *errorNotifier;
NvNotification *keyRotationNotifier;
NvU32 hwRunlistId;
NvU32 hwChannelId;
@@ -284,12 +293,13 @@ typedef struct UvmGpuChannelInfo_tag
// GPU VAs of both GPFIFO and GPPUT are needed in Confidential Computing
// so a channel can be controlled via another channel (SEC2 or WLC/LCIC)
NvU64 gpFifoGpuVa;
NvU64 gpPutGpuVa;
NvU64 gpFifoGpuVa;
NvU64 gpPutGpuVa;
NvU64 gpGetGpuVa;
// GPU VA of work submission offset is needed in Confidential Computing
// so CE channels can ring doorbell of other channels as required for
// WLC/LCIC work submission
NvU64 workSubmissionOffsetGpuVa;
NvU64 workSubmissionOffsetGpuVa;
} UvmGpuChannelInfo;
typedef enum
@@ -320,10 +330,6 @@ typedef struct UvmGpuChannelAllocParams_tag
// The next two fields store UVM_BUFFER_LOCATION values
NvU32 gpFifoLoc;
NvU32 gpPutLoc;
// Allocate the channel as secure. This flag should only be set when
// Confidential Compute is enabled.
NvBool secure;
} UvmGpuChannelAllocParams;
typedef struct UvmGpuPagingChannelAllocParams_tag
@@ -335,7 +341,7 @@ typedef struct UvmGpuPagingChannelAllocParams_tag
// The max number of Copy Engines supported by a GPU.
// The gpu ops build has a static assert that this is the correct number.
#define UVM_COPY_ENGINE_COUNT_MAX 10
#define UVM_COPY_ENGINE_COUNT_MAX 64
typedef struct
{
@@ -367,9 +373,6 @@ typedef struct
// True if the CE can be used for P2P transactions
NvBool p2p:1;
// True if the CE supports encryption
NvBool secure:1;
// Mask of physical CEs assigned to this LCE
//
// The value returned by RM for this field may change when a GPU is
@@ -544,6 +547,10 @@ typedef struct UvmGpuP2PCapsParams_tag
// the GPUs are direct peers.
NvU32 peerIds[2];
// Out: peerId[i] contains gpu[i]'s EGM peer id of gpu[1 - i]. Only defined
// if the GPUs are direct peers and EGM enabled in the system.
NvU32 egmPeerIds[2];
// Out: UVM_LINK_TYPE
NvU32 p2pLink;
@@ -572,8 +579,11 @@ typedef struct UvmPlatformInfo_tag
// Out: ATS (Address Translation Services) is supported
NvBool atsSupported;
// Out: AMD SEV (Secure Encrypted Virtualization) is enabled
NvBool sevEnabled;
// Out: True if HW trusted execution, such as AMD's SEV-SNP or Intel's TDX,
// is enabled in the VM, indicating that Confidential Computing must be
// also enabled in the GPU(s); these two security features are either both
// enabled, or both disabled.
NvBool confComputingEnabled;
} UvmPlatformInfo;
typedef struct UvmGpuClientInfo_tag
@@ -604,7 +614,8 @@ typedef struct UvmGpuInfo_tag
// Printable gpu name
char name[UVM_GPU_NAME_LENGTH];
// Uuid of this gpu
// Uuid of the physical GPU or GI UUID if nvUvmInterfaceGetGpuInfo()
// requested information for a valid SMC partition.
NvProcessorUuid uuid;
// Gpu architecture; NV2080_CTRL_MC_ARCH_INFO_ARCHITECTURE_*
@@ -686,6 +697,16 @@ typedef struct UvmGpuInfo_tag
// to NVSwitch peers.
NvBool connectedToSwitch;
NvU64 nvswitchMemoryWindowStart;
// local EGM properties
// NV_TRUE if EGM is enabled
NvBool egmEnabled;
// Peer ID to reach local EGM when EGM is enabled
NvU8 egmPeerId;
// EGM base address to offset in the GMMU PTE entry for EGM mappings
NvU64 egmBaseAddr;
} UvmGpuInfo;
typedef struct UvmGpuFbInfo_tag
@@ -694,9 +715,10 @@ typedef struct UvmGpuFbInfo_tag
// RM regions that are not registered with PMA either.
NvU64 maxAllocatableAddress;
NvU32 heapSize; // RAM in KB available for user allocations
NvU32 reservedHeapSize; // RAM in KB reserved for internal RM allocation
NvBool bZeroFb; // Zero FB mode enabled.
NvU32 heapSize; // RAM in KB available for user allocations
NvU32 reservedHeapSize; // RAM in KB reserved for internal RM allocation
NvBool bZeroFb; // Zero FB mode enabled.
NvU64 maxVidmemPageSize; // Largest GPU page size to access vidmem.
} UvmGpuFbInfo;
typedef struct UvmGpuEccInfo_tag
@@ -774,14 +796,14 @@ typedef NV_STATUS (*uvmEventResume_t) (void);
/*******************************************************************************
uvmEventStartDevice
This function will be called by the GPU driver once it has finished its
initialization to tell the UVM driver that this GPU has come up.
initialization to tell the UVM driver that this physical GPU has come up.
*/
typedef NV_STATUS (*uvmEventStartDevice_t) (const NvProcessorUuid *pGpuUuidStruct);
/*******************************************************************************
uvmEventStopDevice
This function will be called by the GPU driver to let UVM know that a GPU
is going down.
This function will be called by the GPU driver to let UVM know that a
physical GPU is going down.
*/
typedef NV_STATUS (*uvmEventStopDevice_t) (const NvProcessorUuid *pGpuUuidStruct);
@@ -812,7 +834,7 @@ typedef NV_STATUS (*uvmEventServiceInterrupt_t) (void *pDeviceObject,
/*******************************************************************************
uvmEventIsrTopHalf_t
This function will be called by the GPU driver to let UVM know
that an interrupt has occurred.
that an interrupt has occurred on the given physical GPU.
Returns:
NV_OK if the UVM driver handled the interrupt
@@ -851,6 +873,14 @@ typedef union UvmFaultMetadataPacket_tag
NvU8 _padding[32];
} UvmFaultMetadataPacket;
// This struct shall not be accessed nor modified directly by UVM as it is
// entirely managed by the RM layer
typedef struct UvmCslContext_tag
{
struct ccslContext_t *ctx;
void *nvidia_stack;
} UvmCslContext;
typedef struct UvmGpuFaultInfo_tag
{
struct
@@ -908,10 +938,9 @@ typedef struct UvmGpuFaultInfo_tag
// Confidential Computing is disabled.
UvmFaultMetadataPacket *bufferMetadata;
// Indicates whether UVM owns the replayable fault buffer.
// The value of this field is always NV_TRUE When Confidential Computing
// is disabled.
NvBool bUvmOwnsHwFaultBuffer;
// CSL context used for performing decryption of replayable faults when
// Confidential Computing is enabled.
UvmCslContext cslCtx;
} replayable;
struct
{
@@ -1046,24 +1075,33 @@ typedef UvmGpuPagingChannelInfo gpuPagingChannelInfo;
typedef UvmGpuPagingChannelAllocParams gpuPagingChannelAllocParams;
typedef UvmPmaAllocationOptions gpuPmaAllocationOptions;
// This struct shall not be accessed nor modified directly by UVM as it is
// entirely managed by the RM layer
typedef struct UvmCslContext_tag
{
struct ccslContext_t *ctx;
void *nvidia_stack;
} UvmCslContext;
typedef struct UvmCslIv
{
NvU8 iv[12];
NvU8 fresh;
} UvmCslIv;
typedef enum UvmCslDirection
typedef enum UvmCslOperation
{
UVM_CSL_DIR_CPU_TO_GPU,
UVM_CSL_DIR_GPU_TO_CPU
} UvmCslDirection;
UVM_CSL_OPERATION_ENCRYPT,
UVM_CSL_OPERATION_DECRYPT
} UvmCslOperation;
typedef enum UVM_KEY_ROTATION_STATUS {
// Key rotation complete/not in progress
UVM_KEY_ROTATION_STATUS_IDLE = 0,
// RM is waiting for clients to report their channels are idle for key rotation
UVM_KEY_ROTATION_STATUS_PENDING = 1,
// Key rotation is in progress
UVM_KEY_ROTATION_STATUS_IN_PROGRESS = 2,
// Key rotation timeout failure, RM will RC non-idle channels.
// UVM should never see this status value.
UVM_KEY_ROTATION_STATUS_FAILED_TIMEOUT = 3,
// Key rotation failed because upper threshold was crossed, RM will RC non-idle channels
UVM_KEY_ROTATION_STATUS_FAILED_THRESHOLD = 4,
// Internal RM failure while rotating keys for a certain channel, RM will RC the channel.
UVM_KEY_ROTATION_STATUS_FAILED_ROTATION = 5,
UVM_KEY_ROTATION_STATUS_MAX_COUNT = 6,
} UVM_KEY_ROTATION_STATUS;
#endif // _NV_UVM_TYPES_H_

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2014-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2014-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -45,6 +45,11 @@
#define NVKMS_DEVICE_ID_TEGRA 0x0000ffff
#define NVKMS_MAX_SUPERFRAME_VIEWS 4
#define NVKMS_LOG2_LUT_ARRAY_SIZE 10
#define NVKMS_LUT_ARRAY_SIZE (1 << NVKMS_LOG2_LUT_ARRAY_SIZE)
typedef NvU32 NvKmsDeviceHandle;
typedef NvU32 NvKmsDispHandle;
typedef NvU32 NvKmsConnectorHandle;
@@ -53,6 +58,7 @@ typedef NvU32 NvKmsFrameLockHandle;
typedef NvU32 NvKmsDeferredRequestFifoHandle;
typedef NvU32 NvKmsSwapGroupHandle;
typedef NvU32 NvKmsVblankSyncObjectHandle;
typedef NvU32 NvKmsVblankSemControlHandle;
struct NvKmsSize {
NvU16 width;
@@ -179,6 +185,14 @@ enum NvKmsEventType {
NVKMS_EVENT_TYPE_FLIP_OCCURRED,
};
enum NvKmsFlipResult {
NV_KMS_FLIP_RESULT_SUCCESS = 0, /* Success */
NV_KMS_FLIP_RESULT_INVALID_PARAMS, /* Parameter validation failed */
NV_KMS_FLIP_RESULT_IN_PROGRESS, /* Flip would fail because an outstanding
flip containing changes that cannot be
queued is in progress */
};
typedef enum {
NV_EVO_SCALER_1TAP = 0,
NV_EVO_SCALER_2TAPS = 1,
@@ -221,6 +235,16 @@ struct NvKmsUsageBounds {
} layer[NVKMS_MAX_LAYERS_PER_HEAD];
};
/*!
* Per-component arrays of NvU16s describing the LUT; used for both the input
* LUT and output LUT.
*/
struct NvKmsLutRamps {
NvU16 red[NVKMS_LUT_ARRAY_SIZE]; /*! in */
NvU16 green[NVKMS_LUT_ARRAY_SIZE]; /*! in */
NvU16 blue[NVKMS_LUT_ARRAY_SIZE]; /*! in */
};
/*
* A 3x4 row-major colorspace conversion matrix.
*
@@ -531,6 +555,18 @@ typedef struct {
NvBool noncoherent;
} NvKmsDispIOCoherencyModes;
enum NvKmsInputColorRange {
/*
* If DEFAULT is provided, driver will assume full range for RGB formats
* and limited range for YUV formats.
*/
NVKMS_INPUT_COLORRANGE_DEFAULT = 0,
NVKMS_INPUT_COLORRANGE_LIMITED = 1,
NVKMS_INPUT_COLORRANGE_FULL = 2,
};
enum NvKmsInputColorSpace {
/* Unknown colorspace; no de-gamma will be applied */
NVKMS_INPUT_COLORSPACE_NONE = 0,
@@ -542,6 +578,12 @@ enum NvKmsInputColorSpace {
NVKMS_INPUT_COLORSPACE_BT2100_PQ = 2,
};
enum NvKmsOutputColorimetry {
NVKMS_OUTPUT_COLORIMETRY_DEFAULT = 0,
NVKMS_OUTPUT_COLORIMETRY_BT2100 = 1,
};
enum NvKmsOutputTf {
/*
* NVKMS itself won't apply any OETF (clients are still
@@ -552,6 +594,17 @@ enum NvKmsOutputTf {
NVKMS_OUTPUT_TF_PQ = 2,
};
/*!
* EOTF Data Byte 1 as per CTA-861-G spec.
* This is expected to match exactly with the spec.
*/
enum NvKmsInfoFrameEOTF {
NVKMS_INFOFRAME_EOTF_SDR_GAMMA = 0,
NVKMS_INFOFRAME_EOTF_HDR_GAMMA = 1,
NVKMS_INFOFRAME_EOTF_ST2084 = 2,
NVKMS_INFOFRAME_EOTF_HLG = 3,
};
/*!
* HDR Static Metadata Type1 Descriptor as per CEA-861.3 spec.
* This is expected to match exactly with the spec.
@@ -605,4 +658,29 @@ struct NvKmsHDRStaticMetadata {
NvU16 maxFALL;
};
/*!
* A superframe is made of two or more video streams that are combined in
* a specific way. A DP serializer (an external device connected to a Tegra
* ARM SOC over DP or HDMI) can receive a video stream comprising multiple
* videos combined into a single frame and then split it into multiple
* video streams. The following structure describes the number of views
* and dimensions of each view inside a superframe.
*/
struct NvKmsSuperframeInfo {
NvU8 numViews;
struct {
/* x offset inside superframe at which this view starts */
NvU16 x;
/* y offset inside superframe at which this view starts */
NvU16 y;
/* Horizontal active width in pixels for this view */
NvU16 width;
/* Vertical active height in lines for this view */
NvU16 height;
} view[NVKMS_MAX_SUPERFRAME_VIEWS];
};
#endif /* NVKMS_API_TYPES_H */

View File

@@ -49,6 +49,8 @@ struct NvKmsKapiDevice;
struct NvKmsKapiMemory;
struct NvKmsKapiSurface;
struct NvKmsKapiChannelEvent;
struct NvKmsKapiSemaphoreSurface;
struct NvKmsKapiSemaphoreSurfaceCallback;
typedef NvU32 NvKmsKapiConnector;
typedef NvU32 NvKmsKapiDisplay;
@@ -67,6 +69,14 @@ typedef NvU32 NvKmsKapiDisplay;
*/
typedef void NvKmsChannelEventProc(void *dataPtr, NvU32 dataU32);
/*
* Note: Same as above, this function must not call back into NVKMS-KAPI, nor
* directly into RM. Doing so could cause deadlocks given the notification
* function will most likely be called from within RM's interrupt handler
* callchain.
*/
typedef void NvKmsSemaphoreSurfaceCallbackProc(void *pData);
/** @} */
/**
@@ -126,6 +136,11 @@ struct NvKmsKapiDeviceResourcesInfo {
NvU32 validCursorCompositionModes;
NvU64 supportedCursorSurfaceMemoryFormats;
struct {
NvU64 maxSubmittedOffset;
NvU64 stride;
} semsurf;
struct {
NvU16 validRRTransforms;
NvU32 validCompositionModes;
@@ -218,8 +233,10 @@ struct NvKmsKapiLayerConfig {
struct NvKmsRRParams rrParams;
struct NvKmsKapiSyncpt syncptParams;
struct NvKmsHDRStaticMetadata hdrMetadata;
NvBool hdrMetadataSpecified;
struct {
struct NvKmsHDRStaticMetadata val;
NvBool enabled;
} hdrMetadata;
enum NvKmsOutputTf tf;
@@ -233,16 +250,21 @@ struct NvKmsKapiLayerConfig {
NvU16 dstWidth, dstHeight;
enum NvKmsInputColorSpace inputColorSpace;
struct NvKmsCscMatrix csc;
NvBool cscUseMain;
};
struct NvKmsKapiLayerRequestedConfig {
struct NvKmsKapiLayerConfig config;
struct {
NvBool surfaceChanged : 1;
NvBool srcXYChanged : 1;
NvBool srcWHChanged : 1;
NvBool dstXYChanged : 1;
NvBool dstWHChanged : 1;
NvBool surfaceChanged : 1;
NvBool srcXYChanged : 1;
NvBool srcWHChanged : 1;
NvBool dstXYChanged : 1;
NvBool dstWHChanged : 1;
NvBool cscChanged : 1;
NvBool tfChanged : 1;
NvBool hdrMetadataChanged : 1;
} flags;
};
@@ -286,14 +308,41 @@ struct NvKmsKapiHeadModeSetConfig {
struct NvKmsKapiDisplayMode mode;
NvBool vrrEnabled;
struct {
NvBool enabled;
enum NvKmsInfoFrameEOTF eotf;
struct NvKmsHDRStaticMetadata staticMetadata;
} hdrInfoFrame;
enum NvKmsOutputColorimetry colorimetry;
struct {
struct {
NvBool specified;
NvU32 depth;
NvU32 start;
NvU32 end;
struct NvKmsLutRamps *pRamps;
} input;
struct {
NvBool specified;
NvBool enabled;
struct NvKmsLutRamps *pRamps;
} output;
} lut;
};
struct NvKmsKapiHeadRequestedConfig {
struct NvKmsKapiHeadModeSetConfig modeSetConfig;
struct {
NvBool activeChanged : 1;
NvBool displaysChanged : 1;
NvBool modeChanged : 1;
NvBool activeChanged : 1;
NvBool displaysChanged : 1;
NvBool modeChanged : 1;
NvBool hdrInfoFrameChanged : 1;
NvBool colorimetryChanged : 1;
NvBool lutChanged : 1;
} flags;
struct NvKmsKapiCursorRequestedConfig cursorRequestedConfig;
@@ -318,6 +367,7 @@ struct NvKmsKapiHeadReplyConfig {
};
struct NvKmsKapiModeSetReplyConfig {
enum NvKmsFlipResult flipResult;
struct NvKmsKapiHeadReplyConfig
headReplyConfig[NVKMS_KAPI_MAX_HEADS];
};
@@ -434,6 +484,14 @@ enum NvKmsKapiAllocationType {
NVKMS_KAPI_ALLOCATION_TYPE_OFFSCREEN = 2,
};
typedef enum NvKmsKapiRegisterWaiterResultRec {
NVKMS_KAPI_REG_WAITER_FAILED,
NVKMS_KAPI_REG_WAITER_SUCCESS,
NVKMS_KAPI_REG_WAITER_ALREADY_SIGNALLED,
} NvKmsKapiRegisterWaiterResult;
typedef void NvKmsKapiSuspendResumeCallbackFunc(NvBool suspend);
struct NvKmsKapiFunctionsTable {
/*!
@@ -519,8 +577,8 @@ struct NvKmsKapiFunctionsTable {
);
/*!
* Revoke permissions previously granted. Only one (dispIndex, head,
* display) is currently supported.
* Revoke modeset permissions previously granted. Only one (dispIndex,
* head, display) is currently supported.
*
* \param [in] device A device returned by allocateDevice().
*
@@ -537,6 +595,34 @@ struct NvKmsKapiFunctionsTable {
NvKmsKapiDisplay display
);
/*!
* Grant modeset sub-owner permissions to fd. This is used by clients to
* convert drm 'master' permissions into nvkms sub-owner permission.
*
* \param [in] fd fd from opening /dev/nvidia-modeset.
*
* \param [in] device A device returned by allocateDevice().
*
* \return NV_TRUE on success, NV_FALSE on failure.
*/
NvBool (*grantSubOwnership)
(
NvS32 fd,
struct NvKmsKapiDevice *device
);
/*!
* Revoke sub-owner permissions previously granted.
*
* \param [in] device A device returned by allocateDevice().
*
* \return NV_TRUE on success, NV_FALSE on failure.
*/
NvBool (*revokeSubOwnership)
(
struct NvKmsKapiDevice *device
);
/*!
* Registers for notification, via
* NvKmsKapiAllocateDeviceParams::eventCallback, of the events specified
@@ -1122,6 +1208,208 @@ struct NvKmsKapiFunctionsTable {
NvP64 dmaBuf,
NvU32 limit);
/*!
* Import a semaphore surface allocated elsewhere to NVKMS and return a
* handle to the new object.
*
* \param [in] device A device allocated using allocateDevice().
*
* \param [in] nvKmsParamsUser Userspace pointer to driver-specific
* parameters describing the semaphore
* surface being imported.
*
* \param [in] nvKmsParamsSize Size of the driver-specific parameter
* struct.
*
* \param [out] pSemaphoreMap Returns a CPU mapping of the semaphore
* surface's semaphore memory to the client.
*
* \param [out] pMaxSubmittedMap Returns a CPU mapping of the semaphore
* surface's semaphore memory to the client.
*
* \return struct NvKmsKapiSemaphoreSurface* on success, NULL on failure.
*/
struct NvKmsKapiSemaphoreSurface* (*importSemaphoreSurface)
(
struct NvKmsKapiDevice *device,
NvU64 nvKmsParamsUser,
NvU64 nvKmsParamsSize,
void **pSemaphoreMap,
void **pMaxSubmittedMap
);
/*!
* Free an imported semaphore surface.
*
* \param [in] device The device passed to
* importSemaphoreSurface() when creating
* semaphoreSurface.
*
* \param [in] semaphoreSurface A semaphore surface returned by
* importSemaphoreSurface().
*/
void (*freeSemaphoreSurface)
(
struct NvKmsKapiDevice *device,
struct NvKmsKapiSemaphoreSurface *semaphoreSurface
);
/*!
* Register a callback to be called when a semaphore reaches a value.
*
* The callback will be called when the semaphore at index in
* semaphoreSurface reaches the value wait_value. The callback will
* be called at most once and is automatically unregistered when called.
* It may also be unregistered (i.e., cancelled) explicitly using the
* unregisterSemaphoreSurfaceCallback() function. To avoid leaking the
* memory used to track the registered callback, callers must ensure one
* of these methods of unregistration is used for every successful
* callback registration that returns a non-NULL pCallbackHandle.
*
* \param [in] device The device passed to
* importSemaphoreSurface() when creating
* semaphoreSurface.
*
* \param [in] semaphoreSurface A semaphore surface returned by
* importSemaphoreSurface().
*
* \param [in] pCallback A pointer to the function to call when
* the specified value is reached. NULL
* means no callback.
*
* \param [in] pData Arbitrary data to be passed back to the
* callback as its sole parameter.
*
* \param [in] index The index of the semaphore within
* semaphoreSurface.
*
* \param [in] wait_value The value the semaphore must reach or
* exceed before the callback is called.
*
* \param [in] new_value The value the semaphore will be set to
* when it reaches or exceeds <wait_value>.
* 0 means do not update the value.
*
* \param [out] pCallbackHandle On success, the value pointed to will
* contain an opaque handle to the
* registered callback that may be used to
* cancel it if needed. Unused if pCallback
* is NULL.
*
* \return NVKMS_KAPI_REG_WAITER_SUCCESS if the waiter was registered or if
* no callback was requested and the semaphore at <index> has
* already reached or exceeded <wait_value>
*
* NVKMS_KAPI_REG_WAITER_ALREADY_SIGNALLED if a callback was
* requested and the semaphore at <index> has already reached or
* exceeded <wait_value>
*
* NVKMS_KAPI_REG_WAITER_FAILED if waiter registration failed.
*/
NvKmsKapiRegisterWaiterResult
(*registerSemaphoreSurfaceCallback)
(
struct NvKmsKapiDevice *device,
struct NvKmsKapiSemaphoreSurface *semaphoreSurface,
NvKmsSemaphoreSurfaceCallbackProc *pCallback,
void *pData,
NvU64 index,
NvU64 wait_value,
NvU64 new_value,
struct NvKmsKapiSemaphoreSurfaceCallback **pCallbackHandle
);
/*!
* Unregister a callback registered via registerSemaphoreSurfaceCallback()
*
* If the callback has not yet been called, this function will cancel the
* callback and free its associated resources.
*
* Note this function treats the callback handle as a pointer. While this
* function does not dereference that pointer itself, the underlying call
* to RM does within a properly guarded critical section that first ensures
* it is not in the process of being used within a callback. This means
* the callstack must take into consideration that pointers are not in
* general unique handles if they may have been freed, since a subsequent
* malloc could return the same pointer value at that point. This callchain
* avoids that by leveraging the behavior of the underlying RM APIs:
*
* 1) A callback handle is referenced relative to its corresponding
* (semaphore surface, index, wait_value) tuple here and within RM. It
* is not a valid handle outside of that scope.
*
* 2) A callback can not be registered against an already-reached value
* for a given semaphore surface index.
*
* 3) A given callback handle can not be registered twice against the same
* (semaphore surface, index, wait_value) tuple, so unregistration will
* never race with registration at the RM level, and would only race at
* a higher level if used incorrectly. Since this is kernel code, we
* can safely assume there won't be malicious clients purposely misuing
* the API, but the burden is placed on the caller to ensure its usage
* does not lead to races at higher levels.
*
* These factors considered together ensure any valid registered handle is
* either still in the relevant waiter list and refers to the same event/
* callback as when it was registered, or has been removed from the list
* as part of a critical section that also destroys the list itself and
* makes future lookups in that list impossible, and hence eliminates the
* chance of comparing a stale handle with a new handle of the same value
* as part of a lookup.
*
* \param [in] device The device passed to
* importSemaphoreSurface() when creating
* semaphoreSurface.
*
* \param [in] semaphoreSurface The semaphore surface passed to
* registerSemaphoreSurfaceCallback() when
* registering the callback.
*
* \param [in] index The index passed to
* registerSemaphoreSurfaceCallback() when
* registering the callback.
*
* \param [in] wait_value The wait_value passed to
* registerSemaphoreSurfaceCallback() when
* registering the callback.
*
* \param [in] callbackHandle The callback handle returned by
* registerSemaphoreSurfaceCallback().
*/
NvBool
(*unregisterSemaphoreSurfaceCallback)
(
struct NvKmsKapiDevice *device,
struct NvKmsKapiSemaphoreSurface *semaphoreSurface,
NvU64 index,
NvU64 wait_value,
struct NvKmsKapiSemaphoreSurfaceCallback *callbackHandle
);
/*!
* Update the value of a semaphore surface from the CPU.
*
* Update the semaphore value at the specified index from the CPU, then
* wake up any pending CPU waiters associated with that index that are
* waiting on it reaching a value <= the new value.
*/
NvBool
(*setSemaphoreSurfaceValue)
(
struct NvKmsKapiDevice *device,
struct NvKmsKapiSemaphoreSurface *semaphoreSurface,
NvU64 index,
NvU64 new_value
);
/*!
* Set the callback function for suspending and resuming the display system.
*/
void
(*setSuspendResumeCallback)
(
NvKmsKapiSuspendResumeCallbackFunc *function
);
};
/** @} */

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 1993-2020 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -494,6 +494,23 @@ do \
//
#define NV_TWO_N_MINUS_ONE(n) (((1ULL<<(n/2))<<((n+1)/2))-1)
//
// Create a 64b bitmask with n bits set
// This is the same as ((1ULL<<n) - 1), but it doesn't overflow for n=64
//
// ...
// n=-1, 0x0000000000000000
// n=0, 0x0000000000000000
// n=1, 0x0000000000000001
// ...
// n=63, 0x7FFFFFFFFFFFFFFF
// n=64, 0xFFFFFFFFFFFFFFFF
// n=65, 0xFFFFFFFFFFFFFFFF
// n=66, 0xFFFFFFFFFFFFFFFF
// ...
//
#define NV_BITMASK64(n) ((n<1) ? 0ULL : (NV_U64_MAX>>((n>64) ? 0 : (64-n))))
#define DRF_READ_1WORD_BS(d,r,f,v) \
((DRF_EXTENT_MW(NV##d##r##f)<8)?DRF_READ_1BYTE_BS(NV##d##r##f,(v)): \
((DRF_EXTENT_MW(NV##d##r##f)<16)?DRF_READ_2BYTE_BS(NV##d##r##f,(v)): \
@@ -574,6 +591,13 @@ nvMaskPos32(const NvU32 mask, const NvU32 bitIdx)
n32 = BIT_IDX_32(LOWESTBIT(n32));\
}
// Destructive operation on n64
#define LOWESTBITIDX_64(n64) \
{ \
n64 = BIT_IDX_64(LOWESTBIT(n64));\
}
// Destructive operation on n32
#define HIGHESTBITIDX_32(n32) \
{ \
@@ -918,6 +942,14 @@ static NV_FORCEINLINE void *NV_NVUPTR_TO_PTR(NvUPtr address)
// Use (lo) if (b) is less than 64, and (hi) if >= 64.
//
#define NV_BIT_SET_128(b, lo, hi) { nvAssert( (b) < 128 ); if ( (b) < 64 ) (lo) |= NVBIT64(b); else (hi) |= NVBIT64( b & 0x3F ); }
//
// Clear the bit at pos (b) for U64 which is < 128.
// Use (lo) if (b) is less than 64, and (hi) if >= 64.
//
#define NV_BIT_CLEAR_128(b, lo, hi) { nvAssert( (b) < 128 ); if ( (b) < 64 ) (lo) &= ~NVBIT64(b); else (hi) &= ~NVBIT64( b & 0x3F ); }
// Get the number of elements the specified fixed-size array
#define NV_ARRAY_ELEMENTS(x) ((sizeof(x)/sizeof((x)[0])))
#ifdef __cplusplus
}

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2014-2020 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2014-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -150,6 +150,9 @@ NV_STATUS_CODE(NV_ERR_NVLINK_CONFIGURATION_ERROR, 0x00000078, "Nvlink Confi
NV_STATUS_CODE(NV_ERR_RISCV_ERROR, 0x00000079, "Generic RISC-V assert or halt")
NV_STATUS_CODE(NV_ERR_FABRIC_MANAGER_NOT_PRESENT, 0x0000007A, "Fabric Manager is not loaded")
NV_STATUS_CODE(NV_ERR_ALREADY_SIGNALLED, 0x0000007B, "Semaphore Surface value already >= requested wait value")
NV_STATUS_CODE(NV_ERR_QUEUE_TASK_SLOT_NOT_AVAILABLE, 0x0000007C, "PMU RPC error due to no queue slot available for this event")
NV_STATUS_CODE(NV_ERR_KEY_ROTATION_IN_PROGRESS, 0x0000007D, "Operation not allowed as key rotation is in progress")
NV_STATUS_CODE(NV_ERR_TEST_ONLY_CODE_NOT_ENABLED, 0x0000007E, "Test-only code path not enabled")
// Warnings:
NV_STATUS_CODE(NV_WARN_HOT_SWITCH, 0x00010001, "WARNING Hot switch")

View File

@@ -145,7 +145,18 @@ typedef signed short NvS16; /* -32768 to 32767 */
#endif
// Macro to build an NvU32 from four bytes, listed from msb to lsb
#define NvU32_BUILD(a, b, c, d) (((a) << 24) | ((b) << 16) | ((c) << 8) | (d))
#define NvU32_BUILD(a, b, c, d) \
((NvU32)( \
(((NvU32)(a) & 0xff) << 24) | \
(((NvU32)(b) & 0xff) << 16) | \
(((NvU32)(c) & 0xff) << 8) | \
(((NvU32)(d) & 0xff))))
// Macro to build an NvU64 from two DWORDS, listed from msb to lsb
#define NvU64_BUILD(a, b) \
((NvU64)( \
(((NvU64)(a) & ~0U) << 32) | \
(((NvU64)(b) & ~0U))))
#if NVTYPES_USE_STDINT
typedef uint32_t NvV32; /* "void": enumerated or multiple fields */

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 1999-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 1999-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -67,7 +67,6 @@ typedef struct os_wait_queue os_wait_queue;
* ---------------------------------------------------------------------------
*/
NvU64 NV_API_CALL os_get_num_phys_pages (void);
NV_STATUS NV_API_CALL os_alloc_mem (void **, NvU64);
void NV_API_CALL os_free_mem (void *);
NV_STATUS NV_API_CALL os_get_current_time (NvU32 *, NvU32 *);
@@ -105,7 +104,6 @@ void* NV_API_CALL os_map_kernel_space (NvU64, NvU64, NvU32);
void NV_API_CALL os_unmap_kernel_space (void *, NvU64);
void* NV_API_CALL os_map_user_space (NvU64, NvU64, NvU32, NvU32, void **);
void NV_API_CALL os_unmap_user_space (void *, NvU64, void *);
NV_STATUS NV_API_CALL os_flush_cpu_cache (void);
NV_STATUS NV_API_CALL os_flush_cpu_cache_all (void);
NV_STATUS NV_API_CALL os_flush_user_cache (void);
void NV_API_CALL os_flush_cpu_write_combine_buffer(void);
@@ -162,10 +160,9 @@ NvBool NV_API_CALL os_is_vgx_hyper (void);
NV_STATUS NV_API_CALL os_inject_vgx_msi (NvU16, NvU64, NvU32);
NvBool NV_API_CALL os_is_grid_supported (void);
NvU32 NV_API_CALL os_get_grid_csp_support (void);
void NV_API_CALL os_get_screen_info (NvU64 *, NvU16 *, NvU16 *, NvU16 *, NvU16 *, NvU64, NvU64);
void NV_API_CALL os_bug_check (NvU32, const char *);
NV_STATUS NV_API_CALL os_lock_user_pages (void *, NvU64, void **, NvU32);
NV_STATUS NV_API_CALL os_lookup_user_io_memory (void *, NvU64, NvU64 **, void**);
NV_STATUS NV_API_CALL os_lookup_user_io_memory (void *, NvU64, NvU64 **);
NV_STATUS NV_API_CALL os_unlock_user_pages (NvU64, void *);
NV_STATUS NV_API_CALL os_match_mmap_offset (void *, NvU64, NvU64 *);
NV_STATUS NV_API_CALL os_get_euid (NvU32 *);
@@ -200,6 +197,8 @@ nv_cap_t* NV_API_CALL os_nv_cap_create_file_entry (nv_cap_t *, const char *,
void NV_API_CALL os_nv_cap_destroy_entry (nv_cap_t *);
int NV_API_CALL os_nv_cap_validate_and_dup_fd(const nv_cap_t *, int);
void NV_API_CALL os_nv_cap_close_fd (int);
NvS32 NV_API_CALL os_imex_channel_get (NvU64);
NvS32 NV_API_CALL os_imex_channel_count (void);
enum os_pci_req_atomics_type {
OS_INTF_PCIE_REQ_ATOMICS_32BIT,
@@ -207,16 +206,21 @@ enum os_pci_req_atomics_type {
OS_INTF_PCIE_REQ_ATOMICS_128BIT
};
NV_STATUS NV_API_CALL os_enable_pci_req_atomics (void *, enum os_pci_req_atomics_type);
NV_STATUS NV_API_CALL os_get_numa_node_memory_usage (NvS32, NvU64 *, NvU64 *);
NV_STATUS NV_API_CALL os_numa_add_gpu_memory (void *, NvU64, NvU64, NvU32 *);
NV_STATUS NV_API_CALL os_numa_remove_gpu_memory (void *, NvU64, NvU64, NvU32);
NV_STATUS NV_API_CALL os_offline_page_at_address(NvU64 address);
void* NV_API_CALL os_get_pid_info(void);
void NV_API_CALL os_put_pid_info(void *pid_info);
NV_STATUS NV_API_CALL os_find_ns_pid(void *pid_info, NvU32 *ns_pid);
extern NvU32 os_page_size;
extern NvU64 os_page_mask;
extern NvU8 os_page_shift;
extern NvU32 os_sev_status;
extern NvBool os_sev_enabled;
extern NvBool os_cc_enabled;
extern NvBool os_cc_tdx_enabled;
extern NvBool os_dma_buf_enabled;
extern NvBool os_imex_channel_is_supported;
/*
* ---------------------------------------------------------------------------

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 1999-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 1999-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -37,7 +37,7 @@ NV_STATUS NV_API_CALL rm_gpu_ops_create_session (nvidia_stack_t *, nvgpuSessio
NV_STATUS NV_API_CALL rm_gpu_ops_destroy_session (nvidia_stack_t *, nvgpuSessionHandle_t);
NV_STATUS NV_API_CALL rm_gpu_ops_device_create (nvidia_stack_t *, nvgpuSessionHandle_t, const nvgpuInfo_t *, const NvProcessorUuid *, nvgpuDeviceHandle_t *, NvBool);
NV_STATUS NV_API_CALL rm_gpu_ops_device_destroy (nvidia_stack_t *, nvgpuDeviceHandle_t);
NV_STATUS NV_API_CALL rm_gpu_ops_address_space_create(nvidia_stack_t *, nvgpuDeviceHandle_t, unsigned long long, unsigned long long, nvgpuAddressSpaceHandle_t *, nvgpuAddressSpaceInfo_t);
NV_STATUS NV_API_CALL rm_gpu_ops_address_space_create(nvidia_stack_t *, nvgpuDeviceHandle_t, unsigned long long, unsigned long long, NvBool, nvgpuAddressSpaceHandle_t *, nvgpuAddressSpaceInfo_t);
NV_STATUS NV_API_CALL rm_gpu_ops_dup_address_space(nvidia_stack_t *, nvgpuDeviceHandle_t, NvHandle, NvHandle, nvgpuAddressSpaceHandle_t *, nvgpuAddressSpaceInfo_t);
NV_STATUS NV_API_CALL rm_gpu_ops_address_space_destroy(nvidia_stack_t *, nvgpuAddressSpaceHandle_t);
NV_STATUS NV_API_CALL rm_gpu_ops_memory_alloc_fb(nvidia_stack_t *, nvgpuAddressSpaceHandle_t, NvLength, NvU64 *, nvgpuAllocInfo_t);
@@ -45,7 +45,6 @@ NV_STATUS NV_API_CALL rm_gpu_ops_memory_alloc_fb(nvidia_stack_t *, nvgpuAddres
NV_STATUS NV_API_CALL rm_gpu_ops_pma_alloc_pages(nvidia_stack_t *, void *, NvLength, NvU32 , nvgpuPmaAllocationOptions_t, NvU64 *);
NV_STATUS NV_API_CALL rm_gpu_ops_pma_free_pages(nvidia_stack_t *, void *, NvU64 *, NvLength , NvU32, NvU32);
NV_STATUS NV_API_CALL rm_gpu_ops_pma_pin_pages(nvidia_stack_t *, void *, NvU64 *, NvLength , NvU32, NvU32);
NV_STATUS NV_API_CALL rm_gpu_ops_pma_unpin_pages(nvidia_stack_t *, void *, NvU64 *, NvLength , NvU32);
NV_STATUS NV_API_CALL rm_gpu_ops_get_pma_object(nvidia_stack_t *, nvgpuDeviceHandle_t, void **, const nvgpuPmaStatistics_t *);
NV_STATUS NV_API_CALL rm_gpu_ops_pma_register_callbacks(nvidia_stack_t *sp, void *, nvPmaEvictPagesCallback, nvPmaEvictRangeCallback, void *);
void NV_API_CALL rm_gpu_ops_pma_unregister_callbacks(nvidia_stack_t *sp, void *);
@@ -76,7 +75,8 @@ NV_STATUS NV_API_CALL rm_gpu_ops_own_page_fault_intr(nvidia_stack_t *, nvgpuDevi
NV_STATUS NV_API_CALL rm_gpu_ops_init_fault_info(nvidia_stack_t *, nvgpuDeviceHandle_t, nvgpuFaultInfo_t);
NV_STATUS NV_API_CALL rm_gpu_ops_destroy_fault_info(nvidia_stack_t *, nvgpuDeviceHandle_t, nvgpuFaultInfo_t);
NV_STATUS NV_API_CALL rm_gpu_ops_get_non_replayable_faults(nvidia_stack_t *, nvgpuFaultInfo_t, void *, NvU32 *);
NV_STATUS NV_API_CALL rm_gpu_ops_flush_replayable_fault_buffer(nvidia_stack_t *, nvgpuDeviceHandle_t);
NV_STATUS NV_API_CALL rm_gpu_ops_flush_replayable_fault_buffer(nvidia_stack_t *, nvgpuFaultInfo_t, NvBool);
NV_STATUS NV_API_CALL rm_gpu_ops_toggle_prefetch_faults(nvidia_stack_t *, nvgpuFaultInfo_t, NvBool);
NV_STATUS NV_API_CALL rm_gpu_ops_has_pending_non_replayable_faults(nvidia_stack_t *, nvgpuFaultInfo_t, NvBool *);
NV_STATUS NV_API_CALL rm_gpu_ops_init_access_cntr_info(nvidia_stack_t *, nvgpuDeviceHandle_t, nvgpuAccessCntrInfo_t, NvU32);
NV_STATUS NV_API_CALL rm_gpu_ops_destroy_access_cntr_info(nvidia_stack_t *, nvgpuDeviceHandle_t, nvgpuAccessCntrInfo_t);
@@ -101,15 +101,17 @@ NV_STATUS NV_API_CALL rm_gpu_ops_paging_channels_map(nvidia_stack_t *, nvgpuAdd
void NV_API_CALL rm_gpu_ops_paging_channels_unmap(nvidia_stack_t *, nvgpuAddressSpaceHandle_t, NvU64, nvgpuDeviceHandle_t);
NV_STATUS NV_API_CALL rm_gpu_ops_paging_channel_push_stream(nvidia_stack_t *, nvgpuPagingChannelHandle_t, char *, NvU32);
NV_STATUS NV_API_CALL rm_gpu_ops_key_rotation_channel_disable(nvidia_stack_t *, nvgpuChannelHandle_t [], NvU32);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_context_init(nvidia_stack_t *, struct ccslContext_t **, nvgpuChannelHandle_t);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_context_clear(nvidia_stack_t *, struct ccslContext_t *);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_log_device_encryption(nvidia_stack_t *, struct ccslContext_t *, NvU8 *);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_context_update(nvidia_stack_t *, UvmCslContext *[], NvU32);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_rotate_iv(nvidia_stack_t *, struct ccslContext_t *, NvU8);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_acquire_encryption_iv(nvidia_stack_t *, struct ccslContext_t *, NvU8 *);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_encrypt(nvidia_stack_t *, struct ccslContext_t *, NvU32, NvU8 const *, NvU8 *, NvU8 *);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_encrypt_with_iv(nvidia_stack_t *, struct ccslContext_t *, NvU32, NvU8 const *, NvU8*, NvU8 *, NvU8 *);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_decrypt(nvidia_stack_t *, struct ccslContext_t *, NvU32, NvU8 const *, NvU8 const *, NvU8 *, NvU8 const *);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_decrypt(nvidia_stack_t *, struct ccslContext_t *, NvU32, NvU8 const *, NvU8 const *, NvU8 *, NvU8 const *, NvU32, NvU8 const *);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_sign(nvidia_stack_t *, struct ccslContext_t *, NvU32, NvU8 const *, NvU8 *);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_query_message_pool(nvidia_stack_t *, struct ccslContext_t *, NvU8, NvU64 *);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_increment_iv(nvidia_stack_t *, struct ccslContext_t *, NvU8, NvU64, NvU8 *);
NV_STATUS NV_API_CALL rm_gpu_ops_ccsl_log_device_encryption(nvidia_stack_t *, struct ccslContext_t *, NvU32);
#endif

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,102 @@
# Each of these headers is checked for presence with a test #include; a
# corresponding #define will be generated in conftest/headers.h.
NV_HEADER_PRESENCE_TESTS = \
asm/system.h \
drm/drmP.h \
drm/drm_aperture.h \
drm/drm_auth.h \
drm/drm_gem.h \
drm/drm_crtc.h \
drm/drm_color_mgmt.h \
drm/drm_atomic.h \
drm/drm_atomic_helper.h \
drm/drm_atomic_state_helper.h \
drm/drm_encoder.h \
drm/drm_atomic_uapi.h \
drm/drm_drv.h \
drm/drm_fbdev_generic.h \
drm/drm_framebuffer.h \
drm/drm_connector.h \
drm/drm_probe_helper.h \
drm/drm_blend.h \
drm/drm_fourcc.h \
drm/drm_prime.h \
drm/drm_plane.h \
drm/drm_vblank.h \
drm/drm_file.h \
drm/drm_ioctl.h \
drm/drm_device.h \
drm/drm_mode_config.h \
drm/drm_modeset_lock.h \
dt-bindings/interconnect/tegra_icc_id.h \
generated/autoconf.h \
generated/compile.h \
generated/utsrelease.h \
linux/efi.h \
linux/kconfig.h \
linux/platform/tegra/mc_utils.h \
linux/printk.h \
linux/ratelimit.h \
linux/prio_tree.h \
linux/log2.h \
linux/of.h \
linux/bug.h \
linux/sched.h \
linux/sched/mm.h \
linux/sched/signal.h \
linux/sched/task.h \
linux/sched/task_stack.h \
xen/ioemu.h \
linux/fence.h \
linux/dma-fence.h \
linux/dma-resv.h \
soc/tegra/chip-id.h \
soc/tegra/fuse.h \
soc/tegra/fuse-helper.h \
soc/tegra/tegra_bpmp.h \
video/nv_internal.h \
linux/platform/tegra/dce/dce-client-ipc.h \
linux/nvhost.h \
linux/nvhost_t194.h \
linux/host1x-next.h \
asm/book3s/64/hash-64k.h \
asm/set_memory.h \
asm/prom.h \
asm/powernv.h \
linux/atomic.h \
asm/barrier.h \
asm/opal-api.h \
sound/hdaudio.h \
asm/pgtable_types.h \
asm/page.h \
linux/stringhash.h \
linux/dma-map-ops.h \
rdma/peer_mem.h \
sound/hda_codec.h \
linux/dma-buf.h \
linux/time.h \
linux/platform_device.h \
linux/mutex.h \
linux/reset.h \
linux/of_platform.h \
linux/of_device.h \
linux/of_gpio.h \
linux/gpio.h \
linux/gpio/consumer.h \
linux/interconnect.h \
linux/pm_runtime.h \
linux/clk.h \
linux/clk-provider.h \
linux/ioasid.h \
linux/stdarg.h \
linux/iosys-map.h \
asm/coco.h \
linux/vfio_pci_core.h \
linux/mdev.h \
soc/tegra/bpmp-abi.h \
soc/tegra/bpmp.h \
linux/sync_file.h \
linux/cc_platform.h \
asm/cpufeature.h \
linux/mpi.h

View File

@@ -0,0 +1,334 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*/
#include "nv-kthread-q.h"
#include "nv-list-helpers.h"
#include <linux/kthread.h>
#include <linux/interrupt.h>
#include <linux/completion.h>
#include <linux/module.h>
#include <linux/mm.h>
#if defined(NV_LINUX_BUG_H_PRESENT)
#include <linux/bug.h>
#else
#include <asm/bug.h>
#endif
// Today's implementation is a little simpler and more limited than the
// API description allows for in nv-kthread-q.h. Details include:
//
// 1. Each nv_kthread_q instance is a first-in, first-out queue.
//
// 2. Each nv_kthread_q instance is serviced by exactly one kthread.
//
// You can create any number of queues, each of which gets its own
// named kernel thread (kthread). You can then insert arbitrary functions
// into the queue, and those functions will be run in the context of the
// queue's kthread.
#ifndef WARN
// Only *really* old kernels (2.6.9) end up here. Just use a simple printk
// to implement this, because such kernels won't be supported much longer.
#define WARN(condition, format...) ({ \
int __ret_warn_on = !!(condition); \
if (unlikely(__ret_warn_on)) \
printk(KERN_ERR format); \
unlikely(__ret_warn_on); \
})
#endif
#define NVQ_WARN(fmt, ...) \
do { \
if (in_interrupt()) { \
WARN(1, "nv_kthread_q: [in interrupt]: " fmt, \
##__VA_ARGS__); \
} \
else { \
WARN(1, "nv_kthread_q: task: %s: " fmt, \
current->comm, \
##__VA_ARGS__); \
} \
} while (0)
static int _main_loop(void *args)
{
nv_kthread_q_t *q = (nv_kthread_q_t *)args;
nv_kthread_q_item_t *q_item = NULL;
unsigned long flags;
while (1) {
// Normally this thread is never interrupted. However,
// down_interruptible (instead of down) is called here,
// in order to avoid being classified as a potentially
// hung task, by the kernel watchdog.
while (down_interruptible(&q->q_sem))
NVQ_WARN("Interrupted during semaphore wait\n");
if (atomic_read(&q->main_loop_should_exit))
break;
spin_lock_irqsave(&q->q_lock, flags);
// The q_sem semaphore prevents us from getting here unless there is
// at least one item in the list, so an empty list indicates a bug.
if (unlikely(list_empty(&q->q_list_head))) {
spin_unlock_irqrestore(&q->q_lock, flags);
NVQ_WARN("_main_loop: Empty queue: q: 0x%p\n", q);
continue;
}
// Consume one item from the queue
q_item = list_first_entry(&q->q_list_head,
nv_kthread_q_item_t,
q_list_node);
list_del_init(&q_item->q_list_node);
spin_unlock_irqrestore(&q->q_lock, flags);
// Run the item
q_item->function_to_run(q_item->function_args);
// Make debugging a little simpler by clearing this between runs:
q_item = NULL;
}
while (!kthread_should_stop())
schedule();
return 0;
}
void nv_kthread_q_stop(nv_kthread_q_t *q)
{
// check if queue has been properly initialized
if (unlikely(!q->q_kthread))
return;
nv_kthread_q_flush(q);
// If this assertion fires, then a caller likely either broke the API rules,
// by adding items after calling nv_kthread_q_stop, or possibly messed up
// with inadequate flushing of self-rescheduling q_items.
if (unlikely(!list_empty(&q->q_list_head)))
NVQ_WARN("list not empty after flushing\n");
if (likely(!atomic_read(&q->main_loop_should_exit))) {
atomic_set(&q->main_loop_should_exit, 1);
// Wake up the kthread so that it can see that it needs to stop:
up(&q->q_sem);
kthread_stop(q->q_kthread);
q->q_kthread = NULL;
}
}
// When CONFIG_VMAP_STACK is defined, the kernel thread stack allocator used by
// kthread_create_on_node relies on a 2 entry, per-core cache to minimize
// vmalloc invocations. The cache is NUMA-unaware, so when there is a hit, the
// stack location ends up being a function of the core assigned to the current
// thread, instead of being a function of the specified NUMA node. The cache was
// added to the kernel in commit ac496bf48d97f2503eaa353996a4dd5e4383eaf0
// ("fork: Optimize task creation by caching two thread stacks per CPU if
// CONFIG_VMAP_STACK=y")
//
// To work around the problematic cache, we create up to three kernel threads
// -If the first thread's stack is resident on the preferred node, return this
// thread.
// -Otherwise, create a second thread. If its stack is resident on the
// preferred node, stop the first thread and return this one.
// -Otherwise, create a third thread. The stack allocator does not find a
// cached stack, and so falls back to vmalloc, which takes the NUMA hint into
// consideration. The first two threads are then stopped.
//
// When CONFIG_VMAP_STACK is not defined, the first kernel thread is returned.
//
// This function is never invoked when there is no NUMA preference (preferred
// node is NUMA_NO_NODE).
static struct task_struct *thread_create_on_node(int (*threadfn)(void *data),
nv_kthread_q_t *q,
int preferred_node,
const char *q_name)
{
unsigned i, j;
const static unsigned attempts = 3;
struct task_struct *thread[3];
for (i = 0;; i++) {
struct page *stack;
thread[i] = kthread_create_on_node(threadfn, q, preferred_node, q_name);
if (unlikely(IS_ERR(thread[i]))) {
// Instead of failing, pick the previous thread, even if its
// stack is not allocated on the preferred node.
if (i > 0)
i--;
break;
}
// vmalloc is not used to allocate the stack, so simply return the
// thread, even if its stack may not be allocated on the preferred node
if (!is_vmalloc_addr(thread[i]->stack))
break;
// Ran out of attempts - return thread even if its stack may not be
// allocated on the preferred node
if ((i == (attempts - 1)))
break;
// Get the NUMA node where the first page of the stack is resident. If
// it is the preferred node, select this thread.
stack = vmalloc_to_page(thread[i]->stack);
if (page_to_nid(stack) == preferred_node)
break;
}
for (j = i; j > 0; j--)
kthread_stop(thread[j - 1]);
return thread[i];
}
int nv_kthread_q_init_on_node(nv_kthread_q_t *q, const char *q_name, int preferred_node)
{
memset(q, 0, sizeof(*q));
INIT_LIST_HEAD(&q->q_list_head);
spin_lock_init(&q->q_lock);
sema_init(&q->q_sem, 0);
if (preferred_node == NV_KTHREAD_NO_NODE) {
q->q_kthread = kthread_create(_main_loop, q, q_name);
}
else {
q->q_kthread = thread_create_on_node(_main_loop, q, preferred_node, q_name);
}
if (IS_ERR(q->q_kthread)) {
int err = PTR_ERR(q->q_kthread);
// Clear q_kthread before returning so that nv_kthread_q_stop() can be
// safely called on it making error handling easier.
q->q_kthread = NULL;
return err;
}
wake_up_process(q->q_kthread);
return 0;
}
int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname)
{
return nv_kthread_q_init_on_node(q, qname, NV_KTHREAD_NO_NODE);
}
// Returns true (non-zero) if the item was actually scheduled, and false if the
// item was already pending in a queue.
static int _raw_q_schedule(nv_kthread_q_t *q, nv_kthread_q_item_t *q_item)
{
unsigned long flags;
int ret = 1;
spin_lock_irqsave(&q->q_lock, flags);
if (likely(list_empty(&q_item->q_list_node)))
list_add_tail(&q_item->q_list_node, &q->q_list_head);
else
ret = 0;
spin_unlock_irqrestore(&q->q_lock, flags);
if (likely(ret))
up(&q->q_sem);
return ret;
}
void nv_kthread_q_item_init(nv_kthread_q_item_t *q_item,
nv_q_func_t function_to_run,
void *function_args)
{
INIT_LIST_HEAD(&q_item->q_list_node);
q_item->function_to_run = function_to_run;
q_item->function_args = function_args;
}
// Returns true (non-zero) if the q_item got scheduled, false otherwise.
int nv_kthread_q_schedule_q_item(nv_kthread_q_t *q,
nv_kthread_q_item_t *q_item)
{
if (unlikely(atomic_read(&q->main_loop_should_exit))) {
NVQ_WARN("Not allowed: nv_kthread_q_schedule_q_item was "
"called with a non-alive q: 0x%p\n", q);
return 0;
}
return _raw_q_schedule(q, q_item);
}
static void _q_flush_function(void *args)
{
struct completion *completion = (struct completion *)args;
complete(completion);
}
static void _raw_q_flush(nv_kthread_q_t *q)
{
nv_kthread_q_item_t q_item;
DECLARE_COMPLETION_ONSTACK(completion);
nv_kthread_q_item_init(&q_item, _q_flush_function, &completion);
_raw_q_schedule(q, &q_item);
// Wait for the flush item to run. Once it has run, then all of the
// previously queued items in front of it will have run, so that means
// the flush is complete.
wait_for_completion(&completion);
}
void nv_kthread_q_flush(nv_kthread_q_t *q)
{
if (unlikely(atomic_read(&q->main_loop_should_exit))) {
NVQ_WARN("Not allowed: nv_kthread_q_flush was called after "
"nv_kthread_q_stop. q: 0x%p\n", q);
return;
}
// This 2x flush is not a typing mistake. The queue really does have to be
// flushed twice, in order to take care of the case of a q_item that
// reschedules itself.
_raw_q_flush(q);
_raw_q_flush(q);
}

View File

@@ -25,6 +25,15 @@
#include <linux/module.h>
#include "nv-pci-table.h"
#include "cpuopsys.h"
#if defined(NV_BSD)
/* Define PCI classes that FreeBSD's linuxkpi is missing */
#define PCI_VENDOR_ID_NVIDIA 0x10de
#define PCI_CLASS_DISPLAY_VGA 0x0300
#define PCI_CLASS_DISPLAY_3D 0x0302
#define PCI_CLASS_BRIDGE_OTHER 0x0680
#endif
/* Devices supported by RM */
struct pci_device_id nv_pci_table[] = {
@@ -48,7 +57,7 @@ struct pci_device_id nv_pci_table[] = {
};
/* Devices supported by all drivers in nvidia.ko */
struct pci_device_id nv_module_device_table[] = {
struct pci_device_id nv_module_device_table[4] = {
{
.vendor = PCI_VENDOR_ID_NVIDIA,
.device = PCI_ANY_ID,
@@ -76,4 +85,6 @@ struct pci_device_id nv_module_device_table[] = {
{ }
};
#if defined(NV_LINUX)
MODULE_DEVICE_TABLE(pci, nv_module_device_table);
#endif

View File

@@ -27,5 +27,6 @@
#include <linux/pci.h>
extern struct pci_device_id nv_pci_table[];
extern struct pci_device_id nv_module_device_table[4];
#endif /* _NV_PCI_TABLE_H_ */

View File

@@ -43,9 +43,13 @@
#if defined(NV_LINUX_FENCE_H_PRESENT)
typedef struct fence nv_dma_fence_t;
typedef struct fence_ops nv_dma_fence_ops_t;
typedef struct fence_cb nv_dma_fence_cb_t;
typedef fence_func_t nv_dma_fence_func_t;
#else
typedef struct dma_fence nv_dma_fence_t;
typedef struct dma_fence_ops nv_dma_fence_ops_t;
typedef struct dma_fence_cb nv_dma_fence_cb_t;
typedef dma_fence_func_t nv_dma_fence_func_t;
#endif
#if defined(NV_LINUX_FENCE_H_PRESENT)
@@ -97,6 +101,14 @@ static inline int nv_dma_fence_signal(nv_dma_fence_t *fence) {
#endif
}
static inline int nv_dma_fence_signal_locked(nv_dma_fence_t *fence) {
#if defined(NV_LINUX_FENCE_H_PRESENT)
return fence_signal_locked(fence);
#else
return dma_fence_signal_locked(fence);
#endif
}
static inline u64 nv_dma_fence_context_alloc(unsigned num) {
#if defined(NV_LINUX_FENCE_H_PRESENT)
return fence_context_alloc(num);
@@ -108,7 +120,7 @@ static inline u64 nv_dma_fence_context_alloc(unsigned num) {
static inline void
nv_dma_fence_init(nv_dma_fence_t *fence,
const nv_dma_fence_ops_t *ops,
spinlock_t *lock, u64 context, unsigned seqno) {
spinlock_t *lock, u64 context, uint64_t seqno) {
#if defined(NV_LINUX_FENCE_H_PRESENT)
fence_init(fence, ops, lock, context, seqno);
#else
@@ -116,6 +128,29 @@ nv_dma_fence_init(nv_dma_fence_t *fence,
#endif
}
static inline void
nv_dma_fence_set_error(nv_dma_fence_t *fence,
int error) {
#if defined(NV_DMA_FENCE_SET_ERROR_PRESENT)
return dma_fence_set_error(fence, error);
#elif defined(NV_FENCE_SET_ERROR_PRESENT)
return fence_set_error(fence, error);
#else
fence->status = error;
#endif
}
static inline int
nv_dma_fence_add_callback(nv_dma_fence_t *fence,
nv_dma_fence_cb_t *cb,
nv_dma_fence_func_t func) {
#if defined(NV_LINUX_FENCE_H_PRESENT)
return fence_add_callback(fence, cb, func);
#else
return dma_fence_add_callback(fence, cb, func);
#endif
}
#endif /* defined(NV_DRM_FENCE_AVAILABLE) */
#endif /* __NVIDIA_DMA_FENCE_HELPER_H__ */

View File

@@ -121,6 +121,20 @@ static inline void nv_dma_resv_add_excl_fence(nv_dma_resv_t *obj,
#endif
}
static inline void nv_dma_resv_add_shared_fence(nv_dma_resv_t *obj,
nv_dma_fence_t *fence)
{
#if defined(NV_LINUX_DMA_RESV_H_PRESENT)
#if defined(NV_DMA_RESV_ADD_FENCE_PRESENT)
dma_resv_add_fence(obj, fence, DMA_RESV_USAGE_READ);
#else
dma_resv_add_shared_fence(obj, fence);
#endif
#else
reservation_object_add_shared_fence(obj, fence);
#endif
}
#endif /* defined(NV_DRM_FENCE_AVAILABLE) */
#endif /* __NVIDIA_DMA_RESV_HELPER_H__ */

View File

@@ -24,6 +24,7 @@
#define __NVIDIA_DRM_CONFTEST_H__
#include "conftest.h"
#include "nvtypes.h"
/*
* NOTE: This file is expected to get included at the top before including any
@@ -61,4 +62,132 @@
#undef NV_DRM_FENCE_AVAILABLE
#endif
/*
* We can support color management if either drm_helper_crtc_enable_color_mgmt()
* or drm_crtc_enable_color_mgmt() exist.
*/
#if defined(NV_DRM_HELPER_CRTC_ENABLE_COLOR_MGMT_PRESENT) || \
defined(NV_DRM_CRTC_ENABLE_COLOR_MGMT_PRESENT)
#define NV_DRM_COLOR_MGMT_AVAILABLE
#else
#undef NV_DRM_COLOR_MGMT_AVAILABLE
#endif
/*
* Adapt to quirks in FreeBSD's Linux kernel compatibility layer.
*/
#if defined(NV_BSD)
#include <linux/rwsem.h>
#include <sys/param.h>
#include <sys/lock.h>
#include <sys/sx.h>
/* For nv_drm_gem_prime_force_fence_signal */
#ifndef spin_is_locked
#define spin_is_locked(lock) mtx_owned(lock.m)
#endif
#ifndef rwsem_is_locked
#define rwsem_is_locked(sem) (((sem)->sx.sx_lock & (SX_LOCK_SHARED)) \
|| ((sem)->sx.sx_lock & ~(SX_LOCK_FLAGMASK & ~SX_LOCK_SHARED)))
#endif
/*
* FreeBSD does not define vm_flags_t in its linuxkpi, since there is already
* a FreeBSD vm_flags_t (of a different size) and they don't want the names to
* collide. Temporarily redefine it when including nv-mm.h
*/
#define vm_flags_t unsigned long
#include "nv-mm.h"
#undef vm_flags_t
/*
* sys/nv.h and nvidia/nv.h have the same header guard
* we need to clear it for nvlist_t to get loaded
*/
#undef _NV_H_
#include <sys/nv.h>
/*
* For now just use set_page_dirty as the lock variant
* is not ported for FreeBSD. (in progress). This calls
* vm_page_dirty. Used in nv-mm.h
*/
#define set_page_dirty_lock set_page_dirty
/*
* FreeBSD does not implement drm_atomic_state_free, simply
* default to drm_atomic_state_put
*/
#define drm_atomic_state_free drm_atomic_state_put
#if __FreeBSD_version < 1300000
/* redefine LIST_HEAD_INIT to the linux version */
#include <linux/list.h>
#define LIST_HEAD_INIT(name) LINUX_LIST_HEAD_INIT(name)
#endif
/*
* FreeBSD currently has only vmf_insert_pfn_prot defined, and it has a
* static assert warning not to use it since all of DRM's usages are in
* loops with the vm obj lock(s) held. Instead we should use the lkpi
* function itself directly. For us none of this applies so we can just
* wrap it in our own definition of vmf_insert_pfn
*/
#ifndef NV_VMF_INSERT_PFN_PRESENT
#define NV_VMF_INSERT_PFN_PRESENT 1
#if __FreeBSD_version < 1300000
#define VM_SHARED (1 << 17)
/* Not present in 12.2 */
static inline vm_fault_t
lkpi_vmf_insert_pfn_prot_locked(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn, pgprot_t prot)
{
vm_object_t vm_obj = vma->vm_obj;
vm_page_t page;
vm_pindex_t pindex;
VM_OBJECT_ASSERT_WLOCKED(vm_obj);
pindex = OFF_TO_IDX(addr - vma->vm_start);
if (vma->vm_pfn_count == 0)
vma->vm_pfn_first = pindex;
MPASS(pindex <= OFF_TO_IDX(vma->vm_end));
page = vm_page_grab(vm_obj, pindex, VM_ALLOC_NORMAL);
if (page == NULL) {
page = PHYS_TO_VM_PAGE(IDX_TO_OFF(pfn));
vm_page_xbusy(page);
if (vm_page_insert(page, vm_obj, pindex)) {
vm_page_xunbusy(page);
return (VM_FAULT_OOM);
}
page->valid = VM_PAGE_BITS_ALL;
}
pmap_page_set_memattr(page, pgprot2cachemode(prot));
vma->vm_pfn_count++;
return (VM_FAULT_NOPAGE);
}
#endif
static inline vm_fault_t
vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn)
{
vm_fault_t ret;
VM_OBJECT_WLOCK(vma->vm_obj);
ret = lkpi_vmf_insert_pfn_prot_locked(vma, addr, pfn, vma->vm_page_prot);
VM_OBJECT_WUNLOCK(vma->vm_obj);
return (ret);
}
#endif
#endif /* defined(NV_BSD) */
#endif /* defined(__NVIDIA_DRM_CONFTEST_H__) */

View File

@@ -349,10 +349,125 @@ nv_drm_connector_best_encoder(struct drm_connector *connector)
return NULL;
}
#if defined(NV_DRM_MODE_CREATE_DP_COLORSPACE_PROPERTY_HAS_SUPPORTED_COLORSPACES_ARG)
static const NvU32 __nv_drm_connector_supported_colorspaces =
BIT(DRM_MODE_COLORIMETRY_BT2020_RGB) |
BIT(DRM_MODE_COLORIMETRY_BT2020_YCC);
#endif
#if defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT)
static int
__nv_drm_connector_atomic_check(struct drm_connector *connector,
struct drm_atomic_state *state)
{
struct drm_connector_state *new_connector_state =
drm_atomic_get_new_connector_state(state, connector);
struct drm_connector_state *old_connector_state =
drm_atomic_get_old_connector_state(state, connector);
struct nv_drm_device *nv_dev = to_nv_device(connector->dev);
struct drm_crtc *crtc = new_connector_state->crtc;
struct drm_crtc_state *crtc_state;
struct nv_drm_crtc_state *nv_crtc_state;
struct NvKmsKapiHeadRequestedConfig *req_config;
if (!crtc) {
return 0;
}
crtc_state = drm_atomic_get_new_crtc_state(state, crtc);
nv_crtc_state = to_nv_crtc_state(crtc_state);
req_config = &nv_crtc_state->req_config;
/*
* Override metadata for the entire head instead of allowing NVKMS to derive
* it from the layers' metadata.
*
* This is the metadata that will sent to the display, and if applicable,
* layers will be tone mapped to this metadata rather than that of the
* display.
*/
req_config->flags.hdrInfoFrameChanged =
!drm_connector_atomic_hdr_metadata_equal(old_connector_state,
new_connector_state);
if (new_connector_state->hdr_output_metadata &&
new_connector_state->hdr_output_metadata->data) {
/*
* Note that HDMI definitions are used here even though we might not
* be using HDMI. While that seems odd, it is consistent with
* upstream behavior.
*/
struct hdr_output_metadata *hdr_metadata =
new_connector_state->hdr_output_metadata->data;
struct hdr_metadata_infoframe *info_frame =
&hdr_metadata->hdmi_metadata_type1;
unsigned int i;
if (hdr_metadata->metadata_type != HDMI_STATIC_METADATA_TYPE1) {
return -EINVAL;
}
for (i = 0; i < ARRAY_SIZE(info_frame->display_primaries); i++) {
req_config->modeSetConfig.hdrInfoFrame.staticMetadata.displayPrimaries[i].x =
info_frame->display_primaries[i].x;
req_config->modeSetConfig.hdrInfoFrame.staticMetadata.displayPrimaries[i].y =
info_frame->display_primaries[i].y;
}
req_config->modeSetConfig.hdrInfoFrame.staticMetadata.whitePoint.x =
info_frame->white_point.x;
req_config->modeSetConfig.hdrInfoFrame.staticMetadata.whitePoint.y =
info_frame->white_point.y;
req_config->modeSetConfig.hdrInfoFrame.staticMetadata.maxDisplayMasteringLuminance =
info_frame->max_display_mastering_luminance;
req_config->modeSetConfig.hdrInfoFrame.staticMetadata.minDisplayMasteringLuminance =
info_frame->min_display_mastering_luminance;
req_config->modeSetConfig.hdrInfoFrame.staticMetadata.maxCLL =
info_frame->max_cll;
req_config->modeSetConfig.hdrInfoFrame.staticMetadata.maxFALL =
info_frame->max_fall;
req_config->modeSetConfig.hdrInfoFrame.eotf = info_frame->eotf;
req_config->modeSetConfig.hdrInfoFrame.enabled = NV_TRUE;
} else {
req_config->modeSetConfig.hdrInfoFrame.enabled = NV_FALSE;
}
req_config->flags.colorimetryChanged =
(old_connector_state->colorspace != new_connector_state->colorspace);
// When adding a case here, also add to __nv_drm_connector_supported_colorspaces
switch (new_connector_state->colorspace) {
case DRM_MODE_COLORIMETRY_DEFAULT:
req_config->modeSetConfig.colorimetry =
NVKMS_OUTPUT_COLORIMETRY_DEFAULT;
break;
case DRM_MODE_COLORIMETRY_BT2020_RGB:
case DRM_MODE_COLORIMETRY_BT2020_YCC:
// Ignore RGB/YCC
// See https://patchwork.freedesktop.org/patch/525496/?series=111865&rev=4
req_config->modeSetConfig.colorimetry =
NVKMS_OUTPUT_COLORIMETRY_BT2100;
break;
default:
// XXX HDR TODO: Add support for more color spaces
NV_DRM_DEV_LOG_ERR(nv_dev, "Unsupported color space");
return -EINVAL;
}
return 0;
}
#endif /* defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT) */
static const struct drm_connector_helper_funcs nv_connector_helper_funcs = {
.get_modes = nv_drm_connector_get_modes,
.mode_valid = nv_drm_connector_mode_valid,
.best_encoder = nv_drm_connector_best_encoder,
#if defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT)
.atomic_check = __nv_drm_connector_atomic_check,
#endif
};
static struct drm_connector*
@@ -405,6 +520,32 @@ nv_drm_connector_new(struct drm_device *dev,
DRM_CONNECTOR_POLL_CONNECT | DRM_CONNECTOR_POLL_DISCONNECT;
}
#if defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT)
if (nv_connector->type == NVKMS_CONNECTOR_TYPE_HDMI) {
#if defined(NV_DRM_MODE_CREATE_DP_COLORSPACE_PROPERTY_HAS_SUPPORTED_COLORSPACES_ARG)
if (drm_mode_create_hdmi_colorspace_property(
&nv_connector->base,
__nv_drm_connector_supported_colorspaces) == 0) {
#else
if (drm_mode_create_hdmi_colorspace_property(&nv_connector->base) == 0) {
#endif
drm_connector_attach_colorspace_property(&nv_connector->base);
}
drm_connector_attach_hdr_output_metadata_property(&nv_connector->base);
} else if (nv_connector->type == NVKMS_CONNECTOR_TYPE_DP) {
#if defined(NV_DRM_MODE_CREATE_DP_COLORSPACE_PROPERTY_HAS_SUPPORTED_COLORSPACES_ARG)
if (drm_mode_create_dp_colorspace_property(
&nv_connector->base,
__nv_drm_connector_supported_colorspaces) == 0) {
#else
if (drm_mode_create_dp_colorspace_property(&nv_connector->base) == 0) {
#endif
drm_connector_attach_colorspace_property(&nv_connector->base);
}
drm_connector_attach_hdr_output_metadata_property(&nv_connector->base);
}
#endif /* defined(NV_DRM_CONNECTOR_ATTACH_HDR_OUTPUT_METADATA_PROPERTY_PRESENT) */
/* Register connector with DRM subsystem */
ret = drm_connector_register(&nv_connector->base);

View File

@@ -48,6 +48,11 @@
#include <linux/host1x-next.h>
#endif
#if defined(NV_DRM_DRM_COLOR_MGMT_H_PRESENT)
#include <drm/drm_color_mgmt.h>
#endif
#if defined(NV_DRM_HAS_HDR_OUTPUT_METADATA)
static int
nv_drm_atomic_replace_property_blob_from_id(struct drm_device *dev,
@@ -87,11 +92,22 @@ static void nv_drm_plane_destroy(struct drm_plane *plane)
nv_drm_free(nv_plane);
}
static inline void
plane_config_clear(struct NvKmsKapiLayerConfig *layerConfig)
{
if (layerConfig == NULL) {
return;
}
memset(layerConfig, 0, sizeof(*layerConfig));
layerConfig->csc = NVKMS_IDENTITY_CSC_MATRIX;
}
static inline void
plane_req_config_disable(struct NvKmsKapiLayerRequestedConfig *req_config)
{
/* Clear layer config */
memset(&req_config->config, 0, sizeof(req_config->config));
plane_config_clear(&req_config->config);
/* Set flags to get cleared layer config applied */
req_config->flags.surfaceChanged = NV_TRUE;
@@ -108,6 +124,45 @@ cursor_req_config_disable(struct NvKmsKapiCursorRequestedConfig *req_config)
req_config->flags.surfaceChanged = NV_TRUE;
}
#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
static void color_mgmt_config_ctm_to_csc(struct NvKmsCscMatrix *nvkms_csc,
struct drm_color_ctm *drm_ctm)
{
int y;
/* CTM is a 3x3 matrix while ours is 3x4. Zero out the last column. */
nvkms_csc->m[0][3] = nvkms_csc->m[1][3] = nvkms_csc->m[2][3] = 0;
for (y = 0; y < 3; y++) {
int x;
for (x = 0; x < 3; x++) {
/*
* Values in the CTM are encoded in S31.32 sign-magnitude fixed-
* point format, while NvKms CSC values are signed 2's-complement
* S15.16 (Ssign-extend12-3.16?) fixed-point format.
*/
NvU64 ctmVal = drm_ctm->matrix[y*3 + x];
NvU64 signBit = ctmVal & (1ULL << 63);
NvU64 magnitude = ctmVal & ~signBit;
/*
* Drop the low 16 bits of the fractional part and the high 17 bits
* of the integral part. Drop 17 bits to avoid corner cases where
* the highest resulting bit is a 1, causing the `cscVal = -cscVal`
* line to result in a positive number.
*/
NvS32 cscVal = (magnitude >> 16) & ((1ULL << 31) - 1);
if (signBit) {
cscVal = -cscVal;
}
nvkms_csc->m[y][x] = cscVal;
}
}
}
#endif /* NV_DRM_COLOR_MGMT_AVAILABLE */
static void
cursor_plane_req_config_update(struct drm_plane *plane,
struct drm_plane_state *plane_state,
@@ -121,12 +176,10 @@ cursor_plane_req_config_update(struct drm_plane *plane,
return;
}
*req_config = (struct NvKmsKapiCursorRequestedConfig) {
.surface = to_nv_framebuffer(plane_state->fb)->pSurface,
.dstX = plane_state->crtc_x,
.dstY = plane_state->crtc_y,
};
memset(req_config, 0, sizeof(*req_config));
req_config->surface = to_nv_framebuffer(plane_state->fb)->pSurface;
req_config->dstX = plane_state->crtc_x;
req_config->dstY = plane_state->crtc_y;
#if defined(NV_DRM_ALPHA_BLENDING_AVAILABLE)
if (plane->blend_mode_property != NULL && plane->alpha_property != NULL) {
@@ -220,22 +273,22 @@ plane_req_config_update(struct drm_plane *plane,
return 0;
}
*req_config = (struct NvKmsKapiLayerRequestedConfig) {
.config = {
.surface = to_nv_framebuffer(plane_state->fb)->pSurface,
memset(req_config, 0, sizeof(*req_config));
/* Source values are 16.16 fixed point */
.srcX = plane_state->src_x >> 16,
.srcY = plane_state->src_y >> 16,
.srcWidth = plane_state->src_w >> 16,
.srcHeight = plane_state->src_h >> 16,
req_config->config.surface = to_nv_framebuffer(plane_state->fb)->pSurface;
.dstX = plane_state->crtc_x,
.dstY = plane_state->crtc_y,
.dstWidth = plane_state->crtc_w,
.dstHeight = plane_state->crtc_h,
},
};
/* Source values are 16.16 fixed point */
req_config->config.srcX = plane_state->src_x >> 16;
req_config->config.srcY = plane_state->src_y >> 16;
req_config->config.srcWidth = plane_state->src_w >> 16;
req_config->config.srcHeight = plane_state->src_h >> 16;
req_config->config.dstX = plane_state->crtc_x;
req_config->config.dstY = plane_state->crtc_y;
req_config->config.dstWidth = plane_state->crtc_w;
req_config->config.dstHeight = plane_state->crtc_h;
req_config->config.csc = old_config.csc;
#if defined(NV_DRM_ROTATION_AVAILABLE)
/*
@@ -399,27 +452,25 @@ plane_req_config_update(struct drm_plane *plane,
}
for (i = 0; i < ARRAY_SIZE(info_frame->display_primaries); i ++) {
req_config->config.hdrMetadata.displayPrimaries[i].x =
req_config->config.hdrMetadata.val.displayPrimaries[i].x =
info_frame->display_primaries[i].x;
req_config->config.hdrMetadata.displayPrimaries[i].y =
req_config->config.hdrMetadata.val.displayPrimaries[i].y =
info_frame->display_primaries[i].y;
}
req_config->config.hdrMetadata.whitePoint.x =
req_config->config.hdrMetadata.val.whitePoint.x =
info_frame->white_point.x;
req_config->config.hdrMetadata.whitePoint.y =
req_config->config.hdrMetadata.val.whitePoint.y =
info_frame->white_point.y;
req_config->config.hdrMetadata.maxDisplayMasteringLuminance =
req_config->config.hdrMetadata.val.maxDisplayMasteringLuminance =
info_frame->max_display_mastering_luminance;
req_config->config.hdrMetadata.minDisplayMasteringLuminance =
req_config->config.hdrMetadata.val.minDisplayMasteringLuminance =
info_frame->min_display_mastering_luminance;
req_config->config.hdrMetadata.maxCLL =
req_config->config.hdrMetadata.val.maxCLL =
info_frame->max_cll;
req_config->config.hdrMetadata.maxFALL =
req_config->config.hdrMetadata.val.maxFALL =
info_frame->max_fall;
req_config->config.hdrMetadataSpecified = true;
switch (info_frame->eotf) {
case HDMI_EOTF_SMPTE_ST2084:
req_config->config.tf = NVKMS_OUTPUT_TF_PQ;
@@ -432,10 +483,21 @@ plane_req_config_update(struct drm_plane *plane,
NV_DRM_DEV_LOG_ERR(nv_dev, "Unsupported EOTF");
return -1;
}
req_config->config.hdrMetadata.enabled = true;
} else {
req_config->config.hdrMetadataSpecified = false;
req_config->config.hdrMetadata.enabled = false;
req_config->config.tf = NVKMS_OUTPUT_TF_NONE;
}
req_config->flags.hdrMetadataChanged =
((old_config.hdrMetadata.enabled !=
req_config->config.hdrMetadata.enabled) ||
memcmp(&old_config.hdrMetadata.val,
&req_config->config.hdrMetadata.val,
sizeof(struct NvKmsHDRStaticMetadata)));
req_config->flags.tfChanged = (old_config.tf != req_config->config.tf);
#endif
/*
@@ -564,6 +626,24 @@ static int nv_drm_plane_atomic_check(struct drm_plane *plane,
return ret;
}
#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
if (crtc_state->color_mgmt_changed) {
/*
* According to the comment in the Linux kernel's
* drivers/gpu/drm/drm_color_mgmt.c, if this property is NULL,
* the CTM needs to be changed to the identity matrix
*/
if (crtc_state->ctm) {
color_mgmt_config_ctm_to_csc(&plane_requested_config->config.csc,
(struct drm_color_ctm *)crtc_state->ctm->data);
} else {
plane_requested_config->config.csc = NVKMS_IDENTITY_CSC_MATRIX;
}
plane_requested_config->config.cscUseMain = NV_FALSE;
plane_requested_config->flags.cscChanged = NV_TRUE;
}
#endif /* NV_DRM_COLOR_MGMT_AVAILABLE */
if (__is_async_flip_requested(plane, crtc_state)) {
/*
* Async flip requests that the flip happen 'as soon as
@@ -604,9 +684,7 @@ static int nv_drm_plane_atomic_set_property(
to_nv_drm_plane_state(state);
if (property == nv_dev->nv_out_fence_property) {
#if defined(NV_LINUX_NVHOST_H_PRESENT) && defined(CONFIG_TEGRA_GRHOST)
nv_drm_plane_state->fd_user_ptr = u64_to_user_ptr(val);
#endif
nv_drm_plane_state->fd_user_ptr = (void __user *)(uintptr_t)(val);
return 0;
} else if (property == nv_dev->nv_input_colorspace_property) {
nv_drm_plane_state->input_colorspace = val;
@@ -654,6 +732,38 @@ static int nv_drm_plane_atomic_get_property(
return -EINVAL;
}
/**
* nv_drm_plane_atomic_reset - plane state reset hook
* @plane: DRM plane
*
* Allocate an empty DRM plane state.
*/
static void nv_drm_plane_atomic_reset(struct drm_plane *plane)
{
struct nv_drm_plane_state *nv_plane_state =
nv_drm_calloc(1, sizeof(*nv_plane_state));
if (!nv_plane_state) {
return;
}
drm_atomic_helper_plane_reset(plane);
/*
* The drm atomic helper function allocates a state object that is the wrong
* size. Copy its contents into the one we allocated above and replace the
* pointer.
*/
if (plane->state) {
nv_plane_state->base = *plane->state;
kfree(plane->state);
plane->state = &nv_plane_state->base;
} else {
kfree(nv_plane_state);
}
}
static struct drm_plane_state *
nv_drm_plane_atomic_duplicate_state(struct drm_plane *plane)
{
@@ -692,9 +802,11 @@ static inline void __nv_drm_plane_atomic_destroy_state(
#endif
#if defined(NV_DRM_HAS_HDR_OUTPUT_METADATA)
struct nv_drm_plane_state *nv_drm_plane_state =
to_nv_drm_plane_state(state);
drm_property_blob_put(nv_drm_plane_state->hdr_output_metadata);
{
struct nv_drm_plane_state *nv_drm_plane_state =
to_nv_drm_plane_state(state);
drm_property_blob_put(nv_drm_plane_state->hdr_output_metadata);
}
#endif
}
@@ -711,7 +823,7 @@ static const struct drm_plane_funcs nv_plane_funcs = {
.update_plane = drm_atomic_helper_update_plane,
.disable_plane = drm_atomic_helper_disable_plane,
.destroy = nv_drm_plane_destroy,
.reset = drm_atomic_helper_plane_reset,
.reset = nv_drm_plane_atomic_reset,
.atomic_get_property = nv_drm_plane_atomic_get_property,
.atomic_set_property = nv_drm_plane_atomic_set_property,
.atomic_duplicate_state = nv_drm_plane_atomic_duplicate_state,
@@ -757,14 +869,58 @@ static inline void nv_drm_crtc_duplicate_req_head_modeset_config(
* there is no change in new configuration yet with respect
* to older one!
*/
*new = (struct NvKmsKapiHeadRequestedConfig) {
.modeSetConfig = old->modeSetConfig,
};
memset(new, 0, sizeof(*new));
new->modeSetConfig = old->modeSetConfig;
for (i = 0; i < ARRAY_SIZE(old->layerRequestedConfig); i++) {
new->layerRequestedConfig[i] = (struct NvKmsKapiLayerRequestedConfig) {
.config = old->layerRequestedConfig[i].config,
};
new->layerRequestedConfig[i].config =
old->layerRequestedConfig[i].config;
}
}
static inline struct nv_drm_crtc_state *nv_drm_crtc_state_alloc(void)
{
struct nv_drm_crtc_state *nv_state = nv_drm_calloc(1, sizeof(*nv_state));
int i;
if (nv_state == NULL) {
return NULL;
}
for (i = 0; i < ARRAY_SIZE(nv_state->req_config.layerRequestedConfig); i++) {
plane_config_clear(&nv_state->req_config.layerRequestedConfig[i].config);
}
return nv_state;
}
/**
* nv_drm_atomic_crtc_reset - crtc state reset hook
* @crtc: DRM crtc
*
* Allocate an empty DRM crtc state.
*/
static void nv_drm_atomic_crtc_reset(struct drm_crtc *crtc)
{
struct nv_drm_crtc_state *nv_state = nv_drm_crtc_state_alloc();
if (!nv_state) {
return;
}
drm_atomic_helper_crtc_reset(crtc);
/*
* The drm atomic helper function allocates a state object that is the wrong
* size. Copy its contents into the one we allocated above and replace the
* pointer.
*/
if (crtc->state) {
nv_state->base = *crtc->state;
kfree(crtc->state);
crtc->state = &nv_state->base;
} else {
kfree(nv_state);
}
}
@@ -779,7 +935,7 @@ static inline void nv_drm_crtc_duplicate_req_head_modeset_config(
static struct drm_crtc_state*
nv_drm_atomic_crtc_duplicate_state(struct drm_crtc *crtc)
{
struct nv_drm_crtc_state *nv_state = nv_drm_calloc(1, sizeof(*nv_state));
struct nv_drm_crtc_state *nv_state = nv_drm_crtc_state_alloc();
if (nv_state == NULL) {
return NULL;
@@ -800,6 +956,9 @@ nv_drm_atomic_crtc_duplicate_state(struct drm_crtc *crtc)
&(to_nv_crtc_state(crtc->state)->req_config),
&nv_state->req_config);
nv_state->ilut_ramps = NULL;
nv_state->olut_ramps = NULL;
return &nv_state->base;
}
@@ -823,16 +982,22 @@ static void nv_drm_atomic_crtc_destroy_state(struct drm_crtc *crtc,
__nv_drm_atomic_helper_crtc_destroy_state(crtc, &nv_state->base);
nv_drm_free(nv_state->ilut_ramps);
nv_drm_free(nv_state->olut_ramps);
nv_drm_free(nv_state);
}
static struct drm_crtc_funcs nv_crtc_funcs = {
.set_config = drm_atomic_helper_set_config,
.page_flip = drm_atomic_helper_page_flip,
.reset = drm_atomic_helper_crtc_reset,
.reset = nv_drm_atomic_crtc_reset,
.destroy = nv_drm_crtc_destroy,
.atomic_duplicate_state = nv_drm_atomic_crtc_duplicate_state,
.atomic_destroy_state = nv_drm_atomic_crtc_destroy_state,
#if defined(NV_DRM_ATOMIC_HELPER_LEGACY_GAMMA_SET_PRESENT)
.gamma_set = drm_atomic_helper_legacy_gamma_set,
#endif
};
/*
@@ -866,6 +1031,132 @@ static int head_modeset_config_attach_connector(
return 0;
}
#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
static int color_mgmt_config_copy_lut(struct NvKmsLutRamps *nvkms_lut,
struct drm_color_lut *drm_lut,
uint64_t lut_len)
{
uint64_t i = 0;
if (lut_len != NVKMS_LUT_ARRAY_SIZE) {
return -EINVAL;
}
/*
* Both NvKms and drm LUT values are 16-bit linear values. NvKms LUT ramps
* are in arrays in a single struct while drm LUT ramps are an array of
* structs.
*/
for (i = 0; i < lut_len; i++) {
nvkms_lut->red[i] = drm_lut[i].red;
nvkms_lut->green[i] = drm_lut[i].green;
nvkms_lut->blue[i] = drm_lut[i].blue;
}
return 0;
}
static int color_mgmt_config_set_luts(struct nv_drm_crtc_state *nv_crtc_state,
struct NvKmsKapiHeadRequestedConfig *req_config)
{
struct NvKmsKapiHeadModeSetConfig *modeset_config =
&req_config->modeSetConfig;
struct drm_crtc_state *crtc_state = &nv_crtc_state->base;
int ret = 0;
/*
* According to the comment in the Linux kernel's
* drivers/gpu/drm/drm_color_mgmt.c, if either property is NULL, that LUT
* needs to be changed to a linear LUT
*/
req_config->flags.lutChanged = NV_TRUE;
if (crtc_state->degamma_lut) {
struct drm_color_lut *degamma_lut = NULL;
uint64_t degamma_len = 0;
nv_crtc_state->ilut_ramps = nv_drm_calloc(1, sizeof(*nv_crtc_state->ilut_ramps));
if (!nv_crtc_state->ilut_ramps) {
ret = -ENOMEM;
goto fail;
}
degamma_lut = (struct drm_color_lut *)crtc_state->degamma_lut->data;
degamma_len = crtc_state->degamma_lut->length /
sizeof(struct drm_color_lut);
if ((ret = color_mgmt_config_copy_lut(nv_crtc_state->ilut_ramps,
degamma_lut,
degamma_len)) != 0) {
goto fail;
}
modeset_config->lut.input.specified = NV_TRUE;
modeset_config->lut.input.depth = 30; /* specify the full LUT */
modeset_config->lut.input.start = 0;
modeset_config->lut.input.end = degamma_len - 1;
modeset_config->lut.input.pRamps = nv_crtc_state->ilut_ramps;
} else {
/* setting input.end to 0 is equivalent to disabling the LUT, which
* should be equivalent to a linear LUT */
modeset_config->lut.input.specified = NV_TRUE;
modeset_config->lut.input.depth = 30; /* specify the full LUT */
modeset_config->lut.input.start = 0;
modeset_config->lut.input.end = 0;
modeset_config->lut.input.pRamps = NULL;
}
if (crtc_state->gamma_lut) {
struct drm_color_lut *gamma_lut = NULL;
uint64_t gamma_len = 0;
nv_crtc_state->olut_ramps = nv_drm_calloc(1, sizeof(*nv_crtc_state->olut_ramps));
if (!nv_crtc_state->olut_ramps) {
ret = -ENOMEM;
goto fail;
}
gamma_lut = (struct drm_color_lut *)crtc_state->gamma_lut->data;
gamma_len = crtc_state->gamma_lut->length /
sizeof(struct drm_color_lut);
if ((ret = color_mgmt_config_copy_lut(nv_crtc_state->olut_ramps,
gamma_lut,
gamma_len)) != 0) {
goto fail;
}
modeset_config->lut.output.specified = NV_TRUE;
modeset_config->lut.output.enabled = NV_TRUE;
modeset_config->lut.output.pRamps = nv_crtc_state->olut_ramps;
} else {
/* disabling the output LUT should be equivalent to setting a linear
* LUT */
modeset_config->lut.output.specified = NV_TRUE;
modeset_config->lut.output.enabled = NV_FALSE;
modeset_config->lut.output.pRamps = NULL;
}
return 0;
fail:
/* free allocated state */
nv_drm_free(nv_crtc_state->ilut_ramps);
nv_drm_free(nv_crtc_state->olut_ramps);
/* remove dangling pointers */
nv_crtc_state->ilut_ramps = NULL;
nv_crtc_state->olut_ramps = NULL;
modeset_config->lut.input.pRamps = NULL;
modeset_config->lut.output.pRamps = NULL;
/* prevent attempts at reading NULLs */
modeset_config->lut.input.specified = NV_FALSE;
modeset_config->lut.output.specified = NV_FALSE;
return ret;
}
#endif /* NV_DRM_COLOR_MGMT_AVAILABLE */
/**
* nv_drm_crtc_atomic_check() can fail after it has modified
* the 'nv_drm_crtc_state::req_config', that is fine because 'nv_drm_crtc_state'
@@ -887,6 +1178,9 @@ static int nv_drm_crtc_atomic_check(struct drm_crtc *crtc,
struct NvKmsKapiHeadRequestedConfig *req_config =
&nv_crtc_state->req_config;
int ret = 0;
#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
struct nv_drm_device *nv_dev = to_nv_device(crtc_state->crtc->dev);
#endif
if (crtc_state->mode_changed) {
drm_mode_to_nvkms_display_mode(&crtc_state->mode,
@@ -925,6 +1219,25 @@ static int nv_drm_crtc_atomic_check(struct drm_crtc *crtc,
req_config->flags.activeChanged = NV_TRUE;
}
#if defined(NV_DRM_CRTC_STATE_HAS_VRR_ENABLED)
req_config->modeSetConfig.vrrEnabled = crtc_state->vrr_enabled;
#endif
#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
if (nv_dev->drmMasterChangedSinceLastAtomicCommit &&
(crtc_state->degamma_lut ||
crtc_state->ctm ||
crtc_state->gamma_lut)) {
crtc_state->color_mgmt_changed = NV_TRUE;
}
if (crtc_state->color_mgmt_changed) {
if ((ret = color_mgmt_config_set_luts(nv_crtc_state, req_config)) != 0) {
return ret;
}
}
#endif
return ret;
}
@@ -1156,6 +1469,8 @@ nv_drm_plane_create(struct drm_device *dev,
plane,
validLayerRRTransforms);
nv_drm_free(formats);
return plane;
failed_plane_init:
@@ -1187,7 +1502,7 @@ static struct drm_crtc *__nv_drm_crtc_create(struct nv_drm_device *nv_dev,
goto failed;
}
nv_state = nv_drm_calloc(1, sizeof(*nv_state));
nv_state = nv_drm_crtc_state_alloc();
if (nv_state == NULL) {
goto failed_state_alloc;
}
@@ -1220,6 +1535,22 @@ static struct drm_crtc *__nv_drm_crtc_create(struct nv_drm_device *nv_dev,
drm_crtc_helper_add(&nv_crtc->base, &nv_crtc_helper_funcs);
#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
#if defined(NV_DRM_CRTC_ENABLE_COLOR_MGMT_PRESENT)
drm_crtc_enable_color_mgmt(&nv_crtc->base, NVKMS_LUT_ARRAY_SIZE, true,
NVKMS_LUT_ARRAY_SIZE);
#else
drm_helper_crtc_enable_color_mgmt(&nv_crtc->base, NVKMS_LUT_ARRAY_SIZE,
NVKMS_LUT_ARRAY_SIZE);
#endif
ret = drm_mode_crtc_set_gamma_size(&nv_crtc->base, NVKMS_LUT_ARRAY_SIZE);
if (ret != 0) {
NV_DRM_DEV_LOG_WARN(
nv_dev,
"Failed to initialize legacy gamma support for head %u", head);
}
#endif
return &nv_crtc->base;
failed_init_crtc:
@@ -1328,10 +1659,16 @@ static void NvKmsKapiCrcsToDrm(const struct NvKmsKapiCrcs *crcs,
{
drmCrcs->outputCrc32.value = crcs->outputCrc32.value;
drmCrcs->outputCrc32.supported = crcs->outputCrc32.supported;
drmCrcs->outputCrc32.__pad0 = 0;
drmCrcs->outputCrc32.__pad1 = 0;
drmCrcs->rasterGeneratorCrc32.value = crcs->rasterGeneratorCrc32.value;
drmCrcs->rasterGeneratorCrc32.supported = crcs->rasterGeneratorCrc32.supported;
drmCrcs->rasterGeneratorCrc32.__pad0 = 0;
drmCrcs->rasterGeneratorCrc32.__pad1 = 0;
drmCrcs->compositorCrc32.value = crcs->compositorCrc32.value;
drmCrcs->compositorCrc32.supported = crcs->compositorCrc32.supported;
drmCrcs->compositorCrc32.__pad0 = 0;
drmCrcs->compositorCrc32.__pad1 = 0;
}
int nv_drm_get_crtc_crc32_v2_ioctl(struct drm_device *dev,

View File

@@ -129,6 +129,9 @@ struct nv_drm_crtc_state {
*/
struct NvKmsKapiHeadRequestedConfig req_config;
struct NvKmsLutRamps *ilut_ramps;
struct NvKmsLutRamps *olut_ramps;
/**
* @nv_flip:
*

View File

@@ -44,6 +44,10 @@
#include <drm/drmP.h>
#endif
#if defined(NV_DRM_DRM_ATOMIC_UAPI_H_PRESENT)
#include <drm/drm_atomic_uapi.h>
#endif
#if defined(NV_DRM_DRM_VBLANK_H_PRESENT)
#include <drm/drm_vblank.h>
#endif
@@ -60,7 +64,17 @@
#include <drm/drm_ioctl.h>
#endif
#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
#include <drm/drm_aperture.h>
#include <drm/drm_fb_helper.h>
#endif
#if defined(NV_DRM_DRM_FBDEV_GENERIC_H_PRESENT)
#include <drm/drm_fbdev_generic.h>
#endif
#include <linux/pci.h>
#include <linux/workqueue.h>
/*
* Commit fcd70cd36b9b ("drm: Split out drm_probe_helper.h")
@@ -84,6 +98,11 @@
#include <drm/drm_atomic_helper.h>
#endif
static int nv_drm_revoke_modeset_permission(struct drm_device *dev,
struct drm_file *filep,
NvU32 dpyId);
static int nv_drm_revoke_sub_ownership(struct drm_device *dev);
static struct nv_drm_device *dev_list = NULL;
static const char* nv_get_input_colorspace_name(
@@ -354,19 +373,15 @@ static int nv_drm_create_properties(struct nv_drm_device *nv_dev)
len++;
}
#if defined(NV_LINUX_NVHOST_H_PRESENT) && defined(CONFIG_TEGRA_GRHOST)
if (!nv_dev->supportsSyncpts) {
return 0;
if (nv_dev->supportsSyncpts) {
nv_dev->nv_out_fence_property =
drm_property_create_range(nv_dev->dev, DRM_MODE_PROP_ATOMIC,
"NV_DRM_OUT_FENCE_PTR", 0, U64_MAX);
if (nv_dev->nv_out_fence_property == NULL) {
return -ENOMEM;
}
}
nv_dev->nv_out_fence_property =
drm_property_create_range(nv_dev->dev, DRM_MODE_PROP_ATOMIC,
"NV_DRM_OUT_FENCE_PTR", 0, U64_MAX);
if (nv_dev->nv_out_fence_property == NULL) {
return -ENOMEM;
}
#endif
nv_dev->nv_input_colorspace_property =
drm_property_create_enum(nv_dev->dev, 0, "NV_INPUT_COLORSPACE",
enum_list, len);
@@ -387,6 +402,27 @@ static int nv_drm_create_properties(struct nv_drm_device *nv_dev)
return 0;
}
#if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
/*
* We can't just call drm_kms_helper_hotplug_event directly because
* fbdev_generic may attempt to set a mode from inside the hotplug event
* handler. Because kapi event handling runs on nvkms_kthread_q, this blocks
* other event processing including the flip completion notifier expected by
* nv_drm_atomic_commit.
*
* Defer hotplug event handling to a work item so that nvkms_kthread_q can
* continue processing events while a DRM modeset is in progress.
*/
static void nv_drm_handle_hotplug_event(struct work_struct *work)
{
struct delayed_work *dwork = to_delayed_work(work);
struct nv_drm_device *nv_dev =
container_of(dwork, struct nv_drm_device, hotplug_event_work);
drm_kms_helper_hotplug_event(nv_dev->dev);
}
#endif
static int nv_drm_load(struct drm_device *dev, unsigned long flags)
{
#if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
@@ -440,6 +476,22 @@ static int nv_drm_load(struct drm_device *dev, unsigned long flags)
return -ENODEV;
}
#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
/*
* If fbdev is enabled, take modeset ownership now before other DRM clients
* can take master (and thus NVKMS ownership).
*/
if (nv_drm_fbdev_module_param) {
if (!nvKms->grabOwnership(pDevice)) {
nvKms->freeDevice(pDevice);
NV_DRM_DEV_LOG_ERR(nv_dev, "Failed to grab NVKMS modeset ownership");
return -EBUSY;
}
nv_dev->hasFramebufferConsole = NV_TRUE;
}
#endif
mutex_lock(&nv_dev->lock);
/* Set NvKmsKapiDevice */
@@ -460,6 +512,11 @@ static int nv_drm_load(struct drm_device *dev, unsigned long flags)
nv_dev->supportsSyncpts = resInfo.caps.supportsSyncpts;
nv_dev->semsurf_stride = resInfo.caps.semsurf.stride;
nv_dev->semsurf_max_submitted_offset =
resInfo.caps.semsurf.maxSubmittedOffset;
#if defined(NV_DRM_FORMAT_MODIFIERS_PRESENT)
gen = nv_dev->pageKindGeneration;
kind = nv_dev->genericPageKind;
@@ -517,6 +574,7 @@ static int nv_drm_load(struct drm_device *dev, unsigned long flags)
/* Enable event handling */
INIT_DELAYED_WORK(&nv_dev->hotplug_event_work, nv_drm_handle_hotplug_event);
atomic_set(&nv_dev->enable_event_handling, true);
init_waitqueue_head(&nv_dev->flip_event_wq);
@@ -544,8 +602,20 @@ static void __nv_drm_unload(struct drm_device *dev)
return;
}
/* Release modeset ownership if fbdev is enabled */
#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
if (nv_dev->hasFramebufferConsole) {
drm_atomic_helper_shutdown(dev);
nvKms->releaseOwnership(nv_dev->pDevice);
}
#endif
cancel_delayed_work_sync(&nv_dev->hotplug_event_work);
mutex_lock(&nv_dev->lock);
WARN_ON(nv_dev->subOwnershipGranted);
/* Disable event handling */
atomic_set(&nv_dev->enable_event_handling, false);
@@ -595,9 +665,15 @@ static int __nv_drm_master_set(struct drm_device *dev,
{
struct nv_drm_device *nv_dev = to_nv_device(dev);
if (!nvKms->grabOwnership(nv_dev->pDevice)) {
/*
* If this device is driving a framebuffer, then nvidia-drm already has
* modeset ownership. Otherwise, grab ownership now.
*/
if (!nv_dev->hasFramebufferConsole &&
!nvKms->grabOwnership(nv_dev->pDevice)) {
return -EINVAL;
}
nv_dev->drmMasterChangedSinceLastAtomicCommit = NV_TRUE;
return 0;
}
@@ -631,6 +707,9 @@ void nv_drm_master_drop(struct drm_device *dev, struct drm_file *file_priv)
struct nv_drm_device *nv_dev = to_nv_device(dev);
int err;
nv_drm_revoke_modeset_permission(dev, file_priv, 0);
nv_drm_revoke_sub_ownership(dev);
/*
* After dropping nvkms modeset onwership, it is not guaranteed that
* drm and nvkms modeset state will remain in sync. Therefore, disable
@@ -655,7 +734,9 @@ void nv_drm_master_drop(struct drm_device *dev, struct drm_file *file_priv)
drm_modeset_unlock_all(dev);
nvKms->releaseOwnership(nv_dev->pDevice);
if (!nv_dev->hasFramebufferConsole) {
nvKms->releaseOwnership(nv_dev->pDevice);
}
}
#endif /* NV_DRM_ATOMIC_MODESET_AVAILABLE */
@@ -693,16 +774,39 @@ static int nv_drm_get_dev_info_ioctl(struct drm_device *dev,
params->gpu_id = nv_dev->gpu_info.gpu_id;
params->primary_index = dev->primary->index;
#if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
params->generic_page_kind = nv_dev->genericPageKind;
params->page_kind_generation = nv_dev->pageKindGeneration;
params->sector_layout = nv_dev->sectorLayout;
#else
params->supports_alloc = false;
params->generic_page_kind = 0;
params->page_kind_generation = 0;
params->sector_layout = 0;
#endif
params->supports_sync_fd = false;
params->supports_semsurf = false;
#if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
/* Memory allocation and semaphore surfaces are only supported
* if the modeset = 1 parameter is set */
if (nv_dev->pDevice != NULL) {
params->supports_alloc = true;
params->generic_page_kind = nv_dev->genericPageKind;
params->page_kind_generation = nv_dev->pageKindGeneration;
params->sector_layout = nv_dev->sectorLayout;
if (nv_dev->semsurf_stride != 0) {
params->supports_semsurf = true;
#if defined(NV_SYNC_FILE_GET_FENCE_PRESENT)
params->supports_sync_fd = true;
#endif /* defined(NV_SYNC_FILE_GET_FENCE_PRESENT) */
}
}
#endif /* defined(NV_DRM_ATOMIC_MODESET_AVAILABLE) */
return 0;
}
static int nv_drm_get_drm_file_unique_id_ioctl(struct drm_device *dev,
void *data, struct drm_file *filep)
{
struct drm_nvidia_get_drm_file_unique_id_params *params = data;
params->id = (u64)(filep->driver_priv);
return 0;
}
@@ -833,10 +937,10 @@ static NvU32 nv_drm_get_head_bit_from_connector(struct drm_connector *connector)
return 0;
}
static int nv_drm_grant_permission_ioctl(struct drm_device *dev, void *data,
struct drm_file *filep)
static int nv_drm_grant_modeset_permission(struct drm_device *dev,
struct drm_nvidia_grant_permissions_params *params,
struct drm_file *filep)
{
struct drm_nvidia_grant_permissions_params *params = data;
struct nv_drm_device *nv_dev = to_nv_device(dev);
struct nv_drm_connector *target_nv_connector = NULL;
struct nv_drm_crtc *target_nv_crtc = NULL;
@@ -958,26 +1062,102 @@ done:
return ret;
}
static bool nv_drm_revoke_connector(struct nv_drm_device *nv_dev,
struct nv_drm_connector *nv_connector)
static int nv_drm_grant_sub_ownership(struct drm_device *dev,
struct drm_nvidia_grant_permissions_params *params)
{
bool ret = true;
if (nv_connector->modeset_permission_crtc) {
if (nv_connector->nv_detected_encoder) {
ret = nvKms->revokePermissions(
nv_dev->pDevice, nv_connector->modeset_permission_crtc->head,
nv_connector->nv_detected_encoder->hDisplay);
}
nv_connector->modeset_permission_crtc->modeset_permission_filep = NULL;
nv_connector->modeset_permission_crtc = NULL;
int ret = -EINVAL;
struct nv_drm_device *nv_dev = to_nv_device(dev);
struct drm_modeset_acquire_ctx *pctx;
#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
struct drm_modeset_acquire_ctx ctx;
DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, DRM_MODESET_ACQUIRE_INTERRUPTIBLE,
ret);
pctx = &ctx;
#else
mutex_lock(&dev->mode_config.mutex);
pctx = dev->mode_config.acquire_ctx;
#endif
if (nv_dev->subOwnershipGranted ||
!nvKms->grantSubOwnership(params->fd, nv_dev->pDevice)) {
goto done;
}
nv_connector->modeset_permission_filep = NULL;
return ret;
/*
* When creating an ownership grant, shut down all heads and disable flip
* notifications.
*/
ret = nv_drm_atomic_helper_disable_all(dev, pctx);
if (ret != 0) {
NV_DRM_DEV_LOG_ERR(
nv_dev,
"nv_drm_atomic_helper_disable_all failed with error code %d!",
ret);
}
atomic_set(&nv_dev->enable_event_handling, false);
nv_dev->subOwnershipGranted = NV_TRUE;
ret = 0;
done:
#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
#else
mutex_unlock(&dev->mode_config.mutex);
#endif
return 0;
}
static int nv_drm_revoke_permission(struct drm_device *dev,
struct drm_file *filep, NvU32 dpyId)
static int nv_drm_grant_permission_ioctl(struct drm_device *dev, void *data,
struct drm_file *filep)
{
struct drm_nvidia_grant_permissions_params *params = data;
if (params->type == NV_DRM_PERMISSIONS_TYPE_MODESET) {
return nv_drm_grant_modeset_permission(dev, params, filep);
} else if (params->type == NV_DRM_PERMISSIONS_TYPE_SUB_OWNER) {
return nv_drm_grant_sub_ownership(dev, params);
}
return -EINVAL;
}
static int
nv_drm_atomic_disable_connector(struct drm_atomic_state *state,
struct nv_drm_connector *nv_connector)
{
struct drm_crtc_state *crtc_state;
struct drm_connector_state *connector_state;
int ret = 0;
if (nv_connector->modeset_permission_crtc) {
crtc_state = drm_atomic_get_crtc_state(
state, &nv_connector->modeset_permission_crtc->base);
if (!crtc_state) {
return -EINVAL;
}
crtc_state->active = false;
ret = drm_atomic_set_mode_prop_for_crtc(crtc_state, NULL);
if (ret < 0) {
return ret;
}
}
connector_state = drm_atomic_get_connector_state(state, &nv_connector->base);
if (!connector_state) {
return -EINVAL;
}
return drm_atomic_set_crtc_for_connector(connector_state, NULL);
}
static int nv_drm_revoke_modeset_permission(struct drm_device *dev,
struct drm_file *filep, NvU32 dpyId)
{
struct drm_modeset_acquire_ctx *pctx;
struct drm_atomic_state *state;
struct drm_connector *connector;
struct drm_crtc *crtc;
int ret = 0;
@@ -988,10 +1168,19 @@ static int nv_drm_revoke_permission(struct drm_device *dev,
struct drm_modeset_acquire_ctx ctx;
DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, DRM_MODESET_ACQUIRE_INTERRUPTIBLE,
ret);
pctx = &ctx;
#else
mutex_lock(&dev->mode_config.mutex);
pctx = dev->mode_config.acquire_ctx;
#endif
state = drm_atomic_state_alloc(dev);
if (!state) {
ret = -ENOMEM;
goto done;
}
state->acquire_ctx = pctx;
/*
* If dpyId is set, only revoke those specific resources. Otherwise,
* it is from closing the file so revoke all resources for that filep.
@@ -1003,10 +1192,13 @@ static int nv_drm_revoke_permission(struct drm_device *dev,
struct nv_drm_connector *nv_connector = to_nv_connector(connector);
if (nv_connector->modeset_permission_filep == filep &&
(!dpyId || nv_drm_connector_is_dpy_id(connector, dpyId))) {
if (!nv_drm_connector_revoke_permissions(dev, nv_connector)) {
ret = -EINVAL;
// Continue trying to revoke as much as possible.
ret = nv_drm_atomic_disable_connector(state, nv_connector);
if (ret < 0) {
goto done;
}
// Continue trying to revoke as much as possible.
nv_drm_connector_revoke_permissions(dev, nv_connector);
}
}
#if defined(NV_DRM_CONNECTOR_LIST_ITER_PRESENT)
@@ -1020,6 +1212,25 @@ static int nv_drm_revoke_permission(struct drm_device *dev,
}
}
ret = drm_atomic_commit(state);
done:
#if defined(NV_DRM_ATOMIC_STATE_REF_COUNTING_PRESENT)
drm_atomic_state_put(state);
#else
if (ret != 0) {
drm_atomic_state_free(state);
} else {
/*
* In case of success, drm_atomic_commit() takes care to cleanup and
* free @state.
*
* Comment placed above drm_atomic_commit() says: The caller must not
* free or in any other way access @state. If the function fails then
* the caller must clean up @state itself.
*/
}
#endif
#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
#else
@@ -1029,14 +1240,55 @@ static int nv_drm_revoke_permission(struct drm_device *dev,
return ret;
}
static int nv_drm_revoke_sub_ownership(struct drm_device *dev)
{
int ret = -EINVAL;
struct nv_drm_device *nv_dev = to_nv_device(dev);
#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
struct drm_modeset_acquire_ctx ctx;
DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, DRM_MODESET_ACQUIRE_INTERRUPTIBLE,
ret);
#else
mutex_lock(&dev->mode_config.mutex);
#endif
if (!nv_dev->subOwnershipGranted) {
goto done;
}
if (!nvKms->revokeSubOwnership(nv_dev->pDevice)) {
NV_DRM_DEV_LOG_ERR(nv_dev, "Failed to revoke sub-ownership from NVKMS");
goto done;
}
nv_dev->subOwnershipGranted = NV_FALSE;
atomic_set(&nv_dev->enable_event_handling, true);
ret = 0;
done:
#if NV_DRM_MODESET_LOCK_ALL_END_ARGUMENT_COUNT == 3
DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
#else
mutex_unlock(&dev->mode_config.mutex);
#endif
return ret;
}
static int nv_drm_revoke_permission_ioctl(struct drm_device *dev, void *data,
struct drm_file *filep)
{
struct drm_nvidia_revoke_permissions_params *params = data;
if (!params->dpyId) {
return -EINVAL;
if (params->type == NV_DRM_PERMISSIONS_TYPE_MODESET) {
if (!params->dpyId) {
return -EINVAL;
}
return nv_drm_revoke_modeset_permission(dev, filep, params->dpyId);
} else if (params->type == NV_DRM_PERMISSIONS_TYPE_SUB_OWNER) {
return nv_drm_revoke_sub_ownership(dev);
}
return nv_drm_revoke_permission(dev, filep, params->dpyId);
return -EINVAL;
}
static void nv_drm_postclose(struct drm_device *dev, struct drm_file *filep)
@@ -1051,11 +1303,22 @@ static void nv_drm_postclose(struct drm_device *dev, struct drm_file *filep)
dev->mode_config.num_connector > 0 &&
dev->mode_config.connector_list.next != NULL &&
dev->mode_config.connector_list.prev != NULL) {
nv_drm_revoke_permission(dev, filep, 0);
nv_drm_revoke_modeset_permission(dev, filep, 0);
}
}
#endif /* NV_DRM_ATOMIC_MODESET_AVAILABLE */
static int nv_drm_open(struct drm_device *dev, struct drm_file *filep)
{
_Static_assert(sizeof(filep->driver_priv) >= sizeof(u64),
"filep->driver_priv can not hold an u64");
static atomic64_t id = ATOMIC_INIT(0);
filep->driver_priv = (void *)atomic64_inc_return(&id);
return 0;
}
#if defined(NV_DRM_MASTER_HAS_LEASES)
static struct drm_master *nv_drm_find_lessee(struct drm_master *master,
int lessee_id)
@@ -1299,6 +1562,9 @@ static const struct drm_ioctl_desc nv_drm_ioctls[] = {
DRM_IOCTL_DEF_DRV(NVIDIA_GET_DEV_INFO,
nv_drm_get_dev_info_ioctl,
DRM_RENDER_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF_DRV(NVIDIA_GET_DRM_FILE_UNIQUE_ID,
nv_drm_get_drm_file_unique_id_ioctl,
DRM_RENDER_ALLOW|DRM_UNLOCKED),
#if defined(NV_DRM_FENCE_AVAILABLE)
DRM_IOCTL_DEF_DRV(NVIDIA_FENCE_SUPPORTED,
@@ -1310,11 +1576,35 @@ static const struct drm_ioctl_desc nv_drm_ioctls[] = {
DRM_IOCTL_DEF_DRV(NVIDIA_GEM_PRIME_FENCE_ATTACH,
nv_drm_gem_prime_fence_attach_ioctl,
DRM_RENDER_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF_DRV(NVIDIA_SEMSURF_FENCE_CTX_CREATE,
nv_drm_semsurf_fence_ctx_create_ioctl,
DRM_RENDER_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF_DRV(NVIDIA_SEMSURF_FENCE_CREATE,
nv_drm_semsurf_fence_create_ioctl,
DRM_RENDER_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF_DRV(NVIDIA_SEMSURF_FENCE_WAIT,
nv_drm_semsurf_fence_wait_ioctl,
DRM_RENDER_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF_DRV(NVIDIA_SEMSURF_FENCE_ATTACH,
nv_drm_semsurf_fence_attach_ioctl,
DRM_RENDER_ALLOW|DRM_UNLOCKED),
#endif
/*
* DRM_UNLOCKED is implicit for all non-legacy DRM driver IOCTLs since Linux
* v4.10 commit fa5386459f06 "drm: Used DRM_LEGACY for all legacy functions"
* (Linux v4.4 commit ea487835e887 "drm: Enforce unlocked ioctl operation
* for kms driver ioctls" previously did it only for drivers that set the
* DRM_MODESET flag), so this will race with SET_CLIENT_CAP. Linux v4.11
* commit dcf727ab5d17 "drm: setclientcap doesn't need the drm BKL" also
* removed locking from SET_CLIENT_CAP so there is no use attempting to lock
* manually. The latter commit acknowledges that this can expose userspace
* to inconsistent behavior when racing with itself, but accepts that risk.
*/
DRM_IOCTL_DEF_DRV(NVIDIA_GET_CLIENT_CAPABILITY,
nv_drm_get_client_capability_ioctl,
0),
#if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
DRM_IOCTL_DEF_DRV(NVIDIA_GET_CRTC_CRC32,
nv_drm_get_crtc_crc32_ioctl,
@@ -1357,6 +1647,9 @@ static struct drm_driver nv_drm_driver = {
.driver_features =
#if defined(NV_DRM_DRIVER_PRIME_FLAG_PRESENT)
DRIVER_PRIME |
#endif
#if defined(NV_DRM_SYNCOBJ_FEATURES_PRESENT)
DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE |
#endif
DRIVER_GEM | DRIVER_RENDER,
@@ -1367,8 +1660,23 @@ static struct drm_driver nv_drm_driver = {
.ioctls = nv_drm_ioctls,
.num_ioctls = ARRAY_SIZE(nv_drm_ioctls),
/*
* Linux kernel v6.6 commit 71a7974ac701 ("drm/prime: Unexport helpers
* for fd/handle conversion") unexports drm_gem_prime_handle_to_fd() and
* drm_gem_prime_fd_to_handle().
*
* Prior Linux kernel v6.6 commit 6b85aa68d9d5 ("drm: Enable PRIME
* import/export for all drivers") made these helpers the default when
* .prime_handle_to_fd / .prime_fd_to_handle are unspecified, so it's fine
* to just skip specifying them if the helpers aren't present.
*/
#if NV_IS_EXPORT_SYMBOL_PRESENT_drm_gem_prime_handle_to_fd
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
#endif
#if NV_IS_EXPORT_SYMBOL_PRESENT_drm_gem_prime_fd_to_handle
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
#endif
.gem_prime_import = nv_drm_gem_prime_import,
.gem_prime_import_sg_table = nv_drm_gem_prime_import_sg_table,
@@ -1394,6 +1702,7 @@ static struct drm_driver nv_drm_driver = {
#if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
.postclose = nv_drm_postclose,
#endif
.open = nv_drm_open,
.fops = &nv_drm_fops,
@@ -1421,7 +1730,7 @@ static struct drm_driver nv_drm_driver = {
* kernel supports atomic modeset and the 'modeset' kernel module
* parameter is true.
*/
static void nv_drm_update_drm_driver_features(void)
void nv_drm_update_drm_driver_features(void)
{
#if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
@@ -1447,11 +1756,12 @@ static void nv_drm_update_drm_driver_features(void)
/*
* Helper function for allocate/register DRM device for given NVIDIA GPU ID.
*/
static void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)
void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)
{
struct nv_drm_device *nv_dev = NULL;
struct drm_device *dev = NULL;
struct device *device = gpu_info->os_device_ptr;
bool bus_is_pci;
DRM_DEBUG(
"Registering device for NVIDIA GPU ID 0x08%x",
@@ -1485,8 +1795,15 @@ static void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)
dev->dev_private = nv_dev;
nv_dev->dev = dev;
bus_is_pci =
#if defined(NV_LINUX)
device->bus == &pci_bus_type;
#elif defined(NV_BSD)
devclass_find("pci");
#endif
#if defined(NV_DRM_DEVICE_HAS_PDEV)
if (device->bus == &pci_bus_type) {
if (bus_is_pci) {
dev->pdev = to_pci_dev(device);
}
#endif
@@ -1498,6 +1815,23 @@ static void nv_drm_register_drm_device(const nv_gpu_info_t *gpu_info)
goto failed_drm_register;
}
#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
if (nv_drm_fbdev_module_param &&
drm_core_check_feature(dev, DRIVER_MODESET)) {
if (bus_is_pci) {
struct pci_dev *pdev = to_pci_dev(device);
#if defined(NV_DRM_APERTURE_REMOVE_CONFLICTING_PCI_FRAMEBUFFERS_HAS_DRIVER_ARG)
drm_aperture_remove_conflicting_pci_framebuffers(pdev, &nv_drm_driver);
#else
drm_aperture_remove_conflicting_pci_framebuffers(pdev, nv_drm_driver.name);
#endif
}
drm_fbdev_generic_setup(dev, 32);
}
#endif /* defined(NV_DRM_FBDEV_GENERIC_AVAILABLE) */
/* Add NVIDIA-DRM device into list */
nv_dev->next = dev_list;
@@ -1517,6 +1851,7 @@ failed_drm_alloc:
/*
* Enumerate NVIDIA GPUs and allocate/register DRM device for each of them.
*/
#if defined(NV_LINUX)
int nv_drm_probe_devices(void)
{
nv_gpu_info_t *gpu_info = NULL;
@@ -1559,6 +1894,7 @@ done:
return ret;
}
#endif
/*
* Unregister all NVIDIA DRM devices.
@@ -1567,9 +1903,10 @@ void nv_drm_remove_devices(void)
{
while (dev_list != NULL) {
struct nv_drm_device *next = dev_list->next;
struct drm_device *dev = dev_list->dev;
drm_dev_unregister(dev_list->dev);
nv_drm_dev_free(dev_list->dev);
drm_dev_unregister(dev);
nv_drm_dev_free(dev);
nv_drm_free(dev_list);
@@ -1577,4 +1914,79 @@ void nv_drm_remove_devices(void)
}
}
/*
* Handle system suspend and resume.
*
* Normally, a DRM driver would use drm_mode_config_helper_suspend() to save the
* current state on suspend and drm_mode_config_helper_resume() to restore it
* after resume. This works for upstream drivers because user-mode tasks are
* frozen before the suspend hook is called.
*
* In the case of nvidia-drm, the suspend hook is also called when 'suspend' is
* written to /proc/driver/nvidia/suspend, before user-mode tasks are frozen.
* However, we don't actually need to save and restore the display state because
* the driver requires a VT switch to an unused VT before suspending and a
* switch back to the application (or fbdev console) on resume. The DRM client
* (or fbdev helper functions) will restore the appropriate mode on resume.
*
*/
void nv_drm_suspend_resume(NvBool suspend)
{
static DEFINE_MUTEX(nv_drm_suspend_mutex);
static NvU32 nv_drm_suspend_count = 0;
struct nv_drm_device *nv_dev;
mutex_lock(&nv_drm_suspend_mutex);
/*
* Count the number of times the driver is asked to suspend. Suspend all DRM
* devices on the first suspend call and resume them on the last resume
* call. This is necessary because the kernel may call nvkms_suspend()
* simultaneously for each GPU, but NVKMS itself also suspends all GPUs on
* the first call.
*/
if (suspend) {
if (nv_drm_suspend_count++ > 0) {
goto done;
}
} else {
BUG_ON(nv_drm_suspend_count == 0);
if (--nv_drm_suspend_count > 0) {
goto done;
}
}
#if defined(NV_DRM_ATOMIC_MODESET_AVAILABLE)
nv_dev = dev_list;
/*
* NVKMS shuts down all heads on suspend. Update DRM state accordingly.
*/
for (nv_dev = dev_list; nv_dev; nv_dev = nv_dev->next) {
struct drm_device *dev = nv_dev->dev;
if (!drm_core_check_feature(dev, DRIVER_MODESET)) {
continue;
}
if (suspend) {
drm_kms_helper_poll_disable(dev);
#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
drm_fb_helper_set_suspend_unlocked(dev->fb_helper, 1);
#endif
drm_mode_config_reset(dev);
} else {
#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
drm_fb_helper_set_suspend_unlocked(dev->fb_helper, 0);
#endif
drm_kms_helper_poll_enable(dev);
}
}
#endif /* NV_DRM_ATOMIC_MODESET_AVAILABLE */
done:
mutex_unlock(&nv_drm_suspend_mutex);
}
#endif /* NV_DRM_AVAILABLE */

View File

@@ -31,6 +31,12 @@ int nv_drm_probe_devices(void);
void nv_drm_remove_devices(void);
void nv_drm_suspend_resume(NvBool suspend);
void nv_drm_register_drm_device(const nv_gpu_info_t *);
void nv_drm_update_drm_driver_features(void);
#endif /* defined(NV_DRM_AVAILABLE) */
#endif /* __NVIDIA_DRM_DRV_H__ */

View File

@@ -300,7 +300,7 @@ void nv_drm_handle_display_change(struct nv_drm_device *nv_dev,
nv_drm_connector_mark_connection_status_dirty(nv_encoder->nv_connector);
drm_kms_helper_hotplug_event(dev);
schedule_delayed_work(&nv_dev->hotplug_event_work, 0);
}
void nv_drm_handle_dynamic_display_connected(struct nv_drm_device *nv_dev,
@@ -347,6 +347,6 @@ void nv_drm_handle_dynamic_display_connected(struct nv_drm_device *nv_dev,
drm_reinit_primary_mode_group(dev);
#endif
drm_kms_helper_hotplug_event(dev);
schedule_delayed_work(&nv_dev->hotplug_event_work, 0);
}
#endif

View File

@@ -240,7 +240,7 @@ struct drm_framebuffer *nv_drm_internal_framebuffer_create(
if (nv_dev->modifiers[i] == DRM_FORMAT_MOD_INVALID) {
NV_DRM_DEV_DEBUG_DRIVER(
nv_dev,
"Invalid format modifier for framebuffer object: 0x%016llx",
"Invalid format modifier for framebuffer object: 0x%016" NvU64_fmtx,
modifier);
return ERR_PTR(-EINVAL);
}

File diff suppressed because it is too large Load Diff

View File

@@ -41,6 +41,22 @@ int nv_drm_prime_fence_context_create_ioctl(struct drm_device *dev,
int nv_drm_gem_prime_fence_attach_ioctl(struct drm_device *dev,
void *data, struct drm_file *filep);
int nv_drm_semsurf_fence_ctx_create_ioctl(struct drm_device *dev,
void *data,
struct drm_file *filep);
int nv_drm_semsurf_fence_create_ioctl(struct drm_device *dev,
void *data,
struct drm_file *filep);
int nv_drm_semsurf_fence_wait_ioctl(struct drm_device *dev,
void *data,
struct drm_file *filep);
int nv_drm_semsurf_fence_attach_ioctl(struct drm_device *dev,
void *data,
struct drm_file *filep);
#endif /* NV_DRM_FENCE_AVAILABLE */
#endif /* NV_DRM_AVAILABLE */

View File

@@ -71,12 +71,42 @@ static int __nv_drm_gem_dma_buf_create_mmap_offset(
static int __nv_drm_gem_dma_buf_mmap(struct nv_drm_gem_object *nv_gem,
struct vm_area_struct *vma)
{
#if defined(NV_LINUX)
struct dma_buf_attachment *attach = nv_gem->base.import_attach;
struct dma_buf *dma_buf = attach->dmabuf;
#endif
struct file *old_file;
int ret;
/* check if buffer supports mmap */
#if defined(NV_BSD)
/*
* Most of the FreeBSD DRM code refers to struct file*, which is actually
* a struct linux_file*. The dmabuf code in FreeBSD is not actually plumbed
* through the same linuxkpi bits it seems (probably so it can be used
* elsewhere), so dma_buf->file really is a native FreeBSD struct file...
*/
if (!nv_gem->base.filp->f_op->mmap)
return -EINVAL;
/* readjust the vma */
get_file(nv_gem->base.filp);
old_file = vma->vm_file;
vma->vm_file = nv_gem->base.filp;
vma->vm_pgoff -= drm_vma_node_start(&nv_gem->base.vma_node);
ret = nv_gem->base.filp->f_op->mmap(nv_gem->base.filp, vma);
if (ret) {
/* restore old parameters on failure */
vma->vm_file = old_file;
vma->vm_pgoff += drm_vma_node_start(&nv_gem->base.vma_node);
fput(nv_gem->base.filp);
} else {
if (old_file)
fput(old_file);
}
#else
if (!dma_buf->file->f_op->mmap)
return -EINVAL;
@@ -84,18 +114,20 @@ static int __nv_drm_gem_dma_buf_mmap(struct nv_drm_gem_object *nv_gem,
get_file(dma_buf->file);
old_file = vma->vm_file;
vma->vm_file = dma_buf->file;
vma->vm_pgoff -= drm_vma_node_start(&nv_gem->base.vma_node);;
vma->vm_pgoff -= drm_vma_node_start(&nv_gem->base.vma_node);
ret = dma_buf->file->f_op->mmap(dma_buf->file, vma);
if (ret) {
/* restore old parameters on failure */
vma->vm_file = old_file;
vma->vm_pgoff += drm_vma_node_start(&nv_gem->base.vma_node);
fput(dma_buf->file);
} else {
if (old_file)
fput(old_file);
}
#endif
return ret;
}

View File

@@ -37,6 +37,9 @@
#endif
#include <linux/io.h>
#if defined(NV_BSD)
#include <vm/vm_pageout.h>
#endif
#include "nv-mm.h"
@@ -93,7 +96,17 @@ static vm_fault_t __nv_drm_gem_nvkms_handle_vma_fault(
if (nv_nvkms_memory->pages_count == 0) {
pfn = (unsigned long)(uintptr_t)nv_nvkms_memory->pPhysicalAddress;
pfn >>= PAGE_SHIFT;
#if defined(NV_LINUX)
/*
* FreeBSD doesn't set pgoff. We instead have pfn be the base physical
* address, and we will calculate the index pidx from the virtual address.
*
* This only works because linux_cdev_pager_populate passes the pidx as
* vmf->virtual_address. Then we turn the virtual address
* into a physical page number.
*/
pfn += page_offset;
#endif
} else {
BUG_ON(page_offset >= nv_nvkms_memory->pages_count);
pfn = page_to_pfn(nv_nvkms_memory->pages[page_offset]);
@@ -243,6 +256,15 @@ static int __nv_drm_nvkms_gem_obj_init(
NvU64 *pages = NULL;
NvU32 numPages = 0;
if ((size % PAGE_SIZE) != 0) {
NV_DRM_DEV_LOG_ERR(
nv_dev,
"NvKmsKapiMemory 0x%p size should be in a multiple of page size to "
"create a gem object",
pMemory);
return -EINVAL;
}
nv_nvkms_memory->pPhysicalAddress = NULL;
nv_nvkms_memory->pWriteCombinedIORemapAddress = NULL;
nv_nvkms_memory->physically_mapped = false;
@@ -314,7 +336,7 @@ int nv_drm_dumb_create(
ret = -ENOMEM;
NV_DRM_DEV_LOG_ERR(
nv_dev,
"Failed to allocate NvKmsKapiMemory for dumb object of size %llu",
"Failed to allocate NvKmsKapiMemory for dumb object of size %" NvU64_fmtu,
args->size);
goto nvkms_alloc_memory_failed;
}
@@ -465,7 +487,7 @@ int nv_drm_gem_alloc_nvkms_memory_ioctl(struct drm_device *dev,
goto failed;
}
if (p->__pad != 0) {
if ((p->__pad0 != 0) || (p->__pad1 != 0)) {
ret = -EINVAL;
NV_DRM_DEV_LOG_ERR(nv_dev, "non-zero value in padding field");
goto failed;
@@ -529,14 +551,12 @@ static struct drm_gem_object *__nv_drm_gem_nvkms_prime_dup(
{
struct nv_drm_device *nv_dev = to_nv_device(dev);
const struct nv_drm_device *nv_dev_src;
const struct nv_drm_gem_nvkms_memory *nv_nvkms_memory_src;
struct nv_drm_gem_nvkms_memory *nv_nvkms_memory;
struct NvKmsKapiMemory *pMemory;
BUG_ON(nv_gem_src == NULL || nv_gem_src->ops != &nv_gem_nvkms_memory_ops);
nv_dev_src = to_nv_device(nv_gem_src->base.dev);
nv_nvkms_memory_src = to_nv_nvkms_memory_const(nv_gem_src);
if ((nv_nvkms_memory =
nv_drm_calloc(1, sizeof(*nv_nvkms_memory))) == NULL) {

View File

@@ -36,6 +36,10 @@
#include "linux/mm.h"
#include "nv-mm.h"
#if defined(NV_BSD)
#include <vm/vm_pageout.h>
#endif
static inline
void __nv_drm_gem_user_memory_free(struct nv_drm_gem_object *nv_gem)
{
@@ -113,6 +117,10 @@ static vm_fault_t __nv_drm_gem_user_memory_handle_vma_fault(
page_offset = vmf->pgoff - drm_vma_node_start(&gem->vma_node);
BUG_ON(page_offset >= nv_user_memory->pages_count);
#if !defined(NV_LINUX)
ret = vmf_insert_pfn(vma, address, page_to_pfn(nv_user_memory->pages[page_offset]));
#else /* !defined(NV_LINUX) */
ret = vm_insert_page(vma, address, nv_user_memory->pages[page_offset]);
switch (ret) {
case 0:
@@ -131,6 +139,7 @@ static vm_fault_t __nv_drm_gem_user_memory_handle_vma_fault(
ret = VM_FAULT_SIGBUS;
break;
}
#endif /* !defined(NV_LINUX) */
return ret;
}
@@ -170,7 +179,7 @@ int nv_drm_gem_import_userspace_memory_ioctl(struct drm_device *dev,
if ((params->size % PAGE_SIZE) != 0) {
NV_DRM_DEV_LOG_ERR(
nv_dev,
"Userspace memory 0x%llx size should be in a multiple of page "
"Userspace memory 0x%" NvU64_fmtx " size should be in a multiple of page "
"size to create a gem object",
params->address);
return -EINVAL;
@@ -183,7 +192,7 @@ int nv_drm_gem_import_userspace_memory_ioctl(struct drm_device *dev,
if (ret != 0) {
NV_DRM_DEV_LOG_ERR(
nv_dev,
"Failed to lock user pages for address 0x%llx: %d",
"Failed to lock user pages for address 0x%" NvU64_fmtx ": %d",
params->address, ret);
return ret;
}

View File

@@ -95,6 +95,16 @@ static inline struct nv_drm_gem_object *to_nv_gem_object(
* 3e70fd160cf0b1945225eaa08dd2cb8544f21cb8 (2018-11-15).
*/
static inline void
nv_drm_gem_object_reference(struct nv_drm_gem_object *nv_gem)
{
#if defined(NV_DRM_GEM_OBJECT_GET_PRESENT)
drm_gem_object_get(&nv_gem->base);
#else
drm_gem_object_reference(&nv_gem->base);
#endif
}
static inline void
nv_drm_gem_object_unreference_unlocked(struct nv_drm_gem_object *nv_gem)
{
@@ -179,6 +189,7 @@ static inline int nv_drm_gem_handle_create(struct drm_file *filp,
return drm_gem_handle_create(filp, &nv_gem->base, handle);
}
#if defined(NV_DRM_FENCE_AVAILABLE)
static inline nv_dma_resv_t *nv_drm_gem_res_obj(struct nv_drm_gem_object *nv_gem)
{
#if defined(NV_DRM_GEM_OBJECT_HAS_RESV)
@@ -187,6 +198,7 @@ static inline nv_dma_resv_t *nv_drm_gem_res_obj(struct nv_drm_gem_object *nv_gem
return nv_gem->base.dma_buf ? nv_gem->base.dma_buf->resv : &nv_gem->resv;
#endif
}
#endif
void nv_drm_gem_object_init(struct nv_drm_device *nv_dev,
struct nv_drm_gem_object *nv_gem,

View File

@@ -45,8 +45,7 @@
/*
* The inclusion of drm_framebuffer.h was removed from drm_crtc.h by commit
* 720cf96d8fecde29b72e1101f8a567a0ce99594f ("drm: Drop drm_framebuffer.h from
* drm_crtc.h") in linux-next, expected in v5.19-rc7.
* 720cf96d8fec ("drm: Drop drm_framebuffer.h from drm_crtc.h") in v6.0.
*
* We only need drm_framebuffer.h for drm_framebuffer_put(), and it is always
* present (v4.9+) when drm_framebuffer_{put,get}() is present (v4.12+), so it

View File

@@ -306,6 +306,36 @@ int nv_drm_atomic_helper_disable_all(struct drm_device *dev,
for_each_plane_in_state(__state, plane, plane_state, __i)
#endif
/*
* for_each_new_plane_in_state() was added by kernel commit
* 581e49fe6b411f407102a7f2377648849e0fa37f which was Signed-off-by:
* Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
* Daniel Vetter <daniel.vetter@ffwll.ch>
*
* This commit also added the old_state and new_state pointers to
* __drm_planes_state. Because of this, the best that can be done on kernel
* versions without this macro is for_each_plane_in_state.
*/
/**
* nv_drm_for_each_new_plane_in_state - iterate over all planes in an atomic update
* @__state: &struct drm_atomic_state pointer
* @plane: &struct drm_plane iteration cursor
* @new_plane_state: &struct drm_plane_state iteration cursor for the new state
* @__i: int iteration cursor, for macro-internal use
*
* This iterates over all planes in an atomic update, tracking only the new
* state. This is useful in enable functions, where we need the new state the
* hardware should be in when the atomic commit operation has completed.
*/
#if !defined(for_each_new_plane_in_state)
#define nv_drm_for_each_new_plane_in_state(__state, plane, new_plane_state, __i) \
nv_drm_for_each_plane_in_state(__state, plane, new_plane_state, __i)
#else
#define nv_drm_for_each_new_plane_in_state(__state, plane, new_plane_state, __i) \
for_each_new_plane_in_state(__state, plane, new_plane_state, __i)
#endif
static inline struct drm_connector *
nv_drm_connector_lookup(struct drm_device *dev, struct drm_file *filep,
uint32_t id)
@@ -582,6 +612,19 @@ static inline int nv_drm_format_num_planes(uint32_t format)
#endif /* defined(NV_DRM_FORMAT_MODIFIERS_PRESENT) */
/*
* DRM_UNLOCKED was removed with commit 2798ffcc1d6a ("drm: Remove locking for
* legacy ioctls and DRM_UNLOCKED") in v6.8, but it was previously made
* implicit for all non-legacy DRM driver IOCTLs since Linux v4.10 commit
* fa5386459f06 "drm: Used DRM_LEGACY for all legacy functions" (Linux v4.4
* commit ea487835e887 "drm: Enforce unlocked ioctl operation for kms driver
* ioctls" previously did it only for drivers that set the DRM_MODESET flag), so
* it was effectively a no-op anyway.
*/
#if !defined(NV_DRM_UNLOCKED_IOCTL_FLAG_PRESENT)
#define DRM_UNLOCKED 0
#endif
/*
* drm_vma_offset_exact_lookup_locked() were added
* by kernel commit 2225cfe46bcc which was Signed-off-by:

View File

@@ -48,6 +48,11 @@
#define DRM_NVIDIA_GET_CONNECTOR_ID_FOR_DPY_ID 0x11
#define DRM_NVIDIA_GRANT_PERMISSIONS 0x12
#define DRM_NVIDIA_REVOKE_PERMISSIONS 0x13
#define DRM_NVIDIA_SEMSURF_FENCE_CTX_CREATE 0x14
#define DRM_NVIDIA_SEMSURF_FENCE_CREATE 0x15
#define DRM_NVIDIA_SEMSURF_FENCE_WAIT 0x16
#define DRM_NVIDIA_SEMSURF_FENCE_ATTACH 0x17
#define DRM_NVIDIA_GET_DRM_FILE_UNIQUE_ID 0x18
#define DRM_IOCTL_NVIDIA_GEM_IMPORT_NVKMS_MEMORY \
DRM_IOWR((DRM_COMMAND_BASE + DRM_NVIDIA_GEM_IMPORT_NVKMS_MEMORY), \
@@ -67,7 +72,7 @@
*
* 'warning: suggest parentheses around arithmetic in operand of |'
*/
#if defined(NV_LINUX)
#if defined(NV_LINUX) || defined(NV_BSD)
#define DRM_IOCTL_NVIDIA_FENCE_SUPPORTED \
DRM_IO(DRM_COMMAND_BASE + DRM_NVIDIA_FENCE_SUPPORTED)
#define DRM_IOCTL_NVIDIA_DMABUF_SUPPORTED \
@@ -133,6 +138,31 @@
DRM_IOWR((DRM_COMMAND_BASE + DRM_NVIDIA_REVOKE_PERMISSIONS), \
struct drm_nvidia_revoke_permissions_params)
#define DRM_IOCTL_NVIDIA_SEMSURF_FENCE_CTX_CREATE \
DRM_IOWR((DRM_COMMAND_BASE + \
DRM_NVIDIA_SEMSURF_FENCE_CTX_CREATE), \
struct drm_nvidia_semsurf_fence_ctx_create_params)
#define DRM_IOCTL_NVIDIA_SEMSURF_FENCE_CREATE \
DRM_IOWR((DRM_COMMAND_BASE + \
DRM_NVIDIA_SEMSURF_FENCE_CREATE), \
struct drm_nvidia_semsurf_fence_create_params)
#define DRM_IOCTL_NVIDIA_SEMSURF_FENCE_WAIT \
DRM_IOW((DRM_COMMAND_BASE + \
DRM_NVIDIA_SEMSURF_FENCE_WAIT), \
struct drm_nvidia_semsurf_fence_wait_params)
#define DRM_IOCTL_NVIDIA_SEMSURF_FENCE_ATTACH \
DRM_IOW((DRM_COMMAND_BASE + \
DRM_NVIDIA_SEMSURF_FENCE_ATTACH), \
struct drm_nvidia_semsurf_fence_attach_params)
#define DRM_IOCTL_NVIDIA_GET_DRM_FILE_UNIQUE_ID \
DRM_IOWR((DRM_COMMAND_BASE + \
DRM_NVIDIA_GET_DRM_FILE_UNIQUE_ID), \
struct drm_nvidia_get_drm_file_unique_id_params)
struct drm_nvidia_gem_import_nvkms_memory_params {
uint64_t mem_size; /* IN */
@@ -154,10 +184,15 @@ struct drm_nvidia_get_dev_info_params {
uint32_t gpu_id; /* OUT */
uint32_t primary_index; /* OUT; the "card%d" value */
/* See DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D definitions of these */
uint32_t supports_alloc; /* OUT */
/* The generic_page_kind, page_kind_generation, and sector_layout
* fields are only valid if supports_alloc is true.
* See DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D definitions of these. */
uint32_t generic_page_kind; /* OUT */
uint32_t page_kind_generation; /* OUT */
uint32_t sector_layout; /* OUT */
uint32_t supports_sync_fd; /* OUT */
uint32_t supports_semsurf; /* OUT */
};
struct drm_nvidia_prime_fence_context_create_params {
@@ -179,6 +214,7 @@ struct drm_nvidia_gem_prime_fence_attach_params {
uint32_t handle; /* IN GEM handle to attach fence to */
uint32_t fence_context_handle; /* IN GEM handle to fence context on which fence is run on */
uint32_t sem_thresh; /* IN Semaphore value to reach before signal */
uint32_t __pad;
};
struct drm_nvidia_get_client_capability_params {
@@ -190,6 +226,8 @@ struct drm_nvidia_get_client_capability_params {
struct drm_nvidia_crtc_crc32 {
uint32_t value; /* Read value, undefined if supported is false */
uint8_t supported; /* Supported boolean, true if readable by hardware */
uint8_t __pad0;
uint16_t __pad1;
};
struct drm_nvidia_crtc_crc32_v2_out {
@@ -229,10 +267,11 @@ struct drm_nvidia_gem_alloc_nvkms_memory_params {
uint32_t handle; /* OUT */
uint8_t block_linear; /* IN */
uint8_t compressible; /* IN/OUT */
uint16_t __pad;
uint16_t __pad0;
uint64_t memory_size; /* IN */
uint32_t flags; /* IN */
uint32_t __pad1;
};
struct drm_nvidia_gem_export_dmabuf_memory_params {
@@ -266,13 +305,94 @@ struct drm_nvidia_get_connector_id_for_dpy_id_params {
uint32_t connectorId; /* OUT */
};
enum drm_nvidia_permissions_type {
NV_DRM_PERMISSIONS_TYPE_MODESET = 2,
NV_DRM_PERMISSIONS_TYPE_SUB_OWNER = 3
};
struct drm_nvidia_grant_permissions_params {
int32_t fd; /* IN */
uint32_t dpyId; /* IN */
uint32_t type; /* IN */
};
struct drm_nvidia_revoke_permissions_params {
uint32_t dpyId; /* IN */
uint32_t type; /* IN */
};
struct drm_nvidia_semsurf_fence_ctx_create_params {
uint64_t index; /* IN Index of the desired semaphore in the
* fence context's semaphore surface */
/* Params for importing userspace semaphore surface */
uint64_t nvkms_params_ptr; /* IN */
uint64_t nvkms_params_size; /* IN */
uint32_t handle; /* OUT GEM handle to fence context */
uint32_t __pad;
};
struct drm_nvidia_semsurf_fence_create_params {
uint32_t fence_context_handle; /* IN GEM handle to fence context on which
* fence is run on */
uint32_t timeout_value_ms; /* IN Timeout value in ms for the fence
* after which the fence will be signaled
* with its error status set to -ETIMEDOUT.
* Default timeout value is 5000ms */
uint64_t wait_value; /* IN Semaphore value to reach before signal */
int32_t fd; /* OUT sync FD object representing the
* semaphore at the specified index reaching
* a value >= wait_value */
uint32_t __pad;
};
/*
* Note there is no provision for timeouts in this ioctl. The kernel
* documentation asserts timeouts should be handled by fence producers, and
* that waiters should not second-guess their logic, as it is producers rather
* than consumers that have better information when it comes to determining a
* reasonable timeout for a given workload.
*/
struct drm_nvidia_semsurf_fence_wait_params {
uint32_t fence_context_handle; /* IN GEM handle to fence context which will
* be used to wait on the sync FD. Need not
* be the fence context used to create the
* sync FD. */
int32_t fd; /* IN sync FD object to wait on */
uint64_t pre_wait_value; /* IN Wait for the semaphore represented by
* fence_context to reach this value before
* waiting for the sync file. */
uint64_t post_wait_value; /* IN Signal the semaphore represented by
* fence_context to this value after waiting
* for the sync file */
};
struct drm_nvidia_semsurf_fence_attach_params {
uint32_t handle; /* IN GEM handle of buffer */
uint32_t fence_context_handle; /* IN GEM handle of fence context */
uint32_t timeout_value_ms; /* IN Timeout value in ms for the fence
* after which the fence will be signaled
* with its error status set to -ETIMEDOUT.
* Default timeout value is 5000ms */
uint32_t shared; /* IN If true, fence will reserve shared
* access to the buffer, otherwise it will
* reserve exclusive access */
uint64_t wait_value; /* IN Semaphore value to reach before signal */
};
struct drm_nvidia_get_drm_file_unique_id_params {
uint64_t id; /* OUT Unique ID of the DRM file */
};
#endif /* _UAPI_NVIDIA_DRM_IOCTL_H_ */

View File

@@ -1,5 +1,5 @@
/*
* Copyright (c) 2015, NVIDIA CORPORATION. All rights reserved.
* Copyright (c) 2015-2023, NVIDIA CORPORATION. All rights reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
@@ -21,8 +21,6 @@
*/
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/err.h>
#include "nvidia-drm-os-interface.h"
#include "nvidia-drm.h"
@@ -31,135 +29,18 @@
#if defined(NV_DRM_AVAILABLE)
#if defined(NV_DRM_DRMP_H_PRESENT)
#include <drm/drmP.h>
#endif
#include <linux/vmalloc.h>
#include "nv-mm.h"
MODULE_PARM_DESC(
modeset,
"Enable atomic kernel modesetting (1 = enable, 0 = disable (default))");
bool nv_drm_modeset_module_param = false;
module_param_named(modeset, nv_drm_modeset_module_param, bool, 0400);
void *nv_drm_calloc(size_t nmemb, size_t size)
{
size_t total_size = nmemb * size;
//
// Check for overflow.
//
if ((nmemb != 0) && ((total_size / nmemb) != size))
{
return NULL;
}
return kzalloc(nmemb * size, GFP_KERNEL);
}
void nv_drm_free(void *ptr)
{
if (IS_ERR(ptr)) {
return;
}
kfree(ptr);
}
char *nv_drm_asprintf(const char *fmt, ...)
{
va_list ap;
char *p;
va_start(ap, fmt);
p = kvasprintf(GFP_KERNEL, fmt, ap);
va_end(ap);
return p;
}
#if defined(NVCPU_X86) || defined(NVCPU_X86_64)
#define WRITE_COMBINE_FLUSH() asm volatile("sfence":::"memory")
#elif defined(NVCPU_FAMILY_ARM)
#if defined(NVCPU_ARM)
#define WRITE_COMBINE_FLUSH() { dsb(); outer_sync(); }
#elif defined(NVCPU_AARCH64)
#define WRITE_COMBINE_FLUSH() mb()
#endif
#elif defined(NVCPU_PPC64LE)
#define WRITE_COMBINE_FLUSH() asm volatile("sync":::"memory")
#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
MODULE_PARM_DESC(
fbdev,
"Create a framebuffer device (1 = enable, 0 = disable (default)) (EXPERIMENTAL)");
module_param_named(fbdev, nv_drm_fbdev_module_param, bool, 0400);
#endif
void nv_drm_write_combine_flush(void)
{
WRITE_COMBINE_FLUSH();
}
int nv_drm_lock_user_pages(unsigned long address,
unsigned long pages_count, struct page ***pages)
{
struct mm_struct *mm = current->mm;
struct page **user_pages;
int pages_pinned;
user_pages = nv_drm_calloc(pages_count, sizeof(*user_pages));
if (user_pages == NULL) {
return -ENOMEM;
}
nv_mmap_read_lock(mm);
pages_pinned = NV_PIN_USER_PAGES(address, pages_count, FOLL_WRITE,
user_pages, NULL);
nv_mmap_read_unlock(mm);
if (pages_pinned < 0 || (unsigned)pages_pinned < pages_count) {
goto failed;
}
*pages = user_pages;
return 0;
failed:
if (pages_pinned > 0) {
int i;
for (i = 0; i < pages_pinned; i++) {
NV_UNPIN_USER_PAGE(user_pages[i]);
}
}
nv_drm_free(user_pages);
return (pages_pinned < 0) ? pages_pinned : -EINVAL;
}
void nv_drm_unlock_user_pages(unsigned long pages_count, struct page **pages)
{
unsigned long i;
for (i = 0; i < pages_count; i++) {
set_page_dirty_lock(pages[i]);
NV_UNPIN_USER_PAGE(pages[i]);
}
nv_drm_free(pages);
}
void *nv_drm_vmap(struct page **pages, unsigned long pages_count)
{
return vmap(pages, pages_count, VM_USERMAP, PAGE_KERNEL);
}
void nv_drm_vunmap(void *address)
{
vunmap(address);
}
#endif /* NV_DRM_AVAILABLE */
/*************************************************************************

View File

@@ -237,6 +237,14 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,
int i;
int ret;
/*
* If sub-owner permission was granted to another NVKMS client, disallow
* modesets through the DRM interface.
*/
if (nv_dev->subOwnershipGranted) {
return -EINVAL;
}
memset(requested_config, 0, sizeof(*requested_config));
/* Loop over affected crtcs and construct NvKmsKapiRequestedModeSetConfig */
@@ -274,9 +282,6 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,
nv_new_crtc_state->nv_flip = NULL;
}
#if defined(NV_DRM_CRTC_STATE_HAS_VRR_ENABLED)
requested_config->headRequestedConfig[nv_crtc->head].modeSetConfig.vrrEnabled = new_crtc_state->vrr_enabled;
#endif
}
}
@@ -292,7 +297,9 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,
requested_config,
&reply_config,
commit)) {
return -EINVAL;
if (commit || reply_config.flipResult != NV_KMS_FLIP_RESULT_IN_PROGRESS) {
return -EINVAL;
}
}
if (commit && nv_dev->supportsSyncpts) {
@@ -314,6 +321,24 @@ int nv_drm_atomic_check(struct drm_device *dev,
{
int ret = 0;
#if defined(NV_DRM_COLOR_MGMT_AVAILABLE)
struct drm_crtc *crtc;
struct drm_crtc_state *crtc_state;
int i;
nv_drm_for_each_crtc_in_state(state, crtc, crtc_state, i) {
/*
* if the color management changed on the crtc, we need to update the
* crtc's plane's CSC matrices, so add the crtc's planes to the commit
*/
if (crtc_state->color_mgmt_changed) {
if ((ret = drm_atomic_add_affected_planes(state, crtc)) != 0) {
goto done;
}
}
}
#endif /* NV_DRM_COLOR_MGMT_AVAILABLE */
if ((ret = drm_atomic_helper_check(dev, state)) != 0) {
goto done;
}
@@ -388,42 +413,56 @@ int nv_drm_atomic_commit(struct drm_device *dev,
struct nv_drm_device *nv_dev = to_nv_device(dev);
/*
* drm_mode_config_funcs::atomic_commit() mandates to return -EBUSY
* for nonblocking commit if previous updates (commit tasks/flip event) are
* pending. In case of blocking commits it mandates to wait for previous
* updates to complete.
* XXX: drm_mode_config_funcs::atomic_commit() mandates to return -EBUSY
* for nonblocking commit if the commit would need to wait for previous
* updates (commit tasks/flip event) to complete. In case of blocking
* commits it mandates to wait for previous updates to complete. However,
* the kernel DRM-KMS documentation does explicitly allow maintaining a
* queue of outstanding commits.
*
* Our system already implements such a queue, but due to
* bug 4054608, it is currently not used.
*/
if (nonblock) {
nv_drm_for_each_crtc_in_state(state, crtc, crtc_state, i) {
struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
nv_drm_for_each_crtc_in_state(state, crtc, crtc_state, i) {
struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
/*
* Here you aren't required to hold nv_drm_crtc::flip_list_lock
* because:
*
* The core DRM driver acquires lock for all affected crtcs before
* calling into ->commit() hook, therefore it is not possible for
* other threads to call into ->commit() hook affecting same crtcs
* and enqueue flip objects into flip_list -
*
* nv_drm_atomic_commit_internal()
* |-> nv_drm_atomic_apply_modeset_config(commit=true)
* |-> nv_drm_crtc_enqueue_flip()
*
* Only possibility is list_empty check races with code path
* dequeuing flip object -
*
* __nv_drm_handle_flip_event()
* |-> nv_drm_crtc_dequeue_flip()
*
* But this race condition can't lead list_empty() to return
* incorrect result. nv_drm_crtc_dequeue_flip() in the middle of
* updating the list could not trick us into thinking the list is
* empty when it isn't.
*/
/*
* Here you aren't required to hold nv_drm_crtc::flip_list_lock
* because:
*
* The core DRM driver acquires lock for all affected crtcs before
* calling into ->commit() hook, therefore it is not possible for
* other threads to call into ->commit() hook affecting same crtcs
* and enqueue flip objects into flip_list -
*
* nv_drm_atomic_commit_internal()
* |-> nv_drm_atomic_apply_modeset_config(commit=true)
* |-> nv_drm_crtc_enqueue_flip()
*
* Only possibility is list_empty check races with code path
* dequeuing flip object -
*
* __nv_drm_handle_flip_event()
* |-> nv_drm_crtc_dequeue_flip()
*
* But this race condition can't lead list_empty() to return
* incorrect result. nv_drm_crtc_dequeue_flip() in the middle of
* updating the list could not trick us into thinking the list is
* empty when it isn't.
*/
if (nonblock) {
if (!list_empty(&nv_crtc->flip_list)) {
return -EBUSY;
}
} else {
if (wait_event_timeout(
nv_dev->flip_event_wq,
list_empty(&nv_crtc->flip_list),
3 * HZ /* 3 second */) == 0) {
NV_DRM_DEV_LOG_ERR(
nv_dev,
"Flip event timeout on head %u", nv_crtc->head);
}
}
}
@@ -467,6 +506,7 @@ int nv_drm_atomic_commit(struct drm_device *dev,
goto done;
}
nv_dev->drmMasterChangedSinceLastAtomicCommit = NV_FALSE;
nv_drm_for_each_crtc_in_state(state, crtc, crtc_state, i) {
struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
@@ -547,6 +587,9 @@ int nv_drm_atomic_commit(struct drm_device *dev,
NV_DRM_DEV_LOG_ERR(
nv_dev,
"Flip event timeout on head %u", nv_crtc->head);
while (!list_empty(&nv_crtc->flip_list)) {
__nv_drm_handle_flip_event(nv_crtc);
}
}
}
}

View File

@@ -0,0 +1,285 @@
/*
* Copyright (c) 2015-2023, NVIDIA CORPORATION. All rights reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*/
#include <linux/slab.h>
#include "nvidia-drm-os-interface.h"
#if defined(NV_DRM_AVAILABLE)
#if defined(NV_LINUX_SYNC_FILE_H_PRESENT)
#include <linux/file.h>
#include <linux/sync_file.h>
#endif
#include <linux/vmalloc.h>
#include <linux/sched.h>
#include <linux/device.h>
#include "nv-mm.h"
#if defined(NV_DRM_DRMP_H_PRESENT)
#include <drm/drmP.h>
#endif
bool nv_drm_modeset_module_param = false;
bool nv_drm_fbdev_module_param = false;
void *nv_drm_calloc(size_t nmemb, size_t size)
{
size_t total_size = nmemb * size;
//
// Check for overflow.
//
if ((nmemb != 0) && ((total_size / nmemb) != size))
{
return NULL;
}
return kzalloc(nmemb * size, GFP_KERNEL);
}
void nv_drm_free(void *ptr)
{
if (IS_ERR(ptr)) {
return;
}
kfree(ptr);
}
char *nv_drm_asprintf(const char *fmt, ...)
{
va_list ap;
char *p;
va_start(ap, fmt);
p = kvasprintf(GFP_KERNEL, fmt, ap);
va_end(ap);
return p;
}
#if defined(NVCPU_X86) || defined(NVCPU_X86_64)
#define WRITE_COMBINE_FLUSH() asm volatile("sfence":::"memory")
#elif defined(NVCPU_PPC64LE)
#define WRITE_COMBINE_FLUSH() asm volatile("sync":::"memory")
#else
#define WRITE_COMBINE_FLUSH() mb()
#endif
void nv_drm_write_combine_flush(void)
{
WRITE_COMBINE_FLUSH();
}
int nv_drm_lock_user_pages(unsigned long address,
unsigned long pages_count, struct page ***pages)
{
struct mm_struct *mm = current->mm;
struct page **user_pages;
int pages_pinned;
user_pages = nv_drm_calloc(pages_count, sizeof(*user_pages));
if (user_pages == NULL) {
return -ENOMEM;
}
nv_mmap_read_lock(mm);
pages_pinned = NV_PIN_USER_PAGES(address, pages_count, FOLL_WRITE,
user_pages);
nv_mmap_read_unlock(mm);
if (pages_pinned < 0 || (unsigned)pages_pinned < pages_count) {
goto failed;
}
*pages = user_pages;
return 0;
failed:
if (pages_pinned > 0) {
int i;
for (i = 0; i < pages_pinned; i++) {
NV_UNPIN_USER_PAGE(user_pages[i]);
}
}
nv_drm_free(user_pages);
return (pages_pinned < 0) ? pages_pinned : -EINVAL;
}
void nv_drm_unlock_user_pages(unsigned long pages_count, struct page **pages)
{
unsigned long i;
for (i = 0; i < pages_count; i++) {
set_page_dirty_lock(pages[i]);
NV_UNPIN_USER_PAGE(pages[i]);
}
nv_drm_free(pages);
}
/*
* linuxkpi vmap doesn't use the flags argument as it
* doesn't seem to be needed. Define VM_USERMAP to 0
* to make errors go away
*
* vmap: sys/compat/linuxkpi/common/src/linux_compat.c
*/
#if defined(NV_BSD)
#define VM_USERMAP 0
#endif
void *nv_drm_vmap(struct page **pages, unsigned long pages_count)
{
return vmap(pages, pages_count, VM_USERMAP, PAGE_KERNEL);
}
void nv_drm_vunmap(void *address)
{
vunmap(address);
}
bool nv_drm_workthread_init(nv_drm_workthread *worker, const char *name)
{
worker->shutting_down = false;
if (nv_kthread_q_init(&worker->q, name)) {
return false;
}
spin_lock_init(&worker->lock);
return true;
}
void nv_drm_workthread_shutdown(nv_drm_workthread *worker)
{
unsigned long flags;
spin_lock_irqsave(&worker->lock, flags);
worker->shutting_down = true;
spin_unlock_irqrestore(&worker->lock, flags);
nv_kthread_q_stop(&worker->q);
}
void nv_drm_workthread_work_init(nv_drm_work *work,
void (*callback)(void *),
void *arg)
{
nv_kthread_q_item_init(work, callback, arg);
}
int nv_drm_workthread_add_work(nv_drm_workthread *worker, nv_drm_work *work)
{
unsigned long flags;
int ret = 0;
spin_lock_irqsave(&worker->lock, flags);
if (!worker->shutting_down) {
ret = nv_kthread_q_schedule_q_item(&worker->q, work);
}
spin_unlock_irqrestore(&worker->lock, flags);
return ret;
}
void nv_drm_timer_setup(nv_drm_timer *timer, void (*callback)(nv_drm_timer *nv_drm_timer))
{
nv_timer_setup(timer, callback);
}
void nv_drm_mod_timer(nv_drm_timer *timer, unsigned long timeout_native)
{
mod_timer(&timer->kernel_timer, timeout_native);
}
unsigned long nv_drm_timer_now(void)
{
return jiffies;
}
unsigned long nv_drm_timeout_from_ms(NvU64 relative_timeout_ms)
{
return jiffies + msecs_to_jiffies(relative_timeout_ms);
}
bool nv_drm_del_timer_sync(nv_drm_timer *timer)
{
if (del_timer_sync(&timer->kernel_timer)) {
return true;
} else {
return false;
}
}
#if defined(NV_DRM_FENCE_AVAILABLE)
int nv_drm_create_sync_file(nv_dma_fence_t *fence)
{
#if defined(NV_LINUX_SYNC_FILE_H_PRESENT)
struct sync_file *sync;
int fd = get_unused_fd_flags(O_CLOEXEC);
if (fd < 0) {
return fd;
}
/* sync_file_create() generates its own reference to the fence */
sync = sync_file_create(fence);
if (IS_ERR(sync)) {
put_unused_fd(fd);
return PTR_ERR(sync);
}
fd_install(fd, sync->file);
return fd;
#else /* defined(NV_LINUX_SYNC_FILE_H_PRESENT) */
return -EINVAL;
#endif /* defined(NV_LINUX_SYNC_FILE_H_PRESENT) */
}
nv_dma_fence_t *nv_drm_sync_file_get_fence(int fd)
{
#if defined(NV_SYNC_FILE_GET_FENCE_PRESENT)
return sync_file_get_fence(fd);
#else /* defined(NV_SYNC_FILE_GET_FENCE_PRESENT) */
return NULL;
#endif /* defined(NV_SYNC_FILE_GET_FENCE_PRESENT) */
}
#endif /* defined(NV_DRM_FENCE_AVAILABLE) */
void nv_drm_yield(void)
{
set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout(1);
}
#endif /* NV_DRM_AVAILABLE */

View File

@@ -29,10 +29,47 @@
#if defined(NV_DRM_AVAILABLE)
#if defined(NV_DRM_FENCE_AVAILABLE)
#include "nvidia-dma-fence-helper.h"
#endif
#if defined(NV_LINUX) || defined(NV_BSD)
#include "nv-kthread-q.h"
#include "linux/spinlock.h"
typedef struct nv_drm_workthread {
spinlock_t lock;
struct nv_kthread_q q;
bool shutting_down;
} nv_drm_workthread;
typedef nv_kthread_q_item_t nv_drm_work;
#else
#error "Need to define deferred work primitives for this OS"
#endif
#if defined(NV_LINUX) || defined(NV_BSD)
#include "nv-timer.h"
typedef struct nv_timer nv_drm_timer;
#else
#error "Need to define kernel timer callback primitives for this OS"
#endif
#if defined(NV_DRM_FBDEV_GENERIC_SETUP_PRESENT) && defined(NV_DRM_APERTURE_REMOVE_CONFLICTING_PCI_FRAMEBUFFERS_PRESENT)
#define NV_DRM_FBDEV_GENERIC_AVAILABLE
#endif
struct page;
/* Set to true when the atomic modeset feature is enabled. */
extern bool nv_drm_modeset_module_param;
#if defined(NV_DRM_FBDEV_GENERIC_AVAILABLE)
/* Set to true when the nvidia-drm driver should install a framebuffer device */
extern bool nv_drm_fbdev_module_param;
#endif
void *nv_drm_calloc(size_t nmemb, size_t size);
@@ -51,6 +88,37 @@ void *nv_drm_vmap(struct page **pages, unsigned long pages_count);
void nv_drm_vunmap(void *address);
#endif
bool nv_drm_workthread_init(nv_drm_workthread *worker, const char *name);
/* Can be called concurrently with nv_drm_workthread_add_work() */
void nv_drm_workthread_shutdown(nv_drm_workthread *worker);
void nv_drm_workthread_work_init(nv_drm_work *work,
void (*callback)(void *),
void *arg);
/* Can be called concurrently with nv_drm_workthread_shutdown() */
int nv_drm_workthread_add_work(nv_drm_workthread *worker, nv_drm_work *work);
void nv_drm_timer_setup(nv_drm_timer *timer,
void (*callback)(nv_drm_timer *nv_drm_timer));
void nv_drm_mod_timer(nv_drm_timer *timer, unsigned long relative_timeout_ms);
bool nv_drm_del_timer_sync(nv_drm_timer *timer);
unsigned long nv_drm_timer_now(void);
unsigned long nv_drm_timeout_from_ms(NvU64 relative_timeout_ms);
#if defined(NV_DRM_FENCE_AVAILABLE)
int nv_drm_create_sync_file(nv_dma_fence_t *fence);
nv_dma_fence_t *nv_drm_sync_file_get_fence(int fd);
#endif /* defined(NV_DRM_FENCE_AVAILABLE) */
void nv_drm_yield(void);
#endif /* defined(NV_DRM_AVAILABLE) */
#endif /* __NVIDIA_DRM_OS_INTERFACE_H__ */

View File

@@ -46,12 +46,33 @@
#define NV_DRM_LOG_ERR(__fmt, ...) \
DRM_ERROR("[nvidia-drm] " __fmt "\n", ##__VA_ARGS__)
/*
* DRM_WARN() was added in v4.9 by kernel commit
* 30b0da8d556e65ff935a56cd82c05ba0516d3e4a
*
* Before this commit, only DRM_INFO and DRM_ERROR were defined and
* DRM_INFO(fmt, ...) was defined as
* printk(KERN_INFO "[" DRM_NAME "] " fmt, ##__VA_ARGS__). So, if
* DRM_WARN is undefined this defines NV_DRM_LOG_WARN following the
* same pattern as DRM_INFO.
*/
#ifdef DRM_WARN
#define NV_DRM_LOG_WARN(__fmt, ...) \
DRM_WARN("[nvidia-drm] " __fmt "\n", ##__VA_ARGS__)
#else
#define NV_DRM_LOG_WARN(__fmt, ...) \
printk(KERN_WARNING "[" DRM_NAME "] [nvidia-drm] " __fmt "\n", ##__VA_ARGS__)
#endif
#define NV_DRM_LOG_INFO(__fmt, ...) \
DRM_INFO("[nvidia-drm] " __fmt "\n", ##__VA_ARGS__)
#define NV_DRM_DEV_LOG_INFO(__dev, __fmt, ...) \
NV_DRM_LOG_INFO("[GPU ID 0x%08x] " __fmt, __dev->gpu_info.gpu_id, ##__VA_ARGS__)
#define NV_DRM_DEV_LOG_WARN(__dev, __fmt, ...) \
NV_DRM_LOG_WARN("[GPU ID 0x%08x] " __fmt, __dev->gpu_info.gpu_id, ##__VA_ARGS__)
#define NV_DRM_DEV_LOG_ERR(__dev, __fmt, ...) \
NV_DRM_LOG_ERR("[GPU ID 0x%08x] " __fmt, __dev->gpu_info.gpu_id, ##__VA_ARGS__)
@@ -105,6 +126,7 @@ struct nv_drm_device {
NvU64 modifiers[6 /* block linear */ + 1 /* linear */ + 1 /* terminator */];
#endif
struct delayed_work hotplug_event_work;
atomic_t enable_event_handling;
/**
@@ -117,9 +139,26 @@ struct nv_drm_device {
#endif
#if defined(NV_DRM_FENCE_AVAILABLE)
NvU64 semsurf_stride;
NvU64 semsurf_max_submitted_offset;
#endif
NvBool hasVideoMemory;
NvBool supportsSyncpts;
NvBool subOwnershipGranted;
NvBool hasFramebufferConsole;
/**
* @drmMasterChangedSinceLastAtomicCommit:
*
* This flag is set in nv_drm_master_set and reset after a completed atomic
* commit. It is used to restore or recommit state that is lost by the
* NvKms modeset owner change, such as the CRTC color management
* properties.
*/
NvBool drmMasterChangedSinceLastAtomicCommit;
struct drm_property *nv_out_fence_property;
struct drm_property *nv_input_colorspace_property;

View File

@@ -0,0 +1,132 @@
###########################################################################
# Kbuild fragment for nvidia-drm.ko
###########################################################################
#
# Define NVIDIA_DRM_SOURCES
#
NVIDIA_DRM_SOURCES =
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-drv.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-utils.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-crtc.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-encoder.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-connector.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-gem.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-fb.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-modeset.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-fence.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-helper.c
NVIDIA_DRM_SOURCES += nvidia-drm/nv-kthread-q.c
NVIDIA_DRM_SOURCES += nvidia-drm/nv-pci-table.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-gem-nvkms-memory.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-gem-user-memory.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-gem-dma-buf.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-format.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-os-interface.c
#
# Register the conftests needed by nvidia-drm.ko
#
NV_CONFTEST_GENERIC_COMPILE_TESTS += drm_available
NV_CONFTEST_GENERIC_COMPILE_TESTS += drm_atomic_available
NV_CONFTEST_GENERIC_COMPILE_TESTS += is_export_symbol_gpl_refcount_inc
NV_CONFTEST_GENERIC_COMPILE_TESTS += is_export_symbol_gpl_refcount_dec_and_test
NV_CONFTEST_GENERIC_COMPILE_TESTS += drm_alpha_blending_available
NV_CONFTEST_GENERIC_COMPILE_TESTS += is_export_symbol_present_drm_gem_prime_fd_to_handle
NV_CONFTEST_GENERIC_COMPILE_TESTS += is_export_symbol_present_drm_gem_prime_handle_to_fd
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_dev_unref
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_reinit_primary_mode_group
NV_CONFTEST_FUNCTION_COMPILE_TESTS += get_user_pages_remote
NV_CONFTEST_FUNCTION_COMPILE_TESTS += get_user_pages
NV_CONFTEST_FUNCTION_COMPILE_TESTS += pin_user_pages_remote
NV_CONFTEST_FUNCTION_COMPILE_TESTS += pin_user_pages
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_gem_object_lookup
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_atomic_state_ref_counting
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_driver_has_gem_prime_res_obj
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_atomic_helper_connector_dpms
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_connector_funcs_have_mode_in_name
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_connector_has_vrr_capable_property
NV_CONFTEST_FUNCTION_COMPILE_TESTS += vmf_insert_pfn
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_framebuffer_get
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_gem_object_get
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_dev_put
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_format_num_planes
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_connector_for_each_possible_encoder
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_rotation_available
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_vma_offset_exact_lookup_locked
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_gem_object_put_unlocked
NV_CONFTEST_FUNCTION_COMPILE_TESTS += nvhost_dma_fence_unpack
NV_CONFTEST_FUNCTION_COMPILE_TESTS += list_is_first
NV_CONFTEST_FUNCTION_COMPILE_TESTS += timer_setup
NV_CONFTEST_FUNCTION_COMPILE_TESTS += dma_fence_set_error
NV_CONFTEST_FUNCTION_COMPILE_TESTS += fence_set_error
NV_CONFTEST_FUNCTION_COMPILE_TESTS += sync_file_get_fence
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_aperture_remove_conflicting_pci_framebuffers
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_fbdev_generic_setup
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_connector_attach_hdr_output_metadata_property
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_helper_crtc_enable_color_mgmt
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_crtc_enable_color_mgmt
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_atomic_helper_legacy_gamma_set
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_present
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_has_bus_type
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_has_get_irq
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_has_get_name
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_device_list
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_legacy_dev_list
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_set_busid
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_crtc_state_has_connectors_changed
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_init_function_args
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_helper_mode_fill_fb_struct
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_master_drop_has_from_release_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_unload_has_int_return_type
NV_CONFTEST_TYPE_COMPILE_TESTS += vm_fault_has_address
NV_CONFTEST_TYPE_COMPILE_TESTS += vm_ops_fault_removed_vma_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_atomic_helper_crtc_destroy_state_has_crtc_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_atomic_helper_plane_destroy_state_has_plane_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_mode_object_find_has_file_priv_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += dma_buf_owner
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_connector_list_iter
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_atomic_helper_swap_state_has_stall_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_prime_flag_present
NV_CONFTEST_TYPE_COMPILE_TESTS += vm_fault_t
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_gem_object_has_resv
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_crtc_state_has_async_flip
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_crtc_state_has_pageflip_flags
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_crtc_state_has_vrr_enabled
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_format_modifiers_present
NV_CONFTEST_TYPE_COMPILE_TESTS += mm_has_mmap_lock
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_vma_node_is_allowed_has_tag_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_vma_offset_node_has_readonly
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_display_mode_has_vrefresh
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_master_set_has_int_return_type
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_gem_free_object
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_prime_pages_to_sg_has_drm_device_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_gem_prime_callbacks
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_crtc_atomic_check_has_atomic_state_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_gem_object_vmap_has_map_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_plane_atomic_check_has_atomic_state_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_device_has_pdev
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_crtc_state_has_no_vblank
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_mode_config_has_allow_fb_modifiers
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_has_hdr_output_metadata
NV_CONFTEST_TYPE_COMPILE_TESTS += dma_resv_add_fence
NV_CONFTEST_TYPE_COMPILE_TESTS += dma_resv_reserve_fences
NV_CONFTEST_TYPE_COMPILE_TESTS += reservation_object_reserve_shared_has_num_fences_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_connector_has_override_edid
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_master_has_leases
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_file_get_master
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_modeset_lock_all_end
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_connector_lookup
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_connector_put
NV_CONFTEST_TYPE_COMPILE_TESTS += vm_area_struct_has_const_vm_flags
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_dumb_destroy
NV_CONFTEST_TYPE_COMPILE_TESTS += fence_ops_use_64bit_seqno
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_aperture_remove_conflicting_pci_framebuffers_has_driver_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_mode_create_dp_colorspace_property_has_supported_colorspaces_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_syncobj_features_present
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_unlocked_ioctl_flag_present

View File

@@ -2,29 +2,16 @@
# Kbuild fragment for nvidia-drm.ko
###########################################################################
# Get our source file list and conftest list from the common file
include $(src)/nvidia-drm/nvidia-drm-sources.mk
# Linux-specific sources
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-linux.c
#
# Define NVIDIA_DRM_{SOURCES,OBJECTS}
#
NVIDIA_DRM_SOURCES =
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-drv.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-utils.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-crtc.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-encoder.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-connector.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-gem.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-fb.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-modeset.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-fence.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-linux.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-helper.c
NVIDIA_DRM_SOURCES += nvidia-drm/nv-pci-table.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-gem-nvkms-memory.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-gem-user-memory.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-gem-dma-buf.c
NVIDIA_DRM_SOURCES += nvidia-drm/nvidia-drm-format.c
NVIDIA_DRM_OBJECTS = $(patsubst %.c,%.o,$(NVIDIA_DRM_SOURCES))
obj-m += nvidia-drm.o
@@ -43,91 +30,4 @@ NVIDIA_DRM_CFLAGS += -UDEBUG -U_DEBUG -DNDEBUG -DNV_BUILD_MODULE_INSTANCES=0
$(call ASSIGN_PER_OBJ_CFLAGS, $(NVIDIA_DRM_OBJECTS), $(NVIDIA_DRM_CFLAGS))
#
# Register the conftests needed by nvidia-drm.ko
#
NV_OBJECTS_DEPEND_ON_CONFTEST += $(NVIDIA_DRM_OBJECTS)
NV_CONFTEST_GENERIC_COMPILE_TESTS += drm_available
NV_CONFTEST_GENERIC_COMPILE_TESTS += drm_atomic_available
NV_CONFTEST_GENERIC_COMPILE_TESTS += is_export_symbol_gpl_refcount_inc
NV_CONFTEST_GENERIC_COMPILE_TESTS += is_export_symbol_gpl_refcount_dec_and_test
NV_CONFTEST_GENERIC_COMPILE_TESTS += drm_alpha_blending_available
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_dev_unref
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_reinit_primary_mode_group
NV_CONFTEST_FUNCTION_COMPILE_TESTS += get_user_pages_remote
NV_CONFTEST_FUNCTION_COMPILE_TESTS += get_user_pages
NV_CONFTEST_FUNCTION_COMPILE_TESTS += pin_user_pages_remote
NV_CONFTEST_FUNCTION_COMPILE_TESTS += pin_user_pages
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_gem_object_lookup
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_atomic_state_ref_counting
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_driver_has_gem_prime_res_obj
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_atomic_helper_connector_dpms
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_connector_funcs_have_mode_in_name
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_connector_has_vrr_capable_property
NV_CONFTEST_FUNCTION_COMPILE_TESTS += vmf_insert_pfn
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_framebuffer_get
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_gem_object_get
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_dev_put
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_format_num_planes
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_connector_for_each_possible_encoder
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_rotation_available
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_vma_offset_exact_lookup_locked
NV_CONFTEST_FUNCTION_COMPILE_TESTS += drm_gem_object_put_unlocked
NV_CONFTEST_FUNCTION_COMPILE_TESTS += nvhost_dma_fence_unpack
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_present
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_has_bus_type
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_has_get_irq
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_bus_has_get_name
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_device_list
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_legacy_dev_list
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_set_busid
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_crtc_state_has_connectors_changed
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_init_function_args
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_helper_mode_fill_fb_struct
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_master_drop_has_from_release_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_unload_has_int_return_type
NV_CONFTEST_TYPE_COMPILE_TESTS += vm_fault_has_address
NV_CONFTEST_TYPE_COMPILE_TESTS += vm_ops_fault_removed_vma_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_atomic_helper_crtc_destroy_state_has_crtc_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_atomic_helper_plane_destroy_state_has_plane_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_mode_object_find_has_file_priv_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += dma_buf_owner
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_connector_list_iter
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_atomic_helper_swap_state_has_stall_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_prime_flag_present
NV_CONFTEST_TYPE_COMPILE_TESTS += vm_fault_t
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_gem_object_has_resv
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_crtc_state_has_async_flip
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_crtc_state_has_pageflip_flags
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_crtc_state_has_vrr_enabled
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_format_modifiers_present
NV_CONFTEST_TYPE_COMPILE_TESTS += mm_has_mmap_lock
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_vma_node_is_allowed_has_tag_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_vma_offset_node_has_readonly
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_display_mode_has_vrefresh
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_master_set_has_int_return_type
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_gem_free_object
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_prime_pages_to_sg_has_drm_device_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_gem_prime_callbacks
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_crtc_atomic_check_has_atomic_state_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_gem_object_vmap_has_map_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_plane_atomic_check_has_atomic_state_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_device_has_pdev
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_crtc_state_has_no_vblank
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_mode_config_has_allow_fb_modifiers
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_has_hdr_output_metadata
NV_CONFTEST_TYPE_COMPILE_TESTS += dma_resv_add_fence
NV_CONFTEST_TYPE_COMPILE_TESTS += dma_resv_reserve_fences
NV_CONFTEST_TYPE_COMPILE_TESTS += reservation_object_reserve_shared_has_num_fences_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_connector_has_override_edid
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_master_has_leases
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_file_get_master
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_modeset_lock_all_end
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_connector_lookup
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_connector_put
NV_CONFTEST_TYPE_COMPILE_TESTS += vm_area_struct_has_const_vm_flags
NV_CONFTEST_TYPE_COMPILE_TESTS += drm_driver_has_dumb_destroy

View File

@@ -45,6 +45,7 @@ int nv_drm_init(void)
return -EINVAL;
}
nvKms->setSuspendResumeCallback(nv_drm_suspend_resume);
return nv_drm_probe_devices();
#else
return 0;
@@ -54,6 +55,7 @@ int nv_drm_init(void)
void nv_drm_exit(void)
{
#if defined(NV_DRM_AVAILABLE)
nvKms->setSuspendResumeCallback(NULL);
nv_drm_remove_devices();
#endif
}

View File

@@ -247,6 +247,11 @@ int nv_kthread_q_init_on_node(nv_kthread_q_t *q, const char *q_name, int preferr
return 0;
}
int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname)
{
return nv_kthread_q_init_on_node(q, qname, NV_KTHREAD_NO_NODE);
}
// Returns true (non-zero) if the item was actually scheduled, and false if the
// item was already pending in a queue.
static int _raw_q_schedule(nv_kthread_q_t *q, nv_kthread_q_item_t *q_item)

View File

@@ -1,5 +1,5 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2015-21 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2015-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a
@@ -35,12 +35,13 @@
#include <linux/list.h>
#include <linux/rwsem.h>
#include <linux/freezer.h>
#include <linux/poll.h>
#include <linux/cdev.h>
#include <acpi/video.h>
#include "nvstatus.h"
#include "nv-register-module.h"
#include "nv-modeset-interface.h"
#include "nv-kref.h"
@@ -53,8 +54,13 @@
#include "nv-kthread-q.h"
#include "nv-time.h"
#include "nv-lock.h"
#include "nv-chardev-numbers.h"
#if !defined(CONFIG_RETPOLINE)
/*
* Commit aefb2f2e619b ("x86/bugs: Rename CONFIG_RETPOLINE =>
* CONFIG_MITIGATION_RETPOLINE) in v6.8 renamed CONFIG_RETPOLINE.
*/
#if !defined(CONFIG_RETPOLINE) && !defined(CONFIG_MITIGATION_RETPOLINE)
#include "nv-retpoline.h"
#endif
@@ -65,6 +71,21 @@
static bool output_rounding_fix = true;
module_param_named(output_rounding_fix, output_rounding_fix, bool, 0400);
static bool disable_hdmi_frl = false;
module_param_named(disable_hdmi_frl, disable_hdmi_frl, bool, 0400);
static bool disable_vrr_memclk_switch = false;
module_param_named(disable_vrr_memclk_switch, disable_vrr_memclk_switch, bool, 0400);
static bool hdmi_deepcolor = true;
module_param_named(hdmi_deepcolor, hdmi_deepcolor, bool, 0400);
static bool vblank_sem_control = true;
module_param_named(vblank_sem_control, vblank_sem_control, bool, 0400);
static bool opportunistic_display_sync = true;
module_param_named(opportunistic_display_sync, opportunistic_display_sync, bool, 0400);
/* These parameters are used for fault injection tests. Normally the defaults
* should be used. */
MODULE_PARM_DESC(fail_malloc, "Fail the Nth call to nvkms_alloc");
@@ -75,6 +96,7 @@ MODULE_PARM_DESC(malloc_verbose, "Report information about malloc calls on modul
static bool malloc_verbose = false;
module_param_named(malloc_verbose, malloc_verbose, bool, 0400);
#if NVKMS_CONFIG_FILE_SUPPORTED
/* This parameter is used to find the dpy override conf file */
#define NVKMS_CONF_FILE_SPECIFIED (nvkms_conf != NULL)
@@ -83,6 +105,7 @@ MODULE_PARM_DESC(config_file,
"(default: disabled)");
static char *nvkms_conf = NULL;
module_param_named(config_file, nvkms_conf, charp, 0400);
#endif
static atomic_t nvkms_alloc_called_count;
@@ -91,6 +114,45 @@ NvBool nvkms_output_rounding_fix(void)
return output_rounding_fix;
}
NvBool nvkms_disable_hdmi_frl(void)
{
return disable_hdmi_frl;
}
NvBool nvkms_disable_vrr_memclk_switch(void)
{
return disable_vrr_memclk_switch;
}
NvBool nvkms_hdmi_deepcolor(void)
{
return hdmi_deepcolor;
}
NvBool nvkms_vblank_sem_control(void)
{
return vblank_sem_control;
}
NvBool nvkms_opportunistic_display_sync(void)
{
return opportunistic_display_sync;
}
NvBool nvkms_kernel_supports_syncpts(void)
{
/*
* Note this only checks that the kernel has the prerequisite
* support for syncpts; callers must also check that the hardware
* supports syncpts.
*/
#if (defined(CONFIG_TEGRA_GRHOST) || defined(NV_LINUX_HOST1X_NEXT_H_PRESENT))
return NV_TRUE;
#else
return NV_FALSE;
#endif
}
#define NVKMS_SYNCPT_STUBS_NEEDED
/*************************************************************************
@@ -192,9 +254,23 @@ static inline int nvkms_read_trylock_pm_lock(void)
static inline void nvkms_read_lock_pm_lock(void)
{
while (!down_read_trylock(&nvkms_pm_lock)) {
try_to_freeze();
cond_resched();
if ((current->flags & PF_NOFREEZE)) {
/*
* Non-freezable tasks (i.e. kthreads in this case) don't have to worry
* about being frozen during system suspend, but do need to block so
* that the CPU can go idle during s2idle. Do a normal uninterruptible
* blocking wait for the PM lock.
*/
down_read(&nvkms_pm_lock);
} else {
/*
* For freezable tasks, make sure we give the kernel an opportunity to
* freeze if taking the PM lock fails.
*/
while (!down_read_trylock(&nvkms_pm_lock)) {
try_to_freeze();
cond_resched();
}
}
}
@@ -327,7 +403,7 @@ NvU64 nvkms_get_usec(void)
struct timespec64 ts;
NvU64 ns;
ktime_get_real_ts64(&ts);
ktime_get_raw_ts64(&ts);
ns = timespec64_to_ns(&ts);
return ns / 1000;
@@ -441,6 +517,8 @@ nvkms_event_queue_changed(nvkms_per_open_handle_t *pOpenKernel,
static void nvkms_suspend(NvU32 gpuId)
{
nvKmsKapiSuspendResume(NV_TRUE /* suspend */);
if (gpuId == 0) {
nvkms_write_lock_pm_lock();
}
@@ -459,6 +537,8 @@ static void nvkms_resume(NvU32 gpuId)
if (gpuId == 0) {
nvkms_write_unlock_pm_lock();
}
nvKmsKapiSuspendResume(NV_FALSE /* suspend */);
}
@@ -787,49 +867,6 @@ void nvkms_free_timer(nvkms_timer_handle_t *handle)
timer->cancel = NV_TRUE;
}
void* nvkms_get_per_open_data(int fd)
{
struct file *filp = fget(fd);
struct nvkms_per_open *popen = NULL;
dev_t rdev = 0;
void *data = NULL;
if (filp == NULL) {
return NULL;
}
if (filp->f_inode == NULL) {
goto done;
}
rdev = filp->f_inode->i_rdev;
if ((MAJOR(rdev) != NVKMS_MAJOR_DEVICE_NUMBER) ||
(MINOR(rdev) != NVKMS_MINOR_DEVICE_NUMBER)) {
goto done;
}
popen = filp->private_data;
if (popen == NULL) {
goto done;
}
data = popen->data;
done:
/*
* fget() incremented the struct file's reference count, which
* needs to be balanced with a call to fput(). It is safe to
* decrement the reference count before returning
* filp->private_data because core NVKMS is currently holding the
* nvkms_lock, which prevents the nvkms_close() => nvKmsClose()
* call chain from freeing the file out from under the caller of
* nvkms_get_per_open_data().
*/
fput(filp);
return data;
}
NvBool nvkms_fd_is_nvidia_chardev(int fd)
{
struct file *filp = fget(fd);
@@ -1211,6 +1248,26 @@ void nvkms_close_from_kapi(struct nvkms_per_open *popen)
nvkms_close_pm_unlocked(popen);
}
NvBool nvkms_ioctl_from_kapi_try_pmlock
(
struct nvkms_per_open *popen,
NvU32 cmd, void *params_address, const size_t param_size
)
{
NvBool ret;
if (nvkms_read_trylock_pm_lock()) {
return NV_FALSE;
}
ret = nvkms_ioctl_common(popen,
cmd,
(NvU64)(NvUPtr)params_address, param_size) == 0;
nvkms_read_unlock_pm_lock();
return ret;
}
NvBool nvkms_ioctl_from_kapi
(
struct nvkms_per_open *popen,
@@ -1374,6 +1431,7 @@ static void nvkms_proc_exit(void)
/*************************************************************************
* NVKMS Config File Read
************************************************************************/
#if NVKMS_CONFIG_FILE_SUPPORTED
static NvBool nvkms_fs_mounted(void)
{
return current->fs != NULL;
@@ -1481,6 +1539,11 @@ static void nvkms_read_config_file_locked(void)
nvkms_free(buffer, buf_size);
}
#else
static void nvkms_read_config_file_locked(void)
{
}
#endif
/*************************************************************************
* NVKMS KAPI functions
@@ -1575,6 +1638,12 @@ static int nvkms_ioctl(struct inode *inode, struct file *filp,
return status;
}
static long nvkms_unlocked_ioctl(struct file *filp, unsigned int cmd,
unsigned long arg)
{
return nvkms_ioctl(filp->f_inode, filp, cmd, arg);
}
static unsigned int nvkms_poll(struct file *filp, poll_table *wait)
{
unsigned int mask = 0;
@@ -1602,17 +1671,73 @@ static unsigned int nvkms_poll(struct file *filp, poll_table *wait)
* Module loading support code.
*************************************************************************/
static nvidia_module_t nvidia_modeset_module = {
#define NVKMS_RDEV (MKDEV(NV_MAJOR_DEVICE_NUMBER, \
NV_MINOR_DEVICE_NUMBER_MODESET_DEVICE))
static struct file_operations nvkms_fops = {
.owner = THIS_MODULE,
.module_name = "nvidia-modeset",
.instance = 1, /* minor number: 255-1=254 */
.open = nvkms_open,
.close = nvkms_close,
.mmap = nvkms_mmap,
.ioctl = nvkms_ioctl,
.poll = nvkms_poll,
.unlocked_ioctl = nvkms_unlocked_ioctl,
#if NVCPU_IS_X86_64 || NVCPU_IS_AARCH64
.compat_ioctl = nvkms_unlocked_ioctl,
#endif
.mmap = nvkms_mmap,
.open = nvkms_open,
.release = nvkms_close,
};
static struct cdev nvkms_device_cdev;
static int __init nvkms_register_chrdev(void)
{
int ret;
ret = register_chrdev_region(NVKMS_RDEV, 1, "nvidia-modeset");
if (ret < 0) {
return ret;
}
cdev_init(&nvkms_device_cdev, &nvkms_fops);
ret = cdev_add(&nvkms_device_cdev, NVKMS_RDEV, 1);
if (ret < 0) {
unregister_chrdev_region(NVKMS_RDEV, 1);
return ret;
}
return ret;
}
static void nvkms_unregister_chrdev(void)
{
cdev_del(&nvkms_device_cdev);
unregister_chrdev_region(NVKMS_RDEV, 1);
}
void* nvkms_get_per_open_data(int fd)
{
struct file *filp = fget(fd);
void *data = NULL;
if (filp) {
if (filp->f_op == &nvkms_fops && filp->private_data) {
struct nvkms_per_open *popen = filp->private_data;
data = popen->data;
}
/*
* fget() incremented the struct file's reference count, which needs to
* be balanced with a call to fput(). It is safe to decrement the
* reference count before returning filp->private_data because core
* NVKMS is currently holding the nvkms_lock, which prevents the
* nvkms_close() => nvKmsClose() call chain from freeing the file out
* from under the caller of nvkms_get_per_open_data().
*/
fput(filp);
}
return data;
}
static int __init nvkms_init(void)
{
int ret;
@@ -1643,10 +1768,9 @@ static int __init nvkms_init(void)
INIT_LIST_HEAD(&nvkms_timers.list);
spin_lock_init(&nvkms_timers.lock);
ret = nvidia_register_module(&nvidia_modeset_module);
ret = nvkms_register_chrdev();
if (ret != 0) {
goto fail_register_module;
goto fail_register_chrdev;
}
down(&nvkms_lock);
@@ -1665,8 +1789,8 @@ static int __init nvkms_init(void)
return 0;
fail_module_load:
nvidia_unregister_module(&nvidia_modeset_module);
fail_register_module:
nvkms_unregister_chrdev();
fail_register_chrdev:
nv_kthread_q_stop(&nvkms_deferred_close_kthread_q);
fail_deferred_close_kthread:
nv_kthread_q_stop(&nvkms_kthread_q);
@@ -1730,7 +1854,7 @@ restart:
nv_kthread_q_stop(&nvkms_deferred_close_kthread_q);
nv_kthread_q_stop(&nvkms_kthread_q);
nvidia_unregister_module(&nvidia_modeset_module);
nvkms_unregister_chrdev();
nvkms_free_rm();
if (malloc_verbose) {

View File

@@ -97,6 +97,11 @@ typedef struct {
} NvKmsSyncPtOpParams;
NvBool nvkms_output_rounding_fix(void);
NvBool nvkms_disable_hdmi_frl(void);
NvBool nvkms_disable_vrr_memclk_switch(void);
NvBool nvkms_hdmi_deepcolor(void);
NvBool nvkms_vblank_sem_control(void);
NvBool nvkms_opportunistic_display_sync(void);
void nvkms_call_rm (void *ops);
void* nvkms_alloc (size_t size,
@@ -299,6 +304,11 @@ NvU32 nvkms_enumerate_gpus(nv_gpu_info_t *gpu_info);
NvBool nvkms_allow_write_combining(void);
/*!
* Check if OS supports syncpoints.
*/
NvBool nvkms_kernel_supports_syncpts(void);
/*!
* Checks whether the fd is associated with an nvidia character device.
*/
@@ -323,6 +333,16 @@ NvBool nvkms_ioctl_from_kapi
NvU32 cmd, void *params_address, const size_t params_size
);
/*!
* Like nvkms_ioctl_from_kapi, but return NV_FALSE instead of waiting if the
* power management read lock cannot be acquired.
*/
NvBool nvkms_ioctl_from_kapi_try_pmlock
(
struct nvkms_per_open *popen,
NvU32 cmd, void *params_address, const size_t params_size
);
/*!
* APIs for locking.
*/

View File

@@ -58,6 +58,18 @@ nvidia-modeset-y += $(NVIDIA_MODESET_BINARY_OBJECT_O)
NVIDIA_MODESET_CFLAGS += -I$(src)/nvidia-modeset
NVIDIA_MODESET_CFLAGS += -UDEBUG -U_DEBUG -DNDEBUG -DNV_BUILD_MODULE_INSTANCES=0
# Some Android kernels prohibit driver use of filesystem functions like
# filp_open() and kernel_read(). Disable the NVKMS_CONFIG_FILE_SUPPORTED
# functionality that uses those functions when building for Android.
PLATFORM_IS_ANDROID ?= 0
ifeq ($(PLATFORM_IS_ANDROID),1)
NVIDIA_MODESET_CFLAGS += -DNVKMS_CONFIG_FILE_SUPPORTED=0
else
NVIDIA_MODESET_CFLAGS += -DNVKMS_CONFIG_FILE_SUPPORTED=1
endif
$(call ASSIGN_PER_OBJ_CFLAGS, $(NVIDIA_MODESET_OBJECTS), $(NVIDIA_MODESET_CFLAGS))
@@ -93,3 +105,4 @@ NV_CONFTEST_FUNCTION_COMPILE_TESTS += list_is_first
NV_CONFTEST_FUNCTION_COMPILE_TESTS += ktime_get_real_ts64
NV_CONFTEST_FUNCTION_COMPILE_TESTS += ktime_get_raw_ts64
NV_CONFTEST_FUNCTION_COMPILE_TESTS += acpi_video_backlight_use_native
NV_CONFTEST_FUNCTION_COMPILE_TESTS += kernel_read_has_pointer_pos_arg

View File

@@ -66,6 +66,8 @@ enum NvKmsClientType {
NVKMS_CLIENT_KERNEL_SPACE,
};
struct NvKmsPerOpenDev;
NvBool nvKmsIoctl(
void *pOpenVoid,
NvU32 cmd,
@@ -101,7 +103,11 @@ NvBool nvKmsKapiGetFunctionsTableInternal
struct NvKmsKapiFunctionsTable *funcsTable
);
void nvKmsKapiSuspendResume(NvBool suspend);
NvBool nvKmsGetBacklight(NvU32 display_id, void *drv_priv, NvU32 *brightness);
NvBool nvKmsSetBacklight(NvU32 display_id, void *drv_priv, NvU32 brightness);
NvBool nvKmsOpenDevHasSubOwnerPermissionOrBetter(const struct NvKmsPerOpenDev *pOpenDev);
#endif /* __NV_KMS_H__ */

View File

@@ -1,20 +1,25 @@
/* SPDX-License-Identifier: Linux-OpenIB */
/*
* Copyright (c) 2006, 2007 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2007, 2008 Mellanox Technologies. All rights reserved.
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
@@ -43,7 +48,9 @@
MODULE_AUTHOR("Yishai Hadas");
MODULE_DESCRIPTION("NVIDIA GPU memory plug-in");
MODULE_LICENSE("Linux-OpenIB");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_VERSION(DRV_VERSION);
enum {
NV_MEM_PEERDIRECT_SUPPORT_DEFAULT = 0,
@@ -53,7 +60,13 @@ static int peerdirect_support = NV_MEM_PEERDIRECT_SUPPORT_DEFAULT;
module_param(peerdirect_support, int, S_IRUGO);
MODULE_PARM_DESC(peerdirect_support, "Set level of support for Peer-direct, 0 [default] or 1 [legacy, for example MLNX_OFED 4.9 LTS]");
#define peer_err(FMT, ARGS...) printk(KERN_ERR "nvidia-peermem" " %s:%d " FMT, __FUNCTION__, __LINE__, ## ARGS)
#define peer_err(FMT, ARGS...) printk(KERN_ERR "nvidia-peermem" " %s:%d ERROR " FMT, __FUNCTION__, __LINE__, ## ARGS)
#ifdef NV_MEM_DEBUG
#define peer_trace(FMT, ARGS...) printk(KERN_DEBUG "nvidia-peermem" " %s:%d TRACE " FMT, __FUNCTION__, __LINE__, ## ARGS)
#else
#define peer_trace(FMT, ARGS...) do {} while (0)
#endif
#if defined(NV_MLNX_IB_PEER_MEM_SYMBOLS_PRESENT)
@@ -74,7 +87,10 @@ invalidate_peer_memory mem_invalidate_callback;
static void *reg_handle = NULL;
static void *reg_handle_nc = NULL;
#define NV_MEM_CONTEXT_MAGIC ((u64)0xF1F4F1D0FEF0DAD0ULL)
struct nv_mem_context {
u64 pad1;
struct nvidia_p2p_page_table *page_table;
struct nvidia_p2p_dma_mapping *dma_mapping;
u64 core_context;
@@ -86,8 +102,22 @@ struct nv_mem_context {
struct task_struct *callback_task;
int sg_allocated;
struct sg_table sg_head;
u64 pad2;
};
#define NV_MEM_CONTEXT_CHECK_OK(MC) ({ \
struct nv_mem_context *mc = (MC); \
int rc = ((0 != mc) && \
(READ_ONCE(mc->pad1) == NV_MEM_CONTEXT_MAGIC) && \
(READ_ONCE(mc->pad2) == NV_MEM_CONTEXT_MAGIC)); \
if (!rc) { \
peer_trace("invalid nv_mem_context=%px pad1=%016llx pad2=%016llx\n", \
mc, \
mc?mc->pad1:0, \
mc?mc->pad2:0); \
} \
rc; \
})
static void nv_get_p2p_free_callback(void *data)
{
@@ -97,8 +127,9 @@ static void nv_get_p2p_free_callback(void *data)
struct nvidia_p2p_dma_mapping *dma_mapping = NULL;
__module_get(THIS_MODULE);
if (!nv_mem_context) {
peer_err("nv_get_p2p_free_callback -- invalid nv_mem_context\n");
if (!NV_MEM_CONTEXT_CHECK_OK(nv_mem_context)) {
peer_err("detected invalid context, skipping further processing\n");
goto out;
}
@@ -169,9 +200,11 @@ static int nv_mem_acquire(unsigned long addr, size_t size, void *peer_mem_privat
/* Error case handled as not mine */
return 0;
nv_mem_context->pad1 = NV_MEM_CONTEXT_MAGIC;
nv_mem_context->page_virt_start = addr & GPU_PAGE_MASK;
nv_mem_context->page_virt_end = (addr + size + GPU_PAGE_SIZE - 1) & GPU_PAGE_MASK;
nv_mem_context->mapped_size = nv_mem_context->page_virt_end - nv_mem_context->page_virt_start;
nv_mem_context->pad2 = NV_MEM_CONTEXT_MAGIC;
ret = nvidia_p2p_get_pages(0, 0, nv_mem_context->page_virt_start, nv_mem_context->mapped_size,
&nv_mem_context->page_table, nv_mem_dummy_callback, nv_mem_context);
@@ -195,6 +228,7 @@ static int nv_mem_acquire(unsigned long addr, size_t size, void *peer_mem_privat
return 1;
err:
memset(nv_mem_context, 0, sizeof(*nv_mem_context));
kfree(nv_mem_context);
/* Error case handled as not mine */
@@ -249,8 +283,8 @@ static int nv_dma_map(struct sg_table *sg_head, void *context,
nv_mem_context->sg_allocated = 1;
for_each_sg(sg_head->sgl, sg, nv_mem_context->npages, i) {
sg_set_page(sg, NULL, nv_mem_context->page_size, 0);
sg->dma_address = dma_mapping->dma_addresses[i];
sg->dma_length = nv_mem_context->page_size;
sg_dma_address(sg) = dma_mapping->dma_addresses[i];
sg_dma_len(sg) = nv_mem_context->page_size;
}
nv_mem_context->sg_head = *sg_head;
*nmap = nv_mem_context->npages;
@@ -304,8 +338,13 @@ static void nv_mem_put_pages_common(int nc,
return;
if (nc) {
#ifdef NVIDIA_P2P_CAP_GET_PAGES_PERSISTENT_API
ret = nvidia_p2p_put_pages_persistent(nv_mem_context->page_virt_start,
nv_mem_context->page_table, 0);
#else
ret = nvidia_p2p_put_pages(0, 0, nv_mem_context->page_virt_start,
nv_mem_context->page_table);
#endif
} else {
ret = nvidia_p2p_put_pages(0, 0, nv_mem_context->page_virt_start,
nv_mem_context->page_table);
@@ -342,6 +381,7 @@ static void nv_mem_release(void *context)
sg_free_table(&nv_mem_context->sg_head);
nv_mem_context->sg_allocated = 0;
}
memset(nv_mem_context, 0, sizeof(*nv_mem_context));
kfree(nv_mem_context);
module_put(THIS_MODULE);
return;
@@ -412,9 +452,15 @@ static int nv_mem_get_pages_nc(unsigned long addr,
nv_mem_context->core_context = core_context;
nv_mem_context->page_size = GPU_PAGE_SIZE;
#ifdef NVIDIA_P2P_CAP_GET_PAGES_PERSISTENT_API
ret = nvidia_p2p_get_pages_persistent(nv_mem_context->page_virt_start,
nv_mem_context->mapped_size,
&nv_mem_context->page_table, 0);
#else
ret = nvidia_p2p_get_pages(0, 0, nv_mem_context->page_virt_start, nv_mem_context->mapped_size,
&nv_mem_context->page_table, NULL, NULL);
#endif
if (ret < 0) {
peer_err("error %d while calling nvidia_p2p_get_pages() with NULL callback\n", ret);
return ret;
@@ -459,8 +505,6 @@ static int __init nv_mem_client_init(void)
}
#if defined (NV_MLNX_IB_PEER_MEM_SYMBOLS_PRESENT)
int status = 0;
// off by one, to leave space for the trailing '1' which is flagging
// the new client type
BUG_ON(strlen(DRV_NAME) > IB_PEER_MEMORY_NAME_MAX-1);
@@ -489,7 +533,7 @@ static int __init nv_mem_client_init(void)
&mem_invalidate_callback);
if (!reg_handle) {
peer_err("nv_mem_client_init -- error while registering traditional client\n");
status = -EINVAL;
rc = -EINVAL;
goto out;
}
@@ -499,12 +543,12 @@ static int __init nv_mem_client_init(void)
reg_handle_nc = ib_register_peer_memory_client(&nv_mem_client_nc, NULL);
if (!reg_handle_nc) {
peer_err("nv_mem_client_init -- error while registering nc client\n");
status = -EINVAL;
rc = -EINVAL;
goto out;
}
out:
if (status) {
if (rc) {
if (reg_handle) {
ib_unregister_peer_memory_client(reg_handle);
reg_handle = NULL;
@@ -516,7 +560,7 @@ out:
}
}
return status;
return rc;
#else
return -EINVAL;
#endif

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2022 NVIDIA Corporation
Copyright (c) 2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2022 NVIDIA Corporation
Copyright (c) 2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2013-2022 NVIDIA Corporation
Copyright (c) 2013-2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to

View File

@@ -247,6 +247,11 @@ int nv_kthread_q_init_on_node(nv_kthread_q_t *q, const char *q_name, int preferr
return 0;
}
int nv_kthread_q_init(nv_kthread_q_t *q, const char *qname)
{
return nv_kthread_q_init_on_node(q, qname, NV_KTHREAD_NO_NODE);
}
// Returns true (non-zero) if the item was actually scheduled, and false if the
// item was already pending in a queue.
static int _raw_q_schedule(nv_kthread_q_t *q, nv_kthread_q_item_t *q_item)

View File

@@ -27,6 +27,7 @@ NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_rm_mem.c
NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_channel.c
NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_lock.c
NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_hal.c
NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_processors.c
NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_range_tree.c
NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_rb_tree.c
NVIDIA_UVM_SOURCES += nvidia-uvm/uvm_range_allocator.c

View File

@@ -81,12 +81,13 @@ NV_CONFTEST_FUNCTION_COMPILE_TESTS += set_memory_uc
NV_CONFTEST_FUNCTION_COMPILE_TESTS += set_pages_uc
NV_CONFTEST_FUNCTION_COMPILE_TESTS += ktime_get_raw_ts64
NV_CONFTEST_FUNCTION_COMPILE_TESTS += ioasid_get
NV_CONFTEST_FUNCTION_COMPILE_TESTS += mm_pasid_set
NV_CONFTEST_FUNCTION_COMPILE_TESTS += migrate_vma_setup
NV_CONFTEST_FUNCTION_COMPILE_TESTS += mm_pasid_drop
NV_CONFTEST_FUNCTION_COMPILE_TESTS += mmget_not_zero
NV_CONFTEST_FUNCTION_COMPILE_TESTS += mmgrab
NV_CONFTEST_FUNCTION_COMPILE_TESTS += iommu_sva_bind_device_has_drvdata_arg
NV_CONFTEST_FUNCTION_COMPILE_TESTS += vm_fault_to_errno
NV_CONFTEST_FUNCTION_COMPILE_TESTS += find_next_bit_wrap
NV_CONFTEST_FUNCTION_COMPILE_TESTS += iommu_is_dma_domain
NV_CONFTEST_TYPE_COMPILE_TESTS += backing_dev_info
NV_CONFTEST_TYPE_COMPILE_TESTS += mm_context_t
@@ -100,6 +101,7 @@ NV_CONFTEST_TYPE_COMPILE_TESTS += kmem_cache_has_kobj_remove_work
NV_CONFTEST_TYPE_COMPILE_TESTS += sysfs_slab_unlink
NV_CONFTEST_TYPE_COMPILE_TESTS += vm_fault_t
NV_CONFTEST_TYPE_COMPILE_TESTS += mmu_notifier_ops_invalidate_range
NV_CONFTEST_TYPE_COMPILE_TESTS += mmu_notifier_ops_arch_invalidate_secondary_tlbs
NV_CONFTEST_TYPE_COMPILE_TESTS += proc_ops
NV_CONFTEST_TYPE_COMPILE_TESTS += timespec64
NV_CONFTEST_TYPE_COMPILE_TESTS += mm_has_mmap_lock
@@ -108,5 +110,10 @@ NV_CONFTEST_TYPE_COMPILE_TESTS += migrate_device_range
NV_CONFTEST_TYPE_COMPILE_TESTS += vm_area_struct_has_const_vm_flags
NV_CONFTEST_TYPE_COMPILE_TESTS += handle_mm_fault_has_mm_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += handle_mm_fault_has_pt_regs_arg
NV_CONFTEST_TYPE_COMPILE_TESTS += mempolicy_has_unified_nodes
NV_CONFTEST_TYPE_COMPILE_TESTS += mempolicy_has_home_node
NV_CONFTEST_TYPE_COMPILE_TESTS += mpol_preferred_many_present
NV_CONFTEST_TYPE_COMPILE_TESTS += mmu_interval_notifier
NV_CONFTEST_SYMBOL_COMPILE_TESTS += is_export_symbol_present_int_active_memcg
NV_CONFTEST_SYMBOL_COMPILE_TESTS += is_export_symbol_present_migrate_vma_setup

View File

@@ -24,16 +24,17 @@
#include "nvstatus.h"
#if !defined(NV_PRINTF_STRING_SECTION)
#if defined(NVRM) && NVCPU_IS_RISCV64
#define NV_PRINTF_STRING_SECTION __attribute__ ((section (".logging")))
#else // defined(NVRM) && NVCPU_IS_RISCV64
#if defined(NVRM) && NVOS_IS_LIBOS
#include "libos_log.h"
#define NV_PRINTF_STRING_SECTION LIBOS_SECTION_LOGGING
#else // defined(NVRM) && NVOS_IS_LIBOS
#define NV_PRINTF_STRING_SECTION
#endif // defined(NVRM) && NVCPU_IS_RISCV64
#endif // defined(NVRM) && NVOS_IS_LIBOS
#endif // !defined(NV_PRINTF_STRING_SECTION)
/*
* Include nvstatuscodes.h twice. Once for creating constant strings in the
* the NV_PRINTF_STRING_SECTION section of the ececutable, and once to build
* the NV_PRINTF_STRING_SECTION section of the executable, and once to build
* the g_StatusCodeList table.
*/
#undef NV_STATUS_CODE

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2015-2022 NVIDIA Corporation
Copyright (c) 2015-2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -571,7 +571,6 @@ static void uvm_vm_open_managed_entry(struct vm_area_struct *vma)
static void uvm_vm_close_managed(struct vm_area_struct *vma)
{
uvm_va_space_t *va_space = uvm_va_space_get(vma->vm_file);
uvm_processor_id_t gpu_id;
bool make_zombie = false;
if (current->mm != NULL)
@@ -606,12 +605,6 @@ static void uvm_vm_close_managed(struct vm_area_struct *vma)
uvm_destroy_vma_managed(vma, make_zombie);
// Notify GPU address spaces that the fault buffer needs to be flushed to
// avoid finding stale entries that can be attributed to new VA ranges
// reallocated at the same address.
for_each_gpu_id_in_mask(gpu_id, &va_space->registered_gpu_va_spaces) {
uvm_processor_mask_set_atomic(&va_space->needs_fault_buffer_flush, gpu_id);
}
uvm_va_space_up_write(va_space);
if (current->mm != NULL)
@@ -1060,7 +1053,7 @@ NV_STATUS uvm_test_register_unload_state_buffer(UVM_TEST_REGISTER_UNLOAD_STATE_B
// are not used because unload_state_buf may be a managed memory pointer and
// therefore a locking assertion from the CPU fault handler could be fired.
nv_mmap_read_lock(current->mm);
ret = NV_PIN_USER_PAGES(params->unload_state_buf, 1, FOLL_WRITE, &page, NULL);
ret = NV_PIN_USER_PAGES(params->unload_state_buf, 1, FOLL_WRITE, &page);
nv_mmap_read_unlock(current->mm);
if (ret < 0)

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2013-2022 NVIDIA Corporation
Copyright (c) 2013-2024 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -45,16 +45,20 @@
// #endif
// 3) Do the same thing for the function definition, and for any structs that
// are taken as arguments to these functions.
// 4) Let this change propagate over to cuda_a, so that the CUDA driver can
// start using the new API by bumping up the API version number its using.
// This can be found in gpgpu/cuda/cuda.nvmk.
// 5) Once the cuda_a changes have made it back into chips_a, remove the old API
// declaration, definition, and any old structs that were in use.
// 4) Let this change propagate over to cuda_a and dev_a, so that the CUDA and
// nvidia-cfg libraries can start using the new API by bumping up the API
// version number it's using.
// Places where UVM_API_REVISION is defined are:
// drivers/gpgpu/cuda/cuda.nvmk (cuda_a)
// drivers/setup/linux/nvidia-cfg/makefile.nvmk (dev_a)
// 5) Once the dev_a and cuda_a changes have made it back into chips_a,
// remove the old API declaration, definition, and any old structs that were
// in use.
#ifndef _UVM_H_
#define _UVM_H_
#define UVM_API_LATEST_REVISION 8
#define UVM_API_LATEST_REVISION 11
#if !defined(UVM_API_REVISION)
#error "please define UVM_API_REVISION macro to a desired version number or UVM_API_LATEST_REVISION macro"
@@ -180,12 +184,8 @@ NV_STATUS UvmSetDriverVersion(NvU32 major, NvU32 changelist);
// because it is not very informative.
//
//------------------------------------------------------------------------------
#if UVM_API_REV_IS_AT_MOST(4)
NV_STATUS UvmInitialize(UvmFileDescriptor fd);
#else
NV_STATUS UvmInitialize(UvmFileDescriptor fd,
NvU64 flags);
#endif
//------------------------------------------------------------------------------
// UvmDeinitialize
@@ -216,6 +216,10 @@ NV_STATUS UvmDeinitialize(void);
// Note that it is not required to release VA ranges that were reserved with
// UvmReserveVa().
//
// This is useful for per-process checkpoint and restore, where kernel-mode
// state needs to be reconfigured to match the expectations of a pre-existing
// user-mode process.
//
// UvmReopen() closes the open file returned by UvmGetFileDescriptor() and
// replaces it with a new open file with the same name.
//
@@ -293,7 +297,9 @@ NV_STATUS UvmIsPageableMemoryAccessSupported(NvBool *pageableMemAccess);
//
// Arguments:
// gpuUuid: (INPUT)
// UUID of the GPU for which pageable memory access support is queried.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition for which
// pageable memory access support is queried.
//
// pageableMemAccess: (OUTPUT)
// Returns true (non-zero) if the GPU represented by gpuUuid supports
@@ -323,9 +329,19 @@ NV_STATUS UvmIsPageableMemoryAccessSupportedOnGpu(const NvProcessorUuid *gpuUuid
// usage. Calling UvmRegisterGpu multiple times on the same GPU from the same
// process results in an error.
//
// After successfully registering a GPU partition, all subsequent API calls
// which take a NvProcessorUuid argument (including UvmGpuMappingAttributes),
// must use the GI partition UUID which can be obtained with
// NvRmControl(NVC637_CTRL_CMD_GET_UUID). Otherwise, if the GPU is not SMC
// capable or SMC enabled, the physical GPU UUID must be used.
//
// Arguments:
// gpuUuid: (INPUT)
// UUID of the GPU to register.
// UUID of the physical GPU to register.
//
// platformParams: (INPUT)
// User handles identifying the GPU partition to register.
// This should be NULL if the GPU is not SMC capable or SMC enabled.
//
// Error codes:
// NV_ERR_NO_MEMORY:
@@ -360,27 +376,31 @@ NV_STATUS UvmIsPageableMemoryAccessSupportedOnGpu(const NvProcessorUuid *gpuUuid
// OS state required to register the GPU is not present.
//
// NV_ERR_INVALID_STATE:
// OS state required to register the GPU is malformed.
// OS state required to register the GPU is malformed, or the partition
// identified by the user handles or its configuration changed.
//
// NV_ERR_GENERIC:
// Unexpected error. We try hard to avoid returning this error code,
// because it is not very informative.
//
//------------------------------------------------------------------------------
#if UVM_API_REV_IS_AT_MOST(8)
NV_STATUS UvmRegisterGpu(const NvProcessorUuid *gpuUuid);
#else
NV_STATUS UvmRegisterGpu(const NvProcessorUuid *gpuUuid,
const UvmGpuPlatformParams *platformParams);
#endif
#if UVM_API_REV_IS_AT_MOST(8)
//------------------------------------------------------------------------------
// UvmRegisterGpuSmc
//
// The same as UvmRegisterGpu, but takes additional parameters to specify the
// GPU partition being registered if SMC is enabled.
//
// TODO: Bug 2844714: Merge UvmRegisterGpuSmc() with UvmRegisterGpu() once
// the initial SMC support is in place.
//
// Arguments:
// gpuUuid: (INPUT)
// UUID of the parent GPU of the SMC partition to register.
// UUID of the physical GPU of the SMC partition to register.
//
// platformParams: (INPUT)
// User handles identifying the partition to register.
@@ -393,6 +413,7 @@ NV_STATUS UvmRegisterGpu(const NvProcessorUuid *gpuUuid);
//
NV_STATUS UvmRegisterGpuSmc(const NvProcessorUuid *gpuUuid,
const UvmGpuPlatformParams *platformParams);
#endif
//------------------------------------------------------------------------------
// UvmUnregisterGpu
@@ -418,7 +439,8 @@ NV_STATUS UvmRegisterGpuSmc(const NvProcessorUuid *gpuUuid,
//
// Arguments:
// gpuUuid: (INPUT)
// UUID of the GPU to unregister.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition to unregister.
//
// Error codes:
// NV_ERR_INVALID_DEVICE:
@@ -476,7 +498,8 @@ NV_STATUS UvmUnregisterGpu(const NvProcessorUuid *gpuUuid);
//
// Arguments:
// gpuUuid: (INPUT)
// UUID of the GPU to register.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition to register.
//
// platformParams: (INPUT)
// On Linux: RM ctrl fd, hClient and hVaSpace.
@@ -547,7 +570,9 @@ NV_STATUS UvmRegisterGpuVaSpace(const NvProcessorUuid *gpuUuid,
//
// Arguments:
// gpuUuid: (INPUT)
// UUID of the GPU whose VA space should be unregistered.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition whose VA space
// should be unregistered.
//
// Error codes:
// NV_ERR_INVALID_DEVICE:
@@ -577,7 +602,7 @@ NV_STATUS UvmUnregisterGpuVaSpace(const NvProcessorUuid *gpuUuid);
//
// The two GPUs must be connected via PCIe. An error is returned if the GPUs are
// not connected or are connected over an interconnect different than PCIe
// (NVLink, for example).
// (NVLink or SMC partitions, for example).
//
// If both GPUs have GPU VA spaces registered for them, the two GPU VA spaces
// must support the same set of page sizes for GPU mappings.
@@ -590,10 +615,12 @@ NV_STATUS UvmUnregisterGpuVaSpace(const NvProcessorUuid *gpuUuid);
//
// Arguments:
// gpuUuidA: (INPUT)
// UUID of GPU A.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition A.
//
// gpuUuidB: (INPUT)
// UUID of GPU B.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition B.
//
// Error codes:
// NV_ERR_NO_MEMORY:
@@ -639,10 +666,12 @@ NV_STATUS UvmEnablePeerAccess(const NvProcessorUuid *gpuUuidA,
//
// Arguments:
// gpuUuidA: (INPUT)
// UUID of GPU A.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition A.
//
// gpuUuidB: (INPUT)
// UUID of GPU B.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition B.
//
// Error codes:
// NV_ERR_INVALID_DEVICE:
@@ -687,7 +716,9 @@ NV_STATUS UvmDisablePeerAccess(const NvProcessorUuid *gpuUuidA,
//
// Arguments:
// gpuUuid: (INPUT)
// UUID of the GPU that the channel is associated with.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition that the channel is
// associated with.
//
// platformParams: (INPUT)
// On Linux: RM ctrl fd, hClient and hChannel.
@@ -1126,11 +1157,14 @@ NV_STATUS UvmAllowMigrationRangeGroups(const NvU64 *rangeGroupIds,
// Length, in bytes, of the range.
//
// preferredLocationUuid: (INPUT)
// UUID of the preferred location for this VA range.
// UUID of the CPU, UUID of the physical GPU if the GPU is not SMC
// capable or SMC enabled, or the GPU instance UUID of the partition of
// the preferred location for this VA range.
//
// accessedByUuids: (INPUT)
// UUIDs of all processors that should have persistent mappings to this
// VA range.
// UUID of the CPU, UUID of the physical GPUs if the GPUs are not SMC
// capable or SMC enabled, or the GPU instance UUID of the partitions
// that should have persistent mappings to this VA range.
//
// accessedByCount: (INPUT)
// Number of elements in the accessedByUuids array.
@@ -1408,12 +1442,13 @@ NV_STATUS UvmAllocSemaphorePool(void *base,
// Length, in bytes, of the range.
//
// destinationUuid: (INPUT)
// UUID of the destination processor to migrate pages to.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, the GPU instance UUID of the partition, or the CPU UUID to
// migrate pages to.
//
// preferredCpuMemoryNode: (INPUT)
// Preferred CPU NUMA memory node used if the destination processor is
// the CPU. This argument is ignored if the given virtual address range
// corresponds to managed memory.
// the CPU.
//
// Error codes:
// NV_ERR_INVALID_ADDRESS:
@@ -1452,16 +1487,10 @@ NV_STATUS UvmAllocSemaphorePool(void *base,
// pages were associated with a non-migratable range group.
//
//------------------------------------------------------------------------------
#if UVM_API_REV_IS_AT_MOST(5)
NV_STATUS UvmMigrate(void *base,
NvLength length,
const NvProcessorUuid *destinationUuid);
#else
NV_STATUS UvmMigrate(void *base,
NvLength length,
const NvProcessorUuid *destinationUuid,
NvS32 preferredCpuMemoryNode);
#endif
//------------------------------------------------------------------------------
// UvmMigrateAsync
@@ -1493,7 +1522,9 @@ NV_STATUS UvmMigrate(void *base,
// Length, in bytes, of the range.
//
// destinationUuid: (INPUT)
// UUID of the destination processor to migrate pages to.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, the GPU instance UUID of the partition, or the CPU UUID to
// migrate pages to.
//
// preferredCpuMemoryNode: (INPUT)
// Preferred CPU NUMA memory node used if the destination processor is
@@ -1543,20 +1574,12 @@ NV_STATUS UvmMigrate(void *base,
// pages were associated with a non-migratable range group.
//
//------------------------------------------------------------------------------
#if UVM_API_REV_IS_AT_MOST(5)
NV_STATUS UvmMigrateAsync(void *base,
NvLength length,
const NvProcessorUuid *destinationUuid,
void *semaphoreAddress,
NvU32 semaphorePayload);
#else
NV_STATUS UvmMigrateAsync(void *base,
NvLength length,
const NvProcessorUuid *destinationUuid,
NvS32 preferredCpuMemoryNode,
void *semaphoreAddress,
NvU32 semaphorePayload);
#endif
//------------------------------------------------------------------------------
// UvmMigrateRangeGroup
@@ -1564,9 +1587,7 @@ NV_STATUS UvmMigrateAsync(void *base,
// Migrates the backing of all virtual address ranges associated with the given
// range group to the specified destination processor. The behavior of this API
// is equivalent to calling UvmMigrate on each VA range associated with this
// range group. The value for the preferredCpuMemoryNode is irrelevant in this
// case as it only applies to migrations of pageable address, which cannot be
// used to create range groups.
// range group.
//
// Any errors encountered during migration are returned immediately. No attempt
// is made to migrate the remaining unmigrated ranges and the ranges that are
@@ -1580,7 +1601,9 @@ NV_STATUS UvmMigrateAsync(void *base,
// Id of the range group whose associated VA ranges have to be migrated.
//
// destinationUuid: (INPUT)
// UUID of the destination processor to migrate pages to.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, the GPU instance UUID of the partition, or the CPU UUID to
// migrate pages to.
//
// Error codes:
// NV_ERR_OBJECT_NOT_FOUND:
@@ -1942,7 +1965,9 @@ NV_STATUS UvmMapExternalAllocation(void *base,
//
//
// gpuUuid: (INPUT)
// UUID of the GPU to map the sparse region on.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition to map the sparse
// region on.
//
// Errors:
// NV_ERR_INVALID_ADDRESS:
@@ -1999,7 +2024,9 @@ NV_STATUS UvmMapExternalSparse(void *base,
// The length of the virtual address range.
//
// gpuUuid: (INPUT)
// UUID of the GPU to unmap the VA range from.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition to unmap the VA
// range from.
//
// Errors:
// NV_ERR_INVALID_ADDRESS:
@@ -2066,7 +2093,9 @@ NV_STATUS UvmUnmapExternalAllocation(void *base,
// supported by the GPU.
//
// gpuUuid: (INPUT)
// UUID of the GPU to map the dynamic parallelism region on.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition to map the
// dynamic parallelism region on.
//
// Errors:
// NV_ERR_UVM_ADDRESS_IN_USE:
@@ -2297,15 +2326,14 @@ NV_STATUS UvmDisableReadDuplication(void *base,
// Length, in bytes, of the range.
//
// preferredLocationUuid: (INPUT)
// UUID of the preferred location.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, the GPU instance UUID of the partition, or the CPU UUID
// preferred location.
//
// preferredCpuNumaNode: (INPUT)
// preferredCpuMemoryNode: (INPUT)
// Preferred CPU NUMA memory node used if preferredLocationUuid is the
// UUID of the CPU. -1 is a special value which indicates all CPU nodes
// allowed by the global and thread memory policies. This argument is
// ignored if preferredLocationUuid refers to a GPU or the given virtual
// address range corresponds to managed memory. If NUMA is not enabled,
// only 0 or -1 is allowed.
// allowed by the global and thread memory policies.
//
// Errors:
// NV_ERR_INVALID_ADDRESS:
@@ -2335,10 +2363,11 @@ NV_STATUS UvmDisableReadDuplication(void *base,
//
// NV_ERR_INVALID_ARGUMENT:
// One of the following occured:
// - preferredLocationUuid is the UUID of a CPU and preferredCpuNumaNode
// refers to a registered GPU.
// - preferredCpuNumaNode is invalid and preferredLocationUuid is the
// UUID of the CPU.
// - preferredLocationUuid is the UUID of the CPU and
// preferredCpuMemoryNode is either:
// - not a valid NUMA node,
// - not a possible NUMA node, or
// - a NUMA node ID corresponding to a registered GPU.
//
// NV_ERR_NOT_SUPPORTED:
// The UVM file descriptor is associated with another process and the
@@ -2349,16 +2378,10 @@ NV_STATUS UvmDisableReadDuplication(void *base,
// because it is not very informative.
//
//------------------------------------------------------------------------------
#if UVM_API_REV_IS_AT_MOST(7)
NV_STATUS UvmSetPreferredLocation(void *base,
NvLength length,
const NvProcessorUuid *preferredLocationUuid);
#else
NV_STATUS UvmSetPreferredLocation(void *base,
NvLength length,
const NvProcessorUuid *preferredLocationUuid,
NvS32 preferredCpuNumaNode);
#endif
NvS32 preferredCpuMemoryNode);
//------------------------------------------------------------------------------
// UvmUnsetPreferredLocation
@@ -2481,8 +2504,9 @@ NV_STATUS UvmUnsetPreferredLocation(void *base,
// Length, in bytes, of the range.
//
// accessedByUuid: (INPUT)
// UUID of the processor that should have pages in the the VA range
// mapped when possible.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, the GPU instance UUID of the partition, or the CPU UUID
// that should have pages in the VA range mapped when possible.
//
// Errors:
// NV_ERR_INVALID_ADDRESS:
@@ -2550,8 +2574,10 @@ NV_STATUS UvmSetAccessedBy(void *base,
// Length, in bytes, of the range.
//
// accessedByUuid: (INPUT)
// UUID of the processor from which any policies set by
// UvmSetAccessedBy should be revoked for the given VA range.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, the GPU instance UUID of the partition, or the CPU UUID
// from which any policies set by UvmSetAccessedBy should be revoked
// for the given VA range.
//
// Errors:
// NV_ERR_INVALID_ADDRESS:
@@ -2609,7 +2635,9 @@ NV_STATUS UvmUnsetAccessedBy(void *base,
//
// Arguments:
// gpuUuid: (INPUT)
// UUID of the GPU to enable software-assisted system-wide atomics on.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition to enable
// software-assisted system-wide atomics on.
//
// Error codes:
// NV_ERR_NO_MEMORY:
@@ -2645,7 +2673,9 @@ NV_STATUS UvmEnableSystemWideAtomics(const NvProcessorUuid *gpuUuid);
//
// Arguments:
// gpuUuid: (INPUT)
// UUID of the GPU to disable software-assisted system-wide atomics on.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition to disable
// software-assisted system-wide atomics on.
//
// Error codes:
// NV_ERR_INVALID_DEVICE:
@@ -2874,7 +2904,9 @@ NV_STATUS UvmDebugCountersEnable(UvmDebugSession session,
// Name of the counter in that scope.
//
// gpu: (INPUT)
// Gpuid of the scoped GPU. This parameter is ignored in AllGpu scopes.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, or the GPU instance UUID of the partition of the scoped GPU.
// This parameter is ignored in AllGpu scopes.
//
// pCounterHandle: (OUTPUT)
// Handle to the counter address.
@@ -2928,7 +2960,7 @@ NV_STATUS UvmDebugGetCounterVal(UvmDebugSession session,
// UvmEventQueueCreate
//
// This call creates an event queue of the given size.
// No events are added in the queue till they are enabled by the user.
// No events are added in the queue until they are enabled by the user.
// Event queue data is visible to the user even after the target process dies
// if the session is active and queue is not freed.
//
@@ -2979,7 +3011,7 @@ NV_STATUS UvmEventQueueCreate(UvmDebugSession sessionHandle,
// UvmEventQueueDestroy
//
// This call frees all interal resources associated with the queue, including
// upinning of the memory associated with that queue. Freeing user buffer is
// unpinning of the memory associated with that queue. Freeing user buffer is
// responsibility of a caller. Event queue might be also destroyed as a side
// effect of destroying a session associated with this queue.
//
@@ -3163,9 +3195,9 @@ NV_STATUS UvmEventGetNotificationHandles(UvmEventQueueHandle *queueHandleArray,
// UvmEventGetGpuUuidTable
//
// Each migration event entry contains the gpu index to/from where data is
// migrated. This index maps to a corresponding gpu UUID in the gpuUuidTable.
// Using indices saves on the size of each event entry. This API provides the
// gpuIndex to gpuUuid relation to the user.
// migrated. This index maps to a corresponding physical gpu UUID in the
// gpuUuidTable. Using indices saves on the size of each event entry. This API
// provides the gpuIndex to gpuUuid relation to the user.
//
// This API does not access the queue state maintained in the user
// library and so the user doesn't need to acquire a lock to protect the
@@ -3173,9 +3205,9 @@ NV_STATUS UvmEventGetNotificationHandles(UvmEventQueueHandle *queueHandleArray,
//
// Arguments:
// gpuUuidTable: (OUTPUT)
// The return value is an array of UUIDs. The array index is the
// corresponding gpuIndex. There can be at max 32 gpus associated with
// UVM, so array size is 32.
// The return value is an array of physical GPU UUIDs. The array index
// is the corresponding gpuIndex. There can be at max 32 GPUs
// associated with UVM, so array size is 32.
//
// validCount: (OUTPUT)
// The system doesn't normally contain 32 GPUs. This field gives the
@@ -3234,7 +3266,7 @@ NV_STATUS UvmEventGetGpuUuidTable(NvProcessorUuid *gpuUuidTable,
//------------------------------------------------------------------------------
NV_STATUS UvmEventFetch(UvmDebugSession sessionHandle,
UvmEventQueueHandle queueHandle,
UvmEventEntry *pBuffer,
UvmEventEntry_V1 *pBuffer,
NvU64 *nEntries);
//------------------------------------------------------------------------------
@@ -3430,10 +3462,14 @@ NV_STATUS UvmToolsDestroySession(UvmToolsSessionHandle session);
// 4. Destroy event Queue using UvmToolsDestroyEventQueue
//
#if UVM_API_REV_IS_AT_MOST(10)
// This is deprecated and replaced by sizeof(UvmToolsEventControlData).
NvLength UvmToolsGetEventControlSize(void);
// This is deprecated and replaced by sizeof(UvmEventEntry_V1) or
// sizeof(UvmEventEntry_V2).
NvLength UvmToolsGetEventEntrySize(void);
#endif
NvLength UvmToolsGetNumberOfCounters(void);
@@ -3448,6 +3484,10 @@ NvLength UvmToolsGetNumberOfCounters(void);
// session: (INPUT)
// Handle to the tools session.
//
// version: (INPUT)
// Requested version for events or counters.
// See UvmEventEntry_V1 and UvmEventEntry_V2.
//
// event_buffer: (INPUT)
// User allocated buffer. Must be page-aligned. Must be large enough to
// hold at least event_buffer_size events. Gets pinned until queue is
@@ -3460,9 +3500,7 @@ NvLength UvmToolsGetNumberOfCounters(void);
// event_control (INPUT)
// User allocated buffer. Must be page-aligned. Must be large enough to
// hold UvmToolsEventControlData (although single page-size allocation
// should be more than enough). One could call
// UvmToolsGetEventControlSize() function to find out current size of
// UvmToolsEventControlData. Gets pinned until queue is destroyed.
// should be more than enough). Gets pinned until queue is destroyed.
//
// queue: (OUTPUT)
// Handle to the created queue.
@@ -3472,22 +3510,32 @@ NvLength UvmToolsGetNumberOfCounters(void);
// Session handle does not refer to a valid session
//
// NV_ERR_INVALID_ARGUMENT:
// The version is not UvmEventEntry_V1 or UvmEventEntry_V2.
// One of the parameters: event_buffer, event_buffer_size, event_control
// is not valid
//
// NV_ERR_INSUFFICIENT_RESOURCES:
// There could be multiple reasons for this error. One would be that it's
// not possible to allocate a queue of requested size. Another would be
// that either event_buffer or event_control memory couldn't be pinned
// (e.g. because of OS limitation of pinnable memory). Also it could not
// have been possible to create UvmToolsEventQueueDescriptor.
// There could be multiple reasons for this error. One would be that
// it's not possible to allocate a queue of requested size. Another
// would be either event_buffer or event_control memory couldn't be
// pinned (e.g. because of OS limitation of pinnable memory). Also it
// could not have been possible to create UvmToolsEventQueueDescriptor.
//
//------------------------------------------------------------------------------
#if UVM_API_REV_IS_AT_MOST(10)
NV_STATUS UvmToolsCreateEventQueue(UvmToolsSessionHandle session,
void *event_buffer,
NvLength event_buffer_size,
void *event_control,
UvmToolsEventQueueHandle *queue);
#else
NV_STATUS UvmToolsCreateEventQueue(UvmToolsSessionHandle session,
UvmToolsEventQueueVersion version,
void *event_buffer,
NvLength event_buffer_size,
void *event_control,
UvmToolsEventQueueHandle *queue);
#endif
UvmToolsEventQueueDescriptor UvmToolsGetEventQueueDescriptor(UvmToolsEventQueueHandle queue);
@@ -3524,7 +3572,7 @@ NV_STATUS UvmToolsSetNotificationThreshold(UvmToolsEventQueueHandle queue,
//------------------------------------------------------------------------------
// UvmToolsDestroyEventQueue
//
// Destroys all internal resources associated with the queue. It unpinns the
// Destroys all internal resources associated with the queue. It unpins the
// buffers provided in UvmToolsCreateEventQueue. Event Queue is also auto
// destroyed when corresponding session gets destroyed.
//
@@ -3546,7 +3594,7 @@ NV_STATUS UvmToolsDestroyEventQueue(UvmToolsEventQueueHandle queue);
// UvmEventQueueEnableEvents
//
// This call enables a particular event type in the event queue. All events are
// disabled by default. Any event type is considered listed if and only if it's
// disabled by default. Any event type is considered listed if and only if its
// corresponding value is equal to 1 (in other words, bit is set). Disabled
// events listed in eventTypeFlags are going to be enabled. Enabled events and
// events not listed in eventTypeFlags are not affected by this call.
@@ -3579,7 +3627,7 @@ NV_STATUS UvmToolsEventQueueEnableEvents(UvmToolsEventQueueHandle queue,
// UvmToolsEventQueueDisableEvents
//
// This call disables a particular event type in the event queue. Any event type
// is considered listed if and only if it's corresponding value is equal to 1
// is considered listed if and only if its corresponding value is equal to 1
// (in other words, bit is set). Enabled events listed in eventTypeFlags are
// going to be disabled. Disabled events and events not listed in eventTypeFlags
// are not affected by this call.
@@ -3617,7 +3665,7 @@ NV_STATUS UvmToolsEventQueueDisableEvents(UvmToolsEventQueueHandle queue,
//
// Counters position follows the layout of the memory that UVM driver decides to
// use. To obtain particular counter value, user should perform consecutive
// atomic reads at a a given buffer + offset address.
// atomic reads at a given buffer + offset address.
//
// It is not defined what is the initial value of a counter. User should rely on
// a difference between each snapshot.
@@ -3640,9 +3688,9 @@ NV_STATUS UvmToolsEventQueueDisableEvents(UvmToolsEventQueueHandle queue,
// Provided session is not valid
//
// NV_ERR_INSUFFICIENT_RESOURCES
// There could be multiple reasons for this error. One would be that it's
// not possible to allocate counters structure. Another would be that
// either event_buffer or event_control memory couldn't be pinned
// There could be multiple reasons for this error. One would be that
// it's not possible to allocate counters structure. Another would be
// that either event_buffer or event_control memory couldn't be pinned
// (e.g. because of OS limitation of pinnable memory)
//
//------------------------------------------------------------------------------
@@ -3653,12 +3701,12 @@ NV_STATUS UvmToolsCreateProcessAggregateCounters(UvmToolsSessionHandle session
//------------------------------------------------------------------------------
// UvmToolsCreateProcessorCounters
//
// Creates the counters structure for tracking per-process counters.
// Creates the counters structure for tracking per-processor counters.
// These counters are disabled by default.
//
// Counters position follows the layout of the memory that UVM driver decides to
// use. To obtain particular counter value, user should perform consecutive
// atomic reads at a a given buffer + offset address.
// atomic reads at a given buffer + offset address.
//
// It is not defined what is the initial value of a counter. User should rely on
// a difference between each snapshot.
@@ -3674,7 +3722,9 @@ NV_STATUS UvmToolsCreateProcessAggregateCounters(UvmToolsSessionHandle session
// counters are destroyed.
//
// processorUuid: (INPUT)
// UUID of the resource, for which counters will provide statistic data.
// UUID of the physical GPU if the GPU is not SMC capable or SMC
// enabled, the GPU instance UUID of the partition, or the CPU UUID of
// the resource, for which counters will provide statistic data.
//
// counters: (OUTPUT)
// Handle to the created counters.
@@ -3684,9 +3734,9 @@ NV_STATUS UvmToolsCreateProcessAggregateCounters(UvmToolsSessionHandle session
// session handle does not refer to a valid tools session
//
// NV_ERR_INSUFFICIENT_RESOURCES
// There could be multiple reasons for this error. One would be that it's
// not possible to allocate counters structure. Another would be that
// either event_buffer or event_control memory couldn't be pinned
// There could be multiple reasons for this error. One would be that
// it's not possible to allocate counters structure. Another would be
// that either event_buffer or event_control memory couldn't be pinned
// (e.g. because of OS limitation of pinnable memory)
//
// NV_ERR_INVALID_ARGUMENT
@@ -3702,7 +3752,7 @@ NV_STATUS UvmToolsCreateProcessorCounters(UvmToolsSessionHandle session,
// UvmToolsDestroyCounters
//
// Destroys all internal resources associated with this counters structure.
// It unpinns the buffer provided in UvmToolsCreate*Counters. Counters structure
// It unpins the buffer provided in UvmToolsCreate*Counters. Counters structure
// also gest destroyed when corresponding session is destroyed.
//
// Arguments:
@@ -3723,7 +3773,7 @@ NV_STATUS UvmToolsDestroyCounters(UvmToolsCountersHandle counters);
// UvmToolsEnableCounters
//
// This call enables certain counter types in the counters structure. Any
// counter type is considered listed if and only if it's corresponding value is
// counter type is considered listed if and only if its corresponding value is
// equal to 1 (in other words, bit is set). Disabled counter types listed in
// counterTypeFlags are going to be enabled. Already enabled counter types and
// counter types not listed in counterTypeFlags are not affected by this call.
@@ -3757,7 +3807,7 @@ NV_STATUS UvmToolsEnableCounters(UvmToolsCountersHandle counters,
// UvmToolsDisableCounters
//
// This call disables certain counter types in the counters structure. Any
// counter type is considered listed if and only if it's corresponding value is
// counter type is considered listed if and only if its corresponding value is
// equal to 1 (in other words, bit is set). Enabled counter types listed in
// counterTypeFlags are going to be disabled. Already disabled counter types and
// counter types not listed in counterTypeFlags are not affected by this call.
@@ -3902,32 +3952,72 @@ NV_STATUS UvmToolsWriteProcessMemory(UvmToolsSessionHandle session,
// UvmToolsGetProcessorUuidTable
//
// Populate a table with the UUIDs of all the currently registered processors
// in the target process. When a GPU is registered, it is added to the table.
// When a GPU is unregistered, it is removed. As long as a GPU remains registered,
// its index in the table does not change. New registrations obtain the first
// unused index.
// in the target process. When a GPU is registered, it is added to the table.
// When a GPU is unregistered, it is removed. As long as a GPU remains
// registered, its index in the table does not change.
// Note that the index in the table corresponds to the processor ID reported
// in UvmEventEntry event records and that the table is not contiguously packed
// with non-zero UUIDs even with no GPU unregistrations.
//
// Arguments:
// session: (INPUT)
// Handle to the tools session.
//
// version: (INPUT)
// Requested version for the UUID table returned. The version must
// match the requested version of the event queue created with
// UvmToolsCreateEventQueue().
// See UvmEventEntry_V1 and UvmEventEntry_V2.
//
// table: (OUTPUT)
// Array of processor UUIDs, including the CPU's UUID which is always
// at index zero. The srcIndex and dstIndex fields of the
// UvmEventMigrationInfo struct index this array. Unused indices will
// have a UUID of zero.
// have a UUID of zero. Version UvmEventEntry_V1 only uses GPU UUIDs
// for the UUID of the physical GPU and only supports a single SMC
// partition registered per process. Version UvmEventEntry_V2 supports
// multiple SMC partitions registered per process and uses physical GPU
// UUIDs if the GPU is not SMC capable or SMC enabled and GPU instance
// UUIDs for SMC partitions.
// The table pointer can be NULL in which case, the size of the table
// needed to hold all the UUIDs is returned in 'count'.
//
// table_size: (INPUT)
// The size of the table in number of array elements. This can be
// zero if the table pointer is NULL.
//
// count: (OUTPUT)
// Set by UVM to the number of UUIDs written, including any gaps in
// the table due to unregistered GPUs.
// On output, it is set by UVM to the number of UUIDs needed to hold
// all the UUIDs, including any gaps in the table due to unregistered
// GPUs.
//
// Error codes:
// NV_ERR_INVALID_ADDRESS:
// writing to table failed.
// writing to table failed or the count pointer was invalid.
//
// NV_ERR_INVALID_ARGUMENT:
// The version is not UvmEventEntry_V1 or UvmEventEntry_V2.
// The count pointer is NULL.
// See UvmToolsEventQueueVersion.
//
// NV_WARN_MISMATCHED_TARGET:
// The kernel returned a table suitable for UvmEventEntry_V1 events.
// (i.e., the kernel is older and doesn't support UvmEventEntry_V2).
//
// NV_ERR_NO_MEMORY:
// Internal memory allocation failed.
//------------------------------------------------------------------------------
#if UVM_API_REV_IS_AT_MOST(10)
NV_STATUS UvmToolsGetProcessorUuidTable(UvmToolsSessionHandle session,
NvProcessorUuid *table,
NvLength *count);
#else
NV_STATUS UvmToolsGetProcessorUuidTable(UvmToolsSessionHandle session,
UvmToolsEventQueueVersion version,
NvProcessorUuid *table,
NvLength table_size,
NvLength *count);
#endif
//------------------------------------------------------------------------------
// UvmToolsFlushEvents

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2021 NVIDIA Corporation
Copyright (c) 2021-2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -79,6 +79,8 @@ void uvm_hal_ada_arch_init_properties(uvm_parent_gpu_t *parent_gpu)
parent_gpu->access_counters_supported = true;
parent_gpu->access_counters_can_use_physical_addresses = false;
parent_gpu->fault_cancel_va_supported = true;
parent_gpu->scoped_atomics_supported = true;
@@ -94,4 +96,6 @@ void uvm_hal_ada_arch_init_properties(uvm_parent_gpu_t *parent_gpu)
parent_gpu->map_remap_larger_page_promotion = false;
parent_gpu->plc_supported = true;
parent_gpu->no_ats_range_required = false;
}

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2018-20221 NVIDIA Corporation
Copyright (c) 2018-2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -38,10 +38,12 @@ void uvm_hal_ampere_arch_init_properties(uvm_parent_gpu_t *parent_gpu)
parent_gpu->utlb_per_gpc_count = uvm_ampere_get_utlbs_per_gpc(parent_gpu);
parent_gpu->fault_buffer_info.replayable.utlb_count = parent_gpu->rm_info.maxGpcCount * parent_gpu->utlb_per_gpc_count;
parent_gpu->fault_buffer_info.replayable.utlb_count = parent_gpu->rm_info.maxGpcCount *
parent_gpu->utlb_per_gpc_count;
{
uvm_fault_buffer_entry_t *dummy;
UVM_ASSERT(parent_gpu->fault_buffer_info.replayable.utlb_count <= (1 << (sizeof(dummy->fault_source.utlb_id) * 8)));
UVM_ASSERT(parent_gpu->fault_buffer_info.replayable.utlb_count <= (1 <<
(sizeof(dummy->fault_source.utlb_id) * 8)));
}
// A single top level PDE on Ampere covers 128 TB and that's the minimum
@@ -53,7 +55,7 @@ void uvm_hal_ampere_arch_init_properties(uvm_parent_gpu_t *parent_gpu)
parent_gpu->uvm_mem_va_size = UVM_MEM_VA_SIZE;
// See uvm_mmu.h for mapping placement
parent_gpu->flat_vidmem_va_base = 136 * UVM_SIZE_1TB;
parent_gpu->flat_vidmem_va_base = 160 * UVM_SIZE_1TB;
parent_gpu->flat_sysmem_va_base = 256 * UVM_SIZE_1TB;
parent_gpu->ce_phys_vidmem_write_supported = true;
@@ -81,6 +83,8 @@ void uvm_hal_ampere_arch_init_properties(uvm_parent_gpu_t *parent_gpu)
parent_gpu->access_counters_supported = true;
parent_gpu->access_counters_can_use_physical_addresses = false;
parent_gpu->fault_cancel_va_supported = true;
parent_gpu->scoped_atomics_supported = true;
@@ -101,4 +105,6 @@ void uvm_hal_ampere_arch_init_properties(uvm_parent_gpu_t *parent_gpu)
parent_gpu->map_remap_larger_page_promotion = false;
parent_gpu->plc_supported = true;
parent_gpu->no_ats_range_required = false;
}

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2018-2022 NVIDIA Corporation
Copyright (c) 2018-2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -117,10 +117,12 @@ bool uvm_hal_ampere_ce_memcopy_is_valid_c6b5(uvm_push_t *push, uvm_gpu_address_t
NvU64 push_begin_gpu_va;
uvm_gpu_t *gpu = uvm_push_get_gpu(push);
if (!uvm_gpu_is_virt_mode_sriov_heavy(gpu))
if (!uvm_parent_gpu_is_virt_mode_sriov_heavy(gpu->parent))
return true;
if (uvm_channel_is_proxy(push->channel)) {
uvm_pushbuffer_t *pushbuffer;
if (dst.is_virtual) {
UVM_ERR_PRINT("Destination address of memcopy must be physical, not virtual\n");
return false;
@@ -142,7 +144,8 @@ bool uvm_hal_ampere_ce_memcopy_is_valid_c6b5(uvm_push_t *push, uvm_gpu_address_t
return false;
}
push_begin_gpu_va = uvm_pushbuffer_get_gpu_va_for_push(push->channel->pool->manager->pushbuffer, push);
pushbuffer = uvm_channel_get_pushbuffer(push->channel);
push_begin_gpu_va = uvm_pushbuffer_get_gpu_va_for_push(pushbuffer, push);
if ((src.address < push_begin_gpu_va) || (src.address >= push_begin_gpu_va + uvm_push_get_size(push))) {
UVM_ERR_PRINT("Source address of memcopy must point to pushbuffer\n");
@@ -177,10 +180,13 @@ bool uvm_hal_ampere_ce_memcopy_is_valid_c6b5(uvm_push_t *push, uvm_gpu_address_t
// irrespective of the virtualization mode.
void uvm_hal_ampere_ce_memcopy_patch_src_c6b5(uvm_push_t *push, uvm_gpu_address_t *src)
{
uvm_pushbuffer_t *pushbuffer;
if (!uvm_channel_is_proxy(push->channel))
return;
src->address -= uvm_pushbuffer_get_gpu_va_for_push(push->channel->pool->manager->pushbuffer, push);
pushbuffer = uvm_channel_get_pushbuffer(push->channel);
src->address -= uvm_pushbuffer_get_gpu_va_for_push(pushbuffer, push);
}
bool uvm_hal_ampere_ce_memset_is_valid_c6b5(uvm_push_t *push,
@@ -190,7 +196,7 @@ bool uvm_hal_ampere_ce_memset_is_valid_c6b5(uvm_push_t *push,
{
uvm_gpu_t *gpu = uvm_push_get_gpu(push);
if (!uvm_gpu_is_virt_mode_sriov_heavy(gpu))
if (!uvm_parent_gpu_is_virt_mode_sriov_heavy(gpu->parent))
return true;
if (uvm_channel_is_proxy(push->channel)) {

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2018-2022 NVIDIA Corporation
Copyright (c) 2018-2024 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -33,7 +33,7 @@ bool uvm_hal_ampere_host_method_is_valid(uvm_push_t *push, NvU32 method_address,
{
uvm_gpu_t *gpu = uvm_push_get_gpu(push);
if (!uvm_gpu_is_virt_mode_sriov_heavy(gpu))
if (!uvm_parent_gpu_is_virt_mode_sriov_heavy(gpu->parent))
return true;
if (uvm_channel_is_privileged(push->channel)) {
@@ -205,17 +205,18 @@ void uvm_hal_ampere_host_clear_faulted_channel_sw_method(uvm_push_t *push,
CLEAR_FAULTED_B, HWVALUE(C076, CLEAR_FAULTED_B, INST_HI, instance_ptr_hi));
}
// Copy from Pascal, this version sets TLB_INVALIDATE_INVAL_SCOPE.
// Copy from Turing, this version sets TLB_INVALIDATE_INVAL_SCOPE.
void uvm_hal_ampere_host_tlb_invalidate_all(uvm_push_t *push,
uvm_gpu_phys_address_t pdb,
NvU32 depth,
uvm_membar_t membar)
uvm_gpu_phys_address_t pdb,
NvU32 depth,
uvm_membar_t membar)
{
NvU32 aperture_value;
NvU32 page_table_level;
NvU32 pdb_lo;
NvU32 pdb_hi;
NvU32 ack_value = 0;
NvU32 sysmembar_value = 0;
UVM_ASSERT_MSG(pdb.aperture == UVM_APERTURE_VID || pdb.aperture == UVM_APERTURE_SYS, "aperture: %u", pdb.aperture);
@@ -230,8 +231,8 @@ void uvm_hal_ampere_host_tlb_invalidate_all(uvm_push_t *push,
pdb_lo = pdb.address & HWMASK(C56F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
pdb_hi = pdb.address >> HWSIZE(C56F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
// PDE3 is the highest level on Pascal, see the comment in uvm_pascal_mmu.c
// for details.
// PDE3 is the highest level on Pascal-Ampere, see the comment in
// uvm_pascal_mmu.c for details.
UVM_ASSERT_MSG(depth < NVC56F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE3, "depth %u", depth);
page_table_level = NVC56F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE3 - depth;
@@ -242,7 +243,12 @@ void uvm_hal_ampere_host_tlb_invalidate_all(uvm_push_t *push,
ack_value = HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_ACK_TYPE, GLOBALLY);
}
NV_PUSH_4U(C56F, MEM_OP_A, HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS) |
if (membar == UVM_MEMBAR_SYS)
sysmembar_value = HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, EN);
else
sysmembar_value = HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS);
NV_PUSH_4U(C56F, MEM_OP_A, sysmembar_value |
HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS),
MEM_OP_B, 0,
MEM_OP_C, HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_PDB, ONE) |
@@ -255,16 +261,18 @@ void uvm_hal_ampere_host_tlb_invalidate_all(uvm_push_t *push,
MEM_OP_D, HWCONST(C56F, MEM_OP_D, OPERATION, MMU_TLB_INVALIDATE) |
HWVALUE(C56F, MEM_OP_D, TLB_INVALIDATE_PDB_ADDR_HI, pdb_hi));
uvm_hal_tlb_invalidate_membar(push, membar);
// GPU membar still requires an explicit membar method.
if (membar == UVM_MEMBAR_GPU)
uvm_push_get_gpu(push)->parent->host_hal->membar_gpu(push);
}
// Copy from Volta, this version sets TLB_INVALIDATE_INVAL_SCOPE.
// Copy from Turing, this version sets TLB_INVALIDATE_INVAL_SCOPE.
void uvm_hal_ampere_host_tlb_invalidate_va(uvm_push_t *push,
uvm_gpu_phys_address_t pdb,
NvU32 depth,
NvU64 base,
NvU64 size,
NvU32 page_size,
NvU64 page_size,
uvm_membar_t membar)
{
NvU32 aperture_value;
@@ -272,6 +280,7 @@ void uvm_hal_ampere_host_tlb_invalidate_va(uvm_push_t *push,
NvU32 pdb_lo;
NvU32 pdb_hi;
NvU32 ack_value = 0;
NvU32 sysmembar_value = 0;
NvU32 va_lo;
NvU32 va_hi;
NvU64 end;
@@ -281,9 +290,9 @@ void uvm_hal_ampere_host_tlb_invalidate_va(uvm_push_t *push,
NvU32 log2_invalidation_size;
uvm_gpu_t *gpu = uvm_push_get_gpu(push);
UVM_ASSERT_MSG(IS_ALIGNED(page_size, 1 << 12), "page_size 0x%x\n", page_size);
UVM_ASSERT_MSG(IS_ALIGNED(base, page_size), "base 0x%llx page_size 0x%x\n", base, page_size);
UVM_ASSERT_MSG(IS_ALIGNED(size, page_size), "size 0x%llx page_size 0x%x\n", size, page_size);
UVM_ASSERT_MSG(IS_ALIGNED(page_size, 1 << 12), "page_size 0x%llx\n", page_size);
UVM_ASSERT_MSG(IS_ALIGNED(base, page_size), "base 0x%llx page_size 0x%llx\n", base, page_size);
UVM_ASSERT_MSG(IS_ALIGNED(size, page_size), "size 0x%llx page_size 0x%llx\n", size, page_size);
UVM_ASSERT_MSG(size > 0, "size 0x%llx\n", size);
// The invalidation size must be a power-of-two number of pages containing
@@ -325,7 +334,7 @@ void uvm_hal_ampere_host_tlb_invalidate_va(uvm_push_t *push,
pdb_lo = pdb.address & HWMASK(C56F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
pdb_hi = pdb.address >> HWSIZE(C56F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
// PDE3 is the highest level on Pascal-Ampere , see the comment in
// PDE3 is the highest level on Pascal-Ampere, see the comment in
// uvm_pascal_mmu.c for details.
UVM_ASSERT_MSG(depth < NVC56F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE3, "depth %u", depth);
page_table_level = NVC56F_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE3 - depth;
@@ -337,10 +346,15 @@ void uvm_hal_ampere_host_tlb_invalidate_va(uvm_push_t *push,
ack_value = HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_ACK_TYPE, GLOBALLY);
}
if (membar == UVM_MEMBAR_SYS)
sysmembar_value = HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, EN);
else
sysmembar_value = HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS);
NV_PUSH_4U(C56F, MEM_OP_A, HWVALUE(C56F, MEM_OP_A, TLB_INVALIDATE_INVALIDATION_SIZE, log2_invalidation_size) |
HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS) |
HWVALUE(C56F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO, va_lo) |
HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS),
HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS) |
sysmembar_value |
HWVALUE(C56F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO, va_lo),
MEM_OP_B, HWVALUE(C56F, MEM_OP_B, TLB_INVALIDATE_TARGET_ADDR_HI, va_hi),
MEM_OP_C, HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_PDB, ONE) |
HWVALUE(C56F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO, pdb_lo) |
@@ -352,21 +366,23 @@ void uvm_hal_ampere_host_tlb_invalidate_va(uvm_push_t *push,
MEM_OP_D, HWCONST(C56F, MEM_OP_D, OPERATION, MMU_TLB_INVALIDATE_TARGETED) |
HWVALUE(C56F, MEM_OP_D, TLB_INVALIDATE_PDB_ADDR_HI, pdb_hi));
uvm_hal_tlb_invalidate_membar(push, membar);
// GPU membar still requires an explicit membar method.
if (membar == UVM_MEMBAR_GPU)
gpu->parent->host_hal->membar_gpu(push);
}
// Copy from Pascal, this version sets TLB_INVALIDATE_INVAL_SCOPE.
// Copy from Turing, this version sets TLB_INVALIDATE_INVAL_SCOPE.
void uvm_hal_ampere_host_tlb_invalidate_test(uvm_push_t *push,
uvm_gpu_phys_address_t pdb,
UVM_TEST_INVALIDATE_TLB_PARAMS *params)
{
NvU32 ack_value = 0;
NvU32 sysmembar_value = 0;
NvU32 invalidate_gpc_value = 0;
NvU32 aperture_value = 0;
NvU32 pdb_lo = 0;
NvU32 pdb_hi = 0;
NvU32 page_table_level = 0;
uvm_membar_t membar;
UVM_ASSERT_MSG(pdb.aperture == UVM_APERTURE_VID || pdb.aperture == UVM_APERTURE_SYS, "aperture: %u", pdb.aperture);
if (pdb.aperture == UVM_APERTURE_VID)
@@ -381,7 +397,7 @@ void uvm_hal_ampere_host_tlb_invalidate_test(uvm_push_t *push,
pdb_hi = pdb.address >> HWSIZE(C56F, MEM_OP_C, TLB_INVALIDATE_PDB_ADDR_LO);
if (params->page_table_level != UvmInvalidatePageTableLevelAll) {
// PDE3 is the highest level on Pascal, see the comment in
// PDE3 is the highest level on Pascal-Ampere, see the comment in
// uvm_pascal_mmu.c for details.
page_table_level = min((NvU32)UvmInvalidatePageTableLevelPde3, params->page_table_level) - 1;
}
@@ -393,6 +409,11 @@ void uvm_hal_ampere_host_tlb_invalidate_test(uvm_push_t *push,
ack_value = HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_ACK_TYPE, GLOBALLY);
}
if (params->membar == UvmInvalidateTlbMemBarSys)
sysmembar_value = HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, EN);
else
sysmembar_value = HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS);
if (params->disable_gpc_invalidate)
invalidate_gpc_value = HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_GPC, DISABLE);
else
@@ -403,9 +424,9 @@ void uvm_hal_ampere_host_tlb_invalidate_test(uvm_push_t *push,
NvU32 va_lo = va & HWMASK(C56F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO);
NvU32 va_hi = va >> HWSIZE(C56F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO);
NV_PUSH_4U(C56F, MEM_OP_A, HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS) |
HWVALUE(C56F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO, va_lo) |
HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS),
NV_PUSH_4U(C56F, MEM_OP_A, sysmembar_value |
HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS) |
HWVALUE(C56F, MEM_OP_A, TLB_INVALIDATE_TARGET_ADDR_LO, va_lo),
MEM_OP_B, HWVALUE(C56F, MEM_OP_B, TLB_INVALIDATE_TARGET_ADDR_HI, va_hi),
MEM_OP_C, HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_REPLAY, NONE) |
HWVALUE(C56F, MEM_OP_C, TLB_INVALIDATE_PAGE_TABLE_LEVEL, page_table_level) |
@@ -418,7 +439,7 @@ void uvm_hal_ampere_host_tlb_invalidate_test(uvm_push_t *push,
HWVALUE(C56F, MEM_OP_D, TLB_INVALIDATE_PDB_ADDR_HI, pdb_hi));
}
else {
NV_PUSH_4U(C56F, MEM_OP_A, HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_SYSMEMBAR, DIS) |
NV_PUSH_4U(C56F, MEM_OP_A, sysmembar_value |
HWCONST(C56F, MEM_OP_A, TLB_INVALIDATE_INVAL_SCOPE, NON_LINK_TLBS),
MEM_OP_B, 0,
MEM_OP_C, HWCONST(C56F, MEM_OP_C, TLB_INVALIDATE_REPLAY, NONE) |
@@ -432,12 +453,7 @@ void uvm_hal_ampere_host_tlb_invalidate_test(uvm_push_t *push,
HWVALUE(C56F, MEM_OP_D, TLB_INVALIDATE_PDB_ADDR_HI, pdb_hi));
}
if (params->membar == UvmInvalidateTlbMemBarSys)
membar = UVM_MEMBAR_SYS;
else if (params->membar == UvmInvalidateTlbMemBarLocal)
membar = UVM_MEMBAR_GPU;
else
membar = UVM_MEMBAR_NONE;
uvm_hal_tlb_invalidate_membar(push, membar);
// GPU membar still requires an explicit membar method.
if (params->membar == UvmInvalidateTlbMemBarLocal)
uvm_push_get_gpu(push)->parent->host_hal->membar_gpu(push);
}

View File

@@ -51,7 +51,7 @@ uvm_mmu_engine_type_t uvm_hal_ampere_mmu_engine_id_to_type(NvU16 mmu_engine_id)
return UVM_MMU_ENGINE_TYPE_GRAPHICS;
}
static NvU32 page_table_depth_ampere(NvU32 page_size)
static NvU32 page_table_depth_ampere(NvU64 page_size)
{
// The common-case is page_size == UVM_PAGE_SIZE_2M, hence the first check
if (page_size == UVM_PAGE_SIZE_2M)
@@ -62,14 +62,14 @@ static NvU32 page_table_depth_ampere(NvU32 page_size)
return 4;
}
static NvU32 page_sizes_ampere(void)
static NvU64 page_sizes_ampere(void)
{
return UVM_PAGE_SIZE_512M | UVM_PAGE_SIZE_2M | UVM_PAGE_SIZE_64K | UVM_PAGE_SIZE_4K;
}
static uvm_mmu_mode_hal_t ampere_mmu_mode_hal;
uvm_mmu_mode_hal_t *uvm_hal_mmu_mode_ampere(NvU32 big_page_size)
uvm_mmu_mode_hal_t *uvm_hal_mmu_mode_ampere(NvU64 big_page_size)
{
static bool initialized = false;

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2018-2021 NVIDIA Corporation
Copyright (c) 2018-2024 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -44,6 +44,8 @@ void uvm_ats_init(const UvmPlatformInfo *platform_info)
void uvm_ats_init_va_space(uvm_va_space_t *va_space)
{
uvm_init_rwsem(&va_space->ats.lock, UVM_LOCK_ORDER_LEAF);
if (UVM_ATS_IBM_SUPPORTED())
uvm_ats_ibm_init_va_space(va_space);
}

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2018-2021 NVIDIA Corporation
Copyright (c) 2018-2024 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -28,10 +28,10 @@
#include "uvm_forward_decl.h"
#include "uvm_ats_ibm.h"
#include "nv_uvm_types.h"
#include "uvm_lock.h"
#include "uvm_ats_sva.h"
#include "uvm_ats_sva.h"
#define UVM_ATS_SUPPORTED() (UVM_ATS_IBM_SUPPORTED() || UVM_ATS_SVA_SUPPORTED())
#define UVM_ATS_SUPPORTED() (UVM_ATS_IBM_SUPPORTED() || UVM_ATS_SVA_SUPPORTED())
typedef struct
{
@@ -39,6 +39,10 @@ typedef struct
// indexed by gpu->id. This mask is protected by the VA space lock.
uvm_processor_mask_t registered_gpu_va_spaces;
// Protects racing invalidates in the VA space while hmm_range_fault() is
// being called in ats_compute_residency_mask().
uvm_rw_semaphore_t lock;
union
{
uvm_ibm_va_space_t ibm;

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2018 NVIDIA Corporation
Copyright (c) 2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -20,73 +20,46 @@
DEALINGS IN THE SOFTWARE.
*******************************************************************************/
#include "uvm_api.h"
#include "uvm_tools.h"
#include "uvm_va_range.h"
#include "uvm_ats.h"
#include "uvm_ats_faults.h"
#include "uvm_migrate_pageable.h"
#include <linux/nodemask.h>
#include <linux/mempolicy.h>
#include <linux/mmu_notifier.h>
// TODO: Bug 2103669: Implement a real prefetching policy and remove or adapt
// these experimental parameters. These are intended to help guide that policy.
static unsigned int uvm_exp_perf_prefetch_ats_order_replayable = 0;
module_param(uvm_exp_perf_prefetch_ats_order_replayable, uint, 0644);
MODULE_PARM_DESC(uvm_exp_perf_prefetch_ats_order_replayable,
"Max order of pages (2^N) to prefetch on replayable ATS faults");
#if UVM_HMM_RANGE_FAULT_SUPPORTED()
#include <linux/hmm.h>
#endif
static unsigned int uvm_exp_perf_prefetch_ats_order_non_replayable = 0;
module_param(uvm_exp_perf_prefetch_ats_order_non_replayable, uint, 0644);
MODULE_PARM_DESC(uvm_exp_perf_prefetch_ats_order_non_replayable,
"Max order of pages (2^N) to prefetch on non-replayable ATS faults");
// Expand the fault region to the naturally-aligned region with order given by
// the module parameters, clamped to the vma containing fault_addr (if any).
// Note that this means the region contains fault_addr but may not begin at
// fault_addr.
static void expand_fault_region(struct vm_area_struct *vma,
NvU64 start,
size_t length,
uvm_fault_client_type_t client_type,
unsigned long *migrate_start,
unsigned long *migrate_length)
typedef enum
{
unsigned int order;
unsigned long outer, aligned_start, aligned_size;
UVM_ATS_SERVICE_TYPE_FAULTS = 0,
UVM_ATS_SERVICE_TYPE_ACCESS_COUNTERS,
UVM_ATS_SERVICE_TYPE_COUNT
} uvm_ats_service_type_t;
*migrate_start = start;
*migrate_length = length;
if (client_type == UVM_FAULT_CLIENT_TYPE_HUB)
order = uvm_exp_perf_prefetch_ats_order_non_replayable;
else
order = uvm_exp_perf_prefetch_ats_order_replayable;
if (order == 0)
return;
UVM_ASSERT(vma);
UVM_ASSERT(order < BITS_PER_LONG - PAGE_SHIFT);
aligned_size = (1UL << order) * PAGE_SIZE;
aligned_start = start & ~(aligned_size - 1);
*migrate_start = max(vma->vm_start, aligned_start);
outer = min(vma->vm_end, aligned_start + aligned_size);
*migrate_length = outer - *migrate_start;
}
static NV_STATUS service_ats_faults(uvm_gpu_va_space_t *gpu_va_space,
struct vm_area_struct *vma,
NvU64 start,
size_t length,
uvm_fault_access_type_t access_type,
uvm_fault_client_type_t client_type)
static NV_STATUS service_ats_requests(uvm_gpu_va_space_t *gpu_va_space,
struct vm_area_struct *vma,
NvU64 start,
size_t length,
uvm_fault_access_type_t access_type,
uvm_ats_service_type_t service_type,
uvm_ats_fault_context_t *ats_context)
{
uvm_va_space_t *va_space = gpu_va_space->va_space;
struct mm_struct *mm = va_space->va_space_mm.mm;
bool write = (access_type >= UVM_FAULT_ACCESS_TYPE_WRITE);
NV_STATUS status;
NvU64 user_space_start;
NvU64 user_space_length;
bool write = (access_type >= UVM_FAULT_ACCESS_TYPE_WRITE);
bool fault_service_type = (service_type == UVM_ATS_SERVICE_TYPE_FAULTS);
uvm_populate_permissions_t populate_permissions = fault_service_type ?
(write ? UVM_POPULATE_PERMISSIONS_WRITE : UVM_POPULATE_PERMISSIONS_ANY) :
UVM_POPULATE_PERMISSIONS_INHERIT;
// Request uvm_migrate_pageable() to touch the corresponding page after
// population.
@@ -95,17 +68,18 @@ static NV_STATUS service_ats_faults(uvm_gpu_va_space_t *gpu_va_space,
// 2) guest physical -> host physical
//
// The overall ATS translation will fault if either of those translations is
// invalid. The get_user_pages() call above handles translation #1, but not
// #2. We don't know if we're running as a guest, but in case we are we can
// force that translation to be valid by touching the guest physical address
// from the CPU. If the translation is not valid then the access will cause
// a hypervisor fault. Note that dma_map_page() can't establish mappings
// used by GPU ATS SVA translations. GPU accesses to host physical addresses
// obtained as a result of the address translation request uses the CPU
// address space instead of the IOMMU address space since the translated
// host physical address isn't necessarily an IOMMU address. The only way to
// establish guest physical to host physical mapping in the CPU address
// space is to touch the page from the CPU.
// invalid. The pin_user_pages() call within uvm_migrate_pageable() call
// below handles translation #1, but not #2. We don't know if we're running
// as a guest, but in case we are we can force that translation to be valid
// by touching the guest physical address from the CPU. If the translation
// is not valid then the access will cause a hypervisor fault. Note that
// dma_map_page() can't establish mappings used by GPU ATS SVA translations.
// GPU accesses to host physical addresses obtained as a result of the
// address translation request uses the CPU address space instead of the
// IOMMU address space since the translated host physical address isn't
// necessarily an IOMMU address. The only way to establish guest physical to
// host physical mapping in the CPU address space is to touch the page from
// the CPU.
//
// We assume that the hypervisor mappings are all VM_PFNMAP, VM_SHARED, and
// VM_WRITE, meaning that the mappings are all granted write access on any
@@ -116,21 +90,22 @@ static NV_STATUS service_ats_faults(uvm_gpu_va_space_t *gpu_va_space,
uvm_migrate_args_t uvm_migrate_args =
{
.va_space = va_space,
.mm = mm,
.dst_id = gpu_va_space->gpu->parent->id,
.dst_node_id = -1,
.populate_permissions = write ? UVM_POPULATE_PERMISSIONS_WRITE : UVM_POPULATE_PERMISSIONS_ANY,
.touch = true,
.skip_mapped = true,
.user_space_start = &user_space_start,
.user_space_length = &user_space_length,
.va_space = va_space,
.mm = mm,
.dst_id = ats_context->residency_id,
.dst_node_id = ats_context->residency_node,
.start = start,
.length = length,
.populate_permissions = populate_permissions,
.touch = fault_service_type,
.skip_mapped = fault_service_type,
.populate_on_cpu_alloc_failures = fault_service_type,
.user_space_start = &user_space_start,
.user_space_length = &user_space_length,
};
UVM_ASSERT(uvm_ats_can_service_faults(gpu_va_space, mm));
expand_fault_region(vma, start, length, client_type, &uvm_migrate_args.start, &uvm_migrate_args.length);
// We are trying to use migrate_vma API in the kernel (if it exists) to
// populate and map the faulting region on the GPU. We want to do this only
// on the first touch. That is, pages which are not already mapped. So, we
@@ -145,10 +120,10 @@ static NV_STATUS service_ats_faults(uvm_gpu_va_space_t *gpu_va_space,
return status;
}
static void flush_tlb_write_faults(uvm_gpu_va_space_t *gpu_va_space,
NvU64 addr,
size_t size,
uvm_fault_client_type_t client_type)
static void flush_tlb_va_region(uvm_gpu_va_space_t *gpu_va_space,
NvU64 addr,
size_t size,
uvm_fault_client_type_t client_type)
{
uvm_ats_fault_invalidate_t *ats_invalidate;
@@ -157,12 +132,323 @@ static void flush_tlb_write_faults(uvm_gpu_va_space_t *gpu_va_space,
else
ats_invalidate = &gpu_va_space->gpu->parent->fault_buffer_info.non_replayable.ats_invalidate;
if (!ats_invalidate->write_faults_in_batch) {
uvm_tlb_batch_begin(&gpu_va_space->page_tables, &ats_invalidate->write_faults_tlb_batch);
ats_invalidate->write_faults_in_batch = true;
if (!ats_invalidate->tlb_batch_pending) {
uvm_tlb_batch_begin(&gpu_va_space->page_tables, &ats_invalidate->tlb_batch);
ats_invalidate->tlb_batch_pending = true;
}
uvm_tlb_batch_invalidate(&ats_invalidate->write_faults_tlb_batch, addr, size, PAGE_SIZE, UVM_MEMBAR_NONE);
uvm_tlb_batch_invalidate(&ats_invalidate->tlb_batch, addr, size, PAGE_SIZE, UVM_MEMBAR_NONE);
}
static void ats_batch_select_residency(uvm_gpu_va_space_t *gpu_va_space,
struct vm_area_struct *vma,
uvm_ats_fault_context_t *ats_context)
{
uvm_gpu_t *gpu = gpu_va_space->gpu;
int residency = uvm_gpu_numa_node(gpu);
#if defined(NV_MEMPOLICY_HAS_UNIFIED_NODES)
struct mempolicy *vma_policy = vma_policy(vma);
unsigned short mode;
ats_context->prefetch_state.has_preferred_location = false;
// It's safe to read vma_policy since the mmap_lock is held in at least read
// mode in this path.
uvm_assert_mmap_lock_locked(vma->vm_mm);
if (!vma_policy)
goto done;
mode = vma_policy->mode;
if ((mode == MPOL_BIND)
#if defined(NV_MPOL_PREFERRED_MANY_PRESENT)
|| (mode == MPOL_PREFERRED_MANY)
#endif
|| (mode == MPOL_PREFERRED)) {
int home_node = NUMA_NO_NODE;
#if defined(NV_MEMPOLICY_HAS_HOME_NODE)
if ((mode != MPOL_PREFERRED) && (vma_policy->home_node != NUMA_NO_NODE))
home_node = vma_policy->home_node;
#endif
// Prefer home_node if set. Otherwise, prefer the faulting GPU if it's
// in the list of preferred nodes, else prefer the closest_cpu_numa_node
// to the GPU if closest_cpu_numa_node is in the list of preferred
// nodes. Fallback to the faulting GPU if all else fails.
if (home_node != NUMA_NO_NODE) {
residency = home_node;
}
else if (!node_isset(residency, vma_policy->nodes)) {
int closest_cpu_numa_node = gpu->parent->closest_cpu_numa_node;
if ((closest_cpu_numa_node != NUMA_NO_NODE) && node_isset(closest_cpu_numa_node, vma_policy->nodes))
residency = gpu->parent->closest_cpu_numa_node;
else
residency = first_node(vma_policy->nodes);
}
if (!nodes_empty(vma_policy->nodes))
ats_context->prefetch_state.has_preferred_location = true;
}
// Update gpu if residency is not the faulting gpu.
if (residency != uvm_gpu_numa_node(gpu))
gpu = uvm_va_space_find_gpu_with_memory_node_id(gpu_va_space->va_space, residency);
done:
#else
ats_context->prefetch_state.has_preferred_location = false;
#endif
ats_context->residency_id = gpu ? gpu->id : UVM_ID_CPU;
ats_context->residency_node = residency;
}
static void get_range_in_vma(struct vm_area_struct *vma, NvU64 base, NvU64 *start, NvU64 *end)
{
*start = max(vma->vm_start, (unsigned long) base);
*end = min(vma->vm_end, (unsigned long) (base + UVM_VA_BLOCK_SIZE));
}
static uvm_page_index_t uvm_ats_cpu_page_index(NvU64 base, NvU64 addr)
{
UVM_ASSERT(addr >= base);
UVM_ASSERT(addr <= (base + UVM_VA_BLOCK_SIZE));
return (addr - base) / PAGE_SIZE;
}
// start and end must be aligned to PAGE_SIZE and must fall within
// [base, base + UVM_VA_BLOCK_SIZE]
static uvm_va_block_region_t uvm_ats_region_from_start_end(NvU64 start, NvU64 end)
{
// base can be greater than, less than or equal to the start of a VMA.
NvU64 base = UVM_VA_BLOCK_ALIGN_DOWN(start);
UVM_ASSERT(start < end);
UVM_ASSERT(PAGE_ALIGNED(start));
UVM_ASSERT(PAGE_ALIGNED(end));
UVM_ASSERT(IS_ALIGNED(base, UVM_VA_BLOCK_SIZE));
return uvm_va_block_region(uvm_ats_cpu_page_index(base, start), uvm_ats_cpu_page_index(base, end));
}
static uvm_va_block_region_t uvm_ats_region_from_vma(struct vm_area_struct *vma, NvU64 base)
{
NvU64 start;
NvU64 end;
get_range_in_vma(vma, base, &start, &end);
return uvm_ats_region_from_start_end(start, end);
}
#if UVM_HMM_RANGE_FAULT_SUPPORTED()
static bool uvm_ats_invalidate_notifier(struct mmu_interval_notifier *mni, unsigned long cur_seq)
{
uvm_ats_fault_context_t *ats_context = container_of(mni, uvm_ats_fault_context_t, prefetch_state.notifier);
uvm_va_space_t *va_space = ats_context->prefetch_state.va_space;
// The following write lock protects against concurrent invalidates while
// hmm_range_fault() is being called in ats_compute_residency_mask().
uvm_down_write(&va_space->ats.lock);
mmu_interval_set_seq(mni, cur_seq);
uvm_up_write(&va_space->ats.lock);
return true;
}
static bool uvm_ats_invalidate_notifier_entry(struct mmu_interval_notifier *mni,
const struct mmu_notifier_range *range,
unsigned long cur_seq)
{
UVM_ENTRY_RET(uvm_ats_invalidate_notifier(mni, cur_seq));
}
static const struct mmu_interval_notifier_ops uvm_ats_notifier_ops =
{
.invalidate = uvm_ats_invalidate_notifier_entry,
};
#endif
static NV_STATUS ats_compute_residency_mask(uvm_gpu_va_space_t *gpu_va_space,
struct vm_area_struct *vma,
NvU64 base,
uvm_ats_fault_context_t *ats_context)
{
NV_STATUS status = NV_OK;
uvm_page_mask_t *residency_mask = &ats_context->prefetch_state.residency_mask;
#if UVM_HMM_RANGE_FAULT_SUPPORTED()
int ret;
NvU64 start;
NvU64 end;
struct hmm_range range;
uvm_page_index_t page_index;
uvm_va_block_region_t vma_region;
uvm_va_space_t *va_space = gpu_va_space->va_space;
struct mm_struct *mm = va_space->va_space_mm.mm;
uvm_assert_rwsem_locked_read(&va_space->lock);
ats_context->prefetch_state.first_touch = true;
uvm_page_mask_zero(residency_mask);
get_range_in_vma(vma, base, &start, &end);
vma_region = uvm_ats_region_from_start_end(start, end);
range.notifier = &ats_context->prefetch_state.notifier;
range.start = start;
range.end = end;
range.hmm_pfns = ats_context->prefetch_state.pfns;
range.default_flags = 0;
range.pfn_flags_mask = 0;
range.dev_private_owner = NULL;
ats_context->prefetch_state.va_space = va_space;
// mmu_interval_notifier_insert() will try to acquire mmap_lock for write
// and will deadlock since mmap_lock is already held for read in this path.
// This is prevented by calling __mmu_notifier_register() during va_space
// creation. See the comment in uvm_mmu_notifier_register() for more
// details.
ret = mmu_interval_notifier_insert(range.notifier, mm, start, end, &uvm_ats_notifier_ops);
if (ret)
return errno_to_nv_status(ret);
while (true) {
range.notifier_seq = mmu_interval_read_begin(range.notifier);
ret = hmm_range_fault(&range);
if (ret == -EBUSY)
continue;
if (ret) {
status = errno_to_nv_status(ret);
UVM_ASSERT(status != NV_OK);
break;
}
uvm_down_read(&va_space->ats.lock);
// Pages may have been freed or re-allocated after hmm_range_fault() is
// called. So the PTE might point to a different page or nothing. In the
// memory hot-unplug case it is not safe to call page_to_nid() on the
// page as the struct page itself may have been freed. To protect
// against these cases, uvm_ats_invalidate_entry() blocks on va_space
// ATS write lock for concurrent invalidates since va_space ATS lock is
// held for read in this path.
if (!mmu_interval_read_retry(range.notifier, range.notifier_seq))
break;
uvm_up_read(&va_space->ats.lock);
}
if (status == NV_OK) {
for_each_va_block_page_in_region(page_index, vma_region) {
unsigned long pfn = ats_context->prefetch_state.pfns[page_index - vma_region.first];
if (pfn & HMM_PFN_VALID) {
struct page *page = hmm_pfn_to_page(pfn);
if (page_to_nid(page) == ats_context->residency_node)
uvm_page_mask_set(residency_mask, page_index);
ats_context->prefetch_state.first_touch = false;
}
}
uvm_up_read(&va_space->ats.lock);
}
mmu_interval_notifier_remove(range.notifier);
#else
uvm_page_mask_zero(residency_mask);
#endif
return status;
}
static void ats_compute_prefetch_mask(uvm_gpu_va_space_t *gpu_va_space,
struct vm_area_struct *vma,
uvm_ats_fault_context_t *ats_context,
uvm_va_block_region_t max_prefetch_region)
{
uvm_page_mask_t *accessed_mask = &ats_context->accessed_mask;
uvm_page_mask_t *residency_mask = &ats_context->prefetch_state.residency_mask;
uvm_page_mask_t *prefetch_mask = &ats_context->prefetch_state.prefetch_pages_mask;
uvm_perf_prefetch_bitmap_tree_t *bitmap_tree = &ats_context->prefetch_state.bitmap_tree;
if (uvm_page_mask_empty(accessed_mask))
return;
uvm_perf_prefetch_compute_ats(gpu_va_space->va_space,
accessed_mask,
uvm_va_block_region_from_mask(NULL, accessed_mask),
max_prefetch_region,
residency_mask,
bitmap_tree,
prefetch_mask);
}
static NV_STATUS ats_compute_prefetch(uvm_gpu_va_space_t *gpu_va_space,
struct vm_area_struct *vma,
NvU64 base,
uvm_ats_service_type_t service_type,
uvm_ats_fault_context_t *ats_context)
{
NV_STATUS status;
uvm_page_mask_t *accessed_mask = &ats_context->accessed_mask;
uvm_page_mask_t *prefetch_mask = &ats_context->prefetch_state.prefetch_pages_mask;
uvm_va_block_region_t max_prefetch_region = uvm_ats_region_from_vma(vma, base);
// Residency mask needs to be computed even if prefetching is disabled since
// the residency information is also needed by access counters servicing in
// uvm_ats_service_access_counters()
status = ats_compute_residency_mask(gpu_va_space, vma, base, ats_context);
if (status != NV_OK)
return status;
if (!uvm_perf_prefetch_enabled(gpu_va_space->va_space))
return status;
if (uvm_page_mask_empty(accessed_mask))
return status;
// Prefetch the entire region if none of the pages are resident on any node
// and if preferred_location is the faulting GPU.
if (ats_context->prefetch_state.has_preferred_location &&
(ats_context->prefetch_state.first_touch || (service_type == UVM_ATS_SERVICE_TYPE_ACCESS_COUNTERS)) &&
uvm_id_equal(ats_context->residency_id, gpu_va_space->gpu->id)) {
uvm_page_mask_init_from_region(prefetch_mask, max_prefetch_region, NULL);
}
else {
ats_compute_prefetch_mask(gpu_va_space, vma, ats_context, max_prefetch_region);
}
if (service_type == UVM_ATS_SERVICE_TYPE_FAULTS) {
uvm_page_mask_t *read_fault_mask = &ats_context->read_fault_mask;
uvm_page_mask_t *write_fault_mask = &ats_context->write_fault_mask;
uvm_page_mask_or(read_fault_mask, read_fault_mask, prefetch_mask);
if (vma->vm_flags & VM_WRITE)
uvm_page_mask_or(write_fault_mask, write_fault_mask, prefetch_mask);
}
else {
uvm_page_mask_or(accessed_mask, accessed_mask, prefetch_mask);
}
return status;
}
NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
@@ -178,6 +464,7 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
uvm_page_mask_t *faults_serviced_mask = &ats_context->faults_serviced_mask;
uvm_page_mask_t *reads_serviced_mask = &ats_context->reads_serviced_mask;
uvm_fault_client_type_t client_type = ats_context->client_type;
uvm_ats_service_type_t service_type = UVM_ATS_SERVICE_TYPE_FAULTS;
UVM_ASSERT(vma);
UVM_ASSERT(IS_ALIGNED(base, UVM_VA_BLOCK_SIZE));
@@ -186,6 +473,9 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
UVM_ASSERT(gpu_va_space->ats.enabled);
UVM_ASSERT(uvm_gpu_va_space_state(gpu_va_space) == UVM_GPU_VA_SPACE_STATE_ACTIVE);
uvm_assert_mmap_lock_locked(vma->vm_mm);
uvm_assert_rwsem_locked(&gpu_va_space->va_space->lock);
uvm_page_mask_zero(faults_serviced_mask);
uvm_page_mask_zero(reads_serviced_mask);
@@ -203,8 +493,16 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
uvm_page_mask_and(write_fault_mask, write_fault_mask, read_fault_mask);
else
uvm_page_mask_zero(write_fault_mask);
// There are no pending faults beyond write faults to RO region.
if (uvm_page_mask_empty(read_fault_mask))
return status;
}
ats_batch_select_residency(gpu_va_space, vma, ats_context);
ats_compute_prefetch(gpu_va_space, vma, base, service_type, ats_context);
for_each_va_block_subregion_in_mask(subregion, write_fault_mask, region) {
NvU64 start = base + (subregion.first * PAGE_SIZE);
size_t length = uvm_va_block_region_num_pages(subregion) * PAGE_SIZE;
@@ -215,12 +513,13 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
UVM_ASSERT(start >= vma->vm_start);
UVM_ASSERT((start + length) <= vma->vm_end);
status = service_ats_faults(gpu_va_space, vma, start, length, access_type, client_type);
status = service_ats_requests(gpu_va_space, vma, start, length, access_type, service_type, ats_context);
if (status != NV_OK)
return status;
if (vma->vm_flags & VM_WRITE) {
uvm_page_mask_region_fill(faults_serviced_mask, subregion);
uvm_ats_smmu_invalidate_tlbs(gpu_va_space, start, length);
// The Linux kernel never invalidates TLB entries on mapping
// permission upgrade. This is a problem if the GPU has cached
@@ -231,7 +530,7 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
// infinite loop because we just forward the fault to the Linux
// kernel and it will see that the permissions in the page table are
// correct. Therefore, we flush TLB entries on ATS write faults.
flush_tlb_write_faults(gpu_va_space, start, length, client_type);
flush_tlb_va_region(gpu_va_space, start, length, client_type);
}
else {
uvm_page_mask_region_fill(reads_serviced_mask, subregion);
@@ -244,15 +543,25 @@ NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
for_each_va_block_subregion_in_mask(subregion, read_fault_mask, region) {
NvU64 start = base + (subregion.first * PAGE_SIZE);
size_t length = uvm_va_block_region_num_pages(subregion) * PAGE_SIZE;
uvm_fault_access_type_t access_type = UVM_FAULT_ACCESS_TYPE_READ;
UVM_ASSERT(start >= vma->vm_start);
UVM_ASSERT((start + length) <= vma->vm_end);
status = service_ats_faults(gpu_va_space, vma, start, length, UVM_FAULT_ACCESS_TYPE_READ, client_type);
status = service_ats_requests(gpu_va_space, vma, start, length, access_type, service_type, ats_context);
if (status != NV_OK)
return status;
uvm_page_mask_region_fill(faults_serviced_mask, subregion);
// Similarly to permission upgrade scenario, discussed above, GPU
// will not re-fetch the entry if the PTE is invalid and page size
// is 4K. To avoid infinite faulting loop, invalidate TLB for every
// new translation written explicitly like in the case of permission
// upgrade.
if (PAGE_SIZE == UVM_PAGE_SIZE_4K)
flush_tlb_va_region(gpu_va_space, start, length, client_type);
}
return status;
@@ -287,7 +596,7 @@ NV_STATUS uvm_ats_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space,
NV_STATUS status;
uvm_push_t push;
if (!ats_invalidate->write_faults_in_batch)
if (!ats_invalidate->tlb_batch_pending)
return NV_OK;
UVM_ASSERT(gpu_va_space);
@@ -299,7 +608,7 @@ NV_STATUS uvm_ats_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space,
"Invalidate ATS entries");
if (status == NV_OK) {
uvm_tlb_batch_end(&ats_invalidate->write_faults_tlb_batch, &push, UVM_MEMBAR_NONE);
uvm_tlb_batch_end(&ats_invalidate->tlb_batch, &push, UVM_MEMBAR_NONE);
uvm_push_end(&push);
// Add this push to the GPU's tracker so that fault replays/clears can
@@ -307,8 +616,57 @@ NV_STATUS uvm_ats_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space,
status = uvm_tracker_add_push_safe(out_tracker, &push);
}
ats_invalidate->write_faults_in_batch = false;
ats_invalidate->tlb_batch_pending = false;
return status;
}
NV_STATUS uvm_ats_service_access_counters(uvm_gpu_va_space_t *gpu_va_space,
struct vm_area_struct *vma,
NvU64 base,
uvm_ats_fault_context_t *ats_context)
{
uvm_va_block_region_t subregion;
uvm_va_block_region_t region = uvm_va_block_region(0, PAGES_PER_UVM_VA_BLOCK);
uvm_ats_service_type_t service_type = UVM_ATS_SERVICE_TYPE_ACCESS_COUNTERS;
UVM_ASSERT(vma);
UVM_ASSERT(IS_ALIGNED(base, UVM_VA_BLOCK_SIZE));
UVM_ASSERT(g_uvm_global.ats.enabled);
UVM_ASSERT(gpu_va_space);
UVM_ASSERT(gpu_va_space->ats.enabled);
UVM_ASSERT(uvm_gpu_va_space_state(gpu_va_space) == UVM_GPU_VA_SPACE_STATE_ACTIVE);
uvm_assert_mmap_lock_locked(vma->vm_mm);
uvm_assert_rwsem_locked(&gpu_va_space->va_space->lock);
ats_batch_select_residency(gpu_va_space, vma, ats_context);
// Ignoring the return value of ats_compute_prefetch is ok since prefetching
// is just an optimization and servicing access counter migrations is still
// worthwhile even without any prefetching added. So, let servicing continue
// instead of returning early even if the prefetch computation fails.
ats_compute_prefetch(gpu_va_space, vma, base, service_type, ats_context);
// Remove pages which are already resident at the intended destination from
// the accessed_mask.
uvm_page_mask_andnot(&ats_context->accessed_mask,
&ats_context->accessed_mask,
&ats_context->prefetch_state.residency_mask);
for_each_va_block_subregion_in_mask(subregion, &ats_context->accessed_mask, region) {
NV_STATUS status;
NvU64 start = base + (subregion.first * PAGE_SIZE);
size_t length = uvm_va_block_region_num_pages(subregion) * PAGE_SIZE;
uvm_fault_access_type_t access_type = UVM_FAULT_ACCESS_TYPE_COUNT;
UVM_ASSERT(start >= vma->vm_start);
UVM_ASSERT((start + length) <= vma->vm_end);
status = service_ats_requests(gpu_va_space, vma, start, length, access_type, service_type, ats_context);
if (status != NV_OK)
return status;
}
return NV_OK;
}

View File

@@ -42,17 +42,37 @@
// corresponding bit in read_fault_mask. These returned masks are only valid if
// the return status is NV_OK. Status other than NV_OK indicate system global
// fault servicing failures.
//
// LOCKING: The caller must retain and hold the mmap_lock and hold the va_space
// lock.
NV_STATUS uvm_ats_service_faults(uvm_gpu_va_space_t *gpu_va_space,
struct vm_area_struct *vma,
NvU64 base,
uvm_ats_fault_context_t *ats_context);
// Service access counter notifications on ATS regions in the range (base, base
// + UVM_VA_BLOCK_SIZE) for individual pages in the range requested by page_mask
// set in ats_context->accessed_mask. base must be aligned to UVM_VA_BLOCK_SIZE.
// The caller is responsible for ensuring that the addresses in the
// accessed_mask is completely covered by the VMA. The caller is also
// responsible for handling any errors returned by this function.
//
// Returns NV_OK if servicing was successful. Any other error indicates an error
// while servicing the range.
//
// LOCKING: The caller must retain and hold the mmap_lock and hold the va_space
// lock.
NV_STATUS uvm_ats_service_access_counters(uvm_gpu_va_space_t *gpu_va_space,
struct vm_area_struct *vma,
NvU64 base,
uvm_ats_fault_context_t *ats_context);
// Return whether there are any VA ranges (and thus GMMU mappings) within the
// UVM_GMMU_ATS_GRANULARITY-aligned region containing address.
bool uvm_ats_check_in_gmmu_region(uvm_va_space_t *va_space, NvU64 address, uvm_va_range_t *next);
// This function performs pending TLB invalidations for ATS and clears the
// ats_invalidate->write_faults_in_batch flag
// ats_invalidate->tlb_batch_pending flag
NV_STATUS uvm_ats_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space,
uvm_ats_fault_invalidate_t *ats_invalidate,
uvm_tracker_t *out_tracker);

View File

@@ -29,8 +29,13 @@
#include "uvm_va_space.h"
#include "uvm_va_space_mm.h"
#include <asm/io.h>
#include <linux/log2.h>
#include <linux/iommu.h>
#include <linux/mm_types.h>
#include <linux/acpi.h>
#include <linux/device.h>
#include <linux/mmu_context.h>
// linux/sched/mm.h is needed for mmget_not_zero and mmput to get the mm
// reference required for the iommu_sva_bind_device() call. This header is not
@@ -46,17 +51,276 @@
#define UVM_IOMMU_SVA_BIND_DEVICE(dev, mm) iommu_sva_bind_device(dev, mm)
#endif
// Type to represent a 128-bit SMMU command queue command.
struct smmu_cmd {
NvU64 low;
NvU64 high;
};
// Base address of SMMU CMDQ-V for GSMMU0.
#define SMMU_CMDQV_BASE_ADDR(smmu_base) (smmu_base + 0x200000)
#define SMMU_CMDQV_BASE_LEN 0x00830000
// CMDQV configuration is done by firmware but we check status here.
#define SMMU_CMDQV_CONFIG 0x0
#define SMMU_CMDQV_CONFIG_CMDQV_EN BIT(0)
// Used to map a particular VCMDQ to a VINTF.
#define SMMU_CMDQV_CMDQ_ALLOC_MAP(vcmdq_id) (0x200 + 0x4 * (vcmdq_id))
#define SMMU_CMDQV_CMDQ_ALLOC_MAP_ALLOC BIT(0)
// Shift for the field containing the index of the virtual interface
// owning the VCMDQ.
#define SMMU_CMDQV_CMDQ_ALLOC_MAP_VIRT_INTF_INDX_SHIFT 15
// Base address for the VINTF registers.
#define SMMU_VINTF_BASE_ADDR(cmdqv_base_addr, vintf_id) (cmdqv_base_addr + 0x1000 + 0x100 * (vintf_id))
// Virtual interface (VINTF) configuration registers. The WAR only
// works on baremetal so we need to configure ourselves as the
// hypervisor owner.
#define SMMU_VINTF_CONFIG 0x0
#define SMMU_VINTF_CONFIG_ENABLE BIT(0)
#define SMMU_VINTF_CONFIG_HYP_OWN BIT(17)
#define SMMU_VINTF_STATUS 0x0
#define SMMU_VINTF_STATUS_ENABLED BIT(0)
// Caclulates the base address for a particular VCMDQ instance.
#define SMMU_VCMDQ_BASE_ADDR(cmdqv_base_addr, vcmdq_id) (cmdqv_base_addr + 0x10000 + 0x80 * (vcmdq_id))
// SMMU command queue consumer index register. Updated by SMMU
// when commands are consumed.
#define SMMU_VCMDQ_CONS 0x0
// SMMU command queue producer index register. Updated by UVM when
// commands are added to the queue.
#define SMMU_VCMDQ_PROD 0x4
// Configuration register used to enable a VCMDQ.
#define SMMU_VCMDQ_CONFIG 0x8
#define SMMU_VCMDQ_CONFIG_ENABLE BIT(0)
// Status register used to check the VCMDQ is enabled.
#define SMMU_VCMDQ_STATUS 0xc
#define SMMU_VCMDQ_STATUS_ENABLED BIT(0)
// Base address offset for the VCMDQ registers.
#define SMMU_VCMDQ_CMDQ_BASE 0x10000
// Size of the command queue. Each command is 16 bytes and we can't
// have a command queue greater than one page in size.
#define SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE (PAGE_SHIFT - ilog2(sizeof(struct smmu_cmd)))
#define SMMU_VCMDQ_CMDQ_ENTRIES (1UL << SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE)
// We always use VINTF63 for the WAR
#define VINTF 63
static void smmu_vintf_write32(void __iomem *smmu_cmdqv_base, int reg, NvU32 val)
{
iowrite32(val, SMMU_VINTF_BASE_ADDR(smmu_cmdqv_base, VINTF) + reg);
}
static NvU32 smmu_vintf_read32(void __iomem *smmu_cmdqv_base, int reg)
{
return ioread32(SMMU_VINTF_BASE_ADDR(smmu_cmdqv_base, VINTF) + reg);
}
// We always use VCMDQ127 for the WAR
#define VCMDQ 127
void smmu_vcmdq_write32(void __iomem *smmu_cmdqv_base, int reg, NvU32 val)
{
iowrite32(val, SMMU_VCMDQ_BASE_ADDR(smmu_cmdqv_base, VCMDQ) + reg);
}
NvU32 smmu_vcmdq_read32(void __iomem *smmu_cmdqv_base, int reg)
{
return ioread32(SMMU_VCMDQ_BASE_ADDR(smmu_cmdqv_base, VCMDQ) + reg);
}
static void smmu_vcmdq_write64(void __iomem *smmu_cmdqv_base, int reg, NvU64 val)
{
iowrite64(val, SMMU_VCMDQ_BASE_ADDR(smmu_cmdqv_base, VCMDQ) + reg);
}
// Fix for Bug 4130089: [GH180][r535] WAR for kernel not issuing SMMU
// TLB invalidates on read-only to read-write upgrades
static NV_STATUS uvm_ats_smmu_war_init(uvm_parent_gpu_t *parent_gpu)
{
uvm_spin_loop_t spin;
NV_STATUS status;
unsigned long cmdqv_config;
void __iomem *smmu_cmdqv_base;
struct acpi_iort_node *node;
struct acpi_iort_smmu_v3 *iort_smmu;
node = *(struct acpi_iort_node **) dev_get_platdata(parent_gpu->pci_dev->dev.iommu->iommu_dev->dev->parent);
iort_smmu = (struct acpi_iort_smmu_v3 *) node->node_data;
smmu_cmdqv_base = ioremap(SMMU_CMDQV_BASE_ADDR(iort_smmu->base_address), SMMU_CMDQV_BASE_LEN);
if (!smmu_cmdqv_base)
return NV_ERR_NO_MEMORY;
parent_gpu->smmu_war.smmu_cmdqv_base = smmu_cmdqv_base;
cmdqv_config = ioread32(smmu_cmdqv_base + SMMU_CMDQV_CONFIG);
if (!(cmdqv_config & SMMU_CMDQV_CONFIG_CMDQV_EN)) {
status = NV_ERR_OBJECT_NOT_FOUND;
goto out;
}
// Allocate SMMU CMDQ pages for WAR
parent_gpu->smmu_war.smmu_cmdq = alloc_page(NV_UVM_GFP_FLAGS | __GFP_ZERO);
if (!parent_gpu->smmu_war.smmu_cmdq) {
status = NV_ERR_NO_MEMORY;
goto out;
}
// Initialise VINTF for the WAR
smmu_vintf_write32(smmu_cmdqv_base, SMMU_VINTF_CONFIG, SMMU_VINTF_CONFIG_ENABLE | SMMU_VINTF_CONFIG_HYP_OWN);
UVM_SPIN_WHILE(!(smmu_vintf_read32(smmu_cmdqv_base, SMMU_VINTF_STATUS) & SMMU_VINTF_STATUS_ENABLED), &spin);
// Allocate VCMDQ to VINTF
iowrite32((VINTF << SMMU_CMDQV_CMDQ_ALLOC_MAP_VIRT_INTF_INDX_SHIFT) | SMMU_CMDQV_CMDQ_ALLOC_MAP_ALLOC,
smmu_cmdqv_base + SMMU_CMDQV_CMDQ_ALLOC_MAP(VCMDQ));
smmu_vcmdq_write64(smmu_cmdqv_base, SMMU_VCMDQ_CMDQ_BASE,
page_to_phys(parent_gpu->smmu_war.smmu_cmdq) | SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE);
smmu_vcmdq_write32(smmu_cmdqv_base, SMMU_VCMDQ_CONS, 0);
smmu_vcmdq_write32(smmu_cmdqv_base, SMMU_VCMDQ_PROD, 0);
smmu_vcmdq_write32(smmu_cmdqv_base, SMMU_VCMDQ_CONFIG, SMMU_VCMDQ_CONFIG_ENABLE);
UVM_SPIN_WHILE(!(smmu_vcmdq_read32(smmu_cmdqv_base, SMMU_VCMDQ_STATUS) & SMMU_VCMDQ_STATUS_ENABLED), &spin);
uvm_mutex_init(&parent_gpu->smmu_war.smmu_lock, UVM_LOCK_ORDER_LEAF);
parent_gpu->smmu_war.smmu_prod = 0;
parent_gpu->smmu_war.smmu_cons = 0;
return NV_OK;
out:
iounmap(parent_gpu->smmu_war.smmu_cmdqv_base);
parent_gpu->smmu_war.smmu_cmdqv_base = NULL;
return status;
}
static void uvm_ats_smmu_war_deinit(uvm_parent_gpu_t *parent_gpu)
{
void __iomem *smmu_cmdqv_base = parent_gpu->smmu_war.smmu_cmdqv_base;
NvU32 cmdq_alloc_map;
if (parent_gpu->smmu_war.smmu_cmdqv_base) {
smmu_vcmdq_write32(smmu_cmdqv_base, SMMU_VCMDQ_CONFIG, 0);
cmdq_alloc_map = ioread32(smmu_cmdqv_base + SMMU_CMDQV_CMDQ_ALLOC_MAP(VCMDQ));
iowrite32(cmdq_alloc_map & SMMU_CMDQV_CMDQ_ALLOC_MAP_ALLOC, smmu_cmdqv_base + SMMU_CMDQV_CMDQ_ALLOC_MAP(VCMDQ));
smmu_vintf_write32(smmu_cmdqv_base, SMMU_VINTF_CONFIG, 0);
}
if (parent_gpu->smmu_war.smmu_cmdq)
__free_page(parent_gpu->smmu_war.smmu_cmdq);
if (parent_gpu->smmu_war.smmu_cmdqv_base)
iounmap(parent_gpu->smmu_war.smmu_cmdqv_base);
}
// The SMMU on ARM64 can run under different translation regimes depending on
// what features the OS and CPU variant support. The CPU for GH180 supports
// virtualisation extensions and starts the kernel at EL2 meaning SMMU operates
// under the NS-EL2-E2H translation regime. Therefore we need to use the
// TLBI_EL2_* commands which invalidate TLB entries created under this
// translation regime.
#define CMDQ_OP_TLBI_EL2_ASID 0x21;
#define CMDQ_OP_TLBI_EL2_VA 0x22;
#define CMDQ_OP_CMD_SYNC 0x46
// Use the same maximum as used for MAX_TLBI_OPS in the upstream
// kernel.
#define UVM_MAX_TLBI_OPS (1UL << (PAGE_SHIFT - 3))
#if UVM_ATS_SMMU_WAR_REQUIRED()
void uvm_ats_smmu_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space, NvU64 addr, size_t size)
{
struct mm_struct *mm = gpu_va_space->va_space->va_space_mm.mm;
uvm_parent_gpu_t *parent_gpu = gpu_va_space->gpu->parent;
struct {
NvU64 low;
NvU64 high;
} *vcmdq;
unsigned long vcmdq_prod;
NvU64 end;
uvm_spin_loop_t spin;
NvU16 asid;
if (!parent_gpu->smmu_war.smmu_cmdqv_base)
return;
asid = arm64_mm_context_get(mm);
vcmdq = kmap(parent_gpu->smmu_war.smmu_cmdq);
uvm_mutex_lock(&parent_gpu->smmu_war.smmu_lock);
vcmdq_prod = parent_gpu->smmu_war.smmu_prod;
// Our queue management is very simple. The mutex prevents multiple
// producers writing to the queue and all our commands require waiting for
// the queue to drain so we know it's empty. If we can't fit enough commands
// in the queue we just invalidate the whole ASID.
//
// The command queue is a cirular buffer with the MSB representing a wrap
// bit that must toggle on each wrap. See the SMMU architecture
// specification for more details.
//
// SMMU_VCMDQ_CMDQ_ENTRIES - 1 because we need to leave space for the
// CMD_SYNC.
if ((size >> PAGE_SHIFT) > min(UVM_MAX_TLBI_OPS, SMMU_VCMDQ_CMDQ_ENTRIES - 1)) {
vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].low = CMDQ_OP_TLBI_EL2_ASID;
vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].low |= (NvU64) asid << 48;
vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].high = 0;
vcmdq_prod++;
}
else {
for (end = addr + size; addr < end; addr += PAGE_SIZE) {
vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].low = CMDQ_OP_TLBI_EL2_VA;
vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].low |= (NvU64) asid << 48;
vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].high = addr & ~((1UL << 12) - 1);
vcmdq_prod++;
}
}
vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].low = CMDQ_OP_CMD_SYNC;
vcmdq[vcmdq_prod % SMMU_VCMDQ_CMDQ_ENTRIES].high = 0x0;
vcmdq_prod++;
// MSB is the wrap bit
vcmdq_prod &= (1UL << (SMMU_VCMDQ_CMDQ_BASE_LOG2SIZE + 1)) - 1;
parent_gpu->smmu_war.smmu_prod = vcmdq_prod;
smmu_vcmdq_write32(parent_gpu->smmu_war.smmu_cmdqv_base, SMMU_VCMDQ_PROD, parent_gpu->smmu_war.smmu_prod);
UVM_SPIN_WHILE(
(smmu_vcmdq_read32(parent_gpu->smmu_war.smmu_cmdqv_base, SMMU_VCMDQ_CONS) & GENMASK(19, 0)) != vcmdq_prod,
&spin);
uvm_mutex_unlock(&parent_gpu->smmu_war.smmu_lock);
kunmap(parent_gpu->smmu_war.smmu_cmdq);
arm64_mm_context_put(mm);
}
#endif
NV_STATUS uvm_ats_sva_add_gpu(uvm_parent_gpu_t *parent_gpu)
{
int ret;
ret = iommu_dev_enable_feature(&parent_gpu->pci_dev->dev, IOMMU_DEV_FEAT_SVA);
if (ret)
return errno_to_nv_status(ret);
return errno_to_nv_status(ret);
if (UVM_ATS_SMMU_WAR_REQUIRED())
return uvm_ats_smmu_war_init(parent_gpu);
else
return NV_OK;
}
void uvm_ats_sva_remove_gpu(uvm_parent_gpu_t *parent_gpu)
{
if (UVM_ATS_SMMU_WAR_REQUIRED())
uvm_ats_smmu_war_deinit(parent_gpu);
iommu_dev_disable_feature(&parent_gpu->pci_dev->dev, IOMMU_DEV_FEAT_SVA);
}

View File

@@ -32,23 +32,39 @@
// For ATS support on aarch64, arm_smmu_sva_bind() is needed for
// iommu_sva_bind_device() calls. Unfortunately, arm_smmu_sva_bind() is not
// conftest-able. We instead look for the presence of ioasid_get() or
// mm_pasid_set(). ioasid_get() was added in the same patch series as
// arm_smmu_sva_bind() and removed in v6.0. mm_pasid_set() was added in the
// mm_pasid_drop(). ioasid_get() was added in the same patch series as
// arm_smmu_sva_bind() and removed in v6.0. mm_pasid_drop() was added in the
// same patch as the removal of ioasid_get(). We assume the presence of
// arm_smmu_sva_bind() if ioasid_get(v5.11 - v5.17) or mm_pasid_set(v5.18+) is
// arm_smmu_sva_bind() if ioasid_get(v5.11 - v5.17) or mm_pasid_drop(v5.18+) is
// present.
//
// arm_smmu_sva_bind() was added with commit
// 32784a9562fb0518b12e9797ee2aec52214adf6f and ioasid_get() was added with
// commit cb4789b0d19ff231ce9f73376a023341300aed96 (11/23/2020). Commit
// 701fac40384f07197b106136012804c3cae0b3de (02/15/2022) removed ioasid_get()
// and added mm_pasid_set().
#if UVM_CAN_USE_MMU_NOTIFIERS() && (defined(NV_IOASID_GET_PRESENT) || defined(NV_MM_PASID_SET_PRESENT))
#define UVM_ATS_SVA_SUPPORTED() 1
// and added mm_pasid_drop().
#if UVM_CAN_USE_MMU_NOTIFIERS() && (defined(NV_IOASID_GET_PRESENT) || defined(NV_MM_PASID_DROP_PRESENT))
#if defined(CONFIG_IOMMU_SVA)
#define UVM_ATS_SVA_SUPPORTED() 1
#else
#define UVM_ATS_SVA_SUPPORTED() 0
#endif
#else
#define UVM_ATS_SVA_SUPPORTED() 0
#endif
// If NV_MMU_NOTIFIER_OPS_HAS_ARCH_INVALIDATE_SECONDARY_TLBS is defined it
// means the upstream fix is in place so no need for the WAR from
// Bug 4130089: [GH180][r535] WAR for kernel not issuing SMMU TLB
// invalidates on read-only
#if defined(NV_MMU_NOTIFIER_OPS_HAS_ARCH_INVALIDATE_SECONDARY_TLBS)
#define UVM_ATS_SMMU_WAR_REQUIRED() 0
#elif NVCPU_IS_AARCH64
#define UVM_ATS_SMMU_WAR_REQUIRED() 1
#else
#define UVM_ATS_SMMU_WAR_REQUIRED() 0
#endif
typedef struct
{
int placeholder;
@@ -77,6 +93,17 @@ typedef struct
// LOCKING: None
void uvm_ats_sva_unregister_gpu_va_space(uvm_gpu_va_space_t *gpu_va_space);
// Fix for Bug 4130089: [GH180][r535] WAR for kernel not issuing SMMU
// TLB invalidates on read-only to read-write upgrades
#if UVM_ATS_SMMU_WAR_REQUIRED()
void uvm_ats_smmu_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space, NvU64 addr, size_t size);
#else
static void uvm_ats_smmu_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space, NvU64 addr, size_t size)
{
}
#endif
#else
static NV_STATUS uvm_ats_sva_add_gpu(uvm_parent_gpu_t *parent_gpu)
{
@@ -107,6 +134,11 @@ typedef struct
{
}
static void uvm_ats_smmu_invalidate_tlbs(uvm_gpu_va_space_t *gpu_va_space, NvU64 addr, size_t size)
{
}
#endif // UVM_ATS_SVA_SUPPORTED
#endif // __UVM_ATS_SVA_H__

View File

@@ -56,7 +56,7 @@ static NV_STATUS test_non_pipelined(uvm_gpu_t *gpu)
// TODO: Bug 3839176: the test is waived on Confidential Computing because
// it assumes that GPU can access system memory without using encryption.
if (uvm_conf_computing_mode_enabled(gpu))
if (g_uvm_global.conf_computing_enabled)
return NV_OK;
status = uvm_rm_mem_alloc_and_map_cpu(gpu, UVM_RM_MEM_TYPE_SYS, CE_TEST_MEM_SIZE, 0, &host_mem);
@@ -176,7 +176,7 @@ static NV_STATUS test_membar(uvm_gpu_t *gpu)
// TODO: Bug 3839176: the test is waived on Confidential Computing because
// it assumes that GPU can access system memory without using encryption.
if (uvm_conf_computing_mode_enabled(gpu))
if (g_uvm_global.conf_computing_enabled)
return NV_OK;
status = uvm_rm_mem_alloc_and_map_cpu(gpu, UVM_RM_MEM_TYPE_SYS, sizeof(NvU32), 0, &host_mem);
@@ -191,7 +191,7 @@ static NV_STATUS test_membar(uvm_gpu_t *gpu)
for (i = 0; i < REDUCTIONS; ++i) {
uvm_push_set_flag(&push, UVM_PUSH_FLAG_NEXT_MEMBAR_NONE);
gpu->parent->ce_hal->semaphore_reduction_inc(&push, host_mem_gpu_va, REDUCTIONS + 1);
gpu->parent->ce_hal->semaphore_reduction_inc(&push, host_mem_gpu_va, REDUCTIONS);
}
// Without a sys membar the channel tracking semaphore can and does complete
@@ -338,11 +338,6 @@ static NV_STATUS test_memcpy_and_memset_inner(uvm_gpu_t *gpu,
return NV_OK;
}
if (!gpu->parent->ce_hal->memcopy_is_valid(&push, dst, src)) {
TEST_NV_CHECK_RET(uvm_push_end_and_wait(&push));
return NV_OK;
}
// The input virtual addresses exist in UVM's internal address space, not
// the proxy address space
if (uvm_channel_is_proxy(push.channel)) {
@@ -401,7 +396,7 @@ static NV_STATUS test_memcpy_and_memset_inner(uvm_gpu_t *gpu,
static NV_STATUS test_memcpy_and_memset(uvm_gpu_t *gpu)
{
NV_STATUS status = NV_OK;
bool is_proxy_va_space;
bool is_proxy_va_space = false;
uvm_gpu_address_t gpu_verif_addr;
void *cpu_verif_addr;
uvm_mem_t *verif_mem = NULL;
@@ -416,10 +411,11 @@ static NV_STATUS test_memcpy_and_memset(uvm_gpu_t *gpu)
size_t i, j, k, s;
uvm_mem_alloc_params_t mem_params = {0};
if (uvm_conf_computing_mode_enabled(gpu))
if (g_uvm_global.conf_computing_enabled)
TEST_NV_CHECK_GOTO(uvm_mem_alloc_sysmem_dma_and_map_cpu_kernel(size, gpu, current->mm, &verif_mem), done);
else
TEST_NV_CHECK_GOTO(uvm_mem_alloc_sysmem_and_map_cpu_kernel(size, current->mm, &verif_mem), done);
TEST_NV_CHECK_GOTO(uvm_mem_map_gpu_kernel(verif_mem, gpu), done);
gpu_verif_addr = uvm_mem_gpu_address_virtual_kernel(verif_mem, gpu);
@@ -437,6 +433,34 @@ static NV_STATUS test_memcpy_and_memset(uvm_gpu_t *gpu)
}
}
// Virtual address (in UVM's internal address space) backed by sysmem
TEST_NV_CHECK_GOTO(uvm_rm_mem_alloc(gpu, UVM_RM_MEM_TYPE_SYS, size, 0, &sys_rm_mem), done);
gpu_addresses[0] = uvm_rm_mem_get_gpu_va(sys_rm_mem, gpu, is_proxy_va_space);
if (g_uvm_global.conf_computing_enabled) {
for (i = 0; i < iterations; ++i) {
for (s = 0; s < ARRAY_SIZE(element_sizes); s++) {
TEST_NV_CHECK_GOTO(test_memcpy_and_memset_inner(gpu,
gpu_addresses[0],
gpu_addresses[0],
size,
element_sizes[s],
gpu_verif_addr,
cpu_verif_addr,
i),
done);
}
}
// Because gpu_verif_addr is in sysmem, when the Confidential
// Computing feature is enabled, only the previous cases are valid.
// TODO: Bug 3839176: the test partially waived on Confidential
// Computing because it assumes that GPU can access system memory
// without using encryption.
goto done;
}
// Using a page size equal to the allocation size ensures that the UVM
// memories about to be allocated are physically contiguous. And since the
// size is a valid GPU page size, the memories can be virtually mapped on
@@ -448,37 +472,22 @@ static NV_STATUS test_memcpy_and_memset(uvm_gpu_t *gpu)
// Physical address in sysmem
TEST_NV_CHECK_GOTO(uvm_mem_alloc(&mem_params, &sys_uvm_mem), done);
TEST_NV_CHECK_GOTO(uvm_mem_map_gpu_phys(sys_uvm_mem, gpu), done);
gpu_addresses[0] = uvm_mem_gpu_address_physical(sys_uvm_mem, gpu, 0, size);
gpu_addresses[1] = uvm_mem_gpu_address_physical(sys_uvm_mem, gpu, 0, size);
// Physical address in vidmem
mem_params.backing_gpu = gpu;
TEST_NV_CHECK_GOTO(uvm_mem_alloc(&mem_params, &gpu_uvm_mem), done);
gpu_addresses[1] = uvm_mem_gpu_address_physical(gpu_uvm_mem, gpu, 0, size);
gpu_addresses[2] = uvm_mem_gpu_address_physical(gpu_uvm_mem, gpu, 0, size);
// Virtual address (in UVM's internal address space) backed by vidmem
TEST_NV_CHECK_GOTO(uvm_rm_mem_alloc(gpu, UVM_RM_MEM_TYPE_GPU, size, 0, &gpu_rm_mem), done);
is_proxy_va_space = false;
gpu_addresses[2] = uvm_rm_mem_get_gpu_va(gpu_rm_mem, gpu, is_proxy_va_space);
gpu_addresses[3] = uvm_rm_mem_get_gpu_va(gpu_rm_mem, gpu, is_proxy_va_space);
// Virtual address (in UVM's internal address space) backed by sysmem
TEST_NV_CHECK_GOTO(uvm_rm_mem_alloc(gpu, UVM_RM_MEM_TYPE_SYS, size, 0, &sys_rm_mem), done);
gpu_addresses[3] = uvm_rm_mem_get_gpu_va(sys_rm_mem, gpu, is_proxy_va_space);
for (i = 0; i < iterations; ++i) {
for (j = 0; j < ARRAY_SIZE(gpu_addresses); ++j) {
for (k = 0; k < ARRAY_SIZE(gpu_addresses); ++k) {
for (s = 0; s < ARRAY_SIZE(element_sizes); s++) {
// Because gpu_verif_addr is in sysmem, when the Confidential
// Computing feature is enabled, only the following cases are
// valid.
//
// TODO: Bug 3839176: the test partially waived on
// Confidential Computing because it assumes that GPU can
// access system memory without using encryption.
if (uvm_conf_computing_mode_enabled(gpu) &&
!(gpu_addresses[k].is_unprotected && gpu_addresses[j].is_unprotected)) {
continue;
}
TEST_NV_CHECK_GOTO(test_memcpy_and_memset_inner(gpu,
gpu_addresses[k],
gpu_addresses[j],
@@ -551,7 +560,7 @@ static NV_STATUS test_semaphore_reduction_inc(uvm_gpu_t *gpu)
// TODO: Bug 3839176: the test is waived on Confidential Computing because
// it assumes that GPU can access system memory without using encryption.
if (uvm_conf_computing_mode_enabled(gpu))
if (g_uvm_global.conf_computing_enabled)
return NV_OK;
status = test_semaphore_alloc_sem(gpu, size, &mem);
@@ -569,7 +578,7 @@ static NV_STATUS test_semaphore_reduction_inc(uvm_gpu_t *gpu)
for (i = 0; i < REDUCTIONS; i++) {
uvm_push_set_flag(&push, UVM_PUSH_FLAG_NEXT_MEMBAR_NONE);
gpu->parent->ce_hal->semaphore_reduction_inc(&push, gpu_va, i+1);
gpu->parent->ce_hal->semaphore_reduction_inc(&push, gpu_va, REDUCTIONS);
}
status = uvm_push_end_and_wait(&push);
@@ -603,7 +612,7 @@ static NV_STATUS test_semaphore_release(uvm_gpu_t *gpu)
// TODO: Bug 3839176: the test is waived on Confidential Computing because
// it assumes that GPU can access system memory without using encryption.
if (uvm_conf_computing_mode_enabled(gpu))
if (g_uvm_global.conf_computing_enabled)
return NV_OK;
status = test_semaphore_alloc_sem(gpu, size, &mem);
@@ -657,7 +666,7 @@ static NV_STATUS test_semaphore_timestamp(uvm_gpu_t *gpu)
// TODO: Bug 3839176: the test is waived on Confidential Computing because
// it assumes that GPU can access system memory without using encryption.
if (uvm_conf_computing_mode_enabled(gpu))
if (g_uvm_global.conf_computing_enabled)
return NV_OK;
status = test_semaphore_alloc_sem(gpu, size, &mem);
@@ -752,7 +761,7 @@ static NV_STATUS alloc_vidmem_protected(uvm_gpu_t *gpu, uvm_mem_t **mem, size_t
*mem = NULL;
TEST_NV_CHECK_RET(uvm_mem_alloc_vidmem_protected(size, gpu, mem));
TEST_NV_CHECK_RET(uvm_mem_alloc_vidmem(size, gpu, mem));
TEST_NV_CHECK_GOTO(uvm_mem_map_gpu_kernel(*mem, gpu), err);
TEST_NV_CHECK_GOTO(zero_vidmem(*mem), err);
@@ -1145,7 +1154,7 @@ static NV_STATUS test_encryption_decryption(uvm_gpu_t *gpu,
} small_sizes[] = {{1, 1}, {3, 1}, {8, 1}, {2, 2}, {8, 4}, {UVM_PAGE_SIZE_4K - 8, 8}, {UVM_PAGE_SIZE_4K + 8, 8}};
// Only Confidential Computing uses CE encryption/decryption
if (!uvm_conf_computing_mode_enabled(gpu))
if (!g_uvm_global.conf_computing_enabled)
return NV_OK;
// Use a size, and copy size, that are not a multiple of common page sizes.

File diff suppressed because it is too large Load Diff

View File

@@ -104,16 +104,14 @@ typedef enum
// ----------------------------------
// Channel type with fixed schedules
// Work Launch Channel (WLC) is a specialized channel
// for launching work on other channels when
// Confidential Computing is enabled.
// It is paired with LCIC (below)
// Work Launch Channel (WLC) is a specialized channel for launching work on
// other channels when the Confidential Computing is feature enabled. It is
// paired with LCIC (below)
UVM_CHANNEL_TYPE_WLC,
// Launch Confirmation Indicator Channel (LCIC) is a
// specialized channel with fixed schedule. It gets
// triggered by executing WLC work, and makes sure that
// WLC get/put pointers are up-to-date.
// Launch Confirmation Indicator Channel (LCIC) is a specialized channel
// with fixed schedule. It gets triggered by executing WLC work, and makes
// sure that WLC get/put pointers are up-to-date.
UVM_CHANNEL_TYPE_LCIC,
UVM_CHANNEL_TYPE_COUNT,
@@ -242,11 +240,9 @@ typedef struct
DECLARE_BITMAP(push_locks, UVM_CHANNEL_MAX_NUM_CHANNELS_PER_POOL);
// Counting semaphore for available and unlocked channels, it must be
// acquired before submitting work to a secure channel.
// acquired before submitting work to a channel when the Confidential
// Computing feature is enabled.
uvm_semaphore_t push_sem;
// See uvm_channel_is_secure() documentation.
bool secure;
} uvm_channel_pool_t;
struct uvm_channel_struct
@@ -304,8 +300,9 @@ struct uvm_channel_struct
// its internal operation and each push may modify this state.
uvm_mutex_t push_lock;
// Every secure channel has cryptographic state in HW, which is
// mirrored here for CPU-side operations.
// When the Confidential Computing feature is enabled, every channel has
// cryptographic state in HW, which is mirrored here for CPU-side
// operations.
UvmCslContext ctx;
bool is_ctx_initialized;
@@ -355,6 +352,13 @@ struct uvm_channel_struct
// Encryption auth tags have to be located in unprotected sysmem.
void *launch_auth_tag_cpu;
NvU64 launch_auth_tag_gpu_va;
// Used to decrypt the push back to protected sysmem.
// This happens when profilers register callbacks for migration data.
uvm_push_crypto_bundle_t *push_crypto_bundles;
// Accompanying authentication tags for the crypto bundles
uvm_rm_mem_t *push_crypto_bundle_auth_tags;
} conf_computing;
// RM channel information
@@ -452,46 +456,28 @@ struct uvm_channel_manager_struct
// Create a channel manager for the GPU
NV_STATUS uvm_channel_manager_create(uvm_gpu_t *gpu, uvm_channel_manager_t **manager_out);
static bool uvm_channel_pool_is_ce(uvm_channel_pool_t *pool);
// A channel is secure if it has HW encryption capabilities.
//
// Secure channels are treated differently in the UVM driver. Each secure
// channel has a unique CSL context associated with it, has relatively
// restrictive reservation policies (in comparison with non-secure channels),
// it is requested to be allocated differently by RM, etc.
static bool uvm_channel_pool_is_secure(uvm_channel_pool_t *pool)
static bool uvm_pool_type_is_valid(uvm_channel_pool_type_t pool_type)
{
return pool->secure;
}
static bool uvm_channel_is_secure(uvm_channel_t *channel)
{
return uvm_channel_pool_is_secure(channel->pool);
return (is_power_of_2(pool_type) && (pool_type < UVM_CHANNEL_POOL_TYPE_MASK));
}
static bool uvm_channel_pool_is_sec2(uvm_channel_pool_t *pool)
{
UVM_ASSERT(pool->pool_type < UVM_CHANNEL_POOL_TYPE_MASK);
UVM_ASSERT(uvm_pool_type_is_valid(pool->pool_type));
return (pool->pool_type == UVM_CHANNEL_POOL_TYPE_SEC2);
}
static bool uvm_channel_pool_is_secure_ce(uvm_channel_pool_t *pool)
{
return uvm_channel_pool_is_secure(pool) && uvm_channel_pool_is_ce(pool);
}
static bool uvm_channel_pool_is_wlc(uvm_channel_pool_t *pool)
{
UVM_ASSERT(pool->pool_type < UVM_CHANNEL_POOL_TYPE_MASK);
UVM_ASSERT(uvm_pool_type_is_valid(pool->pool_type));
return (pool->pool_type == UVM_CHANNEL_POOL_TYPE_WLC);
}
static bool uvm_channel_pool_is_lcic(uvm_channel_pool_t *pool)
{
UVM_ASSERT(pool->pool_type < UVM_CHANNEL_POOL_TYPE_MASK);
UVM_ASSERT(uvm_pool_type_is_valid(pool->pool_type));
return (pool->pool_type == UVM_CHANNEL_POOL_TYPE_LCIC);
}
@@ -501,11 +487,6 @@ static bool uvm_channel_is_sec2(uvm_channel_t *channel)
return uvm_channel_pool_is_sec2(channel->pool);
}
static bool uvm_channel_is_secure_ce(uvm_channel_t *channel)
{
return uvm_channel_pool_is_secure_ce(channel->pool);
}
static bool uvm_channel_is_wlc(uvm_channel_t *channel)
{
return uvm_channel_pool_is_wlc(channel->pool);
@@ -516,12 +497,13 @@ static bool uvm_channel_is_lcic(uvm_channel_t *channel)
return uvm_channel_pool_is_lcic(channel->pool);
}
bool uvm_channel_type_requires_secure_pool(uvm_gpu_t *gpu, uvm_channel_type_t channel_type);
NV_STATUS uvm_channel_secure_init(uvm_gpu_t *gpu, uvm_channel_t *channel);
uvm_channel_t *uvm_channel_lcic_get_paired_wlc(uvm_channel_t *lcic_channel);
uvm_channel_t *uvm_channel_wlc_get_paired_lcic(uvm_channel_t *wlc_channel);
static bool uvm_channel_pool_is_proxy(uvm_channel_pool_t *pool)
{
UVM_ASSERT(pool->pool_type < UVM_CHANNEL_POOL_TYPE_MASK);
UVM_ASSERT(uvm_pool_type_is_valid(pool->pool_type));
return pool->pool_type == UVM_CHANNEL_POOL_TYPE_CE_PROXY;
}
@@ -533,11 +515,7 @@ static bool uvm_channel_is_proxy(uvm_channel_t *channel)
static bool uvm_channel_pool_is_ce(uvm_channel_pool_t *pool)
{
UVM_ASSERT(pool->pool_type < UVM_CHANNEL_POOL_TYPE_MASK);
if (uvm_channel_pool_is_wlc(pool) || uvm_channel_pool_is_lcic(pool))
return true;
return (pool->pool_type == UVM_CHANNEL_POOL_TYPE_CE) || uvm_channel_pool_is_proxy(pool);
return !uvm_channel_pool_is_sec2(pool);
}
static bool uvm_channel_is_ce(uvm_channel_t *channel)
@@ -629,6 +607,11 @@ bool uvm_channel_is_value_completed(uvm_channel_t *channel, NvU64 value);
// Update and get the latest completed value by the channel
NvU64 uvm_channel_update_completed_value(uvm_channel_t *channel);
// Wait for the channel to idle
// It waits for anything that is running, but doesn't prevent new work from
// beginning.
NV_STATUS uvm_channel_wait(uvm_channel_t *channel);
// Select and reserve a channel with the specified type for a push
NV_STATUS uvm_channel_reserve_type(uvm_channel_manager_t *manager,
uvm_channel_type_t type,
@@ -643,6 +626,9 @@ NV_STATUS uvm_channel_reserve_gpu_to_gpu(uvm_channel_manager_t *channel_manager,
// Reserve a specific channel for a push or for a control GPFIFO entry.
NV_STATUS uvm_channel_reserve(uvm_channel_t *channel, NvU32 num_gpfifo_entries);
// Release reservation on a specific channel
void uvm_channel_release(uvm_channel_t *channel, NvU32 num_gpfifo_entries);
// Set optimal CE for P2P transfers between manager->gpu and peer
void uvm_channel_manager_set_p2p_ce(uvm_channel_manager_t *manager, uvm_gpu_t *peer, NvU32 optimal_ce);
@@ -674,11 +660,18 @@ NvU32 uvm_channel_get_available_gpfifo_entries(uvm_channel_t *channel);
void uvm_channel_print_pending_pushes(uvm_channel_t *channel);
bool uvm_channel_is_locked_for_push(uvm_channel_t *channel);
static uvm_gpu_t *uvm_channel_get_gpu(uvm_channel_t *channel)
{
return channel->pool->manager->gpu;
}
static uvm_pushbuffer_t *uvm_channel_get_pushbuffer(uvm_channel_t *channel)
{
return channel->pool->manager->pushbuffer;
}
// Index of a channel within the owning pool
static unsigned uvm_channel_index_in_pool(const uvm_channel_t *channel)
{

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2015-2022 NVIDIA Corporation
Copyright (c) 2015-2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -24,6 +24,7 @@
#include "uvm_global.h"
#include "uvm_channel.h"
#include "uvm_hal.h"
#include "uvm_mem.h"
#include "uvm_push.h"
#include "uvm_test.h"
#include "uvm_test_rng.h"
@@ -57,14 +58,14 @@ static NV_STATUS test_ordering(uvm_va_space_t *va_space)
const NvU32 values_count = iters_per_channel_type_per_gpu;
const size_t buffer_size = sizeof(NvU32) * values_count;
gpu = uvm_va_space_find_first_gpu(va_space);
TEST_CHECK_RET(gpu != NULL);
// TODO: Bug 3839176: the test is waived on Confidential Computing because
// it assumes that GPU can access system memory without using encryption.
if (uvm_conf_computing_mode_enabled(gpu))
if (g_uvm_global.conf_computing_enabled)
return NV_OK;
gpu = uvm_va_space_find_first_gpu(va_space);
TEST_CHECK_RET(gpu != NULL);
status = uvm_rm_mem_alloc_and_map_all(gpu, UVM_RM_MEM_TYPE_SYS, buffer_size, 0, &mem);
TEST_CHECK_GOTO(status == NV_OK, done);
@@ -84,7 +85,7 @@ static NV_STATUS test_ordering(uvm_va_space_t *va_space)
TEST_NV_CHECK_GOTO(uvm_tracker_add_push(&tracker, &push), done);
exclude_proxy_channel_type = uvm_gpu_uses_proxy_channel_pool(gpu);
exclude_proxy_channel_type = uvm_parent_gpu_needs_proxy_channel_pool(gpu->parent);
for (i = 0; i < iters_per_channel_type_per_gpu; ++i) {
for (j = 0; j < UVM_CHANNEL_TYPE_CE_COUNT; ++j) {
@@ -222,7 +223,7 @@ static NV_STATUS uvm_test_rc_for_gpu(uvm_gpu_t *gpu)
// Check RC on a proxy channel (SR-IOV heavy) or internal channel (any other
// mode). It is not allowed to use a virtual address in a memset pushed to
// a proxy channel, so we use a physical address instead.
if (uvm_gpu_uses_proxy_channel_pool(gpu)) {
if (uvm_parent_gpu_needs_proxy_channel_pool(gpu->parent)) {
uvm_gpu_address_t dst_address;
// Save the line number the push that's supposed to fail was started on
@@ -314,6 +315,110 @@ static NV_STATUS test_rc(uvm_va_space_t *va_space)
return NV_OK;
}
static NV_STATUS uvm_test_iommu_rc_for_gpu(uvm_gpu_t *gpu)
{
NV_STATUS status = NV_OK;
#if defined(NV_IOMMU_IS_DMA_DOMAIN_PRESENT) && defined(CONFIG_IOMMU_DEFAULT_DMA_STRICT)
// This test needs the DMA API to immediately invalidate IOMMU mappings on
// DMA unmap (as apposed to lazy invalidation). The policy can be changed
// on boot (e.g. iommu.strict=1), but there isn't a good way to check for
// the runtime setting. CONFIG_IOMMU_DEFAULT_DMA_STRICT checks for the
// default value.
uvm_push_t push;
uvm_mem_t *sysmem;
uvm_gpu_address_t sysmem_dma_addr;
char *cpu_ptr = NULL;
const size_t data_size = PAGE_SIZE;
size_t i;
struct device *dev = &gpu->parent->pci_dev->dev;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
// Check that the iommu domain is controlled by linux DMA API
if (!domain || !iommu_is_dma_domain(domain))
return NV_OK;
// Only run if ATS is enabled with 64kB base page.
// Otherwise the CE doesn't get response on writing to unmapped location.
if (!g_uvm_global.ats.enabled || PAGE_SIZE != UVM_PAGE_SIZE_64K)
return NV_OK;
status = uvm_mem_alloc_sysmem_and_map_cpu_kernel(data_size, NULL, &sysmem);
TEST_NV_CHECK_RET(status);
status = uvm_mem_map_gpu_phys(sysmem, gpu);
TEST_NV_CHECK_GOTO(status, done);
cpu_ptr = uvm_mem_get_cpu_addr_kernel(sysmem);
sysmem_dma_addr = uvm_mem_gpu_address_physical(sysmem, gpu, 0, data_size);
status = uvm_push_begin(gpu->channel_manager, UVM_CHANNEL_TYPE_GPU_TO_CPU, &push, "Test memset to IOMMU mapped sysmem");
TEST_NV_CHECK_GOTO(status, done);
gpu->parent->ce_hal->memset_8(&push, sysmem_dma_addr, 0, data_size);
status = uvm_push_end_and_wait(&push);
TEST_NV_CHECK_GOTO(status, done);
// Check that we have zeroed the memory
for (i = 0; i < data_size; ++i)
TEST_CHECK_GOTO(cpu_ptr[i] == 0, done);
// Unmap the buffer and try write again to the same address
uvm_mem_unmap_gpu_phys(sysmem, gpu);
status = uvm_push_begin(gpu->channel_manager, UVM_CHANNEL_TYPE_GPU_TO_CPU, &push, "Test memset after IOMMU unmap");
TEST_NV_CHECK_GOTO(status, done);
gpu->parent->ce_hal->memset_4(&push, sysmem_dma_addr, 0xffffffff, data_size);
status = uvm_push_end_and_wait(&push);
TEST_CHECK_GOTO(status == NV_ERR_RC_ERROR, done);
TEST_CHECK_GOTO(uvm_channel_get_status(push.channel) == NV_ERR_RC_ERROR, done);
TEST_CHECK_GOTO(uvm_global_reset_fatal_error() == NV_ERR_RC_ERROR, done);
// Check that writes after unmap did not succeed
for (i = 0; i < data_size; ++i)
TEST_CHECK_GOTO(cpu_ptr[i] == 0, done);
status = NV_OK;
done:
uvm_mem_free(sysmem);
#endif
return status;
}
static NV_STATUS test_iommu(uvm_va_space_t *va_space)
{
uvm_gpu_t *gpu;
uvm_assert_mutex_locked(&g_uvm_global.global_lock);
for_each_va_space_gpu(gpu, va_space) {
NV_STATUS test_status, create_status;
// The GPU channel manager is destroyed and then re-created after
// testing ATS RC fault, so this test requires exclusive access to the GPU.
TEST_CHECK_RET(uvm_gpu_retained_count(gpu) == 1);
g_uvm_global.disable_fatal_error_assert = true;
test_status = uvm_test_iommu_rc_for_gpu(gpu);
g_uvm_global.disable_fatal_error_assert = false;
uvm_channel_manager_destroy(gpu->channel_manager);
create_status = uvm_channel_manager_create(gpu, &gpu->channel_manager);
TEST_NV_CHECK_RET(test_status);
TEST_NV_CHECK_RET(create_status);
}
return NV_OK;
}
typedef struct
{
uvm_push_t push;
@@ -403,7 +508,7 @@ static uvm_channel_type_t random_ce_channel_type_except(uvm_test_rng_t *rng, uvm
static uvm_channel_type_t gpu_random_internal_ce_channel_type(uvm_gpu_t *gpu, uvm_test_rng_t *rng)
{
if (uvm_gpu_uses_proxy_channel_pool(gpu))
if (uvm_parent_gpu_needs_proxy_channel_pool(gpu->parent))
return random_ce_channel_type_except(rng, uvm_channel_proxy_channel_type());
return random_ce_channel_type(rng);
@@ -586,12 +691,16 @@ static NV_STATUS stress_test_all_gpus_in_va(uvm_va_space_t *va_space,
if (uvm_test_rng_range_32(&rng, 0, 1) == 0) {
NvU32 random_stream_index = uvm_test_rng_range_32(&rng, 0, num_streams - 1);
uvm_test_stream_t *random_stream = &streams[random_stream_index];
uvm_push_acquire_tracker(&stream->push, &random_stream->tracker);
snapshot_counter(&stream->push,
random_stream->counter_mem,
stream->other_stream_counter_snapshots_mem,
i,
random_stream->queued_counter_repeat);
if ((random_stream->push.gpu == gpu) || uvm_push_allow_dependencies_across_gpus()) {
uvm_push_acquire_tracker(&stream->push, &random_stream->tracker);
snapshot_counter(&stream->push,
random_stream->counter_mem,
stream->other_stream_counter_snapshots_mem,
i,
random_stream->queued_counter_repeat);
}
}
uvm_push_end(&stream->push);
@@ -681,9 +790,10 @@ done:
}
// The following test is inspired by uvm_push_test.c:test_concurrent_pushes.
// This test verifies that concurrent pushes using the same secure channel pool
// select different channels.
NV_STATUS test_secure_channel_selection(uvm_va_space_t *va_space)
// This test verifies that concurrent pushes using the same channel pool
// select different channels, when the Confidential Computing feature is
// enabled.
NV_STATUS test_conf_computing_channel_selection(uvm_va_space_t *va_space)
{
NV_STATUS status = NV_OK;
uvm_channel_pool_t *pool;
@@ -692,9 +802,7 @@ NV_STATUS test_secure_channel_selection(uvm_va_space_t *va_space)
NvU32 i;
NvU32 num_pushes;
gpu = uvm_va_space_find_first_gpu(va_space);
if (!uvm_conf_computing_mode_enabled(gpu))
if (!g_uvm_global.conf_computing_enabled)
return NV_OK;
uvm_thread_context_lock_disable_tracking();
@@ -703,9 +811,6 @@ NV_STATUS test_secure_channel_selection(uvm_va_space_t *va_space)
uvm_channel_type_t channel_type;
for (channel_type = 0; channel_type < UVM_CHANNEL_TYPE_COUNT; channel_type++) {
if (!uvm_channel_type_requires_secure_pool(gpu, channel_type))
continue;
pool = gpu->channel_manager->pool_to_use.default_for_type[channel_type];
TEST_CHECK_RET(pool != NULL);
@@ -748,6 +853,101 @@ error:
return status;
}
NV_STATUS test_channel_iv_rotation(uvm_va_space_t *va_space)
{
uvm_gpu_t *gpu;
if (!g_uvm_global.conf_computing_enabled)
return NV_OK;
for_each_va_space_gpu(gpu, va_space) {
uvm_channel_pool_t *pool;
uvm_for_each_pool(pool, gpu->channel_manager) {
NvU64 before_rotation_enc, before_rotation_dec, after_rotation_enc, after_rotation_dec;
NV_STATUS status = NV_OK;
// Check one (the first) channel per pool
uvm_channel_t *channel = pool->channels;
// Create a dummy encrypt/decrypt push to use few IVs.
// SEC2 used encrypt during initialization, no need to use a dummy
// push.
if (!uvm_channel_is_sec2(channel)) {
uvm_push_t push;
size_t data_size;
uvm_conf_computing_dma_buffer_t *cipher_text;
void *cipher_cpu_va, *plain_cpu_va, *tag_cpu_va;
uvm_gpu_address_t cipher_gpu_address, plain_gpu_address, tag_gpu_address;
uvm_channel_t *work_channel = uvm_channel_is_lcic(channel) ? uvm_channel_lcic_get_paired_wlc(channel) : channel;
plain_cpu_va = &status;
data_size = sizeof(status);
TEST_NV_CHECK_RET(uvm_conf_computing_dma_buffer_alloc(&gpu->conf_computing.dma_buffer_pool,
&cipher_text,
NULL));
cipher_cpu_va = uvm_mem_get_cpu_addr_kernel(cipher_text->alloc);
tag_cpu_va = uvm_mem_get_cpu_addr_kernel(cipher_text->auth_tag);
cipher_gpu_address = uvm_mem_gpu_address_virtual_kernel(cipher_text->alloc, gpu);
tag_gpu_address = uvm_mem_gpu_address_virtual_kernel(cipher_text->auth_tag, gpu);
TEST_NV_CHECK_GOTO(uvm_push_begin_on_channel(work_channel, &push, "Dummy push for IV rotation"), free);
(void)uvm_push_get_single_inline_buffer(&push,
data_size,
UVM_CONF_COMPUTING_BUF_ALIGNMENT,
&plain_gpu_address);
uvm_conf_computing_cpu_encrypt(work_channel, cipher_cpu_va, plain_cpu_va, NULL, data_size, tag_cpu_va);
gpu->parent->ce_hal->decrypt(&push, plain_gpu_address, cipher_gpu_address, data_size, tag_gpu_address);
TEST_NV_CHECK_GOTO(uvm_push_end_and_wait(&push), free);
free:
uvm_conf_computing_dma_buffer_free(&gpu->conf_computing.dma_buffer_pool, cipher_text, NULL);
if (status != NV_OK)
return status;
}
// Reserve a channel to hold the push lock during rotation
if (!uvm_channel_is_lcic(channel))
TEST_NV_CHECK_RET(uvm_channel_reserve(channel, 1));
uvm_conf_computing_query_message_pools(channel, &before_rotation_enc, &before_rotation_dec);
TEST_NV_CHECK_GOTO(uvm_conf_computing_rotate_channel_ivs_below_limit(channel, -1, true), release);
uvm_conf_computing_query_message_pools(channel, &after_rotation_enc, &after_rotation_dec);
release:
if (!uvm_channel_is_lcic(channel))
uvm_channel_release(channel, 1);
if (status != NV_OK)
return status;
// All channels except SEC2 used at least a single IV to release tracking.
// SEC2 doesn't support decrypt direction.
if (uvm_channel_is_sec2(channel))
TEST_CHECK_RET(before_rotation_dec == after_rotation_dec);
else
TEST_CHECK_RET(before_rotation_dec < after_rotation_dec);
// All channels used one CPU encrypt/GPU decrypt, either during
// initialization or in the push above, with the exception of LCIC.
// LCIC is used in tandem with WLC, but it never uses CPU encrypt/
// GPU decrypt ops.
if (uvm_channel_is_lcic(channel))
TEST_CHECK_RET(before_rotation_enc == after_rotation_enc);
else
TEST_CHECK_RET(before_rotation_enc < after_rotation_enc);
}
}
return NV_OK;
}
NV_STATUS test_write_ctrl_gpfifo_noop(uvm_va_space_t *va_space)
{
uvm_gpu_t *gpu;
@@ -847,11 +1047,9 @@ NV_STATUS test_write_ctrl_gpfifo_tight(uvm_va_space_t *va_space)
NvU64 entry;
uvm_push_t push;
gpu = uvm_va_space_find_first_gpu(va_space);
// TODO: Bug 3839176: the test is waived on Confidential Computing because
// it assumes that GPU can access system memory without using encryption.
if (uvm_conf_computing_mode_enabled(gpu))
if (g_uvm_global.conf_computing_enabled)
return NV_OK;
for_each_va_space_gpu(gpu, va_space) {
@@ -926,7 +1124,7 @@ static NV_STATUS test_channel_pushbuffer_extension_base(uvm_va_space_t *va_space
uvm_channel_manager_t *manager;
uvm_channel_pool_t *pool;
if (!uvm_gpu_has_pushbuffer_segments(gpu))
if (!uvm_parent_gpu_needs_pushbuffer_segments(gpu->parent))
continue;
// The GPU channel manager pushbuffer is destroyed and then re-created
@@ -997,7 +1195,11 @@ NV_STATUS uvm_test_channel_sanity(UVM_TEST_CHANNEL_SANITY_PARAMS *params, struct
if (status != NV_OK)
goto done;
status = test_secure_channel_selection(va_space);
status = test_conf_computing_channel_selection(va_space);
if (status != NV_OK)
goto done;
status = test_channel_iv_rotation(va_space);
if (status != NV_OK)
goto done;
@@ -1021,6 +1223,10 @@ NV_STATUS uvm_test_channel_sanity(UVM_TEST_CHANNEL_SANITY_PARAMS *params, struct
goto done;
}
status = test_iommu(va_space);
if (status != NV_OK)
goto done;
done:
uvm_va_space_up_read_rm(va_space);
uvm_mutex_unlock(&g_uvm_global.global_lock);
@@ -1036,23 +1242,22 @@ static NV_STATUS uvm_test_channel_stress_stream(uvm_va_space_t *va_space,
if (params->iterations == 0 || params->num_streams == 0)
return NV_ERR_INVALID_PARAMETER;
// TODO: Bug 3839176: the test is waived on Confidential Computing because
// it assumes that GPU can access system memory without using encryption.
if (g_uvm_global.conf_computing_enabled)
return NV_OK;
// TODO: Bug 1764963: Rework the test to not rely on the global lock as that
// serializes all the threads calling this at the same time.
uvm_mutex_lock(&g_uvm_global.global_lock);
uvm_va_space_down_read_rm(va_space);
// TODO: Bug 3839176: the test is waived on Confidential Computing because
// it assumes that GPU can access system memory without using encryption.
if (uvm_conf_computing_mode_enabled(uvm_va_space_find_first_gpu(va_space)))
goto done;
status = stress_test_all_gpus_in_va(va_space,
params->num_streams,
params->iterations,
params->seed,
params->verbose);
done:
uvm_va_space_up_read_rm(va_space);
uvm_mutex_unlock(&g_uvm_global.global_lock);

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2013-2021 NVIDIA Corporation
Copyright (c) 2013-2023 NVIDIA Corporation
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
@@ -233,18 +233,6 @@ unsigned uvm_get_stale_thread_id(void)
return (unsigned)task_pid_vnr(current);
}
//
// A simple security rule for allowing access to UVM user space memory: if you
// are the same user as the owner of the memory, or if you are root, then you
// are granted access. The idea is to allow debuggers and profilers to work, but
// without opening up any security holes.
//
NvBool uvm_user_id_security_check(uid_t euidTarget)
{
return (NV_CURRENT_EUID() == euidTarget) ||
(UVM_ROOT_UID == euidTarget);
}
void on_uvm_test_fail(void)
{
(void)NULL;
@@ -330,10 +318,11 @@ int format_uuid_to_buffer(char *buffer, unsigned bufferLength, const NvProcessor
unsigned i;
unsigned dashMask = 1 << 4 | 1 << 6 | 1 << 8 | 1 << 10;
memcpy(buffer, "UVM-GPU-", 8);
if (bufferLength < (8 /*prefix*/+ 16 * 2 /*digits*/ + 4 * 1 /*dashes*/ + 1 /*null*/))
return *buffer = 0;
memcpy(buffer, "UVM-GPU-", 8);
for (i = 0; i < 16; i++) {
*str++ = uvm_digit_to_hex(pUuidStruct->uuid[i] >> 4);
*str++ = uvm_digit_to_hex(pUuidStruct->uuid[i] & 0xF);

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2013-2021 NVIDIA Corporation
Copyright (c) 2013-2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -21,8 +21,8 @@
*******************************************************************************/
#ifndef _UVM_COMMON_H
#define _UVM_COMMON_H
#ifndef __UVM_COMMON_H__
#define __UVM_COMMON_H__
#ifdef DEBUG
#define UVM_IS_DEBUG() 1
@@ -204,13 +204,6 @@ extern bool uvm_release_asserts_set_global_error_for_tests;
#define UVM_ASSERT_MSG_RELEASE(expr, fmt, ...) _UVM_ASSERT_MSG_RELEASE(expr, #expr, ": " fmt, ##__VA_ARGS__)
#define UVM_ASSERT_RELEASE(expr) _UVM_ASSERT_MSG_RELEASE(expr, #expr, "\n")
// Provide a short form of UUID's, typically for use in debug printing:
#define ABBREV_UUID(uuid) (unsigned)(uuid)
static inline NvBool uvm_uuid_is_cpu(const NvProcessorUuid *uuid)
{
return memcmp(uuid, &NV_PROCESSOR_UUID_CPU_DEFAULT, sizeof(*uuid)) == 0;
}
#define UVM_SIZE_1KB (1024ULL)
#define UVM_SIZE_1MB (1024 * UVM_SIZE_1KB)
#define UVM_SIZE_1GB (1024 * UVM_SIZE_1MB)
@@ -282,9 +275,6 @@ static inline void kmem_cache_destroy_safe(struct kmem_cache **ppCache)
}
}
static const uid_t UVM_ROOT_UID = 0;
typedef struct
{
NvU64 start_time_ns;
@@ -335,7 +325,6 @@ NV_STATUS errno_to_nv_status(int errnoCode);
int nv_status_to_errno(NV_STATUS status);
unsigned uvm_get_stale_process_id(void);
unsigned uvm_get_stale_thread_id(void);
NvBool uvm_user_id_security_check(uid_t euidTarget);
extern int uvm_enable_builtin_tests;
@@ -413,4 +402,40 @@ static inline void uvm_touch_page(struct page *page)
// Return true if the VMA is one used by UVM managed allocations.
bool uvm_vma_is_managed(struct vm_area_struct *vma);
#endif /* _UVM_COMMON_H */
static bool uvm_platform_uses_canonical_form_address(void)
{
if (NVCPU_IS_PPC64LE)
return false;
return true;
}
// Similar to the GPU MMU HAL num_va_bits(), it returns the CPU's num_va_bits().
static NvU32 uvm_cpu_num_va_bits(void)
{
return fls64(TASK_SIZE - 1) + 1;
}
// Return the unaddressable range in a num_va_bits-wide VA space, [first, outer)
static void uvm_get_unaddressable_range(NvU32 num_va_bits, NvU64 *first, NvU64 *outer)
{
UVM_ASSERT(num_va_bits < 64);
UVM_ASSERT(first);
UVM_ASSERT(outer);
if (uvm_platform_uses_canonical_form_address()) {
*first = 1ULL << (num_va_bits - 1);
*outer = (NvU64)((NvS64)(1ULL << 63) >> (64 - num_va_bits));
}
else {
*first = 1ULL << num_va_bits;
*outer = ~0Ull;
}
}
static void uvm_cpu_get_unaddressable_range(NvU64 *first, NvU64 *outer)
{
return uvm_get_unaddressable_range(uvm_cpu_num_va_bits(), first, outer);
}
#endif /* __UVM_COMMON_H__ */

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2021 NVIDIA Corporation
Copyright (c) 2021-2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -26,51 +26,62 @@
#include "uvm_conf_computing.h"
#include "uvm_kvmalloc.h"
#include "uvm_gpu.h"
#include "uvm_hal.h"
#include "uvm_mem.h"
#include "uvm_processors.h"
#include "uvm_tracker.h"
#include "nv_uvm_interface.h"
#include "uvm_va_block.h"
// The maximum number of secure operations per push is:
// UVM_MAX_PUSH_SIZE / min(CE encryption size, CE decryption size)
// + 1 (tracking semaphore) = 128 * 1024 / 56 + 1 = 2342
#define UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_MIN 2342lu
// Channels use 32-bit counters so the value after rotation is 0xffffffff.
// setting the limit to this value (or higher) will result in rotation
// on every check. However, pre-emptive rotation when submitting control
// GPFIFO entries relies on the fact that multiple successive checks after
// rotation do not trigger more rotations if there was no IV used in between.
#define UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_MAX 0xfffffffelu
// Attempt rotation when two billion IVs are left. IV rotation call can fail if
// the necessary locks are not available, so multiple attempts may be need for
// IV rotation to succeed.
#define UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_DEFAULT (1lu << 31)
// Start rotating after 500 encryption/decryptions when running tests.
#define UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_TESTS ((1lu << 32) - 500lu)
static ulong uvm_conf_computing_channel_iv_rotation_limit = UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_DEFAULT;
module_param(uvm_conf_computing_channel_iv_rotation_limit, ulong, S_IRUGO);
static UvmGpuConfComputeMode uvm_conf_computing_get_mode(const uvm_parent_gpu_t *parent)
{
return parent->rm_info.gpuConfComputeCaps.mode;
}
bool uvm_conf_computing_mode_enabled_parent(const uvm_parent_gpu_t *parent)
{
return uvm_conf_computing_get_mode(parent) != UVM_GPU_CONF_COMPUTE_MODE_NONE;
}
bool uvm_conf_computing_mode_enabled(const uvm_gpu_t *gpu)
{
return uvm_conf_computing_mode_enabled_parent(gpu->parent);
}
bool uvm_conf_computing_mode_is_hcc(const uvm_gpu_t *gpu)
{
return uvm_conf_computing_get_mode(gpu->parent) == UVM_GPU_CONF_COMPUTE_MODE_HCC;
}
NV_STATUS uvm_conf_computing_init_parent_gpu(const uvm_parent_gpu_t *parent)
void uvm_conf_computing_check_parent_gpu(const uvm_parent_gpu_t *parent)
{
UvmGpuConfComputeMode cc, sys_cc;
uvm_gpu_t *first;
uvm_parent_gpu_t *other_parent;
UvmGpuConfComputeMode parent_mode = uvm_conf_computing_get_mode(parent);
uvm_assert_mutex_locked(&g_uvm_global.global_lock);
// TODO: Bug 2844714.
// Since we have no routine to traverse parent gpus,
// find first child GPU and get its parent.
first = uvm_global_processor_mask_find_first_gpu(&g_uvm_global.retained_gpus);
if (!first)
return NV_OK;
// The Confidential Computing state of the GPU should match that of the
// system.
UVM_ASSERT((parent_mode != UVM_GPU_CONF_COMPUTE_MODE_NONE) == g_uvm_global.conf_computing_enabled);
sys_cc = uvm_conf_computing_get_mode(first->parent);
cc = uvm_conf_computing_get_mode(parent);
return cc == sys_cc ? NV_OK : NV_ERR_NOT_SUPPORTED;
// All GPUs derive Confidential Computing status from their parent. By
// current policy all parent GPUs have identical Confidential Computing
// status.
for_each_parent_gpu(other_parent)
UVM_ASSERT(parent_mode == uvm_conf_computing_get_mode(other_parent));
}
static void dma_buffer_destroy_locked(uvm_conf_computing_dma_buffer_pool_t *dma_buffer_pool,
@@ -184,15 +195,11 @@ static void dma_buffer_pool_add(uvm_conf_computing_dma_buffer_pool_t *dma_buffer
static NV_STATUS conf_computing_dma_buffer_pool_init(uvm_conf_computing_dma_buffer_pool_t *dma_buffer_pool)
{
size_t i;
uvm_gpu_t *gpu;
size_t num_dma_buffers = 32;
NV_STATUS status = NV_OK;
UVM_ASSERT(dma_buffer_pool->num_dma_buffers == 0);
gpu = dma_buffer_pool_to_gpu(dma_buffer_pool);
UVM_ASSERT(uvm_conf_computing_mode_enabled(gpu));
UVM_ASSERT(g_uvm_global.conf_computing_enabled);
INIT_LIST_HEAD(&dma_buffer_pool->free_dma_buffers);
uvm_mutex_init(&dma_buffer_pool->lock, UVM_LOCK_ORDER_CONF_COMPUTING_DMA_BUFFER_POOL);
@@ -349,7 +356,7 @@ NV_STATUS uvm_conf_computing_gpu_init(uvm_gpu_t *gpu)
{
NV_STATUS status;
if (!uvm_conf_computing_mode_enabled(gpu))
if (!g_uvm_global.conf_computing_enabled)
return NV_OK;
status = conf_computing_dma_buffer_pool_init(&gpu->conf_computing.dma_buffer_pool);
@@ -360,6 +367,20 @@ NV_STATUS uvm_conf_computing_gpu_init(uvm_gpu_t *gpu)
if (status != NV_OK)
goto error;
if (uvm_enable_builtin_tests && uvm_conf_computing_channel_iv_rotation_limit == UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_DEFAULT)
uvm_conf_computing_channel_iv_rotation_limit = UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_TESTS;
if (uvm_conf_computing_channel_iv_rotation_limit < UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_MIN ||
uvm_conf_computing_channel_iv_rotation_limit > UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_MAX) {
UVM_ERR_PRINT("Value of uvm_conf_computing_channel_iv_rotation_limit: %lu is outside of the safe "
"range: <%lu, %lu>. Using the default value instead (%lu)\n",
uvm_conf_computing_channel_iv_rotation_limit,
UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_MIN,
UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_MAX,
UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_DEFAULT);
uvm_conf_computing_channel_iv_rotation_limit = UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_DEFAULT;
}
return NV_OK;
error:
@@ -378,11 +399,11 @@ void uvm_conf_computing_log_gpu_encryption(uvm_channel_t *channel, UvmCslIv *iv)
NV_STATUS status;
uvm_mutex_lock(&channel->csl.ctx_lock);
status = nvUvmInterfaceCslLogDeviceEncryption(&channel->csl.ctx, iv);
status = nvUvmInterfaceCslIncrementIv(&channel->csl.ctx, UVM_CSL_OPERATION_DECRYPT, 1, iv);
uvm_mutex_unlock(&channel->csl.ctx_lock);
// nvUvmInterfaceLogDeviceEncryption fails when a 64-bit encryption counter
// overflows. This is not supposed to happen on CC.
// IV rotation is done preemptively as needed, so the above
// call cannot return failure.
UVM_ASSERT(status == NV_OK);
}
@@ -391,11 +412,11 @@ void uvm_conf_computing_acquire_encryption_iv(uvm_channel_t *channel, UvmCslIv *
NV_STATUS status;
uvm_mutex_lock(&channel->csl.ctx_lock);
status = nvUvmInterfaceCslAcquireEncryptionIv(&channel->csl.ctx, iv);
status = nvUvmInterfaceCslIncrementIv(&channel->csl.ctx, UVM_CSL_OPERATION_ENCRYPT, 1, iv);
uvm_mutex_unlock(&channel->csl.ctx_lock);
// nvUvmInterfaceLogDeviceEncryption fails when a 64-bit encryption counter
// overflows. This is not supposed to happen on CC.
// IV rotation is done preemptively as needed, so the above
// call cannot return failure.
UVM_ASSERT(status == NV_OK);
}
@@ -419,8 +440,8 @@ void uvm_conf_computing_cpu_encrypt(uvm_channel_t *channel,
(NvU8 *) auth_tag_buffer);
uvm_mutex_unlock(&channel->csl.ctx_lock);
// nvUvmInterfaceCslEncrypt fails when a 64-bit encryption counter
// overflows. This is not supposed to happen on CC.
// IV rotation is done preemptively as needed, so the above
// call cannot return failure.
UVM_ASSERT(status == NV_OK);
}
@@ -433,14 +454,174 @@ NV_STATUS uvm_conf_computing_cpu_decrypt(uvm_channel_t *channel,
{
NV_STATUS status;
// The CSL context associated with a channel can be used by multiple
// threads. The IV sequence is thus guaranteed only while the channel is
// "locked for push". The channel/push lock is released in
// "uvm_channel_end_push", and at that time the GPU encryption operations
// have not executed, yet. Therefore the caller has to use
// "uvm_conf_computing_log_gpu_encryption" to explicitly store IVs needed
// to perform CPU decryption and pass those IVs to this function after the
// push that did the encryption completes.
UVM_ASSERT(src_iv);
uvm_mutex_lock(&channel->csl.ctx_lock);
status = nvUvmInterfaceCslDecrypt(&channel->csl.ctx,
size,
(const NvU8 *) src_cipher,
src_iv,
(NvU8 *) dst_plain,
NULL,
0,
(const NvU8 *) auth_tag_buffer);
uvm_mutex_unlock(&channel->csl.ctx_lock);
return status;
}
NV_STATUS uvm_conf_computing_fault_decrypt(uvm_parent_gpu_t *parent_gpu,
void *dst_plain,
const void *src_cipher,
const void *auth_tag_buffer,
NvU8 valid)
{
NV_STATUS status;
// There is no dedicated lock for the CSL context associated with replayable
// faults. The mutual exclusion required by the RM CSL API is enforced by
// relying on the GPU replayable service lock (ISR lock), since fault
// decryption is invoked as part of fault servicing.
UVM_ASSERT(uvm_sem_is_locked(&parent_gpu->isr.replayable_faults.service_lock));
UVM_ASSERT(g_uvm_global.conf_computing_enabled);
status = nvUvmInterfaceCslDecrypt(&parent_gpu->fault_buffer_info.rm_info.replayable.cslCtx,
parent_gpu->fault_buffer_hal->entry_size(parent_gpu),
(const NvU8 *) src_cipher,
NULL,
(NvU8 *) dst_plain,
&valid,
sizeof(valid),
(const NvU8 *) auth_tag_buffer);
if (status != NV_OK)
UVM_ERR_PRINT("nvUvmInterfaceCslDecrypt() failed: %s, GPU %s\n",
nvstatusToString(status),
uvm_parent_gpu_name(parent_gpu));
return status;
}
void uvm_conf_computing_fault_increment_decrypt_iv(uvm_parent_gpu_t *parent_gpu, NvU64 increment)
{
NV_STATUS status;
// See comment in uvm_conf_computing_fault_decrypt
UVM_ASSERT(uvm_sem_is_locked(&parent_gpu->isr.replayable_faults.service_lock));
UVM_ASSERT(g_uvm_global.conf_computing_enabled);
status = nvUvmInterfaceCslIncrementIv(&parent_gpu->fault_buffer_info.rm_info.replayable.cslCtx,
UVM_CSL_OPERATION_DECRYPT,
increment,
NULL);
UVM_ASSERT(status == NV_OK);
}
void uvm_conf_computing_query_message_pools(uvm_channel_t *channel,
NvU64 *remaining_encryptions,
NvU64 *remaining_decryptions)
{
NV_STATUS status;
UVM_ASSERT(channel);
UVM_ASSERT(remaining_encryptions);
UVM_ASSERT(remaining_decryptions);
uvm_mutex_lock(&channel->csl.ctx_lock);
status = nvUvmInterfaceCslQueryMessagePool(&channel->csl.ctx, UVM_CSL_OPERATION_ENCRYPT, remaining_encryptions);
UVM_ASSERT(status == NV_OK);
UVM_ASSERT(*remaining_encryptions <= NV_U32_MAX);
status = nvUvmInterfaceCslQueryMessagePool(&channel->csl.ctx, UVM_CSL_OPERATION_DECRYPT, remaining_decryptions);
UVM_ASSERT(status == NV_OK);
UVM_ASSERT(*remaining_decryptions <= NV_U32_MAX);
// LCIC channels never use CPU encrypt/GPU decrypt
if (uvm_channel_is_lcic(channel))
UVM_ASSERT(*remaining_encryptions == NV_U32_MAX);
uvm_mutex_unlock(&channel->csl.ctx_lock);
}
static NV_STATUS uvm_conf_computing_rotate_channel_ivs_below_limit_internal(uvm_channel_t *channel, NvU64 limit)
{
NV_STATUS status = NV_OK;
NvU64 remaining_encryptions, remaining_decryptions;
bool rotate_encryption_iv, rotate_decryption_iv;
UVM_ASSERT(uvm_channel_is_locked_for_push(channel) ||
(uvm_channel_is_lcic(channel) && uvm_channel_manager_is_wlc_ready(channel->pool->manager)));
uvm_conf_computing_query_message_pools(channel, &remaining_encryptions, &remaining_decryptions);
// Ignore decryption limit for SEC2, only CE channels support
// GPU encrypt/CPU decrypt. However, RM reports _some_ decrementing
// value for SEC2 decryption counter.
rotate_decryption_iv = (remaining_decryptions <= limit) && uvm_channel_is_ce(channel);
rotate_encryption_iv = remaining_encryptions <= limit;
if (!rotate_encryption_iv && !rotate_decryption_iv)
return NV_OK;
// Wait for all in-flight pushes. The caller needs to guarantee that there
// are no concurrent pushes created, e.g. by only calling rotate after
// a channel is locked_for_push.
status = uvm_channel_wait(channel);
if (status != NV_OK)
return status;
uvm_mutex_lock(&channel->csl.ctx_lock);
if (rotate_encryption_iv)
status = nvUvmInterfaceCslRotateIv(&channel->csl.ctx, UVM_CSL_OPERATION_ENCRYPT);
if (status == NV_OK && rotate_decryption_iv)
status = nvUvmInterfaceCslRotateIv(&channel->csl.ctx, UVM_CSL_OPERATION_DECRYPT);
uvm_mutex_unlock(&channel->csl.ctx_lock);
// Change the error to out of resources if the available IVs are running
// too low
if (status == NV_ERR_STATE_IN_USE &&
(remaining_encryptions < UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_MIN ||
remaining_decryptions < UVM_CONF_COMPUTING_IV_REMAINING_LIMIT_MIN))
return NV_ERR_INSUFFICIENT_RESOURCES;
return status;
}
NV_STATUS uvm_conf_computing_rotate_channel_ivs_below_limit(uvm_channel_t *channel, NvU64 limit, bool retry_if_busy)
{
NV_STATUS status;
do {
status = uvm_conf_computing_rotate_channel_ivs_below_limit_internal(channel, limit);
} while (retry_if_busy && status == NV_ERR_STATE_IN_USE);
// Hide "busy" error. The rotation will be retried at the next opportunity.
if (!retry_if_busy && status == NV_ERR_STATE_IN_USE)
status = NV_OK;
return status;
}
NV_STATUS uvm_conf_computing_maybe_rotate_channel_ivs(uvm_channel_t *channel)
{
return uvm_conf_computing_rotate_channel_ivs_below_limit(channel, uvm_conf_computing_channel_iv_rotation_limit, false);
}
NV_STATUS uvm_conf_computing_maybe_rotate_channel_ivs_retry_busy(uvm_channel_t *channel)
{
return uvm_conf_computing_rotate_channel_ivs_below_limit(channel, uvm_conf_computing_channel_iv_rotation_limit, true);
}

View File

@@ -42,9 +42,11 @@
// Use sizeof(UvmCslIv) to refer to the IV size.
#define UVM_CONF_COMPUTING_IV_ALIGNMENT 16
// SEC2 decrypt operation buffers are required to be 16-bytes aligned. CE
// encrypt/decrypt can be unaligned if the buffer lies in a single 32B segment.
// Otherwise, they need to be 32B aligned.
// SEC2 decrypt operation buffers are required to be 16-bytes aligned.
#define UVM_CONF_COMPUTING_SEC2_BUF_ALIGNMENT 16
// CE encrypt/decrypt can be unaligned if the entire buffer lies in a single
// 32B segment. Otherwise, it needs to be 32B aligned.
#define UVM_CONF_COMPUTING_BUF_ALIGNMENT 32
#define UVM_CONF_COMPUTING_DMA_BUFFER_SIZE UVM_VA_BLOCK_SIZE
@@ -58,12 +60,8 @@
// UVM_METHOD_SIZE * 2 * 10 = 80.
#define UVM_CONF_COMPUTING_SIGN_BUF_MAX_SIZE 80
// All GPUs derive confidential computing status from their parent.
// By current policy all parent GPUs have identical confidential
// computing status.
NV_STATUS uvm_conf_computing_init_parent_gpu(const uvm_parent_gpu_t *parent);
bool uvm_conf_computing_mode_enabled_parent(const uvm_parent_gpu_t *parent);
bool uvm_conf_computing_mode_enabled(const uvm_gpu_t *gpu);
void uvm_conf_computing_check_parent_gpu(const uvm_parent_gpu_t *parent);
bool uvm_conf_computing_mode_is_hcc(const uvm_gpu_t *gpu);
typedef struct
@@ -175,4 +173,45 @@ NV_STATUS uvm_conf_computing_cpu_decrypt(uvm_channel_t *channel,
const UvmCslIv *src_iv,
size_t size,
const void *auth_tag_buffer);
// CPU decryption of a single replayable fault, encrypted by GSP-RM.
//
// Replayable fault decryption depends not only on the encrypted fault contents,
// and the authentication tag, but also on the plaintext valid bit associated
// with the fault.
//
// When decrypting data previously encrypted by the Copy Engine, use
// uvm_conf_computing_cpu_decrypt instead.
//
// Locking: this function must be invoked while holding the replayable ISR lock.
NV_STATUS uvm_conf_computing_fault_decrypt(uvm_parent_gpu_t *parent_gpu,
void *dst_plain,
const void *src_cipher,
const void *auth_tag_buffer,
NvU8 valid);
// Increment the CPU-side decrypt IV of the CSL context associated with
// replayable faults. The function is a no-op if the given increment is zero.
//
// The IV associated with a fault CSL context is a 64-bit counter.
//
// Locking: this function must be invoked while holding the replayable ISR lock.
void uvm_conf_computing_fault_increment_decrypt_iv(uvm_parent_gpu_t *parent_gpu, NvU64 increment);
// Query the number of remaining messages before IV needs to be rotated.
void uvm_conf_computing_query_message_pools(uvm_channel_t *channel,
NvU64 *remaining_encryptions,
NvU64 *remaining_decryptions);
// Check if there are more than uvm_conf_computing_channel_iv_rotation_limit
// messages available in the channel and try to rotate if not.
NV_STATUS uvm_conf_computing_maybe_rotate_channel_ivs(uvm_channel_t *channel);
// Check if there are more than uvm_conf_computing_channel_iv_rotation_limit
// messages available in the channel and rotate if not.
NV_STATUS uvm_conf_computing_maybe_rotate_channel_ivs_retry_busy(uvm_channel_t *channel);
// Check if there are fewer than 'limit' messages available in either direction
// and rotate if not.
NV_STATUS uvm_conf_computing_rotate_channel_ivs_below_limit(uvm_channel_t *channel, NvU64 limit, bool retry_if_busy);
#endif // __UVM_CONF_COMPUTING_H__

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2016-2019 NVIDIA Corporation
Copyright (c) 2016-2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -34,24 +34,27 @@ NV_STATUS uvm_test_fault_buffer_flush(UVM_TEST_FAULT_BUFFER_FLUSH_PARAMS *params
NV_STATUS status = NV_OK;
uvm_va_space_t *va_space = uvm_va_space_get(filp);
uvm_gpu_t *gpu;
uvm_global_processor_mask_t retained_gpus;
uvm_processor_mask_t *retained_gpus;
NvU64 i;
uvm_global_processor_mask_zero(&retained_gpus);
retained_gpus = uvm_processor_mask_cache_alloc();
if (!retained_gpus)
return NV_ERR_NO_MEMORY;
uvm_processor_mask_zero(retained_gpus);
uvm_va_space_down_read(va_space);
for_each_va_space_gpu(gpu, va_space) {
if (gpu->parent->replayable_faults_supported)
uvm_global_processor_mask_set(&retained_gpus, gpu->global_id);
}
uvm_processor_mask_and(retained_gpus, &va_space->faultable_processors, &va_space->registered_gpus);
uvm_global_mask_retain(&retained_gpus);
uvm_global_gpu_retain(retained_gpus);
uvm_va_space_up_read(va_space);
if (uvm_global_processor_mask_empty(&retained_gpus))
return NV_ERR_INVALID_DEVICE;
if (uvm_processor_mask_empty(retained_gpus)) {
status = NV_ERR_INVALID_DEVICE;
goto out;
}
for (i = 0; i < params->iterations; i++) {
if (fatal_signal_pending(current)) {
@@ -59,11 +62,12 @@ NV_STATUS uvm_test_fault_buffer_flush(UVM_TEST_FAULT_BUFFER_FLUSH_PARAMS *params
break;
}
for_each_global_gpu_in_mask(gpu, &retained_gpus)
for_each_gpu_in_mask(gpu, retained_gpus)
TEST_CHECK_GOTO(uvm_gpu_fault_buffer_flush(gpu) == NV_OK, out);
}
out:
uvm_global_mask_release(&retained_gpus);
uvm_global_gpu_release(retained_gpus);
uvm_processor_mask_cache_free(retained_gpus);
return status;
}

View File

@@ -50,6 +50,7 @@ typedef struct uvm_channel_struct uvm_channel_t;
typedef struct uvm_user_channel_struct uvm_user_channel_t;
typedef struct uvm_push_struct uvm_push_t;
typedef struct uvm_push_info_struct uvm_push_info_t;
typedef struct uvm_push_crypto_bundle_struct uvm_push_crypto_bundle_t;
typedef struct uvm_push_acquire_info_struct uvm_push_acquire_info_t;
typedef struct uvm_pushbuffer_struct uvm_pushbuffer_t;
typedef struct uvm_gpfifo_entry_struct uvm_gpfifo_entry_t;

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
Copyright (c) 2016-2021 NVidia Corporation
Copyright (c) 2016-2023 NVIDIA Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
@@ -168,7 +168,8 @@ static NV_STATUS test_get_rm_ptes_single_gpu(uvm_va_space_t *va_space, UVM_TEST_
client = params->hClient;
memory = params->hMemory;
// Note: This check is safe as single GPU test does not run on SLI enabled devices.
// Note: This check is safe as single GPU test does not run on SLI enabled
// devices.
memory_mapping_gpu = uvm_va_space_get_gpu_by_uuid_with_gpu_va_space(va_space, &params->gpu_uuid);
if (!memory_mapping_gpu)
return NV_ERR_INVALID_DEVICE;
@@ -180,7 +181,7 @@ static NV_STATUS test_get_rm_ptes_single_gpu(uvm_va_space_t *va_space, UVM_TEST_
if (status != NV_OK)
return status;
TEST_CHECK_GOTO(uvm_processor_uuid_eq(&memory_info.uuid, &params->gpu_uuid), done);
TEST_CHECK_GOTO(uvm_uuid_eq(&memory_info.uuid, &params->gpu_uuid), done);
TEST_CHECK_GOTO((memory_info.size == params->size), done);

Some files were not shown because too many files have changed in this diff Show More