forked from KhronosGroup/Vulkan-Samples
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Arm Release v1.3.0 (KhronosGroup#24)
Updated liceneses and sample CMakeLists Updated sample tutorials to be more clear about mali specific content Ordering samples now removes duplicates from the order list Render granularity only displays the first time, and it is now calculated correctly Added missing headers to CMake, variant definitions are now OO, removed superfluous dynamic code Shader program removed and reverted back inside pipeline layout Removed FAQ to be added as a seperate MR Added contents to memory limits document Updated depth format function and integrated it into all existing depth buffer images
- Loading branch information
1 parent
4772121
commit 2d31f78
Showing
99 changed files
with
3,052 additions
and
931 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
<!-- | ||
- Copyright (c) 2019-2020, Arm Limited and Contributors | ||
- | ||
- SPDX-License-Identifier: Apache-2.0 | ||
- | ||
- Licensed under the Apache License, Version 2.0 the "License"; | ||
- you may not use this file except in compliance with the License. | ||
- You may obtain a copy of the License at | ||
- | ||
- http://www.apache.org/licenses/LICENSE-2.0 | ||
- | ||
- Unless required by applicable law or agreed to in writing, software | ||
- distributed under the License is distributed on an "AS IS" BASIS, | ||
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
- See the License for the specific language governing permissions and | ||
- limitations under the License. | ||
- | ||
--> | ||
|
||
# Memory limits with Vulkan <!-- omit in toc --> | ||
|
||
## Contents <!-- omit in toc --> | ||
|
||
- [Mali GPUs](#mali-gpus) | ||
|
||
## Mali GPUs | ||
|
||
This article covers situations in which a Vulkan application might trigger an out of memory (OOM) condition on Mali GPUs, resulting in a `DEVICE_LOST` error, even if the API usage is correct. The OOM condition that developers hit most often is due to a very high vertex load, which might be relatively common when porting Vulkan applications from desktop to mobile. | ||
|
||
Mali GPUs have a memory region which is available to store the intermediate geometry output from a render pass. This memory is used to store all of the varying data generated by vertex, tessellation, and geometry shading prior to fragment shading. Exceeding the size of this region may result in a `VK_ERROR_DEVICE_LOST`. The limit is fixed to 180 MB on current Mali GPUs, but it may be increases or lifted altogether in future GPUs. | ||
|
||
The reasoning behind this limit is that tile-based renderers need to write out and then read back intermediate geometry output, thus vertex load is directly correlated to memory bandwidth. For a typical program using 64 bytes of varying data per vertex the 180 MB of intermediate storage can contain over 2 million vertices, which we expect to be enough for normal mobile application usage. | ||
We will now cover the reasons why such a vertex load is unlikely to be sustainable and possible mitigations if your application is hitting it. | ||
|
||
Let us consider a vertex-heavy application with a single render pass that reaches the 180 MB limit. Since the GPU has to write the data out and read it back from memory, this results in 2 x 180 = 360 MB/render pass, which at 30 FPS brings memory bandwidth up to 30 x 360 = 10.8 GB/s. Memory bandwidth has a direct correlation with power consumption, which can be estimated as 100 mW/(GB/s). This means that an application using 180 MB of varying data will consume at least 1.08 W, and this does not consider further contributions to memory bandwidth and general GPU power consumption. A mobile GPU cannot sustain such a power usage without overheating, which would further cause a reduction of GPU frequency and a performance drop. | ||
|
||
The only real solution to the issue is to keep the application’s vertex count below approximately 2 million, as derived above for an average of 64 bytes of varying data per vertex. In scenarios where the memory storage is exceeded and reducing the vertex load is not feasible, we recommend that the application splits the render pass into multiple render passes, each using a safe amount of intermediate storage. Later render passes can use a loadOp=LOAD to restore the content of the framebuffer and continue rendering on top of earlier rendering. This form of incremental rendering might impact performance, due to the write-out and further read-back of the color image. | ||
|
||
If your vertex load is unpredictable and you are hitting `DEVICE_LOST` issues in the field, you can set up a scheme for estimating memory consumption for each draw call in a render pass, then performing incremental rendering if the limit is reached. You should keep in mind that memory is allocated for all vertex indices between the min and max index referenced by a draw call, and for all generated vertices for tessellation and geometry shading, even if they are subsequently culled by the clipping and culling pass. Such an estimate will be conservative, as the actual amount of memory allocated might be lower, so we don’t recommend adding a further safety margin to the 180 MB limit. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.