This blog post discusses how HLSL matrices are translated into SPIR-V for Vulkan consumption in the SPIR-V CodeGen of DirectXShaderCompiler. It is one of the “HLSL for Vulkan” series.

Matrix types are native to high-level shading languages, but not to GPU ISAs, which only perform operations on scalars and vectors. Intermediate languages, albeit higher level than GPU ISAs, have their own decisions of whether to retain matrix types: DXIL lowers matrices into vectors, while SPIR-V keeps them.

Translation Guidelines

For driver

Having native matrix types in SPIR-V does not necessarily mean that we must translate HLSL matrices into SPIR-V matrices, though. We can still lower HLSL matrices into SPIR-V vectors, which helps to reduce some of the confusions (to be discussed later), but loses high-level information and brings complexity into the CodeGen (because we need to emulate all the native SPIR-V matrix instructions like OpMatrixTimesMatrix using vectors). GPU drivers arguably prefer high-level information so that they can perform optimizations more tailored to their architectures. With only lower-level information, they sometimes need to perform analyses to rediscover high-level information. Therefore, we decided to

Use SPIR-V’s matrix types and instructions when possible.

For developer

Of course, aiding the driver is only half of the job of a compiler; we also need to make sure the developer does not need to switch to another mindset when using HLSL to program Vulkan shaders: HLSL should be written the way they are written for DirectX, and data should be passed into the shader from application the way they are in DirectX. So, we should

Translate in an approach intuitive and transparent to developers.

Here we use the behavior of fxc.exe as the definition of HLSL and what developers will expect since HLSL does not have a language specification publicly available.

HLSL Matrices

Regarding HLSL matrices, there are a few aspects that need to be covered:

  • Initialization: external vs. internal
  • Majorness: row vs. column
  • Element type: boolean vs. integer vs. float

Initialization

Depending on the visibility of the matrix, it can be either initialized via the application or within the shader. Let’s call the former as externally initialized and the later as internally initialized. From HLSL’s perspective,

  • Externally initialized matrices are in cbuffers, tbuffers, structured buffers, the $Globals cbuffer (declared as non-static global variables), and the $Params cbuffer (declared as uniform entry function parameters);
  • Internally initialized matrices are the rest, declared as global static or local variables.

Note that HLSL for DirectX supports supplying initializers for cbuffer members, which is a feature that Vulkan does not have the equivalent. cbuffer member initializers will simply trigger warnings and be ignored by the compiler. So, we cannot have matrices that are able to be initialized via both ways.

We need to differentiate these two kinds of matrices because they need different handling regarding majorness.

Majorness

External initialization

For externally initialized matrices, conceptually, we need to read the initialization data from the GPU memory. Data backing the matrices in memory are stored as a sequence of elements. Majorness determines how we group these elements into vectors and then matrices. For a floatMxN matrix,

  • Row-major means consecutive numbers group into row-vectors and then the matrix. That is, the first N elements group into the first row-vector, the next N elements group into the second row-vector, and so on. We have M such row-vectors in total.
  • column-major means consecutive numbers group into column-vectors and then the matrix. That is, the first M elements group into the first column-vector, the next M elements group into the second column-vector, and so on. We have N such column-vectors in total.

For the ease of discussion, let’s call the matrix on GPU memory as in the storage form, and the matrix in shader after initialization as in mathematical form.

Using float2x3 as an example:

// Data on GPU memory
{1, 2, 3, 4, 5, 6}

// -----

// Storage form for row-major float2x3
{{1, 2, 3}, {4, 5, 6}}

// Mathematical form for row-major float2x3
[ 1, 2, 3,
  4, 5, 6 ]

// -----

// Storage form for column-major float2x3
{{1, 2}, {3, 4}, {5, 6}}

// Mathematical form for column-major float2x3
[ 1, 3, 5,
  2, 4, 6 ]

Majorness only matters for external initialized matrices, because it controls how they transform from the storage to the mathematical form.

Internal initialization

For internally initialized matrices, we have already embedded the matrix initialization data in the shader. Conceptually, we don’t need to get the data from GPU memory again, instead they are placed in GPU registers as immediate values.

Internally initialized matrices are populated nothing different than other structured types: they are all in an element-wise way. For a floatMxN matrix mat, the first element in the initializer populates mat[0][0], the second element populates mat[0][1], the Nth element populates mat[0][N-1], the N+1th element populates mat[1][0], and so on.

Initializer

It’s sometimes the flexibility of HLSL syntax that can cause confusions with the majorness. The compiler is happy with the initializer as long as the total number of elements agrees with the matrix to initialize. With that, we can group elements in the initializer almost randomly, even if it means further decomposing some element into components. For example,

static float4   vec  = {2, 3, 4, 5};
static float2x3 mat1 = {1, 2, 3, 4, 5, 6};
static float2x3 mat2 = {{1}, {2, 3, {4, 5}}, 6};
static float2x3 mat3 = {1, vec, 6};

fxc.exe accepts all of the above initializers. It is easy to have the question of what majorness we should use to initialize mat1. Should it be column-major since column-major is the default for externally initialized matrices? Actually, majorness does not matter here since this is just element-wise initialization. fxc.exe also agrees with that. For example, for the following source code:

static column_major float2x2 mat1 = {1, 2, 3, 4};
static row_major    float2x2 mat2 = {1, 2, 3, 4};

void main(
  out float4 v1 : A,
  out float4 v2 : B
) {
  v1 = float4(mat1[0][0], mat1[0][1], mat1[1][0], mat1[1][1]);
  v2 = float4(mat2[0][0], mat2[0][1], mat2[1][0], mat2[1][1]);
}

The output of fxc.exe is

// Output signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// A                        0   xyzw        0     NONE   float   xyzw
// B                        0   xyzw        1     NONE   float   xyzw
//
vs_5_1
dcl_globalFlags refactoringAllowed
dcl_output o0.xyzw
dcl_output o1.xyzw
mov o0.xyzw, l(1.000000,2.000000,3.000000,4.000000)
mov o1.xyzw, l(1.000000,2.000000,3.000000,4.000000)
ret

The row_major/column_major modifier is just ignored. It is also the case for -Zpr/-Zpc command-line options.

SPIR-V Matrices

SPIR-V OpTypeMatrix is column-oriented; it takes a “column type” and a “column count” as parameters:

  • Column Type is the type of each column in the matrix. It must be vector type.
  • Column Count is the number of columns in the new matrix type. It must be at least 2.
  • Matrix columns are numbered consecutively, starting with 0. This is true independently of any Decorations describing the memory layout of a matrix.

So with one-level indexing into a SPIR-V matrix, we get the column vector. This is fundamentally different from HLSL matrices, which is row-oriented; One-level indexing into a HLSL matrix gives us the row vector.

Resulting from the above differences, translating HLSL matrices into SPIR-V ones is not straightforward.

Furthermore, the SPIR-V specification has a few rules regarding using OpTypeMatrix (in “2.16. Validation Rules”) that make us even unable to have a unified way of translating HLSL matrices into SPIR-V matrices.

Validation rules

  • Matrix types can only be parameterized with floating-point types.
  • Matrix types can only be parameterized as having only 2, 3, or 4 columns.
  • Composite objects in the StorageBuffer, UniformConstant, Uniform, and PushConstant Storage Classes must be explicitly laid out. The following apply to all the aggregate and matrix types describing such an object, recursively through their nested types:
    • Each structure-type member that is a matrix or array-of-matrices must have be decorated with a MatrixStride Decoration and one of the RowMajor or ColMajor Decorations.

There are a few Vulkan/SPIR-V terms in the above worth some explanations:

Storage class dictates the type of memory:

  • HLSL textures and samplers will be put in the UniformConstant storage class;
  • HLSL structured buffers will be put in the Uniform storage class.

(See the tables in the mapping doc for details of how all HLSL resource types are mapping to Vulkan resource types and their corresponding storage classes.)

  • RowMajor indicates that components within a row are contiguous in memory.
  • ColMajor indicates that components within a column are contiguous in memory.
  • MatrixStride specifies the stride of rows in a RowMajor-decorated matrix, or columns in a ColMajor-decorated matrix.

Element type

So, for float matrices, we can use SPIR-V OpTypeMatrix and all the matrix instructions like OpVectorTimesMatrix, OpMatrixTimesVector, OpMatrixTimesMatrix, etc. Unfortunately, boolean/integer matrices cannot enjoy such luxury because of the validation rules shown in the above. We translate them into arrays of vectors: a HLSL MxN matrix will turn into an SPIR-V OpTypeArray of M N-component vectors. This means we need to emulate all the nice matrix instructions by ourselves, too.

Float matrix operations

For float matrices, although we can use OpTypeMatrix, the translation is not straightforward because of the differences explained in the beginning of this section.

The translation must be functionally correct. This means, for the same HLSL source code with the same data from the application, we should have the same behavior as fxc.exe. Then we need to get the following language features correct:

  • Matrix indexing
  • Matrix per-element operations
  • Matrix multiplication
  • Matrix majorness modifiers

Indexing

Among them, indexing is the most flexible one: we can have multiple forms/ways to index into a matrix, like ._mMN, ._MN, [M][N], [M].yyxx, etc. Thus it is more likely to cause problems for the CodeGen than others. So we chose our translation scheme to satisfy indexing correctness first.

We have two approach to represent a floatMxN matrix mat in SPIR-V:

// 1st approach
%vec1 = OpTypeVector %float N // Column vector with N elements
%mat1 = OpTypeMatrix %vec1  M // M columns

// 2nd approach
%vec2 = OpTypeVector %float M // Column vector with M elements
%mat2 = OpTypeMatrix %vec2  N // N columns

The 2nd way represents a floatMxN as a matrix of M rows and N columns. It’s nice that we have consistent mathematical representation here, but unfortunately, it breaks indexing. Let’s say we are trying to get mat[i][j]. Clearly in the source code we have 0 <= i < M and 0 <= j < N. But for %mat2 in SPIR-V, we actually have 0 <= i < N and 0 <= j < M. Further considering that we can copy the whole mat[i] vector and then referencing the jth element in it behind complicated control flows, this approach is just unmanageable.

That leaves us to use the first representation, which is essentially a transpose of the original matrix: %mat1 is a matrix of N rows and M columns. But we have the correct indexing behavior. Accessing mat[i][j] can be translated into indexing into %mat2 first by i and then by j, and we are getting the correct element if the matrix is initialized in the transposed manner (to be discussed later).

Per-element operations

Operations conducting in a per-element manner, like multiplying the matrix by a scalar, will just work naturally since we have the same operation on each element.

Multiplication

Because for a HLSL matrix mat, we are actually representing it as transpose(mat) in SPIR-V, HLSL matrix multiplication mat1 * mat2 should swap the operands in SPIR-V: transpose(mat2) * transpose(mat1), which is then transpose(mat1 * mat2): just how we should represent mat1 * mat2 in SPIR-V.

Initialization

Initialization the matrix in the transposed manner is key to get float matrix calculations correct.

For the SPIR-V matrix %mat from an internally initialized floatMxN matrix, we should initialize %mat[0][0] with the first element, %mat[0][N-1] with the nth element, %mat[1][0] with the n+1th element, and so on. That means, we need to group the first N elements as the first column-vector, the second N elements as the second column-vector, and so on.

// HLSL
static float2x3 mat = {1, 2, 3, 4, 5, 6};

// SPIR-V
%vec1 = OpCompositeConstruct %v3float %float_1 %float_2 %float_3
%vec2 = OpCompositeConstruct %v3float %float_4 %float_5 %float_6
%mat  = OpCompositeConstruct %mat2v3float %vec1 %vec2

The above is just nice and natural: it populates the elements in the matrix one by one following the “ᴎ” pattern; just what we want.

For externally initialized matrices, majorness is involved.

Majorness

As said previously, majorness only matters for externally initialized matrices. This agrees with SPIR-V validation rules, which requires matrices inside shader resources to be explicitly laid out, but not the shader internal ones (inside Function or Private storage class).

Float matrices

But as we need to swap the multiplication operands, we also need to flip the majorness decoration in SPIR-V to make sure matrices are initialized in the transposed manner:

  • HLSL row_major should be translated into SPIR-V ColMajor;
  • HLSL column_major should be translated into SPIR-V RowMajor.

An example will make it clear:

// Data on GPU memory
{1, 2, 3, 4, 5, 6}

// --- HLSL ---

// Storage form for row-major float2x3
{{1, 2, 3}, {4, 5, 6}}

// Mathematical form for row-major float2x3
[ 1, 2, 3,
  4, 5, 6 ]

// -----

// Storage form for column-major float2x3
{{1, 2}, {3, 4}, {5, 6}}

// Mathematical form for column-major float2x3
[ 1, 3, 5,
  2, 4, 6 ]

// --- SPIR-V ---

// Storage form for RowMajor %mat2v3float
{{1, 2}, {3, 4}, {5, 6}}

// Mathematical form for RowMajor %mat2v3float
[ 1, 2,
  3, 4,
  5, 6 ]

// -----

// Storage form for ColMajor %mat2v3float
{{1, 2, 3}, {4, 5, 6}}

// Mathematical form for ColMajor %mat2v3float
[ 1, 4,
  2, 5,
  3, 6 ]

It’s clear from the above that row_major float2x3 should be represented as ColMajor %mat2v3float, and column_major float2x3 should be represented as RowMajor %mat2v3float, to achieve transposed initialization.

Bool/integer matrices

The above is for float matrices, though. We don’t have the RowMajor/ ColMajor decoration for boolean/integer matrices since they are translated into an array of vectors. For them, we need to handle source code row_major/ column_major modifier similarly to what the driver is doing for RowMajor/ ColMajor decorations. Note that we do not want to perform transposed initialization here since we are not using OpTypeMatrix.

So for HLSL row_major floatMxN matrix, we just need to take the first N consecutive elements to initialize the first vector in the array, the second N consecutive elements to initialize the second vector in the array. Essentially we don’t need to do anything special.

But for HLSL column_major floatMxN matrix, we need to take the 1st, M+1th, M*2+1th, …, M*(N-1)+1th element to compose the first vector. Similarly for other vectors.

Summary

To summarize what discussed in the above sections:

HLSL Element Type HLSL Initialization Majorness SPIR-V Type SPIRV Decoration
Float Internal Row OpTypeMatrix N/A
Float Internal Column OpTypeMatrix N/A
Float External Row OpTypeMatrix ColMajor
Float External Column OpTypeMatrix RowMajor
Bool/Integer Internal Row Array of OpTypeVector N/A
Bool/Integer Internal Column Array of OpTypeVector N/A
Bool/Integer External Row Array of OpTypeVector N/A
Bool/Integer External Column Array of OpTypeVector N/A

Takeaways

Due to the fundamental differences between HLSL matrices and SPIR-V matrices (HLSL matrices are row-oriented while SPIR-V matrices are column-oriented) and additional requirements over matrix types in SPIR-V, we don’t have a straightforward and unified translation scheme for HLSL matrices.

  • HLSL float matrices are translated into SPIR-V OpTypeMatrixs in a transposed manner, which requires corresponding special handling of matrix features:
    • Operands in matrix multiplication need to be swapped.
    • Majorness decorations need to be swapped.
  • HLSL boolean/integer matrices are translated into SPIR-V OpTypeArrays of OpTypeVectors.

With the above translation scheme, we retain source code high-level information as much as we can, and the SPIR-V code should work transparently for developers.