# HLSL for Vulkan: Matrices

## Contents

This blog post discusses how HLSL matrices are translated into SPIR-V for Vulkan consumption in the SPIR-V CodeGen of DirectXShaderCompiler. It is one of the “HLSL for Vulkan” series.

Matrix types are native to high-level shading languages, but not to GPU ISAs, which only perform operations on scalars and vectors. Intermediate languages, albeit higher level than GPU ISAs, have their own decisions of whether to retain matrix types: DXIL lowers matrices into vectors, while SPIR-V keeps them.

# Translation Guidelines

## For driver

Having native matrix types in SPIR-V does not necessarily mean that we must
translate HLSL matrices into SPIR-V matrices, though. We can still lower
HLSL matrices into SPIR-V vectors, which helps to reduce some of the confusions
(to be discussed later), but loses high-level information and brings
complexity into the CodeGen (because we need to emulate all the native SPIR-V
matrix instructions like `OpMatrixTimesMatrix`

using vectors). GPU drivers
arguably prefer high-level information so that they can perform optimizations
more tailored to their architectures. With only lower-level information, they
sometimes need to perform analyses to rediscover high-level information.
Therefore, we decided to

Use SPIR-V’s matrix types and instructions when possible.

## For developer

Of course, aiding the driver is only half of the job of a compiler; we also need to make sure the developer does not need to switch to another mindset when using HLSL to program Vulkan shaders: HLSL should be written the way they are written for DirectX, and data should be passed into the shader from application the way they are in DirectX. So, we should

Translate in an approach intuitive and transparent to developers.

Here we use the behavior of `fxc.exe`

as the definition of HLSL and what
developers will expect since HLSL does not have a language specification
publicly available.

# HLSL Matrices

Regarding HLSL matrices, there are a few aspects that need to be covered:

- Initialization: external vs. internal
- Majorness: row vs. column
- Element type: boolean vs. integer vs. float

## Initialization

Depending on the visibility of the matrix, it can be either initialized via
the application or within the shader. Let’s call the former as *externally
initialized* and the later as *internally initialized*. From HLSL’s perspective,

- Externally initialized matrices are in
`cbuffer`

s,`tbuffer`

s, structured buffers, the`$Globals`

cbuffer (declared as non-`static`

global variables), and the`$Params`

cbuffer (declared as`uniform`

entry function parameters); - Internally initialized matrices are the rest, declared as global
`static`

or local variables.

Note that HLSL for DirectX supports supplying initializers for `cbuffer`

members, which is a feature that Vulkan does not have the equivalent. `cbuffer`

member initializers will simply trigger warnings and be ignored by the compiler.
So, we cannot have matrices that are able to be initialized via both ways.

We need to differentiate these two kinds of matrices because they need different handling regarding majorness.

## Majorness

### External initialization

For externally initialized matrices, **conceptually**, we need to read the
initialization data from the GPU memory. Data backing the matrices in memory
are stored as a sequence of elements. Majorness determines how we group these
elements into vectors and then matrices. For a `floatMxN`

matrix,

- Row-major means consecutive numbers group into row-vectors and then the
matrix. That is, the first
`N`

elements group into the first row-vector, the next`N`

elements group into the second row-vector, and so on. We have`M`

such row-vectors in total. - column-major means consecutive numbers group into column-vectors and then
the matrix. That is, the first
`M`

elements group into the first column-vector, the next`M`

elements group into the second column-vector, and so on. We have`N`

such column-vectors in total.

For the ease of discussion, let’s call the matrix on GPU memory as in the
*storage form*, and the matrix in shader after initialization as in
*mathematical form*.

Using `float2x3`

as an example:

```
// Data on GPU memory
{1, 2, 3, 4, 5, 6}
// -----
// Storage form for row-major float2x3
{{1, 2, 3}, {4, 5, 6}}
// Mathematical form for row-major float2x3
[ 1, 2, 3,
4, 5, 6 ]
// -----
// Storage form for column-major float2x3
{{1, 2}, {3, 4}, {5, 6}}
// Mathematical form for column-major float2x3
[ 1, 3, 5,
2, 4, 6 ]
```

Majorness only matters for external initialized matrices, because it controls how they transform from the storage to the mathematical form.

### Internal initialization

For internally initialized matrices, we have already embedded the matrix
initialization data in the shader. **Conceptually**, we don’t need to get the
data from GPU memory again, instead they are placed in GPU registers as
immediate values.

Internally initialized matrices are populated nothing different than other
structured types: they are all in an element-wise way. For a `floatMxN`

matrix `mat`

, the first element in the initializer populates `mat[0][0]`

,
the second element populates `mat[0][1]`

, the `N`

th element populates
`mat[0][N-1]`

, the `N+1`

th element populates `mat[1][0]`

, and so on.

### Initializer

It’s sometimes the flexibility of HLSL syntax that can cause confusions with the majorness. The compiler is happy with the initializer as long as the total number of elements agrees with the matrix to initialize. With that, we can group elements in the initializer almost randomly, even if it means further decomposing some element into components. For example,

```
static float4 vec = {2, 3, 4, 5};
static float2x3 mat1 = {1, 2, 3, 4, 5, 6};
static float2x3 mat2 = {{1}, {2, 3, {4, 5}}, 6};
static float2x3 mat3 = {1, vec, 6};
```

`fxc.exe`

accepts all of the above initializers. It is easy to have the question
of what majorness we should use to initialize `mat1`

. Should it be column-major
since column-major is the default for externally initialized matrices? Actually,
majorness does not matter here since this is just element-wise initialization.
`fxc.exe`

also agrees with that. For example, for the following source code:

```
static column_major float2x2 mat1 = {1, 2, 3, 4};
static row_major float2x2 mat2 = {1, 2, 3, 4};
void main(
out float4 v1 : A,
out float4 v2 : B
) {
v1 = float4(mat1[0][0], mat1[0][1], mat1[1][0], mat1[1][1]);
v2 = float4(mat2[0][0], mat2[0][1], mat2[1][0], mat2[1][1]);
}
```

The output of `fxc.exe`

is

```
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// A 0 xyzw 0 NONE float xyzw
// B 0 xyzw 1 NONE float xyzw
//
vs_5_1
dcl_globalFlags refactoringAllowed
dcl_output o0.xyzw
dcl_output o1.xyzw
mov o0.xyzw, l(1.000000,2.000000,3.000000,4.000000)
mov o1.xyzw, l(1.000000,2.000000,3.000000,4.000000)
ret
```

The `row_major`

/`column_major`

modifier is just ignored. It is also the case for
`-Zpr`

/`-Zpc`

command-line options.

# SPIR-V Matrices

SPIR-V `OpTypeMatrix`

is *column-oriented*; it takes a “column type” and
a “column count” as parameters:

- Column Type is the type of each column in the matrix. It must be vector type.
- Column Count is the number of columns in the new matrix type. It must be at least 2.
- Matrix columns are numbered consecutively, starting with 0. This is true independently of any Decorations describing the memory layout of a matrix.

So with one-level indexing into a SPIR-V matrix, we get the column vector.
This is fundamentally different from HLSL matrices, which is *row-oriented*;
One-level indexing into a HLSL matrix gives us the row vector.

Resulting from the above differences, translating HLSL matrices into SPIR-V ones is not straightforward.

Furthermore, the SPIR-V specification has a few rules regarding using
`OpTypeMatrix`

(in “2.16. Validation Rules”) that make us even unable to have
a unified way of translating HLSL matrices into SPIR-V matrices.

## Validation rules

- Matrix types can only be parameterized with floating-point types.
- Matrix types can only be parameterized as having only 2, 3, or 4 columns.
- Composite objects in the
`StorageBuffer`

,`UniformConstant`

,`Uniform`

, and`PushConstant`

Storage Classes must be explicitly laid out. The following apply to all the aggregate and matrix types describing such an object, recursively through their nested types:- Each structure-type member that is a matrix or array-of-matrices must have
be decorated with a
`MatrixStride`

Decoration and one of the`RowMajor`

or`ColMajor`

Decorations.

- Each structure-type member that is a matrix or array-of-matrices must have
be decorated with a

There are a few Vulkan/SPIR-V terms in the above worth some explanations:

Storage class dictates the type of memory:

- HLSL textures and samplers will be put in the
`UniformConstant`

storage class; - HLSL structured buffers will be put in the
`Uniform`

storage class.

(See the tables in the mapping doc for details of how all HLSL resource types are mapping to Vulkan resource types and their corresponding storage classes.)

`RowMajor`

indicates that components within a row are contiguous in memory.`ColMajor`

indicates that components within a column are contiguous in memory.`MatrixStride`

specifies the stride of rows in a`RowMajor`

-decorated matrix, or columns in a`ColMajor`

-decorated matrix.

## Element type

So, for float matrices, we can use SPIR-V `OpTypeMatrix`

and all the matrix
instructions like `OpVectorTimesMatrix`

, `OpMatrixTimesVector`

,
`OpMatrixTimesMatrix`

, etc. Unfortunately, boolean/integer matrices cannot
enjoy such luxury because of the validation rules shown in the above.
We translate them into arrays of vectors: a HLSL `MxN`

matrix will turn into
an SPIR-V `OpTypeArray`

of `M`

`N`

-component vectors. This means we need
to emulate all the nice matrix instructions by ourselves, too.

## Float matrix operations

For float matrices, although we can use `OpTypeMatrix`

, the translation is
not straightforward because of the differences explained in the beginning
of this section.

The translation must be functionally correct. This means, for the same HLSL
source code with the same data from the application, we should have the same
behavior as `fxc.exe`

. Then we need to get the following language features
correct:

- Matrix indexing
- Matrix per-element operations
- Matrix multiplication
- Matrix majorness modifiers

### Indexing

Among them, indexing is the most flexible one: we can have multiple
forms/ways to index into a matrix, like `._mMN`

, `._MN`

, `[M][N]`

, `[M].yyxx`

,
etc. Thus it is more likely to cause problems for the CodeGen than others.
So we chose our translation scheme to satisfy indexing correctness first.

We have two approach to represent a `floatMxN`

matrix `mat`

in SPIR-V:

```
// 1st approach
%vec1 = OpTypeVector %float N // Column vector with N elements
%mat1 = OpTypeMatrix %vec1 M // M columns
// 2nd approach
%vec2 = OpTypeVector %float M // Column vector with M elements
%mat2 = OpTypeMatrix %vec2 N // N columns
```

The 2nd way represents a `floatMxN`

as a matrix of `M`

rows and `N`

columns.
It’s nice that we have consistent mathematical representation here, but
unfortunately, it breaks indexing. Let’s say we are trying to get `mat[i][j]`

.
Clearly in the source code we have `0 <= i < M`

and `0 <= j < N`

.
But for `%mat2`

in SPIR-V, we actually have `0 <= i < N`

and `0 <= j < M`

.
Further considering that we can copy the whole `mat[i]`

vector and then
referencing the `j`

th element in it behind complicated control flows, this
approach is just unmanageable.

That leaves us to use the first representation, which is essentially a
transpose of the original matrix: `%mat1`

is a matrix of `N`

rows and `M`

columns. But we have the correct indexing behavior. Accessing `mat[i][j]`

can be translated into indexing into `%mat2`

first by `i`

and then by `j`

,
and we are getting the correct element if the matrix is initialized in the
transposed manner (to be discussed later).

### Per-element operations

Operations conducting in a per-element manner, like multiplying the matrix by a scalar, will just work naturally since we have the same operation on each element.

### Multiplication

Because for a HLSL matrix `mat`

, we are actually representing it as
`transpose(mat)`

in SPIR-V, HLSL matrix multiplication `mat1 * mat2`

should
swap the operands in SPIR-V: `transpose(mat2) * transpose(mat1)`

, which is
then `transpose(mat1 * mat2)`

: just how we should represent `mat1 * mat2`

in
SPIR-V.

## Initialization

Initialization the matrix in the transposed manner is key to get float matrix calculations correct.

For the SPIR-V matrix `%mat`

from an internally initialized `floatMxN`

matrix,
we should initialize `%mat[0][0]`

with the first element, `%mat[0][N-1]`

with
the `n`

th element, `%mat[1][0]`

with the `n+1`

th element, and so on. That means,
we need to group the first `N`

elements as the first column-vector, the second
`N`

elements as the second column-vector, and so on.

```
// HLSL
static float2x3 mat = {1, 2, 3, 4, 5, 6};
// SPIR-V
%vec1 = OpCompositeConstruct %v3float %float_1 %float_2 %float_3
%vec2 = OpCompositeConstruct %v3float %float_4 %float_5 %float_6
%mat = OpCompositeConstruct %mat2v3float %vec1 %vec2
```

The above is just nice and natural: it populates the elements in the matrix one by one following the “ᴎ” pattern; just what we want.

For externally initialized matrices, majorness is involved.

## Majorness

As said previously, majorness only matters for externally initialized matrices.
This agrees with SPIR-V validation rules, which requires matrices inside
shader resources to be explicitly laid out, but not
the shader internal ones (inside `Function`

or `Private`

storage class).

### Float matrices

But as we need to swap the multiplication operands, we also need to flip the majorness decoration in SPIR-V to make sure matrices are initialized in the transposed manner:

- HLSL
`row_major`

should be translated into SPIR-V`ColMajor`

; - HLSL
`column_major`

should be translated into SPIR-V`RowMajor`

.

An example will make it clear:

```
// Data on GPU memory
{1, 2, 3, 4, 5, 6}
// --- HLSL ---
// Storage form for row-major float2x3
{{1, 2, 3}, {4, 5, 6}}
// Mathematical form for row-major float2x3
[ 1, 2, 3,
4, 5, 6 ]
// -----
// Storage form for column-major float2x3
{{1, 2}, {3, 4}, {5, 6}}
// Mathematical form for column-major float2x3
[ 1, 3, 5,
2, 4, 6 ]
// --- SPIR-V ---
// Storage form for RowMajor %mat2v3float
{{1, 2}, {3, 4}, {5, 6}}
// Mathematical form for RowMajor %mat2v3float
[ 1, 2,
3, 4,
5, 6 ]
// -----
// Storage form for ColMajor %mat2v3float
{{1, 2, 3}, {4, 5, 6}}
// Mathematical form for ColMajor %mat2v3float
[ 1, 4,
2, 5,
3, 6 ]
```

It’s clear from the above that `row_major`

`float2x3`

should be represented as
`ColMajor`

`%mat2v3float`

, and `column_major`

`float2x3`

should be represented
as `RowMajor`

`%mat2v3float`

, to achieve transposed initialization.

### Bool/integer matrices

The above is for float matrices, though. We don’t have the `RowMajor`

/
`ColMajor`

decoration for boolean/integer matrices since they are translated
into an array of vectors. For them, we need to handle source code `row_major`

/
`column_major`

modifier similarly to what the driver is doing for `RowMajor`

/
`ColMajor`

decorations. Note that we do not want to perform transposed
initialization here since we are not using `OpTypeMatrix`

.

So for HLSL `row_major`

`floatMxN`

matrix, we just need to take the first `N`

consecutive elements to initialize the first vector in the array, the second `N`

consecutive elements to initialize the second vector in the array. Essentially
we don’t need to do anything special.

But for HLSL `column_major`

`floatMxN`

matrix, we need to take the 1st,
`M+1`

th, `M*2+1`

th, …, `M*(N-1)+1`

th element to compose the first vector.
Similarly for other vectors.

## Summary

To summarize what discussed in the above sections:

HLSL Element Type | HLSL Initialization | Majorness | SPIR-V Type | SPIRV Decoration |
---|---|---|---|---|

Float | Internal | Row | `OpTypeMatrix` |
N/A |

Float | Internal | Column | `OpTypeMatrix` |
N/A |

Float | External | Row | `OpTypeMatrix` |
`ColMajor` |

Float | External | Column | `OpTypeMatrix` |
`RowMajor` |

Bool/Integer | Internal | Row | Array of `OpTypeVector` |
N/A |

Bool/Integer | Internal | Column | Array of `OpTypeVector` |
N/A |

Bool/Integer | External | Row | Array of `OpTypeVector` |
N/A |

Bool/Integer | External | Column | Array of `OpTypeVector` |
N/A |

# Takeaways

Due to the fundamental differences between HLSL matrices and SPIR-V matrices (HLSL matrices are row-oriented while SPIR-V matrices are column-oriented) and additional requirements over matrix types in SPIR-V, we don’t have a straightforward and unified translation scheme for HLSL matrices.

- HLSL float matrices are translated into SPIR-V
`OpTypeMatrix`

s in a transposed manner, which requires corresponding special handling of matrix features:- Operands in matrix multiplication need to be swapped.
- Majorness decorations need to be swapped.

- HLSL boolean/integer matrices are translated into SPIR-V
`OpTypeArray`

s of`OpTypeVector`

s.

With the above translation scheme, we retain source code high-level information as much as we can, and the SPIR-V code should work transparently for developers.

Author Lei Zhang

LastMod 2018-04-24