Skip to content

Commit e355479

Browse files
authored
Merge pull request #13 from MurrellGroup/main
Update v0.1.0 branch
2 parents 6cb4c4c + 4c27428 commit e355479

File tree

5 files changed

+29
-27
lines changed

5 files changed

+29
-27
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
[![Build Status](https://github.com/MurrellGroup/Microfloats.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/MurrellGroup/Microfloats.jl/actions/workflows/CI.yml?query=branch%3Amain)
66
[![Coverage](https://codecov.io/gh/MurrellGroup/Microfloats.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/MurrellGroup/Microfloats.jl)
77

8-
Microfloats is a Julia package that implements floating point types and arithmetic for sub-8 bit floating point types, supporting arbitrary combinations of sign, exponent, and mantissa bits.
8+
Microfloats is a Julia package that implements floating point types and arithmetic (through wider intermediates) for sub-8 bit floating point types, supporting arbitrary combinations of sign, exponent, and mantissa (significand) bits.
99

1010
Instances of a sub-8 bit floating point type are still 8 bits wide in memory; the goal of `Microfloat` is to serve as a base for arithmetic operations and method dispatch, lending downstream packages a good abstraction for doing bitpacking and hardware acceleration.
1111

@@ -30,12 +30,12 @@ const UFloat7_5 = Microfloat{0,5,2,IEEE_754_like}
3030

3131
### Microscaling (MX)
3232

33-
Microfloats implements the E4M3, E5M2, E2M3, E3M2, E2M1, and E8M0 types from the [Open Compute Project Microscaling Formats (MX) Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf), with most of these using saturated arithmetic (no infinities), and different bit layouts for NaNs. These are exported as `MX_E4M3`, `MX_E5M2`, `MX_E2M3`, `MX_E3M2`, `MX_E2M1`, and `MX_E8M0`, respectively.
33+
Microfloats implements the E4M3, E5M2, E2M3, E3M2, E2M1, and E8M0 types from the [Open Compute Project Microscaling Formats (MX) Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf). These are exported as `MX_E4M3`, `MX_E5M2`, `MX_E2M3`, `MX_E3M2`, `MX_E2M1`, and `MX_E8M0`, respectively, with most of these using saturated arithmetic (no Inf or NaN), and a different encoding for the types that do have NaNs.
3434

3535
For INT8, see `FixedPointNumbers.Q1f6`.
3636

37-
> [!WARNING]
38-
> MX types may not yet be fully OCP compliant. See issues with the [![MX-compliance](https://img.shields.io/github/labels/MurrellGroup/Microfloats.jl/mx-compliance)](https://github.com/MurrellGroup/Microfloats.jl/labels/mx-compliance) label.
37+
> [!NOTE]
38+
> MX types may not be fully MX compliant, but efforts have been and continue to be made to adhere to the specification. See issues with the [![MX-compliance](https://img.shields.io/github/labels/MurrellGroup/Microfloats.jl/mx-compliance)](https://github.com/MurrellGroup/Microfloats.jl/labels/mx-compliance) label.
3939
4040
Since Microfloats.jl only implements the primitive types, microscaling itself may be done with [Microscaling.jl](https://github.com/MurrellGroup/Microscaling.jl), which includes quantization and bitpacking.
4141

docs/src/assets/icon.svg

Lines changed: 15 additions & 14 deletions
Loading

docs/src/conversion.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
## BFloat16
55

66
Conversion to and from `Microfloat` uses `BFloat16` as an intermediate type,
7-
since BFloat16 has 1 sign bit, 8 exponent bits, and 7 significand bits,
7+
since BFloat16 has 1 sign bit, 8 exponent bits, and 7 significand (mantissa) bits,
88
and is therefore able to represent all `Microfloat` types.
99

1010
## Rounding

src/Microfloats.jl

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ include("random.jl")
4141
Microfloat{S,E,M,V}
4242
4343
A `Microfloat` type has `S` sign bits (between 0 and 1),
44-
`E` exponent bits (between 1 and 8), and `M` significand bits (between 0 and 7).
44+
`E` exponent bits (between 1 and 8), and `M` mantissa bits (between 0 and 7).
4545
"""
4646
Microfloat
4747

@@ -70,8 +70,8 @@ for T in (
7070
- Has NaN: `$(hasnan($T))`
7171
- Max normal: `$(Float64(floatmax($T)))`
7272
- Min normal: `$(Float64(floatmin($T)))`
73-
- Max subnormal: `$(Float64(prevfloat(floatmin($T))))`
74-
- Min subnormal: `$(Float64(nextfloat(zero($T))))`
73+
- Max subnormal: `$(significand_bits($T) > 0 ? Float64(prevfloat(floatmin($T))) : "N/A")`
74+
- Min subnormal: `$(significand_bits($T) > 0 ? Float64(nextfloat(zero($T))) : "N/A")`
7575
"""
7676
$T
7777
end

src/conversion.jl

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,12 @@ abstract type SAT <: OverflowPolicy end
33
abstract type OVF <: OverflowPolicy end
44

55
function rshift_round_to_even(x::UInt16, n::Int)
6-
n <= 0 && return x << (-n)
7-
lower = x & ((UInt16(1) << n) - UInt16(1))
8-
half = UInt16(1) << (n - 1)
9-
up = (lower > half) | ((lower == half) & (((x >> n) & UInt16(1)) == UInt16(1)))
10-
(x >> n) + (up ? UInt16(1) : UInt16(0))
6+
n <= 0 && return x >> n
7+
x_32 = UInt32(x)
8+
lower = x_32 & ((UInt32(1) << n) - UInt32(1))
9+
half = UInt32(1) << (n - 1)
10+
up = (lower > half) | ((lower == half) & (((x_32 >> n) & UInt32(1)) == UInt32(1)))
11+
UInt16((x_32 >> n) + (up ? 1 : 0))
1112
end
1213

1314
is_outside_floatmax(xb::BFloat16, ::Type{T}) where T<:Microfloat = reinterpret(Unsigned, abs(xb)) > reinterpret(Unsigned, BFloat16(floatmax(T)))

0 commit comments

Comments
 (0)