Skip to content

MurrellGroup/Microfloats.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Microfloats

Stable Dev Build Status Coverage

Microfloats is a Julia package that implements types and arithmetic (through wider intermediates) for sub-8 bit floating points, supporting arbitrary combinations of sign, exponent, and mantissa (significand) bits.

Instances of a sub-8 bit floating point type are still 8 bits wide in memory; the goal of Microfloat is to serve as a base for arithmetic operations and method dispatch, lending downstream packages a good abstraction for doing bitpacking and hardware acceleration.

Usage

Along with the types already exported by Microfloats, we can also create our own types by passing the number of sign, exponent, and mantissa bits to the Microfloat type constructor. For example, one can recreate the Float8 and Float8_4 types exported by Float8s.jl:

using Microfloats

const Float8 = Microfloat{1,3,4,IEEE_754_like}
const Float8_4 = Microfloat{1,4,3,IEEE_754_like}

# creating a sawed-off Float16 (BFloat8?) becomes trivial:
const Float8_5 = Microfloat{1,5,2,IEEE_754_like}

# unsigned variants:
const UFloat7 = Microfloat{0,3,4,IEEE_754_like}
const UFloat7_4 = Microfloat{0,4,3,IEEE_754_like}
const UFloat7_5 = Microfloat{0,5,2,IEEE_754_like}

Microscaling (MX)

Microfloats implements the E4M3, E5M2, E2M3, E3M2, E2M1, and E8M0 types from the Open Compute Project Microscaling Formats (MX) Specification. These are exported as MX_E4M3, MX_E5M2, MX_E2M3, MX_E3M2, MX_E2M1, and MX_E8M0, respectively, with most of these using saturated arithmetic (no Inf or NaN), and a different encoding for the types that do have NaNs.

For INT8, see FixedPointNumbers.Q1f6.

Note

MX types may not be fully MX compliant, but efforts have been and continue to be made to adhere to the specification. See issues with the MX-compliance label.

Since Microfloats.jl only implements the primitive types, microscaling itself may be done with Microscaling.jl, which includes quantization and bitpacking.

Installation

using Pkg
Pkg.Registry.add(url="https://github.com/MurrellGroup/MurrellGroupRegistry")
Pkg.add("Microfloats")

See also