Skip to content

Baked data: use VarULE to store data when specified #6133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Feb 25, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
522e1ab
impl ULE for ()
sffc Feb 15, 2025
309d12d
Add MaybeAsVarULE and impl on all data structs
sffc Feb 15, 2025
8d45d49
Start integrating into BakedExporter
sffc Feb 15, 2025
5c34469
Implement on HelloWorld
sffc Feb 15, 2025
a814e3b
datagen
sffc Feb 15, 2025
781fe51
Almost done
sffc Feb 15, 2025
1eee363
Add the missing impl
sffc Feb 16, 2025
9b18826
fmt
sffc Feb 16, 2025
5bd8b18
Reduce diff; DRY
sffc Feb 16, 2025
255122a
clean datagen (not word break?)
sffc Feb 16, 2025
1eb0897
datagen for word break (it appears the same, it just doesn't format)
sffc Feb 16, 2025
c1e86b1
features
sffc Feb 16, 2025
6eea0dd
macro test
sffc Feb 16, 2025
fc564ea
clippy
sffc Feb 16, 2025
e73da50
Merge branch 'main' into maybe-as-varule
sffc Feb 17, 2025
81035bb
MaybeExportAsVarULE
sffc Feb 18, 2025
4cc5166
NeverVarULE
sffc Feb 18, 2025
2e09206
remove impl ULE for ()
sffc Feb 18, 2025
035abb3
Add MaybeExportAsVarULE::from_varule
sffc Feb 18, 2025
8c12687
Rename macro to data_struct
sffc Feb 18, 2025
d6dcd09
Docs for the struct
sffc Feb 18, 2025
8ffb284
Use macro to implement the trait for HelloWorld
sffc Feb 18, 2025
889a44f
fmt
sffc Feb 18, 2025
c3e539f
New three traits; move to `icu_provider::ule` module
sffc Feb 19, 2025
8eb18c5
Switch from box to reference for now (https://github.com/unicode-org/…
sffc Feb 19, 2025
dd381a4
Fix docs links
sffc Feb 19, 2025
a343f01
Revert "remove impl ULE for ()"
sffc Feb 19, 2025
8511175
Switch from NeverVarULE back to [()]
sffc Feb 19, 2025
9f42022
Add `TODO(#6164)`
sffc Feb 19, 2025
32b18c6
Try migrating to a bake that happens in DataPayload
sffc Feb 20, 2025
cd71e4a
Switch to baking VarZeroSlice, and datagen
sffc Feb 20, 2025
de923c3
tokenize_to_varzeroslice
sffc Feb 20, 2025
ef365c5
EncodedStruct
sffc Feb 21, 2025
b51cc77
Add issue number to TODO
sffc Feb 21, 2025
6eccf2e
data_struct_new!
sffc Feb 21, 2025
bf91bd4
Fix the types in the data_struct_new macro definition
sffc Feb 21, 2025
d5c1922
Do some magic tricks to use closures in the macro
sffc Feb 21, 2025
77062f3
Merge remote-tracking branch 'upstream/main' into maybe-as-varule
sffc Feb 21, 2025
510b2cf
tokenize_encoded_seq
sffc Feb 21, 2025
e35d9e2
Merge remote-tracking branch 'upstream/main' into maybe-as-varule
sffc Feb 23, 2025
bdab081
Catch up on bitrot; clippy
sffc Feb 23, 2025
af6e5a5
doc
sffc Feb 23, 2025
b3d00a2
features
sffc Feb 23, 2025
a8e2ce3
FromVarULE -> ZeroFrom
sffc Feb 23, 2025
ccf35ba
Safety doc tweaks
sffc Feb 23, 2025
bb86f4a
Update provider/core/src/varule_traits.rs
sffc Feb 24, 2025
0bcc8af
cfg(feature = "datagen")
sffc Feb 24, 2025
901cd21
TODO(#5230)
sffc Feb 24, 2025
e55933c
Remove the word "storage" from optimization remark
sffc Feb 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 10 additions & 0 deletions components/calendar/src/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,11 @@ pub struct JapaneseEras<'data> {
pub dates_to_eras: ZeroVec<'data, (EraStartDate, TinyStr16)>,
}

icu_provider::data_struct_new!(
JapaneseEras<'_>,
#[cfg(feature = "datagen")]
);

/// An ICU4X mapping to a subset of CLDR weekData.
/// See CLDR-JSON's weekData.json for more context.
///
Expand All @@ -159,6 +164,11 @@ pub struct WeekData {
pub weekend: WeekdaySet,
}

icu_provider::data_struct_new!(
WeekData,
#[cfg(feature = "datagen")]
);

/// Bitset representing weekdays.
//
// This Bitset uses an [u8] to represent the weekend, thus leaving one bit free.
Expand Down
5 changes: 5 additions & 0 deletions components/calendar/src/provider/chinese_based.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@ pub struct ChineseBasedCache<'data> {
pub data: ZeroVec<'data, PackedChineseBasedYearInfo>,
}

icu_provider::data_struct_new!(
ChineseBasedCache<'_>,
#[cfg(feature = "datagen")]
);

impl ChineseBasedCache<'_> {
/// Compute this data for a range of years
#[cfg(feature = "datagen")]
Expand Down
5 changes: 5 additions & 0 deletions components/calendar/src/provider/islamic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,11 @@ pub struct IslamicCache<'data> {
pub data: ZeroVec<'data, PackedIslamicYearInfo>,
}

icu_provider::data_struct_new!(
IslamicCache<'_>,
#[cfg(feature = "datagen")]
);

impl IslamicCache<'_> {
/// Compute this data for a range of years
#[cfg(feature = "datagen")]
Expand Down
5 changes: 5 additions & 0 deletions components/casemap/src/provider/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,11 @@ pub struct CaseMap<'data> {
pub exceptions: CaseMapExceptions<'data>,
}

icu_provider::data_struct_new!(
CaseMap<'_>,
#[cfg(feature = "datagen")]
);

#[cfg(feature = "serde")]
impl<'de> serde::Deserialize<'de> for CaseMap<'de> {
fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
Expand Down
5 changes: 5 additions & 0 deletions components/casemap/src/provider/unfold.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,11 @@ pub struct CaseMapUnfold<'data> {
pub map: ZeroMap<'data, PotentialUtf8, str>,
}

icu_provider::data_struct_new!(
CaseMapUnfold<'_>,
#[cfg(feature = "datagen")]
);

impl CaseMapUnfold<'_> {
/// Creates a new CaseMapUnfold using data exported by the `icuexportdata` tool in ICU4C.
///
Expand Down
30 changes: 30 additions & 0 deletions components/collator/src/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,11 @@ pub struct CollationData<'data> {
pub contexts: ZeroVec<'data, u16>,
}

icu_provider::data_struct_new!(
CollationData<'_>,
#[cfg(feature = "datagen")]
);

impl<'data> CollationData<'data> {
pub(crate) fn ce32_for_char(&self, c: char) -> CollationElement32 {
CollationElement32::new(self.trie.get32(c as u32))
Expand Down Expand Up @@ -303,6 +308,11 @@ pub struct CollationDiacritics<'data> {
pub secondaries: ZeroVec<'data, u16>,
}

icu_provider::data_struct_new!(
CollationDiacritics<'_>,
#[cfg(feature = "datagen")]
);

/// `CollationElement32`s for the Hangul Jamo Unicode Block
///
/// <div class="stab unstable">
Expand All @@ -321,6 +331,11 @@ pub struct CollationJamo<'data> {
pub ce32s: ZeroVec<'data, u32>,
}

icu_provider::data_struct_new!(
CollationJamo<'_>,
#[cfg(feature = "datagen")]
);

/// Script reordering data
///
/// <div class="stab unstable">
Expand Down Expand Up @@ -371,6 +386,11 @@ pub struct CollationReordering<'data> {
pub reorder_ranges: ZeroVec<'data, u32>,
}

icu_provider::data_struct_new!(
CollationReordering<'_>,
#[cfg(feature = "datagen")]
);

impl CollationReordering<'_> {
pub(crate) fn reorder(&self, primary: u32) -> u32 {
if let Some(b) = self.reorder_table.get((primary >> 24) as usize) {
Expand Down Expand Up @@ -429,6 +449,11 @@ pub struct CollationMetadata {
pub bits: u32,
}

icu_provider::data_struct_new!(
CollationMetadata,
#[cfg(feature = "datagen")]
);

impl CollationMetadata {
const MAX_VARIABLE_MASK: u32 = 0b11;
const TAILORED_MASK: u32 = 1 << 3;
Expand Down Expand Up @@ -518,6 +543,11 @@ pub struct CollationSpecialPrimaries<'data> {
pub numeric_primary: u8,
}

icu_provider::data_struct_new!(
CollationSpecialPrimaries<'_>,
#[cfg(feature = "datagen")]
);

impl CollationSpecialPrimaries<'_> {
#[allow(clippy::unwrap_used)]
pub(crate) fn last_primary_for_group(&self, max_variable: MaxVariable) -> u32 {
Expand Down
5 changes: 5 additions & 0 deletions components/decimal/src/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,11 @@ pub struct DecimalSymbols<'data> {
pub grouping_sizes: GroupingSizes,
}

icu_provider::data_struct_new!(
DecimalSymbols<'_>,
#[cfg(feature = "datagen")]
);

impl DecimalSymbols<'_> {
/// Return (prefix, suffix) for the minus sign
pub fn minus_sign_affixes(&self) -> (&str, &str) {
Expand Down
5 changes: 5 additions & 0 deletions components/list/src/provider/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,11 @@ data_marker!(
ListFormatterPatterns<'static>,
);

icu_provider::data_struct_new!(
ListFormatterPatterns<'_>,
#[cfg(feature = "datagen")]
);

/// Symbols and metadata required for [`ListFormatter`](crate::ListFormatter).
///
/// <div class="stab unstable">
Expand Down
35 changes: 35 additions & 0 deletions components/locale/src/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,11 @@ pub struct Aliases<'data> {
pub subdivision: ZeroMap<'data, UnvalidatedSubdivision, SemivalidatedSubdivision>,
}

icu_provider::data_struct_new!(
Aliases<'_>,
#[cfg(feature = "datagen")]
);

#[derive(Debug, PartialEq, Clone, yoke::Yokeable, zerofrom::ZeroFrom)]
#[cfg_attr(feature = "datagen", derive(serde::Serialize, databake::Bake))]
#[cfg_attr(feature = "datagen", databake(path = icu_locale::provider))]
Expand Down Expand Up @@ -325,6 +330,11 @@ pub struct LikelySubtagsForLanguage<'data> {
pub und: (Language, Script, Region),
}

icu_provider::data_struct_new!(
LikelySubtagsForLanguage<'_>,
#[cfg(feature = "datagen")]
);

#[derive(Debug, PartialEq, Clone, yoke::Yokeable, zerofrom::ZeroFrom)]
#[cfg_attr(feature = "datagen", derive(serde::Serialize, databake::Bake))]
#[cfg_attr(feature = "datagen", databake(path = icu_locale::provider))]
Expand Down Expand Up @@ -364,6 +374,11 @@ pub struct LikelySubtagsForScriptRegion<'data> {
pub region: ZeroMap<'data, UnvalidatedRegion, (Language, Script)>,
}

icu_provider::data_struct_new!(
LikelySubtagsForScriptRegion<'_>,
#[cfg(feature = "datagen")]
);

#[derive(Debug, PartialEq, Clone, yoke::Yokeable, zerofrom::ZeroFrom)]
#[cfg_attr(feature = "datagen", derive(serde::Serialize, databake::Bake))]
#[cfg_attr(feature = "datagen", databake(path = icu_locale::provider))]
Expand Down Expand Up @@ -398,6 +413,11 @@ pub struct LikelySubtagsExtended<'data> {
pub region: ZeroMap<'data, UnvalidatedRegion, (Language, Script)>,
}

icu_provider::data_struct_new!(
LikelySubtagsExtended<'_>,
#[cfg(feature = "datagen")]
);

/// Locale fallback rules derived from CLDR parent locales data.
#[derive(Default, Clone, PartialEq, Debug, yoke::Yokeable, zerofrom::ZeroFrom)]
#[cfg_attr(feature = "datagen", derive(serde::Serialize, databake::Bake))]
Expand All @@ -411,6 +431,11 @@ pub struct Parents<'data> {
pub parents: ZeroMap<'data, PotentialUtf8, (Language, Option<Script>, Option<Region>)>,
}

icu_provider::data_struct_new!(
Parents<'_>,
#[cfg(feature = "datagen")]
);

#[derive(Debug, PartialEq, Clone, yoke::Yokeable, zerofrom::ZeroFrom)]
#[cfg_attr(feature = "datagen", derive(serde::Serialize, databake::Bake))]
#[cfg_attr(feature = "datagen", databake(path = icu_locale::provider))]
Expand All @@ -432,6 +457,11 @@ pub struct ScriptDirection<'data> {
pub ltr: ZeroVec<'data, UnvalidatedScript>,
}

icu_provider::data_struct_new!(
ScriptDirection<'_>,
#[cfg(feature = "datagen")]
);

/// A set of characters and strings which share a particular property value.
///
/// <div class="stab unstable">
Expand All @@ -449,3 +479,8 @@ pub struct ScriptDirection<'data> {
pub struct ExemplarCharactersData<'data>(
#[cfg_attr(feature = "serde", serde(borrow))] pub CodePointInversionListAndStringList<'data>,
);

icu_provider::data_struct_new!(
ExemplarCharactersData<'_>,
#[cfg(feature = "datagen")]
);
20 changes: 20 additions & 0 deletions components/normalizer/src/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,11 @@ pub struct DecompositionData<'data> {
pub passthrough_cap: u16,
}

icu_provider::data_struct_new!(
DecompositionData<'_>,
#[cfg(feature = "datagen")]
);

/// The expansion tables for cases where the decomposition isn't
/// contained in the trie value
///
Expand All @@ -154,6 +159,11 @@ pub struct DecompositionTables<'data> {
pub scalars24: ZeroVec<'data, char>,
}

icu_provider::data_struct_new!(
DecompositionTables<'_>,
#[cfg(feature = "datagen")]
);

/// Non-Hangul canonical compositions
///
/// <div class="stab unstable">
Expand All @@ -173,6 +183,11 @@ pub struct CanonicalCompositions<'data> {
pub canonical_compositions: Char16Trie<'data>,
}

icu_provider::data_struct_new!(
CanonicalCompositions<'_>,
#[cfg(feature = "datagen")]
);

/// Non-recursive canonical decompositions that differ from
/// `DecompositionData`.
///
Expand All @@ -194,3 +209,8 @@ pub struct NonRecursiveDecompositionSupplement<'data> {
#[cfg_attr(feature = "serde", serde(borrow))]
pub scalars24: ZeroVec<'data, char>,
}

icu_provider::data_struct_new!(
NonRecursiveDecompositionSupplement<'_>,
#[cfg(feature = "datagen")]
);
10 changes: 10 additions & 0 deletions components/plurals/src/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,11 @@ pub struct PluralRulesData<'data> {
pub many: Option<Rule<'data>>,
}

icu_provider::data_struct_new!(
PluralRulesData<'_>,
#[cfg(feature = "datagen")]
);

#[cfg(feature = "experimental")]
pub use ranges::*;

Expand Down Expand Up @@ -345,6 +350,11 @@ mod ranges {
#[cfg_attr(feature = "serde", serde(borrow))]
pub ranges: ZeroMap<'data, UnvalidatedPluralRange, RawPluralCategory>,
}

icu_provider::data_struct_new!(
PluralRanges<'_>,
#[cfg(feature = "datagen")]
);
}

/// A sized packed [`PluralElements`] suitable for use in data structs.
Expand Down
5 changes: 5 additions & 0 deletions components/properties/src/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -476,6 +476,11 @@ pub enum PropertyCodePointMap<'data, T: TrieValue> {
// https://docs.rs/serde/latest/serde/trait.Serializer.html#tymethod.serialize_unit_variant
}

icu_provider::data_struct_new!(
<T: TrieValue> PropertyCodePointMap<'_, T>,
#[cfg(feature = "datagen")]
);

macro_rules! data_struct_generic {
($(marker($marker:ident, $ty:ident, $path:literal),)+) => {
$(
Expand Down
10 changes: 10 additions & 0 deletions components/time/src/provider/iana.rs
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,11 @@ pub struct IanaToBcp47Map<'data> {
pub bcp47_ids: ZeroVec<'data, TimeZone>,
}

icu_provider::data_struct_new!(
IanaToBcp47Map<'_>,
#[cfg(feature = "datagen")]
);

/// A mapping from IANA time zone identifiers to BCP-47 time zone identifiers.
///
/// The BCP-47 time zone ID maps to the default IANA time zone ID according to the CLDR data.
Expand All @@ -105,3 +110,8 @@ pub struct IanaNames<'data> {
#[cfg_attr(feature = "serde", serde(borrow))]
pub normalized_iana_ids: VarZeroVec<'data, str>,
}

icu_provider::data_struct_new!(
IanaNames<'_>,
#[cfg(feature = "datagen")]
);
5 changes: 5 additions & 0 deletions components/time/src/provider/windows.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,8 @@ pub struct WindowsZonesToBcp47Map<'data> {
#[cfg_attr(feature = "serde", serde(borrow))]
pub bcp47_ids: ZeroVec<'data, TimeZone>,
}

icu_provider::data_struct_new!(
WindowsZonesToBcp47Map<'_>,
#[cfg(feature = "datagen")]
);
1 change: 1 addition & 0 deletions provider/baked/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ include.workspace = true
icu_provider = { workspace = true }
writeable = { workspace = true }
zerotrie = { workspace = true, features = ["alloc"] }
zerovec = { workspace = true }

crlify = { workspace = true, optional = true }
databake = { workspace = true, optional = true}
Expand Down
2 changes: 2 additions & 0 deletions provider/baked/src/binary_search.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

//! Data stored as slices, looked up with binary search
//!
//! TODO(#6164): This code is stale; update it before use.

use icu_provider::prelude::*;

Expand Down
Loading