Skip to content

Implement Quick Check#2769

Merged
P-E-P merged 1 commit intoRust-GCC:masterfrom
tamaroning:quick-check
Jan 29, 2024
Merged

Implement Quick Check#2769
P-E-P merged 1 commit intoRust-GCC:masterfrom
tamaroning:quick-check

Conversation

@tamaroning
Copy link
Contributor

@tamaroning tamaroning commented Dec 2, 2023

Addresses #2379

I implemented Unicode Quick Check.
This checks if strings (usually identifiers) should be normalized or not before performing Unicode normalization.
QuickCheckResult::Yes shows the given string is already normalized, saving time and memory.
Strings should be normalized if the result is QuickCheckResult::No or QuickCheckResult::Maybe.

Also, I added Unicode data files which are compatible with GPL.
they are used to generate rust-unicode-data.h
See https://www.gnu.org/licenses/license-list.en.html#Unicode

gcc/rust/ChangeLog:

	* rust-system.h: Add <algorithm>.
	* util/make-rust-unicode.py: Output NFC_Quick_Check table.
	* util/rust-codepoint.h (struct Codepoint): Add is_supplementary
	method.
	* util/rust-unicode-data.h: Generated.
	* util/rust-unicode.cc (lookup_cc): Modified to use
	std::lower_bound.
	(is_alphabetic): Likewise.
	(nfc_quick_check): New function.
	(nfc_normalize): Use nfc_quick_check.
	(is_nfc_qc_maybe): New function.
	(is_nfc_qc_no): New function.
	(rust_nfc_qc_test): New test.
	* util/rust-unicode.h (is_nfc_qc_no): New function.
	(is_nfc_qc_maybe): New function.
	(enum class): New enum class.
	(nfc_quick_check): New function.
	(rust_nfc_qc_test): New test.
	* util/DerivedCoreProperties.txt: New file.
	* util/DerivedNormalizationProps.txt: New file.
	* util/UnicodeData.txt: New file.

Signed-off-by: Raiki Tamura <tamaron1203@gmail.com>

# <http://www.gnu.org/licenses/>.

# Run this program as
# First, download the folloqing files from unicode.org
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# First, download the folloqing files from unicode.org
# First, download the following files from unicode.org

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks😆

@tamaroning tamaroning force-pushed the quick-check branch 2 times, most recently from 10a9f78 to 4a9fc52 Compare December 12, 2023 16:20
@tamaroning tamaroning marked this pull request as ready for review December 12, 2023 16:32
Copy link
Member

@CohenArthur CohenArthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, thank you :) is it possible to maybe add some unit testing? otherwise this looks good to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so these files are not present in any other GCC frontend? this is surprising. I guess it'll be a nice thing to bring higher in the GCC file hierarchy, so that other languages can benefit from this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omg, I completely missed the fact that contrib/unicode contains these files 😮😮
https://github.com/tamaroning/gccrs/tree/2a72592cf111f8f3dcc024d1f34ccd7fe6d36878/contrib/unicode

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, no problem! better late than never :D well done finding them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed them and pushed just now

@tamaroning tamaroning force-pushed the quick-check branch 4 times, most recently from 37125bc to b6c3c3d Compare December 14, 2023 11:39
Comment on lines +345 to +349
void
rust_nfc_qc_test ()
{
ASSERT_EQ (Rust::nfc_quick_check ({0x1e0a /* NFC_QC_YES */}),
Rust::QuickCheckResult::YES);
ASSERT_EQ (Rust::nfc_quick_check (
{0x1e0a /* NFC_QC_YES */, 0x0323 /* NFC_QC_MAYBE */}),
Rust::QuickCheckResult::MAYBE);
ASSERT_EQ (Rust::nfc_quick_check ({0x0340 /* NFC_QC_NO */}),
Rust::QuickCheckResult::NO);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CohenArthur
Here are tests. (but very few...)
Sorry, I failed to commit gcc/rust/rust-lang.cc for some reason.

Copy link
Member

@CohenArthur CohenArthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great :) thanks @tamaroning, good work!


template <std::size_t SIZE>
int64_t
bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace this function with a direct call to std::binary_search instead ?

Copy link
Contributor Author

@tamaroning tamaroning Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes.
I will fix it later

gcc/rust/ChangeLog:

	* rust-lang.cc (run_rust_tests): Add test.
	* rust-system.h: Add <algorithm>.
	* util/make-rust-unicode.py: Output NFC_Quick_Check table.
	* util/rust-codepoint.h (struct Codepoint): Add is_supplementary
	method.
	* util/rust-unicode-data.h: Generated.
	* util/rust-unicode.cc (binary_search_sorted_array): Removed.
	(lookup_cc): Remove namespace.
	(is_alphabetic): Use std::binary_search
	(nfc_quick_check): New function.
	(nfc_normalize): Use nfc_quick_check.
	(is_nfc_qc_maybe): New function.
	(is_nfc_qc_no): New function.
	(rust_nfc_qc_test): New test.
	* util/rust-unicode.h (is_nfc_qc_no): New function.
	(is_nfc_qc_maybe): New function.
	(enum class): New enum class.
	(nfc_quick_check): New function.
	(rust_nfc_qc_test): New test.

Signed-off-by: Raiki Tamura <tamaron1203@gmail.com>
@tamaroning
Copy link
Contributor Author

tamaroning commented Jan 26, 2024

@P-E-P
Sorry for super late response!
I've been busy for my university exams
i fixed everything.

Copy link
Member

@P-E-P P-E-P left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@P-E-P P-E-P added this pull request to the merge queue Jan 29, 2024
Merged via the queue into Rust-GCC:master with commit d6e10d2 Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants