Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for new index hash implementation. #184

Merged
merged 2 commits into from
Feb 19, 2025

Conversation

UebelAndre
Copy link
Contributor

closes #182

@Byron
Copy link
Collaborator

Byron commented Feb 8, 2025

Thanks a lot for tackling this!

I took a look as well but couldn't see what the difference is. Cargo doesn't seem to do more than to switch out the hasher, so it's surprising that it doesn't work the same when that's done here.

Maybe the approach can be to find a test in Cargo where it would need the sparse-index path, and then figure out where it's actually generating the hash for the input path. Ideally one can get to a point where one can see exactly what's going into the stable-hash so it can be replicated here exactly.

@UebelAndre UebelAndre force-pushed the stable_hash branch 2 times, most recently from 9f63743 to 456b8e2 Compare February 9, 2025 05:06
@UebelAndre
Copy link
Contributor Author

I've implemented a hasher that matches what Cargo produces. I tested by adding the examples in the repo to @cargo//src/cargo/core/source_id.rs and ensuring the results were identical. What I'm not sure about is the canonical form of the URL which I've kept backward compatible but do not know what Cargo would produce for various registries with the new registry hashing strategy.

@UebelAndre UebelAndre force-pushed the stable_hash branch 3 times, most recently from 063fbdc to 5b5c665 Compare February 9, 2025 06:17
Copy link
Collaborator

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great that it works now!

It's probably on me to review and push into a draft PR, as these changes have now been overwritten. Also it's not visible anymore what the change was that made it work, but that would just have satisfied my own curiosity and isn't material to getting this PR merged.

Maybe you can integrate the changes I put on top the previous version. To my mind, the PR would then probably be ready for merging.

commit 285913dc50708ff27d577cce1bb1fc7371586336
Author: Sebastian Thiel <[email protected]>
Date:   Sat Feb 8 16:42:55 2025 +0100

    refactor
    
    - replace boolean with an enum to make purpose obvious
    - remove legacy hex implementation

diff --git a/src/dirs.rs b/src/dirs.rs
index 3d2df34..5a4760b 100644
--- a/src/dirs.rs
+++ b/src/dirs.rs
@@ -17,12 +17,12 @@ pub fn local_path_and_canonical_url(
     path.push("registry");
     path.push("index");
 
-    let (dir_name, canonical) = url_to_local_dir(url, true)?;
+    let (dir_name, canonical) = url_to_local_dir(url, HashKind::Stable)?;
     let canonical_url = if path.join(&dir_name).exists() {
         path.push(dir_name);
         canonical
     } else {
-        let (dir_name, canonical) = url_to_local_dir(url, false)?;
+        let (dir_name, canonical) = url_to_local_dir(url, HashKind::Legacy)?;
         path.push(dir_name);
         canonical
     };
@@ -77,37 +77,14 @@ pub(crate) fn crate_name_to_relative_path(crate_name: &str, separator: Option<ch
     Some(rel_path)
 }
 
+enum HashKind {
+    Stable,
+    Legacy,
+}
+
 /// Converts a full url, eg https://github.com/rust-lang/crates.io-index, into
 /// the root directory name where cargo itself will fetch it on disk
-fn url_to_local_dir(url: &str, use_stable_hash: bool) -> Result<(String, String), Error> {
-    fn legacy_to_hex(num: u64) -> String {
-        const CHARS: &[u8] = b"0123456789abcdef";
-
-        let bytes = &[
-            num as u8,
-            (num >> 8) as u8,
-            (num >> 16) as u8,
-            (num >> 24) as u8,
-            (num >> 32) as u8,
-            (num >> 40) as u8,
-            (num >> 48) as u8,
-            (num >> 56) as u8,
-        ];
-
-        let mut output = vec![0u8; 16];
-
-        let mut ind = 0;
-
-        for &byte in bytes {
-            output[ind] = CHARS[(byte >> 4) as usize];
-            output[ind + 1] = CHARS[(byte & 0xf) as usize];
-
-            ind += 2;
-        }
-
-        String::from_utf8(output).expect("valid utf-8 hex string")
-    }
-
+fn url_to_local_dir(url: &str, hash_kind: HashKind) -> Result<(String, String), Error> {
     #[allow(deprecated)]
     fn legacy_hash_u64(url: &str, registry_kind: u64) -> u64 {
         use std::hash::{Hash, Hasher, SipHasher};
@@ -120,7 +97,7 @@ fn url_to_local_dir(url: &str, use_stable_hash: bool) -> Result<(String, String)
         hasher.finish()
     }
 
-    fn stable_to_hex(num: u64) -> String {
+    fn to_hex(num: u64) -> String {
         hex::encode(num.to_le_bytes())
     }
 
@@ -137,18 +114,13 @@ fn url_to_local_dir(url: &str, use_stable_hash: bool) -> Result<(String, String)
         Hasher::finish(&hasher)
     }
 
-    let hash_u64 = if use_stable_hash {
+    let is_stable_hash = matches!(hash_kind, HashKind::Stable);
+    let hash_u64 = if is_stable_hash {
         stable_hash_u64
     } else {
         legacy_hash_u64
     };
 
-    let to_hex = if use_stable_hash {
-        stable_to_hex
-    } else {
-        legacy_to_hex
-    };
-
     // SourceKind::Registry
     let mut registry_kind = 2;
 
@@ -221,16 +193,17 @@ fn url_to_local_dir(url: &str, use_stable_hash: bool) -> Result<(String, String)
 
 #[cfg(test)]
 mod test {
+    use crate::dirs::HashKind;
 
     #[test]
     fn http_index_url_matches_cargo() {
         use crate::sparse::URL;
         assert_eq!(
-            super::url_to_local_dir(URL, false).unwrap(),
+            super::url_to_local_dir(dbg!(URL), HashKind::Legacy).unwrap(),
             ("index.crates.io-6f17d22bba15001f".to_owned(), URL.to_owned(),)
         );
         assert_eq!(
-            super::url_to_local_dir(URL, true).unwrap(),
+            super::url_to_local_dir(URL, HashKind::Stable).unwrap(),
             ("index.crates.io-1949cf8c6b5b557f".to_owned(), URL.to_owned(),)
         );
 
@@ -240,7 +213,7 @@ mod test {
         assert_eq!(
             super::url_to_local_dir(
                 "https://dl.cloudsmith.io/aBcW1234aBcW1234/embark/rust/cargo/index.git",
-                false
+                HashKind::Legacy
             )
             .unwrap(),
             (
@@ -251,7 +224,7 @@ mod test {
         assert_eq!(
             super::url_to_local_dir(
                 "https://dl.cloudsmith.io/aBcW1234aBcW1234/embark/rust/cargo/index.git",
-                true
+                HashKind::Stable
             )
             .unwrap(),
             (
@@ -266,22 +239,24 @@ mod test {
     fn git_url_matches_cargo() {
         use crate::git::URL;
         assert_eq!(
-            crate::dirs::url_to_local_dir(URL, false).unwrap(),
+            crate::dirs::url_to_local_dir(URL, HashKind::Legacy).unwrap(),
             ("github.com-1ecc6299db9ec823".to_owned(), URL.to_owned())
         );
         assert_eq!(
-            crate::dirs::url_to_local_dir(URL, true).unwrap(),
+            crate::dirs::url_to_local_dir(URL, HashKind::Stable).unwrap(),
             ("github.com-1ecc6299db9ec823".to_owned(), URL.to_owned())
         );
 
         // Ensure we actually strip off the irrelevant parts of a url, note that
         // the .git suffix is not part of the canonical url, but *is* used when hashing
         assert_eq!(
-            crate::dirs::url_to_local_dir(&format!("registry+{}.git?one=1&two=2#fragment", URL), false).unwrap(),
+            crate::dirs::url_to_local_dir(&format!("registry+{}.git?one=1&two=2#fragment", URL), HashKind::Legacy)
+                .unwrap(),
             ("github.com-c786010fb7ef2e6e".to_owned(), URL.to_owned())
         );
         assert_eq!(
-            crate::dirs::url_to_local_dir(&format!("registry+{}.git?one=1&two=2#fragment", URL), true).unwrap(),
+            crate::dirs::url_to_local_dir(&format!("registry+{}.git?one=1&two=2#fragment", URL), HashKind::Stable)
+                .unwrap(),
             ("github.com-c786010fb7ef2e6e".to_owned(), URL.to_owned())
         );
     }

What I'm not sure about is the canonical form of the URL which I've kept backward compatible but do not know what Cargo would produce for various registries with the new registry hashing strategy.

It's probably OK to merge this PR as it's seemingly working, and put a fix later once it's clearer how custom registry paths truly work now.

@UebelAndre
Copy link
Contributor Author

UebelAndre commented Feb 9, 2025

re #184 (review) @Byron

It's probably OK to merge this PR as it's seemingly working, and put a fix later once it's clearer how custom registry paths truly work now.

I've applied your patch and also added new functions suffixed with _with_hash_kind that enable users to choose what hashing strategy they use and avoid an interface change for to be less disruptive to folks.

It's probably on me to review and push into a draft PR, as these changes have now been overwritten. Also it's not visible anymore what the change was that made it work, but that would just have satisfied my own curiosity and isn't material to getting this PR merged.

Sorry! I saw you merged #185 so had fetched master and rebased my local branch without fetching 😞. I wasn't expecting a full review or I wouldn't have force pushed anything. The most notable changes were:

  1. Hashing registry kind as isize
  2. Updating the "canonical" url being hashed, which cargo would appear to now treat differently.

Again, for testing I used the test data from http_index_url_matches_cargo and put it in @cargo//src/cargo/core/source_id.rs and updated my changes until they produced the same output as Cargo.

@UebelAndre
Copy link
Contributor Author

@Byron also, whenever whatever form of this merges, would it be possible to get a new release on crates.io?

@UebelAndre UebelAndre requested a review from Byron February 10, 2025 17:04
@UebelAndre
Copy link
Contributor Author

@Byron friendly ping 😄

let me know if there’s anything more you think I should do for this change!

UebelAndre and others added 2 commits February 19, 2025 07:17
 This also adds `dirs::local_path_and_canonical_url_with_hash_kind()`
and `SparseIndex::with_path_and_hash_kind()` to control
which hash is used.
Copy link
Collaborator

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this looks good now!

I will create a new release shortly.

@Byron Byron merged commit 1f3b4b0 into frewsxcv:master Feb 19, 2025
3 checks passed
github-merge-queue bot pushed a commit to bazelbuild/rules_rust that referenced this pull request Feb 20, 2025
This change uses frewsxcv/rust-crates-index#184
to fix issues around locating crate files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Registry index paths changed in Rust 1.85.0
2 participants