-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for new index hash implementation. #184
Conversation
fc491c2
to
99d9c77
Compare
Thanks a lot for tackling this! I took a look as well but couldn't see what the difference is. Cargo doesn't seem to do more than to switch out the hasher, so it's surprising that it doesn't work the same when that's done here. Maybe the approach can be to find a test in Cargo where it would need the sparse-index path, and then figure out where it's actually generating the hash for the input path. Ideally one can get to a point where one can see exactly what's going into the stable-hash so it can be replicated here exactly. |
9f63743
to
456b8e2
Compare
I've implemented a hasher that matches what Cargo produces. I tested by adding the examples in the repo to @cargo//src/cargo/core/source_id.rs and ensuring the results were identical. What I'm not sure about is the canonical form of the URL which I've kept backward compatible but do not know what Cargo would produce for various registries with the new registry hashing strategy. |
063fbdc
to
5b5c665
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great that it works now!
It's probably on me to review and push into a draft PR, as these changes have now been overwritten. Also it's not visible anymore what the change was that made it work, but that would just have satisfied my own curiosity and isn't material to getting this PR merged.
Maybe you can integrate the changes I put on top the previous version. To my mind, the PR would then probably be ready for merging.
commit 285913dc50708ff27d577cce1bb1fc7371586336
Author: Sebastian Thiel <[email protected]>
Date: Sat Feb 8 16:42:55 2025 +0100
refactor
- replace boolean with an enum to make purpose obvious
- remove legacy hex implementation
diff --git a/src/dirs.rs b/src/dirs.rs
index 3d2df34..5a4760b 100644
--- a/src/dirs.rs
+++ b/src/dirs.rs
@@ -17,12 +17,12 @@ pub fn local_path_and_canonical_url(
path.push("registry");
path.push("index");
- let (dir_name, canonical) = url_to_local_dir(url, true)?;
+ let (dir_name, canonical) = url_to_local_dir(url, HashKind::Stable)?;
let canonical_url = if path.join(&dir_name).exists() {
path.push(dir_name);
canonical
} else {
- let (dir_name, canonical) = url_to_local_dir(url, false)?;
+ let (dir_name, canonical) = url_to_local_dir(url, HashKind::Legacy)?;
path.push(dir_name);
canonical
};
@@ -77,37 +77,14 @@ pub(crate) fn crate_name_to_relative_path(crate_name: &str, separator: Option<ch
Some(rel_path)
}
+enum HashKind {
+ Stable,
+ Legacy,
+}
+
/// Converts a full url, eg https://github.com/rust-lang/crates.io-index, into
/// the root directory name where cargo itself will fetch it on disk
-fn url_to_local_dir(url: &str, use_stable_hash: bool) -> Result<(String, String), Error> {
- fn legacy_to_hex(num: u64) -> String {
- const CHARS: &[u8] = b"0123456789abcdef";
-
- let bytes = &[
- num as u8,
- (num >> 8) as u8,
- (num >> 16) as u8,
- (num >> 24) as u8,
- (num >> 32) as u8,
- (num >> 40) as u8,
- (num >> 48) as u8,
- (num >> 56) as u8,
- ];
-
- let mut output = vec![0u8; 16];
-
- let mut ind = 0;
-
- for &byte in bytes {
- output[ind] = CHARS[(byte >> 4) as usize];
- output[ind + 1] = CHARS[(byte & 0xf) as usize];
-
- ind += 2;
- }
-
- String::from_utf8(output).expect("valid utf-8 hex string")
- }
-
+fn url_to_local_dir(url: &str, hash_kind: HashKind) -> Result<(String, String), Error> {
#[allow(deprecated)]
fn legacy_hash_u64(url: &str, registry_kind: u64) -> u64 {
use std::hash::{Hash, Hasher, SipHasher};
@@ -120,7 +97,7 @@ fn url_to_local_dir(url: &str, use_stable_hash: bool) -> Result<(String, String)
hasher.finish()
}
- fn stable_to_hex(num: u64) -> String {
+ fn to_hex(num: u64) -> String {
hex::encode(num.to_le_bytes())
}
@@ -137,18 +114,13 @@ fn url_to_local_dir(url: &str, use_stable_hash: bool) -> Result<(String, String)
Hasher::finish(&hasher)
}
- let hash_u64 = if use_stable_hash {
+ let is_stable_hash = matches!(hash_kind, HashKind::Stable);
+ let hash_u64 = if is_stable_hash {
stable_hash_u64
} else {
legacy_hash_u64
};
- let to_hex = if use_stable_hash {
- stable_to_hex
- } else {
- legacy_to_hex
- };
-
// SourceKind::Registry
let mut registry_kind = 2;
@@ -221,16 +193,17 @@ fn url_to_local_dir(url: &str, use_stable_hash: bool) -> Result<(String, String)
#[cfg(test)]
mod test {
+ use crate::dirs::HashKind;
#[test]
fn http_index_url_matches_cargo() {
use crate::sparse::URL;
assert_eq!(
- super::url_to_local_dir(URL, false).unwrap(),
+ super::url_to_local_dir(dbg!(URL), HashKind::Legacy).unwrap(),
("index.crates.io-6f17d22bba15001f".to_owned(), URL.to_owned(),)
);
assert_eq!(
- super::url_to_local_dir(URL, true).unwrap(),
+ super::url_to_local_dir(URL, HashKind::Stable).unwrap(),
("index.crates.io-1949cf8c6b5b557f".to_owned(), URL.to_owned(),)
);
@@ -240,7 +213,7 @@ mod test {
assert_eq!(
super::url_to_local_dir(
"https://dl.cloudsmith.io/aBcW1234aBcW1234/embark/rust/cargo/index.git",
- false
+ HashKind::Legacy
)
.unwrap(),
(
@@ -251,7 +224,7 @@ mod test {
assert_eq!(
super::url_to_local_dir(
"https://dl.cloudsmith.io/aBcW1234aBcW1234/embark/rust/cargo/index.git",
- true
+ HashKind::Stable
)
.unwrap(),
(
@@ -266,22 +239,24 @@ mod test {
fn git_url_matches_cargo() {
use crate::git::URL;
assert_eq!(
- crate::dirs::url_to_local_dir(URL, false).unwrap(),
+ crate::dirs::url_to_local_dir(URL, HashKind::Legacy).unwrap(),
("github.com-1ecc6299db9ec823".to_owned(), URL.to_owned())
);
assert_eq!(
- crate::dirs::url_to_local_dir(URL, true).unwrap(),
+ crate::dirs::url_to_local_dir(URL, HashKind::Stable).unwrap(),
("github.com-1ecc6299db9ec823".to_owned(), URL.to_owned())
);
// Ensure we actually strip off the irrelevant parts of a url, note that
// the .git suffix is not part of the canonical url, but *is* used when hashing
assert_eq!(
- crate::dirs::url_to_local_dir(&format!("registry+{}.git?one=1&two=2#fragment", URL), false).unwrap(),
+ crate::dirs::url_to_local_dir(&format!("registry+{}.git?one=1&two=2#fragment", URL), HashKind::Legacy)
+ .unwrap(),
("github.com-c786010fb7ef2e6e".to_owned(), URL.to_owned())
);
assert_eq!(
- crate::dirs::url_to_local_dir(&format!("registry+{}.git?one=1&two=2#fragment", URL), true).unwrap(),
+ crate::dirs::url_to_local_dir(&format!("registry+{}.git?one=1&two=2#fragment", URL), HashKind::Stable)
+ .unwrap(),
("github.com-c786010fb7ef2e6e".to_owned(), URL.to_owned())
);
}
What I'm not sure about is the canonical form of the URL which I've kept backward compatible but do not know what Cargo would produce for various registries with the new registry hashing strategy.
It's probably OK to merge this PR as it's seemingly working, and put a fix later once it's clearer how custom registry paths truly work now.
I've applied your patch and also added new functions suffixed with
Sorry! I saw you merged #185 so had fetched
Again, for testing I used the test data from |
@Byron also, whenever whatever form of this merges, would it be possible to get a new release on crates.io? |
@Byron friendly ping 😄 let me know if there’s anything more you think I should do for this change! |
This also adds `dirs::local_path_and_canonical_url_with_hash_kind()` and `SparseIndex::with_path_and_hash_kind()` to control which hash is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, this looks good now!
I will create a new release shortly.
This change uses frewsxcv/rust-crates-index#184 to fix issues around locating crate files.
closes #182