-
Notifications
You must be signed in to change notification settings - Fork 140
Flatbuffers impl #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Flatbuffers impl #446
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust Benchmark
Benchmark suite | Current: 79b0b83 | Previous: 67b7b70 | Ratio |
---|---|---|---|
rule-match-browserlike/brave-list |
2191632901 ns/iter (± 16546585 ) |
1803081237 ns/iter (± 16195247 ) |
1.22 |
rule-match-first-request/brave-list |
1049815 ns/iter (± 25195 ) |
980016 ns/iter (± 5452 ) |
1.07 |
blocker_new/brave-list |
160335860 ns/iter (± 2521451 ) |
203961371 ns/iter (± 2004867 ) |
0.79 |
memory-usage/brave-list-initial |
21536659 ns/iter (± 3 ) |
41762172 ns/iter (± 3 ) |
0.52 |
memory-usage/brave-list-after-1000-requests |
24141128 ns/iter (± 3 ) |
44355700 ns/iter (± 3 ) |
0.54 |
This comment was automatically generated by workflow using github-action-benchmark.
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that from_raw_parts
results in issues if bytes.as_ptr()
isn't aligned to 2 bytes We need to assert this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert added
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
src/flat_network_filter_list.rs
Outdated
} | ||
|
||
let filters_list = | ||
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boocmp use root_as_network_filter_list()
+ .expect()
to remove unsafe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it runs a verification process, it'll degrade performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we build the flatbuffer just before using root_as_network_filter_list_unchecked
should be fine in terms of security.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this check on deserialization
unsafe { | ||
self._tab | ||
.get::<flatbuffers::ForwardsUOffset<&str>>(NetworkFilter::VT_RAW_LINE, None) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few nits
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
let mut disabled_directives: HashSet<String> = HashSet::new(); | ||
let mut enabled_directives: HashSet<String> = HashSet::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not &str
anymore?
if self.filter_map.is_empty() { | ||
return None; | ||
} | ||
|
||
let filters_list = | ||
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
|
||
if self.filter_map.is_empty() { | ||
return filters; | ||
} | ||
|
||
let filters_list = | ||
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
0238db7
to
a40fa5e
Compare
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
src/engine_serializer.rs
Outdated
pub trait EngineSerializer { | ||
fn serialize(&self) -> Result<Vec<u8>, SerializeError>; | ||
fn deserialize(&mut self, serialized: &[u8]) -> Result<(), DeserializeError>; | ||
} | ||
|
||
impl EngineSerializer for Engine { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we leave these directly on impl Engine
? I see no reason to move it into a separate trait; just makes the API slightly more confusing for users
index: u32, | ||
owner: &'a NetworkFilterList, | ||
) -> Self { | ||
let list_address: *const NetworkFilterList = owner as *const NetworkFilterList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pointer casting often indicates some problematic design patterns in Rust. I think we can remove it in this instance?
- this
owner
field can increase memory usage since it's an additional address-sized field on each filter, even though they'd all have the same value for all filters in a list - I only see it used in
NetworkMatchable
to make it conform to the existing API. I'm not opposed to that API changing if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FlatNetworkFilter
is a wrapper over the fb::NetworkFilter
. We only create the instance of it when found the the fb::NetworkFilter
in the NetworkFilterList
so it does't increase the memory usage because it temporary object on the stack. For now, regex manager is designed to use unique key to find corresponding regex, previously it was an address of the NetworkFilter
, but now we don't have unique address for every filter because they stored in flatbuffer. To create unique key we use the list address combined with the filter's index. I agree it looks ugly but I don't know how to do it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's understand why we need this unique key
.
If it's only for regex manager, let's:
- try to move key calculation here
- add TODO to make keys unique only in scope of regex manager. I believe once that in one of the next steps we move from multiple flatbuffers to a just one that ables to contain multiple lists. That way we get 1:1 proportion (regex manager: flatbuffer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this could be done in follow up: this PR is already waiting for a long time.
src/filters/fb_network.rs
Outdated
patterns: patterns, | ||
modifier_option: modifier_option, | ||
opt_domains: opt_domains, | ||
opt_not_domains: opt_not_domains, | ||
hostname: hostname, | ||
tag: tag, | ||
raw_line: raw_line, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these can all use shorthand field initialization
src/network_filter_list.rs
Outdated
#[derive(Serialize, Deserialize)] | ||
pub struct NetworkFilterList { | ||
pub(crate) flatbuffer_memory: Vec<u8>, | ||
pub(crate) filter_map: HashMap<Hash, Vec<u32>>, | ||
pub(crate) unique_domains_hashes_map: HashMap<Hash, u16>, | ||
} | ||
|
||
impl Default for NetworkFilterList { | ||
fn default() -> Self { | ||
Self { | ||
flatbuffer_memory: Default::default(), | ||
filter_map: Default::default(), | ||
unique_domains_hashes_map: Default::default(), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(Serialize, Deserialize, Default)]
should automatically produce this Default
implementation already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason it doesn't, I think because of the new version of serde
src/network_filter_list.rs
Outdated
} | ||
|
||
#[derive(Serialize, Deserialize)] | ||
pub struct NetworkFilterList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep the original pub(crate)
visibility?
assert_eq!(added.err().unwrap(), BlockerError::BadFilterAddUnsupported); | ||
} | ||
|
||
#[test] | ||
#[ignore] | ||
fn filter_add_twice_handling_error() { | ||
{ | ||
// Not allow filter to be added twice hwn the engine is not optimised | ||
let blocker_options: BlockerOptions = BlockerOptions { | ||
enable_optimizations: false, | ||
}; | ||
|
||
let mut blocker = Blocker::new(Vec::new(), &blocker_options); | ||
|
||
let filter = NetworkFilter::parse("adv", true, Default::default()).unwrap(); | ||
blocker.add_filter(filter.clone()).unwrap(); | ||
assert!( | ||
blocker.filter_exists(&filter), | ||
"Expected filter to be inserted" | ||
); | ||
let added = blocker.add_filter(filter); | ||
assert!(added.is_err(), "Expected repeated insertion to fail"); | ||
assert_eq!( | ||
added.err().unwrap(), | ||
BlockerError::FilterExists, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BadFilterAddUnsupported
and FilterExists
error variants can be removed now
[puLL-Merge] - brave/adblock-rust@446 DescriptionThis PR changes the internal representation of network filters in the adblock-rust library, switching to using flatbuffers for serialization instead of the previous approach. The main goal appears to be improving performance and storage efficiency. The PR also refactors the serialization benchmarks into a separate file and makes flatbuffers a required dependency rather than an optional one. ChangesChanges
sequenceDiagram
participant C as Client
participant E as Engine
participant B as Blocker
participant FB as FlatNetworkFiltersListBuilder
participant NFList as NetworkFilterList
C->>E: create Engine
E->>B: create Blocker
B->>FB: new FlatNetworkFiltersListBuilder
loop For each network filter
B->>FB: add filter
FB-->>B: filter index
end
FB->>B: finish() (creates flatbuffer)
B-->>E: return Blocker with flatbuffer storage
C->>E: serialize()
E->>E: create SerializeFormat with blocker & cosmetic_cache
E-->>C: return binary data
C->>E: deserialize(binary)
E->>E: parse and rebuild internal state from binary
E-->>C: return success/error
C->>E: check_network_request(request)
E->>B: check(request, resources)
B->>NFList: match_rule(request, active_tags, regex_manager)
NFList->>FB: get network_filter by index
FB-->>NFList: return FlatNetworkFilter
NFList->>NFList: check if filter matches request
NFList-->>B: return CheckResult
B-->>E: return BlockerResult
E-->>C: return result
Possible Issues
Security HotspotsNo significant security concerns were identified. The PR mainly changes internal storage mechanisms rather than security-sensitive components. |
Added raw_line field to flatbuffers for testing.
NetworkFilterList
implementation is now based onFlatBuffers
.NetworkFilterList
creation has been removed. Since rebuilding the entireFlatBuffer
is costly, this functionality may be reintroduced in the future if there is a clear need for it.