Skip to content

Flatbuffers impl #446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Flatbuffers impl #446

wants to merge 14 commits into from

Conversation

boocmp
Copy link
Collaborator

@boocmp boocmp commented Mar 28, 2025

  1. The NetworkFilterList implementation is now based on FlatBuffers.
  2. The serialization format has been updated to support this change.
  3. Serialization benchmarks have been moved to a separate file.
  4. Support for adding filters after NetworkFilterList creation has been removed. Since rebuilding the entire FlatBuffer is costly, this functionality may be reintroduced in the future if there is a clear need for it.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust Benchmark

Benchmark suite Current: 79b0b83 Previous: 67b7b70 Ratio
rule-match-browserlike/brave-list 2191632901 ns/iter (± 16546585) 1803081237 ns/iter (± 16195247) 1.22
rule-match-first-request/brave-list 1049815 ns/iter (± 25195) 980016 ns/iter (± 5452) 1.07
blocker_new/brave-list 160335860 ns/iter (± 2521451) 203961371 ns/iter (± 2004867) 0.79
memory-usage/brave-list-initial 21536659 ns/iter (± 3) 41762172 ns/iter (± 3) 0.52
memory-usage/brave-list-after-1000-requests 24141128 ns/iter (± 3) 44355700 ns/iter (± 3) 0.54

This comment was automatically generated by workflow using github-action-benchmark.

@boocmp boocmp force-pushed the flatbuffers_impl branch from 91c5ec7 to cacfcc6 Compare April 9, 2025 00:55
bytes.as_ptr() as *const u16,
bytes.len() / std::mem::size_of::<u16>(),
)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that from_raw_parts results in issues if bytes.as_ptr() isn't aligned to 2 bytes We need to assert this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert added

bytes.as_ptr() as *const u16,
bytes.len() / std::mem::size_of::<u16>(),
)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

}

let filters_list =
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boocmp use root_as_network_filter_list() + .expect() to remove unsafe

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it runs a verification process, it'll degrade performance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we build the flatbuffer just before using root_as_network_filter_list_unchecked should be fine in terms of security.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this check on deserialization

unsafe {
self._tab
.get::<flatbuffers::ForwardsUOffset<&str>>(NetworkFilter::VT_RAW_LINE, None)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

@boocmp boocmp force-pushed the flatbuffers_impl branch from cacfcc6 to a3df20d Compare April 9, 2025 01:05
@boocmp boocmp marked this pull request as ready for review April 9, 2025 01:38
@boocmp boocmp requested review from atuchin-m and antonok-edm April 9, 2025 01:38
Copy link
Collaborator

@atuchin-m atuchin-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few nits

bytes.as_ptr() as *const u16,
bytes.len() / std::mem::size_of::<u16>(),
)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

bytes.as_ptr() as *const u16,
bytes.len() / std::mem::size_of::<u16>(),
)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

Comment on lines +392 to +371
let mut disabled_directives: HashSet<String> = HashSet::new();
let mut enabled_directives: HashSet<String> = HashSet::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not &str anymore?

if self.filter_map.is_empty() {
return None;
}

let filters_list =
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog


if self.filter_map.is_empty() {
return filters;
}

let filters_list =
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

@boocmp boocmp force-pushed the flatbuffers_impl branch 4 times, most recently from 0238db7 to a40fa5e Compare April 28, 2025 07:50
bytes.as_ptr() as *const u16,
bytes.len() / std::mem::size_of::<u16>(),
)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

bytes.as_ptr() as *const u16,
bytes.len() / std::mem::size_of::<u16>(),
)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

bytes.as_ptr() as *const u16,
bytes.len() / std::mem::size_of::<u16>(),
)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

Comment on lines 25 to 30
pub trait EngineSerializer {
fn serialize(&self) -> Result<Vec<u8>, SerializeError>;
fn deserialize(&mut self, serialized: &[u8]) -> Result<(), DeserializeError>;
}

impl EngineSerializer for Engine {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we leave these directly on impl Engine? I see no reason to move it into a separate trait; just makes the API slightly more confusing for users

index: u32,
owner: &'a NetworkFilterList,
) -> Self {
let list_address: *const NetworkFilterList = owner as *const NetworkFilterList;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointer casting often indicates some problematic design patterns in Rust. I think we can remove it in this instance?

  • this owner field can increase memory usage since it's an additional address-sized field on each filter, even though they'd all have the same value for all filters in a list
  • I only see it used in NetworkMatchable to make it conform to the existing API. I'm not opposed to that API changing if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FlatNetworkFilter is a wrapper over the fb::NetworkFilter. We only create the instance of it when found the the fb::NetworkFilter in the NetworkFilterList so it does't increase the memory usage because it temporary object on the stack. For now, regex manager is designed to use unique key to find corresponding regex, previously it was an address of the NetworkFilter, but now we don't have unique address for every filter because they stored in flatbuffer. To create unique key we use the list address combined with the filter's index. I agree it looks ugly but I don't know how to do it better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's understand why we need this unique key.
If it's only for regex manager, let's:

  1. try to move key calculation here
  2. add TODO to make keys unique only in scope of regex manager. I believe once that in one of the next steps we move from multiple flatbuffers to a just one that ables to contain multiple lists. That way we get 1:1 proportion (regex manager: flatbuffer).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this could be done in follow up: this PR is already waiting for a long time.

Comment on lines 104 to 110
patterns: patterns,
modifier_option: modifier_option,
opt_domains: opt_domains,
opt_not_domains: opt_not_domains,
hostname: hostname,
tag: tag,
raw_line: raw_line,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can all use shorthand field initialization

Comment on lines 48 to 62
#[derive(Serialize, Deserialize)]
pub struct NetworkFilterList {
pub(crate) flatbuffer_memory: Vec<u8>,
pub(crate) filter_map: HashMap<Hash, Vec<u32>>,
pub(crate) unique_domains_hashes_map: HashMap<Hash, u16>,
}

impl Default for NetworkFilterList {
fn default() -> Self {
Self {
flatbuffer_memory: Default::default(),
filter_map: Default::default(),
unique_domains_hashes_map: Default::default(),
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[derive(Serialize, Deserialize, Default)] should automatically produce this Default implementation already

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason it doesn't, I think because of the new version of serde

}

#[derive(Serialize, Deserialize)]
pub struct NetworkFilterList {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep the original pub(crate) visibility?

Comment on lines -1302 to -1326
assert_eq!(added.err().unwrap(), BlockerError::BadFilterAddUnsupported);
}

#[test]
#[ignore]
fn filter_add_twice_handling_error() {
{
// Not allow filter to be added twice hwn the engine is not optimised
let blocker_options: BlockerOptions = BlockerOptions {
enable_optimizations: false,
};

let mut blocker = Blocker::new(Vec::new(), &blocker_options);

let filter = NetworkFilter::parse("adv", true, Default::default()).unwrap();
blocker.add_filter(filter.clone()).unwrap();
assert!(
blocker.filter_exists(&filter),
"Expected filter to be inserted"
);
let added = blocker.add_filter(filter);
assert!(added.is_err(), "Expected repeated insertion to fail");
assert_eq!(
added.err().unwrap(),
BlockerError::FilterExists,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BadFilterAddUnsupported and FilterExists error variants can be removed now

Copy link

[puLL-Merge] - brave/adblock-rust@446

Description

This PR changes the internal representation of network filters in the adblock-rust library, switching to using flatbuffers for serialization instead of the previous approach. The main goal appears to be improving performance and storage efficiency. The PR also refactors the serialization benchmarks into a separate file and makes flatbuffers a required dependency rather than an optional one.

Changes

Changes

  1. Cargo.toml

    • Makes flatbuffers a required dependency (no longer optional)
    • Removes the optional flatbuffers-storage feature
    • Adds a new benchmark for serialization
  2. Benchmarks

    • Moves serialization/deserialization benchmarks from bench_matching.rs to a new file bench_serialization.rs
    • Updates serialization benchmark to use the new serialize() method instead of serialize_raw()
  3. FlatBuffers Implementation

    • Adds a new module fb_network.rs that implements flatbuffers-based network filter storage
    • Updates the flatbuffers schema to include a raw_line field
  4. Engine API

    • Renames serialize_raw() to serialize() and deserialize() remains the same
    • Makes blocker and cosmetic_cache fields public(crate)
    • Adjusts internal implementation to use the new flatbuffers approach
  5. Blocker Implementation

    • Removes methods optimize(), add_filter(), and filter_exists() from Blocker
    • Changes internal filter storage to use indices into the flatbuffer instead of Arc pointers
  6. Network Filter Matching

    • Refactors check_options, splitting it into separate functions for domain inclusion/exclusion
    • Adds versions of these functions that work with the new flatbuffers storage
  7. Examples and Tests

    • Updates examples to use the new API
    • Updates tests to work with the new implementation
sequenceDiagram
    participant C as Client
    participant E as Engine
    participant B as Blocker
    participant FB as FlatNetworkFiltersListBuilder
    participant NFList as NetworkFilterList

    C->>E: create Engine
    E->>B: create Blocker
    B->>FB: new FlatNetworkFiltersListBuilder
    loop For each network filter
        B->>FB: add filter
        FB-->>B: filter index
    end
    FB->>B: finish() (creates flatbuffer)
    B-->>E: return Blocker with flatbuffer storage

    C->>E: serialize()
    E->>E: create SerializeFormat with blocker & cosmetic_cache
    E-->>C: return binary data

    C->>E: deserialize(binary)
    E->>E: parse and rebuild internal state from binary
    E-->>C: return success/error
    
    C->>E: check_network_request(request)
    E->>B: check(request, resources)
    B->>NFList: match_rule(request, active_tags, regex_manager)
    NFList->>FB: get network_filter by index
    FB-->>NFList: return FlatNetworkFilter
    NFList->>NFList: check if filter matches request
    NFList-->>B: return CheckResult
    B-->>E: return BlockerResult
    E-->>C: return result
Loading

Possible Issues

  1. Some methods have been removed from the Blocker class that may have been used by external code, such as optimize(), add_filter(), and filter_exists().

  2. The PR introduces a significant change in the internal storage format, which might result in compatibility issues with existing serialized engines.

  3. The error types and debug information have changed, which could affect error handling in client code.

Security Hotspots

No significant security concerns were identified. The PR mainly changes internal storage mechanisms rather than security-sensitive components.

@boocmp boocmp requested review from antonok-edm and atuchin-m May 16, 2025 08:00
@boocmp boocmp force-pushed the flatbuffers_impl branch from a541821 to 79b0b83 Compare May 16, 2025 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants