[enhancement]: Detect fake "addresses" in From headers #982

TaaviE · 2024-12-04T12:36:36Z

Which feature or improvement would you like to request?

It would be very nice if Stalwart had build-in rules to detect fake "addresses" in From headers' comment section. Such as [email protected] in the header content "Foo ([email protected])" <[email protected]>.

Any such fake address (because it's not actually functional) should ideally get a symbol indicating its existence (with a small score). After that, it should be checked if the used domain's DMARC/DKIM/SPF align, if it doesn't give it a new symbol with a higher score (but probably not as high as as full DMARC failure for example, because there might be false positives).

Is your feature request related to a problem?

Spammers are starting to increasingly (ab)use the comment part in From headers, by including a fake "address" with parenthesis around. This is likely because a lot of (IMHO very poorly designed) MUAs hide the actual sender should a comment exist in the From. Making the address in parenthesis look like the actual one.

Stalwart should be able to penalize both the existence of such fake addresses in From header(s) but also detect when such addresses do not match the sender and/or do not align with SPF/DMARC/DKIM.

Additional context

In addition to "Foo ([email protected])" <[email protected]> I've also seen "Foo [email protected]" <[email protected]> and "Foo '[email protected]'" <[email protected]>.

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

nomadturk · 2024-12-04T13:21:45Z

How about:

"Foo Bar '<[email protected]>' , <[email protected]>, [email protected]" <[email protected]>
or
Foo Bar "'<[email protected]>' , <[email protected]>, [email protected]" <[email protected]>

Commas and multiple quotes also create confusion.
But did you actually test to see if Stalwart displays the fake address @TaaviE ?

TaaviE · 2024-12-04T14:06:23Z

@nomadturk

How about [...]

I haven't encountered such variations. I suspect the efficacy of such approaches decline rapidly with everything less common than parentheses. Though Stalwart should pick out those addresses as well and penalize any forgeries or mismatches.

But did you actually test to see if Stalwart displays the fake address @TaaviE ?

Stalwart is not a MUA that would be doing any displaying that matters in this case.

nomadturk · 2024-12-04T14:57:16Z

Well, that's the thing. Stalwart does pick them up properly and save the email, as intended.

I've tried it with Snappy, Roundcube and Twake Mail. There are no problems there. They were able to parse those emails properly as well. SnappyMail for example, does display a tick if DKIM/SPF etc are all valid.
So, as far as I can see, Stalwart does most of what it should and there are no problems here.

Since it's not clear, maybe the wording for your ask should be more like:
"Add another spam rule to check for emails inside the from address and punish them even further"

I don't think the server should dictate these rules for everyone though. The way it is now, we have no say over them and it's impossible to use external rulesets like KAM rules and so on.

I believe that will be possible when the new Spam filtering will be written #947
We should be able to add our own spamassassin compatible rules, update or modify rules and be able to do add such a filter ourselves.

Hopefully, when that happens, we can create custom rules like below and achieve what you ask for by then.
(This rule probably won't work)


# Rule to detect email addresses embedded in the name portion of the From header
header   NAME_EMAIL_IN_NAME_FIELD From =~ /["']?[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}.*["']?\s*<.*>/
describe NAME_EMAIL_IN_NAME_FIELD Email address appears in name portion of From header
score    NAME_EMAIL_IN_NAME_FIELD 2.5

# Rule to detect suspicious formats in the From header
header   SUSPICIOUS_FROM_FORMAT From =~ /["'].*(<[^>]+>.*<[^>]+>|'.*<.*>)/i
describe SUSPICIOUS_FROM_FORMAT Suspicious multiple addresses or malformed From header
score    SUSPICIOUS_FROM_FORMAT 3.0

# Rule to detect extra characters around email addresses in the From header
header   EXTRA_CHARS_IN_FROM From =~ /['"<][^'">]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}['">],?/
describe EXTRA_CHARS_IN_FROM Extra characters around email address in From header
score    EXTRA_CHARS_IN_FROM 2.0

TaaviE · 2024-12-04T15:18:53Z

Add another spam rule to check for emails inside the from address and punish them even further

I think it is fairly obvious given I have mentioned scoring and symbols.

I don't think the server should dictate these rules for everyone though. The way it is now, we have no say over them and it's impossible to use external rulesets like KAM rules and so on.

If and how people assign (additional) score to those depends on those people. That's really not a reason not to improve built-in detections and rulesets.

We should be able to add our own spamassassin compatible rules, update or modify rules and be able to do add such a filter ourselves.

Stalwart has currently followed Rspamd's symbol naming and logic. It would be beneficial to stay on that track as SpamAssasin falls significantly short compared to rspamd, from rule syntax to flexibility in general. An amalgam of both would be the least preferred option.

Hopefully, when that happens, we can create custom rules like below and achieve what you ask for by then.

They don't. These can not handle UTF-8 and there are no checks to avoid scoring allowed (albeit weird) usage, containing the same address, an address from the same (or an authenticated domain) should not be scored.

nomadturk · 2024-12-04T16:17:46Z

I think it is fairly obvious given I have mentioned scoring and symbols.

It's not.

If and how people assign (additional) score to those depends on those people. That's really not a reason not to improve built-in detections and rulesets.

True. As long as rules themselves are editable.

Stalwart has currently followed Rspamd's symbol naming and logic. It would be beneficial to stay on that track as SpamAssasin falls significantly short compared to rspamd, from rule syntax to flexibility in general. An amalgam of both would be the least preferred option.

Well, that doesn't make much of a difference. We are not using rspamd as-is anyways.
A long as it works, whatever solution we have should support creating rules in Rspamd's LUA format or Regex OR SpamAssassin format.

Falling short... That doesn't mean RSpamD is better by the way. I'm especially curious to see a review of Rspamd vs SA v4 from a heavy, non-hobbyist user.
Just because it's newer and supposedly faster (according to their own comparison), doesn't automatically mean it is so.
Who cares if it doesn't work as intended, even if it's the best piece of tech but it lacks support, rules and so on.

Anyways even Rspamd support using some drop-in SpamAssassin rules, so Stalwart can support that as well.
Especially since most of these rules do get updated over either channels, via curl or distro upgrades.

Ideally, it should be able to either

create, use and update any custom rules as you would in any other email software.
work with external SpamAssassin or Rspamd's.

So, what you write here can be a rule.
Be it in Stalwart or in any of the other spam engine's config formats.

Do you have a rule suggestion for RSpamd that would work?

mdecimus · 2024-12-04T16:44:13Z

Thanks @TaaviE - this will be included in #947

@nomadturk - regarding the SA vs Rspamd comparison, although I haven't compared the effectiveness of both solutions, I did look at both codebases and Rspamd is much faster and better designed than SA. SpamAssassin is written in Perl and relies heavily on regular expressions which are much slower than hand crafting a parser (what Rspamd does in their C code and LUA bindings). In addition to that, some of the regular expressions in SpamAssassin are poorly designed and result in heavy memory usage as well as performance degradation. I recommend that on your custom filtering rules you avoid using regular expressions as much as possible. If what you need to filter can't be done with a simple 'contains', then try with Sieve and if that is not enough then MTA hooks.

nomadturk · 2024-12-04T17:18:34Z

@mdecimus :)

Even the poorly designed ones does wonders to stop spam, if there are no better alternatives.
It's better to have them instead of not having them. I guess that's why RSpamd supports them as well.

As long as you support adding custom rules and auto-updateable RSpamd/SA rules from 3rd parties, we should be good.

Even for RSpamd, there are lots of community, documentation, provider support and plugins that are in the wild.
I would like us to have the ability to make use of those with #947

Otherwise, maybe running an external cluster of RSpamd and offloading that workload from Stalwart to that one would be better.

TaaviE · 2024-12-05T10:41:37Z

Do you have a rule suggestion for RSpamd that would work?

I have an unpolished piece of Lua. Quite a lot of the complexity stems from having to check the alignment against any newly extracted domains. Which is why having it as a built-in would be really beneficial.

mdecimus · 2024-12-05T17:21:03Z

Migrated to #947

TaaviE added the enhancement New feature or request label Dec 4, 2024

mdecimus mentioned this issue Dec 5, 2024

[enhancement]: Spam filter improvements #947

Closed

1 task

mdecimus closed this as completed Dec 5, 2024

mdecimus added a commit that referenced this issue Dec 23, 2024

Improve SPOOF_DISPLAY_NAME detection (fixes #982)

1726d68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[enhancement]: Detect fake "addresses" in From headers #982

[enhancement]: Detect fake "addresses" in From headers #982

TaaviE commented Dec 4, 2024 •

edited

Loading

nomadturk commented Dec 4, 2024 •

edited

Loading

TaaviE commented Dec 4, 2024

nomadturk commented Dec 4, 2024

TaaviE commented Dec 4, 2024

nomadturk commented Dec 4, 2024

mdecimus commented Dec 4, 2024

nomadturk commented Dec 4, 2024

TaaviE commented Dec 5, 2024

mdecimus commented Dec 5, 2024

[enhancement]: Detect fake "addresses" in From headers #982

[enhancement]: Detect fake "addresses" in From headers #982

Comments

TaaviE commented Dec 4, 2024 • edited Loading

Which feature or improvement would you like to request?

Is your feature request related to a problem?

Additional context

Code of Conduct

nomadturk commented Dec 4, 2024 • edited Loading

TaaviE commented Dec 4, 2024

nomadturk commented Dec 4, 2024

TaaviE commented Dec 4, 2024

nomadturk commented Dec 4, 2024

mdecimus commented Dec 4, 2024

nomadturk commented Dec 4, 2024

TaaviE commented Dec 5, 2024

mdecimus commented Dec 5, 2024

TaaviE commented Dec 4, 2024 •

edited

Loading

nomadturk commented Dec 4, 2024 •

edited

Loading