Skip to content

Commit 904ac9d

Browse files
committed
feature: allocate message_id in mail from: phase
Added EXPERIMENTAL_MAILFROM_MSGID to move generation of message_id from DATA to MAIL FROM phase. This will allow logging of errors in RCPT phase, referencing a potential future message_id
1 parent b273058 commit 904ac9d

File tree

6 files changed

+129
-108
lines changed

6 files changed

+129
-108
lines changed

src/src/EDITME

+3
Original file line numberDiff line numberDiff line change
@@ -635,6 +635,9 @@ DISABLE_MAL_MKS=yes
635635
# Uncomment the following to include the fast-ramp two-phase-queue-run support
636636
# EXPERIMENTAL_QUEUE_RAMP=yes
637637

638+
# Uncomment the following line to enable message_id generation in mail from:
639+
# EXPERIMENTAL_MAILFROM_MSGID=yes
640+
638641
###############################################################################
639642
# THESE ARE THINGS YOU MIGHT WANT TO SPECIFY #
640643
###############################################################################

src/src/config.h.defaults

+1
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,7 @@ Do not put spaces between # and the 'define'.
202202
#define EXPERIMENTAL_DCC
203203
#define EXPERIMENTAL_DSN_INFO
204204
#define EXPERIMENTAL_LMDB
205+
#define EXPERIMENTAL_MAILFROM_MSGID
205206
#define EXPERIMENTAL_QUEUE_RAMP
206207
#define EXPERIMENTAL_QUEUEFILE
207208
#define EXPERIMENTAL_SRS

src/src/functions.h

+1
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,7 @@ extern uschar *event_raise(uschar *, const uschar *, uschar *);
220220
extern void msg_event_raise(const uschar *, const address_item *);
221221
#endif
222222

223+
extern void generate_message_id(void);
223224
extern int exim_chown_failure(int, const uschar*, uid_t, gid_t);
224225
extern const uschar * exim_errstr(int);
225226
extern void exim_exit(int, const uschar *) NORETURN;

src/src/receive.c

+7-107
Original file line numberDiff line numberDiff line change
@@ -1653,7 +1653,6 @@ int error_rc = error_handling == ERRORS_SENDER
16531653
? errors_sender_rc : EXIT_FAILURE;
16541654
int header_size = 256;
16551655
int start, end, domain;
1656-
int id_resolution = 0;
16571656
int had_zero = 0;
16581657
int prevlines_length = 0;
16591658

@@ -1742,7 +1741,9 @@ next->text = store_get(header_size, TRUE); /* tainted */
17421741
header names list to be the normal list. Indicate there is no data file open
17431742
yet, initialize the size and warning count, and deal with no size limit. */
17441743

1744+
#ifndef EXPERIMENTAL_MAILFROM_MSGID
17451745
message_id[0] = 0;
1746+
#endif
17461747
spool_data_file = NULL;
17471748
data_fd = -1;
17481749
spool_name = US"";
@@ -1775,18 +1776,6 @@ if (smtp_input && !smtp_batched_input && !f.dkim_disable_verify)
17751776
if (sender_host_address) dmarc_init(); /* initialize libopendmarc */
17761777
#endif
17771778

1778-
/* Remember the time of reception. Exim uses time+pid for uniqueness of message
1779-
ids, and fractions of a second are required. See the comments that precede the
1780-
message id creation below. */
1781-
1782-
(void)gettimeofday(&message_id_tv, NULL);
1783-
1784-
/* For other uses of the received time we can operate with granularity of one
1785-
second, and for that we use the global variable received_time. This is for
1786-
things like ultimate message timeouts. */
1787-
1788-
received_time = message_id_tv;
1789-
17901779
/* If SMTP input, set the special handler for timeouts. The alarm() calls
17911780
happen in the smtp_getc() function when it refills its buffer. */
17921781

@@ -2609,83 +2598,11 @@ if (extract_recip)
26092598

26102599
}
26112600

2612-
/* Now build the unique message id. This has changed several times over the
2613-
lifetime of Exim. This description was rewritten for Exim 4.14 (February 2003).
2614-
Retaining all the history in the comment has become too unwieldy - read
2615-
previous release sources if you want it.
2616-
2617-
The message ID has 3 parts: tttttt-pppppp-ss. Each part is a number in base 62.
2618-
The first part is the current time, in seconds. The second part is the current
2619-
pid. Both are large enough to hold 32-bit numbers in base 62. The third part
2620-
can hold a number in the range 0-3843. It used to be a computed sequence
2621-
number, but is now the fractional component of the current time in units of
2622-
1/2000 of a second (i.e. a value in the range 0-1999). After a message has been
2623-
received, Exim ensures that the timer has ticked at the appropriate level
2624-
before proceeding, to avoid duplication if the pid happened to be re-used
2625-
within the same time period. It seems likely that most messages will take at
2626-
least half a millisecond to be received, so no delay will normally be
2627-
necessary. At least for some time...
2628-
2629-
There is a modification when localhost_number is set. Formerly this was allowed
2630-
to be as large as 255. Now it is restricted to the range 0-16, and the final
2631-
component of the message id becomes (localhost_number * 200) + fractional time
2632-
in units of 1/200 of a second (i.e. a value in the range 0-3399).
2633-
2634-
Some not-really-Unix operating systems use case-insensitive file names (Darwin,
2635-
Cygwin). For these, we have to use base 36 instead of base 62. Luckily, this
2636-
still allows the tttttt field to hold a large enough number to last for some
2637-
more decades, and the final two-digit field can hold numbers up to 1295, which
2638-
is enough for milliseconds (instead of 1/2000 of a second).
2639-
2640-
However, the pppppp field cannot hold a 32-bit pid, but it can hold a 31-bit
2641-
pid, so it is probably safe because pids have to be positive. The
2642-
localhost_number is restricted to 0-10 for these hosts, and when it is set, the
2643-
final field becomes (localhost_number * 100) + fractional time in centiseconds.
2644-
2645-
Note that string_base62() returns its data in a static storage block, so it
2646-
must be copied before calling string_base62() again. It always returns exactly
2647-
6 characters.
2648-
2649-
There doesn't seem to be anything in the RFC which requires a message id to
2650-
start with a letter, but Smail was changed to ensure this. The external form of
2651-
the message id (as supplied by string expansion) therefore starts with an
2652-
additional leading 'E'. The spool file names do not include this leading
2653-
letter and it is not used internally.
2654-
2655-
NOTE: If ever the format of message ids is changed, the regular expression for
2656-
checking that a string is in this format must be updated in a corresponding
2657-
way. It appears in the initializing code in exim.c. The macro MESSAGE_ID_LENGTH
2658-
must also be changed to reflect the correct string length. The queue-sort code
2659-
needs to know the layout. Then, of course, other programs that rely on the
2660-
message id format will need updating too. */
2661-
2662-
Ustrncpy(message_id, string_base62((long int)(message_id_tv.tv_sec)), 6);
2663-
message_id[6] = '-';
2664-
Ustrncpy(message_id + 7, string_base62((long int)getpid()), 6);
2665-
2666-
/* Deal with the case where the host number is set. The value of the number was
2667-
checked when it was read, to ensure it isn't too big. The timing granularity is
2668-
left in id_resolution so that an appropriate wait can be done after receiving
2669-
the message, if necessary (we hope it won't be). */
2670-
2671-
if (host_number_string)
2672-
{
2673-
id_resolution = BASE_62 == 62 ? 5000 : 10000;
2674-
sprintf(CS(message_id + MESSAGE_ID_LENGTH - 3), "-%2s",
2675-
string_base62((long int)(
2676-
host_number * (1000000/id_resolution) +
2677-
message_id_tv.tv_usec/id_resolution)) + 4);
2678-
}
2679-
2680-
/* Host number not set: final field is just the fractional time at an
2681-
appropriate resolution. */
2682-
2683-
else
2684-
{
2685-
id_resolution = BASE_62 == 62 ? 500 : 1000;
2686-
sprintf(CS(message_id + MESSAGE_ID_LENGTH - 3), "-%2s",
2687-
string_base62((long int)(message_id_tv.tv_usec/id_resolution)) + 4);
2688-
}
2601+
#ifdef EXPERIMENTAL_MAILFROM_MSGID
2602+
if (!smtp_input || smtp_batched_input) generate_message_id();
2603+
#else
2604+
generate_message_id();
2605+
#endif /* EXPERIMENTAL_MAILFROM_MSGID */
26892606

26902607
/* Add the current message id onto the current process info string if
26912608
it will fit. */
@@ -4303,23 +4220,6 @@ then we can think about properly declaring the message not-received. */
43034220

43044221

43054222
TIDYUP:
4306-
/* In SMTP sessions we may receive several messages in one connection. After
4307-
each one, we wait for the clock to tick at the level of message-id granularity.
4308-
This is so that the combination of time+pid is unique, even on systems where the
4309-
pid can be re-used within our time interval. We can't shorten the interval
4310-
without re-designing the message-id. See comments above where the message id is
4311-
created. This is Something For The Future.
4312-
Do this wait any time we have created a message-id, even if we rejected the
4313-
message. This gives unique IDs for logging done by ACLs. */
4314-
4315-
if (id_resolution != 0)
4316-
{
4317-
message_id_tv.tv_usec = (message_id_tv.tv_usec/id_resolution) * id_resolution;
4318-
exim_wait_tick(&message_id_tv, id_resolution);
4319-
id_resolution = 0;
4320-
}
4321-
4322-
43234223
process_info[process_info_len] = 0; /* Remove message id */
43244224
if (spool_data_file && cutthrough_done == NOT_TRIED)
43254225
{

src/src/smtp_in.c

+5-1
Original file line numberDiff line numberDiff line change
@@ -2415,7 +2415,7 @@ TCP_SYN_RCV (as of 12.1) so no idea about data-use. */
24152415

24162416
if (getsockopt(fileno(smtp_out), IPPROTO_TCP, TCP_FASTOPEN, &is_fastopen, &len) == 0)
24172417
{
2418-
if (is_fastopen)
2418+
if (is_fastopen)
24192419
{
24202420
DEBUG(D_receive)
24212421
debug_printf("TFO mode connection (TCP_FASTOPEN getsockopt)\n");
@@ -4900,6 +4900,9 @@ while (done <= 0)
49004900
/* Apply an ACL check if one is defined, before responding. Afterwards,
49014901
when pipelining is not advertised, do another sync check in case the ACL
49024902
delayed and the client started sending in the meantime. */
4903+
#ifdef EXPERIMENTAL_MAILFROM_MSGID
4904+
generate_message_id();
4905+
#endif
49034906

49044907
if (acl_smtp_mail)
49054908
{
@@ -4929,6 +4932,7 @@ while (done <= 0)
49294932
user_msg = string_sprintf("%s%s", user_msg, US", PRDR Requested");
49304933
#endif
49314934
smtp_user_msg(US"250", user_msg);
4935+
generate_message_id();
49324936
}
49334937
smtp_delay_rcpt = smtp_rlr_base;
49344938
f.recipients_discarded = (rc == DISCARD);

src/src/string.c

+112
Original file line numberDiff line numberDiff line change
@@ -1822,4 +1822,116 @@ return 0;
18221822
}
18231823
#endif
18241824

1825+
/* Now build the unique message id. This has changed several times over the
1826+
lifetime of Exim. This description was rewritten for Exim 4.14 (February 2003).
1827+
Retaining all the history in the comment has become too unwieldy - read
1828+
previous release sources if you want it.
1829+
1830+
The message ID has 3 parts: tttttt-pppppp-ss. Each part is a number in base 62.
1831+
The first part is the current time, in seconds. The second part is the current
1832+
pid. Both are large enough to hold 32-bit numbers in base 62. The third part
1833+
can hold a number in the range 0-3843. It used to be a computed sequence
1834+
number, but is now the fractional component of the current time in units of
1835+
1/2000 of a second (i.e. a value in the range 0-1999). After a message has been
1836+
received, Exim ensures that the timer has ticked at the appropriate level
1837+
before proceeding, to avoid duplication if the pid happened to be re-used
1838+
within the same time period. It seems likely that most messages will take at
1839+
least half a millisecond to be received, so no delay will normally be
1840+
necessary. At least for some time...
1841+
1842+
There is a modification when localhost_number is set. Formerly this was allowed
1843+
to be as large as 255. Now it is restricted to the range 0-16, and the final
1844+
component of the message id becomes (localhost_number * 200) + fractional time
1845+
in units of 1/200 of a second (i.e. a value in the range 0-3399).
1846+
1847+
Some not-really-Unix operating systems use case-insensitive file names (Darwin,
1848+
Cygwin). For these, we have to use base 36 instead of base 62. Luckily, this
1849+
still allows the tttttt field to hold a large enough number to last for some
1850+
more decades, and the final two-digit field can hold numbers up to 1295, which
1851+
is enough for milliseconds (instead of 1/2000 of a second).
1852+
1853+
However, the pppppp field cannot hold a 32-bit pid, but it can hold a 31-bit
1854+
pid, so it is probably safe because pids have to be positive. The
1855+
localhost_number is restricted to 0-10 for these hosts, and when it is set, the
1856+
final field becomes (localhost_number * 100) + fractional time in centiseconds.
1857+
1858+
Note that string_base62() returns its data in a static storage block, so it
1859+
must be copied before calling string_base62() again. It always returns exactly
1860+
6 characters.
1861+
1862+
There doesn't seem to be anything in the RFC which requires a message id to
1863+
start with a letter, but Smail was changed to ensure this. The external form of
1864+
the message id (as supplied by string expansion) therefore starts with an
1865+
additional leading 'E'. The spool file names do not include this leading
1866+
letter and it is not used internally.
1867+
1868+
NOTE: If ever the format of message ids is changed, the regular expression for
1869+
checking that a string is in this format must be updated in a corresponding
1870+
way. It appears in the initializing code in exim.c. The macro MESSAGE_ID_LENGTH
1871+
must also be changed to reflect the correct string length. The queue-sort code
1872+
needs to know the layout. Then, of course, other programs that rely on the
1873+
message id format will need updating too. */
1874+
1875+
void
1876+
generate_message_id()
1877+
{
1878+
int id_resolution = 0;
1879+
1880+
/* Remember the time of reception. Exim uses time+pid for uniqueness of message
1881+
ids, and fractions of a second are required. See the comments that precede the
1882+
message id creation below. */
1883+
1884+
(void)gettimeofday(&message_id_tv, NULL);
1885+
1886+
/* For other uses of the received time we can operate with granularity of one
1887+
second, and for that we use the global variable received_time. This is for
1888+
things like ultimate message timeouts. */
1889+
1890+
received_time = message_id_tv;
1891+
1892+
Ustrncpy(message_id, string_base62((long int)(message_id_tv.tv_sec)), 6);
1893+
message_id[6] = '-';
1894+
Ustrncpy(message_id + 7, string_base62((long int)getpid()), 6);
1895+
1896+
/* Deal with the case where the host number is set. The value of the number was
1897+
checked when it was read, to ensure it isn't too big. The timing granularity is
1898+
left in id_resolution so that an appropriate wait can be done after receiving
1899+
the message, if necessary (we hope it won't be). */
1900+
1901+
if (host_number_string)
1902+
{
1903+
id_resolution = BASE_62 == 62 ? 5000 : 10000;
1904+
sprintf(CS(message_id + MESSAGE_ID_LENGTH - 3), "-%2s",
1905+
string_base62((long int)(
1906+
host_number * (1000000/id_resolution) +
1907+
message_id_tv.tv_usec/id_resolution)) + 4);
1908+
}
1909+
1910+
/* Host number not set: final field is just the fractional time at an
1911+
appropriate resolution. */
1912+
1913+
else
1914+
{
1915+
id_resolution = BASE_62 == 62 ? 500 : 1000;
1916+
sprintf(CS(message_id + MESSAGE_ID_LENGTH - 3), "-%2s",
1917+
string_base62((long int)(message_id_tv.tv_usec/id_resolution)) + 4);
1918+
}
1919+
1920+
/* In SMTP sessions we may receive several messages in one connection. After
1921+
each one, we wait for the clock to tick at the level of message-id granularity.
1922+
This is so that the combination of time+pid is unique, even on systems where the
1923+
pid can be re-used within our time interval. We can't shorten the interval
1924+
without re-designing the message-id. See comments above where the message id is
1925+
created. This is Something For The Future.
1926+
Do this wait any time we have created a message-id, even if we rejected the
1927+
message. This gives unique IDs for logging done by ACLs. */
1928+
1929+
if (id_resolution != 0)
1930+
{
1931+
message_id_tv.tv_usec = (message_id_tv.tv_usec/id_resolution) * id_resolution;
1932+
exim_wait_tick(&message_id_tv, id_resolution);
1933+
id_resolution = 0;
1934+
}
1935+
}
1936+
18251937
/* End of string.c */

0 commit comments

Comments
 (0)