"SfR Fresh" - the SfR Freeware/Shareware Archive 
Member "amavisd-new-2.6.1/README_FILES/README.lookups" of archive amavisd-new-2.6.1.tar.gz:
As a special service "SfR Fresh" has tried to format the requested source page into HTML format using source code syntax highlighting with prefixed line numbers.
Alternatively you can here view or download the uninterpreted source code file.
That can be also achieved for any archive member file by clicking within an archive contents listing on the first character of the file(path) respectively on the according byte size field.
1 LOOKUP MAPS (hash, SQL) AND ACCESS LISTS EXPLAINED
2 ==================================================
3 Updated: 2002-04, 2002-06, 2002-11, 2002-12,
4 2003-03, 2003-05, 2003-06, 2003-09, 2003-12,
5 2004-01, 2004-03, 2004-12,
6 2005-01, 2005-03, 2005-05, 2005-08
7 Mark Martinec <Mark.Martinec@ijs.si>
8
9 (applies to the semantics of amavisd.conf variables such as:
10 %virus_lovers, %bypass_checks,
11 @virus_lovers_acl, @bypass_checks_acl,
12 $virus_lovers_re, $bypass_checks_re,
13 %local_domains, @local_domains_acl, %mailto, ... )
14
15 NOTE:
16 All lookups are performed with raw (rfc2821-unquoted
17 and unbracketed) addresses as a key,
18 i.e.: Bob "Funny" Dude@example.com
19 not: "Bob \"Funny\" Dude"@example.com
20 and not: <"Bob \"Funny\" Dude"@example.com>
21
22
23 Several configurable settings in amavisd are controlled through the use
24 of table lookups (hash/associative array), access control lists (array),
25 Perl-regexp -based access control lists, SQL or LDAP lookups.
26 The subroutine that does all the lookups is:
27
28 sub lookup($$@) {
29 my($get_all, $addr, @tables) = @_;
30
31 It perform a lookup for a key (usually a recipient e-mail envelope address,
32 unless otherwise noted) against one or more lookup tables / maps.
33 The set of maps used to be hard-wired into the program (but no longer is),
34 and the order chosen is: from specific to more general, and from faster
35 to slower, which is usually flexible enough. Thus the default sequence
36 of lookups: SQL, LDAP, hash, ACL, regexp, constant. The first that returns
37 a definitive answer (not undef/NULL) stops the search.
38
39 The SQL and LDAP are somewhat specific and are always consulted first.
40 There can only be one (or none) SQL and one (or none) LDAP lookup.
41 This is an implementational limitation, and might be lifted some day.
42
43 The lists of static lookup tables are configurable since 20040701
44 (amavisd-new-2.0), and is controlled by array variables such as:
45
46 @virus_admin_maps = (\%virus_admin, \$virus_admin);
47 @viruses_that_fake_sender_maps = (\$viruses_that_fake_sender_re, 1);
48 @spam_kill_level_maps = (\$sa_kill_level_deflt);
49 @local_domains_maps =
50 (\%local_domains, \@local_domains_acl, \$local_domains_re);
51 @bypass_virus_checks_maps =
52 (\%bypass_virus_checks, \@bypass_virus_checks_acl, \$bypass_virus_checks_re);
53 @virus_lovers_maps =
54 (\%virus_lovers, \@virus_lovers_acl, \$virus_lovers_re);
55
56 See amavisd.conf-default for a complete list of these @*_maps variables.
57 The above example shows that the default value of these arrays exactly
58 corresponds to the formerly hard-wired search order. Users are free to
59 leave these @*_maps variables at their default, referencing the legacy
60 variables, or the list can be replaced entirely. There may be any number
61 of lookup tables of any static type specified in these lists. Some restrain
62 is warranted nevertheless for efficiency reasons - one lookup into larger
63 lookup table is ofter quicker than two lookups into smaller ones.
64
65 Some lookup maps can only return boolean result (e.g. ACL), other maps
66 may return any value, which can be interpreted as boolean, numeric, string
67 or possibly other. The result of some lookup maps (e.g. regexp) may include
68 pieces of lookup key.
69
70 If a match is found, the subroutine lookup() returns whatever the map
71 returns; undef is returned if nothing matches (which for Perl is false
72 as well).
73
74
75 A CONSTANT
76
77 Specifying a Perl scalar as an argument to lookup() is a degenerate
78 case of a lookup table: it matches any key, and the value of the
79 scalar is returned as the match value.
80
81 Specifying a scalar argument in a call to lookup() (e.g. as the last element
82 in @*_maps arrays) is useful as a last-resort (catchall, default) value.
83
84 One level of indirection is alowed, so the following three cases are
85 equivalent:
86 $sa_kill_level_deflt = 6.0;
87 @spam_kill_level_maps = (\%some_hash, \$sa_kill_level_deflt);
88
89 and:
90 $sa_kill_level_deflt = 6.0;
91 @spam_kill_level_maps = (\%some_hash, $sa_kill_level_deflt);
92
93 and:
94 @spam_kill_level_maps = (\%some_hash, 6.0);
95
96 The first case allows for the value of a scalar variable to be assigned
97 even _after_ the assignment to @*_maps, so this still works as expected:
98 @spam_kill_level_maps = (\%some_hash, \$sa_kill_level_deflt);
99 $sa_kill_level_deflt = 6.0;
100
101 but the following does not (it uses a value in the scalar variable
102 at the time of assignment to the list, which is most likely not 6.0):
103 @spam_kill_level_maps = (\%some_hash, $sa_kill_level_deflt);
104 $sa_kill_level_deflt = 6.0;
105
106
107 HASH LOOKUPS (associative array lookups)
108
109 For arguments to subroutine lookup() of type hash-ref, the argument
110 is passed to subroutine lookup_hash(), which does a lookup into
111 a Perl hash.
112
113 Hash lookups (e.g. for user+foo@sub.example.com) are performed in the
114 following order:
115 - lookup for user+foo@sub.example.com
116 - lookup for user@sub.example.com (only if $recipient_delimiter is '+')
117 - lookup for user+foo@
118 - lookup for user@ (only if $recipient_delimiter is '+')
119 - lookup for sub.example.com
120 - lookup for .sub.example.com
121 - lookup for .example.com
122 - lookup for .com
123 - lookup for .
124
125 The search sequence stops as soon as a match is found, and the value
126 of the matched entry determines the result.
127
128 The domain part is always matched case-insensitively, the localpart
129 is matched case-sensitively when $localpart_is_case_sensitive
130 is true (not case-sensitive by default).
131
132 A field value undef implies that the next lookup table (if there are more)
133 is to be tried. In plain words, undef means "this table does not know
134 the answer, try the next one". Further searching in this table
135 (for possibly more general defaults) is terminated.
136
137 NOTE: a null reverse path e-mail address used by MTA for delivery status
138 notifications (DSN) has empty local part and empty domain. As far as the
139 lookup is concerned (which uses raw, i.e. non-quoted and non-bracketed
140 address form), this address is @, i.e. a single character "@".
141 The lookup_hash for null address goes through the following sequence
142 of keys: "", "@", "." (double quotes added for clarity, they are not part
143 of the key).
144
145 There is a subroutine read_hash() available for use in amavisd.conf.
146 It can read keys from a plain text file, and load them into a Perl hash.
147 Format of the text file: one address per line, anything from '#' to the end
148 of line is treated as a comment, but '#' within correctly quoted rfc2821
149 addresses is not treated as a comment (e.g. a hash sign within
150 "strange # \"foo\" address"@example.com is valid). Leading and trailing
151 whitespace is discarded, empty lines (containing only whitespace and comment)
152 are ignored. Addresses are converted from quoted form into internal (raw)
153 form and inserted as keys into a given hash, with a value of 1 (true).
154 Each address can have an associated optional value (also known as the
155 'righthand side' or RHS) separated from the address by whitespace.
156 An absence of a value implies 1 (true). The $hashref argument is returned
157 for convenience, so that one can say for example:
158 $per_recip_whitelist_sender_lookup_tables = {
159 '.my1.example.com' => read_hash({},'/var/amavis/my1-example-com.wl'),
160 '.my2.example.com' => read_hash({},'/var/amavis/my2-example-com.wl') }
161
162
163
164 LIST LOOKUPS (ACL)
165
166 For arguments to subroutine lookup() of type array-ref, the argument
167 is passed to subroutine lookup_acl(), which does an access list lookup:
168
169 sub lookup_acl($$) {
170 my($addr, $acl_ref) = @_;
171
172 The supplied e-mail address is compared with each member of the
173 lookup list in turn, the first match wins (terminates the search),
174 and its value decides whether the result is true (yes, permit, pass)
175 or false (no, deny, drop). Falling through without a match
176 produces false (undef). Search is case-insensitive.
177
178 NOTE: lookup_acl is not aware of address extensions and they are
179 not handled specially!
180
181 If a list element contains a '@', the full e-mail address is compared,
182 otherwise if a list element has a leading dot, the domain name part is
183 matched only, and the domain as well as its subdomains can match. If there
184 is no leading dot, the domain must match exactly (subdomains do not match).
185
186 The presence of character '!' prepended to a list element decides
187 whether the result will be true (without a '!') or false (with '!')
188 in case this list element matches and terminates the search.
189
190 Because search stops at the first match, it only makes sense
191 to place more specific patterns before the more general ones.
192
193 Although not a special case, it is good to remember that '.' always matches,
194 so a '.' would stop the search and return true, whereas '!.' would stop the
195 search and return false (0).
196
197 Examples:
198
199 given: @acl = qw( me.ac.uk !.ac.uk .uk )
200 'u@me.ac.uk' matches me.ac.uk, returns true and search stops
201
202 given: @acl = qw( me.ac.uk !.ac.uk .uk )
203 'u@you.ac.uk' matches .ac.uk, returns false (because of '!'), search stops
204
205 given: @acl = qw( me.ac.uk !.ac.uk .uk )
206 'u@them.co.uk' matches .uk, returns true and search stops
207
208 given: @acl = qw( me.ac.uk !.ac.uk .uk )
209 'u@some.com' does not match anything, falls through and returns undef
210
211 given: @acl = qw( me.ac.uk !.ac.uk .uk !. )
212 'u@some.com' similar to the previous, except it returns 0 instead of undef,
213 which would only make a difference if this ACL is not the last argument
214 in a call to lookup()
215
216 given: @acl = qw( me.ac.uk !.ac.uk .uk . )
217 'u@some.com' matches catchall ".", and returns true
218
219 more complex example: @acl = qw(
220 !The.Boss@dept1.xxx.com .dept1.xxx.com
221 .dept2.xxx.com .dept3.xxx.com lab.dept4.xxx.com
222 sub.xxx.com !.sub.xxx.com
223 me.d.aaa.com him.d.aaa.com !.d.aaa.com .aaa.com
224 );
225
226
227 Comparing hash (associative array) and ACL:
228
229 For smaller sets of keys and if only boolean results are needed,
230 both hash and ACL are appropriate.
231
232 - hash is still effective for lots of keys, ACL search is linear;
233 - hash can return any value, not just true or false;
234 - hash can strip away address extension, ACL can not;
235
236 - ACL appears simpler and more obvious for smaller sets;
237 - ACL can accommodate arbitrarily nested if-then-elseif-then-...-else cases
238 whereas hash only follows a fixed order of stripping addresses;
239
240
241 ACL FOR IP ADDRESSES
242
243 A special type of lookup is an IP-matching access list implemented by
244 lookup_ip_acl(). It performs a lookup for an IP address against a list
245 or an asssociative array (a hash) of IPv4 or IPv6 networks. It is used
246 by amavisd for example to check if the SMTP client (normally your MTA)
247 is allowed to connect, which is why it is sometimes called 'access control
248 list' or ACL (the variable is @inet_acl).
249
250 IP address is compared with each member of an access list in turn,
251 the first match wins (terminates the search), and its value decides
252 whether the result is true (yes, permit) or false (no, deny).
253 Falling through without a match produces false (undef).
254
255 The presence of character '!' prepended to a list member decides
256 whether the result will be true (without a '!') or false (with '!')
257 in case this list member matches and terminates the search.
258
259 Because search stops at the first match, it only makes sense
260 to place more specific patterns before the more general ones.
261
262 Network can be specified in classless notation n.n.n.n/k, or using
263 a mask n.n.n.n/m.m.m.m . Missing mask implies /32, i.e. a host address.
264
265 Although not a special case, it is good to remember that '::/0'
266 always matches any IPv4 or IPv6 address (even syntactically invalid address).
267
268 The '0/0' is equivalent to '::FFFF:0:0/96' and matches any syntactically
269 valid IPv4 address (including IPv4-mapped IPv6 addresses), but not other
270 IPv6 addresses!
271
272 Example
273 given: @acl = qw( !192.168.1.12 172.16.3.3 !172.16.3.0/255.255.255.0
274 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16
275 !0.0.0.0/8 !:: 127.0.0.0/8 ::1 );
276 matches rfc1918 private address space except host 192.168.1.12
277 and net 172.16.3/24 (but host 172.16.3.3 within 172.16.3/24 still matches).
278 In addition, the 'unspecified' (null, i.e. all zeros) IPv4 and IPv6
279 addresses return false, and IPv4 and IPv6 loopback addresses match
280 and return true.
281
282 If the supplied lookup table is a hash reference, match a canonical IP
283 address: dot-quad IPv4, or preferred IPv6 form, against hash keys. For IPv4
284 addresses a simple classful subnet specification is allowed in hash keys
285 by truncating trailing bytes from the looked up IPv4 address. A syntactically
286 invalid IP address can only match a hash entry with an undef key.
287
288 Besides looking up full CIDR-style IPv4 or IPv6 lists, later versions of
289 lookup_ip_acl() also make possible matching against a hash lookup table,
290 which only allows for full addresses in canonical form (dotted-quad IPv4
291 addresses without leading zeroes or IPv6 addresses in canonical preferred
292 form: x:x:x:x:x:x:x:x), or classful IPv4 subnets with truncated octets,
293 such as:
294 ('10.11.12.13'=>1, '192.168.1.2'=>0, '192.168'=>1, '127'=>1, '10'=>1)
295
296 A convenient method of reading CIDR-style IP lists or a hash from a file
297 if by calling provided routines read_array or read_hash, e.g.:
298 @mynetworks_maps = (read_array('/etc/amavisd-mynetworks'), \@mynetworks);
299 or:
300 @mynetworks_maps = (read_hash('/etc/amavisd-mynetworks'), \@mynetworks);
301
302 More examples at amavisd.conf-sample.
303
304
305 REGULAR EXPRESSION LOOKUPS
306
307 For arguments to subroutine lookup() of type Amavis::Lookup::RE
308 (objects), the object is passed to method lookup_re, which does a
309 lookup into a list of Perl regular expressions (regexp or RE for short).
310
311 The full unmodified e-mail address is always used, so splitting to localpart
312 and domain or lowercasing is NOT performed. The regexp is powerful enough
313 that this is unnecessary. The routine is useful for other RE tests, such as
314 looking for banned file names.
315
316 Each element of the list can be a ref to a pair, or directly a regexp
317 ('Regexp' object created by qr operator, or just a (less efficient)
318 string containing a regular expression). If it is a pair, the first
319 element is treated as a regexp, and the second provides a return value
320 in case the regexp matches. If not a pair, the implied result value
321 of a match is 1.
322
323 The regular expression is taken as-is, no implicit anchoring or setting
324 case insensitivity is done, so do use a qr'(?i)^user@example\.com$',
325 and not a sloppy qr'user@example.com', which can easily backfire.
326 Also, if qr is used with a delimiter other than ' (apostrophe), make sure
327 to quote the @ and $ .
328
329 The pattern allows for capturing of parenthesized substrings, which can
330 then be referenced from the result string using the $1, $2, ... notation,
331 as with the Perl m// operator. The number after a $ may be a multi-digit
332 decimal number. To avoid possible ambiguity the ${n} or $(n) form may be used.
333 Substring numbering starts with 1. Nonexistent references evaluate to empty
334 strings. If any substitution is done, the result inherits the taintedness
335 of the key. Keep in mind that $ and @ characters needs to be backslash-quoted
336 in qq() strings. Example:
337 $virus_quarantine_to = new_RE(
338 [ qr'^(.*)@example\.com$'i => 'virus-${1}@example.com' ],
339 [ qr'^(.*)(@[^@]*)?$'i => 'virus-${1}${2}' ] );
340
341 Example (equivalent to the example in lookup_acl):
342 $acl_re = new_re->new(
343 qr'@me\.ac\.uk$'i,
344 [ qr'[@.]ac\.uk$'i => 0 ],
345 qr'\.uk$'i,
346 );
347 ($r,$k) = $acl_re->lookup_re('user@me.ac.uk');
348 or $r = lookup('user@me.ac.uk', $acl_re);
349
350 'user@me.ac.uk' matches me.ac.uk, returns true and search stops
351 'user@you.ac.uk' matches .ac.uk, returns false (because of =>0) and search stops
352 'user@them.co.uk' matches .uk, returns true and search stops
353 'user@some.com' does not match anything, falls through and returns false (undef)
354
355 NOTE: new_RE is a synonym (shorthand) for the
356 internal subroutine Amavis::Lookup::RE::new
357
358 See Perl documentation (or Google the Internet) for the description
359 of Perl regular expressions. They are just enhanced version of Posix regular
360 expressions, i.e. what egrep, awk and sed thrive on. Here are the
361 most important constructs (simplified):
362
363 . Match any character inter..t
364 | Alternation alfa|beta|gamma
365 () Grouping (pre|post)fix
366 [] Set of characters (char. class) [Aa]lfa[0-9]
367 ^ Match the beginning of the string ^MakeMoney
368 $ Match the end of the string com$
369 \ Quote the next metacharacter \.com$
370 ^\$\$\$\+spam@\[127\.0\.0\.1\]$
371 most other characters just match themselves
372
373 quantifiers may be placed after the pattern to modify its meaning
374 from 'match itself exactly once' into:
375 * Match 0 or more times ^alfa.*omega$
376 + Match 1 or more times alfa +beta
377 ? Match 1 or 0 times (first)?aid
378 {n} Match exactly n times 0{6}
379 {n,} Match at least n times !{3,}
380 {n,m} Match at least n but not more than m times
381
382
383 SQL LOOKUPS
384
385 For general SQL considerations and the interpretation of @lookup_sql_dsn
386 please see documentation in README.sql .
387
388 The amavisd.conf variable @lookup_sql_dsn controls access to the SQL
389 server (dsn = data source name). If the list @lookup_sql_dsn is empty,
390 no attempts to use SQL for lookups will be made, and no code to use DBI
391 will be loaded or compiled (if @storage_sql_dsn is empty as well).
392
393 For arguments to subroutine lookup() of type Amavis::Lookup::SQLfield
394 (objects), the object is passed to method lookup_sql_field, which does
395 a lookup into a SQL table by using Perl module DBI.
396
397 SQL 'select' requests all available fields from the specified tables,
398 and the result is cached (just for this mail message processing).
399 Individual fields can be extracted one at a time from this cache
400 very quickly, so there is no penalty in using several calls to lookup
401 for different fields (for the same key) in different parts of the program.
402
403 lookup_sql() performs a lookup for an e-mail address against a SQL map.
404 If a match is found it returns whatever the map returns (a reference
405 to a hash containing values of requested fields), otherwise returns undef.
406 A match aborts further fetching sequence.
407
408 lookup_sql_field() also performs a lookup for an e-mail address against
409 a SQL map. It first calls lookup_sql() if it hasn't been called yet for
410 this key, requesting it to return all matching records. Instead of returning
411 the whole record as lookup_sql does, it returns just a value of one particular
412 table field, the first one with a defined (non-NULL) value from the list
413 of matching records (or undef if there are none).
414
415 The lookup_sql_field() is the subroutine that gets called from lookup()
416 for arguments (objects) of type Amavis::Lookup::SQLfield.
417
418 The field value NULL is translated to Perl undef, which according
419 to lookup rules implies that the next lookup table (if there are more)
420 is to be tried. In plain words, NULL means "this table does not know
421 the answer, try the next one". Further searching in this table
422 (e.g. for more general defaults) is terminated.
423
424 Boolean fields are usually represented as a single character (instead of
425 an integer) to minimize storage. Characters N,n,F,f,0,NUL and SPACE
426 represent false (0), any other character represents true. Trailing blanks
427 are ignored. It is customary to use Y for true and N for false.
428
429 SQL lookups (e.g. for user+foo@example.com) are performed in order
430 which is usually specified by 'ORDER BY...DESC' in the SELECT statement;
431 otherwise the order is unspecified, which is only useful if just specific
432 entries exist in a database (e.g. full address always, not only domain part
433 or mailbox part).
434
435 The following order (implemented by sorting on the 'priority' field
436 in DESCending order, zero is low priority) is recommended, to follow
437 the same specific-to-general principle as in other lookup tables:
438
439 - lookup for user+foo@example.com
440 - lookup for user@example.com (only if $recipient_delimiter is '+')
441 - lookup for user+foo (only if domain part is local)
442 - lookup for user (only local; only if $recipient_delimiter is '+')
443 - lookup for @example.com
444 - lookup for @.example.com
445 - lookup for @.com
446 - lookup for @. (catchall)
447
448 NOTE:
449 this is different from hash and ACL lookups in two important aspects:
450 - key without '@' implies mailbox name, not domain name;
451 - a naked mailbox name lookups (without '@', e.g. 'user') are only
452 performed when the mail address matches local_domains lookups.
453
454 The domain part is always lowercased when constructing a key,
455 the localpart is not lowercased when $localpart_is_case_sensitive is true.
456
457 NOTE: a null reverse path e-mail address used by MTA for delivery status
458 notifications (DSN) has empty local part and empty domain. As far as the
459 lookup is concerned (which uses raw, i.e. non-quoted and non-bracketed
460 address form), this address is @, i.e. a single character "@".
461 The SQL lookup for null address goes through the following sequence
462 of keys: "", "@", "@." (double quotes added for clarity, they are not part
463 of the key).
464
465 Table names and field names as used for SQL lookups are hard-wired in the
466 routine prepare_sql_queries(). Please adjust it to will. Field names should
467 be unique even without the table prefix. If they are not, the last one
468 in the SELECT field list prevails.
469
470 For an example schema that can be used with MySQL or PostgreSQL or SQLite
471 see README.sql.
472
473
474 Special handling of optional SQL field 'users.local'
475
476 A special shorthand is provided when SQL lookups are used: when a match
477 for recipient address (or domain) is found in SQL tables (regardless of
478 field values), the recipient is considered local, regardless of static
479 @local_comains_acl or %local_domains lookup tables. This simplifies
480 life when a large number of dynamically changing domains is hosted.
481 To overrule this behaviour, add an explicit boolean field 'local'
482 to table 'users' (missing field defaults to true, meaning record match
483 implies the recipient is local; a NULL field 'local' is not special,
484 it is interpreted as undef like other NULL fields, causing search
485 to continue into other lookup tables).
486
487 Since amavisd-new-20030616-p7:
488 changed the default value for local_domains_sql lookup for the catchall
489 key '@.' under conditions: when user record with key '@.' is present in
490 the database and a field 'local' is not present. Previously it surprisingly
491 defaulted to true, now it falls back to static lookup table defaults,
492 the same as if the record '@.' were not present in the table or as if
493 the field value 'local' was NULL.
494
495
496 Case sentitivity of string comparison
497
498 Amavisd-new expects string comparison to be case sensitive (but does not
499 mind if it isn't). When forming a SELECT clause it lowercases parts of
500 keys that are supposed to be case-insensitive, such as the domain name.
501 The local part of the e-mail address in SQL search keys is lowercased
502 if and only if the $localpart_is_case_sensitive variable is false (which
503 is a default).
504
505 This means that case-insensitive parts of e-mail addresses as kept in the
506 SQL database should be in lower case, otherwise match may fail, depending
507 on SQL server behaviour and the use of BINARY prefix in string data types.
508
509 Since MySQL version 3.23.0 it is possible to declare a data type of
510 a column as BINARY, forcing string comparision to be case sensitive,
511 as it is in PostgreSQL. This is only required for sites that want to treat
512 localpart as case-sentitive and have $localpart_is_case_sensitive true.
513