Tuesday, November 17, 2009

NAT-Filtering DNS With iptables/netfilter (Blocking Conficker)

Recently I have been looking at approaches to containing the activity of the Conficker (Downup, Downadup, Kido) worm, in particular by focusing on blocking of initial DNS queries for its large set of rendezvous domains that it uses to update its executable.

The existing techniques that I've read about involve creating nameservers that are authoritative for a huge list of zones, either themselves resolving or forwarding to distinct resolvers. But as anybody that has run BIND configured with a large set of authoritative zones will know, the resource requirements for such a configuration are excessive: high memory usage, long pauses during cache cleanup, slow startup times...

So I derived an entirely different filtering technique that appears to work very well for blocking a set of over 100,000 domains, which has negligible resource requirements and which does not require reconfiguration of an existing resolver infrastructure.

The essential principle is that both UDP and TCP versions of the protocol for DNS queries are trivial to proxy in a robust manner at layer 4, i.e. using only simple NAT, without any application awareness. In fact, this is how many commodity broadband routers function - they perform NAT on DNS packets destined to their LAN-side IP address to redirect them to an ISP's resolvers and then relay the response back to the requester.

This allows for the creation of an extremely lightweight set of DNS proxy servers that can be placed between some network whose hosts you wish to contain and your existing resolver infrastructure. These proxy servers may then be configured to perform content-based filtering of DNS requests by performing string matching at a specific offset within the DNS packets. Such a proxy server requires nothing more than low-spec hardware (or a modest virtual machine) and uses only the built in features of a vanilla Linux 2.6 kernel - no nameserver software required.

What follows assumes a sound understanding Linux networking and iptables. The firewall snippets are in "iptables-restore" format. It is intended to provide comprehensive details of the approach I am using but does not consitute anything like a complete HOWTO.

The following configures a NAT that will rewrite DNS packets destined to the DNS_PROXY address to a set of DNS_RESOLVER_{1,2,3} addresses following a round-robin policy.

echo 1 > /proc/sys/net/ipv4/ip_forward

*mangle
:PREROUTING ACCEPT [0:0]
-A PREROUTING -d [DNS_PROXY] -p udp -m udp --dport 53 -m statistic --mode nth --every 3 --packet 0 -m state --state new -j CONNMARK --set-mark 1
-A PREROUTING -d [DNS_PROXY]  -p udp -m udp --dport 53 -m statistic --mode nth --every 3 --packet 1 -m state --state new -j CONNMARK --set-mark 2
-A PREROUTING -d  [DNS_PROXY] -p udp -m udp --dport 53 -m statistic --mode nth --every 3 --packet 2 -m state --state new -j CONNMARK --set-mark 3
-A PREROUTING -d  [DNS_PROXY] -p tcp -m tcp --dport 53 -m statistic --mode random --probability 0.333333 -m state --state new -j CONNMARK --set-mark 1
-A PREROUTING -d  [DNS_PROXY] -p tcp -m tcp --dport 53 -m statistic --mode random --probability 0.5 -m state --state new -j CONNMARK --set-mark 2
-A PREROUTING -d  [DNS_PROXY] -p tcp -m tcp --dport 53 -m state --state new -j CONNMARK --set-mark 3
COMMIT

*nat
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-A PREROUTING -m connmark --mark 1 -j DNAT --to-destination [DNS_RESOLVER_1]
-A PREROUTING -m connmark --mark 2 -j DNAT --to-destination [DNS_RESOLVER_2]
-A PREROUTING -m connmark --mark 3 -j DNAT --to-destination [DNS_RESOLVER_3]
-A POSTROUTING -m connmark --mark 1 -j SNAT --to-source [DNS_PROXY]
-A POSTROUTING -m connmark --mark 2 -j SNAT --to-source [DNS_PROXY]
-A POSTROUTING -m connmark --mark 3 -j SNAT --to-source [DNS_PROXY]
COMMIT

We want DNS packets that are queries, rather than responses, to be sent via the DNSCHECK chain. Other DNS packets are passed unchecked whilst non-DNS protocols are blocked. We only accept DNS packets from a trusted SRC_NET to avoid being an open resolver.

*filter
:FORWARD ACCEPT [0:0]
-A FORWARD -s SRC_NET -p udp --dport 53 -m u32 --u32 "0>>22&0x3C@8>>15&0x01=0" -j DNSCHECK
-A FORWARD -s SRC_NET -p udp --dport 53 -j ACCEPT
-A FORWARD -s SRC_NET -p tcp --dport 53 -j ACCEPT
-A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT
-A FORWARD -j LOGDROP

To reduce the number of (expensive) "string" match tests applied to each DNS query we dispatch the packet to the relevant DNSCHECKxx chain based on the first two characters of the query data, using a (cheap) "u32" match on the correct offset of the packet.

:DNSCHECK - [0:0]
-A DNSCHECK -m u32 --u32 "0>>22&0x3C@19&0xffff=0x6161" -j DNSCHECKaa
-A DNSCHECK -m u32 --u32 "0>>22&0x3C@19&0xffff=0x6162" -j DNSCHECKab
...
-A DNSCHECK -m u32 --u32 "0>>22&0x3C@19&0xffff=0x7a7a" -j DNSCHECKzz

In each DNSCHECKxx chain we perform the full string matches on the query data.

:DNSCHECKaa - [0:0]
-A DNSCHECKaa -m string --from 40 --to 53 --hex-string "|08|aaadokfn|03|net|00|" --algo bm -j LOGDROP
-A DNSCHECKaa -m string --from 40 --to 49 --hex-string "|05|aaakp|02|cc|00|" --algo bm -j LOGDROP
-A DNSCHECKaa -m string --from 40 --to 55 --hex-string "|0b|aaapcxqiqgg|02|ws|00|" --algo bm -j LOGDROP
...

Log and drop matches, rate limiting to prevent flooding of logfiles.

:LOGDROP - [0:0]
-A LOGDROP -m limit --limit 1/second --limit-burst 100 -j LOG
-A LOGDROP -j DROP

The following sample scripts should be useful for generating and loading the daily iptables rulesets:

gen_domains.sh - http://pastebin.com/f240d06a1

Uses Downatool2 under Wine to create date-based files for a number of days that contain a sorted list of rendezvous domains for each Conficker variant. Could be invoked as an unprivileged user from cron once per week.

build_iptables.pl - http://pastebin.com/f2c085ac4

Builds a iptables firewall policy in iptables-restore format that implements the NAT-filter for DNS described above from a list of files containing blacklisted domains.

load_iptables.sh - http://pastebin.com/f158d3ffd

Uses build_iptables.pl to create a firewall policy to filter domains for the next couple of days and then executes it. Could be invoked as root from cron each night.

Sunday, April 13, 2008

Catching up with Barcode Writer in Pure PostScript

It has been a while since I last wrote about Barcode Writer in Pure PostScript. The project has been far from dormant in recent months so here is a chance to catch up with what's been keeping me busy in my “spare” time. (And even very busy at other times!)

It's hard to recall from memory all of the improvements that have been made to the project over the last year but thankfully the commit logs do not forget! There have been the usual bug fixes, some code optimisations and new miscellaneous features, but the main highlight has to be the inclusion of support for 2D barcodes which went in to the mix as follows:

MaxiCode (June to July 2007)

MaxiCode is an irregular matrix symbologies whose symbols consist of a hexagonal grid of dots around a bullseye finder pattern. The sequencing of these dot positions does not follow any regular pattern and so unfortunately the mapping matrix must be hard-coded into the software. MaxiCode also has various different "modes" of operation, some of which impose a strict format on the initial part of the data which makes the input encoding quite complicated.

PDF417 (Boxing Day to New Year's Day)

Technically speaking, you might refer to PDF417 as a "stacked-linear" symbology, however BWIPP renders it using a grid of tall, rectangular cells. The worst thing about this symbology is that it requires a set of lookup tables that contain the "cluster sets" - three groups of 930 numbers used to convert from codewords to bar/space widths. The sequencing of the numbers within these sets appears to be quite random (if you know otherwise then please let me know) and so they must be hard-coded into the software which leads to a lot of uninteresting code – ouch!

Data Matrix (early- to mid-January)

The is a matrix symbology that can be rendered using a grid of squares through which the data zig-zags in eight-module, L-shaped clusters. Whilst the ordering of the modules within the grid is reasonably complicated, it can nevertheless be determined algorithmically for only a small amount of computational cost and requires only some minor tweaking to fix up the corner cases for matrices that do not contain some multiple of eight modules. So overall this symbology can be coded very nicely. We can generate both the standard square symbols types as well as the optional rectangular symbols.

Aztec Code (early- to mid-February)

This is a matrix symbology that can be rendered using a grid of squares with the data wrapping clockwise in two-module wide layers around a square finder pattern in the centre of the symbol. Whilst there are a few different types of symbols it is possible to fold the implementation for each of these into a single relatively sophisticated but direct algorithm that does containing excessive branching. So again, this symbology can be coded quite cleanly.

QR Code (February to late-March)

This is a matrix symbology that can be rendered using a grid of squares with the data vertically meandering in two-modules columns from right to left. With respect to implementation this symbology is quite hideous with its one saving grace being that is does not require the inclusion of hard-coded lookup tables for module placement. Firstly, in certain symbols the final data codeword is defined to be four bits wide rather than the usual eight which results in an awkward bit shift having to be applied to the trailing codewords in order to avoid propagating the exceptional processing required for these specials cases throughout the remainder of symbol generation process. Secondly, the "drunken walk" algorithm for placing the modules within the symbol (whilst avoiding the pre-defined static feature placeholders) has an unexplained inconsistency in the way that you perform the hop over the vertical timing pattern. Thirdly, the format and version information functions are unnecessarily complicated, however since their domain is very small it is possible to use a small set of pre-calculated lookup tables for these in order to avoiding using a significant amount of complex code. But finally, the worst aspect of this symbology is the optional, but recommended, process of apply eight distinct mask patterns to the candidate symbol in turn and then to evaluate these in order to select the one that would produce an output symbol with the fewest undesirable properties. To perform the evaluation algorithm as given by the specification turns out to be significantly more operationally expensive that the entire remainder of the symbol generation process! So for the time being we always select one particular mask.

Going Forward

So, we presently support all major 2D barcode formats, but with one major caveat - the user (or application developer) that is working with BWIPP has to do some preparative work to process the barcode data into the particular intermediate format required by each encoder for which they require the corresponding specification. This is a small task compared to the sometimes sophisticated numeric manipulation involved in the remainder of the symbol generation process. However it does involve extensive string manipulation which is a task for which PostScript is definitely not well suited whilst purpose-built application development languages (such as Perl and C++) have much better support for this task either natively or through libraries.

So the next major set of challenges on the BWIPP roadmap is to integrate the high-level encoding routine for each 2D symbology that convert from a user-supplied ASCII string to the intermediate format that is required by the encoders at present. The result will be that the novice user can simply enter the data that they require to place into a barcode, with only the minimal restrictions as necessarily imposed by each symbology, and our code will create the most optimal encoding that produces the best symbol for the given data, thereby making the system much easier to use for the uninitiated user.

[Update: Since July 2009, all of the encoders now feature (non-optimised) auto-encoding of the user-provided input data into an intermediate format required by that encoder.]

Lastly, but by no means least, an extremely useful component in the implementation of support for 2D barcode generation has been the extensive testing performed by Jean-Fran├žois Barbeau. He has helped detect and fix a number of bugs, some obvious, and some much more subtle so that we can place much greater confidence in the correctness of the output – so a big thank you on behalf of the PostScript barcoding community!

Friday, November 09, 2007

Apache accesslog to syslog

Apache allows its error logs to be written to the local syslog, however it does not natively support the directing of access logs to the syslog. How frustrating!

It does however allow access logs to be written to a pipe and I have seen a number of home-brew scripts that essentially redirect the Apache access log data from STDIN to syslog.

I've yet to see anything quite as simple as the following directive that I cooked up today:

CustomLog "|exec /usr/bin/logger -t apache -i -p local6.notice" combined

It pipes the access log data to the BSD logger(1) utility that is installed by default on almost any Unix system. No need for any more of those STDIN wrapper scripts!