The following little script wraps the p0f process and removes the redundant information from each 1000 record chunk of output:
#! /bin/sh
exec 3>&-
exec 2>&-
exec 1>&-
cd /
nohup p0f -i eth2 -u p0f -N -U -q -p -t -l 'src net 143.210.0.0/16' | \
sed -n -e 's/^<\([A-Za-z0-9: ]*\)> \([0-9.]\{7,15\}\):[0-9]\{1,5\} - \
\(.*\)/\2 \3/p' | gawk 'ORS=NR%1000?"\n":"\000"' | xargs -0 -i bash -c \
'date +"*** %c ***"; echo "$0" | sort | uniq' {} >> /srv/p0f/os.log &
p0f aside, the interesting part boils down to this useful Unix shell programming paradigm:
$INPUT_CMD | gawk "ORS=NR%$BLOCK_LINES?'\n':'\000'" | xargs -0 -i $PROCESS_CMD {}It splits the streamed output of $INPUT_CMD down into chunks of $BLOCK_LINES lines which are immediately independently processed by $PROCESS_CMD. It chunks the data by replacing the ordinary line separator on every $BLOCK_LINES line into an ASCII 0 character which xargs -0 uses as the argument separator.