Night Hour

Reading under a cool night sky ... 宁静沉思的夜晚 ...

Blocking Sensitive Content using Nginx and Docker

Pavilion leaf

I'm smart enough to know that I'm dumb. , Richard Feynman


21 June 2018


Introduction

Web application firewalls (WAFs) are often deployed by security professionals to protect applications against malicious attacks. Some of these like the popular opensource Mod-Security, can inspect both the incoming request and the outgoing response. It can detect web attacks and information leakage. There are also cloud-based WAFs such as those by Cloudflare, Securi etc... that made it easy to protect a web application or website.

Not all web application firewalls offer outgoing response inspection. Some WAFs solely focused on analyzing incoming request to stop attack before it can reach the application. This article shows how to build a simple Nginx module that can inspect outgoing response body for sensitive data and block the response. The module uses PCRE regular expression library to inspect content and is based on a fork of Weibin Yao's nginx substitution filter.

This module can be useful as an additional layer of defense against web attacks. It can complement a WAF that only analyzes incoming requests. The module will be compiled into Nginx and packaged as a Docker image.

Design and Approach

Weibin Yao's substitution module matches specific content in the HTTP response body using either regular expression or fixed string. It can replace these matches with specific values. This replacement functionality can already be used to "block" sensitive content. For example, a regular expression can match for Singapore identity card number (NRIC) and replace it with a single blank space.

However, it may be convenient to prevent an entire html content or page from being displayed if it contains sensitive identity card numbers. It will be relatively easy to modify the original substitution module to do this through a forked version.

The following diagram illustrates one of the ways that this module can be used with Nginx to block outbound content with sensitive information.

Nginx Reverse Proxy to Filter and Block Sensitive Content
Fig 1. Nginx Reverse Proxy to Filter and Block Sensitive Content

Nginx is compiled with the content filter module and run as a reverse proxy infront of a web application. It inspects the outgoing content body using regular expression. If a specific number of matches for sensitive data occurred, the content is blocked and Nginx displays an empty page instead of the orginal response. Note, instead of a reverse proxy setup, the module can also be used directly on a website served from Nginx.

Nginx uses a chain of buffers to store the outgoing response. Weibin Yao's substitution module processes the outgoing response, looking for a linefeed character (\n). When a linefeed is found, the characters up till and including the linefeed is stored into a variable line_in. Matching and substitution is then performed on line_in and a new string with replacements copied to a buffer variable, ctx->out_buf.

ctx->out_buf itself is appended to the ctx->out chain of buffers whenever new memory is allocated for it. When all the lines and response body content are processed, the ctx->out chain is passed along to the next filter in Nginx. This modified content will eventually be delivered to the client once it has cleared all other nginx filters.

The new content filter module will retain most of this logic. However, it doesn't need to do any replacements or substitutions. Instead, it keeps a count for the number of matches per regular expression. Unlike the original substitution module, the content filter will not do fixed string matching, all matching will be done through the PCRE regular expression engine. The new module will only do case insenstive comparisons.

If the number of matches for a particular regular expression equals or exceeds a specified threshold, the content is deemed to be sensitive and will be blocked. Some of the original code in the substitution module is refactored to make it clearer and easier to understand.

Implementation

This section will run through parts of the content filter implementation. The full source code is available from the Github link at the bottom of the article. It is assumed that the reader understands how a basic Nginx module is structured and how a module works. Refer to an earlier article Writing an Nginx Response Body Filter Module for a quick introduction on how to code a simple Nginx filter. There are also links to other resources for developing Nginx modules.

The following snippet shows the configuration directives that the content filter accepts.

static ngx_command_t  ngx_http_ct_filter_commands[] = {

      { ngx_string("ct_filter"),
      NGX_HTTP_LOC_CONF|NGX_CONF_TAKE2,
      ngx_http_ct_filter,
      NGX_HTTP_LOC_CONF_OFFSET,
      0,
      NULL },

    { ngx_string("ct_filter_logonly"),
      NGX_HTTP_MAIN_CONF|NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONF|NGX_CONF_1MORE,
      ngx_conf_set_flag_slot,
      NGX_HTTP_LOC_CONF_OFFSET,
      offsetof(ngx_http_ct_loc_conf_t,logonly),
      NULL },

    { ngx_string("ct_filter_types"),
      NGX_HTTP_MAIN_CONF|NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONF|NGX_CONF_1MORE,
      ngx_http_types_slot,
      NGX_HTTP_LOC_CONF_OFFSET,
      offsetof(ngx_http_ct_loc_conf_t, types_keys),
      &ngx_http_html_default_types[0] },

    { ngx_string("ct_line_buffer_size"),
      NGX_HTTP_MAIN_CONF|NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONF|NGX_CONF_TAKE1,
      ngx_conf_set_size_slot,
      NGX_HTTP_LOC_CONF_OFFSET,
      offsetof(ngx_http_ct_loc_conf_t, line_buffer_size),
      NULL },

    { ngx_string("ct_buffers"),
      NGX_HTTP_MAIN_CONF|NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONF|NGX_CONF_TAKE2,
      ngx_conf_set_bufs_slot,
      NGX_HTTP_LOC_CONF_OFFSET,
      offsetof(ngx_http_ct_loc_conf_t, bufs),
      NULL },

    ngx_null_command
};

The ct_filter directive takes 2 arguments and can occur in the Nginx configuration location block. The first argument is the regular expression to compare against each line of the response body. The second is the threshold for the number of matches. If the number of matches for the entire response body equals or exceeds this threshold, the content is flagged as sensitive.

The ct_filter_logonly takes a on/off value and can occur in the main, server, location blocks of the Nginx configuration file. By default ct_filter_logonly is set to off. When this directive is set to "on", the module will not block sensitive content, it will only log that such content has been detected. This option is useful when tuning the regular expression or troubleshooting issues.

The ct_filter_types specifies the MIME content type of the responses that the content module will process. The default is text/html. Additional types such as text/plain, application/javascript etc... can be specified so that the module will inspect these for sensitive content.

The other parameters ct_line_buffer_size and ct_buffers are directives for tuning the module. ct_line_buffer_size specifies the initial buffer size for storing a line, the default is 8 x pagesize. On most system it should be 8 x 4096 bytes. The ct_buffers specifies the number of buffers and the size of each buffer. This directive can be used to tune the number of buffers used by the module and the size of each buffer. Note that this ct_buffers directive is currently not fully implemented.

The following shows the code snippet for some of the data structures used by the Nginx content filter module.

typedef struct {
     ngx_str_t      match;
#if (NGX_PCRE)
    ngx_regex_t   *match_regex;
    int           *captures;
    ngx_int_t      ncaptures;
#endif
    unsigned int    occurence; 
    unsigned int    matched;
} blk_pair_t;



typedef struct {
    ngx_array_t   *blk_pairs; /* array of blk_pair_t */
    ngx_flag_t    logonly;   /* flag to indicate logging only */
    ngx_chain_t   *in;

    /* the line input buffer before substitution */
    ngx_buf_t     *line_in;
   
    /* the last output buffer */
    ngx_buf_t     *out_buf;
    /* point to the last output chain's next chain */
    ngx_chain_t  **last_out;
    ngx_chain_t   *out;

    ngx_chain_t   *busy;

    /* the freed chain buffers. */
    ngx_chain_t   *free;

    ngx_int_t      bufs;

    unsigned       last;
    unsigned int    matched;

} ngx_http_ct_ctx_t;

blk_pair_t is a data structure that holds the compiled regular expression (match_regex) used for comparison, the threshold for the number of matches (occurence) that determines if the content is sensitive, and an integer variable (matched) that tracks the number of matches for the whole response body. ngx_http_ct_ctx_t is the request module context. It allows the module to track and maintain state per request.

The following shows the code snippet for ngx_http_ct_header_filter().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
static ngx_int_t
ngx_http_ct_header_filter(ngx_http_request_t *r)
{
    ngx_http_ct_loc_conf_t  *slcf;

    slcf = ngx_http_get_module_loc_conf(r, ngx_http_ct_filter_module);

    if(slcf == NULL)
    {
        return ngx_http_next_header_filter(r);
    }

    if (slcf->blk_pairs == NULL
        || slcf->blk_pairs->nelts == 0
        || r->header_only
        || r->headers_out.content_type.len == 0
        || r->headers_out.content_length_n == 0)
    {
        return ngx_http_next_header_filter(r);
    }

    if (ngx_http_test_content_type(r, &slcf->types) == NULL) {
        return ngx_http_next_header_filter(r);
    }

    //Check for compressed content  
    if(ngx_test_ct_compression(r) != 0)
    {//Compression enabled, don't filter   
        ngx_log_error(NGX_LOG_WARN, r->connection->log, 0,
                     "[Content filter]: ngx_http_ct_header_filter"
                     " compression enabled skipping");
        return ngx_http_next_header_filter(r);
    }

    #if CONTF_DEBUG
        ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                       "[Content filter]: http content filter header \"%V\"", &r->uri);
    #endif

    if (ngx_http_ct_init_context(r) == NGX_ERROR) {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
                     "[Content filter]: ngx_http_ct_header_filter"
                     " cannot initialize request ctx");
        return NGX_ERROR;
    }

    r->filter_need_in_memory = 1;

    return ngx_http_next_header_filter(r);

}

This function handles the response headers and is called by Nginx for every response that it is processing. The function checks that module configuration is configured and that the response is not empty. If the response contains only headers (request is a HTTP HEAD method), it won't be processed further. The response is also checked for compression and its content type. Compressed response will not be processed. Response with content type that is not configured to be handled by the module will not be processed.

One of the difference between the original substitution filter and the code here is the use of Chunk Encoding. The substitution filter uses Chunked Transfer Encoding due to the fact the content may be changed after replacements and will therefore have different length. For the case here, there are no content replacements, although a blank empty page may be displayed if sensitive data is detected. The author (me) has a biased view that Chunk Encoding can affect performance and therefore use a simple trick to set keep alive to false when displaying the empty page. Although the empty page has a different length then what is in the content-length header, setting keep alive to false will cause the connection to be terminated when nginx completes the sending of the empty page. The browser should be able to handle this properly and will not incur performance penalities from Chunk Encoding.

Another difference is the clearing of last modified header. Again for performance, the module will not cleared the last modified header. Last modified header is used by web caching mechanism to determine if fresh content needs to be fetched. Not clearing this means that pages can be serviced by caches. This improves performance but can sometimes lead to stale content being displayed. The caches may have to be cleared manually when such cases occur.

The following shows the ngx_http_ct_body_filter() function.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
static ngx_int_t
ngx_http_ct_body_filter(ngx_http_request_t *r, ngx_chain_t *in)
{
    ngx_int_t    	           rc;
    ngx_log_t                 *log;
    ngx_chain_t               *cl;
    ngx_http_ct_ctx_t       *ctx;
    ngx_http_ct_loc_conf_t  *slcf;
    ngx_buf_t  *b;
    
    log = r->connection->log;

    slcf = ngx_http_get_module_loc_conf(r, ngx_http_ct_filter_module);
    if (slcf == NULL) {
        return ngx_http_next_body_filter(r, in);
    }

    ctx = ngx_http_get_module_ctx(r, ngx_http_ct_filter_module);
    if (ctx == NULL) {
        return ngx_http_next_body_filter(r, in);
    }

    #if CONTF_DEBUG
        ngx_log_debug1(NGX_LOG_DEBUG_HTTP, log, 0,
                       "[Content filter]: ngx_http_ct_body_filter http content filter \"%V\"", &r->uri);
    #endif

    if (in == NULL && ctx->busy == NULL) {
        return ngx_http_next_body_filter(r, in);
    }

    if (ngx_http_ct_body_filter_init_context(r, in) != NGX_OK){
        goto failed;
    }

    for (cl = ctx->in; cl; cl = cl->next) {

        if (cl->buf->last_buf || cl->buf->last_in_chain){
            ctx->last = 1;
        }
        
        
        /* 
           Process each buffer for sensitive content matching
        */
        if(!ctx->matched)
        {//no sensitive content is detected earlier
            rc = ngx_http_ct_body_filter_process_buffer(r, cl->buf);
            
            if (rc == NGX_ERROR) {
                ngx_log_error(NGX_LOG_ERR, log, 0,  "[Content filter]: "
                    "ngx_http_ct_body_filter error procesing buffer"
                    " for sensitive content");
                goto failed;
            }
        }
        else if(ctx->logonly)
        {//sensitive content already detected
         //just copy remaining buffer to output chain
         //if logonly is enabled, otherwise can ignore
            if (ngx_http_ct_out_chain_append(r, ctx, 
                cl->buf)!= NGX_OK) 
            {
                ngx_log_error(NGX_LOG_ERR, log, 0,  "[Content filter]: "
                    "ngx_http_ct_body_filter cannot append to output chain");
                goto failed;
            }
        }
        
        
        if (ctx->last) 
        {//last buffer set the last_buf or last_in_chain flag 
         //for the last output buffer
            if (ctx->out_buf == NULL) {
                if (ngx_http_ct_get_chain_buf(r, ctx) != NGX_OK) {
                    ngx_log_error(NGX_LOG_ERR, log, 0,
                              "[Content filter]: ngx_http_ct_body_filter "
                              "cannot get buffer for out_buf");
                    return NGX_ERROR;
              }
            }
            
            if( ngx_buf_size(ctx->out_buf) == 0)
            {//last buffer is zero size
                 ctx->out_buf->sync = 1;
            }
            
            ctx->out_buf->last_buf = (r == r->main) ? 1 : 0;
            ctx->out_buf->last_in_chain = cl->buf->last_in_chain;
            break;
        }
        
    }

    /* It doesn't output anything, return */
    if ((ctx->out == NULL) && (ctx->busy == NULL)) {
        ngx_log_error(NGX_LOG_WARN, r->connection->log, 0, 
                     "[Content filter]: ngx_http_ct_body_filter nothing to output");
        return NGX_OK;
    }
    
    /*If sensitive content is detected */
    if(ctx->matched && ctx->last)
    {
         ngx_log_error(NGX_LOG_ALERT, r->connection->log, 0,
                      "[Content filter]: Alert ! Sensitive content is detected !");
        
        if(!ctx->logonly)
        { //logonly is not enabled. Show empty page. 
    
            //Get a new buffer into ctx->out_buf        
            if (ngx_http_ct_get_chain_buf(r, ctx) != NGX_OK) {
                ngx_log_error(NGX_LOG_ERR, log, 0,  "[Content filter]: "
                    "ngx_http_ct_body_filter cannot allocate chain "
                    "for empty page");
                goto failed;
            }
            
            b = ctx->out_buf; 
            ngx_memzero(b, sizeof(ngx_buf_t));
            
            b->tag = (ngx_buf_tag_t) &ngx_http_ct_filter_module;
            b->memory=1;
            b->pos = empty_page;
            b->last = empty_page + ngx_strlen(empty_page);
            b->last_buf = 1;
            b->last_in_chain = 1;
            
            ctx->out->buf = b;
            ctx->out->next = NULL; 
            
            r->keepalive = 0;
            
        }
        
    }
    
    return ngx_http_ct_output(r, ctx, in);

failed:

    ngx_log_error(NGX_LOG_ERR, log, 0,
                  "[Content filter]: ngx_http_ct_body_filter error.");

    return NGX_ERROR;
}

The function is called by Nginx for each response body. It goes through the buffer chain containing the buffers that hold the response body, processing each buffer using the ngx_http_ct_body_filter_process_buffer() function. If matches for a regular expression equal or exceed the configured threshold, the data is deemed to be sensitive, ctx->matched is set. If logonly is set to "on", the module will allow the original content to be sent to the browser and log an alert indicating that sensitive data is detected. The default behavior is to log an alert and block the sensitive content by displaying a blank empty page. It actually just sent a " ".

The following shows the ngx_http_ct_body_filter_process_buffer() function.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
static ngx_int_t
ngx_http_ct_body_filter_process_buffer(ngx_http_request_t *r, ngx_buf_t *b)
{
    u_char               *p, *last, *linefeed;
    ngx_int_t             len, rc;
    ngx_http_ct_ctx_t  *ctx;

    ctx = ngx_http_get_module_ctx(r, ngx_http_ct_filter_module);

    if (b == NULL) {
        //Input buffer shouldn't be NULL 
        //If it is NULL, it is an error
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
            "[Content filter]: ngx_http_ct_body_filter_process_buffer "
            " input buffer is null");
        return NGX_ERROR;
    }

    p = b->pos;
    last = b->last;
    b->pos = b->last; //buffer is consumed

    #if CONTF_DEBUG
        ngx_log_debug4(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                       "[Content filter]: processing buffer: %p %uz, line_in buffer: %p %uz",
                       b, last - p,
                       ctx->line_in, ngx_buf_size(ctx->line_in));
    #endif

    if ((last - p) == 0 && ngx_buf_size(ctx->line_in) == 0){
        return NGX_OK;
    }

    if ((last - p) == 0 && ngx_buf_size(ctx->line_in) && ctx->last) {

        #if CONTF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                           "[Content filter]: the last zero buffer, try to do substitution");
        #endif

        rc = ngx_http_ct_match(r, ctx);
        if (rc < 0) {
            ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
                "[Content filter]: ngx_http_ct_body_filter_process_buffer"
                " regex matching for line fails");
            return NGX_ERROR;
        }

        return NGX_OK;
    }

    while (p < last) {

        linefeed = memchr(p, LF, last - p);

        #if CONTF_DEBUG
            ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0, "[Content filter]: find linefeed: %p",
                           linefeed);
        #endif

        if (linefeed == NULL) {

            if (ctx->last) {
              /* Last buffer no line feed. Set linefeed to last - 1 so
                it will be processed in subsequent block
                (last - 1) will unlikely be zero since last as a 
                memory pointer should not be 1 unless there is an
                error elsewhere.  */
                linefeed = last - 1;

                #if CONTF_DEBUG
                    ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                                   "[Content filter]: the last buffer, not find linefeed");
                #endif
            }
            else {
                /* Not last buffer and no linefeed. Accumulate and wait for other buffers with linefeed*/
                if (buffer_append_string(ctx->line_in, p, last - p, r->pool)
                    == NULL) {
                    ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
                        "[Content filter]: ngx_http_ct_body_filter_process_buffer"
                        " cannot append to string buffer");
                    return NGX_ERROR;
                }

                break;
            }
        }

        if (linefeed) {

            len = linefeed - p + 1;

            if (buffer_append_string(ctx->line_in, p, len, r->pool) == NULL) {
                ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
                        "[Content filter]: ngx_http_ct_body_filter_process_buffer  "
                        " cannot append to string buffer");
                return NGX_ERROR;
            }

            p += len;

            rc = ngx_http_ct_match(r, ctx);
            if (rc < 0) {
                ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
                     "[Content filter]: ngx_http_ct_body_filter_process_buffer"
                     " regex matching for line fails");
                return NGX_ERROR;
            }

        }
    }

    return NGX_OK;
}

The function goes through a buffer and look for linefeed character that indicates an end of line. It appends the characters in the line (including the linefeed) into ctx->line_in. When a line is available, it calls the function ngx_http_ct_match() to do the matching. If no linefeed is found in the current buffer, all the content is appended to ctx->line_in, waiting for subsequent buffers which may contain linefeed. If the current buffer is the last buffer and no linefeed is found, all the content in the current buffer is appended to ctx->line_in and ngx_http_ct_match() is called to do the matching. This ensures that the whole response body will be processed regardless of whether there is a linefeed or not.

The following shows the code for ngx_http_ct_match() function.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
static ngx_int_t
ngx_http_ct_match(ngx_http_request_t *r, ngx_http_ct_ctx_t *ctx)
{

    ngx_log_t   *log;
    ngx_int_t    count, match_count;
    #if (NGX_PCRE)     
    ngx_buf_t   *src;
    ngx_uint_t   i;
    blk_pair_t  *pairs, *pair;
    ngx_str_t input;
    #endif

    match_count = 0;
    count = 0;

    log = r->connection->log;

    if(ngx_buf_size(ctx->line_in) <= 0)
    {
        return match_count;
    }


    #if (NGX_PCRE)   
    src = ctx->line_in;

    if(!ctx->matched)
    {//this block will not run if sensitive content is already detected

        pairs = (blk_pair_t *) ctx->blk_pairs->elts;
        for (i = 0; i < ctx->blk_pairs->nelts; i++) {

            pair = &pairs[i];
            input.data = src->pos;
            input.len = ngx_buf_size(src);

            while(input.len > 0)
            {
                /* regex matching */

                pair->ncaptures = (NGX_HTTP_MAX_CAPTURES + 1) * 3;
                pair->captures = ngx_pcalloc(r->pool, pair->ncaptures * sizeof(int));

                count = ngx_regex_exec(pair->match_regex, &input, pair->captures, pair->ncaptures);
                if (count >= 0) {
                    /* Regex matches */
                    match_count += count;

                    /*
                      To track  previous matches pair->matched is used. 
                    */
                    pair->matched++;

                    input.data = input.data + pair->captures[1];
                    input.len = input.len - pair->captures[1];

                    if(pair->matched >= pair->occurence)
                    {
                        ctx->matched++;           
                        break;
                    }

                } else if (count == NGX_REGEX_NO_MATCHED) {
                     //no match break out of while loop
                     break;

                } else {

                    ngx_log_error(NGX_LOG_ERR, log, 0,  "[Content filter]: ngx_http_ct_match"
                                                        " regexec failed: %i", count);
                    goto failed;
                }

            }


            if(ctx->matched)
            {//one of the regex pair has matched
             //exit the for loop
              break;
            }


        }
    }
    #endif


    if (ngx_http_ct_out_chain_append(r, ctx,
        ctx->line_in)!= NGX_OK) {
            ngx_log_error(NGX_LOG_ERR, log, 0,  "[Content filter]: "
            "ngx_http_ct_match cannot append line to output buffer: %i", count);
            goto failed;
        }


    ngx_buffer_init(ctx->line_in);

    #if CONTF_DEBUG
        ngx_log_debug1(NGX_LOG_DEBUG_HTTP, log, 0, "[Content filter]: match counts: %i", match_count);
    #endif

    return match_count;

failed:

    ngx_log_error(NGX_LOG_ERR, log, 0,
                  "[Content filter]: ngx_http_ct_match error.");

    return -1;
}

The regular expression matching is done in this function. It will go through the array of blk_pair_t (array of regular expressions and other attributes) and match it against the line. If a match is found, it will increment the matched variable (pair->matched) that tracks the number of matches for a regular expression. The matching continues for the substring excluding the previous match until the end of the line. If the matched variable (pair->matched) equals or exceeds the threshold (pair->occurence) for the regular expression, a flag (ctx->matched) is set to indicate sensitive data is detected. No futher regular expression matching will be done once sensitive data is detected.

Building the Docker Image

This section uses a Ubuntu linux system (18.04 LTS) with Docker Community Edition installed to build the Nginx image with the content filter module. Refer to Docker Installation for information on how to install and set up Docker.

We will use a docker multi-stage build to create the nginx content filter image. Create a working directory and change to the current path to the new directory.

mkdir mynginx
cd mynginx

Enable Content Trust to verify the docker base images that will be pulled from DockerHub.

export DOCKER_CONTENT_TRUST=1

We will use alpine linux 3.10.2 as the base image for the nginx application. Create a Dockerfile with the following content.

#Docker Image for building
FROM alpine:3.10.2 as builder
COPY build.sh /root
RUN cd root &&\
    chmod 755 build.sh &&\
    ./build.sh


#Actual image to be created
FROM alpine:3.10.2
COPY --from=builder /usr/local/nginx /usr/local/nginx
RUN touch /usr/local/nginx/logs/access.log &&\
    touch /usr/local/nginx/logs/error.log &&\
    ln -sf /dev/stdout /usr/local/nginx/logs/access.log &&\
    ln -sf /dev/stderr /usr/local/nginx/logs/error.log &&\
    addgroup -g 8000 nginx &&\
    adduser -G nginx -u 8000 -D  -s /sbin/nologin nginx &&\
    mkdir /usr/local/nginx/tmp &&\
    chmod 1777 /usr/local/nginx/tmp

USER nginx
EXPOSE 8000/tcp

STOPSIGNAL SIGTERM

CMD ["/usr/local/nginx/sbin/nginx", "-g", "daemon off;"]

The Dockerfile is a multi-stage build, the first portion contains the instructions to create the builder image and compile nginx with the content filter module. A script build.sh is used to download the required sources and compile nginx. The second portion creates an nginx image using the compiled binary created by the builder image.

The nginx application will be run as a normal user instead of root. The logs will be sent to stdout and stderr. A special temporary directory /usr/local/nginx/tmp is created that can be mounted using tmpfs. This allows us to run the nginx image as an immutable read only image. All the temporary files used by Nginx will be written to /usr/local/nginx/tmp which is a tmpfs memory-based filesystem.

Create the build.sh script with the following content.

#!/bin/sh
apk update
apk add wget gcc libc-dev make git g++ perl linux-headers gnupg
mkdir build
cd build
wget https://nginx.org/download/nginx-1.16.1.tar.gz
wget https://ftp.pcre.org/pub/pcre/pcre-8.43.tar.gz
wget https://www.zlib.net/zlib-1.2.11.tar.gz
wget https://www.openssl.org/source/openssl-1.1.1d.tar.gz
git clone https://github.com/ngchianglin/NginxContentFilter.git

nginx_sha256="f11c2a6dd1d3515736f0324857957db2de98be862461b5a542a3ac6188dbe32b"
pcre_sha256="0b8e7465dc5e98c757cc3650a20a7843ee4c3edf50aaf60bb33fd879690d2c73"
zlib_sha256="c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1"
openssl_sha256="1e3a91bc1f9dfce01af26026f856e064eab4c8ee0a8f457b5ae30b40b8b711f2"
content_filter_config="d20e9df127e9e3c87e175b7a2191021a9a3ffc0d94aff5e1dfbdbbaaea033074"
content_filter_module="299de1516e76be72a643825b9bfcf0feae84ae7294e7b851ec647e528a780199"

cksum()
{
  checksum=$1
  file=$2
  val="`sha256sum $file  | cut -d ' ' -f1`"

  if [ $val != $checksum ]
  then
      echo "Sha256 sum of package $file does not match !"
      exit 1
  else
      return 0
  fi
}

cksum $nginx_sha256 "nginx-1.16.1.tar.gz"
cksum $pcre_sha256 "pcre-8.43.tar.gz"
cksum $zlib_sha256 "zlib-1.2.11.tar.gz"
cksum $openssl_sha256 "openssl-1.1.1d.tar.gz"
cksum $content_filter_config "NginxContentFilter/config"
cksum $content_filter_module "NginxContentFilter/ngx_http_ct_filter_module.c"

tar -zxvf nginx-1.16.1.tar.gz
tar -zxvf pcre-8.43.tar.gz
tar -zxvf zlib-1.2.11.tar.gz
tar -zxvf openssl-1.1.1d.tar.gz


#
# Take note that alpine linux uses musl as the c library  
# instead of glibc. musl at the moment doesn't
# support _FORTIFY_SOURCE and this option have no 
# effect 
#
cd nginx-1.16.1
./configure --with-cc-opt="-Wextra -Wformat -Wformat-security -Wformat-y2k -Werror=format-security -fPIE -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all" --with-ld-opt="-pie -Wl,-z,relro -Wl,-z,now -Wl,--strip-all" --with-http_v2_module --with-http_ssl_module --without-http_uwsgi_module --without-http_fastcgi_module   --without-http_scgi_module --without-http_empty_gif_module --with-openssl=../openssl-1.1.1d --with-openssl-opt="no-ssl2 no-ssl3 no-comp no-weak-ssl-ciphers -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all -fPIC" --with-zlib=../zlib-1.2.11 --with-zlib-opt="-O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all -fPIC" --with-pcre=../pcre-8.43 --with-pcre-opt="-O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all -fPIC" --with-pcre-jit --add-module=../NginxContentFilter
make
make install
cat << EOF > /usr/local/nginx/conf/nginx.conf
worker_processes  1;
events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    tcp_nopush      on;
    tcp_nodelay     on;
    keepalive_timeout  65;

    server {
        listen       8000;
        server_name  localhost;
        charset utf-8;

        location / {
                root   html;
                index  index.html index.htm;
        }

        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

    }

}

EOF

The build.sh is used by the builder for compiling nginx from source. Notice that it verifies all the source code that is fetched using sha256 checksums configured in the script itself.

Note that the compiler option _FORTIFY_SOURCE is not supported in the c library, musl, used by alpine linux. This option will have no effect on the final compiled nginx binary.

Let's proceed to build the nginx docker image.

docker build -t mynginx .

A docker image with the tag mynginx will be created. This image contains Nginx compiled with the content filter module. The image comes with a default configuration for Nginx. To run the content filter, we shall use a custom configuration file.

Create a nginx.conf file inside a new directory called conf.

mkdir conf
cd conf
vim nginx.conf

Add the following to nginx.conf

worker_processes  4;
pid        /usr/local/nginx/tmp/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       mime.types;
    default_type  application/octet-stream;
    
    sendfile        on;
    tcp_nopush      on;
    tcp_nodelay     on;
    keepalive_timeout  65;
    server_tokens off;
    gzip  on;
    
    proxy_cache_path /usr/local/nginx/tmp/cache levels=1:2 keys_zone=webcache:2m max_size=20m;
    proxy_cache_key "$scheme$request_method$host$request_uri$is_args$args";
    proxy_cache_valid 200 302 1d;
    proxy_cache_valid 404 1m;

    proxy_temp_path /usr/local/nginx/tmp/proxy_temp;
    client_body_temp_path /usr/local/nginx/tmp/client_body_temp; 
    

    map $sent_http_content_type $cachemap {
        default    no-store;
        ~text/html  "private, max-age=900";
        text/plain  "private, max-age=900";
        text/css    "private, max-age=7776000";
        application/javascript "private, max-age=7776000";
        ~image/    "private, max-age=7776000";
    }

    server {
        listen     8000;
        server_name  localhost;
        root   /usr/local/nginx/html/;
        charset utf-8;
        
        
        location / {
        
            proxy_cache webcache;
            proxy_cache_bypass $http_cache_control;
            
            proxy_set_header Accept-Encoding "";        
            proxy_set_header HOST $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_pass http://mamashop;
            add_header Cache-Control $cachemap;
            
          # ct_filter_types text/plain application/javascript;
          # ct_filter S\d\d\d\d\d\d\d[A-Z] 1;
          # ct_filter_logonly off;
            
            index  index.html index.htm;
        }

        # redirect server error pages to the static page /50x.html
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
    }

}

The configuration sets up Nginx as a reverse proxy for http://mamashop, where the actual web application is running. The options for the content filter are currently commented out. These can be enabled later.

Testing the Nginx Content Filter

We will use the Vulnerable Mama Shop (VMS) application to test the Nginx Content Filter. Vulnerable Mama Shop has a SQL injection vulnerability that allows user data to be dumped out. Refer to this article Learning SQL Injection using Vulnerable Mama Shop for more information on the Vulnerable Mama Shop application.

Issue the following commands to build VMS.

git clone https://github.com/ngchianglin/VulnerableMamaShop.git
cd VulnerableMamaShop
docker build -t mamashop .

We will create a bridge network for both the nginx content filter image and the mamashop image.

docker network create --driver bridge mynet

Start up mamashop using the following command

docker run -it --rm --disable-content-trust --name mamashop --network mynet mamashop

This will starts up the VMS application on the mynet network. VMS is available at http://mamashop for other docker applications in mynet. Starts up another console and run the mynginx image using the following command.

docker run -it --rm --network mynet -p 8000:8000 --name mynginx -v [home dir]/conf/nginx.conf:/usr/local/nginx/conf/nginx.conf:ro --mount type=tmpfs,destination=/usr/local/nginx/tmp,tmpfs-size=52428800 --read-only mynginx

Note, you need to replace [home dir] with the full path where the custom conf/nginx.conf file is created earlier. It mounts the custom conf/nginx.conf as read only file, replacing the default nginx configuration in the docker image. The command also maps port 8000 on the host to 8000 on the nginx docker image. The nginx docker image will in turn proxy and forward traffic to the mamashop docker image. Notice that /usr/local/nginx/tmp is mapped to a tmpfs and the docker image set to read only.

Visit the http://[host ip]:8000 and you should be able to see the mamashop application. Play around with its functionalities.

Mama Shop Application through the Nginx Reverse Proxy
Fig 2. Mama Shop Application through the Nginx Reverse Proxy

Let's launch an SQL injection to dump out the user information from the vulnerable application. Configure your browser to use ZAP proxy to intercept requests sent to VMS. Refer to the article Learning SQL Injection using Vulnerable Mama Shop for more information on how to do this.

Intercept a request to query items for a category. Modify the value of the catid parameter to the following

ZAP proxy modify category id
Fig 3. ZAP Proxy modify category id

Send the modified request to VMS. A list of user including their email and NRIC (National Registration Identity Card) will be dumped out. At this point, we have not enabled the nginx content filter yet.

List of Users dumped out
Fig 4. List of Users dumped out

Modify the nginx.conf and enable the content filter by uncommenting the following lines (remove the # in front of them).

# ct_filter_types text/plain application/javascript;
# ct_filter S\d\d\d\d\d\d\d[A-Z] 1;
# ct_filter_logonly off;

The ct_filter directive set up a regular expression to match for NRIC numbers. It has a strict threshold of 1, which means a single match will flag the content as sensitive. ct_filter_logonly is set to off. Content that is deemed to be sensitive will be blocked and a blank page displayed instead. ct_filter_types directive add two other MIME types, text/plain and application/javascript that will be processed by the filter. By default the filter will process text/html.

At the console when mynginx is running, type Ctrl-C to terminate the docker instance. Start it up again with the modified configuration file. Exploit the SQL injection vulnerability again and this time you should get a blank page.

Blank Page when content filter enabled
Fig 5. Blank Page when content filter enabled

The nginx content filter has stopped the sensitive user list from being dumped out. If you look at the console when the Nginx docker instance is running, there should also be a message saying "Alert ! Sensitive content is detected !"

You can play around with the filter by changing some of its configuration settings, such as setting ct_filter_logonly to on, or changing the regular expressions or changing the threshold to some other value. If you want to add another regular expression to match for email address, simply add a new ct_filter directive with the relevant PCRE regular expression and threshold.

Bypassing the Content Filter

The content filter serves as an additional layer of defense against web attacks but it is not foolproof. An attacker can try to bypass the regular expression matching. For example, in the Vulnerable Mama Shop case, we can set the catid parameter with the following

1000 union select firstname, to_base64(nric), email from users LIMIT 7, 100

The SQL injection encodes the NRIC field into base64. This bypass the regular expression configured for detecting NRIC number.

Bypassing the Nginx Content Filter
Fig 6. Bypassing the Nginx Content Filter

Notice that in the screenshot, the NRIC numbers are now all in base64 and content filter fails to block this. The attacker can easily convert the NRIC numbers from base64 back to its original alphanumberic value using widely available tools.

To avoid this, we can try to add a regular expression that attempt to detect base64 encoding. However, it is not easy to determine base64 encoding without false positives using regular expression. Base64 encoding uses many of the same characters in the regular alphabet and digits. There can be a lot of false positives. Even if we can formulate a suitable regular expression, it too can be bypassed by attackers. For example, an attacker can add spaces in the formatting of data or even use hexadecimal representation instead of base64.

Another useful technique to enhance the detection of sensitive information leakage is the use of dummy data. For example, we could have inserted dummy user data into the userlist and set up corresponding regular expressions to detect these dummy data. There can be regular expression to match the origin dummy data as is, regular expression to match base64 encoded format of the dummy data, regular expression to match hexadecimal encoded form of the dummy data etc... This can help in detection of data leakage and reduce false positives. But it too is not perfect and can be bypassed.

The Nginx content filter module though is still useful as an additional layer of defense that can thwart simple attacks. When there is a vulnerability in an application, the best way to resolving it is fixing the bug and vulnerability directly. Additional protections such as web application firewalls (WAFs) and outgoing content monitoring can provide some mitigations. These mechanisms though can be bypassed by more advanced attackers.

Conclusion and Afterthought

The Nginx content filter module depends on PCRE for regular expression matching. A possible improvement is use a regex engine that is stream based and non-backtracking. An example is the openresty sregex. The sregex is still under heavy development and its APIs may change without notice. It may be worthwhile to look into using sregex if high performance is required.

As web attacks continue to evolve, having some means to monitor and protect outgoing data can help to stop and prevent some attacks. The Nginx Content Filter module allows the inspection of outbound response body using PRCE regular expression. While it is not perfect, it can add to the tools that security professionals and defenders have for defeating web attacks.

Useful References

The full source code for the Nginx Content Filter is available at the following Github link.
https://github.com/ngchianglin/NginxContentFilter

The scripts and Dockerfile for building the Nginx Content Filter docker image is available at the following Github link.
https://github.com/ngchianglin/Docker-Alpine-NginxContentFilter

If you have any feedback, comments, corrections or suggestions to improve this article. You can reach me via the contact/feedback link at the bottom of the page.

Article last updated on Oct 2019.