Night Hour

Reading under a cool night sky ... 宁静沉思的夜晚 ...

Writing an Nginx Response Body Filter Module

Willow tranquility

By three methods we may learn wisdom: First, by reflection, which is noblest; Second, by imitation, which is easiest; and third by experience, which is bitterest. , Confucius (孔子)


15 Dec 2017


Introduction

Nginx is a popular opensource web/proxy server that is known for its performance and used by many websites. It supports 3rd party modules that can provide additional functionalities and customizations. This article shows how to write and develop a simple filter module that inserts a text string after the html <head> tag in a HTTP response body.

This can be useful in some cases. For instance, to insert a monitoring script without modifying the existing web pages or web application. Nginx can be used as a reverse proxy to speed up access to the website and at the same time insert the monitoring script to the web content.

Design and Approach

We will use C language to write the nginx module. Besides the boilerplate code for integrating with Nginx, we need a parser that can parse an input stream for html tags. The following is a syntax diagram of an html tag.

Html tag syntax diagram
Fig 1. Syntax Diagram HTML Tag

A tag starts with a angle bracket < and ends with the corresponding closing > bracket. It has a tagname, an optional "/" and optional attributes. SP represents whitespace, there must be a space between the tagname and an attribute. Additonal whitespaces may be present between each of these tokens.

A simplified BNF (Backus–Naur form) for HTML tagname and its attributes may look like this.

Tagname :: alphabetic letters
Attribute :: AttributeName <opt space> = <opt space> <opt quotes> AttributeValue <opt quotes>
AttributeName :: alpha-numeric letters
AttributeValue :: alpha-numeric letters | EscapeSequences | empty
alphabetic letters :: a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
alpha-numeric letters :: 0|1|2|3|4|5|6|7|8|9 | alphabetic letters
EscapeSequences :: '\"' | '\'' | '\n' | '\r' | '\\' | '\t' | '\v' | '\f' | '\b' | '\a' | '\xhhhh' | '\uhhhh'
<opt space> :: Optional white spaces
<opt quotes> :: Optional quotes
Optional quotes :: " | ' | empty
Optional white spaces :: '  ' | '\r' | '\n' | '\t' | '\v' | '\f' | empty
empty :: ''

It does look complex. Parsing html into a syntax tree like what a web browser does is hard. Fortunately, it is not as difficult as thought for our case. We can forget about the BNF listing above.

The parser just needs to focus on four key tokens. A starting angle bracket, closing angle bracket, single quote and a double quote.

< > ' "

A stack can be used to collect the html tag encountered in an input stream. When the parser encounter a start bracket, '<', it initializes an empty stack and push the start bracket into the stack. Other characters that come after the start bracket will be pushed into the stack. If a single or double quote is seen, a toggling flag is set. When the parser finally sees an end bracket, '>', it pushes it into the stack and process the complete html tag now present in the stack.

Toggling flags are meant to determine if a '<' or '>' represents a token or is part of a string. Any '<' or '>' tags encountered after the start bracket and a quotation is part of a string. It will be treated as a normal character to be pushed into the stack. When the corresponding closing quotation is seen, the relevant toggling flag is reset. Any '<' or '>' encountered afterwards will be interpreted as the start or end token for a html tag.

The same toggling mechanism applies to the quotations too. A single quote that appears after a start bracket and double quote is part of a string. The same thing applies for the double quote. Any characters encountered before a start bracket, '<', are ignored. These are the content of the html document. We will look at the parser code later in the article.

Nginx stores the content of the response body into a linked list of buffer chains (ngx_chain_t), each chain containing a buffer structure (ngx_buf_t) that holds part of the content. The final buffer has a special flag set, last_buf, which marks it as last. See the illustration below.

Nginx buffer chain diagram
Fig 2. Nginx chain of buffers

The parser has to process each buffer in the list of chains, extracting each html tag and examing to see if it is <head> tag. Once the <head> tag is found, its end position must be in the current buffer. To insert our own text string, this buffer will be split up and relinked with our text inserted in the middle. The following illustrates how a original buffer is split up into 3 new buffers with our text in-between.

Nginx buffers Insertion of Text 1
Fig 3. Insertion of Text 1

If the original buffer doesn't contain any data after <head> tag, it can be split and linked as 2 parts.

Nginx buffers Insertion of Text 2
Fig 4. Insertion of Text 2

This new set of buffers with the inserted text are linked up in the correct order with the other buffers in the nginx chain.

Another thing to note is that the full response body may not be stored in a single chain. Nginx can send a sequence of chains to the module at different times. The last buffer in the last chain will have the last_buf flag set. The module needs to check for this condition to determine if all content has been received.

The earlier description of the parser and text insertion is basically the heart or engine of the filter module that will be implemented. The diagram below illustrates the big picture view of how the filter module together with Nginx can be deployed.

Nginx Reverse Proxy Setup Architecture
Fig 5. Nginx Reverse Proxy Setup Architecture

Nginx and the web server are located on the same machine. The web server listens only on localhost (127.0.0.1) and accepts traffic from Nginx. Nginx is setup as a reverse proxy with the filter module installed. It forwards incoming client request to the web server and modifies the response with the inserted text (a monitoring script).

Nginx will implement TLS (Transport Layer Security, a.k.a HTTPS) and serve as TLS termination for the web server. Caching will be enabled on Nginx to speed up performance. There are a few other things the filter module has to handle. For example, if the original content from the web server is compressed (gzip or deflate), the filter will let the result pass through unmodified. The web server should therefore disable compression and let Nginx itself handle content compression.

The order of module loading in Nginx is important. The filter module needs to be run first before Nginx's gzip module; otherwise it cannot process the content that is compressed by gzip. By default, the filter module will run before gzip. The filter module will only handle html content type. Other content types like images, javascript, stylesheets or binary will be passed through unmodified.

The filter module will check the HTTP status code as well. If the status is not HTTP 200 OK, the content will pass through unmodified. This means error pages will not have the text inserted. Finally, the filter needs to be able to handle malformed html, such as those without <head> tag or those with multiple <head> tags etc... The text will only be inserted once after the first <head> tag that is encountered. If no <head> tag is detected after the first 512 characters, there is a filter directive that can be configured to "block" the page.

When the "block" directive is enabled, the filter will display a blank html page if the response doesn't have a valid <head> tag in the first 512 characters. A single html tag including its attributes cannot be more than 512 characters.

Structure of an Nginx Filter Module

To develop and write an Nginx module, there is a need to know the data types and function calls available in the Nginx system. The official Nginx Development Guide is a key document to read. It provides information on what header files to include, what return codes are supported, the Nginx data types such as ngx_str_t (String), arrays, lists etc...that are available. There is also information on memory management, the Nginx cycle, Nginx events, how connections and requests are defined.

The official guide is the main reference to learn how to develop Nginx modules. However, it is rather long and multiple readings are probably required to understand the content. An easier introduction is available at EMiller 's Guide To Nginx Development. This guide is a useful tutorial for beginners learning to write Nginx modules.

In this section, we will briefly run through the key components of an Nginx filter module without going into too much details. In the implementation section, we will run through the source code of the filter module. There are 3 important Nginx data structures that modules rely on.

  1. Module Definition
  2. Module Context
  3. Module Directive Structure

The following table describes each item in more details.

Data Structure Description Source Definition
ngx_module_t
(Module Definition)

This structure is the module definition. It is a typedef of ngx_module_s and it defines the module. It is a global variable for each module. At the top of the structure are version information that can be filled by using a macro NGX_MODULE_V1. There are also several unused fields for future extensions at the bottom of the struct that can be filled with NGX_MODULE_V1_PADDING.

For the remaining fields, we are interested in only 3 of them. The rest are handlers that can be called at various points in the Nginx cycle. These are set to NULL. The 3 fields that concern us are as follow.

  • void *ctx;
    This takes the module context (ngx_http_module_t) which contains the function handlers for creating module configuration struct and merging module configuration. ngx_http_module_t is covered later in this table.
  • ngx_command_t *commands;
    This takes a pointer to an array of ngx_command_t. Each ngx_command_t defines a directive that the module takes. ngx_command_t is covered later in this table.
  • ngx_uint_t type;
    This defines the type of module (let Nginx know what is stored in ctx), such as NGX_CORE_MODULE, NGX_HTTP_MODULE etc...
Source Def
ngx_http_module_t
(Module Context)

Module context, a static data structure that defines the handlers for the creation and initialization of a module's configuration struct. It includes handlers that can run pre and post configuration.

A module can have its own configuration struct that contains the parameters it requires. The function handlers define here are for the creation and merging of the module configuration struct. There are seperate pairs of function handlers for the module configuration that appear in Nginx 's main config block, server config block and location block. There are also two handlers that can run pre and post configuration.

For those handlers that are not needed, NULL can be specified. For example, if a module only has directives in Nginx's location block and it doesn't require merging values from higher levels, the function handler for creating a location configuration can be specified, while all others set to NULL.

Source Def
ngx_command_t
(Module Directive Structure)

This is a typedef of ngx_command_s, for defining a module directive. A static array of ngx_command_t, containing the directives of a module is passed to Nginx. The arrays is terminated by a ngx_null_command. ngx_command_t has the following fields.

  • Directive Name
    An ngx string for the name of the directive.
  • Bitmask
    Indicates where the directive will be configured (eg. HTTP, server or location block in the Nginx config file). The bitmask also indicates how many and what arguments the directive takes.
  • Set Function pointer
    A set handler function for saving the directive arguments. Nginx has several pre-defined set functions for saving various directive arguments like boolean, string etc... A custom handler can also be specified.
  • Configuration Structure
    This specifies the configuration structure passed to the directive handler. If a module directive is configured in the server context/block of the Nginx config file, then the server context offset (NGX_HTTP_SRV_CONF_OFFSET) should be specified here. The handler function use this information for locating the right module configuration.
  • Parameter offset
    This is where the parameter for the module configuration is located. The set handler function will save the directive argument here.
  • Post
    A secondary function pointer can be specified that will be called after the earlier set function handler has saved the directive argument. This field can also hold a default value that can be used by some of the Nginx pre-defined set functions.
Source Def

Nginx Module Filter Chain

Besides these 3 data structures, we need to know a bit about how Nginx handles the filter chain. Nginx treats filter modules like a chain. The first will call the second and second calls the third and so on... until the last. There are two separate chains, one for handling HTTP headers and another for the HTTP response body. A filter module can register a handler for HTTP headers, as well as a handler for HTTP response body.

Registration can be done in an initialization function defined as a post configuration function in the module context (ngx_http_module_t) described earlier. The filter handlers take the arguments and return values required by Nginx. For example, a HTTP header filter handler function, takes a pointer of ngx_http_request_t as argument and return ngx_int_t. This handler function will call the next header filter in the chain when it is done.

The HTTP response body filter handler takes two arguments, a pointer to ngx_http_request_t and a pointer to ngx_chain_t. It returns an ngx_int_t. The second argument, ngx_chain_t* is a linked list for the input buffers. Each buffer stores part of the HTTP response body. This is illustrated earlier in the Design and Approach section. Our filter module will be parsing these buffers and inserting our text after the <head> tag. Once it is done, it will call the next response body filter.

Note that the response body filter handler function can be called many times in a single request. This is due to the nature of network data. Each filter is called once data is available and it will call the next filter when it has processed the current set of data buffers. The request though, may not have ended and there can be more data buffers coming.

ngx_http_top_header_filter is a global handler pointer for storing the first HTTP header filter handler function, to be called by Nginx. ngx_http_top_body_filter stores the first HTTP response body filter handler to be called. These are used when registering our own handlers.

Module Config Shell File

To tell Nginx about the filter module, a config file is required. This is just a regular shell file. It tells Nginx, the module name, the module type and the module source code location. For more details on the config file and Nginx module, refer to the Nginx Development Guide. The Nginx Wiki provides information on the config file as well.

Implementation of the Nginx Response Body Filter

Let's run through the key functions in the source code for the Html Head filter module. The full source is available at the Github link at the bottom of the article.

The following is the listing for the config file of Html Head filter module. Note, the filename of the config file is "config". It specifies the type of the module, a name for the module and a single c source file that contains the module code.

1
2
3
4
5
6
7
ngx_module_type=HTTP_AUX_FILTER
ngx_module_name=ngx_http_html_head_filter_module
ngx_module_srcs="$ngx_addon_dir/ngx_http_html_head_filter_module.c"

. auto/module

ngx_addon_name=$ngx_module_name

ngx_http_html_head_filter_module.c is the filter source file. The 3 Nginx header files required for HTTP module development are included at the top. Two macros, HF_MAX_STACK_SZ and HF_MAX_CONTENT_SZ are set to 512. These are the values for the maximum stack size and the maximum characters to check for the <head> tag in a HTTP response. The stack and a struct for maintaining state for each HTTP request, ngx_http_html_head_filter_ctx_t, are defined.

Warning:
This warning applies if you are using the blocking feature.

It is not recommended to increase HF_MAX_CONTENT_SZ. HF_MAX_CONTENT_SZ is deliberately set to a low value so that blocking can be effective. For performance, the filter module doesn't buffer the full output response. If HF_MAX_CONTENT_SZ is set to a large value, it is possible that the blocking will not work and partial response can be sent to the client browser.

Try setting HF_MAX_CONTENT_SZ to a lower value if blocking doesn't work. A lower value will also help to improve performance.

The following code listing shows these macros and data structures.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <ngx_config.h>
#include <ngx_core.h>
#include <ngx_http.h>

#define HF_MAX_STACK_SZ 512
#define HF_MAX_CONTENT_SZ 512

/*
stack for parsing html
*/
typedef struct 
{
u_char data[HF_MAX_STACK_SZ];
ngx_int_t top;
}
headfilter_stack_t;


/*
module data struct for maintaining
state per request
*/
typedef struct
{
ngx_uint_t  last;
ngx_uint_t  count;
ngx_uint_t  found;
ngx_uint_t  index;
ngx_uint_t  starttag; 
ngx_uint_t  tagquote;
ngx_uint_t  tagsquote;
headfilter_stack_t stack;
ngx_chain_t  *free;
ngx_chain_t  *busy;
ngx_chain_t  *out;
ngx_chain_t  *in;
ngx_chain_t  **last_out;
}
ngx_http_html_head_filter_ctx_t;

Nginx allows a module to keep state information per request through a request context data structure defined by the module. ngx_http_html_head_filter_ctx_t stores the state of processing a response for the module. For example, count is the number of characters that has been processed so far (count cannot exceed 512), last is a flag indicating that 512 characters limit has been reached or all content has been processed, index stores the position of the closing bracket ,">", of a <head> tag in the current buffer if the <head> tag is found.

Nginx offers two convenient functions, ngx_http_set_ctx(r, c, module) and ngx_http_get_module_ctx(r, module) for saving and retrieving the module's request context. Other fields in ngx_http_html_head_filter_ctx_t, include the stack used for storing html tag, the flags used with the stack for parsing html, a found flag that indicates that <head> tag is found, a number of ngx_chain_t pointers and a pointer to a ngx_chain_t pointer.

These ngx_chain_t pointers are for handling the input and output buffer chains in a request. The free and busy pointers are for buffer reuse. Refer to the Nginx Development Guide for more details on buffer reuse.

The following is the data structure for holding the arguments of the configuration directives.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
/* 
Configuration struct for module
*/
typedef struct
{
ngx_str_t insert_text;
ngx_flag_t block;
}
ngx_http_html_head_filter_loc_conf_t; 

static ngx_http_output_header_filter_pt  ngx_http_next_header_filter;
static ngx_http_output_body_filter_pt    ngx_http_next_body_filter;

The Html Head filter module directives can only be set in the location context of the Nginx configuration file. ngx_http_html_head_filter_loc_conf_t has a string field, insert_text, that holds the text to be inserted after the <head> tag. It has a block flag that indicates whether a blank html should be displayed if a <head> tag is not found in the first 512 characters of a response.

The two static variables ngx_http_next_header_filter and ngx_http_next_body_filter, are pointers for storing the next header filter and body filter in the Nginx chain of filters. These are set during initialization of Html Head filter module and are called when Html Head filter has done its work.

The following listing shows the module directives, declared as a static array of ngx_command_t.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/*
Module directives
*/
static ngx_command_t ngx_http_html_head_filter_commands[] =
{
   {
     ngx_string("html_head_filter"), //Module Directive name
     NGX_HTTP_LOC_CONF | NGX_CONF_1MORE, //Directive argument 
     ngx_conf_set_str_slot, //Handler function 
     NGX_HTTP_LOC_CONF_OFFSET, //Save to loc config 
     offsetof(ngx_http_html_head_filter_loc_conf_t, insert_text),//loc para
     NULL
   },
   
   {
     ngx_string("html_head_filter_block"), //Module Directive name
     NGX_HTTP_LOC_CONF | NGX_CONF_FLAG, //Directive argument
     ngx_conf_set_flag_slot, //Handler function 
     NGX_HTTP_LOC_CONF_OFFSET, //Save to loc config 
     offsetof(ngx_http_html_head_filter_loc_conf_t, block),//loc para
     NULL
   },
   
   ngx_null_command
};

ngx_http_html_head_filter_commands[] is an array of ngx_command_t, it holds 2 directives of the module and is terminated by a ngx_null_command. The directive "html_head_filter" takes a string argument (text to be inserted). This directive is required to enable the filtering.

The other directive is "html_head_filter_block", it takes a on/off flag that determines whether to display a blank html when a <head> tag is not found within the first 512 characters of the Http response. This is an optional directive. By default blocking is not enabled. Both directive arguments are read using the Nginx set functions and saved into the module configuration structure, ngx_http_html_head_filter_loc_conf_t.

The module context, ngx_http_html_head_filter_ctx, defines two function handlers. ngx_http_html_head_init( ) is used for initializing the module after configuration is done and ngx_http_html_head_create_conf( ) is for creating the module configuration structure. The following shows the code listing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
/*
Module context 
*/
static ngx_http_module_t  ngx_http_html_head_filter_ctx =
{
    NULL, //Pre config
    ngx_http_html_head_init, //Post config
    NULL, //Create main config
    NULL, //Init main config
    NULL, //Create server config
    NULL, //Merge server config
    ngx_http_html_head_create_conf, //Create loc config
    NULL //Merge loc config
};

The ngx_http_html_head_init() function initializes the module and updates the filter chain. The module's header filter and body filter handler are set to ngx_http_top_header_filter and ngx_http_top_body_filter respectively. Nginx will call these and hence invoke our filter module.

The original function handlers in these 2 function pointers are saved as ngx_http_next_header_filter and ngx_http_next_body_filter respectively. When our module completes its work, it will in turn call these saved functions. This establishes the Nginx filter chain, enabling one filter to call the next until the last in the chain.

The following is the code snippet for the ngx_http_html_head_create_conf() and ngx_http_html_head_init() function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/* Creates the module location config struct */
static void* 
ngx_http_html_head_create_conf(ngx_conf_t *cf)
{

    ngx_http_html_head_filter_loc_conf_t *conf;
    conf = ngx_pcalloc(cf->pool, sizeof(ngx_http_html_head_filter_loc_conf_t));
    if(conf == NULL)
    {
        return NGX_CONF_ERROR;
    }

    conf->block = NGX_CONF_UNSET;
    return conf;

}


/* Function to initialize the module */
static ngx_int_t
ngx_http_html_head_init(ngx_conf_t * cfg)
{

    ngx_http_next_header_filter = ngx_http_top_header_filter;
    ngx_http_top_header_filter = ngx_http_html_head_header_filter;

    ngx_http_next_body_filter = ngx_http_top_body_filter;
    ngx_http_top_body_filter = ngx_http_html_head_body_filter;
 
    return NGX_OK;

}

The array of module directives, the module context and module type are specified in the ngx_module_t structure. This is the module definition discussed in the earlier section. The following shows the code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
/*
Module definition
*/
ngx_module_t  ngx_http_html_head_filter_module = 
{
    NGX_MODULE_V1,
    &ngx_http_html_head_filter_ctx,     /* module context */
    ngx_http_html_head_filter_commands, /* module directives */
    NGX_HTTP_MODULE,                    /* module type */
    NULL,                                  
    NULL,                                  
    NULL,                                  
    NULL,                                  
    NULL,                                  
    NULL,                                 
    NULL,                                  
    NGX_MODULE_V1_PADDING
};

The following shows the code listing for the ngx_http_html_head_header_filter() function. This is the handler that is registered earlier by the module initialization function. It process the incoming HTTP response headers, does some checks and initialize the module request context for managing state. If some of the checks failed, the response will be passed unmodified to the next filter handler. For example, if the "html_head_filter" directive is not set, or if the response is compressed.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
/*Module function handler to filter http response headers */
static ngx_int_t
ngx_http_html_head_header_filter(ngx_http_request_t *r )
{

    ngx_http_html_head_filter_loc_conf_t *slcf;
    ngx_http_html_head_filter_ctx_t *ctx;
    ngx_uint_t content_length=0; 

    slcf = ngx_http_get_module_loc_conf(r, ngx_http_html_head_filter_module);
    
    
    if(slcf == NULL)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter "
                "null configuration");
        #endif
       
        return ngx_http_next_header_filter(r);
    }
    

    if(slcf->insert_text.len == 0)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: empty configuration insert text");
        #endif
        
        return ngx_http_next_header_filter(r);
    }
    

    if(r->headers_out.content_type.len == 0 || 
        r->headers_out.content_length_n == 0 ||
        r->header_only )
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: empty content type or "
                "header only ");
        #endif
        
        return ngx_http_next_header_filter(r);
    }
    
     
    if(ngx_test_content_type(r) == 0) 
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: content type not html");
        #endif            
        
        return ngx_http_next_header_filter(r);
    }

    
    if(ngx_test_content_compression(r) != 0)
    {//Compression enabled, don't filter   
        ngx_log_error(NGX_LOG_WARN, r->connection->log, 0, 
                     "[Html_head filter]: compression enabled");
                     
        return ngx_http_next_header_filter(r);
    }
 
    if(r->headers_out.status != NGX_HTTP_OK)
    {//Response is not HTTP 200   
        ngx_log_error(NGX_LOG_WARN, r->connection->log, 0, 
                     "[Html_head filter]: http response is not 200");
                     
        return ngx_http_next_header_filter(r);
    }

    r->filter_need_in_memory = 1;

    if (r == r->main) 
    {//Main request 
        content_length = r->headers_out.content_length_n + 
                         slcf->insert_text.len;
        r->headers_out.content_length_n = content_length;      
    }
    

    ctx = ngx_http_get_module_ctx(r, ngx_http_html_head_filter_module);
    if(ctx == NULL)
    {
        ctx = ngx_pcalloc(r->pool, 
                          sizeof(ngx_http_html_head_filter_ctx_t)); 
        
        if(ctx == NULL)
        {
            ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
                          "[Html_head filter]: cannot allocate ctx"
                          " memory");
                          
            return ngx_http_next_header_filter(r);
        }
        
        ngx_http_set_ctx(r, ctx, ngx_http_html_head_filter_module);
    }
    
    
    return ngx_http_next_header_filter(r);
    
}

The following is the code listing for the ngx_http_html_head_body_filter() function. Like the header filter handler, this function is registered by the module initialization function.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
/*
Module function handler to filter the html response body
and insert the text string
*/
static ngx_int_t
ngx_http_html_head_body_filter(ngx_http_request_t *r, ngx_chain_t *in)
{

    ngx_http_html_head_filter_loc_conf_t *slcf;
    ngx_http_html_head_filter_ctx_t *ctx;
    ngx_chain_t  *cl;
    ngx_buf_t  *b;
    ngx_int_t rc;
    u_char* empty_page = (u_char*)"<!DOCTYPE html><html><head>"
                                  "<meta charset=\"UTF-8\">"
	                              "<title></title></head><body>"
                                  "</body></html>";
                                  

    slcf = ngx_http_get_module_loc_conf(r, ngx_http_html_head_filter_module);
    ctx = ngx_http_get_module_ctx(r, ngx_http_html_head_filter_module);

    
    if(slcf == NULL)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_body_filter "
                "null configuration");
        #endif
       
        return ngx_http_next_body_filter(r, in);
    }


    if(ctx == NULL)
    {
       ngx_log_error(NGX_LOG_WARN, r->connection->log, 0, 
            "[Html_head filter]: ngx_http_html_head_body_filter" 
            "unable to get module ctx");
            
       return ngx_http_next_body_filter(r, in);
    }


    if(in == NULL)
    {
       ngx_log_error(NGX_LOG_WARN, r->connection->log, 0, 
            "[Html_head filter]: input chain is null");
                     
       return ngx_http_next_body_filter(r, in);
    }


    //Copy the incoming chain to ctx-in
    if (ngx_chain_add_copy(r->pool, &ctx->in, in) != NGX_OK) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: unable to copy"
            " input chain - in");
                     
        return NGX_ERROR;
    }

    ctx->last_out = &ctx->out;
   
    //Loop through all the incoming buffers
    while(ctx->in)
    {	
        ctx->index = 0; 
        if(ctx->found == 0 && ctx->last == 0)
        {		 
            rc = ngx_parse_buf_html(ctx, r);
            if(rc == NGX_OK)
            { //<head> is found
                ctx->found = 1; 
                rc=ngx_html_insert_output(ctx, r, slcf);
			   
                if(rc == NGX_ERROR)
                {
                    return rc; 
                }
            }
            else if(rc == NGX_ERROR)
            {
                ctx->last = 1;
            }	
        }	

        b = ctx->in->buf;

        if(b->last_buf || b->last_in_chain)
        {//Last buffer and <head> not found
         //even if content is less than 512 chars
           if(!ctx->found)
           {
              ctx->last = 1;
           }
        }		
    
        *ctx->last_out=ctx->in;
        ctx->last_out=&ctx->in->next;
        ctx->in = ctx->in->next;
    }

    *ctx->last_out = NULL;
	
    //If <head> is not found and block option is enabled
    if(ctx->last  && slcf->block == 1) 
    {

        ngx_log_error(NGX_LOG_ALERT, r->connection->log, 0,
                      "[Html_head filter]: cannot find <head> "
                      "blocking");

        cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
        if (cl == NULL) 
        {
            ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                "[Html_head filter]: ngx_http_html_head_body_filter "
                "unable to allocate output chain");
                
            return NGX_ERROR;
        }

        b=cl->buf;
        ngx_memzero(b, sizeof(ngx_buf_t));

        b->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
        b->memory=1;
        b->pos = empty_page;
        b->last = empty_page + ngx_strlen(empty_page);

        if(r ==r->main)
        {		
            b->last_buf = 1;
        }
        else
        {
            b->last_in_chain = 1; 
        }
        
        ctx->out = cl; 
        ctx->out->next = NULL; 
        r->keepalive = 0;
		
    }
    
   
    rc=ngx_http_next_body_filter(r, ctx->out);

    ngx_chain_update_chains(r->pool, &ctx->free, &ctx->busy, &ctx->out,
                            (ngx_buf_tag_t)&ngx_http_html_head_filter_module);

    ctx->in = NULL; 

    return rc;
    
}

The while loop on line 68 iterates through the incoming chain of buffers and call ngx_parse_buf_html() function to parse each buffer for the <head> tag. If <head> is found, the found flag in the module request context is set and ngx_html_insert_output() function is called. ngx_html_insert_output() will insert our text after the <head> tag. The process for doing this is described in the earlier Design and Approach section. If <head> is not found after the first 512 characters, the last flag is set in the module request context.

If the html content is less than 512 characters and the <head> tag is not found, the last flag is set as well. This avoids the case where content less than 512 characters without the <head> tag is sent to the user when the block option is enabled.

The found and last flags ensure that parsing will be done only for the first 512 characters of the HTTP response. They also ensure that the text will only be inserted once, after the occurence of the first <head> tag even if there are multiple <head> tags in a response. The while loop builds the output chain that will be passed to the next filter in the chain.

A blank html page will be sent to the next filter in the chain if the last flag is set and the "html_head_filter_block" directive is set to on. The ngx_chain_update_chains() function at line 152, reused buffers that have been consumed by adding them to the free chain in the module request context.

The following lists the code for the ngx_parse_buf_html() function.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
/*
Parses the buffer to look for the <head> tag
Returns NGX_OK if found, 
        NGX_AGAIN if not found in this buffer,
        NGX_ERROR if an error occurs.
*/
static ngx_int_t 
ngx_parse_buf_html(ngx_http_html_head_filter_ctx_t *ctx, 
                   ngx_http_request_t *r)
{
    u_char *p, c;
    ngx_int_t rc;
    ngx_buf_t* buf;
	
    if(ctx->in == NULL)
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_parse_buf_html "
            "unable to parse html ctx->in is NULL");  
            
        return NGX_ERROR;
    }
		
    buf = ctx->in->buf; 

    for(p=buf->pos; p < buf->last; p++)
    {

        c = *p;
        if(ctx->count == HF_MAX_CONTENT_SZ)
        {
            ngx_log_error(NGX_LOG_WARN, 
               r->connection->log, 0, 
               "[Html_head filter]: ngx_parse_buf_html unable "
               "to find <head> tag within 512 "
               "characters");  
               
            return NGX_ERROR;
        } 
        
        switch(c)
        {
            case '<':

                ctx->starttag=1;
                if(!ctx->tagquote && ! ctx->tagsquote)
                {
                   ngx_init_stack(&ctx->stack);
                }

                if(push(c, &ctx->stack) == -1)
                {
                      ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                        "[Html_head filter]: ngx_parse_buf_html "
                        "parse stack is full");  
                         
                      return NGX_ERROR;
                }
                
                break;

            case '>':

                if(ctx->starttag)
                {
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html "
                            "parse stack is full");  
                            
                        return NGX_ERROR;
                    }

                    if(!ctx->tagquote && !ctx->tagsquote)
                    {    
                        ctx->starttag = 0; 
                        //Process the tag
                        rc = ngx_process_tag(ctx,r);

                        if(rc == NGX_OK)
                        {
                            return NGX_OK;
                        }
                        else if(rc == NGX_ERROR)
                        {
                            return NGX_ERROR; 
                        }
                
                    }
                }

                break;

            case '\"':

                if(ctx->starttag && ctx->tagsquote==0 && ctx->tagquote==0 )
                {
                    ctx->tagquote=1;
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html "
                            "parse stack is full");  
                            
                        return NGX_ERROR;
                    }
                }
                else if(ctx->starttag && ctx->tagsquote==0 && ctx->tagquote)
                {
                    ctx->tagquote=0; 
                    if(push(c, &ctx->stack) == -1)
                    {
                         ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
            
                }
                else if(ctx->starttag && ctx->tagsquote)
                {
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
                }
          
                break;

            case '\'':

                if(ctx->starttag && ctx->tagquote == 0 && ctx->tagsquote == 0)
                {
                    ctx->tagsquote = 1;
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }  
                }   
                else if(ctx->starttag && ctx->tagquote==0 && ctx->tagsquote)
                {
                    ctx->tagsquote = 0;
                    if(push(c, &ctx->stack) == -1)
                    {
                         ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
                } 
                else if(ctx->starttag && ctx->tagquote)
                {
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
                }

                break;

            default:
         
                if(ctx->starttag)
                {
                    if(push(c, &ctx->stack) == -1)
                    {
                         ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
                }

        }

        ctx->count++;
        ctx->index++;
    }

    return NGX_AGAIN;
}

The function goes through the character stream in a buffer and looks for the four tokens <, ", ', >. The < token indicates a starting html tag. The stack is initialized and the token pushed into the stack. Subsequent characters that are not a token, are pushed into the stack. If a double quote or single quote is encountered, toggling flags for the respective quote is set. Any > that comes after either quotation will not be interpreted as an html ending tag. Any < that comes after a quotation will not be interpreted as a start tag.

The relevant quotation flags are reset when a second double quote or single quote is encountered. A subsequent > will then be treated as an end tag. The parser will then call the function ngx_process_tag() to check if the html tag in the stack is a <head>. Leading and trailing spaces in the tag are ignored and the check is case insensitive. However, the <head> tag cannot contain attributes.

Some examples will make this clearer. <   HeAD> is considered valid, while <Head id=1> is invalid. The parser function returns NGX_OK if a valid <head> tag is found, it returns NGX_AGAIN to indicate processing can continue with subsequent buffers and NGX_ERROR if an error occurs (such as a tag exceeding 512 characters).

We will list one more function, the ngx_html_insert_output() function that will insert our text into the buffer chains. The following is the code snippet for ngx_html_insert_output().

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
/*
Insert the text into body response buffer
*/
static ngx_int_t 
ngx_html_insert_output(ngx_http_html_head_filter_ctx_t *ctx, 
                       ngx_http_request_t *r, 
                       ngx_http_html_head_filter_loc_conf_t *slcf)
{

    ngx_chain_t  *cl, *ctx_in_new, **ll;
    ngx_buf_t  *b;

    if(ctx->in == NULL)
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
             "[Html_head filter]: ngx_html_insert_output "
             "text Insertion ctx->in is NULL");
             
        return NGX_ERROR;
    }

				   
    ll = &ctx_in_new;	   
    b=ctx->in->buf;

    if(b->pos + ctx->index + 1 > b->last)
    {//Check that the head tag position does not exceed buffer
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_html_insert_output "
            "invalid input buffer at text insertion");
            
        return NGX_ERROR;          
    }

    cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
    if (cl == NULL) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_html_insert_output "
            "unable to allocate output chain");
            
        return NGX_ERROR;
    }

    b=cl->buf;
    ngx_memzero(b, sizeof(ngx_buf_t));

    b->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
    b->memory=1;
    b->pos = ctx->in->buf->pos;
    b->last = b->pos + ctx->index + 1;
    b->recycled = ctx->in->buf->recycled;
    b->flush = ctx->in->buf->flush; 
       
    *ll = cl;  
    ll = &cl->next;
	

    cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
    if (cl == NULL) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
             "[Html_head filter]: ngx_html_insert_output "
             "unable to allocate output chain");
             
        return NGX_ERROR;
    }

    b=cl->buf;
    ngx_memzero(b, sizeof(ngx_buf_t));
	 
    b->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
    b->memory=1;
    b->pos=slcf->insert_text.data;
    b->last=b->pos + slcf->insert_text.len;
    b->recycled = ctx->in->buf->recycled;
	 
    *ll = cl;
    ll = &cl->next;
	 

    if(ctx->in->buf->pos + ctx->index + 1 == ctx->in->buf->last )
    {//head tag is in last position of the buffer
   
        b->last_buf = ctx->in->buf->last_buf;
        b->last_in_chain = ctx->in->buf->last_in_chain;
		 
        *ll = ctx->in->next; 
		
	    if(ctx->in->buf->recycled)
	    {//consume existing buffer
	        ctx->in->buf->pos = ctx->in->buf->last;
	    }
	    ctx->in = ctx_in_new;
	    return NGX_OK;
		
    }
     
    
    //tag is within buffer last position, 
    //i.e. ctx->in->buf->pos + ctx->index + 1 < ctx->in->buf->last
    cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
    if (cl == NULL) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_html_insert_output unable to allocate "
            "output chain");
            
        return NGX_ERROR;
    }

    b=cl->buf;
    ngx_memzero(b, sizeof(ngx_buf_t));

    b->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
    b->memory=1;
    b->pos = ctx->in->buf->pos + ctx->index + 1;
    b->last = ctx->in->buf->last;
    b->recycled = ctx->in->buf->recycled;
    b->last_buf = ctx->in->buf->last_buf;
    b->last_in_chain = ctx->in->buf->last_in_chain;

    *ll = cl;
    ll = &cl->next;
    *ll = ctx->in->next;
	 
    if(ctx->in->buf->recycled)
    {//consume existing buffer
        ctx->in->buf->pos = ctx->in->buf->last;	
    }
	 
    ctx->in = ctx_in_new; 
	   
    return NGX_OK;

}

The insert text function splits the input buffer where the <head> tag is found into either 3 or 2 buffers with the text inserted. The process is illustrated earlier in the Design and Approach section. If the current input buffer has only content up to the <head> tag, then our text can be inserted directly as a new buffer after the input buffer. In this case, it is split into 2 buffers.

Alternatively if the current input buffer has content after the <head> tag, the input buffer will be split into 3 buffers. The first is the content up till and including the <head> tag, the second is our inserted text and the third is the content after the <head> tag.

The new set of buffers are then incorporated into the output chain by the while loop in the function handler, ngx_http_html_head_body_filter(). If the original input buffer is marked with a recycled flag, the ngx_html_insert_output() function will consume the buffer. It does this by setting the start position of the buffer content to be equal to its last content position. The recycled flag indicates that the buffer has to be consumed as soon as possible.

Compiling the Nginx Body Filter Module

Let's proceed to compile and test the html head filter module. Create a working directory "Build-Module" to hold the source files that are required. The filter module source code can be obtained from the github repository. On a Ubuntu linux system with git installed, the following commands can be used.

mkdir Build-Module
cd Build-Module
git clone https://github.com/ngchianglin/NginxHtmlHeadFilter.git

To verify the signature of the git download, refer to these instructions. Let's do a quick static analysis of the module's source code to make sure that there are no major vulnerabilities, such as buffer overflows. On Ubuntu, we can install cppcheck.

sudo apt-get install cppcheck
cd NginxHtmlHeadFilter
cppcheck --enable=warning ngx_http_html_head_filter_module.c

Good, our module code doesn't have any glaring issues that the cppcheck analyzer can find. We can proceed to download the other packages that are required. Change our directory back to Build-Module.

cd ..

The filter module works with the latest stable Nginx 1.16.1. Download the latest stable nginx source code from the official Nginx download page. We are going to download Openssl 1.1.1d, zlib 1.2.11 and pcre 8.43 as well.

Verify the integrity of the downloads with either SHA-256 checksum or gpg signature provided by each of the package website. The following lists the sha256 checksums of the packages.

nginx-1.16.1.tar.gz
f11c2a6dd1d3515736f0324857957db2de98be862461b5a542a3ac6188dbe32b

openssl-1.1.1d.tar.gz
1e3a91bc1f9dfce01af26026f856e064eab4c8ee0a8f457b5ae30b40b8b711f2

zlib-1.2.11.tar.gz
c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1

pcre-8.43.tar.gz
0b8e7465dc5e98c757cc3650a20a7843ee4c3edf50aaf60bb33fd879690d2c73

Extract these tar balls in the Build-Module directory. Issue the following commands to configure Nginx. The options include hardening flags to ensure a hardened binary.

cd nginx-1.16.1
./configure --with-cc-opt="-Wextra -Wformat -Wformat-security -Wformat-y2k -Werror=format-security -fPIE -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all" --with-ld-opt="-pie -Wl,-z,relro -Wl,-z,now -Wl,--strip-all" --with-http_v2_module --with-http_ssl_module --without-http_uwsgi_module --without-http_fastcgi_module --without-http_scgi_module --without-http_empty_gif_module --with-openssl=../openssl-1.1.1d --with-openssl-opt="no-ssl2 no-ssl3 no-comp no-weak-ssl-ciphers -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all -fPIC" --with-zlib=../zlib-1.2.11 --with-zlib-opt="-O2  -D_FORTIFY_SOURCE=2 -fstack-protector-all -fPIC" --with-pcre=../pcre-8.43 --with-pcre-opt="-O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all -fPIC" --with-pcre-jit --add-module=../NginxHtmlHeadFilter

The configure command above will create a Makefile in the objs directory. Proceed to build the binary and install it into /usr/local/nginx.

make
sudo make install

We can tar zip the compiled nginx package and move it to our server machine for testing. As a security measure and best practice, the server doesn't have gcc or compiler tools installed. We compile the code on a separate workstation that has the same architecture and OS as the server and then copy the compiled package to the server using sftp or scp.

cd /usr/local
tar -czvf nginx-binary-package.tgz nginx
sftp -i /home/devuser1/keyloc/private_rsa user@myserver
put nginx-binary-package.tgz

Testing the Nginx Filter Module

On the server, extract the nginx binary package to /usr/local/nginx. Ensure that the ownership and permission on this extracted nginx binary location are secure. The Apache web server shall serve the main website on this machine. It listens locally (127.0.0.1) on port 80 and will not accept any external network traffic.

Nginx will be configured as a reverse proxy in front of the Apache web server. Nginx accepts external network traffic and forward the traffic to the Apache web server. Refer to the earlier section, Design and Approach, for a big picture view of the deployment architecture.

Nginx is run using the nginx user and group. The following commands create the user and group, as well as the directories used by Nginx.

sudo mkdir /opt/nginx
sudo chmod 755 /opt/nginx
sudo groupadd -g 8800 nginx
sudo useradd -d /opt/nginx/home -m -u 8800 -g 8800 -s /bin/false nginx
sudo mkdir /var/log/nginx
sudo chown nginx: /var/log/nginx
sudo chmod 700 /var/log/nginx
sudo mkdir /opt/nginx/www
sudo chmod 755 /opt/nginx/www
sudo mkdir /opt/nginx/cache
sudo chown nginx: /opt/nginx/cache
sudo chmod 700 /opt/nginx/cache

Let 's do some additional hardening of the /usr/local/nginx location.

sudo chown -R root:nginx /usr/local/nginx
sudo chmod 750 /usr/local/nginx
sudo chown -R root:root /usr/local/nginx/sbin
sudo chmod 700 /usr/local/nginx/sbin/nginx
sudo chown -R root:root /usr/local/nginx/conf
sudo chmod -R 600 /usr/local/nginx/conf/
sudo chmod 700 /usr/local/nginx/conf

Opened up the nginx configuration file located at /usr/local/nginx/conf/nginx.conf and fill in the following settings. Note these configuration settings are for nighthour.sg. Edit and replace the IP address, the server name, the ssl certificates, etc... with settings that are relevant for your test environment. Testing should be done on a non production system.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
user  nginx nginx;
worker_processes  4;
error_log  /var/log/nginx/error.log warn;
pid        /var/log/nginx/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" "$gzip_ratio"';

    sendfile    on;
    keepalive_timeout  65;
    server_tokens off;
    
    proxy_cache_path /usr/local/var/nginx/cache levels=1:2 keys_zone=webcache:2m max_size=150m;
    proxy_cache_key "$scheme$request_method$host$request_uri$is_args$args";
    proxy_cache_valid 200 90d;
    proxy_cache_valid 404 1m;

    gzip  on;
    
    map $sent_http_content_type $cachemap {
        default    no-store;
        ~text/html  "private, max-age=900";
        text/plain  "private, max-age=900";
        text/css    "private, max-age=7776000";
        application/javascript "private, max-age=7776000";
        ~image/    "private, max-age=7776000";
    }

    server {
        listen       128.199.64.100:80;
        server_name  www.nighthour.sg nighthour.sg;
        root   /var/www/html;
        charset utf-8;
        access_log  /var/log/nginx/access.log  main;
        
        expires 900;
        add_header Cache-Control public;
        if ( $host ~* "nighthour.sg$" )
        {
           return 301 https://$host$request_uri;
        }

        return 400;

        location / {
            index  index.html index.htm;
        }

        # redirect server error pages to the static page /50x.html
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

    }

    # HTTPS server
    server {
        listen       128.199.64.100:443 ssl http2;
        server_name  www.nighthour.sg nighthour.sg;
        root /var/www/html;
        charset utf-8;

        ssl_certificate      /etc/letsencrypt/live/www.nighthour.sg/fullchain.pem;
        ssl_certificate_key  /etc/letsencrypt/live/www.nighthour.sg/privkey.pem;
 
        ssl_session_timeout 15m;
        ssl_session_cache shared:SSL:50m;
        ssl_session_tickets off;
        
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256';
        ssl_prefer_server_ciphers  on;
        
        ssl_stapling on;
        ssl_stapling_verify on;
        ssl_trusted_certificate /etc/letsencrypt/live/www.nighthour.sg/fullchain.pem;
        resolver 8.8.8.8 8.8.4.4 valid=300s;
        resolver_timeout 5s;
        
        add_header Strict-Transport-Security "max-age=31536000;includeSubDomains";
        access_log  /var/log/nginx/ssl_access.log  main;

        location / {
            
            index  index.html index.htm;
            
            html_head_filter "<script src=\"/scripts/mymonitor.js\" async></script>";
            html_head_filter_block on;
            
            proxy_cache webcache;
            proxy_cache_bypass $http_cache_control;
            
            proxy_set_header HOST $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_pass http://127.0.0.1;
            
            add_header Cache-Control $cachemap;
            add_header Strict-Transport-Security "max-age=31536000;includeSubDomains";
        }
   
        # redirect server error pages to the static page /50x.html
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
        
    }

}

The configuration above sets up Nginx to listen on the public ip address at port 80 and 443. The server block at port 80 redirects HTTP request to HTTPS at port 443. In the server block for port 443 (HTTPS), proxy_pass to http://127.0.0.1 is configured. http://127.0.0.1 is where the Apache web server is listening for traffic.

We also turn on the Html Head filter module by setting the directive html_head_filter with its argument string in the location block. This argument string is the text to be inserted after the <head> tag in the HTTP response from the Apache web server. The argument string is a script tag, refering to a monitoring javascript, mymonitor.js. This script tag will be inserted into the HTTP response.

html_head_filter "<script src=\"scripts\mymonitor.js\"></script>";
html_head_filter_block on;

The html_head_filter_block directive is set to on, this tells the Html Head filter module to display a blank html page for HTTP responses that do not contain a <head> tag within the first 512 characters.

Start up Nginx with the following command

sudo /usr/local/nginx/sbin/nginx

Access a page on the website using your favourite web browser and view the page source. The monitoring script should be inserted.

Nginx Html head filter module script insertion
Fig 6. Nginx Html Head filter module -- Script insertion

Create a test html page on the website that doesn't contain any <head> tag and is at least 512 characters.

echo "<html> Hello world" > testwithouthead.html
perl -e 'print "A" x 512' >> testwithouthead.html
echo "</html>" >> testwithouthead.html

Move the test html file into the document root of the Apache web server. Try accessing it, a blank page should be displayed.

Nginx Html head filter module blank page
Fig 7. Nginx Html head filter module -- Blank page

Edit the Nginx configuration /usr/local/nginx/conf/nginx.conf and set html_head_filter_block to off.

html_head_filter_block off;

Send a HUP signal to Nginx to re-read the configuration.

sudo kill -HUP `ps -ef | grep "nginx: master process" | grep -v grep | awk '{print $2;}'`

Clear the browser cache and restart the browser. Access the page again. The page will be displayed without being "blocked".

Nginx Html head filter module Block off
Fig 8. Nginx Html head filter module -- Block off

Some other tests can include html pages with multiple <head> tags, (the monitoring script should be inserted once), head tags with leading/trailing spaces and a mix of upper/lower case, or a Php script dynamically generating html content, or a 404 not found error page (monitoring script should not be inserted) etc... The Html Head filter module should handle all these cases properly.

When all the testings are done and the results met expectations, the filter module can be deployed to production. The filter module is actually deployed on nighthour.sg, inserting the monitoring script into the web pages here.

Conclusion and Afterthought

Nginx is a high performance web server and reverse proxy that is highly extensible. It can serve as a Web Application Firewall (WAF) through modules such as Mod-Security, or even act as an application server through project such as Openresty. Learning to write an Nginx module will allow an IT professional to know more about the internals of this flexible web infrastructure that is gaining wide usage.

The knowledge gained can benefit developers, infrastructure engineers, security engineers/professionals and even system administrators who code.

Useful References

The full source code for the Nginx Html Head Filter is available at the following Github link.
https://github.com/ngchianglin/NginxHtmlHeadFilter

If you have any feedback, comments, corrections or suggestions to improve this article. You can reach me via the contact/feedback link at the bottom of the page.

Article last updated on Oct 2019.