Night Hour

Reading under a cool night sky ... 宁静沉思的夜晚 ...

Writing an Nginx Response Body Filter Module

Willow tranquility

By three methods we may learn wisdom: First, by reflection, which is noblest; Second, by imitation, which is easiest; and third by experience, which is bitterest. , Confucius (孔子)


15 Dec 2017


Introduction

Nginx is a popular opensource web and proxy server that is known for its performance and used by many websites. It supports third party modules that can provide additional functionalities and customizations. This article shows how to write and develop a simple filter module that inserts a text string after the <head> element in a HTTP response body.

This can be useful in some cases. For instance, to insert a monitoring script without modifying the existing web pages or web application. Nginx can be used as a reverse proxy to speed up access to the website and at the same time inserts the monitoring script to the web content.


Article last updated Nov 2020.

Table of Content

  1. Design and Approach
    1. The Html Tag Parser
    2. Nginx Buffer Chains and Text Insertion
    3. A Big Picture View of the Filter Setup
    4. Logical Flow of the Filter Module
    5. Performance Considerations
    6. HTTP Chunked Transfer Encoding
  2. Structure of an Nginx HTTP Filter Module
    1. Components of Nginx Module
    2. Nginx Module Filter Chain
    3. Module Config Shell File
  3. Implementating Nginx Response Body Filter
    1. Nginx Per Request/Respond Context
    2. Saving and Retrieving Per Request/Response Context
    3. Structure for Storing Module Configuration
    4. Module Directives
    5. Nginx Module Context
    6. The module initialization function
    7. The module configuration creation and merge functions
    8. Nginx Module Definition
    9. The response headers filter function
    10. The response body filter function
    11. Explaining ctx->in, ctx->out, ctx->last_out
    12. The html tag parser function
    13. The text insertion function
  4. Compiling the Nginx Body Filter Module
  5. Testing the Nginx Filter Module
  6. A note about previous versions
  7. Conclusion and Afterthought
  8. Useful References

Design and Approach

This section describes the design and approach taken to build the filter module. It shows how a simple parser can be built to parse for html tags. It explains how Nginx stores HTTP response using chain links of buffers and the way to insert text into this output chain. It also touches on how the filter module can be deployed, some of its features and the performance considerations.

Like many other Nginx modules, this filter module will be written using the C language.

The Html Tag Parser

In order to locate the <head> element, the filter needs to be able to parse an input stream for html tags or elements. To do this, let's take a look at the structure of an html element.

Html tag syntax diagram
Fig 1. Syntax Diagram HTML Tag

A html tag starts with an angle bracket < and ends with the corresponding closing > bracket. It has a tagname, an optional "/" and optional attributes. In the diagram above, SP represents whitespace. There must be at least a single space between the tagname and an attribute. Additonal whitespaces may be present between thesse tokens.

The following shows some examples of html tag.

<div>
<p    class='style1' >
<span class="mystyle"  id="list1">
<head>
</footer>

A simplified BNF (Backus–Naur form) for HTML tagname and its attributes may look like this.

Tagname :: alphabetic letters
Attribute :: AttributeName <opt space> = <opt space> <opt quotes> AttributeValue <opt quotes>
AttributeName :: alpha-numeric letters
AttributeValue :: alpha-numeric letters | EscapeSequences | empty
alphabetic letters :: a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
alpha-numeric letters :: 0|1|2|3|4|5|6|7|8|9 | alphabetic letters
EscapeSequences :: '\"' | '\'' | '\n' | '\r' | '\\' | '\t' | '\v' | '\f' | '\b' | '\a' | '\xhhhh' | '\uhhhh'
<opt space> :: Optional white spaces
<opt quotes> :: Optional quotes
Optional quotes :: " | ' | empty
Optional white spaces :: '  ' | '\r' | '\n' | '\t' | '\v' | '\f' | empty
empty :: ''

The BNF does look complex and scary. Parsing html into a syntax tree like what a web browser does is hard. Fortunately, it is not as difficult as thought in our case. We can forget about the BNF listing above.

The parser just needs to focus on four key tokens. A starting angle bracket, closing angle bracket, single quote and double quote.

< > ' "

A stack can be used to collect the html tag encountered in an input stream. When the parser encounters a start bracket, '<', it initializes an empty stack and push the start bracket into the stack. Other characters that come after the start bracket will be pushed into the stack.

If a single or double quote is seen, a toggling flag is set to indicate the start of string content. A corresponding closing quotation mark is required to end the string content. When the parser finally sees an end bracket, '>', it pushes it into the stack and the complete html tag is now present on the stack.

Toggling flags are used to determine if a bracket, '<' or '>', represents a token or is part of a string. Any '<' or '>' tags encountered after the start bracket and a quotation mark is part of a string. It will be treated as a normal character to be pushed into the stack. When the corresponding closing quotation is seen, the relevant toggling flag is reset. Any '<' or '>' encountered afterwards will be interpreted as the start or end token for an html element.

This toggling mechanism applies to the single and double quotation marks too. A single quote that appears after a start bracket and double quote is part of a string. A double quote that appears after a start bracket and a single quote is part of of a string.

Any characters encountered before a start bracket, '<', are ignored. These are the content of the html document. A fresh stack is initialized each time the starting bracket is encountered.

These simple rules are sufficient to extract an html element from a input stream. It is really not that complicated or scary as we have first thought. We will look at the parser code later in the implemetation section of this article.

Nginx Buffer Chains and Text Insertion

Nginx stores the content of the HTTP response body into a linked list of buffers using chain links (ngx_chain_t). Each buffer structure (ngx_buf_t) in the linked list holds a part of the HTTP response body. The final buffer has a special flag, last_buf, configured. This marks it as the last buffer in the output.

Nginx buffer chain diagram
Fig 2. Nginx chain of buffers

More than one linked list of buffer chains may be required to store the entire content of a HTTP response body. Nginx will pass each linked list of chains to the filter module as and when data is available.

The job of our html parser is to process each of these buffers, looking for the <head> tag. Each buffer (ngx_buf_t) has a pointer to a block of memory space holding the actual response content. The parser treats this memory block as an input stream starting with the first buffer.

When the <head> tag is found, its end position must be in the memory block held by the current buffer. To insert our own text string, this buffer will be split and relinked with our text in the middle. The following illustrates how an original buffer is split into 3 new buffers with the inserted text.

Nginx buffers Insertion of Text 1
Fig 3. Insertion of Text 1

If the original buffer doesn't contain any data after the <head> tag, our text can be linked directly to this buffer.

Nginx buffers Insertion of Text 2
Fig 4. Insertion of Text 2

The new set of buffers with the inserted text are linked up in the correct order with other buffers in the nginx output chain. This modified chain link is then passed to other filters in nginx for processing. The content will eventually be sent to the user.

So far, there are 3 diagrams showing the structure of Nginx buffer chains but they are actually high level abstract views, meant to describe the concepts of text insertion.

The actual data structures is more like the following.

More detailed view Nginx buffers Chain
Fig 5. A more detailed view Nginx Buffers Chain

The diagram shows a single linked list of ngx_chain_t (chain links) containing ngx_buf_t (buffers) that point to blocks of memory holding the content of the HTTP response body. The final buffer in the link has the last_buf flag set to true. This indicates the end of output for the HTTP response.

Take note that the HTTP response can be stored in multiple sequential chain links. The filter module has to check the last_buf flag to determine the end of the HTTP response.

It is useful to keep the above diagram in mind; the filter module will be working on these chain of structures. It is easier to understand the source code when one can visualize these structures.

Refer to the official Nginx Development Guide for detailed description of ngx_chain_t and ngx_buf_t structures.

A Big Picture View of the Filter Setup

The earlier description about the html parser and text insertion is the core of the filter module that will be implemented. Here, we will show a big picture view of how this filter module can be deployed and used.

Nginx Reverse Proxy Setup Architecture
Fig 6. Nginx Reverse Proxy Setup Architecture

In the diagram above, Nginx and the web server are located on the same machine. The web server listens only on localhost (127.0.0.1) and accepts traffic from Nginx. Nginx is setup as a reverse proxy with the filter module installed. Incoming client requests are forwarded to the web server. The outgoing response from the web server is intercepted by Nginx and modified with the inserted text (a monitoring script).

Nginx is configured with TLS (Transport Layer Security, a.k.a HTTPS) and served as the TLS termination proxy for the web server. Caching will be enabled on Nginx to speed up performance.

There are a few other things the filter module has to handle. For example, if the original content from the web server is compressed (gzip or deflate), the filter will let the compressed content pass through unmodified. The web server should therefore disable compression and let Nginx itself handle content compression.

The order of module loading in Nginx is important. The filter module needs to run before Nginx's gzip module; otherwise, it cannot process the content that is compressed by gzip. By default, the filter module will run before gzip. The filter module will only handle html content type. Other content types like images, javascript, stylesheets or binary will be passed through unmodified.

The filter module will check the HTTP status code as well. If the status is not HTTP 200, the content will pass through unmodified. This means error pages will not have the text inserted.

Our filter also needs to be able to handle malformed html, such as those without <head> tag or those with multiple <head> tags etc... The string text will only be inserted once after the first <head> tag that is encountered.

The <head> tag has to be in the first 256 characters of the HTTP response body. The filter module will only process the first 256 characters of a HTTP response. Most well formed html content should have the <head> tag right at the beginning of a document. The 256 characters limit can be changed in the source code.

Another limit that is set is that a single html tag including its attributes cannot be more than 512 characters. The maximum stack size for the parser is set to 512. This limit should not be hit as the 256 characters limit will have been triggered much earlier.

Logical Flow of the Filter Module

The big picture view earlier has shown how the module can be deployed and what are some of its limits and features. We can work out the behaviour of the filter module using a logic flow diagram. This will provide more clarity when writing the module code.

The simple block diagram below shows the logical flow of the filter module.

Logic flow of the Nginx Filter Module
Fig 7. Logic Flow of Nginx Filter Module

The current buffer from the chain link is processed and there are two possible outcomes. The <head> tag is found within the first 256 characters of the current buffer or it is not found.

If the <head> tag is found, our text will be inserted as described earlier. The modified buffers will be linked to the other buffers in the chain link and eventually its new content will be sent to the user.

For the case where the <head> tag is not found, the filter module will log an alert in the nginx error log. The current buffer is already a part of the chain link of buffers and no modification is made. The chain link will be processed by Nginx and the unmodified content will eventually be sent to the user.

Performance Considerations

The filter module needs to be fast. An nginx setup may include many other modules; our module needs to do it work fast and pass the output to other modules and nginx for processing.

The html parsing and text insertion is done in a single pass through the chain of buffers. The parser will only process the first 256 characters in the response body. Anything that comes after will not be parsed. This avoids parsing all of the response body improving performance.

HTTP Chunked Transfer Encoding

A particular problem of modifying a HTTP response body is the determination of the new content length. In our case, we are unable to tell whether a <head> tag is present until we have processed the content. Therefore, we can't determine the value of the content length header that is to be sent in advance.

The standard solution is to use HTTP Chunked Transfer Encoding that indicates unknown response body size. To avoid chunked transfer encoding, some tricks can actually be used.

For example, we can add the length of the text string to the Content Length header. If the <head> tag is eventually not found, we can append blank paddings to the output so that it matches the content length. If the <head> tag is found, our inserted text will ensure that the Content Length header is correct.

For simplicity, our filter will use chunked transfer encoding. In earlier versions of our filter module, the paddings are actually implemented to avoid chunked transfer encoding, there are also other features like blanking a page if <head> is not found etc...

All these additional features and tricks add complexity. It can also lead to potential bugs. Poor understanding of the module behaviour can lead to misconfiguration issues. In the end, I reverted back to a simple design for this filter module. The aim is for simplicity and performance.

In the future though, I may come up with another filter module that has mandatory blocking as this can be useful in security. For readers who are interested in the earlier versions, you can refer to the Github link for the module at the end of this article. The README.md describes how to checkout the version before my reversion back to this simple design.

Structure of an Nginx HTTP Filter Module

This section will briefly run through some of the components of an Nginx module. This will help in understanding how the filter module works when going through its source code later.

The official Nginx Development Guide is the main reference to learn about developing nginx modules. It provides detailed information on the header files to include, the return codes that are supported, the functions available, the various Nginx data types such as ngx_str_t (String), arrays, lists etc... There are also many example codes that one can refer to.

The official guide is rather long and multiple readings are probably required to understand the content. An easier introduction is available at EMiller 's Guide To Nginx Development. This guide is a useful tutorial for beginners learning to write Nginx modules.

Components of Nginx Module

There are 3 important Nginx data structures that modules rely on.

  1. Module Definition
  2. Module Context
  3. Module Directive Structure

The following table describes each item in more details. The source definition column provides the link to the actual nginx source code where the structure is defined.

Data Structure Description Source Definition
ngx_module_t
(Module Definition)

This structure is the module definition. It is a typedef of ngx_module_s and it defines the module. It is a global variable for each module. At the top of the structure are version information that can be filled by using a macro NGX_MODULE_V1. There are also several unused fields for future extensions at the bottom of the struct that can be filled with NGX_MODULE_V1_PADDING.

For the remaining fields, we are interested in only 3 of them. The rest are handlers that can be called at various points in the Nginx cycle. These are set to NULL. The 3 fields that concern us are as follow.

  • void *ctx;
    This takes the module context (ngx_http_module_t) which contains the function handlers for creating module configuration struct and merging module configuration. ngx_http_module_t is covered later in this table.
  • ngx_command_t *commands;
    This takes a pointer to an array of ngx_command_t. Each ngx_command_t defines a directive that the module takes. ngx_command_t is covered later in this table.
  • ngx_uint_t type;
    This defines the type of module (let Nginx know what is stored in ctx), such as NGX_CORE_MODULE, NGX_HTTP_MODULE etc...
Source Def
ngx_http_module_t
(Module Context)

Module context, a static data structure that defines the handlers for the creation and initialization of a module's configuration struct. It includes handlers that can run pre and post configuration.

A module can have its own configuration struct that contains the parameters it requires. The function handlers define here are for the creation and merging of the module configuration struct. There are seperate pairs of function handlers for the module configuration that appear in Nginx 's main config block, server config block and location block. There are also two handlers that can run pre and post configuration.

For those handlers that are not needed, NULL can be specified. For example, if a module only has directives in Nginx's location block and it doesn't require merging values from higher levels, the function handler for creating a location configuration can be specified, while all others set to NULL.

Source Def
ngx_command_t
(Module Directive Structure)

This is a typedef of ngx_command_s, for defining a module directive. A static array of ngx_command_t, containing the directives of a module is passed to Nginx. The arrays is terminated by a ngx_null_command. ngx_command_t has the following fields.

  • Directive Name
    An ngx string for the name of the directive.
  • Bitmask
    Indicates where the directive will be configured (eg. HTTP, server or location block in the Nginx config file). The bitmask also indicates how many and what arguments the directive takes.
  • Set Function pointer
    A set handler function for saving the directive arguments. Nginx has several pre-defined set functions for saving various directive arguments like boolean, string etc... A custom handler can also be specified.
  • Configuration Structure
    This specifies the configuration structure passed to the directive handler. If a module directive is configured in the server context/block of the Nginx config file, then the server context offset (NGX_HTTP_SRV_CONF_OFFSET) should be specified here. The handler function use this information for locating the right module configuration.
  • Parameter offset
    This is where the parameter for the module configuration is located. The set handler function will save the directive argument here.
  • Post
    A secondary function pointer can be specified that will be called after the earlier set function handler has saved the directive argument. This field can also hold a default value that can be used by some of the Nginx pre-defined set functions.
Source Def

Nginx Module Filter Chain

Besides the 3 data structures described above, we need to know a bit about how Nginx handles http filter modules. Nginx treats http filter modules like a chain too. The first filter will call the second and the second calls the third and so on... until the last. There are two separate chains, one for handling HTTP response headers and another for the HTTP response body.

A filter module can register a handler for HTTP response headers, as well as a handler for HTTP response body.

Registration can be done in an initialization function defined as a post configuration function in the module context. The module context (ngx_http_module_t) is described in the table earlier.

The filter handlers take the arguments and return values required by Nginx. For example, a HTTP response headers handler function takes a pointer of ngx_http_request_t as argument and return ngx_int_t. This handler function will call the next response headers handler in the chain when it is done.

The following is a function prototype of a filter handler for HTTP headers. The code is from our filter module.

static ngx_int_t ngx_http_html_head_header_filter(ngx_http_request_t *r );

The nginx request structure, ngx_http_request_t, contains many useful information like the HTTP status of the response, its content type, content length etc... Refer to the Nginx Development Guide on the various fields stored in a ngx_http_request_t structure.

The HTTP response body filter handler takes two arguments, a pointer to ngx_http_request_t and a pointer to ngx_chain_t. It returns an ngx_int_t. The second argument, ngx_chain_t* is a linked list for the output buffers. Each buffer stores part of the HTTP response body.

Function prototype of a filter handler for HTTP response body taken from our filter module.

static ngx_int_t ngx_http_html_head_body_filter(ngx_http_request_t *r, ngx_chain_t *in);

Our filter module will be parsing the content blocks in the ngx_chain_t* linked list; inserting our text after the <head> tag. Once it is done, it will call the next response body handler in the chain.

Note that the response body filter handler function can be called many times in a single request. This is due to the nature of asynchronous data access, non blocking I/O that enables nginx to be high performance. The filter handler is called when data is available for processing.

There are two global variables that are used by Nginx for registering the handler functions. The initialization function of our filter module sets these two variables when registering the handlers.

  • ngx_http_top_header_filter is a global pointer for storing the first HTTP response headers filter handler.

  • ngx_http_top_body_filter is a global pointer that stores the first HTTP response body filter handler.

We will see how these 2 variables are used when going through the source code.

Module Config Shell File

To tell Nginx about the filter module, a config file is required. This is just a regular shell file. It tells Nginx, the module name, the module type and the module source code location. For more details on the config file and Nginx module, refer to the Nginx Development Guide. The Nginx Wiki provides information on the config file as well.

Let's proceed to the implementation of the filter module and hopefully these concepts will become clearer when going through actual source code.

Implementing Nginx Response Body Filter

This section runs through some of the functions and data structures in the source code for the Html Head filter module. The full source is available at the Github link at the bottom of the article.

The following is the listing for the config file of Html Head filter module. Note, the filename of the config file is "config". It specifies the type of the module, a name for the module and a single c source file that contains the module code.

1
2
3
4
5
6
7
ngx_module_type=HTTP_AUX_FILTER
ngx_module_name=ngx_http_html_head_filter_module
ngx_module_srcs="$ngx_addon_dir/ngx_http_html_head_filter_module.c"

. auto/module

ngx_addon_name=$ngx_module_name

ngx_http_html_head_filter_module.c is the filter source file. The 3 Nginx header files required for HTTP module development are included at the top of the source file. Three macros are defined at the top as well.

The following code listing shows these macros and include files.

1
2
3
4
5
6
7
#include <ngx_config.h>
#include <ngx_core.h>
#include <ngx_http.h>

#define HF_MAX_STACK_SZ 512
#define HF_MAX_CHARACTERS 256
#define HF_LAST_SEARCH 54321

A brief explanation of each of the macros are given below.

  • HF_MAX_STACK_SZ defines the size of the parsing stack, currently set to 512.

  • HF_MAX_CHARACTERS defines the maximum characters in a response body that the parser will look for the <head> tag. Currently set as 256 characters.

  • HF_LAST_SEARCH defines the return code of our parsing function if the <head> tag is not found within 256 characters.

Nginx Per Request/Respond Context

Nginx allows a module to keep state information per HTTP request/response through a data structure defined by the module. We define a structure ngx_http_html_head_filter_ctx_t that stores the state of processing a response. It includes a stack, headfilter_stack_t, used by the parser.

There are also a number of other members like count, which tracks the number of characters processed by the parser so far. The filter module expects to find the <head> tag in the first 256 characters of the response body.

The following shows the code for the per request/respond context structure and the parser stack.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/* Stack for parsing html */
typedef struct 
{
u_char data[HF_MAX_STACK_SZ];
ngx_int_t top;
}
headfilter_stack_t;


/* 
 * Module data struct for maintaining
 * state per request/response
 * 
*/
typedef struct
{
ngx_uint_t  last_search; 
ngx_uint_t  log_once; 
ngx_uint_t  last; 
ngx_uint_t  count;
ngx_uint_t  index;
ngx_uint_t  found;
ngx_uint_t  starttag; 
ngx_uint_t  tagquote;
ngx_uint_t  tagsquote;
headfilter_stack_t stack;
ngx_chain_t  *free;
ngx_chain_t  *busy;
ngx_chain_t  *out;
ngx_chain_t  *in;
ngx_chain_t  **last_out;
}
ngx_http_html_head_filter_ctx_t;

The index variable stores the current position in the memory block of a buffer that the parser is processing. If a <head> tag is found, index will point to the position of the closing bracket ">" in the memory block of the current buffer. This information will be used for splitting up the buffer and inserting our text.

Structure members like found, last_search and last are flags to indicate certain conditions. The variable found is set to true when the <head> tag is found. last_search is set when the characters limit of 256 is hit. last is set when the last buffer of the output is processed.

starttag, tagquote and tagsquote are used by the parser when parsing the content block.

The ngx_chain_t pointers, free, busy, out and in, are used together with the pointer to pointer, last_out, for handling the incoming and outgoing buffers chains. free and busy are required for buffer reuse. Refer to the Nginx Development Guide for more details on buffer reuse.

Saving and Retrieving Per Request/Response Context

Nginx offers two functions, ngx_http_set_ctx(r, ctx, module) and ngx_http_get_module_ctx(r, module) for saving and retrieving the module's per request/response context.

In our filter module implementation, ngx_http_set_ctx() function is called by the response headers filter handler when creating and initializing the per request/response context structure. The response body handler calls ngx_http_get_module_ctx() to retrieve the per request/response context structure.

If this structure is NULL, the response body handler will skip processing and call the next response body filter in the filter chain. The response headers filter handler will not create this context if certain checks failed. For example, if the content type is not "text/html" etc... You shall see this later in the source code.

Structure for Storing Module Configuration

The following is the data structure for storing the arguments of the configuration directives. When the nginx configuration file is processed, the arguments for our filter module directive will be stored into this structure.

1
2
3
4
5
6
7
8
9
/* Configuration struct for module */
typedef struct
{
ngx_str_t insert_text;
}
ngx_http_html_head_filter_loc_conf_t; 

static ngx_http_output_header_filter_pt  ngx_http_next_header_filter;
static ngx_http_output_body_filter_pt    ngx_http_next_body_filter;

ngx_http_html_head_filter_loc_conf_t has a string field, insert_text, that holds the text to be inserted after the <head> tag. This is the only configuration directive for our simple filter module.

The two static variables ngx_http_next_header_filter and ngx_http_next_body_filter, are pointers for storing the next header filter and body filter in the Nginx chain of filters. These are set during initialization of our filter module and are called when our module has done its work.

Module Directives

The following listing shows the directive that our filter module will take. The directives are declared as a static array of ngx_command_t structures.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
/* Module directives */
static ngx_command_t ngx_http_html_head_filter_commands[] =
{
   {
     ngx_string("html_head_filter"),     /* Module Directive name */
     NGX_HTTP_LOC_CONF | NGX_CONF_1MORE, /* Directive location and argument */
     ngx_conf_set_str_slot,              /* Handler function */
     NGX_HTTP_LOC_CONF_OFFSET,           /* Save to loc config */ 
     offsetof(ngx_http_html_head_filter_loc_conf_t, insert_text), /* loc para */
     NULL
   },
      
   ngx_null_command
};

ngx_http_html_head_filter_commands[ ] is an array of ngx_command_t, it holds a single directive for our filter module and is terminated by a ngx_null_command.

The directive that is defined is "html_head_filter". The following describes its individual fields.

  • Its first field is simply the directive name, an ngx_str_t, "html_head_filter".

  • The second field is a bitmask that defines where this directive can occur in the nginx configuration file (NGX_HTTP_LOC_CONF) and the number of arguments (NGX_CONF_1MORE) that it takes. In our case, we specify that this directive can occur in the location context in nginx configuration file and takes 1 or more argument. The argument is a string, the text to be inserted after the <head> tag.

  • The third field is the handler function that is called to read in our directive and set its argument. In this case, we use some of the set functions provided by Nginx. ngx_conf_set_str_slot( ) will read a string argument and save it in our module configuration structure.

  • The fourth field, NGX_HTTP_LOC_CONF_OFFSET, tells the handler function that our module configuration structure is a location configuration.

  • The fifth field, specifies the offset for saving the argument. In this case, the argument should be saved in our ngx_http_html_head_filter_loc_conf_t module configuration structure in the insert_text variable.

  • The sixth field, allows the specification of a post handler that can be used for further initialization of the directive argument. In our case, we are not using this and set it to NULL.

Note, that the "html_head_filter" directive is required in order to enable the filter module. If this directive is not set in the nginx configuration, our filter module will skip processing.

Nginx Module Context

The module context, ngx_http_html_head_filter_ctx, sets three function handlers. The following shows the code listing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
/* Module context */
static ngx_http_module_t  ngx_http_html_head_filter_ctx =
{
    NULL,                             /* Pre config */
    ngx_http_html_head_init,          /* Post config */
    NULL,                             /* Create main config */
    NULL,                             /* Init main config */
    NULL,                             /* Create server config */
    NULL,                             /* Merge server config */
    ngx_http_html_head_create_conf,   /* Create loc config */
    ngx_http_html_head_merge_loc_conf /* Merge loc config */
};

ngx_http_html_head_init( ) is used for initializing the module after configuration is done and ngx_http_html_head_create_conf( ) is for creating the module configuration structure. ngx_http_html_head_merge_loc_conf( ) function is used for merging configuration directives from parent location contexts in the nginx configuration file.

More details of these 3 functions are provided below.

The module initialization function

The ngx_http_html_head_init( ) function initializes the module and registers our handlers in the filter chain. This function is set in the post configuration field of the module context earlier. Nginx will call it after the configuration has been read.

The module's header filter and body filter handler functions are assigned to the global ngx_http_top_header_filter and ngx_http_top_body_filter pointers respectively. Nginx will call these and hence invoke our filter handlers.

The original function handlers in these 2 global pointers are saved in ngx_http_next_header_filter and ngx_http_next_body_filter respectively. When our module completes its work, it will in turn call these saved function handlers. This establishes the Nginx filter chain, enabling one filter to call the next until the last in the filter chain.

The following shows the source code for the ngx_http_html_head_init( ) function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
/* Function to initialize the module */
static ngx_int_t
ngx_http_html_head_init(ngx_conf_t * cfg)
{

    ngx_http_next_header_filter = ngx_http_top_header_filter;
    ngx_http_top_header_filter = ngx_http_html_head_header_filter;

    ngx_http_next_body_filter = ngx_http_top_body_filter;
    ngx_http_top_body_filter = ngx_http_html_head_body_filter;

    return NGX_OK;

}

The module configuration creation and merge functions

The following shows the code snippets for the ngx_http_html_head_create_conf( ) and ngx_http_html_head_merge_loc_conf( ) functions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/* Creates the module location config struct */
static void* 
ngx_http_html_head_create_conf(ngx_conf_t *cf)
{

    ngx_http_html_head_filter_loc_conf_t *conf;
    conf = ngx_pcalloc(cf->pool, sizeof(ngx_http_html_head_filter_loc_conf_t));
    if(conf == NULL)
    {
        ngx_conf_log_error(NGX_LOG_EMERG, cf, 0,
            "[Html_head filter]: ngx_http_html_head_create_conf: "
            " cannot allocate memory for config");
        return NGX_CONF_ERROR;
    }

    return conf;

}

/* Merges the module location config struct */
static char* 
ngx_http_html_head_merge_loc_conf(ngx_conf_t *cf,                 
    void *parent, void *child) 
{

    ngx_http_html_head_filter_loc_conf_t *prev = parent;
    ngx_http_html_head_filter_loc_conf_t *conf = child;

    ngx_conf_merge_str_value(conf->insert_text, prev->insert_text, '\0');

   return NGX_CONF_OK;

}

The ngx_http_html_head_create_conf( ) function creates our module configuration structure for saving our directives. The ngx_http_html_head_merge_loc_conf( ) function merges directives that appears in parent locations with that appearing in child locations.

Nginx Module Definition

The array of module directives, the module context and module type are specified in the ngx_module_t structure. This is the module definition discussed in the earlier section. The following shows the code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
/*
Module definition
*/
ngx_module_t  ngx_http_html_head_filter_module =
{
    NGX_MODULE_V1,
    &ngx_http_html_head_filter_ctx,     /* module context */
    ngx_http_html_head_filter_commands, /* module directives */
    NGX_HTTP_MODULE,                    /* module type */
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NGX_MODULE_V1_PADDING
};

The response headers filter function

The following shows the code listing for the ngx_http_html_head_header_filter() function. This is the handler that is registered earlier by the module initialization function. It process the incoming HTTP response headers, does some checks and initialize the module per request/response context for managing state.

If some of the checks failed, the context will not be created. The current response headers will be passed unmodified to the next headers filter handler. Some examples of checks failing include, the "html_head_filter" directive is not set, or if the HTTP response is compressed.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
/* Module function handler to filter http response headers */
static ngx_int_t
ngx_http_html_head_header_filter(ngx_http_request_t *r )
{

    ngx_http_html_head_filter_loc_conf_t *slcf;
    ngx_http_html_head_filter_ctx_t *ctx;

    slcf = ngx_http_get_module_loc_conf(r, ngx_http_html_head_filter_module);
    
    
    if(slcf == NULL || slcf->insert_text.data == NULL)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "null configuration");
        #endif
       
        return ngx_http_next_header_filter(r);
    }
    

    if(slcf->insert_text.len == 0)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                " empty configuration insert text");
        #endif
        
        return ngx_http_next_header_filter(r);
    }
    

    if(r->header_only || r->headers_out.content_length_n == 0)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "header only, invalid content length");
        #endif
        
        return ngx_http_next_header_filter(r);
    }
    
     
    if(ngx_test_content_type(r) == 0) 
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "content type not html");
        #endif            
        
        return ngx_http_next_header_filter(r);
    }

    
    if(ngx_test_content_compression(r) != 0)
    {/* Compression enabled, don't filter  */ 

        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "compression enabled");
        #endif    
                     
        return ngx_http_next_header_filter(r);
    }
 
    if(r->headers_out.status != NGX_HTTP_OK)
    {/* Response is not HTTP 200   */

        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "http response is not 200");
        #endif   
                     
        return ngx_http_next_header_filter(r);
    }

    r->filter_need_in_memory = 1;

    if (r == r->main) 
    {/* Main request */
        
         ngx_http_clear_content_length(r);
         ngx_http_weak_etag(r);
     
    }
    

    ctx = ngx_http_get_module_ctx(r, ngx_http_html_head_filter_module);
    if(ctx == NULL)
    {
        ctx = ngx_pcalloc(r->pool, 
                sizeof(ngx_http_html_head_filter_ctx_t)); 
        
        if(ctx == NULL)
        {
            #if HT_HEADF_DEBUG
                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "cannot allocate ctx memory");
            #endif 
                          
            return ngx_http_next_header_filter(r);
        }
        
        ngx_http_set_ctx(r, ctx, ngx_http_html_head_filter_module);
    }
    
    /* Intializes the last output chain */
    ctx->last_out = &ctx->out;
    
    return ngx_http_next_header_filter(r);
    
}

The response body filter function

The following is the code listing for the ngx_http_html_head_body_filter() function. Like the header filter handler, this function is registered by the module initialization function.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
/*
 * Module function handler to filter the html response body
 * and insert the text string
 * 
*/
static ngx_int_t
ngx_http_html_head_body_filter(ngx_http_request_t *r, ngx_chain_t *in)
{

    ngx_int_t                               rc;
    ngx_http_html_head_filter_ctx_t         *ctx;
    ngx_http_html_head_filter_loc_conf_t    *slcf;
   
  
    slcf = ngx_http_get_module_loc_conf(r, ngx_http_html_head_filter_module);
    ctx = ngx_http_get_module_ctx(r, ngx_http_html_head_filter_module);

    
    if(slcf == NULL)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_body_filter: "
                "null configuration");
        #endif
       
        return ngx_http_next_body_filter(r, in);
    }


    if(ctx == NULL)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_body_filter: "
                "unable to get module ctx");
        #endif           
            
        return ngx_http_next_body_filter(r, in);
    }


    if(in == NULL && ctx->busy == NULL)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_body_filter: "
                "input and busy chain is null");
        #endif     
       
       return ngx_http_next_body_filter(r, in);
    }
	
   
    /* Copy the incoming chain to ctx-in */
    if (ngx_chain_add_copy(r->pool, &ctx->in, in) != NGX_OK) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_http_html_head_body_filter: "
            "unable to copy input chain - in");
                     
        return NGX_ERROR;
    }
    
    
    /* Loop through and process all the incoming buffers */
    while(ctx->in)
    {	
        ctx->index = 0; 
                
        if(ctx->found == 0 && ctx->last_search == 0)
        {		 
    
            rc = ngx_parse_buf_html(ctx, r);
            if(rc == NGX_OK)
            { /* <head> is found */
                ctx->found = 1; 
                rc=ngx_html_insert_output(ctx, r, slcf);
			   
                if(rc == NGX_ERROR)
                {
                    return rc; 
                }
            }
            else if(rc == HF_LAST_SEARCH)
            {
                ctx->last_search = 1;
            }
            else if(rc == NGX_ERROR)
            {
                return rc; 
            }	
            
        }	
        
        
        if(ctx->in->buf->last_buf || ctx->in->buf->last_in_chain)
        {/* Last buffer  */
           ctx->last = 1; 
        }	

	    
        *ctx->last_out=ctx->in;
        ctx->last_out=&ctx->in->next;
        ctx->in = ctx->in->next;
    }
    

    /* It doesn't output anything, return */
    if ((ctx->out == NULL) && (ctx->busy == NULL)) 
    {
        
        ngx_log_error(NGX_LOG_WARN, r->connection->log, 0,
                     "[Html_head filter]: ngx_http_html_head_body_filter: "
                     "nothing to output");
                     
        return NGX_OK;
    }
    
    /* Log an alert indicating <head> tag is not found */
    if(ctx->last && !ctx->found && !ctx->log_once)
    {
        
        ngx_log_error(NGX_LOG_ALERT, r->connection->log, 0,
                      "[Html_head filter]: Cannot find <head> within "
                      "%ui characters limit", HF_MAX_CHARACTERS);
        
        ctx->log_once = 1;
        
    }
    
 
    rc = ngx_http_next_body_filter(r, ctx->out);
    ngx_chain_update_chains(r->pool, &ctx->free, &ctx->busy, &ctx->out,
                            (ngx_buf_tag_t)&ngx_http_html_head_filter_module);
                            
                            
    ctx->last_out = &ctx->out;
    ctx->in = NULL;
    
    return  rc;
    
}

Notice that the code follows the logical flow diagram. Text insertion though is done when processing each buffer and the <head> is found. So in a single pass, the buffers will have been changed.

The while loop on line 67 iterates through the incoming chain of buffers and call ngx_parse_buf_html( ) function to parse each buffer for the <head> tag. The <head> tag can be split over two or more consecutive buffers; the parser through the use of the stack can handle and track this easily.

If the <head> tag is found, the found flag in the module per request/response context is set and ngx_html_insert_output( ) function is called. ngx_html_insert_output( ) will insert our text after the <head> tag. The process for doing this is described in the earlier Design and Approach section. The text insertion is done in a single pass of the incoming buffers chain.

If <head> tag is not found after the first 256 characters, the last_search flag is set in the per request/response context. This stops the ngx_parse_buf_html( ) from being called on subsequent buffers, speeding up performance.

The found flag also prevents ngx_parse_buf_html( ) from being called on subsequent buffers once the <head> tag is found. It also ensures that the text will only be inserted once, after the occurence of the first <head> tag even if there are multiple <head> tags in a response body. The while loop builds the output chain that will be passed to the next nginx filter.

The ngx_http_next_body_filter() function is called once our filter has done its work.

Explaining ctx->in, ctx->out, ctx->last_out

Let's run through how the filter module actually handles the incoming buffers chain of the response body.

ctx->in and ctx->out are both pointers of ngx_chain_t. ctx->last_out is a pointer to a pointer of ngx_chain_t. When our response body handler, ngx_http_html_head_body_filter( ), is called; it is passed an incoming linked list of ngx_chain_t containing the buffers storing the response content. This linked list is copied to ctx->in. From that point on, our filter module will work on our own linked list, ctx->in.

The copying is done because our filter module may be replacing the buffers in the linked list of ngx_chain_t. It helps ensure the structures used by prior module is not accidentally modified by our filter module. These input chain of buffers in ctx->in are then processed and placed in ctx->out. ctx->out points to the head of the linked list of ngx_chain_t containing the buffers to be sent out.

To faciliate the placement of processed buffers into ctx->out, the pointer to pointer, ctx->last_out is used. ctx->last_out is initialized to the address of ctx->out, head of the output list in the ngx_http_html_head_header_filter( ) function. As and when buffer chain are added to ctx->out, ctx->last_out is updated to the address of the next chain.

ctx->last_out always point to the address of the next output chain. When the output chain is sent out to the next filter, ctx->last_out is reinitialized back to the address of ctx->out. When new buffer chains are available for our filter to process, ctx->last_out will be ready to add these to ctx->out.

The html tag parser function

The following lists the code for the ngx_parse_buf_html() function.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
/*
 * Parses the buffer to look for the <head> tag
 * Returns NGX_OK if found, 
 * NGX_AGAIN if not found in this buffer,
 * NGX_ERROR if an error occurs.
 * HF_LAST_SEARCH if the maximum characters is reached
 * 
*/
static ngx_int_t 
ngx_parse_buf_html(ngx_http_html_head_filter_ctx_t *ctx, 
                   ngx_http_request_t *r)
{
    u_char *p, c;
    ngx_int_t rc;
    ngx_buf_t* buf;
	
    if(ctx->in == NULL)
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_parse_buf_html: "
            "ctx->in is NULL");  
            
        return NGX_ERROR;
    }
		
    buf = ctx->in->buf; 

    for(p=buf->pos; p < buf->last; p++)
    {

        c = *p;
        if(ctx->count == HF_MAX_CHARACTERS)
        {
            ngx_log_error(NGX_LOG_WARN, 
               r->connection->log, 0, 
               "[Html_head filter]: ngx_parse_buf_html: "
               "unable to find <head> tag within %ui characters",
               HF_MAX_CHARACTERS);
               
            return HF_LAST_SEARCH;
        } 
        
        switch(c)
        {
            case '<':

                ctx->starttag=1;
                if(!ctx->tagquote && !ctx->tagsquote)
                {
                   ngx_init_stack(&ctx->stack);
                }

                if(push(c, &ctx->stack) == -1)
                {
                      ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                        "[Html_head filter]: ngx_parse_buf_html: "
                        "parse stack is full");  
                         
                      return NGX_ERROR;
                }
                
                break;

            case '>':

                if(ctx->starttag)
                {
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");  
                            
                        return NGX_ERROR;
                    }

                    if(!ctx->tagquote && !ctx->tagsquote)
                    {    
                        ctx->starttag = 0; 
                        /* Process the tag */
                        rc = ngx_process_tag(ctx,r);

                        if(rc == NGX_OK)
                        {
                            return NGX_OK;
                        }
                        else if(rc == NGX_ERROR)
                        {
                            return NGX_ERROR; 
                        }
                
                    }
                }

                break;

            case '\"':

                if(ctx->starttag && ctx->tagsquote==0 && ctx->tagquote==0 )
                {
                    ctx->tagquote=1;
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");  
                            
                        return NGX_ERROR;
                    }
                }
                else if(ctx->starttag && ctx->tagsquote==0 && ctx->tagquote)
                {
                    ctx->tagquote=0; 
                    if(push(c, &ctx->stack) == -1)
                    {
                         ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
            
                }
                else if(ctx->starttag && ctx->tagsquote)
                {
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
                }
          
                break;

            case '\'':

                if(ctx->starttag && ctx->tagquote == 0 && ctx->tagsquote == 0)
                {
                    ctx->tagsquote = 1;
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }  
                }   
                else if(ctx->starttag && ctx->tagquote==0 && ctx->tagsquote)
                {
                    ctx->tagsquote = 0;
                    if(push(c, &ctx->stack) == -1)
                    {
                         ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
                } 
                else if(ctx->starttag && ctx->tagquote)
                {
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
                }

                break;

            default:
         
                if(ctx->starttag)
                {
                    if(push(c, &ctx->stack) == -1)
                    {
                         ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
                }

        }

        ctx->count++;
        ctx->index++;
    }


    return NGX_AGAIN;
    
}

The function goes through the character stream in a buffer and looks for the four tokens <, ", ', >. The < token indicates a starting html tag. The stack is initialized and the token pushed into the stack. Subsequent characters that are not a token, are pushed into the stack. If a double quote or single quote is encountered, toggling flags for the respective quote is set. Any > that comes after either quotation will not be interpreted as an html ending tag. Any < that comes after a quotation will not be interpreted as a start tag.

The relevant quotation flags are reset when a second double quote or single quote is encountered. A subsequent > will then be treated as an end tag. The parser will then call the function ngx_process_tag() to check if the html tag in the stack is a <head>. Leading and trailing spaces in the tag are ignored and the check is case insensitive. However, the <head> tag cannot contain attributes.

Some examples will make this clearer. <   HeAD> is considered valid, while <Head id=1> is invalid. The parser function returns NGX_OK if a valid <head> tag is found, it returns NGX_AGAIN to indicate processing can continue with subsequent buffers and NGX_ERROR if an error occurs. When the maximum characters limit of 256 is reached, the parser will return HF_LAST_SEARCH.

The text insertion function

We will list one more function, the ngx_html_insert_output( ) function that will insert our text into the buffer chains. The following is the code snippet for ngx_html_insert_output( ).

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
/* Insert the text into body response buffer */
static ngx_int_t 
ngx_html_insert_output(ngx_http_html_head_filter_ctx_t *ctx, 
                       ngx_http_request_t *r, 
                       ngx_http_html_head_filter_loc_conf_t *slcf)
{

    ngx_chain_t  *cl, *ctx_in_new, **ll;
    ngx_buf_t  *b;

    if(ctx->in == NULL)
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
             "[Html_head filter]: ngx_html_insert_output: "
             "ctx->in is NULL");
             
        return NGX_ERROR;
    }

				   
    ll = &ctx_in_new;				   
    b=ctx->in->buf;
   
    if(b->pos + ctx->index + 1 > b->last)
    {/* Check that the head tag position does not exceed buffer */
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_html_insert_output: "
            "invalid input buffer at text insertion");
            
        return NGX_ERROR;          
    }

    cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
    if (cl == NULL) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_html_insert_output: "
            "unable to allocate output chain memory");
            
        return NGX_ERROR;
    }

    b=cl->buf;
    ngx_memzero(b, sizeof(ngx_buf_t));
   
    b->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
    b->memory=1;
    b->pos = ctx->in->buf->pos;
    b->last = b->pos + ctx->index + 1;
    b->start = ctx->in->buf->start;
    b->end = ctx->in->buf->end;
    b->recycled = 1;
    b->flush = ctx->in->buf->flush; 
       
    *ll = cl;  
    ll = &cl->next;
	

    cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
    if (cl == NULL) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
             "[Html_head filter]: ngx_html_insert_output: "
             "unable to allocate output chain memory");
             
        return NGX_ERROR;
    }

    b=cl->buf;
    ngx_memzero(b, sizeof(ngx_buf_t));
   
    b->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
    b->memory=1;
    b->pos=slcf->insert_text.data;
    b->last=b->pos + slcf->insert_text.len;
    b->start = b->pos;
    b->end = b->last; 
    b->recycled = 1;
	 
    *ll = cl;
    ll = &cl->next;
	 

    if(ctx->in->buf->pos + ctx->index + 1 == ctx->in->buf->last )
    {/* head tag is in last position of the buffer */
   
        b->last_buf = ctx->in->buf->last_buf;
        b->last_in_chain = ctx->in->buf->last_in_chain;
		 
        *ll = ctx->in->next;
        
        if(ctx->in->buf->recycled)
        {/* consume existing buffer */
            ctx->in->buf->pos = ctx->in->buf->last;  
        }
		
	    ctx->in = ctx_in_new;
	    return NGX_OK;
    }
     
    
    /* 
     * tag is within buffer last position, 
     * i.e. ctx->in->buf->pos + ctx->index + 1 < ctx->in->buf->last
     * 
     */
     
    cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
    if (cl == NULL) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_html_insert_output: "
            "unable to allocate output chain memory");
            
        return NGX_ERROR;
    }

    b=cl->buf;
    ngx_memzero(b, sizeof(ngx_buf_t));

    b->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
    b->memory=1;
    b->pos = ctx->in->buf->pos + ctx->index + 1;
    b->last = ctx->in->buf->last;
    b->start = ctx->in->buf->start;
    b->end = ctx->in->buf->end;
    b->recycled = 1;
    b->last_buf = ctx->in->buf->last_buf;
    b->last_in_chain = ctx->in->buf->last_in_chain;

    *ll = cl;
    ll = &cl->next;
    *ll = ctx->in->next;
    
    if(ctx->in->buf->recycled)
    {/* consume existing buffer */
        ctx->in->buf->pos = ctx->in->buf->last; 
    }
	  
    ctx->in = ctx_in_new; 
	   
    return NGX_OK;

}

The insert text function splits the input buffer where the <head> tag is found into either 3 or 2 buffers with the text inserted. The process is illustrated earlier in the Design and Approach section. If the current input buffer has only content up to the <head> tag, then our text can be inserted directly as a new buffer after the input buffer. In this case, it is split into 2 buffers.

Alternatively if the current input buffer has content after the <head> tag, the input buffer will be split into 3 buffers. The first is the content up till and including the <head> tag, the second is our inserted text and the third is the content after the <head> tag.

The new set of buffers are then incorporated into the output chain by the while loop in the function handler, ngx_http_html_head_body_filter( ). If the original buffer is marked with a recycled flag, it will be consumed. This is done by setting the start position of the buffer content to be equal to its last content position. The recycled flag indicates that the buffer has to be consumed as soon as possible, so that it can potentially be reused.

There are a couple of other functions and code snippet not covered in this implementation section. Some examples, include the functions for handling the parser stack, the ngx_process_tag( ) function etc... Refer to the github link below for the full source code.

Compiling the Nginx Body Filter Module

Let's proceed to compile and test the html head filter module. Create a working directory "Build-Module" to hold the source files that are required. The filter module source code can be obtained from the github repository. On a Ubuntu linux system with git installed, the following commands can be used.

mkdir Build-Module
cd Build-Module
git clone https://github.com/ngchianglin/NginxHtmlHeadFilter.git

To verify the signature of the git download, refer to these instructions. Let's do a quick static analysis of the module's source code to make sure that there are no major vulnerabilities, such as buffer overflows. On Ubuntu, we can install cppcheck.

sudo apt-get install cppcheck
cd NginxHtmlHeadFilter
cppcheck --enable=warning ngx_http_html_head_filter_module.c

Good, our module code doesn't have any glaring issues that the cppcheck analyzer can find. We can proceed to download the other packages that are required. Change our directory back to Build-Module.

cd ..

The filter module works with the latest stable Nginx 1.18.0. Download the latest stable nginx source code from the official Nginx download page. We are going to download Openssl 1.1.1h, zlib 1.2.11 and pcre 8.44 as well.

Verify the integrity of the downloads with either SHA-256 checksum or gpg signature provided by each of the package website. The following lists the sha256 checksums of the packages.

nginx-1.18.0.tar.gz
4c373e7ab5bf91d34a4f11a0c9496561061ba5eee6020db272a17a7228d35f99

openssl-1.1.1h.tar.gz
5c9ca8774bd7b03e5784f26ae9e9e6d749c9da2438545077e6b3d755a06595d9

zlib-1.2.11.tar.gz
c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1

pcre-8.44.tar.gz
aecafd4af3bd0f3935721af77b889d9024b2e01d96b58471bd91a3063fb47728

Extract these tar balls in the Build-Module directory. Issue the following commands to configure Nginx. The options include hardening flags to ensure a hardened binary.

cd nginx-1.18.0
./configure --with-cc-opt="-Wextra -Wformat -Wformat-security -Wformat-y2k -Werror=format-security -fPIE -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all" --with-ld-opt="-pie -Wl,-z,relro -Wl,-z,now -Wl,--strip-all" --with-http_v2_module --with-http_ssl_module --without-http_uwsgi_module --without-http_fastcgi_module --without-http_scgi_module --without-http_empty_gif_module --with-openssl=../openssl-1.1.1h --with-openssl-opt="no-ssl2 no-ssl3 no-comp no-weak-ssl-ciphers -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all -fPIC" --with-zlib=../zlib-1.2.11 --with-zlib-opt="-O2  -D_FORTIFY_SOURCE=2 -fstack-protector-all -fPIC" --with-pcre=../pcre-8.44 --with-pcre-opt="-O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all -fPIC" --with-pcre-jit --add-module=../NginxHtmlHeadFilter

The configure command above will create a Makefile in the objs directory. Proceed to build the binary and install it into /usr/local/nginx.

make
sudo make install

We can tar zip the compiled nginx package and move it to our server machine for testing. As a security measure and best practice, the server doesn't have gcc or compiler tools installed. We compile the code on a separate workstation that has the same architecture and OS as the server and then copy the compiled package to the server using sftp or scp.

cd /usr/local
tar -czvf nginx-binary-package.tgz nginx
sftp -i /home/devuser1/keyloc/private_rsa user@myserver
put nginx-binary-package.tgz

Testing the Nginx Filter Module

On the server, extract the nginx binary package to /usr/local/nginx. Ensure that the ownership and permission on this extracted nginx binary location are secure. The Apache web server shall serve the main website on this machine. It listens locally (127.0.0.1) on port 80 and will not accept any external network traffic.

Nginx will be configured as a reverse proxy in front of the Apache web server. Nginx accepts external network traffic and forward the traffic to the Apache web server. Refer to the earlier section, Design and Approach, for a big picture view of the deployment architecture.

Nginx is run using the nginx user and group. The following commands create the user and group, as well as the directories used by Nginx.

sudo mkdir /opt/nginx
sudo chmod 755 /opt/nginx
sudo groupadd -g 8800 nginx
sudo useradd -d /opt/nginx/home -m -u 8800 -g 8800 -s /bin/false nginx
sudo mkdir /var/log/nginx
sudo chown nginx: /var/log/nginx
sudo chmod 700 /var/log/nginx
sudo mkdir /opt/nginx/www
sudo chmod 755 /opt/nginx/www
sudo mkdir /opt/nginx/cache
sudo chown nginx: /opt/nginx/cache
sudo chmod 700 /opt/nginx/cache

Let 's do some additional hardening of the /usr/local/nginx location.

sudo chown -R root:nginx /usr/local/nginx
sudo chmod 750 /usr/local/nginx
sudo chown -R root:root /usr/local/nginx/sbin
sudo chmod 700 /usr/local/nginx/sbin/nginx
sudo chown -R root:root /usr/local/nginx/conf
sudo chmod -R 600 /usr/local/nginx/conf/
sudo chmod 700 /usr/local/nginx/conf

Open up the nginx configuration file located at /usr/local/nginx/conf/nginx.conf and fill in the following settings. Note these configuration settings are for nighthour.sg. Edit and replace the IP address, the server name, the ssl certificates, etc... with settings that are relevant for your test environment. Testing should be done on a non production system.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
user  nginx nginx;
worker_processes  4;
error_log  /var/log/nginx/error.log warn;
pid        /var/log/nginx/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" "$gzip_ratio"';

    
    sendfile        on;
    keepalive_timeout  65;
    server_tokens off;
    
    proxy_cache_path /var/nginx/cache levels=1:2 keys_zone=webcache:2m max_size=150m inactive=10080m use_temp_path=off;
    proxy_cache_key "$scheme$request_method$host$request_uri$is_args$args";
    proxy_cache_valid 200 302 90d;
    proxy_cache_valid 404 1m;

    proxy_cache_lock on;
    proxy_cache_revalidate on;

    gzip  on;
    

    map $upstream_http_cache_control $cachemap {
        "~."    $upstream_http_cache_control;
        default    no-store; 
    }


    server {
        listen       128.199.64.100:80;
        server_name  www.nighthour.sg nighthour.sg;
        root   /var/www/html;
        
        charset utf-8;

        access_log  /var/log/nginx/access.log  main;
        
        expires 900;
        add_header Cache-Control public;
        if ( $host ~* "nighthour.sg$" )
        {
           return 301 https://$host$request_uri;
        }

        return 400;

        location / {
            index  index.html index.htm;
            
        }

        # redirect server error pages to the static page /50x.html
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

    }


    # HTTPS server
    #
    server {
        listen       128.199.64.100:443 ssl http2;
        server_name  www.nighthour.sg nighthour.sg;
        root   /opt/nginx/www;
        charset utf-8;

        ssl_certificate      /etc/letsencrypt/live/nighthour.sg/fullchain.pem;
        ssl_certificate_key  /etc/letsencrypt/live/nighthour.sg/privkey.pem;
 
        ssl_session_timeout 15m;
        ssl_session_cache shared:SSL:50m;
        ssl_session_tickets off;
        
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256';
        ssl_prefer_server_ciphers  on;
        
        ssl_stapling on;
        ssl_stapling_verify on;
        ssl_trusted_certificate /etc/letsencrypt/live/nighthour.sg/fullchain.pem;
        resolver 8.8.8.8 8.8.4.4 valid=300s;
        resolver_timeout 5s;
        
         
        add_header Strict-Transport-Security "max-age=31536000;includeSubDomains";
        
        access_log  /var/log/nginx/ssl_access.log  main;


        location / {
            
            index  index.html index.htm;
            
            html_head_filter "<script src=\"/scripts/mymonitor.js\"></script>";
            
            proxy_cache webcache;
            proxy_cache_bypass $http_cache_control;
            
            proxy_set_header HOST $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_pass http://127.0.0.1;
           
            proxy_hide_header Cache-Control;
 
            add_header Cache-Control $cachemap;
            add_header Strict-Transport-Security "max-age=31536000;includeSubDomains";
            
        }

   
        
        # redirect server error pages to the static page /50x.html
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
        
       
    }

}

The configuration above sets up Nginx to listen on the public ip address at port 80 and 443. The server block at port 80 redirects HTTP request to HTTPS at port 443. In the server block for port 443 (HTTPS), proxy_pass to http://127.0.0.1 is configured. http://127.0.0.1 is where the Apache web server is listening for traffic.

We also turn on the Html Head filter module by setting the directive html_head_filter with its argument string in the location block.

html_head_filter "<script src=\"scripts\mymonitor.js\"></script>";

This argument string is the text to be inserted after the <head> tag in the HTTP response body from the Apache web server. The argument string is a script tag. It is a monitoring javascript, mymonitor.js. This script tag will be inserted into the HTTP response body.

Start up Nginx with the following command

sudo /usr/local/nginx/sbin/nginx

Access a page on the website using your favourite web browser and view the page source. The monitoring script should be inserted.

Nginx Html head filter module script insertion
Fig 8. Nginx Html Head filter module -- Script insertion

Some other tests can include html pages with multiple <head> tags, (the monitoring script should be inserted once), head tags with leading/trailing spaces and a mix of upper/lower case, or a Php script dynamically generating html content, or a 404 not found error page (monitoring script should not be inserted) etc... The Html Head filter module should handle all these cases properly.

When all the testings are done and the results meet expectations, the filter module can be deployed to production. The filter module is actually deployed on nighthour.sg, inserting the monitoring script into the web pages here.

A note about previous versions

There are previous versions of this filter with more features. For example, sending a blank page (blocking) when the <head> tag is not found, a logging mode that allows content to pass through unmodified, the avoidance of HTTP chunked transfer, size limit of 10MiB for static content etc...

Some of these features such as sending a blank page when <head> is not found can be useful. However, all these other features have made the module complex and harder to reason about its behaviour. There are also issues with the trick of avoiding chunked transfer encoding. A simple module has become far more complicated than is necessary.

A good program has to be as simple as possible but still get its job done. In this case, it is really about inserting a text string after the

tag. This is when I decided to throw away all these features and revert back to this simple goal and function.

If features like blocking, additional content size limits, avoidance of chunked transfer encoding etc... are needed. It will be far better to implemet these as seperate customized versions, built for a specific purpose. This will reduce variations of combining different features all into one, making the behaviour of the module easier to grasp and reason. It also improves performance and reduce bugs.

The blocking feature though is useful from a security perspective. For example, there can be cases where a monitoring script has to be present in all html pages. In this case, html pages that doesn't have <head> tag can be blocked, since the monitoring script can't be inserted.

A customized version of this module that will send a blank page if the <head> tag is not found within the first 256 characters is available at the following github link.

Take note that the customized version should not be installed together with the non blocking version on the same nginx instance.

Conclusion and Afterthought

This article runs through the design and implementation of a simple nginx filter module that inserts a text into the http response body, after the html <head> tag. The code implementation though doesn't exactly follow nginx coding convention, it follows the author's random style.

Nginx has its own recommended coding convention. For those attempting to write nginx modules, it is good to follow the nginx coding convention. The coding convention is documented in the Nginx development guide. I may reformat this code again in the future to follow the nginx convention.

Nginx is a high performance web server and reverse proxy that is highly extensible. It can serve as a Web Application Firewall (WAF) through modules such as Mod-Security, NAXSI or even act as an application server through project such as Openresty. Learning to write an Nginx module will allow an IT professional to know more about the internals of this flexible web infrastructure that is gaining wide usage.

The knowledge gained can benefit developers, infrastructure engineers, security engineers/professionals and even system administrators who code.

Useful References

The full source code for the Nginx Html Head Filter is available at the following Github link.
https://github.com/ngchianglin/NginxHtmlHeadFilter

A customized version that will send an empty page if the <head> tag is not found is available at
https://github.com/ngchianglin/NginxHtmlHeadBlankFilter

If you have any feedback, comments, corrections or suggestions to improve this article. You can reach me via the contact/feedback link at the bottom of the page.

Article last updated on Nov 2020.