Developing an Nginx URL Whitelisting Module
Premature optimization is the root of all evil.
,
Donald Knuth.
29 Oct 2019
Introduction
One of the challenges of securing web applications and websites is preventing the accidental exposure of sensitive parts of an application or website, such as administrative interfaces. A common technique is to blacklist an application path and prevent access to resources starting or matching with that path. Other techniques include disabling unneeded administrative interfaces, or removing unwanted features. This article shows how to develop an Nginx module that allows access only to whitelisted URLs or web resources. Any URLs that are not in the whitelist will be blocked.
A whitelist approach offers good protection as all resources are denied access by default. A security administrator or web developer has to explicitly whitelist each web resource to enable access. Whitelisting can also mitigate against application frameworks that exposes sensitive interfaces by mistake, such as the case of vulnerable Spring Boot actuators. It can also mitigate against accidental uploading of sensitive files to a website.
Whitelisting may sound tedious, but for a web application or web based api, the developers will know exactly what are the web resources that should be accessible by normal end users. The quality assurance testers will also be testing the known end points and functionalities accessible by users.
It will be relatively easy to create a listing of such URLs to be whitelisted. A whitelisting approach may not be suitable for all use cases, but it will prove useful in cases where security is essential. It can greatly reduce the attack surface of a website or web application.
Design and Approach
The diagram below shows the URLs for a web application. It can be seen that the URLs that should be accessible by end users are a small subset of all the available URLs.
Some traditional enterprise middlewares and application frameworks are complex and have a fair number of interfaces exposed. Many of these interfaces are not meant for user access.
Applications can also have administrative consoles, management interfaces, status monitoring features and internal APIs that should not be publicly accessible. It will be prudent not to expose the URLs that users should not access.
Nginx can be configured as a reverse proxy with the URLs whitelisting module enabled. Any URLs that are not explicitly whitelisted will be blocked. The following diagram illustrates this.
The "/" on www.myapp.com is whitelisted, and the /store is whitelisted. Access is granted to these 2 URLs. When an access attempt is made for /admin, it is blocked and HTTP 404 (Not Found) is returned.
One of the things to consider when building the whitelisting module is the data structure for the whitelist. Nginx exposes a HTTP Request structure (ngx_http_request_t) to modules. It contains a uri field (ngx_str_t) holding the URL string starting from web root. The whitelisted URLs can be stored as an array of Nginx string (ngx_str_t) and a comparison be done with the uri field in a loop.
The problem with using an array is that if the number of whitelisted URLs are large, many comparisons will be required. For performance, using a hash table will ensure a faster lookup. But memory requirements and the hashing function have to be considered. For the case here, we will use a tree like data structure (trie), just like the file directory tree. It should offer a faster lookup than an array of URL strings. We also don't have to worry about hashing function or allocating memory for hashing tables.
The following diagram illustrates this.
A URI or URL is broken into parts, starting from the root '/'. The root has children either sub directories or files. Each sub directory again has its own children, either files or sub directories. A sub directory ends with a "/"; for example, "scripts/".
To whitelist http://www.nighthour.sg/articles/index.html. The hostname portion is not included, the syntax starts from the webroot "/", follows by "article/" and then "index.html". The module requires a URL syntax like this
This will be further broken down into the following parts in the tree structure.
/ | articles/ | index.html |
If a URL ends with a subdirectory, a trailing forward slash is required when specifying to the module that the subdirectory should be whitelisted and accessible. For example, https://www.nighthour.sg/articles/. The URL syntax required for the module to whitelist this will be
This will give the following parts in the directory tree
/ | articles/ |
If a webresource ends with a file, the trailing slash is not required. For example, https://www.nighthour.sg/myapi/myapplication. The URL syntax for the module will be
This will give the following parts in the tree.
/ | myapi/ | myapplication |
We will use a node structure that represents a part of the URL. A node may have other child nodes. A node contains a string holding its path segment, example "/" or "scripts/". Using this we can build a tree like structure that can represent all the whitelisted URLs of a website or web application.
For each HTTP request, the module compares the uri string against the tree structure, part by part. Once a part doesn't match, we know it is not in the whitelist. A directory tree like structure minimizes the comparisons required. Conversely, if all parts matched, then it is in the whitelist and access should be granted. The module returns a HTTP 404 (not found) error for URLs that are not whitelisted.
Note that the module doesn't compare against the URL query string or query parameters. For example,
The portion starting from the question mark is the query string. This is not used by the module when checking the whitelist. When specifying the URL syntax for the module to whitelist; do not include the query string.
The URI whitelisting module can be used on a site hosted directly by nginx or with nginx configured as a reverse proxy. The reverse proxy option is particularly useful as an additional layer of protection for web applications or api end points.
Extensions Bypass Feature
A web application or website may have a lot of static assets like images that are for public access. If there are no other sensitive images present on the application or website, it may be convenient to have a way to grant access to all images or all files that end with a specific extension. The URLs whitelist module has a directive that caters for this.
The wh_list_bypass directive. It can take a list of file extensions such as "jpg", "png", "svg", "gif", "webp" etc... as arguments. Any web resource that ends with one of the specified extension will be granted access by the module.
This directive should be used carefully. Although it is a convenient way to grant access to URLs ending with specific extensions, it is also contrary to the strict whitelisting approach.
Implementation
This section will run through the source code of the Nginx URL whitelisting Module. It will not explain the basics of writing Nginx modules. Refer to the Nginx Development Guide for details on Nginx development. Another good beginner resource is Emiller's Guide to Nginx Module Development.
The full source code of the module is available at the Github link at the end of the article.
The code snippet below shows a few macro constants and the node data structure for building the URI tree. NGX_WHL_INIT_CHIDREN_SZ is the number of initial children for each node. The child nodes can be expanded when necessary until the maximum defined in NGX_WHL_MAX_CHILDREN.
NGX_WHL_MAXPATHSZ sets the maximum length for a URL. It is currently defined as 2048. A web administrator may want to reduce this number if he or she is sure that the web application or website does not have URLs that are this long. For example, I can set a value of 100. Any URL that exceeds 100 in length will be blocked by the module with HTTP 404 error.
NGX_WHL_MAX_CHILDREN defines the maximum number of child nodes that a parent node can have. NGX_WHL_TH_BSEARCH defines when binary search will be used to find a child node. If the number of children exceeds NGX_WHL_TH_BSEARCH, binary search will be used, otherwise it will just loop through all the children. The child nodes are sorted by qsort when the module loads in the configuration.
NGX_WHL_MAX_NEST defines the maximum number of nested path segments. Example, /myapp/dir1/dir2/, this will have 3 path segments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #define NGX_WHL_INIT_CHIDREN_SZ 8
#define NGX_WHL_MAXPATHSZ 2048
#define NGX_WHL_MAX_CHILDREN 65536
#define NGX_WHL_TH_BSEARCH 6
#define NGX_WHL_MAX_NEST 10
typedef struct ngx_whl_pnode_s ngx_whl_pnode_t;
struct ngx_whl_pnode_s
{
ngx_str_t *segment;
size_t num_child;
ngx_whl_pnode_t **children;
size_t maxchild;
size_t end_slash_allowed;
};
|
The following shows the configuration structure of the module. This structure is used by Nginx for storing the configuration options. The uri_tree variable holds the URL tree. This tree is built as Nginx reads in the configuration options.
bp_extens is an array containing file extensions that will be bypassed by the module. A list of extensions such as jpg, gif etc... can be provided in a bypass configuration option. This URI whitelist module will skip URLs with such extensions and allow access. The enabled flag sets whether the module is turned on or off.
1 2 3 4 5 6 | /* Configuration struct */
typedef struct {
ngx_flag_t enabled;
ngx_array_t *bp_extens;
ngx_whl_pnode_t *uri_tree;
} ngx_http_uri_whitelist_loc_conf_t;
|
The following shows the code snippet for the module configuration directives. wh_list directive can be set to on|off, to determine whether the module is enabled or disabled. The wh_list_uri directive takes a URL string starting with "/" , these are the URLs that will be whitelisted. wh_list_bypass is for specifying the extensions that will be bypassed by the module.
The functions for handling each directive in the configuration file are specified in this ngx_command_t array as well. ngx_http_wh_list_cfg() is a function to process each wh_list_uri directive and builds up the URI tree. ngx_http_wh_list_bypass_cfg() populates the bypass array with the file extensions that will be skipped by the module.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | /* Module Directives */
static ngx_command_t ngx_http_uri_whitelist_commands[] = {
{ ngx_string("wh_list"),
NGX_HTTP_LOC_CONF | NGX_CONF_FLAG,
ngx_conf_set_flag_slot,
NGX_HTTP_LOC_CONF_OFFSET,
offsetof(ngx_http_uri_whitelist_loc_conf_t, enabled),
NULL },
{ ngx_string("wh_list_uri"),
NGX_HTTP_LOC_CONF | NGX_CONF_TAKE1,
ngx_http_wh_list_cfg,
NGX_HTTP_LOC_CONF_OFFSET,
0,
NULL },
{ ngx_string("wh_list_bypass"),
NGX_HTTP_LOC_CONF | NGX_CONF_1MORE,
ngx_http_wh_list_bypass_cfg,
NGX_HTTP_LOC_CONF_OFFSET,
0,
NULL },
ngx_null_command
};
|
The following are the code snippets for the Module context and Module definition. This article will not go into details on what these are. Refer to the earlier links on Nginx development for more information.
The ngx_http_uri_whitelist_init() function initializes the module after the configuration has been read. ngx_http_uri_whitelist_create_loc_conf() and ngx_http_uri_whitelist_merge_loc_conf() are for creating and merging the configuration structure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | /* Module Context */
static ngx_http_module_t ngx_http_uri_whitelist_module_ctx = {
NULL, /* preconfiguration */
ngx_http_uri_whitelist_init, /* postconfiguration */
NULL, /* create main configuration */
NULL, /* init main configuration */
NULL, /* create server configuration */
NULL, /* merge server configuration */
ngx_http_uri_whitelist_create_loc_conf,/* create location configuration */
ngx_http_uri_whitelist_merge_loc_conf /* merge location configuration */
};
/* Module Definition */
ngx_module_t ngx_http_uri_whitelist_module = {
NGX_MODULE_V1,
&ngx_http_uri_whitelist_module_ctx, /* module context */
ngx_http_uri_whitelist_commands, /* module directives */
NGX_HTTP_MODULE, /* module type */
NULL, /* init master */
NULL, /* init module */
NULL, /* init process */
NULL, /* init thread */
NULL, /* exit thread */
NULL, /* exit process */
NULL, /* exit master */
NGX_MODULE_V1_PADDING
};
|
The following is the code snippet for the ngx_http_uri_whitelist_init() function. It registers the module handler, ngx_http_uri_whitelist_handler(), to Nginx HTTP Access phase. At this phase of Nginx, the handler can choose whether to accept or reject a HTTP request.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | /* Module initialization */
static ngx_int_t
ngx_http_uri_whitelist_init(ngx_conf_t *cf)
{
ngx_http_handler_pt *h;
ngx_http_core_main_conf_t *cmcf;
cmcf = ngx_http_conf_get_module_main_conf(cf, ngx_http_core_module);
/* Add our module handler to the HTTP ACCESS phase */
h = ngx_array_push(&cmcf->phases[NGX_HTTP_ACCESS_PHASE].handlers);
if (h == NULL) {
return NGX_ERROR;
}
*h = ngx_http_uri_whitelist_handler;
return NGX_OK;
}
|
The following is the code snippet for the module handler, ngx_http_uri_whitelist_handler(). The handler function checks whether the whitelist module is set to enable or disable. If it is disabled, it will pass control back to nginx; otherwise it will proceed to check the URL for bypass file extensions. If an extension matches, it will pass control back to nginx.
The handler then calls ngx_http_wh_check_path_exists() function to see if the URL is in the whitelist URI tree. It returns HTTP 404 error if the URL is not whitelisted. If the URL is whitelisted, control is passed back to Nginx.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | /* Module Handler */
static ngx_int_t
ngx_http_uri_whitelist_handler(ngx_http_request_t *r)
{
size_t i;
ngx_str_t *ext;
ngx_http_uri_whitelist_loc_conf_t *slcf;
#if WHL_DEBUG
ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
"[URI_WHITELIST]: %V",&r->uri);
ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
"[URI_WHITELIST] extension: %V",&r->exten);
#endif
if (r->uri.len == 0) {
return NGX_HTTP_BAD_REQUEST;
}
slcf = ngx_http_get_module_loc_conf(r, ngx_http_uri_whitelist_module);
if (slcf == NULL) {
return NGX_HTTP_INTERNAL_SERVER_ERROR;
}
if (slcf->enabled != 1) {
ngx_log_error(NGX_LOG_WARN, r->connection->log, 0,
"[URI_WHITELIST] : White list module disabled !");
return NGX_DECLINED;
}
/* Check for extensions bypass */
ext = slcf->bp_extens->elts;
for (i=0; i < slcf->bp_extens->nelts; i++) {
if (r->exten.len == ext[i].len
&& ngx_strncmp(r->exten.data, ext[i].data, r->exten.len) == 0)
{
return NGX_DECLINED;
}
}
if (!ngx_http_wh_check_path_exists(r->uri.data,
r->uri.len, slcf->uri_tree))
{
/* If uri is not present in whitelist */
ngx_log_error(NGX_LOG_ALERT, r->connection->log, 0,
"[URI_WHITELIST] : Access Denied for [ %V ] ", &r->uri);
return NGX_HTTP_NOT_FOUND;
}
return NGX_DECLINED;
}
|
The following code snippet are the functions for building up the URI tree. ngx_http_wh_create_node() function creates a new node. ngx_http_wh_add_child() function adds a child node to a parent. If the url path passed in is a single "/", ngx_http_wh_add_child() returns the parent. This is to skip repeated "/" in the URL. ngx_http_wh_add_child() returns the child node either if the child node already exists or it is added successfully to the parent node.
If the parent node runs out of space for storing child nodes, ngx_http_wh_add_child() calls the ngx_http_wh_resize_children() function. ngx_http_wh_resize_children() function resizes the children array of the parent node doubling the capacity each time. The maximum number of children nodes is limited to NGX_WHL_MAX_CHILDREN (65536), defined earlier in the source.
ngx_http_wh_add_path() function adds a URL or URI to the URI tree. It loops through the URL string, breaking it into its constituent parts and add each to the URI tree.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | /* Creates a path node based on a part of the uri */
static ngx_whl_pnode_t *
ngx_http_wh_create_node(const u_char* path, size_t plen, ngx_conf_t *cf)
{
size_t sz;
ngx_str_t *sgmt;
ngx_whl_pnode_t *node;
if (path == NULL)
return NULL;
if (plen == 0 || plen >= NGX_WHL_MAXPATHSZ)
return NULL;
sgmt = ngx_pcalloc(cf->pool, sizeof(ngx_str_t));
if (sgmt == NULL) {
return NULL;
}
sz = plen + 1;
sgmt->data = ngx_pcalloc(cf->pool, sz * sizeof(u_char));
if (sgmt->data == NULL) {
return NULL;
}
ngx_memcpy(sgmt->data, path, sz);
sgmt->len = plen;
node = ngx_pcalloc(cf->pool, sizeof(ngx_whl_pnode_t));
if (node == NULL) {
return NULL;
}
node->children = ngx_pcalloc(cf->pool,
NGX_WHL_INIT_CHIDREN_SZ * sizeof(ngx_whl_pnode_t *));
if (node->children == NULL) {
return NULL;
}
node->segment = sgmt;
node->num_child = 0;
node->maxchild = NGX_WHL_INIT_CHIDREN_SZ;
node->end_slash_allowed = 0;
return node;
}
/* Adds a uri path to the uri tree */
static ngx_whl_pnode_t *
ngx_http_wh_add_child(const u_char *path, ngx_whl_pnode_t *parent,
ngx_conf_t *cf)
{
size_t plen, i;
ngx_whl_pnode_t *node;
if (path == NULL || parent == NULL) {
return NULL;
}
plen = ngx_strlen(path);
if (plen == 0 || plen >= NGX_WHL_MAXPATHSZ) {
return NULL;
}
/* Ignore additional '/' */
if (plen == 1 && ngx_strncmp(path, "/", plen) == 0) {
return parent;
}
for (i = 0; i < parent->num_child; i++) {
/* check if segment path already exists */
node = parent->children[i];
if(node->segment->len == plen &&
ngx_strncmp(path, node->segment->data, plen) == 0)
{
return node;
}
}
/* uri segment path does not exists allocate new child */
node = ngx_http_wh_create_node(path, plen, cf);
if (node == NULL) {
return NULL;
}
if (i >= parent->maxchild) {
if (!ngx_http_wh_resize_children(parent, cf)) {
return NULL;
}
}
parent->children[i] = node;
parent->num_child ++;
return node;
}
/* Resizes a node children array if original space is insufficient */
static size_t
ngx_http_wh_resize_children(ngx_whl_pnode_t *parent, ngx_conf_t *cf)
{
size_t new_sz, i;
ngx_whl_pnode_t **old, **new;
if (parent == NULL) {
return 0;
}
new_sz = parent->maxchild * 2;
if (new_sz > NGX_WHL_MAX_CHILDREN) {
return 0;
}
new = ngx_pcalloc(cf->pool, new_sz * sizeof(ngx_whl_pnode_t*));
if (new == NULL) {
return 0;
}
old = parent->children;
for (i=0; i<parent->num_child; i++) {
new[i] = old[i];
}
parent->children = new;
parent->maxchild = new_sz;
old = NULL;
return 1;
}
/* Adds a full URL to the uri tree */
static size_t
ngx_http_wh_add_path(u_char *path, ngx_whl_pnode_t *root, ngx_conf_t *cf)
{
size_t plen, last, index, nested;
u_char *p, c, tmp[NGX_WHL_MAXPATHSZ];
ngx_whl_pnode_t *node;
if (path == NULL || root == NULL) {
return 0;
}
plen = ngx_strlen(path);
if (plen == 0 || plen >= NGX_WHL_MAXPATHSZ) {
return 0;
}
p = path;
index = last = nested = 0;
node = root;
while ((c=*p++) != '\0') {
switch(c) {
case '/':
if (index + 1 >= NGX_WHL_MAXPATHSZ
|| nested > NGX_WHL_MAX_NEST) {
return 0;
}
tmp[index] = c;
index++;
tmp[index] = '\0';
node = ngx_http_wh_add_child(tmp, node, cf);
if (node == NULL) {
return 0;
}
nested++;
index = last = 0;
break;
default:
if (index >= NGX_WHL_MAXPATHSZ) {
return 0;
}
tmp[index] = c;
index++;
last = 1;
}
}
if (last) {
if (index >= NGX_WHL_MAXPATHSZ
|| nested > NGX_WHL_MAX_NEST) {
return 0;
}
tmp[index] = '\0';
node = ngx_http_wh_add_child(tmp, node, cf);
if (node == NULL) {
return 0;
}
} else {
/* node ends with '/' */
node->end_slash_allowed = 1;
}
return 1;
}
|
The following is the code snippet for the ngx_http_wh_list_cfg() function. This function is called to process each wh_list_uri directive containing the URL to be whitelisted. It calls the ngx_http_wh_add_path() function to add each URL to the whitelist URI tree. It also creates the root node using ngx_http_wh_create_node() function, if it doesn't exist.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | /* Process the white list uri configuration */
static char *
ngx_http_wh_list_cfg(ngx_conf_t *cf, ngx_command_t *cmd, void *conf)
{
size_t len;
u_char *uri;
ngx_str_t *value;
ngx_whl_pnode_t *root;
ngx_http_uri_whitelist_loc_conf_t *slcf;
if (cf->args->nelts < 2) {
return NGX_CONF_ERROR;
}
value = cf->args->elts;
uri = value[1].data;
len = value[1].len;
if (uri[0] != '/') {
ngx_log_error(NGX_LOG_EMERG, cf->log, 0, "[URI_WHITELIST]: "
"Error uri must starts with '/'");
return NGX_CONF_ERROR;
}
if (uri[len] != '\0') {
ngx_log_error(NGX_LOG_EMERG, cf->log, 0, "[URI_WHITELIST]: "
"Error uri does not end with '\0'");
return NGX_CONF_ERROR;
}
slcf = conf;
if (slcf->uri_tree == NULL) {
slcf->uri_tree = ngx_http_wh_create_node( (u_char *)"/", 1, cf);
if (slcf->uri_tree == NULL) {
return NGX_CONF_ERROR;
}
}
root = slcf->uri_tree;
if (!ngx_http_wh_add_path(uri, root, cf)) {
ngx_log_error(NGX_LOG_EMERG, cf->log, 0, "[URI_WHITELIST]: "
"Error cannot add uri to whitelist");
return NGX_CONF_ERROR;
}
return NGX_CONF_OK;
}
|
The ngx_http_wh_check_path_exists() function checks if a URL string is present in the URI tree. The following shows the code snippet. It breaks down a URL string into its parts. It checks that a URL string always begin with a "/" (must always have a root node). Then for each of its subsequent child parts, it checks whether the parent node contains the child part.
If the node is the root node "/" or if the node ends with a slash like "scripts/", then the end_slash_allowed flag of the node is checked. When end_slash_allowed is set to 1, it means that the node (URL) is present, otherwise it is not. The end_slash_allowed flag is set only when there is an explicit whitelist directive (wh_list_uri) for a URL that ends with "/".
This is required because when a URL like "/mydirectory/myfile.php" is whitelisted; the nodes "/", "mydirectory/" and "myfile.php" are created in the URI tree. However, this doesn't mean that the URL string "/" , or "/mydirectory/" should be accessible, since these 2 URLs are not whitelisted explicitly. To make "/" or "/mydirectory/" accessible, they must be specified explicitly using the whitelist directive.
Notice that in the parsing code, there is no handling of "./" or "../". This is not necessary in our case as Nginx normalizes the request URL before passing it to the module.
The ngx_http_wh_check_path_exists() function calls ngx_http_wh_check_path_seg() to check that a child node exists under a parent node. We will not go through the ngx_http_wh_check_path_seg() function. Refer to the Github link at the end of the article for the full module source code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | /* Checks if a uri path is present in the uri tree */
static size_t
ngx_http_wh_check_path_exists(u_char* path, size_t len, ngx_whl_pnode_t *root)
{
size_t plen, index, last;
u_char c, *p, tmp[NGX_WHL_MAXPATHSZ];
ngx_whl_pnode_t *node;
if (path == NULL || root == NULL) {
return 0;
}
if (len == 0 || len >= NGX_WHL_MAXPATHSZ) {
return 0;
}
p = path;
c = *p++;
if( c != '/') {
return 0;
}
plen = len - 1;
node = root;
index = last = 0;
while (plen-- > 0) {
c = *p++;
switch(c) {
case '/':
if (index + 1 >= NGX_WHL_MAXPATHSZ) {
return 0;
}
tmp[index] = c;
index++;
tmp[index] = '\0';
node = ngx_http_wh_check_path_seg(tmp, index, node);
if (node == NULL) {
return 0;
}
index = last = 0;
break;
default:
last = 1;
if (index >= NGX_WHL_MAXPATHSZ) {
return 0;
}
tmp[index] = c;
index++;
}
}
if (last) {
if (index >= NGX_WHL_MAXPATHSZ) {
return 0;
}
tmp[index]='\0';
node = ngx_http_wh_check_path_seg(tmp, index, node);
if (node == NULL) {
return 0;
}
} else {
/* node ends with '/' */
if (node->end_slash_allowed == 0) {
return 0;
}
}
return 1;
}
|
Installation and Testing
To install the module, obtain a copy of the module source code from github.
To verify the integrity and signature of the module source code, refer to this link. Obtain a copy of my public key; follow the page instructions on how to import it and verify the git commit.
Download the latest stable nginx source code from https://nginx.org. Verfiy the integrity of the source code using the pgp signature.
The downloaded gzipped file should have the following SHA256 checksum.
Extract the nginx source and compile nginx with the URI whitelisting module. Install it into /usr/local/nginx.
cd nginx-1.18.0/
./configure --with-cc-opt="-Wextra -Wformat -Wformat-security -Wformat-y2k -Werror=format-security -fPIE -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all" --with-ld-opt="-pie -Wl,-z,relro -Wl,-z,now -Wl,--strip-all" --without-http_rewrite_module --add-module=../ngx_http_uri_whitelist_module
make
sudo make install
We can now test the URI whitelist module. It is assumed that there is already an apache website set up on the system and apache httpd is configured to listen on port 8080. We can configure nginx as a reverse proxy for the apache website. The module can also be used on a website hosted directly by nginx.
Edit the /usr/local/nginx/conf/nginx.conf with the following.
user nginx nginx;
worker_processes 1;
error_log /var/log/nginx/error.log warn;
pid /var/log/nginx/nginx.pid;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" "$gzip_ratio"';
sendfile on;
keepalive_timeout 65;
server_tokens off;
proxy_cache_path /usr/local/nginx/cache levels=1:2 keys_zone=webcache:2m max_size=150m;
proxy_cache_key "$scheme$request_method$host$request_uri$is_args$args";
proxy_cache_valid 200 302 30m;
proxy_cache_valid 404 1m;
gzip on;
map $sent_http_content_type $cachemap {
default no-store;
~text/html "private, max-age=900";
text/plain "private, max-age=900";
text/css "private, max-age=7776000";
application/javascript "private, max-age=7776000";
~image/ "private, max-age=7776000";
}
server {
listen 80;
server_name localhost;
root /opt/nginx/www;
charset utf-8;
access_log /var/log/nginx/access.log main;
location / {
index index.html index.htm;
proxy_cache webcache;
proxy_cache_bypass $http_cache_control;
proxy_set_header HOST $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_pass http://127.0.0.1:8080;
add_header Cache-Control $cachemap;
wh_list off;
}
# redirect server error pages to the static page /50x.html
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
}
Create a web root directory for nginx.
Make sure that the nginx user and group are present, otherwise create them.
sudo chmod 750 /opt/nginx/home
sudo groupadd -g 9870 nginx
sudo useradd -d /opt/nginx/home -u 9870 -g 9870 -s /bin/false nginx
On the apache website, make sure that you have a index.html with some test content inside. Create another test file, test.txt and put in some test content. The Nginx URI whitelist module is currently turned off in the nginx.conf. So these urls should be accessible from the Nginx proxy. Make sure apache httpd is running and listening on port 8080. Start up Nginx.
Access http://localhost, http://localhost/index.html and http://localhost/test.txt. All three URLs should be accessible and return the right content.
devuser1@devmachine:~$ curl -i http://localhost HTTP/1.1 200 OK Server: nginx Date: Tue, 29 Oct 2019 04:38:06 GMT Content-Type: text/html; charset=UTF-8 Content-Length: 170 Connection: keep-alive Last-Modified: Tue, 29 Oct 2019 04:36:42 GMT Vary: Accept-Encoding Cache-Control: private, max-age=900 Accept-Ranges: bytes <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Testing html page</title> </head> <body> <p> This is a test for Nginx URI whitelisting ! </p> </body> </html> devuser1@devmachine:~$ curl -i http://localhost/index.html HTTP/1.1 200 OK Server: nginx Date: Tue, 29 Oct 2019 04:43:04 GMT Content-Type: text/html; charset=UTF-8 Content-Length: 170 Connection: keep-alive Last-Modified: Tue, 29 Oct 2019 04:36:42 GMT Vary: Accept-Encoding Cache-Control: private, max-age=900 Accept-Ranges: bytes <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Testing html page</title> </head> <body> <p> This is a test for Nginx URI whitelisting ! </p> </body> </html> devuser1@devmachine:~$ curl -i http://localhost/test.txt HTTP/1.1 200 OK Server: nginx Date: Tue, 29 Oct 2019 04:43:50 GMT Content-Type: text/plain; charset=UTF-8 Content-Length: 56 Connection: keep-alive Last-Modified: Tue, 29 Oct 2019 04:37:23 GMT Cache-Control: no-store Accept-Ranges: bytes This is a test text file Testing Nginx URI whitelisting
Edit the /usr/local/nginx/conf/nginx.conf and turn on the Nginx URI whitelisting module.
Reload nginx with the new configuration.
Access the 3 URLs again using curl. This time, access should be denied with HTTP 404 error.
devuser1@devmachine:~$ curl -i http://localhost HTTP/1.1 404 Not Found Server: nginx Date: Tue, 29 Oct 2019 04:49:49 GMT Content-Type: text/html; charset=utf-8 Content-Length: 146 Connection: keep-alive <html> <head><title>404 Not Found</title></head> <body> <center><h1>404 Not Found</h1></center> <hr><center>nginx</center> </body> </html> devuser1@devmachine:~$ curl -i http://localhost/index.html HTTP/1.1 404 Not Found Server: nginx Date: Tue, 29 Oct 2019 05:07:19 GMT Content-Type: text/html; charset=utf-8 Content-Length: 146 Connection: keep-alive <html> <head><title>404 Not Found</title></head> <body> <center><h1>404 Not Found</h1></center> <hr><center>nginx</center> </body> </html> devuser1@devmachine:~$ curl -i http://localhost/test.txt HTTP/1.1 404 Not Found Server: nginx Date: Tue, 29 Oct 2019 04:49:40 GMT Content-Type: text/html; charset=utf-8 Content-Length: 146 Connection: keep-alive <html> <head><title>404 Not Found</title></head> <body> <center><h1>404 Not Found</h1></center> <hr><center>nginx</center> </body> </html>
Let's whitelist some of the URLs. Edit nginx.conf and add the following.
wh_list_uri /test.txt;
Reload nginx.
These 2 URLs should now be accessible again due to the whitelist.
devuser1@devmachine:~$ curl -i http://localhost/index.html HTTP/1.1 200 OK Server: nginx Date: Tue, 29 Oct 2019 05:13:50 GMT Content-Type: text/html; charset=UTF-8 Content-Length: 170 Connection: keep-alive Last-Modified: Tue, 29 Oct 2019 04:36:42 GMT Vary: Accept-Encoding Cache-Control: private, max-age=900 Accept-Ranges: bytes <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Testing html page</title> </head> <body> <p> This is a test for Nginx URI whitelisting ! </p> </body> </html> devuser1@devmachine:~$ curl -i http://localhost/test.txt HTTP/1.1 200 OK Server: nginx Date: Tue, 29 Oct 2019 05:16:35 GMT Content-Type: text/plain; charset=UTF-8 Content-Length: 56 Connection: keep-alive Last-Modified: Tue, 29 Oct 2019 04:37:23 GMT Cache-Control: no-store Accept-Ranges: bytes This is a test text file Testing Nginx URI whitelisting
However, when we try to access http://localhost or http://localhost/, both show HTTP 404 error.
devuser1@devmachine:~$ curl -i http://localhost HTTP/1.1 404 Not Found Server: nginx Date: Tue, 29 Oct 2019 05:17:43 GMT Content-Type: text/html; charset=utf-8 Content-Length: 146 Connection: keep-alive <html> <head><title>404 Not Found</title></head> <body> <center><h1>404 Not Found</h1></center> <hr><center>nginx</center> </body> </html> devuser1@devmachine:~$ curl -i http://localhost/ HTTP/1.1 404 Not Found Server: nginx Date: Tue, 29 Oct 2019 05:17:49 GMT Content-Type: text/html; charset=utf-8 Content-Length: 146 Connection: keep-alive <html> <head><title>404 Not Found</title></head> <body> <center><h1>404 Not Found</h1></center> <hr><center>nginx</center> </body> </html>
This is because the root directory "/" has not been whitelisted. To allow access, we need to add the following to nginx.conf.
Reload nginx and the root directory URL should be accessible again. This is similar for subdirectory. If the root of a subdirectory is to be accessible, it has to be whitelisted. For example,
Play around with the Nginx whitelist module. There is also the bypass extensions directive that will allow files with certain extensions such as jpg, gif etc... to be bypassed. The extensions bypass directive should be used carefully. For the best protection, web resources including static image files that are supposed to be accessible, should be whitelisted explicitly. The README.md at the module github repository has details on the syntax of its directives.
To whitelist a file extension, for instance ".txt", add the following to the nginx.conf
Create a new text file, mytest.txt and fill in some content. Reload nginx. This new text file will be accessible without being explicitly whitelisted. In fact, all files that end with ".txt" extensions will be accessible. The whitelist module will bypass the access checks for such extension.
devuser1@devmachine:~$ curl -i http://localhost/mytest.txt HTTP/1.1 200 OK Server: nginx Date: Thu, 31 Oct 2019 02:24:54 GMT Content-Type: text/plain; charset=UTF-8 Content-Length: 70 Connection: keep-alive Last-Modified: Thu, 31 Oct 2019 02:23:38 GMT Vary: Accept-Encoding Cache-Control: no-store Accept-Ranges: bytes This is another test file for trying on extensions bypass directive.
The Nginx whitelist module will print warnings and alerts to the nginx error log. If a URL is denied access, an alert will be in the error log. If the module itself is turned off, a warning will be logged. This is useful for security monitoring, where a security engineer or administrator may want to know about illegal access or if the module itself got disabled.
Some examples from the error log.
2019/10/29 12:43:50 [warn] 6523#0: *6 [URI_WHITELIST] : White list module disabled !, client: 127.0.0.1, server: localhost, request: "GET /test.txt HTTP/1.1", host: "localhost" 2019/10/29 12:49:40 [alert] 6653#0: *8 [URI_WHITELIST] : Access Denied for [ /test.txt ] , client: 127.0.0.1, server: localhost, request: "GET /test.txt HTTP/1.1", host: "localhost"
Generating a Whitelist Configuration using Python
To ease the whitelisting process, a python script can be used to generate a list of whitelisted URLs automatically. The following is a simple python3 script that traverse through a web document root folder/directory, generating the whitelist directives.
#!/usr/bin/python3
#
# Simple python script to traverse a web document root directory
# Generate a whitelist configuration for the files
# and directories
#
# Ng Chiang Lin
# Nov 2020
# https://www.nighthour.sg/articles/2019/developing-nginx-url-whitelisting-module.html
#
import os
rootdir = "HomePage"
ignore_exts = ['jpg', 'png', 'svg','gif','webp']
directive = "wh_list_uri"
def checkFile(filename):
#hidden file or directory
if filename.startswith('.'):
return False
parts = filename.split('.')
length = len(parts)
#filename doesn't have a dot extension
if length < 2 :
return True
#check that file extension is not in ignore list
extension = parts[length - 1]
for ext in ignore_exts:
if extension == ext :
return False
return True
def formatRelativePath(path):
parts = path.split('/')
length = len(parts)
if length < 2:
print("An error occurred file path format is wrong")
exit(1)
relativepath = ""
for i in range(1,length):
if i < length -1 :
relativepath = relativepath + parts[i] + "/"
else:
relativepath = relativepath + parts[i]
return relativepath
def listDir(directory):
with os.scandir(directory) as it:
for entry in it:
entryname = ''
if entry.is_file() and checkFile(entry.name) :
entryname = formatRelativePath(entry.path)
print(directive, ' /', entryname, ' ;', sep='')
elif entry.is_dir():
entryname = formatRelativePath(entry.path)
print(directive, ' /', entryname, ' ;',sep='')
print(directive, ' /', entryname, '/ ;',sep='')
listDir(entry.path)
if __name__ == "__main__":
print(directive, ' / ;', sep='')
listDir(rootdir)
The script prints the whitelist directives to the console. This can be redirected to a configuration file that can be included in an nginx configuration.
It is assumed that the files and sub directories in the web document root are not sensitive, and all should be publicly accessible. The script also skips certain file extensions such as images, "jpg", "png", "svg" etc... The rootdir variable and ignore_exts define the document root directory as well as the extensions to skip.
The python script can generate an initial list of white listed URLs. A web or security administrator should go through the whitelist and remove files/directories that should not be accessible.
For the image extensions that are skipped by the python script, the extension bypass directive of the Nginx module can be used to grant access to these image types. If there are sensitive images that should not be accessible, the image types should not be bypassed. Instead, an explicit whitelist needs to be generated for each of the image file that are not sensitive and can be publicly accessible.
Defense in Depth
There are many ways to protect administrative interfaces/consoles, API end points or other web resources that should not be publicly accessible. Security best practices often stress defense in depth, having multiple layers of defenses and mitigations.
Administrative components that are not necessary should be removed or disabled in an application. Proper firewall rules should be set up to protect internal interfaces that are listening on ports that should not be publicly accessible. Strong authentication and complex passwords should be set for administrative interfaces.
IP address filtering can be set up to control access to administrative console etc... There is also the approach of using mutual TLS authentication to secure private APIs. Mutual TLS ensures that only authorized clients with the right certificates can access an application. Mutual TLS also prevents Man in the middle attack, since both the client and server certificate have to be verified and trusted.
All these different measures can be used together with URLs whitelisting to secure an application and reduce its attack surface.
Conclusion and Afterthought
Whitelisting is a useful technique in information security. It can be used in web applications to guard against invalid user input, it can be used in enterprises to prevent unauthorized applications from running on desktops and servers. Whitelisting is used on network firewalls and rate limiters to stop malicious network traffic.
We can also use whitelisting on web URLs, controlling access to web resources. An Nginx module can control access to web resources using a whitelist of URLs. This can be an additional layer of defense against web attacks, vulnerabilities in web application frameworks, misconfigurations and accidental uploads of sensitive files. White listing of URLs together with other security measures can reduce the attack surface of a website or web application.
Useful References
- Veracode Blog on Exploiting Spring actuators, A blog post that explains how vulnerable Spring Boot actuators can be exploited by attackers.
- Nginx Development Guide, The official nginx guide on Nginx development.
- Emiller's Guide to Nginx Module Development, Evan Miller guide on how to develop and write Nginx modules. It is a useful guide for beginners to learn how to code an Nginx module.
- Nginx Auth Basic Module, A useful reference on how to code up an nginx module that controls access.
- Restricting Access using IP Addresses, Nginx documentation on how to restrict access to web resources using IP addresses.
- How to Set Up Mutual TLS Authentication to Protect Your Admin Console, An article on how to enable Mutual TLS to protect an admin console.
- mTLS with NGINX, A presentation on Mutual TLS using Nginx.
The full source code for the Nginx URI Whitelist Module is available at the following Github link.
https://github.com/ngchianglin/ngx_http_uri_whitelist_module
The python script that can generate a initial list of whitelist.
https://github.com/ngchianglin/VPS_MISC/blob/master/whitelist.py
If you have any feedback, comments, corrections or suggestions to improve this article. You can reach me via the contact/feedback link at the bottom of the page.
Article last updated on Nov 2020.