Silence & Solitude makes...

Pu's mind space

一次搭建高性能Nodejs httpServer的尝试

老板让尝试用node搭建一个高TPS的http服务器,业务不重要,仅仅测试一下传说中的适合I/O的技术能比java web container好多少.英文版测试结果:

Tried several approach to increase the TPS of a node.js http server to check if it’s competitive to be a easy tool for some specific tasks.
I create a simple http server, based on nodejs native http server. It receives http requests, records its information into a (remote) mongo DB, then response with ‘Okey’.

Test tool is Apache Bench, installed in the same host machine with http server: a desktop of “Dell OptiPlex 7010” with 8 core CPU as well as 8G memory running Oracle Linux Server 6.8.

Optimization approaches include:

  • Increasing the host server’s ‘open files limit’ with “ulimit -n 99999” while the default is 1024, also increasing the default stack size for V8 in node with '--max-old-space-size 2048'

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    [root@pu ~]# ulimit -a
    core file size (blocks, -c) 0
    data seg size (kbytes, -d) unlimited
    scheduling priority (-e) 0
    file size (blocks, -f) unlimited
    pending signals (-i) 31197
    max locked memory (kbytes, -l) 64
    max memory size (kbytes, -m) unlimited
    open files (-n) 99999
    pipe size (512 bytes, -p) 8
    POSIX message queues (bytes, -q) 819200
    real-time priority (-r) 0
    stack size (kbytes, -s) 10240
    cpu time (seconds, -t) unlimited
    max user processes (-u) 31197
    virtual memory (kbytes, -v) unlimited
    file locks (-x) unlimited
  • Re-use TCP connections for successive requests, i.e. make use of the Keep-Alive feature of http(1.0)

  • Make the http server a cluster, make use of more cores of CPU.
  • Change the business logic, return immediately when receiving request instead of waiting for database finish recording.

OS level tuning

Increasing the max open files(and hence the sockets) as well as stack size didn’t improve the performance. Which means we haven’t reached the limit of parallel socket numbers, nor memory limit.

Reuse connection

The http header ‘Connection: keep-alive’ is needed for Http/1.0 to reuse connection for further request— while for Http/1.1, the connection is keep-alive by default. Apache Bench is using 1.0, and with a parameter “-k”, it will add the “keep-alive” header.
As Http/1.0 can’t make use of ‘Transfer-encoding: chunked’, there’s only one possible way for the client to determine the boundary of successive requests in a single connection, i.e. ‘Content-Length’, it’s easy to know the content-length when requesting static file, but for the case of dynamic page, we need to manually calculate the ‘Content-Length’ and then mark it in the response header. And this is what we do by adding code in the node.js http server.
By doing this, the throughput increased:

1
2
3
4
5
6
[root@pu ~]# ab -n 10000 -c 10 http://localhost:1337/
Concurrency Level: 10
Time taken for tests: 1.512 seconds
[root@pu ~]# ab -n 10000 -c 10 -k http://localhost:1337/
Concurrency Level: 10
Time taken for tests: 1.144 seconds

Introduce concurrency

It contains two aspects when we introduce concurrency:
Adding the concurrency level of the test client
Adding the concurrency level of the http server
Since we are doing test in the exact server where http server deploys, the bottleneck can shift between client and server. So adding the concurrency level blindly won’t always increase the performance.
Adding the concurrency of Apache Bench is easy, just increase the parameter value of “-c”, adding this value will increase the TPS, but only valid in a certain range, approximately 1-50, in this range increase concurrency level will increase TPS, but out of this range, the TPS didn’t increase, –and also won’t decrease. For example if you increase the concurrency level to a non-sense high value, it won’t increase the TPS as 50.
To add the concurrency of Nodejs Http Server, we use node’s build-in feature of ‘cluster’, creating several slaves to strive for a single port. After several tuning, I find the concurrent level of 4 slaves increases the performance better, unlike the Apache Bench, adding the concurrency level of Http Server bigger than 4 will cause the total TPS decreased– this is because it will occupy the CPU resources that used for Apache Bench.

1
2
3
[root@pu ~]# ab -n 10000 -c 100 -k http://localhost:1337/
Concurrency Level: 100
Time taken for tests: 0.794 seconds

It is argued that several slaves striving for the same port is not so efficient than four slaves listening to different port respectively, and in the front, adding a inverse-proxy like Nginx to balance the load. This approach is not tried yet.

Change business logic

I tried to remove the code snippet of writing to mongo db, and then test it. In this situation, node.js server has the same TPS as Apache Httpd server.

1
2
3
[root@pu ~]# ab -n 10000 -c 100 -k http://localhost:1337/
Concurrency Level: 100
Time taken for tests: 0.251 seconds

So for static page, nodejs is not so powerful, it’s value lies in when the business logic added, the TPS won’t drop down rapidly.

Future test

Stability: as last time I tried this http server, it shows periodically TPS down, probably related with V8’s GC, so need to investigate into more detail about it.

Update