We recently ran into a problem where our node.js process was running out of memory very rapidly. We tracked it down to a couple of debug log lines that, when removed, reduced the memory problem considerably. The problem was that because all i/o is asynchronous in node.js we were generating log messages faster than they could be written to the file system. They were getting buffered, piling up until the process ran out of memory.

As an attempt to ameliorate this I wanted to see if changing log4js‘s fileAppender from using fs.write to using a WriteStream would make any difference. My test program looked like this:

var log4js = require('./lib/log4js')
, logger
, usage
, i;

log4js.configure(
    {
        appenders: [
            {
                category: "memory-test"
              , type: "file"
              , filename: "memory-test.log"
            },
            {
                type: "console"
              , category: "memory-usage"
            },
            {
                type: "file"
              , filename: "memory-usage.log"
              , category: "memory-usage"
              , layout: {
                  type: "messagePassThrough"
              }
            }
        ]
    }
);
logger = log4js.getLogger("memory-test");
usage = log4js.getLogger("memory-usage");

for (i=0; i < 1000000; i++) {
    if ( (i % 5000) === 0) {
        usage.info("%d %d", i, process.memoryUsage().rss);
    }
    logger.info("Doing something.");
}

It writes one million log lines to a file, and every 5000 it records the memory usage to another file for processing later. I ran two versions of log4js, one with the original fileAppender (using fs.write) and one using a rewritten fileAppender that used a WriteStream. The graph below shows the results.

graph of memory usage by fs.write and writestream

It is pretty obvious that WriteStream is the better implementation for this application, and the unstable branch of log4js now contains the rewritten fileAppender. I hope to release this within the next couple of days.

I came across a small problem with unit testing functions which take a callback. If you write your test like this:

"should callback with error on connection failed": function () {
  thing.doSomething(function(result) {
    Y.Assert.areEqual(result, "error");
  });
},

That test will pass even when the callback is never called. Your code could silently swallow errors, but your test would always pass. The assert never gets called.

If you write the test like this instead:

"should callback with error on connection failed": function () {
  var result;
  thing.doSomething(function(res) {
    result = res;
  });
  Y.Assert.areEqual(result, "error");
},

This will mean that your assert will always get called. You may run into problems if the doSomething function calls something asynchronous – but that’s where dependency injection and mocking can help you.

Dependency injection is a handy technique for writing testable code. If you’ve used Spring in the Java world, you’re probably familiar with it. Put simply, it means that code should not be responsible for instantiating the things it depends on – these dependencies should be passed in by the calling code. That means you can pass in mock versions for testing.

In node.js land, we use require to bring in external modules. For example:

var http = require('http');
exports.twitterData = function(callback) {
  http.createClient(... blah...);
}

This module is hard to test without needing internet access, which may not be available if your tests are running on a build server behind a corporate firewall. Besides, you don’t want your tests calling twitter all the time.

What you want to be able to do is replace http with a mock version, so that you can verify that your module makes all the right calls without actually needing access to the internet. There are a couple of ways to do this, like messing about with require.paths to bring in mock modules. Here’s the way we decided to do it.

First we write our modules in this style:

module.exports = function(http) {
  var http = http | require('http');
  // private functions and variables go here...

  //return the public functions
  return {
    twitterData: function(callback) {
     http.createClient(...etc...);
    }
  };
}

In normal use, we’d require our twitter module like this:

var twitter = require('twitter')();

In our tests, we’d require it like this:

var mockHttp = { createClient: function() { assert(something); } };
var twitter = require('twitter')(mockHttp);
//do some tests.

It seems to be working fine for us so far, and having that first line of your module tell you explicitly what the dependencies are is quite handy as well. If you want some real-world examples of this, take a look at the log4js-node source.

One of our happy team of code monkeys wrote up his heroic efforts on getting node.js to build on Solaris x86.