Software Development

Node.js vs C++

While waiting for payed projects I am learning new things and keep my fingers moving. Some time ago I learned Node.js from the book Node.js 8 the Right Way

In chapter 5 you learn how to generate a bulk import file (Json format) for Elasticsearch from Project Gutenberg. You do this by parsing each the over 58000 rdf files you can download from and extracting the Gutenberg ID, the book’s title, the list of authors and the list of subjects an writing out a index line and this book info as json lines. This takes quite a while to process and I wanted to get fluid writing test driven C++ code using Jetbrain’s rider and trying out some new C++ Parser – in this case CMarkup. In Node.js Cheerio was used which uses CSS selectors to find the elements you are looking for. Sometime in the future I will implement a sax based C++ xml parser and a CSS selector solution.

Including my tests the C++ source code has 608 lines in total:

% wc $(find inc src tests | egrep ".cpp$|.h$" | egrep -v "googletest|Mark")
      30      63     633 inc/Book.h
      73     323    2604 inc/KKKLogger.h
      40      67     668 inc/GutenbergParser.h
     118     281    3762 src/GutenbergParser.cpp
      60     163    1490 src/Book.cpp
      88     263    2593 src/main.cpp
      88     203    2544 tests/GutenbergParser_Tests.cpp
      24     125    1231 tests/TestFilePaths.h
      87     190    2865 tests/Book_Tests.cpp
     608    1678   18390 total

In NodeJS we have only 33 lines in total:

% wc rdf-to-bulk.js lib/parse-rdf.js
      20      34     440 rdf-to-bulk.js
      13      27     497 lib/parse-rdf.js
      33      61     937 total

The C++ has some more functionality (selective logs, different output options) but that makes only 100 or so lines difference while the tests are about 200 lines so in effect you have:

  • 300 lines for C++
  • 33 lines for Node.js

The Runtime is about 5 times faster in C++ vs. Node.js 8 . Runtime in Node.js in my bash environment with Node.js 8:

$ node --version
$ time node rdf-to-bulk.js /Users/kubi/NodeJsProgs/data/cache/epub >bulk_node

real	2m21.932s
user	1m58.511s
sys	0m18.657s

Runtime in C++:

time ./cmake-build-debug/parseGutenberg -bulk /Users/kubi/NodeJsProgs/data/cache/epub > bulk_cpp
./cmake-build-debug/parseGutenberg -bulk  > bulk_cpp  27,24s user 9,69s system 60% cpu 1:00,97 total

Later I installed node fresh from Homebrew (because after using csh as standard it was not available there) and that installed Node.js 14.5 and is much faster:

% node
Welcome to Node.js v14.5.0.
% time node rdf-to-bulk.js /Users/kubi/NodeJsProgs/data/cache/epub >bulk_node.V14.5.0
node rdf-to-bulk.js /Users/kubi/NodeJsProgs/data/cache/epub >   94,72s user 21,25s system 116% cpu 1:39,59 total

The runtime varies +- 3 seconds when I ran the programs several times within the last two hours (I have mysql, Jenkins, docker and several other servers running on my MacBookPro all the time but of them only mysql pops up with a CPU usage of 0.2% from time to time (the other may hide under kernel_task with around 2.6%) when I use top and the server load between 1.3 and 1.79)


With the newest Node.js environment the run times of the C++ and the Node.js differ by only 60% (1:00m vs :140m) while you need 9 times more code lines in C++ (that is not really comparable as with CMarkup the implementation is not comparable with a CSS selector approach). There is more evaluation needed. I.e. I did not switch on the optimization flag in C++ compilation and I have to use the CSS Selector approach in C++ and reduce the code to a minimalistic version without logs and error handling but then the code line comparison becomes unrealistic. The code in the Node.js book has this minimalistic style to concentrate on the important learning units. I will add the c++ project to my kubimtk GitHub account bin the near future.


My current situation

I am available for a freelance software development project

I will update my situation here when it changes