Categories
Software Development

ParseGutenberg C++ code available in GitHub now

In my Node.js vs C++ blog I promised to add the Source Code to GitHub. It is available now at https://github.com/kubimtk/parseGutenberg

Categories
Software Development

Lucky 3D Buddha in Swift

Today my first Apple app is available in AppStore. Difficult to find within the first 24 hours.

Colors and Material can be changed and the Buddha can be moved around to the perspective you like.
The Software was written in Swift (5 at this time) – 2 years ago I wrote the first (simple) version in RubyMotion but this time I wanted just improv my Swift skills and bring it to the app store for a simple skill reference. I additionally added changed movement from automatic to my own movement control using gestures and added material and color control and a user preset.

The app is available for free at the moment and there is no means implemented to earn money.

Categories
Software Development

Node.js vs C++

While waiting for payed projects I am learning new things and keep my fingers moving. Some time ago I learned Node.js from the book Node.js 8 the Right Way

In chapter 5 you learn how to generate a bulk import file (Json format) for Elasticsearch from Project Gutenberg. You do this by parsing each the over 58000 rdf files you can download from http://www.gutenberg.org/cache/epub/feeds/rdf-files.tar.bz2 and extracting the Gutenberg ID, the book’s title, the list of authors and the list of subjects an writing out a index line and this book info as json lines. This takes quite a while to process and I wanted to get fluid writing test driven C++ code using Jetbrain’s rider and trying out some new C++ Parser – in this case CMarkup. In Node.js Cheerio was used which uses CSS selectors to find the elements you are looking for. Sometime in the future I will implement a sax based C++ xml parser and a CSS selector solution.

Including my tests the C++ source code has 608 lines in total:

% wc $(find inc src tests | egrep ".cpp$|.h$" | egrep -v "googletest|Mark")
      30      63     633 inc/Book.h
      73     323    2604 inc/KKKLogger.h
      40      67     668 inc/GutenbergParser.h
     118     281    3762 src/GutenbergParser.cpp
      60     163    1490 src/Book.cpp
      88     263    2593 src/main.cpp
      88     203    2544 tests/GutenbergParser_Tests.cpp
      24     125    1231 tests/TestFilePaths.h
      87     190    2865 tests/Book_Tests.cpp
     608    1678   18390 total

In NodeJS we have only 33 lines in total:

% wc rdf-to-bulk.js lib/parse-rdf.js
      20      34     440 rdf-to-bulk.js
      13      27     497 lib/parse-rdf.js
      33      61     937 total

The C++ has some more functionality (selective logs, different output options) but that makes only 100 or so lines difference while the tests are about 200 lines so in effect you have:

  • 300 lines for C++
  • 33 lines for Node.js

The Runtime is about 5 times faster in C++ vs. Node.js 8 . Runtime in Node.js in my bash environment with Node.js 8:

$ node --version
v8.0.0
$ time node rdf-to-bulk.js /Users/kubi/NodeJsProgs/data/cache/epub >bulk_node

real	2m21.932s
user	1m58.511s
sys	0m18.657s

Runtime in C++:

time ./cmake-build-debug/parseGutenberg -bulk /Users/kubi/NodeJsProgs/data/cache/epub > bulk_cpp
./cmake-build-debug/parseGutenberg -bulk  > bulk_cpp  27,24s user 9,69s system 60% cpu 1:00,97 total

Later I installed node fresh from Homebrew (because after using csh as standard it was not available there) and that installed Node.js 14.5 and is much faster:

% node
Welcome to Node.js v14.5.0.
% time node rdf-to-bulk.js /Users/kubi/NodeJsProgs/data/cache/epub >bulk_node.V14.5.0
node rdf-to-bulk.js /Users/kubi/NodeJsProgs/data/cache/epub >   94,72s user 21,25s system 116% cpu 1:39,59 total

The runtime varies +- 3 seconds when I ran the programs several times within the last two hours (I have mysql, Jenkins, docker and several other servers running on my MacBookPro all the time but of them only mysql pops up with a CPU usage of 0.2% from time to time (the other may hide under kernel_task with around 2.6%) when I use top and the server load between 1.3 and 1.79)

Conclusion

With the newest Node.js environment the run times of the C++ and the Node.js differ by only 60% (1:00m vs :140m) while you need 9 times more code lines in C++ (that is not really comparable as with CMarkup the implementation is not comparable with a CSS selector approach). There is more evaluation needed. I.e. I did not switch on the optimization flag in C++ compilation and I have to use the CSS Selector approach in C++ and reduce the code to a minimalistic version without logs and error handling but then the code line comparison becomes unrealistic. The code in the Node.js book has this minimalistic style to concentrate on the important learning units. I will add the c++ project to my kubimtk GitHub account bin the near future.

Categories
General

My current situation

I am available for a freelance software development project

I will update my situation here when it changes

Categories
General

Warum English?

Jeder, der sich für professionelle Software Entwicklung interessiert, wird auch englisch verstehen. Es lohnt sich daher für mich nicht, auch noch deutsch sprechendes Publikum auf deutsch anzusprechen.

Categories
General

Hello world!

Welcome to my blog!