I was made simple and very complicated experiments too, when take a serial code and put it in one thread (threads). Execution time increases about two time after that. I’d like to know scientific explanation. As result, two processors (cores) are not enough to increase efficiency of series code because we have two times delay (series code to threads) and some time for synchronization (by events) in some cases.
I have used in my experiments QuickWin library and Digital Fortran for creation multi-threading.
Or may be I mistake?