Last update October 31, 2014

 

Welcome !

Hope you will enjoy my parallel libraries and my other libraries. Your comments , suggestions and feedback are welcome.

You can also download my libraries from here:

https://sites.google.com/site/aminer68/


Sincerely,
Amine Moulay Ramdane
aminer@videotron.ca

 Modules/Authors

 

Files




A web page that i wrote on how to analyze parallel applications with Petri Nets:
Petri Nets.

A web page that i wrote about a Jackson network problem:
Jackson network

You need the following 32 bit DLL for my libraries: 
msvcr110.dll

You need the following 64 bit DLL for my libraries: 
msvcr100.dll

 

Multiple linear regression 1.02  


Authors: Amine Moulay Ramdane


Click here to download the zip file: Zip (for FreePascal and Lazarus and Delphi 7 to 2010)


Description:


Description: Multiple linear regression that uses SIMD SSE2 instructions and that implements the following mathematical theorem:

If A is an m x n rank n matrix, then the least-squares solutions to the system A*vector(X) = vector(C) are the solutions to the system:

A*vector(X) = A*inverse(transpose(A)*A))*transpose(A)*vector(C).

This system has the following unique solution:

vector(X) = inverse(transpose(A)*A)*transpose(A)*vector(C)

I have also included a Matrix library called LinMath that uses SIMD SSE2 instructions and that is multithreaded, LinMath was derived fom mrmath Matrix library by Rabatscher Michael Matrix library and modified to become compatible with both FreePascal and Delphi.
LinMath is offered under the licence agreement described on:

http://www.mrsoft.org/


Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux...

Required FPC switches: -O3 -Sd

-Sd for delphi mode....

Required Delphi switches:  -$H+

PArchiver 2.23  

Author: Amine Moulay Ramdane


Click here to download the zip file: Zip 


Description: Parallel archiver using Parallel LZO, Parallel LZ4, Parallel Zlib, Parallel Bzip and Parallel LZMA compression algorithms.

Supported features:

- Opens and creates archives using my Parallel LZ4 or Parllel LZO or Parallel Zlib or Parllel Bzip or Paralle LZMA with different compression levels

- Wide range of Parallel compression algorithms: Parallel LZ4, Parallel LZO, Parallel ZLib, Parallel BZip and  Parallel 

LZMA with different compression levels

- Compiles into exe - no dll/ocx required.

- 64 bit supports - lets you create archive files over 4 GB ,
 
  supports archives up  to 2^63 bytes, compresses and 
  decompresses files up to 2^63 bytes.

- Now my Parallel Zlib gives 5% better performance than Pigz.

- Supports memory and file streams , 
adds compressed data 
  directly from streams and extracts archived files to streams 
  without creating temp files.

- Save/Load the archive from stream

- Supports in-memory archives

- You can use it as a hashtable from the hardisk or from the memory

- Fault tolerant to power failures etc..

-
Creates encrypted archives using Parallel AES encryption with 256 bit keys.

- Fastest compression levels are extremely fast

- Good balanced compression levels provide  both good compression ratio and high speed

- Maximum compression levels provide much better compression ratio than Zip, RAR and BZIP and the same as  7Zip with 8 megabytes dictionary.

- It supports both compression and decompression rate indicator

- You can test the integrity of your archive

- Easy object programming interface

- Full source codes available.

- Platform: Win32 , Win64


Please look at test_pzlib.pas , test_plzo.pas , test_plz4.pas ,  test_pbzip.pas and test_plzma.pas demos inside the zip file, compile and execute them.. -

I have tried to do a worst scalability prediction with an HDD drives and a Q6600 quad core for my parallel archiver  with Parallel LZMA, and i think it's good..

There is four things in my Parallel LZMA algorithm:

First we have to copy serially a stream from the hardisk to the memory and this will take in average 0.2  second and in the compression method we have to copy a stream to the memory and this will take in average 0.05 second and in the compression method you have to compress a stream to another stream in memory and this will take in average 13 seconds seconds and in the compression method you have to copy a compressed stream to a hardisk file and this will take in average 0.01 second.

So we have the serial part that is: 0.2 second + 0.01 second + 0.05 second = 0.26 second = 0.02%
and the parallel part will that is: 13 seconds = 0.98%

So the worst case scalability scenario using an HDD and using the Amdahl equation will give us: 1/0.02% + (0.98%/N) = 50X scalability  (N: is the number of cores)

So this will scale up to: 50X , so as you have noticed with an HDD drive this is a good scalability.

So what can we do to scale more parallel archiver using parallel LZMA ?

You can for example use a RAID 10 with a base configuration of 4 HDD drives, so this will cut in 4 the 0.2 second and the 0.01 second , so  this will give a scalability of 124X and this is better.. but to speed more the things we can use SSD drives that are 2X time faster than a HDD drives and  with  a RAID 10 configuration and this will give:  434X worst case scalability.

So as you have noticed if you are using only an HDD with a multicore system you will get a 50X scalability with my parallel archiver using parallel LZMA, and if you use RAID 10 with SSD drives you will get 434X scalability.

When you want to delete files inside the archive you have to call the DeleteFiles() method , the DeleteFiles() method will not delete the files, it will mark the files as deleted , when you want to delete completly the files , you have to call the DeletedItems() method to see how many files are marked deleted and after that you use the Clean() method to delete completly the files from the archive. I have implemented it like that, cause it's better in my opinion..

And my parallel archiver uses a hashtable to store the file names and there corresponding file positions so that you can direct access to files inside the archive when decompressing, and deleting etc. so it's very fast.

Please look at the test_pzlib.pas, test_plzo.pas, test_plz4.pas , test_pbzip.pas and test_plzma.pas demos inside the zip file to see how to use my Parallel archiver.

And please don't use directly the ParalleZlib.pas that i have included inside the Parallel archiver zip file, cause i have modified it to work correclty with my Parallel archiver.

If you want to use my ParallelZlib library just download it from my website, or download my other Parallel compression library.

You can now use my Parallel archiver as a hashtable from the hardisk with 0(1) access, you can for example
stream your database row with my ParallelVarFiler into a memory stream or into a string, and store it with my Parallel archiver into an  archive, and after that your can access your rows into the hardisk as a hashtable with O(1) access, you can use it like that as a database if you  have for example id keys that you want to map to database rows, that will be a good idea to
use my Parallel archiver as a hashtable.

Question:

What's your newest ideas behind your parallel archiver ?

Answer:

Of course my Parallel Archiver supports Parallel compression etc. but my newest ideas behind my Parallel Archiver are the following:


I have played with Winzip and 7Zip , but if you want to give some files to extract or to test there integrity, they both (Winzip and 7Zip) will use sequential access and that's bad i think, so i have decided to implement a O(1) access that is very fast for extraction and and for testing the integrity etc. into my Parallel Archiver and for that i have used an in-memory hashtable that maintains the files names and there correponding file positions , and my second idea is that my Parallel Archiver is fault tolerant to power failures and also if your hardisk is full and you get file corruption etc. so my Parallel Archiver is fault tolerant to this kind of problems , 7Zip and Winzip i think are not fault tolerant to those kind of problems. 

I have just played with 7Zip , and i have compressed 3 files into the archive and after than i have opened the archive with an editor and i have deleted some bytes and i have saved the file and after that when i have tried to open the archive, 7zip responded that the file is corrupted, so 7Zip is not fault tolerant, i think that with WinZip it's the same, but i have done the same test with  my Parallel archiver, and it's recovering from the file damage, so it's fault tolerant to this kind of damages, such as power failures and when also the disk is full and you get a file corruption etc. I have implemented this kind of  fault tolerancy into my Parallel archiver.

I have updated my Parallel archiver and i have added the Update() method, it's overloaded now in the first version you pass a key name and a TStream, and in the second version you pass a key name and a filename. Please look at the test_pzlib.pas demo inside the zip file to see how to use those methods.

So now you have all the methods to use my Parallel archiver as a Hashtable from the hardisk with direct access to the compressed and/or encrypted data with O(1) complexity and very fast acces to the data , the DeleteFiles() has a O(1) complexity the ExtractFiles() and Extract() have also O(1) complexity and  GetInfo() is also O(1) and of course the AddFiles() is also O(1), the Test() method is also O(1). So now it's extremely fast.

When you want to do solid compression with my Parallel archiver using  Bzip , you can use the same method as is using Tar , you can first archive your file with the compression level 0 and after that compress all your archive file using Bzip, and when you want to encrypt your data with Parallel AES encryption just give a password by setting the password property and when you don't want to encrypt just set the password property to a null string or don't set the password property , that's all.

Parallel archiver supports the storing and restoring of the following file attributes:

Hidden, Archive, System, and Read only attributes.

To store and restore them just set the AddAttributes property like this:

pzr.AddAttributes:=[ffArchive,ffReadOnly,ffHidden,ffSystem];

I have added the in-memory archives support, cause this way Parallel archiver will be much more faster than disk archives, and you will be able to lower much more the response time and to lower the load on your server.

If you want to use an in-memory archive,  pass an empty string to the file name in the constructor, like this:

pzr :=TPLZ4Archiver.Create('',1000,4);

And if you want to read your in-memory archive , read from the Stream property that is exposed(a TStream) like this:

pzr.stream.position:=0;
A_Memory_Stream.copyfrom(pzr.stream,pzr.stream.size)

You can also load your archive from a file or memory stream just by assigning your file or memory stream to the Stream property (a TStream).


I have overloaded the GetKeys() method , now you can use wildcards, you can pass the wildcard in the first argument and the TStringList in the second argument like this:  pzr.getkeys('*.pas',st);

and after that call the  ExtractFiles() method and pass it the TStringList.

As you have noticed,  the programming interface of  my Parallel archiver is very easy to use.

And read this:


"We're a video sharing site located in China. We rewrote the PHP memcached client extension by replacing zlib with QuickLZ. Then our server loads were dramatically reduced by up to 50%, the page response time was also boosted. Thanks for your great work!

Jiang Hong"

http://www.quicklz.com/testimonials.html

http://www.quicklz.com/

So as you have noticed , like QuickLZ or Qpress, i have implemented Parallel archiver to be very fast also.

By using my Parallel Zlib or my Parallel LZ4 or my Parallel LZO  compression algorithms my Parallel archiver will  be very very fast and as i have wrote in my webpage:


"So now you have all the methods to use my Parallel archiver as a Hashtable from the hardisk with direct access to the compressed and/or encrypted data with O(1) very fast acces to the data , the DeleteFiles() has a O(1) complexity the ExtractFiles() and Extract() have also O(1) complexity and  GetInfo() is also O(1) and of course the AddFiles() is also O(1), the Test() method is also O(1). So now it's extremely fast. "

You can even use  my Parallel archiver as a hash table database from the Harddisk to lower more the load on your server (from internet or intranet) and boost the response time.....

I have used solid compression like with the  tar.lzma format and i have found that my Parallel archiver, with maximum level compression that is clLZMAMax, compresses to the same size as 7Zip with maximum level compression and with a dictionary size of 8 MB and it compresses 13% better than WinRar with maximum level compression and it is muh  better than WinZip on compression ratio .

How to use solid compression with my Parallel archiver ?

Just archive your files with clLZMANone and  after that compress your archive with clLZMAMax, Parallel archiver will then compress to the same size as 7Zip with maximum level compression and with a dictionary size of 8 MB and it will compress 13% better than WinRar with maximum level compression and it will compress muh better than WinZip with maximum level compression .

I have updated my Parallel archiver to a new version  and i have decided to include Parallel LZ4 compression  algorithm (one of the fastest in the world) into my Parallel archiver,  so to compress bigger data such us Terabytes data you can use my Parallel LZO or my Parallel LZ4 compression algorithms with my Parallel archiver, i have also added the high compression mode to Parallel LZ4 compression algorithm, now for a fast mode  use clLZ4Fast and for the high compression mode use clLZ4Max. The Parallel LZ4 high compression mode is interresting also, it compresses much better than LZO and it is very very fast on decompression, faster than Parallel LZO. I have included a test_plz4.pas demo inside my Parallel archiver zip file to show you how to use Parallel LZ4 algorithm with  my Parallel archiver.

Here is the LZ4 website if you want to read about it:

http://code.google.com/p/lz4/


I have downloaded also the IHCA compression  algorithm from the following website:

http://objectegypt.com/

And i have wrote a Parallel IHCA and begin testing it against my Parallel LZO and my Parallel LZ4 , they say on the IHCA website that it  has the same performance as the LZO algorithm , but i have noticed on my benchmarks that Parallel IHCA(that i wrote) is much more slower than my Parallel LZO and my Parallel LZ4 , so i think the IHCA compressoin algorithm is a poor quality software that you must avoid, so please use my Parallel archiver and Parallel compression library cause with my Parallel LZO and my Parallel LZ4 they are now one of the fastest in the world.

I have also downloaded the following QuickLZ algorithm from:

http://www.quicklz.com/

and i have wrote a Parallel QuickLZ and i have tested it against  my Parallel LZO and Parallel LZ4 , and i have noticed that Parallel QuickLZ is slower than my Parallel LZ4 algorithm, other than that  with  QuickLZ  you have to pay for a commercial license , but with  my Parallel archiver and my Parallel compression library you have to pay 0$ for a commercial license.

My Parallel archiver was updated,  i have ported the Parallel LZ4 compression algorithm(one of the fastest in the world)  to the Windows 64 bit system, now Parallel LZ4 compression algorithm is working perfectly with Windows 32 bit and 64 bit, if you want to use Parallel LZ4 with Windows 64 bit just copy the lz4_2.dll inside the LZ4_64 directory (that you find inside the zip file) to your
current directory or to the c:\windows\system32 directory, and if you want to use the Parallel LZ4 with Windows 32 bit  use the lz4_2.dll inside the LZ4_32 directory.

Here is more information about my Parallel archiver:

Parallel LZO supports Windows 32 bit and 64 bit

Parallel Zlib supports Windows 32 bit and 64 bit

Parallel LZ4 supports Windows 32 bit and 64 bit

Parallel Bzip is Windows 32 bit only

Parallel LZMA is Windows 32 bit only

But even if Parallel LZMA and Parallel Bzip are windows 32 bit only , my Parallel archiver supports Terabytes files and your archive can grow to Terabytes  size even with 32 bit windows executables, and that's good.

And Look also at the prices of the XCEED products:

XCEED Streaming compression library:

http://xceed.com/Streaming_ActiveX_Intro.html

and the XCEED Zip compression library:

http://xceed.com/Zip_ActiveX_Intro.html

http://xceed.com/pages/TopMenu/Products/ProductSearch.aspx?Lang=EN-CA


I don't think the XCEED products supports parallel compression as does my Parallel archiver
and my Parallel compression library..

And just look also at the Easy compression library for example, if you have noticed also it's not a parallel compression library.

http://www.componentace.com/ecl_features.htm

And look at its pricing:

http://www.componentace.com/order/order_product.php?id=4


My Parallel archiver and parallel compression library costs you 0$ and they are parallel compression libraries, and they are very fast and very easy to use, and they  supports Parallel LZ , Parallel LZ4, Parallel LZO, Parallel Zlib,  Parallel Bzip and Parallel LZMA and they come with the source codes and much more...

Hope you will enjoy my Parallel archiver.

Here is the public methods that i have implemented:

Constructor Create(file1:string,size:integer;nbrprocs:integer);
- Creates a new TPZArchiver ready to use, size is the hashtable size for the index(Key file names and the corresponding file position ,and file1 is the file archive, nbrprocs is the number of cores you have specify to run Zlib , LZ4, LZO , Bzip and LZMA  in parallel.

Destructor Destroy;
- Destroys the TPZArchiver object and cleans up.

function AddFiles;
- Adds the files to the archive.

function AddStream;
-Adds the stream to the archive.

function DeleteFiles;
- Deletes the TStringList content from the archive.

function Erase;
   - Erases the data inside the archive and inside  the hashtable.

function Update;
- Updates the file or the stream inside the archive

function ExtractFiles;
- Extracts the TStringList content from the archive.

function ExtractAll;
- Extracts all the files from the archive.

function Extract;
-Extracts the file to the stream.

function Test;
- Tests the integrity of the files inside the archive.

function GetInfo;
- Gets the file info that is returned in a TZSearchRec record.

function ClearFile;
- Deletes all contents of the archive.

function Clean:boolean
- Cleans the marked deleted items from the file.

function DeletedItems:integer
- Returns the number of items marked deleted.

function LoadIndex:boolean
- Loads the the file names keys and there correponding file positions values from the file passed to the constructor into the hashtable.

function Exists(Name : String) : Boolean;
- Returns True if a file Name exists

procedure GetKeys(Strings : Tstrings);
- Fills up a TStrings descendant with all the file names.

function Count : Integer;
- Returns the number of files inside the archive.


PUBLIC PROPERTIES:

Indicator : boolean
- To show the compression and decompression indicator.
CompressionLevel;
- Sets and reads the compression level.
Overwrite:boolean
- To update and overwrite the file without asking .
Freshen: boolean
-Adds newer files to the archiver and extract newer files from the archive.
AddRecurse: boolean
- AddFiles() method will recurse on subdirectories. 
Stream:boolean
  - The archive is exposed as a TStream, use it for in-memory archive or disk archive.
AddAttributes: TAttrOptions
- FindFile attributes for the AddFiles() method, look inside FindFile component.

Language: FPC Pascal v2.2.0+ and Lazarus  / Delphi 7 to 2007: http://www.freepascal.org/

Operating Systems: Win32 and Win64 

And inside defines.inc you can use the following defines:

{$DEFINE CPU32} and {$DEFINE Win32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Win64} for 64 bit systems

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+ -DDelphi


Lightweight SemaCondvar version 1.21

Author: Amine Moulay Ramdane.


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2010)

Click here to download the zip file: Zip (for Delphi XE versions)

Description:

SemaCondvar and SemaMonitor are new and portable synchronization objects , SemaCondvar combines all the charateristics of a semaphore and a condition variable and SemaMonitor combines all charateristics of a semaphore and an eventcount , they only use an event object and and a very fast and very efficient and portable FIFO fair Lock , so they are fast and they are FIFO fair.

Now you can pass the SemaCondvar's initialcount  and SemaCondvar's MaximumCount to the construtor, it's like the Semaphore`s InitialCount and the Semaphore's MaximumCount.

Like this:

t:=TSemaMonitor.create(true,0,4);

When you pass True in the first parameter of the constructor the signal(s) will not be lost even if there is no waiting threads, if it's False the signal(s) will be lost if there is no waiting thread.

As you will notice i have used a scalable and efficient and FIFO fair Lock inside SemaCondVar.pas, i have made ,  but you will notice that this Lock is now portable to the x86 systems and to Linux(x86) , Mac OSX(x86) and Windows(x86), but if you want to port it to other systems than the x86, please change inside SemaCondVar.pas the TALOCK by the TCriticalSection of the SyncObjs unit and it will be portable to other systems than the x86. That's all. So enjoy SemaCondvar and SemaMonitor.

Here is the methods that i have implemented :

TSemaCondvar = class
   public

constructor
Create(m1:TCriticalSection;state1:boolean=false;InitialCount1:longword=0;MaximumCount1:longword=INFINITE);
     destructor Destroy; override;
     function  wait(mstime:longword=INFINITE):boolean;
     procedure signal();overload;
     procedure signal_all();
     procedure signal(nbr:integer);overload;
     function WaitersBlocked:integer;
   end;

TSemaMonitor = class
   public
     constructor
Create(state1:boolean=false;InitialCount1:longword=0;MaximumCount1:longword=INFINITE);
     destructor Destroy; override;
     function  wait(mstime:longword=INFINITE):boolean;
     procedure signal();overload;
     procedure signal_all();
     procedure signal(nbr:integer);overload;
     function WaitersBlocked:integer;
    end;

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... {$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems

{$DEFINE OSX}  for Mac OSX on Delphi XE4 to XE5


Scalable MLock 1.2

Author: Amine Moulay Ramdane.


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2010)

Scalable MLock is a node based Lock that is scalable, FIFO fair
and starvation-free.

- Discovered by Amine Moulay Ramdane

- This lock is scalable

- It has the same space requirement as the scalable MCS lock

- Doesn't require a local "queue node" to be passed in as a parameter as is doing the MCS and CLH locks.

- Spins only on local locations on a cache-coherent machine

- And it's fast.

Please read this:

A bigger problem with the MCS lock is its API. It requires a second structure to be passed in addition to the address of the lock. The algorithm
uses this second structure to store the information which describes the queue of threads waiting for the lock. Unfortunately, most code written using spinlocks doesn't have this extra information, so the fact that the MCS algorithm isn't a drop-in replacement to a standard spin lock is a problem.

An IBM working group found a way to improve the MCS algorithm to remove the need to pass the extra structure as a parameter. Instead, on-stack information was used instead. The result is  the K42 lock algorithm:

Unfortunately, the K42 algorithm has another problem. It appears that it may be patented by IBM. Thus it cannot be used either. (Without perhaps paying royalties to IBM.)

So you have to know that my scalable MLock doesn't require a local "queue node" to be passed in as a parameter  as is doing the MCS and CLH locks, my scalable MLock doesn't require any parameter to be passed, just call the Enter() and Leave() method and that's all.

A very fast concurrent FIFO Queue version 1.2

Authors: Amine Moulay Ramdane 


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2007)


Description:

A very fast concurrent FIFO queue that satisfies many requirements: it has more parallelism than the two locks algorithm, it is FIFO fair , it's starvation-free and it has more parallelism than the two locks algorithm, it minimizes efficiently the cache-coherence traffic and it is energy efficient on the pop() side when you set the wait parameter to true in the construtor: when there is no items in the queue it will not spin-wait , but it will block wait on my SemaMonitor, and when the wait parameter of the constructor is set to false it uses only an atomic increment on the push() side and an atomic increment on the pop() side, so it's very fast. The number of threads on the push() side are limited by the length of the queue, and the number of threads on the pop() side are limited by the length of the queue, the length of the queue must be greater or equal to 2^10, i have set it like that.


You have 3 options for setting the kind of locks, just look inside defines.inc , if you want to set it for my array based lock called AMLock just uncomment the option AMLock inside defines.inc, if you want to set it for Ticket Spinlock just uncomment the option TicketSpinlock ,If you want to set it for Spinlock just uncomment the option Spinlock,  the Ticket Spinlock option scored 12.5 millions of transactions per second on my 2.4 GHz Quadcore.

The size of the queue must be passed to the constructor and it must be a power of 2.

Look here at my reasonning to prove to you that my algorithm is correct: Concurrent FIFO queue.

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux (x86)...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... 

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems


A very fast concurrent FIFO Queue version 1.1

Authors: Amine Moulay Ramdane 


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2007)


Description:

A very fast concurrent FIFO queue that satisfies many requirements: it has more parallelism than the two locks algorithm, it is FIFO fair, it minimizes efficiently the cache-coherence traffic and it is energy efficient on the pop() side when you set the wait parameter to true in the construtor: when there is no items in the queue it will not spin-wait , but it will block wait on my SemaMonitor, and when the wait parameter of the constructor is set to false it uses only an atomic increment on the push() side and a CAS on the pop() side, so it's very fast., this concurrent FIFO queue is limited to 1000 threads on the push() side, if you want to higher that, just modify the "margin" constant in the source code.

You have 3 options for setting the kind of locks, just look inside defines.inc , if you want to set it for my array based lock called AMLock just uncomment the option AMLock inside defines.inc, if you want to set it for Ticket Spinlock just uncomment the option TicketSpinlock ,If you want to set it for Spinlock just uncomment the option Spinlock, the Spinlock gives better performance under contention it scored 6.4 millions of transactions per second on my 2.4 GHz Quadcore, the Ticket Spinlock option scored 2.6 millions of transactions per second on my 2.4 GHz Quadcore, the Spinlock scaled better even if the number of threads are greater than the number of cores, the TicketSpinlock and AMLock don't scale when the number of threads are greater than the number of cores, the Ticket Spinlock
and scalable AMLock are optimal when the number of threads are equal to the number of cores, and when the wait parameter of the constructor is false it scales even if the number of threads are greater than the number of cores and when the wait parameter is set to false it scored 6.4 millions of transactions per second on my 2.4 GHz Quadcore, that's very fast.

Look inside the zipfile, you have to know that SFWQueue.pas is starvation-free on both the push() and the pop(), but WQueue.pas is only starvation-free on the pop().

The size of the queue must be passed to the constructor and it must be a power of 2.

Look here at my reasonning to prove to you that my algorithm is correct: Concurrent FIFO queue.

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux (x86)...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... 

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems


A fast concurrent FIFO Queue using the two locks algorithm version 1.04

Authors: Amine Moulay Ramdane 


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2007)


Description:

A concurrent FIFO queue that satisfies many requirements: it is FIFO fair, it minimizes efficiently the cache-coherence traffic and it is energy efficient on the pop() side: when there is no items in the queue it will not spin-wait , but it will block wait on my SemaMonitor.

You have 3 options for setting the kind of locks, just look inside defines.inc , if you want to set it for my array based lock called AMLock just uncomment the option AMLock inside defines.inc, if you want to set it for Ticket Spinlock just uncomment the option TicketSpinlock ,If you want to set it for Spinlock just uncomment the option Spinlock, the Spinlock gives better performance under contention it scored 12.5 millions of transactions per second on my 2.4 GHz Quadcore, the Ticket Spinlock option scored 3.2 millions of transactions per second on my 2.4 GHz Quadcore, the Spinlock scaled better even if the number of threads are greater than the number of cores, the TicketSpinlock and AMLock don't scale when the number of threads are greater than the number of cores, the Ticket Spinlock and scalable AMLock are optimal when the number of threads are equal to the number of cores.

You have to set the wait parameter of the constructor to "true" and set the option inside the file "defines.inc" to TicketSpinlock or AMLock, so that my concurrent FIFO queue becomes fully starvation-free. 

The size of the queue must be passed to the constructor and it must be a power of 2.

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux  (x86)...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... 

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems


A concurrent FIFO Queue version 2.03

Authors: Amine Moulay Ramdane and  Dariusz Mazur 


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2007)


Description:

A concurrent FIFO queue that satisfies many requirements: it is FIFO fair on the push() side and lockfree on the pop() side, it uses only an atomic increment on the push() side and s proportional backoff on the push() to optimize it more on the push() side and it uses a CAS on the pop() side , and it minimizes efficiently the cache-coherence traffic and it is energy efficient on the pop() side: when there is no items in the queue it will not spin-wait , but it will block and wait on my SemaMonitor, this concurrent FIFO queue is limited to 1000 threads on the push() side, if you want to higher that , just modify the "margin" constant in the source code. This concurrent FIFO queue  gives a throughput of 3.2 millions of transactions per second on my 2.4 GHz Quadcore, that's cool and that's  a decent throughtput .

You have 3 options for setting the kind of locks, just look inside defines.inc , if you want to set it for my scalable array based lock called AMLock just uncomment the option AMLock inside defines.inc, if you want to set it for Ticket Spinlock just uncomment the option TicketSpinlock ,If you want to set it for Spinlock just uncomment the option Spinlock, the Spinlock gives better performance under contention it scored 6.4 millions of transactions per second on my 2.4 GHz Quadcore, the Ticket Spinlock option scored 2.6 millions of transactions per second on my 2.4 GHz Quadcore, the Spinlock scaled even if the number of threads are greater than the number of cores, the TicketSpinlock and AMLock don't scale when the number of threads are greater than the number of cores, the Ticket Spinlock and scalable AMLock are optimal when the number of threads are equal to the number of cores.

The size of the queue must be passed to the constructor and it must be a power of 2.

Please look more information here: Concurrent FIFO queue.

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... 

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems


A scalable array based Lock version 1.03

Authors: Amine Moulay Ramdane.


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2007)


Description:

A scalable array based Lock that works accross processes and threads, it's FIFO fair and it reduces efficiently the cache-coherence traffic.

Note: You have to use a number of threads not greater than 1X times the number of cores so that my scalable AMLock will be optimal.

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... 

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems


A scalable FIFO fair lock version 1.01

Authors: Amine Moulay Ramdane.


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2007)


Description:

A scalable FIFO fair lock.

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... 

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems


Ticket Spinlock with proportional backoff 1.01

Authors: Amine Moulay Ramdane.


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2007)


Description:

Ticket Spinlock with proportional backoff, it's FIFO fair and it reduces the cache-coherence traffic.

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... 

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems

 

Scalable RWLock 3.1

Authors: Amine Moulay Ramdane.


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2010)

Click here to download the zip file: Zip (for Delphi XE1 to XE4)

Description:

A fast, and scalable and lightweight  Multiple-Readers Exclusive-Writer Lock that is portable called LW_RWLock that works accross processes and threads and a fast, and scalable and starvation-free and lightweight Multiple-Readers-Exclusive-Writer Lock that is portable called LW_RWLockX and that works across processes and threads and a fast a scalable Multiple-Readers-Exclusive-Writer Lock that is portable called RWLock and that don't use spin-wait but uses my SemaMonitor so  RWLock consumes less CPU than the lightweight version and also a fast and scalable and starvation-free and lightweight Multiple-Readers-Exclusive-Writer Lock that is portable called RWLockX  and that don't use spin-wait but uses my portable SemaMonitor and portable event objects.

As you have noticed i have invented four variants of my scalable RWLock, the ones that ends with an X in there names are starvation-free the others are not..

Why i have decided to come with scalable and starvation-free RWLocks, cause you can have frequent reads and infrequent writes but from time to time you can have frequent writes, so i think it is important to have a scalable and starvation-free RWLock , this is why i have come up with two variants of scalable and starvation-free RWLocks.

My lightweight variant and starvation-free RWLock called LW_RWLockX and also my LW_RWLOCK both scales even at 3% of writes, that's also very interresting to know...

And you have to know also that even if we are in a scenario with many more writes than 3% of writes and my scalable LW_RWLockX don't scale globally in the timing you have to know that LW_RWLockX will still be useful , cause even if it doen't scale globally in the timing ,  you have to understand that if from the time t1 to the time t2 there is writes and reads and from the time t2 to time t3 there is only reads, so even if the time from t1 to t2 will add more to the overall time cause there is writes threads, the perceived throughput from time t2 to t3 will be higher and the waiting time from t2 to t3 will be lower cause all my RWLocks will be scalable from t2 to t3 and this will make all my scalable RWLocks useful cause for example the database clients from internet or intranet waiting from t2 to t3 will be served more quickly and this is still useful and this will make my scalable RWLocks still useful even if it doesn't scale above 3% of writes.

A Read/Write Lock is a performance improvement over a standard mutex for cases where reads outnumber writes. with a Read/Write Lock multiple simultaneous read locks may be held, but write locks are exclusively held.

The exclusive writing lock ensures that race conditions do not occur, since if one client is writing to the data no other client  may read or write.

Also, the allowance for multiple simultaneous read locks decreases resource contention since multiple readers can safely use the shared data.


Click here to see the explanation of my scalable RWLock algorithm: Algorithm explanation

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... 

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems

 

An enhanced version of the scalable Anderson  array based Lock version 1.03

Authors: Amine Moulay Ramdane.


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2007)


Description:

An enhanced version of the scalable Anderson array based Lock, it's FIFO fair and it reduces efficiently the cache-coherence traffic.

Note: You have to use a number of threads not greater than 1X times the number of cores so that the Anderson Lock will be optimal.

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... 

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems


Threadpool with priorities 1.55 (stable version)


Author: Amine Moulay Ramdane


Click here to download the zip file: Zip (for FreePascal and Lazarus and Delphi 7 to 2010)

Click here to download the zip file: Zip (for Delphi XE to XE4)


Language: FPC Pascal v2.2.0+ / Delphi 5+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).


Description:

Thread Pool Engine.

The following have been added:

- You can give the following priorities to jobs:

LOW_PRIORITY
NORMAL_PRIORITY
HIGH_PRIORITY

- Uses a FIFO queue
that satisfies many requirements: it is FIFO fair, it minimizes efficiently the cache-coherence traffic and it is energy efficient on the pop(): when there is no items in the queue it will not spin-wait , but it will wait on a portable manual event object..

- Enters in a wait state when there is no job in the queue - for more efficiency -

- You can distribute your jobs to the workers threads and call any method with the threadpool's execute() method.

- Uses O(1) complexity on enqueue and O(3) worst case complexity  on dequeue.

Look into defines.inc there is many options:

CPU32: for 32 bits architecture
CPU64: for 64 bits architecture

Please read an article that i wrote about my Threadpool engine: article.

Look at test1.pas demo inside the zip file...

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 5,6,7 use -DDelphi

For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005+

{$DEFINE CPU32} and {$DEFINE Win32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Win64} for 64 bit systems

Note: testpool.pas is a parallel program of a Matrix multiply by a vector that uses SSE+ and it requires Delphi 5+. test.pas and test_thread.pas works with both FreePascal and Delphi. 

Zip

RWLock 1.13

Authors: Amine Moulay Ramdane.


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2010)

Click here to download the zip file: Zip (for Delphi XE1 to XE4)

Description:

Description: A fast Multiple-Readers-Exclusive-Writer Lock that works across processes and threads.


A Read/Write Lock is a performance improvement over a standard mutex for cases where reads outnumber writes. with a Read/Write Lock multiple simultaneous read locks may be held, but write locks are exclusively held.

The exclusive writing lock ensures that race conditions do not occur,  since if one client is writing to the data no other client may read or write. Also, the allowance for multiple simultaneous read locks decreases  resource contention since multiple readers can safely use the shared data.  This increases performance over a standard mutex for the assumed usage pattern of frequent simultaneous reads and infrequent writes.

I have looked at another lockfree Read/Write Lock implementation called TOmniMREW that you will find in the OmniThread library(at
http://otl.17slon.com/), but my RWLock algorithm is faster and more efficient than TOmniMREW and it doesn't starve the writes. I have also included MSync library inside RWLock zip file,  this MSync library includes the Distributed RWLock by Dmitry Vyukov that i have enhanced and wrote in Object Pascal, this Distributed RWLock will use my RWLock and make it more scalable, please take a look at TDRWLock class inside the MSync library inside the RWLock zipfile.

Click here to see the explanation of my RWLock algorithm: Algorithm explanation

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... 

{$DEFINE CPU32} and {$DEFINE Win32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Win64} for 64 bit systems

 

SemaCondvar version 1.21

Author: Amine Moulay Ramdane.


Click here to download the zip file:
Zip (for FreePascal and Lazarus and Delphi 7 to 2010)

Click here to download the zip file: Zip (for Delphi XE1 to XE4)

Description:

SemaCondvar and SemaMonitor are new and portable synchronization objects , SemaCondvar combines all the charateristics of a semaphore and a condition variable and SemaMonitor combines all charateristics of a semaphore and an eventcount , they only use an event object and and a very fast and very efficient and portable FIFO fair Lock , so they are fast and they are FIFO fair.

Now you can pass the SemaCondvar's initialcount  and SemaCondvar's MaximumCount to the construtor, it's like the Semaphore`s InitialCount and the Semaphore's MaximumCount.

Like this:

t:=TSemaMonitor.create(true,0,4);

When you pass True in the first parameter of the constructor the signal(s) will not be lost even if there is no waiting threads, if it's False the signal(s) will be lost if there is no waiting thread.

As you will notice i have used a scalable and efficient and FIFO fair Lock inside SemaCondVar.pas, i have made ,  but you will notice that this Lock is now portable to the x86 systems and to Linux(x86) , Mac OSX(x86) and Windows(x86), but if you want to port it to other systems than the x86, please change inside SemaCondVar.pas the TALOCK by the TCriticalSection of the SyncObjs unit and it will be portable to other systems than the x86. That's all. So enjoy SemaCondvar and SemaMonitor.

Here is the methods that i have implemented :

TSemaCondvar = class
   public

constructor
Create(m1:TCriticalSection;state1:boolean=false;InitialCount1:longword=0;MaximumCount1:longword=INFINITE);
     destructor Destroy; override;
     function  wait(mstime:longword=INFINITE):boolean;
     procedure signal();overload;
     procedure signal_all();
     procedure signal(nbr:integer);overload;
     function WaitersBlocked:integer;
   end;

TSemaMonitor = class
   public
     constructor
Create(state1:boolean=false;InitialCount1:longword=0;MaximumCount1:longword=INFINITE);
     destructor Destroy; override;
     function  wait(mstime:longword=INFINITE):boolean;
     procedure signal();overload;
     procedure signal_all();
     procedure signal(nbr:integer);overload;
     function WaitersBlocked:integer;
     procedure setSignal;
     procedure  resetSignal;

   end;

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... {$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems

{$DEFINE OSX}  for Mac OSX on Delphi XE4 to XE5

Zip



A scalable and relaxed MPMC and almost strict FIFO priority Queue version 1.07


Author: Amine Moulay Ramdane


Click here to download the zip file: Zip


Language: FPC Pascal v2.2.0+ / Delphi 5+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).


Description:

A  scalable and relaxed MPMC priority Queue.

Relaxed means not a strict FIFO, but it's almost a strict FIFO, it processes the jobs in a FIFO order in each queue, and when there is no items in the queue the thread stills from the other queues in a round robin manner, that's good.

Where can my scalable and relaxed MPMC priority Queue be useful ?

When for example you want to do something like a threadpool that do parallel tasks, and you can find that in many applications that do parallel mathematical calculations or parallel mechanical or graphic calculations and  many parallel tasks... so you will reduce on those applications the S part in the Amdahl equation using my scalable and relaxed MPMC priority Queue.

The number of  consumer threads and the number of producers must be equal to the number of queues that you  pass to the constructor so that it become scalable, and you have to initialize first the Push() method with high(pqueue.long), look at how to do it inside the test1.pas example inside the zip file.   

The following have been added:

- You can give the following priorities to jobs:

LOW_PRIORITY
NORMAL_PRIORITY
HIGH_PRIORITY

- A queue for each worker thread and it uses work-stealing - for more efficiency -

- Enters in a wait state when there is no job in the queue - for more efficiency -

- Uses O(1) complexity on enqueue and O(3) worst case complexity on dequeue.

Look into defines.inc there is many options:

CPU32: for 32 bits architecture
CPU64: for 64 bits architecture

Look test.pas and  at test1.pas examples inside the zip file that shows you that my PQueue is scaling.

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 5,6,7 use -DDelphi

For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005+ 

 

Parallel Compression Library 3.31  

Author: Amine Moulay Ramdane


Click here to download the zip file: Zip


Description:

Parallel Compression Library implements Parallel Gzip , Parallel Bzip , Parallel LZMA and Parallel LZ , Parallel LZO and parallel LZ4 algorithms using my Thread Pool Engine.

- Now my ParallelGzip gives 5% better performance than Pigz.

- It supports memory streams, file streams and files

- 64 bit supports - lets you create archive files over 4 GB ,
 
  supports archives up  to 2^63 bytes, compresses and 
  decompresses files up to 2^63 bytes.


- Parallel compression and parallel decompression are extremely fast

- It supports both compression and decommpression rate indicator

- You can test the integrity of your compressed file

- Easy programming interface

- Full source codes available.


Just look at the Easy compression library for example,
if you have noticed it's not a parallel compression library.


http://www.componentace.com/ecl_features.htm

And look at its pricing:

http://www.componentace.com/order/order_product.php?id=4


My parallel compression library costs you 0$
and it's a parallel compression library..


Please see the benchmarks here: benchmarks.

Please look at test_pgzip.pas , test_pbzip.pas , test_plzma.pas , test_plz.pas , test_plzo.pas and tesT_plz4.pas demos inside the zip file, compile and execute them... -


Language: FPC Pascal v2.2.0+ and Lazarus / Delphi 7 to XE5: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Note: to be able to port to Linux and Mac OSX you have to compile the dynamic libraries...

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

-dUnix for Linux,MacOSX etc.

Required Delphi7 to Delphi 2007 switch: -DDelphi32
                                                                         
Required Delphi switch: -DDelphi32
                                                                          
Required Delphi XE-XE5 switches: -DXE   -$H+                   

Look into defines.inc there is many options:

CPU32: for 32 bits architecture
CPU64: for 64 bits architecture

Zip

ParallelVarFiler 1.4



Authors: Amine Moulay Ramdane

Click here to download the zip file: Zip


Parallel Variable filer and streamer for Delphi and Freepascal that can be used also as a parallel hashtable and that uses ParallelHashList (Parallel Hashtable) with O(1) best case and O(log(n)) worst case access that uses lock striping and lightweight MREWs(multiple-readers-exclusive-writer) , this allows multiple threads to write and read concurently. also ParallelHashList maintains an independant counter , that counts the number of entries , for each segment of the hashtable and uses a lock for each counter, this is also for better scalability.

Description:

ParallelVarFiler is a Parallel HashTable that can be saved automatically or manually to a file or to a stream or to a string and it can be restored from  a file or from a stream or from a string back to the Hashtable in memory, and it's fault tolerant to power failures etc.

You can collect different types of variables and save them into one file or stream. TParallelVarFiler reads and writes on files, streams and Strings. You can collect Strings, Integers and Dates, in fact everything that can be stored in a variant. In addition you can collect Streams.

You can use also ParallelVarFiler to send parameters over network. It can be used for example to share data by any IPC mechanism.

And please look at test.pas and test1.pas a parallel variable filer examples - compile and execute them...

Now you can use ParallelVarFiler as a small to medium database, i have added a Clean() and DeletedItems() methods and i have changed the constructor, now you can pass a file name to the constructor where the data will be wrote and updated and deleted... if you pass and empty file name the data will be updated and wrote and deleted only in memory. When you use a file name in the constructor, many readers and one writer can proceed concurrently. But when you pass an empty file name to the constructor, many writers and many readers can proceed concurrently in memory. and when you read a data item it only use the parallel hashtable in memory, hardisk is not used. 

And when you pass a file name to the constructor and delete or update the items, those items will be marked deleted from the file, and you can use DeletedItems() to see how many data items are marked deleted in the file, and use after that the Clean() method to delete completly those data items from the file.

How can you store mutiple data inside a database row and map it to a key ?

You can simply use Parallel Variable Filer for that and stream your mutiple data variants and map them to a key...

As you know ParallelVarFiler is using ParallelHashList (a parallel hashtable that scales on multicores), so when you are using the GetKeys() method , and you want to get the data of those keys don't forget to test if the variant is not empty by using the VarIsEmpty() function, and when you are using GetStream() you can test if the data exists by testing the return boolean value from GetStream().

ParallelVarFiler is easy to learn, i have documented all the methods , and please read inside ParallelVarFiler.pas about
them.

Now ParallelVarFiler is Fault tolerant to power failures etc. i have done a simulation of power failures and data file damages and ParallelVarFiler is recovering from power failures and damages of the data file ...

As you know ParallelVarFiler is using ParallelHashList (a parallel hashtable), but as you know ParallelHashList is cache unfriendly since it's using a hash mechanism etc. and as you know ParallelVarFiler and ParallelHashlist  are memory bound, so since ParallelHashList is cache unfriendly and  it's memory bound so i don't think ParallelVarFiler or ParallelHahsList will scale very well, to avoid this problem you have to use something like replication across computers using TCP/IP and using your database in each computer and load balance between computers, but even if you are using replication and load balancing this can make the memory and hardisk truly parallel but this will not avoid the problem with the network bandwidth limitation. But ParallelVarFiler and ParallelHashlist are scaling to a certain point and they are fast and they are still usefull.

Please see the test.pas example inside the zip file to see how i am using it...

But please read the following:

This software is provided on an "as-is" basis, with no warranties, express or implied. The entire risk and liability of using it is yours. Any damages resulting from the use or misuse of this software will be the responsibility of the user.

please look at the test.pas and test1.pas example inside the zip file to see how to use ParallelVarFiler...

Here is the methods that have been implemented:

PUBLIC METHODS:

constructor Create(file1:string,size,mrews:integer;casesensitive:boolean);

- Creates a new VarFiler ready to use, with size and with mrews number of mrewa and casesensitive

for case sensitive keys,the number of MREWS(multiple-readers-exclusive-writer) must be less or

equal to the Hashtable size and file1 is the the file to write to, if file1 is an empty string

the data will be wrote only to memory and use SaveToFile() to write to a file.

destructor Destroy;

- Destroys the ParallelVarFiler object and cleans up.

procedure Clear;

- Deletes all contents.

function Clean:boolean

- Cleans the deleted items from the file.

function DeletedItems:integer

- Returns the number of items marked deleted.

function LoadData:boolean

- Loads the data from the file passed to the constructor.

function Delete(Name : String):boolean;

- Deletes the variable with Name.

function Exists(Name : String) : Boolean;

- Returns True if a variable with Name exists

procedure GetKeys(Strings : TstringList);

- Fills up a TStringList descendant with all the keys.

function Count : Integer;

- Returns the number of variables

function Add(Name : String; Value : Variant ):boolean;

- Adds a new variable , given the Name

function AddStream( Name : String;Stream : TStream ):boolean;

- Adds a new Stream, given the Name

function Update(Name : String; Value : Variant):boolean;

- Updates the new variable , given the Name

function UpdateStream(Name : String; Stream : TStream ):boolean;

- Updates a new Stream , given the Name

function GetStream(Name : String;Stream : TStream):boolean;

- Fills up the stream with data

procedure SaveToStream(Stream : TStream);

- Saves all variables, streams and Graphics to a stream.

procedure SaveToFile(FileName : String);

- Saves all variables, streams and Graphics to a file.

procedure SaveToString(out S : String);

- Saves all variables, streams and Graphics to a string.

 procedure SaveToStreamAndDelete(Stream : TStream);

   - Saves all variables, streams and Graphics to a stream

     and delete after that all the data inside the hashtable
     and inside the file.
   
   procedure SaveToFileAndDelete(FileName : String);

   - Saves all variables, streams and Graphics to a file.

     and delete after that all the data inside the hashtable
     and inside the file.

   procedure SaveToStringAndDelete(out S : String);

   - Saves all variables, streams and Graphics to a string.

     and delete after that all the data inside the hashtable
     and inside the file.

function LoadFromStream(Stream : TStream):boolean;

- Loads all variables, streams and Graphics from a stream.

function LoadFromFile(FileName : String):boolean;

- Loads all variables, streams and Graphics from a File.

function LoadFromString(S : String):boolean;

- Loads all variables, streams and Graphics from a string.

function Stream2String(Stream1: TStream;var Mystring:string): boolean;

procedure String2Stream(var String2BeSaved:string; Stream1: TStream);

procedure Variant2Stream(var VR:variant; Stream1: TStream);

function Stream2Variant(Stream1: TStream;var VR:variant): boolean;


PUBLIC PROPERTIES:

Items : Variant

- Gets the value (indexed).


Language: FPC Pascal v2.2.0+ and Lazarus / Delphi 7 to 2007: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi use -DDelphi

Zip

MSync module version 2.2

Authors: Amine Moulay Ramdane, Dmitry Vyukov ,  Primoz Gabrijelcic, Marcel van Brakel , Olivier Sannier, Matthias Thoma, Florent Ouchet, Robert Marquardt.

Click here to download the zip file: Zip (for FreePascal and Lazarus and Delphi 7 to 2010)

Click here to download the zip file: Zip (for Delphi XE1 to XE4)



Description:

The following classes have been added to the MSync module: TCriticalSection,TMutex,TSimpleEvent,TSemaphore,TMREW (Multiple-Readers-Exclusive-Writer),  TMultiReadExclusiveWrite  and TDRWLock(Distributed Reader-Writer Mutex).


Language
: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

-dUnix for Linux , Unix and Mac OSX

-dWindows for windows

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 5,6,7 use -DDelphi switch

For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005+

Zip

MREW 1.0



Authors:
Primoz Gabrijelcic and Amine Moulay Ramdane


Click here to download the zip file: Zip


Description:

MREW lock(multiple-readers-exclusive-writer lock) that works across processes and threads.


Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi use -DDelphi

Zip

Parallel Sort Library 3.2

Author: Amine Moulay Ramdane


Click here to download the zip file: Zip (for Delphi and FreePascal)


Description:

Parallel Sort Library that supports Parallel Quicksort, Parallel HeapSort and Parallel MergeSort on Multicores systems.

Parallel Sort Library uses my Thread Pool Engine and sort many array parts - of your array -  in parallel using Quicksort or HeapSort or MergeSort and after that it finally merge them - with the merge() procedure -

In the previous parallelsort version i have parallelized only the sort part, but in this new parallelsort version i have parallelized also the merge procedure part and it gives better performance.

I have implemented a Parallel hybrid divide-and-conquer merge algorithm that performs 0.9-5.8 times better than sequential merge, on a quad-core processor, with larger arrays outperforming by over 5 times. Parallel processing combined with a hybrid algorithm approach provides a powerful high performance result.

The best case complexity of ParallelSort using mergesort  is:

 ((n/p)* log(n/p)) + O(n/p)

p: is the number of cores

the ((n/p)* log(n/p)) is the complexity of the sorting part.

O(n/p) is the best case complexity of the merging part.

so the best case complexity  is:  ((n/p)* log(n/p))

The worst case complexity of parallel sort library using mergesort is:

 ((n/p)* log(n/p)) +  O(n)

the ((n/p)* log(n/p)) is the complexity of the sorting part.

O(n) is the worst  case complexity of the merging part.

so the worst case complexity of parallelsort using mergesort is approximatly: O(n)


I have done some tests with my ParallelSort library and i have noticed that it can give up to 3.8x scalability with strings, and it gives 3x scalability with integers on a quad cores.

So, why it scales to 3.8x with strings and only 3x with integers on a quad cores ?


I explain:

In the SequentialMerge() method and QSort() method inside Parallel Sort library, i am calling the Scompare() method and also in both of them i am copying to the memory system.

So when i am using strings the SCompare() method is more expensive, so the parallel part p in the Amdahl equation 1/ S + P/N (S: the serial part, P: parallel part and N: the number of cores) is bigger than with integers so the Amdahl equation will scale better, but when we are using integers the SCompare() method is less expensive than the SCompare() with strings, so the parallel part p in the Amdahl equation is less bigger than with strings. so this is why parallel sorting with strings scales better than with integers.


I have implemented mergsort and quicksort, but as you know the complexity of mergesort in the worst case is better than quicksort , and the mergesort that i have implemented is faster than quicksort, but mergesort takes more space..

The reccurence equation of the complexity of mergesort is:

T(n) = 2 * T(n/2) + n

cause it take O(n) for the merging part.

It gives:

T(n) = 2 * T(n/2) + n

= 2 (2T(n/4) +n/2) + n

=4T(n/4)+n+n

=4T(n/4)+2*n

=4 (2T(n/8) +n/4) + 2*n

=8T(n/8)+n+2n

=8T(n/8)+3*n

=2k T(n/2^k) + k*n

We want:

n/2^k = 1

n = 2^k

log n = k

so the reccurence equation gives:

= n*T(1) +n*log(n)

= n+ (n * log(n))

So the mergesort complexity in the best case and in the worst case is:

n * log(n)

But the complexity of the quicksort in the worst case is:

T(n)= n + T(n-1)

it gives:

T(n) = n + (n-1) + T(n-2)

T(n) = n + (n-1) + (n-2)+ T(n-3)

T(n) = 1 + 2+ 3+.+N

.T(n) = O(n^2)

And Quicksort in best case performance is:
 
T(n) = T(n/2) + T(n/2) + O(n)
       = 2T(n/2) + (n)

this gives: T(n) = (n lg n)

Parallelizing the Sorts

One way to parallelize the sorts is:

  1. Divide the data among the processors
  2. Sort the data on the individual processors.
  3. Merge the various data

Note that the merge operation is a reduction operation !

I have done some scalability tests on my parallelsort library and i have come to the conclusion that parallel heapsort is better on scalability than parallel quicksort cause the P part (of the Amdahl equation) is bigger in parallel heapsort
than in parallel quicksort, the parallel heapsort is doing more on the parallel part, it's why it scales better than parallel quicksort, but parallel quicksort is still faster than parallel heapsort on my tests on a quad core processor.

And please look at test.pas a parallel quicksort on many cores - compile and execute it...


Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi use -DDelphi

Zip

Distributed Reader-Writer Mutex 1.13

Click here to download the zip file: Zip


Description:

Distributed Reader-Writer Mutex, based on the Dmitry Vyukov C++ Distributed Reader-Writer Mutex , that works across processe and threads, I have included the following Reader-Writer Mutexes inside this Distributed Reader-Writer mutex:

TMREW a light weight MREW lock (multiple-readers-exclusive-writer lock) that is very fast and that works across processes and threads and TMultiReadExclusiveWrite from JCL and now both of them can scale better, and i have modified the Dmitry Vyukov Distributed Reader-Writer Mutex, in the first version i have used GetCurrentProcessorNumber(), and i have also provided you with a second version that scales better, to be able to use the second version please use the version2 inside defines.inc, i have given you a test.pas example for the first version and test1.pas for the second version, but don't forget to use version2 inside defines.inc, to use the second version just uncomment the version2 inside defines.inc and comment version1. I have also done a cache line alignement in TOmniMREW, this has allowed Drwlock to scale better.

I have provided you with the source code, please take a look at the source code to understand better.The Object Pascal Distributed Reader-Writer Mutex  is based on the following C++ Distributed Reader-Writer Mutex by Dmitry Vyukov, read more here:

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex

You have four methods:

procedure wlock; // same as EnterWriteLock of TOmniMREW
procedure wunlock; // same as ExitWriteLock
procedure rlock; // same as EnterReadLock
procedure runlock; // same as ExitReadLock

Here is some scalability numbers:

I have used TOmniMREW of the Omnithread library and used only EnterReadLock() and ExitReadLock() with four threads on a quad cores and TOmniMREW gave a negative scalability of -5.51x

And when i have used the second version of Distributed Reader-Writer Mutex using only rlock() and runlock() , it gave me +3.94x scalability with four threads on four cores. So now it's scaling.

And about the second version , don't forget to initialize the number  that you pass to rlock() and runlock()  to 0 before calling  rlock() and runlock() .

In the previous versions i have aligned the array elements on cache line bounderies like have done Dmitry Vyukov, and when i have tested the second version it didn't work correctly , so i have thought about that and after that i have decided to not align the array elements on cache line bounderied but just add a cache line padding to TOmniMREW and this time it has worked perfectly and now the second version is scaling perfectly..


Language
: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Win and Linux (x86).

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi use -DDelphi

And inside defines.inc you can use the following defines:

{$DEFINE CPU32} for 32 bit systems

{$DEFINE CPU64} for 64 bit systems

{$DEFINE TOmniMREW} to use Omnithread MREW

{$DEFINE TMultiReadExclusiveWrite} to use the jcl TMultiReadExclusiveWrite

Zip

Parallel BucketSort 1.05

Author: Amine Moulay Ramdane


Click here to download the zip file: Zip


Description:

Parallel BucketSort.

Bucket sort is not a sorting algorithm itself, rather it is a procedure for partitioning data to be sorted using some given sorting algorithm—a ”meta-algorithm” so to speak. Bucket sort will be better, when elements are uniformly distributed over an interval [a, b]  and buckets have not significantly different number of elements.

bucketsort sequential best case computational complexity using quicksort is = p (n/p) log(n/p) = n log(n/p)

bucket sort parallel best case computational complexity using quicksort = (n/p) log(n/p)

Parallel BucketSort can scale better than my parallel quicksort and it can be much faster than my parallel quicksort. I have thought about my parallel quicksort , and i have found that
parallel quicksort is fine but the problem with it is that the partition function is not parallelizable , so i have thought about this and i have decided to write a parallel BucketSort,and this parallel bucketsort can give you much better performance than parallel quicksort , just test it yourself and see, parallel bucketsort can sort just strings at the moment, and it can use quicksort or mergesort, mergesort is faster.

And please look at test.pas a parallel bucketsort on many cores - compile and execute it...


Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi use -DDelphi

Zip

Parallel Quicksort 1.09

Author: Amine Moulay Ramdane


Click here to download the zip file: Zip


Description:

Parallel Quicksort using the median-of-three..

In this new version i have changed the partition() function and also i have used the median of three method so that it avoids the worst case performance.

Parallel Quicksort is an implementation of the median-of-three that gives almost 10% better speed.

Parallel Quicksort gave me almost 3x scaling when sorting strings and integers on a quad cores, and you can use it also in an hybrid manner with mergsort, just by passing ctmergesort to the constructor it will give 10% better speed.

And as you know , Quicksort is a divide and conquer algorithm that have the following best case performance:

T(n) = T(n/2) + T(n/2) + O(n)
= 2T(n/2) + (n)

cause it take O(n) for the partition part.

It gives:

= 2 (2T(n/4) +n/2) + n
=4T(n/4)+n+n
=4T(n/4)+2*n
=4 (2T(n/8) +n/4) + 2*n
=8T(n/8)+n+2n
=8T(n/8)+3*n
=2k T(n/2^k) + k*n

We want:

n/2^k = 1
n = 2^k
log n = k

so the reccurence equation gives:

= n*T(1) +n*log(n)
= n+ (n * log(n))

So the quicksort complexity in the best case is:

n * log(n)

And please look at test.pas a parallel quicksort on many cores - compile and execute it...


Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi use -DDelphi

Zip

Parallel implementation of Conjugate Gradient Linear System Solver 1.02

Author: Amine Moulay Ramdane


Click here to download the zip file: Zip


Description:

The Parallel implementation of Conjugate Gradient Linear System Solver that i programmed here is designed to be used to solve large sparse systems of linear equations where the direct methods can exceed available machine memory and/or be extremely time-consuming. for example the direct method of the Gauss algorithm takes O(n^2) in the back substitution process and is dominated by the O(n^3) forward elimination process, that means, if for example an operation takes 10^-9 second and we have 1000 equations , the elimination process in the Gauss algorithm will takes 0.7 second, but if we have 10000 equations in the system , the elimination process in the Gauss algorithm will take 11 minutes !. This is why i have develloped for you the Parallel implementation of Conjugate Gradient Linear System Solver in Object Pascal, that is very fast.

You have only one method to use that is Solve()

function TParallelConjugateGradient.Solve(var A: arrarrext;var B,X:VECT;var
RSQ:DOUBLE;nbr_iter:integer;show_iter:boolean):boolean;

The system: A*x = b

The important parameters in the Solve() method are:

A is the matrix , B is the b vector, X the initial vector x,

nbr_iter is the number of iterations that you want

and show_iter to show the number of iteration on the screen.

RSQ is the sum of the squares of the components of the residual vector
A.x - b.


I have got over 3X scalability on a quad core.

The Conjugate Gradient Method is the most prominent iterative method for solving sparse systems of linear equations. Unfortunately, many textbook treatments of the topic are written with neither illustrations nor intuition, and their victims can be found to this day babbling senselessly in the corners of dusty libraries. For this reason, a deep, geometric understanding of the method has been reserved for the elite brilliant few who have painstakingly decoded the mumblings of their forebears. Conjugate gradient is the most popular iterative method for solving large systems of linear equations. CG is effective for systems of the form A.x = b where x is an unknown vector,
b is a known vector, A is a known square, symmetric, positive-definite (or positive-indefinite) matrix. These systems arise in many important settings, such as finite difference and finite element methods for solving partial differential equations, structural analysis, circuit analysis, and math homework

The Conjugate gradient method can also be applied to non-linear problems, but with much less success since the non-linear functions have multiple minimums. The Conjugate gradient method will indeed find a minimum of such a nonlinear function, but it is in no way guaranteed to be a global minimum, or the minimum that is desired.
But the conjugate gradient method is great iterative method for solving large, sparse linear systems with a symmetric, positive, definite matrix.

In the method of conjugate gradients the residuals are not used as search directions, as in the steepest decent method, cause searching can require a large number of iterations as the residuals zig zag towards the minimum value for ill-conditioned matrices. But instead conjugate gradient method uses the residuals as a basis to form conjugate search directions . In this manner, the conjugated gradients (residuals) form a
basis of search directions to minimize the quadratic function f(x)=1/2*Transpose(x)*A*x + Transpose(b)*x and to achieve faster speed and result of dim(N) convergence.

Jacobi serial complexity is O(N^2) and Conjugate gradient serial complexity is O(N^3/2).

Please look at the test.pas example inside the zip file, compile and execute it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

-dUnix for Linux,MacOSX etc.

Required Delphi switches: -DMSWINDOWS -$H+ -DDelphi

And inside defines.inc you have two defines:

{$DEFINE CPU32} for 32 bit systems

{$DEFINE CPU64} for 64 bit systems

Zip
 


Parallel implementation of Jacobi with relaxation Linear System Solver version 1.01

Author: Amine Moulay Ramdane


Click here to download the zip file: Zip


Description:

The Parallel iterative with relaxation method that i programmed here is designed to be used to solve large sparse systems of linear equations where the direct methods can exceed available machine memory and/or be extremely time-consuming. for example the direct method of the Gauss algorithm takes O(n^2) in the back substitution process and is dominated by the O(n^3) forward elimination process, that means, if for example an operation takes 10^-9 second and we have 1000 equations , the elimination process in the Gauss algorithm will takes 0.7 second, but if we have 10000 equations in the system , the elimination process in the Gauss algorithm will take 11 minutes !. This is why i have develloped for you the Parallel Jacobi with relaxation iterative algorithm in Object Pascal, that is very fast.

Please read more here: ParallelJacobiWithRelaxation

Please look at test.pas example inside the zip file, compile and execute it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Note: to be able to port to Linux and Mac OSX you have to compile the dynamic libraries...

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

-dUnix for Linux,MacOSX etc.

Required Delphi switches: -DMSWINDOWS -$H+ -DDelphi

And inside defines.inc you can use the following defines:

{$DEFINE CPU32} and {$DEFINE Win32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Win64} for 64 bit systems

Zip

ParallelZlb Compression library 1.12

Author: Amine Moulay Ramdane


Click here to download the zip file: Zip


Description:

Parallel Compression library using the Zlib algorithm and my Thread Pool Engine.

- Now my ParallelZlb gives 5% better performance than Pigz.

- It supports memory streams, file streams and files

- 64 bit supports - lets you create archive files over 4 GB ,
 
  supports archives up  to 2^63 bytes, compresses and 
  decompresses files up to 2^63 bytes.


- Parallel compression and parallel decompression are extremely fast

- It supports both compression and decompression rate indicator

- You can test your compressed file

- Easy programming interface

- Full source codes available.


Please look at test_pzlib.pas inside the zip file, compile and execute it.. -


Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Note: to be able to port to Linux and Mac OSX you have to compile the dynamic libraries...

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

-dUnix for Linux,MacOSX etc.

Required Delphi switches: -DMSWINDOWS -$H+ -DDelphi

Zip

Parallel Hashlist 1.48

Authors: Amine Moulay Ramdane(Parallel parts) and Barry Kelly (HashList).


Click here to download the zip file: Zip


Language: FPC Pascal v2.2.0+ / Delphi 5+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Description:

A parallel HashList with O(1) best case and O(log(n)) worst case access that uses lock striping and lightweight MREWs(multiple-readers-exclusive-writer) , this allows multiple threads to write and read concurently. also parallelhashlist maintains an independant counter , that counts the number of entries , for each segment of the hashtable and uses a lock for each counter, this is also for better scalability.

I have tested ParallelHashlist with four threads on a quad core and it's giving a perfect scaling on both reads and writes.

Note: When i have done those benchmarks , there was not enough/much items organized as a self-balancing tree in the individual chains of the hashtable, so , almost all the items was found and inserted in O(1) , so the parallel part in the Amdahl equation was not much bigger compared to to the serial part. But you will notice in pratice that as soon as you will have more items on the chains of the Hash table , organized as self-balancing tree, with a worst case log(n)  , the parallel part will become bigger in the Amdahl equation and you will have better scalability.

Parallel Hashlist was updated to version 1.44 , In the previous version i have set the number of lightweight MREWs(multiple-readers-exclusive-writer) to 128 but i have decided to change that in version 1.44  so that you can give a variable number of MREWS, this will scale better and now the constructor will look like this:

 hash:=TParallelHashList.create(trait, Hashsize, number_of_MREWS);

and the number_of_MREWS must be less or equal to the Hashsize

and please pass a hashsize and the number of mrews in power of 2 to the constructor by using the shl operation for example like this


trait:=TCaseinsensitiveTraits.create;;
hash1:=TParallelHashList.create(trait,1 shl 25,1 shl 25);

Why do you have to use a power of 2 ?

Please read this:

"Power-of-Two Hash Table Size

Any data structures 101 book will say choose a prime for the number of buckets, so that the bucket's index can easily be computed by h(k) = k mod m, where k is the key value and m is the bucket size. While this approach is straight-forward, there are a number of issues with it, including slow modulo performance. ConcurrentHashMap instead uses a power-of-two rule

http://work.tinou.com/2008/09/performance-optimization-in-concurrenthashmap.html "


I am using modulo functions inside parallelhashlist, so it's better to use hashsize and the number of MREWs(multiple-readers-exclusive-writer) in power of 2 ,  this will make the modulo function of the delphi and freepascal compilers 10X faster.

Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 5,6,7 use -DDelphi

For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005

Zip

Linear programming modeling examples in Object Pascal

Click here to download the zip file: Zip


Language: FPC Pascal v2.2.0+ / Delphi 5+: http://www.freepascal.org/

Operating Systems: Win , (x86).

Description:

Linear programming modeling examples in Object Pascal. Please look inside the zip at the example SP.pas (shortest-path problem) in Object Pascal, i have provided you with the LPSolve library, but the examples that i have included do generate and MPS file , so you can use a faster software than LPSolve , example SCIP or CLP from COIN-OR , and just pass them the MPS ouput file that the LPSolve library does generate and they will execute them without any problem...

Please look more information here: Linear Programming.


Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+ -DDelphi

Zip

Lockfree MPMC and SPMC fifo queues version 1.13


Click here to download the zip file: Zip


Language: FPC Pascal v2.2.0+ / Delphi 5+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Description:

Lock-free SPMC algorithm to handle queue FIFO proposed by Amine Moulay Ramdane, use only single CAS on pop and no CAS on push. You can use it in a work-stealing manner with a local queue for each server thread to optimize for more throuput and minimize contention.

and

Lock-free MPMC algorithm to handle queue FIFO proposed by Dariusz Mazur (and modified by Amine Moulay Ramdane), use only single CAS on pop and single CAS on push.

I have done some acalability tests and it's not scaling cause when you are using a single thread some variable are updated on the L1  cache but using multiple threads those variables are loaded from the L2 cache and it's more expensive to load them from the L2 cache. But even though lockfree_mpmc is not scalable, you can   increase the P (parallel) part by doing more of the same: Increase the volume of data processed by the P part (and therefore the percentage p of time spent in computing). This is Gustafson's Law and you will get more scalability.

For example i have used the IntToStr() function on each of the four threads (on a quad core) on my lockfree_mpmc test programs to convert from and integer to a string, so i have increased the P (parallel) part and i have got more scalability, this is Gustafson's Law, and you have to remember Gustafson's Law , this is very important.

Please look more information here: Lock-free MPMC.


Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+ -DDelphi

Zip

Lockfree MPMC and SPMC priority FIFO queues version 1.1

Author: Amine Moulay Ramdane


Click here to download the zip file: Zip


Language: FPC Pascal v2.2.0+ / Delphi 5+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Description:

Lock-free MPMC and SPMC priority FIFO queues. They use 0(1) on enqueue and 0(3) worst case on dequeue


Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 5,6,7 use -DDelphi

For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005+

Zip

FIFO MPMC Queue


Click here to download the zip file: Zip


Language: FPC Pascal v2.2.0+ / Delphi 5+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Description:

Parallel FIFO MPMC queue algorithm that uses a spinlock and an exponential backoff.

Please look at the benchmarks against lockfree fifo queues: FIFO MPMC queue.


Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 5,6,7 use -DDelphi

For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005+


Zip

An M/M/n queuing model simulation with Object Pascal and my Thread Pool Engine - version 1.02

Author: Amine Moulay Ramdane


Click here to download the zip file: Zip


Description:

It's harder and sometimes impossible to get analytical results about waiting times and queue length for general interarrival and service distributions; so, it's important to be able to estimate these quantities by observing the results of simulation.

It's very easy in Object Pascal to simulate a sequence of arrival times with a given interarrival distribution.

Look at the examples MM1.pas( M/M/1 queuing model) and MMn.pas(M/M/n - n: number of servers -) inside the zip file:

---------------------------

InterArrivals:=TExponentialDistribution.Create(420623,1.0/3.0);

ServiceTimes:=TExponentialDistribution.Create(220623,1.0/4.0);

currtime:=0.0;

for i:=1 to simnumber

do

begin

obj:=TJob.create;

obj.simnumber:=simnumber;

obj.number:=i;

currtime:=currtime+InterArrivals.Sample;

obj.ArrivalTime:=currtime;

obj.Servicetime:= ServiceTimes.sample;

TP.execute(myobj.myproc1,pointer(obj),NORMAL_PRIORITY);

end;

-------------------------------------------

Here we have the InterArrivals object and ServiceTimes object and we are calling InterArrivals.Sample to get our samples from the Exponential Distribution.

After that we are calling myobj.myproc1 to simulate our M/M/1 queuing model...

If you look at MMn.pas , you will see that the arrival rate is: 3 and the service rate is 4 , so, this will give us a theoretical value of 1/(4-3) = 1 for one server, and the Object Pascal simulation gave me 1.02 for one server.

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 5,6,7 use -DDelphi

For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005+

Zip

WinMenus version 1.22

Author: Amine Moulay Ramdane


Click here to download the zip file: Zip


Description:

Drop-Down Menu widget using the Object Pascal CRT unit. Please look at the test.pas example inside the zip file.

Use the 'Delete' on the keyboard to delete the items
Use the 'Insert' on the keyboard to insert the items
and use the 'Up' and 'Down' and 'PageUp and 'PageDown' to scroll ..
and use the 'Tab' on the keyboard to switch between the Drop Down Menus
and 'Enter' to select an item..
and the 'Esc' on the keyboard to exit..
and the 'F1' on keyboard to delete all the items from the list

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

-dUnix for Linux , Unix and Mac OSX

-dWindows for windows

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 5,6,7 use -DDelphi switch

For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005+

Zip
 

AWE version 1.3


Author: Amine Moulay Ramdane


Click here to download the zip file: Zip


Description:

Why the 2 GB limitation on the Win 32bits systems ?

My AWE object pascal module is here, it allows your application to use up to 64GB of RAM.

And here is the public interfaces of TAWEMem and TAWEStream classes, i have implemented the following methods:

TAWEStream = class

public

Memory:pointer;
allocated:boolean;
Size:longword;
position:longint;
PageSize:integer;

constructor Create(ulBytes: longint);
destructor Destroy(); override;

function HowManyPagesAllocated(): ulong;
function Seek(Offset: longint; Origin: Word): longint;
function CopyFrom(in_stream:TStream;count:longint):longint;
function CopyTo(out_stream:TStream;count:longint):longint;
function Read(var Buffer;count:longint):longint;
function Write(const Buffer;count:longint):longint;
procedure LoadFromStream(in_stream: TStream);
procedure LoadFromFile(const FileName: string);
function ToString:string;

published

property TotalMem:int64 read gettotalmem;
property AvailMem:int64 read getavailmem;
property MemLoad:int64 read getmemload;


end;

TAWEMem = class

public

memory:pointer;
allocated:boolean;
Size:lonword;
PageSize:integer;

constructor Create(ulBytes: ULONG);
destructor Destroy(); override;
function HowManyPagesAllocated(): ULONG;

end;

Note: To be able to use AWE you have to set the user rights correctly, so, go to:

Control Panel -> Administrative tools -> Local Security Policy -> User Rights Assignment and give 'Lock pages in memory' right to the user that wants to use AWE and reboot the system..

Every TAWEMem object can go up to 2GB of memory and every TAWEStream can go up to 2GB of memory, and you can go up to 64GB of memory !


And please look at the test examples 'test.pas' and 'test1.pas' inside the zipfile - compile and execute them... -


Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows 2000 and up (x86).

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 5,6,7 use -DDelphi

For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005+

Zip

Threadpool 1.56 (stable version)

Author: Amine Moulay Ramdane


Click here to download the zip file: Zip


Language: FPC Pascal v2.2.0+ / Delphi 5+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).


Description:

Thread Pool Engine.

Please read an article that i wrote about my Threadpool engine: article.

The following have been added:

  • Lockfree ParallelQueue for less contention and more efficiency or it can use lockfree_mpmc - flqueue that i have modified, enhanced and improved... -
  • A lock-free queue for each worker thread and it uses work-stealing - for more efficiency -
  • Enters in a wait state when there is no job in the queue - for more efficiency -
  • You can distribute your jobs to the workers threads and call any method with the threadpool's execute() method.


Look into defines.inc there is many options:


Lockfree_MPMC: it uses lockfree_MPMC
RingBuffer: it uses lock-free RingBuffer
SINGLE_PRODUCER
: for a single producer (thread)
MUTIPLE_PRODUCER: mutiple producer (threads)
CPU32: for 32 bits architecture

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 5,6,7 use -DDelphi

For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005+


Please look at the examples test.pas,testpool.pas and test_thread.pas...

Note: testpool.pas is a parallel program of a Matrix multiply by a vector that uses SSE+ and it requires Delphi 5+. test.pas and test_thread.pas works with both FreePascal and Delphi.


Zip
 
   

Parallel Matrix demo


Author: Amine Moulay Ramdane


Click here to download the zip file: Zip



Language: Delphi 5+

Operating Systems: NT/2000/XP and above.

Description:

Parallel program of Matrix multiply by a vector that uses SSE+ and my Lock-free Threadpool. Please look at the example testpool.pas inside the zip file...

This demonstration does use SSE+ , so if you don't have SSE support it will not work. It does work with P4 and above.

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 5,6,7 use -DDelphi

For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005+


Zip
   

LockfreeUtils



Click here to download the zip file:
Zip


Language: FPC Pascal v2.2.0+ / Delphi 5+: http://www.freepascal.org/

Operating Systems: NT/2000/XP and above

Description:

TLFMemoryStream is like the Freepascal and Delphi TMemoryStream, but it's Lock-free: it mean that it's scalable and ready for multicores. Use the Lock-Free TLFMemoryStream to store data in a dynamic memory buffer that is enhanced with file-like access capabilities. TLFMemoryStream provides the general I/O capabilities of a stream object while introducing methods and properties to manage a dynamic memory buffer. Memory streams are useful as intermediary objects that can hold information as well as read it from or write it to another storage medium. They provide a useful format for comparing the contents of streams, or for manipulating data that is stored in a less accessible medium.

The zip file also include TAsyncFileStream - asynchronous file input/output in Windows NT. Please read AFS.txt inside the zip.


Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 5,6,7 use -DDelphi

For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005+


Zip

API module (By Aldo Calpini)
version 0.58


Language: Perl: http://www.activestate.com

With this module you can import and call arbitrary functions from Win32's Dynamic Link Libraries (DLL).

API module for ActivePerl 5.10: Zip

Operating Systems: Windows 95/98/Me/NT/2000/XP

How to install:

Please read the readme file inside the zipfile.


.Zip

AdvNotify module (By Amine Moulay Ramdane)
version 1.32



Click here to download the zip file: Exe


Description:

With this object oriented Perl module you will be able to monitor the changes on your directories, this allows your Perl application to respond to changes in the file system, such as new file and/or directory creation, renaming, size, attribute, time, security-descriptor changes etc. Also, with AdvNotify you can run multiple threads from inside your Perl application to monitor the file system changes and you will be notified of wich file has been changed. The AdvNotify's interface has been carefully designed to be very easy to use.

Note: AdvNotify does use asynchronous completion to receive notification from the WinNT OS (no wasting CPU time "polling"), hence AdvNotify is very efficient. This module needs the API module, please download and install the API module before installing AdvNotify.

Perl: ActivePerl: 5.0050x, 5.6 or higher.

Operating Systems: Windows NT/2000/XP

How to install: Read the readme file inside the zip.

Documentation:

After installing AdvNotify, click on your \perl\html\index.html and scroll down to the Win32 links and you will find it there, or you can also access it directly from \perl\html\lib\win32\AdvNotify.html or from Start => Programs => ActivePerl (documentation).

Exe

Doc


NTFS module (By Amine Moulay Ramdane)
version 1.0


This object oriented Perl module implements a high level interface to the NTFS filesystem.

Please take a look at the documentation..

Perl: ActivePerl: 5.0050x, 5.6 or higher.

Operating System: Windows 2000/XP

Note: This module needs the API module, please download and install the API module before installing NTFS.

Zip

MemMap module (By Amine Moulay Ramdane)
version 2.12



Click here to download the zip file:
Zip


Description:


Shared memory is very important for client-server application developers and is also the fastest mechanism by wich server developers can etablish (local) interprocess communication between the cooperating tasks of an application program, often server applications are designed as a set of processes and threads, these tasks must use some kind of interprocess-communication mechanisms to communicate data and coordinate processing between each other, for example: TCP/IP,Pipes,Shared memory etc., but shared memory is the fastest one. This object oriented Perl module will allow you to map both memory and files into virtual address space, processes or threads can gain access to mapped memory via naming mechanism. If for example one process writes a data into a particular region of shared memory the other processes see it immediatly hence shared memory does avoid system space buffering and slow I/O operations, also shared memory will take benefit of the underlying VM paging mechanism and it will improve the memory access time (pages hits,locked memory regions). Any data structure you want can be embeded in a shared memory area and thus made accessible to multiple processes. This gives you the flexibility to create any communication structure you desire, both text and binary data are supported by this module.

Perl: ActivePerl: 5.0050x, 5.6 or higher.

Operating Systems: Windows 95/98/Me/NT/2000/XP

Note: This module needs the API module, please download and install the API module before installing MemMap.

Zip

Doc


ISync module (By Amine Moulay Ramdane)
version 1.21



Click here to download the zip file:
Zip


Description:

This object oriented Perl module implements the Win32 synchronisation mechanisms: Semaphore,Mutex, Event,Timer objects.

Perl: ActivePerl: 5.0050x, 5.6 or higher.

Operating Systems: Windows 95/98/Me/NT/2000/XP

Note: This module needs the API module, please download and install the Perl API module before installing ISync.

Zip

Doc

   

More to come,Thank you.