Database Idea and Theory


(David Coles-Dobay) #1

A while ago I was working on a very large database with 375 million records.

I used mysqld_multi to set up a group of 1000 listeners and segmented the database into 1000 data stores.

Other people wrote various scripts that interacted with the stack.

I had the idea to create socketless listeners and responders.

Never got around to writing the code but I am thinking of a small C++ daemon listener that would receive the query and remain busy on the client side.

The daemon would send a request to a server daemon instance than disconnect. The server daemon would query the db then forward the result back to the client daemon.

On single client server this has no benefit but on a cluster the ability for any node in the cluster to answer the query would free up the IO on the cluster.

So the client query to localhost would trigger the instance that attaches an identifier then queries a dispatch server that then does a presort based on the query forwards the query to the correct node and the node answers the correct result back to the daemon on the client. Asynchronously.

I saw a need for this when doing several million queries per second on the large DB cluster. It was running out of available sockets to accept new queries. They solved their issue by presorting their query sets and queuing the individual nodes instead.

I never did find if there has been a middleware that does this sort of thing. And no the cluster software does not do this. I also discovered you can not do this sort of thing with Cisco or windows on the network as it will trigger STP prevention and create a tremendous ARP storm.

If you know of software that can do asynchronous network database queries please reply.


#2

Maybe this can help: https://www.postgresql.org/docs/current/static/libpq-async.html

PostgreSQL outperforms MySQL on large clusters. Once you get used to PostgreSQL you never go back to MySQL unless you need specific features.


(David Coles-Dobay) #3

I am really not a big fan of postgres. I know it is popular and mariadb has a lot of the same issues I did not like about postgres. I use mariadb often with php7.0. The issue I found with postgres was from a few years ago I wrote a calendaring scheduling service salon management web app that had 9 stations feeding the db. The db queuing caused the calendar to double schedule. I pulled the postgres and used mysql and the problem was gone.

The shop was run on iMac G4 flat panels so it may have been just a Mac thing but never trusted postgres after that.

Unless there have been major improvements I think that there would still be an issue with the lock on write that Postgress had back then.

Thanks