database advice for simple functionality and a large number of records

LMHmedchem

New Member
Credits
28
Hello,

I am setting up a data processing project that will have a relatively large throughput. I am anticipating about 17,000,000,000 records in the largest database. There would be a table with a primary key and a record name. There would be a dozen or so other columns that could be in the same table as the primary key. These would be a mix of int, float, and string. There would be another table that would have about 3000 columns of float. There would also be a file associated with each record that would be from 1K to 15K or so in size. There would only be one user.

The task is to generate a large number of records, process the records to generate about 3000 floats per record, and then analyze the float data with other tools. I can think of allot of ways to do a task like this but most things are what databases do already, so I don't want to reinvent the wheel.

There are a number of issues here from the operating system, the file system, and the database technology to allow for billions of records in a database and billions of associated files. I would like to run this in Linux, but I am not sure what the best flavor is or if it matters.

Can anyone suggest a good OS to run such a platform? Will any 64-bit Linux be sufficient?

I have mostly stuck with CentOS and OpenSUSE. I have also used Cygwin on Windows, thought I would expect that to be overly slow. The OS would have to have reasonable native support for the database technology. I would guess the XFS would be the best filesystem to use but I don't know.

As far as the database technology, I have used SQLite several times but I don't know if trying to put all of this in one table is a good idea. The data could likely be split up over multiple databases if necessary. I think that it would be helpful if the database had native compression for the tables. There would probably be some streaming processes that would generate records, populate the database, process the records, and add the the database. This could involve decompression/compression.

At any rate, I am trying to develop a framework for this and I want to get together a list of components that I know can manage the individual parts before trying to put it all together.

Any suggestions would be greatly appreciated,

LMHmedchem
 


brickwizard

Well-Known Member
Credits
5,271
Can anyone suggest a good OS to run such a platform? Will any 64-bit Linux be sufficient?
If security of data base is a concern I suggest you read up on RHEL linux [non-free] Centos and Cubeos , if its going onto a dedicated server then look at one of the server builds of Linux, if the data base is for general use and security is not a concern, then almost any build of Linux can be used/adapted.
I am not up on what you are trying to achieve but this will give you something to start reading.
Bwiz
 

f33dm3bits

Gold Member
Gold Supporter
Credits
24,745
I would go for Rocky Linux, Cloud Linux or Alma Linux because CentOS 8 support end 31-12-2021 because CentOS is going Stream which means it won't be a RHEL clone anymore but Up-stream from RHEL. I don't know much about databases but for that many records sqllite doesn't sound the right tool for the job, so have a look at postgres, mariadb or mongodb and see which best fits your needs if you want to use an opensource database, that's just my two cents.
 
Last edited:
$100 Digital Ocean Credit
Get a free VM to test out Linux!


Latest posts

Top