What is the best method in bash to copy large amount of files

linux_NBcKTU

New Member
Joined
May 4, 2022
Messages
4
Reaction score
1
Credits
44
My question: I have a large amount of files on a disk and I have three different methods to move/copy them:

1. using find:
```
find /disk -xdev -type f -iname "*.pdf" -exec cp -ga "{}" /dest \:
```
2. using find print0 and grep -z:
```
find /disk -xdev type f -iname "*" -print0 > ./alllist.txt
```
and then grepping with
```
grep -z "<some regex="" for="" pdf="" files"="" .="" alllist.txt=""> ./copylist.txt
```
and then using that copylist for copying

3. using a single
```
rsync
```
expression which only copies pdf files into one! folder

Which one would you all suggest?? (less moving of the RW-Head and don't changing too much on disk structure). Another question is: what would be the rsync command for that?
 


I have for instance web dev at /srv/http . inside http I have a few web dev directories. each web dev has version control with a .git . I then need to get files to my Desktop from the whole of say one web dev, but I don't want certain directories like .git because with distribution via sourceforge , its a case of zipping up a whole folder and uplaoding.

So if its a case of you want to copy/backup just about all the files and exclude a couple of files or directories then rsync works very well. let me give you an example :

Code:
$ sudo rsync -avu --exclude .git  --exclude  node_modules --exclude vendor ads.com/   /home/andrew/Desktop/CI4-CMS/

here i use rsync with archive flag -a , verbose etc I exclude vendor directory , node_modules and .git

Location context via terminal is inside http and I copy everything <b>inside</b> ads.com to <b>inside</b> of a directory on my desktop called CI4-CMS

Everything gets copied as follows and retains permissions :

Code:
ndrew@darkstar:~/Desktop]$ tree -L 1 CI4-CMS                                                                           (05-05 08:25)
CI4-CMS
├── app
├── bootstrapCss
├── bootstrapS
├── builds
├── composer.json
├── composer.lock
├── env
├── fontawesome
├── Gruntfile.js
├── Gruntfile.js.bk
├── Gruntfile.js.save
├── gulpfile.js
├── license.txt
├── package.json
├── package-lock.json
├── PHPMailer
├── phpunit.xml.dist
├── public
├── README.md
├── scss
├── spark
├── tecnickcom
├── tests
└── writable


if you have a mixed bag then you might findit easier to use find ; i guess you can also pipe from one command to another
 
I have for instance web dev at /srv/http . inside http I have a few web dev directories. each web dev has version control with a .git . I then need to get files to my Desktop from the whole of say one web dev, but I don't want certain directories like .git because with distribution via sourceforge , its a case of zipping up a whole folder and uplaoding.

So if its a case of you want to copy/backup just about all the files and exclude a couple of files or directories then rsync works very well. let me give you an example :

Code:
$ sudo rsync -avu --exclude .git  --exclude  node_modules --exclude vendor ads.com/   /home/andrew/Desktop/CI4-CMS/

here i use rsync with archive flag -a , verbose etc I exclude vendor directory , node_modules and .git

Location context via terminal is inside http and I copy everything <b>inside</b> ads.com to <b>inside</b> of a directory on my desktop called CI4-CMS

Everything gets copied as follows and retains permissions :

Code:
ndrew@darkstar:~/Desktop]$ tree -L 1 CI4-CMS                                                                           (05-05 08:25)
CI4-CMS
├── app
├── bootstrapCss
├── bootstrapS
├── builds
├── composer.json
├── composer.lock
├── env
├── fontawesome
├── Gruntfile.js
├── Gruntfile.js.bk
├── Gruntfile.js.save
├── gulpfile.js
├── license.txt
├── package.json
├── package-lock.json
├── PHPMailer
├── phpunit.xml.dist
├── public
├── README.md
├── scss
├── spark
├── tecnickcom
├── tests
└── writable


if you have a mixed bag then you might findit easier to use find ; i guess you can also pipe from one command to another
Really thanks for the answer, but I meant to copy no folders but transferring all files into one single destination folder just eg. that I'm using BTRFS. The problem for me eg. so far is with find maybe also rsync is that these commands r' looking for folders and moving the RW head of a HDD too much, deteriorating the disk... Therefore I came on the idea to do a single find putting all found files into a text file, which will be worked by another script using grep -z and a copy --- but I am seriously not sure due I don't want to do a scientific test which step r' taking more time and less disk damage...
 
Huh, you sure you're not thinking too hard about this? Maybe I'm just not catching what you're throwing.

You want to copy a bunch of files of a certain extension, from multiple directories into a single directory? Why? Then you end up with a mishmash of files that may or may not be related, and harder to find what you're looking for. Or you want to copy all of them into a single text file, e.g. concatenate them? And you're worried about the wear on your hard drive.

Using a modern filesystem (ext*, btrfs) will reduce fragmentation, which potentially reduces drive wear. Rsync is designed for archiving/backups, and is about the best CLI tool for the purpose, in my opinion.
 
If I want to move a large number of files, I do it the easy way...I use "copy and paste".
happy0006.gif


If I wanted to move 80GB or more of what ever, from my main Drive to an External Drive be it HDD or SSD...it's best to move small amounts at a time eg 20GB lots which works just fine and doesn't cause any problems.
happy0054.gif


As for Fragmentation...you never need to defrag a Linux Distro because it doesn't happen...that's a windwoes problem.
happy0015.gif
 
Huh, you sure you're not thinking too hard about this? Maybe I'm just not catching what you're throwing.

You want to copy a bunch of files of a certain extension, from multiple directories into a single directory? Why? Then you end up with a mishmash of files that may or may not be related, and harder to find what you're looking for. Or you want to copy all of them into a single text file, e.g. concatenate them? And you're worried about the wear on your hard drive.

Using a modern filesystem (ext*, btrfs) will reduce fragmentation, which potentially reduces drive wear. Rsync is designed for archiving/backups, and is about the best CLI tool for the purpose, in my opinion.
"Huh, you sure you're not thinking too hard about this? Maybe I'm just not catching what you're throwing."
yes I agree, sometimes I think that too.
"You want to copy a bunch of files of a certain extension, from multiple directories into a single directory.<...> Then you end up with a mishmash of files that may or may not be related,"
Yes they r' all independent pdf files and currently I am trying to use a Btrfs as database usage (because most databases, eg. MySQL, Oracle are using in the ground a b-tree filesystem (afaik) therefore I am thinking to remove the overhead of a certain database and using eg something like recoll... because it is easier to btrfs recovery instead of hacking in large database blobs.. isn't it????
 
Last edited:
If I want to move a large number of files, I do it the easy way...I use "copy and paste".
happy0006.gif


If I wanted to move 80GB or more of what ever, from my main Drive to an External Drive be it HDD or SSD...it's best to move small amounts at a time eg 20GB lots which works just fine and doesn't cause any problems.
happy0054.gif


As for Fragmentation...you never need to defrag a Linux Distro because it doesn't happen...that's a windwoes problem.
happy0015.gif
Thanks for any seriously meant answer, but my datas are in many subfolders and i don't wanna copy that folder structure... and defraggin in in btrfs already avoided (look at autodefrag)
 
You can follow the copy's progress by using the command df from another terminal. Try
Code:
watch df
or
Code:
watch ls -l /new-disk
to see a report updated every two seconds; press Ctrl-C to end the display. Be aware that running the watch program itself will slow down the copying. To copy all files run
Code:
cp -ax / /new-disk
The above is the simplest method, but will only work if your original Linux system is on a single disk partition.
 


Top