Copying Directories with Exceptions - possible?

Lucan

New Member
Joined
Jan 9, 2019
Messages
7
Reaction score
6
Credits
0
I want to copy a directory tree with the exception of one (or some) of its subdirectories. There does not seem to be anything in man cp to cover this. In effect I'm looking for an "--except" option, something like :-

cp -a --except=/home/BS,/home/fubar /home /backup/

... which would copy the whole home directory except for the BS and fubar subdirectories. Is there a way to do something like this, with an option I have missed, or some other way? I often find I need it, and it is tedious doing multiple cp's for each of the directories that I do want. Sometime the destination directory is itself in the source directory tree but must obviously be excluded.
 


Use tar to create an archive to stdout, of the source tree, excluding the unwanted directories and pipe the result back through tar to extract the wanted stuff in the target directory.

Note the relative directory specs.

cd / ; tar cf - ./home --exclude=./home/BS --exclude=./home/fubar | (cd /backup; tar xf - )
 
G'day mate, you drew the short straw and got me first :confused::eek::rolleyes: and I am no expert in cp.

This will work though, I have just tried it

Nope, I am second apparently, welcome @Lyall :)

Code:
cp -r !(/home/BS|/home/fubar) /home/backup

There are also options with

Code:
rsync --exclude

... but I have chosen this one as you asked about cp.

Cheers all

Chris Turner
wizardfromoz
 
Would you mind explaining !(/home/BS|/home/fubar)?
Are you using a specific shell?
I use Bash 4.412 on my Gentoo system and I tried the cp command you supplied and receive an event not found error.
Having been a bash user for 20 odd years, this confuses me somewhat and am keen to understand the mechanics of stuff that may have been introduced since the last time I did a deep dive.
:)
 
Seems The Jury is out on this one, so in the meantime I would advocate the OP going with @Lyall 's offering, unless he specifically wants to stay with cp.

Lucan, BTW, is on Devuan as I understand it.

@Lyall - Below screenshots are from my Linux Mint 19 'Tara' Cinnamon. In my case, to save time on privileges and permissions issues, I simply created a folder "share2thendelete" in ~/.local and then cd'ed to ~/.local/share . There, I picked 2 folders, nemo and vlc, to exclude.



xftTZ75.png


SCREENSHOT 1

The inxi output shows bash 4.4.191 being used


4Mdpa8x.png


SCREENSHOT 2

In the above, the pane on the right was the target, the one on the left the source, and you can clearly see that the operation was successful. vlc and nemo folders have been excluded.

BUT (Wizard's but is never far behind him) - I went to my Calculate Linux (Gentoo-based) and experienced the same outcome as you, on I think the same bash.

So while my method works, it may be conditional, and in that case a more global solution is preferable :)

Cheers

Wizard
 
Yes, that worked for me, thanks. Not sure there is anything like that in "man cp" though.

The "!(/home/BS|/home/fubar)" part of the command posted by Wiz is called a regular expression, or a regex. It isn't a part of the cp command per se, so it won't be listed in the man page for cp.

I can't remember the exact man-page for the regex format. I think it might be:
Code:
man 7 regex
or you could use:
Code:
 apropos regex
To see what regex related man-pages are installed on your system.

You can also have a quick google for "bash regular expressions tutorial" and that should also yield some regex related tutorials, to help you familiarise yourself with them.
e.g.
http://www.aboutlinux.info/2006/01/learn-how-to-use-regular-expressions.html

When used as a parameter to a command - The shell will use the regular expression to determine the appropriate files and directories to act upon and will pass them as parameters to the cp command.

In this case, the pattern will match any files or directories that are NOT called /home/BS or /home/fubar.
The ! is the logical negation operator, or NOT. The pipe character represents a logical OR.
So this:
"!(/home/BS|/home/fubar)"
Translates to :
NOT ( /home/BS OR /home/fubar)

So when that regex is used to specify the input files/directories for cp to copy, it means that the names of ALL files and directories that are NOT called /home/BS OR /home/fubar will be passed to the cp command as a list of target files/directories to be copied to the specified destination.

So going back to Wiz's cp command:
Code:
cp -r !(/home/BS|/home/fubar) /home/backup

The -r option to cp is the recursive option, which means that the cp command will recurse into all sub-directories for any specified directories and copy all of their contents too.

So the entire command translates to:
Recursively copy any files or directories that are NOT called /home/BS OR /home/fubar to /home/backup

Does that make sense?
 
I understand it's a regular expression, but the problem is, ! is a reserved Bash word (negation), brackets (list) enclose a list which is executed in a subshell, both of these come directly from the Bash man page.
What you are describing is globbing, or filename expansion, which is a subset of regexp.
Every command line command expects parameters as input, cp being no exception.
They do not, however actually expand filename expressions, that is done by the shell, so one wildcard in a cp on the command line may result in hundreds or thousands of parameters to cp.
The supplied cp 'regexp' is not processed by cp, nor is it a valid filename expression for bash. If it was a valid expression for another shell, then it they would be expanded and become parameters to the cp.
With regard to even typing the supplied cp on a bash command line, the 'readline' facility is jumping in and grabbing the ! and interpreting it
I even tried running the cp as a script (#!/bin/bash), which results in a unexpected token '(' error, basically because there is no space after the !.

I completely fail to see how the supplied cp will work within a Bash environment.

Feel free to modify the following script so that it runs :)
#!/bin/bash
trap "rm -fr /tmp/cptest" EXIT
cd /tmp
mkdir -p {cptest,cptest/A,cptest/B,cptest/C}
cp -r !(/tmp/cptest/A|/tmp/cptest/B) /tmp/cptest/backup
ls -laR /tmp/cptest/backup​

because, when I run it, I see this...
lyall@Lyalls-PC /tmp
$ bash -x ./cp.sh
+ trap 'rm -fr /tmp/cptest' EXIT
+ cd /tmp
+ mkdir -p cptest cptest/A cptest/B cptest/C
./cp.sh: line 5: syntax error near unexpected token `('
./cp.sh: line 5: `cp -r !(/tmp/cptest/A|/tmp/cptest/B) /tmp/cptest/backup'
+ rm -fr /tmp/cptest
lyall@Lyalls-PC /tmp
$​

Maybe zsh or csh or some other shell, but I don't use them nor do I have them installed to fiddle with.
 
I think the main perspective here should be that the method worked for the OP, desired outcome achieved, problem solved?

You could start a Thread, Lyall, on shell compliancy with POSIX, or similar, if you wish to pursue the matter. I would be interested to learn.

I typically run 70 - 80 Linux over 2 rigs (30 on this main one), and I have not the time to test them all, but I tried my method on a few, and with the ones that failed, it was for Event not Found. Bash versions are listed.

Ubuntu 18.04.1 MATE - 4.4.191 - FAILED

Linux Mint 19 Cinnamon - 4.4.191 - worked (same bash version)

Manjaro 4.4.23 - FAILED

MX-18 (Debian derivative) 4.4.12 - worked

Kali 4.4.23 - worked (but same bash version as Manjaro)

So, go figure. :confused:

I'm installing Fedora 29 Workstation this weekend, so will check that.

Friday here in Oz, so to all

Avagudweegend :D

Wizard

BTW Lucan - Rsync would work as well, likely - there is an "rsync --exclude" option which likely resembles what you were first looking for.;)
 
Just a postscript to the above - I have run the cp command on newly installed Fedora 29 WS (Cinnamon spin), works fine. Bash is 4.4.23, again, so 2 for, 1 against with that version.

@Lucan - if you get time, can you give me your Bash version from Devuan?

Code:
bash --version

#that's a double dash

Thanks.

@Lyall - FYI - I only use Bash. One exception in my stable may be Deepin - one of them runs on zsh but I have not got into that.

Cheers

Wizard
 
I am ok knowing that I am running an old version of bash and that this is a new feature which I don't have.
It is cool knowing that filename expressions are can be full regexps, I look forward to checking the manpage for the new version of bash.
Being able to exclude files from a glob expression is cool, much better than a subshell running ls or find and piping output through egrep to filter the list) :)

I do wonder if the new version has a setting to enable/disable this behaviour as it's definitely not POSIX compatible :)

Edit: Examining the release notes for Bash 4.4 to 5.0, there is no mention of full regex in globbing. Still keen to know how this works.
 
Last edited:
I understand it's a regular expression, but the problem is, ! is a reserved Bash word (negation), brackets (list) enclose a list which is executed in a subshell, both of these come directly from the Bash man page.
What you are describing is globbing, or filename expansion, which is a subset of regexp.
Every command line command expects parameters as input, cp being no exception.
They do not, however actually expand filename expressions, that is done by the shell, so one wildcard in a cp on the command line may result in hundreds or thousands of parameters to cp.
The supplied cp 'regexp' is not processed by cp, nor is it a valid filename expression for bash. If it was a valid expression for another shell, then it they would be expanded and become parameters to the cp.
With regard to even typing the supplied cp on a bash command line, the 'readline' facility is jumping in and grabbing the ! and interpreting it
I even tried running the cp as a script (#!/bin/bash), which results in a unexpected token '(' error, basically because there is no space after the !.

I completely fail to see how the supplied cp will work within a Bash environment.

Feel free to modify the following script so that it runs :)
#!/bin/bash
trap "rm -fr /tmp/cptest" EXIT
cd /tmp
mkdir -p {cptest,cptest/A,cptest/B,cptest/C}
cp -r !(/tmp/cptest/A|/tmp/cptest/B) /tmp/cptest/backup
ls -laR /tmp/cptest/backup​

because, when I run it, I see this...
lyall@Lyalls-PC /tmp
$ bash -x ./cp.sh
+ trap 'rm -fr /tmp/cptest' EXIT
+ cd /tmp
+ mkdir -p cptest cptest/A cptest/B cptest/C
./cp.sh: line 5: syntax error near unexpected token `('
./cp.sh: line 5: `cp -r !(/tmp/cptest/A|/tmp/cptest/B) /tmp/cptest/backup'
+ rm -fr /tmp/cptest
lyall@Lyalls-PC /tmp
$​

Maybe zsh or csh or some other shell, but I don't use them nor do I have them installed to fiddle with.

Sorry, I'll have to hold my hands up and admit - I didn't try the cp command in Wiz's example. I just assumed it worked because
A. Wiz posted it
and
B. Because Lucan said it worked for him.

And yes, you're correct, it does appear to be missing a space between the ! and the opening brace "(". And perhaps I was slightly inaccurate with some of my wording. I wasn't sure if it was the shell, or readline, or what it was that did the evaluation. But going by the assumption that the command does work (which it does, but only on recent versions of bash it would seem) - something would have obtained a list of files from the pattern when it was evaluated and then it would feed cp a list of file-names - which is pretty much what I alluded to in my original post. In a round-about way!

I'm not fully up to speed with all of the jargon when it comes to writing shell scripts - I'm usually too busy writing scripts to be worried about the correct terminology for things - a bit of a failing on my part. Nobody is perfect. Especially not me! :)

I was on my phone yesterday (and today), not on a Linux PC, so I didn't have the means to test it myself. AFAIK it is quite new functionality in bash. So if you have a slightly older version of bash it might not work for you if you added a space after the !.

However, if wiz's solution doesn't work for you - there's always more than one way to skin a cat!

Another way of excluding those directories from your example would be to do something this in your script:
Code:
#!/bin/bash
trap "rm -fr /tmp/cptest" EXIT
cd /tmp
mkdir -p {cptest,cptest/A,cptest/B,cptest/C,cptest/backup}
cd -
cp -r $( \find /tmp/cptest/* | \egrep -v "t/A$|t/B$|t/backup$" ) /tmp/cptest/backup/
ls -laR /tmp/cptest/backup

Which unike Wiz's solution means using a sub-shell. But it does the job!

I'm sure Lyall understands what's going on in that script, but for the benefit of the non-scripters.

There are a few unusual things going on in there:
I have used \find and \egrep to escape/avoid any aliases that the user may have set up for find or egrep.

If the user has set up an alias for egrep which includes line numbers, or any fancy formatting - the output from egrep will not be suitable for us. We just want the plain file-names/directory names to feed to the cp command.

Similarly for find - the user may have an alias set up which sets certain options. We just want to use find, with no options and with a simple path/pattern.

So we have escaped any aliases by prepending a backslash to the find and egrep commands.

The cp command itself is fairly straightforward.
But to get our list of input files, we have our "\find | \egrep" command in a subshell.
e.g. this part of the line:
Code:
$( \find /tmp/cptest/* | \egrep -v "t/A$|t/B$|t/backup$" )
This will provide the cp command with the list of target files and directories to copy to the backup folder.

Because no options have been set - find will not recurse into any subdirectories. It will only list items that are in the top level of the search directory.
The reason for the * at the end of the search regex e.g. "/tmp/cptest/*" is because we dont want to get /tmp/cptest included in the results. We only want to see items that are in /tmp/cptest/ NOT the entirety of tmp/cptest itself. Otherwise we'd accidentally end up recursively copying everything in cptest/ into the cptest/backup/ directory - which would completely negate what we're trying to do.

So find simply lists everything in the top-level of /temp/cptest/

After getting results from find, we pipe them to \egrep which filters out the files or directories that we want to exclude. The -v option will negate the results to yield only files/directories that DO NOT match the specified regex.

The directories we want to exclude are:
A, B and backup (because the backup directory is also in the main folder we're searching in, so we need to exclude that too)

Taking another look at the regex pattern:
"t/A$|t/B$|t/backup$"
Each exclusion in the regex is separated by a pipe character '|' - which represents a logical OR.

The "t/" part in each refers to the "t/" characters at the end of "cptest/" in the path. And the $ at the end of each one tells \egrep that the pattern is at the end of the line.
So we are looking to exlude any results from find that end in:
t/A
t/B
t/backup

Which will exclude /tmp/cptest/A, /tmp/cptest/B and /tmp/cptest/backup.
We could have put the full paths into our regex:
"/tmp/cptest/A|/tmp/cptest/B|/tmp/cptest/backup"
But I decided to whittle it down to t/A$ etc.

Getting back on point:
The only directories we have in /tmp/cptest/ are A, B, C and backup.

egrep has now filtered and excluded A, B and backup from the results from find - so the only directory that gets passed to the cp command and copied to the /tmp/cptest/backup/ directory is /tmp/cptest/C/

So at the end of the script, /tmp/cptest/backup/ will only contain the C/ directory.

You could also use ls and egrep instead of find in a subshell too. But the globbing from Wiz's post is the nicest solution - if you have a recent version of bash!

It should also be possible to do it as a one liner using find.
 
Last edited:
Actually after a quick trawl through the man page for find - to do it in a single find command is pretty easy!

The syntax is:
Code:
find /path/to/search/* \( -not -name "dirToExclude1" -not -name "dirToExclude2" \) -exec cp -r  {} /path/to/backup \;

Where dirToExclude1 and dirToExclude2 are the names of two directories we want to exclude from our find and /path/to/backup is the path to the directory we are backing up to.

So for each file or directory you want to explicitly exclude you need to add a '-not -name "nameOfFileOrDir"' somewhere inside the brackets.

There is a reason that the "-not -name" clauses are grouped together inside a pair of brackets \( \). If the brackets were not present, the -exec would only get called for items meeting the final clause. By grouping the clauses inside the brackets, the exec will be applied to all files/directories that pass all of the requirements.

So if you wanted to exclude a file called excludeme.txt you would add:
Code:
-not -name "excludeme.txt"
Somewhere inside the brackets.

Using Lyalls example:
Code:
find /tmp/cptest/* \( -not -name "A" -not -name "B" -not -name "backup" \) -exec cp -r  {} /tmp/cptest/backup \;
In the above we have specified three directories to exclude. A,B and backup.

Using the OPs example - they wanted to do this kind of thing with cp:
Code:
cp -a --except=/home/BS,/home/fubar /home /backup/
That obviously doesn't work because there is no --except option.

But using find:
Code:
find /home/* \( -not -name "BS" -not -name "fubar" -not -name "backup" \) -exec cp -r {} /home/backup \;

Again, the OP was attempting to copy everything from /home (or should that be /home/username??) to /home/backup/, whilsty excluding two specific directories.

But because their target directory "backup" is in /home as well, they will need to exclude that too - no point copying the backup directory into itself!
 
- I have run the cp command on newly installed Fedora 29 WS (Cinnamon spin), works fine. Bash is 4.4.23, again, so 2 for, 1 against with that version.

@Lucan - if you get time, can you give me your Bash version from Devuan?

It reports :-
GNU bash, version 4.3.30(1)-release (x86_64-pc-linux-gnu)

That came with Devuan 1.0.0. I'm bracing myself to upgrade to 2.0.0.
 
I use Bash 4.412 on my Gentoo system

Lyall, was that actually 4.4.12 ?

@Lucan , ta (pron. "tar" Aussie for thank you) for that info on 4.3.30 :)

...but for the benefit of the non-scripters.

... and that would include ...moi :confused::eek::rolleyes:

Thanks Jas, it's all clear now :)rolleyes:) but I follow a fair bit. For the benefit of Lukan and Lyall, my area of interest (I won't say expertise) is in multi-multi-booting and in recovery measures such as Timeshift and the like. Troubleshooting, &c.

Lucan, sorry if it seems we (I) have hijacked your Thread, we may have to start a Thread elsewhere on the subject, because I find this extraordinarily interesting, and want to learn more.

Cheers all

Wizard

BTW @JasKinasis , Mate are you on Sid, or other, and what is your bash version, please?
 
I'm running Debian Testing on my laptop.
Have been for a number of years.
But I haven't used my laptop for a couple of weeks. Next time I boot it up, I'll check which version of bash I'm running.
 
My bash version is 4.4.12 under Gentoo which pretty much up to date, except, at this time, a minor blocking issue with mariadb and openssl.
 
Thanks, guys :)

Lucan's 4.3.30 goes back to 2014 since release.

Lyall has 4.4.12 and it doesn't work, but 4.4.12 on my MX-18 does.

Most recent bash used was 4.4.23 which worked with my Fedora 29 WS and Kali, but didn't with my Manjaro.

4.4.191 - worked with Linux Mint 19 but didn't with Ubuntu 18.04, unusual.

We've covered Debian, Gentoo, Arch and RPM, all with mixed results, varying even with the same bash version.

I believe this to be more a case of how different Distros handle regex/p with regard to POSIX compliance or not. (But I am grasping with the concepts I have not learned yet).

In any event, I don't expect we will solve it here, but if someone wants to start a Thread and steer it, I can take instructions as easily as give them, and try things on my different Distros.

Thanks for your input, folks, and thanks to Lucan - hope you got something out of it ;)

Wiz
 
Interesting that the bash 5.0 manual explicitly mentions the !(pattern) that we have been having difficulty with.
Even Bash, Version 2.05a mentions it.
It must be a compilation issue with regard to POSIX compliance. I think I may do some research to make my Gentoo bash non-POSIX compliant, to gain some of these features. :)
 
I want to copy a directory tree with the exception of one (or some) of its subdirectories. There does not seem to be anything in man cp to cover this. In effect I'm looking for an "--except" option, something like :-

cp -a --except=/home/BS,/home/fubar /home /backup/

... which would copy the whole home directory except for the BS and fubar subdirectories. Is there a way to do something like this, with an option I have missed, or some other way? I often find I need it, and it is tedious doing multiple cp's for each of the directories that I do want. Sometime the destination directory is itself in the source directory tree but must obviously be excluded.

this command also works:
cp -r `ls -A | grep -v "dir2"` /destination/
 

Members online


Latest posts

Top