Creating Shell script to compare two directories

tolong1

Member
Joined
Dec 15, 2017
Messages
134
Reaction score
16
I have a remote server in which I have come data.
I want to compare those files with local directories.

Assume my IP Address is 168.aaa.bbb.ccc
I am login as root

for copying I typically use this command line
rsync --protect-args -av root@168.aaa.bbb.ccc:/root/dl/2021\ Academy\ Registered/ /home/cathtan/Downloads/dl/2021_Academy_Registered

For comparing, I found the following line of code will do that task

rsync -avun --delete ${TARGET}/ ${SOURCE} | sed -ne 's/^deleting *//p'


Now I want to create a shell script cmpdir.sh to compare two command line directories using following below command.
rsync -avun --delete ${TARGET}/ ${SOURCE} | sed -ne 's/^deleting *//p'

For example I will issue the following command
./cmpdir root@168.aaa.bbb.ccc:/root/dl/2021\ Academy\ Registered/ /home/cathtan/Downloads/dl/2021_Academy_Registered

How do I implement the script?


Thanks
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
I have a remote server in which I have come data.
I want to compare those files with local directories.

Assume my IP Address is 168.aaa.bbb.ccc
I am login as root

for copying I typically use this command line
rsync --protect-args -av root@168.aaa.bbb.ccc:/root/dl/2021\ Academy\ Registered/ /home/cathtan/Downloads/dl/2021_Academy_Registered

For comparing, I found the following line of code will do that task

rsync -avun --delete ${TARGET}/ ${SOURCE} | sed -ne 's/^deleting *//p'


Now I want to create a shell script cmpdir.sh to compare two command line directories using following below command.
rsync -avun --delete ${TARGET}/ ${SOURCE} | sed -ne 's/^deleting *//p'

For example I will issue the following command
./cmpdir root@168.aaa.bbb.ccc:/root/dl/2021\ Academy\ Registered/ /home/cathtan/Downloads/dl/2021_Academy_Registered

How do I implement the script?


Thanks

Simply just use the command you have provided, I would just create a file with the following commands

Bash:
#!/usr/bin/env bash

rsync -avun --delete "$1" "$2" | sed -ne 's/^deleting *//p'

Remember to give it execution permission as such
Bash:
chmod u+x ./cmpdir

Then you can execute it as follows
Bash:
./cmpdir root@168.aaa.bbb.ccc:/root/dl/2021\ Academy\ Registered/ /home/cathtan/Downloads/dl/2021_Academy_Registered

If all you need is to know if 2 directories are of identical data, your command would not be completely suffice. Since You are merely testing for modified file date time and file size only. What if there are 2 files of the same modified file date time and length but of different contents ? This is a generic requirement though. If your intention is just to test for differences between modified file date time and length only, that works too having you need to know that is what you want.

If checksum is required, please do include "--checksum" option for your rsync command.

Also in your question, you did not mention what are you expecting the command to return ? Obviously for your current commands, you are merely printing out the files that will be deleted in your destination since "--delete" on a dry-run simply just print out what commands are to be executed on the destination. Your "sed" command is inhibiting any additional files found in the source to be added into the destination to be printed since you are using

Bash:
sed -ne 's/^deleting *//p'

For further discussion, lets tie down the convention of SRC to be your local dir and DEST to be your remote directory accessed over SSH. This way will have less confusion in the prorogation of actions used in the discussion.
 
Last edited:

tolong1

Member
Joined
Dec 15, 2017
Messages
134
Reaction score
16
Thank you David for a such a wonderful explanation and education. Really you are a Guru who wholeheartedly educate us without any expectation. Appreciate

Basically what I am trying to do is as follows
1) I have some data posted on the linux server overseas. I am copying manually as an when required. During the copy process, I have experienced couple of times disconnection due to some internal technical and wifi issue.
2) I want to make sure the data what I copied is accurate and no files missed or corrupted.
3) In the above example , if a file is missing in my local directory, exisit in remote server, will it flagged?
4) Any other better method to copy and ensure files are copied accurately?
I will add --checksum in the script.

The script as follows
#!/usr/bin/env bash
rsync -avun --checksum --delete "$1" "$2" | sed -ne 's/^deleting *//p'


I am using the following command to copy typically
rsync --protect-args -av root@168.aaa.bbb.ccc:/root/dl/2021\ Academy\ Registered\ /home/cathtan/Downloads/dl/2021_Academy_Registered


I tried to use the usual command to copy it gave error
cathtan@pop-os:~/scr$ rsync --protect-args -av root@168.aaa.bbb.ccc:/root/dl/source\ /home/cathtan/Downloads/dl/dest
root@168.aaa.bbb.ccc's password:
receiving incremental file list
rsync: link_stat "/root/dl/source " failed: No such file or directory (2)

sent 8 bytes received 85 bytes 10.94 bytes/sec
total size is 0 speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1816) [Receiver=3.2.3]
rsync: [Receiver] write error: Broken pipe (32)

so I have mounded the by remote server locally using the following script
#!/bin/bash
sshfs root@168.aaa.bbb.ccc:/root/dl /home/cathtan/dl
So I used the following to command to copy
rsync --protect-args -av /home/cathtan/dl/source/ /home/cathtan/Downloads/dl/dest

contents of directory (currently difffile.txt do not have any difference. I will modify it and test)

Remote server mounted locally
cathtan@pop-os:~/dl$ pwd
/home/cathtan/dl
cathtan@pop-os:~/dl$ ls -ll source/
total 8
-rw-r--r-- 1 root root 32 Sep 3 11:10 difffile.txt
-rw-r--r-- 1 root root 19 Sep 3 11:09 samefile.txt
cathtan@pop-os:~/dl$

Local files (at present the files are same. I am going to modify difffile.txt at dest (local)
cathtan@pop-os:~/Downloads/dl/dest$ pwd
/home/cathtan/Downloads/dl/dest
cathtan@pop-os:~/Downloads/dl/dest$ ls -l
total 8
-rw-r--r-- 1 cathtan cathtan 32 Sep 3 11:10 difffile.txt
-rw-r--r-- 1 cathtan cathtan 19 Sep 3 11:09 samefile.txt
cathtan@pop-os:~/Downloads/dl/dest$

I have updated the difffile.txt
cathtan@pop-os:~/Downloads/dl/dest$ pwd
/home/cathtan/Downloads/dl/dest
cathtan@pop-os:~/Downloads/dl/dest$ ls -ll
total 8
-rw-r--r-- 1 cathtan cathtan 44 Sep 4 11:45 difffile.txt
-rw-r--r-- 1 cathtan cathtan 19 Sep 3 11:09 samefile.txt
cathtan@pop-os:~/Downloads/dl/dest$

I ran the cmpdir script
cathtan@pop-os:~/scr$ ./cmpdirdvd.sh root@168.aaa.bbb.ccc:/root/dl/source/ /home/cathtan/Downloads/dl/dest
root@168.aaa.bbb.ccc's password:
cathtan@pop-os:~/scr$

The difference in file not showing up
What mistakes I am making?

Thanks
 
Last edited:

tolong1

Member
Joined
Dec 15, 2017
Messages
134
Reaction score
16
Thanks Lemon. I will try your reference.

so better option is create separate user instead of root right?
 
Last edited:

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
Thank you David for a such a wonderful explanation and education. Really you are a Guru who wholeheartedly educate us without any expectation. Appreciate

Basically what I am trying to do is as follows
1) I have some data posted on the linux server overseas. I am copying manually as an when required. During the copy process, I have experienced couple of times disconnection due to some internal technical and wifi issue.
2) I want to make sure the data what I copied is accurate and no files missed or corrupted.
3) In the above example , if a file is missing in my local directory, exisit in remote server, will it flagged?
4) Any other better method to copy and ensure files are copied accurately?
I will add --checksum in the script.

The script as follows



I am using the following command to copy typically



I tried to use the usual command to copy it gave error


so I have mounded the by remote server locally using the following script

So I used the following to command to copy


contents of directory (currently difffile.txt do not have any difference. I will modify it and test)

Remote server mounted locally


Local files (at present the files are same. I am going to modify difffile.txt at dest (local)



I have updated the difffile.txt


I ran the cmpdir script


The difference in file not showing up
What mistakes I am making?

Thanks

Lets just go right into what you want to achieve instead of rectifying what you are doing now. I assumed that you are not trying to do a rsync, that you are trying to copy files over. rsync is not meant to do that, but like many tools if you know what you are doing, you can obviously use it in variety purposes.

Based on what you have described, the approach you have taken or comparing using rsync is unnecessary. Your thread starts with verification, not with the copying, hence I will show you a correct way to ensure transferring of data across server is reliable and consistent. This way you don't need to worry about reliability and resort to using rsync for it.

The technique I proposed will be likely yield higher throughput because the copying process is one sequential stream which rsync will do more than just that. However if you are indeed trying to sync directories instead of just copying file over reliably, then the technique below may or may not be appropriate. Still I will share in case it aligns with your primary needs.

The steps are as follows

Requirements

1) As highlighted by @lemondrink, your approach of using root user at the remote site/server is unsafe. It is better you create a new user(eg: john) at your remote site/server
Bash:
sudo useradd -m john ## create new `john` user with home directory usually in `/home` dir, hence `/home/john`
sudo passwd john ## assign a new password for `john` user

I assume you are using `ssh` to access your remote site/server, hence you will want to use public key authentication instead of password so that the copy process can be repeated all the time without explicitly entering password
I will not go into details for there are plentiful of such articles publicly. You may refer to https://kb.iu.edu/d/aews for more information


Steps

Actual copying technique. Here I will assume your target user account is `john@remote.site` while the target destination of your data should be copied to `/target/`. The local directory that you could like to copy over is `/source/*`

1) tar up your local directory
Bash:
pushd /source
tar -czf /tmp/payload.tar.gz .
popd

2) Create a checksum of the tar file
Bash:
pushd /tmp
sha1sum payload.tar.gz > ./payload.tar.gz.sha
popd

3) scp the file over to the remote site
Bash:
scp /tmp/payload.tar.gz /tmp/payload.tar.gz.sha john@remote.site:/tmp

4) Perform a checksum to ensure the file is copied over reliably
Bash:
ssh john@remote.site 'cd /tmp && sha1sum -c ./payload.tar.gz.sha'

5) Unpack the tar file
Bash:
ssh john@remote.site 'rm -rf /target && mkdir -p /target && cd /target && tar -xf /tmp/payload.tar.gz && rm -f /tmp/payload.tar.gz{,.sha}'

6) local housekeeping
Bash:
rm -f /tmp/payload.tar.gz{,.sha}

All you need to do is to ensure all the steps above executed without errors and it will ensure the transfer is reliable and consistent.

However there are some assumptions I have made having I do not exactly know your requirement.
1) One major assumption is I assume the /target directory can be completely overwritten.
2) I assume your remote user have access to read, write, create directories and files at the `/target` path

All unix commands above have return status code which will tell you if the commands are successfully executed.
In the case of step (5). If the transfer is interrupted halfway, the return code will be non-zero. If any of the commands listed in the `ssh` command is not remotely executed successfully and completely, the return code of SSH will also be non-zero.

In unix shell, zero return code is success. Anything else is either warning or erroneous. After each command, you can always find out the return code using `$?` variable.

Below is an example of a success and and error case
Bash:
ls /
echo $?

ls /doesnotexist
echo $?

The first echo will give you zero(0), the next command should be 2.

In the cases of my shell commands you can so something like this, which I believe for step (5) is the more crucial
Bash:
ssh john@remote.site 'rm -rf /target && mkdir -p /target && cd /target && tar -xf /tmp/payload.tar.gz && rm -f /tmp/payload.tar.gz{,sha}'
[ $? -eq 0 ] || ( echo "remote transfer unsuccessful"  && exit 1)

If you have devise a script based on the steps above, you can examine it's return code to know if the entire script has executed properly
Bash:
./transferfile || echo "transfer incomplete"

Hopes it shed better understanding in you on how to devise reliable transfer techniques.
Of course, this is just ONE way. There are numerous different tools and techniques to also achieve similar results. It depends on what tools are available to you and the simplicity and how much dependencies you want to create in your process.

If you still have doubts or trouble in writing your own shell script, just do a shout out. :)
 
Last edited:

tolong1

Member
Joined
Dec 15, 2017
Messages
134
Reaction score
16
Dear Davidktw

Thanks a lot for taking so much time to educate me. Really nice of you and I appreciate.

Linux is an intresting Operating System, lot to learn. I am learning everyday.
I will create a user as per your suggestion and strictly follow. Thanks a lot

In my case I am doing other way. copy data from remote server to my local linux box located at my place behind router. I may have to configure DMZ if I follow your script.

currently I am using the following command to copy file from remote server to local linux box

rsync --protect-args -av root@168.aaa.bbb.ccc:/root/dl/Sample\ Data/ /home/cathtan/Downloads/dl/Sample_Data


The directores I am copying from the remote server is in the range of 500 MB to 800 MB. These are mostly educational recorded material. so mostly mp4 files.

Your suggestion to compress first, create checksum is a good idea reduces the file size.

So is it possible I create a bash script in remote box, example cpydir.sh and pass the directory name to copy? Is it possible to pass the command line argument to remote server while login?

Assume remote user name I created is john
Example
ssh john@168.aaa.bbb.ccc /home/john/scripts/cpydir.sh /home/john/dl/Sample\ Data/

Upon login ssh should execuate cpydir.sh then take the argument home/john/dl/Sample\ Data/

Is this possible?

or any other better way?
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
Dear Davidktw

Thanks a lot for taking so much time to educate me. Really nice of you and I appreciate.

Linux is an intresting Operating System, lot to learn. I am learning everyday.
I will create a user as per your suggestion and strictly follow. Thanks a lot

In my case I am doing other way. copy data from remote server to my local linux box located at my place behind router. I may have to configure DMZ if I follow your script.

currently I am using the following command to copy file from remote server to local linux box




The directores I am copying from the remote server is in the range of 500 MB to 800 MB. These are mostly educational recorded material. so mostly mp4 files.

Your suggestion to compress first, create checksum is a good idea reduces the file size.

So is it possible I create a bash script in remote box, example cpydir.sh and pass the directory name to copy? Is it possible to pass the command line argument to remote server while login?

Assume remote user name I created is john
Example


Upon login ssh should execuate cpydir.sh then take the argument home/john/dl/Sample\ Data/

Is this possible?

or any other better way?
@tolong1 Before we proceed further with more suggestion. Lets find out the exact requirement first. Different requirements may have different approaches.

1) We know your local machine may not have static IP. Does your remote machine have static IP ? Rsync can transfer bidirectional no matter which machine does the initiation. Hence if your remote machine have static IP, you can just connect rsync from your local machine to your remote machine and perform a sync back from remote back to your local
2) Those media files that you want to download back to your local machine, are they different each time ? When you are done downloading, do you remove from them from the remote server ?
3) How often do you perform this transfer ? Is it adhoc or periodic like daily, weekly, monthly etc ? How do you there are new files at the remote machine ?

Since your media files are MP4, there is no need to perform any compression as these files are not very compressible and you will most likely be wasting time and resources.

rsync can compress with the -z option .. so maybe a rsync -acrvz --progress running inside a screen session
How would a screen session helps with rsync d/c ? It only helps to keep the shell session on the remote side.

But since you are looking to transfer mp4 files which are "educational material" and you have root on the remote machine, why don't you run a httpd or a ftpd with a password protected dir where your stuff is stored, then wget -c to your local machine?
@tolong1 lemondrink suggestion is good too, provided what you are looking out for is manual downloading process. Since you are already using SSH today, you don't need to resort to FTPD, just SFTP will do. OPENSSH server has a SFTP subsystem in it readily for your use and it is secured over SSL. HTTPd is fine if you need, you can also use WebDAV and you can have secure https:// connection from your windows file explorer. https://community.microstrategy.com...-Windows-for-multimedia-widget?language=en_US
 
Last edited:

tolong1

Member
Joined
Dec 15, 2017
Messages
134
Reaction score
16
@tolong1 Before we proceed further with more suggestion. Lets find out the exact requirement first. Different requirements may have different approaches.

1) We know your local machine may not have static IP. Does your remote machine have static IP ? Rsync can transfer bidirectional no matter which machine does the initiation. Hence if your remote machine have static IP, you can just connect rsync from your local machine to your remote machine and perform a sync back from remote back to your local
2) Those media files that you want to download back to your local machine, are they different each time ? When you are done downloading, do you remove from them from the remote server ?
3) How often do you perform this transfer ? Is it adhoc or periodic like daily, weekly, monthly etc ? How do you there are new files at the remote machine ?

@davidktw @lemondrink Thank you for your inputs.

Basically these are video files recorded in a community events for kids. This can be outdoor events or indoor events. An event will have multiple video files.

What we are doing is, downloading these files, reviewing the contents, edit where required and send to them edited files.

1) We know your local machine may not have static IP. Does your remote machine have static IP ? Rsync can transfer bidirectional no matter which machine does the initiation. Hence if your remote machine have static IP, you can just connect rsync from your local machine to your remote machine and perform a sync back from remote back to your local

Yes local machine do not have static IP remote machine has static IP.

2) Those media files that you want to download back to your local machine, are they different each time ? When you are done downloading, do you remove from them from the remote server ?
Yes these files will be different every time. They conduct various sessions and record each session in one directory. Each directory may have multiple files. So we copy as a directory . We don't remove them. Customer removes is when there is a space constraint.

3) How often do you perform this transfer ? Is it adhoc or periodic like daily, weekly, monthly etc ? How do you there are new files at the remote machine ?

Weekly once or twice we copy. Most of the time customer will drop us an email. In case if they miss we will check periodically. Each time we may have two to five directores and each directory may have multiple files.

I think time being rsync will be sufficient. I will consider SFTP and HTTPd in the future. This require some learning curve. You have provided quite a lot of information for me learn and apply in future. I really appreciate the guidance you are providing.

For be first is to master the shell script. This will give me some confidence to go into next level.
At present what I am doing is when ever I login to linux machine, I mount the remote linux machine to local mount. Then I manually check the directory. Once I know the new directory name I copy the directory name . I have a standard shell script. I edit this script with new directory name and save and run.
Only consitraint I have is I need to enter the password for every directory copy

See below example for my script


#!/bin/bash
tstart=$(date +"%T")
echo "Starting ... $tstart"
rsync --protect-args -av --progress cathysg@168.aaa.bbb.ccc:/home/cathysg/dl/2021jan\ grade6\ Human\ Body/ /home/cathtan/Downloads/dl/2021jan_grade6_Human_Body
rsync --protect-args -av --progress cathysg@168.aaa.bbb.ccc:/home/cathysg/dl/2021jan\ grade5\ Sun\ Science/ /home/cathtan/Downloads/dl/2021jan_grade5_Sun_Science
tend=$(date +"%T")
echo "Completed at ... $tend"

I did introduce --checksum in the above script as below bit it is supressing the --progress display. I am not sure why.
rsync --protect-args -av --checksum --progress cathysg@168.aaa.bbb.ccc:/home/cathysg/dl/2021jan\ grade6\ Human\ Body/ /home/cathtan/Downloads/dl/2021jan_grade6_Human_Body

Future plan, Currently these backup files are stored locally in SSD. I am considering storing in AMAZON AWS STORAGE . I need to learn AMAZON AWS to do backup.
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
@davidktw @lemondrink Thank you for your inputs.

Basically these are video files recorded in a community events for kids. This can be outdoor events or indoor events. An event will have multiple video files.

What we are doing is, downloading these files, reviewing the contents, edit where required and send to them edited files.



Yes local machine do not have static IP remote machine has static IP.


Yes these files will be different every time. They conduct various sessions and record each session in one directory. Each directory may have multiple files. So we copy as a directory . We don't remove them. Customer removes is when there is a space constraint.



Weekly once or twice we copy. Most of the time customer will drop us an email. In case if they miss we will check periodically. Each time we may have two to five directores and each directory may have multiple files.

I think time being rsync will be sufficient. I will consider SFTP and HTTPd in the future. This require some learning curve. You have provided quite a lot of information for me learn and apply in future. I really appreciate the guidance you are providing.

For be first is to master the shell script. This will give me some confidence to go into next level.
At present what I am doing is when ever I login to linux machine, I mount the remote linux machine to local mount. Then I manually check the directory. Once I know the new directory name I copy the directory name . I have a standard shell script. I edit this script with new directory name and save and run.
Only consitraint I have is I need to enter the password for every directory copy

See below example for my script




I did introduce --checksum in the above script as below bit it is supressing the --progress display. I am not sure why.


Future plan, Currently these backup files are stored locally in SSD. I am considering storing in AMAZON AWS STORAGE . I need to learn AMAZON AWS to do backup.

Okay since your remote server has static IP and you are initiating from your local machine, then I think the process is rather simple. Just a single rsync will do, but of course, you need to ensure the rsync completes without error.

In order to not repeatedly enter credentials during SSH, the most common method is public key authentication. I have already provided you with a link to a tutorial on how you can achieve it. Since you already can SSH to your client's system today, it should works.

You can initial a rsync from your local machine to the remote machine, and perform a transfer from the remote machine (SRC) back to your local machine (DEST). The script below will also repeat up to 5 times of error
Bash:
#!/usr/bin/env bash

SRC=“$1”
TGT=“$2”

print_usage() {
  (
  echo -e "$1"
  echo
  ) >&2

  cat <<MSG >&2
USAGE:
  $0 <SRC DIR> <DEST DIR>

EG:
  $0 remote/relative/dir/ local/relative/dir/
  $0 /remote/absolute/dir/ local/relative/dir/
MSG
  exit 1
}

[ -z $SRC ] && print_usage "Error:\n  Missing source directory."
[ -z $TGT ] && print_usage "Error:\n  Missing target directory."

cnt=5
while [ $cnt -gt 0 ]; do
  rsync -arc user@remote.host:$SRC $TGT
  [ $? -eq 0 ] && break
  cnt=$[ $cnt - 1 ]
  sleep 1
done
* The script above have quite a fair bit of interesting constructs, so feel free to learn them.

As long as the exit code is ZERO(0), then your files should have been completely transferred. Whether you want to use `--partial` is up to you, and **don't** use `--delete` option so that if the file/dir is missing from the remote machine, it wouldn't delete the file from your local machine.
You can read up https://linux.die.net/man/1/rsync to find out more about the tool.

How you get the source directory and where is the target directory, I will leave it to you. :)
 
Last edited:
Important Forum Advisory Note
This forum is moderated by volunteer moderators who will react only to members' feedback on posts. Moderators are not employees or representatives of HWZ Forums. Forum members and moderators are responsible for their own posts. Please refer to our Community Guidelines and Standards and Terms and Conditions for more information.
Top