Thank you David for a such a wonderful explanation and education. Really you are a Guru who wholeheartedly educate us without any expectation. Appreciate
Basically what I am trying to do is as follows
1) I have some data posted on the linux server overseas. I am copying manually as an when required. During the copy process, I have experienced couple of times disconnection due to some internal technical and wifi issue.
2) I want to make sure the data what I copied is accurate and no files missed or corrupted.
3) In the above example , if a file is missing in my local directory, exisit in remote server, will it flagged?
4) Any other better method to copy and ensure files are copied accurately?
I will add --checksum in the script.
The script as follows
I am using the following command to copy typically
I tried to use the usual command to copy it gave error
so I have mounded the by remote server locally using the following script
So I used the following to command to copy
contents of directory (currently difffile.txt do not have any difference. I will modify it and test)
Remote server mounted locally
Local files (at present the files are same. I am going to modify difffile.txt at dest (local)
I have updated the difffile.txt
I ran the cmpdir script
The difference in file not showing up
What mistakes I am making?
Thanks
Lets just go right into what you want to achieve instead of rectifying what you are doing now. I assumed that you are not trying to do a rsync, that you are trying to copy files over. rsync is not meant to do that, but like many tools if you know what you are doing, you can obviously use it in variety purposes.
Based on what you have described, the approach you have taken or comparing using rsync is unnecessary. Your thread starts with verification, not with the copying, hence I will show you a correct way to ensure transferring of data across server is reliable and consistent. This way you don't need to worry about reliability and resort to using rsync for it.
The technique I proposed will be likely yield higher throughput because the copying process is one sequential stream which rsync will do more than just that. However if you are indeed trying to sync directories instead of just copying file over reliably, then the technique below may or may not be appropriate. Still I will share in case it aligns with your primary needs.
The steps are as follows
Requirements
1) As highlighted by
@lemondrink, your approach of using root user at the remote site/server is unsafe. It is better you create a new user(eg: john) at your remote site/server
Bash:
sudo useradd -m john ## create new `john` user with home directory usually in `/home` dir, hence `/home/john`
sudo passwd john ## assign a new password for `john` user
I assume you are using `ssh` to access your remote site/server, hence you will want to use public key authentication instead of password so that the copy process can be repeated all the time without explicitly entering password
I will not go into details for there are plentiful of such articles publicly. You may refer to
https://kb.iu.edu/d/aews for more information
Steps
Actual copying technique. Here I will assume your target user account is `
john@remote.site` while the target destination of your data should be copied to `/target/`. The local directory that you could like to copy over is `/source/*`
1) tar up your local directory
Bash:
pushd /source
tar -czf /tmp/payload.tar.gz .
popd
2) Create a checksum of the tar file
Bash:
pushd /tmp
sha1sum payload.tar.gz > ./payload.tar.gz.sha
popd
3) scp the file over to the remote site
Bash:
scp /tmp/payload.tar.gz /tmp/payload.tar.gz.sha john@remote.site:/tmp
4) Perform a checksum to ensure the file is copied over reliably
Bash:
ssh john@remote.site 'cd /tmp && sha1sum -c ./payload.tar.gz.sha'
5) Unpack the tar file
Bash:
ssh john@remote.site 'rm -rf /target && mkdir -p /target && cd /target && tar -xf /tmp/payload.tar.gz && rm -f /tmp/payload.tar.gz{,.sha}'
6) local housekeeping
Bash:
rm -f /tmp/payload.tar.gz{,.sha}
All you need to do is to ensure all the steps above executed without errors and it will ensure the transfer is reliable and consistent.
However there are some assumptions I have made having I do not exactly know your requirement.
1) One major assumption is I assume the /target directory can be completely overwritten.
2) I assume your remote user have access to read, write, create directories and files at the `/target` path
All unix commands above have return status code which will tell you if the commands are successfully executed.
In the case of step (5). If the transfer is interrupted halfway, the return code will be non-zero. If any of the commands listed in the `ssh` command is not remotely executed successfully and completely, the return code of SSH will also be non-zero.
In unix shell, zero return code is success. Anything else is either warning or erroneous. After each command, you can always find out the return code using `$?` variable.
Below is an example of a success and and error case
Bash:
ls /
echo $?
ls /doesnotexist
echo $?
The first echo will give you zero(0), the next command should be 2.
In the cases of my shell commands you can so something like this, which I believe for step (5) is the more crucial
Bash:
ssh john@remote.site 'rm -rf /target && mkdir -p /target && cd /target && tar -xf /tmp/payload.tar.gz && rm -f /tmp/payload.tar.gz{,sha}'
[ $? -eq 0 ] || ( echo "remote transfer unsuccessful" && exit 1)
If you have devise a script based on the steps above, you can examine it's return code to know if the entire script has executed properly
Bash:
./transferfile || echo "transfer incomplete"
Hopes it shed better understanding in you on how to devise reliable transfer techniques.
Of course, this is just ONE way. There are numerous different tools and techniques to also achieve similar results. It depends on what tools are available to you and the simplicity and how much dependencies you want to create in your process.
If you still have doubts or trouble in writing your own shell script, just do a shout out.
