July 2018
M T W T F S S
« Jun    
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Categories

WordPress Quotes

Life shrinks or expands in proportion to one's courage.
Anais Nin

Recent Comments

July 2018
M T W T F S S
« Jun    
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Short Cuts

2012 SERVER (64)
2016 windows (9)
AIX (13)
Amazon (30)
Ansibile (18)
Apache (125)
Asterisk (2)
cassandra (2)
Centos (208)
Centos RHEL 7 (254)
chef (3)
cloud (2)
cluster (3)
Coherence (1)
DB2 (5)
DISK (25)
DNS (9)
Docker (24)
Eassy (11)
EXCHANGE (3)
Fedora (6)
ftp (5)
GIT (3)
GOD (2)
Grub (1)
Hacking (10)
Hadoop (6)
horoscope (23)
Hyper-V (10)
IIS (15)
IPTABLES (15)
JAVA (6)
JBOSS (32)
jenkins (1)
Kubernetes (2)
Ldap (5)
Linux (189)
Linux Commands (167)
Load balancer (5)
mariadb (14)
Mongodb (4)
MQ Server (21)
MYSQL (82)
Nagios (5)
NaturalOil (13)
Nginx (29)
Ngix (1)
openldap (1)
Openstack (6)
Oracle (34)
Perl (3)
Postfix (19)
Postgresql (1)
PowerShell (2)
Python (3)
qmail (36)
Redis (12)
RHCE (28)
SCALEIO (1)
Security on Centos (29)
SFTP (1)
Shell (64)
Solaris (58)
Sql Server 2012 (4)
squid (3)
SSH (10)
SSL (14)
Storage (1)
swap (3)
TIPS on Linux (28)
tomcat (59)
Uncategorized (29)
Veritas (2)
vfabric (1)
VMware (28)
Weblogic (38)
Websphere (71)
Windows (19)
Windows Software (2)
wordpress (1)
ZIMBRA (17)

WP Cumulus Flash tag cloud by Roy Tanck requires Flash Player 9 or better.

Who's Online

31 visitors online now
8 guests, 23 bots, 0 members

Hit Counter provided by dental implants orange county

How to Improve rsync Performance

I need to transfer 10TB of data from one machine to another machine. Those 10TB of files are living in a large RAID which span across 7 different disks. The target machine has another large RAID which span across 12 different disks. It is not easy to copying those files locally. Therefore, I decide to copy the files over the LAN.

There are four options popping up in my head: scprsyncrsyncd (rsync as daemon) and netcat.

scp

scp is handy, easy to use but comes with two disadvantages: slow and not fault-tolerant. Since scp comes with the highest security, all data are encrypted before the transfer. It will slow down the overall performance because of the extra encryption stuffs (which makes the data larger), and extra computational resource (which uses more CPU). If the transfer is interrupted, there is no easy way to resume the process other than transferring everything again. Here are some example commands:

#Source machine
#Typical speed is about 20 to 30MB/s
scp -r /data target_machine:/data

#Or you can enable the compression on the fly
#Depending on the type of your data, if your data is already compressed, you may see no or negative speed improvement
scp -rC /data target_machine:/data

rsync

rsync is similar to scp. It comes with the encryption (via SSH) such that the data is safe. It also allows you to transfer the newer files only. This will reduce the amount of data being transferred. However, it comes with few disadvantages: long decision time, encryption (which increase the size of overhead) and extra computational resource(e.g., data comparison, encryption and decryption etc). For example, if I use rsync to transfer 10TB of files from one machine to another machine (where the directory on the target machine is blank), it can easily take 5 hours to determine which files will need to be transferred before the actual data transfer is initialized.

#Run on the target machine
rsync -avzr -e ssh --delete-after source_machine:/data/ /data/

#Use a less secure encryption algorithm to speed up the process
rsync -avzr --rsh="ssh -c blowfish" --delete-after source_machine:/data/ /data/

#Use an even less secure algorithm to get the top speed
rsync -avzr --rsh="ssh -c arcfour" --delete-after source_machine:/data/ /data/

#By default, rsync compares the files using checksum, file size and modification date.
#Reduce the decision process by skipping the hash check
rsync -avzr --rsh="ssh -c arcfour" --delete-after --whole-file source_machine:/data/ /data/

Anyway, no matter what you do, the top speed of rsync in a consumer-grade gigabit network is around 45MB/s. On average, the speed is around 25-35MB/s. Keep in mind that this number does not include the decision time, which can be few hours.

rsyncd (rsync as a daemon)

Thanks for the comment of our reader. I got a chance to investigate the rsync as a daemon. Basically, the idea of running rsync as a daemon is similar to rsync. On the server, we run rsync as a service/daemon. We specify which directory we want to “export” to the clients (e.g., /usr/ports). When the files get changed on the server, it records the changes so that the when the clients talk to the server, the decision time will be faster. Here is how to set up rsync server on FreeBSD

sudo nano /usr/local/etc/rsyncd.conf

And this is my configuration file:

pid file = /var/run/rsyncd.pid

#Notice that I use derrick here instead of other systems users, such as nobody
#That's because nobody does not have permission to access the path, i.e., /data/
#Either you make the source directory available to "nobody", or you change the daemon user.
uid = derrick
gid = derrick
use chroot = no
max connections = 4
syslog facility = local5
pid file = /var/run/rsyncd.pid

[mydata]
   path = /data/
   comment = data
Don't forget to include the following in /etc/rc.conf, so that the service will be started automatically.

rsyncd_enable="YES"
#Let's start the rsync service:

sudo /usr/local/etc/rc.d/rsyncd start

To pull the files from the server to the clients, run the following:

rsync -av myserver::mydata /data/

#Or you can enable compression
rsync -avz myserver::mydata /data/

To my surprise, it works much better than running rsync alone. Here are some data I collected during transferring 10TB files from ZFS to ZFS:

Bandwidth measured on the client machine: 70MB/s

zpool IO speed on the client side: 75MB/s

P.S. Initially, the speed was about 45-60MB/s, after I tweak my Zpool, I can get the top speed to 75-80MB/s. Please check out here for references.

I notice that the decision time is much faster than running rsync alone. Also the process is much more stable, with zero interruption, i.e.,

rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at io.c(521) [receiver=3.1.0]
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(632) [generator=3.1.0]
rsync: [receiver] write error: Broken pipe (32)

NetCat

NetCat is similar to cat, except that it works at the network level. I decide to use netcat for the initial transfer. If it is interrupted, I will let rsync to kick in the process. Netcat does not encrypt the data, so the overhead is very small. If you transfer the file within a local network and you don’t care about the security, netcat is a perfect choice.

There is only one disadvantage of using netcat. It can only handle one file at a time. It doesn’t mean you need to run netcat for every single file. Instead, we can tar the file before feeding to netcat, and untar the file at the receiving end. As long as we do not compress the files, we can keep the CPU usage small.

#Open two terminals, one for the source and another one for the target machine.

#On the target machine:
#Go to the directory, e.g., 
cd /data

#Run the following:
nc -l 9999| tar xvfp -

#On the source machine:
#Go to the directory, e.g.,
cd /data

#Pick a port number that is not being used, e.g., 9999
tar -cf - . | nc target_machine 9999

Unlike rsync, the process will start right the way, and the maximum speed is around 45 to 60MB/s in a gigabit network.

Conclusion

Candidates Top Speed (w/o compression) Top Speed (w/ compression) Resume Stability Instant Start?
scp 40MB/s 25MB/s No Low Instant
rsync 25MB/s 50MB/s Yes Medium Long Preparation
rsyncd 30MB/s 70MB/s Yes High Short Preparation
netcat 60MB/s (tar w/o -z) 40MB/s (tar w/ -z) No Very High Instant

 

Choice
Command
Pros
Cons
#1
scp
  • can be speed up choosing simple encryption
  • can recursively copy directories
#2
rsync
flexible and convenient for directory synchronization
possible, but not easy to configure to NOT to use encryption
#3
sftp
can be speed up choosing simple encryption
can’t recursively copy directories

Notice: when running any tool it consumes about 5-10% of CPU at both sender and receiver machines, apparently, doing encryption/decryption.

TEST DETAILS

These are the actual commands and generated output by these tools.

rsync

default “-a” –archive mode
“-z” compress
rsync -a –progress DIR2REPLICATE root@10.10.10.2:/tmp
   411533312   0%   45.38MB/s    0:26:04
  1407877120   1%   44.41MB/s    0:26:16
  1716748288   2%   42.09MB/s    0:27:36
  2002550784   2%   46.47MB/s    0:24:541
  2382397440   3%   45.31MB/s    0:25:24
  2762407936   3%   45.34MB/s    0:25:15
 rsync -az –progress DIR2REPLICATE root@10.10.10.2:/tmp
991383915 100%   13.67MB/s    0:01:09
   990955265 100%   14.02MB/s    0:01:07
   202624740 100%   15.42MB/s    0:00:12
   202771784 100%   15.87MB/s    0:00:12
    91676674 100%   12.86MB/s    0:00:06
    91628045 100%   11.76MB/s    0:00:07
  1082301721 100%   16.86MB/s    0:01:01
  1081744094 100%   17.14MB/s    0:01:00
   444531263 100%   13.06MB/s    0:00:32
   444311917 100%   12.97MB/s    0:00:32
    25956199 100%   11.99MB/s    0:00:02
    25387962 100%   16.94MB/s    0:00:01
    94059363 100%   15.51MB/s    0:00:05
    94189273 100%   14.61MB/s    0:00:06
   369550738 100%   16.31MB/s    0:00:21
   370924791 100%   15.96MB/s    0:00:22
   143659839 100%   14.75MB/s    0:00:09
   141681760 100%   14.58MB/s    0:00:09
    74662680 100%   14.45MB/s    0:00:04
    73882769 100%   12.73MB/s    0:00:05
     1809543 100%   13.59MB/s    0:00:00
###
### “-c arcfour” cipher is defined in RFC 4253; it is plain RC4 with a 128-bit key
###
rsync -a -P -e “ssh -T -c arcfour -o Compression=no -x” DIR2REPLICATE root@10.10.10.2:/tmp
  1081744094 100%   65.35MB/s    0:00:15
444531263 100%   56.34MB/s    0:00:07
444311917 100%   61.61MB/s    0:00:06
369550738 100%   53.94MB/s    0:00:06
370924791 100%   60.03MB/s    0:00:05
23319017231 100%   65.89MB/s    0:05:37
23308793162 100%   64.88MB/s    0:05:42
11951287020 100%   65.68MB/s    0:02:53
3453648896  28%   68.11MB/s    0:02:0

scp

default “-r” recursive
“-C” compress “-r” recursive
scp -r DIR2REPLICATE root@10.10.10.2:/tmp
100%  193MB  64.4MB/s   00:03
100%  424MB  60.6MB/s   00:07
100%  945MB  63.0MB/s   00:15
100%  945MB  59.1MB/s   00:16
100% 1032MB  64.5MB/s   00:16
100% 1032MB  60.7MB/s   00:17
100%  749MB  53.5MB/s   00:14
100% 1253MB  62.6MB/s   00:20
18% 4615MB  62.6MB/s   05:18
scp -Cr DIR2REPLICATE root@10.10.10.2:/tmp
100%  193MB  16.1MB/s   00:12
100%  424MB  14.6MB/s   00:29
100%  945MB  15.0MB/s   01:03
100%  945MB  14.8MB/s   01:04
100%  424MB  14.1MB/s   00:30
100%  352MB  17.6MB/s   00:20
100%  193MB  17.6MB/s   00:11
100%  135MB  16.9MB/s   00:08
100% 1032MB  17.8MB/s   00:58
100% 1032MB  17.8MB/s   00:58
100%  354MB  17.7MB/s   00:20
100%  749MB  18.3MB/s   00:41
100% 1253MB  18.4MB/s   01:08
  6% 1518MB  17.7MB/s   21:43
“-c arcfour” cipher is defined in RFC 4253; it is plain RC4 with a 128-bit key.
scp -c arcfour -r DIR2REPLICATE root@10.10.10.2:/tmp
100%  424MB 141.3MB/s   00:03
100%  945MB 135.0MB/s   00:07
100%  945MB 189.1MB/s   00:05
100%  424MB 141.2MB/s   00:03
100%  352MB 117.5MB/s   00:03
100% 1032MB 147.4MB/s   00:07
100% 1032MB 147.5MB/s   00:07
100%  749MB 149.8MB/s   00:05
100% 1253MB 156.6MB/s   00:08
100%   24GB 142.0MB/s   02:53
100%  595MB 119.1MB/s   00:05
100%   82GB 138.3MB/s   10:09
51% 9099MB 141.3MB/s   01:01
sftp
default behavior
“-R” to increase request queue length (default is 64)
“-B” to increase read/write request size (default is 32 KB)
sftp  root@10.10.10.2:/tmp
10% 2363MB  57.8MB/s   05:43 ETA
15% 3349MB  58.1MB/s   05:25 ETA
32% 7311MB  59.3MB/s   04:11 ETA
35% 7803MB  60.6MB/s   03:58 ETA
43% 9594MB  62.1MB/s   03:23 ETA
69%   15GB  58.6MB/s   01:55 ETA
77%   17GB  62.1MB/s   01:20 ETA
sftp  -R 128 -B 65536 root@10.10.10.2:/tmp
  2%  551MB  58.9MB/s   06:08 ETA
  8% 1806MB  62.3MB/s   05:28 ETA
41% 9170MB  60.6MB/s   03:35 ETA
56%   12GB  62.6MB/s   02:32 ETA
100%   22GB  62.5MB/s   05:56
“-c arcfour” cipher is defined in RFC 4253; it is plain RC4 with a 128-bit key.
sftp -oCiphers=arcfour root@10.10.10.2:/tmp
3%  711MB 142.5MB/s   02:31 ETA
18% 4115MB 146.0MB/s   02:04 ETA
23% 5156MB 148.1MB/s   01:55 ETA
28% 6379MB 144.6MB/s   01:49 ETA
34% 7672MB 144.0MB/s   01:41 ETA
37% 8389MB 143.7MB/s   01:36 ETA
62%   14GB 143.8MB/s   00:58 ETA
85%   19GB 142.4MB/s   00:22 ETA
92%   20GB 142.3MB/s   00:12 ETA
100%   22GB 144.4MB/s   02:34

TEST ENVIRONMENT

The test was performed between two servers interconnected by private 10 Gbit link with 9000 MTU “jumbo frame“. The files copies were large (100’s GB) binary files.

iperf network bandwidth test between 10.10.10.2 and 10.10.10.1
Network interface configuration (10 Gbit, MTU 9000)
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
————————————————————
[  4] local 10.10.10.2 port 5001 connected with 10.10.10.1 port 57279
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  11.5 GBytes  9.89 Gbits/sec
[  5]  0.0-30.0 sec  34.6 GBytes  9.89 Gbits/sec
[  4]  0.0- 0.9 sec  1000 MBytes  9.80 Gbits/sec
[  5]  0.0- 8.8 sec  9.77 GBytes  9.53 Gbits/sec
[  4]  0.0- 8.7 sec  10.0 GBytes  9.89 Gbits/sec
[  5]  0.0-86.8 sec   100 GBytes  9.89 Gbits/sec
[root@10.10.10.2]# cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=”bond0″
BOOTPROTO=none
IPADDR=10.10.10.2
NETMASK=255.255.255.192
ONBOOT=”yes”
USERCTL=no
TXQUEUELEN=100000
MTU=8912
BONDING_OPTS=”mode=1 miimon=200 primary=eth0″

 

ALTERNATIVE METHODS TO MOVE DATA FAST

Copy directories via netcat: tar | nc. Renders speed  ~251 Mb/s ( = ~1 TB/hr).

### On receiver ###
nc -v -l 5555  | tar -xvf –

### On Sender: test2del – large directory to move ###
time tar -cvf – test2del | nc -v 10.100.100.2 5555

### Output calculated ###
11GB in  27.465 s = 293 MB/s
42GB in 2m51.513s = 249 Mb/s (~1 TB/hr)
42GB in 2m50.630s = 251 Mb/s (~1 TB/hr)

Replacing IP Address in Apache2 config files with SED

Suppose i just mirrored my vps machine (starting from a clone and then rsync-ing all needed files) with rsync. Obviously i need to change the IP Address value contained into all the config files, but I’m lazy.
So, let’s use “SED” to do it at once, with a single line command.
I need to replace the IP Address “192.168.100.5” with “192.168.100.4” in all files contained in /etc/apache2/*

Our command for one file should be:

$ sed -i 's/192.168.100.5/192.168.100.4/g' /etc/apache2/sites-available/default

We want to do it on a bounce of files that contain that string, but unfortunately SED can’t accept wildcard chars so we need run it through a loop.
For this purpose we can use the linux FIND utility so we will end up with sed within this loop. The command should look like this:

$ find /etc/apache2/sites-available/ -type f -exec sed -i 's/192\.168\.100\.4/192\.168\.100\.5/g' {} \;

And while we are here, let’s say that if you manage to obtain the backup machine from a clone of the production machine and you are mantaining the filesystem in sync, this is what you should do with relevant files once you have finished to sync.

 

sed -i -r ‘s/192.168.1.35$/192.168.1.14/g’

find . -type f -exec sed -i ‘s/192\.168\.1\.35/192\.168\.1\.14/g’ {} \;

rsync

There are many commands to copy a directory in Linux. The difference between them in current Linux distribution are very small. All of them support link, time, ownership and sparse.

I tested them to copy a Linux kernel source tree. Each command I tested twice and keep the lower result.
The original directory size is 639660032 bytes. All methods generate exact same size of 675446784 bytes without sparse option.

Non Sparse Sparse
rsync rsync -a src /tmp rsync -a -S src /tmp
cpio find src -depth|cpio -pdm /tmp find src -depth|cpio -pdm –sparse /tmp
cp cp -a –sparse=never src /tmp cp -a –sparse=always src /tmp
tar tar -c src|tar -x -C /tmp tar -c -S src|tar -x -C /tmp

SCP: Secure Copy

Secure Copy is just like the cp command, but secure. More importantly, it has the ability to send files to remote servers via SSH!

Copy a file to a remote server:

# Copy a file:
$ scp /path/to/source/file.ext username@hostname.com:/path/to/destination/file.ext

# Copy a directory:
$ scp -r /path/to/source/dir username@server-host.com:/path/to/destination

This will attempt to connect to hostname.com as user username. It will ask you for a password if there’s no SSH key setup (or if you don’t have a password-less SSH key setup between the two computers). If the connection is authenticated, the file will be copied to the remote server.

Since this works just like SSH (using SSH, in fact), we can add flags normally used with the SSH command as well. For example, you can add the -v and/or -vvv to get various levels of verbosity in output about the connection attempt and file transfer.

You can also use the -i (identity file) flag to specify an SSH identity file to use:

$ scp -i ~/.ssh/some_identity.pem /path/to/source/file.ext username@hostname:/path/to/destination/file.ext

Here are some other useful flags:

  • -p (lowercase) – Preserves modification times, access times, and modes from the original file
  • -P – Choose an alternate port
  • -c (lowercase) – Choose another cypher other than the default AES-128 for encryption
  • -C – Compress files before copying, for faster upload speeds (already compressed files are not compressed further)
  • -l – Limit bandwidth used in kiltobits per second (8 bits to a byte!).
    • e.g. Limit to 50 KB/s: scp -l 400 ~/file.ext user@host.com:~/file.ext
  • -q – Quiet output

Rsync: Sync Files Across Hosts

Rsync is another secure way to transfer files. Rsync has the ability to detect file differences, giving it the opportunity to save bandwidth and time when transfering files.

Just like scp, rsync can use SSH to connect to remote hosts and send/receive files from them. The same (mostly) rules and SSH-related flags apply for rsync as well.

Copy files to a remote server:

# Copy a file
$ rsync /path/to/source/file.ext username@hostname.com:/path/to/destination/file.ext

# Copy a directory:
$ rsync -r /path/to/source/dir username@hostname.com:/path/to/destination/dir

To use a specific SSH identity file and/or SSH port, we need to do a little more work. We’ll use the -e flag, which lets us choose/modify the remote shell program used to send files.

# Send files over SSH on port 8888 using a specific identity file:
$ rsync -e 'ssh -p 8888 -i /home/username/.ssh/some_identity.pem' /source/file.ext username@hostname:/destination/file.ext

Here are some other common flags to use:

  • -v – Verbose output
  • -z – Compress files
  • -c – Compare files based on checksum instead of mod-time (create/modified timestamp) and size
  • -r – Recursive
  • -S – Handle sparse files efficiently
  • Symlinks:
    • -l – Copy symlinks as symlinks
    • -L – Transform symlink into referent file/dir (copy the actual file)
  • -p – Preserve permissions
  • -h – Output numbers in a human-readable format
  • --exclude="" – Files to exclude
    • e.g. Exclude the .git directory: --exclude=".git"

There are many other options as well – you can do a LOT with rsync!

Do a Dry-Run:

I often do a dry-run of rsync to preview what files will be copied over. This is useful for making sure your flags are correct and you won’t overwrite files you don’t wish to:

For this, we can use the -n or --dry-run flag:

# Copy the current directory
$ rsync -vzcrSLhp --dry-run ./ username@hostname.com:/var/www/some-site.com
#> building file list ... done
#> ... list of directories/files and some meta data here ...

Resume a Stalled Transfer:

Once in a while a large file transfer might stall or fail (while either using scp or rsync). We can actually use rsync to finish a file transfer!

For this, we can use the --partial flag, which tells rsync to not delete partially transferred files but keep them and attempt to finish its transfer on a next attempt:

$ rsync --partial --progress largefile.ext username@hostname:/path/to/largefile.ext

The Archive Option:

There’s also a -a or --archive option, which is a handy shortcut for the options -rlptgoD:

  • -r – Copy recursively
  • -l – Copy symlinks as symlinks
  • -p – Preserve permissions
  • -t – Preserve modification times
  • -g – Preserve group
  • -o – Preserve owner (User needs to have permission to change owner)
  • -D – Preserve special/device files. Same as --devices --specials. (User needs permissions to do so)
# Copy using the archive option and print some stats
$ rsync -a --stats /source/dir/path username@hostname:/destination/dir/path


1) technique

copy from source

tar -cf – /backup/ | pv | pigz | nc -l 8888

Destination

nc master.active.ai 8888 | pv | pigz -d | tar xf – -C /

2)
time tar -c /backup/ |pv|lz4 -B4| ssh -c aes128-ctr root@192.168.1.73 “lz4 -d |tar -xC /backup”

3) copy files using netcat

4) rysnc

50 MB /SEC

rsync -aHAXWxv –numeric-ids –no-i-r –info=progress2 -e “ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x” /backup/ root@192.168.1.73:/backup/

time rsync -aHAXWxv –numeric-ids –no-i-r –info=progress2 -e “ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x” /backup/ root@192.168.1.73:/backup/


hen copying to the local file system I always use the following rsync options:

# rsync -avhW --no-compress --progress /src/ /dst/

Here’s my reasoning:

-a is for archive, which preserves ownership, permissions etc.
-v is for verbose, so I can see what's happening (optional)
-h is for human-readable, so the transfer rate and file sizes are easier to read (optional)
-W is for copying whole files only, without delta-xfer algorithm which should reduce CPU load
--no-compress as there's no lack of bandwidth between local devices
--progress so I can see the progress of large files (optional)

70 MB / SEC
5) time tar cvf – /backup/* | ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x root@192.168.1.73 “tar xf – -C / ”

time tar cvf – /backup/* | pv | ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x root@192.168.1.73 “tar xf – -C / ”

time tar -cpSf – /backup/* | pv | ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x root@192.168.1.73 “tar xf – -C / ”

 6)
tar cvf - ubuntu.iso | gzip -9 - | split -b 10M -d - ./disk/ubuntu.tar.gz.



#!/bin/bash
# SETUP OPTIONS
export SRCDIR="/folder/path"
export DESTDIR="/folder2/path"
export THREADS="8"
# RSYNC DIRECTORY STRUCTURE
rsync -zr -f"+ */" -f"- *" $SRCDIR/ $DESTDIR/ \
# FOLLOWING MAYBE FASTER BUT NOT AS FLEXIBLE
# cd $SRCDIR; find . -type d -print0 | cpio -0pdm $DESTDIR/
# FIND ALL FILES AND PASS THEM TO MULTIPLE RSYNC PROCESSES
cd $SRCDIR  &&  find . ! -type d -print0 | xargs -0 -n1 -P$THREADS -I% rsync -az % $DESTDIR/%
# IF YOU WANT TO LIMIT THE IO PRIORITY, 
# PREPEND THE FOLLOWING TO THE rsync & cd/find COMMANDS ABOVE:
#   ionice -c2
rsync -zr -f"+ */" -f"- *" -e 'ssh -c arcfour' $SRCDIR/ remotehost:/$DESTDIR/ \
&& \
cd $SRCDIR  &&  find . ! -type d -print0 | xargs -0 -n1 -P$THREADS -I% rsync -az -e 'ssh -c arcfour' % remotehost:/$DESTDIR/% 

Parallelizing rsync

Last week I had a massive hardware failure on one of the GlusterFS storage nodes in the ILRI, Kenya Research Computing cluster: two drives failed simultaneously on the underlying RAID5. As RAID5 can only withstand one drive failure, the entire 31TB array was toast. FML.

After replacing the failed disks, rebuilding the array, and formatting my bricks, I decided I would use rsync to pre-seed my bricks from the good node before bringing glusterd back up.

tl;dr: rsync is amazing, but it’s single threaded and struggles when you tell it to sync large directory hierarchies. Here’s how you can speed it up.

rsync #fail

I figured syncing the brick hierarchy from the good node to the bad node was simple enough, so I stopped the glusterd service on the bad node and invoked:

# rsync -aAXv --delete --exclude=.glusterfs storage0:/path/to/bricks/homes/ storage1:/path/to/bricks/homes/

After a day or so I noticed I had only copied ~1.5TB (over 1 hop on a dedicated 10GbE switch!), and I realized something must be wrong. I attached to the rsync process with strace -p and saw a bunch of system calls in one particular user’s directory. I dug deeper:

# find /path/to/bricks/homes/ukenyatta/maker/genN_datastore/ -type d | wc -l
1398640

So this one particular directory in one user’s home contained over a million other directories and $god knows how many files, and this command itself took several hours to finish! To make matters worse, careful trial and error inspection of other user home directories revealed more massive directory structures as well.

What we’ve learned:

  • rsync is single threaded
  • rsync generates a list of files to be synced before it starts the sync
  • MAKER creates a ton of output files/directories ????

It’s pretty clear (now) that a recursive rsync on my huge directory hierarchy is out of the question!

rsync #winning

I had a look around and saw lots of people complaining about rsync being “slow” and others suggesting tips to speed it up. One very promising strategy was described on this wiki and there’s a great discussion in the comments.

Basically, he describes a clever use of find and xargs to split up the problem set into smaller pieces that rsync can process more quickly.

sync_brick.sh

So here’s my adaptation of his script for the purpose of syncing failed GlusterFS bricks, sync_brick.sh:

#!/usr/bin/env bash
# borrowed / adapted from: https://wiki.ncsa.illinois.edu/display/~wglick/Parallel+Rsync

# RSYNC SETUP
RSYNC_PROG=/usr/bin/rsync
# note the important use of --relative to use relative paths so we don't have to specify the exact path on dest
RSYNC_OPTS="-aAXv --numeric-ids --progress --human-readable --delete --exclude=.glusterfs --relative"
export RSYNC_RSH="ssh -T -c arcfour -o Compression=no -x"

# ENV SETUP
SRCDIR=/path/to/good/brick
DESTDIR=/path/to/bad/brick
# Recommend to match # of CPUs
THREADS=4
BAD_NODE=server1

cd $SRCDIR

# COPY
# note the combination of -print0 and -0!
find . -mindepth 1 -maxdepth 1 -print0 | \ 
    xargs -0 -n1 -P$THREADS -I% \
        $RSYNC_PROG $RSYNC_OPTS "%" $BAD_NODE:$DESTDIR

Pay attention to the source/destination paths, the number of THREADS, and the BAD_NODE name, then you should be ready to roll.

The Magic, Explained

It’s a bit of magic, but here are the important parts:

  • The -aAXv options to rsync tell it to archive, preserve ACLs, and preserve eXtended attributes. Extended attributes are critically important in GlusterFS >= 3.3, and also if you’re using SELinux.
  • The --exclude=.glusterfs option to rsync tells it to ignore this directory at the root of the directory, as the self-heal daemon?—?glustershd?—?will rebuild it based on the files’ extended attributes once we restart the glusterd service.
  • The --relative option to rsync is so we don’t have to bother constructing the destination path, as rsync will imply the path is relative to our destination’s top.
  • The RSYNC_RSH options influence rsync‘s use of SSH, basically telling it to use very weak encryption and disable any unnecessary features for non-interactive sessions (tty, X11, etc).
  • Using find with -mindepth 1 and -maxdepth 1 just means we concentrate on files/directories 1 level below each directory in our immediate hierarchy.
  • Using xargs with -n1 and -P tells it to use 1 argument per command line, and to launch $THREADS number of processes at a time.

Migrating to Amazon Linux 2

AWS also announced that Amazon Linux 2018.03 is the last release for the current generation of Amazon Linux and will be supported until June 30, 2020. Therefore, you have to come up with a migration plan.

Amazon Linux 2 comes with the same benefits as Amazon Linux, but it adds some new capabilities:

  • long-term support: Amazon Linux 2 supports each LTS release for five years
  • on-premises support: virtual machine images for on-premises development and testing are available
  • systemd: replacing SystemVinit
  • extras library: provides up-to-date versions of software bundles such as nginx

Let’s dive into some of the changes in more detail. At the end of the post, I will also outline some pitfalls I encountered when migrating our Free Templates for AWS CloudFormation to Amazon Linux 2.

Further reading: Release Notes, FAQs, AWS Blog Post, Announcement

Long-term support

The Amazon Linux delivers a continuous flow of updates that allow you to roll from one version of the Amazon Linux AMI to the most recent. A yum update always moves your system to the latest Amazon Linux version. There were no versions of Amazon Linux available, only snapshots.

Amazon Linux 2 changes this. You will have Amazon Linux 2 versions that are supplied with updates for five years. Once a new Amazon Linux 2 LTS release becomes available, no breaking changes will be introduced by AWS for this release.

systemd

Amazon Linux uses SysVinit to bootstrap the Linux user space and to manage system processes after booting. This procedure is usually called init. One of the major drawbacks of SysVinit is that it starts tasks serially, waiting for each to finish loading before moving on to the next. This can result in long delays during boot.

Amazon Linux 2 uses systemd as the init system. systemd executes elements of its startup sequence in parallel, which is faster than the traditional serial approach from SysVinit. systemd can also ensure that a service is running (e.g., it restarts a service if it crashed).

systemd is not just the name of the init system daemon but also refers to the entire software bundle around it, which includes:

  • journald: responsible for event logging (replaces syslog)
  • udevd: device manager for the Linux kernel, which handles the /dev directory and all user space actions when adding/removing devices
  • logind: manages user logins and seats in various ways.

I will not cover udevd and logind in this post. You should not get in touch with them as a normal user like me. Keep in mind that networking configuration is not controlled by networkd (also part of systemd software bundle). Instead, networking configuration is controlled by cloud-init which is triggered by systemd several times during boot. cloud-init handles early initialization of an EC2 instance (also works with other vendors).

Further reading: systemd man page

Reading logs from journald

To read all system logs (journal in journald terminology), starting with the oldest entry, run journalctl. The output is paged through less by default. Which means you can scroll down / up an entry with the DOWN / UP arrow keys, or scroll a full page down/up with the SPACE / b keys. Press the q key to quit. To reverse the order, run journalctl -r.

To show only the most recent journal entries, and continuously print new entries, run journalctl -f (like a tail -f).

There are many ways to filter the output. Based on priority, run journalctl -p err to get levels alert, crit, and err (using syslog log levels). Based on the unit, run journalctl -u sshd to get all entries for sshd. Check the further reading links for more information.

Keep in mind that some applications still write logs to /var/log. Journald also forwards logs to rsyslog which is configured (/etc/rsyslog.conf) to write some of them to files in /var/log.

Further reading: journalctl man page

Controlling systemd services

To start a service (unit in systemd terminology), you run:

systemctl start awslogsd.service

To make sure a service (unit in systemd terminology) is started during boot/reboot, you run:

systemctl enable awslogsd.service

There are many other commands. E.g., you can also reboot the system:

systemctl reboot

Further reading: systemctl man page

Extras Library

The Extras Library (aka Amazon Linux Extras Repository or Extras mechanism), provides a way to install up-to-date software bundles (topics in Amazon Linux 2 terminology) without impacting the stability of the rest of the operating system.

Extras Library is not covered by LTS!

To get a list of available topics, run:

$ amazon-linux-extras list
  0  ansible2                 available  [ =2.4.2 ]
  1  emacs                    available  [ =25.3 ]
  2  memcached1.5             available  [ =1.5.1 ]
  3  nginx1.12                available  [ =1.12.2 ]
  4  postgresql9.6            available  [ =9.6.6  =9.6.8 ]
  5  python3                  available  [ =3.6.2 ]
  6  redis4.0                 available  [ =4.0.5 ]
  7  R3.4                     available  [ =3.4.3 ]
  8  rust1                    available  [ =1.22.1  =1.26.0 ]
  9  vim                      available  [ =8.0 ]
 10  golang1.9                available  [ =1.9.2 ]
 11  ruby2.4                  available  [ =2.4.2  =2.4.4 ]
 12  nano                     available  [ =2.9.1 ]
 13  php7.2                   available  [ =7.2.0  =7.2.4  =7.2.5 ]
 14  lamp-mariadb10.2-php7.2  available  [ =10.2.10_7.2.0  =10.2.10_7.2.4  =10.2.10_7.2.5 ]
 15  libreoffice              available  [ =5.0.6.2_15 ]
 16  gimp                     available  [ =2.8.22 ]
 17  docker=latest            enabled    [ =17.12.1  =18.03.1 ]
 18  mate-desktop1.x          available  [ =1.19.0  =1.20.0 ]
 19  GraphicsMagick1.3        available  [ =1.3.29 ]
 20  tomcat8.5                available  [ =8.5.31 ]

To install an topic, run amazon-linux-extras install <topic> (e.g., amazon-linux-extras install ruby2.4).

If you install (or only enable) a topic, a new repository (plus two for sources and debuginfo) is configured in /etc/yum.repos.d/amzn2-extras.repo.

Pitfalls

I migrated Free Templates for AWS CloudFormation to Amazon Linux 2. In the following, I will outline the problems I was faced with and how I worked around them.

The awslogs agent was renamed

The awslogs agent was renamed to awslogsd but you still install it via yum install awslogs.

You can start (activate in systemd terminology) awslogs with systemctl start awslogsd.service (shortcut: systemctl start awslogsd).

The awslogs agent does not support journald

awslogs agent cannot read logs directly from the journal. journald fowards all logs to rsyslog which is configured (/etc/rsyslog.conf) to write some of the logs to files in /var/log from where the awslogs agent can pick them up.

Where are the log files?

/var/log does not contain all system logs anymore.

If in doubt, you can access all system logs with journalctl.

Ruby is missing

Ruby is no longer installed by default. This breaks cfn-init if you want to install RubyGems.

You can install Ruby 2.0 with yum install ruby or Ruby 2.4 with amazon-linux-extras install ruby2.4.

netcat is missing

netcat (or nc) is no longer installed by default.

You can install ncat with yum install nmap-ncat, but this will install nmap based ncat which behaves differently (e.g., no -z flag anymore). Learn more

Nginx package not available by default

nginx is no longer part of the default repository.

$ yum install nginx
Failed to set locale, defaulting to C
Loaded plugins: langpacks, update-motd
No package nginx available.
Error: Nothing to do

To install nginx, use the new Amazon Linux Extras Repository amazon-linux-extras install nginx1.12.

EPEL repository is missing

The EPEL repository (Extra Packages for Enterprise Linux) is no longer installed by default or available to install. The Extras Library replaces the EPEL repository but contains only a fraction of the packages which may causes troubles during your migration.

NAT and ECS optimized AMIs are missing

NAT and ECS optimized AMI are not available. You can replace your NAT instances with NAT Gateways to get around this problem. But for ECS workloads there is no easy workaround. I advise waiting for news from AWS regarding the ECS optimized AMI.

cfn-init is not integrated with the Extras Library

You can not install packages from the Extras Library with the package mechanism in cfn-init easily. cfn-init is the way how you can install software onto EC2 instances managed by CloudFormation.

There can either run amazon-linux-extras enable <topic> before running cfn-init which than can install the package by using the package mechanism. Or you can use two config sets. The first config sets uses the command mechanism to enable the topic. The second config set uses the package mechanism to install the enabled package. You have to use two config sets because commands run after package installation. Here is an example:

AutoScalingGroup:
  Type: 'AWS::AutoScaling::AutoScalingGroup'
  Properties:
    # [...]
LaunchConfiguration:
  Type: 'AWS::AutoScaling::LaunchConfiguration'
  Metadata:
    'AWS::CloudFormation::Init':
      configSets:
        default: [extras, config]
      extras:
        commands:
          a_enable_nginx:
            command: 'amazon-linux-extras enable nginx1.12'
            test: "[ ! grep -Fxq '[amzn2extra-nginx1.12]' /etc/yum.repos.d/amzn2-extras.repo ]"
      config:
        packages:
          yum:
            nginx: [] # will install nginx1.12
  Properties:
    # [...]
    UserData:
      'Fn::Base64': !Sub |
        #!/bin/bash -x
        /opt/aws/bin/cfn-init -v --stack ${AWS::StackName} --resource LaunchConfiguration --region ${AWS::Region}
        /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource AutoScalingGroup --region ${AWS::Region}

Summary

Amazon Linux 2 is the new default for running Linux workloads on AWS. Amazon Linux 2 benefits from systemd, LTS, and a new extras library. There are a few pain points when migrating, most notably the missing EPEL repository. Besides that, you should spend some time to understand how systemd works, because that’s central in modern Linux operating systems.

It’s time to plan your migration from Amazon Linux now!

mysql dump issue utf8_unicode_520_ci

i use this in linux :

sed -i 's/utf8mb4/utf8/g' your_file.sql
sed -i 's/utf8_unicode_ci/utf8_general_ci/g' your_file.sql
sed -i 's/utf8_unicode_520_ci/utf8_general_ci/g' your_file.sql

then restore your_file.sql

mysql -uyourdbuser -pyourdbpasswd yourdb < your_file.sql

Install LDAP-slapd.conf mode

Install LDAP-slapd.conf mode
——————start installation———————
Query whether to install

# rpm -qa openldap-servers

Remove ldap
# yum remove openldap

# yum remove openldap-servers

Installation Environment
Centos7
Apache/2.4.6 (CentOS)
PHP 7.1.11

Install LDAP Server

Openldap-servers-2.4.44-5.el7.x86_64

# yum install openldap-servers openldap-clients migrationtools

# rpm -qa | grep openldap

Delete all files in the slapd.d folder and copy a copy of slapd.conf
# rm -rvf /etc/openldap/slapd.d
# mkdir /etc/openldap/slapd.d

Generate password

# /sbin/slappasswd
New password:
Re-enter new password:

{SSHA}XXXXXXXXXXXXXXXXX….

Ldap-server configuration file
# cp /usr/share/openldap-servers/slapd.ldif /etc/openldap/slapd.conf

Modify the slapd.conf file
# vi /etc/openldap/slapd.conf

Put

olcAccess: to * by dn.base=”gidNumber=0+uidNumber=0,cn=peercred,cn=external,c n=auth” read by dn.base=”cn=Manager,dc=my-domain,dc= Com” read by * none
olcSuffix: dc=my-domain,dc=com olcRootDN: cn=Manager,dc=my-domain,dc=com

Change to

olcAccess: to * by dn.base=”gidNumber=0+uidNumber=0,cn=peercred,cn=external,c n=auth” read by dn.base=” cn=Manager,dc=rmohan,dc=com ” read by * none olcSuffix: dc=rmohan,dc=com
olcRootDN: cn=Manager,dc=rmohan,dc=com
olcRootPW: {SSHA}XXXXXXXXXXXXXXXXXXXXX

(olcRootPW please copy from password.txt just)

Start converting configuration files
# rm -rvf /var/lib/ldap/*
# /usr/libexec/openldap/convert-config.sh
# head -20 /etc/openldap/slapd.d/cn\=config/olcDatabase\=\{[12]}*

Test if the LDAP configuration file is normal
# slaptest -u
Config file testing succeeded

Clear the database sample file and copy the database default profile
Copy database preset profile #
Cp /usr/share/openldap-servers/DB_CONFIG.example /var/lib/ldap/DB_CONFIG

# rm -rvf /etc/openldap/certs
# mkdir /etc/openldap/ certs

Create a certs DB profile
# /usr/libexec/openldap/create-certdb.sh
Creating certificate database in ‘/etc/openldap/certs’.
# /usr/libexec/openldap/generate-server-cert.sh
Creating new server certificate in ‘/etc/openldap/certs’.
# chown ldap:ldap -R /var/lib/ldap/
# systemctl start slapd
# slaptest
Config file testing succeeded
# systemctl enable slapd
# firewall-cmd –permanent –zone=public –add-port=389/tcp
# firewall-cmd –reload
# ldapwhoami -WD cn=Manager,dc=rmohan,dc=com
Enter LDAP Password:
Dn :cn=Mana g er,dc=rmohan,dc=com

Import schema
# ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/schema/cosine.ldif
# ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/schema/nis.ldif
# ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/schema/inetorgperson.ldif

Edit root node

# vim base.ldif
Dn: dc=rmohan,dc=com
objectClass: top
objectClass: dcObject
Objectclass: organization
o: rmohan
Dc: rmohan

Dn: cn=Manager,dc=rmohan,dc=com
objectClass: organizationalRole
Cn: Manager
Description: Directory Manager

Dn: ou=STU, dc=rmohan,dc=com
objectClass: organizationalUnit
Ou: STU
Description:student

Dn: ou=TEA,dc=rmohan,dc=com
objectClass: organizationalUnit
Ou: TEA
Description:teacher

# /bin/ldapadd -x -D “cn=Manager,dc=rmohan,dc=com” -W -f base.ldif
Enter LDAP Password:
Adding new entry “dc=rmohan,dc=com”
Adding new entry “cn=Manager,dc=rmohan,dc=com”
Adding new entry “ou=People,dc=rmohan,dc=com”
Adding new entry “ou=Group,dc=rmohan,dc=com”

Firewall settings

# /bin/firewall-cmd –permanent –add-service=ldap
# /bin/firewall-cmd –reload

How to reset your root MySQL password

Stop the MySQL process
# service mysqld stop

Once MySQL has stopped, restart it with the --skip-grant-tables option
# mysqld_safe --skip-grant-tables &
or edit your /etc/my.cnf file to add the line
skip-grant-tables

Connect to MySQL using the root user.
mysql -u root

Once logged in, you should see the following prompt:
mysql>
Enter the following commands:
mysql> use mysql;
mysql> UPDATE user SET password=PASSWORD("YOUR NEW PASSWORD HERE") WHERE User='root';
mysql> flush privileges;
mysql> quit

Now stop MySQL again:
# service mysqld stop

If you edited your /etc/my.cnf file, delelete the skip-grant-tables line.

Now restart MySQL and test your new login.
# service mysqld restart
# mysql -u root -p


 

MariaDB [(none)]> CREATE USER xxxx;
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> CREATE USER xxxx@hostname ;
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> GRANT ALL PRIVILEGES ON dbane.* To ‘xxxxx@hostname’ IDENTIFIED BY ‘P@ssw0rd5768#’;
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> FLUSH PRIVILEGES;

GRANT ALL PRIVILEGES ON db.* To ‘usernam’@’localhost’ IDENTIFIED BY ‘P@ssw0rd5768#’;
GRANT USAGE ON *.* TO ‘usernam’@’localhost’ IDENTIFIED BY ‘P@ssw0rd5768#’;

 

Reset root password

UPDATE user SET authentication_string=password(‘password’) WHERE user=’root’;

update mysql.user set password_expired = ‘N’, authentication_string=PASSWORD(‘password’) where user = ‘root’;

GRANT USAGE ON *.* TO ‘root’@’localhost’ IDENTIFIED BY ‘password’;

Reset SonarQube Admin User Password

Reset SonarQube Admin User Password

by default sonar creates admin account : user: admin, password : admin

Start root access to mysql database

mysql -uroot -p
Bash

Check if sonar database exists.

show databases;
SQL
use sonar;
SQL

Reset admin user to admin password.

update users set crypted_password = '88c991e39bb88b94178123a849606905ebf440f5', salt='6522f3c5007ae910ad690bb1bdbf264a34884c6d' where login = 'admin';

git cmd

git pull # will get you repo updates
git add . # will add files in your dir
git add [dirname]/* # will add files under a new dir
git commit -m your comment # will commit your code to your changes
git push # will push your code to a reposatory
git pull # will pull down new reposatory updates

Git Fundamentals

Git Fundamentals

 

Topics To be Covered

  1. Installation

2. Setup

3. Creating a Project

4. Checking the status of the repository

5. Making changes

6. Staging the changes

7. Staging and committing

8. Commiting the changes

9. Changes, not files

10. History

11. Aliases

12. Getting older versions

13. Tagging versions

14. Discarding local changes (before staging)

15. Cancel Staged changes (before committing)

16. Cancelling commits


1. Installing Git

Installing on Linux

If you want to install the basic Git tools on Linux via a binary installer, you can generally do so through the basic package-management tool that comes with your distribution. If you’re on Fedora for example, you can use yum:

$ sudo yum install git-all

If you’re on a Debian-based distribution like Ubuntu, try apt-get:

$ sudo apt-get install git-all

For more options, there are instructions for installing on several different Unix flavors on the Git website, at http://git-scm.com/download/linux.

Installing on Mac

There are several ways to install Git on a Mac. The easiest is probably to install the Xcode Command Line Tools. On Mavericks (10.9) or above you can do this simply by trying to run git from the Terminal the very first time. If you don’t have it installed already, it will prompt you to install it.

If you want a more up to date version, you can also install it via a binary installer. An OSX Git installer is maintained and available for download at the Git website, at https://sourceforge.net/projects/git-osx-installer/

You can also install it as part of the GitHub for Mac install. Their GUI Git tool has an option to install command line tools as well. You can download that tool from the GitHub for Mac website, athttp://mac.github.com.

Installing on Windows

There are also a few ways to install Git on Windows. The most official build is available for download on the Git website. Just go to https://git-for-windows.github.io/

Another easy way to get Git installed is by installing GitHub for Windows. The installer includes a command line version of Git as well as the GUI. It also works well with Powershell, and sets up solid credential caching and sane CRLF settings. We’ll learn more about those things a little later, but suffice it to say they’re things you want. You can download this from the GitHub for Windows website, at http://windows.github.com

2. One time Git Environment Setup

Now that you have Git on your system, you’ll want to do a few things to customize your Git environment. You should have to do these things only once; they’ll stick around between upgrades. You can also change them at any time by running through the commands again.

Git comes with a tool called git config that lets you get and set configuration variables that control all aspects of how Git looks and operates. These variables can be stored in three different places:

$ git config?—?global user.name “Aravind G V”

$ git config?—?global user.email aravind_gv@intuit.com

$ git config?—?global core.editor “edit -w”

3. Creating a Project

Create a new repository

Create a New Folder or Do in Existing Folder if you want to add it to vcs

mkdir training
cd training
touch hello.txt

Create a repository

So you have a directory that contains one file. Run the git init in order to create a git repo from that directory.

RUN:

git init

RESULT:

$ git init
Initialized empty Git repository in /Users/agv/git-training

This creates a new subdirectory named .git that contains all of your necessary repository files?—?a Git repository skeleton. At this point, nothing in your project is tracked yet

Add the page to the repository

Now let’s add the “Hello, World” page to the repository.

RUN:

git add hello.txt
git commit -m "First Commit"

You will see …

RESULT:

$ git add hello.txt
$ git commit -m "First Commit"
[master (root-commit) 2fc4372] First Commit
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 hello.txt

Checking the status of the repository

Use the git status command, to check the current state of the repository.

RUN:

git status

RESULT:

$ git status
# On branch master
nothing to commit (working directory clean)

The command checks the status and reports that there’s nothing to commit, meaning the repository stores the current state of the working directory, and there are no changes to record.

We will use the git status, to keep monitoring the states of both the working directory and the repository.

Making changes

Let’s add some thing in text file

vi hello.txt
Test status

Checking the status

Check the working directory’s status.

RUN:

git status

You will see …

RESULT:

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
modified:   hello.txt
no changes added to commit (use "git add" and/or "git commit -a")

The first important aspect here is that git knows hello.txt file has been changed, but these changes are not yet committed to the repository.

Another aspect is that the status message hints about what to do next. If you want to add these changes to the repository, use git add. To undo the changes use git checkout.

6. Staging the changes

Adding changes

Now command git to stage changes. Check the status

RUN:

git add hello.txt
git status

You will see …

RESULT:

$ git add hello.txt
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   hello.txt
#

Changes to the hello.txt have been staged. This means that git knows about the change, but it is not permanent in the repository. The next commit will include the changes staged.

Should you decide not to commit the change, the status command will remind you that you can use the git reset command to unstage these changes.

Staging and committing

A staging step in git allows you to continue making changes to the working directory, and when you decide you wanna interact with version control, it allows you to record changes in small commits.

Suppose you have edited three files (a. html, b. html, and c. html). After that you need to commit all the changes so that the changes to a. html and b. html were a single commit, while the changes to c. html were not logically associated with the first two files and were done in a separate commit.

In theory you can do the following:

touch a.html b.html c.html
git add a.html
git add b.html
git commit -m "Changes for a and b"
git add c.html
git commit -m "Unrelated change to c"

Separating staging and committing, you get the chance to easily customize what goes into a commit

8. Committing the changes

Well, enough about staging. Let’s commit the staged changes to the repository.

When you previously used git commit for committing the first hello.html version to the repository, you included the -m flag that gives a comment on the command line. The commit command allows interactively editing comments for the commit. And now, let’s see how it works.

If you omit the -m flag from the command line, git will pop you into the editor of your choice from the list (in order of priority):

  • GIT_EDITOR environment variable
  • core.editor configuration setting
  • VISUAL environment variable
  • EDITOR environment variable

I have the EDITOR variable set to emacsclient (available for Linux and Mac).

Let us commit now and check the status.

RUN:

git commit

You will see the following in your editor:

RESULT:

|
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   hello.html
#

On the first line, enter the comment: “Added hi tag”. Save the file and exit the editor (to do it in default editor, press ESC and then type:wq and hit Enter). You should see …

RESULT:

git commit
Waiting for Emacs...
[master 569aa96] Added h1 tag
 1 files changed, 1 insertions(+), 1 deletions(-)

“Waiting for Emacs…” is obtained from the emacsclient program sending the file to a running emacs program and waiting for it to be closed. The rest of the data is the standard commit messages.

02Checking the status

At the end let us check the status.

RUN:

git status

You will see …

RESULT:

$ git status
# On branch master
nothing to commit (working directory clean)

The working directory is clean, you can continue working

Changes, not files

Understanding that git works with the changes, not the files.

Most version control systems work with files. You add the file to source control and the system tracks changes from that moment on.

Git concentrates on the changes to a file, not the file itself. A git add file command does not tell git to add the file to the repository, but to note the current state of the file for it to be commited later.

We will try to investigate the difference in this lesson.

First Change:

FILE: HELLO.txt

First Change

Add this change

Now add this change to the git staging.

RUN:

git add hello.html

Second change:

FILE: HELLO.txt

Second Change

Check the current status

RUN:

git status

You will see …

RESULT:

$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   hello.txt
#
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   modified:   hello.txt
#

Please note that hello.txt is listed in the status twice. The first change (the addition of default tags) is staged and ready for a commit. The second change (adding HTML headers) is unstaged. If you were making a commit right now, headers would not have been saved to the repository.

Let’s check.

05Commit

Commit the staged changes (default values), then check the status one more time.

RUN:

git commit -m "Added second "
git status

You will see …

RESULT:

$ git commit -m "Added second"
[master 8c32287] Added standard HTML page tags
 1 files changed, 3 insertions(+), 1 deletions(-)
$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   modified:   hello.txt
#
no changes added to commit (use "git add" and/or "git commit -a")

The status command suggests that hello.html has unrecorded changes, but is no longer in the buffer zone.

Adding the second change

Add the second change to the staging area, after that run the git status command.

RUN:

git add .
git status

Note: The current directory (‘.’) will be our file to add. This is the most convenient way to add all the changes to the files of the current directory and its folders. But since it adds everything, it is a good idea to check the status prior to doing an add ., to make sure you don’t add any file that should not be added.

I wanted you to see the “add .” trick, and we will continue adding explicit files later on just in case.

You will see …

RESULT:

$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   hello.html
#

The second change has been staged and is ready for a commit.

Commit the second change

RUN:

git commit -m "Added second change"

History

To learn to view the project’s history.

Getting a list of changes made is a function of the git log command.

RUN:

git log

You will see …

RESULT:

commit ef44671e9b7aef27027f9d2b438e366d3133102b
Author: Aravind G V <aravind_g@?—?— >
Date: Wed Apr 6 13:09:55 2016 +0530

XIADMIN windows scripts

commit 70d87e94e7937f8a6ce89d4cd6001d99abfb4e77
Author: Neelam Malik <Neelam_Malik@?—?>
Date: Wed Apr 6 11:21:54 2016 +0530

added app_stop

commit 77ac1140018714a6b26c27535c6279eff40c87e9
Author: Neelam Malik <aravind_gv@?—?>
Date: Wed Apr 6 11:21:32 2016 +0530

added app_status

commit 36fb85091deabfe3e590a101aa1a1aca1adfa8c2
Author: Neelam Malik ?@?—?.om>
Date: Wed Apr 6 11:21:04 2016 +0530

added app_start

added app_stop

commit e4be69a36c3a37203e2b539f97eb77b3f253fe99
Author: Neelam Malik <aravind_gv@?—?>
Date: Wed Apr 6 11:18:42 2016 +0530

One line history

You fully control over what the log shows. I like the single line format:

RUN:

git log --pretty=oneline

You will see …

RESULT:

$ git log --pretty=oneline
fa3c1411aa09441695a9e645d4371e8d749da1dc Added HTML header
8c3228730ed03116815a5cc682e8105e7d981928 Added standard HTML page tags
43628f779cb333dd30d78186499f93638107f70b Added h1 tag
911e8c91caeab8d30ad16d56746cbd6eef72dc4c First Commit

Controlling the display of entries

There are many options to choose which entries appear in the log. Play around with the following parameters:

git log --pretty=oneline --max-count=2
git log --pretty=oneline --since='5 minutes ago'
git log --pretty=oneline --until='5 minutes ago'
git log --pretty=oneline --author=<your name>
git log --pretty=oneline --all

Details are provided in the git-log instruction.

Getting fancy

This is what I use to review the changes made within the last week. I will add?—?author=alex if I want to see only the changes made by me.

git log --all --pretty=format:"%h %cd %s (%an)" --since='7 days ago'

The ultimate format of the log

Over time, I found the following log format to be the most suitable.

RUN:

git log --pretty=format:"%h %ad | %s%d [%an]" --graph --date=short

It looks like this:

RESULT:

$ git log --pretty=format:"%h %ad | %s%d [%an]" --graph --date=short
* fa3c141 2011-03-09 | Added HTML header (HEAD, master) [Alexander Shvets]
* 8c32287 2011-03-09 | Added standard HTML page tags [Alexander Shvets]
* 43628f7 2011-03-09 | Added h1 tag [Alexander Shvets]
* 911e8c9 2011-03-09 | First Commit [Alexander Shvets]

Let’s look at it in detail:

  • — pretty=”…” defines the output format.
  • %h is the abbreviated hash of the commit
  • %d commit decorations (e.g. branch heads or tags)
  • %ad is the commit date
  • %s is the comment
  • %an is the name of the author
  • — graph tells git to display the commit tree in the form of an ASCII graph layout
  • — date=short keeps the date format short and nice

So, every time you want to see a log, you’ll have to do a lot of typing. Fortunately, we will find out about the git aliases in the next lesson.

Other tools

Both gitx (for Mac) and gitk (for any platform) can help to explore log history.

Aliases

Command aliases (optional)

Common aliases

For Windows users:

RUN:

git config --global alias.co checkout
git config --global alias.ci commit
git config --global alias.st status
git config --global alias.br branch
git config --global alias.hist 'log --pretty=format:"%h %ad | %s%d [%an]" --graph --date=short'
git config --global alias.type 'cat-file -t'
git config --global alias.dump 'cat-file -p'

Also, for users of Unix/Mac:

git status, git add, git commit, and git checkout are common commands so it is a good idea to have abbreviations for them.

Add the following to the .gitconfig file in your $HOME directory.

.GITCONFIG

[alias]
  co = checkout
  ci = commit
  st = status
  br = branch
  hist = log --pretty=format:\"%h %ad | %s%d [%an]\" --graph --date=short
  type = cat-file -t
  dump = cat-file -p

We’ve already talked about commit and status commands. In the previous lesson we covered the log command and will get to know the checkout command very soon. The most important thing to learn from this lesson is that you can type git st wherever you had to typegit status. Best of all, the git hist command will help you avoid the really long log command.

Go ahead and try using the new commands.

If your shell supports aliases, or shortcuts, you can add aliases on this level, too. I use:

alias gs='git status '
alias ga='git add '
alias gb='git branch '
alias gc='git commit'
alias gd='git diff'
alias go='git checkout '
alias gk='gitk --all&'
alias gx='gitx --all'
alias got='git '
alias get='git '

The go abbreviation for git checkout is very useful, allowing me to type:

go <branch>

to checkout a particular branch.

Also, I often mistype git as get or got so I created aliases for them too.

Getting older versions

To learn how to checkout any previous snapshot into the working directory.

Going back in history is very simple. The checkout command can copy any snapshot from the repo to the working directory.

Getting hashes for the previous versions

RUN:

git hist

Note: Do not forget to define hist in your .gitconfig file? If you do not remember how, review the lesson on aliases.

RESULT:

$ git hist
* 7358571 2016-06-23 | Second Test (HEAD -> master) [Aravind G V]
* 2fc4372 2016-06-23 | First Commit [Aravind G V]

Check the log data and find the hash for the first commit. You will find it in the last line of the git hist data. Use the code (its first 7 chars are enough) in the command below. After that check the contents of the hello.html file.

RUN:

git checkout <hash>
cat hello.txt

Note: Many commands depend on the hash values in the repository. Since my hash values will be different from yours, substitute in the appropriate hash value for your repository everytime you see <hash> or <treehash> in the command.

You will see …

RESULT:

$ git checkout 911e8c9
Note: checking out '911e8c9'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
  git checkout -b new_branch_name
HEAD is now at 911e8c9... First Commit
$ cat hello.txt
Hello, World

The checkout command output totally clarifies the situation. Older git versions will complain about not being on a local branch. But you don’t need to worry about that right now.

Note that the content of the hello.html file is the default content.

Returning to the latest version in the master branch

RUN:

git checkout master
cat hello.txt

‘master’ is the name of the default branch. By checking out a branch by name, you go to its latest version.

Tagging versions

To learn how to tag commits for future references

Let’s call the current version of the hello program version 1 (v1).

Creating a tag of the first

RUN:

git tag v1

Now, the current version of the page is referred to as v1.

Tags for previous versions

Let’s tag the version prior to the current version with the name v1-beta. First of all we will checkout the previous version. Instead of looking up the hash, we are going to use the ^ notation indicating “the parent of v1”.

If the v1^ notation causes troubles, try using v1~1, referencing the same version. This notation means “the first version prior to v1”.

RUN:

git checkout v1^
cat hello.html

RUN:

$ git checkout v1^
Note: checking out 'v1^'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
  git checkout -b new_branch_name
HEAD is now at 8c32287... Added standard HTML page tags
$ cat hello.html

.

RUN:

git tag v1-beta

Check out by the tag name

Now try to checkout between the two tagged versions.

RUN:

git checkout v1
git checkout v1-beta

RESULT:

$ git checkout v1
Previous HEAD position was 8c32287... Added standard HTML page tags
HEAD is now at fa3c141... Added HTML header
$ git checkout v1-beta
Previous HEAD position was fa3c141... Added HTML header
HEAD is now at 8c32287... Added standard HTML page tags

04Viewing tags with the tag command

You can see the available tags using the git tag command.

RUN:

git tag

RESULT:

$ git tag
v1
v1-beta

Viewing tags in logs

You can also check for tags in the log.

RUN:

git hist master --all

RESULT:

$ git hist master --all
* fa3c141 2011-03-09 | Added HTML header (v1, master) [Alexander Shvets]
* 8c32287 2011-03-09 | Added standard HTML page tags (HEAD, v1-beta) [Alexander Shvets]
* 43628f7 2011-03-09 | Added h1 tag [Alexander Shvets]
* 911e8c9 2011-03-09 | First Commit [Alexander Shvets]

You can see tags (v1 and v1-beta) listed in the log together with the name of the branch (master). The HEAD shows the commit you checked out (currently v1-beta).

Discarding local changes (before staging)

To learn how to discard the working directory changes

Checking out the Master branch

Make sure you are on the lastest commit in the master brach before you continue.

RUN:

git checkout master

Change hello.txt

It happens that you modify a file in your local working directory and sometimes wish just to discard the committed changes. Here is when the checkout command will help you.

Make changes to the hello.html file in the form of an unwanted comment.

FILE: HELLO.txt

test-1
test-2
no need of this line

Check the status

First of all, check the working directory’s status.

RUN:

git status

Undoing the changes in the working directory

Use the checkout command in order to checkout the repository’s version of the hello.html file.

RUN:

git checkout hello.txt
git status
cat hello.txt

Cancel Staged changes (before committing)

To learn how to undo changes that have been staged

Edit file and stage changes

Make changes to the hello.html file in the form of an unwanted comment

FILE: HELLO.txt

test-1
test-2
no need of this line

Stage the modified file.

RUN:

git add hello.txt

Check the status

Check the status of unwanted changes .

RUN:

git status

RESULT:

$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   hello.txt
#

Status shows that the change has been staged and is ready to commit.

Reset the buffer zone

Fortunately, the displayed status shows us exactly what we should do to cancel staged changes.

RUN:

git reset HEAD hello.txt

RESULT:

$ git reset HEAD hello.html
Unstaged changes after reset:
M   hello.html

The reset command resets the buffer zone to HEAD. This clears the buffer zone from the changes that we have just staged.

The reset command (default) does not change the working directory. Therefore, the working directory still contains unwanted comments. We can use the checkout command from the previous tutorial to remove unwanted changes from working directory.

Switch to commit version

RUN:

git checkout hello.html
git status

RESULT:

$ git status
# On branch master
nothing to commit (working directory clean)

Our working directory is clean again.

Removing a commit from a branch

To learn to delete the branch’s latest commits

Revert is a powerful command of the previous section that allows you to cancel any commits to the repository. However, both original and cancelled commits are seen in the history of the branch (when using git log command).

Often after commit is already made, we realize that it was a mistake. It would be nice to have undo option command allowing deleting incorrect commit immediately. This command would prevent the appearance of unwanted commit in the git log history.

The reset command

We have already used reset command to match the buffer zone and the selected commit (HEAD commit was used in the previous lesson).

When a commit reference is given (ie, a branch, hash, or tag name), the reset command will…

  1. Overwrite the current branch so it will point to the correct commit
  2. Optionally reset the buffer zone so it will comply with the specified commit
  3. Optionally reset the working directory so it will match the specified commit

02Check our history

Let us do a quick scan of our commit history.

RUN:

git hist

We see that the last two commits in this branch are “Oops” and “Revert Oops”. Let us remove them with reset command.

Mark this branch first

Let us mark the last commit with tag, so you can find it after removing commits.

RUN:

git tag oops

Reset commit to previous Oops

At the history log (see above), the commit tagged «v1» is committing previous wrong commit. Let us reset the branch to that point. As the branch has a tag, we can use the tag name in the reset command (if it does not have a tag, we can use the hash value).

RUN:

git reset --hard v1
git hist

RESULT:

$ git reset --hard v1
HEAD is now at fa3c141 Added HTML header

Our master branch is pointing at commit v1 and “Revert Oops” and “Oops” commits no longer exist in the branch. The?—?hard parameter points out that the working directory must be updated to reflect the new branch head.

Nothing is ever lost

What happens to the wrong commits? They are still in the repository. Actually, we still can refer to them. At the beginning of the lesson, we created «oops» tag for the canceled commit. Let us take a look at all commits.

RUN:

git hist --all

We can see that the wrong commits haven’t gone. They are not listed in the master branch anymore but still remain in the repository. They would be still in the repository if we did not tag them, but then we could reference them only by their hash names. Unreferenced commits remain in the repository until the garbage collection software is run by system.

Reset dangers

Resets on local branches are usually harmless. The consequences of any “accident” can be reverted by using the proper commit.

Though, other users sharing the branch can be confused if the branch is shared on remote repositories.

Removing the tag

Oops tag has performed it’s function. Let us remove that tag and permit the garbage collector to delete referenced commit.

RUN:

git tag -d oops
git hist --all

Changing commits

Goals

  • To learn how to modify an already existing commit

Change the page and commit

Post an author comment on the page.

git add hello.html
git commit -m "Add an author comment"

Change the previous commit

We do not want to create another commit for some update. Let us change the previous commit and add an update.

git add hello.html
git commit --amend -m "Add an author/email comment"

View history

git hist

Moving files

Now we will create the structure of our repository. Let us move the page in the lib directory

RUN:

mkdir lib
git mv hello.html lib
git status

Moving files with git, we notify git about two things

  1. The hello.html file was deleted.
  2. The lib/hello.html file was created.

Both facts are staged immediately and ready for a commit. Git status command reports the file has been moved.

One more way to move files

A positive fact about git is that you don’t need to remember about version control to the moment when you need to commit code. What could happen if we were using the operating system command line instead of the git command to move files?

The following set of commands turned out to be identical to our last actions. It requires more work with same result.

We can do:

mkdir lib
mv hello.txt lib
git add lib/hello.txt
git rm hello.txt

Commit new directory

Let us commit this movement.

RUN:

git commit -m "Moved hello.html to lib"

Creating a Branch

Goals

  • To learn how to create a local branch in the repository

It is time to make our hello world more expressive. Since it may take some time, it is best to move these changes into a new branch to isolate them from master branch changes.

Create a branch

Let us name our new branch «testbranch».

RUN:

git checkout -b testbranch
git status

Note: git checkout -b <branch name> is the shortcuts for git branch <branch name> followed by agit checkout <branch name>.

Note that the git status command reports that you are in the testbranch branch.Make some changes and stage the changes.

Navigating Branches

Now your project has two branches:

RUN:

git hist --all

Switching to the Master branch

To switch between branches simply use the git checkout command.

RUN:

git checkout master
cat lib/hello.txt

Changes to master branch

To learn how to work with several branches with different (sometimes conflicting) changes.

At the time you are changing the style branch, someone decided to change the master branch. He added a README file.

Commit changes of README file in the master branch.

RUN:

git checkout master
git add README
git commit -m "Added README"

View the different branches

Now we have a repository with two different branches. To view branches and their differences use log command as follows.

RUN:

git hist --all

We have opportunity to see?—?graph of git hist in action. Adding the?—?graph option to git log causes the construction of a commit tree with the help of simple ASCII characters. We see both branches (style and master) and that the current branch is master HEAD. The Added index.html branch goes prior to both branches.

The?—?all flag guarantees that we see all the branches. By default, only the current branch is displayed.

Merging to a single branch

Merging brings changes from two branches into one. Let us go back to the testbranch branch and merge it with master.

RUN:

git checkout style
git merge master
git hist --all

Creating and Resolving conflict

Return to the master and create conflict

Return to the master branch and make some changes:

git checkout master
vi lib/hello.txt
Test Status
Creating conflict
git add lib/hello.html
git commit -m 'Life is great!'

(Warning: make sure you’ve used single-quotes to avoid problems with bash and ! character)

View branches

git hist --all

Merge the master branch with testbranch

Let us go back to the testbranch branch and merge it with a new master branch.

git checkout testbranch
git merge master

If you open the lib/hello.html you will see conflict errors

Resolution of the conflict

You need to resolve the conflict manually. Make changes to lib/hello.html to achieve the following result.

Now that you’ve got a good handle on Git, let’s look at GitHub. I’m keen not to overwhelm you, so I’ve made an annotated screenshot of a GitHub project, so that you can quickly become familiar with the most common features. Yes, GitHub is more than simply a project repository, but that’s where you’re likely going to spend most of your time on the site.

Page 1 of 18012345...102030...Last »