SED Command

  • SED Command stand for Stream Editor.
  • It can be used for find,replace,insertion or deletion.
  • This powerful tool can edit file without even opening the files.

Synatx

sed OPTIONS... [SCRIPT] [INPUTFILE...] 
  • Print the contents of demo.txt file using sed
sed "" demo.txt
  • Replace(Substitute) the contents of file and print it
sed "s/o/oo/" demo.txt
sed -i "s/o/oo/" demo.txt   # -i will actullay replace the contents of the file

In the above command s/ stands for substitue and then o/ is the character that has to be searched for and oo/ is the text that is suppose to be replaced if the file.

Note in the above executed command only the 1st o is replaced with oo in a line, all the remaining o in the same line are skipped example is line number 3, the o of the World text is not replaced but o of the text Hello is replaced.

use the attribute g at the end of the command to enable replace all the occurrence in a file

  • Delete the lines which has numbers in it
sed "/[0-9]/d" demo.txt
  • Replace and delete together in a single sed command, replace o with oo and delete all the lines which has number in it
sed "s/o/oo/g; /[0-9]/ d" demo.txt
  • Delete all the blank lines using SED
sed "/^$/d" demo.txt

In the above command ^$ stand for blank line i.e the line start(^) and end($) are both together that means it is a blank line, and the command d stands for delete

AWK Command

What is AWK?

  • AWK is a powerful scripting language used to manipulate data and generate report.
  • AWK utiltiy scan a file line by line
  • Split each lines which is scanned into fields separated by delimiter.
  • Compares the scanned fields with the given pattern and matches it.
  • Perform operation on the output that is matched

Syntax of AWK

awk options 'selection _criteria {action }' input-file > output-file

We are going to perform search operation on the below text file demo.txt

cat demo.txt 
Tyson
Demo 332
Hello World
DemO
987 Democode
  • Display all the lines in the file demo.txt?
awk '{ print }' demo.txt
  • Display 1st column of every text in file demo.txt?
awk '{ print $1 }' demo.txt

$1 in the above command prints the first column and no delimiter is specified because by default the delimiter is space.You can specify the delimiter by using -F attribute just like in the example below where the delimiter is colon (:)

You can also concatenate two columns using . operator

  • Find every line in file demo.txt which contains the word demo in regex
awk '/Demo/ { print }' demo.txt
  • Find line in the file demo.txt which starts with a number
awk '/^[0-9]/ { print }' demo.txt
  • Find if the second column contains the number 332
awk ' { if ( $2 ~ /332/)  print } ' demo.txt

In the above example $2 stands for 2nd column and tilda symbol ~ stands for equals to.

  • Print line from 2 to 4 in demo.txt
awk 'NR==2,NR==4 {print} ' demo.txt
  • Perform Arithmetic operation using AWK
echo 22 7 | awk '{ print $1/$2 }'
echo 22 7 | awk '{ print $1*$2 }'
echo 22 7 | awk '{ print $1+$2 }'
echo 22 7 | awk '{ print $1-$2 }'

Why Generics in java

  • Generics was introduced in java 1.5
  • Generics enable classes and interfaces to be parameters while defining it, these parameters are type parameters.
  • Type parameters helps to provide a way to reuse the same code but with different input argument types.
  • The reason why Generics came into picture is to reduce the runtime exceptions in java programs, since before java 1.4 collections does not use to take parameters and developers where not sure which type of object is stored in that collection, developer always had take the object from the collection and cast it and check it, and if the object does not cast properly then it would result into a run time exception. To prevent it from happening using generics we can define what kind of object is stored in the collection while declaring it itself and could prevent the effort of casting the object and possible run time exception while code execution.
  • Thus by using generics we could prevent a lot of run time exceptions by declaring the type along with collection declaration.
  • We need not worry about casting the object since JVM already knows what kind of object is stored in the collection.
  • We can implement sorting searching algorithm in a very generic way using the generics functionality.

Advantages of Generics over non generic code:

  1. Stronger type check at compile time
    If violated then it results into a compile time error.
  2. We can eliminate casting
    List nameList = new ArrayList();
    String name = (String)nameList.get(0);

    List<String> nameList = new ArrayList<>();
    String name = nameList.get(0);
  3. Generic algorithms can be implemented to reuse search and sorting algorithms with the same code.

How to write a Generic Class?

  • Generic is identified by <T>, it is not necessary to be T but any alphabets. We take T commonly in the examples to suggest T as Types
  • If a Data Member(not member function) is a class uses a Generic Data member then its class name show be followed by name of the generic types inside an diamond operator.

Example :

class GenericClass<A,B,C>{
	A varA;
	B varB;
	C varC;
}

Example : On how a single classes variable can store different datatype

package com.test;

public class Demo {
	public static void main(String[] args) {
		Test<String> stringTest = new Test<>();
		stringTest.setItem("Hello");
		System.out.println(stringTest);
		Test<Integer> intTest = new Test<>();
		intTest.setItem(50);
		System.out.println(intTest);
		Test<Double> doubleTest = new Test<>();
		doubleTest.setItem(50.55);
		System.out.println(doubleTest);
		
	}
}
class Test<T>{
	T item;

	public T getItem() {
		return item;
	}

	public void setItem(T item) {
		this.item = item;
	}
	@Override
	public String toString() {
		return this.item.toString();
	}
}

Example : Multiple Parameters

package com.test;

public class DemoHashTable {
	public static void main(String[] args) {
		Hashtable<String,Integer> h1 = new Hashtable<>("Ten",10);
		System.out.println(h1);
		
		Hashtable<String,String> h2 = new Hashtable<>("Ten","Ten");
		System.out.println(h2);
	}
}
class Hashtable<K,V>{
	private K k;
	private V v;
	Hashtable(K k,V v){
		this.k = k;
		this.v = v;
	}
	@Override
	public String toString(){
		return k.toString()+" - "+v.toString();
	}
}

Common Characters Used in Generics:

  • K – Key
  • E – Element
  • N – Number
  • T – Type
  • V – Value
  • S,U,V – Multiple Type

Comparing difference between files in git(Meld GUI tool)

GIT diff

If git status command is not enough for you and you want to find out what content of the files has been changed then you can do that using git diff

git diff command compares the committed files with the files that are not yet added to the staging area, if you want to compare the committed files with the staging area you need to run the command git diff command with —staged attribute, example below

Now can check what you have staged so for using the command git diff –cached

git diff --cached

There exist many different GUI tools under which you can view the GUI differences, run the below command to see the list of tools available and list of tools installed on your system.
Note : Some of these GIT diff tools are commercial product

git difftool --tool-help

We are going to use Meld tool as a git diff tool, for windows you can directly download the exe from their website, for ubuntu users you can run the below command

sudo apt-get install meld

Below commands can be used to set up meld tool on GIT as difftool and merge tool

git config --global diff.tool meld
git config --global difftool.meld.path "/usr/bin/meld"
git config --global difftool.prompt false
git config --global difftool.meld.cmd meld "$LOCAL" "$REMOTE"

git config --global merge.tool meld
git config --global mergetool.meld.path "/usr/bin/meld"
git config --global mergetool.prompt false

#confirm the path of meld using command : which meld

Now you can compare the unstaged and commited data using the meld difftool using below command:

git difftool
git difftool --staged #to Compare the staged data and the committed data

You use git difftool in exactly the same way as you use git diff. e.g.

git difftool <COMMIT_HASH> file_name
git difftool <BRANCH_NAME> file_name
git difftool <COMMIT_HASH_1> <COMMIT_HASH_2> file_name

Inside .git directory

  • When a git repository is initiated i.e (git init) a directory is created called (.git).
  • .git directory contains all the necessary file git stores and manipulate.
  • If you want to backup or clone a repository, copying the .git directory will do the job.
  • Below is the files and directories inside .git folder
    • branch directory
    • config file
    • description file
    • HEAD
    • hooks directory
    • info directory
    • objects directory
    • refs directory
    • index file (if data exist in staging area)
  • You can see other files as well with in this directory but above mentioned files and directories are created when your git repository is initialised.
  • description directory is used by git web program.
  • config file contains project specific configuration.
  • info directory keeps a global excluded file for ignoring pattern that you don’t want to track in .gitignore file.
  • hooks directory contains client or server side hook scripts.
  • object directory stores all the content of your repository.
  • refs directory stores pointer to the commit objects.
  • HEAD file points to the currently checked out branch.
  • index file is where GIT staging information is stored.

Use Round brackets for Grouping

  • By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together.
  • This allows you to apply a regex operator, e.g. a repetition operator, to the entire group.
  • Note that only round brackets can be used for grouping. Square brackets define a character class, and curly braces are used by a special repetition operator.
  • Besides grouping part of a regular expression together, round brackets also create a “backreference”.
  • A backreference stores the part of the string matched by the part of the regular expression inside the parentheses.
  • That is, unless you use non-capturing parentheses. Remembering part of the regex match in a backreference, slows down the regex engine because it has more work to do.
  • If you do not use the backreference, you can speed things up by using non-capturing parentheses, at the expense of making your regular expression slightly harder to read.
  • The regex «Set(Value)?» matches „Set” or „SetValue”. In the first case, the first backreference will be empty, because it did not match anything. In the second case, the first backreference will contain „Value”.
  • If you do not use the backreference, you can optimize this regular expression into «Set(?:Value)?».
  • The question mark and the colon after the opening round bracket are the special syntax that you can use to tell the regex engine that this pair of brackets should not create a backreference.

Use Back reference in regex

  • To back reference is to use the search result of the previous group again in the same search.
  • We define group by using round brackets
    Example : (\d)(\w) where (\d) is the first group and (\w) is the second group
  • We can reference to these groups by using \1 and \2 where \1 is the digit and \2 is the word respectively.
  • Let us see an example to clarify it more

Q & A

Find a digits in the search text that may have been repeated 4 or more than 4 times
Text :
123 13222234
3534534 214333332432
Search Result expected : 2222 and 33333
==================================================================
Regex : (\d)\1{3,}

Optional Items ? in Regex

The question mark makes the preceding token in the regular expression optional. E.g.: «colou?r» matches both „colour” and „color”.

You can make several tokens optional by grouping them together using round brackets, and placing the question mark after the closing bracket. E.g.: «Nov(ember)?» will match „Nov” and „November”.

Important Regex Concept: Greediness

The question mark gives the regex engine two choices: try to match the part the question mark applies to, or do not try to match it. The engine will always try to match that part. Only if this causes the entire regular expression to fail, will the engine try ignoring the part the question mark applies to.

The effect is that if you apply the regex «Feb 23(rd)?» to the string “Today is Feb 23rd, 2003”, the match will always be „Feb 23rd” and not „Feb 23”. You can make the question mark lazy (i.e. turn off the greediness) by putting a second question mark after the first.

Pipe Symbol in regex | (Alteration)

  • You can use alteration or pipe symbol to find out pattern out of several regular expression.
  • Pipe symbol works just like OR operator in java language.
  • If you want to search bikes or cars using same regex then you can separate bikes and cars token with a pipe symbol and search both of them together.
  • Text : Tonight Bikes and Cars are going to have a race on highway
    Find : Text “Bikes” and “Cars” in the same regex.
    ==================================
    Regex : Bikes|Cars
  • Now suppose you want to add additional regex on the search token like \b then you need to group the search using round brackets to do it
    Example : \b(Bikes|Cars)\b
  • If we had omitted the round brackets, the regex engine would have searched for a word boundary followed by Bikes, or, Cars followed by a word boundary.

Word Boundaries (Anchor)

  • The meta character \b is an Anchor just like caret and dollar symbol.
  • Note : Alphanumeric (\w short hand) is considered as words here
  • There are 4 positions that count as word boundaries :
    • Before the first character in string if the first character is a word character(FYI :Special character and numbers are not counted as words)
    • After the last character in string if the last character is a word character
    • Between a word character and a non word character(\W)
    • Between a non word character(\W) and a word character
    • Note : Position between two letters (\w) is not considered as boundary
  • Word boundaries helps to search a word using regex \bword\b

Negation of \b is \B

  • Points the exactly opposite of what \b points to.
  • \B points to position of String character since \b does not point to it.

Q & A

Text : This island is beautiful
Find : is (using \b)
==============================================================
Regex : \bis\b

How Regex Engine search in \b meta character?

Let’s see what happens when we apply the regex «\bis\b» to the string “This island is beautiful”. The engine starts with the first token «\b» at the first character “T”. Since this token is zero-length, the position before the character is inspected. «\b» matches here, because the T is a word character and the character before it is the void before the start of the string. The engine continues with the next token: the literal «i». The engine does not advance to the next character in the string, because the previous regex token was zero-width. «i» does not match “T”, so the engine retries the first token at the next character position.

«\b» cannot match at the position between the “T” and the “h”. It cannot match between the “h” and the “i” either, and neither between the “i” and the “s”.

The next character in the string is a space. «\b» matches here because the space is not a word character, and the preceding character is. Again, the engine continues with the «i» which does not match with the space.

Advancing a character and restarting with the first regex token, «\b» matches between the space and the second “i” in the string. Continuing, the regex engine finds that «i» matches „i” and «s» matches „s”. Now, the engine tries to match the second «\b» at the position before the “l”. This fails because this position is between two word characters. The engine reverts to the start of the regex and advances one character to the “s” in “island”. Again, the «\b» fails to match and continues to do so until the second space is reached. It matches there, but matching the «i» fails.

But «\b» matches at the position before the third “i” in the string. The engine continues, and finds that «i» matches „i” and «s» matches «s». The last token in the regex, «\b», also matches at the position before the second space in the string because the space is not a word character, and the character before it is.

The engine has successfully matched the word „is” in our string, skipping the two earlier occurrences of the characters i and s. If we had used the regular expression «is», it would have matched the „is” in “This”.

Start of String and End of String Anchors ^ and $

  • Unlike character set, anchors in Regex are used to match positions before, after and in between
  • They are used to “Anchor” the regex match at a certain position.
  • The caret “^” matches the position before first character in a single line
    Example : In text “abc” regex “^a” matches the letter “a”
  • The dollar “$” matches the position after last character in a single line
    Example : In text “xyz” regex “z$” matches the letter “z”
  • Note : Anchors matches/searches line by line and not word by word hence they are great to validate single word input from users in applications like email id or number only input.
  • Searching caret “^” and dollar “$” in a multi-line text (Note : CR LF is \n in windows)
Since ^ is a position, notepad++ is pointing to it
Since $ is a position, notepad++ is pointing at it
  • Example 2:
    Input : Text should contain number inputs only “746746746”
    Regex : ^\d+$
  • Example 3:
    Input : Find the starting and ending spaces in paragraph
    Regex : ^\s+|\s+$

Permanent Start and End of String

  • As discussed above “^” and “$” works line by line and with multiple lines together, suppose in a scenario you want to find what is the start of a String in multiple lines in a file the you can go for “\A” and “\Z” instead.
  • \A check the first position of the first line in multiple lines.(Example Below)
  • \Z checks of the last position of the last line in multiple lines.(Example Below)
    Note : If you want to match the line break “\n” position as well then you can use “\z” which will return you the position after \n