Regular Expression

On this page we are going to discuss regarding following about Regular Expression.

a. What is Regular Expression ?
b. How can we use Regular Expression in Java ?

 

What is Regular Expression ?
Regular expressions are a language of string patterns built in to most modern programming  so as they can be used for: searching, extracting, and modifying text.Regular expressions, by definition, are string patterns that describe text. These descriptions can then be used in nearly infinite ways.Regular expressions are supported by most programming languages, e.g., Java, Perl, Groovy, etc. Unfortunately each language supports regular expressions slightly different.

 The basic language constructs include character classes, quantifiers, and meta-characters.


1. Character Classes

Character classes are used to define the content of the pattern. E.g. what should the pattern look for?
 

Expression Description
  Dot, any character (may or may not match line terminators, read on)
\   A digit: [0-9]
\   A non-digit: [^0-9]
\   A whitespace character:
\   A non-whitespace character: [^s]
\   An alphanumeric character: [a-zA-Z_0-9]
\   A non-word character: [^\w]
[abc]   a, b, or c (simple class)
[^abc]   Any character except a, b, or c (negation)
[a-zA-Z]   a through z or A through Z, inclusive (range)
[a-d[m-p]]   a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]]   d, e, or f (intersection)
[a-z&&[^bc]]   a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]]   a through z, and not m through p: [a-lq-z](subtraction)

 
However; notice that in Java, you will need to “double escape” these backslashes. For eg  String pattern = "\\d \\ D \\ W \\ w \\ S \\ s";

2. Quantifiers

Quantifiers can be used to specify the number or length that part of a pattern should match or repeat. A quantifier will bind to the expression group to its immediate left.

Expression Description
*   Match 0 or more times
+   Match 1 or more times
?   Match 1 or 0 times
{n}   Match exactly n times
{n,}   Match at least n times
{n,}   Match at least n times


3. Meta-characters

Meta-characters are used to group, divide, and perform special operations in patterns.

Expression Description
 \   Escape the next meta-character (it becomes a normal/literal character)
^   Match the beginning of the line
.   Match any character (except newline)
$   Match the end of the line (or before newline at the end)
|   Alternation (‘or’ statement)
( )   Grouping
[ ]   Custom character class


 

Let's see some example before we move to next topic

 1. Regular expression for the following list of Email Id's

      JavaSeleniumWorld@gmail.com
      QtpWorld@yahoo.com

      Expression : 
[a-zA-Z0-9\\._%+-]+@[a-zA-Z0-9\\.-]+.[a-zA-Z]{2,4}
      

 Email Id JavaSeleniumWorld @ gmail . com
 Email Id  QtpWorld @  yahoo  . in
 Expression  [a-zA-Z0-9\\._%+-]+ @  [a-zA-Z0-9\\.-]+  .  [a-zA-Z]{2,4}
 Description For 1 or more occurences of alphanumeric Characters and symbols like dot,%,+,-
  For 1 or more occurences of alphanumeric Characters and symbols like dot,-   For 2 to 4 occurences of alphanumeric Characters

 
2. Regular Expression for the following SSN numbers

     123-45-9869
     334-67-6789
     234-86-3456


     Expression : \\d{3}-\\d{2}-\\d{4}

SSN 1 123 - 45 - 9869
SSN 2 334 - 67 - 6789
SSN 3 234 - 86 - 3456
Expression \\d{3} - \\d{2} - \\d{4}
Description  For 3 occurences of numeric digits    For 2 occurrences of numeric digits    For 4 occurrences of numeric digits



How can we use Regular Expression in Java ?


Java provides the java.util.regex package for pattern matching with regular expressions.

The java.util.regex package primarily consists of the following three classes:

Pattern: A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.

Matcher: A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher method on a Pattern object.

PatternSyntaxException: A PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.




Let's see some example

1.
From a list of Email Id's extract all the Email id's which ends with "World.com"

Code

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegularExpression
{
   public static void main( String args[] )
   {
      // create regular expression
      Pattern expression = Pattern.compile( ".*World.com" );
      
      String string1 = "JavaSeleniumWorld.com\n" +
         "QtpWorld.com\n" +
         "google.com\n" +
         "rediff.com\n" +
         "yahoo.com\n";

      // match regular expression to string and print matches
      Matcher matcher = expression.matcher( string1 );
        
      while ( matcher.find() )
         System.out.println( matcher.group() );
   }
}

Output

JavaSeleniumWorld.com
QTPWorld.com