Regular expression: Difference between revisions

Latest revision as of 17:59, 22 August 2016

This article or section is a stub. It does not yet contain enough information to be considered a real article. In other words, it is a short or insufficient piece of information and requires additions.

Draft

Definition

Regular expressions (regexps) provide a formalism to identify patterns in some text and any sort of other code. E.g. programmers when creating computer code use to find/replace text in some code, computer program scripts can use regexps to translate code from one form into another (e.g. HTML to Wiki), JavaScript programs may use regexps to check if user data entered in HTML form is correct, etc.

In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters. Regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.

The following examples illustrate a few specifications that could be expressed in a regular expression:

the sequence of characters "car" in any context, such as "car", "cartoon", or "bicarbonate"
the word "car" when it appears as an isolated word
the word "car" when preceded by the word "blue" or "red"
a dollar sign immediately followed by one or more digits, and then optionally a period and exactly two more digits

Regular expressions can be much more complex than these examples.

(Wikipedia, retrieved 16:52, 29 August 2008 (UTC)).

There exist several definitions / standards / implementations for regeps. They share a common core. The most popular ones are (see the Wikipedia article for details.

POSIX Basic Regular expressions (BRE)
POSIX Extended Regular expressions (ERE)
Perl-derivative regular expressions

Note: Regular expressions (although useful) are difficult to learn and usually only computer programmers use these. However, HTML and XML coders may consider learning some. E.g. if plan to use JavaScript form validation code it's a good thing to know some.

Examples

Removing HTML code

Identifies both img and a begin tags ([http://stackoverflow.com/questions/3790681/regular-expression-to-remove-html-tags StackOverflow)

<(img|a)[^>]*>

Removes span (begin tag)

<span[^>]*>

Zip code

The following defines a somewhat legal Swiss Zip code:

CH-[0-9]{4,}

The following one defines a valid email address (example from http://www.regular-expressions.info/email.html)

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

Replacing Wiki text

Administrators of this wiki can use RegExps to make mass changes to pages.

Example one

The example from the french version below shows how to replace

[[Flash CS4 - Composant bouton]]/

by

[[Flash CS5 - Composant bouton]] ([[Flash CS4 - Composant bouton|CS4]])

Note how we had to quote the [[ ]]

/\[\[Flash CS4 - Composant bouton\]\]/

Extension:MassEditRegex

Example two

Removing pageby + args tag

Search for:

/<pageby nominor="false" comments="false"\/>/

Search for (a bit dangerous):

/<pageby.*\/>/

Replace with:

<!--  -->

Mass editing HTML files with Perl

Let's assume that you would like to add some line after each <body .....> tag.

Let's find all files that have body tags:

find . -type f -print | xargs grep -l "<body\(.*\)>"

Replace (that's more hairy)

find . -type f -print | xargs perl -i~ -pe "s:<body(.*)>:<body \\1> <p>Something new</p>:g"

Explanations:

-pe means to execute a one liner Perl command (i.e. the expression within the " .....")
-i~ means to replace the orginal file, but create a backup copy with a "~" appended
The pattern s/search regexp/replacement/g defines a search/replace pattern
Since we got a / in the closing p tag, we will use ":" to separate the two
The final :g means to replace all occurences (not needed in our case actually).
The grouping () don't need to be escaped in Perl. I hate regexps :)

Software

Most good text editors provide regexp support
Computer programming languages usually too (including scripting languages like PhP or JavaScript).

Links

Overviews

Regular expression (Wikipedia)
regular-expressions.info Includes a catalogue of most popular ones)
expreg.com

Tutorials

Using Regular Expressions by Stephen Ramsay, Electronic Text Center, University of Virginia]

The 30 Minute Regex Tutorial By Jim Hollenhorst

Programming languages

In JavaScript

Cheatsheets

Regular Expressions Cheat Sheet by LoveJackDaniels.com / Dave Child.

Online tools

regexpal.com (A testing tool for regexps)

@@ Line 14: / Line 14: @@
 Regular expressions can be much more complex than these examples.
-([http://en.wikipedia.org/wiki/Regular_expression Wikipedia], retrieved 16:48, 29 August 2008 (UTC)).}}
+([http://en.wikipedia.org/wiki/Regular_expression Wikipedia], retrieved 16:52, 29 August 2008 (UTC)).}}
 There exist several definitions / standards / implementations for regeps. They share a common core. The most popular ones are (see the [http://en.wikipedia.org/wiki/Regular_expression Wikipedia article] for details.
@@ Line 21: / Line 21: @@
 * Perl-derivative regular expressions
-Note: Regular expressions (although useful) are difficult to learn and usually only computer programmers use these. However, HTML and XML coders may consider learning some ...
+Note: Regular expressions (although useful) are difficult to learn and usually only computer programmers use these. However, HTML and XML coders may consider learning some. E.g. if plan to use [[JavaScript_links#Form_validation|JavaScript form validation]] code it's a good thing to know some.
-== An example ==
+== Examples ==
+=== Removing HTML code ===
+Identifies both img and a begin tags ([http://stackoverflow.com/questions/3790681/regular-expression-to-remove-html-tags StackOverflow)
+ <(img|a)[^>]*>
+Removes span (begin tag)
+ <span[^>]*>
+=== Zip code ===
 The following defines a somewhat legal Swiss Zip code:
@@ Line 32: / Line 42: @@
   \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
+=== Replacing Wiki text ===
+Administrators of this wiki can use RegExps to make mass changes to pages.
+'''Example one'''
+The example from the french version below shows how to replace
+: <nowiki>[[Flash CS4 - Composant bouton]]/</nowiki>
+by
+:<nowiki>[[Flash CS5 - Composant bouton]] ([[Flash CS4 - Composant bouton|CS4]])</nowiki>
+Note how we had to quote the <nowiki> [[ ]] </nowiki>
+:<nowiki> /\[\[Flash CS4 - Composant bouton\]\]/ </nowiki>
+[[image:Mass-edit-regexp.png|frame|none| [http://www.mediawiki.org/wiki/Extension:MassEditRegex Extension:MassEditRegex] ]]
+'''Example two'''
+Removing pageby + args tag
+Search for:
+ /<pageby nominor="false" comments="false"\/>/
+Search for (a bit dangerous):
+ /<pageby.*\/>/
+Replace with:
+ &lt;!-- <!-- <pageby nominor="false" comments="false"/> --> --&gt;
+== Mass editing HTML files with Perl ==
+Let's assume that you would like to add some line after each <nowiki> <body .....> </nowiki> tag.
+Let's find all files that have body tags:
+ <nowiki>find . -type f -print | xargs grep -l "<body\(.*\)>"</nowiki>
+Replace (that's more hairy)
+ <nowiki>find . -type f -print | xargs perl -i~ -pe "s:<body(.*)>:<body \\1> <p>Something new</p>:g"</nowiki>
+Explanations:
+* -pe means to execute a one liner Perl command (i.e. the expression within the " .....")
+* -i~ means to replace the orginal file, but create a backup copy with a "~" appended
+* The pattern s/search regexp/replacement/g defines a search/replace pattern
+* Since we got a / in the closing p tag, we will use ":" to separate the two
+* The final :g means to replace all occurences (not needed in our case actually).
+* The grouping () don't need to be escaped in Perl. I hate regexps :)
 == Software ==
@@ Line 45: / Line 102: @@
 * [http://www.regular-expressions.info/ regular-expressions.info] Includes a catalogue of most popular ones)
 * [http://www.expreg.com/ expreg.com]
+=== Tutorials ===
+* [http://etext.lib.virginia.edu/services/helpsheets/unix/regex.html Using Regular Expressions] by Stephen Ramsay, Electronic Text Center, University of Virginia]
+* [http://codeproject.com/dotnet/RegexTutorial.asp The 30 Minute Regex Tutorial] By Jim Hollenhorst
 === Programming languages ===

Regular expression: Difference between revisions

Latest revision as of 17:59, 22 August 2016

Contents

Definition

Examples

Removing HTML code

Zip code

Replacing Wiki text

Mass editing HTML files with Perl

Software

Links

Tutorials

Programming languages

Cheatsheets

Online tools

Navigation menu

Regular expression: Difference between revisions

Latest revision as of 17:59, 22 August 2016

Definition

Examples

Removing HTML code

Zip code

Replacing Wiki text

Mass editing HTML files with Perl

Software

Links

Tutorials

Programming languages

Cheatsheets

Online tools

Navigation menu

Slow Search