From PHP to Perl


BBCode and Regex

Posted in Programming by gosukiwi on November 23, 2006

BBcode is very popular on forums, if you have never heard of it you may have seen it and don’t know it’s name

BBcode looks like [b][/b] for bold, [url=url]linktitle[/url] for links, etc
I will show you how to make that with Perl and PHP
You need a basic knowledge of regular expressions. I will explain some basic stuff here.

What are Regular Expressions?

Basically, Regular Expressions (also called regexp) are templates that matches a given string.
For example
PHP:

<?php
$string = ‘Hello! Im fedekiller’;
if(preg_match(‘@fedekiller@i’,$string))
{
echo ‘We found fedekiller in that string.’;
}
else
{
echo ‘We could not found fedekiller in that string.’;
}
?>

In Perl:

#!/usr/bin/perl
use CGI’:all’;
print header, start_html();
my $string = ‘Hello! im fedekiller’;
if($string =~ /fedekiller/i)
{
print ‘We found fedekiller in that string.’;
}
else
{
print ‘We could not found fedekiller in that string.’;
}
print end_html;

Those codes will look for fedekiller in the given string, in perl, we use // and put the pattern there, but that was because we did not specify we were matching, if you want to use other separators, you can use for example || or {} but you have to specify you are matching. We will use // here, we dont want to complicate it so much, we also use i after the / to indicate that its CaSe InSeNsItIvE, we also specify case insensitive in PHP after the @
As you have probably realised, to match something in perl we use $string =~/pattern/;
And in PHP we use preg_match(@pattern@,$string);

Quantifiers
Quantifiers are used for repetition
* Matches 0 or more times
+ Matches 1 or more times
? Matches 0 or 1 time
{2} Matches exactly 2 times
{2,} Matches 2 or more
{2,5} Matches 2, 3, 4 or 5 times

Example:
Regex:

fe+dekil{2}er

Will Match:

feeeeeedekiller

Grouping
You can also match by group, using “(” and “)”, that will also make a variable with the content between the “(” and “)”
Example:
Regex:

(fede)+

Will Match:

fedefedefede

We can now acess to the content of the first () with the variable $1, for example if we did something like
Regex:

\[b\](.+?)\[\/b\]

Will Match:

[b]Something[/b]

And the variable $1 will contain Something
Alternatives
You can also match one thing OR other thing, using the ‘|’ bar
Example:
Regex:

(fede|killer)+

Will Match:

killerkillerfedekillerfede

Modifiers:
The modifiers are:
i – Will make it CaSe InSeNsItIvE
g – Global match
m – Multiple lines match
s – Single line match
x – Allow comments and white space in the pattern
e – Evaluate placement
U – Ungreedy pattern
Escape Character
Sometimes we want to use something for example () or {} but we dont want to use that function, for example in BBCode we want to match [b][/b] but no use the []’s function, so, we escape them with the \ character

Anchors
If you want something to match only at the beggining or the end of an string, we can use anchors
^ – Start of a string
$ – End of a string
\b – Word boundary
\B – Not word boundary
\< – Start of a word
\> – End of a word

What is boundary?
Word boundary is if we want to match that string only in a complete word, its for making sure for example we wont find cat in category.

Character Classes
\c – Control Character
\s – White Space
\S – Not white space
\d – Digit (number)
\D – Not digit
\w – Word
\W – Not word
\x – Hexadecimal digit
\O – Octal digit

Special Characters
\n – New Line
\r – Carriage Return
\t – Tab
\v – Vertical tab
\f – Form feed
\xxx – Octal character xxx
\xhh – Hexadecimal character hh
Well, i think that’s enought regex, we can now start with our bbcode function.
So, we want to match all between the [b][/b] tags, and aso the [i], [u], [img] and [url] tags.
In PHP:

function bbcode($content)
{
$content = preg_replace(“@\[b\](.+?)\[\/b\]@i”, “<b>$1</b>”, $content);
$content = preg_replace(“@\[i\](.+?)\[\/i\]@i”, “<i>$1</i>”, $content);
$content = preg_replace(“@\[u\](.+?)\[\/u\]@i”, “<u>$1</u>”, $content);
$content = preg_replace(“@\[img\](.+?)\[\/img\]@i”, “<img src=\”$1\” alt=\”\” />”, $content);
$content = preg_replace(“@\[url\](.+?)\[\/url\]@i”, “<a href=\”$1\”>[Link]</a>”, $content);
$content = preg_replace(“@\[url=(.+?)\](.+?)\[\/url\]@i”, “<a href=\”$1\”>$2</a>”, $content);
$content = str_replace(“\n”, ‘<br />’, $content);
$content = preg_replace(“@\[code\](.+?)\[\/code\]@i”, “<span class=\”box\”>$1</span>”, $content);
return($content);
}

We use the i after the @ to choose case insensitive, we also use the grouping function to get everything between the tags and then a variable wich contains that and put it inside the corresponding HTML tags
We do exactly the same in Perl
In Perl:

sub bbcode
{
$_ = $_[0];
s/\n/<br \/>/gi;
s/\[b\](.+?)\[\/b\]/<b>$1<\/b>/gi;
s/\[i\](.+?)\[\/i\]/<i>$1<\/i>/gi;
s/\[u\](.+?)\[\/u\]/<u>$1<\/u>/gi;
s/\[url\](.+?)\[\/url\]/<a href=”$1″ tager=”_blank”>$1<\/a>/gi;
s/\[url=(.+?)\](.+?)\[\/url\]/<a href=”$1″ target=”_blank”>$2<\/a>/gi;
s/\[img\](.+?)\[\/img\]/<img src=”$1″ \/>/gi;
s/\[code\](.+?)\[\/code\]/<div class=”code”><pre>$1<\/pre><\/div>/gi;
s/\[quote\](.+?)\[\/quote\]/<div class=”quote”>$1<\/div>/gi;
return $_;
}

in PHP, preg_replace is like this preg_replace(regex,replace_with,string)
In Perl
string =~ /regex/replace_with/;
If we dont specify the variable in Perl it will automatically use $_, but we have to specify that we are replacing with the letter s
So it will look like
s/regex/replacement/;

Well, thats all, i hope you all find it useful 🙂

Advertisements

9 Responses to 'BBCode and Regex'

Subscribe to comments with RSS or TrackBack to 'BBCode and Regex'.

  1. Stephen said,

    Thanks for the concise and to-the-point regex info! It really helps.

  2. danang said,

    hi…
    i like the way you explain this regex stuff. but I have dificulties in using anchor [^] which means not matched. I tried to use ^[^abc]is equal to ^[^a|b|c].
    So what about if I want to define the the text is not started by ‘abc’ (not ‘a’ or ‘b’ or ‘c’) . thank in advance..

  3. assasiner said,

    Hmm if you want to see if the string starts with a,b or c it would be something like this

    #!/usr/bin/perl
    use strict;

    my @string = (‘an elephant’, ‘before midnight’, ‘come to my house later’, ‘once uppon a time’);

    if($string[3] =~ /^(a|b|c)/i)
    {
    print ‘It starts with a, b or c’;
    }
    else
    {
    print ‘It does not start with a,b or c’;
    }

    range from 0 to 3 to see the results

  4. danang said,

    thank you for your quick answer.
    but what I am asking is if string isn’t started by ‘abc’.
    so ‘abc is my brand’ is true, and ‘bca is my bank’ is false.

  5. assasiner said,

    (abc) instead of (a|b|c)

  6. standy said,

    fHfla5 bnnLst19hdY6llAd3fg6

  7. boingonline said,

    I have this code:
    —————————————-

       pre content 1
    
       pre content 2
    

    —————————————-

    how can i make:

    —————————————-

       pre content 1
    
      pre content 2
    

    —————————————-

  8. boingonline said,

    How to make:
    —————————————-
    <pre>
    pre content 1
    </pre>

    <pre>
    pre content 2
    </pre>
    —————————————-
    to this:
    —————————————-
    <pre>
    <b> pre content 1</b>
    </pre>

    <pre>
    <b> pre content 2 </b>
    </pre>
    —————————————-

  9. alfanak said,

    this helped me a lot
    thank you very match


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: