BBCode and Regex

Posted in Programming by gosukiwi on November 23, 2006

BBcode is very popular on forums, if you have never heard of it you may have seen it and don’t know it’s name

BBcode looks like [b][/b] for bold, [url=url]linktitle[/url] for links, etc
I will show you how to make that with Perl and PHP
You need a basic knowledge of regular expressions. I will explain some basic stuff here.

What are Regular Expressions?

Basically, Regular Expressions (also called regexp) are templates that matches a given string.
For example
PHP:

<?php
$string = ‘Hello! Im fedekiller’;
if(preg_match(‘@fedekiller@i’,$string))
{
echo ‘We found fedekiller in that string.’;
}
else
{
echo ‘We could not found fedekiller in that string.’;
}
?>

In Perl:

#!/usr/bin/perl
use CGI’:all’;
print header, start_html();
my $string = ‘Hello! im fedekiller’;
if($string =~ /fedekiller/i)
{
print ‘We found fedekiller in that string.’;
}
else
{
print ‘We could not found fedekiller in that string.’;
}
print end_html;

Those codes will look for fedekiller in the given string, in perl, we use // and put the pattern there, but that was because we did not specify we were matching, if you want to use other separators, you can use for example || or {} but you have to specify you are matching. We will use // here, we dont want to complicate it so much, we also use i after the / to indicate that its CaSe InSeNsItIvE, we also specify case insensitive in PHP after the @
As you have probably realised, to match something in perl we use $string =~/pattern/;
And in PHP we use preg_match(@pattern@,$string);

Quantifiers
Quantifiers are used for repetition
* Matches 0 or more times
+ Matches 1 or more times
? Matches 0 or 1 time
{2} Matches exactly 2 times
{2,} Matches 2 or more
{2,5} Matches 2, 3, 4 or 5 times

Example:
Regex:

fe+dekil{2}er

Will Match:

feeeeeedekiller

Grouping
You can also match by group, using “(” and “)”, that will also make a variable with the content between the “(” and “)”
Example:
Regex:

(fede)+

Will Match:

fedefedefede

We can now acess to the content of the first () with the variable $1, for example if we did something like
Regex:

\[b\](.+?)\[\/b\]

Will Match:

[b]Something[/b]

And the variable $1 will contain Something
Alternatives
You can also match one thing OR other thing, using the ‘|’ bar
Example:
Regex:

(fede|killer)+

Will Match:

killerkillerfedekillerfede

Modifiers:
The modifiers are:
i – Will make it CaSe InSeNsItIvE
g – Global match
m – Multiple lines match
s – Single line match
x – Allow comments and white space in the pattern
e – Evaluate placement
U – Ungreedy pattern
Escape Character
Sometimes we want to use something for example () or {} but we dont want to use that function, for example in BBCode we want to match [b][/b] but no use the []’s function, so, we escape them with the \ character

Anchors
If you want something to match only at the beggining or the end of an string, we can use anchors
^ – Start of a string
$ – End of a string
\b – Word boundary
\B – Not word boundary
\< – Start of a word
\> – End of a word

What is boundary?
Word boundary is if we want to match that string only in a complete word, its for making sure for example we wont find cat in category.

Character Classes
\c – Control Character
\s – White Space
\S – Not white space
\d – Digit (number)
\D – Not digit
\w – Word
\W – Not word
\x – Hexadecimal digit
\O – Octal digit

Special Characters
\n – New Line
\r – Carriage Return
\t – Tab
\v – Vertical tab
\f – Form feed
\xxx – Octal character xxx
\xhh – Hexadecimal character hh
Well, i think that’s enought regex, we can now start with our bbcode function.
So, we want to match all between the [b][/b] tags, and aso the [i], [u], [img] and [url] tags.
In PHP:

function bbcode($content)
{
$content = preg_replace(“@\[b\](.+?)\[\/b\]@i”, “$1”, $content);
$content = preg_replace(“@\[i\](.+?)\[\/i\]@i”, “$1”, $content);
$content = preg_replace(“@\[u\](.+?)\[\/u\]@i”, “$1”, $content);
$content = preg_replace(“@\[img\](.+?)\[\/img\]@i”, “<img src=\”$1\” alt=\”\” />”, $content);
$content = preg_replace(“@\[url\](.+?)\[\/url\]@i”, “<a href=\”$1\”>[Link]</a>”, $content);
$content = preg_replace(“@\[url=(.+?)\](.+?)\[\/url\]@i”, “<a href=\”$1\”>$2</a>”, $content);
$content = str_replace(“\n”, ‘ ’, $content);
$content = preg_replace(“@\[code\](.+?)\[\/code\]@i”, “$1”, $content);
return($content);
}

We use the i after the @ to choose case insensitive, we also use the grouping function to get everything between the tags and then a variable wich contains that and put it inside the corresponding HTML tags
We do exactly the same in Perl
In Perl:

sub bbcode
{
$_ = $_[0];
s/\n/ /gi;
s/\[b\](.+?)\[\/b\]/$1<\/b>/gi;
s/\[i\](.+?)\[\/i\]/$1<\/i>/gi;
s/\[u\](.+?)\[\/u\]/$1<\/u>/gi;
s/\[url\](.+?)\[\/url\]/<a href=”$1″ tager=”_blank”>$1<\/a>/gi;
s/\[url=(.+?)\](.+?)\[\/url\]/<a href=”$1″ target=”_blank”>$2<\/a>/gi;
s/\[img\](.+?)\[\/img\]/<img src=”$1″ \/>/gi;
s/\[code\](.+?)\[\/code\]/<div class=”code”><pre>$1<\/pre><\/div>/gi;
s/\[quote\](.+?)\[\/quote\]/<div class=”quote”>$1<\/div>/gi;
return $_;
}

in PHP, preg_replace is like this preg_replace(regex,replace_with,string)
In Perl
string =~ /regex/replace_with/;
If we dont specify the variable in Perl it will automatically use $_, but we have to specify that we are replacing with the letter s
So it will look like
s/regex/replacement/;

Well, thats all, i hope you all find it useful 🙂

10 Comments

10 Responses to 'BBCode and Regex'

Subscribe to comments with RSS or TrackBack to 'BBCode and Regex'.

Stephen said,

on November 13, 2007 on 7:43 pm

Thanks for the concise and to-the-point regex info! It really helps.

Reply
danang said,

on September 29, 2008 on 12:55 pm

hi…
i like the way you explain this regex stuff. but I have dificulties in using anchor [^] which means not matched. I tried to use ^[^abc]is equal to ^[^a|b|c].
So what about if I want to define the the text is not started by ‘abc’ (not ‘a’ or ‘b’ or ‘c’) . thank in advance..

Reply
assasiner said,

on September 29, 2008 on 1:55 pm

Hmm if you want to see if the string starts with a,b or c it would be something like this

#!/usr/bin/perl
use strict;

my @string = (‘an elephant’, ‘before midnight’, ‘come to my house later’, ‘once uppon a time’);

if($string[3] =~ /^(a|b|c)/i)
{
print ‘It starts with a, b or c’;
}
else
{
print ‘It does not start with a,b or c’;
}

range from 0 to 3 to see the results

Reply
danang said,

on September 29, 2008 on 3:16 pm

thank you for your quick answer.
but what I am asking is if string isn’t started by ‘abc’.
so ‘abc is my brand’ is true, and ‘bca is my bank’ is false.

Reply
assasiner said,

on September 29, 2008 on 4:02 pm

(abc) instead of (a|b|c)

Reply
standy said,

on November 9, 2008 on 5:56 am

fHfla5 bnnLst19hdY6llAd3fg6

Reply
boingonline said,

on October 15, 2010 on 10:34 am

I have this code:
—————————————-
```
   pre content 1
```
```
   pre content 2
```
—————————————-

how can i make:

—————————————-
```
   pre content 1
```
```
  pre content 2
```
—————————————-

Reply
boingonline said,

on October 15, 2010 on 10:36 am

How to make:
—————————————-
<pre>
pre content 1
</pre>

<pre>
pre content 2
</pre>
—————————————-
to this:
—————————————-
<pre>
 pre content 1
</pre>

<pre>
 pre content 2 
</pre>
—————————————-

Reply
alfanak said,

on September 17, 2011 on 2:56 pm

this helped me a lot
thank you very match

Reply
DanielPah said,

on March 13, 2020 on 10:25 pm

continue reading this https://casino-v.site

Reply