BBCode and Regex
BBcode is very popular on forums, if you have never heard of it you may have seen it and don’t know it’s name
BBcode looks like [b][/b] for bold, [url=url]linktitle[/url] for links, etc
I will show you how to make that with Perl and PHP
You need a basic knowledge of regular expressions. I will explain some basic stuff here.
What are Regular Expressions?
Basically, Regular Expressions (also called regexp) are templates that matches a given string.
For example
PHP:
<?php
$string = ‘Hello! Im fedekiller’;
if(preg_match(‘@fedekiller@i’,$string))
{
echo ‘We found fedekiller in that string.’;
}
else
{
echo ‘We could not found fedekiller in that string.’;
}
?>
In Perl:
#!/usr/bin/perl
use CGI’:all’;
print header, start_html();
my $string = ‘Hello! im fedekiller’;
if($string =~ /fedekiller/i)
{
print ‘We found fedekiller in that string.’;
}
else
{
print ‘We could not found fedekiller in that string.’;
}
print end_html;
Those codes will look for fedekiller in the given string, in perl, we use // and put the pattern there, but that was because we did not specify we were matching, if you want to use other separators, you can use for example || or {} but you have to specify you are matching. We will use // here, we dont want to complicate it so much, we also use i after the / to indicate that its CaSe InSeNsItIvE, we also specify case insensitive in PHP after the @
As you have probably realised, to match something in perl we use $string =~/pattern/;
And in PHP we use preg_match(@pattern@,$string);
Quantifiers
Quantifiers are used for repetition
* Matches 0 or more times
+ Matches 1 or more times
? Matches 0 or 1 time
{2} Matches exactly 2 times
{2,} Matches 2 or more
{2,5} Matches 2, 3, 4 or 5 times
Example:
Regex:
fe+dekil{2}er
Will Match:
feeeeeedekiller
Grouping
You can also match by group, using “(” and “)”, that will also make a variable with the content between the “(” and “)”
Example:
Regex:
(fede)+
Will Match:
fedefedefede
We can now acess to the content of the first () with the variable $1, for example if we did something like
Regex:
\[b\](.+?)\[\/b\]
Will Match:
[b]Something[/b]
And the variable $1 will contain Something
Alternatives
You can also match one thing OR other thing, using the ‘|’ bar
Example:
Regex:
(fede|killer)+
Will Match:
killerkillerfedekillerfede
Modifiers:
The modifiers are:
i – Will make it CaSe InSeNsItIvE
g – Global match
m – Multiple lines match
s – Single line match
x – Allow comments and white space in the pattern
e – Evaluate placement
U – Ungreedy pattern
Escape Character
Sometimes we want to use something for example () or {} but we dont want to use that function, for example in BBCode we want to match [b][/b] but no use the []’s function, so, we escape them with the \ character
Anchors
If you want something to match only at the beggining or the end of an string, we can use anchors
^ – Start of a string
$ – End of a string
\b – Word boundary
\B – Not word boundary
\< – Start of a word
\> – End of a word
What is boundary?
Word boundary is if we want to match that string only in a complete word, its for making sure for example we wont find cat in category.
Character Classes
\c – Control Character
\s – White Space
\S – Not white space
\d – Digit (number)
\D – Not digit
\w – Word
\W – Not word
\x – Hexadecimal digit
\O – Octal digit
Special Characters
\n – New Line
\r – Carriage Return
\t – Tab
\v – Vertical tab
\f – Form feed
\xxx – Octal character xxx
\xhh – Hexadecimal character hh
Well, i think that’s enought regex, we can now start with our bbcode function.
So, we want to match all between the [b][/b] tags, and aso the [i], [u], [img] and [url] tags.
In PHP:
function bbcode($content)
{
$content = preg_replace(“@\[b\](.+?)\[\/b\]@i”, “<b>$1</b>”, $content);
$content = preg_replace(“@\[i\](.+?)\[\/i\]@i”, “<i>$1</i>”, $content);
$content = preg_replace(“@\[u\](.+?)\[\/u\]@i”, “<u>$1</u>”, $content);
$content = preg_replace(“@\[img\](.+?)\[\/img\]@i”, “<img src=\”$1\” alt=\”\” />”, $content);
$content = preg_replace(“@\[url\](.+?)\[\/url\]@i”, “<a href=\”$1\”>[Link]</a>”, $content);
$content = preg_replace(“@\[url=(.+?)\](.+?)\[\/url\]@i”, “<a href=\”$1\”>$2</a>”, $content);
$content = str_replace(“\n”, ‘<br />’, $content);
$content = preg_replace(“@\[code\](.+?)\[\/code\]@i”, “<span class=\”box\”>$1</span>”, $content);
return($content);
}
We use the i after the @ to choose case insensitive, we also use the grouping function to get everything between the tags and then a variable wich contains that and put it inside the corresponding HTML tags
We do exactly the same in Perl
In Perl:
sub bbcode
{
$_ = $_[0];
s/\n/<br \/>/gi;
s/\[b\](.+?)\[\/b\]/<b>$1<\/b>/gi;
s/\[i\](.+?)\[\/i\]/<i>$1<\/i>/gi;
s/\[u\](.+?)\[\/u\]/<u>$1<\/u>/gi;
s/\[url\](.+?)\[\/url\]/<a href=”$1″ tager=”_blank”>$1<\/a>/gi;
s/\[url=(.+?)\](.+?)\[\/url\]/<a href=”$1″ target=”_blank”>$2<\/a>/gi;
s/\[img\](.+?)\[\/img\]/<img src=”$1″ \/>/gi;
s/\[code\](.+?)\[\/code\]/<div class=”code”><pre>$1<\/pre><\/div>/gi;
s/\[quote\](.+?)\[\/quote\]/<div class=”quote”>$1<\/div>/gi;
return $_;
}
in PHP, preg_replace is like this preg_replace(regex,replace_with,string)
In Perl
string =~ /regex/replace_with/;
If we dont specify the variable in Perl it will automatically use $_, but we have to specify that we are replacing with the letter s
So it will look like
s/regex/replacement/;
Well, thats all, i hope you all find it useful 🙂
on November 13, 2007 on 7:43 pm
Thanks for the concise and to-the-point regex info! It really helps.
on September 29, 2008 on 12:55 pm
hi…
i like the way you explain this regex stuff. but I have dificulties in using anchor [^] which means not matched. I tried to use ^[^abc]is equal to ^[^a|b|c].
So what about if I want to define the the text is not started by ‘abc’ (not ‘a’ or ‘b’ or ‘c’) . thank in advance..
on September 29, 2008 on 1:55 pm
Hmm if you want to see if the string starts with a,b or c it would be something like this
#!/usr/bin/perl
use strict;
my @string = (‘an elephant’, ‘before midnight’, ‘come to my house later’, ‘once uppon a time’);
if($string[3] =~ /^(a|b|c)/i)
{
print ‘It starts with a, b or c’;
}
else
{
print ‘It does not start with a,b or c’;
}
range from 0 to 3 to see the results
on September 29, 2008 on 3:16 pm
thank you for your quick answer.
but what I am asking is if string isn’t started by ‘abc’.
so ‘abc is my brand’ is true, and ‘bca is my bank’ is false.
on September 29, 2008 on 4:02 pm
(abc) instead of (a|b|c)
on November 9, 2008 on 5:56 am
fHfla5 bnnLst19hdY6llAd3fg6
on October 15, 2010 on 10:34 am
I have this code:
—————————————-
—————————————-
how can i make:
—————————————-
—————————————-
on October 15, 2010 on 10:36 am
How to make:
—————————————-
<pre>
pre content 1
</pre>
<pre>
pre content 2
</pre>
—————————————-
to this:
—————————————-
<pre>
<b> pre content 1</b>
</pre>
<pre>
<b> pre content 2 </b>
</pre>
—————————————-
on September 17, 2011 on 2:56 pm
this helped me a lot
thank you very match
on March 13, 2020 on 10:25 pm
continue reading this https://casino-v.site