Highlighting keywords in text using Regex.Replace (Perfect for SEO)

Why

I needed to take some text and bold certain keywords before returning the data to the web browser to enhance my Search Engine Optimization

Example

The following example shows how I achieved this although it does contain dummy data. I created a new C# 2005 Console App and added the following to the Main method:

string keywords = "Cat, rabbit, dog,hound, fox";
string text = "The cat spoke to the dog and told him what the rabbit did to the fox while the hound was sleeping.";

Console.WriteLine(HighlightKeywords(keywords, text));
Console.ReadLine();

Then added the follwoing static methods:

private static string HighlightKeywords(string keywords, string text)
{
    // Swap out the ,<space> for pipes and add the braces
    Regex r = new Regex(@", ?");
    keywords = "(" + r.Replace(keywords, @"|") +  ")";

    // Get ready to replace the keywords
    r = new Regex(keywords, RegexOptions.Singleline | RegexOptions.IgnoreCase);

    // Do the replace
    return r.Replace(text, new MatchEvaluator(MatchEval));
}

private static string MatchEval(Match match)
{
    if (match.Groups[1].Success)
    {
        return "<b>" + match.ToString() + "</b>";
        
    }
    
    return ""//no match
}

Result

The <b>cat</b> spoke to the <b>dog</b> and told him 
what the <b>rabbit</b> did to the <b>fox</b> while 
the <b>hound</b> was sleeping.

Explanation

First of all, I needed to swap out the comma+space (or just comma in some cases) for the pipe character '(regex or)'

    // Swap out the ,<space> for pipes and add the braces
    Regex r = new Regex(@", ?");
    keywords = "(" + r.Replace(keywords, @"|") +  ")";

I then prepared a Regex object for the main keyword replace:

    // Get ready to replace the keywords
    r = new Regex(keywords, RegexOptions.Singleline | RegexOptions.IgnoreCase);

You can see I chose to ignore case and match based on Singleline. Singleline ignores new line characters mid match.. for example two words seperated by newline rather than space.

Now comes the replace. You'll notice that I pass a MatchEvaluator into the replace method. I use this to choose what to replace the match with.

    // Do the replace
    return r.Replace(text, new MatchEvaluator(MatchEval));

The MatchEval method only looks for the first group match, that's all I need. Had the master regular expresion contained two groups, the MatchEval method would have required a second If.

Author Paul Hayman

Paul is the COO of kwiboo ltd and has more than 20 years IT consultancy experience. He has consulted for a number of blue chip companies and has been exposed to the folowing sectors: Utilities, Telecommunications, Insurance, Media, Investment Banking, Leisure, Legal, CRM, Pharmaceuticals, Interactive Gaming, Mobile Communications, Online Services.

Paul is the COO and co-founder of kwiboo (http://www.kwiboo.com/) and is also the creator of GeekZilla.

Comments

Egil Hansen said:

Neat trick.

One question though. Doesn't the Regex let you supply a array of strings as keywords? That would make the code a lot more elegant imo.

Another thing. <b> and </b> are presentational tags, and as such should be avoided when doing proper xhtml. Instead I would recommend <strong></strong> or <em></em>, depending on what you want.

Regards, Egil.

19/Jun/2007 13:56 PM

smitha said:

Hi all,

Iam using regular expression to highlight a text. which is even highligting the text if the text is inside any textbox or text areas.

So here is my code pls help me how to get rid of the problem.

Regex re = new Regex("(<[^>]?.)(<span class='hl'>"keywords"<\\/span>(.*?>", RegexOptions.IgnoreCase | RegexOptions.Singleline);

text = re.Replace(text, "$1$2$3") ;

-- key words have one word which has to be highlited --

so now iam trying to remove that span which has already put to those text inside a textbox. this is working fine only when there is only one occurance and not for multiple occurance of the same text.(:confused:

11/Sep/2007 13:42 PM

xbit said:

This is my expression

strInput =”This is a bunch of text in <b>BOLD</b>, <i>Italic</i>, <br> yadayaya…<br><br>”

And this is how I filter

objRegEx.Pattern = “<[^>]*>”

objRegEx.Global = true

strOutput = objRegEx.Replace(strInput, “”)

What is I want to clean all other tags except lines-breaks? ie. “” tags. How can I do that?

14/Jan/2008 00:08 AM

phayman said:

ok, what you need to do is add another match on the front of your list of matches like this <.[^>]*> then, in the match evaluator just return the string as so :

if (match.Groups[1].Success)

{

return match.ToString();

}

14/Jan/2008 17:48 PM

phayman said:

xbit, Use this expression :

        (<br(?:[ /]+?)?>)|(<[^>]*>)

and replace with $1

14/Jan/2008 17:59 PM

Omar said:

Good but you could have simplified it to one line and removed the MatchEvaluator functions

Something like:

text = Regex.Replace(text, "(" keywords ")", "<span style=background:yellow>$1</span>", RegexOptions.Singleline | RegexOptions.IgnoreCase);

09/Sep/2008 17:42 PM

swap said:

hi every body actually i have a problem.I need help on one concept,i have a javascript code which reads the main headline throung any website now what i want is when i click on that link it should open the actual news link that can be done using openURL but i also want to highlight the default keywords which is passed by me using javascript can any one help me how to do this and using what?

28/Jan/2009 04:54 AM

Philip said:

Hi!

Nice one!

Is it possible to highlight a complete word if a match inside the word.

Keyword: football

Text: Yada yada england footballclub.

And then get <b>footballclub</b> not just <b>football</b>club ?

Thanks!

17/Sep/2009 13:03 PM

Yitbarek said:

Thank you so much, that was neat and very helpful- Thanks!!

23/Oct/2009 20:08 PM

Add Comment

Name
Comment
 

Your comment has been received and will be shown once it passes moderation.