GeekZilla
Highlighting keywords in text using Regex.Replace (Perfect for SEO)
Why
I needed to take some text and bold certain keywords before returning the data to the web browser to enhance my Search Engine Optimization
Example
The following example shows how I achieved this although it does contain dummy data. I created a new C# 2005 Console App and added the following to the Main method:
string keywords = "Cat, rabbit, dog,hound, fox"; string text = "The cat spoke to the dog and told him what the rabbit did to the fox while the hound was sleeping."; Console.WriteLine(HighlightKeywords(keywords, text)); Console.ReadLine();
Then added the follwoing static methods:
private static string HighlightKeywords(string keywords, string text) { // Swap out the ,<space> for pipes and add the braces Regex r = new Regex(@", ?"); keywords = "(" + r.Replace(keywords, @"|") + ")"; // Get ready to replace the keywords r = new Regex(keywords, RegexOptions.Singleline | RegexOptions.IgnoreCase); // Do the replace return r.Replace(text, new MatchEvaluator(MatchEval)); } private static string MatchEval(Match match) { if (match.Groups[1].Success) { return "<b>" + match.ToString() + "</b>"; } return ""; //no match }
Result
The <b>cat</b> spoke to the <b>dog</b> and told him what the <b>rabbit</b> did to the <b>fox</b> while the <b>hound</b> was sleeping.
Explanation
First of all, I needed to swap out the comma+space (or just comma in some cases) for the pipe character '(regex or)'
// Swap out the ,<space> for pipes and add the braces Regex r = new Regex(@", ?"); keywords = "(" + r.Replace(keywords, @"|") + ")";
I then prepared a Regex object for the main keyword replace:
// Get ready to replace the keywords r = new Regex(keywords, RegexOptions.Singleline | RegexOptions.IgnoreCase);
You can see I chose to ignore case and match based on Singleline. Singleline ignores new line characters mid match.. for example two words seperated by newline rather than space.
Now comes the replace. You'll notice that I pass a MatchEvaluator into the replace method. I use this to choose what to replace the match with.
// Do the replace return r.Replace(text, new MatchEvaluator(MatchEval));
The MatchEval method only looks for the first group match, that's all I need. Had the master regular expresion contained two groups, the MatchEval method would have required a second If.
Paul is the COO of kwiboo ltd and has more than 20 years IT consultancy experience. He has consulted for a number of blue chip companies and has been exposed to the folowing sectors: Utilities, Telecommunications, Insurance, Media, Investment Banking, Leisure, Legal, CRM, Pharmaceuticals, Interactive Gaming, Mobile Communications, Online Services.
Paul is the COO and co-founder of kwiboo (http://www.kwiboo.com/) and is also the creator of GeekZilla.
Comments
Egil Hansen
said:
Neat trick.
One question though. Doesn't the Regex let you supply a array of strings as keywords? That would make the code a lot more elegant imo.
Another thing. <b> and </b> are presentational tags, and as such should be avoided when doing proper xhtml. Instead I would recommend <strong></strong> or <em></em>, depending on what you want.
Regards, Egil.
smitha
said:
Hi all,
Iam using regular expression to highlight a text. which is even highligting the text if the text is inside any textbox or text areas.
So here is my code pls help me how to get rid of the problem.
Regex re = new Regex("(<[^>]?.)(<span class='hl'>"keywords"<\\/span>(.*?>
", RegexOptions.IgnoreCase | RegexOptions.Singleline);
text = re.Replace(text, "$1$2$3") ;
-- key words have one word which has to be highlited --
so now iam trying to remove that span which has already put to those text inside a textbox. this is working fine only when there is only one occurance and not for multiple occurance of the same text.(:confused:
xbit
said:
This is my expression
strInput =”This is a bunch of text in <b>BOLD</b>, <i>Italic</i>, <br> yadayaya…<br><br>”
And this is how I filter
objRegEx.Pattern = “<[^>]*>”
objRegEx.Global = true
strOutput = objRegEx.Replace(strInput, “”)
What is I want to clean all other tags except lines-breaks? ie. “” tags. How can I do that?
phayman
said:
ok, what you need to do is add another match on the front of your list of matches like this <.[^>]*> then, in the match evaluator just return the string as so :
if (match.Groups[1].Success)
{
return match.ToString();
}
phayman
said:
xbit, Use this expression :
(<br(?:[ /]+?)?>)|(<[^>]*>)
and replace with $1
Omar
said:
Good but you could have simplified it to one line and removed the MatchEvaluator functions
Something like:
text = Regex.Replace(text, "(" keywords ")", "<span style=background:yellow>$1</span>", RegexOptions.Singleline | RegexOptions.IgnoreCase);
swap
said:
hi every body actually i have a problem.I need help on one concept,i have a javascript code which reads the main headline throung any website now what i want is when i click on that link it should open the actual news link that can be done using openURL but i also want to highlight the default keywords which is passed by me using javascript can any one help me how to do this and using what?
Philip
said:
Hi!
Nice one!
Is it possible to highlight a complete word if a match inside the word.
Keyword: football
Text: Yada yada england footballclub.
And then get <b>footballclub</b> not just <b>football</b>club ?
Thanks!
Yitbarek
said:
Thank you so much, that was neat and very helpful- Thanks!!