Boltinghouse: Python Regex for parsing site

Friday, 27 September 2013

Python Regex for parsing site

Python Regex for parsing site

I am trying to write python script to pull data from a site and place it
into a json string.
The site is http://mtc.sri.com/live_data/attackers/.
I have python pulling the source, but can't quite figure out the regex
portion
When I use RegExr, this regex works:
</?table[^>]*>|</?tr[^>]*>|</?td[^>]*>|</?thead[^>]*>|</?tbody[^>]*>|</?font[^>]*>
But when I put it into the script, I get no match.
#!/usr/bin/python
import urllib2
import re
f = urllib2.urlopen("http://mtc.sri.com/live_data/attackers/")
out = f.read();
matchObj = re.match(
r'</?table[^>]*>|</?tr[^>]*>|</?td[^>]*>|</?thead[^>]*>|</?tbody[^>]*>|</?font[^>]*>',
out, re.M|re.I)
if matchObj:
print "matchObj.group() : ", matchObj.group()
print "matchObj.group(1) : ", matchObj.group(1)
print "matchObj.group(2) : ", matchObj.group(2)
else:
print "No match!!"
Any idea why I am not getting the appropriate response?

Boltinghouse

Friday, 27 September 2013

Python Regex for parsing site

No comments:

Post a Comment