Python Regex for Numeric Pattern that has Dashes
An answer to this question on Stack Overflow.
Question
I have a column in a pandas data frame called sample_id. Each entry contains a string, from this string I'd like to pull a numeric pattern that will have one of two forms
1-234-5-6789
or
123-4-5648
I'm having trouble defining the correct regex pattern for this. So far I have been experimenting with the following:
re.findall(pattern=r'\b2\w+', string=str(data['sample_id']))
But this is only pulling values that are starting with 2 and only the first chunk of the numeric pattern. How do I express the above patterns with the dashes?
Answer
A vertical pipe | makes an OR in a regular expression, so you can use:
test1='123-4-5648'
test2='1-234-5-6789'
re.findall(pattern=r'[0-9]-[0-9]{3}-[0-9]-[0-9]{4}|[0-9]{3}-[0-9]-[0-9]{4}', string=test1)
re.findall(pattern=r'[0-9]-[0-9]{3}-[0-9]-[0-9]{4}|[0-9]{3}-[0-9]-[0-9]{4}', string=test2)
[0-9] matches a single digit in the range 0 through 9 (inclusive), {4} indicates that four such digits should occur in a row, - means a hyphen, and | means an OR and separates the two patterns you mention.