REGEX HELP!

sjx_samuel

Member
Joined
Jun 4, 2008
Messages
223
Reaction score
0
<a>
<b name1="323232" name2="343243243"></b>
<b name2="" name1=""></b>
<b name1="" name3=""></b>
</a>


I've this xml with this content

I would like my regax to pass if these condition are met.

First: Tag <a> contained the specify child.
Second: <b element contain name 1 and name 2> only

Side note: Sequence <b>, same for name1 and name2 is not fixed

This is what i have done so far :
Code:
(?<=<b).*(?<=name1=).*(?=></b>)
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,301
<a>
<b name1="323232" name2="343243243"></b>
<b name2="" name1=""></b>
<b name1="" name3=""></b>
</a>


I've this xml with this content

I would like my regax to pass if these condition are met.

First: Tag <a> contained the specify child.
Second: <b element contain name 1 and name 2> only

Side note: Sequence <b>, same for name1 and name2 is not fixed

This is what i have done so far :
Code:
(?<=<b).*(?<=name1=).*(?=></b>)

I can give you a regex to accomplish your needs based on your above input, but it will not be complete because regex cannot do counting, not without certain hacks that I know can be done in Perl.

For your above input,

you can use, try it because I didn't execute to validate it myself

Code:
<a[^>]*>.*?<b\s*(name1\s*=\s*["'][^"']*["']\s+name2\s*=\s*["'][^"']*["'])|(name2\s*=\s*["'][^"']*["']\s+name1\s*=\s*["'][^"']*["'])\s*</b>.*?</a>

In any case, I don't recommend at all. Because the regex will not take care of for example

Code:
<a ..>
   <a ..>
     <b ..></b>
     <b ..></b>
   </a>
</a>

Should the above be made correct since the inner "a" container will be valid, but the outer "a" will be invalid since your requirement does not says the "a" container can be nested.

There will also be so many possibility of a string that could be an invalid XML.

In such cases where your purpose is to validate the correct structure of a XML input, first use a XML parser to ensure correct XML tree structure, which you can either choose to use SAX parser to perform the validation using your own code logic, or after parsing into a DOM where you perform the tree node validation. This method will be more likely to be correct and gives better codes for maintainability.
 
Last edited:

sjx_samuel

Member
Joined
Jun 4, 2008
Messages
223
Reaction score
0
Thanks davidktw, basically just provide all the possible combination and group them together. Appreciate :D
 

sjx_samuel

Member
Joined
Jun 4, 2008
Messages
223
Reaction score
0
i found a way, it seems logically right.

Code:
(?=[\s\S]*?<b\s*((?P<combi1>name1=\"[^\"]*\" name2=\"[^\"]*\")|(?P<combi2>name2=\"[^\"]*\" name1=\"[^\"]*\"))[^>]*>)
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,301
i found a way, it seems logically right.

Code:
(?=[\s\S]*?<b\s*((?P<combi1>name1=\"[^\"]*\" name2=\"[^\"]*\")|(?P<combi2>name2=\"[^\"]*\" name1=\"[^\"]*\"))[^>]*>)

Would you like to test with the follow input ?

Code:
.......<b       name1="..." name2="..." x="1" y="2">

Is there any constraint that you have while the following has to be done using regex ? I have already told you, regex is not the right tool for such exercise.

Without knowing what your possible inputs maybe, and without using an extremely complex regex which would really defeat the purpose of a regex, there are XML patterns which can easily break your regex. Remember XML parsing like any general purpose parser are using BNF and BNF is more expressive than just regex. In any case, the choice is yours. :)
 
Last edited:

stan216

Senior Member
Joined
Jul 8, 2012
Messages
1,034
Reaction score
34
Thanks davidktw, basically just provide all the possible combination and group them together. Appreciate :D

https://regexone.com/

Have you tried learning from this site? That helped me a lot. See what regex can or cannot do, then look again at what you can or cannot do with regex (with regards to what you need to do).
 

cwchong

Master Member
Joined
Jan 7, 2005
Messages
4,654
Reaction score
96
Looking at ur sample, it is well-structured

Why not parse it as xml and check existence of child nodes and attributes’ values instead of reinventing the parse?

Off hand i would think the regex already miss out cases such as when the attribute value contains < or >
 
Important Forum Advisory Note
This forum is moderated by volunteer moderators who will react only to members' feedback on posts. Moderators are not employees or representatives of HWZ Forums. Forum members and moderators are responsible for their own posts. Please refer to our Community Guidelines and Standards and Terms and Conditions for more information.
Top