Regex Unicode and Non-Unicode

Debugger · Post by **Debugger** » Sun Feb 19, 2017 9:18 am

It does not work in Everything:

Unicode:
(?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])
or
\p{Han} for CJK ideographs

Not Unicode:
(?>\x0D\x0A|[\x0A-\x0D])

Post by **void** » Sun Feb 19, 2017 11:45 pm

Everything uses Perl Compatible Regular Expressions.

Please try:

\b\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}]

Unicode is supported.
\p{...} is not supported.

Debugger · Post by **Debugger** » Wed Mar 01, 2017 5:08 pm

[quote="void"]

Please try:

\b (?>) Matches a word boundary (the start or end of a word).

Regex enabled:
\b\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}]
Always not work:
0 objects!!!!!!!!!!!

Post by **void** » Thu Mar 02, 2017 8:09 am

regex:"\b\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}]" is working correctly here.

\b = starting word boundary.
\x0D = carriage return
\x0A = new line
| = OR (all text before this is one search, all text after this another search)
[] = in a set
\x{85} = new line
\x{2028} = separator
\x{2029} = separator

Combing them all together you get:
(a carriage return or newline after a word boundary) OR (a single character matching a carriage return, newline, atlernate newline or unicode separator 2028 or unicode separator 2029)

What exactly are you trying to search for?

Please try without the word boundary:
regex:[\x0A-\x0D\x{85}\x{2028}\x{2029}]

Make sure regex is disabled from the Search menu if you use the regex: modifier.
Also if you use the regex: modifier, please make sure you escape | with double quotes.

You can also use the built in macro to find unicode characters, which should be faster, with regex disabled, search for:
#x0a:|#x0d:|#x85:|#x2028:|#x2029:

Debugger · Post by **Debugger** » Fri Mar 03, 2017 6:52 am

0 object

Post by **void** » Sat Mar 04, 2017 10:35 am

Are you certain you have a filename with one of the above characters?

Does the following search find any results:
#x0a:|#x0d:|#x85:|#x2028:|#x2029:

Debugger · Post by **Debugger** » Mon Mar 06, 2017 1:35 pm

It does not work for me.
I want correct Regex: Show the names of Unicode
I want correct Regex: All names without Unicode.

Post by **void** » Wed Mar 08, 2017 4:27 am

I've tested creating filenames with 0x0a, 0x0d, U+2028 and U+2029 characters and the above searches would find them.

It's not clear what you are searching for.

To search for files with non-ASCII characters, search for:
regex:[^\x{00}-\x{7f}]

To search for files with only non-ASCII characters, search for:
!regex:[\x{00}-\x{7f}]

To search for files with ASCII only characters, search for:
regex:^[\x{00}-\x{7f}]*$

skribb · Post by **skribb** » Wed Mar 08, 2017 11:08 pm

Debugger wrote:It does not work in Everything:

Unicode:
(?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])
or
\p{Han} for CJK ideographs

Not Unicode:
(?>\x0D\x0A|[\x0A-\x0D])

I don't know anything about Regx BUT as far as I understand it I don't see why those strings would find folders and file names containing characters from the non-latin character set

Debugger · Post by **Debugger** » Fri Mar 10, 2017 9:32 am

regex:[^\x{00}-\x{7f}]

It works, but I do not want to include Polish alphabet (native OS Polish)
https://en.wikipedia.org/wiki/Polish_alphabet

Show only English + Unicode.

Post by **void** » Sat Mar 11, 2017 1:21 pm

It works, but I do not want to include Polish alphabet (native OS Polish)

regex:[^\x{00}-\x{7f}\x{104}\x{106}\x{118}\x{141}\x{143}\x{d3}\x{15a}\x{179}\x{17b}\x{105}\x{107}\x{119}\x{142}\x{144}\x{f3}\x{15b}\x{17a}\x{17c}]

Show only English + Unicode.

What do you mean by English? does this include spaces? numbers?
What do you mean by Unicode? I assume you mean characters with a code > 7f.

To search for a-z only search for:
regex:^[a-zA-Z]*$

Debugger · Post by **Debugger** » Sat Mar 11, 2017 3:45 pm

English
Aa Cc Ec
Aceelerator

-----------------------
Polish
AĄaą CĆcć EĘeę
Mąka ćwikłowa

------------------------
Unicode -> Other languages than Polish native + Special Chars ★ Hozda ★

Code: Select all

¡ ¦

гвинея-спорт_олимпиада_мюнхен-72(1972)
極上スマイル(brz_

regex:^[a-zA-Z]*$
It does not show all the folders
It does not show all the files

voidtools forum

Regex Unicode and Non-Unicode

Regex Unicode and Non-Unicode

Re: Regex Unicode and Non-Unicode

Re: Regex Unicode and Non-Unicode

Re: Regex Unicode and Non-Unicode

Re: Regex Unicode and Non-Unicode

Re: Regex Unicode and Non-Unicode

Re: Regex Unicode and Non-Unicode

Re: Regex Unicode and Non-Unicode

Re: Regex Unicode and Non-Unicode

Re: Regex Unicode and Non-Unicode

Re: Regex Unicode and Non-Unicode

Re: Regex Unicode and Non-Unicode