Search for file content with a search term that should appear more than once

If you are experiencing problems with "Everything", post here for assistance.
Post Reply
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Search for file content with a search term that should appear more than once

Post by tuska »

Hi,

Example:

Code: Select all

ext:pdf content:searchword word-count:>=2
The word you are looking for should therefore appear at least twice or more in a pdf file.
word-count: is NOT the exact parameter in this case and is only intended to clarify
which search query - according to the forum title - is desired.

Unfortunately, this search brings no result.
Could a search using "Search Preprocessor" bring any success?
-----
Another question:
Is it possible to create a direct link to "word-count:"?

Currently I can only get as far as Search Functions, then press Ctrl+F in the browser and enter as a search text: word-count
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for file content with a search term that should appear more than once

Post by void »

There's no such function yet.

I have this on my TODO list.
Thank you for the suggestion.
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Search for file content with a search term that should appear more than once

Post by tuska »

void wrote: Fri Apr 01, 2022 9:38 am There's no such function yet.

I have this on my TODO list.
Thank you for the suggestion.
Thank you for your prompt reply!
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for file content with a search term that should appear more than once

Post by void »

For now, please try:

wildcards:content:*searchword*searchword*

regex:content:searchword.*searchword
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Search for file content with a search term that should appear more than once

Post by tuska »

Thank you for the solutions! :)

Code: Select all

wildcards:content:*searchword*searchword*
works for me as desired.

Regarding the RegEx solution,
I am sorry to say that I have no knowledge on this subject and therefore cannot perform a test. :(
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Search for file content with a search term that should appear more than once

Post by tuska »

Just one more question please:
Could the search be narrowed down to exact: wfn:?
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for file content with a search term that should appear more than once

Post by void »

Could you please give an example of what you are trying to do.
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Search for file content with a search term that should appear more than once

Post by tuska »

void wrote: Fri Apr 01, 2022 10:19 am Could you please give an example of what you are trying to do.
Example:
ext:pdf wildcards:content:*Altbestand*Altbestand*

In the search result, I get a PDF file containing the text: Altbestandes and Altbestand.
Only the file with text: Altbestand should be found.
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for file content with a search term that should appear more than once

Post by void »

You would need to use regex to match word boundaries:

ext:pdf regex:content:\bAltbestand\b.*\bAltbestand\b


\b = match a word boundary.
.* = match any character any number of times (same as a wildcard * -edit: doesn't match newlines! -use dotall:)
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Search for file content with a search term that should appear more than once

Post by tuska »

Unfortunately, this does not work.
Other words are also found in which "Altbestand" occurs in the name.
In this case, the word "Altbestand" is found more than twice.
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for file content with a search term that should appear more than once

Post by void »

I'm not 100% on your search requirements.

Are you looking for only one instance of Altbestand?
where Altbestand is a whole word?

for example:

ww:content:Altbestand



Are you looking for files containing the whole word Altbestand and where Altbestand must not exist as apart of other words?

for example:
ww:content:Altbestand !regex:content:Altbestand\B !regex:content:\BAltbestand
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Search for file content with a search term that should appear more than once

Post by tuska »

I am searching for PDF files with the word "Altbestand" (single word - exact search),
which occurs several times in a PDF file, e.g. exact twice or three times, etc.

ext:pdf wildcards:content:*Altbestand*Altbestand*
would be exactly what was wanted if it could only find the exact stand-alone term "Altbestand".
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for file content with a search term that should appear more than once

Post by void »

Please try:

ext:pdf regex:dotall:content:\bAltbestand\b.*\bAltbestand\b

I keep forgetting . doesn't match all..
and my test files didn't have newlines!
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Search for file content with a search term that should appear more than once

Post by tuska »

This search query returns:
Altbestandes (7x) and Altbestand (3x)
in one pdf file.
Last edited by tuska on Fri Apr 01, 2022 11:08 am, edited 1 time in total.
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for file content with a search term that should appear more than once

Post by void »

While there might be the word 'Altbestandes' in the content, is there at least two occurrences of Altbestand ?
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Search for file content with a search term that should appear more than once

Post by tuska »

void wrote: Fri Apr 01, 2022 11:08 am While there might be the word 'Altbestandes' in the content, is there at least two occurrences of Altbestand ?
Yes, Altbestand (3x) - please see also above your post (edited by me).
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for file content with a search term that should appear more than once

Post by void »

I'm not 100% sure on your search requirements.

ext:pdf regex:dotall:content:\bAltbestand\b.*\bAltbestand\b
is finding content where the whole word Altbestand exists at least twice.


Should the file not match if it contains Altbestandes? even though there is two occurrences of Altbestand ?
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for file content with a search term that should appear more than once

Post by void »

To match the whole word Altbestand exactly twice, please try:

ext:pdf regex:dotall:content:\bAltbestand\b.*\bAltbestand\b !regex:dotall:content:\bAltbestand\b.*\bAltbestand\b.*\bAltbestand\b
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Search for file content with a search term that should appear more than once

Post by tuska »

void wrote: Fri Apr 01, 2022 11:14 am I'm not 100% sure on your search requirements.

ext:pdf regex:dotall:content:\bAltbestand\b.*\bAltbestand\b
is finding content where the whole word Altbestand exists at least twice.


Should the file not match if it contains Altbestandes? even though there is two occurrences of Altbestand ?
Thanks for this solution - it works! :)
Otherwise, I wanted to ask the question about "Altbestand exists at least twice".
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Search for file content with a search term that should appear more than once

Post by tuska »

void wrote: Fri Apr 01, 2022 11:18 am To match the whole word Altbestand exactly twice, please try:

ext:pdf regex:dotall:content:\bAltbestand\b.*\bAltbestand\b !regex:dotall:content:\bAltbestand\b.*\bAltbestand\b.*\bAltbestand\b
Thank you very much!
This is the solution I wanted and it works for me as expected! :)
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Search for file content with a search term that should appear more than once

Post by raccoon »

I think you should also have luck with this abbreviated version.

ext:pdf regex:dotall:content:(\bAltbestand\b).*?\1 !regex:dotall:content:(\bAltbestand\b).*?\1.*?\1

Test and compare to make sure the results match. I also changed .* to .*? to march forward instead of backward.
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Search for file content with a search term that should appear more than once

Post by tuska »

raccoon wrote: Fri Apr 01, 2022 5:38 pm I think you should also have luck with this abbreviated version.
ext:pdf regex:dotall:content:(\bAltbestand\b).*?\1 !regex:dotall:content:(\bAltbestand\b).*?\1.*?\1
...
Thank you for your efforts.

With this search query I cannot find the pdf test file with the following content:
Test mit Altbestand.
Altbestandsliste
Altbestand ist vorhanden.
I am already very satisfied with the existing solutions from the author.
They already cover my requirements.

Nevertheless, thanks again for thinking along!
Post Reply