Quick Question regarding Everything and RegEx

General discussion related to "Everything".
Post Reply
hairypaulsack
Posts: 11
Joined: Wed Jun 10, 2015 11:37 pm

Quick Question regarding Everything and RegEx

Post by hairypaulsack »

I must admit that I am fairly new to RegEx and while having been mostly successful with my endeavors I am absolutely clueless as to where to begin with something like this.

Simply Put:
I've had a horrible habit of extracting archives and leaving the remains behind. If this isn't a problem of epic proportions with respect to wasted space, it is certainly a contributing factor to the mass amounts of files that I believe to be a bottleneck for the de/fragmentation process.

Would like to know if it would be possible to help me track down pairs of archives and the contents of archives. I'm fairly confident that I've read duplicate file finders that have the ability to parse through some (or all) archive file types, but because of the way that I extracted my archives, I think there may be a faster solution using the gift of everything search. I am not necessarily seeking a solution as I feel the path to learning is through self discovery, but I wouldn't complain if one were provided.

Again, do want to reiterate that I am merely seeking conformation as to whether or not this is possible as I've had my fair share of searching for things that do not exist, and if not, maybe someone knows a duplicate file finder that could do the operation that I seek without having to parse within compressed archives (as I assume to do this would require the archives to temporarily be extracted behind the scenes which would dramatically increase the operation time.

Why I think there might be a solution with Everything Search:
In my thoughts, one possible benefit to the way that I did this is that 95% of the time I extracted the archive into a new folder based on the name of the archive.

E.g.

.\archive_example_1.zip would have been extracted into .\archive_example_1\

This seems like an easy process if I were seeking something like d3d8.dll from archive d3d8.zip, but when it comes to folders it is another story. Maybe I am just intimidated from my experiences in batch scripting and how folder operations seem to always differ from file operations, but I honestly don't even know where to begin.

Maybe something with wildcards to cover the prefix, or everything excluding the extension and then somehow tie this to the identical name of the folder using RegEx?

If confused, I'd like to find folders that share the same name as the archive from which they were extracted from, because the name is 95% of the time identical to the name of the archive minus the extension.

EDIT 1: I don't care about the contents of the archive as they are expected to be deleted. So they of are no concern to the search query.

EDIT 2: Is there a possible solution using the modifier folder: + wildcard and somehow having that match to a file: *.zip | *.rar ..etc. I'm not sure if this involves grouping or if there is even a way to seek a match from two different modifiers
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Quick Question regarding Everything and RegEx

Post by NotNull »

MAybe someone will post a regex query for this, but I can think of 2 relatively easy ways to get this information:

Option 1:
a simple batch file (could be done in one line, but I made it a little easier to read) might be enough for what you need....

  • In Everything, query for folders:
  • Export the results as folders.txt in some folder, say c:\temp
  • In Everything, query for ext:zip
  • Export the results as zips.txt to the same folder
  • Save posted code as FindZipFolders.cmd in that same folder
  • Run FindZipFolders.cmd
  • Open found.csv in Notepad or Excel to see which ZIP's have a folder with the Zip's name on the same dorectory level.
Some remarks:
  • Most of the time running this is spend on displaying Parsing .... on the screen. If you want to speed things up, change echo Parsing .... %1 to:
    REM echo Parsing .... %1
  • This script seraches for MyZipFile.zip file and MyZipFile folder in the same directory.
    If you want to find *any* matching MyZipFile folder anywhere, replace (`findstr /e /i /c:"%~dpn1" "%FOLDERS%"`) with:
    (`findstr /e /i /c:"\%~n1" "%FOLDERS%"`)

Code: Select all

@echo off
setlocal

::_______________________________________________________________
::
::      Settings
::_______________________________________________________________
::

set OUTPUT=found.csv
set ZIPS=zips.txt
set FOLDERS=folders.txt

::_______________________________________________________________
::
::      Init ...
::_______________________________________________________________
::

echo ZIPFILE;FOLDER > "%OUTPUT%"

::_______________________________________________________________
::
::      Action!
::_______________________________________________________________
::

    for /f "usebackq delims=" %%x in ("%ZIPS%") DO call :THISZIP "%%x"

    echo.
    echo.
    echo.   Results are in "%OUTPUT%"
    echo.
    echo.
    pause

goto :EOF


::===============================================================
:THISZIP
::===============================================================

    echo Parsing  .... %1
    for /f "usebackq delims=" %%a in (`findstr /e /i /c:"%~dpn1" "%FOLDERS%"`) DO echo %1;"%%a" >> "%OUTPUT%"

goto :EOF

NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Quick Question regarding Everything and RegEx

Post by NotNull »

Option 2:
  • In Everything, query for ext:zip
  • Export the results as zips.txt to a folder
  • Open zips.txt in Notepad (or any other text editor)
  • Search and replace ".zip" with "" (nothing,empty)
  • Save zips.txt
  • In Everything, got to : Menu:Search > Advanced search
  • In the search for a list of filenames: select zips.txt
  • Result: all the folders that have a matching .zip.

Remarks:
  • This is limited to folders in the same directory as the zip
  • If there is a .zip somewhere else in the file/folderpath, the matching folder will not be found
    Eg: Search/replace on "c:\My .zip folder\dummy.zip" would search for a (non exiting) "c:\My folder\dummy" folder.
hairypaulsack
Posts: 11
Joined: Wed Jun 10, 2015 11:37 pm

Re: Quick Question regarding Everything and RegEx

Post by hairypaulsack »

NotNull wrote:MAybe someone will post a regex query for this, but I can think of 2 relatively easy ways to get this information:

Option 1:
a simple batch file (could be done in one line, but I made it a little easier to read) might be enough for what you need....


[*] In Everything, query for folders:
[*] Export the results as folders.txt in some folder, say c:\temp
[*] In Everything, query for ext:zip
[*] Export the results as zips.txt to the same folder
[*] Save posted code as FindZipFolders.cmd in that same folder
[*] Run FindZipFolders.cmd

[*] Open found.csv in Notepad or Excel to see which ZIP's have a folder with the Zip's name on the same dorectory level.


Some remarks:
[*] Most of the time running this is spend on displaying Parsing .... on the screen. If you want to speed things up, change echo Parsing .... %1 to:
REM echo Parsing .... %1

[*] This script seraches for MyZipFile.zip file and MyZipFile folder in the same directory.
If you want to find *any* matching MyZipFile folder anywhere, replace (`findstr /e /i /c:"%~dpn1" "%FOLDERS%"`) with:
(`findstr /e /i /c:"\%~n1" "%FOLDERS%"`)


Code: Select all

@echo off
setlocal

::_______________________________________________________________
::
::      Settings
::_______________________________________________________________
::

set OUTPUT=found.csv
set ZIPS=zips.txt
set FOLDERS=folders.txt

::_______________________________________________________________
::
::      Init ...
::_______________________________________________________________
::

echo ZIPFILE;FOLDER > "%OUTPUT%"

::_______________________________________________________________
::
::      Action!
::_______________________________________________________________
::

    for /f "usebackq delims=" %%x in ("%ZIPS%") DO call :THISZIP "%%x"

    echo.
    echo.
    echo.   Results are in "%OUTPUT%"
    echo.
    echo.
    pause

goto :EOF


::===============================================================
:THISZIP
::===============================================================

    echo Parsing  .... %1
    for /f "usebackq delims=" %%a in (`findstr /e /i /c:"%~dpn1" "%FOLDERS%"`) DO echo %1;"%%a" >> "%OUTPUT%"

goto :EOF


Wow thank you!

Hope you didn't think this was all a waste of time, I sure have wondered when devoting time to a lengthy (or semi) response only to have the person never respond. It's nice to think (and find out) that others were able to use it so there's always a reason ease the thought of wasting time.. Went away on vacation and just as I was reminded of my horrible fragmented PC (should have had it work then auto shutoff) I thought, oh shit!, I need to go check Voidtools lol...

I came pretty close using similar methods, just got hung up on comparing the results. I'm embarrassed to describe how I went about the path modification to match files names and more embarrassed to describe how I tried to compare the results..

I just got this so I'm going to test it out and try to figure out the parts of the script that I don't understand but again, thanks! I've already learned a thing or two before starting and have some other things to question. Didn't know you could use a command in the (set) of a FOR loop, and the ';' at the end where you have

Code: Select all

%1;"%%a"
is new to me.

From my experience:
The %1 is usually the passed (parameter??/argument??) that drag over a script file, or more accurately what I pass when I start or call a script in a CLI, but (again, haven't used script; these are simply first thoughts) from browsing the script and your instructions it seems that we are not passing anything, simply using a new command to compare the results of two files and output the results in a CSV. I understand the '%%a' alright. After writing the last sentences I just realized that the ';' could be a delimiter, but more importantly realized I should just go and try this out as my questions might as well be answered. So I'll stop here.

Also, is it easy to put a condition for say, date or size using attributes? One thing I've never done is use attributes in batch scripting

Thanks again,
Paul
Janus
Posts: 84
Joined: Mon Nov 07, 2016 7:33 pm

Re: Quick Question regarding Everything and RegEx

Post by Janus »

Just a random thought.

Do your search, whatever it is.
I would limit to one drive letter at a time though "C:\" for instance, then "D:\", etc.
Export to efu, which you then take into a spreadsheet program.
Then use Data:'text to columns', with a comma separator and quotes for delimiters.
This will give you filename, size, date created, date modified, attributes.
Then use Data:Sort expand to all, sort by filename, attributes, date.

As you look down the list, you will notice attributes=16, those are directories.
You can then compare them to the filenames above and below them for matches.
This will even show you directories with extensions.

You can then use other spreadsheet functions to filter from there.

I hope that helps.


Janus.
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Quick Question regarding Everything and RegEx

Post by NotNull »

hairypaulsack wrote: Hope you didn't think this was all a waste of time, I sure have wondered when devoting time to a lengthy (or semi) response only to have the person never respond. It's nice to think (and find out) that others were able to use it so there's always a reason ease the thought of wasting time..
No, I didn't gave up on you :-)
Partly because you took some serious time to describe your issue, partly because it was indeed summer holiday (assuming you live on the northern hemisphere ...) and partly because the "Hit-and-run" percentage on these forums is substantially lower than a lot of other forums I visit. (I don't post on those forums because of that ..)

I do have to say that I really dislike it when people fail to have the decency to respond to their own threads.
I'm not very motivated to help those people again a next time.
But as said before: most people here are friendly and polite.



On topic:
Were you able to try (one of) the suggestions?
hairypaulsack wrote:
From my experience:
The %1 is usually the passed (parameter??/argument??) that drag over a script file, or more accurately what I pass when I start or call a script in a CLI, but (again, haven't used script; these are simply first thoughts) from browsing the script and your instructions it seems that we are not passing anything, simply using a new command to compare the results of two files and output the results in a CSV. I understand the '%%a' alright. After writing the last sentences I just realized that the ';' could be a delimiter, but more importantly realized I should just go and try this out as my questions might as well be answered. So I'll stop here.
The two most important lines of the script:

Code: Select all

    for /f "usebackq delims=" %%x in ("%ZIPS%") DO call :THISZIP "%%x"
For each line in zips.txt (every line contains a zipfile): call the THISZIP routine with the current line (=zipfile) as parameter

Code: Select all

for /f "usebackq delims=" %%a in (`findstr /e /i /c:"%~dpn1" "%FOLDERS%"`) DO echo %1;%~z1;"%%a"; >> "%OUTPUT%"
%1 = the parameter used when THISZIP routine is called (like "C:\TEST\MyFile.zip")

findstr /e /i /c:"%~dpn1" "%FOLDERS%" :
Search in the file containing all the folders and Find the lines that end with the filename without extension and without the surrounding quotes (C:\TEST\MyFile)

If a matching folder is found (which means the ZIP is extracted in the current folder): write "the name of thezipfile";"name of found folder ("C:\TEST\MyFile.zip";"C:\TEST") to the output file.


";" is indeed the separator for the different fields. You could change that to anything you like, like a ","


hairypaulsack wrote: Also, is it easy to put a condition for say, date or size using attributes? One thing I've never done is use attributes in batch scripting
Yes, that is possible. You could also filter size and date attributes in Everything and export that.
But if you want to do this in batch, change the line containg findstr in:

Code: Select all

for /f "usebackq delims=" %%a in (`findstr /e /i /c:"%~dpn1" "%FOLDERS%"`) DO echo %1;%~z1;"%%a"; >> "%OUTPUT%"
That way it will also report filesize of the zipfile.
By including an IF condition you could filter out (for example) filesizes below a certain threshold.
Same goes for attributes. Or dates.

HTH.
Last edited by NotNull on Sun Sep 24, 2017 11:28 am, edited 2 times in total.
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Quick Question regarding Everything and RegEx

Post by NotNull »

Janus wrote: As you look down the list, you will notice attributes=16, those are directories.
That will report all directories that have no further attributes set, like Read-only (+1), Hidden (+2) or System (+4)
Gisle Vanem
Posts: 34
Joined: Mon May 04, 2015 10:30 am

Re: Quick Question regarding Everything and RegEx

Post by Gisle Vanem »

I'm also not fluent in regex syntax. So it was easy for me to erroneously query like this:

Code: Select all

c:\\windows\\.*gcc*\.exe$
which returns this:

Code: Select all

 
c:\WINDOWS\sysnative\dnscacheugc.exe
c:\Windows\System32\netbtugc.exe
c:\Windows\System32\netiougc.exe
c:\Windows\System32\setupugc.exe
c:\Windows\SysWOW64\netbtugc.exe
c:\Windows\SysWOW64\netiougc.exe
I'd expect gcc not to match gc before the file extension (.exe).

But what I was supposed to query was:

Code: Select all

c:\\windows\\*.gcc*\.exe$
It's too easy to swap the . and *.
Post Reply