The other day I found a file with a peculiar format and I needed to find it again. There are two problems I’ve encountered:
- Size: It’s one of 20,000,000 XML files
- Search: I need to find a file containing a string “xocs:doi”, which is not interpreted as a single word (due to the colon character, of course)
My case is search through XML files, but I’ve figured the method is generic enough for any text files.
TL;DR: Use the power of findstr, Luke!
Searching through windows explorer is trivial; you just go to the search bar and type your string. If you want to searhc for a phrase, you just type it in double quotes like so:
"hello world"..and click the search button.
However, the behaviour is inconsistent:
- Indexed directories: the result is a list of files containing the exact phrase you wanted to search
- Un-indexed directories: the result is a list of files containing all words in the phrase
Assuming we have two files in the same directory:
test1.txt hello world
test2.txt hello dear world
We type the text “hello world” (including the quotes) and launch the search in Explorer.
- If the directory is indexed, the result will contain only test1.txt
- If the directory has no index, the result will have both test1.txt and test2.txt
In my case, indexing 20M files is out of the question (time and disk space constraints), so searching via explorer yields (a lot of) false positives.
My command line to find the string above through all files is:
findstr /s /m /c:"xocs:doi" *.*
What it does is:
/s- look through all subdirectories
/m- display only the file name containing the match
/c:"xocs:doi"- look for the exact string
.- check all files
The most accurate method to look for exact arbitrary strings in windows is via findstr command line.