I'm looking to "grep" a file looking for matches only if 2 or more of the keywords are found. So, for example, I want to search a file for blah and foo but they are on separate lines of the file. I only want to know if the file contains both keywords. I realize grep may not be the solution for this; may need to write a script of some sort but grep or some equivalent would be much faster. More Info: I have a file that contains:
I want to be able to search that file for to see if both blah and foo appear in the file. asked 30 Dec '10, 17:12 Andy |
EDIT TO ANSWER QUESTION IN COMMENT/REPLY SECTION We kick things off with the cat command and give it the name of the file we want to examine. The cat command then passes the contents of our text file to "tr". The tr command breaks up the file, putting each word on its own line, for easy access. (The '\n' after "tr" indicates we want to add newline characters to our text.) We next filter our file through the sed command, which removes any empty lines. (The ^ immediately followed by the $ mean we're looking for lines that effectively have nothing between the beginning of the line and the end. The "d" on the end of the sed command indicates we want to delete any such lines.) The list of words we have is sorted alphabetically and then passed to the "uniq" command, which performs the actual count for us. Should we want to narrow things down so we just see the count for the word "love" we can append the grep program to our command in this manner:
answered 30 Dec '10, 17:40 Ron ♦ Ron, thanks for the tip! I must admit I'm not a great sed user yet but am learning. Where in the line would I add blah and foo? Or would it be in the tr section? Thanks!
(30 Dec '10, 18:35)
Andy
I'll post another answer since I need more room than is allotted in this reply space. Look there.
(30 Dec '10, 20:16)
Ron ♦
|
I love Python ... I couldn't resist writing you some code. It would be easy to generalize it to check all the files in a directory ... but I'm trying to hook you! I am sure it could be done in a more Pythonic way, but python is great for string stuff -- way more readable than Perl. (Flame war alert!)
answered 29 Jan '11, 07:03 pcardout |
If instead of words you use patterns, simple chain of xarg grep 's will work:
The -lZ flags will tell grep to output the name of each file with a match, followed by \0. xargs -0 executes the specified command for each filename it reads, separated by \0's. Using \0 makes sure all file names, even those with spaces or newlines, are handled correctly. The final grep does not have the -Z flag, so its output will be one file name per line. Basic idea is to first scan all files for the first pattern, and output the names of the files that did have a match. This list is then fed to the second grep, which only outputs the names of those files that did have a match on the second pattern (as well as the first). This is repeated for each pattern. The last grep will then output the names of the files that matched all patterns. This is extremely efficient, especially if you order the patterns in ascending likelihood to be found in the files. In practice, ordering the patterns by length (longest first) works almost as well. Note that you can prepend find to the chain by using the -print0 flag for find, i.e.
answered 31 Dec '10, 17:55 Nominal Animal |
perl -e '$a=`cat file.txt`;if (grep(/foo/,$a)&&(grep(/blah/,$a))){print "YES\n"}else{print "NO\n"}'; NOTE1: back-tics do not show up on this site. Should be BACK-TIC cat file.txt BACK-TIC (correction - back-tics do show up if escaped) NOTE2: And file.txt is from the above example:
answered 10 Jan '11, 12:49 joe 3 |
Please accept an answer so the question/answer can be finished. Or provide more details so we can help.