That's absolutely not it. What you're describing is part of the UNIX philosophy: programs should do one thing and do it well, and they should function in a way that makes them very versatile and composable, etc.
And that part of the philosophy works GREAT when everything follows another part of the philosophy: everything should be based on flat text files.
But for a number of reasons, and regardless of whatever we all think of those reasons, we live in a world that has a lot of stuff that is NOT the kind of flat text file grep was made for. Binary formats, minified JS, etc. And so to make the tool more practical on a modern *nix workstation, suddenly more people want defaults that are going to work on their flat text files and transparently ignore things like .git.
It's just that you've showed up to an wildly unprincipled world armed with principles.
Alternately, maybe people's idea of what "one thing" is ends up being more subjective than it sounds (or at least depends on context). "Searching through my code" at least sounds like a reasonable idea of "one thing", and it's not crazy that someone might consider "don't search though the stuff that isn't my code, like my npm dependencies or my Rust build artifacts" would be part of "doing it well". Having to specify it every time would be annoying, so you might want to put it in a config file, but then if then if it ends up being identical to your gitignore, having to manually symlink it or copy it each time you modify it is annoying, so it's also not crazy to just use the gitignore by default with a way to opt out of it. Now we're just back where we started; custom .ignore files, fallback to .gitignore, and a flag for when you want to skip that.
I first became aware of the phenomenon of an enlightened anti-UNIX bundle in ZFS; in particular how it unifies lvm, RAID, and the filesystem. While zfs isn't universally loved, it seems that each hot new filesystem that comes out now adopts this strategy as well.
While this doesn't lead to immediate enlightenment about where the balance is, it does highlight an important aspect to consider: whether the whole is more than the sum of its parts. One way openzfs is more than the sum of its parts is that it closes the RAID write hole. The next step, whether it be stabilized in openzfs or otherwise, is to merge encryption into the stack: The current state of the art is to compose block encryption with zfs on top. But a better solution would be for zfs's object layer to encrypt its blocks itself. Because the blocks are not required to have a particular disk alignment or size, the filesystem can offer authenticated encryption without losing the random-access property, as well as granular keys, thus offering some clear advantages over the UNIXy composition method.
Actually I'm not sure how strong an example ripgrep is by comparison. Could a `find` replacement do the ignore patterns just as well? OTOH, does ripgrep offer better I/O and compute parallelism than a naive xargs/parallel?
I'm not sure if you're asking rhetorically or not, but I genuinely don't know the answers to those questions, and I'd argue that's kind of the point. Pretty much any time I've ever had to do anything non-trivial with either find or xargs, I've had to look up how to do it. The most common way I've used xargs over the years by far is piping to it as a quick and dirty way to condense whitespace to a string I get out at the end of some one-liner.
The larger point I was trying to make is that good experience out of the box" is in practice a legitimate reason that people will prefer one thing to another even if it's equivalent to some other thing you might be able to throw together manually that's just as good from a technical perspective. There's certainly power in knowing how to use composable tools, but there's also power in being able to save time to put towards other things if you care about them more, and people will have different preferences about where to strike that balance for a given tool. The more precise point I was trying to make is that "this doesn't fit the UNIX philosophy" seems like fairly weak criticism; if that's the strongest argument that can be made against ripgrep, it makes a lot of sense why it was so successful.
No you are correct, do not doubt yourself. Baked in behavior catering to a completely separate tool is bad design. Git is the current version control software but its not the first nor last. Imagine if we move to another source control and are burdened with .gitignore files. No thanks.
The Unix tools are designed to be good and explicit at their individual jobs so they can be easily composed together to form more complex tools that cater to the task at hand.
Actually we are, because some utter idiot wrote:
> grep-like tools which read .gitignore violate POLA.
How about we keep it civilized
Or some combination of --no-ignore (or -u/--unrestricted) with --ignore-file or --glob.
[0]https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#c...
LESS="-FQMR"
(no bell, more status, raw characters, exit if less than one page).Those are also completely reasonable to use, but they must set conciously, otherwise the might give results that confuse the user.
[1]: https://ugrep.com/
But that’s the kind of problem that only successful things have to worry about.
But it does make sense today.
I'd argue that modern computers do many astonishing and complicated and confusing things - for example, they synchronize with cloud storage through complex on-demand mechanisms that present a file as being on the users' computer, but only actually download it when it's opened by the user - and they attempt to do so as a transparent abstraction of a real file on the disk. But if ripgrep tried to traverse your corporate Google Drive or Dropbox or Onedrive, users might be "astonished" when that abstraction breaks down and a minor rg query takes an hour and 800 GB of bandwidth.
It used to be that one polymath engineer could have a decent understanding of the whole pyramid of complexity, from semiconductors through spinlocks on to SQL servers. Now that goal is unachievable for most, and tools ought to be sophisticated enough to help the user with corner cases and gotchas that make their task more difficult than they expected it to be.
[1]: https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...
--ignore-file=.ignore
--ignore-file=.gitignore
--ignore-file=.dockerignore
--ignore-file=.npmignore
etc
but then, assuming all those share the same "ignore file syntax/grammar"...
The comparison between jiff, chrono, time and hifitime is just as good of a read in my opinion: https://github.com/BurntSushi/jiff/blob/HEAD/COMPARE.md
(And they have also written interesting things on regex, non-regex string matching, etc.)
What is the motivation that Rust invented its own RegEx flavor instead of sticking to PCRE or even RE2?
Other than that it is standard regex syntax pretty much as far as I know? One library is pretty much like another. What differs is mostly availability of some less used extensions, and performance characteristics. If you accept untrusted regexes you want one with linear time complexity. On top of that the rust regex engine has really good performance when it comes to the constants too (not just the big O notation).
There was this post from cursor https://cursor.com/blog/fast-regex-search today about building an index for agents due to them hitting a limit on ripgrep, but I’m not sure what codebase they are hitting that warrants it. Especially since they would have to be at 100-200 GB to be getting to 15s of runtime. Unless it’s all matches that is.
On a mid-size codebase, I fzf- and rg-ed through the code almost instantly, while watching my coworker's computer slow down to a crawl when Pycharm started reindexing the project.
Perhaps they run their software on operating or file systems that can't do it, or on hardware with different constraints than the workstation flavoured laptops I use.
And I was dead wrong. Overnight everyone uses rg (me included).
It’s fast even on a 300mhz Octane.
SGUG tried hard to port newer packages for IRIX for several years but hit a wall with ABI mismatches leading to GOT corruption. This prevented a lot of larger packages from working or even building.
I picked up the effort again after wondering if LLMs would help. I ran into the ABI problems pretty quickly. This time though, I had Claude use Ghidra to RE the IRIX runtime linker daemon, which gave the LLM enough to understand that the memory structures I’d been using in LLVM were all wrong. See https://github.com/unxmaal/mogrix/blob/main/rules/methods/ir... .
After cracking that mystery I was able to quickly get “impossible” packages building, like WebKit, QT5, and even small bits of Go and Rust.
I’m optimistic that we’ll see more useful applications built for this cool old OS.
I’m sort of thinking of AmigaOS/Workbench as well although, perhaps because of what I would assume was always a much larger user base than SGI had, it maybe never went away like SGI and IRIX did.
It is great seeing these old platforms get a new lease of life.
Eventually I was considering rebuilding the machine completely but for some reason after a very long time digging deep into the rabbit hole I tried plain old grep and there was the data exactly where it should have been.
So it's such a vague story but it was a while back - I don't remember the specifics but I sure recall the panic.
If it actually matched grep's contract with opt-in differences that'd be a gamechanger and actually let it become the default for people, but that ship seems to have sailed.
rg : Searches git tracked files
rg -u : Includes .gitignored files
rg -uu : Includes .gitignored + hidden files
rg -uuu : Includes .gitignored + hidden + binary filesSometimes I forget that some of the config files I have for CI in a project are under a dot directory, and therefore ignored by rg by default, so I have to repeat the search giving the path to that config files subdirectory if I want to see the results that are under that one (or use some extra flags for rg to not ignore dot directories other than .git)
I still use it but Ive never trusted it fully since then I double check.
It's the reason I started using it. Got sick of grep returning results from node_modules etc.
> You could easily just alias a command with the right flag if the capability was opt-in.
I tried a search to make grep ignore .gitignore because `--exclude=...` got tedious and there was ripgrep to answer my prayers.
Maintaining an alias would be more work than just `rg 'regex' .venv` (which is tab-completed after `.v`) the few times I'm looking for something in there. I like to keep my aliases clean and not have to use rg-all to turn off the setting I turned on. Like in your case, `alias rg='rg -u'`, now how do you turn it off?
To be clear, I was not suggesting an alias for grep, but for a hypothetical alternate ripgrep that searches everything by default but has a flag to skip ignored files. Something like
alias rgi='rg --skip-ignored'
or whatever. Or if it came with a short flag that could work too, so you could use it without an alias easily.> Like in your case, `alias rg='rg -u'`, now how do you turn it off?
You don't use the same name, you make a new alias. Like rgi or something. Bonus point is you find out immediately if it's missing.
> Or if it came with a short flag that could work too
It does, `-.` for hidden and `-u` for hidden + ignored.
I'm not sure you understood what I wrote? Those are opt-out. The entire discussion is about opt-in.
Depends on your perspective, to me you have them flipped, and enabling them is "opt-in", i.e: "now I would like to see the hidden files please".
But I don't think I misunderstood you. You're telling me I should prefer hidden files to be the default, and I disagree and give my arguments. It's not more complicated than that.
To me rg only follows the same principle as the rest of my tools, fd requires `-H/--hidden`, ls `-a` or `-A` and so on. It is a big reason to why I prefer rg and fd over grep and find. Which brings us back to your first comment:
>> You started using it because it had that capability I imagine, not because it is the default.
See https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#a... for the details.
I wouldn't want to use tools that straddle the two, unless they had a nice clear way of picking one or the other. ripgrep does have "--no-ignore", though I would prefer -a / --all (one could make their own with alias rga='rg --no-ignore')
I think riggrep will not search UTF-16 files by default. I had some such issue once at least.
I ran into that with pt, and it definitely made me think I was going mad[0]. I can't fully remember if rg suffered from the same issue or not.
[0] https://github.com/monochromegane/the_platinum_searcher/issu...
There's also RGA (ripgrep-all) which searches binary files like PDFs, ebooks, doc files: https://github.com/phiresky/ripgrep-all
I suspect, in general, age has a fair amount to do with it (I certainly notice it in myself) but either way I think it's worth evaluating new things every so often.
Something like rg in specific can be really tricky to evaluate because it does basically the same thing as the builtin grep, but sometimes just being faster crosses a threshold where you can use it in ways you couldn't previously.
E.g. some kind of find as you type system, if it took 1s per letter it would be genuinely unusuable but 50ms might take it over the edge so now it's an option. Stuff like that.
With 240 log files in various subfolders.
grep -q -r "22:02" --include=".log" 4.15s user 0.09s system 99% cpu 4.269 total
grep -q -r "22:02" --include=".log" 4.18s user 0.09s system 99% cpu 4.265 total
grep -q -r "22:02" --include="*.log" 4.31s user 0.09s system 99% cpu 4.401 total
rg -q "22:02" -t log 0.01s user 0.01s system 83% cpu 0.018 total
rg -q "22:02" -t log 0.01s user 0.01s system 93% cpu 0.017 total
rg -q "22:02" -t log 0.01s user 0.01s system 95% cpu 0.018 total
I really did not expect it to be that fast.
https://hwisnu.bearblog.dev/building-cgrep-using-safe_ch-cus...
It seems this was possible because ripgrep is inefficient in CPU usage when runs multithreaded and uses about 2x times more CPU time in comparison to GNU grep.
https://hwisnu.bearblog.dev/levelized-cost-of-resources-in-b...
I don’t understand when people typeset some name in verbatim, lowercase, but then have another name for the actual command. That’s confusing to me.
Programmers are too enarmored with lower-case names. Why not Ripgrep? Then I can surmise that there might not be some program ripgrep(1) (there might be a shorter version), since using capital letters is not traditional for CLI programs.
Look at Stacked Git:
https://stacked-git.github.io/
> Stacked Git, StGit for short, is an application for managing Git commits as a stack of patches.
> ... The `stg` command line tool ...
Now, I’ve been puzzled in the past when inputing `stgit` doesn’t work. But here they call it StGit for short and the actual command is typeset in verbatim (stg(1) would have also worked).
alias g=grep
command -v rg 2>&1/dev/null && alias g=rgYou may be able to download ripgrep, and execute it (!), but god forbid you can create an alias in your shell in a persistant manner.
Really? "most" even? What CAN you do if you can't edit files in your own $HOME?
Most corporate machines are Windows boxes with ps and cmd.exe heavily restricted, no admin, and anti malware software surveilling I/O like a hawk.
You might get a git bash if you are lucky, but it's usually so slow it's completely unusable.
In one client I once tried to sneak in Clink. Flagged instantly by security and reported to HR.
It's easy to forget that life outside the HN bubble is still stuck there.
We are not talking about exceptions either. This is pretty standard stuff when you work outside of the IT-literate companies.
At one client, they provided me with a part time tester, they neglected to give him the permissions to install git. Took 3 weeks to fix.
The same client makes us dev on Windows machine but deploy on Linux pods. We can't directly test on the linux, nor connect to them, only deploy on it. In fact, we don't even have the specs of the pods, I had to create a whole API endpoint in the project just to be able to fetch them.
Other things I got to enjoy:
- CTO storing the passwords of all the servers in an libre office file
- lead testing in prod, as root, by copying files through ftp. No version control.
- sysadmin that had an interesting way of managing his servers: he remote controlled one particular windows machine using team viewer which ones the only one that could connect through ssh to them.
The list is quite long.
This makes you see the entire world with a whole new perspective.
I always thought that all devs should spend a year doing tech support for a variety of companies so that they get a reality check on what most humans actually have to deal with when working on a computer.
If you are on HN, you are the 1%.
The ".ignore" name was actually suggested by the author of ag (whereas the author of rg thought it was too generic): https://news.ycombinator.com/item?id=12568245
It's nice and everything, but I remember being happy with the tools before (I think i moved from grep to ack, then jumped due to perf to ag and for unremembered reasons to pt.)
It took me a while, but I remembered I ran into an issue with pt incorrectly guessing the encoding of some files[0].
I can't remember whether rg suffered from the same issue or not, but I do know after switching to rg everything was plain sailing and I've been happy with it since.
[0] https://github.com/monochromegane/the_platinum_searcher/issu...
On a related note, it's now ten years since an everyday tool written in Rust was released and Rust is still seen as a scary new language that might turn out to be a quick fad.
TIL: rg uses Rusts RegEx library (incompatible to PCRE, incompatible to RE2)
// really hoping openai wouldn't now force him to work on some crappy codex stuff if he stays there / in astral.
https://reddit.com/r/rust/comments/1fvzfnb/gg_a_fast_more_li...
Also something-something about dependencies (a Rust staple): https://www.reddit.com/r/rust/comments/1fvzfnb/gg_a_fast_mor...
The TUI is great, and approximate matches are insanely useful.
Someone please make an awesome new sed and awk.
Is there any demand for it anyway?
A tool called sd exists that's close enough for sed, but I haven't seen anything similar for awk.
Are you using "sd" or is there anything wrong with it as a replacement of sed?
I totally get that PCRE is a massive beast and might not be worth the effort. I would gladly settle for a smaller engine that can handle lookahead and lookback. Yeah, they're expensive, but they're powerful and convenient enough for me to still reach for them when I can.
Thanks for the consideration :-)
I thought "well I kinda want to do what rg does". Had a little glance and it was already nicely extracted into a separate crate that was a dream to use.
Thanks @BurntSushi!
Someone kinda did
I wonder how much the above reflects a dated and/or stereotyped view? Who here works with "sysadmins"? I mean... devops-all-the-places now, right? :P Share your experiences?: I'm curious.
If I were a sysadmin, I'd have some kind of "sanity check" script if I had to manage a fleet of disparate systems. I'd use it automatically when I logon to a system. Checking things like: Linux vs BSD, what tools are installed, load, any weirdnesses, etc. Heck, maybe even create aliases that abstract over all of it, as much as possible. Maybe copy over a helix binary for editing too, if that was kosher.
To be honest I hate all the new rust replacement tools, they introduce new behavior just for the sake of it, it's annoying.