We’ve been building since 2020 (we were part of YC W21 batch), and iterating on the product, building out a team etc. However, back in 2020 one of our users asked if we can also show the carbon impact alongside costs.
It has been itching my brain since then. The biggest challenge has always been the carbon data. The mapping of carbon data to infrastructure is time consuming, but it is possible since we’ve done it with cloud costs. But we need the raw carbon data first. The discussions that have happened in the last few years finally led me to a company called Greenpixie in the UK. A few of our existing customers were using them already, so I immediately connected with the founder, John.
Greenpixie said they have the data (AHA!!) And their data is verified (ISO-14064 & aligned with the Greenhouse Gas Protocol). As soon as I talked to a few of their customers, I asked my team to see if we can actually finally do this, and build it.
My thinking is this: some engineers will care, and some will not (or maybe some will love it and some will hate it!). For those who care, cost and carbon are actually linked; meaning if you reduce the carbon, you usually reduce the cost of the cloud too. It can act as another motivation factor.
And now, it is here, and I’d love your feedback. Try it out by going to https://dashboard.infracost.io/, create an account, set up with the GitHub app or GitLab app, and send a pull request with Terraform changes (you can use our example terraform file). It will then show you the cost impact alongside the carbon impact, and how you can optimize it.
I’d especially love to hear your feedback on if you think carbon is a big driver for engineers within your teams, or if carbon is a big driver for your company (i.e. is there anything top-down about carbon).
AMA - I’ll be monitoring the thread :)
Thanks
There is one implementation detail that I geek out about:
It is zero config and has built-in leader nomination for running the web server and MCP server. When you start one `teemux` instance, it starts web server, .. when you start second and third instances, they join the first server and start merging logs. If you were to kill the first instance, a new leader is nominated. This design allows to seamless add/remove nodes that share logs (a process that historically would have taken a central log aggregator).
A super quick demo:
npx teemux -- curl -N https://teemux.com/random-logs
Everything looked correct. OpenSSL 3 with the FIPS provider enabled. Ruby built against it. A simple pg connection worked.
The app failed once ActiveRecord was involved. The error came from libpq. It turned out the pg gem had pulled in a prebuilt native dependency that was linked against different crypto. That path was always there. It just was not exercised until ActiveRecord hit it.
Forcing a source build fixed the issue because the extension then linked against the OpenSSL in the image.
The takeaway is that a FIPS base image does not mean your dependency graph respects the same boundary once native code is involved.
Curious how others have seen this play out in Ruby, Python wheels, Go with CGO, or Node native addons.
He described how they measured atomic hand movements (reach, grasp, orient) in decimal seconds to balance the line. But he made a distinction that stuck with me:
Back then, the goal was Flow (smoothness), which inherently required some slack in the system. Today, he argued, the goal of modern management is Utilization (removing every micro-second of downtime).
His quote: "We deleted the 'waiting,' but we forgot that the waiting was the only time the human got to breathe."
I feel like I see this exact pattern in Software Engineering now. We treat Developer Idle Time as a defect to be eliminated by JIRA tickets, rather than the necessary slack required for thinking.
Ask HN: For those who have been in the industry for 20+ years, do you agree?
This post summarizes four runs of the same task (search → first product → add to cart → checkout on Amazon). The key comparison is Demo 0 (cloud baseline) vs Demo 3 (local autonomy); Demos 1–2 are intermediate controls.
More technical detail (architecture, code excerpts, additional log snippets):
https://www.sentienceapi.com/blog/verification-layer-amazon-...
Demo 0 vs Demo 3:
Demo 0 (cloud, GLM‑4.6 + structured snapshots) success: 1/1 run tokens: 19,956 (~43% reduction vs ~35k estimate) time: ~60,000ms cost: cloud API (varies) vision: not required
Demo 3 (local, DeepSeek R1 planner + Qwen ~3B executor) success: 7/7 steps (re-run) tokens: 11,114 time: 405,740ms cost: $0.00 incremental (local inference) vision: not required
Latency note: the local stack is slower end-to-end here largely because inference runs on local hardware (Mac Studio with M4); the cloud baseline benefits from hosted inference, but has per-token API cost.
Architecture
This worked because we changed the control plane and added a verification loop.
1) Constrain what the model sees (DOM pruning). We don’t feed the entire DOM or screenshots. We collect raw elements, then run a WASM pass to produce a compact “semantic snapshot” (roles/text/geometry) and prune the rest (often on the order of ~95% of nodes).
2) Split reasoning from acting (planner vs executor).
Planner (reasoning): DeepSeek R1 (local) generates step intent + what must be true afterward. Executor (action): Qwen ~3B (local) selects concrete DOM actions like CLICK(id) / TYPE(text). 3) Gate every step with Jest‑style verification. After each action, we assert state changes (URL changed, element exists/doesn’t exist, modal/drawer appeared). If a required assertion fails, the step fails with artifacts and bounded retries.
Minimal shape:
ok = await runtime.check( exists("role=textbox"), label="search_box_visible", required=True, ).eventually(timeout_s=10.0, poll_s=0.25, max_snapshot_attempts=3)
What changed between “agents that look smart” and agents that work Two examples from the logs:
Deterministic override to enforce “first result” intent: “Executor decision … [override] first_product_link -> CLICK(1022)”
Drawer handling that verifies and forces the correct branch: “result: PASS | add_to_cart_verified_after_drawer”
The important point is that these are not post‑hoc analytics. They are inline gates: the system either proves it made progress or it stops and recovers.
Takeaway If you’re trying to make browser agents reliable, the highest‑leverage move isn’t a bigger model. It’s constraining the state space and making success/failure explicit with per-step assertions.
Reliability in agents comes from verification (assertions on structured snapshots), not just scaling model size.