Primary Block Lists
Note: This document has been archived. The recommendations below were implemented as part of the 2025 rewrite. See the main README for current documentation.
Yep — and you can cut maintenance time dramatically by turning this repo into a config-driven build pipeline (instead of “hand-curated files + a pile of scripts”).
A couple things I can tell from the repo itself:
The big win now is to standardize the entire build around one pipeline + one source of truth.
Pick a single internal format as the “truth,” e.g.:
Everything else (hosts / dnsmasq / adguard / etc) becomes pure output rendering from that canonical set.
Why this saves time: format bugs and “why is it different between versions?” disappear because you only curate one dataset.
Create something like config/lists.yml:
local overrides:
Then the build system loops over the config.
Result: Adding a new list becomes “add 15 lines to YAML,” not “copy script X and hope it works.”
Most time waste in list projects is reprocessing everything from scratch.
Do this:
ETag/Last-Modified when possiblebuild/cache/<source_id>.txt and metadata jsonapply:
sanity thresholds:
From canonical domains, generate:
0.0.0.0 domaindomainserver=/domain/||domain^…and any future formats.
manifest.json with counts + sources + build infoInstead of pushing straight to master, have actions open an automated PR:
This alone prevents “oops we shipped a bad upstream day.”
Here’s a minimal workflow skeleton:
name: Build Lists
on:
schedule:
- cron: "12 6 * * *" # daily
workflow_dispatch: {}
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install deps
run: |
pip install -r scripts/requirements.txt
- name: Build
run: |
python scripts/build.py --config config/lists.yml --out .
- name: Create Pull Request
uses: peter-evans/create-pull-request@v6
with:
title: "Automated list build"
commit-message: "Automated list build"
branch: "bot/list-build"
labels: "automation"
(If you already publish releases via github-actions, this fits right into what you’re doing now. (GitHub))
This is usually the real time sink.
Use GitHub Issue Forms for:
Required fields should include:
On issue open:
comment with:
This reduces back-and-forth to near zero.
Generate manifest.json (and optionally manifest.csv) with:
This makes debugging fast when users complain.
It’s common for upstreams to get compromised or to ship garbage. Add build fails for:
At the top of each generated list:
This cuts repeated “is this still maintained?” questions. (You already get those. (GitHub))
Right now, GitHub’s web view for that /scripts folder isn’t loading cleanly in my browsing tool (it returns GitHub’s “Uh oh! error”), so I can’t reliably read your current scripts line-by-line from that link in this chat.
If you paste the main build script(s) here (or upload them), I can:
build.py + lists.ymlIf you only paste one file, paste the entrypoint script that orchestrates everything (the one you run to generate outputs).