Lists

Primary Block Lists

View the Project on GitHub blocklistproject/Lists

Archive: Optimization Recommendations

Note: This document has been archived. The recommendations below were implemented as part of the 2025 rewrite. See the main README for current documentation.


Yep — and you can cut maintenance time dramatically by turning this repo into a config-driven build pipeline (instead of “hand-curated files + a pile of scripts”).

A couple things I can tell from the repo itself:

The big win now is to standardize the entire build around one pipeline + one source of truth.


The “new architecture” that cuts maintenance the most

1) Make one canonical representation

Pick a single internal format as the “truth,” e.g.:

Everything else (hosts / dnsmasq / adguard / etc) becomes pure output rendering from that canonical set.

Why this saves time: format bugs and “why is it different between versions?” disappear because you only curate one dataset.


2) Replace “scripts per list” with a declarative config

Create something like config/lists.yml:

Then the build system loops over the config.

Result: Adding a new list becomes “add 15 lines to YAML,” not “copy script X and hope it works.”


3) Make the pipeline incremental

Most time waste in list projects is reprocessing everything from scratch.

Do this:


The pipeline stages (clean + boring + reliable)

Stage A — Fetch

Stage B — Normalize

Stage C — Merge + Dedup + Apply Overrides

Stage D — Validate

Stage E — Render Outputs

From canonical domains, generate:

…and any future formats.

Stage F — Publish


GitHub Actions: stop “manual babysitting”

Instead of pushing straight to master, have actions open an automated PR:

This alone prevents “oops we shipped a bad upstream day.”

Here’s a minimal workflow skeleton:

name: Build Lists

on:
  schedule:
    - cron: "12 6 * * *" # daily
  workflow_dispatch: {}

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: write
      pull-requests: write

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install deps
        run: |
          pip install -r scripts/requirements.txt

      - name: Build
        run: |
          python scripts/build.py --config config/lists.yml --out .

      - name: Create Pull Request
        uses: peter-evans/create-pull-request@v6
        with:
          title: "Automated list build"
          commit-message: "Automated list build"
          branch: "bot/list-build"
          labels: "automation"

(If you already publish releases via github-actions, this fits right into what you’re doing now. (GitHub))


Automate 80% of issue/PR maintenance (add/remove requests)

This is usually the real time sink.

1) Issue forms (required fields)

Use GitHub Issue Forms for:

Required fields should include:

2) Bot triage comments automatically

On issue open:

This reduces back-and-forth to near zero.


“Greatly update it” ideas that also reduce your workload

Add a machine-readable manifest

Generate manifest.json (and optionally manifest.csv) with:

This makes debugging fast when users complain.

Add “poisoning protection”

It’s common for upstreams to get compromised or to ship garbage. Add build fails for:

Add provenance headers

At the top of each generated list:

This cuts repeated “is this still maintained?” questions. (You already get those. (GitHub))


What I can do next (without guessing)

Right now, GitHub’s web view for that /scripts folder isn’t loading cleanly in my browsing tool (it returns GitHub’s “Uh oh! error”), so I can’t reliably read your current scripts line-by-line from that link in this chat.

If you paste the main build script(s) here (or upload them), I can:

If you only paste one file, paste the entrypoint script that orchestrates everything (the one you run to generate outputs).