Run Generation

SWE-gen generation is driven by language-specific shell scripts in subblock/swegen/scripts/. Each script activates the environment, loads local runtime variables, configures Docker cache paths, and calls swegen create.

Collect PRs

For a quick validation run, collect a small Python pool first:

python repos/swegen/tools/collect_prs_wo_image.py \
  --languages python \
  --repo_num 2 \
  --max_prs_per_repo 10 \
  --max-candidate-repos 60 \
  --output_dir ./artifacts/collected_prs \
  --disable_progress_bar

For a production refresh, increase the collection target:

python repos/swegen/tools/collect_prs_wo_image.py \
  --languages python \
  --repo_num 100 \
  --max_prs_per_repo 50 \
  --output_dir ./artifacts/collected_prs

The output file is:

artifacts/collected_prs/python_pr_ids.txt

The production command can take a long time before writing the final PR ID file. Use the smaller command above for smoke tests and node validation. Repeat or schedule collection for all enabled languages when preparing production pools.

Run one language

Start with an explicit one-task command when validating a new node:

swegen create \
  --input-ids-file artifacts/collected_prs/python_pr_ids.txt \
  --max-pr 1 \
  --n-concurrent 1 \
  --output artifacts/swe_tasks/py-cc \
  --state-dir scripts/.swegen-py \
  --timeout 2400 \
  --cc-timeout 1800 \
  --no-require-issue \
  --min-source-files 1 \
  --max-source-files 10 \
  --docker-prune-batch 0

The command writes tasks to:

artifacts/swe_tasks/py-cc

For production Python generation, use the language script:

bash scripts/create_py.sh

The script reads runtime parameters from config.yaml, including concurrency. Setting N_CONCURRENT=1 in front of the script may not override the configured value. Script logs are written to:

artifacts/logs/swegen-create/cc_py_March.txt

Run all languages

After the smoke test is healthy:

bash scripts/create_all_bg.sh

The script launches the configured language create scripts in the background. Use tmux or a process supervisor for long production runs.

Tuned parameters

Each language has its own defaults:

N_CONCURRENT controls parallel PR cases.
--timeout controls the whole case timeout.
--cc-timeout controls the task completion model timeout.
--min-source-files and --max-source-files filter PR scope.
--docker-prune-batch controls Docker cleanup cadence.

You can override N_CONCURRENT at launch:

N_CONCURRENT=8 bash scripts/create_rust.sh

If the script loads concurrency from config.yaml, update the config value instead of relying only on a shell prefix.

LLM and Claude SDK notes

swegen create uses both OpenAI-compatible and Anthropic-compatible environment variables. On shared machines, clear stale values before a run, or set both provider families explicitly:

export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="$OPENAI_API_KEY"
export OPENAI_API_BASE_URL="https://your-openai-compatible-endpoint/v1"
export ANTHROPIC_BASE_URL="https://your-anthropic-compatible-endpoint"
export OPENAI_MODEL="..."
export ANTHROPIC_MODEL="..."

For reproducible task completion, prefer a clean Claude Code config directory:

export CLAUDE_CONFIG_DIR="$PWD/artifacts/claude-config/swegen-clean"
mkdir -p "$CLAUDE_CONFIG_DIR"

The current pinned CLI performs a small OpenAI-compatible preflight before generation. If your endpoint rejects that tiny response limit, first verify the same key, base URL, and model with a manual API call using a slightly larger max_tokens value, then use a compatible model/base combination for generation or update the pinned CLI before a production run.

Docker and local caches

Generation builds and validates Docker images. The scripts route Docker config, buildx state, and cloned repo cache away from shared filesystems when possible. These caches are performance state, not dataset state. They do not need to be committed or copied between nodes.

Resume behavior

Re-running a create script resumes from:

artifacts/swe_tasks/<lang>-cc/verifiable_tasks.txt
artifacts/swe_tasks/<lang>-cc/.swegen-create-batch/*.json
the optional .swegen-* task state directory

The resume logic reconciles successful batch entries against the verified task manifest, so stale success flags do not count unless the task files are present and the manifest lists the task ID.

Operational loop

A typical production loop is:

Keep PR pools full.
Run language create scripts.
Monitor verifiable_tasks.txt growth and batch-state failures.
Tune concurrency and timeouts.
Export verified tasks or let downstream blocks read manifests directly.

Open the Dashboard for the live progress view.

Run Generation

On this page