Run Generation
Prepare PR pools, run create scripts, and monitor generation
SWE-gen generation is driven by language-specific shell scripts in
subblock/swegen/scripts/. Each script activates the environment, loads local
runtime variables, configures Docker cache paths, and calls swegen create.
Collect PRs
For a quick validation run, collect a small Python pool first:
python repos/swegen/tools/collect_prs_wo_image.py \
--languages python \
--repo_num 2 \
--max_prs_per_repo 10 \
--max-candidate-repos 60 \
--output_dir ./artifacts/collected_prs \
--disable_progress_barFor a production refresh, increase the collection target:
python repos/swegen/tools/collect_prs_wo_image.py \
--languages python \
--repo_num 100 \
--max_prs_per_repo 50 \
--output_dir ./artifacts/collected_prsThe output file is:
artifacts/collected_prs/python_pr_ids.txtThe production command can take a long time before writing the final PR ID file. Use the smaller command above for smoke tests and node validation. Repeat or schedule collection for all enabled languages when preparing production pools.
Run one language
Start with an explicit one-task command when validating a new node:
swegen create \
--input-ids-file artifacts/collected_prs/python_pr_ids.txt \
--max-pr 1 \
--n-concurrent 1 \
--output artifacts/swe_tasks/py-cc \
--state-dir scripts/.swegen-py \
--timeout 2400 \
--cc-timeout 1800 \
--no-require-issue \
--min-source-files 1 \
--max-source-files 10 \
--docker-prune-batch 0The command writes tasks to:
artifacts/swe_tasks/py-ccFor production Python generation, use the language script:
bash scripts/create_py.shThe script reads runtime parameters from config.yaml, including concurrency.
Setting N_CONCURRENT=1 in front of the script may not override the configured
value. Script logs are written to:
artifacts/logs/swegen-create/cc_py_March.txtRun all languages
After the smoke test is healthy:
bash scripts/create_all_bg.shThe script launches the configured language create scripts in the background.
Use tmux or a process supervisor for long production runs.
Tuned parameters
Each language has its own defaults:
N_CONCURRENTcontrols parallel PR cases.--timeoutcontrols the whole case timeout.--cc-timeoutcontrols the task completion model timeout.--min-source-filesand--max-source-filesfilter PR scope.--docker-prune-batchcontrols Docker cleanup cadence.
You can override N_CONCURRENT at launch:
N_CONCURRENT=8 bash scripts/create_rust.shIf the script loads concurrency from config.yaml, update the config value
instead of relying only on a shell prefix.
LLM and Claude SDK notes
swegen create uses both OpenAI-compatible and Anthropic-compatible environment
variables. On shared machines, clear stale values before a run, or set both
provider families explicitly:
export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="$OPENAI_API_KEY"
export OPENAI_API_BASE_URL="https://your-openai-compatible-endpoint/v1"
export ANTHROPIC_BASE_URL="https://your-anthropic-compatible-endpoint"
export OPENAI_MODEL="..."
export ANTHROPIC_MODEL="..."For reproducible task completion, prefer a clean Claude Code config directory:
export CLAUDE_CONFIG_DIR="$PWD/artifacts/claude-config/swegen-clean"
mkdir -p "$CLAUDE_CONFIG_DIR"The current pinned CLI performs a small OpenAI-compatible preflight before
generation. If your endpoint rejects that tiny response limit, first verify the
same key, base URL, and model with a manual API call using a slightly larger
max_tokens value, then use a compatible model/base combination for generation
or update the pinned CLI before a production run.
Docker and local caches
Generation builds and validates Docker images. The scripts route Docker config, buildx state, and cloned repo cache away from shared filesystems when possible. These caches are performance state, not dataset state. They do not need to be committed or copied between nodes.
Resume behavior
Re-running a create script resumes from:
artifacts/swe_tasks/<lang>-cc/verifiable_tasks.txtartifacts/swe_tasks/<lang>-cc/.swegen-create-batch/*.json- the optional
.swegen-*task state directory
The resume logic reconciles successful batch entries against the verified task manifest, so stale success flags do not count unless the task files are present and the manifest lists the task ID.
Operational loop
A typical production loop is:
- Keep PR pools full.
- Run language create scripts.
- Monitor
verifiable_tasks.txtgrowth and batch-state failures. - Tune concurrency and timeouts.
- Export verified tasks or let downstream blocks read manifests directly.
Open the Dashboard for the live progress view.