What is QA Prompt Composer?

QA Prompt Composer is a free web tool that builds structured AI prompts for software testers. It generates Gherkin test cases, bug reports with root-cause hypotheses, Playwright TypeScript specs, API test suites, regression checklists, exploratory charters, and synthetic test data — all structured for use with any AI assistant. No signup required.

What's the correct order to build a prompt in QA Prompt Composer?

Work top to bottom through the three steps: (1) Pick a Purpose — this sets sensible Role, Output Format, Reasoning Mode, and Coverage defaults automatically. (2) Add your content — paste the ticket, requirement, logs, or scenario into the main textarea. (3) Tune the selectors — only change a selector if you can name a specific reason. Then click Generate and copy the assembled prompt into your AI assistant.

I have a Jira ticket. Should I use 'Requirement & Jira' or 'Test case writing'?

Use 'Requirement & Jira' first when the ticket is vague, missing acceptance criteria, or has ambiguous behaviour. This purpose produces testable AC statements, a numbered ambiguity list for the PM/dev, and a risk map — it does not write test cases. Use 'Test case writing' directly when the ticket is already well-specified with clear, measurable acceptance criteria and you're ready to generate scenarios.

When should I use Gherkin vs a lightweight checklist vs a column-based suite?

Use a lightweight checklist for smoke sweeps, sanity checks, and tight deadlines where no formal test management is needed. Use BDD/Gherkin for Agile teams doing BDD/ATDD where scenarios need to be readable by non-technical stakeholders and feed Cucumber/SpecFlow automation directly. Use a column-based suite for enterprise, regulated, or compliance/audit environments with imports to Xray, Zephyr, or TestRail.

How do I generate a Playwright TypeScript spec from a manual test case?

First run Automation Strategy on your batch of manual test cases to decide what to automate and at which pyramid layer. Then switch Purpose to Automation (Playwright), paste a single manual test case with numbered steps, and generate. The output is a runnable TypeScript spec using Arrange-Act-Assert structure, role-based locators (getByRole, getByLabel, getByTestId), and web-first assertions — never waitForTimeout.

What guards should I always have on?

Two guards are safe defaults for almost every prompt: Grounding (prevents the model from inventing limits or selectors not in your input — on by default for most Tier 1 purposes) and No-preamble (makes the response start directly at the deliverable with no intro wrapper — useful when you'll copy output straight into a tool). Add Redact/Privacy when pasting logs with real tokens or PII.

My CI is failing intermittently. Which purpose handles flaky tests?

Use the Flake Triage purpose (Tier 2). Paste the failing test code and CI failure logs into the content field. The prompt applies a hypothesis-evidence ladder over the logs to root-cause non-determinism: race conditions, animation/network timing, shared or leaked state, ordering dependencies, and time/locale sensitivity. It proposes concrete fixes and recommends a quarantine policy.

Does QA Prompt Composer work with ChatGPT, Claude, Gemini, and other AI tools?

Yes. QA Prompt Composer assembles structured prompts that you copy and paste into any AI assistant — ChatGPT, Claude, Gemini, Copilot, or any other. It is not an AI itself; it structures your inputs and selectors into an optimised prompt that any LLM can process. No API keys or accounts are needed.

Quick Answers

Frequently Asked Questions

Q: How do I save and reuse my own prompt configurations?

Open the Presets menu, type a name in the 'Save current configuration' field, and click Save. The entire state — purpose, all selectors, and context fields — is saved to your browser's local storage. To share across browsers or with your team, use Export to save a JSON file, then Import on another machine.

Q: What's the difference between 'Data Generator' and 'Data Validation' purposes?

These work in opposite directions. Data Generator produces synthetic test data (CSV, JSON, SQL, XML, Faker.js) — use it when you need records to seed an environment or feed a test. Data Validation validates data that already exists (migration output, API response, ETL result) — use it when numbers look wrong, a migration just ran, or an API response shape is suspect.

Decision-making help for every selector and workflow. Click any question to expand.

Using the Composer

What does this tool actually do? +

QA Prompt Composer is a training and reference tool for understanding AI prompt structure. It shows you which options (purpose, role, coverage, guards) go into a quality QA prompt and how they combine — so you can build that awareness and apply it in your own AI agent workflow.

It also assembles a structured prompt you can copy and paste into an AI assistant (ChatGPT, Claude, Gemini, or any other). It does not call an AI itself.

The loop is:

Pick a Purpose (e.g. “Test case writing” or “Bug reporting”) to see the relevant selectors.
Optionally paste your raw material — a Jira ticket, steps to reproduce, an API spec. Leave it blank to explore the prompt structure without submitting content.
Adjust selectors if you have a specific reason to (most defaults are calibrated for each purpose).
Click Generate to see the assembled prompt structure.
Copy the prompt, update your content as needed, and paste it into your AI agent.

Content sent to the server is only used to assemble the prompt. It is not stored, logged, or used for training. No account, no API key, no cost.

What does a structured prompt add compared to a freeform one? +

A freeform prompt like “write Gherkin test cases for this feature” tends to produce generic output because the AI has no persona, no explicit format constraint, no coverage instruction, and no grounding guard. It fills in the gaps with guesses.

The Composer shows you what those components look like and assembles them systematically:

Without Composer

“Write Gherkin test cases for TEST-4821. It’s about adding a trim column to the inventory CSV export.”

No role · No format spec · No grounding · No output constraint → generic, inconsistent output

With Composer

Persona: senior Test Case Author / Manual QA
Task: Gherkin BDD, happy path + negative + boundary
Source: your ticket, isolated from the instructions
Guard: no preamble, grounded in stated AC only

→ Consistent, format-correct output grounded in your actual ticket

The difference is most visible for complex or format-sensitive outputs — Gherkin scenarios, column-based test suites, Playwright specs, and bug reports with root-cause hypotheses. Understanding these components is also what you take with you when prompting any AI agent directly.

What’s the correct order to build a prompt? +

Work top to bottom through the three steps:

Step 1 – Pick a Purpose. This sets sensible Role, Output, Reasoning, and Coverage defaults automatically. Most of the time you change nothing else.
Step 2 – Add your content. Paste the ticket, requirement, logs, or scenario into the main textarea. Fill in optional sources (logs, schema, house-style example) only when they’re needed.
Step 3 – Tune the selectors. Only change a selector if you can name a specific reason — wrong output format, need a different role, want to lock down tone. The defaults are calibrated; unnecessary changes add noise.

The prompt assembles live on the right panel. When it looks right, hit Copy prompt and paste it into your AI assistant.

When should I trust the defaults vs change the selectors? +

Trust the defaults as the starting point. Each Purpose was calibrated with the most useful combination. Change a selector only when you have a concrete reason:

Output format — your team uses column-based suites for audit evidence, not Gherkin
Role — the default role is close but missing a second failure dimension (e.g. data integrity on top of a flow test)
Coverage — the feature has complex role-based permissions that the default doesn’t cover
Guards — you’re pasting real customer data and need the Redact guard on

If you’re changing three or more selectors routinely for the same task type, save it as a preset so you don’t repeat that work.

What are built-in presets and when should I load one? +

Built-in presets are one-click configurations for the five most common daily QA workflows. Load them from the Presets ▾ menu in the left panel header.

Preset	Best for
Gherkin from Jira Ticket	A new ticket that needs BDD scenarios immediately
Bug Report + Root Cause	Something broke and you need a dev-ready report with a root-cause hypothesis
Regression Smoke Checklist	Before a release — confirming nothing broke across existing features
API Test Cases (Column Suite)	Testing a REST endpoint for status codes, payload shape, auth, errors
Playwright E2E Spec	Converting a manual scenario to a runnable TypeScript spec

All fields remain editable after loading. You can also save your own presets, export them as JSON, and share across browsers or team members.

How do I save and reuse my own configurations? +

Open the Presets ▾ menu, type a name in the “Save current configuration” field, and click Save. The entire state — purpose, all selectors, and context fields — is saved to your browser’s local storage.

Load: select from the dropdown and click Load
Delete: select and click Delete
Share across browsers / team: Export as a JSON file, then Import on another machine

Presets survive browser restarts but are browser-specific unless exported. Export regularly if the configuration matters.

Choosing a Purpose

I have a Jira ticket. “Requirement & Jira” or “Test case writing”? +

It depends on how well-specified the ticket is.

Use “Requirement & Jira” if:

The acceptance criteria are vague, missing, or written in business language with no measurable conditions.
You see words like “should work correctly”, “as expected”, or “user-friendly” with no concrete pass/fail condition.
There are open questions about edge cases or role-based behaviour.

Use “Test case writing” directly if:

The ticket has explicit, numbered acceptance criteria with observable outcomes.
Each AC item states a condition, an action, and an expected result.

The safest workflow

Always run Requirement & Jira first. It produces an ambiguity list — if the list is empty, the ticket is ready; if it has items, resolve them with the product owner before writing cases. Writing test cases against an ambiguous ticket wastes time.

What’s the difference between “Data generator” and “Data validation”? +

These are opposite directions:

Data Generator

Produces synthetic test data that does not yet exist.

You define the schema (field names, types, constraints) and the tool produces records you can use as test input. Use it when you need realistic, PII-safe, constraint-honouring test data for seeding databases, API fixtures, or import tests.

Data Validation

Checks data that already exists.

You paste the actual data (an API response, a migrated table export, an ETL result) and the tool checks it against type, range, format, nullability, uniqueness, referential integrity, and business invariants. Use it after a migration, after an ETL run, or when an API response shape looks wrong.

Practical sequence

Use Data Generator to create the input data → run your process → use Data Validation to verify the output.

When do I use “Automation strategy” instead of going straight to “Automation (Playwright)”? +

Use Automation Strategy first whenever you have more than one test case to automate.

Automation Strategy answers three questions the Playwright purpose cannot:

Should this case be automated at all? Some cases are cheaper to keep manual (one-off, visual, exploratory). Automating them creates maintenance debt with no ROI.
Which pyramid layer? A test that checks business logic belongs at the unit or API layer — not E2E. Putting it in Playwright makes it slower, flakier, and harder to debug.
In what order? High-frequency, high-stability cases first. New features under active development last.

Common mistake

Skipping this step is the most common cause of brittle, high-maintenance test suites. Run Automation Strategy on your batch, get the “automate now / automate later / keep manual” decision for each case, then run Automation (Playwright) one case at a time on the approved candidates.

My CI is failing intermittently. Which purpose handles that? +

Use the Flake Triage purpose (Tier 2). Paste the failing test code and CI failure logs into the content field. The prompt will:

Apply a hypothesis-evidence ladder over the logs
Root-cause non-determinism: race conditions, animation/network timing, shared or leaked state, test-ordering dependencies, time/locale sensitivity
Propose concrete fixes (e.g. replace hard waitForTimeout with web-first assertions, isolate setup/teardown)
Recommend a quarantine policy and flake budget

The more log context you paste — stack trace, timing, CI environment, frequency of failure — the better the root-cause output.

Roles & Output Formats

“The numbers are wrong” vs “the button doesn’t work” — which role? +

These are two different failure modes at two different layers, and the right role depends on which layer the defect lives at.

“The button doesn’t work”

UI/interaction failure → use Bug Reporter / Defect Analyst.

The defect is in behaviour: something is unresponsive, mis-labelled, or throws a UI error. Steps to reproduce focus on user actions and visual state.

“The numbers are wrong”

Data/calculation failure → use Bug Reporter / Defect Analyst as primary, but add Data Validation Specialist as a second role.

The defect is in a value: a calculation is incorrect, a field is truncated, a decimal is rounded wrong.

Practical test

If you can reproduce the defect without opening a browser (e.g. by querying the database or calling the API directly), it’s a data-layer bug — use the second-role pairing. If you need the UI to repro it, the bug is in the UI layer — use Bug Reporter alone.

When do I add a second role and what does it actually change? +

Add a second role only when the task genuinely spans two failure dimensions — not as a default.

What it changes: the first selected role sets the primary lens (vocabulary, coverage instincts, output structure). The second role adds an additional set of instincts to the same output — it does not produce two separate outputs.

Example: Test Case Author + Accessibility Test Engineer → produces test cases that include functional coverage AND WCAG 2.2 AA checks (keyboard, focus, ARIA, contrast) in the same output.

Useful pairings:

Test Case Author + Data Validation Specialist — feature involves data correctness (types, ranges, nulls)
Test Case Author + Security Test Engineer — feature handles auth, tokens, or PII
Bug Reporter + Data Validation Specialist — defect involves wrong values, not wrong behaviour
Playwright Engineer + Flake Triage — writing a new spec while diagnosing why an existing one is flaky

Checklist vs Gherkin vs Column-based suite — when does each make sense? +

Format	Effort	Traceability	Use when
Lightweight checklist	Low	Low	Smoke sweeps, sanity checks, tight deadlines, MVPs.
BDD / Gherkin	Medium	Medium-High	Agile BDD/ATDD teams. Readable by non-technical stakeholders, can feed Cucumber/SpecFlow.
Column-based suite	Highest	Highest	Enterprise, regulated, compliance/audit. Imports to Xray/Zephyr/TestRail.

The “Regression smoke checklist” preset locks Checklist + concise length. The “Gherkin from Jira Ticket” preset locks Gherkin. The “API Test Cases” preset locks Column-based suite.

What does “Runnable spec file” actually output, and is it safe to run as-is? +

The Runnable spec file output instructs the model to emit a single runnable file — imports, describe/test blocks, setup/teardown, and assertions — in the idioms of the framework set in the Framework context field.

How safe is it? Treat it as a strong first draft:

The locator names (e.g. getByRole('button', { name: 'Save' })) will match your input — verify they reflect the actual DOM
The API seeding calls will use placeholder endpoints — replace with real paths
Check the framework version — @playwright/test API changes between major versions
Run it in headed mode first (--headed) to watch what it actually does before adding it to CI

Guards & Coverage

Which guards should I always have on? +

Two guards are safe defaults for almost every prompt:

Grounding — prevents the model from inventing limits, causes, or selectors not present in your input. Turn it on whenever facts, logs, or numbers are involved. It’s on by default for most Tier 1 purposes.
No-preamble — makes the response start directly at the deliverable with no “Here is your test suite...” wrapper. Useful for output you’ll copy straight into a tool.

Add these situationally:

Redact / privacy — when pasting real logs or tickets that contain tokens, customer names, or PII.
PII / synthetic — auto-enabled for the Data Generator purpose. Keeps data fully synthetic.
Self-check — for structured output with fixed columns or ID sequences.

What does “Pairwise” coverage actually do, and how is it different from “Boundary”? +

Boundary targets individual field edges — min, min−1, max, max+1 values for a single field. Use it whenever constrained numeric or string fields exist.

Pairwise targets combinations of fields. When you have many inputs, an exhaustive cross-product of all combinations is too large to test. Pairwise (all-pairs) ensures every pair of field values appears in at least one test case. This gives ~90% defect detection at a fraction of the test count.

Example: a search form with 4 filters, each with 3 values = 81 combinations exhaustively. Pairwise reduces this to ~15 cases while still covering every value paired with every other value at least once.

Use both when a form has both range-constrained fields (needs Boundary) and multiple interacting inputs (needs Pairwise). They solve different coverage problems.

When should I fill in the “Roles under test” context field? +

Fill it in whenever you select Permission coverage. Without it, the model has no role names to vary across and produces generic “authorized/unauthorized” language instead of concrete cases.

Also fill it in for:

Features with visibility differences per role (e.g. pricing tiers, dashboard widgets)
Security testing where IDOR or broken access control is a concern
Multi-tenant systems where one tenant must not see another’s data

Format: comma-separated role names matching what your system actually uses — e.g. admin, dealer, customer, guest. The model will vary test data and assertions across those exact names.

Data Generator

What’s the difference between “Constant fields” and “Varying fields”? +

Constant fields are identical across every generated record — tenant ID, region code, account code. They give the dataset a shared context without wasting uniqueness budget on fields that don’t need to vary.

Varying fields must be unique per record — names, IDs, emails, amounts, dates. The type dropdown (Email, UUID, Decimal, Date, etc.) injects a format constraint into the prompt, telling the model exactly what shape each unique value should take.

Required

At least one Varying field is required. A dataset where every field is constant has no per-record uniqueness and is useless for most test scenarios.

How do I pick the right data format: CSV, JSON, SQL, XML, or Faker.js? +

Format	Use when
CSV / TSV	Importing into a tool, spreadsheet, or any system with a flat import feature.
JSON	API fixtures, frontend mock data, or any consumer that reads JSON.
SQL INSERT	Seeding a relational database. Set the dialect so escaping and date/boolean literals are correct.
XML	Structured feeds, EDI documents, or any legacy system that consumes XML.
Faker.js factory (TS)	When you need repeatable, version-controlled data generation in a TypeScript project.

Volume note

The model returns records inline — keep Quantity at or below 50 for reliable output. For larger volumes, generate a Faker.js factory or SQL script and run it yourself.

What does “Edge-case records” actually add beyond normal records? +

Toggling Edge-case records instructs the model to include deliberately invalid and boundary records, clearly flagged as such, alongside the normal records. Specifically it adds:

Boundary values per constrained field: MIN, MIN+1, MAX−1, MAX, and a mid-range value
Empty string and null variants for nullable fields
Max-length + 1 overflow strings (expect rejection)
Unicode / emoji / right-to-left (RTL) characters in string fields

These records are useful for negative testing — loading the dataset and verifying the system rejects or handles them gracefully. Combine with Scenario mapping if you want each edge-case record linked to a specific test case ID.