When to use each pipeline

/ Getting Started, How To Guide

The tables below provide an overview of each analysis pipeline and offer guidance on when and how to use them effectively.

Overview

	Thematic Analysis	Transcript Analysis	Key Term Extraction	Codeframe Classification
Purpose	Discover the themes and topics hidden in open-ended responses	Analyse long-form content such as interview transcripts, focus-group discussions, and multi-paragraph feedback	Extract and count the specific items (brands, products, features) that respondents mention	Classify responses against a predefined set of codes/categories provided by the user
Best for	Long, descriptive responses – sentences and paragraphs	Very long responses – multi-sentence and multi-paragraph (interviews, transcripts, focus groups)	Short responses – single words, brand names, product mentions	Any response length – when you already know the categories you want to code against
What it produces	A two-level hierarchy of parent themes and sub-themes	A two-level hierarchy of parent themes and sub-themes	A frequency-ranked list of normalised entities	A flat frequency distribution of responses across your predefined codes
How it classifies	Groups responses by meaning – responses about similar topics end up in the same theme	Splits each response into overlapping sentence windows, clusters the chunks by meaning, then aggregates back to document level	Reads each response and pulls out every named item mentioned	Matches each response to the most relevant code(s) from your uploaded codeframe
Multi-label support	Yes – primary theme plus secondary codes	Yes – primary theme plus secondary codes	Yes – multiple entities per response	Yes – multi-label is the default; most responses get multiple codes
Who defines the categories?	The system discovers them from the data	The system discovers them from the data (same as Thematic but optimised for long text)	The data itself – entities are extracted as-is	You do – you upload a codebook of themes and descriptions
Hierarchy	Yes – parent themes and sub-themes	Yes – parent themes and sub-themes	Flat list	Flat list (no sub-themes)

Scenario Comparisons

Scenario	Thematic Analysis	Transcript Analysis	Key Term Extraction	Codeframe Classification
Responses are sentences or paragraphs	Best	Best		Good
Responses are single words or short phrases			Best	Good
You want to understand the topics people are talking about	Best		Not designed for this	Partial – only finds topics in your codeframe
You want to count how many times each brand/product was mentioned	Not designed for this	Not designed for this	Best	Possible if your codeframe lists the brands
You want a parent/sub-theme hierarchy	Yes	Yes		No – flat structure only
You already have a codebook and need responses coded against it	Not designed for this	Not designed for this	Not designed for this	Best
You need consistent, repeatable coding categories across waves	Themes may vary slightly between runs	Themes may vary slightly between runs	Good – entities come from the data	Best – categories are locked to your codeframe
Grouped columns (e.g. brand_1, brand_2, brand_3)	Yes – with mention rank filtering	Yes – with mention rank filtering	Yes – with mention rank filtering	Yes – with mention rank filtering
Fewer than 50 responses	Not enough data to find reliable patterns	Not enough data to find reliable patterns	Works well – even small datasets produce useful frequency counts	Works well – predefined codes don’t need large samples to apply
You need to compare results across waves or markets	Good – but themes may vary slightly between runs	Good – but themes may vary slightly between runs	Good – entities are consistent because they come from the data itself	Best – same codeframe guarantees identical categories every time