The tables below provide an overview of each analysis pipeline and offer guidance on when and how to use them effectively.
Overview
| Thematic Analysis | Transcript Analysis | Key Term Extraction | Codeframe Classification | |
|---|---|---|---|---|
| Purpose | Discover the themes and topics hidden in open-ended responses | Analyse long-form content such as interview transcripts, focus-group discussions, and multi-paragraph feedback | Extract and count the specific items (brands, products, features) that respondents mention | Classify responses against a predefined set of codes/categories provided by the user |
| Best for | Long, descriptive responses – sentences and paragraphs | Very long responses – multi-sentence and multi-paragraph (interviews, transcripts, focus groups) | Short responses – single words, brand names, product mentions | Any response length – when you already know the categories you want to code against |
| What it produces | A two-level hierarchy of parent themes and sub-themes | A two-level hierarchy of parent themes and sub-themes | A frequency-ranked list of normalised entities | A flat frequency distribution of responses across your predefined codes |
| How it classifies | Groups responses by meaning – responses about similar topics end up in the same theme | Splits each response into overlapping sentence windows, clusters the chunks by meaning, then aggregates back to document level | Reads each response and pulls out every named item mentioned | Matches each response to the most relevant code(s) from your uploaded codeframe |
| Multi-label support | Yes – primary theme plus secondary codes | Yes – primary theme plus secondary codes | Yes – multiple entities per response | Yes – multi-label is the default; most responses get multiple codes |
| Who defines the categories? | The system discovers them from the data | The system discovers them from the data (same as Thematic but optimised for long text) | The data itself – entities are extracted as-is | You do – you upload a codebook of themes and descriptions |
| Hierarchy | Yes – parent themes and sub-themes | Yes – parent themes and sub-themes | Flat list | Flat list (no sub-themes) |
Scenario Comparisons
| Scenario | Thematic Analysis | Transcript Analysis | Key Term Extraction | Codeframe Classification |
|---|---|---|---|---|
| Responses are sentences or paragraphs | Best | Best | Good | |
| Responses are single words or short phrases | Best | Good | ||
| You want to understand the topics people are talking about | Best | Not designed for this | Partial – only finds topics in your codeframe | |
| You want to count how many times each brand/product was mentioned | Not designed for this | Not designed for this | Best | Possible if your codeframe lists the brands |
| You want a parent/sub-theme hierarchy | Yes | Yes | No – flat structure only | |
| You already have a codebook and need responses coded against it | Not designed for this | Not designed for this | Not designed for this | Best |
| You need consistent, repeatable coding categories across waves | Themes may vary slightly between runs | Themes may vary slightly between runs | Good – entities come from the data | Best – categories are locked to your codeframe |
| Grouped columns (e.g. brand_1, brand_2, brand_3) | Yes – with mention rank filtering | Yes – with mention rank filtering | Yes – with mention rank filtering | Yes – with mention rank filtering |
| Fewer than 50 responses | Not enough data to find reliable patterns | Not enough data to find reliable patterns | Works well – even small datasets produce useful frequency counts | Works well – predefined codes don’t need large samples to apply |
| You need to compare results across waves or markets | Good – but themes may vary slightly between runs | Good – but themes may vary slightly between runs | Good – entities are consistent because they come from the data itself | Best – same codeframe guarantees identical categories every time |
