At a glance
- A Likert item is one statement; a Likert scale is a summed composite of multiple items. Most teams blur the two, and that single distinction drives most bad survey analysis. [Carifio & Perla; Trochim]
- 5 points is the operational default; 7 points is the research default. Reliability gains flatten after about 5–7 categories. Sliders and 11-point scales usually do not improve measurement. [Krosnick & Presser, 2010; Revilla, Saris & Krosnick, 2014]
- Cross-cultural distortion is real and large. Japanese respondents are roughly 2.4× more likely than U.S. respondents to choose the neutral midpoint in matched studies. Raw cross-country mean comparisons are partly artifactual. [Chen, Lee & Stevenson 1995; Harzing 2006; Ulitzsch et al. 2024]
- Single Likert items are ordinal; well-behaved multi-item scales are often treated as interval-like in practice. Norman (2010) showed parametric tests are usually robust for summed scales with reasonably symmetric distributions and adequate sample sizes. [Norman 2010; Carifio & Perla]
- Sliders and VAS look modern but underperform. Funke (2016) found slider scales increase missing data, slow completion, and disproportionately hurt mobile response — one comparison reported a 6.9× odds ratio for break-off. [Funke 2016; Couper et al. 2006]
- Mobile design changes the measurement. Weijters et al. (2021) found vertical Likert layouts can raise extreme responding vs horizontal. Apple HIG (44×44 pt) and Material Design (48×48 dp) define the minimum hit target — smaller buttons inflate error rates. [Weijters et al. 2021; Apple HIG; Material Design]
What is a Likert scale?
A likert scale is an ordered response format used to capture degree rather than a binary yes/no — typically 4, 5, or 7 points spanning from one direction (disagreement, dissatisfaction, never) to its opposite (agreement, satisfaction, always). It is still the default response format for measuring attitudes, satisfaction, agreement, frequency, and perceived importance in surveys. [Likert, 1932]
The short answer to "what is a likert scale?" is: an ordered response continuum, typically 4–7 points. The rigorous answer is more useful in practice: a Likert scale is a set of related item responsesthat can be summed or modelled to estimate a latent construct. The modern literature still treats Likert as central to self-report measurement, but it is much more cautious now about response styles, midpoint usage, cross-cultural comparability, and mobile implementation than the older "just use five points" playbook. [Clark & Watson review; Krosnick & Presser, 2010]
Likert item vs Likert scale
This is the distinction most commercial content gets wrong:
One statement, answered on an ordered continuum.
"The onboarding flow was easy to follow." (Strongly disagree → Strongly agree)
A composite score built from multiple related items measuring one construct.
3–8 statements about usability, summed after checking they psychometrically hang together.
If you only have one item, report it as an item. If you have a scale, test reliability and dimensionality before pretending you have a stable construct. Trochim's survey-methods guidance and Carifio & Perla's psychometric clarification both stress that everyday usage blurs the two, which is one reason so much bad analysis keeps getting published. [Trochim; Carifio & Perla; Tavakol & Dennick, 2011]
Origin and modern definition
Rensis Likert's 1932 paper, A Technique for the Measurement of Attitudes, did not simply invent a row of boxes from "strongly disagree" to "strongly agree". The original method involved generating statements, trialling them, retaining items that discriminated between more and less favourable respondents, assigning ordered weights to the response categories, and then summing item scores into a composite attitude measure. That original logic is far closer to modern scale development than most blog posts suggest. [Likert, 1932]
Modern psychometrics still leans on the same idea: one item is noisy; a properly constructed scale is stronger. Carifio & Perla's intervention in this debate was to remind researchers that much of the "Likert data are ordinal, therefore means are forbidden" argument collapses once people confuse a single Likert-type item with a multi-item scale score. Norman's 2010 paper went further: parametric statistics are often robust even when the data arise from ordinal categories, especially for summed scales and reasonably symmetric distributions. [Carifio & Perla; Norman, 2010]
Krosnick and Presser's survey-design review adds an important practical update: the classic agree–disagree format is not always the best way to measure attitudes. They argue agree–disagree items invite acquiescence and that item-specific alternatives often perform better. In blunt terms, many teams still use "strongly disagree → strongly agree" because it is easy to template, not because it is always best measurement. [Krosnick & Presser, 2010]
The practical rule for a 5 point Likert scale
Use a 5 point Likert scale when you want fast completion, easy interpretation, broad public audiences, repeated pulse tracking, or mobile-heavy traffic. This format is the commercial default for a reason: it is familiar, compact, and robust enough for most CX, EX, and product-feedback workflows. Public examples from Qualtrics, SurveyMonkey, Gallup's Q12 ecosystem, and many healthcare instruments all reinforce how dominant five-point formats remain. [Qualtrics CSAT; SurveyMonkey; Gallup Q12]
A good five-point sequence should be fully directional and plainly worded:
- Agreement: Strongly disagree / Disagree / Neither agree nor disagree / Agree / Strongly agree
- Satisfaction: Very dissatisfied / Dissatisfied / Neither / Satisfied / Very satisfied
- Frequency: Never / Rarely / Sometimes / Often / Always
- Importance: Not at all important / Slightly / Moderately / Very / Extremely important
The practical rule for a 7 point Likert scale
Use a 7 point Likert scale when you need slightly finer discrimination, you expect engaged respondents, or you are building a higher-stakes research instrument rather than a quick operational pulse. Reviews of scale length suggest gains in reliability and validity tend to flatten after around 5–7 categories, not keep climbing forever. That is why seven is often the "research" choice and five is often the "operations" choice. [Krosnick & Presser, 2010; Revilla et al.]
The catch is mode. Seven verbal options can work well on screen but are less elegant in voice, phone, or cramped mobile layouts. If your audience is tired, rushed, or low-attention, the theoretical benefit of more categories can disappear in practice. [Krosnick & Presser, 2010; Funke 2016; Weijters et al. 2021]
Odd vs even Likert scales
The "odd vs even" debate is really a fight about the midpoint. Garland's classic comparison showed that removing the midpoint changes distributions; it does not simply reveal respondents' "true" view. Johns reached a similar conclusion on British Election Study items: some respondents use the midpoint because it fits their actual position, while others use it as a low-effort escape hatch. Sturgis and colleagues later argued that many midpoint answers function as a face-saving "I don't know", not genuine neutrality. [Garland 1991; Johns 2005; Bishop; Sturgis et al.]
The decision rule
- Include a midpoint when neutrality is substantively meaningful.
- Exclude the midpoint when you are deliberately forcing direction on trade-offs or prioritisation.
- Add a separate "Not applicable" or "Don't know" when uncertainty is common — otherwise that uncertainty leaks into the midpoint and distorts your distribution.
Unipolar vs bipolar scales
Bipolar scales are appropriate when the construct genuinely runs from one evaluative pole to another: negative → positive, oppose → support, dissatisfied → satisfied. Unipolar scales are appropriate when the construct runs from absence to presence: not at all important → extremely important, never → always, no confidence → complete confidence. Match polarity to the construct rather than defaulting to agree–disagree out of convenience. [Krosnick & Presser; Höhne et al.; Menold guidance]
Sliders, VAS, and modern alternatives
The strongest recurring finding here is simple: sliders are usually overused. Couper and colleagues found VAS responses were distributed similarly to radio-button formats but produced more missing data and longer completion times. Funke's later work is sharper on sliders specifically: slider scales negatively affected response rate, were especially bad on mobile, altered sample composition, and increased response times. Another Funke paper reported substantially higher break-off with sliders, with an odds ratio of 6.9 in one comparison. [Couper et al. 2006; Funke 2016; Funke, Reips & Thomas 2011]
The commercial takeaway is blunt: unless you can justify a visual analogue scale for a specific measurement reason, radio buttons beat sliders for clarity, speed, accessibility, and mobile execution. Legacy survey builders keep shipping sliders because they look modern. The evidence keeps saying "careful." [Couper et al. 2006; Funke 2016]
Comparison table for the main formats
Scale formats: ease vs granularity (qualitative)
Both axes 1–10. Higher ease = faster mobile completion; higher granularity = more measurement discrimination. Synthesised from Krosnick & Presser 2010, Revilla et al. 2014, Couper et al. 2006, Funke 2016.
| Format | Best use | Main upside | Main risk |
|---|---|---|---|
| 4-point forced choice | Directional judgement, no true neutral expected | Pushes differentiation | Misclassifies true neutrals |
| 5 point Likert scale | Operational surveys, CX, EX, mobile | Fast, familiar, interpretable | Less granularity than 7 |
| 7 point Likert scale | Research instruments, engaged audiences | Finer discrimination | More cognitive load, trickier on mobile/voice |
| 9-point or 11-point | Specialised scales, NPS-like formats | More spread in theory | Diminishing returns, lower usability |
| VAS / slider | Niche continuous measurement | Perceived precision | More missing data, slower, more break-off |
Source note: Krosnick & Presser's review, Revilla et al. on category count, slider/VAS studies by Couper, Funke, and Roster.
Likert scale analysis and the ordinal vs interval debate
The ordinal vs interval argument is the most over-reheated fight in survey analysis. Stevens' 1946 scaling framework is the anchor for the strict view: ordered categories are ordinal, so arithmetic operations and parametric inference are theoretically constrained. Knapp challenged the practice of casually upgrading ordinal scales to interval scales. [Stevens 1946; Knapp 1990]
But in applied work, the stricter position is no longer the whole story. Norman's 2010 review argued that parametric methods are often robust with Likert-based data, particularly when working with summed scale scores, reasonably symmetric distributions, and adequately sized samples. Sullivan and Artino's practical review made the same point more cautiously: single items deserve more restraint, while multi-item scales often justify means, t-tests, ANOVA, or regression under common real-world conditions. [Norman 2010]
Is Likert scale ordinal or interval?
A single Likert item is ordinal; a well-behaved multi-item scale score is often treated as interval-like in practice. That is the position closest to the current methodological centre of gravity. It does not mean "parametrics are always fine." It means the decision depends on the item-vs-scale distinction, distribution shape, sample size, and the stakes of the inference. [Carifio & Perla; Norman 2010]
How to analyse Likert scale data
For a single Likert item: start with frequencies, percentages, medians, and ordered models where appropriate. Use non-parametric tests when assumptions are weak: Mann–Whitney, Wilcoxon, Kruskal–Wallis, ordinal logistic regression, and proportional-odds models all belong in the toolkit.
For a Likert scale built from multiple coherent items: first test reliability and dimensionality. Then means, standard deviations, t-tests, ANOVA, linear regression, and factor models are often defensible. If the data are badly skewed, sample sizes are small, or response styles are likely driving results, check robustness with non-parametric or ordinal models as a sensitivity analysis. Good analysts do not treat "ordinal" or "interval" like religious identities — they triangulate. [Norman 2010; Tavakol & Dennick]
Reliability and Cronbach's alpha
Cronbach's alpha is still the workhorse reliability index for multi-item Likert scales, but it remains a rough indicator rather than a certificate of truth. Very low alpha suggests items are not moving together; very high alpha can indicate redundancy rather than quality. The rule of thumb that around .70 is acceptable for early-stage or group-level work remains common, but should not be used mechanically. If alpha is poor, do not simply delete the "annoying" item — check dimensionality, wording asymmetry, reversed-item problems, and response-style contamination first. [Tavakol & Dennick 2011; Swain, Weathers & Niedrich 2008]
Sample size for stable Likert results
There is no single magic sample size. Hertzog's work warns against overconfidence in tiny samples, especially when estimating reliability or refining scales. Modern questionnaire-validation guidance usually converges on pragmatic ranges: at least 100–200 respondents for basic factor/reliability work is common, and 5–10 respondents per item is still a widely used planning heuristic. For operational likert scale survey work, the stronger rule is: collect enough responses to estimate your decision metric with a tolerable margin of error. [Hertzog 2008]
Likert scale examples, questions, and templates
The best likert scale questions are specific, singly focused, and paired with response options that match the construct. Below are ready-to-use templates you can lift into a live form.
| Use case | Question wording | Format | Anchors |
|---|---|---|---|
| Customer satisfaction | Overall, how satisfied were you with your support experience today? | 5-point satisfaction | Very dissatisfied → Very satisfied |
| Employee engagement | I have the tools and resources I need to do my job well. | 5-point agreement | Strongly disagree → Strongly agree |
| Patient experience | During this visit, staff explained things in a way I could understand. | 4-point forced frequency | Never → Sometimes → Usually → Always |
| Course evaluation | The course activities helped me understand the material in depth. | 7-point agreement | Strongly disagree → Strongly agree |
| Product usability | I felt confident using this product. | 5-point SUS-style agreement | Strongly disagree → Strongly agree |
| Brand perception | This brand feels… | 7-point semantic differential | Untrustworthy ↔ Trustworthy |
| Training effectiveness | After this training, I can apply the material in my work immediately. | 5-point agreement | Strongly disagree → Strongly agree |
| Feature prioritisation | How important is offline access in your workflow? | 5-point unipolar importance | Not at all important → Extremely important |
| Habit / frequency | How often do you use the dashboard to check progress? | 5-point frequency | Never → Always |
| Variance-friendly | The new reporting view reduces the time I need to complete weekly reporting. | 7-point agreement | Strongly disagree → Strongly agree |
Source note: these patterns are aligned with public templates and guidance from Qualtrics, SurveyMonkey, AHRQ/CAHPS, PROMIS, Gallup-style engagement practice, and Nielsen Norman Group.
Design rules that matter more than fancy phrasing
- Avoid double-barrelled prompts like "The interface is clear and fast."
- Avoid backwards wording unless you have a strong psychometric reason.
- Prefer item-specific formats over bland agree–disagree when the real construct is satisfaction, confidence, ease, likelihood, or frequency.
- If you want more response spread, ask about concrete consequences rather than generic positivity. "This dashboard helps me finish my weekly report faster" discriminates better than "I like this dashboard."
If you need a likert scale template in product terms rather than methodology terms, the simplest modern pattern is one matrix-free item per screen on mobile, one construct per page, 5–7 items per construct, and one open text follow-up after the closed ratings. Create your own Likert scale survey with SpaceForms free when you want the structured scale plus a voice or chat follow-up in the same flow.
Biases, benchmarks, and cross-cultural effects
Likert scales are easy to answer, which is exactly why their biases are so persistent.
Central tendency bias
Midpoint and middle-category responding are real response styles, not just reflections of mild opinion. Bishop showed that explicitly offering a middle alternative raises the rate at which people choose it. Sturgis, Roberts, and Smith concluded that many midpoint responses are better interpreted as a face-saving way of saying "I don't know", with only a minority of initial midpoint selections reflecting true neutrality. [Bishop; Sturgis et al.; Ames & Myers]
Acquiescence bias and yea-saying
Krosnick's satisficing framework explains why agree–disagree formats are risky: when respondents are tired, uncertain, rushed, or low-motivation, they may choose the first apparently reasonable answer, agree with assertions, or avoid difficult mapping work. Krosnick and Presser therefore recommend avoiding agree–disagree, true/false, and yes/no formats when item-specific response alternatives are feasible. Cross-national work by Baumgartner and Steenkamp, Tellis et al., and Harzing shows that acquiescence and related styles also vary systematically by country. [Krosnick 1991; Krosnick & Presser; Harzing]
Social desirability bias
Edwards' classic framing of social desirability is still conceptually alive, but the 2020s literature is sharper about separating stable self-presentation tendencies from context-specific responding. Any self-report likert scale survey about ethics, health, compliance, identity, or workplace image can be distorted by this pressure. [Edwards 1957]
Cross-cultural response styles
The safest high-confidence claim is not "country X is always more extreme than country Y." It is that response-scale usage differs across countries enough to distort naive mean comparisons.
Chen, Lee, and Stevenson found that Japanese and Chinese students were more likely than North American groups to use the midpoint on 7-point scales, while U.S. respondents were more likely to use extreme values. Harzing's 26-country study later showed major differences in response styles across countries, including higher extreme and acquiescent responding in some Spanish-speaking contexts and less extreme responding among Japanese and Chinese respondents. Van Herk, Poortinga, and Verhallen found more acquiescence and extreme response styles in Mediterranean than northwestern European countries. Ulitzsch et al. (2024) add the current warning: if countries use the same response options differently, raw mean comparisons can become partly artifactual. [Chen et al. 1995; Harzing 2006; van Herk et al. 2004; Ulitzsch et al. 2024]
Midpoint usage rate by country (illustrative)
% of respondents selecting the neutral midpoint on matched questions. Japanese respondents are ~2.4× more likely than U.S. respondents to choose neutral. Sources: MeasuringU cross-country UX study; Mili (2024) multi-country experiments. These are illustrative benchmarks, not one harmonised academic panel.
Likert vs alternatives
Likert is not always the right tool.
Likert vs semantic differential
The semantic differential, associated with Osgood and colleagues, uses bipolar adjective pairs such as trustworthy ↔ untrustworthy or modern ↔ outdated. It is especially good for brand personality, emotional tone, and connotative meaning. Head-to-head work suggests semantic differential can reduce some acquiescence problems that plague agreement items, though it is not universally superior. [Osgood et al.; NNGroup]
Likert vs NPS, stars, and MaxDiff
NPS is effectively a specialised 0–10 recommendation rating, not a replacement for all Likert usage. Reichheld's original "one number" claim made it influential, but later research has disputed whether it is always the best predictor of growth. Use NPS when referral likelihood is genuinely the KPI; use Likert when you need diagnostic attitudes or experiences. [Reichheld 2003]
Star ratings are fast and familiar, but they compress meaning. A 4-star rating can mean "good but improvable", "acceptable", or "I never give 5s". That ambiguity is fine for lightweight consumer feedback and weak for diagnostic research. Likert questions usually win when you need interpretable dimensions.
MaxDiff or best–worst scaling is the smarter alternative when you need discrimination among many attributes rather than independent ratings of each one. A 2022 comparison of best–worst scaling and Likert in market research explicitly frames Likert as dominant but limited, while applied comparisons show best–worst approaches can deliver stronger differentiation. Use MaxDiff when everything is coming back "important". [BWS vs Likert 2022; MaxDiff guidance]
Mobile and conversational AI
Mobile and modern UX
On mobile, plain design wins. Research on vertical versus horizontal Likert presentation found that vertical formats can increase extreme responding relative to horizontal layouts, largely because visual distance cues change how categories feel. At the same time, item-by-item scrolling formats often outperform dense grids on smartphones for data quality and completion behaviour. That means the UI problem is not just "make it vertical on mobile" — it is "do not let responsive design quietly change the measurement instrument." [Weijters et al. 2021; Mavletova]
Touch target guidance backs the same direction. Material Design recommends touch targets of at least 48 × 48 dp. Apple's HIG recommends a hit region of at least 44 × 44 pt. If your response options are tiny pills jammed into a horizontal row, you are not just creating a UX issue — you are changing error rates and, potentially, the distribution of the data itself. [Material Design; Apple HIG]
AI and conversational surveys
This is the biggest change since mobile. Conversational interviewing with AI promises something traditional Likert cannot: adaptive probing at scale. Recent work on LLM-based conversational interviewing explicitly frames the trade-off as depth vs scale and tests whether AI can recover richer opinion data than fixed questionnaires. NORC's work makes the same case more practically: conversational approaches can yield richer, more relevant answers than standardised web forms, but they are operationally harder. [Wuttke et al. 2025; NORC]
That does not mean Likert is obsolete. It means Likert is increasingly the structured spine, while free text, voice, and chat collect the "why". The sharper 2026 workflow is not Likert or conversation — it is Likert plus conversation, with the same respondent able to rate first and elaborate second. That is also where products built for voice and chat (like SpaceForms) have a real advantage over older survey stacks that still treat open text as an afterthought. [NORC; Stanford working paper on AI/open-ended responses]
FAQs
What is a Likert scale?
An ordered response format used to measure degree, usually with 4, 5, or 7 response categories. Strictly, a single prompt is a Likert item; a true Likert scale is a multi-item composite used to measure one construct.
— Likert 1932; Carifio & Perla
Is a Likert scale ordinal or interval?
A single item is ordinal. A summed, well-behaved multi-item scale is often treated as interval-like in practice, provided diagnostics and assumptions are checked.
— Stevens 1946; Norman 2010
How do you analyse Likert scale data?
Analyse single items mostly with frequencies, medians, and ordinal/non-parametric methods. Analyse coherent multi-item scales with reliability checks first, then use means and parametric models where justified, with robustness checks when needed.
— Tavakol & Dennick; Norman 2010
5 point vs 7 point Likert scale — which is better?
Five points is usually best for speed, familiarity, and operational surveys. Seven points is often better when you need slightly finer discrimination and respondents are attentive enough to use it properly.
— Krosnick & Presser; Revilla et al.
What is the best Likert scale to use?
There is no universal best. Five is the safe default; seven is often the research default; four is useful when you deliberately want directional commitment; sliders are rarely the best default.
— Krosnick & Presser; Funke 2016
What makes good Likert scale questions?
Good Likert questions are specific, single-focus, and matched to the right response dimension: satisfaction, confidence, frequency, ease, likelihood, or importance, rather than generic agreement whenever possible.
— Krosnick & Presser; Qualtrics
When should I avoid agree–disagree Likert items?
Avoid them when you can write item-specific alternatives. Agree–disagree formats are especially vulnerable to acquiescence and low-effort responding.
— Krosnick 1991; Krosnick & Presser
Odd vs even Likert scale — which to choose?
Use odd scales when true neutrality is plausible and substantively meaningful. Use even scales when you need directional choice, and add a separate 'Not applicable' or 'Don't know' pathway when uncertainty is common.
— Garland; Johns; Sturgis et al.
Likert vs rating scale — what's the difference?
A Likert scale is one type of rating scale. Rating scales include Likert, semantic differential, stars, sliders, numeric ratings, and NPS-like formats. They are not interchangeable.
— Kantar; NNGroup
How do I use Cronbach's alpha with a Likert scale?
Use alpha only for multi-item scales, not single items. Treat it as a check on internal consistency, not proof of validity. Very high alpha can signal redundancy. Aim for around .70+ for early-stage work.
— Tavakol & Dennick 2011
How do I design a Likert survey for mobile?
Prefer one item at a time, avoid large matrix grids, keep tap targets large (≥44pt iOS / 48dp Android), and be careful with responsive layouts that silently rotate or restyle the response scale.
— Mavletova; Weijters et al.; Material/Apple guidance
Is a slider better than a Likert scale?
Usually no. The strongest evidence says sliders and VAS may look modern, but they often increase missing data, response time, or break-off — especially on mobile.
— Couper et al. 2006; Funke 2016
Conflicting findings that are still not settled
| Topic | One side | Other side | Best 2026 reading |
|---|---|---|---|
| Midpoint inclusion | Midpoint captures true neutrality | Midpoint absorbs uncertainty and satisficing | Include only when neutrality is substantively meaningful |
| 5 vs 7 points | 7 improves discrimination | 5 is easier and often 'good enough' | 5 for operations, 7 for more discriminating research |
| Parametric tests | Ordinal categories should restrict analysis | Parametrics are robust for many summed scales | Use scale/item distinction and sensitivity checks |
| Sliders | More precise and engaging | Slower, more break-off, more missing data | Avoid as default |
| Vertical mobile Likert | Easier fit on small screens | Can raise extreme responding | Don't let layout changes alter measurement unnoticed |
| Reversed items | Can reduce acquiescence | Can introduce misresponse and factor artefacts | Use sparingly and test carefully |
Source note: conflicts summarised from Krosnick & Presser, Garland, Sturgis, Norman, Funke, Couper, and Weijters.
Open questions and limitations
Several things remain less settled than popular guides imply:
Sector benchmarks are scarce. Publicly verifiable sector-wide percentage benchmarks for "what share of all academic / HR / CX surveys use Likert" are surprisingly weak; the strongest evidence is instrument-level (HCAHPS, Gallup Q12, PROMIS) and platform-level (Qualtrics, SurveyMonkey templates), not one audited universal percentage.
Scale-length comparison is mixed. Many scale-length papers compare reliability, validity, or response distributions rather than pure topline completion rates, so "response rate by 4/5/7/9 points" is not as cleanly established as many SEO pages suggest.
Cross-cultural correction is contextual. Cross-cultural work strongly supports the existence of response-style differences, but fewer public sources provide one harmonised table of directly comparable percentages by country and scale. The danger is real; the exact correction is still context-dependent.
Bottom line
If you want the safest modern default, use a 5 point Likert scale with item-specific wording, explicit N/A where needed, no gratuitous sliders, one question per screen on mobile, and open-text or conversational follow-ups for explanation. Use a 7 point Likert scale only when you can justify the extra granularity. Analyse single items conservatively and multi-item scales like actual scales. That is the closest thing the 2026 evidence offers to a definitive guide. [Krosnick & Presser; Norman; Funke; Ulitzsch et al.]
Source notes
The list below gives the most load-bearing sources with publication year and a direct URL.
- Likert, R. (1932). A Technique for the Measurement of Attitudes. Archives of Psychology.
- Carifio, J., & Perla, R. (2008). Resolving the 50-year debate around using and misusing Likert scales.
- Norman, G. (2010). Likert scales, levels of measurement and the 'laws' of statistics. Advances in Health Sciences Education.
- Stevens, S. S. (1946). On the theory of scales of measurement. Science.
- Tavakol, M., & Dennick, R. (2011). Making sense of Cronbach's alpha. International Journal of Medical Education.
- Revilla, M., Saris, W., & Krosnick, J. (2014). Choosing the number of categories in agree–disagree scales.
- Garland, R. (1991). The mid-point on a rating scale. Marketing Bulletin.
- Johns, R. (2005). One size doesn't fit all: selecting response scales.
- Krosnick, J. A., & Presser, S. (2010). Question and questionnaire design. Handbook of Survey Research.
- Couper, M., Tourangeau, R., Conrad, F., & Singer, E. (2006). Evaluating the effectiveness of visual analog scales. Social Science Computer Review.
- Funke, F. (2016). A web experiment showing negative effects of slider scales.
- Weijters, B., Millet, K., & Cabooter, E. (2021). Extremity in horizontal and vertical Likert scale format responses.
- Chen, C., Lee, S.-Y., & Stevenson, H. W. (1995). Response style and cross-cultural comparisons of rating scales.
- Harzing, A.-W. (2006). Response styles in cross-national survey research (26 countries).
- Ulitzsch, E., Henninger, M., & Meiser, T. (2024). Response-scale usage across countries. Scientific Reports.
- Qualtrics CSAT — Customer satisfaction surveys.
- SurveyMonkey — Likert scale survey template.
- CMS HCAHPS — Hospital CAHPS.
- AHRQ CG-CAHPS — Clinician & Group CAHPS.
- HealthMeasures PROMIS — Patient-Reported Outcomes Measurement Information System.
- Osgood, C. E. (1957). The Measurement of Meaning.
- Best–worst scaling vs Likert (2022) — Comparing two scaling approaches. Journal of Business Research.
- Material Design — 48×48 dp touch target guidance.
- Apple HIG (44×44 pt minimum) — Apple developer forum reference.
- Wuttke, A. et al. (2025). LLM-based conversational interviewing as adaptive interviewers.
- Stanford GSB working paper — Generative AI and open-ended survey responses.
- NORC — Generative AI can enhance survey interviews.
- Nielsen Norman Group — Rating scales (Likert or semantic differential).
Cite this report
Lundberg, E. (2026). The State of Likert Scales 2026: A Definitive Guide with Examples, Benchmarks & Best Practices. SpaceForms Research. Version 1.0. https://spaceforms.io/reports/likert-scale-2026
@techreport{lundberg2026likert,
title = {The State of Likert Scales 2026: A Definitive Guide with Examples, Benchmarks & Best Practices},
author = {Lundberg, Eric},
institution = {SpaceForms Research},
year = {2026},
version = {1.0},
url = {https://spaceforms.io/reports/likert-scale-2026}
}
Lundberg, Eric. "The State of Likert Scales 2026: A Definitive Guide with Examples, Benchmarks & Best Practices." SpaceForms Research, version 1.0, 2026, spaceforms.io/reports/likert-scale-2026.
