Different levels of evidence

Peer reviewed by Dr Colin Tidy, MRCGPLast updated by Dr Hayley Willacy, FRCGP Last updated 22 Feb 2021

Medical Professionals

Professional Reference articles are designed for health professionals to use. They are written by UK doctors and based on research evidence, UK and European Guidelines. You may find one of our health articles more useful.

Identify knowledge gaps and formulate a clear clinical question.
Search the literature to identify relevant articles.
Critically appraise the articles for quality and the usefulness of results; always question whether the available evidence is valid, important and applicable to the individual patient.
Implement clinically useful findings into practice.
Evaluate performance using audit.

Healthcare professionals must always apply their general medical knowledge and clinical judgement not only in assessing the importance of recommendations but also in applying the recommendations which may not be appropriate in all circumstances. The following questions should be asked when deciding on the applicability of evidence to patients:

Is my patient so different from those in the study that results cannot be applied?
Is the treatment feasible in my setting?
What are my patient's likely benefits and harms from the therapy?
How will my patient's values influence the decision?

Continue reading below

Finding the evidence

When looking for appropriate evidence:
- Search for available guidelines - eg, National Institute for Health and Care Excellence (NICE), Health Information Resources, professional bodies (eg, a relevant specialist site such as the Royal College of Obstetricians and Gynaecologists (RCOG)).
- If no guidelines are available, search for systematic reviews - eg, Cochrane database.
- If no systematic reviews are available, look for primary research - eg, PubMed.
- If no research is available, consider general internet searching (eg, Google), or discuss with a local specialist (at this level beware poor-quality information from the internet or individual personal bias from even the most respected specialist).
The National Library for Health provides access to a range of medical search sites, including PubMed, Medline, EMBASSY, Bandolier, University of York's Centre for Review and Dissemination and the Cochrane database.
National guidelines and guidance sites include NICE and the Scottish Intercollegiate Guidelines Network (SIGN). Guidance on many topics is also available at the website for NICE Clinical Knowledge Summaries (NICE CKS) - formerly 'PRODIGY'.

Critical appraisal of medical research

Initial questions

The topic and conclusions: consider whether the message is important and believable, and whether it fits with existing knowledge and opinion (always look for other research, reviews and guidelines on the same topic).
Consider whether there any obvious problems with the research and whether the research has been ethical.
Consider whether the objectives are clear and the precise nature of the hypothesis being considered.
Funding: drug companies might seek to publish studies that show their product in a favorable light, but ignore negative studies.
Conflict of interest: consider whether the authenticity of the research can be relied upon.

Type of study

In general, the hierarchy of studies for obtaining evidence is:

Systematic reviews of randomised controlled trials (RCTs).
RCTs.
Controlled observational studies - cohort and case control studies.
Uncontrolled observational studies - case reports.

However, the hierarchy is dependent on the issue being researched. The Centre for Evidence-Based Medicine (CEBM) published a table to identify the different levels of evidence for different types of questions (eg, prognosis, treatment benefits), including²:

For issues of therapy or treatment, the highest possible level of evidence is a systematic review or meta-analysis of RCTs or an individual RCT.
For issues of prognosis, the highest possible level of evidence is a systematic review of inception cohort studies.

Expert opinion must not to be confused with personal experience (sometimes called eminence-based medicine). Expert opinion is the lowest level of acceptable evidence but, in the absence of research evidence, may be the best guide available.

RCTs:
- RCTs, especially those with double-blind placebo controls, are regarded as the gold standard of clinical research.
- These studies work very well for certain interventions - eg, drug trials, but it is much more difficult for other interventions, such as using sham acupuncture or sham manipulation as the control.
Longitudinal or cohort studies:
- A group of people is followed over many years to ascertain how variables such as smoking habits, exercise, occupation and geography may affect outcome.
- Prospective studies are more highly rated than retrospective ones, although the former obviously take many years to perform. Retrospective studies are more likely to produce bias.
Meta-analysis:
- The more data are pooled, the more valid the results but possibly the less relevant they become to individual patients. Meta-analysis can therefore be a useful tool but it has some important limitations.
- A meta-analysis takes perhaps 10 trials of 100 patients and to combine the results as if it were a trial of 1,000 patients.
- Although this technique rates highly, the methodology may not be identical in all studies and further errors may be caused by a bias to certain publications. A good meta-analysis should contain funnel plotting with cut and fill to assess the completeness of a publication.
- A large, well-conducted trial is, therefore, far more valuable than a meta-analysis.

Method

Selection of subjects is very important; some diseases are difficult to define - eg, irritable bowel syndrome, chronic fatigue syndrome, fibromyalgia. For many diseases there is huge variation in severity - eg, asthma. If subjects have been paid for taking part in the study, there may be possible bias.
Questionnaires: assess the design of the questionnaires, whether they were piloted, whether the interviewers were properly trained and the interviews standardised.
Recall bias may be important. The timing of the questionnaire may be important, especially for seasonal illness such as hay fever. Minor events may easily be forgotten.
Setting and subjects:
- The study population should be clearly defined, as should whether the whole population or a subset has been studied. Consider whether the sample size seems big enough, whether the duration of the study was long enough for the outcome measure to occur and whether there is any possible selection bias - eg, only patients treated in hospital have been selected.
- Assess whether the control group was well matched and whether any exclusion criteria were valid.
- Consider the relevance of any patients who have dropped out of the study, the reasons for dropping out and the relevance for the results and conclusions of the research.
Outcome measures: should be clearly defined, relevant to the objectives, reliable and reproducible, valid and consistent.

Results

Consider how convincing the results are, whether the statistics (eg, P value, confidence limits) are appropriate and impressive, and whether there are any possible alternative explanations for the results.
Type of outcome: the results of a trial may be relatively simple to express in terms of numbers dying or surviving or may be much harder to quantify. The quality adjusted life years (QALY) index may be used for such parameters as pain, incontinence and disability.
The results should be clearly and objectively presented in sufficient detail (eg, age or gender breakdown of results). Consider whether there was an adequate response rate in a questionnaire study (ideally above 70%) and whether the numbers in any study add up.
Identify the rate of loss of follow-up during the study and how non-responders have been dealt with - eg, whether they have been considered as treatment failures or included separately in the analysis.
Assess whether the results are clinically relevant and whether the conclusions are supported by the results of the research study.

Conclusions

Check that the conclusions relate to the stated aims and objectives of a study and whether any generalisations made from a study carried out in one population have been applied inappropriately to a different type of population.
Consider the possibility of any confounding variables - eg, age, social class, ethnicity, smoking, disease duration, comorbidity. Multiple regression analysis or strict matching of controls reduces this problem.
Bias may have many forms - eg, observer bias such as non-blinding, trying to ensure a patient has drug rather than placebo, contamination where the intervention group passes on information to the control group in health education intervention studies.
Annual and seasonal factors in the variation of disease may be important, especially for respiratory infections, rhinitis and asthma.

Discussion

The discussion should include whether the initial objectives have been met, whether the hypothesis has been proved or disproved, whether the data have been interpreted correctly and the conclusions justified.
The discussion should include all the results of the study and not just those that have supported the initial hypothesis.

Continue reading below

Hierarchical systems for levels of evidence used in recommendations and guidelines

A variety of grading systems for evidence and recommendations is currently in use. The system used is usually defined at the beginning of any guideline publication. The hierarchy of evidence and the recommendation gradings relate to the strength of the literature and not necessarily to clinical importance.

GRADE consensus (used by NICE and SIGN)

The Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach provides a system for rating quality of evidence and strength of recommendations that is explicit, comprehensive, transparent, and pragmatic and is increasingly being adopted by organisations worldwide³.

High-quality evidence that an intervention's desirable effects are clearly greater than its undesirable effects, or are clearly not, warrants a strong recommendation.
Uncertainty about the trade-offs (because of low-quality evidence or because the desirable and undesirable effects are closely balanced) warrants a weak recommendation.
Guidelines should inform clinicians what the quality of the underlying evidence is and whether recommendations are strong or weak.

Grading of evidence

Ia: systematic review or meta-analysis of RCTs.
Ib: at least one RCT.
IIa: at least one well-designed controlled study without randomisation.
IIb: at least one well-designed quasi-experimental study, such as a cohort study.
III: well-designed non-experimental descriptive studies, such as comparative studies, correlation studies, case-control studies and case series.
IV: expert committee reports, opinions and/or clinical experience of respected authorities.

Grading of recommendations

A: based on hierarchy I evidence.
B: based on hierarchy II evidence or extrapolated from hierarchy I evidence.
C: based on hierarchy II evidence or extrapolated from hierarchy I or II evidence.
D: directly based on hierarchy IV evidence or extrapolated from hierarchy I, II or III evidence

A similar system to GRADE is recommended by the Evidence-based Practice Center (EPC) program, established by the US Agency for Healthcare Research and Quality (AHRQ)⁴.

Grade and Assess Predictive tools (GRASP)

This is a framework that can provide clinicians with a standardised, evidence-based system to support their search for and selection of efficient clinical predictive tools⁵. It grades predictive tools based on published evidence on the critical appraisal of published evidence reporting the tools' predictive performance before implementation, potential effect and usability during implementation, and their post-implementation impact. For example the Ottawa knee rule had the highest grade, since it has demonstrated positive post-implementation impact on healthcare.

Future directions - medicine-based evidence

It has been argued that EBM has failed to answer the practising doctor's question of what a likely outcome would be when a given treatment is administered to a particular patient with their own distinctive biological and biographical (life experience) profile. Medicine-based evidence (MBE) is proposed to fill this gap. It is based on the profiles of individual patients as the evidence base for individualised or personalised medicine⁶. MBE builds an archive of patient profiles using data from all study types and data sources, and will include both clinical and socio-behavioural information. The clinician seeking guidance for the management of an individual patient will start with the patient's longitudinal profile and find approximate matches in the archive that describes how similar patients responded to a contemplated treatment and alternative treatments.

Article History

The information on this page is written and peer reviewed by qualified clinicians.

Next review due: 21 Feb 2026
22 Feb 2021 | Latest version
Last updated by
Dr Hayley Willacy, FRCGP
Peer reviewed by
Dr Colin Tidy, MRCGP

Feeling unwell?

Assess your symptoms online for free

Health topics

All health topics

Healthy living

Healthy living

Managing conditions

Featured conditions

Medicine information