Methodology

Last updated: 13 April 2026

Quick Answer

PredictJEE uses a 10-signal adaptive prediction framework (long/short-horizon frequency, trend, cycle, gap-boost, over-exposure, recency, short-trend, short-cycle, momentum) trained year-by-year on JEE Main 2015–2025 and validated on the held-out next year. April 2026 validation: 88% precision on top-25.

PredictJEE is built on a single premise: the JEE Main examination follows statistically identifiable patterns. By analysing historical data systematically, we can estimate which topics and question types are most likely to appear in upcoming sessions.

We currently cover Physics, Mathematics, and Chemistry, each with its own tailored prediction engine. This page explains our general approach, the data we use, how we validate our predictions, and where our limitations lie.

1. Data Foundation

Our models are trained on 15,000+ real JEE Main questions scraped from official sources, spanning every session from 2010 to 2026. Each question is classified into a specific question type (internally called a pattern), a category more granular than a chapter.

For example, the chapter "Waves" in Physics contains distinct question types such as Standing Waves, Doppler Effect, and Superposition of Waves. The chapter "Matrices" in Mathematics contains Determinants, Matrix Inverse, and System of Linear Equations. Predicting at the question-type level is more actionable than predicting at the chapter level because it tells you exactly what type of problem to prepare for.

15,000+

questions analysed

345

unique question types tracked

17 years

of exam history (2010-2026)

2. How the Prediction Engine Works

Each question type is scored by a proprietary multi-signal framework that analyses multiple dimensions of exam behaviour simultaneously. The signals are combined into a composite score that determines how likely a question type is to appear in the next session.

At a high level, the engine considers four categories of evidence:

1

Historical Strength

How consistently has this question type appeared across past exams? Question types that show up reliably year after year are statistically more likely to appear again. The engine measures both raw frequency and consistency over time.

2

Momentum and Trends

Is this question type becoming more or less common over recent years? The engine detects rising and falling trends to distinguish question types that are gaining prominence from those being phased out.

3

Gap and Cycle Analysis

When was this question type last tested, and does it follow a cyclical appearance schedule? NTA tends to rotate question types. Those absent for multiple years often make a comeback. The engine identifies these rotation cycles and predicts when a question type is due to return.

4

Saturation Detection

Has this question type been over-tested recently? NTA avoids repeating the same question types excessively. Those that have appeared many times in recent sessions are statistically less likely to appear again immediately.

Signal weights are not fixed. They are calibrated through multi-year backtesting, allowing each subject's engine to adapt to the unique statistical behaviour of that discipline. The exact signal composition, weighting methodology, and calibration process are proprietary.

3. Data Integrity

Prediction accuracy depends on answer accuracy. We have run 10 independent verification passes on our question bank, including:

  • Blind re-solving by 5 independent AI models (cross-verified against each other)
  • Web scraping verification against official answer keys and trusted educational sources
  • Manual expert review of every disputed answer
  • Full re-scrape of all question options to eliminate scraper-introduced errors

After 10 verification passes, our estimated answer accuracy across all subjects is ~99.5%.

4. Tier System

After scoring, all question types are ranked from highest to lowest composite score and assigned to one of three tiers.

Tier 1

Highest Confidence

The most statistically reliable predictions. Expect the vast majority of these question types to appear. These should form the core of your preparation.

Tier 2

Strong Candidates

Strong secondary targets with solid statistical support. These round out a thorough study plan.

Tier 3

Moderate Probability

Supplementary study material. These question types have lower statistical support but can still appear, particularly when NTA introduces rotation surprises.

5. Backtested Performance

Every prediction engine is validated through rigorous backtesting. We simulate predictions for past exam sessions using only data available before that session, then measure how many of our predictions actually appeared.

Physics

Top 25 Coverage

21/25

question types matched

Marks Coverage

84%

of 100 Physics marks

F1 Score

87.7%

mature year average

Hit Rate

94.6%

holdout accuracy

Mathematics

Top 25 Coverage

24/25

question types matched

Holdout Hit Rate

97.9%

46/47 question types

F1 Score

81-84%

mature years (2022-2025)

Answer Accuracy

99.5%

10 verification passes

Chemistry

Question Types

75+

across all chapters

Top 25 Precision

88%

backtested accuracy

Questions Verified

464

re-scraped and audited

F1 score is the harmonic mean of precision (how many predictions were correct) and recall (how many actual question types were predicted). Hit rate measures the proportion of our predicted question types that actually appeared in the exam.

6. Limitations and Disclaimer

Important: Read before relying on predictions

  • Past performance does not guarantee future results. Backtested accuracy is measured on historical data. Future sessions may deviate significantly from historical patterns.
  • NTA can introduce new question types at any time. Our model can only predict question types it has observed before. Entirely novel question types or topics added to the syllabus will not appear in our rankings.
  • Predictions are statistical, not deterministic. A Tier 1 ranking means high probability, not certainty. Some Tier 1 question types will not appear in every session, and some Tier 3 question types will.
  • Use predictions as a supplement, not a replacement. PredictJEE is designed to help you prioritise, not to narrow your syllabus. Comprehensive preparation remains the most reliable strategy.
  • We are not affiliated with NTA. PredictJEE is an independent platform. We have no access to unreleased examination content or insider information of any kind.

7. Questions

If you have questions about our methodology or want to understand how a specific question type was scored, contact us at [email protected].

Try It