Abstract
Objective
Insomnia is a common psychiatric condition characterized by persistent difficulties falling or staying asleep despite having adequate opportunities and conditions. Seven core parameters, as measured by the Insomnia Severity Index, determine the risk of insomnia. A cubic polynomial regression model was developed based on the baseline parameters to forecast insomnia severity.
Materials and Methods
The model incorporates quadratic and cubic terms to enhance accuracy by capturing nonlinear dependencies between variables. Least squares regression and ridge regularization improve stability and avoid overfitting.
Results
The model’s performance was evaluated using mean squared error (MSE) and coefficient of determination R-squared (R2), achieving an accuracy that effectively represents trends in insomnia severity. The results indicate that the cubic polynomial model significantly enhances prediction reliability, reducing the MSE value from simpler polynomial approaches to 0.0018 and increasing the R2 value to 0.98.
Conclusion
The performance results demonstrate that the model's prediction of insomnia severity largely aligns with the actual observation data. This demonstrates that the model can accurately analyze sleep disorders and make dependable predictions. These findings suggest that advanced polynomial regression may be a valuable tool in sleep analysis, enabling the prediction of insomnia severity.
Introduction
Insomnia is a common sleep disorder that significantly impacts an individual’s daily life and can lead to persistent disruptions in sleep patterns. People with insomnia often experience symptoms such as difficulty falling asleep and frequent awakenings during the night, or waking up earlier than usual in the morning. Such insomnia symptoms are frequently associated with stress, excessive mental arousal or emotional dysregulation.1 In modern societies, especially in industrialised countries, this disorder is quite common, with more than one-third of the population reported to experience this problem at some point in time. Sleep is a physiological process that is not limited to rest, but is also considered essential for maintaining mental and physical well-being.2
Deterioration of sleep quality, particularly during childhood, may predispose individuals to develop problems such as attention deficit, learning difficulties, and behavioural disorders later in life. Therefore, recognising cases of insomnia as early as possible is crucial. When diagnosed early, patients generally respond more quickly to treatment, and the likelihood of the disorder becoming chronic is greatly reduced. For children especially, early identification of symptoms can be vital in preventing developmental delays.3-6
Insomnia can not only disrupt individual sleep patterns but also impact overall health. Research indicates that prolonged sleep disturbances weaken the immune system, increase stress levels and may contribute to the development of diseases such as depression, diabetes or hypertension.7, 8 Sleep is essential for brain homeostasis and brain resilience, as well as for maintaining mental and physical health. Insomnia is associated with changes in sleep architecture and function and can occur as a stressor contributing to mental and physical disorders. Therefore, the assessment and treatment of insomnia are essential factors for mental and physical health.9-12
Early diagnosis of insomnia is vital for enhancing both individual health and quality of life. Patients who receive treatment promptly after an early diagnosis tend to recover more quickly and are less likely to develop chronic insomnia. Additionally, early diagnosis aims to minimise other disorders that may be caused by insomnia.13, 14
However, this traditional approach faces several limitations, including the limited reliability of subjective data, the lengthy duration of the diagnostic process, and the difficulty of maintaining continuous follow-up.15, 16
Devices such as actigraphy and polysomnography measure objective data, which clinicians use in conjunction with self-reports obtained directly from the patient. Actigraphy is a measurement technique that usually involves the use of a device to collect physiological data and objectively track a person’s sleep patterns. Polysomnography is a more comprehensive test that measures brain waves, eye movements, heartbeat and muscle activity during sleep.17, 18
The existing methods for diagnosing insomnia assess an individual’s sleep patterns, medical history, and symptoms within a broad framework.19 In diagnosing insomnia, no biomarker is incorporated into clinical practice. Experts’ observations rely on the accurate interpretation of patient information. However, human error always presents a potential risk.20, 21 Moreover, difficulties in precisely describing sleep problems, as well as underestimating or overestimating one’s sleep patterns, are also limitations in diagnosing insomnia. These traditional methods have limitations, such as relying on subjective data, the difficulty of applying many objective measurements in daily life, and the absence of long-term follow-up.22-24
In this context, it has become possible to offer more accurate, faster, and individualised solutions by utilising the opportunities provided by technological advancements. Wearable devices, mobile tracking systems, and AI-enabled models have promising potential to detect insomnia symptoms early and predict the severity of the disorder. This study aims to develop a prediction model that can aid in the early diagnosis of insomnia and its progression, providing a more robust health monitoring process beyond traditional methods.25-27
The regression and artificial intelligence (AI)-based modelling techniques developed are widely used for early diagnosis, monitoring disease progression, and creating personalised treatment recommendations not only for insomnia but also for many chronic health conditions, such as diabetes, hypertension, depression, Parkinson’s, obesity, and cardiovascular diseases.28, 29 The primary advantages of such modelling are:
• Enables early diagnosis: Compared to traditional methods, it can predict risky situations before symptoms appear.
• Offers process automation, reducing clinical assessment time and the burden on healthcare professionals.
• Offers personalised intervention: Individualised treatment recommendations can be produced by taking into account the patient’s historical data.
• Based on objective data, it reduces human error and helps make data-driven decisions.
• Enables long-term follow-up: Clinicians’ dynamic monitoring tracks disease progression or recovery.
The model developed in this study offers a more accurate, rapid, and personalised approach. It also addresses the limitations of subjective data, enabling more reliable results and more effective health management. Such algorithms can enhance the quality of healthcare in both clinical practice and research.30, 31
Materials and Methods
Methodology
This study developed a cubic polynomial model to represent the key parameters that influence the severity of insomnia. While multiple factors contribute to sleep disturbances, this study focuses on five key variables that have the most significant impact: sleep duration (SD), time to fall asleep, time to wake, time to return to sleep after waking, sleep efficiency and impaired daytime functioning (DI). The study selected these parameters based on their clinical relevance and measurable impact on sleep quality. The formulation of a cubic polynomial aims to capture the complex, nonlinear relationships between these factors and the severity of insomnia. The model strikes a balance between accuracy and computational efficiency, providing a reliable data representation while avoiding excessive complexity. The main parameters and threshold values in the literature are listed below.
Basic Parameters Signalling Insomnia
1. SD:
• Sleeping less than 6-7 hours a night.
• Difficulty falling asleep in 30 minutes or more.
2. Sleep Quality:
• Frequent awakenings during the night.
• Not feeling rested or refreshed when waking up.
3. Sleep Delay:
• Taking longer than 30 minutes to fall asleep.
4. Waking After Sleep Onset (WASO):
• Staying awake for more than 20-30 minutes during the night.
5. Early Morning Awakening:
• Waking up 1 hour or more before the scheduled time and not being able to fall back asleep.
6. Impaired DI:
• Fatigue or excessive daytime sleepiness.
• Difficulty concentrating or memory problems.
• Mood disorders such as irritability, anxiety or stress.
7. Trigger Factors:
• Stress, emotional distress or sudden life changes.
• Travel, jet lag or shift work.
• Environmental disturbances (noise, light, temperature).
In natural conditions, if the symptoms listed above persist for more than three months, it is usually classified as chronic insomnia rather than acute insomnia.1 The developed cubic polynomial model reflects the severity of insomnia based on threshold values of key parameters and predicts the future course of the disease. The model includes numerous parameters; however, to reduce computational complexity, the study disregarded two of the seven basic parameters that indicate insomnia, as they contributed minimally.
The developed model considers the five basic parameters and threshold values listed below.
Step 1: Defining Parameters
Affecting parameters and their threshold values:
1. SD: Hours of sleep per night (threshold: 6-7 hours)
2. Sleep Latency (SL): Time taken to fall asleep in minutes (threshold: >30 minutes)
3. WASO: Total time awake during sleep in minutes (threshold: >20-30 minutes)
4. Early Awakening (EA): Minutes awake before the intended wake-up time (threshold: >60 minutes)
5. Impaired DI: Fatigue, concentration problems, mood swings (scale: 0-10)
Step 2: Defining the Cubic Polynomial Model
Using the five parameters (SD, SL, WASO, EA, DI), the study defines a function in Equation (1) to indicate the severity of insomnia.
Equation (1) presents the cubic polynomial model, to which least absolute shrinkage and selection operator (LASSO) optimisation regression is applied to estimate all β values. To support this, Equation (2) includes the variable error (ε).
where:
Y = Insomnia severity
Xi = Input variables (SD, SL, WASO, EA, DI)
β0 = Initial value
βi, βii , βiii = Determined model coefficients
ε = The error variable captures the residual ε component, which includes randomness and unmodeled effects.
The variable β0 denotes the constant term that allows the model to make significant predictions when all input parameters are zero. It explains the endogenous severity of insomnia not explained by the predictors. Without β0, the model would be forced to pass through the origin (Y = 0 when all Xi = 0), which may not be realistic.
In practical application, if β0 is significantly positive, indicating an initial level of insomnia severity, even without the influence of other factors such as SD, SL, and waking up after sleep onset). If the coefficient β0 is close to zero or negative, which indicates that insomnia severity is due to the selected predictors rather than a natural baseline level.
Step 3: Normalise Parameters Using Thresholds
Using the developed model, inputs were normalized according to predefined threshold intervals to ensure comparability among various parameters:
“The normalized values in Equation (1) can be expressed as follows:
Step 4: Multi-Objective Optimization Approach
The LASSO optimisation regression method includes a regularisation term L that encourages sparsity by forcing some coefficients to be exactly zero. This aids feature selection and prevents overfitting.
• Objective Functions:
The original Objective Function (before regulation) can be defined as follows:
Modified Objective Function with LASSO (L-Regulation)
where:
Modified Objective Function with LASSO (L-Regulation)Yj actual observed value for the jth sample, J(β) represents the objective function that measures the error between predicted and actual values in a regression model.
• Constraints:
Depending on the specific optimisation method, constraints may include LASSO (L1 Regulation) constraints:
Ridge (L2 Regulation):
These constraints prevent over-fitting and ensure sparsity in feature selection.
• Physical or Site-Specific Constraints on Implementation:
If certain parameters must be non-negative (for example, sleep time cannot be negative):
If the parameters need to be within a known range:
• Monotony Constraints:
If the model needs to implement monotonic relationships (e.g., severity should increase as sleep parameters worsen), constraints such as:
The condition in Equation (7) ensures that a factor such as sleep delay or waking up after sleep onset increases its severity.
Exact Formulation of the Optimisation Problem:
Equation (8) is defined provided that.
3. (for physically meaningful parameters)
This optimisation problem allows the minimisation of the estimation error while applying realistic constraints on the parameter values.
The β values are obtained from the measured data using a Least Squares Regression solution, but some modifications have been made using LASSO (L regularisation) which forces some β values set to zero, which leads to feature selection. If some polynomial terms are redundant, LASSO improves generalisation by removing them. The least squares regression method determines β values using the design matrix X as follows.
1. Solution Generating Matrix Layout
The model constructs the cubic polynomial regression by organising input variables into constant, linear, quadratic, and cubic terms. Matrix X can be defined as follows:
where each row represents a data sample and the columns represent the polynomial properties.
2. Least Squares Regression Solution
Given the set of observed severity scores, the model coefficients β
where:
XT: is the transpose of X,
(XTX)-1: Inverse of the Gram matrix,
XTY: generates the β parameter using the output values.
Ridge Regression defines a regularisation term to prevent overfitting and numerical instability:
where λ is the regulation parameter, and I is the identity matrix.
3. Prediction and Model Evaluation
The predicted severity score Ŷ is calculated using Equation (11) with the coefficients β computed according to Equation (10).
The model accuracy was evaluated using the Mean Squared Error (MSE) and the Coefficient of Determination (R²):
where, N represents the number of samples, and Ŷ denotes the mean of the actual values.
The goodness of fit of the developed model is evaluated using the MSE and R² methods. A high MSE value indicates poor model fit and greater error, whereas an R² value close to 1 signifies that the model fits the data well and demonstrates high performance.
Pseudocode: Cubic Polynomial Regression for Predicting Insomnia Severity.
Step 1: Upload Data
Input: SD, SL, WASO, EA, DI, Insomnia Severity Scores (Y)
Step 2: Normalized Data
For all parameters Xi {SD, SL, WASO, EA, DI}:
Calculate the average (Xi) and the standard deviation (Xi)
Normalize Xi : Xinormalized = (Xi - average (Xi)) / standard deviation (Xi)
Step 3: Solution Matrix (Cubic Polynomial Parameters)
Step 4: Solving Model Coefficients Using Ridge Regression
lambda = 0.1 # Regulation Parameter
Step 5: Prediction
Step 6: Compute Model Accuracy (MSE, R2)
MSE = (1/N) * sum((Y - Ytahmin)2)
SST = Sum((Y - AverageY)2)
R-squared (R2) = 1 - (MSE / SST)
Step 7: Output Results
Print “Model Constraints (beta):”, beta
Print “MSE:”, MSE
Print “R2:”, R2
Step 8: Plot Actual and Predicted Severity Scores
Plot Y (actual values) vs. Yprediction (predicted values)
Z-score normalization was applied to the input variables to ensure numerical stability and prevent ill-conditioning:
where μi and σi; respectively, Xi denotes the mean and standard deviation values.
Statistical Analysis
This study applies a custom analysis method developed by the authors for statistical analyses. The study utilises MATLAB R2021 b software (The MathWorks, Inc., Natick, MA, USA) to perform the analyses. Cubic polynomial regression analysis examined the relationships within the data, and the model’s performance was measured using MSE and the coefficient of determination (R2). The study set the significance level at p<0.05. The data in this study are from the Kaggle32 open-access library. As the data are publicly available, ethical approval was not required. Throughout the research process, the team adhered to ethical guidelines and maintained the principles of data confidentiality. In this study, a prediction model is developed based on key sleep parameters that affect insomnia severity. The dataset used is a global sleep health dataset titled “Sleep Health and Lifestyle Dataset” obtained from the Kaggle32 open data platform.The dataset includes health, lifestyle, and sleep information for 500 individuals, with the following variables recorded for each participant:
• Demographic information: Age, gender
• Health indicators: Body mass index, physical activity levels, stress levels, medical history
• Sleep information: SD, sleep quality, wake time, weekday/weekend sleep patterns
• Lifestyle factors: Caffeine intake, screen time, smoking and alcohol consumption, occupation, and marital status.
The study chose five parameters that directly influence insomnia prediction: SD, time to fall asleep, nighttime wakefulness time (WASO), EA time, and DI impairment score. The literature recognizes these parameters as biomarkers linked with insomnia, and the study used them as input variables in the model.
The study generated a data sample of around 500 rows based on these five parameters. Each row details the observation data of an individual.
The study applied Z-score normalisation to statistically scale the data and then trained the cubic polynomial regression model. It evaluated the model’s performance using MSE and coefficient of determination (R²). The model, programmed in MATLAB R2021b environment, aims to accurately represent multivariate relationships affecting insomnia severity.
Table 1 lists the scoring system equivalents of the received data
According to Table 1, each parameter is scored on a 0-4 scale and evaluated as follows.
• 0: No problem at all.
• 1: Mild.
• 2: Moderate.
• 3: Severe.
• 4: Very severe.
The calculation of insomnia severity involves evaluating five fundamental sleep parameters on a 0-4 point scale. The overall score, which ranges from 0 to 20, is obtained by adding up the individual parameter scores. Each parameter is weighted according to its clinical significance (for example, SD is multiplied by 1.5, and DI by 2).
Comparison with standard Insomnia Severity Index (ISI) values is as follows;
• 0-7 point: No clinical evidence.
• 8-14 point: Mild insomnia.
• 15-21 point: Moderate insomnia.
• 22-28 point: Severe insomnia.
If the total score exceeds 20, the formula below adjusts it to align with the standard ISI range (0-28).
Implementation Results
The study utilised a dataset comprising 500 values for each parameter influencing insomnia severity. The predictive model, formulated as cubic polynomial regression, was employed to investigate the relationship between these parameters and the severity of the condition.
The actual severity obtained from the data, along with the comparison with the developed model, is shown in Figure 1.
Figure 1 compares the actual insomnia severity data (blue circles) with the model-predicted severity values (red dashed line). The predicted values of the developed model are generally quite close to the actual data.
The developed model accurately predicts insomnia severity, showing no significant deviations. The intrinsic severity values in the resulting graph range from 0 to 60, showing the variation in initial severity levels across different conditions.
Figure 2 illustrates the distribution of the total ISI.
Figure 2 presents the data used to understand the distribution of insomnia severity at different levels. In the graph, the ISI ranges from 0 to 28. This index is a metric used to measure the severity of insomnia symptoms.
The horizontal axis of the graph (x-axis) represents the severity index, while the vertical axis (y-axis) shows the number of individuals with these index values. Most cases concentrate on severity values ranging from 0 to 5. The data show an exceptionally high number of cases in the 0-1 range. This may mean that most individuals have very mild or no apparent symptoms of insomnia. When the severity index rises above 10, the number of cases decreases considerably, i.e. there are fewer cases of severe insomnia. When most of the index values fall within a certain range, insomnia severity is more prevalent in that range.
During the training and evaluation of the developed model, analysing this distribution helps assess how accurately the model can predict insomnia cases of different severity levels. Furthermore, researchers can use the data to develop treatment or intervention strategies for insomnia.
Figure 3 illustrates the distribution of the ISI and shows the number of cases classified according to different levels of insomnia severity.
Figure 3 shows the distribution of samples by severity, categorised as Negative (0-7), Mild insomnia (8-14), moderate insomnia (15-21), and severe insomnia (22-28). According to the graph, the number of individuals with no symptoms of insomnia (i.e., negative individuals) was approximately 340, representing the largest group of the total cases. The number of individuals with mild insomnia was approximately 75, with about 30 having moderate insomnia, and approximately 35 experiencing severe insomnia.These results indicate that the majority of the individuals analysed do not experience insomnia. However, the data reveal a considerable number of mild insomnia cases, suggesting that insomnia symptoms are common but often not severe. The incidence of moderate and severe insomnia is relatively low and close to each other.
This distribution should be considered when evaluating the model’s prediction performance. Particularly, the model’s ability to detect mild and moderate insomnia cases will be a key factor. Additionally, the small number of samples in the “severe insomnia” group in the dataset may hinder the model’s ability to learn this category. From a clinical perspective, early identification and treatment of individuals with mild to moderate insomnia symptoms present a vital opportunity to prevent severe insomnia.
Figure 4 shows the predicted severity of insomnia in future periods.
The highest insomnia severity observed in the graph is approximately 32-34 levels, and this maximum value occurs in the 2nd, 11th, and 17th periods. There are sudden increases at these points. The average insomnia severity is around 7-10, with a few periods where the minimum level is 0.
There is high variance in the dataset, as insomnia severity fluctuates from 0-5 at specific points in time to as high as 30 at others. These fluctuations suggest that specific periods exacerbate insomnia, resulting in its irregular variation over time.
To predict insomnia severity in future periods, time series analysis helps identify whether insomnia exhibits periodic increases or decreases. Sudden jumps, trend shifts, or anomalies enable the detection and investigation of potential triggering factors. These sudden jumps aid in forecasting future episodes of heightened insomnia severity.
Conclusion
This model provides greater practicality than traditional AI solutions and easily integrates with Internet of Things devices, requiring minimal processing power. The model’s autonomous prediction ability enables continuous monitoring of disease progression and the timely recommendation of necessary interventions. This approach improves healthcare efficiency by preventing unnecessary delays for specialists.
This study develops a cubic polynomial regression model to predict insomnia severity based on basic sleep parameters. The model, which includes quadratic and cubic terms, effectively captured the nonlinear relationships between the variables and provided a more accurate representation and prediction of the course of insomnia. The MSE value of 0.0018 indicated that the model’s performance was relatively high. The study also evaluated the model’s performance using the R2 metric. According to the performance results, the model’s R2 value of 0.998 also demonstrates its high accuracy. These performance results emphasised the model’s ability to predict insomnia severity more reliably.
Comparing different polynomial representations, the classical cubic model exhibits a significantly higher error, despite using fewer coefficients. The developed parametric cubic model with more coefficients significantly improved the fit and reduced the margin of error. These results highlight the importance of selecting a model that strikes a balance between accuracy and computational efficiency.
The findings suggest that advanced polynomial regression techniques can enhance sleep disorder assessments and provide a data-driven approach to understand Insomnia patterns.
Future studies could explore additional impact factors and further optimise the model for clinical applications. At the same time, using data from smartwatches can provide objective and continuous follow-up. Since smartwatches can be used comfortably by adults and children, data collection becomes easier and more reliable. Smartwatch data enables early diagnosis, especially in children, allowing doctors or parents to prevent insomnia before it progresses. The model offers a significant advantage in diagnosing insomnia by eliminating the need for direct feedback from individuals who have difficulty expressing themselves in childhood.