کاربرد برنامه ریزی بیان ژن (GEP) برای ایجاد مدل ترکیبی فازی (CFLM) به منظور پیش بینی غلظت آرسنیک منابع آبی در حوضه آبریز سد سهند

نویسندگان

چکیده

مصرف آب‌های زیرزمینی آلوده به آرسنیک، منجر به بروز بیماری‌های متعدد و مرگ انسان‌ها می‌شود. در صورت آلودگی غیرنقطه‌ای و متأثر از زمین‌شناسی آب‌ها، به سادگی نمی‌توان مانع گسترش آن شد؛ لذا این نوع آلودگی‌ها می‌بایست به دقت بررسی شوند. گزارش‌های متعدد اخیر در منطقه مطالعاتی سد سهند، حاکی از وجود آنومالی آرسنیک با مقادیر بیش از استاندارد WHO (mg/L 01/0) است. با توجه به تحقیقات قبلی، مبنی بر مناسب نبودن مدل‌های خطی زمین‌آماری برای پیش‌بینی غلظت آرسنیک کل (III,V) در منطقه، از مدل‌های هوش‌مصنوعی همچون برنامه‌ریزی بیان ژنتیک (GEP) و منطق فازی استفاده شد که با الهام از طبیعت قادر به تخمین پارامترهای پدیده‌های طبیعی با دقت قابل توجهی نسبت به سایر روش‌ها هستند. برای تخمین غلظت‌های آرسنیک کل، از پارامترهای pH، سولفات، نیترات، فلوئورید، آهن و آرسنیک نمونه‌ها به‌عنوان ورودی مدل‌های فازی ممدانی (MFL)، لارسن (LFL) و سوگنو (SFL) استفاده شد. با توجه به مناسب بودن و تشابه نتایج سه مدل فازی و به منظور استفاده همزمان مزایای هر سه مدل، از برنامه‌ریزی بیان ژنتیک برای تولید مدل ترکیبی نتایج سه مدل منفرد فازی استفاده شد. با توجه به مزایای برنامه‌ریزی بیان ژنتیک و نتایج مراحل آموزش و آزمایش مدل که به ترتیب با ضریب تبیین R2 برابر 967/0 و 924/0 و مقادیر RMSE برابر 072/0 و 096/0 است، مدل مذکور قادر به ارائه مدل ترکیبی با دقت بیشتری از سه مدل منفرد فازی ارائه شده است.

کلیدواژه‌ها


عنوان مقاله [English]

Application of Genetic Expression Programming (GEP) to produce Intelligence Committee Fuzzy Logic model (CFLM) to predict arsenic concentration in water resources of the Sahand Dam Basin

نویسندگان [English]

  • Fariba Sadeghi Aghdam
  • Ata Allah Nadiri
  • Asghar Asghari Moghaddam
  • Fereydoun Armanfar
چکیده [English]

Identifying and monitoring of the water resources quality in basin have very special importance for quality management of a dam reservoir. Today, most natural waters arepolluted, so monitoring the distribution of pollutants in surface can control and reduce water pollution and its effects. Having such information is possible only through different analysis and pollution monitoring stations distributed across the study area. Arsenic is considered as one of the most important pollutants due to its high toxicity. Natural water pollution caused by geological resource cannot be eliminated or prevented to be spread simply; therefore, it should be evaluated carefully. Various reports in recent years indicated the presence of arsenic, anomaly with concentration more than the international standard (0.01 mg/L), in the water resources of the Sahand Dam basin which provides agriculture, industry and drinking water demands of the area. Hence, Geology Department of Tabriz University and East Azerbaijan Regional Water Authority have attempted to sampling and chemical analysis of surface water and ground water resources. Groundwater models may use for optimization by one parameter or combination of optimizations, simulation of pollutions and their management. Previous research showed a lack of adequate geostatistical linear models for predicting the total arsenic (III,V) concentration in the study area, so artificial intelligence models such as gene expression programming (GEP) and fuzzy logic (FL) models were used, inspired by nature with ability to estimate the parameters of the natural phenomena with significant accuracy compared to other methods. The number of 60 and 20 data of the hydrochemical parameters that have the highest correlation with arsenic, was used, respectively, in the training and testing level. These parameters Including pH, ?"SO" ?_"4" ^"2-" , ?"NO" ?_"3" ^"_" , F, Fe and As used as input parameters for Mamdani fuzzy logic (MFL), Larsen fuzzy logic (LSL) and Sugeno fuzzy logic (SFL) to estimate the total arsenic concentration. Fuzzy system has three main level, including: 1) fuzzification of data by defining the membership function; 2) communication of input and output by such as if-then rules; and 3) aggregation of system results and defuzzification by the fuzzy operator such as or/ and/ not. Each of the fuzzy models has its own advantages and uncertainty that can be used of the individual benefits. As the results of three fuzzy models are similar, the genetic expression programming model has been used to produce committee fuzzy logic model (CFLM). This theory is based on that the combination of models results achieves a better overall result. Up to now, several studies have taken using different methods of artificial intelligence that have demonstrated excellence in GEP methods. GEP with genetic algorithms is Like the GA and GP that uses individuals of the population and select them based on fitting and using with one or more genetic operator applied genetic changes on them. The search process is done with random, generated a series of trees that is leading to the production of expression tree. This process continues to the maximum number of replications or specific error function.
Fuzzy model by determining the optimal radius of 0.4, based on the lowest RMSE, were accomplished. The data were divided into 8 categories, and 7 if-then rules were determined. The fuzzy membership functions used for modeling of the arsenic values were Gaussian that was fitted to classified data. The output membership function of Sugeno model was linear, made based on the inputs. FCM clustering method was used in Mamdani and Larsen model. In this model, optimal number of 12 categories were determined based on minimum RMSE equal to 0.11 and 0.12 mg/L, respectively with input and output membership function of Gaussian type. The values of R2 for training level of Mamdani and Larsen model were in order 0.94 and 0.91, respectively. All three fuzzy models had acceptable results, but Mamdani model results were relatively better than the two others. Because of each these models own its performance, for simultaneous use of advantage of all, the committee model was used. All output data of three fuzzy models was used as input data in GEP model and also was selected in such a way that the minimum and maximum of the data be entered on the testing level. Production of the initial population of the program was done by selecting the number 20 chromosome with the head size of 7, 3 number of genes, and 2 constant per gene. The mathematical operator of + was selected for the linking function between subtrees. To compare the results in the program, three sets of the function were used as the main operators. F3 function, includes default operators, was selected as the major functions in the program and the best fitted compared to other functions. GEP model by providing the relationship between input and output, and more accurate results in the training and testing levels with R2, 0.97 and 0.92, respectively, was evaluated as the most appropriate model to estimate the arsenic values in the region.
In this study, GEP with practical features and gene expression tree production provided the possibility of evaluating complicated and non-linear models. Also, the genetic programming model provided explicit solutions with high accuracy basis on which can be determined the relationship between input and output variables. With regard to the suitability and similarity of three fuzzy model results, the genetic expression programming was used for the production committee model of results of three single models. Considering the benefits of genetic expression programming, the mentioned model is able to present a committee model with more accuracy than three single fuzzy models. Due to lack of proper accountability of spatial statistical models to estimate the arsenic in the study area, the proposed model can be appropriate in the exact determination.

کلیدواژه‌ها [English]

  • Sahand Dam
  • Genetic programming
  • Artificial Intelligence
  • Arsenic
  • Committee model
  • Fuzzy