![]() Two sets of real world data in the environmental and molecular biology areas are used to exemplify the physical meaning of the proposed measures as well as to demonstrate the operational feasibility and significance of this methodology in analyzing homologous ensemble which is subject to variable degrees of diversity. In the algorithm, schemes for feature patterns and specimen reweighting are proposed to optimize the utilization of available information in the array, and to minimize possible bias caused by the uneven sampling of the ensemble. An algorithm (based upon the proposed measures and statistical screening) is implemented for extracting feature patterns. The features of the specimens investigated are organized in a two-dimensional array, called an observation matrix, with each row vector representing the ordered set of features of a specimen. In this paper, issues concerning feature patterns in terms of both feature composition and feature interdependence are discussed, and the concepts of typicality and diversity of an ensemble are formulated. ![]() El análisis sobre diferentes ejemplos prácticos pone de manifiesto las prestaciones de los instrumentos aquí presentados Gracias a la estrecha relación entre distancia y núcleo en un espacio de Hilbert, se hace posible definir un núcleo de tipo intervalar, sobre el espacio de los intervalos abiertos de dimensión finita en la recta real, un espacio que, originariamente, no posee ningún tipo de estructura algebraica de trabajo. Esta distancia tendrá en consideración como características esenciales el tamaño y la posición relativa entre los intervalos dentro de la recta real. En este trabajo se presenta y desarrolla la definición de una medida entre intervalos a partir de una distancia euclidea. Esto sucede, en especial, cuando se intentan procesar los datos por predicción a un cierto tiempo, como en el estudio del transitorio de un sistema de control, o en el análisis de la evolución financiera de una empresa. We also identify some issues yet to solve and future research for discretization.Įxiste un buen número de aplicaciones en las que la información a codificar viene expresada en forma de un intervalo de valores. Contributions of this paper are an abstract description summarizing existing discretization methods, a hierarchical framework to categorize the existing methods and pave the way for further development, concise discussions of representative discretization methods, extensive experiments and their analysis, and some guidelines as to how to choose a discretization method under various circumstances. This paper aims at a systematic study of discretization methods with their history of development, effect on classification, and trade-off between speed and accuracy. It is time for us to examine these seemingly different methods for discretization and find out how different they really are, what are the key components of a discretization process, how we can improve the current level of research for new development as well as the use of existing methods. There are numerous discretization methods available in the literature. ![]() All these prompt researchers and practitioners to discretize continuous features before or during a machine learning or data mining task. Furthermore, many induction algorithms found in the literature require discrete features. Many studies show induction tasks can benefit from discretization: rules with discrete values are normally shorter and more understandable and discretization can lead to improved predictive accuracy. They are about intervals of numbers which are more concise to represent and specify, easier to use and comprehend as they are closer to a knowledge-level representation than continuous values. Discrete values have important roles in data mining and knowledge discovery.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |