OFP_CLASS: a hybrid method to generate optimized fuzzy partitions for classification

Abstract  The discretization of values plays a critical role in data mining and knowledge discovery. The representation of information
through intervals is more concise and easier to understand at certain levels of knowledge than the represe…

Abstract  

The discretization of values plays a critical role in data mining and knowledge discovery. The representation of information
through intervals is more concise and easier to understand at certain levels of knowledge than the representation by mean
continuous values. In this paper, we propose a method for discretizing continuous attributes by means of fuzzy sets, which
constitute a fuzzy partition of the domains of these attributes. This method carries out a fuzzy discretization of continuous
attributes in two stages. A fuzzy decision tree is used in the first stage to propose an initial set of crisp intervals, while
a genetic algorithm is used in the second stage to define the membership functions and the cardinality of the partitions.
After defining the fuzzy partitions, we evaluate and compare them with previously existing ones in the literature.

  • Content Type Journal Article
  • Category Original Paper
  • Pages 1-16
  • DOI 10.1007/s00500-011-0778-0
  • Authors
    • Jose M. Cadenas, Dept. de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
    • M. Carmen Garrido, Dept. de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
    • Raquel Martínez, Dept. de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
    • Piero P. Bonissone, GE Global Research, One Research Circle, Niskayuna, NY 12309, USA

How landscape ruggedness influences the performance of real-coded algorithms: a comparative study

Abstract  Ruggedness has a strong influence on the performance of algorithms, but it has been barely studied in real-coded optimization,
mainly because of the difficulty of isolating it from a number of involved topological properties. In th…

Abstract  

Ruggedness has a strong influence on the performance of algorithms, but it has been barely studied in real-coded optimization,
mainly because of the difficulty of isolating it from a number of involved topological properties. In this paper, we propose
a framework consisting of increasing ruggedness function sets built by a mechanism which generates multiple funnels. This
mechanism introduces different levels of sinusoidal distortion which can be controlled to isolate the singular influence of
some related features. Some commonly used measures of ruggedness have been applied to analyze these sets of functions, and
a numerical study to compare the performance of some representative algorithms has been carried out. The results confirm that
ruggedness has an influence on the performance of the algorithm, proving that it depends on the multi-funnel structure and
peak features, such as height and relative size of the global peak, and not on the number of peaks.

  • Content Type Journal Article
  • Category Original Paper
  • Pages 1-16
  • DOI 10.1007/s00500-011-0781-5
  • Authors
    • Jesús Marín, Department of Automatic Control (ESAII), Universitat Politècnica de Catalunya, EUETIB, Urgell 187, 08036 Barcelona, Spain

Towards interval-based non-additive deconvolution in signal processing

Abstract  Reconstructing a signal from its observations via a sensor device is usually called “deconvolution”. Such reconstruction requires
perfect knowledge of the impulse response of the sensor involved in the signal measurement. The l…

Abstract  

Reconstructing a signal from its observations via a sensor device is usually called “deconvolution”. Such reconstruction requires
perfect knowledge of the impulse response of the sensor involved in the signal measurement. The lower this knowledge, the
more biased the reconstruction. In this paper, we present a novel method for reconstructing a signal measured by a sensor
whose impulse response is imprecisely known. This technique is based on modeling the relationship between the measurement
and the signal via a concave capacity and extending the convolution concept to a concave set of impulse responses. The reconstructed
signal is interval-valued, thus reflecting the poor knowledge of the sensor impulse response.

  • Content Type Journal Article
  • Category Focus
  • Pages 1-12
  • DOI 10.1007/s00500-011-0771-7
  • Authors
    • Olivier Strauss, LIRMM Université Montpellier II, 161 rue Ada, 34392 Montpellier cedex 5, France
    • Agnès Rico, LIRMM Université Montpellier II, 161 rue Ada, 34392 Montpellier cedex 5, France

Partially supervised Independent Factor Analysis using soft labels elicited from multiple experts: application to railway track circuit diagnosis

Abstract  Using a statistical model in a diagnosis task generally requires a large amount of labeled data. When ground truth information
is not available, too expensive or difficult to collect, one has to rely on expert knowledge. In this pa…

Abstract  

Using a statistical model in a diagnosis task generally requires a large amount of labeled data. When ground truth information
is not available, too expensive or difficult to collect, one has to rely on expert knowledge. In this paper, it is proposed
to use partial information from domain experts expressed as belief functions. Expert opinions are combined in this framework
and used with measurement data to estimate the parameters of a statistical model using a variant of the EM algorithm. The
particular application investigated here concerns the diagnosis of railway track circuits. A noiseless Independent Factor
Analysis model is postulated, assuming the observed variables extracted from railway track inspection signals to be generated
by a linear mixture of independent latent variables linked to the system component states. Usually, learning with this statistical
model is performed in an unsupervised way using unlabeled examples only. In this paper, it is proposed to handle this learning
process in a soft-supervised way using imperfect information on the system component states. Fusing partially reliable information
about cluster membership is shown to significantly improve classification results.

  • Content Type Journal Article
  • Category Focus
  • Pages 1-14
  • DOI 10.1007/s00500-011-0766-4
  • Authors
    • Zohra L. Cherfi, GRETTIA, French Institute of Science and Technology for Transport, Development and Networks, Université Paris-Est, Marne-la-Vallée, France
    • Latifa Oukhellou, LISSI, Université Paris-Est Créteil, Créteil, France
    • Etienne Côme, GRETTIA, French Institute of Science and Technology for Transport, Development and Networks, Université Paris-Est, Marne-la-Vallée, France
    • Thierry Denœux, HEUDIASYC, Université de Technologie de Compiègne, UMR CNRS 6599, Compiègne, France
    • Patrice Aknin, GRETTIA, French Institute of Science and Technology for Transport, Development and Networks, Université Paris-Est, Marne-la-Vallée, France

Extending information processing in a Fuzzy Random Forest ensemble

Abstract  Imperfect information inevitably appears in real situations for a variety of reasons. Although efforts have been made to incorporate
imperfect data into classification techniques, there are still many limitations as to the type of …

Abstract  

Imperfect information inevitably appears in real situations for a variety of reasons. Although efforts have been made to incorporate
imperfect data into classification techniques, there are still many limitations as to the type of data, uncertainty, and imprecision
that can be handled. In this paper, we will present a Fuzzy Random Forest ensemble for classification and show its ability
to handle imperfect data into the learning and the classification phases. Then, we will describe the types of imperfect data
it supports. We will devise an augmented ensemble that can operate with others type of imperfect data: crisp, missing, probabilistic
uncertainty, and imprecise (fuzzy and crisp) values. Additionally, we will perform experiments with imperfect datasets created
for this purpose and datasets used in other papers to show the advantage of being able to express the true nature of imperfect
information.

  • Content Type Journal Article
  • Category Focus
  • Pages 1-17
  • DOI 10.1007/s00500-011-0777-1
  • Authors
    • Jose M. Cadenas, Dept. Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
    • M. Carmen Garrido, Dept. Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
    • Raquel Martínez, Dept. Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
    • Piero P. Bonissone, GE Global Research, One Research Circle, Niskayuna, NY 12309, USA

Missing data imputation for fuzzy rule-based classification systems

Abstract  Fuzzy rule-based classification systems (FRBCSs) are known due to their ability to treat with low quality data and obtain
good results in these scenarios. However, their application in problems with missing data are uncommon while …

Abstract  

Fuzzy rule-based classification systems (FRBCSs) are known due to their ability to treat with low quality data and obtain
good results in these scenarios. However, their application in problems with missing data are uncommon while in real-life
data, information is frequently incomplete in data mining, caused by the presence of missing values in attributes. Several
schemes have been studied to overcome the drawbacks produced by missing values in data mining tasks; one of the most well
known is based on preprocessing, formerly known as imputation. In this work, we focus on FRBCSs considering 14 different approaches
to missing attribute values treatment that are presented and analyzed. The analysis involves three different methods, in which
we distinguish between Mamdani and TSK models. From the obtained results, the convenience of using imputation methods for
FRBCSs with missing values is stated. The analysis suggests that each type behaves differently while the use of determined
missing values imputation methods could improve the accuracy obtained for these methods. Thus, the use of particular imputation
methods conditioned to the type of FRBCSs is required.

  • Content Type Journal Article
  • Category Focus
  • Pages 1-19
  • DOI 10.1007/s00500-011-0774-4
  • Authors
    • Julián Luengo, Deptartment of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain
    • José A. Sáez, Deptartment of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain
    • Francisco Herrera, Deptartment of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain

Mining fuzzy association rules from low-quality data

Abstract  Data mining is most commonly used in attempts to induce association rules from databases which can help decision-makers easily
analyze the data and make good decisions regarding the domains concerned. Different studies have propose…

Abstract  

Data mining is most commonly used in attempts to induce association rules from databases which can help decision-makers easily
analyze the data and make good decisions regarding the domains concerned. Different studies have proposed methods for mining
association rules from databases with crisp values. However, the data in many real-world applications have a certain degree
of imprecision. In this paper we address this problem, and propose a new data-mining algorithm for extracting interesting
knowledge from databases with imprecise data. The proposed algorithm integrates imprecise data concepts and the fuzzy apriori
mining algorithm to find interesting fuzzy association rules in given databases. Experiments for diagnosing dyslexia in early
childhood were made to verify the performance of the proposed algorithm.

  • Content Type Journal Article
  • Category Focus
  • Pages 1-19
  • DOI 10.1007/s00500-011-0775-3
  • Authors
    • A. M. Palacios, Department of Computer Science, University of Oviedo, 33204 Gijón, Spain
    • M. J. Gacto, Department of Computer Science, University of Jaén, 23071 Jaén, Spain
    • J. Alcalá-Fdez, Department of Computer Science and Artificial Intelligence, CITIC-UGR, University of Granada, 18071 Granada, Spain

A fuzzy regression model based on distances and random variables with crisp input and fuzzy output data: a case study in biomass production

Abstract  Least-squares technique is well-known and widely used to determine the coefficients of a explanatory model from observations
based on a concept of distance. Traditionally, the observations consist of pairs of numeric values. Howeve…

Abstract  

Least-squares technique is well-known and widely used to determine the coefficients of a explanatory model from observations
based on a concept of distance. Traditionally, the observations consist of pairs of numeric values. However, in many real-life
problems, the independent or explanatory variable can be observed precisely (for instance, the time) and the dependent or
response variable is usually described by approximate values, such as “about

£300

” or “approximately $500”, instead of exact values, due to sources of uncertainty that may affect the response. In this paper,
we present a new technique to obtain fuzzy regression models that consider triangular fuzzy numbers in the response variable.
The procedure solves linear and non-linear problems and is easy to compute in practice and may be applied in different contexts.
The usefulness of the proposed method is illustrated using simulated and real-life examples.

  • Content Type Journal Article
  • Category Focus
  • Pages 1-11
  • DOI 10.1007/s00500-011-0769-1
  • Authors
    • C. Roldán, Department of Statistics and Operations, University of Jaén, Las Lagunillas s/n, Jaén, Spain
    • A. Roldán, Department of Statistics and Operations, University of Jaén, Las Lagunillas s/n, Jaén, Spain
    • J. Martínez-Moreno, Department of Mathematics, University of Jaén, Las Lagunillas s/n, Jaén, Spain

A cooperative coevolutionary approach dealing with the skull–face overlay uncertainty in forensic identification by craniofacial superimposition

Abstract  Craniofacial superimposition is a forensic process where photographs or video shots of a missing person are compared with
the skull that is found. By projecting both photographs on top of each other (or, even better, matching a sca…

Abstract  

Craniofacial superimposition is a forensic process where photographs or video shots of a missing person are compared with
the skull that is found. By projecting both photographs on top of each other (or, even better, matching a scanned three-dimensional
skull model against the face photo/video shot), the forensic anthropologist can try to establish whether that is the same
person. The whole process is influenced by inherent uncertainty mainly because two objects of different nature (a skull and
a face) are involved. In previous work, we categorized the different sources of uncertainty and introduced the use of imprecise
landmarks to tackle most of them. In this paper, we propose a novel approach, a cooperative coevolutionary algorithm, to deal
with the use of imprecise cephalometric landmarks in the skull–face overlay process, the main task in craniofacial superimposition.
Following this approach we are able to look for both the best projection parameters and the best landmark locations at the
same time. Coevolutionary skull–face overlay results are compared with our previous fuzzy-evolutionary automatic method. Six
skull–face overlay problem instances corresponding to three real-world cases solved by the Physical Anthropology Lab at the
University of Granada (Spain) are considered. Promising results have been achieved, dramatically reducing the run time while
improving the accuracy and robustness.

  • Content Type Journal Article
  • Category Focus
  • Pages 1-12
  • DOI 10.1007/s00500-011-0770-8
  • Authors
    • O. Ibáñez, European Centre for Soft Computing, 33600 Mieres, Asturias, Spain
    • O. Cordón, European Centre for Soft Computing, 33600 Mieres, Asturias, Spain
    • S. Damas, European Centre for Soft Computing, 33600 Mieres, Asturias, Spain

New algorithms for finding approximate frequent item sets

Abstract  In standard frequent item set mining a transaction supports an item set only if all items in the set are present. However,
in many cases this is too strict a requirement that can render it impossible to find certain relevant groups…

Abstract  

In standard frequent item set mining a transaction supports an item set only if all items in the set are present. However,
in many cases this is too strict a requirement that can render it impossible to find certain relevant groups of items. By
relaxing the support definition, allowing for some items of a given set to be missing from a transaction, this drawback can
be amended. The resulting item sets have been called approximate, fault-tolerant or fuzzy item sets. In this paper we present
two new algorithms to find such item sets: the first is an extension of item set mining based on cover similarities and computes
and evaluates the subset size occurrence distribution with a scheme that is related to the Eclat algorithm. The second employs
a clustering-like approach, in which the distances are derived from the item covers with distance measures for sets or binary
vectors and which is initialized with a one-dimensional Sammon projection of the distance matrix. We demonstrate the benefits
of our algorithms by applying them to a concept detection task on the 2008/2009 Wikipedia Selection for schools and to the
neurobiological task of detecting neuron ensembles in (simulated) parallel spike trains.

  • Content Type Journal Article
  • Category Focus
  • Pages 1-15
  • DOI 10.1007/s00500-011-0776-2
  • Authors
    • Christian Borgelt, European Centre for Soft Computing, c/ Gonzalo Gutiérrez Quirós s/n, 33600 Mieres (Asturias), Spain
    • Christian Braune, European Centre for Soft Computing, c/ Gonzalo Gutiérrez Quirós s/n, 33600 Mieres (Asturias), Spain
    • Tobias Kötter, Department of Computer Science, University of Konstanz, Box 712, 78457 Constance, Germany
    • Sonja Grün, RIKEN Brain Science Institute, Wako-Shi, Saitama 351-0198, Japan