HomePaper SubmissionProgram CommitteeAccepted PapersWorkshopsContact UsAIRCC

Accepted Papers

  • Application of Distributed Datamining Techniques for Email Forensics
    Salhi Dhai eddine1, Tari Abdelkamal2 and Kechadi M-Tahar1,1 University of bejaia - Algeria,2 University college Dublin - Ireland
    ABSTRACT
    In our days, the emails have become a daily means of communication most popular accessible via Internet. Accounts in our reception we receive emails gangs (forensics), but we do not know.
    From there, the idea of building a system of automatic check is coming a necessity.
    To this end, in this paper we present a new method of treatment of emails to extract the bad emails in a mail server or an inbox of a user, using distributed data mining techniques. This study will reduce the risk of email users being hacked and even gives out to mail server administrators to detect bad emails and put the servers more secure.
  • Visualization Of A Synthetic Representation Of Association Rules To Assist Expert Validation
    Amdouni Hamida1 and Gammoudi Mohamed Mohsen2,1FST, University of Tunis ElManar,2ISAMM, University of Manouba, Tunisia,
    ABSTRACT
    In order to help the expert to validate association rules, some quality measures are proposed in the literature. We distinguish two categories: objective and subjective measures. The first one depends on a fixed threshold and on data structure from which the rules are extracted. The second one has two subcategories: The first one consists on providing to the expert a tool for rule interactive exploration. In fact, they present these rules in textual form. The second subcategory includes the use of visualization systems to facilitate the task of rules mining. However, this last subcategory assumes that experts have statistical knowledge to interpret and validate association rules. Furthermore, the statistical methods have a lack of semantic representation and could not help the experts during the process of validation. To solve this problem, we propose in this paper a method which visualizes to the experts a synthetic representation of association rules as a formal conceptual graph (FCG). FCG represents his area of interest and allows him to realize the task of rules mining easily due to its semantic richness.
  • Planning Based On Classification By Induction Graph
    Sofia Benbelkacem, Baghdad Atmani and Mohamed Benamina, University of Oran, Algeria
    ABSTRACT
    In Artificial Intelligence, planning refers to an area of research that proposes to develop systems that can automatically generate a result set, in the form of an integrated decision-making system through a formal procedure, known as plan. Instead of resorting to the scheduling algorithms to generate plans, it is proposed to operate the automatic learning by decision tree to optimize time. In this paper, we propose to build a classification model by induction graph from a learning sample containing plans that have an associated set of descriptors whose values change depending on each plan. This model will then operate for classifying new cases by assigning the appropriate plan.
  • Transformation Rules For Building Owl Ontologies From Relational Databases
    Mohammed Reda Chbihi Louhdi1, Hicham Behja2 and Said Ouatik El Alaoui1, 1Dhar El Mehraz, Fez,2Ecole Nationale Superieure d'Electricite et de Mecanique, Casablanca, Morocco
    ABSTRACT
  • Relational Databases (RDB) are used as the backend database by most of information systems. RDB encapsulate conceptual model and metadata needed in the ontology construction. Schema mapping is a technique that is used by all existing approaches for ontology building from RDB. However, most of those methods use poor transformation rules that prevent advanced database mining for building rich ontologies. In this paper, we propose transformation rules for building owl ontologies from RDBs. It allows transforming all possible cases in RDBs into ontological constructs. The proposed rules are enriched by analyzing stored data to detect disjointness and totalness constraints in hierarchies, and calculating the participation level of tables in n-ary relations. In addition, our technique is generic; hence it can be applied to any RDB. The proposed rules were evaluated using a normalized and open RDB. The obtained ontology is richer in terms of non- taxonomic relationships.
  • Demand - Drivan Asset Reutilization Analytics
    Abbas R. Ali, Pitipong J. Lin, Dr., and Raul Zeng,IBM,United Kingdom
    ABSTRACT
    Manufacturers have long benefitted from reusing returned products and parts. This beneficial approach can help contain costs and the manufacturer to play a role in sustaining the environment. Reusing returned products and parts aids sustainability by reducing the use of raw materials, eliminating energy use to produce new parts, and minimizing waste materials. However, handling returns effectively and efficiently can be difficult if the processes and systems do not provide the visibility that is necessary to track, manage, and re-use the returns.
    This paper applies advanced analytics to support reutilization on procurement data to increase reutilization in new build by optimizing Equal-to-New (ETN) parts return. This will reduce 'the spend' on new buy parts for building new product units. The process involves forecasting and matching returns supply to demand for new build. Complexity in the process is the forecasting and matching while making sure a reutilization engineering process is available. Also, this will identify high demand/value/yield parts for Development Engineering to focus.
  • Customer relationship management by Semi-supervised learning
    Siavash Emtiyazand Shilan RahmaniAzar,Sardasht Branch, Islamic Azad University,Iran
    ABSTRACT
    With the increase of customer information and the rapid change of customer requirements, the need for automated intelligent systems is becoming more vital. An automated system reduces human
    intervention, improves the quality of information extracted, and provides fast feedback for decision making purposes. This study investigates the use of a technique, semi-supervised learning, for the
    management and analysis of customer-related data warehouse and information. The idea of semisupervised learning is to learn not only from the labeled training data, but to exploit also the structural
    information in additionally available unlabeled data. The proposed semi-supervised method is a model by means of a feed-forward neural network trained by a back propagation algorithm (multi-layer
    perceptron) in order to predict the category of an unknown customer (potential customers). In addition, this technique can be used with Rapid Miner tools for both labeled and unlabeled data.
  • Membership calculation based on dimension Hierarchical division
    Jinlei Wang1'2 , Ping Zhou1 , Xiankai Chen2, Guanjun Zhang2,1GuiLin University of Electronic Technology, GuiLin, China,2Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciences,china
    ABSTRACT
    Since dataset usually contain noises, it is very helpful to find out and remove the noise in a preprocessing step. Fuzzy membership can measure a sample's weight. The weight should be smaller for noise sample but bigger for important sample. Therefore, appropriate sample memberships are vital. In this paper, we propose a novel approach, Membership Calculate based on Hierarchical Division (MCHD), to calculate the membership of training samples. MCHD uses the conception of dimension similarity, which develop a bottom-up clustering technique to calculate the sample membership iteratively. The weight of membership that we computed in each iteration will be considered. The experiment indicates that MCHD can effectively detect noise and removes them from the dataset. Fuzzy support vector machine based on MCHD outperforms most of approaches published recently and hold the better generalization ability to handle the noise.
  • Clustering Methodology apply in MANET for Energy Aware and Power Optimisation
    Sajal Kanta Das,Women's PolytechnicHapania,Agartala,India.
    ABSTRACT
    The study of Mobile Ad-hoc Network remains attractive due to the desire to achieve better performance and scalability. MANETs are distributed systems consisting of mobile hosts that are connected by multi-hop wireless links. Such systems are self-organized and facilitate communication in the network without any centralized administration. MANETs exhibit battery power constraint and suffer scalability issues therefore cluster formation is expensive. This is due to the large number of messages passed during the process of cluster formation.
    Clustering has evolved as an imperative research domain that enhances system performance such as throughput and delay in Mobile Ad hoc Networks (MANETs) in the presence of both mobility and a large number of mobile terminals.In this thesis, we present a clustering scheme that minimizes message overhead and congestion for cluster formation and maintenance. The algorithm is devised to be independent of the MANET Routing algorithm. Depending upon the context, the clustering algorithm may be implemented in the routing or in higher layers. The dynamic formation of clusters helps reduce data packet overhead, node complexity and power consumption. The simulation shows that the number of clusters formed is in proportion with the number of nodes in MANET.
  • Exploiting Context in Kernel-Mapping Recommender System Algorithms
    Mustansar Ali Ghazanfar1, Adam Prugel-Bennett2,1University of Engineering and Technology,Pakistan,2University of Southampton,United Kingdom.
    ABSTRACT
    Recommender systems apply machine learning techniques for filtering unseen information and can predict whether a user would like a given item. Kernel Mapping Recommender (KMR) algorithms have been proposed which give state-of-the-art performance. In this paper, we show how context information can be added to Kernel-Mapping Recommender (KMR). We consider the trusted friends of a user as their social context and show how this information can be used to provide more personalised, refined, and trustworthy recommendations. The limited set of friends, however, restricts the amount of data available to create useful recommendations. This paper sheds light on this issue and specifically on the amount of friends necessary to get satisfactory recommendations. Furthermore, we describe how the proposed system might be used to generate recommendation in distributed way rather than traditional centralised one.
  • Extraction of Features for Predicting Patterns of Heart Disease
    Iqra Basharat, Mamuna Fatima, Ali Raza Anjum and Shoab Ahmed Khan,National University of Sciences & Technology, Pakistan.
    ABSTRACT
    There is a huge amount of 'knowledge-enriched data' in hospitals, which needs to be processed in order to extract useful information from it. That knowledge-enriched data is very useful in making valuable medical decisions. However, there is a lack of effective analysis tools to discover hidden relationships in data. The objective of this research is to analyze the heart patients data and extract the useful information that helps the doctors in making wise decisions. We have a huge quantity of historical unstructured data of patients in the form of their medical reports along with unstructured doctors remarks. In this research, K-means clustering technique is used to extract features for predicting patterns of heart disease. Using patients' medical profiles such as age, sex, ECG, LVEF, EVS, blood pressure and previous history significant features (as male patients above 60 years with high blood pressure and hypertension are having TVCAD) are extracted. Based on these extracted patterns medical practitioners can make intelligent verdicts. Results of this study could be very constructive for medical researchers in the field of medicine research and can help medical team and doctors to suggest best diagnosis for a disease. There is a huge amount of 'knowledge-enriched data' in hospitals, which needs to be processed in order to extract useful information from it. That knowledge-enriched data is very useful in making valuable medical decisions. However, there is a lack of effective analysis tools to discover hidden relationships in data. The objective of this research is to analyze the heart patients' data and extract the useful information that helps the doctors in making wise decisions. We have a huge quantity of historical unstructured data of patients' in the form of their medical reports along with unstructured doctors' remarks. In this research, K-means clustering technique is used to extract features for predicting patterns of heart disease. Using patients' medical profiles such as age, sex, ECG, LVEF, EVS, blood pressure and previous history significant features (as male patients above 60 years with high blood pressure and hypertension are having TVCAD) are extracted. Based on these extracted patterns medical practitioners can make intelligent verdicts. Results of this study could be very constructive for medical researchers in the field of medicine research and can help medical team and doctors to suggest best diagnosis for a disease.
  • Improving Rule-Based Method for Arabic POS Tagging using HMM Technique
    Meryeme Hadni1, Said Alaoui Ouatik1, and Abdelmonaime Lachkar2, 1FSDM, University Sidi Mohamed Ben Abdellah (USMBA), Morocco,2E.N.S.A, University Sidi Mohamed Ben Abdellah (USMBA), Morocco
    ABSTRACT
    Part-of-speech (POS) tagger plays an important role in Natural Language Applications like Speech Recognition, Natural Language Parsing, Information Retrieval and Multi Words Term Extraction. This study proposes a building of an efficient and accurate POS Tagger technique for Arabic language using statistical approach. Arabic Rule-Based method suffers from misclassified and unanalyzed words due to the ambiguity issue. To overcome these two problems, we propose a Hidden Markov Model (HMM) integrated with Arabic Rule-Based method. Our POS tagger generates a set of 4 POS tags: Noun, Verb, Particle, and Quranic Initial (INL). The proposed technique uses the different contextual information of the words with a variety of the features which are helpful to predict the various POS classes. To evaluate its accuracy, the proposed method has been trained and tested with the Quran Corpus containing 77 430 terms for undiacritized Classical Arabic language. The experiment results demonstrate the efficiency of our method for Arabic POS Tagging. The obtained accuracies are respectively 97.6% and 94.4% for our method and for the Rule based tagger method.
  • An Efficient Approach To Improve Arabic Documents Clustering Based On A New Keyphrases Extraction Algorithm
    Hanane Froud , Issam Sahmoudi and Abdelmonaime Lachkar,L.S.I.S, E.N.S.A University Sidi Mohamed Ben Abdellah (USMBA) Fez, Morocco
    ABSTRACT
    Document Clustering algorithms group a set of documents into subsets or clusters. The algorithms goal is to create clusters that are coherent internally, but clearly different from each other. In other words, documents within a cluster should be as similar as possible and documents in one cluster should be as dissimilar as possible from documents in other clusters. This task is often can be affected by the documents contents, the useful words on the documents is often accompanied by a large amount of noise words. Therefore, it is necessary to eliminate the noise word and keeping just the useful information to improve the performance of Documents Clustering algorithms.
  • Cardiac Data Mining (CDM); Organization and Predictive Analytics on Biomedical (Cardiac) Data
    M.Musa Bilal, Masood Hussain, Iqra Basharat, Mamuna Fatima,College of E&ME NUST,pakistan
    ABSTRACT
    Data mining and data analytics has been of immense importance to many different fields as we witness the evolution of data sciences over recent years. Biostatistics and Medical Informatics has proved to be the foundation of many modern biological theories and analysis techniques. These are the fields which applies data mining practices along with statistical models to discover hidden trends from data that comprises of biological experiments or procedures on different entities. The objective of this research study is to develop a system for the efficient extraction, transformation and loading of such data from cardiologic procedure reports given by Armed Forces Institute of Cardiology. It also aims to devise a model for the predictive analysis and classification of this data to some important classes as required by cardiologists all around the world. This includes predicting patient impressions and other important features..
  • Decision Tree Clustering: A Column-Stores Tuple Reconstruction
    Tejaswini Apte1, Dr. Maya Ingle2 and Dr. A.K.Goyal2, 1Symbiosis Institute of Computer Studies and Research, India,2Devi Ahilya VishwaVidyalaya,India
    ABSTRACT
    Column-Stores gained popularity as a promising physical design alternative for aggregate queries. However, for multi-attribute queries column-stores pays performance penalties due to on-the-fly tuple reconstruction. This paper presents an adaptive approach for reducing tuple reconstruction time. Our approach exploits decision tree algorithm to cluster attributes for each projection and also eliminates
    frequent database scanning. Experimentations with TPC-H data shows the effectiveness of proposed technique.
  • The Application Of Improved Dynamic Decision Tree Based On Particle Swarm Optimization During Transportation Process
    LI Xin-hai and LI Li,Shijiazhuang University, China
    ABSTRACT
    Data mining during transport between the various environmental parameters and event variables come to the corresponding decision tree, by which we are able to predict the occurrence of the event under the certain conditions. Inspired by No Free Lunch Theorem (NFL) , the validity of the decision tree is improved based on Particle Swarm Optimization . Compare the actual outcome to verify the higher efficiency of the new algorithm.
  • Mining Triadic Association Rules
    Sid Ali Selmane1, Rokia Missaoui2, Omar Boussaid1 and Fadila Bentayeb1, 1Pierre Mendes France, France and 2rue Saint-Jean-Bosco Gatineau (Quebec), Canada
    ABSTRACT
    The objective of this research is to extract triadic association rules from a triadic formal context K := (K1, K2, K3, Y) where K1, K2 and K3 respectively represent the sets of objects, properties (or attributes) and conditions while Y is a ternary relation between these sets. Our approach consists to define a procedure to map a set of dyadic association rules into a set of triadic ones. The advantage of the triadic rules compared to the dyadic ones is that they are less numerous and more compact than the second ones and convey a richer semantics of data. Our approach is illustrated through an example of ternary relation representing a set of Customers who purchase their Products from Suppliers. The algorithms and approach proposed have been validated with experimentations on large real datasets.