DKMP 2014

Home Paper Submission Program Committee Accepted Papers Contact Us AIRCC

Accepted Papers

Catalog Integration Based on Taxonomy and Semi Supervised Algorithm
Bharathi B, GKM College of Engineering and Technology, India
ABSTRACT

Integration of data is the major important task for online ecommerce based web portals and commerce search engine based application. The integration of task faced by online commercial portals and e-commerce search engines are the integration of products coming from multiple providers to their creation of product catalogs. Cataloging of products from the data provider into the master taxonomy and while formulate use of the information provider taxonomy data become major problem. Conquer this difficulty classify the products based on textual based classifier and taxonomy-aware step with the purpose of adjust the outcome of a textual based classifier to make sure that products that are close as one in the provider taxonomy. Traditional supervised classification algorithms require a large number of labeled examples to perform accurately. Semi-supervised classification algorithms attempt to overcome this major limitation by also using unlabelled examples. Unlabelled examples have also been used to improve nearest neighbor text classification in a method called bridging. In this paper, we propose the use of bridging in a semi-supervised setting. We introduce a new bridging algorithm that can be used as a base classifier in any supervised approach such as co-training or self learning. We empirically show that classification performance increases by improving the semi-supervised algorithm's ability to correctly assign labels to previously unlabelled data.
Automatic Document Categorization Based on Statistics Trait using Dirichlet Process Mixture Model
Baviya M, GKM College of Engineering and Technology, India
ABSTRACT
In data mining, retrieval of extract data for user query is considered as one of the major task. When the user enters the query first we apply stemming algorithm to remove the stop words in the query. With help of keyword the document are searched. The documents related to that keyword are displayed to the user. Sometimes the unknown documents contains the data related to the keyword, such documents are not displayed to the user. In this paper we create a new document using Dirichlet Process Mixture Model with feature selection which groups the documents into an optimal number of clusters while the number of clusters K is discovered automatically. The unknown document is converted into known documents. Then the documents are clustered. Based on the Scoring algorithm, the documents are principally categorized into corresponding Clusters. As Per the Users request, the corresponding document is transferred to the User. We also retrieve the best relevant documents based on Top K query for effective and efficient data retrieval system.
An Efficient Approach for the Detection and Removal of Overlapping Clusters
Lavanya.C and R.Vanitha, Jerusalem College of Engineering, India
ABSTRACT

In this paper, the data elements of a stream are mapped into a kernel space, and cluster boundaries are constructed of arbitrary shape. To adapt to both dramatic and gradual changes, multiple spheres are dynamically maintained, each describing the corresponding data domain presented in the data stream. By allowing for bounded support vectors (BSVs), the SVStream algorithm is capable of identifying overlapping clusters. A BSV decaying mechanism is designed to automatically detect and remove Outliers (noise). To the end, a Multisphere representation is proposed, where multiple spheres are dynamically maintained in a sphere set. In this paper, experiments over synthetic and real data streams were performed, with the overlapping, evolving, and noise situations taken into consideration. Comparison results with state-of-the-art data stream clustering methods demonstrate the effectiveness and efficiency of the proposed method.
Extracting Significant Patterns for Oral Cancer Detection using Apriori Algorithm
Neha Sharma¹ and Hari Om² , ¹ Pad.Dr.D.Y.Patil Institute of MCA, India and ² Indian School of Mines Dhanbad, India
ABSTRACT

Presently, no effective tool exists for early diagnosis and treatment of oral cancer. Here, we describe an approach for cancer detection and prevention based on analysis using association rule mining. The data analyzed is pertaining to clinical symptoms, history of addiction, co-morbid condition and survivability of the cancer patients. The extracted rules are useful in taking clinical judgments and making right decisions related to the disease. The results shown here are promising and show the potential use of this approach toward eventual development of diagnostic assay and treatment with sufficient support and confidence suitable for detection of early-stage oral cancer.
A Database Sanitization Algorithm for Hiding Sensitive Multi-Level Association Rule Mining
Saad M. Darwish, Magda M. Madbouly and Mohamed A. Elhakeem, University of Alexandria, Egypt
ABSTRACT

The sharing of information has been proven to be beneficial for business partnerships in many application areas such as business planning or marketing. Today, association rule mining imposes an intimidation to data sharing, since it may disclose patterns and various kinds of sensitive knowledge that are difficult to find otherwise. Such information is to be protected against unauthorized access. The challenge is on protecting actionable knowledge for strategic decisions, but at the same time not losing the great benefit of association rule mining. To address this challenging, a sanitization process transforms the source database into a released database that the counterpart cannot extract sensitive rules from it. Unlike existing works that focused on hiding sensitive association rules at a single concept level, this paper emphasizes on building a sanitization algorithm for hiding association rules at multi concept levels. Employing multi-level association rule mining may lead to the discovery of more specific and concrete knowledge from datasets. The proposed system uses genetic algorithm as an important optimization strategy for modifying multi-level items in database in order to minimize sanitization's side effects such as non-sensitive rules falsely hidden and spurious rules falsely generated. The new approach is empirically tested and compared with other sanitization algorithms depicting considerable improvement in completely hiding any given multi-level rule that in turn can fully support security of database and keeping the utility and certainty of mined multi-level rules at highest level.
Multiclass Emotion Extraction from Sentences
Bincy Thomas, Vinod P and Dhanya K A, SCMS School of Engineering & Technology, India
ABSTRACT

This paper aims to investigate the extraction of different classes of emotion from sentences using supervised machine learning technique, Multinomial Naive Bayes (MNB). Here a bag of word approach is used to capture the emotions. The unigrams are mainly used for this and the bigrams and trigrams are used to capture lower order dependencies. The work is done on the ISEAR dataset [14]. The experiments with different feature sets selected using Weighted log-likelihood score (WLLS) [12] shows that the MNB classifier provides good results when the unigram feature set size is 450 which provides an average accuracy of 76.96% across all emotion classes.
Big Data: Paving the Road to Improved Customer Support Efficiency
Ajay Parashar, Tata Consultancy Services, India
ABSTRACT

The organizational adage 'customer is king' is not new. With a significant number of organizational resources devoted to understanding the 'king's' needs and responding to them, this phrase, in today's competitive business arena, is an understatement. Customer delight is one area that organizations across industries must especially focus on; by understanding how customers interact with products and services, the features they use and like and the informal feedback they provide through unconventional channels such as Social Media. This will help organizations not only to improve customer experience, but also to drive efficiency improvements in the processes that help create products and services.

The Big Data platform is emerging as the platform of choice to fuse these individual components to form a robust structure that represents the modern day customer support ecosystem. Each of these individual components has undergone a technology evolution that has enabled them to exist in the first place. A technology solution riding on the Big Data wave is just a catalyst that a flat-world organization today needs to re-energize its customer service effort and venture out to capture newer horizons. This white paper looks at the different components that make up the current customer support service environment and the challenges they pose to a uniform integration strategy; and finally it highlights how Big Data can be leveraged to achieve this strategy.
Finding Frequent and Maximal Periodic Patterns in Spatiotemporal Databases towards Big Data
O.Obulesu¹ and A. Rama Mohan Reddy², ¹SVEC, India and ²SVUCE, India
ABSTRACT

Data mining used to find hidden knowledge from large amount of Databases. Periodic Pattern Mining is useful in Weather Forecasting, Fraud Detection and GIS Applications. In General, spatio-temporal pattern discovery process finds the partially ordered subsets of the event-types whose instances are located together and occur serially for a given collection of Boolean spatio-temporal event-types. Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data is now rapidly expanding in all science and engineering domains, including physical, biological and bio-medical sciences. In this paper, a new framework is proposed to find frequent and maximal spatiotemporal patterns in Big Data. Existing algorithms are well in computation of necessary maximal patterns, but more problematic when they are applied to Big Data. Big Data mining is a new trend used to analyse the datasets that due to their large size and complexity. Developers cannot manage them with traditional current algorithms or data mining software tools. Big Data mining is the capability of extracting useful information from the large datasets or streams of data, that due to its volume, variety, and velocity, it was not possible before to do it. The big challenge is to find frequent and maximal Spatio-Temporal patterns in big data are becoming one of the most exciting opportunities for the next years. All experimental results shows a broad overview of pattern mining algorithms and significance in Spatiotemporal Databases, its current status, trade-offs, and forecast to the big data pattern mining future.

Study on Improving User Navigation by Reorganizing Web Structure Based on Link Mining
Deepshree A. Vadeyar and Yogish H K, East West Institute of Technology, India
ABSTRACT

Website design is easy task but, to navigate user efficiently is big challenge. One of the reason is user behaviour is keep changing and web developer or designer not think according to user's behaviour, so to improve user navigability reorganizing website can be done by web transformation and web personalization. We proposed web transformation by using link mining and we used categorization of link like 1 or 0 for present and absence, for clustering of links we considered each page as node and link as edges later clustering performed on weblog like time stamp and number clicks on page.
Efficient Classification of Text Documents using Semantic and Machine Learning Approaches
Harshita P, Ankita Bhandary, Jyothi K V, Abhilashini A J and Anil Kumar K M, Sri Jayachamarajendra College of Engineering, India
ABSTRACT

Text categorization is the automatic sorting of documents into predefined categories from a set. With the expanding internet, there has been a boom in the number of documents available online. It is here that the text classification gains importance. Text classification has wide applications in genre classification, email classification, language identification and sorting through digitized archives.Subjectivity or Objectivity identification is a sub domain in opinion mining, and is based on classifying a given text into one of the two classes : Objective or Subjective. Subjective and objective text classification is widely used in product reviews, video reviews, social public opinion analysis and micro-blogging attitude analysis. This paper uses the words occurring in the document to categorize the text into predefined categories and also for Subjectivity or Objectivity identication. We explore the various semantic and machine learning approaches, which can aid in efficient classification of text.
Link-Based Classification for MultiRelational Database
Urvashi Mistry, Charotar University of Science and Technology, India
ABSTRACT

Classification is most popular data mining tasks with a wide range of applications. As converting data from multiple relations into single flat relation usually causes many problems so classification task across multiple database relations becomes challenging task. It is counterproductive to convert multi-relational data into single flat table because such conversion may lead to the generation of huge relation and lose of essential semantic information. In this paper we propose two algorithms for Multi-Relational Classification (MRC). To take advantage of linkage relationship and to link target table with different tables, a semantic relationship graph (SRG) is used. In First approach we have used Naive Bayesian Combination to combine heterogeneous classifiers result to get class label. This will classify the instance accurately and efficiently. Second approach is Multi-Relational Classification using Decision Template (DT). Decision profile is created to combine heterogeneous classifiers output. Based on similarity measure decision template and decision profile is compared to get final output. DT takes contribution of each classifiers output rather than class-conscious. So classification accuracy is improved.

AIRCC Library

Courtesy

Technically Sponsored by