Second International Conference of Database and Data Mining (DBDM 2014)

Venue : Coral Deira - Dubai, Deira, Dubai, UAE. & Date : April 4~5, 2014

Accepted Papers

Models and Statistics Generated from Social Networks’ Data are Flawed: UAE Context
Mhamed Zineddine, ALHOSN University, UAE
ABSTRACT

The Internet and other innovative communication technologies have been pervasively invading our lives at will. Social Networks emerged as a viable platform for Internet users to communicate, re-establish relationships, share ideas, exchange multimedia of all types and others. A considerable number of users are joining these networks daily. Massive amount of data and information is generated every day, which is claimed to be a gold mine for marketers, behavior researchers, and others. This study presents findings from a survey of 136 users of different social websites. Our results reveal that a considerable amount of data generated by SNs’ users is bogus. Due to security and privacy and other issues, users are afraid to reveal the truth about them and their opinions. Contrary to SNs administrators and other stakeholders’ belief, marketing models, consumers’ behavior and profiling models and other statistical results generated form this data are flawed and absolutely inaccurate. New methods and techniques are required to validate the data collected.
Using Relational Model to Store Owl Ontologies and Facts
Tarek Bourbia and Mahmoud Boufaida, University Constantine, Algeria
ABSTRACT

The storing and the processing of OWL instances are important subjects in database modeling. Many research works have focused on the way of managing OWL instances efficiently. Some systems store and manage OWL instances using relational models to ensure their persistence. Nevertheless, several approaches keep only RDF triplets as instances in relational tables explicitly, and the manner of structuring instances as graph and keeping links between concepts is not taken into account. In this paper, we propose an architecture that permits relational tables behave as an OWL model by adapting relational tables to OWL instances and an OWL hierarchy structure. Therefore, two kinds of tables are used: facts or instances relational tables. The tables hold instances and the OWL table holds a specification of how the concepts are structured. Instances tables should conform to OWLtable to be valid. A mechanism of construction of OWLtable and instances tables is defined in order to enable and enhance inference and semantic querying of OWL in relational model context.
Multi-Word Term Extraction Based on New Hybrid Approach for Arabic Language
Meryeme Hadni, Abdelmonaime Lachkar and Said Alaoui Ouatik, University Sidi Mohamed Ben Abdellah, Morocco
ABSTRACT

Arabic Multiword terms (AMWTs) are relevant strings of words in text documents. Once they are automatically extracted, they can be used to increase the performance of any text mining applications such as Categorisation, Clustering, Information Retrieval System, Machine Translation, and Summarization, etc. This paper introduces our proposed Multiword term extraction system based on the contextual information. In fact, we propose a new method based a hybrid approach for Arabic Multiword term extraction. Like other method based on hybrid approach, our method is composed by two main steps: the Linguistic approach and the Statistical one. In the first step, the Linguistic approach uses Part Of Speech (POS) Tagger (Taani’s Tagger) and the Sequence Identifier as patterns in order to extract the candidate AMTWs. While in the second one which includes our main contribution, the Statistical approach incorporates the contextual information by using a new proposed association measure based on Termhood and Unithood for AMWTs extraction. To evaluate the efficiency of our proposed method for AMWTs extraction, this later has been tested and compared using three different association measures: the proposed one named NTC-Value, NC-Value, and C-Value. The experimental results using Arabic Texts taken from the environment domain, show that our hybrid method outperforms the other ones in term of precision, in addition, it can deal correctly with tri-gram Arabic Multiword terms.
A Link-Based Approach to Entity Resolution in Social Networks
Gergo Barta, Budapest University of Technology and Economics, Hungary
ABSTRACT

Social networks initially had been places for people to contact each other, find friends or new acquaintances. As such they ever proved interesting for machine aided analysis. Recent developments, however,pivoted social networks to being among the main fields of informationexchange, opinion expression and debate. As a result there is growing interest in both analyzing and integrating social network services. In this environment efficient information retrieval is hindered by the vast amount and varying quality of the user-generated content. Guiding users to relevant information is a valuable service and also a difficult task, where a crucial part of the process is accurately resolving duplicate entities to real-world ones. In this paper we propose a novel approach that utilizes the principles of link mining to successfully extend the methodology of entity resolution to multitype problems. The proposed method is presented using an illustrative social network-based real-world example and validated by comprehensive evaluation of the results.
Extending Dynamic SOMs to Capture Incremental Changes in Data
Buddhima Wijeweera¹, Thushan Ganegedara¹, Ruwan Gunarathne¹, Lasindu Charith Vidana Pathiranage¹, Damminda Alahakoon² and Shehan Perera¹, ¹University of Moratuwa, Australia,²Deakin University, Australia
ABSTRACT

Humans learn in an incremental manner. Due to this reason, humans continuously refine their knowledge about the world with the experience gained. Many attempts have been made in the machine learning area to employ incremental learning in computer systems. Incremental learning, in contrast to one-time learning is far more useful and effective when data is not completely available at once. Here we look at an incremental learning algorithm known as IKASL algorithm. We propose several modifications to the original algorithm which enhances its performance. Moreover, the paper reports results of several experiments conducted using several datasets, to assess the necessity and value of incremental learning in the real world.
Arabic Opinion Mining Using the Appraisal Linguistic Theory
ABIDI karima¹ and Guiassa Yamina Tlili², ¹École supérieur d’informatique Alger, Algerie and ²Université Badji Mokhtar-Annaba Annaba, Algrie
ABSTRACT

Little work to date in Arabic sentiment analysis ( opinion mining) one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining, who consists in locating the passages carrying opinion in a textual collection so for classifying them that it is objective or subjective. Currently, many works is limited to the concept of polarity (positive, negative, and neutral). The main objective of our work is to seek to identify Arabic text segments carriers of opinion, and especially to classify them by type (affect, judgement , appreciation,…) in using the theory of Appraisal.
A Survey on Elliptic Curve Digital Signature Algorithm and its Variants
Greeshma Sarath, Devesh C Jinwala and Sankita J Patel, S V National Institute of Technology, Surat
ABSTRACT

The Elliptic Curve Digital Signature Algorithm(ECDSA) is an elliptic curve variant of the Digital SignatureAlgorithm (DSA). It gives cryptographically strong digital signatures making use of Elliptic curve discrete logarithmic problem. It uses arithmetic with much smaller numbers 160/256 bits instead of 1024/2048 bits in RSA and DSA and provides the same level of security. The ECDSA was accepted in 1999 as an ANSI standard,and was accepted in 2000 as IEEE and NIST standards. It was also accepted in 1998 as an ISO standard. Many cryptologist have studied security aspects of ECDSA and proposed different variants. In this paper, we discuss a detailed analysis of the original ECDSA and all its available variants in terms of the security level and execution time of all the phases. To the best of our knowledge, this is a unique attempt to juxtapose and compare the ECDSA with all of its vairants.
A Study of Clustering Techniques for Crop Prediction - A Survey
Utkarsha P.Narkhede and K.P.Adhiya, SSBT'S College of Engineering and Technology Bambhori, India
ABSTRACT

TFarming community necessitate for well organized system to predict and improve the crop over the world. The complexity of predicting the best crops is highly due to unavailability of proper knowledge discovery in crop knowledgebase which affects the quality of prediction. However, Clustering is an important step in mining useful information. There are several clustering methods such as partitioning, hierarchical, model-based, grid-based, constrained-based which make this task complicated due to problems related to optimization and noise. In this review paper there is a comparative study of clustering algorithms. Out of these BeeHive and Improved k-means clustering algorithm are outstanding in solving the optimization problem which led to select for performance evaluation in order to get good quality of clusters for crop prediction.
Study and Review of Genetic Neural Approaches for Data Mining
Nilakshi Waghulde and Nilima Patil, North Maharashtra University, India
ABSTRACT

Data mining techniques are used to explore, analyse and extract data using complex algorithms in order to discover unknown patterns. Neural network which is one of the data mining techniques was proved tobe universal approximator. Neural network is able to learn a mapping between input and output nodes while the hidden nodes and weights between them contain the internal representation of the input which trains the network with local convergence. As the initialization of neural network weights is a blind process and neural network is slow to converge then it is difficult to find global optimal solution. A fixed structure of the neural network may not provide the optimal performance within the training period so the number of hidden layers and hidden nodes for particular neural network also plays an important role. Hence, this paper presents a Genetic Neural Network technique that takes advantage of global optimization of genetic algorithm for initialization of neural network along with the calculation of the number of hidden nodes and hidden layers for neural network which train the network with proper selection of neural network architecture.
Improved Neural Network Prediction Performances of Electricity Demand: Modifying Inputs Through Clustering
K.A.D. Deshani¹, Liwan Liyanage Hansen², M.D.T. Attygalle³, A. Karunaratne³, ¹University of Colombo,Sri Lanka, ²University of Western Sydney, Australia and ³University of Colombo, Sri Lanka
ABSTRACT

Accurate prediction of electricity demand can bring extensive benefits to any country as the forecast values help the relevant authorities to take decisions regarding electricity generation, transmission and distribution much appropriately. The literature reveals that, when compared to conventional time series techniques, the improved artificial intelligent approaches provide better prediction accuracies. However, the accuracy of predictions using intelligent approaches like neural networks are strongly influenced by the correct selection of inputs and the number of neuro-forecasters used for prediction. This research shows how a cluster analysis performed to group similar day types, could contribute towards selecting a better set of neuro-forecasters in neural networks. Daily total electricity demands for five years were considered for the analysis and each date was assigned to one of the thirteen day-types, in a Sri Lankan context. As a stochastic trend could be seen over the years, prior to performing the k-means clustering, the trend was removed by taking the first difference of the series. Three different clusters were found using Silhouette plots, and thus three neuro-forecasters were used for predictions. This paper illustrates the proposed modified neural network procedure using electricity demand data.
Pattern Based Sentiment Feature Extraction Methodology Using Target
NasirGul, University Malaysia Sarawak, Malaysia
ABSTRACT

Nowa days human beings are relying on computers too much and computersshould be smart enough to understand, recognize and express emotions/opinions according to requirements of the humans for the development of a computer based society. The significance of proposed work is the development of a pattern based algorithm for the extraction of features from tourism domain i.e. hotel reviews. The features process is based on linguistic patterns associated with opinion words. The proposed work will provide such a framework that can understand text, extract opinions, classifying the opinions. The proposed Pattern based feature extraction algorithms provide accurate results with improved frequency and recall scores.
Aspect Based– Opinion Mining from Customer Reviews
Amani K Samha, Yuefeng Li and Jinglan Zhang, Queensland University of Technology, Australia
ABSTRACT

Text is the main method of communicating information in the digital age. Messages, blogs, news articles, reviews, and opinionated information abounds on the Internet. People commonly purchase products online and post their opinions about purchased items. This feedback is displayed publicly to assist others with their purchasing decisions, creating the need for a mechanism with which to extract and summarize useful information for enhancing the decision-making process. Our contribution is to improve the accuracy of extraction by combining different techniques from three major areas, named Data Mining, Natural Language Processing techniques and Ontologies. The proposed model sequentially mines product’s aspects and users’ opinions, groups representative aspects by similarity, and generates an output summary. This paper focuses on the task of extracting product aspects and users’ opinions by extracting all possible aspects and opinions from reviews using natural language, ontology, and frequent “tag” sets. The proposed model, when compared to an existing baseline model, yielded promising results.
Instances-Base Ontology Alignment Approach: Populating Concepts By Qualia Structure
Abderrahmane Khiat, University of Oran Es-Senia, Algeria
ABSTRACT

The semantic web represents an infrastructure based on ontologies to allow the use and sharing of knowledge. However, this knowledge formalized as ontologies are often heterogeneous and distributed. Ontology alignment is the solution to bridge the semantic gap between these ontologies to ensure the semantic interoperability. Instance-based ontology alignment represents a very promising technique to identify semantic correspondences between entities of different ontologies when they contain many instances. In this paper, we pro-pose an approach for managing ontologies which contains disjoint instances. This approach is to populate the concepts by instances extracted from the doc-uments relevant of the two ontologies to be aligned using Qualia Structure of Generative Lexicon Theory.
Query Optimization in OODBMS: Decomposition of Query for Query Management
S.S. Dhande¹ and G. R. Bamnote², ¹Sipna’s College of Engg & Technology, India and ²Ram Meghe. Institute of Research, India
ABSTRACT

This paper is based on relatively newer approach for query optimization in object databases, which uses query decomposition and cached query results to improve execution a query. Issues that are focused here is fast retrieval and high reuse of cached queries, Decompose Query into Sub query, Decomposition of complex queries into smaller for fast retrieval of result.

Here we try to address another open area of query caching like handling wider queries. By using some parts of cached results helpful for answering other queries (wider Queries) and combining many cached queries while producing the result.

Multiple experiments were performed to prove the productivity of this newer way of optimizing a query. The limitation of this technique is that it’s useful especially in scenarios where data manipulation rate is very low as compared to data retrieval rate.