| || |
|Up-to-date list of publications |
To see an up-to-date list of my publications please click here.
|Articles published in, or accepted by refereed journals|
A novel approach for leveraging co-occurrence to improve the false positive error in signature files
Pedram Ghodsnia, Kamran Tirdad, J. Ian Munro,Alejandro López-Ortiz
Abstract – Signature file is a well-studied method in information retrieval for indexing large text databases. Because of the small index size in this method, it is a good candidate for environments where memory is scarce. This small index size, however, comes at the cost of high false positive error rate. In this paper we address the problem of high false positive error rate of signature files by introducing COCA filters, a new variation of Bloom filters which exploits the co-occurrence probability of words in documents to reduce the false positive error. We show experimentally that by using this technique in real document collections we can reduce the false positive error by up to 21 times, for the same index size. It is also shown that in some extreme cases this technique is even able to completely eliminate the false positive error. COCA filters can be considered as a good replacement for Bloom filters wherever the co-occurrence of any two members of the universe is identifiable.
Journal of Discrete Algorithms, Volume 18, Janyuary 2013, Pages 63-74.
A3CRank: An Adaptive Ranking method based on Connectivity, Content and Click-through data
Ali Mohammad Zareh Bidoki, Pedram Ghodsnia, Nasser Yazdani, Farhad Oroumchian
Abstract – Due to the proliferation and abundance of information on the web, ranking algorithms play an important role in web search. Currently, there are some ranking algorithms based on content and connectivity such as PageRank and BM25. Unfortunately, these algorithms have low precision and are not always satisfying for users. In this paper, we propose an adaptive method based on the content, connectivity and click-through data triple, called A3CRank. The aggregation idea of meta search engines has been used to aggregate ranking algorithms such as PageRank, BM25, TF-IDF. We have used reinforcement learning to incorporate user behavior and find a measure of user satisfaction for each ranking algorithm. Furthermore, OWA, an aggregation operator is used for merging the results of the various ranking algorithms. A3CRank adapts itself with user needs and makes use of user clicks to aggregate the results of ranking algorithms. A3Crank is designed to overcome some of the shortcomings of existing ranking algorithms by combining them together and producing an overall better ranking criterion. Experimental results indicate that A3CRank outperforms all other single ranking algorithms in P@n and NDCG measures. We have used 130 queries on University of California at Berkeley’s web to train and evaluate our method.
Information Processing and Management, Volume 46, Issue 2, March 2010, Pages 159-169.
FICA: A Novel Crawling algorithm based on reinforcement learning
Ali Mohammad Zareh Bidoki, Nasser Yazdani, Pedram Ghodsnia
Abstract – Web is a huge and highly dynamic environment which is growing exponentially in the content and structure. Also no search engine can cover whole of the web, thus it has to focus on the most valuable pages for crawling. So an efficient crawling algorithm for retrieving the most important pages has remained still as a challenging issue. Several algorithms like PageRank and OPIC have been proposed. Unfortunately, they have high time complexity. In this paper, an intelligent crawling algorithm based on reinforcement learning, called FICA is proposed that models a real surfing user. The priority for crawling pages is based on a concept which we name it as logarithmic distance. FICA is easy to implement and its time complexity is O(E*logV) where V and E are the number of nodes and edges in the web graph respectively. Comparison of the FICA with other proposed algorithms shows that FICA outperforms them in discovering highly important pages. Furthermore, FICA computes the importance (ranking) of each page during the crawling process. Thus, we can also use FICA as a ranking method for computation of page importance. An adaptive version of FICA is also proposed that adjusts dynamically with changes in the web graph. We have used UK’s web graph for our experiments.
International Journal of Web Intelligence and Agent Systems, Vol 7, No. 4, pp. 363-373, September 2009.
WPR: A Weighted Approach to PageRank
Pedram Ghodsnia, Ali Mohammad Zareh Bidoki, Naser Yazdani
Abstract – The PageRank algorithm which is used by google as a successful ranking algorithm for ranking its results can be interpreted as leveraging the recommendations of all the other page creators on the web, about how important a page is. But it does not take advantage of the recommendations of page visitors. In PageRank, every page creator propagates the importance score of her/his page to its outgoing links uniformly. In this paper a revised version of PageRank called Weighted PageRank (WPR) is proposed in which this uniform propagation is transformed into a weighted one. These weights are assigned to outgoing links based on the average opinion of page visitors about the importance of pages. This opinion is recorded from search engine logs indicating which search results were clicked most. It is demonstrated that our approach is simply applicable without any significant extra time and storage costs compared to PageRank.
International review on Computer and Software (IRECOS), Vol. 3, No. 1, pp. 99-109, January 2008.
|Articles published in, or accepted by refereed conferences|
Parallel I/O Aware Query Optimization
Pedram Ghodsnia, Ivan T. Bowman, Anisoara Nica
Abstract – New trends in storage industry suggest that in the near future a majority of the hard disk drive-based storage subsystems will be replaced by solid state drives (SSDs). Database management systems can substantially benefit from the superior I/O performance of SSDs.
Although the impact of using SSD in query processing has been studied in the past, exploiting the I/O parallelism of SSDs in query processing and optimization has not received enough attention. In this paper, at first, we show why the query optimizer needs to be aware of the benefit of the I/O parallelism in solid state drives. We characterize the benefit of exploiting I/O parallelism in database scan operators in SAP SQL Anywhere and propose a novel general I/O cost model that considers the impact of device I/O queue depth in I/O cost estimation. We show that using this model, the best plans found by the optimizer would be much closer to optimal. The proposed model is implemented in SAP SQL Anywhere. This model, dynamically defined by a calibration process, summarizes the behavior of the I/O subsystem, without having any prior knowledge about the type and the number of devices which are used in the storage subsystem.
To appear in Proceedings of the ACM SIGMOD Interenational Conference on Management of Data, SIGMOD 2014, 22-27 Jun, 2014, Snowbird, Utah, USA.
An In-GPU-Memory Column-Oriented Database for
Processing Analytical Workloads
Abstract – Due to ever increasing demand for fast processing of large
analytical workloads, main memory column-oriented databases have attracted a lot of attention in recent years. In-memory databases eliminate the disk I/O barrier by storing the data in memory. In addition, they utilize a column-oriented data layout to
offer a multi-core-friendly and memory-bandwidth-efficient processing scheme. On the other hand, recently, graphics
processing un its (GPUs) have emerged as powerful tools for general high-performance computing. GPUs are affordable and
energy-efficient devices that deliver a massive computational power by utilizing a large number of cores and a high memory
bandwidth. GPUs can be used as co-processors for query acceleration of in-memory databases. One of the main bottlenecks
in GPU-acceleration of in-memory databases is the need for data to be transferred back and forward between GPU memory and
RAM through a low-bandwidth PCIe bus. To address this problem, in this study, a new generation of in-memory databases is proposed that instead of keeping data in main memory stores it in GPU device memory.
In Proceedings of the VLDB 2012 PhD Workshop, 27-31 August, 2012, Istanbul, Turkey.
COCA Filters: Co-Occurrence Aware Bloom
Kamran Tirdad, Pedram Ghodsnia, J. Ian Munro, and Alejandro Lopez-Ortiz
Abstract – We propose an indexing data structure based on a novel variation of Bloom Filters. Signature Files have been proposed in the past as a method to index large text databases though they suver from a high
false positive error problem. In this paper we introduce COCA Filters, a new type of Bloom Filters which exploits the co-occurrence probability of words in documents to reduce the false positive error. We show experimentally that by using this technique we can reduce the false positive
error by up to 21 times for the same index size. Furthermore Bloom Filters can be replaced by COCA Filters wherever the co-occurrence of any two members of the universe is identifiable.
In Proceedings of the SPIRE'11, October 17-21, 2011, Pisa, Italy.
(This paper won the best student paper award)
Dynamic Data Allocation with Replication in Distributed Systems
Shahin Kamali, Pedram Ghodsnia, Khuzaima Daudjee
Abstract Fragment allocation is an important problem in modern distributed database systems. The goal of fragment allocation is to allocate data fragments to a network of sites so as to minimize the overall data transmission cost incurred to satisfy queries. We consider the problem of fragment allocation and address both placement and replication issues in an integrated approach. While replication can improve system performance via increased locality, excessive replication can incur extra transmission cost to maintain consistency. A comprehensive model that takes into account network topology, fragment correlation and data access patterns is presented. Based on this model, two algorithms are proposed to find near-optimal dynamic allocation solutions. Experimental results show the efficacy of the proposed solutions.
In Proceedings of the IPCCC'11 , November 17-19, 2011, Orlando, USA.
Statistical Evaluation of effect of Some Persian Language Characteristics in Recall Factor of Search Engine Results
Pedram Ghodsnia, Ali Mohammad Zareh Bidoki, Naser Yazdani
Abstract – With growth of Farsi web content in recent years, Farsi language has been turned into the primar language of Iranians in communication, learning and interchange of ideas and thoughts in Internet. In spite of recent studies about challenges and restrictions of Farsi language in quality of search results in search engines, there is no exact statistical analysis about the role of each problem in decreasing the quality of search results. In this paper, using a real collection of Farsi web pages, we present some experimental results about the role of some of the most important problems and restrictions of Farsi language in decreasing the recall factor of search results in search engines.
In Proceedings of the 13th National CSI Computer Conference, March 9-11, 2008, Kish Island, Persian Gulf, Iran.
FICA: A Fast Intelligent Crawling Algorithm
Ali Mohammad Zareh Bidoki, Nasser Yazdani, Pedram Ghodsnia
Abstract – Due to the proliferation and highly dynamic nature of the web, an efficient crawling and ranking algorithm for retrieving the most important pages has remained as a challenging issue. Several algorithms like PageRank and OPIC have been proposed. Unfortunately, they have high time complexity. In this paper, an intelligent crawling algorithm based on reinforcement learning, called FICA is proposed that models a real surfing user. The priority for crawling pages is based on a concept which we name as logarithmic distance. FICA is easy to implement and its time complexity is O(E*logV) where V and E are the number of nodes and edges in the web graph respectively. Comparison of the FICA with other proposed algorithms shows that FICA outperforms them in discovering highly important pages. Furthermore, FICA computes the importance (ranking) of each page during the crawling process. Thus, we can also use FICA as a ranking method for computation of page importance. We have used UK’s web graph for our experiments.
In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, November 2-5, 2007, Silicon Valley, USA.
A Punishment/Reward Based Approach to Ranking
Pedram Ghodsnia, Ali Mohammad Zareh Bidoki, Nasser Yazdani
Abstract –One of the important challenges in current search engines is dealing with the "rich get richer" problem. In popularity-based ranking algorithms like PageRank, due to considering the structure of the web as the measure for ranking the pages, newly-created but highly-qualified pages are effectively disregarded shoot out, and can take a very long time before becoming popular. In this paper we present a new punishment/reward based approach that adds a new dimension to the PageRank model for reducing the effect of the rich get richer problem using implicit feedback of visitors. In this approach, in addition to considering the structure of links as a page-creator's point of view, we use the page-visitor's view as an important parameter to improve the accuracy of the PageRank algorithm.
In Proceedings of the ACM Second International Conference on Scalable Information Systems(INFOSCALE 2007), June 6-8, 2007, Suzhou, China.
Novel Way of Determining the Optimal Location of a Fragment in a DDBS: BGBR
Ashkan Bayati, Pedram Ghodsnia, Masood Rahgozar, Reza Baseda
Abstract –This paper addresses the problem of determining the optimal location to place a fragment (object) in a distributed non-replicated database. The algorithm defined takes into consideration a changing environment with changing access patterns. This paper contributes by allocating data fragments to their optimal location, in a distributed network, based on the access patterns for that fragment. The mechanism for achieving this optimality relies on knowing the costs involved in moving data fragments from one site to the other. Embedded into the algorithm is a mechanism to prioritize fragments so that if there is a limited space on a specific node where many fragments are chosen to be allocated the ones with higher priority are placed before the lower priority fragments.
In Proceedings of the IEEE ICSNC06, Tahiti, French Polynesia, 2006.
|Selected non-refereed articles|
UTUtd 2004 Team Description
Mostafa Hadian Dehkordi, Peyman Zarrineh, Pedram Ghodsnia, Farid AmirGhiasvand, Hesamaddin Torabi Dashtii
Abstract –In this paper, we briefly introduce the main ideas in UTUtd2004 development. The low-level abilities employed in designing our team are mainly based on Tsinghuaeolus Team 2002. In this paper different parts of humanbrain have been described in terms of Balkenius model of learning which has been presented as an emotional learning model. The model has been used for simulating a soccer player in which player can predict and decide his next move or action. The paper further presents an approach toward GCGA and presents a brief solution. The player in this simulation is a trainable robot for a Robocup team.
8th RoboCup International competitions, 2004, Lisboa, Portugal.
Converting REO into SMV
Pedram Ghodsnia, Sadra Abedinzadeh, Ashkan Bayati, Alireza Vazifehdoost
Abstract – In recent years, significant gains have been made in formally verifying systems using symbolic model checking. Reo, an exogenous coordination language, provides the abil ity to formally specify the behavior and requirements of a great range of applications. Although Reo is an extremely powerful tool for modeling, it lacks the ability for verifying the models logic. In this paper we introduce a mechanism to establish properties of hardware or software designs, which have been formally specified via Reo, using logic through the use of a model checker. The model checker we have used is NuSMV.
University of Tehran, May 2006.
Efficient Packet Forwarding on a Multi-Processor Parallel and Distributed Shared Memory Router
Pedram Ghodsnia, Ashkan Bayati
Abstract – As link speed continues to grow exponentially, packet forw arding at network switches and routers is becoming a bottleneck. We felt that in order to design a high performance router capable of meeting today's industrial needs would require a close interaction between the design of the architecture and the operating system. Due to low speed capabilities of memory cards and the amount of I/O used in a router we felt the most important aspects of designing an operating system is memory and I/O management. In this paper we have combined an effective memory management and I/O management framework that reaps the benefits of the underlying architecture.
University of Tehran, May 2006.
TLA and TLA a Survey
Abstract – TLA is a formal specification language based on set theory, firs-order logic, and TLA (the Temporal Logic of Actions). TLA is a version of temporal logic developed for describing and reasoning about concurrent and reactive systems. TLA includes modules and ways of combining them to form larger specifications. In this paper we are going to provide a brief introduction to TLA and TLA , present the currently available tools for model checking with TLA and review some related works and case studies of the usage of TLA for formal verification of systems.
David Cheriton School of Computer Science, University of Waterloo, July 2009.
- Pedram Ghodsnia, “Parallel Gaussian Elimination”, project report, David Cheriton School of Computer Science, University of Waterloo, July 2009.
- Pedram Ghodsnia, Performance improvement of Search Engines using analytical techniques of web graph, Master’s thesis, School of Electrical and Computer Engineering, University of Tehran, 2008.
- Pedram Ghodsnia, “A survey on Spatial Methods”, Technical report, Database Research Group, School of Electrical and Computer Engineering, University of Tehran,2006.
- Pedram Ghodsnia, "Learning Linux", Computer Javan Magazine, Volumes: 37, 38, 39 and 40, 2001.