Nowadays, many computer systems gather independent individuals, who cooperate by sharing their resources. Grids, peer-to-peer networks, web2.0 sites exist only thanks to contributing users. However, as such systems gather independent individuals, they face problems similar to the problems faced by modern democratic societies. Firstly, individuals are inherently selfish. Secondly, the access to shared resources should be equitable.
I am interested in modeling and analysis of such systems with precise, mathematical tools, such as game-theory and equitable optimization. Usually, a system with selfish members is less efficient than a fully centralized one. Resource management in distributed systems seems to be even more sensitive to decentralization. In grids, we proved that the drop of performance (the price of anarchy) depends linearly on the number of jobs. We have similar results in peer-to-peer storage systems.
These systems can achieve acceptable performance only through limiting the control the members have over their resources.
The following list is organized by areas of interest; if you need a list organized by publication type (journal, conference, etc.), get my CV.
To successfully complete a complex project, be it a construction of an airport or of a backbone IT system, agents (companies or individuals) must form a team having required competences and resources. A team can be formed either by the project issuer based on individual agents' offers (centralized formation); or by the agents themselves (decentralized formation) bidding for a project as a consortium-in that case many feasible teams compete for the contract. We investigate rational strategies of the agents (what salary should they ask? with whom should they team up?). We propose concepts to characterize the stability of the winning teams and study their computational complexity.
In geographically-distributed systems, communication latencies are non-negligible. The perceived processing time of a request is thus composed of the time needed to route the request to the server and the true processing time. Once a request reaches a target server, the processing time depends on the total load of that server; this dependency is described by a load function. We consider a broad class of load functions; we just require that they are convex and two times differentiable. In particular our model can be applied to heterogeneous systems in which every server has a different load function. We present optimization centralized and a decentralized algorithms for load balancing. We prove bounds on the algorithms' convergence. To the best of our knowledge these bounds were not known even for the special cases studied previously (queuing theory and batches of requests). Both algorithms are any-time and self-stabilizing algorithms.
In data centers, many tasks (services, virtual machines or computational jobs) share a single physical machine. We propose a new resource management model for such colocation. Our model uses two parameters of a task-its size and its type-to characterize how a task influences the performance of the other tasks allocated on the same machine. As typically a data center hosts many similar, recurring tasks (e.g.: a webserver, a database, a CPU-intensive computation), the resource manager should be able to construct these types and their performance interactions. Moreover, realistic variants of our model are polynomially-solvable, in contrast to the NP-hard vector packing used previously. In particular, we minimize the total cost in a model in which each task's cost is a function of the total sizes of tasks allocated on the same machine (each type is counted separately). We show that for a linear cost function the problem is strongly NP-hard, but polynomially-solvable in some particular cases. We propose an algorithm polynomial in the number of tasks (but exponential in the number of types and machines); and another algorithm polynomial in the number of tasks and machines (but exponential in the number of types and admissible sizes of tasks). When there is a single type, we give a polynomial time algorithm. We also prove that, even for a single type, the problem becomes NP-hard for convex costs.
Energy consumption has become one of the most important factors in High Performance Computing platforms. However, while there are various algorithmic and programming techniques to save energy, a user has currently no incentive to employ them, as they might result in worse performance. We propose to manage the energy budget of a supercomputer through EnergyFairShare (EFS), a FairShare-like scheduling algorithm. FairShare is a classic scheduling rule that prioritizes jobs belonging to users who were assigned small amount of CPU-second in the past. Similarly, EFS keeps track of users' consumption of Watt-seconds and prioritizes those whom jobs consumed less energy. Therefore, EFS incentives users to optimize their code for energy efficiency. Having higher priority, jobs have smaller queuing times and, thus, smaller turn-around time. To validate this principle, we implemented EFS in a scheduling simulator and processed workloads from various HPC centers. The results show that, by reducing her energy consumption, a user will reduce her stretch (slowdown), compared to increasing her energy consumption. To validate the general feasibility of our approach, we also implemented EFS as an extension for SLURM, a popular HPC resource and job management system. We validated our plugin both by emulating a large scale platform, and by experiments upon a real cluster with monitored energy consumption. We observed smaller waiting times for energy efficient users.
To successfully complete a complex project, be it a construction of an airport or of a backbone IT system or crowd-sourced projects, agents (companies or individuals) must form a team (a coalition) having required competences and resources. A team can be formed either by the project issuer based on individual agents' offers (centralized formation); or by the agents themselves (decentralized formation) bidding for a project as a consortium-in that case many feasible teams compete for the employment contract. In these models, we investigate rational strategies of the agents (what salary should they ask? with whom should they team up?) under different organizations of the market. We propose various concepts allowing to characterize the stability of the winning teams. We show that there may be no (rigorously) strongly winning coalition, but the weakly winning and the auction-winning coalitions are guaranteed to exist. In a general setting, with an oracle that decides whether a coalition is feasible, we show how to find winning coalitions with a polynomial number of calls to the oracle. We also determine the complexity of the problem in a special case in which a project is a set of independent tasks. Each task must be processed by a single agent, but processing speeds differ between agents and tasks.
Campaign Scheduling is characterized by multiple job submissions issued from multiple users over time. This model perfectly suits today's systems since most available parallel environments have multiple users sharing a common infrastructure. When scheduling individually the jobs submitted by various users, one crucial issue is to ensure fairness. This work presents a new fair scheduling algorithm called OStrich whose principle is to maintain a virtual time-sharing schedule in which the same amount of processors is assigned to each user. The completion times in the virtual schedule determine the execution order on the physical processors. Then, the campaigns are interleaved in a fair way by OStrich. For independent sequential jobs, we show that OStrich guarantees the stretch of a campaign to be proportional to campaign's size and the total number of users. The stretch is used for measuring by what factor a workload is slowed down relative to the time it takes on an unloaded system. The theoretical performance of our solution is assessed by simulating OStrich compared to the classical FCFS algorithm, issued from synthetic workload traces generated by two different user profiles. This is done to demonstrate how OStrich benefits both types of users, in contrast to FCFS.
We consider the problem of fair scheduling in a multi-organizational system in which organizations are contributing their own resources to the global pool and the jobs to be processed on the common resources. We consider on-line, non-clairvoyant scheduling of sequential jobs without preemption. The organizations must agree on the order of executing the jobs, i.e. on the scheduling algorithm. To ensure that the organizations are willing to cooperate the scheduling algorithm must be fair.
To characterize fairness, we use cooperative game theory approach. The contribution of an organization is computed based on how this organization influences the utility (which can be any metric, e.g., flow time, turnaround, resource allocation, etc.) of all organizations. Formally, the contribution of the organization is its Shapley value in the cooperative game. The scheduling algorithm should ensure that the contributions of the organizations are close to their utilities. Our previous work proves that this problem is NP-hard and hard to approximate.
In this paper we propose a heuristic scheduling algorithm for the fair scheduling problem. We experimentally evaluate the heuristic and compare its fairness to other algorithms (fair share, round robin) and the exact exponential algorithm. Our results show that fairness of the heuristic algorithm is close to the optimal. The difference between our heuristic and the fair share algorithm is more visible on longer traces with more organizations. We believe that these results prove that fair share might not be an optimal solution for multi-organizational systems.
We consider a multi-organizational system in which each organization contributes processors to the global pool but also jobs to be processed on the common resources. The fairness of the scheduling algorithm is essential for the stability and even for the existence of such systems (as organizations may refuse to join an unfair system).
We consider on-line, non-clairvoyant scheduling of sequential jobs. The started jobs cannot be stopped, canceled, preempted, or moved to other processors. We consider identical processors, but most of our results can be extended to related or unrelated processors.
We model the fair scheduling problem as a cooperative game and we use the Shapley value to determine the ideal fair schedule. In contrast to the current literature, we do not use money to assess the relative utilities of jobs. Instead, to calculate the contribution of an organization, we determine how the presence of this organization influences the performance of other organizations. Our approach can be used with arbitrary utility function (e.g., flow time, tardiness, resource utilization), but we argue that the utility function should be strategy resilient. The organizations should be discouraged from splitting, merging or delaying their jobs. We present the unique (to within a multiplicative and additive constants) strategy resilient utility function.
We show that the problem of fair scheduling is NP-hard and hard to approximate. However, for unit-size jobs, we present a fully polynomial-time randomized approximation scheme (FPRAS). We also show that the problem parametrized with the number of organizations is fixed parameter tractable (FPT). In cooperative game theory, the Shapley value is considered in many contexts as “the” fair solution. Our results show that, although the problem for the large number of organizations is computationally hard, this solution concept can be used in scheduling (for instance, as a benchmark for measuring fairness of heuristic algorithms).
We consider a geographically distributed request processing system composed of various organizations and their servers connected by the Internet. The latency a user observes is a sum of communication delays and the time needed to handle the request on a server. The handling time depends on the server congestion, i.e. the total number of requests a server must handle. We analyze the problem of balancing the load in a network of servers in order to minimize the total observed latency. We consider both cooperative and selfish organizations (each organization aiming to minimize the latency of the locally-produced requests). The problem can be generalized to the task scheduling in a distributed cloud; or to content delivery in an organizationally-distributed CDNs.
In a cooperative network, we show that the problem is polynomially solvable. We also present a distributed algorithm iteratively balancing the load. We show how to estimate the distance between the current solution and the optimum based on the amount of load exchanged by the algorithm. During the experimental evaluation, we show that the distributed algorithm is efficient, therefore it can be used in networks with dynamically changing loads.
In a network of selfish organizations, we prove that the price of anarchy (the worst-case loss of performance due to selfishness) is low when the network is homogeneous and the servers are loaded (the request handling time is high compared to the communication delay). After relaxing these assumptions, we assess the loss of performance caused by the selfishness experimentally, showing that it remains low.
Our results indicate that a set of servers handling requests, connected in a heterogeneous network, can be efficiently managed by a distributed algorithm. Additionally, even if the network is organizationally distributed, with individual organizations optimizing performance of their requests, the network remains efficient.
The distributed nature of new computing platforms results in the problem of scheduling parallel jobs produced by several independent organizations that have each their own rules. They have no direct control over the whole system, thus it is necessary to revisit classical scheduling with locality constraints. In this article, we consider distributed computing systems in which each organization has its own resources. Each organization aims at minimizing the execution times of its own jobs. We introduce a global centralized mechanism for designing a collaborative solution that improves the global performance of the system while respecting organizations' selfish objectives. The proposed algorithm is proved to have an approximation ratio equal to 3 over the global optimal makespan and this bound is shown to be asymptotically tight (when the number of organizations is large). Several variants of this problem are also studied. Then, we derive another algorithm that improves in practice these solutions by further balancing the schedules. Finally, we provide some experiments based on simulations that demonstrate a very good efficiency of this last algorithm on typical instances.
The distributed nature of the grid results in the problem of scheduling parallel jobs produced by several independent organizations that have partial control over the system. We consider systems in which each organization owns a cluster of processors. Each organization wants its tasks to be completed as soon as possible. In this paper, we model an off-line system consisting of N identical clusters of m processors. We show that it is always possible to produce a collaborative solution that respects participants' selfish goals, at the same time improving the global performance of the system. We propose an algorithm (called MOLBA) with a guaranteed worst-case performance ratio on the global makespan, equal to 4. Next, we show that a better bound (equal to 3) can be obtained in a specific case when the last completed job requires at most m-2 processors. Then, we derive another algorithm (called ILBA) that in practice improves the proposed, guaranteed solution by further balancing the schedules. Finally, by an extensive evaluation by simulation, we show that the algorithms are efficient on typical instances.
In distributed computing the recent paradigm shift from centrally-owned clusters to organizationally distributed computational grids introduces a number of new challenges in resource management and scheduling. In this work, we study the problem of Selfish Load Balancing which extends the well-known Load Balancing (LB) problem to scenarios in which each processor is concerned only with the performance of its local jobs. We propose a simple mathematical model for such systems and a novel function for computing the cost of the execution of foreign jobs. Then, we use the game-theoretic framework to analyze the model in order to compute the expected result of LB performed in a grid formed by two clusters. We show that, firstly, LB is a socially-optimal strategy, and secondly, for similarly loaded clusters, it is sufficient to collaborate during longer time periods in order to make LB the dominant strategy for each cluster. However, we show that if we allow clusters to make decisions depending on their current queue length, LB will never be performed. Then, we propose a LB algorithm which balances the load more equitably, even in the presence of overloaded clusters. Our algorithms do not use any external forms of compensation (such as money). The load is balanced only by considering the parameters of execution of jobs. This analysis is assessed experimentally by simulation, involving scenarios with multiple clusters and heterogeneous load.
We present a novel, generic model of the grid that emphasises the roles of individual organizations that form the system. The model allows us to study the global behaviour of the system without introducing external forms of recompense. Using game-theory and equitable multicriteria optimization, we study three diverse types of computational grids: an off-line system with dedicated uniprocessors, an on-line system with divisible load and an off-line system with parallel jobs. Results show that, unless strong assumptions are made, the complete decentralization leads to a significant loss of performance.
We study two problems directly resulting from organizational decentralization of the Grid. Firstly, the problem of fair scheduling in systems in which the grid scheduler has complete control of processors' schedules. Secondly, the problem of fair and feasible scheduling in decentralized case, in which the grid scheduler can only suggest a schedule, which can be later modified by a processor's owner. Using game theory, we show that scheduling in decentralized case is analogous to the Prisoner's Dilemma game. Moreover, the Nash equilibrium results in significant performance drop. Therefore, a strong community control is required to achieve acceptable performance.
We present a new method for determining placements of flexible reservation requests into a schedule. For each considered placement the what-if method inserts a placeholder into the schedule and simulates the processing of batch jobs currently known to the system. Each placement is evaluated wrt. well-known scheduling metrics. This information may be used by a Grid reservation service to choose the most likely successful placement of a reservation. According to the results of extensive simulations, the what-if method grants more reservations and improves the performance of local jobs compared to our previously used load method.
P2P architecture is a viable option for enterprise backup. In contrast to dedicated backup servers, nowadays a standard solution, making backups directly on organization's workstations should be cheaper as existing hardware is used; more efficient as there is no single bottleneck server; and more reliable as the machines can be geographically dispersed.
We present an architecture of a p2p backup system that uses pairwise replication contracts between a data owner and a replicator. In contrast to a standard p2p storage system using directly a DHT, the contracts allow our system to optimize replicas' placement depending on a specific optimization strategy, and so to take advantage of the heterogeneity of the machines and the network. Such optimization is particularly appealing in the context of backup: replicas can be geographically dispersed, the load sent over the network can be minimized, or the optimization goal can be to minimize the backup/restore time. However, managing the contracts, keeping them consistent and adjusting them in response to dynamically changing environment is challenging.
We built a scientific prototype and ran experiments on 150 workstations in our university's computer laboratories and, separately, on 50 PlanetLab nodes. We found out that the main factor affecting the performance of the system is the availability of the machines. Yet, our main conclusion is that it is possible to build an efficient and reliable backup system on highly unavailable machines, as our computers had just 13% average availability.
In a decentralized storage system, agents replicate each other's data in order to increase availability. Compared to organizationally-centralized solutions, such as cloud storage, a decentralized storage system requires less trust in the provider and may result in smaller monetary costs. Our system is based on reciprocal storage contracts which allow the agents to adopt to changes in their replication partners' availability (by dropping inefficient contracts and forming new contracts with other partners). The data availability provided by the system is a function of the participating agents' availability. However, a straightforward system in which agents' matching is decentralized uses the given agent availability inefficiently. As agents are autonomous, the highly-available agents form cliques replicating data between each other, which makes the system too hostile for the weakly-available newcomers. In contrast, a centralized, equitable matching is not incentive-compatible: it does not reward users for keeping their software running.
We solve this dilemma by a mixed solution: an “adoption” mechanism in which highly-available agents donate some replication space, which in turn is used to help the worst-off agents. We show that the adoption motivates agents to increase their availability (is incentive-compatible); but also that it is sufficient for acceptable data availability for weakly-available agents.
P2P architecture is a viable option for enterprise backup. In contrast to dedicated backup servers, nowadays a standard solution, making backups directly on organization's workstations should be cheaper (as existing hardware is used), more efficient (as there is no single bottleneck server) and more reliable (as the machines are geographically dispersed).
We present the architecture of a p2p backup system that uses pairwise replication contracts between a data owner and a replicator. In contrast to standard p2p storage systems using directly a DHT, the contracts allow our system to optimize replicas' placement depending on a specific optimization strategy, and so to take advantage of the heterogeneity of the machines and the network. Possible goals include geographical dispersion of the replicas, minimization of backup or restore time or the usage of long-range links. However, managing the contracts, keeping them consistent and adjusting them in response to dynamically changing environment is challenging.
We built a scientific prototype and ran the experiments on 150 workstations in the university's computer laboratories and, separately, on 50 PlanetLab nodes. We found out that the main factor affecting the quality of the system is the availability of the machines. Yet, our main conclusion is that it is possible to build an efficient and reliable backup system on highly unreliable machines (our computers had just 13% average availability).
Recent portable devices, from sophisticated mobile phones, to netbooks, thanks to wireless networking and powerful batteries, give hardware support for collaborative work on the go, even when the Internet connection is not available. Yet, current collaboration software, from Concurent Versions System (CVS) or Subversion (SVN) repositories to Google Docs and collaborative versions of commercial applications (CoWord, CoMaya), requires a dedicated server to synchronize clients, and thus a stable network connection. In this chapter, we present two tools that use peer-to-peer paradigm to build serverless collaboration networks. PBDMS enables users to share, search and review bibliographic databases. SharedMind provides collaborative document editing to FreeMind, popular, open source mind-mapping software. Both tools handle disconnections and network divisions, enabling users to continue their work and to synchronize with their reachable peers. Both tools have been implemented and tested in small scale. PBDMS is available for download at http://code.google.com/p/bibliographicsocialinfosys/; SharedMind is available at http://code.google.com/p/sharedmind . We believe that such seamless, flexible collaboration applications provide the degree of freedom promised by the recent portable devices, yet not fully used by the current applications.
In peer-to-peer storage systems, peers replicate each others' data in order to increase availability. If the matching is done centrally, the algorithm can optimize data availability in an equitable manner for all participants. However, if matching is decentralized, the peers' selfishness can greatly alter the results, leading to performance inequities that can render the system unreliable and thus ultimately unusable. We analyze the problem using both theoretical approaches (complexity analysis for the centralized system, game theory for the decentralized one) and simulation. We prove that the problem of optimizing availability in a centralized system is NP-hard. In decentralized settings, we show that the rational behavior of selfish peers will be to replicate only with similarly-available peers. Compared to the socially-optimal solution, highly available peers have their data availability increased at the expense of decreased data availability for less available peers. The price of anarchy is high: unbounded in one model, and linear with the number of time slots in the second model. We also propose centralized and decentralized heuristics that, according to our experiments, converge fast in the average case. The high price of anarchy means that a completely decentralized system could be too hostile for peers with low availability, who could never achieve satisfying replication parameters. Moreover, we experimentally show that even explicit consideration and exploitation of diurnal patterns of peer availability has a small effect on the data availability-except when the system has truly global scope. Yet a fully centralized system is infeasible, not only because of problems in information gathering, but also the complexity of optimizing availability. The solution to this dilemma is to create system-wide cooperation rules that allow a decentralized algorithm, but also limit the selfishness of the participants.
Current implementations of real-time collaborative applications rely on a dedicated infrastructure to carry out all synchronizing and communication functions, and require all end nodes to communicate directly with and through the central server. In this paper, we investigate scenarios, in which the most resource intensive functionality of continuous communication among collaborators to disseminate changes is decentralized, utilizing the end users as relays. We observe that communication characteristics of real-time collaboration makes use of existing multicast mechanisms unsuitable. As collaborative editing sessions are typically long, we are able to gather and then use additional parameters of nodes (their instabilities and frequency of sending updates) and communication links (latencies and average costs). We identify several criteria to determine the quality of a multicast tree: cost, latency and instability. We analyze the complexity of these problems and propose algorithms to optimize the communication topology. We also consider the multiobjective problem in which we search for a tree that results in a good trade-off between these measures. Validation of algorithms on numerous graphs shows that it is important to consider the multiobjective problem, as optimal solutions for one performance measure can be far from optimal values of the others. Finally, we briefly present an implementation of a communication library that uses the proposed algorithms to periodically adjust the dissemination tree.
Current implementations of real-time collaborative applications rely on a dedicated infrastructure to carry out all synchronizing and communication functions, and require all end nodes to communicate directly with and through the central server. In this paper, we investigate an architecture, in which the most resource intensive functionality of continuous communication among collaborators to disseminate changes is decentralized, utilizing the end users as relays. We observe that communication characteristics of real-time collaboration makes use of existing multicast mechanisms unsuitable. As collaborative editing sessions are typically long, we are able to gather and then use additional parameters of nodes (their instabilities and frequency of sending updates) and communication links (latencies and average costs). We identify several criteria to determine the quality of a multicast tree: cost, latency and instability. We analyze the complexity of these problems and propose algorithms to optimize the communication topology. We also consider the multiobjective problem in which we search for a tree that results in a good trade-off between these measures. Validation of algorithms on numerous graphs shows that it is important to consider the multiobjective problem, as optimal solutions for one performance measure can be far from optimal values of the others.
Models of computational trust support users in taking decisions. They are commonly used to guide users' judgements in online auction sites; or to determine quality of contributions in Web 2.0 sites. However, most existing systems require historical information about the past behavior of the specific agent being judged. In contrast, in real life, to anticipate and to predict a stranger's actions in absence of the knowledge of such behavioral history, we often use our "instinct"- essentially stereotypes developed from our past interactions with other "similar" persons. In this paper, we propose StereoTrust, a computational trust model inspired by stereotypes as used in real-life. A stereotype contains certain features of agents and an expected outcome of the transaction. When facing a stranger, an agent derives its trust by aggregating stereotypes matching the stranger's profile. Since stereotypes are formed locally, recommendations stem from the trustor's own personal experiences and perspective. Historical behavioral information, when available, can be used to refine the analysis. According to our experiments using Epinions.com dataset, StereoTrust compares favorably with existing trust models that use different kinds of information and more complete historical information.
Trust plays important roles in diverse decentralized environments, including our society at large. Computational trust models help to, for instance, guide users' judgements in online auction sites about other users; or determine quality of contributions in web 2.0 sites. Most of the existing trust models, however, require historical information about past behavior of a specific agent being evaluated - information that is not always available. In contrast, in real life interactions among users, in order to make the first guess about the trustworthiness of a stranger, we commonly use our "instinct" - essentially stereotypes developed from our past interactions with "similar" people. We propose StereoTrust, a computational trust model inspired by real life stereotypes. A user forms stereotypes using her previous transactions with other agents. A stereotype contains certain features of agents and an expected outcome of the transaction. These features can be taken from agents' profile information, or agents' observed behavior in the system. When facing a stranger, the stereotypes matching stranger's profile are aggregated to derive his expected trust. Additionally, when some information about stranger's previous transactions is available, StereoTrust uses it to refine the stereotype matching. According to our experiments, StereoTrust compares favorably with existing trust models that use different kind of information and more complete historical information. Moreover, because evaluation is done according to user's personal stereotypes, the system is completely distributed and the result obtained is personalized. StereoTrust can be used as a complimentary mechanism to provide the initial trust value for a stranger, especially when there is no trusted, common third parties.
generated by bibtex2html 1.96.