CIMAR, NIMAR, and LMMA: novel algorithms for thread and memory migrations in user space on NUMA systems using hardware counters

Laso Rodríguez, Rubén; García Lorenzo, Óscar; Cabaleiro Domínguez, José Carlos; Fernández Pena, Anselmo Tomás; Lorenzo del Castillo, Juan Ángel; Fernández Rivera, Francisco

doi:10.1016/j.future.2021.11.008

CIMAR, NIMAR, and LMMA: novel algorithms for thread and memory migrations in user space on NUMA systems using hardware counters

dc.contributor.affiliation	Universidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Información	gl
dc.contributor.affiliation	Universidade de Santiago de Compostela. Departamento de Electrónica e Computación	gl
dc.contributor.area	Área de Enxeñaría e Arquitectura
dc.contributor.author	Laso Rodríguez, Rubén
dc.contributor.author	García Lorenzo, Óscar
dc.contributor.author	Cabaleiro Domínguez, José Carlos
dc.contributor.author	Fernández Pena, Anselmo Tomás
dc.contributor.author	Lorenzo del Castillo, Juan Ángel
dc.contributor.author	Fernández Rivera, Francisco
dc.date.accessioned	2022-04-04T12:17:51Z
dc.date.available	2022-04-04T12:17:51Z
dc.date.issued	2022
dc.description.abstract	This paper introduces two novel algorithms for thread migrations, named CIMAR (Core-aware Interchange and Migration Algorithm with performance Record –IMAR–) and NIMAR (Node-aware IMAR), and a new algorithm for the migration of memory pages, LMMA (Latency-based Memory pages Migration Algorithm), in the context of Non-Uniform Memory Access (NUMA) systems. This kind of system has complex memory hierarchies that present a challenging problem in extracting the best possible performance, where thread and memory mapping play a critical role. The presented algorithms gather and process the information provided by hardware counters to make decisions about the migrations to be performed, trying to find the optimal mapping. They have been implemented as a user space tool that looks for improving the system performance, particularly in, but not restricted to, scenarios where multiple programs with different characteristics are running. This approach has the advantage of not requiring any modification on the target programs or the Linux kernel while keeping a low overhead. Two different benchmark suites have been used to validate our algorithms: The NAS parallel benchmark, mainly devoted to computational routines, and the LevelDB database benchmark focused on read–write operations. These benchmarks allow us to illustrate the influence of our proposal in these two important types of codes. Note that those codes are state-of-the-art implementations of the routines, so few improvements could be initially expected. Experiments have been designed and conducted to emulate three different scenarios: a single program running in the system with full resources, an interactive server where multiple programs run concurrently varying the availability of resources, and a queue of tasks where granted resources are limited. The proposed algorithms have been able to produce significant benefits, especially in systems with higher latency penalties for remote accesses. When more than one benchmark is executed simultaneously, performance improvements have been obtained, reducing execution times up to 60%. In this kind of situation, the behaviour of the system is more critical, and the NUMA topology plays a more relevant role. Even in the worst case, when isolated benchmarks are executed using the whole system, that is, just one task at a time, the performance is not degraded	gl
dc.description.peerreviewed	SI	gl
dc.description.sponsorship	This research work has received financial support from the Ministerio de Ciencia e Innovación, Spain within the project PID2019-104834GB-I00. It was also funded by the Consellería de Cultura, Educación e Ordenación Universitaria of Xunta de Galicia (accr. 2019–2022, ED431G 2019/04 and reference competitive group 2019–2021, ED431C 2018/19)	gl
dc.identifier.citation	Future Generation Computer Systems 129 (2022) 18-32. https://doi.org/10.1016/j.future.2021.11.008	gl
dc.identifier.doi	10.1016/j.future.2021.11.008
dc.identifier.essn	0167-739X
dc.identifier.uri	http://hdl.handle.net/10347/27894
dc.language.iso	eng	gl
dc.publisher	Elsevier	gl
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-104834GB-I00/ES/COMPUTACION DE ALTAS PRESTACIONES Y CLOUD PARA APLICACIONES DE ALTO INTERES	gl
dc.relation.publisherversion	https://doi.org/10.1016/j.future.2021.11.008	gl
dc.rights	© 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/bync-nd/4.0/)	gl
dc.rights.accessRights	open access	gl
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	NUMA	gl
dc.subject	Scheduling	gl
dc.subject	Thread migration	gl
dc.subject	Memory migration	gl
dc.subject	Hardware counters	gl
dc.title	CIMAR, NIMAR, and LMMA: novel algorithms for thread and memory migrations in user space on NUMA systems using hardware counters	gl
dc.type	journal article	gl
dc.type.hasVersion	VoR	gl
dspace.entity.type	Publication
relation.isAuthorOfPublication	0faa7141-ea10-4a10-9414-45cd7b726fef
relation.isAuthorOfPublication	1959c3e1-552e-4a0b-bc17-a5f9f687ad38
relation.isAuthorOfPublication	decb372f-b9cd-4237-8dda-2c0f5c40acbe
relation.isAuthorOfPublication	f905807b-c6bd-4e37-97d1-2e644fc5af62
relation.isAuthorOfPublication.latestForDiscovery	0faa7141-ea10-4a10-9414-45cd7b726fef

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2022_fgcs_laso_cimar.pdf
Size:: 1.37 MB
Format:: Adobe Portable Document Format
Description:: Artigo de investigación

Download

Collections

Electrónica e Computación
Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)