CIMAR, NIMAR, and LMMA: novel algorithms for thread and memory migrations in user space on NUMA systems using hardware counters

dc.contributor.affiliationUniversidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Informacióngl
dc.contributor.affiliationUniversidade de Santiago de Compostela. Departamento de Electrónica e Computacióngl
dc.contributor.areaÁrea de Enxeñaría e Arquitectura
dc.contributor.authorLaso Rodríguez, Rubén
dc.contributor.authorGarcía Lorenzo, Óscar
dc.contributor.authorCabaleiro Domínguez, José Carlos
dc.contributor.authorFernández Pena, Anselmo Tomás
dc.contributor.authorLorenzo del Castillo, Juan Ángel
dc.contributor.authorFernández Rivera, Francisco
dc.date.accessioned2022-04-04T12:17:51Z
dc.date.available2022-04-04T12:17:51Z
dc.date.issued2022
dc.description.abstractThis paper introduces two novel algorithms for thread migrations, named CIMAR (Core-aware Interchange and Migration Algorithm with performance Record –IMAR–) and NIMAR (Node-aware IMAR), and a new algorithm for the migration of memory pages, LMMA (Latency-based Memory pages Migration Algorithm), in the context of Non-Uniform Memory Access (NUMA) systems. This kind of system has complex memory hierarchies that present a challenging problem in extracting the best possible performance, where thread and memory mapping play a critical role. The presented algorithms gather and process the information provided by hardware counters to make decisions about the migrations to be performed, trying to find the optimal mapping. They have been implemented as a user space tool that looks for improving the system performance, particularly in, but not restricted to, scenarios where multiple programs with different characteristics are running. This approach has the advantage of not requiring any modification on the target programs or the Linux kernel while keeping a low overhead. Two different benchmark suites have been used to validate our algorithms: The NAS parallel benchmark, mainly devoted to computational routines, and the LevelDB database benchmark focused on read–write operations. These benchmarks allow us to illustrate the influence of our proposal in these two important types of codes. Note that those codes are state-of-the-art implementations of the routines, so few improvements could be initially expected. Experiments have been designed and conducted to emulate three different scenarios: a single program running in the system with full resources, an interactive server where multiple programs run concurrently varying the availability of resources, and a queue of tasks where granted resources are limited. The proposed algorithms have been able to produce significant benefits, especially in systems with higher latency penalties for remote accesses. When more than one benchmark is executed simultaneously, performance improvements have been obtained, reducing execution times up to 60%. In this kind of situation, the behaviour of the system is more critical, and the NUMA topology plays a more relevant role. Even in the worst case, when isolated benchmarks are executed using the whole system, that is, just one task at a time, the performance is not degradedgl
dc.description.peerreviewedSIgl
dc.description.sponsorshipThis research work has received financial support from the Ministerio de Ciencia e Innovación, Spain within the project PID2019-104834GB-I00. It was also funded by the Consellería de Cultura, Educación e Ordenación Universitaria of Xunta de Galicia (accr. 2019–2022, ED431G 2019/04 and reference competitive group 2019–2021, ED431C 2018/19)gl
dc.identifier.citationFuture Generation Computer Systems 129 (2022) 18-32. https://doi.org/10.1016/j.future.2021.11.008gl
dc.identifier.doi10.1016/j.future.2021.11.008
dc.identifier.essn0167-739X
dc.identifier.urihttp://hdl.handle.net/10347/27894
dc.language.isoenggl
dc.publisherElseviergl
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-104834GB-I00/ES/COMPUTACION DE ALTAS PRESTACIONES Y CLOUD PARA APLICACIONES DE ALTO INTERESgl
dc.relation.publisherversionhttps://doi.org/10.1016/j.future.2021.11.008gl
dc.rights© 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/bync-nd/4.0/)gl
dc.rights.accessRightsopen accessgl
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectNUMAgl
dc.subjectSchedulinggl
dc.subjectThread migrationgl
dc.subjectMemory migrationgl
dc.subjectHardware countersgl
dc.titleCIMAR, NIMAR, and LMMA: novel algorithms for thread and memory migrations in user space on NUMA systems using hardware countersgl
dc.typejournal articlegl
dc.type.hasVersionVoRgl
dspace.entity.typePublication
relation.isAuthorOfPublication0faa7141-ea10-4a10-9414-45cd7b726fef
relation.isAuthorOfPublication1959c3e1-552e-4a0b-bc17-a5f9f687ad38
relation.isAuthorOfPublicationdecb372f-b9cd-4237-8dda-2c0f5c40acbe
relation.isAuthorOfPublicationf905807b-c6bd-4e37-97d1-2e644fc5af62
relation.isAuthorOfPublication.latestForDiscovery0faa7141-ea10-4a10-9414-45cd7b726fef

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2022_fgcs_laso_cimar.pdf
Size:
1.37 MB
Format:
Adobe Portable Document Format
Description:
Artigo de investigación