Kabel und Lichter

HAICORE.berlin

Profile

This service offers dedicated computing resources as part of the Helmholtz AI platform.

Access to computing resources is the key for the Helmholtz AI community to accelerate innovative AI applications. The “Helmholtz AI COmputing REsources” (HAICORE) at the Max Delbrück Center provides easy and low-barrier access to dedicated GPU resources for the Helmholtz Foundation Models Initiative (HFMI).

The service provides in total 64 Nvidia H100 GPUs with 5.1TB of high-bandwidth GPU-RAM, 1760 CPU-cores with 7.7TB of system-memory, and a low-latency interconnect for massive-parallel AI-training. Additionally, fast local scratch-storage (NVMe) and big project storage (1.6 PB) is provided.

Basic Informations about other HAICORE sites can be found at helmholtz.ai.

Connect

You can access the system via a secure shell (ssh) connection. Before accessing the system you must prepare your account as explained below.

Connecting to the headnode of HAICORE.berlin

Please ssh-connect to login.haicore.berlin using your HAICORE.berlin account and the ssh private-key matching the public-key which you have uploaded as explained below in step 2.

Once you got connected, you can start using the workload management tool Slurm by loading the respective module with the following command: module load slurm

When needed please learn more about Slurm here.

Preparations to connect to HAICORE.berlin

In brief:

  1. You have to authenticate with Helmholtz-AAI on sso.haicore.berlin
  2. You have to upload the public-key of an ssh-key-pair which you have created.
  3. You have to bind your HAICORE.berlin account to an existing HFMI project.
 
Step 1: Create an HAICORE.berlin account

You must use your own Helmholtz-Institution Account to authenticate with HAICORE.berlin and create an local account for HAICORE.berlin.

Step 1.1: Connect to the HAICORE.berlin SSO-portal

Open the following URL with your web browser
https://sso.haicore.berlin/realms/Helmholz_AAI_Prod/account/
which forwards you to the login-page of “HAICORE.berlin”:

There you click on the link at the bottom of the dialogue to sign in with Helmholtz AAI.

Step 1.2: Find your Helmholtz-organization

In the drop-down-list provided, find your institution to authenticate with. You can use the search-field to speed-up the scrolling through more than 6000 institutions which are member of Helmholtz-AAI currently.

Tip for members of the Max Delbrück Center: You will fail finding “MDC” - please search for “Max” (Max Delbrück Center for Molecular Medicine) - no “MDC”, no “Berlin”.

Step 1.3: Authenticate with your institutional credentials

Please ensure that the URL now lists your home-institutions address - never give your credentials to other websites than those of your home-institution.

Type-in your home institutions username and password, plus maybe your second-factor when MFA is active on your institution.

In the next dialogue “keycloak@haicore.berlin” must get your permission to access the Helmholtz-AAI information. Please click on “Allow”.

Step 2: Store your HAICORE.berlin ssh-key

After authentications with Helmholts-AAI (step 1.3), you are connected to the HAICORE.berlin Account Management System.

When you have no ssh-key stored in the Helmholtz Cloud Portal, You have to paste an individual ssh-public-key for HAICORE.berlin into the field ssh_key and click on “Safe”

Notice: It only is possible to deposit one ssh-key in the Account Management System. Also passwords and multi-factor-information are ignored.

To create yor ssh-key-pair, you can use for example the “PuTTY Key Generator” on Windows or the ssh-keygen command on the Linux command-line:

PuTTY Key Generator

Linux command-line

Step 3: Bind your HAICORE.berlin account to an existing HFMI Project

Please send an e-mail to haicore-support@mdc-berlin.de

Please state the following facts:

  • Your name
    The user name that your HAICORE.berlin account got.
    Usually <forename>.<familyname>
  • The HFMI project that you belong to
    (HFMI = Helmholtz Foundation Model Initiative)
    Sorry, currently, only 6 named HFMI pilot-projects get access to HAICORE.berlin
  • Which fundamental tools do you need to start your research
    We provide basic software (git, gcc) and global access via http, https and ssh protocolls to download anything else.
    We can not provide access to Anaconda.

Now the administrators of HAICORE.berlin will grant you access to the named project, when your name is on the list of pre-named project members. Otherwise we will confirm back with the HFMI-project-lead at your home institution. The result will be communicated to you by e-mail.

After your account got confirmed, you can connect to HAICORE.berlin (see above)

Notice: You can have your account also bound to multiple HFMI-projects, when that is needed.

Compute

All computation is managed by our workload management system SLURM.

You must request the resources (runtime, GPUs, CPU-cores, CPU-memory) that you jobs needs. The following resources are available at most:

Resourcenamedefaultmaximum per nodemaximum per clusterremarks
Runtime--time=<hh>:<mm>:<ss>24 hours24 hours24 hoursPlease use propper checkpointing for your jobs.
Please specify when you expect shorter runtime than one full day.
Nvidia H100--gpus=<ngpu>08 GPUs64 GPUs80 GB high-bandwidth-memory per GPU.
Xeon Planinum 8480+-c <ncpu>1220 cores1760 coresThis is 27 CPU-cores per GPU.
(Four cores per node are reserved for OS.)
Memory--mem=<size><unit>4 GB985 GB7.7 TB(Some RAM is consumed by OS and caches.)
Infiniband--mpi=<foo>none   

You can consume a total op 50.000 GPU-hours per HFMI-project. The command <missing> gives you a report how much is consumed already by you and your project mates.

Storage on HAICORE.berlin

We provide you 100GB storage capacity in your home directory /fast/home/<your.alias> (or simply ~).

We provide you 10TB storage capacity in your project directory /fast/project/hfmi_<name>.

Please use the df command to find out, how much quota is consumed and left for the current directory.

You can transfer data using your ssh-credentials and the scp or sftp application on your remote system to transfer data to/from HAICORE.berlin.

For the runtime of your computations, you can use the local (NVMe) storage at /scratch (30TB) for temporary files.