During the Sensey Lancer project, we were faced with a question about defining the relation between an Image and a Kubernetes node.

Currently, this is our Image entity:

package com.sensey.lancer.macrocleaner.entity;

import java.time.Instant;
import java.util.List;
import java.util.UUID;

import com.sensey.lancer.macrocleaner.enums.ImageStatus;
import jakarta.persistence.*;
import lombok.*;

@Entity
@Table(name = "images")
@Data
@NoArgsConstructor
@AllArgsConstructor
@Builder
public class ImageEntity {
    
    //--------Atts----------------//
    
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    @Column(name = "id", nullable = false, unique = true, updatable = false)
    private UUID id;
    
    /**
     * Image ID from Docker (?)
     * Example: 77af4d6b9913
     */
    @Id
    @Column(nullable = false, unique = true)
    private String imageId;

    @Column(name = "name", nullable = false)
    private String name;

    @Column(name = "tag", nullable = false)
    private String tag;

    @Column(name = "size_mb", nullable = false)
    private Long sizeMB;

     /**
     * Instant is always stored as UTC, making it ideal for global applications.
     */
    @Column(name = "created_at", nullable = false)
    private Instant createdAt;

    @Column(name = "pulled_at", nullable = false)
    private Instant pulledAt;

    @Column(name = "deleted_at", nullable = true)
    private Instant deletedAt;
    
    //--------Enums----------------//

    @Enumerated(EnumType.STRING)
    @Column(name = "status", nullable = false)
    private ImageStatus status;

    //--------Relations------------//

    /**
     * Many images can have 1 node. 
     */
    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "node_id", nullable = false)
    private NodeEntity node;

    /**
     * An image can have many containers. 
     * When an image is removed, the containers must be kept intact. 
     * The containers need to be removed first, 
     * and then the image needs to be removed for a correct procedure. 
     */
    @OneToMany(mappedBy = "image", cascade = CascadeType.PERSIST, orphanRemoval = false)
    private List<ContainerEntity> containers;
}

In this troubleshoot session, we will focus on this part of the code:

   /**
     * Many images can have 1 node. 
     */
    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "node_id", nullable = false)
    private NodeEntity node;

In the current situation, the relation is defined as many-to-one, meaning:

Howewer, after some discussion between the developers, a question was raised:

Should this not be a many-to-many relation? Since:

  1. A node can have many images
  2. But at the same time, the same image can be use by many nodes.

Which is a many-to-many relation, according to the brainstorm of developer - let’s say his name is X.

To understand the discussion better and to be able to answer this question. A decision was made to to take a step back, and analyze the situation - by understanding how in the name of Kubernetes the process of pulling images and scheduling truly works.

According to my current knowledge, the logic of image pulling in Kubernetes is like this:

  1. A Pod is a logical grouping of many containers, that share the same network and because these containers share the same network - they can talk with each through localhost.
  2. But in most cases, only a single Container is running within a Pod. Only when using a sidecar for monitoring the container, two containers are mostly used in 1 Pod.
  3. A Container pulls an image from an Container Registry: this can be Docker Hub, Azure Container Registry etc.
  4. But, in Kubernetes you have something like an ImagePullPolicy. So the containers handles Image pulling based on this defined ImagePullPolicy defined in it’s configuration. The ImagePullPolicy can be either “Always, IfNotPresent, Never”.
  5. When the policy is always, it always pulls from the registry during a restart. When the policy is IfNotPresent, it only pulls from the container registry if the corresponding image is not already present on the node’s storage. And ‘never’ .. yeah I assume that the container will never pull the image, which is actually a very weird configuration to have.
  6. So, based on these policies like IfNotPresent, I can already deduce that the image is stored somewhere on the underlying disk storage of the Node.
  7. But, what if a Deployment has two replica’s? That means to exact same Pods will be created, with two containers that are going to use the same image. And let’s asume these Pods are going to be scheduled on different nodes, let’s say Node X and Node Y.
  8. This will mean, according to my local, that a container on Node X wil will pull the image from the container registry (assuming imagePullPolicy = always) and that this image will be stored on it’s corresponding node disk storage. But that also the container scheduled on Node Y, will pull the image from the container registry.

Indicating, a many-to-many relation since:

  1. An image can be used by many nodes
  2. A node can have many images.

Now, let’s verify with ChatGPT if X his knowledge is true: