Troubleshooting: Compute Engine VM Fails to Start

Troubleshooting: Compute Engine VM Fails to Start

A Compute Engine VM that won't start is a frustrating but common scenario. The root causes can be categorized into four main areas: Resource Conflicts, Boot Disk Issues, Resource Availability, and Permission Problems.
By systematically checking each one, you can quickly diagnose and fix failure.

1. Resource Conflicts & State Issues

If you try to start a VM and receive an error indicating the resource is "already being used" or the state prevents the action, check the following:

Boot Disk Operations in Progress

When an operation like snapshot creation or disk cloning is running on the VM's boot disk, the disk is temporarily locked, preventing the VM from starting.

Warning
  • Symptom: You see an error like: "Failed to start example-vm: The instance resource '...' is already being used by '.../disks/clone'".

  • Fix:

    1. Go to the VM instances page and check the Notifications pane or the Activity tab for pending operations.

    2. Wait a few minutes for the snapshot or clone to complete.

    3. Once the operation is finished, retry starting the VM.

Incorrect Instance State

A VM can only be started if its status is TERMINATED. If it's in a transitional or erroneous state, a simple "Start" action may fail.

  • Symptom: The console shows a spinning wheel or you receive an error about being unable to start because it's "not stopped."

  • Fix: If the VM is stuck, try a hard reset using the gcloud CLI
    command, which can sometimes clear a bad state.

    Bash
    gcloud compute instances reset [VM_NAME] --zone=[ZONE]
    

    After the reset completes (which is like a hard power cycle), try the start command again.

2. Diagnosing Boot Disk Failures

If the VM successfully moves to the RUNNING state but you can't connect (e.g., SSH fails), the issue is likely within the operating system (OS) boot process.

Examine the Serial Console Output

The serial console is your best diagnostic tool, providing low-level boot messages from the BIOS, bootloader, and kernel.
You can view this output even if the VM is not fully booted.

  • Action:

    1. In the Google Cloud Console, navigate to Compute Engine > VM instances.

    2. Click on the problematic VM name.

    3. Go to the Serial port 1 (Console)
      tab.

    4. Review the output for errors such as kernel panics, filesystem errors, or messages about boot failure.

Boot Disk is Full or Corrupted

A completely full boot disk will prevent the OS from booting properly, as it can't write necessary temporary or log files.

  • Symptom: Serial console output shows errors related to mounting the root filesystem or running out of disk space.

  • Fix (Requires a Rescue VM):

    1. Stop the problematic VM.

    2. Create a snapshot of the boot disk for safety, then detach the boot disk.

    3. Create a new, temporary VM (your rescue VM
      ) in the same zone

    4. Attach the original VM's boot disk
      as a secondary disk to the rescue VM.

    5. SSH into the rescue VM, mount the attached disk, and perform one of the following:

      • If full: Delete unnecessary files to free up space, or resize the disk (which you can do directly from the Compute Engine Disks
        page).

      • If corrupted: Run a file-system check (fsck
        ) to repair errors on the disk.

    6. Unmount and detach
      the disk from the rescue VM.

    7. Attach the disk back as the boot disk to the original VM, and try to start it.

3. Resource & Zone Availability

Sometimes, a VM fails to start due to capacity limits in the selected zone.
This is particularly common for smaller machine types in busy zones.

  • Symptom: You receive an error like: "The zone 'projects/...' does not have enough resources available to fulfill the request. Try a different zone, or try again later."

  • Fix:

    • Wait and Retry: Resource availability is dynamic. Try starting the VM again after a few minutes.

    • Migrate the VM: Move the VM instance to a different zone
      within the same region where resources are available.
      You can use the gcloud compute instances move command or manually re-create the VM from its boot disk in a new zone.

    Bash
    gcloud compute instances move [VM_NAME] \
        --zone=[OLD_ZONE] \
        --destination-zone=[NEW_ZONE]
    

4. Permissions and IAM Issues

If you are using a service account or non-default credentials to start the VM, insufficient IAM permissions can prevent the action.

  • Symptom: The console displays an "Insufficient permissions" error or an error mentioning a lack of permission like compute.instances.start.

  • Fix:

    1. Verify that your user account or the service account being used has the necessary Compute Engine roles, such as Compute Instance Admin (v1) or a custom role that explicitly grants the compute.instances.start
      permission.

    2. If the VM's Service Account is involved, ensure it has the necessary Access scopes (e.g., "Compute Engine API access" set to Read/Write
      or Full).