Summer Certification Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = getmirror

Pass the NVIDIA-Certified Professional NCP-AII Questions and answers with ExamsMirror

Practice at least 50% of the questions to maximize your chances of passing.
Exam NCP-AII Premium Access

View all detail and faqs for the NCP-AII exam


467 Students Passed

95% Average Score

91% Same Questions
Viewing page 3 out of 4 pages
Viewing questions 21-30 out of questions
Questions # 21:

During BCM cluster setup, an engineer must configure bonded network interfaces on DGX nodes for high availability. Which cmsh command sequence properly configures a bond0 interface with two physical NICs?

Options:

A.

device use dgx001 ; interfaces add vlan vlan100 ; set parent bond0 ; set mode 1 ; set network internalnet

B.

device use dgx001 ; interfaces add bond bond0 ; append interfaces enp225s0f1np1 enp97s0f1np1 ; set mode 1 ; set network internalnet

C.

device use dgx001 ; interfaces set enp225s0f1np1 network internalnet ; interfaces set enp97s0f1np1 network internalnet

D.

device use dgx001 ; interfaces delete enp225s0f1np1 ; interfaces delete enp97s0f1np1

Questions # 22:

During HPL execution on a DGX cluster, the benchmark fails with “not enough memory” errors despite sufficient physical RAM. Which HPL.dat parameter adjustment is most effective?

Options:

A.

Disable double-buffering via BCAST parameter.

B.

Increase block size to 6144 to maximize GPU utilization.

C.

Reduce the problem size while maintaining the same block size.

D.

Set PMAP to 1 to enable process mapping.

Questions # 23:

Refer to the output:

~ $ sudo nvsm show healthinfo

—Timestamp: Sat Dec 16 16:26:32 2017 -0800

Version: 17.12-5

Checks—BIOS Revision [5.11].........................

DGX Serial Number [YSY72800016)..................

Verify installed DIMM memory sticks........................Healthy

...[output truncated)

Verify Ethernet controllers...........................Healthy

Verify installed GPU ' s..............................Unhealthy

Checking output of ' lspci ' for expected GPU ' s

Missing GPU at PCI address ' 07:00.0 '

Verify installed InfiniBand controllers....................Healthy

Verify PCIe switches..................................Healthy

...[output truncated)

What insights can a system administrator gain regarding the DGX system ' s health?

Options:

A.

A GPU tray upgrade failed.

B.

A GPU is missing on the DGX system.

C.

A GPU driver upgrade has failed.

D.

The system has passed the hardware health check successfully.

Questions # 24:

An infrastructure engineer runs an NCCL burn-in on an eight-node GPU cluster. Over a 12-hour period, all GPUs are tested with repeated all-reduce collectives. Monitoring tools show the following observations:

Aggregate bandwidth remains within 5% of documented reference for the hardware on every run.

No errors or timeouts are reported in NCCL logs.

On three occasions, one GPU logged single-run bandwidth dips of 15–20% compared to its normal performance, but performance recovered on the next run and stayed stable afterward. System logs show no hardware or driver errors.

Two minor NCCL WARN-level messages about “unexpected latency spike” appear in system logs for separate nodes, but could not be reproduced.

Which conclusion is the best strategy before releasing the cluster to production?

Options:

A.

Proceed, since all bandwidth targets are met, issues were transient and self-resolved, and there are no persistent errors or timeouts across repeated burn-ins.

B.

Recommend proactive maintenance, because any bandwidth drop, even if transient and unreproducible, shows the burn-in failed; clusters must not show performance variance above 10% for any GPU even once.

C.

Approve for AI workload use, but flag affected nodes for manual exclusion from distributed training jobs, as nodes showing any anomaly should be isolated whenever possible.

Questions # 25:

After configuring HA, the administrator runs cmsh status and notices the secondary head node reports mysql [FAIL]. What is the most likely cause?

Options:

A.

The BCM license expired after HA configuration.

B.

Network connectivity issues between the primary and secondary head nodes.

C.

The secondary head node lacks NVIDIA GPU drivers.

D.

The cluster nodes are powered on during the HA configuration.

Questions # 26:

For an NVIDIA Enterprise AI Factory with 256 GPUs, which storage solution characteristic is most critical to validate during scaling tests?

Options:

A.

Consistent per-node throughput > 8 GiB/s.

B.

Single-node write performance during idle clusters.

C.

RAID rebuild times under disk failure.

D.

Maximum 4K random read IOPS exceeding 1 million.

Questions # 27:

An administrator is configuring node categories in BCM for a DGX BasePOD cluster. They need to group all NVIDIA DGX H200 nodes under a dedicated category for GPU-accelerated workloads. Which approach aligns with NVIDIA ' s recommended BCM practices?

Options:

A.

Assign nodes to the ’login " category to simplify Slurm integration.

B.

Create a new " dgx-h200 " category, assign all DGX H200 nodes to it.

C.

Use the existing " dgxnodes " category without modification, as it is preconfigured for all DGX systems.

D.

Avoid categories and configure each DGX node individually via CLI.

Questions # 28:

A team is installing the NVIDIA Run:ai control plane on a Kubernetes cluster. Which two (2) options are most critical to validate before proceeding? (Pick the 2 correct responses below)

Options:

A.

Helm is installed on the installer machine.

B.

Ensure Kubernetes is running on the cluster.

C.

All cluster nodes have NVIDIA GPUs installed.

D.

NTP is disabled to simplify time synchronization.

Questions # 29:

An engineer wants to verify that an NVIDIA GPU is accessible inside a Docker container for running deep learning workloads. The NVIDIA Container Toolkit is installed on a machine with working NVIDIA drivers. Which command demonstrates the correct way to run a container that can access all available GPUs?

Options:

A.

docker run --rm --runtime=docker nvidia/cuda nvidia-smi

B.

docker run --rm -it ubuntu:22.04 nvidia-smi

C.

docker run --rm --gpus all nvidia/cuda:12.4.6-base-ubuntu22.04 nvidia-smi

D.

docker run --rm nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

Questions # 30:

A team is validating a DGX BasePOD deployment. Using cmsh, they run a command to check GPU health across all nodes. What indicates that the system is ready for AI workloads?

Options:

A.

The command output is ignored if the system powers on without errors.

B.

At least half of the GPUs report Status_Health = OK.

C.

All GPUs report Status_Health = OK and Health = OK for each device.

D.

Only the head node ' s GPUs need to be healthy.

Viewing page 3 out of 4 pages
Viewing questions 21-30 out of questions
TOP CODES

TOP CODES

Top selling exam codes in the certification world, popular, in demand and updated to help you pass on the first try.