Spring Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = getmirror

Pass the NVIDIA-Certified Professional NCP-AII Questions and answers with ExamsMirror

Practice at least 50% of the questions to maximize your chances of passing.
Exam NCP-AII Premium Access

View all detail and faqs for the NCP-AII exam


441 Students Passed

87% Average Score

90% Same Questions
Viewing page 1 out of 3 pages
Viewing questions 1-10 out of questions
Questions # 1:

During East-West fabric validation on a 64-GPU cluster, an engineer runs all_reduce_perf and observes an algorithm bandwidth of 350 GB/s and bus bandwidth of 656 GB/s. What does this indicate about the fabric performance?

Options:

A.

Inconclusive; rerun with point-to-point tests.

B.

Optimal performance; bus bandwidth near theoretical peak for NDR InfiniBand.

C.

Critical failure; bus bandwidth exceeds hardware capabilities.

D.

Suboptimal performance; algorithm bandwidth should match bus bandwidth.

Questions # 2:

After NCCL burn-in reports "transport retry count exceeded," which corrective action addresses the underlying fabric issue?

Options:

A.

Switch from Ring to Tree algorithms via NCCL_ALGO=TREE

B.

Reduce message size to decrease network utilization

C.

Increase NCCL_IB_TIMEOUT to tolerate longer latencies

D.

Inspect InfiniBand link quality metrics (BER, symbol errors) and replace faulty cables

Questions # 3:

An engineer needs to validate NVLink Switch functionality on a DGX H100 system with 8 GPUs. Which NCCL command verifies intra-node NVLink bandwidth?

Options:

A.

broadcast_perf -b 8 -e 16G -f 2 -g 8 without split configuration

B.

all_reduce_perf -b 8 -e 16G -f 2 -g 4 with NCCL_TESTS_SPLIT="MOD 2"

C.

all_reduce_perf -b 8 -e 16G -f 2 -g 1 repeated 8 times

D.

all_reduce_perf -b 8 -e 16G -f 2 -g 8 with NCCL_TESTS_SPLIT="OR 0x7"

Questions # 4:

You are a network administrator responsible for configuring an East-West (E/W) Spectrum-X fabric using SuperNIC. The Bluefield-3 devices in your network should be set to NIC mode with RoCE enabled to optimize data flow between servers. You have access to the Spectrum-X management tools and the necessary documentation. You need to use specific configuration commands to achieve this setup. Which of the following steps and commands are necessary to configure the Bluefield-3 devices in NIC mode for the E/W Spectrum-X fabric using SuperNIC? (Pick the 2 correct responses below)

Options:

A.

Use the command sudo mlxconfig -d /dev/mst/ set LINK_TYPE_P1=2 to enable Ethernet on the Bluefield-3 devices.

B.

Use the command sudo mlxconfig -d /dev/mst/ set DISABLE_SPECTRUM_X=1 to reduce overhead.

C.

Use the command sudo mlxconfig -d /dev/mst/ set INTERNAL_CPU_OFFLOAD_ENGINE=1 to configure the SuperNIC to operate in NIC mode.

D.

Use the command sudo mlxconfig -d /dev/mst/ set DPU_MODE=1 to set up the Bluefield-3 devices in DPU mode.

Questions # 5:

A customer is designing an AI Factory for enterprise-scale deployments and wants to ensure redundancy and load balancing for the management and storage networks. Which feature should be implemented on the Ethernet switches?

Options:

A.

Implement redundant switches with spanning tree protocol.

B.

MLAG for bonded interfaces across redundant switches.

C.

Use only one switch for all management and storage traffic.

D.

Disable VLANs and use unmanaged switches.

Questions # 6:

A systems engineer is updating firmware across a large DGX cluster using automation. What is the best practice for minimizing risk and ensuring cluster health during and after the process?

Options:

A.

Drain nodes from the scheduler, run pre-update diagnostics, update firmware in batches, and verify health post-update before scaling to the next batch.

B.

To save time, simultaneously update all nodes in the cluster without draining or diagnostics.

C.

Update nodes that have reported faults, leaving others on older firmware.

D.

Drain nodes from the scheduler, update firmware in batches, skip diagnostics and verify health post-update before scaling to the next batch.

Questions # 7:

A cluster administrator is preparing to update the firmware on a DGX H100 system, including the GPU tray (baseboard). What is the correct sequence of steps to perform a safe and successful firmware upgrade?

Options:

A.

Update the BMC and skip the GPU tray and motherboard tray updates if the system appears healthy.

B.

Perform a cold reset, stop all GPU activity, update and reboot the BMC, update motherboard and tray components, and verify completion.

C.

Update the GPU tray first, then the motherboard tray, and reboot the BMC after all updates are complete.

D.

Stop all GPU activity, update and reboot the BMC, update motherboard and tray components, perform a cold reset, and verify completion.

Questions # 8:

A 24-hour HPL burn-in fails with "illegal value" errors during the first iteration. Which initial troubleshooting step resolves this without compromising burn-in validity?

Options:

A.

Switch from FP64 to FP32 precision.

B.

Disable GPU affinity.

C.

Reduce test duration to 12 hours.

D.

Verify the matrix size is divisible by block size.

Questions # 9:

During cluster validation, the Cable Validation Tool (CVT) reports "Underperforming (BER)" for an InfiniBand link. Which BER thresholds indicate a critical signal quality issue requiring cable replacement?

Options:

A.

Rx power variance > 3dB between lanes

B.

Effective BER > 0 during the first 125 minutes of link operation

C.

Raw BER > 1e-12 or Effective BER > 1.5E-254 for <6hr measurements

D.

Temperature > 85°C on transceiver module

Questions # 10:

A network engineer is tasked with configuring the management, storage, and compute networks for a new DGX BasePOD deployment. Which statement best describes the network segmentation required for optimal operation?

Options:

A.

A single VLAN for all types of network traffic.

B.

Two networks: one for management and one for compute.

C.

Four networks: compute, storage, out-of-band, and management.

Viewing page 1 out of 3 pages
Viewing questions 1-10 out of questions
TOP CODES

TOP CODES

Top selling exam codes in the certification world, popular, in demand and updated to help you pass on the first try.