Summer Certification Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = getmirror

Pass the NVIDIA-Certified Professional NCP-AII Questions and answers with ExamsMirror

Practice at least 50% of the questions to maximize your chances of passing.
Exam NCP-AII Premium Access

View all detail and faqs for the NCP-AII exam


467 Students Passed

95% Average Score

91% Same Questions
Viewing page 4 out of 4 pages
Viewing questions 31-40 out of questions
Questions # 31:

A system administrator needs to install a GPU/DPU in a server. The server has a free PCI-e slot, there are enough free PCI-e lanes, and there is enough room for the card. Which procedure should be followed?

Options:

A.

Ensure the server has enough power. Verify compatibility of cables with server ' s platform. Make sure the server is down to remove cables safely. Do not wear an ESD bracelet.

B.

Ensure the server has enough power. Make sure the server is down to remove cables safely. Wear an ESD bracelet.

C.

Ensure the server has enough power. Make sure the server is up and running with attached cables. Wear an ESD bracelet.

D.

Ensure the server has enough power. Verify compatibility of cables with server ' s platform. Make sure the server is down to remove cables safely. Wear an ESD bracelet.

Questions # 32:

A system administrator has upgraded the firmware of the DPU. What will be the state of the firmware after the upgrade?

Options:

A.

The firmware is installed on the DPU.

B.

The firmware is deleted from the DPU.

C.

The firmware is copied to the DPU but not installed.

D.

The firmware is waiting on reboot to become active.

Questions # 33:

You are an infrastructure engineer tasked with validating a new AI training cluster before releasing it to users. Your team wants to perform a NeMo burn-in to ensure both hardware and software are reliable and ready for production workloads. Which of the following actions are required as part of a proper NeMo burn-in process?

Pick the 2 correct responses below.

Options:

A.

Download a pre-trained NeMo model and use it for a quick accuracy check on a user dataset, then consider the burn-in complete if results are reasonable.

B.

Test inference using the NeMo API and approve the environment if the model outputs valid predictions.

C.

Run the configured NeMo training job repeatedly or for an extended duration, monitoring for errors, stalls, or performance drops across all GPUs and nodes.

D.

Configure a representative NeMo training or pretraining recipe and set up an executor to launch the job across intended nodes and GPUs.

Questions # 34:

During a multi-day NeMo burn-in, intermittent " GPU fell off bus " errors occur. Which diagnostic approach isolates hardware faults?

Options:

A.

Enable HPL_USE_NVSHMEM for alternative memory sharing.

B.

Run DCGM diagnostics alongside burn-in to monitor GPU health metrics.

C.

Switch from BERT to GPT models for simpler computations.

D.

Reduce blocksize to 500MB to lower memory pressure.

Questions # 35:

A system administrator noticed a failure on a DGX H100 server. After a reboot, only the BMC is available. What could be the reason for this behavior?

Options:

A.

The network card has no link / connection.

B.

A boot disk has failed.

C.

Multiple GPUs have failed.

D.

There are more than two failed power supplies.

Questions # 36:

Which of the following tests should be used to check for the lowest possible latency between two nodes in a fabric?

Options:

A.

ib_read_bw

B.

ib_read_lat

C.

ib_write_bw

D.

ib_write_lat

Questions # 37:

You are preparing a Spectrum-based NVIDIA switch for integration into a production AI cluster. To confirm that all modules are running approved firmware versions, you must use the appropriate command from the switch CLI. Which step most accurately meets best practices for ensuring firmware version consistency and cluster compliance?

Options:

A.

Use the show version command to check the overall system version and confirm all modules are updated if the system version matches the documentation.

B.

Use the show interfaces status command to verify all ports are up, and proceed with integration if no interface errors are shown.

C.

Use the show asic-version command to review firmware versions for all modules, then compare these against the documented approved versions.

D.

Use the show inventory command to display component details and serial numbers before proceeding, as this output will include all firmware versions for review.

Questions # 38:

During HPL execution on a DGX cluster, the benchmark fails with " not enough memory " errors despite sufficient physical RAM. Which HPL.dat parameter adjustment is most effective?

Options:

A.

Reduce the problem size while maintaining the same block size.

B.

Set PMAP to 1 to enable process mapping.

C.

Increase block size to 6144 to maximize GPU utilization.

D.

Disable double-buffering via BCAST parameter.

Questions # 39:

A cluster administrator needs to validate transceiver firmware versions across 200 ports using UFM. Which GUI-based method provides a consolidated view?

Options:

A.

Navigate to ’Devices " > select a switch > " Cables ' tab to see ASIC firmware and transceiver versions.

B.

Use " Topology’ view to visually inspect cable icons.

C.

Run mlxlink -d lid- < LID > -m on each port manually.

D.

Export all switch logs and grep for ’FW Version " .

Questions # 40:

ClusterKit’s NCCL bandwidth test shows 350 GB/s on a 400G InfiniBand fabric. How should this result be interpreted?

Options:

A.

Critical failure; expected is greater than 390 GB/s for HDR InfiniBand.

B.

Suboptimal performance; requires FEC tuning to reach 380+ GB/s.

C.

Optimal performance, indicating healthy fabric and GPUDirect RDMA.

D.

Inconclusive; rerun with --stress=cpu to validate.

Viewing page 4 out of 4 pages
Viewing questions 31-40 out of questions
TOP CODES

TOP CODES

Top selling exam codes in the certification world, popular, in demand and updated to help you pass on the first try.