How to quickly locate the problems on the call chain in service-oriented heterogeneous systems?

Great God. I don"t know if it is appropriate for me to write the title like this. In short, it means that our company"s business is micro-service, and everyone calls each other to provide services to users. Therefore, according to the data flow, it will be divided into upstream and downstream.
for example, my service upstream has the services of the raw data provider department, and the services provided by the BI department"s data analysis, which are invoked using http in my services. My downstream is the client department.
so downstream calls my service, I call upstream services, and maybe upstream. So shaping the call chain.
when an exception occurs in my service (suppose 5xx rises or delays rise), how to quickly determine if there is something wrong with the upstream interface or your own service?
this is a bit too theoretical and abstract. The great god can lift the millet.

there is a big god said that the use of distributed system monitoring system zipkin, which only directly uses off-the-shelf solutions, and the implementation workload is heavy.

Mar.03,2021
MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-1b383d6-2b875.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-1b383d6-2b875.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?