From randy@stat.tamu.edu Mon Mar 20 14:20:13 1995 >From randy@stat.tamu.edu Mon Mar 20 14:20:12 1995 Received: from picard.tamu.edu (PICARD.TAMU.EDU [165.91.112.177]) by linc.cis.upenn.edu (8.6.10/UPenn 1.4) with SMTP id OAA28364 for ; Mon, 20 Mar 1995 14:20:10 -0500 Posted-Date: Mon, 20 Mar 1995 14:20:10 -0500 Received-Date: Mon, 20 Mar 1995 14:20:10 -0500 Received: by picard.tamu.edu (AA01797); Mon, 20 Mar 95 13:18:20 CST Date: Mon, 20 Mar 95 13:18:20 CST From: randy@PICARD.tamu.edu (Randy Hutson) Message-Id: <9503201918.AA01797@picard.tamu.edu> To: benh@linc.cis.upenn.edu Subject: Re: Profiling results Status: R >It looks to me that "floor_grid()" *might* benefit from being made into >a macro, but I do not know how to tell the "speedup" of making a function >into a macro. I assume you lose all data on the contribution due to that >function. That is a problem, but it's still possible to get a good idea if a macro version is faster. For example, in one profile I did, floor_grid() was called 4169330 out of 6240772 times from update_view_aux(). These 4169330 calls tooks 4.56 CPU seconds. update_view_aux() was called 2084665 times for a total of 3.73 CPU seconds. Now if floor_grid() were rewritten as a macro, update_view_aux()'s execution time could be compared to 4.56 + 3.73 = 8.29 seconds. If it was noticeably less than 8.29 seconds, this is a good indication that a macro version of floor_grid() is faster. A similar comparison could be made to process_monsters() which is the second most frequent caller of floor_grid. >Unfortunately, there *is* a way to speed up "floor_grid()" a >*lot* and that is to make *two* versions of it, one which *assumes* that >the given location is valid. And then have the code call this "dangerous" >version when possible. Note that usually, about half of the function is >spent calling "in_bounds()". I noticed that, but I had no idea if that check could possibly be removed. >I am pretty sure that *some* of the calls to >"floor_grid()" can be replaced by the "dangerous" version. I was curious about this, so I added some information gathering code into the game. I played for 10 minutes or so, and none of the calls to floor_grid from make_move(), mon_move(), and process_monsters() were ever illegal. The update_view_aux functions did get a lot of out of bounds hits though. >For my edification, could you run tests 1 and 2 with the player doing something >like "resting" for 1000 turns. OK, I've done this. You might be able to profile the game and test it yourself if you have access to a BSDish system (Sun OS included). I'm using "gprof" which is a standard BSD utility. It's not difficult to get profiling data from gprof, although assimilating it into a useful form isn't always trivial. If you can find gprof, just compile and link the game with the "-pg" switch sent to the compiler (this works with gcc). Then run the game normally. When the program exits, a file named "gmon.out" is created. You then run "gprof angband", and it will write a lot of information to stdout (so redirect it to a file). >Your data seems to suggest that (at least when >an orc pit is nearby) the "pre-compute view" code is about as efficient as >not having it. I like that... :-) I'd say that using the pre-compute view option in the dungeon is to moderately more efficient than not using it. It seems to slow down the game noticeably in wide open areas such as the town though. (It was 13% faster when just resting though.) >Oh, note that I chose "resting" because "update_view()" is only called when >the player moves, so don't do 100 calls to "stay here" :-) >As always, thanks a *lot* for your help... You're welcome! Thanks for spending a lot more time on the game yourself. OK, here's the profiling results for resting 1000 turns below the orc pit with the pre-compute option on. As before, I've culled out (most) "system" functions and am just including those few functions which used 1% or more of CPU time: % time seconds cumsec calls function 18.27 3.05 3.05 9802 process_monsters 13.48 2.25 5.30 75019 make_move 6.59 1.10 6.40 358310 mmove 6.29 1.05 7.45 544841 floor_grid 3.54 0.59 9.02 75019 mon_move 3.00 0.50 9.52 74629 get_moves 2.40 0.40 9.92 74629 mon_will_run 2.10 0.35 10.27 126181 random_hack 2.04 0.34 10.61 158932 player_has_los 2.04 0.34 10.95 56525 QueueAttrChar 2.04 0.34 11.29 56514 map_info_aux 1.74 0.29 11.92 56521 map_info 1.32 0.22 12.14 1033 select 1.14 0.19 12.33 56270 lite_spot 1.08 0.18 12.51 74629 get_moves_aux 1.08 0.18 12.69 18976 update_mon 1.02 0.17 12.86 48612 sf_put 1.02 0.17 13.03 1828 string_make Total CPU time used by game code: 16.69 seconds And here's the profile without pre-computing: % time seconds cumsec calls function 17.16 3.29 3.29 9802 process_monsters 10.90 2.09 5.38 75019 make_move 10.12 1.94 7.32 1048688 floor_grid 7.04 1.35 8.67 188550 los 4.90 0.94 10.58 358310 mmove 2.87 0.55 11.13 75019 mon_move 2.45 0.47 11.60 74629 mon_will_run 2.40 0.46 12.06 74629 get_moves 2.30 0.44 12.50 158914 player_has_los 1.98 0.38 12.88 126181 random_hack 1.98 0.38 13.26 56292 map_info_aux 1.56 0.30 13.89 56296 map_info 1.36 0.26 14.15 192940 distance 1.20 0.23 14.38 1033 select 1.15 0.22 14.60 18976 update_mon 1.15 0.22 15.04 1828 string_make Total CPU time used by game code: 19.17 seconds Oh, I have no good explanation as to why make_move took 2.25 seconds with pre-compute and 2.09 seconds without. I ran these tests several times and make_move() was sometimes faster with pre-computing and sometimes it was slower. (It shouldn't make any difference!) I'd consider this an insignificant anomaly perhaps related to timing round off errors. (There are other anomalies, for example, string_make().) I think the point is, don't put much faith in the apparent preciseness of gprof's numbers. Randy