>>106958085
>>106958085
Llama I was running at Q4 on a rented 8xV100 server with some offloaded tensors getting 0.7tk/s with 60k context (also tried the Q3 on a 8xL40 server which I could fit the whole thing on and get 3tk/s), GLM I was consuming through the z-ai API.
But then again I was using the GLM at a much longer contexts so maybe if I had the memory to fit the same context length with Llama it'd start to hallucinate too idk.
This is the syntax:
<tool>
<tool_name>edit_file</tool_name>
<parameters>
<filename>functions/replace_function_body.py</filename>
<old_text>matches = re.search(pattern, content, re.MULTILINE)</old_text>
<new_text>pattern = r'(?s)(?P<function_name>\w+)\s*\(.*?\)\s*\{([^}]*)\}'</new_text>
</parameters>
</tool>
GLM would do shit like
<edit_file>
<filename>functions/replace_function_body.py</filename>
<old_text>matches = re.search(pattern, content, re.MULTILINE)</old_text>
<new_text>pattern = r'(?s)(?P<function_name>\w+)\s*\(.*?\)\s*\{([^}]*)\}'</new_text>
</edit_file>